100 Days of Gen AI: Day 44

After getting laid out with strep throat, I’m back. Today I want to explore what I can do with Replit, a code generation and execution platform. Code execution has been always been a missing piece when developing code with a standard LLM chat.

My development flow when working with a traditional LLM chat has been:

  1. prompt LLM for code modification
  2. copy/paste into my IDE
  3. git diff to see changes
  4. git commit if it looks correct
  5. restart server and manually test the change
  6. repeat from step 1

Then at some point:

  1. setup deployment
  2. set secrets
  3. deploy
  4. test deployment

Let’s see if Replit can integrate all these steps into a seamless experience. I’d like to try to develop a simple CRUD app.

Let’s start with a CRUD app todo list:

CRUD to-do app prompt

I like that it has checkpoints. This has been a missing feature in development with a pure chatbot. I’ve always wanted a git commit after each new AI prompt and the resulting code change.

It looks like it can iterate on its own by generating code, attempting to run it, then self-resolve errors without any user prompting or input. So freakin cool!

It defaulted to Typescript and React, which is interesting. It didn’t even ask me about the language choice. I wonder if it can write well-functioning code in other languages, or if it’s trained primarily on this toolset.

To-do webview

Their onboarding view popped up on refresh. This was interesting:

To-do webview

25 cents per checkpoint. Interesting pricing model. You’re basically paying for one code iteration and commit. My recent siri-playlist-actions project was 53 commits to get to a working version in production, so that would have cost me somewhere on the order of $13.25. Though many of those commits probably represent several iterations of LLM prompting. Let’s ballpark that maybe I did 3 LLM prompts per commit so let’s 3x that to $39.75. TBH that would probably be worth it to me for the extra ergonomics of working in one IDE vs copy/pasting from an LLM chat session assuming it can actually handle the full lifecycle of development, testing, and deployment. Let’s see how well it can handle building more complex features, testing, secrets management, and deployment.

Back to testing. I can add a new todo list item fine:

To-do create

And I can list items:

To-do list

However when I go to edit an existing item it is blank. We’ll fix that later.

To-do edit

Next let’s setup a database. It sets up a postgres db.

Now let’s fix that bug when editing a todo list.

When I test the fix I get an error Invalid time value. Let’s see if I can give a vague description and whether it can figure out the error and fix it.

Error invalid time value

It fixed the original error but now gets a new error

Error invalid due date format

I was eventually able to fix the bug with a few cycles of prompting.

I’m 30 min into my project and looks like I’ve nearly used up my starter plan already :/

Starter plan have used up

I really like how it shows a code diff of the changes. This has also been a longtime annoyance with using a standard LLM chat.

Code diff

It looks like just trying to fix a simple issue of editing a todo took 5 commits.

5 commits to fix bug

That’s $1.25 to fix a very simple bug. I can’t imagine how many iterations it would take to solve a properly difficult bug…

I’ll also note that lack of unit testing is a gap. The Replit agent seems to rely on screenshots of the UI to determine whether the app is performing correctly. This is fine for UI but means it can’t actually test any user actions. I’d love to see some unit tests generated as part of this. Or some kind of Selenium headless browser testing.

Let’s try adding a more complicated feature – user authentication.

When it must use a third-party API Replit provides a set of instructions and secret input field. That’s cool!

Sendgrid

I’m having issues with Sendgrid account creation (denied), so I’m going to skip this for now. Let’s see if Replit can deploy what we currently have.

When asking it to deploy, it instead wants to run a database migration. Cool that it can do this, though not sure why this is relevant to deploying the existing code.

Looks like I have to upgrade to a $25/mo plan in order to deploy. Fine. I’ll try it.

Here’s what the autoscale option looks like. Presumably they have some type of serverless function like a lambda that runs this behind the scenes.

auto-scale

Deploying…

deploying

And it’s deployed!

deployed

Here’s a look at how pricing work. It’s designed to scale so cost is near-zero for low-traffic projects, which is great to see.

deployed

Let’s try again to add user authentication. This is a pretty comprehensive plan, though the email sending part is going to be an issue if it relies on sendgrid.

user authentication plan

They also have a mobile app, so I can presumably develop and deploy straight from my phone!

Replit mobile

In trying to build this user auth page, I have been in the same error loop for 3 cycles where it’s unable to correct the issue despite prompting. I’ve seen a similar behavior when developing code via OpenAI where it seems unable to actually correct the bug and break out of the error loop.

error loop

3 more attempts (for a total of 6) and it still can’t fix the validation error issue. This should be an especially easy fix as it’s a pure front-end issue, with no interactionss with other components of the system.

Now the “Agent’s memory is getting full.” Interesting.

And that’s all the time I have for today.

Takeaways

Overall this seems like a powerful tool. I was able to build some impressive things quickly (front-end UI, basic CRUD functionality, database). Though other things proved difficult (email integration, user authentication, form validation).

I’m sure a good bit of this is the result of user error on my part, and if I learn to use the tool better, I’ll get better results.

As soon as it goes off the rails or struggles with a feature, a background in software development becomes essential. I wonder how effective a non-technical user can be as soon as they try to build anything of complexity.