100 Days of Gen AI: Day 44
After getting laid out with strep throat, I’m back. Today I want to explore what I can do with Replit, a code generation and execution platform. Code execution has been always been a missing piece when developing code with a standard LLM chat.
My development flow when working with a traditional LLM chat has been:
- prompt LLM for code modification
- copy/paste into my IDE
- git diff to see changes
- git commit if it looks correct
- restart server and manually test the change
- repeat from step 1
Then at some point:
- setup deployment
- set secrets
- deploy
- test deployment
Let’s see if Replit can integrate all these steps into a seamless experience. I’d like to try to develop a simple CRUD app.
Let’s start with a CRUD app todo list:
I like that it has checkpoints. This has been a missing feature in development with a pure chatbot. I’ve always wanted a git commit after each new AI prompt and the resulting code change.
It looks like it can iterate on its own by generating code, attempting to run it, then self-resolve errors without any user prompting or input. So freakin cool!
It defaulted to Typescript and React, which is interesting. It didn’t even ask me about the language choice. I wonder if it can write well-functioning code in other languages, or if it’s trained primarily on this toolset.
Their onboarding view popped up on refresh. This was interesting:
25 cents per checkpoint. Interesting pricing model. You’re basically paying for one code iteration and commit. My recent siri-playlist-actions project was 53 commits to get to a working version in production, so that would have cost me somewhere on the order of $13.25. Though many of those commits probably represent several iterations of LLM prompting. Let’s ballpark that maybe I did 3 LLM prompts per commit so let’s 3x that to $39.75. TBH that would probably be worth it to me for the extra ergonomics of working in one IDE vs copy/pasting from an LLM chat session assuming it can actually handle the full lifecycle of development, testing, and deployment. Let’s see how well it can handle building more complex features, testing, secrets management, and deployment.
Back to testing. I can add a new todo list item fine:
And I can list items:
However when I go to edit an existing item it is blank. We’ll fix that later.
Next let’s setup a database. It sets up a postgres db.
Now let’s fix that bug when editing a todo list.
When I test the fix I get an error Invalid time value
. Let’s see if I can give a vague description and whether it can figure out the error and fix it.
It fixed the original error but now gets a new error
I was eventually able to fix the bug with a few cycles of prompting.
I’m 30 min into my project and looks like I’ve nearly used up my starter plan already :/
I really like how it shows a code diff of the changes. This has also been a longtime annoyance with using a standard LLM chat.
It looks like just trying to fix a simple issue of editing a todo took 5 commits.
That’s $1.25 to fix a very simple bug. I can’t imagine how many iterations it would take to solve a properly difficult bug…
I’ll also note that lack of unit testing is a gap. The Replit agent seems to rely on screenshots of the UI to determine whether the app is performing correctly. This is fine for UI but means it can’t actually test any user actions. I’d love to see some unit tests generated as part of this. Or some kind of Selenium headless browser testing.
Let’s try adding a more complicated feature – user authentication.
When it must use a third-party API Replit provides a set of instructions and secret input field. That’s cool!
I’m having issues with Sendgrid account creation (denied), so I’m going to skip this for now. Let’s see if Replit can deploy what we currently have.
When asking it to deploy, it instead wants to run a database migration. Cool that it can do this, though not sure why this is relevant to deploying the existing code.
Looks like I have to upgrade to a $25/mo plan in order to deploy. Fine. I’ll try it.
Here’s what the autoscale option looks like. Presumably they have some type of serverless function like a lambda that runs this behind the scenes.
Deploying…
And it’s deployed!
Here’s a look at how pricing work. It’s designed to scale so cost is near-zero for low-traffic projects, which is great to see.
Let’s try again to add user authentication. This is a pretty comprehensive plan, though the email sending part is going to be an issue if it relies on sendgrid.
They also have a mobile app, so I can presumably develop and deploy straight from my phone!
In trying to build this user auth page, I have been in the same error loop for 3 cycles where it’s unable to correct the issue despite prompting. I’ve seen a similar behavior when developing code via OpenAI where it seems unable to actually correct the bug and break out of the error loop.
3 more attempts (for a total of 6) and it still can’t fix the validation error issue. This should be an especially easy fix as it’s a pure front-end issue, with no interactionss with other components of the system.
Now the “Agent’s memory is getting full.” Interesting.
And that’s all the time I have for today.
Takeaways
- Replit seems to be most effective when given small discreet tasks and then given subsequent prompts to iteratively build on the previous features
- Does not write unit tests (though maybe I can prompt it to do so)
- Defaults to Typescript & React
- Must subscribe and pay $25/mo to build anything beyond a few features
Overall this seems like a powerful tool. I was able to build some impressive things quickly (front-end UI, basic CRUD functionality, database). Though other things proved difficult (email integration, user authentication, form validation).
I’m sure a good bit of this is the result of user error on my part, and if I learn to use the tool better, I’ll get better results.
As soon as it goes off the rails or struggles with a feature, a background in software development becomes essential. I wonder how effective a non-technical user can be as soon as they try to build anything of complexity.