Feeling late to the party, I finally found a weekend to spend an entire day playing with vibe coding, thanks to Itay for organizing and Paul for inviting me to the mini hackday in SF.
As someone who has been programming professionally for more than 25 years, I’d say this is pretty impressive and everyone should give it a try.
Previously, I had used code completion tools like continue.dev. This is a quick note about the experience playing with editors with more vibe(?), a few areas to pay attention to, and how this workflow actually resembles typical development workflow involving data systems (eg, ETL/ELT and tools like dbt)
What is Vibe Coding?
Since LLMs demonstrated the ability to create working code packages, coding assistants are evolving into full-fledged IDEs or agentic workflows, which can modify multiple files, create PRs, run tests and diagnose output. They often require your approval for potentially unsafe actions like editing files or running shell commands. By accepting those requests automatically, you’re now vibe coding.
Here’s how I tested this out. If you worked with or built any frontend frameworks a decade ago, you probably know the todo app example.
Jasmine provided this prompt that I used with Bolt, Cline, and Windsurf, while she tried it on Devin.
Objective: Build a basic to-do list web app with Vite React frontend and Node.js Express backend.
Frontend (React):
–Create components: TaskList, TaskItem, AddTask
–Implement basic CRUD operations
–Style with CSS - using Material-ui
Backend (Node.js):
– Set up Express server
– Create API endpoints for CRUD operations
– Use a simple database using SQLite
Minimum Features:
– Add tasks
– Display tasks
– Mark tasks as complete
– Delete tasks
Tools I Played With
Bolt
This is the first one I tried, as hyped as reaching $4M ARR in weeks. It spat out code and had a browser open to a vanilla Express server that looked like the API endpoint. I pasted the error and it made further edits but I didn’t get a working thing.
Update: Opening the same project a day later, it seemed to be a fully working todo app! I realized it was automatically previewing the API endpoint, not the frontend endpoint.
Cline + gemini-2.0-pro
I then tried using the Cline VSCode extension with the new shiny (and free) gemini-2.0-pro from Google.
It seems knowledgeable, but it tried to execute shell commands with escaped & and I couldn’t convince it to fix the escaping.
As Paul said, nothing is really free. I paid with data and effort trying it out. Hopefully, the Gemini team finds out shell commands shouldn’t be html-entity escaped.
Cline + sonnet 3.5
I then decided to pay for OpenRouter and use Claude Sonnet-3.5 as suggested.
After several back and forths and ~$1.9, we have a todo app! It also launched a browser within the IDE to check behaviors during development. The main issue I had to ask it to fix was avoiding a white-on-white text that is hard for humans to read. Do androids dream in white on white text?
Cline + sonnet 3.5 + Memory bank
Scott recommended the memory bank custom prompt to help the LLM serialize intentions and progress across sessions. Before the LLM starts, you can optionally edit the sequence of tasks and focus.
However, this created overhead in token usage and probably costed $4 to get the todo app working ($1 for scaffolding, $2 to build it, and $1 for unsuccessful tests).
The main back and forth was a nvm environment setup issue, which caused many invocations to reinstantiate npm dependencies and re-edit vite config. So having a reliable devcontainer will help.
Cline + o3-mini
While Cline was improving the tests, I ran out of credits on OpenRouter. I then switched to o3-mini with the native openai endpoint, to see how well it can get the tests passing. Unfortunately, it insisted all tests passed despite 21 failed and 10 skipped tests.
Windsurf
Wow, windsurf actually got everything working on the first try!
Book Ranking Tracker
At this time, I switched to building a more realistic side project - tracking the ranking of my wife’s newly published book in Taiwan. I previously curl’ed online bookstore’s ranking page periodically, so I have some raw historical ranking to start with.
I asked WindSurf to examine 2 html files to extract ranking and book entities. It built a python script to process the files into json, and did additional validation by itself to ensure the data is correct.
I then asked it to build a Streamlit app to plot the rank movement over time. The app worked initially, but the chart looks off and slow. I assumed it’s not pre-aggregating the ranking into a per-day basis, so I asked it to get the daily rank data first.
I finally asked it to turn the curl script into a dlt source, that can read both live html and also backfill from pre-downloaded html snapshots. By this time, I have a duckdb of 60k rows, and I asked it to turn Streamlit’s data source to DuckDB.
This was done in 20 minutes.
I also heard about recommendations for custom rules to ensure Winsurf always keeps a todo file updated, similar to Cline’s memory bank approach. But I didn’t get to try that out.
Conclusion
Vibe coding is a thing! It’s astonishing that we can build things by streaming millions of tokens at low costs.
Jasmine even made some stickers!
However, it’s still a long way to go for making production software this way. Right now,it’s a superpower for mid-level plus developers. Like main fields, we’ll soon have a huge skill gap as junior levels might become prompt wizards, but struggle to gain experience and acquire tastes for underlying technologies and architectures.
A couple of areas we could pay attention to:
Dev Environment & Sandbox
Some native way for coding agents to work with reliable sandbox environments like devcontainers will save a lot of pain and is safer.
Keeping Context and Progress
For slightly complicated projects, having the coding agent maintain context and progress is a great way for itself to be explicit, as well as beneficial for human-machine communication through shared artifacts. This mimics how great teams work: having aligned context and focus areas.
Taste for the architecture and component choices
This probably separates good and great use of coding agents. But who knows, maybe we’ll just let deep research figure out the right stack given the goals and constraints before getting the coding agents to work.
Final Thoughts
Working with data systems, doing ETL/ELT, dbt modeling etc., is actually pretty similar to vibe coding, but manual. You roughly check the output, and there’s little established rigor in making and verifying changes. We studied how the Cal-ITP project established best practices for verifying changes.
It’s funny that while the industry has been discussing learning from software engineering practices and adopting them to data workflow for a few years, a significant amount of software creation in the near future is likely more vibe-based.