Feb 08, 2025

Spec-driven Vibe-coding

Giving a name to an amorphous new thing is a powerful act. And that’s what Karpathy did, via tweet:

There’s a new kind of coding I call “vibe coding”, where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It’s possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good. Also I just talk to Composer with SuperWhisper so I barely even touch the keyboard. I ask for the dumbest things like “decrease the padding on the sidebar by half” because I’m too lazy to find it. I “Accept All” always, I don’t read the diffs anymore. When I get error messages I just copy paste them in with no comment, usually that fixes it. The code grows beyond my usual comprehension, I’d have to really read through it for a while. Sometimes the LLMs can’t fix a bug so I just work around it or ask for random changes until it goes away. It’s not too bad for throwaway weekend projects, but still quite amusing. I’m building a project or webapp, but it’s not really coding - I just see stuff, say stuff, run stuff, and copy paste stuff, and it mostly works.

“I just see stuff, say stuff, run stuff, and copy paste stuff, and it mostly works” – this “veni,vidi, vici” for the modern AI age.

This is how a lot of us have been programming for a while already. It will become more entrenched as the models get better. Having to constantly hit “Accept All” will become the number one user complaint.

I spent the weekend vibe-programming a side project I’ve wanted to implement for a while: a time-tracking app that simply regularly looks at my computer screen and infers what activities I’m doing, and whether that’s deep work, light work, or just entertainment. The project itself is not important for this post. I want to focus on the methodology for building it.

I built it using a twist on vibe-coding that I call spec-driven vibe-coding. The overall flow is this:

First, write a spec. Actually, two specs.

First prompt the LLM to take a basic outline of the idea and propose a full product spec. Iterate in Artifacts as necessary, though for me the first cut was pretty much there.

Then give the product spec to o1-pro to produce an engineering design document.

Armed with these two documents, you can zero-shot vibe-code the product. At this point I moved to VS Code with agents. I put the above two docs into a “docs” folder in my repo, attached it as context in Copilot Chat, and simply asked it to implement the spec.

In this case, I also prompted it again to write tests to cover all the code it had generated. It took a few rounds of back and forth to get all the tests to pass. Then a few more rounds to resolve runtime errors to get it to run and actually work. All in all, this was about a Saturday afternoon’s worth of combined time. From 3-4 bullet points of the basic idea to product spec to software design to running code.

And then I started adding features. I kept the same basic principle in mind: first write a spec, then update the software design doc, then ask the LLM to write code that meets the spec and the design.

Throughout the entire process I did not write any code by hand. I found bugs and issues, yes, but I gave the stack traces or test failures back to the model to solve. I was getting tired of hitting “Accept”.

This is the worst this capability will ever be. This was a tiny weekend project. (Actually, if I had to do this by hand before the AI age, I’m sure it wouldn’t have fit in a weekend, and hence, would never get built.) This basic flow will work for larger and larger codebases over time. Raw-dogging code by hand will have the same frequency as writing assembly.

If you are a programmer, you should be embracing AI to increase your impact and agency.

Appendix: Prompts and Results

Prompt to take basic outline of idea and produce a product spec:

Canvas help me write product spec for the following idea. I am also attaching a diagram describing the flow I’m thinking of. This product spec will then be given to a software engineer to design and implement, so it should have enough detailed description of the product for an engineer to design and implement it fully.

Idea for Time Tracking

#quantified-self

### Goal Automatically measure which apps and tasks I’m spending my time on, on my computer

### Outline

Take screenshots of desktop every ~1min (configurable time interval)

Have a vision LLM describe that image

Then do some more summarization and analysis on that

This should give me a picture of what I’m spending my time on

Resulting product spec.

Prompt to take product spec and produce a detailed engineering design document:

i will give you a product spec below, as a markdown file. thoroughly read it and then produce a software design document to implement that spec. this design doc will be handed off to a programmer for implementation, so it should have enough detail that the programmer can implement it without any ambiguity.

basic tech choices: this will run in mac os x, should be written in python, run as a command line app (no gui necessary).

<paste in product spec here>

Resulting design doc.