Nov 20, 2024

How I use AI (late 2024)

This is how I’ve been using AI (well, mostly LLMs) in the day-to-day of my professional and personal life, as of November 2024.

I’m currently an engineering manager at Emergence AI. Before that I was at Google in various roles (SWE, TL, TLM, EM). Most of my professional work is organizational and technical architecture/design, with a smattering of coding. I do code regularly for personal tools and explorations.

Programming

I use LLMs for programming a lot. It is now a basic tool, on par with having an IDE. I sometimes see techno-primitivist takes on how using AI/LLMs for coding is “bad”. I can only hope that folks getting into programming now aren’t swayed by that.

My two main tools are: Copilot in VS Code, and aider on the command line.

I’m a happy GitHub Copilot subscriber. Pretty much all my programming these days is in Python, and it works great for that. Just like the writing use-case, having Copilot means never having to start from a blank slate. I’ve been writing Python for more than two decades, so I have an eye for when it produces code I don’t like (it almost never produces flat-out wrong code). Depending on how lazy I’m feeling, I either edit it myself, or simply re-prompt it. I exercise the full range of Copilot features – the core feature of tab-completion (usually based on a comment or a prompt), as well as its chat features, for asking general programming questions, reviewing code and explaining code. Using “/explain” very often saves me a round-trip to Google and library or package documentation.

But when I need some very heavy lifting done, particularly for project-wide edits across multiple files, I turn to aider. Aider with Claude-3.5-Sonnet is a very powerful combination. Things for which I turn to aider:

scaffolding out a new project with basic functionality working (fastapi, uvicorn, gunicorn etc)
generating tests from scratch
reviewing: take a messy first pass, prompt it to be a thorough reviewer and stickler for idiomatic usage, and rewrite it to be clean and modular
refactoring across multiple files. Works great if you first generate tests.

Aider reports the cost of a session based on token usage (you have to bring your own API key for the underlying model and pay for it), and it is heart-warming to see the number of times it has generated code for me that would’ve taken me 30-90 minutes and it cost something like 7 cents.

Note that Copilot and Aider represent two very different modalities for programming with LLMs: copilots and agents. Copilot, is, well, a copilot. You, the programmer, ask it to write or re-write code, and it does that in the same place where you do your edit-compile-run cycle, namely, the IDE. Aider, on the other hand, is outside your IDE. It is somewhat more autonomous. It runs in the terminal. It performs agentic workflows and often performs multiple steps (each using LLMs) to carry out the programming task you ask. With tools like Cursor, these two modes are starting to unify.

Media consumption - reading and watching

I read stuff on the web and watch videos on YouTube. I’ve written a couple of utilities for myself to save time on both.

HN summarizer: Given an HN story ID, fetch content of the article it is talking about as well as the comment thread, summarize it.
YouTube video summarizer: this one is more complex. I wanted to do something better than a naive video summarizer, which would transcribe the video, then simply prompt an LLM to summarize it and present a textual summary. I wanted a video summary that was edited from the original video. So I used whisper to transcribe the video with word-level timestamps, then prompt an LLM to construct a summary consisting only of direct quotes from the transcript, and use those two combined to output a chopped up version of the original video. Works great. I use it daily on multiple videos, usually long lectures or podcasts.

Another common workflow is quickly understanding research papers. I’ll throw a PDF of the paper into the ChatGPT desktop app, then interrogate it. It helps me quickly understand the key findings, assumptions and methodology in a paper. This is somewhere between just reading the abstract and doing a quick page-by-page read.

Since the introduction of ChatGPT Search it has also significantly eaten into my Google search volume. I now trust web-grounded RAG enough for many day-to-day queries.

Writing

Two main themes here: going from unstructured to structured, and never starting from a blank page.

I will sometimes do an unstructured stream-of-consciousness braindump (could be either typed-out text, or just spoken voice into ChatGPT’s advanced voice mode) and ask the model to rewrite it cleanly. Helpful for capturing crisp summaries and action items from complex discussions.

When writing (design docs, requirements, blog posts), get over the activation energy of starting from a blank page. Ask the LLM for an outline, or use it as a Socratic partner to flesh out ideas or pressure-test them.

Personal utilities

Miscellaneous use-cases and personal utilities that don’t fit neatly in the above buckets:

create a math quiz on a given topic for my elementary school kids
given a transcript of a video I’ve recorded for my YouTube channel, suggest chapter titles and timestamp markers
trawl over my screenshots directory (where every filename is of the form “Screenshot-<date>.png”), then use a combination of a vision model and language model to give it a short, descriptive filename. From chaos, order!
create a chatty TODO tracker. Braindump a bunch of things I need to get done, get a clean, itemized TODO list. Then conversationally tell it as I get things done. I can ask things like “what’s left to do?” or give it natural language updates like “move item X to tomorrow”.

Huge shout out to Simon Willison’s excellent llm command-line tool that makes it a breeze to invoke pretty much every LLM from the command line, and do so in a unix-tool way that can be composed with pipes and all. It also has a clean, simple Python API that has become my goto library for abstracting over vendor-specific LLM client libraries.

Barriers and wishlist

As you can see from the above, I’m a heavy AI/LLM user, and I derive a ton of utility from it. I’d like for that to be even more frictionless. Here are my major barriers to even deeper usage of AI (or a wishlist):

deployment: this is my biggest pain-point. I can quickly whip up a Python script that strings together some LLM calls and does some local file I/O. But that only works on that local machine. What if I want to wrap it up as a web app, living in the cloud, so that I can use it even if I happen to be working on another computer? Then I immediately land in the quagmire of containerization and fighting with AWS/GCP etc to get it running in the cloud. I wish there was a quicker, easier way to wrap up these things for cloud deployment.
prompting: seems funny to say, but prompting is still a bit of a black art. I have to beg the LLM gods. There are magic incantations like “take a deep breath”. There is a lot of trial and error. Part of this is clarifying my own thinking so that I can instruct the LLM clearly about what exactly I want. But there still is a bag of tricks one must know.
background running: related to the deployment point above, but there are some use-cases I have in mind where I’d like to trigger LLMs with tasks periodically or based on a condition. This is essentially hooking up scripts to a cron schedule or other trigger, but it comes back to the pain of deployment.

How I feel about it

I’ll confess, for all my techno-optimism, I was taken aback when GPT-4 was released, and they emphasized how good it was at coding. I wrote back then:

The release of GPT-4 yesterday and the subsequent demo has made me feel a mix of excitement, melancholy and disorientation.

A couple of years ago I thought I had a decent grasp of the trajectory of programming as a field, and as a career path. Languages, frameworks and libraries would get better at modeling and abstracting more complex programs, safety vs performance would be less and less of a conflicting tradeoff, compilers would get better, IDEs and tools would get better, and underneath it all, even if Moore’s Law was hitting a wall, we’d just continue throwing more cores at larger problems. But at the end of the day, the immense cognitive leverage that comes from encoding something into executable software would ensure that programmers were always in short supply. But now I feel unsure and disoriented, not certain about where I stand.

I’m beginning to get more and more convinced of Matt Welsh’s take on how LLMs spell the end of programming as we know it (YT video of his talk). The last few decades saw steady linear improvements in programming languages and tooling. The next few years (who knows, maybe months?) are going to be uncharted territory, a step change. Like the saying goes – may you live in interesting times.

More than a year and half later, after having used these tools almost daily, I couldn’t be more optimistic. Don’t believe the doomers and naysayers! As Ethan Mollick likes to say, “Always bring AI to the table”.