Vivek Haldar

Metaphors for thinking about LLMs

Not so long ago, I suggested that when it comes to using LLMs for programming, a useful metaphor is to think of them as compilers. I recently came across a couple of papers which are exploring that line of thinking deeper.

In their most general sense, compilers are translators from one language to another. We usually narrow that meaning to only allow translators that output low-level executable machine code, but a program (like an LLM!) that took natural language and emitted executable code in a high-level language could also very well be classified as a compiler.

In Large Language Models: Compilers for the 4th Generation of Programming Languages?, the authors chase exactly that line of reasoning:

…this is a speculative paper discussing whether large language models could be considered a higher level of programming language in relation to current high-level languages. In short, assembly language (2nd generation) replaced punch-card programming (1st generation) by introducing mnemonics. These allowed larger and more complex programs to be created in less time. High level languages (3rd generation) in turn replaced assembly language by introducing structured English constraints. The hypothesis explored in this paper is that large language models could be a 4th generation language, replacing high-level languages by allowing natural language specifications.

The most obvious objection to this line of reasoning is that compilers are deterministic and (modulo bugs) will emit “correct” code, whereas LLMs are probabilistic and will sometimes emit erroneous or incorrect code. The authors propose that feedback be used as a central mechanism. Syntax errors should be fed back to get correct code. Code that does not adhere to a spec (as expressed in unit tests or other checking mechanisms) should be fed back to improve the spec, i.e. the input prompt.

It can also be argued that the deterministic translator is primarily concerned with syntax (i.e. the structure of sentences based on the Chomsky hierarchy) and the probabilistic translator with semantics (i.e. the relationship between words based on the distributional hypothesis). Therefore, it is not a case of replacement, but of composing these two translation strategies. As a result, if a source is generated with syntactic errors, the generator would produce an improved source by using the error messages as feedback.

In What is it like to program with artificial intelligence?, the authors cast their net broader and try on a few different metaphors to see how they might fit LLMs.

  • LLM as a pair programmer: Copilot’s original pitch. Sometimes you drive, sometimes you navigate. It’s your supercharged rubber duck.
  • LLM as search: Skip Google and StackOverflow, ask ChatGPT/Copilot.
  • LLM as compiler (there it comes up again!): the march of programming to higher levels of abstraction and being declarative continues unabated, only to end up in… English?

The authors point out scenarios where each of the above metaphors applies, and where it breaks down. Because none of them is a perfect fit, they say that programming with LLMs is an entirely new, distinct way of programming. It certainly feels that way.

Hoping to make videos soon covering these in a bit more detail.