Jun 30, 2024

Subject-object-verb, or– how we use AI

The AI as a product vs as a feature lens of looking at AI is going mainstream, with Humane and Rabbit exemplifying “AI as a product” and Apple Intelligence representing “AI as a feature”. But there’s another lens to consider: subject-object-verb.

I frame it as follows: in an AI interaction there are three parties–human, AI, artifact; which of them is the subject, which is the object and what is the verb or action relating the two? It is a straightforward application of the same concepts from grammar. A secondary aspect of this framing is the degree to which the verb is constrained.

Let’s look at a few concrete examples:

Generic chat: probably the most dominant modality right now. Human comes to ChatGPT (or Claude, or Gemini), stares at an empty text box. The verb in this case is completely unconstrained, depending entirely on the prompt provided by the human to the AI. What about picking the subject and the object? In a typical scenario, the human will act as subject, initiating the action. The object will be the thing or goal the subject wants out of this interaction. This, by the way, is the reason why the artifacts feature in Claude is groundbreaking. It clearly separates the object of the conversation as a distinguished entity.

But while the human acting as the subject is the most obvious path, there are many valid and useful scenarios where the human is the object.

Consider conversations where the AI is performing the role of a tutor, mentor or coach. Even though the human initiates the conversation, he is the object. The human is being taught, or mentored, or coached. The AI is the subject, performing that action upon the human.

The arc of improving the UX of AI seems to be one of constraining the verb and making the object explicit. Prominent examples are using Gemini in Google Workspace and Apple Intelligence. The way the AI is situated and presented makes the artifact being worked on (the object) readily obvious. It’s the document or email or spreadsheet you have open. The verbs are also constrained to some degree to common actions: summarize, rewrite, adjust tone, suggest formulae. Where there is free-form chat, the UI nudges the user to understand that this chat is within the context of that artifact, nudging the user towards asking questions about that open document or sheet, not generic free-form questions.

AI for developers, with products like GitHub Copilot, falls somewhere in between. The subject is almost always the human programmer. The object is almost always the code being worked upon. The verb is a combination of constrained and free-form. It is constrained when seeing inline completion suggestions: there is implicitly only one verb of “complete what I’m typing”. But the other half of Copilot’s feature set is Chat, which is open-ended. But chat has pre-defined verbs in the form of command like “/explain” or “/refactor”.

When looked at through the lens of constraining the verb, we see a replay of the age old CLI vs GUI evolution. A free-form CLI with little to no constraints on utterances is meant for experts and daunting to most users. For the masses we have a crafted and constrained UI with pre-determined happy paths and constrained verbs (actions) that the user can pick from a given menu of options.

If the user is typing in a prompt, they are at the AI CLI. If not, they’re likely using a product UI.