Skills, CLIs, and MCP: What They're Actually For
There’s been a lot of discourse in the AI agent space lately about skills, CLIs, and MCP. Some of it is a category error. Some of it is a false dichotomy. And some of it is just not considering what environment these things are meant for.
I’ve been building in this space for a while and I have some thoughts that I’m hoping will clear some of these things up. Let’s start with skills. What are skills really for?
Skills are desire paths
Skills have been a formalization of something people were already doing. That’s why I call them a desire path — you look for where people are walking, where the grass is worn out, and that’s where you build the sidewalks.
Before skills existed as a spec, people were stuffing additional context into markdown files and feeding them to the model through whatever harness or agent they were using. Anthropic formalized that into a proper spec, while adding context-saving mechanics like progressive disclosure.
The whole point of skills is to articulate either procedural knowledge or informational knowledge. Procedural knowledge is a set of instructions for how to do something — you already know what to do, and you want to simply tell the model. You don’t want the model to have to plan or think or reason through a process that is largely pre-determined. Informational knowledge is general domain context that helps the model on a task.
To take a common example: if you were building a data analysis agent that queries a bunch of tables, you’d give it knowledge about what the column names mean inside your company, because those meanings are specific to your domain. That’s informational knowledge. But you’d also give it very specific instructions for how to, say, close the books at the end of the month. That’s not something you want the agent to plan or reason about — you just spell out your process. That’s a procedural skill.
The important thing about skills is that they’re almost always something that is not in world knowledge. It’s additional domain information — whether informational or procedural — that you are making explicit.
SkillsBench empirically showed that curated skills authored by humans bring a huge jump in task performance. Whereas self-generated skills — skills generated by the model — bring no benefit or in some cases even hurt performance. Which makes sense if you follow the logic above: skills are supposed to be something outside the model’s world knowledge. You don’t want the model to generate skills for you. You want to tell it all your knowledge.
A great example is Anthropic’s own PDF skill. They spell out common PDF operations — merging, splitting, extracting — and specify to use PyPDF. The model doesn’t have to go hunt for PDF libraries or plan and reason about well-understood operations. The skill just says: here’s how to do it.
MCP
MCP has been a smashing success, so most people now have a reasonable understanding of it. The core idea is to bring context to the model. And once again, like the skills desire path, MCP is a formalization of function calling. Before MCP, each model provider — OpenAI, Gemini, Anthropic — had their own slightly different syntax and API. MCP formalized that into a spec.
The other thing MCP does, and this is less known, is serve up resources — HTML pages, JavaScript, CSS. This is now the basic mechanism behind MCP apps and ChatGPT apps: they are MCP servers that serve UI via resources.
So why did MCP get a bad reputation? Two technical reasons.
Context hogging. When an MCP server had a large list of tools, just loading it would eat a chunk of your context window. The most well-known culprit was the GitHub MCP server with something like 70 tools — loading it consumed roughly a third of your context window. I’d argue that’s not MCP’s fault. That’s bad MCP server design. If you’re building an MCP server, you probably should not make it a kitchen sink.
Context dumping. When you call an MCP tool, the results land in the window. Unfortunately, a bunch of badly designed servers would produce very large tool results, often larger than they needed to be.
Fortunately, both of these problems have been largely mitigated today by code mode. The idea was initially proposed by Cloudflare, but Anthropic later adopted it as the canonical way to invoke MCP tools. The basic idea is that instead of the model directly getting all tool results, you write a small amount of code to make the tool call and interpret the results. Doing it in code shields the model from both problems.
So I think the initial reasons MCP got a bad rap have been largely addressed. A lot of it was just bad server design, and code mode fixes the underlying mechanism.
Where do CLIs work really well?
That brings us to CLIs. Peter Steinberger famously said he prefers CLIs over MCP. So let’s look at where CLIs actually work well.
They work really well on what I call personal agents. These are agents running on my own machine — a Mac Mini, a VM, whatever — where I control the environment. I’m signed into all the CLIs — GitHub, Google Cloud, Cloudflare — and my agents can just invoke them on my behalf. The auth story on personal machines is clean and seamless.
The other advantage is context efficiency. Agents can do agentic search over a CLI’s function space by invoking --help and walking the subcommand tree. Most modern CLIs are structured as a top-level command with subcommands, each handling a subset of functionality. Agents are surprisingly good at this tree-based exploration. They don’t suffer from the context hogging problem of loading dozens of MCP tool descriptions at once.
Where MCP wins
But CLIs don’t work everywhere. In enterprise scenarios, you really do want MCP.
When you’re doing a proper production deployment, you’re deploying your own binary. You’re not also deploying a bunch of CLIs with some auth credentials alongside your application. You want proper service separation.
An organizational pattern I see across enterprise deployments: one team stands up the official MCP servers for the entire company, and other teams build agents on top. This buys you central governance — one official source of truth for the underlying data. You don’t want people spinning up side MCP servers all over the place, just like in the old days you didn’t want everyone creating side tables and fragmenting the source of truth.
It also buys you separation of concerns. The teams building agents don’t need to worry about standing up MCP servers, and the team building MCP servers doesn’t need to know about every agent use case across the company.
It’s not either/or
The takeaway: these are complementary tools, working at different layers of the agent stack, for different environments, not competing alternatives. Skills encode domain knowledge the model doesn’t have. CLIs work great for personal agents on machines you control. MCP is the right abstraction for enterprise deployment with governance and service separation.