
Sign in to join the discussion.
Model Context Protocol (MCP) is an open-source standard, created by Anthropic in November 2024, that defines how AI applications connect to external systems: data sources, APIs, file systems, databases, and any other tool a developer wants to expose to a language model.[1] Before MCP, every integration between an AI model and an external service required custom code, bespoke adapters, and a fresh round of engineering work each time. MCP replaces that with a single, consistent protocol - one interface that any compliant AI client can speak to any compliant server.
The official documentation reaches for the USB-C analogy, and it is genuinely apt: just as USB-C lets you plug a laptop into a monitor, a keyboard, or a power supply without caring which manufacturer built any of them, MCP lets an AI agent query a database, run a search engine, or write to a file system without caring which AI model is running the show.[1] That portability is the point. By early 2026, MCP had reached 97 million monthly SDK downloads and was supported by Anthropic, OpenAI, Google, Microsoft, Amazon, IBM, and Salesforce - a cross-industry adoption rate that no previous developer protocol in the AI space had achieved so quickly.[2]
What makes this worth understanding in depth is not the adoption curve. It is what MCP actually does to the trust model of an AI system, and what that means for every developer and enterprise plugging agents into the world.
MCP follows a three-tier client-server architecture. The MCP Host is the AI application itself - Claude Desktop, Visual Studio Code, Cursor, or any custom agent runtime. The MCP Client is a component inside the host that maintains a dedicated connection to a single external server. The MCP Server is a lightweight program that exposes capabilities - tools, resources, and prompts - to whatever client connects to it.[3]
Each host can manage multiple clients simultaneously, one per connected server. When VS Code connects to both a Sentry MCP server and a local filesystem server, it spins up two independent client objects, each with its own connection lifecycle. This separation is intentional: it prevents one server's data from leaking into another's context, and it allows servers to run locally (via stdio transport, with no network exposure) or remotely (via Streamable HTTP, with OAuth-based authentication).[3]
Everything a server can offer an AI client falls into one of three categories, called primitives:
Tools: Executable functions the AI can invoke - file operations, API calls, database queries, code execution. The AI calls a tool by name with typed parameters; the server runs the function and returns structured output.
Resources: Read-only data sources that provide context - file contents, database records, API responses. Resources are attached to the conversation as background information rather than executed.
Prompts: Reusable interaction templates - system prompts, few-shot examples, structured instructions - that servers can offer to shape how the model behaves for a given task.
Servers also expose two client-facing primitives: Sampling, which lets a server request a language model completion from the host without embedding its own LLM dependency; and Elicitation, which lets a server ask the user a follow-up question mid-task. These two mechanisms are what make MCP genuinely two-directional, not just a fancy API wrapper.[3]
MCP's data layer runs on JSON-RPC 2.0 - a lightweight remote procedure call protocol that encodes every message as JSON. The communication lifecycle begins with an initialization handshake: the client sends an initialize request declaring its protocol version and supported capabilities; the server responds with its own capabilities, including which primitives it supports and whether it can send real-time notifications. Only after both sides confirm compatibility does data start flowing.[3]
This stateful handshake is worth noting. Unlike a stateless REST API, MCP maintains a live session. The server knows which client it is talking to and can push unsolicited notifications - for example, alerting a client that its tool list has changed mid-conversation, so the model can discover new capabilities without restarting. That dynamism is what sets MCP apart from earlier tool-calling approaches like OpenAPI integrations, where the tool manifest was static and declared up front.[4]
The honest answer is that MCP solves a real and previously intractable problem. Before a common protocol existed, building an AI agent that could query a database, send a Slack message, and search the web required three separate integrations, three different authentication schemes, and three different error-handling conventions - and none of it transferred to a different AI model. The engineering cost of connecting AI to the existing world was prohibitive enough that most companies either gave up or accepted severely limited agents.
MCP's value is not that it makes AI more capable. The model's reasoning is unchanged. What it does is make context cheap to supply and tool access portable. A developer who builds an MCP server for their internal database can, in theory, expose that server to Claude, GPT-4o, Gemini, and any future model that adopts the standard - without rebuilding anything. That portability lowers the cost of integration from "engineering project" to "configuration task," which is what actually drives adoption at scale.
By December 2025, Google had announced official MCP support across Vertex AI, Gemini CLI, and Google Cloud services - including fully managed remote MCP servers for BigQuery, Google Maps, GKE, and Compute Engine - confirming that the standard had escaped its Anthropic origins and become genuinely cross-industry infrastructure.[5]
Here is the thing nobody talks about in the breathless adoption coverage: MCP does not just connect AI to tools. It hands tool metadata directly into the language model's context window. Every tool description, every parameter name, every server-provided annotation becomes part of the prompt that the model reasons over. This is not a flaw in MCP's implementation - it is a structural consequence of how tool-calling works in any LLM system. But MCP's dynamic, runtime-discovery design makes it a uniquely rich attack surface.
Prompt injection is the attack class where malicious instructions are embedded in data the model reads and interprets as commands. In MCP systems, the attack vector is the tool description itself. Security researcher Johann Rehberger demonstrated this concretely in May 2025: a malicious MCP server can embed executable instructions inside a tool's description field - instructions that are invisible to human inspection but that Claude, Copilot, and Cursor will faithfully execute.[4] The attack requires no exploit of the model's weights. It exploits the trust the model extends to its own system prompt, where tool metadata lives.
Rehberger's more unsettling finding was that this attack fires before any tool is called. Simply connecting to a malicious MCP server - which causes its tool definitions to be loaded into the system prompt - is sufficient for the server to hijack the AI's behavior. He aptly nicknamed this the "Model Control Protocol": the server controls the client, not the other way around.[4]
Tool poisoning extends this concept to tool metadata manipulation: an attacker modifies the description or behavioral parameters of a registered tool to cause the agent to invoke a compromised or unauthorized action. Security researchers using the MCPTox benchmark found that poisoned tool definitions pass seamlessly into AI agent contexts with high frequency, resulting in unauthorized execution or data leakage.[6]
Two attack patterns specific to MCP's dynamic architecture deserve attention. A rug pull exploits the protocol's support for live tool-list updates. The attack follows a precise sequence: a server registers a benign tool, gains user trust and broad permissions, operates as advertised for days or weeks - then begins serving a modified tool description containing hidden instructions. No package update is required; the server simply returns different content from its tools/list endpoint. The MCP client loads the changed description into the model's context without re-prompting the user for approval, because user consent was granted based on the original description and the protocol has no mechanism to re-trigger it. According to Elastic Security Labs, most MCP clients do not re-prompt when tool descriptions change mid-session - which means rug pulls work reliably in practice.[7]
Tool shadowing occurs in multi-server setups where a malicious server registers a tool with the same name as a legitimate one from a trusted server. The model, resolving a naming conflict, may invoke the attacker's version instead. The 2025 Supabase incident illustrates the real-world cost: a Cursor agent running with privileged service-role access processed support tickets containing user-supplied input. Attackers embedded SQL instructions in those tickets that the agent executed, exfiltrating sensitive integration tokens into a public thread - a classic combination of over-permissioned tools, untrusted input, and an external communication channel.[6]
JFrog Security Research disclosed a vulnerability in oatpp-mcp, a popular C++ MCP server implementation, that enabled a novel attack technique they called Prompt Hijacking. The flaw: the server's session IDs, used to route responses to the correct client over Server-Sent Events connections, could be leaked. An attacker who obtained a valid session ID could inject malicious tool-call requests into an active MCP session without having ever authenticated. The victim's AI client would process the injected calls as legitimate server responses.[8]
Rehberger also documented a harder-to-detect variant: embedding malicious instructions using Unicode Tag characters (code points in the range U+E0000-U+E007F) inside tool descriptions. These characters are invisible in every UI that renders tool metadata - the user inspecting the tool description sees nothing. But Claude, at inference time, interprets the hidden text and acts on it. Anthropic was notified of this technique over 14 months before the public disclosure and declined to classify it as a security vulnerability.[4]
The MCP ecosystem depends on a distributed supply chain of servers hosted across npm, PyPI, GitHub, and community registries. A significant portion of these servers are third-party, unaudited, and subject to version drift. Checkmarx's security research team identified over a dozen attack vectors specific to MCP's supply chain, including fake or malicious tools slipped into registries, rug-pull dependency updates, and parasitic toolchain attacks where a chain of infected tools propagates malicious commands through multi-agent pipelines.[9]
Standardization: A single protocol replaces hundreds of bespoke integrations. Build once, connect to any compliant AI client.
Dynamic discovery: Servers can advertise new tools at runtime without restarting the session, enabling flexible, adaptive agents.
Model-agnostic portability: MCP is now supported by every major AI vendor. An MCP server built for Claude works with Gemini, GPT-4o, and any future adopter of the spec.
Low implementation barrier: A minimal MCP server in TypeScript or Python requires approximately 50 lines of code. Official SDKs abstract the JSON-RPC layer entirely.
Separation of concerns: Tool execution logic lives in the server; the model supplies only the intent. This makes tools auditable, testable, and replaceable independently of the model.
Two-way communication: The Sampling and Elicitation primitives allow servers to interact with the user and the model, enabling richer workflows than a simple request-response API.
No native authorization model: MCP's transport layer supports OAuth 2.0 for authentication, but the protocol has no built-in mechanism for scoping what a tool is permitted to do after the connection is established. Authorization is left entirely to the implementer.
Implicit trust in tool metadata: Tool descriptions are loaded into the model's context unconditionally. There is no sandboxing of tool metadata, no cryptographic attestation of description integrity, and no UI-enforced warning when metadata changes mid-session.
Stateful sessions create new attack windows: The same statefulness that makes dynamic discovery useful also makes rug-pull attacks possible. A session that outlasts the initial user approval becomes a window for mid-session compromise.
Registry governance is immature: There is no centralized, authoritative MCP registry with verified publishers. The community registries that exist have widely varying vetting standards.
Over-permissioned tools are the norm: In practice, most MCP server deployments grant tools broader access than they need, because the path of least resistance is permissive configuration. The principle of least privilege is a recommendation, not an enforcement mechanism.
Limited scope by design: MCP handles agent-to-tool communication only. Agent-to-agent coordination requires a separate protocol (Google's A2A); commercial transactions require yet another (ACP or UCP). A complete agentic architecture already needs four protocols to function.[2]
In early 2026, a meaningful and growing segment of the developer community declared MCP not merely imperfect but architecturally misguided. The critics are not hobbyists bouncing off a tutorial - they include Perplexity CTO Denis Yarats, who announced at the Ask 2026 conference on March 11 that his company was abandoning MCP internally for production workloads, citing context window overhead and authentication friction as the decisive factors.[10] Alongside that announcement, Perplexity launched its Agent API - a single endpoint routing to models from six providers - OpenAI, Anthropic, Google, xAI, NVIDIA, and Perplexity's own Sonar - with built-in tooling, no MCP server management, and no tool schemas polluting the system prompt. The subtext was hard to miss: Perplexity solved the problem MCP was supposed to solve by eliminating MCP entirely.
Y Combinator CEO Garry Tan reached the same conclusion independently, building a CLI for his use case rather than integrating through MCP - citing reliability and speed as the deciding factors.[10] These are not obscure voices. The argument deserves to be heard at full strength.
The latency cost of MCP's intermediary layer is not the 20-50ms overhead on a single call - it is what happens when those calls stack. A production agent completing a non-trivial task may chain dozens of tool invocations: fetch context, query a database, call an API, parse a result, repeat. Each hop through the MCP server doubles the network round-trips required, and at 20 or 30 sequential calls the cumulative delay becomes a structural bottleneck rather than a rounding error. For latency-sensitive workloads - automated pipelines where decisions depend on prior results, or user-facing agents expected to respond in near-real-time - this is not theoretical overhead. It is a billing line item and a user experience problem.[11] Perplexity's workloads - search, fetch, extract, summarize, draft - are exactly the stress test that exposes this most brutally. Yarats' citation of high context usage and "clunky authentication" as the core issues driving the shift is an empirical observation from production infrastructure, not theoretical concern.[10]
When an MCP client connects to a server, it loads the full tool manifest - name, description, and JSON schema for every parameter of every tool - into the model's context window before the agent processes a single token of the user's request. As Simon Willison documented in August 2025, the GitHub MCP server alone defines 93 tools and consumes 55,000 tokens, leaving roughly 121,000 tokens of working context - the practical ceiling in tools like Amp or Cursor is closer to 176,000 tokens once the base system prompt is subtracted from Claude 4's 200,000-token window, and the GitHub MCP alone consumes 55,000 of those.[12] A modest custom server exposing 40 tools burns roughly 8,000 additional tokens. In a multi-agent system where an orchestrating agent spawns sub-agents each connected to multiple servers, total schema overhead can exceed 70,000 tokens per task - driving up API costs, degrading reasoning quality in measurable ways, and slowing every response.[11]
Bigger context windows did not fix this. They just let teams be sloppy for longer before the bill arrived. Willison's own conclusion after working with the problem hands-on was to stop using MCP entirely for coding agents, finding CLI utilities a more effective path for the same goals.[12]
The CLI alternative makes the token gap concrete. For well-known tools already in the model's training data - git, gh, curl, psql, aws - the agent understands the interface without any schema declaration. Connecting via a CLI exposes all that functionality for a token cost approaching zero.[12] The honest counter-argument is that custom, bespoke tools not in the training data still require some form of schema - CLI help text or MCP schema - and the gap narrows. But as Charles Chen of Motion pointed out after walking through the analysis carefully, the agents still end up traversing a tree of --help output progressively, incurring context costs that simply arrive later rather than up front.[13] For the large category of standard developer tooling, though, the savings are real and the structural advantage of the CLI approach holds.
MCP's original design assumed persistent, stateful connections - a reasonable assumption when Claude Desktop talks to a single local server on the same machine. Enterprise deployments do not look like that. They route traffic through load balancers across many server instances. When MCP sessions are stateful and a load balancer routes the next request to a different server instance than the one that initialized the session, that instance has no record of the session and the connection breaks. The workarounds - sticky sessions, shared Redis session stores, distributed state management - add operational complexity that teams did not budget for. The MCP 2026 roadmap names this a top priority, which is an acknowledgment that it was not solved at launch.[11]
Any tool that already has a working REST API pays a hidden tax when wrapped in MCP: a new server process must be authored, deployed, kept alive, updated when the upstream API changes, and hardened against the security risks described above. This is not a one-time cost - it is a recurring operational commitment per tool, per environment, per team. Chen's experience at Motion is instructive: his team bypassed MCP entirely for their initial integrations, writing thin tool wrappers directly against REST endpoints. The result was faster iteration and simpler debugging, without the overhead of managing a parallel server layer. For teams maintaining dozens of integrations, that per-server burden compounds into an engineering cost that predates the protocol's arrival - and persists long after the initial enthusiasm fades.[13] For teams managing dozens of tools, this per-server overhead accumulates into a material ongoing engineering burden that did not exist before the protocol arrived.
The most technically pointed version of the DOA argument is not that MCP is poorly implemented but that its core architectural assumption - that an intermediary server should sit between an agent and every tool - is the wrong abstraction entirely. The Universal Tool Calling Protocol (UTCP), launched in July 2025, proposes a different model: instead of routing all tool calls through a proxy server, a UTCP "manual" is a simple JSON document that tells the AI agent how to call the tool directly via its native interface, whether that is HTTP, gRPC, WebSocket, or CLI. No intermediary server. No double hop. No wrapper to maintain.[14]
In UTCP's benchmarks, simple API calls incur roughly 50ms of latency versus 75ms through an MCP proxy - a 50% overhead difference that grows more pronounced for file operations and streaming data. More importantly, UTCP eliminates the infrastructure requirement entirely: exposing an existing API to an AI agent requires adding one JSON endpoint, not deploying, monitoring, and scaling a new server process.[14]
The honest answer: dead for some use cases, indispensable for others - and the distinction matters more than the headline. The DOA critics are largely right about local stdio MCP wrapping existing REST APIs. In that scenario, the protocol adds complexity without adding value. Chen's own conclusion after working through the analysis was nuanced: he ended up defending remote MCP over Streamable HTTP for enterprise settings, where centralized secrets management, OAuth-gated access control, organization-wide telemetry, and dynamic prompt delivery justify the infrastructure overhead in ways that a CLI simply cannot replicate.[13]
What the DOA camp gets most right is the governance critique: MCP arrived with a thriving community registry and essentially no enforcement mechanism for security or quality. What the defenders get most right is that alternatives like UTCP shift the complexity burden to the AI client, which must now understand multiple transport mechanisms, handle authentication natively, and manage retries and errors without a uniform execution layer. Neither abstraction is free.
The most damaging version of the DOA argument is not about performance or architecture at all. It is about timing. Anthropic's advanced tool use features - including the Tool Search Tool for on-demand tool discovery, announced in November 2025 - introduce a progressive tool discovery model designed to address exactly the context window bloat problem that MCP's upfront declaration creates. If Anthropic is building a layer on top of MCP to fix MCP's most visible flaw, the question is whether the 2024 spec is a long-term standard or a bridge architecture waiting to be superseded by its own successors.
Most MCP deployments start from a developer tutorial and never graduate to production-grade design. These are the patterns that separate hobby integrations from reliable systems.
Every MCP tool should expose only the specific capability it needs - read access to one database table, not the entire schema; write access to one API endpoint, not all endpoints under the same credential. This is not merely a security principle; it also makes tool descriptions shorter and more precise, which improves the model's ability to choose the right tool at inference time. Vague, over-broad tool descriptions are both a security risk and a reliability problem.
Before connecting to any third-party MCP server, inspect its tool definitions in raw JSON - not through a UI that renders only visible characters. Check for unusual Unicode, anomalously long descriptions, and instructions embedded in parameter names or type annotations. For internal servers, implement cryptographic signing of tool manifests so clients can detect unauthorized description changes.
The most common performance bottleneck in MCP deployments is repeated tool discovery calls. Every new session triggers a tools/list request to every connected server. For remote servers over HTTP, this adds latency to every conversation start. Cache tool manifests with a short TTL and subscribe to tools/list_changed notifications for invalidation rather than polling. Connection pooling across sessions reduces the overhead of repeated OAuth handshakes for high-traffic deployments.[3]
MCP's stdio transport runs a server as a subprocess on the same machine as the host, with zero network exposure. For tools that access credentials, internal databases, or proprietary data, local stdio servers are significantly safer than remote HTTP servers. The trade-off is scalability: a local server serves one client; a remote server can serve many. For enterprise deployments, the right architecture typically uses local servers for sensitive tools and remote servers only for tools accessing non-sensitive, externally available data.
The MCP specification supports an Elicitation primitive for exactly this purpose: servers can pause execution and request user confirmation before performing a destructive or irreversible action. Claude's implementation already requires explicit user approval before each tool invocation by default. Preserve this behavior. Disabling it - or enabling "always allow" modes in clients like GitHub Copilot - removes the last human checkpoint between an injected instruction and its execution.
Standard application logs capture what your code does, not what the model instructed. MCP-specific observability means logging every tools/call invocation with its full parameter set, tracking which server supplied which tool definition at the time of the call, and alerting on tool description changes mid-session. Tools like MCPTox and MindGuard are specifically designed to detect tool poisoning patterns that general APM tools miss.[6]
MCP's 97 million downloads are real, but it is worth being precise about what MCP can and cannot do, because the ecosystem has already outgrown it in important respects. MCP handles agent-to-tool communication. It does not handle agent-to-agent coordination, task delegation between agents, or commercial transactions. Three additional protocols fill those gaps in a complete enterprise agent stack:[2]
A2A (Agent-to-Agent): Google's protocol for agent discovery and task delegation, launched April 2025 with 50+ partners. Agents publish "Agent Cards" describing their capabilities; orchestrators query these cards to find capable sub-agents and delegate work. A2A handles the orchestration layer that MCP explicitly does not.
ACP (Agent Commerce Protocol): IBM and Linux Foundation's protocol for agent-to-agent commercial transactions - pricing negotiation, payment confirmation, purchase orders. Governed as a community standard, not a vendor-controlled spec.
UCP (Universal Commerce Protocol): Google's commerce layer for agents transacting within Google's ecosystem - Shopping, Merchant Center, Knowledge Graph.
A full production agent stack in 2026 uses all four protocols: MCP for tool access, A2A for agent coordination, and ACP or UCP for commercial transactions. This is not fragmentation - each protocol has a distinct scope - but it does mean that "we support MCP" describes only the bottom layer of a genuinely complex infrastructure stack.
Here is the prediction that will age either very well or very badly: MCP's security crisis is not coming - it is already structurally identical to the npm supply chain crisis, and the industry is repeating every mistake in the same order.
Consider the parallels. In 2010, npm solved a real and painful problem: package reuse in JavaScript was fragmented, manual, and expensive. Within five years the registry had hundreds of thousands of packages, most of them unreviewed, many of them dependencies of dependencies of dependencies. By 2016 a single developer deleting an 11-line package called left-pad broke thousands of production applications. By 2021, malicious packages with names designed to confuse automated installers were being downloaded millions of times before detection.
MCP is at the 2012 moment of that arc. The protocol solves a real problem. Adoption is explosive. Community registries are proliferating with minimal vetting. Developers are installing third-party MCP servers from GitHub with the same nonchalance they once used to run npm install. The difference - and it is a significant one - is that a malicious npm package needs to be executed to cause harm. A malicious MCP server causes harm the moment an AI loads its tool definitions into context. The attack is faster, less visible, and does not require the user to do anything after the initial installation.
The bold prediction: within 18 months, there will be a documented, high-profile incident in which a widely-used third-party MCP server is found to have been silently exfiltrating data or manipulating AI outputs across thousands of installations - and it will trigger the same industry reckoning that the event-stream incident triggered for npm in 2018. The response will be the same: belated investment in registry governance, cryptographic signing of server manifests, and a generation of "MCP security" tooling that should have shipped with the protocol.[9]
The more uncomfortable implication is that this outcome is not preventable by better developer education. The incentive structure is wrong. Building a useful, widely-adopted MCP server is rewarding and fast. Auditing every server you connect to is slow, technical, and invisible to users. In the absence of enforced signing, verified registries, and automatic revocation mechanisms, the rational choice for most developers will be to skip the audit - just as it was for npm. The MCP ecosystem needs infrastructure that makes the secure path the easy path, not guidance that makes the secure path the responsible path. Those are not the same thing, and only one of them scales.
Anthropic, Google, Microsoft, and the other major adopters have the leverage to demand signed registries and mandatory capability declarations as conditions of continued ecosystem support. Whether they exercise that leverage before or after the reckoning is the real question.
MCP in 2026: Rise, Security Flaws and What Comes Next - Andrew Baker Inline ↗
MCP Architecture Overview - Model Context Protocol Official Docs Inline ↗
MCP: Untrusted Servers and Confused Clients, Plus a Sneaky Exploit - Johann Rehberger (Embrace The Red) Inline ↗
Announcing Model Context Protocol (MCP) Support for Google Services - Google Cloud Blog Inline ↗
MCP Security Vulnerabilities: How to Prevent Prompt Injection and Tool Poisoning Attacks in 2026 - Practical DevSecOps Inline ↗
Securing MCP: A Defense-First Architecture Guide - Christian Schneider Inline ↗
CVE-2025-6515: Prompt Hijacking Attack Affects MCP Ecosystem - JFrog Security Research Inline ↗
11 Emerging AI Security Risks with MCP - Checkmarx Zero Inline ↗
Perplexity CTO Moves Away from MCP Toward APIs and CLIs - Awesome Agents