Part II

The Stack

Chapter 4

Hands, Voice, Workplace

In January 2025, Elena Marchetti — a senior infrastructure engineer at a Series C logistics startup in Milan — was tasked with building her company's first agent-powered workflow. The agent needed to read shipment data from PostgreSQL, check inventory via an internal REST API, and post status updates to Slack.

Elena spent six weeks writing custom integrations. Not the agent logic — that took a day. The plumbing: a PostgreSQL connector with connection pooling and query sanitization. A REST client with retry logic and auth token refresh. A Slack tool that handled threading, rate limits, and file attachments. Six weeks of bespoke infrastructure that had nothing to do with her actual goal.

In Berlin, a developer named Marcus was building the same PostgreSQL connector. In Bangalore, a team at an e-commerce company was writing the same Slack integration. In San Francisco, three separate startups were building the same file-reading tool.

Hundreds of teams. Thousands of custom integrations. All fundamentally incompatible.

The same waste of engineering effort that preceded every major protocol standardization in computing history. Think about what computers looked like before USB. Printers required parallel ports with 25-pin D-sub connectors. Mice used PS/2 ports — a 6-pin mini-DIN that was physically identical to the keyboard port but electrically incompatible. External drives needed SCSI chains with terminators. Every device required its own cable, its own driver, its own prayer to the hardware gods.

In 1996, a consortium of seven companies — Compaq, DEC, IBM, Intel, Microsoft, NEC, and Nortel — published the USB 1.0 specification. One plug. One protocol. Any device. The problem vanished overnight.

A minimal, split-screen technical illustration comparing chaotic legacy hardware to a unified modern standard, set against a warm beige background (#faf8f5) with dark charcoal text (#1a1917). On the left, depict three distinct, complex vintage connectors (like DB-25 and DIN plugs) outlined in a red accent (#c4392a), connected by jagged, chaotic lines to small code snippet boxes to represent proprietary chaos. On the right, draw a single, cleanly designed USB-C connector linking via a straight, elegant line to a minimalist square node labeled "The Agent Standard," symbolizing a unified protocol. The overall style must be a sophisticated, clean editorial diagram—like a premium James Clear or O'Reilly tech book—focusing on perfectly straight vector lines, beautifully balanced white space, and highly legible typography for an authoritative yet approachable mood.

In 2025, Anthropic released the Model Context Protocol. MCP said the same thing for agents: one protocol, any tool.

An MCP server exposes tools with typed schemas — JSON descriptions of what inputs the tool accepts, what outputs it produces, and what the tool does. Any MCP-compatible agent can discover and use those tools without custom integration code. Install an MCP server for Slack, one for PostgreSQL, one for your file system. The agent discovers them, reads their schemas, and knows how to use them.

MCP gave agents hands.

But hands aren't enough. What happens when one agent needs to delegate a task to another agent? When your research agent finds something that needs coding, how does it hand off to your implementation agent?

Google released the Agent-to-Agent Protocol. A2A provides agent cards — standardized JSON descriptions of what an agent can do, like a machine-readable resume. It also provides a structured task lifecycle with explicit states: submitted, working, completed, failed. When Agent A delegates to Agent B, both sides know exactly where the task stands at every moment.

A2A gave agents voice.

In December 2025, six months after the protocol proliferation began, the Linux Foundation created the Agentic AI Foundation — co-founded by OpenAI, Anthropic, Google, Microsoft, AWS, and Block. MCP and A2A now live under one roof. The protocol war that took REST and SOAP a decade to resolve took agent protocols twelve months.10AAIF founding + protocol convergence. DEV Community guide

Minimal, clean technical illustration for a book about AI agent architecture. Warm beige/cream background (#faf8f5), dark text (#1a1917), red accent (#c4392a). Style: sophisticated, editorial, like a premium O'Reilly book. No clipart. No stock photo feel. Create a clean diagram: The Peace Treaty — Linux Foundation's AAIF brings MCP and A2A under one roof, co-founded by every major AI lab. Use simple geometric shapes, minimal lines, clear labels. Should look hand-drawn on a whiteboard but polished. The Peace Treaty — Linux Foundation's AAIF brings MCP and A2A under one roof, co-founded by every major AI lab

But hands and voice aren't enough. A person with hands and a voice still needs a workplace — a structure that provides safety, tools, supervision, and accountability.

That's the agent harness. The infrastructure that wraps around the agent to manage everything the model can't manage itself: context windows, tool orchestration, permission enforcement, sub-agent lifecycle, and — critically — trajectory logging. Claude Code is a harness. Cursor is a harness. OpenClaw is a harness.

The harness gave agents a workplace.

Create a minimal, sophisticated technical illustration of "The Anatomy of an Agent" on a warm beige background (#faf8f5). Feature a central dark charcoal (#1a1917) "Agent Core" node branching symmetrically to three unified, abstract icons representing "Hands" (tools/action), "Voice" (communication), and "Workplace" (governance/structure). Employ clean, precise line work with subtle red accents (#c4392a) to highlight connections, replacing the original's literal graphics with cohesive, premium iconography and improved spatial balance. The overall mood should be authoritative yet approachable, resembling a high-end software engineering textbook diagram.

The three-layer pattern

Every composition paradigm in computing history has eventually settled into exactly three layers. Not by design. By emergence. The same way every government settles into legislative, executive, and judicial branches — because three is the minimum number of layers that produce stable composition.

The Grand Unified Theory of Composition — Contract, Communication, Orchestration across OOP, Services, and Agents — Three layers. Three eras. The same pattern emerges every time.

Layer	OOP	Services	Agents
Contract	Interface	OpenAPI	MCP Schema
Communication	Method call	HTTP / REST	A2A Protocol
Orchestration	Design patterns	API gateway	Agent Harness

If any single layer is missing, undefined, or custom-built without standard protocols, the entire system becomes fragile. If you can name all three layers in your agent system, you understand its architecture. If you can't, that's where your next failure lives.11Three-layer pattern. Shaw & Garlan, Software Architecture, 1996. Wikipedia

Architectural Fragility — if any layer is missing, the system cracks

In my agent system:

Generated illustration — Generate a sophisticated, minimalist technical illustration of a 4x4 matrix titled "The Grand Unified Theory of Composition", styled like a premium editorial tech book. The layout must feature a pristine warm cream background (#faf8f5), crisp dark charcoal typography (#1a1917), and delicate red accent lines (#c4392a) separating the column headers (LAYER, CONTRACT, COMMUNICATION, ORCHESTRATION) from the era rows (OOP, SERVICES, AGENTS). Improve upon standard tables by removing heavy gridlines and pastel backgrounds, utilizing instead ample whitespace, perfect alignment, and an elegant typographic hierarchy to clearly present the cell contents like "Interface", "OpenAPI", and "Agent Harness". The overall mood should be authoritative yet approachable, relying on clean lines and sophisticated design to convey complex software engineering concepts with absolute clarity.The Grand Unified Theory of Composition — Contract, Communication, Orchestration across OOP, Services, and Agents

← Previous The Building Block That Says No Functions can't refuse. APIs can't improvise. Agents do both. Next → The Real Unit The agent is the runtime. The skill is the unit.