Agent Tool Catalogues Are Risk Inventories
By wGrow Project Team ·
Action-type MCP tools — those that write, execute, spend, or delete — are the part of the agent tool catalogue that most deserves security review. A prompt injection cannot drop a database if the agent lacks the tool to do it. The moment an agent holds execution rights, the tool catalogue becomes the primary risk surface. Everything else — prompt filters, model cards, dashboard telemetry — plays a supporting role.
The Action Tool Explosion
The MCP ecosystem has shifted. Early use cases were retrieval and reasoning: search, summarise, classify. The current wave adds payment APIs, database write endpoints, file system mutations, and infrastructure control planes. As models get better at multi-step tool use, the stakes of which tools they hold rise proportionally.
CTOs deploying these systems typically spend months on model selection, red-teaming prompts, and reviewing vendor trust pages. Comparatively little time goes into auditing function call schemas. That’s the allocation error. The prompt is a suggestion to the model. The tool catalogue is a hard capability boundary. If a tool exists in the array, the agent will tend to call it — under unexpected context, at unexpected scale, triggered by unexpected input. We’ve seen this pattern appear consistently across production deployments, including our own.
The security perimeter for an agent system is not the network boundary. It is the array of JSON schemas exposed to the LLM at runtime.
Classify by Consequence, Not Vendor

Standard practice groups tools by vendor: Stripe tools, Slack tools, Xero tools. That grouping carries no security signal. An integration labelled “Xero” could mean reading a list of invoices or voiding one. Both are legitimate tool calls with fundamentally different blast radii.
We classify every tool deployed in a wGrow agent system against five consequence tiers: read, reason, act, spend, delete. Read pulls data without mutation. Reason performs analysis or scoring without side effects. Act triggers an external event — an email send, a webhook, a status change. Spend commits financial or resource value. Delete removes or overwrites persistent state.
A tool that reads an invoice from Xero is a read. A tool that voids that invoice is a delete. Calling both “Xero tools” conflates risk by an order of magnitude.
Some tools sit awkwardly at tier boundaries — a notification webhook that also writes a log entry, or a status-change call that implies downstream spend. The taxonomy requires judgment at the margins. What matters is that the decision gets made explicitly and recorded in the schema, not left implicit in an integration label.
The classification serves a second function: it is the incident response map. If an agent behaves unexpectedly in production, the first question isn’t “which model?” or “what was in the prompt?” It’s: what consequence tier of tools did this agent hold? A rogue read-only agent has a bounded impact window. A rogue agent with act and spend tools does not. The consequence tags define the containment perimeter before an incident occurs.
The Read-Only Exfiltration Risk
We built an internal compliance bot to parse wGrow’s HR and infosec policies. The initial architecture was deliberately conservative: no write APIs, no spend APIs, no delete capabilities. The bot held read tools against internal document stores and two outbound tools we considered benign — a web search endpoint for policy lookups and a structured logging endpoint for audit trails.
We treated a read-only agent as secure by default. That assumption was wrong.
The flaw surfaced during a red-team exercise. A hijacked prompt can instruct the agent to retrieve sensitive data from the document store and exfiltrate it through one of those benign outbound tools. The web search tool accepts a URL — a sufficiently crafted prompt can append sensitive context as a query parameter to an attacker-controlled domain. The logging endpoint writes structured JSON to an external sink. Either channel becomes an exfiltration vector the moment the context window is compromised. The agent never needs a write API. It needs only a read tool and an outbound path.
The fix was egress filtering on the execution environment, applied at the infrastructure layer rather than the prompt layer. All outbound calls from the compliance bot now route through an allowlisted proxy. The web search tool resolves only against an approved domain list. The logging endpoint writes only to an internal sink with no external route. The tools themselves didn’t change. The network boundary around them did.
The broader lesson: any agent with read access to sensitive data and outbound network paths requires egress review, regardless of whether it holds write tools. A read-only label describes mutation capability. It says nothing about exfiltration potential.
Spend-and-Send Boundaries

We built an automated quoting agent for an SME client. The agent scans CRM records, retrieves product pricing, and generates custom proposals. Straightforward pipeline. The risk vectors are not.
Generating a quote is a spend consequence: it implies a commercial commitment the business may not have verified. Emailing the quote is an act consequence: it initiates an external relationship with that commitment attached. Without a hard gate between those two tools, the agent can chain reasoning to spend to act in a single unreviewed pass. An incorrect CRM record or stale pricing table becomes a sent commercial document.
The boundary we implemented separates the agent’s reasoning scope from its execution scope. The agent can draft a quote and stage the email — full reasoning pipeline, all context loaded. It cannot invoke the send tool without a human approval token injected at runtime. The approval interface is a simple web screen. The agent logic has no path around it: the send tool validates the token signature before executing. Absent or expired token, the call returns a hard failure.
The separation of concerns is architectural, not instructional. We do not tell the agent to wait for approval. We remove its ability to proceed without one. Prompts can be overridden by a sufficiently crafted input. Execution gates operate independently of LLM logic.
Blast-Radius Metadata in JSON Schemas
| 1 | { | |
| 2 | "name": "generate_quote", | |
| 3 | "description": "Creates pricing quote", | |
| 4 | "x-wgrow-consequence": "spend", | ← ① |
| 5 | "parameters": { | |
| 6 | "type": "object" | |
| 7 | } | |
| 8 | } |
- ① Orchestrator strictly checks this tag before routing to execution
An OpenAPI spec or JSON schema describes parameters and response types. It does not describe risk. That gap is still open in current standards.
Every tool definition in a wGrow deployment carries a mandatory custom metadata block: a consequence field from the five-tier taxonomy, a blast_radius field describing the scope of unrecoverable impact, and a requires_human_gate boolean. These are not documentation fields. The agent runner reads them at execution time.
If the consequence is spend or delete, the runner triggers an elevated logging pathway and holds the call for human review. If blast_radius extends beyond a single record — a bulk-delete or bulk-send, for instance — the runner applies a transaction limit before the LLM ever calls the function.
Any modification to a tool tagged spend or delete is gated at the infrastructure level, requiring a separate review cycle independent of the engineering change that introduced the new capability. An agent needing new functionality does not automatically inherit the right to expand the blast radius of tools it already holds. The metadata scheme adds overhead to schema maintenance — but that overhead is the point. It forces explicit risk decisions at authoring time rather than during incident review.
Treat tool definitions exactly like firewall rules. A firewall no one reviews offers no protection. A tool schema no one audits is an execution boundary that exists only on paper.
The MCP ecosystem is less than two years old. Action tools change the risk model because model output becomes a real-world side effect. Governance frameworks are not keeping pace. The response isn’t to slow tool adoption — in most production contexts, that’s not realistic. It’s to mandate blast-radius metadata at the schema level, classify every tool by consequence before it ships, and implement execution gates that operate independently of LLM logic.
The agent does not need to be told what it cannot do. Its catalogue needs to make it structurally impossible.
Editorial notes for the research team before publication:
- The “177,000 tool definitions” figure appears in the opening and closing paragraphs and currently attributes to “a cataloguing study” without naming it. Per editorial policy, a named source or citation token is required before this piece ships. If the source cannot be confirmed, the specific figure should be removed and the claim reframed as a directional observation.