wGrow
menu
Postgres JSONB as the Scratchpad for Multi-Agent State
Infra & Security 5 June 2026 · 6 min

Postgres JSONB as the Scratchpad for Multi-Agent State

By wGrow Project Team ·

The Split State Penalty

A focused engineer working at a desk with two computer monitors.

Our internal HR agent crew lasted seven days before we pulled it. Staff had started using it for leave applications — an orchestrator delegating to a retrieval agent for policy lookup, a logic agent for balance checks, a write agent to commit deductions to the HR system. Standard multi-agent pattern. Retrieval and memory ran on a managed vector database. Core HR data and tool execution logs lived in Postgres.

The architecture lasted one week.

It seemed sensible at design time. The vector store handled semantic search over HR policy documents and persisted conversational scratchpads between agent turns. Postgres handled employee records, leave balances, and the audit trail we needed for compliance. Two databases, two responsibilities, clean separation. We’d seen this layout recommended in several agent framework guides, and it looked reasonable on paper.

Multi-agent systems do not cooperate with clean separation. Concurrent tool calls from different agents land simultaneously — and when the leave logic agent committed a balance deduction at the same moment the orchestrator flushed a conversation summary to the vector store, we had two writes in flight across two systems with no coordinating transaction. The vector database had no concept of the Postgres commit. Postgres had no concept of what the vector store was doing. We got race conditions. We got state drift.

Splitting state across specialised databases imposes a coordination tax. At the session volumes most agent crews actually run — dozens to low hundreds per day — that tax rarely pays for itself. We went back to Postgres.

Autopsy of the HR Crew

Split Architecture
HR Agent Crew Vector Store (Memory) SQL DB (Business Data)

The failure that finally killed our confidence in the split architecture was a partial write during a network hiccup. Nothing exotic.

The leave logic agent approved a leave application. The balance deduction committed to Postgres. Simultaneously, the orchestrator attempted to write the updated conversation context — including the approval — into the vector store scratchpad. The connection timed out. The write failed silently. Retry logic fired asynchronously, after the agent had already moved to the next turn.

Postgres recorded the approved leave and the deducted balance. The vector store held the previous turn’s context. The agent’s next response treated the leave as still pending. A staff member submitted the same application twice and got it approved twice.

We spent roughly 40 engineering hours across two weeks writing reconciliation scripts and retry wrappers. Those scripts became their own maintenance liability: every schema change to the vector store metadata required a matching update to the sync logic. More glue. More surface area to break.

Dedicated vector databases excel at approximate nearest-neighbour search at low latency. Pinecone and Milvus are built for exactly that problem. They are not designed to be transactional state stores for business logic that touches leave balances or money. They cannot participate in the same transaction as a Postgres business write — that is the design contract we violated. The vector store could persist the scratchpad update, but it had no mechanism to make that write and the leave-balance deduction commit or roll back as one unit. We were using them outside their intended boundary.

Moving the Scratchpad to Postgres JSONB

Postgres Table Structure
1 CREATE TABLE agent_state (
2 session_id UUID PRIMARY KEY,
3 scratchpad JSONB NOT NULL, ← ①
4 embedding VECTOR(1536), ← ②
5 updated_at TIMESTAMPTZ DEFAULT NOW()
6 );
  1. Stores raw conversational memory and tool execution logs.
  2. Enables semantic search alongside relational state data.

The second crew gave us a chance to rebuild from scratch. An SME quotation generator handling pricing inquiries, inventory checks, and quote assembly for a local distribution client. Same multi-agent pattern. Different architectural choice.

Everything moved to a single Postgres table. We dropped the standalone vector cluster entirely.

One table, agent_sessions, holds a JSONB column called scratchpad. That column stores the full agent state: raw message history, tool call logs with inputs and outputs, intermediate reasoning traces, metadata tags. Each row is a session. Updates are standard Postgres writes inside transactions.

Semantic retrieval — still needed for policy document lookup and historical quote matching — lives in the same table via an embedding column using pgvector. We generate embeddings via a hosted model API and store the vectors alongside the structured data. A single SQL query can combine a cosine similarity filter on the embedding column with a JSONB condition on the scratchpad metadata. One round trip. No cross-system coordination.

The JSONB scratchpad imposes no rigid schema on agent developers. When a tool call completes, the result appends to the tool_calls array. When a context window flush happens, the summary appends to memory_snapshots. Additional keys land without migrations. The structure grows with the agent’s needs rather than requiring anticipation upfront.

ACID Guarantees for Agentic Tool Calls

Minimalist technical illustration of a database cylinder encapsulating data blocks.

The quotation generator does real work with real financial consequences. Quotes lock pricing. Inventory reservations get committed. External supplier APIs get called. When agent state and business state diverge, we issue a wrong quote or overcommit stock — both of which cost us client relationships we can’t afford to burn.

A Postgres transaction block addresses the local write path directly. The execution pattern: BEGIN. The agent retrieves relevant prior quotes via pgvector similarity search. The tool executes — pricing logic runs, the inventory API responds. The tool result, the reservation record, and the scratchpad update all write together in the same transaction. COMMIT. If that local commit fails, the entire block rolls back: the scratchpad returns to its pre-call state and no reservation row lands in the database. The external supplier call already happened outside the transaction boundary, though — it needs its own idempotency key and a compensating action if the local commit doesn’t follow through. Postgres keeps the agent’s local state consistent; distributed side-effect safety is a separate problem you still have to solve.

This local atomicity is difficult to replicate in a split architecture. When Postgres rolls back cleanly, the vector store has already accepted a write with no corresponding rollback mechanism. You’re left with an orphaned embedding representing a tool call that never happened. Your retrieval agent finds it on the next similarity search and acts on stale state.

That is a hallucination sourced from your infrastructure, not from the model.

LLM hallucination is a probabilistic problem. You address it with better prompts, retrieval grounding, and evaluation harnesses. Database state corruption is an engineering choice. You address it by not creating the conditions for it.

Cutting Infrastructure Complexity

Infrastructure Outcomes
AWS DB Cost
−40%
DB Clusters
1

Down from 2

Transactional State
Atomic

Local writes

AWS Cost Explorer recorded a 40% reduction in database spend for that workload over the first full billing cycle after migration — measured against the managed-vector-cluster line item only, excluding LLM API costs. The cluster ran as a managed service with high-availability configuration, two replicas, and dedicated network egress — the majority of infrastructure cost for a crew that application logs put at 200 to 400 sessions per week across the four-week pilot.

pgvector on an existing RDS Postgres instance added negligible marginal cost. The extension installs in seconds. Index creation on our embedding column took under two minutes. At our session volume, similarity query latency is not the bottleneck — the LLM API round-trip dominates by an order of magnitude. At significantly higher vector counts — millions of rows, high-throughput recall requirements — a dedicated ANN index will outperform pgvector’s HNSW implementation, and the trade-off calculus shifts. Our workloads are not there yet.

The operational simplification mattered as much as the cost saving, maybe more. One set of Postgres migrations covers both relational schema changes and embedding column management. One backup schedule. One set of monitoring alerts. One on-call runbook. Engineers who already know Postgres can work on the agent state layer without learning a second operational surface. That last point is underrated — the cognitive load of a second system compounds over time, and it compounds fastest when something breaks at 2am.

The pattern generalises. Agent scratchpads, tool call logs, semantic memory, and structured business data are not fundamentally different kinds of information. They are rows with different shapes. Postgres has been storing rows with different shapes since 1996. pgvector added the embedding dimension. The infrastructure for this problem existed long before anyone labelled it an “agentic workload.”

Teams that consolidate around Postgres spend less time writing synchronisation glue and more time shipping. AI workloads do not require reinventing data persistence. The boring choice is the right choice.

It usually is.