Infra & Security 24 April 2026 · 6 min

Redis Task Queues for Inter-Crew AI Handoffs

By wGrow Project Team · 24 April 2026

When completed quoting runs per hour fell 23% in our production monitoring, our engineers blamed LLM inference latency. They were looking at the wrong metric.

We run six concurrent AI agent crews in production — the BD Crew, Article Crew, Finance Crew, WaterDoctor Analytics Crew, Alerting Crew, and Image Crew — passing structured data between them dozens of times per hour. When we finally profiled the full pipeline, inference was fine. The LLM providers were holding up their end. The latency was sitting in the gaps — the state handoff layer nobody had bothered to instrument.

Over eighteen months we installed three different AI orchestration libraries. Then we uninstalled all three and replaced them with Redis queues. Throughput stabilized within hours.

The False Promise of Native Agent Memory

Every first-generation AI orchestration library makes the same bet: abstract away state management so developers don’t have to think about it. The agent class holds context. You invoke the next agent. The library shuttles the payload in-process. Clean. Simple.

Until it isn’t.

The failure mode is not dramatic — no hard error, just slow degradation under concurrent load. Large JSON context windows bloat in-process memory. Python’s garbage collector pauses at exactly the wrong moments. Node’s event loop stalls under allocation pressure. Engineers instrument the LLM API calls, see response times between 800ms and 1.2s, and conclude the model is the bottleneck. Meanwhile, the actual failure point — the inter-crew communication layer — goes uninstrumented because it looks like application code, not infrastructure. It doesn’t show up on dashboards. It just quietly costs you throughput.

We ran six crews across distributed containers, and that’s where native in-memory handoffs really fall apart: they don’t cross container boundaries reliably. Each library we tried had its own partial workaround — one used a SQLite-backed memory store, one bundled a Redis-compatible layer, one pickled Python objects onto a shared volume. None gave us visibility into message lag. None gave us backpressure. None gave us replay on failure.

To be fair: for single-container, low-concurrency pipelines, native orchestration memory may be adequate. The moment you’re running multiple crews across distributed processes under real load, the abstraction breaks down.

Fixing the wGrow Quoting Pipeline

Software engineer analyzing system metrics on multiple monitors in a modern office.

Handoff Architecture

Our internal quoting pipeline runs a strict sequential handoff. The Sales Crew ingests a client transcript, extracts requirements, and structures a project scope as JSON. That scope must pass cleanly to the Finance Crew, which generates pricing based on scope complexity, resource estimates, and margin rules.

With the native orchestration library’s memory abstraction in place, the handoff was synchronous and invisible — it looked like a function call. In one logged peak-load window with three or more quoting runs in flight simultaneously, our pipeline logs showed 4% of finalized scope payloads had no corresponding Finance Crew receipt event. A dropped payload meant the Finance Crew never received the scope, no quote was generated, and someone had to manually identify and reprocess the gap.

Four percent sounds small. Over a month of sales activity, it was dozens of lost or delayed quotes.

We replaced the native handoff with Redis Streams. The Sales Crew now pushes the finalized scope JSON to a dedicated stream (quoting:scope:v1) once it has validated the payload structure. The Finance Crew runs a consumer group against that stream and only ACKs a message after the generated quote is written to the database.

Payload drop rate fell to zero. We also got a full audit log — every scope ever pushed, with timestamps. When a quote is disputed, we can replay the exact payload that was priced.

Unblocking WaterDoctor Predictive Maintenance

The WaterDoctor system is a different kind of problem. It’s not a sequential pipeline; it’s a continuous stream. IoT sensors in water treatment facilities generate telemetry at variable rates. The Analytics Crew processes incoming readings, runs anomaly detection, and flags events requiring client notification. The Alerting Crew consumes those flagged events and dispatches them.

Early on, we used a bespoke agent-to-agent message bus from one of the orchestration frameworks. That bus choked at 50 messages per second, with alert latency spiking above 3 seconds. For a predictive maintenance product, that’s not a performance issue — it’s a product failure.

The architecture now runs two Redis patterns in parallel. Redis Pub/Sub handles volatile real-time alerts: the Analytics Crew publishes to a channel and Alerting Crew instances subscribe. If a subscriber is down when a message arrives, the message is gone — and that’s acceptable, because a fresh sensor reading will confirm the same condition within seconds. Redis Streams handles persistent logs: every flagged anomaly is written to a stream with a 30-day retention window, giving us a replay-capable audit trail and the ability to backfill missed alerts after a recovery.

Under our internal benchmark — same payload schema and deployment class as production — the crews reached 400 messages per second with queue lag and CPU holding within our operating thresholds. At this scale, the ceiling is Redis capacity, not agent throughput.

Treating Agents as Distributed Microservices

Technical diagram of discrete modular blocks connected by a central messaging queue.

The underlying principle predates LLMs by decades — it just keeps having to be rediscovered.

Agent crews should be treated as distributed microservices. They should expose clean interfaces, consume from well-defined queues, and know nothing about the internal state of other crews. Coupling the Sales Crew to the Finance Crew — sharing objects in memory, passing live references across agent boundaries — is the same mistake engineers made with tightly coupled monolithic services twenty years ago. The lesson was hard-won. It shouldn’t need to be relearned just because the services now contain LLMs.

Redis provides a lightweight, language-agnostic boundary. The Sales Crew is a Node.js process. The Finance Crew could be Python. The message format is JSON. The boundary is a Redis stream. Neither crew cares about the other’s runtime.

The durability guarantee matters as much as the decoupling. If the Finance Crew crashes mid-generation, the scope payload stays in the stream, unacknowledged, sitting in the consumer group’s pending entries list. When the worker restarts, it drains that pending list before reading new messages — that’s the recovery path for a clean restart. If the crashed consumer never returns, a separate recovery process must call XAUTOCLAIM to reassign the stale entries before they can be reprocessed. End-to-end durability still depends on your AOF fsync policy, RDB snapshot cadence, and replication configuration. This isn’t an AI-specific pattern — it’s how production message queues have worked since the 1990s.

Implementation Rules for Redis State Handoff

State Handoff Payload

1	{
2	"crew_id": "sales_01",
3	"status": "finalized",
4	"context_url": "s3://wgrow/quote_1093.pdf",	← ①
5	"structured_data": {	← ②
6	"scope_items": 12,
7	"est_hours": 45
8	}
9	}

① Document context offloaded to object storage
② Parsed strict JSON, kept under 500KB

A few rules that have held up across our deployments.

Never pass raw LLM output between crews. Force the origin crew to parse its output into validated JSON before pushing to Redis. Raw LLM text is unpredictably formatted. A downstream crew should never be parsing prose — eventually, it will break.

Keep payloads under 500 KB. In our deployment topology, reads at this ceiling have consistently stayed within our sub-millisecond budget — network path, persistence settings, and command mix will shift that number in yours. Large contexts — full transcripts, lengthy research compilations — go to object storage. The Redis message carries a reference URL, not the bytes.

Use consumer groups for horizontal scaling. When the WaterDoctor Alerting Crew falls behind during a surge, we attach another instance to the same consumer group. Redis distributes messages automatically. Consumer groups do add operational overhead — you need to manage group state and handle dead-letter scenarios — but for any workload that actually sees load spikes, the scaling headroom is worth it.

Instrument the queue, not just the LLMs. Track message lag, acknowledgement rate, and pending counts per consumer group. These metrics show where your pipeline is actually slow. LLM API dashboards won’t.

Here’s the thing: inference speed is a provider problem, and they’re actively working on it. State handoff is an engineering problem with solutions — message queues, consumer groups, durable streams — that have existed for decades. When you hit a multi-agent throughput wall, the answer is not to wait for AI orchestration libraries to catch up. Wire your agent crews to infrastructure your backend engineers have trusted for years, instrument the gaps between them, and move forward.

← All field notes Brief a crew →