AI & Agents 15 April 2026 · 11 min

Inside the BD Crew: six narrow scouts, a verifier, and a human in the only seat that signs.

By Timothy Mo · 15 April 2026

How we built the bilingual EN/中文 BD pipeline that ships into WaterDoctor's CRM every week — six lane-specialist scouts, a deduping editor, a corroborating verifier, and a registry that grows by proposal, not by crawl.

The BD crew runs every Monday. Six narrow scouts sweep grants, public tenders, corporate intelligence, regulatory triggers, industry events and academic collaborations across Singapore, Greater China, Hong Kong, Taiwan and ASEAN. An editor agent dedupes, scores, fills bilingual gaps. A verifier agent corroborates every claim with fresh independent searches, checks every link is live, plausibility-tests every deadline. A human BD owner promotes the verified pipeline into CRM follow-up. The live instance ships into WaterDoctor’s BD pipeline. The shape ports to anyone who needs a curated opportunity feed instead of a crawler dump.

This is what’s actually wired underneath, why each piece exists, and the failure modes the eval is built around.

Why “one BD agent” failed

The first version of this was, predictably, a single agent. “Find me grants and tenders relevant to aquaculture in Singapore and China this week.” The output was confident-looking, broad, and largely useless. The dominant failure modes:

Hallucinated grants. The model would pattern-match a real Enterprise Singapore programme name with a vaguely-plausible deadline that didn’t exist. The opportunity looked legitimate until you tried to find the actual call.

Stale links served as live. Grant pages from 2022 with the right URL shape and the wrong content. The model had no concept of “is this still open”; it pattern-matched against historical instances of the URL.

Category leak. Asked for grants, it would surface tenders. Asked for tenders, it would surface industry events. A single agent answering a single broad question collapses categories that ought to stay separate, because the human reader of a BD pipeline reads different categories with different criteria.

No corroboration. A single agent has no second opinion. If it pulled a fact from one place, that fact stayed pulled from one place.

The fix was structural. Six scouts, one lane each, with a curated source list per lane. An editor that dedupes across lanes without paraphrasing the underlying data. A verifier that runs fresh independent searches after the editor, against every surviving opportunity, with the verdict on the record.

The shape

The grants scout watches Singapore Food Agency, Enterprise Singapore, A*STAR, MARA’s 渔业发展补助资金, MOST’s 蓝色粮仓 programme, NSFC, provincial Departments of Science and Technology, ADB blue-economy technical assistance windows. The tenders scout watches GeBIZ alongside the China, Indonesia, Philippines, Thailand, Vietnam and Malaysia procurement portals. Corporate intelligence, regulatory triggers, events and academic each have their own brief and their own source list.

One scout per lane is the right cardinality. Two scouts merging at the boundary is where category leak comes back. A scout that’s too narrow produces a thin feed; a scout that’s too broad collapses categories. We tuned the briefs over six months until each one produces between five and twenty candidates per week — enough signal to be worth deduping, not so much that the editor is doing primary filtering.

The registry, not the crawl

This is the part that matters more than the agent design.

A naïve BD crew sends agents at the open web. Ours doesn’t. Each scout reads a curated source list — tier-graded, language-tagged, region-tagged. The grants scout for Singapore reads roughly forty named portals; the grants scout for China reads roughly sixty. The list is in the database. The agent cannot read sources outside it without a proposal step.

When a scout encounters a source it didn’t know about — a new provincial DoST page, a new sectoral grant programme — it can propose the source. The proposal lands in a queue with the URL, a one-paragraph rationale, and the language. It only becomes a scanned source after the human operator approves it. The crew suggests; the operator extends the registry.

This sounds like overhead. It is overhead. It is also why we have a real signal-to-noise ratio. A crawler-style approach pollutes the pipeline with low-tier sources the moment something interesting links to something less interesting. A registry refuses to.

The registry has a tier per source. Tier-1: government primary, official funding agency. Tier-2: legitimate aggregators (GovTech-curated lists, sectoral association programmes). Tier-3: mainstream media coverage of the same thing. Tier-4: trade press. Tier-5: blogs, social. The scouts read tiers 1–3 by default; the verifier accepts tier 1 alone or two-of-tier-2/3 as corroboration; tier 4 and 5 are surfacing-only, never load-bearing.

The verification gate

The verifier runs four checks against every candidate. Title sanity is the cheapest reject — empty title or title equal to the URL is REJECTED before any corroborating search is spent. Deadline plausibility flags missing deadlines and rejects deadlines more than five years out. We do not let evergreen-looking links pass as live grants.

Corroboration is the load-bearing check. A tier-1 primary that’s still live the week we send it is sufficient on its own. Otherwise the verifier requires two live independent corroborators, and independent means not from the same parent organisation or aggregator. URL liveness is the final stop — every URL gets a HEAD or GET this week, and dead URLs block VERIFIED status.

The 70% auto-retry catches the same upstream-flakiness failure mode the WaterDoctor crew protects against. When more than seven of every ten items in a batch come back REJECTED, the rejected subset re-runs once before being persisted. Bad search-grounding hours should not propagate as quality data.

The verdict is on the record. The owner sees VERIFIED with full corroboration, FLAGGED with the reason (missing deadline, single source, tier-too-low corroboration), or nothing — REJECTED items are dropped before the owner’s screen.

Implementation phases

Phase 1 — registry first. Three weeks. Before any agent ran, we built the source registry — about three hundred curated sources across the six lanes, tagged by tier, language and region. The registry was hand-walked: each source loaded, classified, sanity-checked. This was tedious work that paid off immediately.

Phase 2 — one scout, one lane, one verdict. Four weeks. Grants scout only. Single language. No editor, no verifier — we wanted to know if the scout could read its registry and produce candidates we’d actually pursue. About 30% of week-one output was usable, climbing to 65% by week four with prompt iteration.

Phase 3 — six scouts in parallel. Three weeks. We added the other five lanes. The category-leak failure mode came back hard the first week — the corporate intelligence scout was returning grants, the events scout was returning conferences-that-were-actually-tenders. The fix was brief specificity in tier-01 memory: explicit “you do not surface X, X belongs to scout Y” rules per lane.

Phase 4 — editor and verifier. Four weeks. The editor went in first — dedupe across lanes by source URL plus a fit-urgency-value score on a 1–5 scale. The verifier came two weeks later. The editor and verifier both read tier-04 (the registry, the eval rubric, the refusal list) and both write to a per-piece scratchpad in tier-03.

Phase 5 — bilingual and the proposal queue. Three weeks. EN ⇌ 中文 across every field went in once we had the verifier holding the line. The proposal queue — a scout’s path to extending the registry — went in last, because we wanted to be sure the registry was the constraint we wanted before we built a path around it.

Total: about four months from first scout prompt to weekly cadence. The registry build was longer than any single agent build.

What I’d change next time

Three.

Score the registry, not just the opportunities. We score every opportunity 1–5 on fit, urgency, value. We do not score sources on their hit-rate. A source that produces twelve opportunities a year of which one becomes pursued and zero close should be flagged for review. We are starting to track this; we should have started tracking it on day one.

Build the proposal queue earlier. Operators were proposing new sources by Slack message for the first three months. Those messages got lost, debated and forgotten. A proposal queue with a real workflow — reason, evaluator, decision, decision-on-the-record — should be in place before scouts go live.

Move corroboration weighting onto a config, not into the prompt. “Tier-1 alone or two of tier-2/3” is in the verifier’s tier-01 memory today. It should be a config value the operator can change quarterly without touching the prompt. The way this surfaces is the scout-and-verifier prompts conflate what to do with how strict to be; those should be separate.

The shape ports. Six scouts becomes four or eight depending on the domain. The lanes change — clinical-trial calls instead of grants, M&A signals instead of corporate intelligence, RFI publications instead of tenders. The mechanics — narrow scouts on a curated registry, an editor that dedupes without paraphrasing, a verifier that corroborates with fresh searches, a human in the only seat that signs — hold across every BD function we have looked at.

— Timothy Mo, wGrow

← All field notes Brief a crew →