Inside the Image Crew: eight e-commerce roles, one anchor, brand-locked across six channels.
By Timothy Mo ·
How we wired the Image Crew on the ArightAI platform — five analyst agents brief the shoot, a render agent ships the frame, a grader catches drift, an art director signs off. The anchor-first pattern that keeps two thousand SKUs in lock with one brand reference set.
The Image Crew is the artwork team an e-commerce seller would otherwise hire — a brand strategist, a product analyst, a visual researcher, a creative director, a prompt engineer, an image-gen technician, a brand-fit grader, an art-director. We packed eight seats into one agent crew. It runs on the ArightAI platform across Shopee, TikTok Shop, Amazon, Tokopedia, Shopify and Lazada, and it ships every surface a listing actually needs: hero, lifestyle, infographic, feature shots, model photography, banners, ad creatives, short video.
This is the wiring underneath the platform — why eight specialists, why the anchor-first pattern is load-bearing, and where the grader catches the drift that breaks brand consistency at scale.
What “one image gen with a long prompt” failed at
The first version was what most teams ship — a long prompt, a frontier image model, a render. It worked for the first SKU. It failed at twenty.
Drift between SKUs. A camera position, a lighting key, a colour temperature established for SKU 1’s hero would gently wander by SKU 5. By SKU 20 the catalogue looked like a stitched-together photo essay, not a brand. Image models are stateless; without an anchor to lock against, every render is a fresh negotiation with the model’s distribution.
Marketplace-spec violations. Shopee accepts vibrant, badge-heavy product shots. Amazon requires pure white background with no overlays. TikTok Shop favours motion-friendly framing. Shopify DTC leans premium minimal. A single prompt that satisfies the seller’s brand brief frequently violated a channel’s listing spec — pure-white-bg renders that picked up a soft grey gradient, hero shots that crept in props that Amazon will reject.
Erased product text. Removing a watermark, a price sticker, or a seller-overlay is the most common request. Models routinely also erased the actual product text — ingredient lists, certifications, model numbers, care instructions. The seller’s competitive moat lives in those labels; we can’t lose them.
Invented compliance marks. The opposite failure: a model that noticed a “premium” framing and decorated the product with a “ISO 9001”-shaped badge that was never on the physical product. This is the failure mode that would get a seller suspended.
The fix was structural again. Five analyst agents that read the shop, the product and the brand reference set in parallel. A creative director that synthesises into a strategy. A prompt engineer that converts strategy to a precise prompt. A render agent that produces three variants per shot anchored to an approved anchor. A grader that scores brand fit, colour accuracy, anatomy and prop integrity, and marketplace-spec compliance. An art director that signs off.
The shape
The three analysts run in parallel, not in sequence, because their inputs are independent. The shop analyst reads shop name, marketplace, region, description; it knows the visual conventions of the channel it’s shipping into. The product analyst reads product name, description, specs, selling points; it categorises the treatment — electronics sleek, fashion aspirational, food warm, beauty luxurious — and pulls out the differentiators worth a shot. The visual-reference analyst studies the seller’s reference images with a vision model; it names the palette, lighting, textures, composition, mood, and flags blurry or off-angle reference shots so they don’t poison the brief downstream.
Each analyst emits a 3–5-bullet brief. The creative director consumes all three briefs and writes a single creative strategy plus a meaningfully-different shot per variation — not just an angle change. The prompt engineer converts strategy to the precise image-gen prompt. The render agent produces three variants per shot. The grader scores. The art director signs off.
The anchor-first pattern
This is the part that is the architecture, not the agent design.
For any product, the first hero shot generated is the anchor. The art director approves the anchor before any variant runs. Every subsequent shot — lifestyle, infographic, feature, ad, banner, video frame — is generated against the anchor, not against the original product photograph.
The anchor pattern is what kills drift. The render agent is given the anchor image plus the variant prompt; it must keep camera, lighting, palette and composition consistent with the anchor, while changing the scene appropriately for the variant. A lifestyle shot is the same product, the same lighting language, in a contextual scene — it is not a fresh negotiation with the model.
The anchor is also where the brand reference set locks in. A seller’s reference set is a small library — five to twenty images that capture the brand’s visual voice. The anchor inherits from the reference set. Every downstream shot inherits from the anchor. A new SKU in a hundred-SKU catalogue produces a new anchor, but the reference set is the same; the new anchor visually agrees with the previous ninety-nine. We have shipped catalogues at this scale; they hold.
The art director approves anchors. The art director can also send an anchor back with notes: “lighting too cool”, “prop too prominent”, “composition off-axis.” The render agent re-rolls. Once the anchor is approved, the variants run.
The grader
Four axes, scored before the human ever sees a frame:
Brand fit. The grader compares the variant against the anchor and the reference set on palette consistency, lighting language, mood. A frame that looks beautiful but inconsistent with the anchor scores low here and is regenerated.
Colour accuracy. Specifically: does the product’s actual colour, as seen in the original photograph, survive through the render? A model can drift a navy blue toward a friendlier teal because it makes for a better-looking frame; the buyer who receives a navy product they thought was teal returns it. The grader holds the line.
Anatomy and prop integrity. For model photography: hands have five fingers, feet have toes in the right places, the model holds the product in a posture a human would. For props: the product hasn’t sprouted an extra label, an extra cap, a phantom cable. Image models still occasionally get this wrong; the grader catches it.
Marketplace-spec compliance. Aspect ratio, resolution, file weight, file format — and channel-specific rules. Amazon’s listing-card spec rejects shots with overlays. Shopee’s first image must be square. TikTok Shop wants 9:16 ready. A frame that fails the marketplace spec doesn’t ship, no matter how it scored elsewhere.
Below threshold goes back to render. Above threshold is packaged in marketplace-ready dimensions. The grader’s scores travel with the asset; the art director sees them.
Implementation phases
Phase 1 — render-then-grade only. Five weeks. We started with a render agent and a grader, no analysts. The seller pasted a long brief, we generated, the grader scored, the art director approved. It worked for one or two SKUs at a time. It was unworkable above ten — the seller couldn’t keep providing detailed briefs, and the prompts varied enough between briefs that drift came back.
Phase 2 — analysts in front. Four weeks. We added the shop, product and visual-reference analysts. The seller now submits raw shop and product data plus a reference set; the analysts brief the shoot. This phase was where category-leak between analysts surfaced — the product analyst sometimes editorialised on brand, the shop analyst commented on product specs. Tier-01 prompts got tighter on what each agent does and does not do.
Phase 3 — anchor-first. Three weeks. We had been generating each shot as a fresh negotiation with the model. We refactored to render the hero first, gate it, then render every other shot with the approved hero as a reference image plus the variant prompt. Drift dropped sharply. This was the highest-leverage change in the whole build.
Phase 4 — channel registry. Three weeks. Per-channel rules — Amazon vs Shopee vs TikTok Shop vs Shopify — moved from being baked into prompts to being a registry the shop analyst reads. Adding a new channel is now a config change, not a prompt rewrite. We added Tokopedia and Lazada in this phase against the registry, not against new prompts.
Phase 5 — refusal list. Two weeks. The “won’t ship” rules — no invented certifications, no erased product text, no hero shots from behind, no fake testimonial headshots — moved into tier-04 shared memory and surfaced on the art director’s review screen as positive checks. The grader runs them; the art director sees the verdict; refusals are on the record.
Total: about four months. The anchor-first refactor in Phase 3 was a week of refactoring that lifted everything that came after.
What I’d change next time
Three.
Build the anchor-first pattern from day one. We learned anchor-first the hard way. For the next render-heavy crew, the anchor is the first abstraction. Every render is “render against anchor X with variant prompt Y” — there is no “render this prompt fresh” path in the API.
The brand reference set is a tier-04 artefact, not a prompt. Some early implementations passed the reference set inline with each prompt. The reference set is a shared resource of the crew, read by analysts and render agent both, written by the seller and the art director only. It belongs in shared memory with the same write-protection as the registry on the BD crew.
Channel-spec is a config registry, not prompts. Marketplace conventions — aspect ratios, overlay rules, white-bg compliance — change. Prompts that bake these in age badly. A registry is one row to update; a prompt rewrite is a quality risk every time. We moved here in Phase 4; we should have started here.
The shape ports. Eight roles for e-commerce imagery becomes a different number for catalogue photography, food photography, real-estate listings, lookbook content. The anchor-first pattern, the analyst-then-direct pipeline, the four-axis grader, the human-as-only-signer — those hold across surface. The Image Crew is the most-instrumented version we have running, and the one I’d reach for first when somebody asks “how do you keep two thousand SKUs in lock with one brand?”
— Timothy Mo, wGrow