Case study · April 2026
Filemender — a B2B SaaS where Claude agents run growth.
Eight scheduled Claude agents. Five blog posts a week. 50–100 cold emails a day. Hundreds of researched leads. Built solo, runs unattended.
The product
Filemender is a web-based SaaS for validating, analysing, and repairing corrupted or non-compliant media files. It's built for post-production studios, ad agencies, VFX houses, and audio engineers — anyone whose job involves video, audio, image, or document files where the wrong codec, frame rate, or naming convention will get a delivery rejected by a network or platform.
The core workflow: upload a file, Filemender runs it through a handler pipeline, identifies what's wrong (codec issues, corruption, spec violations, naming convention failures), and either flags the problems in a detailed QC report or attempts a repair. Pricing is credit-based across four tiers — Starter (30 credits/month), Pro (200), Agency (1,000), Enterprise (5,000) — and agencies can stand up branded upload portals for their own clients to submit files directly.
Stack: Vue 3 on the frontend, FastAPI + Celery + Redis on the backend, PostgreSQL via Supabase, DigitalOcean Spaces for object storage, Stripe for billing, Resend for transactional email. I built and shipped it solo.
The problem this case study is about
Building the product is half the job. The other half is finding and converting customers — and for a solo-founded B2B SaaS with no marketing budget, that's where most projects quietly die. I had a few options: hire a marketing agency I couldn't afford, do it all manually and never have time to ship product, or build an AI-powered growth stack and treat the whole marketing function as an engineering problem.
I picked option three. This case study is about what I built, what's running, what I got wrong, and what I'd change if I were building the same thing for a client.
Architecture
The growth stack runs as a set of scheduled Claude agents orchestrated through Cowork (Anthropic's desktop scheduling layer for Claude). Each agent has a single, narrow job. They share state through the Filemender database and a small set of structured artifacts — lead tables, draft queues, content calendar entries. Nothing is "one big agent." That pattern is brittle, expensive, and impossible to debug. Everything here is small, monitored, and individually replaceable.
Agent
Blog writer
Long-form SEO articles, 3×/wk. Opus drafts, Haiku fact-checks.
Agent
Lead researcher
25–30 new UK prospects/wk. Haiku shortlists, Opus qualifies.
Agent
Email drafter
Personalised 3-email cold sequences, sector-tailored.
Agent
Email sender
Python cron via Resend, 25/day max. Deterministic, no LLM.
Agent
LinkedIn writer
5 founder-voice posts/wk. Voice trained on labelled samples.
Agent
LinkedIn poster
Chrome automation, jittered timing, peak engagement windows.
Agent
LinkedIn DMs
Personalised openers referencing posts, job changes, mutuals.
Agent
Twitter monitor
Searches for venting on file corruption, queues replies.
What's running today
- ▸5 blog posts published per week to filemender.com/blog
- ▸50–100 cold emails per day through Resend, paced to protect deliverability
- ▸3 LinkedIn posts per week, plus around 50 DMs
- ▸Hundreds of newly researched leads added to the database every week
Everything runs on a schedule. Nothing requires me to be online. I check the queues daily for anything that needs human approval — anything outbound that goes out in my name has a review gate — and the rest happens on its own.
Three things I'd build differently for a client
This section matters more than the architecture. Most case studies are puff. Here's where the first version was wrong.
Lesson 01
Model selection: I started with Opus for everything. I shouldn't have.
The first version of the stack ran Opus across every agent. Within a week the API bill was uncomfortable enough to make me audit. About 70% of the work — initial shortlisting, headline generation, simple categorisation, sentiment checks — was being done at premium prices when Haiku would do it indistinguishably. I kept Opus for jobs where output quality was judged by a human (final email drafts, blog post bodies, LinkedIn posts) and migrated everything else to Haiku. Costs dropped by roughly an order of magnitude and I couldn't tell the difference in output quality.
For a client, I'd build the cost monitoring in from day one — per-agent budget caps, alerts on anomalies, a weekly spend report. I didn't, and got a surprise bill that taught me to.
Lesson 02
Hallucinated leads: the first version of the lead researcher invented contact details.
When you ask an LLM to find a CTO's email address, it will sometimes confidently make one up. The first version of the lead researcher produced lists with a non-trivial percentage of fabricated emails, fabricated job titles, and at least once an entire fabricated company. Sending cold email to fabricated addresses ruins your sender reputation immediately.
Fix: every claim the LLM makes about a prospect — email, role, company URL — gets verified against an external source before it lands in the leads table. Email goes through a verification API. Company URL gets a real-time fetch. Job title gets cross-referenced against LinkedIn. Anything that fails verification gets dropped or flagged for manual review. The agent went from a creative writer back to a researcher.
This is the lesson I'd carry into any LLM integration: you don't trust the LLM with anything that can break the world downstream. Treat its output as a candidate, verify it, then accept or reject.
Lesson 03
Cold email quality: the first version was templated personalisation that fooled no one.
The first email drafter used the kind of personalisation that's everywhere in cold outreach now — {{first_name}}, I saw you work at {{company}}… The reply rate was awful. Worse, several recipients replied just to point out the email was obviously generated.
V2 changed the order of operations. Before the drafter writes anything, a research step actually reads the prospect's website, recent LinkedIn posts, and any recent press, then writes a short paragraph of what's specifically interesting about this company right now. The drafter writes the email with that paragraph as context. Reply rates improved meaningfully. More importantly, the emails read like they were written by someone who'd actually looked at the company — because in a way, they were.
The lesson for clients: most "AI personalisation" is templated mad-libs and prospects can tell. If you're going to use LLMs for outbound, the LLM has to be doing the work a human would do — reading and thinking about the recipient — not slotting variables into a template.
What this maps to for B2B SaaS
Most companies I talk to don't need an LLM-powered product feature. They need an LLM-powered internal workflow: lead research, content generation, support triage, document parsing, reporting, customer onboarding emails. The pattern that works is the one I used for Filemender's growth stack — small narrow agents doing one job each, with deterministic verification layers around anything that touches the world. Not one giant agent. Not raw LLM output piped straight to production.
If you're building this kind of stack for the first time, the things that will bite you are model cost, hallucination on anything factual, and the gap between "looks impressive in a demo" and "still works for the hundredth unattended run." I've hit all three in production and have a strong opinion on how to engineer around them. That's the work I'm available to do.
Want this for your product?
Book a 15-min intro call.