Harry Benham.

AI Integration Engineer · UK · Booking now

I ship AI features
into your Business.

Already scoped an AI feature, or want to pick one off the menu of 50+? Either way — fixed price, shipped in 1–3 weeks, yours to keep (or I'll host it for you).

Production, not demos

I built and run my own B2B SaaS. I know what AI work actually costs — and how it breaks.

Start where you are

Roadmap if you're stuck. Sprint if you've scoped it. Either way, fixed price.

You own it after

Code, prompts, AI agents — maintained by me or handed over so your team runs it without me.

Try it · paste a URL or describe your business

Paste any company's homepage URL. I'll read the page and propose one AI feature I'd build for that team — in about 8 seconds.

You'll get back

01

A one-paragraph read on what your team does

02

One AI feature I'd build for you

03

Which package it fits — Roadmap / Sprint / Build-out

04

Approach sketch — model, data, risks

01 / About

I'm Harry. I build the AI features that actually ship.

Seven years as a software engineer across finance, regulated industries, analytics, and telecoms. I work in Python (FastAPI, Django, Flask) and TypeScript (Node, Next.js), and I've spent the last year specialising in AI features, agents, and workflows that ship into production — not demos.

Most of my clients don't have a dedicated AI team. They're law firms drowning in document review, agencies running outbound by hand, e-commerce ops teams triaging support tickets, accountants categorising invoices, founders shipping their first AI feature. They know AI should help. They don't know where to start. That's the gap I sit in — translating “we should do something with AI” into a working tool their team uses on Monday.

I also built and run Filemender, a B2B SaaS for post-production teams, end to end and solo. The growth stack you can read about above runs my marketing function. I know what production AI work actually costs, where it breaks, and how to engineer around the failure modes — because I've hit them in my own product.

I work remotely across European business hours, take on one client at a time, and I'd rather turn down work than over-promise on a timeline. If we're a fit, I'll tell you. If we're not, I'll point you at someone who is.

Models & agents I build with · default Claude

Claude (Anthropic)

Claude

Anthropic

DEFAULT

GPT (OpenAI)

GPT

OpenAI

BENCHMARK

Gemini (Google)

Gemini

Google

BENCHMARK

Grok (xAI)

Grok

xAI

BENCHMARK

Llama (Meta)

Llama

Meta · Open-src

ON-PREM

DeepSeek

DeepSeek

DeepSeek · OSS

ON-PREM

Cursor

Cursor

Coding agent

AGENT

ClickUp

ClickUp

Workflow agent

AGENT

Claude is the default for production work (evals, long-context, tool use). I benchmark against GPT, Gemini, and Grok per project, and route to open-source Llama or DeepSeek for on-prem or cost-sensitive workloads. Day-to-day I work inside Cursor. Model choice is part of the deliverable, not an assumption.

02 / Recent work
8 scheduled agents · running livemodel · Opusuptime · 0.0sshipped · 00 / 16
▸ Shipping to
your business
monthly spend£18.00
haiku · cheap triagesonnet · extraction + draftingeval suite · greencost cap · £100/moverified · 0 fabricated8 agents · 1 functionhaiku · cheap triagesonnet · extraction + draftingeval suite · greencost cap · £100/moverified · 0 fabricated8 agents · 1 function

Live · 8 scheduled Claude agents shipping into one business · looped preview

Filemender — a B2B SaaS where Claude agents run growth.

I built and operate Filemender, a media file validation and repair tool used by post-production studios and agencies. The interesting part isn't the product — it's the AI agents and workflows running on top of it.

Eight scheduled Claude agents handle SEO content, lead research, cold email drafting, LinkedIn posting, and Twitter monitoring. The whole marketing function runs as engineering — and the case study walks through what I got right, what I got wrong, and what I'd build differently for a client.

Case study · ~12 min read

Filemender — eight Claude agents running growth

Architecture, costs, what broke, and what I'd build differently for a client.

8

Scheduled Claude agents in production, each with a single narrow job

~10×

Cost reduction after migrating routine work from Opus to Haiku

0

Fabricated leads after adding the verification layer

What a user said

“FileMender saved us from a disastrous broadcast rejection on a Friday night. It fixed the loudness levels automatically in seconds.”
JD
James D.

Post-production lead

03 / Other work

Smaller builds, same playbook.

Document review · 10 days

Contract-clause extraction for a law firm

Replaced a paralegal workflow that took 3 hours per contract. Claude reads the PDF, extracts 18 named clauses into a structured review table, and flags anything unusual. Now runs in under 90 seconds per doc.

Claude OpusPDF parsingPostgres

Classification · 10 days

Inbound-lead triage for a B2B agency

Replaced a hand-written rules engine with a two-model classifier: Haiku for cheap triage, Sonnet for ambiguous cases. Cut ops time per lead from 4 min to 18 seconds.

Claude HaikuClaude SonnetZapier

Agents · ongoing

Outbound agent for a pre-seed founder

Daily agent that researches ICP accounts, writes a first-draft cold email, and queues it for the founder's review. Runs for <£30/mo of model spend.

Claude SonnetPlaywrightCron

More case studies on request · Ask to see one ↓

04 / How I work

Three packages. Fixed scope. Start where you are.

Most AI work that fails in real businesses fails for the same reasons: no clear scope, no evals, runaway costs, no handover. I package the work to remove those failure modes upfront — and I'll start with a roadmap if you don't yet know what to build.

▸ Every Sprint & Build-out ships with the learning kit so your team can run it without me

▸ See the full AI Agent Menu (PDF) — 50+ agents I can ship, tagged by package and time

Start here

AI Roadmap

£1,500

1 week · ship one small agent

Best for: you want to see AI working in your business this week. Pick a small agent off the menu and I'll ship it. If you genuinely don't know what you need yet, swap it for a strategy plan instead.

  • 1 Small agent shipped end-to-end (default)
  • OR a 5–8 page strategy plan with 3 ranked use cases
  • Eval suite + handover doc included
  • 14 days of bug-fix support
  • 100% credited toward a Sprint if you proceed

Roadmap examples from the menu

Most popular

AI Sprint

£3,000

2 weeks · 1 medium OR 2 small

Best for: a specific AI agent you've scoped, or a pair of smaller ones. Most of the menu lives here — contract clause extraction, lead qualifier, support triage, CRM enrichment, that kind of thing.

  • 1 Medium agent OR 2 Small agents shipped
  • Eval suite covering the critical paths
  • Cost monitoring with per-feature spend caps
  • Plain-English handover doc — model, prompts, failure modes
  • 30 days of bug-fix support after handover
↓ PDFOr grab the agent menu

Sprint examples from the menu

  • A2Contract clause extraction
  • A5Form auto-fill from documents
  • A6Compliance / risk reviewer
  • A7Foreign-document translator + summary
  • See all 50+ in the menu →
Bigger scope

AI Build-out

£6,000

3–4 weeks · 2 medium OR 1 large

Best for: a connected multi-agent setup or one ambitious agent. Two Medium agents bundled, OR one Large from the menu (e.g. account health scoring, anomaly detection). Narrow and deep, not sprawling.

  • 2 Medium agents OR 1 Large agent shipped
  • Evals at every junction, not just the output
  • Cost dashboards your finance team can read
  • Full handover: code, infra, runbooks, training session
  • 60 days of bug-fix support after handover

Build-out examples from the menu

  • C6New-hire onboarding bot
  • D5Anomaly detector + plain-English alerts
  • G2Expansion / upsell signal detector
  • G3Account health scoring
  • See all 50+ in the menu →

▸ Care subscription · £50/mo per agent · most clients pick this

I host the agent on my standardised stack, monitor it 24/7, and fix bugs free. New requests are quoted separately as a mini-Sprint. Cancel any month. Best for teams without an in-house tech function — which is most of them.

05 / The learning kit

No black box. You own what I ship.

Every Sprint and Build-out ships with a full handover kit — six concrete artifacts that document everything I built, the decisions behind it, and how to run it. Whether your team runs the agent (Self-host) or I run it for you (Care), the IP, the code, and the prompts are yours from day one.

Included with every Sprint & Build-out · yours regardless of how it's hosted

01

Architecture walkthrough

30-minute Loom recorded during handover. Every decision explained — model choice, routing, fallbacks, why this and not that.

02

Annotated prompt library

Every prompt in the system with comments on why it's structured that way. Lives in your GitHub if you Self-host, or your shared docs if I run it on Care. When it drifts, you know which knob to turn.

03

Plain-English runbook

Step-by-step for the day-to-day ops: re-triggering a job, checking spend, rolling back a prompt, adding a new case. Yours to follow if you Self-host; mine to follow if you've added Care.

04

Failure-mode playbook

What breaks, what the signal looks like, what to do. Every hallucination class and edge case hit during build — documented, not hidden.

05

Eval suite, runnable

The test harness lives in your repo. Change a prompt or swap a model? Re-run in one command, see the delta, ship with confidence.

06

Post-handover support

14 days on a Roadmap, 30 on a Sprint, 60 on a Build-out. Anything that breaks the agent doing what the SOW said — I fix it on me. Care extends this indefinitely (£50/mo per agent).

06 / Questions I get a lot

Before the call.

The answers most people want before they'll spend 15 minutes with a freelancer. If yours isn't here, just ask it on the call.

We don't really know what we want to build yet. Is it too early to talk?

No — that's exactly what the AI Roadmap is for. Or, if anything in the Agent Menu (harrybenham.dev/menu) caught your eye, we can ship that as the Roadmap deliverable instead. 100% credited toward a Sprint if you proceed.

How much will the AI itself cost to run each month?

Most agents on the menu run on £20–£100/mo of API + hosting. Per-feature spend caps go into every Sprint so a runaway loop can't run away with your card. You get a costed estimate per use case on the Roadmap.

Where does the agent live after we ship?

Default for most clients is Care (£50/month per agent): I host the agent on my standardised stack, monitor 24/7, fix bugs free, you never log into a hosting dashboard. New requests are quoted separately as a mini-Sprint. Cancel any month. If you have an in-house tech team and prefer Self-host, that's free — you own everything from day 30.

What does “fixed price” actually mean if the scope changes?

Scope is locked in writing on Day 0 — what's in, what's out, the eval set, success criteria. New features mid-build get quoted as a small add-on Sprint, never billed surprise-style. If I under-quoted, that's my problem, not yours.

What if the output is wrong or the model hallucinates?

The single most important question — which is why evals are non-negotiable in every package. I build a test suite covering the critical paths. For high-stakes outputs (legal, financial, customer-facing) I add a verification layer or a human-in-the-loop step.

Do you need access to our codebase?

Almost never. Most menu agents run in standard channels (web chat, email, WhatsApp, Slack, your CRM via API) and don't touch your product code. If something does need deeper integration, I work alongside your engineers — you stay in control of your codebase.

What if our team has zero technical capability?

That's the most common case and exactly what Care is built for. £50/month per agent, I host and monitor it, you never touch infrastructure. New work gets quoted separately so there are no surprises. The agent just works — same way you don't manage your own email server.

Are our documents and data safe? NDA?

Yes on both. Happy to sign an NDA before discovery. Client data never leaves your infrastructure unless you explicitly want it to. I route model calls through enterprise endpoints with zero data retention where requested. UK GDPR DPA available on request.

07 / Let's talk

Pick a time that works.

Fifteen minutes, no prep needed. Tell me what you're trying to build (or that you don't know yet), and I'll tell you whether I'm a fit, what it would cost, and how long it would take.

  • No slide deck

    Just a conversation. Camera optional.

  • No hard sell

    If we’re not a fit, I’ll say so on the call.

  • No follow-up spam

    One email after with notes. Then it’s on you.