AI Integration Engineer · UK · Booking now

Try it · paste a URL or describe your business

I ship AI features
into your Business.

Already scoped an AI feature, or want to pick one off the menu of 50+? Either way — fixed price, shipped in 1–3 weeks, yours to keep (or I'll host it for you).

See the AI Agent Menu50+ AGENTS · PDF

▸ Production, not demos

I built and run my own B2B SaaS. I know what AI work actually costs — and how it breaks.

▸ Start where you are

Roadmap if you're stuck. Sprint if you've scoped it. Either way, fixed price.

▸ You own it after

Code, prompts, AI agents — maintained by me or handed over so your team runs it without me.

Try it · paste a URL or describe your business

Paste any company's homepage URL. I'll read the page and propose one AI feature I'd build for that team — in about 8 seconds.

You'll get back

A one-paragraph read on what your team does

One AI feature I'd build for you

Which package it fits — Roadmap / Sprint / Build-out

Approach sketch — model, data, risks

01 / About

Harry Benham — AI integration engineer for B2B SaaS

↓ Download · 8 pages · PDF

The Harry Benham brochure

The longer story — who I am, how I work, Filemender case study, FAQ.

Grab the brochure →

I'm Harry. I build the AI features that actually ship.

Seven years as a software engineer across finance, regulated industries, analytics, and telecoms. I work in Python (FastAPI, Django, Flask) and TypeScript (Node, Next.js), and I've spent the last year specialising in AI features, agents, and workflows that ship into production — not demos.

Most of my clients don't have a dedicated AI team. They're law firms drowning in document review, agencies running outbound by hand, e-commerce ops teams triaging support tickets, accountants categorising invoices, founders shipping their first AI feature. They know AI should help. They don't know where to start. That's the gap I sit in — translating “we should do something with AI” into a working tool their team uses on Monday.

I also built and run Filemender, a B2B SaaS for post-production teams, end to end and solo. The growth stack you can read about above runs my marketing function. I know what production AI work actually costs, where it breaks, and how to engineer around the failure modes — because I've hit them in my own product.

I work remotely across European business hours, take on one client at a time, and I'd rather turn down work than over-promise on a timeline. If we're a fit, I'll tell you. If we're not, I'll point you at someone who is.

Models & agents I build with · default Claude

Claude

Anthropic

▸ DEFAULT

GPT

OpenAI

▸ BENCHMARK

Gemini

Google

▸ BENCHMARK

Grok

xAI

▸ BENCHMARK

Llama

Meta · Open-src

▸ ON-PREM

DeepSeek

DeepSeek · OSS

▸ ON-PREM

Cursor

Coding agent

▸ AGENT

ClickUp

Workflow agent

▸ AGENT

Claude is the default for production work (evals, long-context, tool use). I benchmark against GPT, Gemini, and Grok per project, and route to open-source Llama or DeepSeek for on-prem or cost-sensitive workloads. Day-to-day I work inside Cursor. Model choice is part of the deliverable, not an assumption.

02 / Recent work

8 scheduled agents · running livemodel · Opusuptime · 0.0sshipped · 00 / 16

▸ Shipping to

your business

monthly spend£18.00

●haiku · cheap triage●sonnet · extraction + drafting●eval suite · green●cost cap · £100/mo●verified · 0 fabricated●8 agents · 1 function●haiku · cheap triage●sonnet · extraction + drafting●eval suite · green●cost cap · £100/mo●verified · 0 fabricated●8 agents · 1 function

Live · 8 scheduled Claude agents shipping into one business · looped preview

Filemender — a B2B SaaS where Claude agents run growth.

I built and operate Filemender, a media file validation and repair tool used by post-production studios and agencies. The interesting part isn't the product — it's the AI agents and workflows running on top of it.

Eight scheduled Claude agents handle SEO content, lead research, cold email drafting, LinkedIn posting, and Twitter monitoring. The whole marketing function runs as engineering — and the case study walks through what I got right, what I got wrong, and what I'd build differently for a client.

Case study · ~12 min read

Filemender — eight Claude agents running growth

Architecture, costs, what broke, and what I'd build differently for a client.

Scheduled Claude agents in production, each with a single narrow job

~10×

Cost reduction after migrating routine work from Opus to Haiku

Fabricated leads after adding the verification layer

What a user said

“FileMender saved us from a disastrous broadcast rejection on a Friday night. It fixed the loudness levels automatically in seconds.”

James D.

Post-production lead

03 / Other work

Smaller builds, same playbook.

Short engagements where I shipped a single AI feature end-to-end. Scope pinned, price pinned, handed over with evals.

Document review · 10 days

Contract-clause extraction for a law firm

Replaced a paralegal workflow that took 3 hours per contract. Claude reads the PDF, extracts 18 named clauses into a structured review table, and flags anything unusual. Now runs in under 90 seconds per doc.

Claude OpusPDF parsingPostgres

Classification · 10 days

Inbound-lead triage for a B2B agency

Replaced a hand-written rules engine with a two-model classifier: Haiku for cheap triage, Sonnet for ambiguous cases. Cut ops time per lead from 4 min to 18 seconds.

Claude HaikuClaude SonnetZapier

Agents · ongoing

Outbound agent for a pre-seed founder

Daily agent that researches ICP accounts, writes a first-draft cold email, and queues it for the founder's review. Runs for <£30/mo of model spend.

Claude SonnetPlaywrightCron

More case studies on request · Ask to see one ↓

04 / How I work

Three packages. Fixed scope. Start where you are.

Most AI work that fails in real businesses fails for the same reasons: no clear scope, no evals, runaway costs, no handover. I package the work to remove those failure modes upfront — and I'll start with a roadmap if you don't yet know what to build.

▸ Every Sprint & Build-out ships with the learning kit so your team can run it without me

▸ See the full AI Agent Menu (PDF) — 50+ agents I can ship, tagged by package and time

Start here

AI Roadmap

£1,500

1 week · ship one small agent

Best for: you want to see AI working in your business this week. Pick a small agent off the menu and I'll ship it. If you genuinely don't know what you need yet, swap it for a strategy plan instead.

▸1 Small agent shipped end-to-end (default)
▸OR a 5–8 page strategy plan with 3 ranked use cases
▸Eval suite + handover doc included
▸14 days of bug-fix support
▸100% credited toward a Sprint if you proceed

Roadmap examples from the menu

A1Document Q&A bot
A3Receipt + invoice extraction
A4ID document checker
B1Meeting notes + action items
See all 50+ in the menu →

No black box. You own what I ship.

Every Sprint and Build-out ships with a full handover kit — six concrete artifacts that document everything I built, the decisions behind it, and how to run it. Whether your team runs the agent (Self-host) or I run it for you (Care), the IP, the code, and the prompts are yours from day one.

Included with every Sprint & Build-out · yours regardless of how it's hosted

▸ 01

Architecture walkthrough

30-minute Loom recorded during handover. Every decision explained — model choice, routing, fallbacks, why this and not that.

▸ 02

Annotated prompt library

Every prompt in the system with comments on why it's structured that way. Lives in your GitHub if you Self-host, or your shared docs if I run it on Care. When it drifts, you know which knob to turn.

▸ 03

Plain-English runbook

Step-by-step for the day-to-day ops: re-triggering a job, checking spend, rolling back a prompt, adding a new case. Yours to follow if you Self-host; mine to follow if you've added Care.

▸ 04

Failure-mode playbook

What breaks, what the signal looks like, what to do. Every hallucination class and edge case hit during build — documented, not hidden.

▸ 05

Eval suite, runnable

The test harness lives in your repo. Change a prompt or swap a model? Re-run in one command, see the delta, ship with confidence.

▸ 06

Post-handover support

14 days on a Roadmap, 30 on a Sprint, 60 on a Build-out. Anything that breaks the agent doing what the SOW said — I fix it on me. Care extends this indefinitely (£50/mo per agent).

06 / Questions I get a lot

Before the call.

The answers most people want before they'll spend 15 minutes with a freelancer. If yours isn't here, just ask it on the call.

We don't really know what we want to build yet. Is it too early to talk?

No — that's exactly what the AI Roadmap is for. Or, if anything in the Agent Menu (harrybenham.dev/menu) caught your eye, we can ship that as the Roadmap deliverable instead. 100% credited toward a Sprint if you proceed.

How much will the AI itself cost to run each month?

Most agents on the menu run on £20–£100/mo of API + hosting. Per-feature spend caps go into every Sprint so a runaway loop can't run away with your card. You get a costed estimate per use case on the Roadmap.

Where does the agent live after we ship?

Default for most clients is Care (£50/month per agent): I host the agent on my standardised stack, monitor 24/7, fix bugs free, you never log into a hosting dashboard. New requests are quoted separately as a mini-Sprint. Cancel any month. If you have an in-house tech team and prefer Self-host, that's free — you own everything from day 30.

What does “fixed price” actually mean if the scope changes?

Scope is locked in writing on Day 0 — what's in, what's out, the eval set, success criteria. New features mid-build get quoted as a small add-on Sprint, never billed surprise-style. If I under-quoted, that's my problem, not yours.

What if the output is wrong or the model hallucinates?

The single most important question — which is why evals are non-negotiable in every package. I build a test suite covering the critical paths. For high-stakes outputs (legal, financial, customer-facing) I add a verification layer or a human-in-the-loop step.

Do you need access to our codebase?

Almost never. Most menu agents run in standard channels (web chat, email, WhatsApp, Slack, your CRM via API) and don't touch your product code. If something does need deeper integration, I work alongside your engineers — you stay in control of your codebase.

What if our team has zero technical capability?

That's the most common case and exactly what Care is built for. £50/month per agent, I host and monitor it, you never touch infrastructure. New work gets quoted separately so there are no surprises. The agent just works — same way you don't manage your own email server.

Are our documents and data safe? NDA?

Yes on both. Happy to sign an NDA before discovery. Client data never leaves your infrastructure unless you explicitly want it to. I route model calls through enterprise endpoints with zero data retention where requested. UK GDPR DPA available on request.

07 / Let's talk

Pick a time that works.

Fifteen minutes, no prep needed. Tell me what you're trying to build (or that you don't know yet), and I'll tell you whether I'm a fit, what it would cost, and how long it would take.

No slide deck
Just a conversation. Camera optional.
No hard sell
If we’re not a fit, I’ll say so on the call.
No follow-up spam
One email after with notes. Then it’s on you.

I ship AI featuresinto your Business.

I'm Harry. I build the AI features that actually ship.

Filemender — a B2B SaaS where Claude agents run growth.

Smaller builds, same playbook.

Contract-clause extraction for a law firm

Inbound-lead triage for a B2B agency

Outbound agent for a pre-seed founder

Three packages. Fixed scope. Start where you are.

No black box. You own what I ship.

Architecture walkthrough

Annotated prompt library

Plain-English runbook

Failure-mode playbook

Eval suite, runnable

Post-handover support

Before the call.

Pick a time that works.

Pick a time that works.

I ship AI features
into your Business.