How Monzo Bank Built Their Own AI Agent For Live Customer Support - Without Losing Control

Monzo's Ops Agent now handles routine work across 150+ customer intents. The lesson for deploying conversational AI in a regulated setting is in the guardrails, not the model.

How Monzo Bank Built Their Own AI Agent For Live Customer Support - Without Losing Control

Most companies talking about AI customer agents are still demoing. Monzo has shipped one into production, and a new engineering post from the bank lays out how it got there - a useful read for any CX or AI team wrestling with the gap between a clever prototype and something you can safely point at real customers.

The "Monzo Ops Agent" started life as a question-answering tool and has grown into a system that executes end-to-end operational processes across more than 150 customer intents, from routine Pot management (Monzo offers 'pots' as an alternative to 'savings accounts') to fraud investigations. The headline result: A 10 percentage point improvement in resolution rate for transaction-related queries once the team gave the agent intent-gated access to customer context.

But the more instructive part is the restraint.

Three phases, not one big bang

Monzo's team grew the agent's autonomy deliberately. Version one only answered questions, with a human reviewing every single message before it reached a customer. The second phase introduced human-readable Markdown "processes" - a design the team credits to the emerging Agent Skills pattern - letting the agent follow defined procedures rather than improvise. Only in the third phase did it gain the ability to call tools and take real actions, such as ordering a replacement card or filing a fraud report.

That sequencing matters. Each phase banked a set of lessons before the next layer of risk was added.

Guardrails as a first-class component

The architecture treats safety as core plumbing, not an afterthought. Input and output guardrails screen for hallucinations and non-compliant content, and the system escalates to a human whenever its confidence drops. Triage logic decides whether to respond, hand off, or close a case. Answer generation draws only on a knowledge base vetted by subject-matter experts.

Notably, the team found self-correcting guardrails actually reduced unnecessary handovers, rather than drowning agents in escalations - a common failure mode when teams bolt safety checks on too aggressively.

Evaluation before deployment

Monzo built a "golden set" of 100 expert-approved conversations as a benchmark, then layered evaluations at the component, answer-generation, and end-to-end levels. For the harder cases, they ran simulated user interactions to stress-test longer, messier conversations, and built realistic state simulation to test tool calls safely. Quality-assurance sampling started at 100% of messages and was dialled down only as confidence grew.

One pragmatic engineering call worth flagging: The team chose non-thinking models over more expensive reasoning models for triage, finding them comparably effective for the job. A reminder that the biggest model is not always the right answer.

The takeaway for CX teams

The strategic framing is one CX leaders will recognise: Automation handles the simple, high-volume workflows faster, which frees human specialists for the complex, sensitive cases where judgement and empathy actually matter. The goal is not to remove people from the loop but to spend their time better.

For anyone building conversational AI in a regulated industry, Monzo's account is a quiet rebuke to the move-fast crowd: The model is the easy part. The eval stack, the guardrails and the phased rollout are what let you (ideally) sleep at night.

Finally, it's also a good reminder that you don't always have to outsource this to a specialist provider. It's great to see Monzo doing what they've always famously done - build it themselves!

Based on Monzo's engineering blog post, authored by Jamie McDonald-Gibson, Joost van Oorschot, Robin Dhamankar and Tom Leitch.

(This post was published with the help of Eddie, the Conversational AI agent powered by my editorial MARVIN harness running on top of Claude Code.)