When Good AI Gives You Bad Numbers

What gets measured gets managed. But what if you’re measuring (and managing) the wrong thing?

For the last two decades, customer service has been ruled by metrics like average handle time, tickets closed, and first-call resolution. Those numbers showed up in dashboards. Those dashboards shaped behavior.

But now those numbers are starting to fall apart. The best service is no service, and the latest AI puts that vision within reach. We touched on this “customerless customer experience” in last week’s email.

The bot that deflects a call doesn’t show up on your handle time report.

The assistant that prevents a ticket doesn’t count as a resolution.

The math hasn’t caught up to the moment.

Which is why it may feel like everything’s getting better… except the metrics.

This week, Fish wrote a sharp piece about what happens when AI reaches the “last mile”—the point where everything looks right but still doesn’t work.

He was building an AI-powered tool for his neighborhood. Everything moved fast. The app looked great. But it kept breaking.

The assistant kept confidently fixing the code. But the code wasn’t the problem. The issue was buried in an AWS policy: outside the code – and context – the model was working with.

Sound familiar?

AI can only fix what it can see. So it optimizes for the visible issues – code, syntax, structure – just like our CX dashboards optimize for visible metrics: calls, tickets, handle time.

But often, the real problems (and the real wins) are hiding just out of frame.

This is a big issue in customer service, customer experience, and AI right now.

The old dashboards will tell you everything’s fine.

Call volume: steady. Containment: high. Average Handling Time (AHT) low.

Meanwhile your competitor is rolling out customer experiences that take longer (because Gen AI enables more complex conversations), are more comfortable to use (less bogus containment from frustrated customers) and escalate more difficult problems (so AHT goes up).

Worse figures, but a better outcome.

And this is a place you’ll never get to when you’re using old metrics. If you’re still thinking in terms of front doors and handoffs, you’re missing the bigger picture.

The magic is in the margins: before the ticket, during the moment, after the resolution. Invisible to the metric… but not to the customer.

So, what should we measure?

• Clear answers and useful outcomes, not just speed

• Future inquiries prevented (this one’s tougher to measure)

If AI is working the way it should, your numbers should look strange.

Don’t blame the tech. Blame the lens.

Damian

PS: If you’re new here, this newsletter brings you the best from Waterfield Tech experts at the frontier of AI, CX, and IT. Also, Kerry posts weekly at The Dualist, and Fish and Dan share their thoughts every other week at Outside Shot and Daichotome.

PPS: In case you missed it, all three of those guys went live on LinkedIn last week to discuss MCP (Model Context Protocol) and what it means for customer experiences. Plus, they even did a quick demo. The full recording is up, and we think it’s 30 minutes well spent.

Here’s what went down this week.

Bleeding Edge

Early signals you should keep on your radar.

OpenAI and Meta are raiding each other for top AI talent. A wave of leadership exits and poaching shows the frontier AI race is boiling down to who can keep their smartest people. Some of these compensation packages are rumored to be $100M. It pays to know AI.

Hedge funds are offering million-dollar packages to lure AI talent. Firms like Point72 and Millennium are aggressively recruiting AI and machine learning experts, offering base salaries between $200,000 and $400,000, plus bonuses. These companies clearly see the leverage that AI can provide and are willing to pay top dollar to harness it (and we do, too).

Leading Edge

Proven moves you can copy today.

Disney, Walmart, and Carnival UK share how they’re using AI to improve CX. From real-time personalization to smarter automation, they’re building AI-first to make customers happier. As are we!

Cobb County is using AI to train 911 dispatchers. The system runs real emergency scenarios to help staff handle high-pressure calls faster and more accurately. We’re seeing a lot of pull for this kind of solution – it’s a way to leverage Gen AI while dodging some of the bigger implementation and compliance issues.

Off the Ledge

Hype and headaches we’re steering clear of.

Many open-source AI agents still struggle with real-world tasks. A new benchmark found they failed over half of basic jobs like travel booking and doc summarization, reminding us the models are capable, but like we said the other day, you need to play Happy Hogan.

Salesforce says many LLMs aren’t ready for customer service. In internal test, even top models fumbled basic CRM tasks — and sometimes made things worse. A stark reminder that you can’t just slap AI over existing systems and hope for success.