Solving AI’s Last‑Mile Problem

Hi, it’s Kerry here again, standing in for Damian, who’s still road tripping around the US of A 😎

Meanwhile, in the AtCoder World Tour Finals 2025 in Tokyo, Polish superstar Przemysław “Psyho” Dębiak edged out an AI coding agent that OpenAI entered into the competition. He shared on social media platform, X:

“Humanity has prevailed (for now!) I’m completely exhausted. I figured I had 10 hours of sleep in the last 3 days, and I’m barely alive”

Obviously the AI agent didn’t break a sweat, and it’s currently helping teenagers cheat on their homework 😜

Now, that was an unreleased model, but it’s safe to say that the AI available to anyone in the world for $20 per month is probably a better coder than 90% of your dev team.

This is just the tip of a huge iceberg heading for software development, and SaaS companies. AI can code better than most humans. So do we need software any more? Should systems integrators like Waterfield Tech shut up shop?

Not so fast.

For any of us who’ve been involved in software delivery – whether as customer, supplier, developer or manager, which is probably most of us here – it’s pretty impressive seeing AI do in seconds what even the best developers take hours or days to deliver. But coding capability isn’t enough.

My co-conspirator and CPO at Waterfield Tech – Michael (Fish) Fisher – was stoked when his AI assistant assembled a functional prototype, yet when it tried to send an email the whole thing crashed because AWS kept the service in sandbox mode. It’s a classic example of AI doing the first 90% of the work and falling over in the last mile.

In his substack article Breaking the Loop: When AI Development Reaches the Last Mile Reality Fish analyses why this is. Applying his characteristic systems thinking approach, he explains that this happens because AI excels at analysis – repeating patterns and optimizing code – but struggles with synthesis: understanding the broader system, the compliance rules, and the messy reality of production.

The good news is there are two ways to beat the last‑mile curse.

In his article, Fish suggests we leverage a constellation of AI agents and human helpers – to wrestle control of a complex software stacks and infrastructure sprawl common in most businesses.

But there’s another approach I’ve been piloting, where we dumb down the problem so the AI doesn’t need to do the synthesis. The trick is to create an environment where AI can shine. Instead of throwing an LLM into your entire tech stack, carve out a controlled environment with a limited set of APIs and data sources and precise instructions. We limit what it can do, so it’s less likely to get stuck or get it wrong.

In future, I think we’ll combine both approaches.

These breakthroughs are part of the continued march towards better, faster, cheaper AI that’s making the AI first contact center more achievable than ever.

But most of us are still building like it’s 2022.

Join our LinkedIn Live today as we explore what’s stopping you – and what might release you – to build an AI first contact center.

Kerry

PS: If you’re new here, this newsletter brings you the best from Waterfield Tech experts at the frontier of AI, CX, and IT. Kerry posts weekly at The Dualist, and Fish and Dan share their thoughts every other week at Outside Shot and Daichotome.

Here’s what went down this week.

Bleeding Edge

Early signals you should keep on your radar

Reverse robocalling. Google is quietly rolling out a Gemini‑powered calling agent that will call local businesses for you. Search “pet groomers near me,” tap “Have AI check pricing,” answer a few questions and the bot dials, announces itself as a robot and returns the details. It’s a glimpse of how personal AI could swamp your contact center, unless you go AI first! (Dataconomy)

Provably private AI for enterprise. A startup called Confident Security launched CONFSEC, an enterprise‑grade implementation of Apple’s Private Cloud Compute architecture to ensure prompts and metadata are never stored or used for training. The company argues that trust must be built into AI infrastructure itself, especially for healthcare and finance. (Businesswire)

Leading Edge

Proven moves you can copy today.

Coding goes command‑line. Terminal‑based AI tools are surging as labs like Anthropic, DeepMind and OpenAI release Claude Code, Gemini CLI and CLI Codex. Unlike code‑editor assistants, these tools run directly in the terminal where developers live. We’ve been leveraging these tools for a few months now and it really is transformational (AInvest)

ChatGPT’s new agent is pay‑walled. OpenAI combined its Operator and Deep Research tools into a general‑purpose agent mode that can analyze your calendar, generate slides or order groceries. It runs on a virtual computer, narrates its actions and connects to apps like Gmail and GitHub, but you need a Pro, Plus or Team subscription to turn it on. I’ve used it extensively, and if you prompt it right, the output is phenomenal. I’ve yet to give it my credit card number 🤣 (Cybernews)

Off the Ledge

Hype and headaches we’re steering clear of.

Open models? Not so fast. OpenAI’s much‑anticipated open‑weight model was supposed to ship last week. CEO Sam Altman now says it’s delayed indefinitely while Open AI address risk and safety concerns. That suggests to me it’s gonna be a super powerful model if/when we do get it.(AInvest)

Researchers warn we’re losing sight of AI’s “thoughts.” A coalition of top scientists from OpenAI, DeepMind and Anthropic released a position paper warning that the transparency we currently enjoy with reasoning models – where we get to see what they’re thinking – is at risk, and should be preserved for the sake of safety. Unlike humans we currently get to see what the AI is thinking… let’s not lose that insight or it could get devious and we wouldn’t even know it 😬 (AI Insider)