Author: Kerry Robinson
Something big happened on Tuesday. Your contact center will never be the same.
In a closed-door, invite-only event, with no livestream, OpenAI announced the availability of their ‘real-time’ API for GPT4o.
This is the enterprise equivalent of ‘advanced voice mode’ which was finally pushed out to the majority of paying ChatGPT subscribers last week.
Advanced voice mode gets rid of the three separate steps we’re used to: Speech to Text, LLM inference, and then Text to Speech.
Instead, the AI model natively receives audio and text, and outputs audio, and text. This reduces the time between when a user finishes speaking, and the system responds, to around 300 milliseconds – the same as human conversation. And with advanced voice mode, we get incredible voices that sound nearly indistinguishable from real people.
But there’s more: Advanced voice mode can both detect and express emotion. It can tell the difference between the same words, said in a different way, and respond appropriately. It can laugh, whisper, and produce useful backchannel sounds, like aha, mm-hmm.
And finally, with yesterday’s announcement. This kind of power is available to you and your business to serve your customers.
Their ‘realtime’ API, currently accepts voice and text input, and outputs voice and text. Images and video are apparently in the pipeline.
It’s hard to underestimate the impact. Until now, our IVR, Conversational AI, and even Gen AI-powered voice agents couldn’t respond fast enough to fully leverage the cooperative nature of conversation. But now they can.
They couldn’t respond appropriately and empathetically. But now they can.
I think we might look back at this as a tipping point. A point when clunky old IVRs and only-slightly-less-clunky natural language routing solutions become totally unacceptable, and chatbots lose their charm.
I might shoot a text message or WhatsApp to my bank or a retailer, or read one they send me… but to get something sorted, why would I rely on typing on my phone, tablet, or keyboard when I can have a fast, efficient, engaging conversation about it? Seriously, done right, AI agents built with the real-time API are gonna solve your problem before you can listen to the long, confusing list of options in a competitors IVR 🤯
In last week’s email, I warned we may see a bifurcation into two types of B2C businesses:
1. Those that provide commodity services that are purchased by AI, on behalf of consumers, at the lowest cost
2. Those who manage to build and maintain customer relationships that are so deep, meaningful, and experiential that consumers choose to engage with them directly.
With the real-time API, we are starting to get access to the kind of tech that may allow you to attain, or retain your position, as a B2C brand that gets direct access to your customers.
But it’s not cheap. A rough calculation suggests the API will cost around 18 cents per minute! You can get offshore agents for less. But AI gets better, faster, and cheaper all the time. GPT4 class models have crashed in price at an annualized rate of 90%.
And if you’re only focused on cost savings you’re missing the point.
More to come next week once we’ve had some time to build with these new models.
Kerry
PS: You are building with genAI right now, aren’t you? If not, what’s stopping you? Check out our latest blog on gen-AI blockers or sign up for a complimentary Strategy Workshop to help you get started.
PPS: If you want a more regular dose of insights, follow or connect with me on LinkedIn for regular posts on conversational AI, mindset, and egg juggling, among other things! If someone forwarded this to you, please subscribe yourself for weekly insights that’ll make you think differently about your IVR, voice, and chatbots. Helping you get maximum ROI from conversational AI — whatever the platform.