There’s something interesting going on in my inbox.
On the one hand I’ve got Gary Marcus telling me LLMs are dumb because they can’t color maps properly.
But then AI platform providers and tech bros keep calling for the end of software, work, poverty… you name it!
So who’s right?
Both!
Welcome to the ‘jagged frontier’ of Gen AI capabilities. A term coined by Ethan Mollick and his collaborators on a study of Boston Consulting Group (BCG) consultants.
It was a decent size study: more than 750 consultants were studied – 7% of the BCG consulting work force. They were given typical consulting tasks. One group had no AI. Another had AI without instruction. The third got AI and an introduction to prompt engineering.
The results were impressive. Those using AI:
– finished 12.2% more tasks on average
– completed tasks 25.1% more quickly
– produced 40% higher quality results
Except there was a catch. One of the tasks was deliberately chosen because the AI (GPT4 in this study) repeatedly failed at it.
Then the tables were turned.
Those with AI performed 19% worse on that task.
So do we just need to give them better AI?
Unfortunately that’s not a panacea. In another study, Harvard researcher Fabrizio Dell’Acqua provided recruitment consultants with AI that was rigged to be correct 75% of the time, and others got AI rigged to be right 85% of the time.
Those with the lower performing AI did better.
Commenting on the study, Ethan Mollick summarized:
“recruiters who used high-quality AI became lazy, careless, and less skilled in their own judgment. They missed out on some brilliant applicants and made worse decisions than recruiters who used low-quality AI or no AI at all”
Fabrizio calls it ‘falling asleep at the wheel’.
So AI can out-code, out-math and out-science humans, but it makes stupid mistakes when we least expect it and makes us so lazy we don’t spot them.
Ouch.
So is Gen AI a dud, as Gary Marcus likes to claim?
Of course not. We have a technology that fits right into the gap between computers – which can do logical stuff very fast and reliably (but can’t do fuzzy stuff), and humans, who can do fuzzy stuff quite fast and reliably (but are slow and unreliable at computation)
Gen AI does the fuzzy stuff faster, sometimes better, and much cheaper than humans can. And it can talk code, so it can leverage the stuff computers do really well and be the bridge between humans and computers and code.
But we need to solve for how to make all three – humans, computers, and AI – play nicely, without pushing each other too far outside of their sweet spot where they start to fail
That’s probably the biggest unsolved problem in tech right now, but don’t let that stop you, because you don’t need a general solution to the problem that works for everyone in every situation.
You need a specific solution that delivers benefit to you, your team, your business and your customers.
To stay on the right side of the jagged frontier, and stop your people falling asleep at the wheel, your approach should:
Stay inside the Jagged frontier – you can’t just ‘design an AI system’. You have to test and learn what works – and do extensive evaluations – to make sure that AI really is capable of the work you’re giving it.
Put in hard constraints – the hard rules of legacy workflow automations and strict input / output formats of APIs provide a useful checkpoint for AI. If certain steps must be completed in a certain order, you can enforce that with a rule-based workflow or API call. It doesn’t matter how confused the AI gets if the workflow rules or API formats are there to keep it on track. If it does something wrong: computer says no!
Let humans ‘own the output’ – whenever you’re close to the frontier of AI capabilities, you’re at risk of your AI doing something dumb. Staying inside the frontier and putting in hard constraints will help. But sometimes you want to push to the edge. Then you need to make sure that you design the process so humans don’t just ‘stay in the loop’ – but own the output of the AI driven process, have sufficient checkpoints and are required to provide meaningful inputs along the way so they don’t “fall asleep at the wheel”.
AI isn’t dumb because it gets stuff wrong. It’s just a symptom of the Jagged Frontier. Putting humans in the loop can help, but they risk ‘falling asleep at the wheel’ – so think carefully about how to keep the AI on track, and human overseers engaged enough and alert enough to catch it whenit fails.
This is a good example of the difference between the possible, the practical, and the profitable when it comes to AI deployments, something that Fish called out in his article that I shared last week. Check it out and hit ‘subscribe’ because I got a sneak peek at his next article and it’s a good’un!
Kerry
PS: If you want a more regular dose of insights, follow or connect with me on LinkedIn for regular posts on conversational AI, mindset, and egg juggling, among other things!
PPS: You are building with GenAI right now, aren’t you? If not, what’s stopping you? Check out our blog on Gen-AI blockers, or sign up for a complimentary Strategy Workshop to help you get started.
If someone forwarded this to you, please subscribe yourself for weekly insights that’ll make you think differently about your IVR, voice, and chatbots.
Helping you get maximum ROI from conversational AI — whatever the platform.