The biggest obstacle preventing AI from making the leap from experiment to production, and how to overcome it
Artificial intelligence is the last digital transformation of our lifetimes, but the gap between AI in experiments and AI in production feels like a widening chasm.
Itās worth saying that the high failure rate identified in the S&P data isnāt stopping companies from experimenting. Most are still investing in generative AI, but the stumbling block seems to be firmly wedged between proof-of-concept phase and actual production.
In my experience, many experiments usually get as far as concluding āthis kind of works, 85% of the timeā. Business users are very interested in the promise of AI but is it any surprise that many are reluctant to take the next step and commit to more extensive projects when the risk of error or hallucination is still high?
I have a theory about why this is happening: the technology is evolving so fast that it can feel like weāre in constant catch-up mode. Itās one thing to successfully launch something that works in experiment mode. Itās another to put it to work in the business ā and to trust it with your customersā and your own data.
Because fundamentally, thatās what it comes down to: a question of trust.
And for that, you still need people. With AI, you can insert automations within a process to speed it up, but you soon reach a point where you need to verify that itās producing the right output. So how do we test that?
As it happens, Phased AI and Each&Other are working together on exactly this challenge: weāve been finding the human pain points our customers have, and figuring out where to automate them, effectively and accurately.
What weāve found is that most of the work is not in building a prompt or an automation to start the process: itās in evaluating it robustly at the far side.
We work with our customers to ensure they can go to a really trustworthy application at the end of the experimentation phase, that works with human workflows. Together with those businesses, we sketch out the best ideas for how they can use AI with high impact but ā and this is the crucial part ā low risk.
Typically, some customers will have played around with ChatGPT and think: ācould it do this task I have in mind?ā āCan I make it work with the API and make it safe?ā Some have a specific use case, and they think generative AI can solve this, or can make it faster, but donāt know how.
In a workshop, we establish if this is a low or high risk use case, and we map out their current human process: is it a spreadsheet or is it another application? Then we identify the hypothesis: can AI do this, and typically, we build a very quick prototype for them, taking in as near-to-real data as we can.
Next we test the AI output at the other end to see if itās accurate. In parallel, Each&Other tests how this will work from a UX point of view, evaluating how the user will experience this.
After the hypothesis, and the rapid proof of concept, we then build out a bigger data set, scale the test, build out a quick UI, and the users check for flaws. That can be a simple checkbox exercise for the test cases: did the expected output match the real output yes or no. Or you can give it a score on a sliding scale.
This part is so important: even at the proof of concept stage, weāre always testing if the AI is consistently giving the output that we think it should. And at the end of the proof of concept, when we hand over the tool to the customer to try in real case scenarios, it comes with a report from our test cases as to what we think.
After that, the next stage is how do we make this production level, discovering the allowable level of accuracy that can be with customers, and we establish what guardrails need to be in place if this is interacting with customers.
For years, the tech industry motto was āmove fast and break thingsā. Right now, I believe weāre at the very beginning of the transformation that AI can bring and companies understandably want to be a part of it. The key to getting it right is to move slowly, intentionally, and to make sure that you are working to properly evaluate at the other end.
AI has the potential to make us all quicker and more efficient, but the current state of the art is not replacing humans. None of this can be just handed over to a machine: thereās a job to understand where the human needs to step in.