How do you measure AI’s real impact?

Last week, I read an upbeat post by John Battelle, one of the Nestors of the Internet age, titled 'Can GenAI Change Big Companies?'

The question posed was: 'Is there any proof that the technology is actually paying dividends for large businesses?'

John used two examples. One from Air India, the other Bayer AG.

Air India, the privatized national carrier, embarked on its own journey to modernize and expand its fleet and especially to improve customer satisfaction. Those who have ever had to communicate with any airline hotline know the amazing feeling of being on hold for a few hours, only to be disconnected or their call being redirected to the janitor's department, where apparently they can't change your flight.

People at Air India decided to test and put into production CoPilot from Microsoft, 'that would allow Air India to query previously inaccessible silos of crucial performance data.' The result is ASK AI.g. One can ask generic questions about booking, flights or check-in (among many others).

I tried a few simple ones:

- Do I need a visa to fly to India?
- What are prohibited items that you can’t bring into India?
- What is the earliest flight from Vancouver to India?

The first answer was reasonably accurate and easy to verify.

The answer to the second question provided a list of things prohibited in the cabin, but not a list of things prohibited to bring into the country.

The third answer was 'Yes, we do have flights from Vancouver to India'. Obviously, everyone hopes that one day the CoPilot will graduate to Pilot, with better answers.

According to Air India, the chatbot is answering about 10,000 questions a day. The customer satisfaction and quality of answers increased; and all that without hiring extra people. Also important, by using this technology, the team has better visibility into all the questions people are asking. All the new data can be used for further training, operational improvements or marketing activities.

(This sounds promising, though there’s an issue with it that I’ll get to below.)

The second example is from Bayer AG,  a German multinational pharmaceutical and biotechnology company. Bayer partnered with Google and built Calantic, 'an orchestrated suite of AI radiology solutions that will focus on quality of care today and help transform radiology tomorrow. Our cloud-based marketplace, with service line AI apps, are integrated into the radiologist’s workflow with the Calantic Viewer integrated in the PACS viewport.'

It is meant to help Radiologists to prioritize cases, enhance diagnostic capabilities and flag suspected cases. It will integrate into the Health Information Technology (HIT) and manage workflows. The list of benefits is long.

Unlike the Air India chatbot, this technology is difficult to test if you are not an expert working in a hospital.

The question to ask both organizations is 'How do you actually know that your AI system is working?'

This is a question which governments, regulators and companies are trying to solve. You might recall my article The EU AI Act. An orgy of bureaucracy, or when I wrote about NIST's Artificial Intelligence Risk Management Framework. Or you can read a great article (not by me, for once) Why Is Everyone Suddenly Furious About AI Regulation? discussing the upcoming California's Bill 1047.

Some companies are proactive and publish Model Specification. Here is one from OpenAI. It is interesting to read how the developers of ChatGPT are trying to grapple with the intricacies of humanity. A nice example is how to decide what is porn and what is art (without relying on the old standby, 'I know it when I see it').

But back to the question: 'How do you actually know that your AI system is working?'

If I take the Air India example, I'd say that out of three questions, it got one and a half right. It appears that it somehow works, but based on my test I know that it is not 100%.

Is there now an implicit requirement to cross-reference the answer with another source? The reason I’m asking: do you remember when Air Canada was sued for a wrong answer and tried to distance itself from the responsibility?

Reading all the regulations from governments or regulators only shows how little they understand the technology, its capabilities and limitations. The rules and regulations are vague and mostly useless.

My suggestion? Transparency. Don't hide behind flowery marketing language or pages of legal fortress. In simple terms, tell your customers the intended use, how it was trained, how you are monitoring the quality and accuracy; and how people can resolve any issues when it doesn't do what the system is supposed to do.

Provide a place where everyone can see how the system is improving over time, and how you are meeting the quality you set at the beginning of the project.

You didn't define these things? What kind of business are you running then?

The recurrent pattern? Good business practices will keep your customers happy and the tech will help you create a positive impact on your company's growth.

Previous
Previous

Will XAI save self-driving cars?

Next
Next

AI in search of (enterprise) customers