The pitfalls of AI search
Before we resume our regular programming, I have to issue an apology to you, my dear reader. I have been misled and in turn, I provided misleading information to you.
You might remember my post Teaching an old monkey new tricks where I described how the Epoch AI company put together a new math benchmark named FrontierMath.
It is a set of math problems, built by mathematicians to be solvable by a computer with a simple answer. They were meant to never be seen by any AI model before the test. Until then, all the models were solving known math problems scraped from known sources. They were claiming above 90% success. Once introduced to FrontierMath problems, they all dropped to single digits.
Well, there was one single, tiny thing omitted from the announcement. This detail was added after it was discovered that OpenAI paid for this - 'We gratefully acknowledge OpenAI for their support in creating the benchmark.'
Once that was discovered, the articles start coming in:
OpenAI quietly funded independent math benchmark before setting record with o3
OpenAI Secretly Funded Benchmarking Dataset Linked To o3 Model
AI benchmarking organization criticized for waiting to disclose funding from OpenAI
The Epoch AI people issued a statement that the integrity of FrontierMath was still intact, but it had 'made a mistake' by not being more transparent. And, 'We were restricted from disclosing the partnership until around the time o3 launched, and in hindsight we should have negotiated harder for the ability to be transparent to the benchmark contributors as soon as possible.'
You think so?
Epoch AI misled all the people who trusted their benchmark and they misled the mathematicians who were not informed for whom they were creating these tests.
Shame on you people at Epoch AI!
Now back to our regular post.
It would appear that the more AI we are trying to use, the more people we need.
The latest can be found in the Wired article This New AI Search Engine Has a Gimmick: Humans Answering Questions.
This new AI/human search engine Pearl (I guess the pearls of wisdom??) mixes AI with humans in this strange way. You pose a question which gets answered by AI. You read the answer and after that, you can decide to hit the 'Verify' button which sends the answer together with your question to a human expert. The system locates a human expert somewhere in its own vast network. While doing that you get a notification that 'Your AI answer is under review by a professional in our network.' After a few minutes, you get 'Your Expert trust score,' ranging from 0 to 5. After that you are nudged to 'Your AI answer is just the start. Connect with an Expert to ensure you’re taking the right steps or continue chatting with AI.' There is a link to 'Connect with an Expert now' which takes you to a payment page where for $28/month you can chat with an expert about your question.
At the time of writing this post, there were 12 experts online (so much for a vast network of experts.)
You are also informed about the benefit of this amazing membership:
Free follow-up chats with your favorited Experts
Access to advanced AI
No hidden commitments—cancel anytime
Naturally, I tried the system with a simple question: 'What is Hiswai?'
The answer was a little bit lengthy but the first paragraph stated that 'Hiswai is a social media monitoring and analytics platform designed for businesses and organizations. It provides tools to track, analyze, and gain insights from various social media platforms and online sources.' And also it stated that it provides 'Real-time monitoring of social media mentions and conversations' and 'Sentiment analysis to gauge public opinion.'
I can tell you that it isn't accurate. I know what I built.
When verified by an expert, Scott, the answer got 4 out of 5 on the trust score ....
Do you think that this Pearl of wisdom system could really work?
The Pearl system uses ChatGPT, which is trained on questions and answers collected from the JustAnswer website, all the way from 2003. On the JustAnswer website you can pay a monthly fee ranging from $55/month to $125/month (legal advice is $79/month while a tutor is $125/month??) for individual expertise, while on Pearl you pay a flat fee of $28/month. For that you have 12 people with the designation of an expert waiting to pass judgment on any question which people can come up with.
I am failing to see how this model can work.
Imagine that you ask a question for which you truly don't know the answer, you get a vague answer from AI (with the disclaimer ‘AI can make mistakes. Check responses') that gets confirmed by an expert whose expertise credentials are manifested by the word ‘expert’ next to his/her name. And the same expert for $28/month will feed you with more nonsense.
It is a dead end. Results are not good and because of the requirement to have an expert on every possible topic, the system is not scalable.
This new recurrent pattern emerging for AI search is that in order to use it, you must know the correct answer before you even ask the question. Otherwise you'll be happy with the answer 42.