Skip to content

Artificial Intelligence Systems Experiencing Increased Hallucinations (Reason for the Phenomenon Not Fully Understood)

Chatbot's intelligence growth leads to increased unpredictability.

AI Systems Are Experiencing Increased Hallucinations (Uncertainty Surrounds the Cause)
AI Systems Are Experiencing Increased Hallucinations (Uncertainty Surrounds the Cause)

Artificial Intelligence Systems Experiencing Increased Hallucinations (Reason for the Phenomenon Not Fully Understood)

AI hallucinations have always been a stumbling block for generative models: The same architecture that allows them to be creative and produce text and images also leaves them vulnerable to manufacturing fictional information. And it seems the hallucination problem isn't improving with advances in AI-it's actually getting worse.

In a new technical report from OpenAI (as reported by The New York Times), the company reveals how its latest o3 and o4-mini models hallucinate a staggering 51% and 79%, respectively, on an AI benchmark called SimpleQA. On the older o1 model, the SimpleQA hallucination rate is 44%.

Those numbers are concerning-and they're moving in the wrong direction. These models are known for their reasoning abilities, but it appears that this contemplative process is leaving more room for errors and inaccuracies to slip in.

False facts are not limited to OpenAI and ChatGPT. I easily got Google's AI Overview search feature to stumble, and AI's inability to extract accurate information from the web is well-known. Recently, a support bot for AI coding app Cursor announced a policy change that hadn't actually taken place yet.

You won't find many mentions of these hallucinations in the announcements of AI companies' latest innovations. Along with energy use and copyright infringement, hallucinations are something the big names in AI would rather not discuss.

While I haven't noticed too many errors when using AI search and bots-the error rate is certainly nowhere near 79%-it seems this is a problem that might never go away, particularly since the teams working on these AI models don't fully understand the reasons behind hallucinations.

In tests run by AI platform developer Vectera, the results are better but not perfect: Many models are showing hallucination rates between 1-3%. OpenAI's o3 model stands at 6.8%, while the smaller o4-mini is at 4.6%. That's more in line with my experience interacting with these tools, but even a very low percentage of hallucinations can mean serious problems-especially as we entrust more responsibilities to AI systems.

Tracking Down the Causes of Hallucinations

No one seems to know how to solve hallucinations or precisely determine their causes: These models aren't designed to follow rules set by their developers but rather to find their own method of working and responding. Vectera's CEO Amr Awadallah told The New York Times that AI models will "always hallucinate," and that these problems will "never disappear."

University of Washington professor Hannaneh Hajishirzi, who is working on reverse-engineering AI responses, told The NYT that "we still don't fully understand how these models work." Solving problems calls for knowing what's gone wrong, just like fixing your car or computer.

According to researcher Neil Chowdhury, from AI analysis lab Transluce, the construction of reasoning models might be making the problem worse. "Our hypothesis is that the kind of reinforcement learning used for O-series models may amplify issues that are usually mitigated-but not fully erased-by standard post-training pipelines," he told TechCrunch.

In OpenAI's own performance report, the issue of "less world knowledge" is mentioned, while it's also noted that the o3 model tends to make more claims than its predecessor-which then leads to more hallucinations. In the end, though, "more research is needed to understand the causes of these results," according to OpenAI.

There are plenty of researchers working on this issue. For example, Oxford University academics have proposed a method for detecting the probability of hallucinations by measuring the variation between multiple AI outputs. However, this comes at a cost in terms of time and processing power and doesn't exactly fix the hallucination problem-it just helps you know when they're more likely to occur.

While allowing AI models to double-check their facts online can help in certain situations, they're not particularly good at this either. They lack basic human common sense that tells them, for instance, that "glue shouldn't be put on a pizza" or that "$410 for a Starbucks coffee" is obviously incorrect.

What's clear is that AI bots can't always be trusted, despite their confident tone-whether they're providing "news summaries," "legal advice," or "interview transcripts." That's crucial to remember as these AI models become more prevalent in our personal and professional lives, and it's a good idea to restrict AI use to cases where hallucinations matter less.

Disclosure: Lifehacker's parent company, Ziff Davis, filed a lawsuit against OpenAI in April, alleging that OpenAI infringed Ziff Davis copyrights in training and operating its AI systems.

  1. OpenAI's latest o3 and o4-mini models hallucinate at staggering rates of 51% and 79%, respectively, on SimpleQA benchmark, compared to 44% on the older o1 model, revealing a persistent issue in AI-generated hallucinations.
  2. TechCrunch reports that researchers at AI analysis lab Transluce believe the construction of reasoning models might be amplifying the hallucination problem, suggesting a possibility of exacerbating already existing issues.
  3. In the same context, University of Washington professor Hannaneh Hajishirzi is working on reverse-engineering AI responses, admitting that "we still don't fully understand how these models work."
  4. OpenAI, in its own performance report, cites a lack of "less world knowledge" as a factor contributing to the increased hallucination rates in their o3 model, suggesting that further research is required for a comprehensive understanding of the causes of AI hallucinations.
AI Systems Exhibit Increased Hallucinations (Unclear Causes Behind This Phenomenon)
AI Systems Exhibit Increased Hallucinations (Unclear Reasons Behind the Phenomenon)

Read also:

    Latest