NZ Herald
  • Home
  • Latest news
  • Video
  • New Zealand
  • Sport
  • World
  • Business
  • Entertainment
  • Podcasts
  • Quizzes
  • Opinion
  • Lifestyle
  • Travel
  • Viva
  • Weather forecasts

Subscriptions

  • Herald Premium
  • Viva Premium
  • The Listener
  • BusinessDesk

Sections

  • Latest news
  • New Zealand
    • All New Zealand
    • Crime
    • Politics
    • Education
    • Open Justice
    • Scam Update
    • The Great NZ Road Trip
  • On The Up
  • World
    • All World
    • Australia
    • Asia
    • UK
    • United States
    • Middle East
    • Europe
    • Pacific
  • Business
    • All Business
    • MarketsSharesCurrencyCommoditiesStock TakesCrypto
    • Markets with Madison
    • Media Insider
    • Business analysis
    • Personal financeKiwiSaverInterest ratesTaxInvestment
    • EconomyInflationGDPOfficial cash rateEmployment
    • Small business
    • Business reportsMood of the BoardroomProject AucklandSustainable business and financeCapital markets reportAgribusiness reportInfrastructure reportDynamic business
    • Deloitte Top 200 Awards
    • CompaniesAged CareAgribusinessAirlinesBanking and financeConstructionEnergyFreight and logisticsHealthcareManufacturingMedia and MarketingRetailTelecommunicationsTourism
  • Opinion
    • All Opinion
    • Analysis
    • Editorials
    • Business analysis
    • Premium opinion
    • Letters to the editor
  • Sport
    • All Sport
    • OlympicsParalympics
    • RugbySuper RugbyNPCAll BlacksBlack FernsRugby sevensSchool rugby
    • CricketBlack CapsWhite Ferns
    • Racing
    • NetballSilver Ferns
    • LeagueWarriorsNRL
    • FootballWellington PhoenixAuckland FCAll WhitesFootball FernsEnglish Premier League
    • GolfNZ Open
    • MotorsportFormula 1
    • Boxing
    • UFC
    • BasketballNBABreakersTall BlacksTall Ferns
    • Tennis
    • Cycling
    • Athletics
    • SailingAmerica's CupSailGP
    • Rowing
  • Lifestyle
    • All Lifestyle
    • Viva - Food, fashion & beauty
    • Society Insider
    • Royals
    • Sex & relationships
    • Food & drinkRecipesRecipe collectionsRestaurant reviewsRestaurant bookings
    • Health & wellbeing
    • Fashion & beauty
    • Pets & animals
    • The Selection - Shop the trendsShop fashionShop beautyShop entertainmentShop giftsShop home & living
    • Milford's Investing Place
  • Entertainment
    • All Entertainment
    • TV
    • MoviesMovie reviews
    • MusicMusic reviews
    • BooksBook reviews
    • Culture
    • ReviewsBook reviewsMovie reviewsMusic reviewsRestaurant reviews
  • Travel
    • All Travel
    • News
    • New ZealandNorthlandAucklandWellingtonCanterburyOtago / QueenstownNelson-TasmanBest NZ beaches
    • International travelAustraliaPacific IslandsEuropeUKUSAAfricaAsia
    • Rail holidays
    • Cruise holidays
    • Ski holidays
    • Luxury travel
    • Adventure travel
  • Kāhu Māori news
  • Environment
    • All Environment
    • Our Green Future
  • Talanoa Pacific news
  • Property
    • All Property
    • Property Insider
    • Interest rates tracker
    • Residential property listings
    • Commercial property listings
  • Health
  • Technology
    • All Technology
    • AI
    • Social media
  • Rural
    • All Rural
    • Dairy farming
    • Sheep & beef farming
    • Horticulture
    • Animal health
    • Rural business
    • Rural life
    • Rural technology
    • Opinion
    • Audio & podcasts
  • Weather forecasts
    • All Weather forecasts
    • Kaitaia
    • Whangārei
    • Dargaville
    • Auckland
    • Thames
    • Tauranga
    • Hamilton
    • Whakatāne
    • Rotorua
    • Tokoroa
    • Te Kuiti
    • Taumaranui
    • Taupō
    • Gisborne
    • New Plymouth
    • Napier
    • Hastings
    • Dannevirke
    • Whanganui
    • Palmerston North
    • Levin
    • Paraparaumu
    • Masterton
    • Wellington
    • Motueka
    • Nelson
    • Blenheim
    • Westport
    • Reefton
    • Kaikōura
    • Greymouth
    • Hokitika
    • Christchurch
    • Ashburton
    • Timaru
    • Wānaka
    • Oamaru
    • Queenstown
    • Dunedin
    • Gore
    • Invercargill
  • Meet the journalists
  • Promotions & competitions
  • OneRoof property listings
  • Driven car news

Puzzles & Quizzes

  • Puzzles
    • All Puzzles
    • Sudoku
    • Code Cracker
    • Crosswords
    • Cryptic crossword
    • Wordsearch
  • Quizzes
    • All Quizzes
    • Morning quiz
    • Afternoon quiz
    • Sports quiz

Regions

  • Northland
    • All Northland
    • Far North
    • Kaitaia
    • Kerikeri
    • Kaikohe
    • Bay of Islands
    • Whangarei
    • Dargaville
    • Kaipara
    • Mangawhai
  • Auckland
  • Waikato
    • All Waikato
    • Hamilton
    • Coromandel & Hauraki
    • Matamata & Piako
    • Cambridge
    • Te Awamutu
    • Tokoroa & South Waikato
    • Taupō & Tūrangi
  • Bay of Plenty
    • All Bay of Plenty
    • Katikati
    • Tauranga
    • Mount Maunganui
    • Pāpāmoa
    • Te Puke
    • Whakatāne
  • Rotorua
  • Hawke's Bay
    • All Hawke's Bay
    • Napier
    • Hastings
    • Havelock North
    • Central Hawke's Bay
    • Wairoa
  • Taranaki
    • All Taranaki
    • Stratford
    • New Plymouth
    • Hāwera
  • Manawatū - Whanganui
    • All Manawatū - Whanganui
    • Whanganui
    • Palmerston North
    • Manawatū
    • Tararua
    • Horowhenua
  • Wellington
    • All Wellington
    • Kapiti
    • Wairarapa
    • Upper Hutt
    • Lower Hutt
  • Nelson & Tasman
    • All Nelson & Tasman
    • Motueka
    • Nelson
    • Tasman
  • Marlborough
  • West Coast
  • Canterbury
    • All Canterbury
    • Kaikōura
    • Christchurch
    • Ashburton
    • Timaru
  • Otago
    • All Otago
    • Oamaru
    • Dunedin
    • Balclutha
    • Alexandra
    • Queenstown
    • Wanaka
  • Southland
    • All Southland
    • Invercargill
    • Gore
    • Stewart Island
  • Gisborne

Media

  • Video
    • All Video
    • NZ news video
    • Business news video
    • Politics news video
    • Sport video
    • World news video
    • Lifestyle video
    • Entertainment video
    • Travel video
    • Markets with Madison
    • Kea Kids news
  • Podcasts
    • All Podcasts
    • The Front Page
    • On the Tiles
    • Ask me Anything
    • The Little Things
    • Cooking the Books
  • Cartoons
  • Photo galleries
  • Today's Paper - E-editions
  • Photo sales
  • Classifieds

NZME Network

  • Advertise with NZME
  • OneRoof
  • Driven Car Guide
  • BusinessDesk
  • Newstalk ZB
  • What the Actual
  • Sunlive
  • ZM
  • The Hits
  • Coast
  • Radio Hauraki
  • The Alternative Commentary Collective
  • Gold
  • Flava
  • iHeart Radio
  • Hokonui
  • Radio Wanaka
  • iHeartCountry New Zealand
  • Restaurant Hub
  • NZME Events

SubscribeSign In
Advertisement
Advertise with NZME.
Home / Business

AI is getting more powerful, but its hallucinations are getting worse

By Cade Metz and Karen Weise
New York Times·
17 May, 2025 07:00 PM7 mins to read

Subscribe to listen

Access to Herald Premium articles require a Premium subscription. Subscribe now to listen.
Already a subscriber?  Sign in here

Listening to articles is free for open-access content—explore other articles or learn more about text-to-speech.
‌
Save

    Share this article

    Reminder, this is a Premium article and requires a subscription to read.

There is still no way of ensuring that AI bots are producing accurate information. Photo / Erik Carter, The New York Times

There is still no way of ensuring that AI bots are producing accurate information. Photo / Erik Carter, The New York Times

A new wave of “reasoning” systems from companies like OpenAI is producing incorrect information more often. Even the companies don’t know why.

Last month, an artificial intelligence bot that handles tech support for Cursor, an up-and-coming tool for computer programmers, alerted several customers about a change in company policy. It said they were no longer allowed to use Cursor on more than just one computer.

In angry posts to internet message boards, the customers complained. Some cancelled their Cursor accounts. And some got even angrier when they realised what had happened: the AI bot had announced a policy change that did not exist.

“We have no such policy. You’re of course free to use Cursor on multiple machines,” the company’s CEO and co-founder, Michael Truell, wrote in a Reddit post. “Unfortunately, this is an incorrect response from a front-line AI support bot.”

More than two years after the arrival of ChatGPT, tech companies, office workers and everyday consumers are using AI bots for an increasingly wide array of tasks. But there is still no way of ensuring that these systems produce accurate information.

Advertisement
Advertise with NZME.

The newest and most powerful technologies – so-called reasoning systems from companies including OpenAI, Google and the Chinese startup DeepSeek – are generating more errors, not fewer. As their math skills have notably improved, their handle on facts has gotten shakier. It is not entirely clear why.

Today’s AI bots are based on complex mathematical systems that learn their skills by analysing enormous amounts of digital data. They do not – and cannot – decide what is true and what is false. Sometimes, they just make stuff up, a phenomenon some AI researchers call hallucinations. On one test, the hallucination rates of newer AI systems were as high as 79%.

These systems use mathematical probabilities to guess the best response, not a strict set of rules defined by human engineers. So they make a certain number of mistakes. “Despite our best efforts, they will always hallucinate,” said Amr Awadallah, CEO of Vectara, a startup that builds AI tools for businesses, and a former Google executive. “That will never go away.”

Advertisement
Advertise with NZME.
Amr Awadallah, the chief executive of Vectara, which builds AI tools for businesses, believes AI “hallucinations” will persist. Photo / Cayce Clifford, The New York Times
Amr Awadallah, the chief executive of Vectara, which builds AI tools for businesses, believes AI “hallucinations” will persist. Photo / Cayce Clifford, The New York Times

For several years, this phenomenon has raised concerns about the reliability of these systems. Though they are useful in some situations – like writing term papers, summarising office documents and generating computer code – their mistakes can cause problems.

The AI bots tied to search engines like Google and Bing sometimes generate search results that are laughably wrong. If you ask them for a good marathon on the West Coast, they might suggest a race in Philadelphia. If they tell you the number of households in Illinois, they might cite a source that does not include that information.

Discover more

Business

Workplace of the future: How humans and digital colleagues will work together

29 Apr 04:00 AM
Opinion

Sasha Borissenko: Legal risks soar as AI enters the courtroom

20 Apr 12:00 AM
Business|markets

OpenAI says it raised US$40b at valuation of US$300b

01 Apr 08:32 PM
World

Doctors told him he was going to die. Then AI saved his life

22 Mar 01:00 AM

Those hallucinations may not be a big problem for many people, but it is a serious issue for anyone using the technology with court documents, medical information or sensitive business data.

“You spend a lot of time trying to figure out which responses are factual and which aren’t,” said Pratik Verma, co-founder and CEO of Okahu, a company that helps businesses navigate the hallucination problem. “Not dealing with these errors properly basically eliminates the value of AI systems, which are supposed to automate tasks for you.”

Cursor and Truell did not respond to requests for comment.

For more than two years, companies such as OpenAI and Google steadily improved their AI systems and reduced the frequency of these errors. But with the use of new reasoning systems, errors are rising. The latest OpenAI systems hallucinate at a higher rate than the company’s previous system, according to the company’s own tests.

The company found that o3 – its most powerful system – hallucinated 33% of the time when running its PersonQA benchmark test, which involves answering questions about public figures. That is more than twice the hallucination rate of OpenAI’s previous reasoning system, called o1. The new o4-mini hallucinated at an even higher rate: 48%.

When running another test called SimpleQA, which asks more general questions, the hallucination rates for o3 and o4-mini were 51% and 79%. The previous system, o1, hallucinated 44% of the time.

Advertisement
Advertise with NZME.
Since the arrival of ChatGPT, the phenomenon of hallucination has raised concerns about the reliability of AI systems. Photo / Kelsey McClellan, The New York Times
Since the arrival of ChatGPT, the phenomenon of hallucination has raised concerns about the reliability of AI systems. Photo / Kelsey McClellan, The New York Times

In a paper detailing the tests, OpenAI said more research was needed to understand the cause of these results. Because AI systems learn from more data than people can wrap their heads around, technologists struggle to determine why they behave in the ways they do.

“Hallucinations are not inherently more prevalent in reasoning models, though we are actively working to reduce the higher rates of hallucination we saw in o3 and o4-mini,” a company spokesperson, Gaby Raila, said. “We’ll continue our research on hallucinations across all models to improve accuracy and reliability.”

Hannaneh Hajishirzi, a professor at the University of Washington and a researcher with the Allen Institute for Artificial Intelligence, is part of a team that recently devised a way of tracing a system’s behaviour back to the individual pieces of data it was trained on. But because systems learn from so much data – and because they can generate almost anything – this new tool can’t explain everything. “We still don’t know how these models work exactly,” she said.

Tests by independent companies and researchers indicate that hallucination rates are also rising for reasoning models from companies such as Google and DeepSeek.

Since late 2023, Awadallah’s company, Vectara, has tracked how often chatbots veer from the truth. The company asks these systems to perform a straightforward task that is readily verified: summarise specific news articles. Even then, chatbots persistently invent information.

Vectara’s original research estimated that in this situation chatbots made up information at least 3% of the time and sometimes as much as 27%.

In the year and a half since, companies such as OpenAI and Google pushed those numbers down into the 1% or 2% range. Others, such as San Francisco startup Anthropic, hovered around 4%. But hallucination rates on this test have risen with reasoning systems. DeepSeek’s reasoning system, R1, hallucinated 14.3% of the time. OpenAI’s o3 climbed to 6.8%.

(The New York Times has sued OpenAI and its partner, Microsoft, accusing them of copyright infringement regarding news content related to AI systems. OpenAI and Microsoft have denied those claims.)

For years, companies like OpenAI relied on a simple concept: the more internet data they fed into their AI systems, the better those systems would perform. But they used up just about all the English text on the internet, which meant they needed a new way of improving their chatbots.

So these companies are leaning more heavily on a technique that scientists call reinforcement learning. With this process, a system can learn behaviour through trial and error. It is working well in certain areas, such as maths and computer programming. But it is falling short in other areas.

“The way these systems are trained, they will start focusing on one task – and start forgetting about others,” said Laura Perez-Beltrachini, a researcher at the University of Edinburgh who is among a team closely examining the hallucination problem.

Another issue is that reasoning models are designed to spend time “thinking” through complex problems before settling on an answer. As they try to tackle a problem step by step, they run the risk of hallucinating at each step. The errors can compound as they spend more time thinking.

The latest bots reveal each step to users, which means the users may see each error, too. Researchers have also found that in many cases, the steps displayed by a bot are unrelated to the answer it eventually delivers.

“What the system says it is thinking is not necessarily what it is thinking,” said Aryo Pradipta Gema, an AI researcher at the University of Edinburgh and a fellow at Anthropic.

This article originally appeared in The New York Times.

Written by: Cade Metz and Karen Weise

Photographs by: Erik Carter, Cayce Clifford and Kelsey McClellan

©2025 THE NEW YORK TIMES

Save

    Share this article

    Reminder, this is a Premium article and requires a subscription to read.

Latest from Business

Premium
Media InsiderUpdated

'They’ve labelled me a troublemaker': Top economics professor terminates blog, takes aim at politicians

17 May 10:23 PM
Premium
Opinion

Diana Clement: What to do when your spending doesn’t match your financial reality

17 May 09:00 PM
Premium
Opinion

Liam Dann: ‘Perfect storm’ for flat whites - what surging food prices mean for the economy

17 May 05:00 PM

Deposit scheme reduces risk, boosts trust – General Finance

sponsored
Advertisement
Advertise with NZME.

Latest from Business

Premium
'They’ve labelled me a troublemaker': Top economics professor terminates blog, takes aim at politicians

'They’ve labelled me a troublemaker': Top economics professor terminates blog, takes aim at politicians

17 May 10:23 PM

Auck Uni professor's final post accuses political parties of threatening his prospects.

Premium
Diana Clement: What to do when your spending doesn’t match your financial reality

Diana Clement: What to do when your spending doesn’t match your financial reality

17 May 09:00 PM
Premium
Liam Dann: ‘Perfect storm’ for flat whites - what surging food prices mean for the economy

Liam Dann: ‘Perfect storm’ for flat whites - what surging food prices mean for the economy

17 May 05:00 PM
Premium
Ryan Bridge: I hereby request a pay equity claim for NZ v Aus

Ryan Bridge: I hereby request a pay equity claim for NZ v Aus

17 May 05:00 PM
Gold demand soars amid global turmoil
sponsored

Gold demand soars amid global turmoil

NZ Herald
  • About NZ Herald
  • Meet the journalists
  • Newsletters
  • Classifieds
  • Help & support
  • Contact us
  • House rules
  • Privacy Policy
  • Terms of use
  • Competition terms & conditions
  • Our use of AI
Subscriber Services
  • NZ Herald e-editions
  • Daily puzzles & quizzes
  • Manage your digital subscription
  • Manage your print subscription
  • Subscribe to the NZ Herald newspaper
  • Subscribe to Herald Premium
  • Gift a subscription
  • Subscriber FAQs
  • Subscription terms & conditions
  • Promotions and subscriber benefits
NZME Network
  • The New Zealand Herald
  • The Northland Age
  • The Northern Advocate
  • Waikato Herald
  • Bay of Plenty Times
  • Rotorua Daily Post
  • Hawke's Bay Today
  • Whanganui Chronicle
  • Viva
  • NZ Listener
  • What the Actual
  • Newstalk ZB
  • BusinessDesk
  • OneRoof
  • Driven CarGuide
  • iHeart Radio
  • Restaurant Hub
NZME
  • About NZME
  • NZME careers
  • Advertise with NZME
  • Digital self-service advertising
  • Book your classified ad
  • Photo sales
  • NZME Events
  • © Copyright 2025 NZME Publishing Limited
TOP