NZ Herald
  • Home
  • Latest news
  • Herald NOW
  • Video
  • New Zealand
  • Sport
  • World
  • Business
  • Entertainment
  • Podcasts
  • Quizzes
  • Opinion
  • Lifestyle
  • Travel
  • Viva
  • Weather

Subscriptions

  • Herald Premium
  • Viva Premium
  • The Listener
  • BusinessDesk

Sections

  • Latest news
  • New Zealand
    • All New Zealand
    • Crime
    • Politics
    • Education
    • Open Justice
    • Scam Update
  • Herald NOW
  • On The Up
  • World
    • All World
    • Australia
    • Asia
    • UK
    • United States
    • Middle East
    • Europe
    • Pacific
  • Business
    • All Business
    • MarketsSharesCurrencyCommoditiesStock TakesCrypto
    • Markets with Madison
    • Media Insider
    • Business analysis
    • Personal financeKiwiSaverInterest ratesTaxInvestment
    • EconomyInflationGDPOfficial cash rateEmployment
    • Small business
    • Business reportsMood of the BoardroomProject AucklandSustainable business and financeCapital markets reportAgribusiness reportInfrastructure reportDynamic business
    • Deloitte Top 200 Awards
    • CompaniesAged CareAgribusinessAirlinesBanking and financeConstructionEnergyFreight and logisticsHealthcareManufacturingMedia and MarketingRetailTelecommunicationsTourism
  • Opinion
    • All Opinion
    • Analysis
    • Editorials
    • Business analysis
    • Premium opinion
    • Letters to the editor
  • Politics
  • Sport
    • All Sport
    • OlympicsParalympics
    • RugbySuper RugbyNPCAll BlacksBlack FernsRugby sevensSchool rugby
    • CricketBlack CapsWhite Ferns
    • Racing
    • NetballSilver Ferns
    • LeagueWarriorsNRL
    • FootballWellington PhoenixAuckland FCAll WhitesFootball FernsEnglish Premier League
    • GolfNZ Open
    • MotorsportFormula 1
    • Boxing
    • UFC
    • BasketballNBABreakersTall BlacksTall Ferns
    • Tennis
    • Cycling
    • Athletics
    • SailingAmerica's CupSailGP
    • Rowing
  • Lifestyle
    • All Lifestyle
    • Viva - Food, fashion & beauty
    • Society Insider
    • Royals
    • Sex & relationships
    • Food & drinkRecipesRecipe collectionsRestaurant reviewsRestaurant bookings
    • Health & wellbeing
    • Fashion & beauty
    • Pets & animals
    • The Selection - Shop the trendsShop fashionShop beautyShop entertainmentShop giftsShop home & living
    • Milford's Investing Place
  • Entertainment
    • All Entertainment
    • TV
    • MoviesMovie reviews
    • MusicMusic reviews
    • BooksBook reviews
    • Culture
    • ReviewsBook reviewsMovie reviewsMusic reviewsRestaurant reviews
  • Travel
    • All Travel
    • News
    • New ZealandNorthlandAucklandWellingtonCanterburyOtago / QueenstownNelson-TasmanBest NZ beaches
    • International travelAustraliaPacific IslandsEuropeUKUSAAfricaAsia
    • Rail holidays
    • Cruise holidays
    • Ski holidays
    • Luxury travel
    • Adventure travel
  • Kāhu Māori news
  • Environment
    • All Environment
    • Our Green Future
  • Talanoa Pacific news
  • Property
    • All Property
    • Property Insider
    • Interest rates tracker
    • Residential property listings
    • Commercial property listings
  • Health
  • Technology
    • All Technology
    • AI
    • Social media
  • Rural
    • All Rural
    • Dairy farming
    • Sheep & beef farming
    • Horticulture
    • Animal health
    • Rural business
    • Rural life
    • Rural technology
    • Opinion
    • Audio & podcasts
  • Weather forecasts
    • All Weather forecasts
    • Kaitaia
    • Whangārei
    • Dargaville
    • Auckland
    • Thames
    • Tauranga
    • Hamilton
    • Whakatāne
    • Rotorua
    • Tokoroa
    • Te Kuiti
    • Taumaranui
    • Taupō
    • Gisborne
    • New Plymouth
    • Napier
    • Hastings
    • Dannevirke
    • Whanganui
    • Palmerston North
    • Levin
    • Paraparaumu
    • Masterton
    • Wellington
    • Motueka
    • Nelson
    • Blenheim
    • Westport
    • Reefton
    • Kaikōura
    • Greymouth
    • Hokitika
    • Christchurch
    • Ashburton
    • Timaru
    • Wānaka
    • Oamaru
    • Queenstown
    • Dunedin
    • Gore
    • Invercargill
  • Meet the journalists
  • Promotions & competitions
  • OneRoof property listings
  • Driven car news

Puzzles & Quizzes

  • Puzzles
    • All Puzzles
    • Sudoku
    • Code Cracker
    • Crosswords
    • Cryptic crossword
    • Wordsearch
  • Quizzes
    • All Quizzes
    • Morning quiz
    • Afternoon quiz
    • Sports quiz

Regions

  • Northland
    • All Northland
    • Far North
    • Kaitaia
    • Kerikeri
    • Kaikohe
    • Bay of Islands
    • Whangarei
    • Dargaville
    • Kaipara
    • Mangawhai
  • Auckland
  • Waikato
    • All Waikato
    • Hamilton
    • Coromandel & Hauraki
    • Matamata & Piako
    • Cambridge
    • Te Awamutu
    • Tokoroa & South Waikato
    • Taupō & Tūrangi
  • Bay of Plenty
    • All Bay of Plenty
    • Katikati
    • Tauranga
    • Mount Maunganui
    • Pāpāmoa
    • Te Puke
    • Whakatāne
  • Rotorua
  • Hawke's Bay
    • All Hawke's Bay
    • Napier
    • Hastings
    • Havelock North
    • Central Hawke's Bay
    • Wairoa
  • Taranaki
    • All Taranaki
    • Stratford
    • New Plymouth
    • Hāwera
  • Manawatū - Whanganui
    • All Manawatū - Whanganui
    • Whanganui
    • Palmerston North
    • Manawatū
    • Tararua
    • Horowhenua
  • Wellington
    • All Wellington
    • Kapiti
    • Wairarapa
    • Upper Hutt
    • Lower Hutt
  • Nelson & Tasman
    • All Nelson & Tasman
    • Motueka
    • Nelson
    • Tasman
  • Marlborough
  • West Coast
  • Canterbury
    • All Canterbury
    • Kaikōura
    • Christchurch
    • Ashburton
    • Timaru
  • Otago
    • All Otago
    • Oamaru
    • Dunedin
    • Balclutha
    • Alexandra
    • Queenstown
    • Wanaka
  • Southland
    • All Southland
    • Invercargill
    • Gore
    • Stewart Island
  • Gisborne

Media

  • Video
    • All Video
    • NZ news video
    • Herald NOW
    • Business news video
    • Politics news video
    • Sport video
    • World news video
    • Lifestyle video
    • Entertainment video
    • Travel video
    • Markets with Madison
    • Kea Kids news
  • Podcasts
    • All Podcasts
    • The Front Page
    • On the Tiles
    • Ask me Anything
    • The Little Things
  • Cartoons
  • Photo galleries
  • Today's Paper - E-editions
  • Photo sales
  • Classifieds

NZME Network

  • Advertise with NZME
  • OneRoof
  • Driven Car Guide
  • BusinessDesk
  • Newstalk ZB
  • Sunlive
  • ZM
  • The Hits
  • Coast
  • Radio Hauraki
  • The Alternative Commentary Collective
  • Gold
  • Flava
  • iHeart Radio
  • Hokonui
  • Radio Wanaka
  • iHeartCountry New Zealand
  • Restaurant Hub
  • NZME Events

SubscribeSign In
Advertisement
Advertise with NZME.
Home / Business

Five AI bots took our tough reading test. One was smartest – and it wasn’t ChatGPT

By Geoffrey A. Fowler
Washington Post·
4 Jun, 2025 09:43 PM8 mins to read

Subscribe to listen

Access to Herald Premium articles require a Premium subscription. Subscribe now to listen.
Already a subscriber?  Sign in here

Listening to articles is free for open-access content—explore other articles or learn more about text-to-speech.
‌
Save

    Share this article

AI summaries frequently left out important information and overemphasised the positive (while ignoring the negative). Photo / 123rf

AI summaries frequently left out important information and overemphasised the positive (while ignoring the negative). Photo / 123rf

All of the most popular artificial intelligence chatbots have the ability to upload and summarise documents, from legal contracts to an entire book. The tech promises to give you a kind of speed-reading superpower. But do any of the bots really understand what they’re reading?

To figure out which artificial intelligence (AI) tools you can trust as a reading assistant, I held a competition. I challenged five bots to read four very different types of writing and then tested their comprehension. The reading spanned the liberal arts, including a novel, medical research, legal agreements and speeches by President Donald Trump.

To judge the AI tools’ summaries and analysis, I gathered a panel of experts – including the original authors of the book and scientific reports.

All told, I asked 115 questions about the assigned reading to ChatGPT, Claude, Copilot, Meta AI and Gemini. Some of the AI responses were astoundingly good. Others were so clueless they sounded like Seinfeld’s George Costanza.

All the bots, barring one, made up – or “hallucinated” – information, a persistent AI problem. But facts were only one part of the challenge; my questions also challenged the AI to provide analysis, such as recommending improvements to the contracts and spotting factual problems in Trump’s speeches. (In March, I ran a similar test asking AI to write tough emails. Send me an email about what you’d like me to test next.)

Advertisement
Advertise with NZME.
Advertisement
Advertise with NZME.

Wait, shouldn’t people be doing their own reading? There’s still no substitute for reading yourself, particularly if you’re trying to learn or experience art. But for better or worse, people are turning to AI for help when they want to get up to speed on a new topic, need help decoding jargon or need to cheat their way through a meeting. Summarisation is emerging as a core use for AI, and chatbots promise to be a kind of CliffsNotes where you can ask follow-up questions.

If you use AI, this test offers a real-world assessment of what the current tech can – and cannot – reliably accomplish. (The Washington Post has a content partnership with ChatGPT’s maker, OpenAI.)

Here’s how the bots performed on each topic, followed by an overall champion and our judges’ conclusions.

Advertisement
Advertise with NZME.

Literature

Best: ChatGPT

Literature was the worst subject overall for the bots. Only Claude got all the facts right about Chris Bohjalian’s 2025 Civil War love story, The Jackal’s Mistress.

Gemini, which wrote very short responses to our questions, was most often guilty of what Bohjalian called inaccurate, misleading and sloppy reading. In one summary, Gemini described a man who just had a leg amputated “appearing” on another character’s doorstep. Bohjalian says the answer reminded him of the Seinfeld episode where Costanza watches the Breakfast at Tiffany’s movie instead of reading the novel and ends up embarrassing himself at the book club.

Even the best overall summary of the book, which came from ChatGPT, left something to be desired. “The response could be copy for the dust jacket. But it also discusses only three of the five major characters, ignoring the important role of the two formerly enslaved people,” says Bohjalian. In fact, he noticed the overly “positive” AI helpers often failed to address slavery and the Civil War.

Discover more

Business|companies

Tech Insider: The Kiwis most likely to support an U16 social media ban; lawyer's AI horror story

21 May 05:00 AM
Technology

Tech Insider: Wellington man gets shock $16k bill after using a Google AI-ready tool

04 Jun 07:04 AM
Business|companies

On The Up: AI disruptors – meet the Kiwis using new tech to boost their businesses and lead the way

19 May 09:00 PM
Business|markets

AI could make crashes worse, amplify herd behaviour: Reserve Bank

05 May 03:00 AM

That said, the quality of answers to more analytical questions by both ChatGPT and Claude left Bohjalian gobsmacked. Prompted to describe how the book’s epilogue “made you feel,” both bots appeared to have “all the feels”, Bohjalian says.

“These responses express precisely what I was trying to convey,” says Bohjalian.

Scores, out of 10: ChatGPT 7.8; Claude 7.3; Meta AI 4.3; Copilot 3.5; Gemini 2.3

Law

Best: Claude

Sterling Miller, a long-time corporate lawyer, judged our AI tools’ understanding of two common legal contracts that people might not necessarily have a lawyer around to help them with. What he found was inconsistency.

At times, Meta AI and ChatGPT tried to reduce complex parts of the contracts to one-line summaries. “That is basically useless,” Miller says.

Advertisement
Advertise with NZME.

Worse, the bots sometimes didn’t seem to appreciate significant nuances. In our test rental agreement, Meta AI skipped several sections entirely and missed that a landlord could enter the property at any time. ChatGPT forgot to mention a key clause in a contractor agreement about who owned inventions.

Claude won overall by offering the most consistently decent answers to our questions. And it did its best work on our most complex request: suggesting changes to our test rental agreement. Miller said Claude’s answer was complete, picked up on nuance and laid things out exactly like he would.

On that prompt, it came the closest to being a “good substitute for a lawyer,” Miller says. “The problem is none of the tools got 10s across the board.”

Scores, out of 10: Claude 6.9; Gemini 6.1; Copilot 5.4; ChatGPT 5.3; Meta AI 2.6

Health science

Best: Claude

On average, all of the AI tools scored better at analysing scientific research. In our test of two papers co-written by judge Eric Topol, less than two points separated the best and worst performances.

Advertisement
Advertise with NZME.

It’s hard to say exactly why. AI might have access to a lot of scientific papers in its training data. Research reports were also the only documents in our tests that follow a very predictable structure, including their own human-written summary introduction.

Topol’s lowest score of 4 went to Gemini for its summary of a study on Parkinson’s disease. The response didn’t introduce hallucinations, but it left out key descriptions of the study and why it mattered.

Claude was the only AI tool to earn a score of 10 out of 10. Topol gave that for its summary of his paper on long covid, which helpfully broke down the results for different kinds of patients and highlighted the most important takeaway from the paper for doctors treating covid patients.

However, on an analytical question about how one study accounted for racial differences, Claude scored only a 5. “I was very surprised at how different the responses were for the different prompts,” says Topol.

Scores, out of 10: Claude 7.7; ChatGPT 7.2; Copilot 7; Gemini 6.5; Meta AI 6

Politics

Best: ChatGPT

Advertisement
Advertise with NZME.

Trump’s speeches can be so meandering, they’ve garnered their own stylistic nickname: “the weave”. Cat Zakrzewski, a Washington Post White House reporter, judged whether AI could make out what he was actually asserting and analyse what it meant.

For example, we asked the bots to analyse Trump’s 100-day rally in Michigan, in which he mentioned jobs returning to the state about a dozen times. But how many jobs? Copilot incorrectly said thousands by conflating some comments Trump made about keeping an Air Force base open. Meta AI answered best by reporting that Trump never specified, while also highlighting what he did suggest about auto jobs.

ChatGPT stood out from the pack with impressive responses to about half of our questions. For example, when we asked it to identify what rival Democrats wouldn’t like about Trump’s unscripted 100-day rally, it produced a bullet-point list that hit all the right notes. “This answer does a good job of drawing specific examples from the speech, and it provides accurate context,” Zakrzewski says. What’s more, it “accurately fact-checks Trump’s false claims that he won the 2020 election”.

The bots got into the most trouble conveying Trump’s tone. For example, Copilot’s summary of the 100-day rally was factually accurate but didn’t capture its charged nature. “If you only read this summary, you might not believe Trump delivered this speech,” says Zakrzewski.

Scores, out of 10: ChatGPT 7.2; Claude 6.2; Meta AI 5.2; Gemini 5; Copilot 3.7

And the overall winner is …

Claude edged out ChatGPT and left the others in the dust.

Advertisement
Advertise with NZME.

Overall winner Claude was also the only model that never hallucinated.

What did we learn?

So is that good or bad? Both Claude and ChatGPT produced some analysis that knocked it out of the park, the judges said.

During his evaluations of those two tools, Bohjalian was flabbergasted. “Okay, I’m done. Whole human race is. Stick a fork in us,” he noted.

But you could also see the results this way: none of the bots scored higher than 70% overall – the typical cutoff for a D+.

Beyond hallucinations, a number of limitations echoed across the tests. AI summaries frequently left out important information and overemphasised the positive (while ignoring the negative). Too often, Bohjalian says, you could “really see the robot hiding behind the human mask” pretending to be an expert in something it didn’t actually understand.

And an AI tool’s capability in one field didn’t necessarily translate to another. ChatGPT, for example, might have been tops in politics and literature but ranked near the bottom in law.

Advertisement
Advertise with NZME.

The judges highlight the inconsistency as reason for caution.

Miller says AI is not a substitute for a lawyer. “If paying an attorney is out of the question or if you just want to have something in hand while you also read through the agreement or document,” he says, “then using generative AI is an ‘okay’ solution.”

I’d also recommend running your document through at least two AI tools, so you can compare the results. And for anything that’s actually important in your life, it’s definitely worth taking the time to read it yourself.

Save

    Share this article

Latest from Business

Premium
Media Insider

Hungry 'food deliverer' at doorstep - fake video dupes Stuff; Big ad industry moves, exits

06 Jun 08:41 AM
Premium
Economy|official cash rate

Public in the dark over Adrian Orr's resignation, as RBNZ delays information release

06 Jun 07:00 AM
Construction

Victims wonder if the ‘Glib Huckster’ can ever change

06 Jun 06:42 AM

Audi offers a sporty spin on city driving with the A3 Sportback and S3 Sportback

sponsored
Advertisement
Advertise with NZME.

Latest from Business

Premium
Hungry 'food deliverer' at doorstep -  fake video dupes Stuff; Big ad industry moves, exits

Hungry 'food deliverer' at doorstep - fake video dupes Stuff; Big ad industry moves, exits

06 Jun 08:41 AM

Heather du Plessis-Allan's big night at Radio Awards; TVNZ drops award-winning series.

Premium
Public in the dark over Adrian Orr's resignation, as RBNZ delays information release

Public in the dark over Adrian Orr's resignation, as RBNZ delays information release

06 Jun 07:00 AM
Victims wonder if the ‘Glib Huckster’ can ever change

Victims wonder if the ‘Glib Huckster’ can ever change

06 Jun 06:42 AM
Premium
Market close: NZ stocks end flat amid Fletcher-SkyCity legal woes

Market close: NZ stocks end flat amid Fletcher-SkyCity legal woes

06 Jun 06:27 AM
Gold demand soars amid global turmoil
sponsored

Gold demand soars amid global turmoil

NZ Herald
  • About NZ Herald
  • Meet the journalists
  • Newsletters
  • Classifieds
  • Help & support
  • Contact us
  • House rules
  • Privacy Policy
  • Terms of use
  • Competition terms & conditions
  • Our use of AI
Subscriber Services
  • NZ Herald e-editions
  • Daily puzzles & quizzes
  • Manage your digital subscription
  • Manage your print subscription
  • Subscribe to the NZ Herald newspaper
  • Subscribe to Herald Premium
  • Gift a subscription
  • Subscriber FAQs
  • Subscription terms & conditions
  • Promotions and subscriber benefits
NZME Network
  • The New Zealand Herald
  • The Northland Age
  • The Northern Advocate
  • Waikato Herald
  • Bay of Plenty Times
  • Rotorua Daily Post
  • Hawke's Bay Today
  • Whanganui Chronicle
  • Viva
  • NZ Listener
  • What the Actual
  • Newstalk ZB
  • BusinessDesk
  • OneRoof
  • Driven CarGuide
  • iHeart Radio
  • Restaurant Hub
NZME
  • About NZME
  • NZME careers
  • Advertise with NZME
  • Digital self-service advertising
  • Book your classified ad
  • Photo sales
  • NZME Events
  • © Copyright 2025 NZME Publishing Limited
TOP