Kiwis' Google searches revealed to researchers the outcome of last year's cannabis referendum - and one prediction came within 0.2 per cent of the final "no" result days before it was announced.
But the lead author of a new study that used Google Trends to pick the outcome feels the tool still shouldn't be seen as an alternative to polling - and told the Herald he was surprised at his team's accurate forecast.
Launched 14 years ago, the Google Trends website analyses the popularity of top search queries across regions and languages, and uses graphs to compare volumes of different searches over time.
While there's been much hype over what the tool can tell us about election results and other issues, Dr Jacques Raubenheimer's studies are beginning to show that most of what researchers are doing with it is flawed.
A University of Sydney senior research fellow whose work includes studying what Google search data can reveal about drug use, Raubenheimer and colleagues turned to New Zealand trends around its referendum over legalising recreational cannabis.
"Our study did not seek, per se, to see whether we could predict the outcome of the referendum, but to compare the ability of various methods of accessing Google Trends data across various timeframes to predict the outcome," he said.
"Our hypothesis was that a prediction within 2 per cent of the final outcome would be a good prediction."
"We thought this was a good test case for the use of Google Trends data for predictions like this, because there was another referendum at the same time, and given the big issues of last year, the referendum was relatively low on the list of priorities."
Raubenheimer and his Australian, Kiwi and US colleagues began by focusing on queries that included either "cannabis AND referendum AND yes" or "cannabis AND referendum AND no".
Datasets based on those searches - the most frequent related ones in the lead-up to the October vote - were downloaded from Google Trends' open platform, as well as from another one only available to licensed members.
It was then pooled into three sets: searches over the 90 days up to the last day of voting, the entire period of voting, and the last week of voting.
Next, they calculated their predictions as proportions of those two "yes" and "no" search queries combined, and explored how the results varied between different time periods and measures.
On October 20, the researchers predicted a "no" result somewhere between 49.0 to 55.4 per cent using daily data.
One specific prediction, calculated from Google's website accessed via the open source PyTrends package, and using daily data over the entire voting period, came in at 51.4 per cent - just shy of the referendum's actual final "no" result of 51.17 per cent.
In another prediction, an aggregation of 500 samples of Google Trends Extended for Health data - which Raubenheimer felt could be more trusted as it had a smaller margin of error - gave a 51.8 per cent "no" prediction.
"To tell you the truth, I was sceptical of actually pulling this off," he said.
"We made a big point of making our prediction well in advance - it really was made on 20 October and by that time, most of the first draft of the paper had already been written."
To ensure they avoided publication bias - or not getting a study with a negative result published - they submitted the paper before the preliminary results were announced on October 30.
"We faced the very real prospect that we would not get an accurate prediction. That we did - to an extent - was a pleasant surprise."
He added not all of the predictions hit the mark.
Those made off hourly data however proved much less accurate, with one set aggregated as the median over the time period picking a 60 per cent "no" vote, but another, calculated as the mean over the same period, picking a 49 per cent "no".
That was important as researchers often only picked one data source or timeframe to make their own predictions - which he likened to playing a lottery.
"And this means that our study's results will have to be repeated several times to figure out what really is the best data source and time frame to look at."
Asked whether those results indicated Google Trends data could become a different way of polling, he said: "Definitely no."
"One of my biggest frustrations was that I had very little data to work with during those initial months — Google Trends really only started showing a signal that we could use in the very short window before and during the referendum, which is what we then used in the paper," he explained.
"Polls, on the other hand, were being conducted from 2018. Polls have longevity, but not necessarily speed. Google Trends has speed, but zero longevity."
Raubenheimer added that people's use of search engines like Google was complicated, as had been noted by Seth Stephens-Davidowitz's book Everybody Lies.
"He makes the claim that people tend to treat Google more as a confessional than a search engine," he said.
"And the underlying premise here is that when people are googling the cannabis referendum - especially with one of the yes or no voting options - they are also subconsciously or consciously declaring their voting intention."
Whether that really was so or not, he said, was something that should be deliberated and debated.
"I do think, though, that Google Trends will never stand alone as a prediction tool for election or referendum results - but journalists all over the world are, for better or worse, already doing just that."