Another week, another massive data leak. This time, someone had left some 1.2 billion personal data records in open, internet-exposed databases hosted on the Google Cloud.
Of the records, over 622 million contained an email address, including my one. I know that as I have alerts from the excellent Have I Been Pwned? service enabled; if you haven't signed up to HIBP, do so now.
Sooner or later almost any service provider you use will leak data and it's good to know when that happens so you can change passwords or delete accounts on compromised hosts.
In this case, the leaking organisation isn't known but the data came from two companies: People Data Labs (PDL) and OxyData.
The two provide a service called data enrichment. If you know someone's name, or better yet, email address, it's easy to look them up in giant databases and add more information on people.
That's the enrichment and lots of companies collect data on you and trade it for marketing purposes. Adobe - yes, the Photoshop creator - is big on marketing and offers lots of specific lists with data on individuals, as researcher Wolfie Christl pointed out.
PDL and Oxydata think the data comes from one of their customers but they don't know who it was that left lots of personal information accessible to anyone with an internet connection; a fast one, as the data trove weighed in at 4 terabytes in size.
That's pretty amazingly careless, but massive data leaks like that happen at regular intervals so nobody should be surprised.
Oxydata is pretty anonymous with no indication as to who's behind the company on their website, which brags about 380 million profiles on individuals.
PDL on the other hand is more open, naming its co-founder Sean Thorne on the site, and even offers free access for up to 1000 queries a month to its 1.5 billion records on individuals.
For New Zealand, PDL holds 2,930,945 records in its database.
The data includes links to LinkedIn and Facebook profiles, birth dates, phone numbers, locations, work and personal email addresses and much more in some records, much less in others.
According to PDL, the information has been gathered from customers who volunteered it, and public data sources. Think about that for a while: if like me you had never heard of PDL before, the next thing you wonder is who gave or sold them your data?
It's free and simple to peek in PDL's database with minimal coding required, so I did.
There's plenty of detail on me in there, but not as much as I thought and one particular record looked like it had been merged with someone else's data and was really wrong as a result; which is arguably worse as it could lead to mistakes and confusion with other people.
PDL holds far less data on Jacinda Ardern, but have almost as much on Privacy Commissioner John Edwards (yes, I notified him about this) as it does on me. For Edwards and myself, the PDL data looks similar to what's on our LinkedIn profiles which might be an indication of its origin. LinkedIn suffered a data breach in 2012, with 167 million account credentials being leaked.
All the records I looked at have unique identifiers and SHA256 computed hash values for email addresses.
I am going to go out on a limb here and say that neither Ardern nor Edwards have given their permission for PDL to aggregate their data for profile enrichment or other purposes.
I most definitely have not done that, and am waiting to hear back from PDL with an explanation as to how they got the data, and an assurance that it will be deleted.
Even if the data is deleted from PDL's database, it will be stored elsewhere though. By whom? By mysterious, irresponsible organisations that are completely clueless about basic information security I wager. I don't want that to happen for obvious reasons.
Sure, the cat's out of the bag on this one but the potential for misuse of data is massive and not just by surveillance capitalists and hypertargeting marketroids. Spammers and phishers love data that make their scammy offerings look less bogus as well and scaling them up to billion target levels must seem like a dream to criminals.
We need to think about what it means to have an unregulated personal information broking industry selling personal data on just about anyone cheaply and efficiently and ask ourselves if this is really what we want - or need?
At the very minimum, we should be told who holds our data and why, and given a chance to opt out. That's for organisations operating above the line of course; the others need to be reported to the authorities and closed down.