Douglass Cutting, the person to blame for the whole Big Data wave sweeping organisations along with it, isn't your typical digital evangelist.

In Auckland as part of Spark's analytics offshoot Qrious, which will work with big data company Cloudera that employs Cutting, he is very much an old-fashioned Silicon Valley engineer - straightforward and modest.

Having worked on search engines straight out of college since the 1980s, Cutting worked for the first set of IT giants to emerge, including Yahoo and Apple, as well as the famed Xerox PARC research centre that's pretty much defined modern computing with a range of inventions and concepts.

He tried drinking the startup Kool-Aid, but "wasn't an entrepreneur at heart", and instead gave away the things he had come up with, when the internet portal Excite that he worked at the time went bust.


One of the things that Cutting published under a very liberal open source licence was a system of computers that could store and process large amounts of data.

It was designed to work with commodity computers and survive hardware failures, and to scale up to massive sizes as needed.

That was some 11 years ago, and by his own admission, Cutting had created something of a monster, "with lots of shortcomings".

These included no security, a difficult to use application programming interface, the system was hard to deploy and monitor and generally a bear to deal with.

Cutting's son named the system after a yellow stuffed toy elephant called Hadoop, and the rest is IT history.

Hadoop copped plenty of criticism initially for being difficult to use, requiring experts to program it, and skilful administrators to keep the system going.

That was then, and Cutting promises that Hadoop is becoming easier to deploy, and to use, through high-level coding frameworks like Apache Spark which is also open source.

Marketers, banks, insurers and other organisations are among those that collect lots of data and want to put it to good use with Hadoop. Some of the real-world applications for Hadoop include risk and customer churn analysis, fraud and anomaly detection, and even understanding how customers use - or not - certain products, Cutting said.


Companies such as electric car maker Tesla upload just about everything their customers do to their company servers, in order to improve their products, Cutting added.

Data science tool are being developed, and Hadoop users can code in the popular Python and R languages. Machine learning, and the ability to run Hadoop in the cloud are being developed too, with proprietary software vendor Microsoft ironically being hot on the open source big data app.

One key attraction of Hadoop is that it is cheaper than traditional databases - one tenth of the older technology, Cutting claimed.

What do local companies interested in dipping their toes into the big data lake need then, to get started with Hadoop?

Cutting provided a salutary reminder that he moves in a very different world when explained that companies with one billion dollar revenue would absolutely benefit from Hadoop (that's US dollars by the way).

"Digital tech is pervading society, and if you want to understand what products and markets and the world is doing, you need the data," Cutting said.

"Otherwise, it's like closing your eyes, and putting your ears in your fingers," he added.

Such billion dollar companies generate more than enough data that's good and accurate as part of their business, but yes, they are a bit bigger than the vast majority of New Zealand enterprises.

Qrious head David Leach assured me that Hadoop solutions can be tailored for local businesses as well, small as they are, and that they too would benefit from big data with better efficiencies and lower costs.

To control your own fate though, Cutting was adamant that outsourcing isn't the right way to do it. This is similar to cloud computing: it can definitely save money, but you need someone on board with systems administrator skills to make it work and decide on the right things for your business.

Instead, it's necessary to build big data skills in-house, and consider those to be part of the core set of business know-how needed to operate in 2017.

In other words, your next hire might have to be a Big Data engineer or scientist - or both. Start looking now, because those people are rarer than hen's teeth.