Last week Yahoo! closed down their directory pages ending one of the defining services of the 1990s internet and showing how the internet has changed since the first dot com boom.
The Yahoo! Directory was victim of a fundamental change in how we manage data as Google showed it wasn’t necessary to tag and label every piece of information before it could be used.
Yahoo!’s Directory was a classic case of applying old methods to new technologies – in this case carrying out a librarian’s function of cataloguing and categorising every web page.
One problem with that way of saving information is you need to know part of the answer before you can start searching; you need to have some idea of what category your query comes under or the name of the business or person you’re looking for.
That pan was exploited by the Yellow Pages where licensees around the world harvested a healthy cash flow from businesses forced to list under a dozen different categories to make sure prospective customers found them.
With the arrival of Google that way of structuring information came to an end as Sergey Brin and Larry Page’s smart algorithm showed it wasn’t necessary to pigeonhole information into highly structured databases.
Rather than being structured, data is now becoming ‘unstructured’ and instead of employing an army of clerks to categorise information it’s now the job of computers to analyse that raw information and pick out what we need for our businesses and lives.
As information pours into companies from increasingly diverse sources, a flood that’s becoming so great it’s being referred to as the ‘data lake’, it’s become clear the battle to structure data is lost.
At the Splunk Conference in Las Vegas this week, the term ‘data lake’ is being used a lot as the company explains its technology for analysing business information.
Splunk, along with services like IBM’s Watson and Tableau Software, is one the companies capitalising on businesses’ need to manage unstructured data by giving customers the tools to analyse their information without having first to shoehorn it into a database.
“Thanks to Google we got to look at data a different way,” says Splunk’s CEO and Chairman Godfrey Sullivan. “You don’t have to know the question before you start the search.”
Diving into the data lake
It’s always dangerous applying simple labels to computing technologies but some terms, like ‘Cloud Computing’, don’t do a bad job of describing the principles involved and so it is with the ‘data lake’.
Rather than a nice, orderly world where everything can be pigeonholed, we know have a fluid environment where it wouldn’t be possible to label everything even if we wanted to. A lake is a good description of the mass of data pouring into our lives.
The web was an early example of having to manage that data lake and Google showed how it could be done. Now it’s the turn of other companies to apply the principles to business.
Google fatally damaged both Yahoo! and the Yellow Pages, other companies that are stuck in the age of structured data are going to find the future equally dismal. Don’t drown in that data lake.
Paul travelled to Las Vegas as a guest of Splunk