Adrift in the data lake

We’re awash with data and businesses have to figure out how not to drown in it

Last week Yahoo! closed down their directory pages ending one of the defining services of the 1990s internet and showing how the internet has changed since the first dot com boom.

The Yahoo! Directory was victim of a fundamental change in how we manage data as Google showed it wasn’t necessary to tag and label every piece of information before it could be used.

Yahoo!’s Directory was a classic case of applying old methods to new technologies – in this case carrying out a librarian’s function of cataloguing and categorising every web page.

One problem with that way of saving information is you need to know part of the answer before you can start searching; you need to have some idea of what category your query comes under or the name of the business or person you’re looking for.

That pan was exploited by the Yellow Pages where licensees around the world harvested a healthy cash flow from businesses forced to list under a dozen different categories to make sure prospective customers found them.

With the arrival of Google that way of structuring information came to an end as Sergey Brin and Larry Page’s smart algorithm showed it wasn’t necessary to pigeonhole information into highly structured databases.

Unstructured data

Rather than being structured, data is now becoming ‘unstructured’ and instead of employing an army of clerks to categorise information it’s now the job of computers to analyse that raw information and pick out what we need for our businesses and lives.

As information pours into companies from increasingly diverse sources, a flood that’s becoming so great it’s being referred to as the ‘data lake’, it’s become clear the battle to structure data is lost.

At the Splunk Conference in Las Vegas this week, the term ‘data lake’ is being used a lot as the company explains its technology for analysing business information.

Splunk, along with services like IBM’s Watson and Tableau Software, is one the companies capitalising on businesses’ need to manage unstructured data by giving customers the tools to analyse their information without having first to shoehorn it into a database.

“Thanks to Google we got to look at data a different way,” says Splunk’s CEO and Chairman Godfrey Sullivan. “You don’t have to know the question before you start the search.”

Diving into the data lake

It’s always dangerous applying simple labels to computing technologies but some terms, like ‘Cloud Computing’, don’t do a bad job of describing the principles involved and so it is with the ‘data lake’.

Rather than a nice, orderly world where everything can be pigeonholed, we know have a fluid environment where it wouldn’t be possible to label everything even if we wanted to. A lake is a good description of the mass of data pouring into our lives.

The web was an early example of having to manage that data lake and Google showed how it could be done. Now it’s the turn of other companies to apply the principles to business.

Google fatally damaged both Yahoo! and the Yellow Pages, other companies that are stuck in the age of structured data are going to find the future equally dismal. Don’t drown in that data lake.

Paul travelled to Las Vegas as a guest of Splunk

Similar posts:

  • No Related Posts

Building community knowledge

Google’s Waze is a good example of shared intelligence

One of the promises of big data and the internet of things is that local governments will be able to gather information about the state of their infrastructure.

A good working example of this is Google’s Waze, the Israeli traffic monitoring startup bought by the search engine giant two years ago.

Waze gathers information about traffic delays and transit times from users then aggregates them to give a picture of commuting times. It has always been a good example of how collaborative data can work.

This week Google announced the service will share its information with a handful of transit agencies and councils to improve their knowledge of the traffic choke points in their cities.

In return the agencies will give their transit information to Waze.

Waze’s story is a good example of how sensors and people, in this case smartphones and their users, are going to gather information on infrastructure and cities. The key is going to be in making sure that data isn’t locked into proprietory databases.

Similar posts:

  • No Related Posts

The limits of big data

A story of lost school books illustrates the limits of big data

A story in the Atlantic – Why Poor Schools Can’t Win At Standardized testing – illustrates the limits of Big Data.

When Meredith Broussard tried to computerise the text book inventory of her son’s school district she found the project limited by poor systems, fragmented record keeping and siloed management.

Broussard found the records were manually collated, collected on Microsoft Word documents and emailed to an under resourced office that entered details into an Excel Spreadsheet.

The Philadelphia schools don’t just have a textbook problem. They have a data problem—which is actually a people problem. We tend to think of data as immutable truth. But we forget that data and data-collection systems are created by people.

The human factor is a key limitation with any technology; if people aren’t collecting or using data properly than the best computer system in the world is useless. Garbage In, Garbage Out is a long standing IT industry saying.

Management systems are more than computer networks, they go to the very core of an organisation’s culture which in itself is probably a better indicator of how well a company or institution will survive the current period of change.

Were the Philadelphia public school system a business it would be a very good example of a company on its way to being digital roadkill, that it’s an educational network should worry anybody concerned about the economy’s future. That’s a bigger issue than Big Data.

 

Similar posts:

  • No Related Posts

Staying healthy with Big Data

Doctors are starting to match shopping patterns to health problems

US medical centre chain Carolinas HealthCare has started mining patients’ credit card data to predict health outcomes reports Bloomberg Businessweek.

The idea is that by looking at credit information and purchasing records, doctors can anticipate what ailments their patients will present with.

Carolinas Healthcare’s matching of spending patterns to healthy is an obvious application of Big Data which illustrates some of the benefits that mining information can deliver for individuals and the community.

Should the project overcome patients’ valid privacy concerns, this is the sort of application that is going to be increasingly common as organisations figure out how to apply software to their mountains of information.

Similar posts:

  • No Related Posts

A bot named Willy and the risk of trusting data

Allegations of Bitcoin market manipulation are a reminder of the risks in blindly trusting data.

For two years we were captivated by spectacular rise of the Bitcoin virtual currency. Allegations those gains were a result of market fixing raise important questions about the integrity of our data networks.

The Coin Desk website discusses how the Mt Gox Bitcoin exchange was being ramped by computer bot network nicknamed Willy.

Rampant market ramping – where stock prices are pushed up to attract suckers before those in know sell at a profit – has a proud financial market history; during the 1920s US stock boom, fortunes were made by inside players before the crash and its subsequent banning in 1934.

So it wouldn’t be a surprise that some smart players would try to ramp the Bitcoin market to make a buck and using a botnet – a network of infected computers – to run the trades is a good technological twist.

Blindly trusting data

The Willy botnet though is a worry for those of us watching the connected economy as it shows a number of weaknesses in a world where data is blindly trusted.

As Quinn Norton writes on Medium, everything in the software industry is broken and blindly trusting the data pouring into servers could be a risky move.

The internet of things is based upon the idea of sensors gathering data for smart services to make decisions – one of those decisions is buying and selling securities.

Feeding false information

It’s not too hard to see a scenario where a compromised service feeds false data such as steel shipments, pork belly consumption or energy usage to manipulate market prices or to damage a competitor’s business.

Real world ramifications of bad data could see not only honest investors out of pocket but also steel workers out work, abattoirs sitting on onsold stocks of pig carcasses or blackouts as energy companies miscalculate demand.

The latter has happened before, with Enron manipulating the Californian electricity market in the late 1990s.

When your supply chain depends upon connected devices reporting accurate information then the integrity of data becomes critical.

Like much in the computer world, the world of big data and the internet of things is based up trust, the Mt Gox Bitcoin manipulation reminds us that we can’t always trust the data we receive.

Similar posts:

  • No Related Posts

The what and the why

SurveyMonkey CEO David Goldberg believes we’re still in the early days of understanding the new economy

“People are drowning in big data,” SurveyMonkey’s CEO Dave Goldberg says in the latest Decoding The New Economy video.

Goldberg sees SurveyMonkey as bringing order to the world of big data in allowing organisations to put their information in context, “We want people to ask the right questions so we can get better data.”

“Here’s a question I need to answer – how happy are my employees? what do customers think of my new product? What are my students doing at school this year?”

Growing the survey industry

One group that’s uncomfortable with the rise of SurveyMonkey, a privately listed company that’s worth $1.3 billion after a capital raising last year, are traditional market research firms who see the service as putting a powerful tool in experienced hands. Goldberg sees it as an opportunity for the market research industry.

“We’re not replacing market researchers,” says Goldberg, “most people who come to SurveyMonkey haven’t used a market researcher before. It actually probably creates more demand for more sophisticated research down the line.”

Goldberg himself isn’t from a market research background, instead he hails from the tech sector having set up LAUNCH in 1994, one of the early music streaming companies which he sold to Yahoo! in 2001 and became the company’s Director of Music.

He left Yahoo1 in 2007 and spent two years in the venture capital industry before joining SurveyMonkey as CEO in 2009.

Understanding the data

From his experience, Goldberg sees understanding data the key business skill for today’s workers, firmly believing that kids should be taught statistic rather than coding.

“Everyone is going to have to learn how to use data.” Says Goldberg, “someone was asking me the other day about sort of skills should we teach our kids to prepare them for the future and I think the thing we’re not doing enough of is teaching them how to use and analyze data.”

To Goldberg we’re still in the early days of understanding how mobile and social media are going to change business with understanding data being one of the great opportunities.

“Implicit data is really interesting but it tells you ‘what’, it doesn’t tell you the ‘why’, believes Goldberg. “We think what we do is the explicit side, we gotta ask people to get the ‘why.”

 

Similar posts:

  • No Related Posts

The Australian Internet of Things Forum

The first Australian Internet of Things Forum is launched

The first Australian Internet of Things was held in Newcastle today which I MC’d and managed to give a quick presentation on my Geek’s Tour of Barcelona.

Big Data was the big message from all the day’s sessions with every speaker touching on the challenge of understanding and securing the vast amounts of data collected.

It’s interesting how the technologists — and most of the material was quite high level — have identified this as the main problem facing management with the Internet of Things.

A key take away from the forum is that the clear opportunity for entrepreneurs with the IoT lies in giving businesses the tools to understand the data.

One of the reasons for the event was to launch the Kaooma Project that aims to link local businesses to the Internet of Things. The local business angle is something that needs to be explored in more depth.

Similar posts:

  • No Related Posts