Garbage In and Garbage Out

The success of using Big Data depends upon the quality of both the data and the algorithm

Smart rubbish bins in Barcelona

UK tech site The Register reports that Google Flu Trends has been dismal failure with the service over-reporting the incidence of influenza by a factor of nearly 12.

The reason for this problem is the algorithm used to determine the existence of a flue outbreak is that it relies on people searching for the terms ‘flu’ or ‘influenza’ and it turns out we tend to over-react to a dose of the sniffles.

Google Flu Trends’ failure illustrates two important things about big data – the veracity of the data coming into the system and the validity of the assumptions underlying the algorithms processing the information.

In the case of Google Flu Trends both were flawed; the algorithm was based on incorrect assumptions  while the incoming data was at best dubious.

The latter point is an important factor for the Internet of Machines. Instead of humans entering search terms, millions of sensors are pumping data into system so bad data from one sensor can have catastrophic effects on the rest of the network.

As managing data becomes a greater task for businesses and governments, making sure that data is trustworthy will be essential and the rules that govern how the information is used will have to be robust.

Hopefully the lessons of Google Flu Trends will save us from more serious mistakes as we come to depend on what algorithms tell us about the data.

Similar posts:

  • No Related Posts

Author: Paul Wallbank

Paul Wallbank is a speaker and writer charting how technology is changing society and business. Paul has four regular technology advice radio programs on ABC, a weekly column on the smartcompany.com.au website and has published seven books.

Leave a Reply