UK tech site The Register reports that Google Flu Trends has been dismal failure with the service over-reporting the incidence of influenza by a factor of nearly 12.
The reason for this problem is the algorithm used to determine the existence of a flue outbreak is that it relies on people searching for the terms ‘flu’ or ‘influenza’ and it turns out we tend to over-react to a dose of the sniffles.
Google Flu Trends’ failure illustrates two important things about big data – the veracity of the data coming into the system and the validity of the assumptions underlying the algorithms processing the information.
In the case of Google Flu Trends both were flawed; the algorithm was based on incorrect assumptions while the incoming data was at best dubious.
The latter point is an important factor for the Internet of Machines. Instead of humans entering search terms, millions of sensors are pumping data into system so bad data from one sensor can have catastrophic effects on the rest of the network.
As managing data becomes a greater task for businesses and governments, making sure that data is trustworthy will be essential and the rules that govern how the information is used will have to be robust.
Hopefully the lessons of Google Flu Trends will save us from more serious mistakes as we come to depend on what algorithms tell us about the data.