King Canute and Google: When the algorithm is wrong

As society and business drown in big data we’re relying on algorithms and computer programs to helps us wade through the masses of information, could that be a weakness?

As society and business drown in big data we’re relying on algorithms and computer programs to helps us wade through a flood of information, could that reliance be a weakness?

British Archeology site Digital Digging discusses how Google displays Manchester United winger Ryan Giggs in the results search for Cnut, the ancient king of Denmark better known in the English speaking world as King Canute.

Apparently Giggs appears in the search results for Canute because of the footballer’s futile attempt to hold back a tide of information about his love life.

While Google’s algorithm seems to have made a mistake, it’s only doing what it’s been programmed to do. A lot of trusted websites have used the term ‘Canute’ or ‘Cnut’ in relation to Giggs so the machine presents his picture as being relevant to the search.

Confusing Ryan Giggs and King Canute is mildly amusing until we consider how critical algorithms like Google Search have become to decision making, there are no shortage of stories about people being wrongly billed, detained or even gaoled on the basis of bad information from computers.

The stakes in making mistakes based on bad information are being raised all the time as processes become more automated, a chilling technology roadmap for the US military in Vice Magazine describes the future of ‘autonomous warfare’.

By the end 2021, just eight years away, the Pentagon sees “autonomous missions worldwide” as being one of their objectives.

Autonomous missions means local commanders and drones being able to make decisions to kill people or attack communities based on the what their computers tell them. The consequences of a bad result from a computer algorithm suddenly become very stark indeed.

While most decisions based on algorithms may not have the life or death consequences that a computer ordered drone strike on a family picnic might have, mistakes could cost businesses money and individuals much inconvenience.

So it’s worthwhile considering how we build the cultural and technological checks and balances into how we use big data and the algorithms necessary to analyze it so that we minimise mistakes.

Contrary to legend, King Canute didn’t try to order the tide not to come in. He was trying to demonstrate to obsequious court that he was fallible and a subject to the laws of nature and god as any other man.

Like the court of King Canute, we should be aware of the foibles and weaknesses of the technologies that increasingly guides us. The computer isn’t always right.

Big Data needs big databases

Investors are making big bets on the databases that underpin Big Data

While the tech industry’s startup hype this week has been focused on the impending Twitter Initial Public Offering, a much more fascinating company quietly completed a major capital raising.

MongoDB provides an open-source, document database program and last week raised another $150 million from investors that values the company at $1.2 billion dollars.

Databases lie at the heart of Big Data and businesses need better computer programs to manage the overwhelming amount of information that’s pouring in every day.

As every business is unique, larger corporations find they spend huge amounts of money on their databases. The enterprise that buys an Oracle, IBM or SAP system usually spends tens, if not hundreds, of millions of dollars in adapting the system to work for them, often with less than spectacular results.

While implementing MongoDB or any other open source program doesn’t eliminate implementation costs, it is often easier to setup and maintain as most of the information about the system is shared and freely available rather than locked inside the vendor’s proprietary knowledgebases.

Probably most important of all, the data structures themselves are open so customers don’t find themselves locked into a relationship with one vendor because all their information is in a format that can only be read by one system.

Open source is where Big Data, social media and cloud computing intersect – without the data itself being open and accessible, most cloud computing and social media services will almost certainly fail.

So MongoDB and the other open source products are the quiet, back of house technologies that keep the internet as we know it ticking along.

Bloomberg Businessweek reports there’s some very serious investors in MongoDB.

The deal attracted new investors such as EMC Corp. (EMC:US) and Salesforce.com Inc. (CRM:US), along with previous backers Red Hat Inc. (RHT:US), Intel Corp. (INTC:US), New Enterprise Associates and Sequoia Capital, according to MongoDB.

Sequoia Capital are one of the longest lasting Silicon Valley venture capital firms whose greatest success was being one of the first investors in Apple Computers and New Enterprise Ventures have a similar pedigree with companies like 3Com, Juniper Networks and Vonage. Investment by industry leaders like Intel, Red Hat, Salesforce and EMC in the company also shows MongoDB isn’t the standard Silicon Valley Greater Fool play.

When there’s a gold rush, it’s those selling the shovels who make the big money and the investors in MongoDB and similar services are hoping they’ve found some of the modern day shovels.

That may well turn out to be the case and while the smart folk make more money from the technologies that drive social media and cloud computing services, the rest of us are distracted by the latest shiny thing.

Exploring the internet of everything

What does the internet of everything mean for businesses? Cisco’s Ken Boal explains.

As part of the Decoding the New Economy video series, I had the opportunity of interviewing Ken Boal, the head of Cisco Australia and New Zealand, about the Internet of Everything and how it will change business.

“The internet of everything is about things, it’s about people, process and it’s about data,” says Ken. “Compounding together to create new capabilities and drive opportunities for nations, enterprises, government and right down to consumers.”

“It’s a huge transition in the internet’s evolution.”

Reducing the road toll

A previous Cisco presentation looked at some of the ways the internet of everything can reduce road deaths, Ken sees this both private and public sector benefits of the connected economy flowing to consumers and the community.

“When you think about things like traffic congestion, health care and how education is delivered we know there’s huge opportunities for greater efficiency,” says Ken.

“Just on road safety, when we’ve got all the vehicles and trucks connected, when the traffic lights and traffic control systems are all connected,” suggests Ken, “then consumers are going be better informed about what is the most efficient route to work.”

“Cars will be communicating with each other to reduce fatalities and collisions in the future as well.”

Bringing together industrial, consumer  and public safety technologies creates a grid of connected devices, including cars, that improve public safety while making industries more efficient.

Of course these connected services come with risks to privacy, particularly when multiple points of data can triangulated despite each individual item being anonymous on their own.

What Ken finds is particularly important is the current value of these technologies with Cisco predicting $1.4 billion in productivity gains through the internet of everything this year, half of which are available for businesses.

A warning for Australia

For Australia, the concern is that business and the economy in general is falling behind, Cisco’s recent Internet of Everything Value Index rated Australia among the BRIC countries in adopting the new technologies.

“We’ve always counted Aussies as fairly innovative and leading edge,” says Ken. “Australia is ranked tenth out of the twelve largest economies in the adoption of internet of everything capablities.”

The countries leading – such as Japan, Germany and the United States – have had a solid record of investing in technology, “in Australia, we’ve had that in the past but we’ve lost our mojo.” Ken says, “IT has been viewed more as a problem – a cost to business – rather than a provider of productivity for the long term.”

How business can adapt

For businesses, the question is how can they take advantage of the internet of everything? “You don’t have to start from scratch,” says Ken. “There are a whole heap of use cases for every vertical.”

“Start to drive some innovation. Think about your business processes at the front end where you touch your customers, look at your supply chains and your back office arrangements driving workforce productivity and how fast can you deliver new innovations to the market.”

“Internet of everything themes can address a whole host of these different processes in different parts of your business.”

Privacy’s still beating heart and the social media challenge

The changing habits of younger web surfers are challenging the assumptions underlying social media services.

“I’m not a very public person,” twenty-two year old Walter Woodman tells the New Yorker in How A Relationship Dies on Facebook.

One of the assumptions of the social media industry is that digital natives, those born after 1990, have little if any expectations of privacy. The New Yorker story challenges that idea.

Much of the New Yorker’s background is taken from the Pew Centre’s May 2013 report Teens, Social Media and Privacy which interviewed 802 US teens and their parents to identify young adults’ attitudes towards privacy.

As the Pew Centre’s Mary Madden wrote in a follow up post to that report, US teenagers aren’t about to about to abandon Facebook yet but they are concerned about privacy and the work involved in managing an online persona.

While some of our teen focus group participants reported positive feelings about their use of Facebook, many spoke negatively about an increasing adult presence, the high stakes of managing self-presentation on the site, the burden of negative social interactions (“drama”), or feeling overwhelmed by friends who share too much.

This suggests a far more mature, and complex, understanding of privacy by teenagers than many of the social media boosters assumed when declaring that privacy is irrelevant in the Facebook era.

Like their parents, teenagers and young adults know there are consequences for sharing too much online which challenges the social media platforms that have built their businesses around users spilling everything about themselves into the big data pot.

It turns out digital natives are just as conscious of the risks as their parents, although how they handle it may manifest in different ways, and the assumptions of many social media businesses aren’t quite as robust as they appeared not so long ago.

A trillion points of data

As shopping centres, social media services and police forces collect greater amounts of information about us, we need to understand and manage the risks involved.

Last night, current Affairs program Four Corners had a look of the risks to families in the age of Big Data.

Earlier in the day I had the opportunity to speak on ABC 702 Sydney with the program’s reporter, Geoff Thompson, to discuss some of the issues and take listeners’ calls about Big Data and security.

What stood out from the audience’s comments is how most people don’t understand the extent of how data is being shared. The frightening thing is the Four Corners program itself understated the extent of how information is being distributed around the internet.

Looking beyond social media

Social media sites like Facebook are an obvious and legitimate area of concern with most people not understanding the ramifications of the terms and conditions of these services, however Big Data is a far more that what you share on LinkedIn or Instagram.

A major point of the program was how the New South Wales police force’s Automatic Number Plate Recognition (ANPR) equipment stores photographs of car license plates.

One of the applications of ANPR shown during the program was how an officer can be warned that a vehicle has owned by someone potentially dangerous or used in a suspicious situation, allowing them to be more cautious if they decide to pull a car over. Probably the greatest application is getting unregistered, uninsured or unlicensed drivers off the road.

Those sorts of usage is the positive side of Big Data and its role in reducing the road toll, the example also illustrates how data points are coming together with the internet of machines as traffic lights, road signs and cars themselves are communicating with each other and those police databases.

When that information is put together there’s a lot valuable intelligence and that’s why people are concerned that the NSW Police are storing millions of apparently useless images of car number plates with the time and location of the photographs.

These technologies aren’t just being used in shopping centres; instore mobile phone tracking combined with the same numberplate recognition the police use watching who is entering the carparks makes it possible to predict buying patterns and target offers to shoppers.

Couple that information with store loyalty cards and add in rapidly developing facial recognition, retailers have a very powerful way of monitoring how their customers behave.

“What instore analytics does is it takes the same kind of capablities that e-commerce sites have had for more than a decade and apply them to brick and mortar stores,” says Retail Next’s Tim Callen. Using the store’s CCTV system the company applies facial recognition software to track shoppers’ behaviour.

Securing the data feeds

The immediate concern is the security of this data, we’ve covered the hackable baby monitor and the Four Corners program examined Troy Hunt’s exposure of security flaws in Westfield Shopping Centres’ Find My Car App. Similar security concerns surround government databases like the NSW Police’s numberplate store.

As we’ve seen with the repeated data breaches of 2011, the management of big and small organisations like Sony or Stratfor don’t take security seriously. It’s hard to recall any senior public servant being held accountable for a security breach by their department.

A billion points of data

On their own, each of these data points means little but for a motivated marketer, tenacious police officer or determined stalker pulling those separate information sources together can pull together an accurate picture of a person’s private information, habits and beliefs.

Almost all the collectors of this data claim this information is anonymised or isn’t personal information, unfortunately there’s mismatch between the definition of private data and reality as number plates and mobile phone MAC addresses are not considered private, however they provide enough insight for an individual to be identified.

That aspect isn’t understood by most people, the final caller to the ABC Radio spot asked why she should be bothered worrying about privacy – it doesn’t matter.

As French politician Cardinal Richelau said in the Seventeenth Century, If you give me six lines written by the hand of the most honest of men, I will find something in them which will hang him

Today we each have six million points of data that can hang us, in a decade it could easily be a billion. We need to understand and manage the risks this presents while enjoying the benefits.

Realising value from the internet of everything

How will businesses benefit from the internet of everything?

How much opportunity does connecting all our machines to the internet really offer businesses and society?

Cisco’s Internet of Everything index released last week looks at one of the great opportunities facing today’s managers in realising business value in these new technologies .

On Cisco’s calculations, the internet of everything is worth over $14.4 trillion to the world economy and nearly half the business benefits are going wasted.

Germany and Japan lead the pack and, as discussed yesterday, Australia wallows between China and Russia.

Cisco comparison of countries
Cisco comparison of countries

Despite German businesses being the leaders, Cisco estimates $33bn, or nearly 40% of the potential gains, isn’t being realised even in that country.

How different industries are using the internet of machines is notable as well, with Cisco claiming the biggest benefits currently being realised by the IT industry while the greatest potential lies in the service, logistics and manufacturing industries.

cisco-internet-of-everything-value-index-by-industry
Internet of everything value by industry

If anything, these projections could be on the conservative side with Cisco estimating fifty billion devices connected to the net by 2020. Given the rate of smartphone being sold and everything from vending machines to clothing being online, it may well be ten or even a hundred times that number.

The real challenge for businesses in all these projections is how individual organisations can realise this value in their operations.

For some businesses, there’s plenty of existing opportunities with well established services in areas like field services and logistics tracking the locations of staff and packages. These are relatively simple to incorporate into existing operations.

In other applications, businesses will find things more complex as the connected devices will tie into analytics and Big Data plays. These won’t be simple.

One particularly important area for the workforce as a whole in business process automation where many tasks currently done by humans can be carried out by machines talking to each other.

This is already happening in fields like fast moving consumer goods and hospitality where stock levels can be automatically monitored and replacement stock ordered in without staff being involved. As the technology becomes more widespread this will threaten the roles of many previously well paid managers.

Many of those managers though will be challenged anyway unless they’re prepared to deal with the changes that internet of things is bringing to their businesses.

How do you think the internet of everything will change your business?

Coming to your city – the internet of machines

A chart by sensor manufacturer Libelium illustrates how the internet of machines is growing

An intriguing infographic from Spanish sensor manufacturer Libelium – which to Australian ears sounds like a new age defamation law firm – illustrates how the internet of things is being used in all walks of life from shipping containers to park benches.

The notable thing about the diagram is pretty well all of the sensor applications have been available for years – in some cases decades – and its only with the arrival of cheap sensors and pervasive internet access that widespread monitoring has becoming possible.

Libelium smart world infographic

With affordable, even disposible, sensors coupled with internet projects like Google Loon and Australia’s National Broadband Network, these networks are now possible at a price that won’t sink a government’s budget.

In fact these sensor networks will probably improve councils’ and governments’ budgets as they promise to improve the efficiency of services like rubbish collection and street repairs.

The real challenge is managing all the data this equipment gathers, that’s going to be one of the big jobs of the next decade.

702 Sydney – Green computing and how we’re being watched online

On 702 SydneyLinda Mottram and I talk about Internet spying and green computing.

This morning on 702 Sydney I’m talking to Linda Mottram about Internet spying and green computing.

How Green is the internet looks at the claims from Google and other companies about cloud computing’s energy use.

The Internet snooping story broke two weeks ago with The Guardian NSA files.

An early part of the story was abot the use of the telephone company metadata – information about phone calls, not the actual content which intelligence agencies and law enforcement can use to draw a picture from.

For Australians, there’s additional cause for concern as the Telecommunications Act gives government agencies the powers to access anyone’s information.

If you’re worried about the way data is being collected about you online. Duck Duck Go is a secure, private browser and Box Free IT has some great suggestions on securing cloud computing services.

For those who want to seriously cover their online tracks, the Tor project and PGP encryption are more advanced privacy tools.

We’d love to hear your views so join the conversation with your on-air questions, ideas or comments; phone in on 1300 222 702 or post a question on ABC702 Sydney’s Facebook page.

If you’re a social media users, you can also follow the show through twitter to @paulwallbank and @702Sydney.

How Green is the Internet?

What are the environmental costs of the internet, cloud computing and big data?

Earlier this month Google hosted “How Green is the Internet?“, a summit which looked at the environmental costs of the connected society and technologies like cloud computing and Big Data.

The environmental impact of the internet and related technologies is a subject worth exploring, like all industries there are real costs to the planet which usually aren’t bourne by those who make the profits or reap the benefits.

In complex modern supply chains which often span the globe, the costs are not often apparent either. What appears to be a relatively clean, innocuous product to city consumers could have terrible environmental consequences for others.

Google’s summit is a good example of overlooking many external costs in that most of the conversations looked at reducing energy usage, understandable given the company’s dependence on power hungry data centres which drive their cloud computing services.

move-to-cloud-cost-savings-on-the-internet

Energy usage is important in the discussion about digital technologies – the businesses of bits and bytes almost wholly relies upon having constant and reliable electricity supplies and power generation is one of the most environmentally damaging activities we engage in.

Focusing on energy consumption though is not the only aspect we need to look at when examining how green the internet is, there’s many other costs in building the supply chain that enables us to watch funny cat videos in our homes or offices.

The entire supply chain is complex and the session on infrastructure costs by Jon Koomey of Stanford University touched on this; there’s the environmental costs of building data centres, of manufacturing routers, of laying cables and – probably the most difficult question of all – what do we do with the e-waste generated by obsolete equipment.

Little of this was touched on in the Google conference and it’s interesting that the tech industry is focusing on the energy costs while overlooking other effects of a global, complex industry.

That isn’t to say the energy story isn’t valid. A number of the Google speakers emphasized the indirect energy saving costs as cloud computing and Big Data allows more intelligent business decisions that make industries and daily life more efficient.

A favourite example is the use of car parking apps where drivers save energy and reduce pollution because they aren’t driving around looking for the parking spaces. This puts Google’s acquisition of traffic app Waze into perspective.

Reducing driving times is just one area of where the internet is improving energy efficiency and these are important factors when considering the ‘greenness’ of the web.

However without considering the full impact of building, maintaining and disposing the equipment that we need to operate the internet, we aren’t really looking at the entire impact the internet is having on the planet.

Google’s conference though is a good starting point for that discussion which is one that every industry should be having.

Smart cities and the sensors in your pocket

Community wide sensors promise to change government

National Public Radio’s Parallels program has story on how the Spanish city of Santander is wiring itself as a ‘smart city’ with a network of sensors wiring everything from garbage bins to parking spots.

The hope with the sensors is they’ll will improve local government’s services, allowing things like more efficient garbage collection and better pricing of parking meters.

What’s notable about the story is that smartphones are included as ‘sensors’ with Santander residents being able to submit data from their handsets.

The idea of smartphones as sensors isn’t new — pothole reporting apps were early to the iPhone — the increased sophistication of handsets and improved tracking technology is making them more powerful.

So we have another Big Data problem with local councils being flooded with information.

Processing all this information is going to require the community pitching in so the data is going to have to open.

Once governments make the data open it also creates opportunities for smart entrepreneurs to create new services and technologies.

Creating new opportunities is a hope of government sensor programs around the world, including Tasmania’s Sense-T project .

With factors like water quality and weather being monitored, existing sectors become more efficient and new industries are being created.

Hopefully the urge to hoard this rich, community data will be resisted by governments.

Big data’s big truths

There’s a lot of hype around Big Data but it doesn’t mean we should ignore the risks or opportunities.

One thing former Obama 2012 campaign CTO Harper Reed cannot be accused of is subtlety so his statement at the Sydney CeBIT conference last week that Big Data is Bullshit wasn’t wholly surprising.

Reed has a good point – like all IT industry buzzwords there is a fair degree of hype and BS around Big Data although his referring to it as a storage problem misses the point.

Data storage is a problem largely solved; when we’re talking about Big Data today, we’re talking more about analysing the information and managing the life cycle of an organisation’s data.

Not that these issues are new, the tech industry has been dealing with the challenges of storing, managing and analysing data since computers first appeared. In fact, that’s the reason computers were invented.

An excellent NY Times Bits blog post expands on Harper’s views and rebuts many of the myths and hype around big data.

Most important is the point that big data is not the truth, we can torture those bits and bytes to tell us anything we like.

Claims that Big Data can tell us everything or that it will conquer discrimination and make cities smarter are fanciful. It all depends on how we choose to use the data.

There are downsides with Big Data too — we live in an age where it’s easier to let the algorithm do the work and if the computer says ‘no’, then we can shrug and say “sorry it’s beyond our control.”

Letting the algorithms run our lives is one of many risks, but it doesn’t change the opportunities for businesses, governments and communities Big Data presents. If we can understand our world better, we can do smarter things.

That’s the real opportunity with Big Data and we don’t need the hype to tell us that.

ABC 702 mornings – Storage and your computer

How we deal with the information explosion in the age of Big Data is the topic of today’s 702 Sydney segment with Linda Mottram

This morning on 702 Sydney I’m talking to Linda Mottram on the decidedly unsexy topic of storage – hard drives, cloud computing and the struggle to keep up with ever expanding file sizes of documents, photos and downloads.

It’s an opportunity to revisit the How Much Data Does The Internet Need topic which I covered for Radio National last year, although almost certainly that needs updating.

Earlier this year networking vendor Cisco released its 2013 Virtual Networking Index which forecast global data traffic growing fourteen fold over the next five years.

Those bytes slopping around the internet have to come to rest on someone’s hard drive and this is what’s driving the storage crisis.

Yesterday US business site Venture Beat had an op-ed by an executive from Seagate, the world’s biggest hard drive manufacturer where he discussed the storage challenges with a claim from industry consultants IDC that worldwide computer storage is 2.7 zettabytes.

A zettabyte is a trillion gigabytes, or ten followed by twenty zeros – it’s the equivalent of a billion one terabyte hard drives that are standard on most cheap desktop computers.

Where those hard drives are located is the big challenge, is it on your laptop, smartphone or on a somewhere on a cloud service?

The other big challenge is what do you do with all this information – which is where the Big Data discussion comes in.

While data storage is a mundane topic, it’s a big one that matters. I hope you can tune in.

We’d love to hear your views so join the conversation with your on-air questions, ideas or comments; phone in on 1300 222 702 or post a question on ABC702 Sydney’s Facebook page.

If you’re a social media users, you can also follow the show through twitter to @paulwallbank and @702Sydney.