Oct 142014

Last week we looked at the way we organise information is changing in the face of exploding data volumes.

One of the consequences of the data explosion is that structured databases are beginning to struggle as information sources and business needs are becoming more diverse.

Yesterday, cloud Customer Relationship Management company Salesforce announced their Wave analytics product which the company says “with its schema-free architecture, data no longer has to be pre-sorted or organized in some narrowly defined manner before it can be analyzed.”

The end of the database era

Salesforce’s move is interesting for a company whose success has been based upon structured databases to run its CRM and other services.

What the company’s move could be interpreted that the age of the database is over; that organising data is a fool’s errand as it becomes harder to sort and categorise the information pouring into businesses.

This was the theme at the previous week’s Splunk conference in Las Vegas where the company’s CTO, Todd Papaioannou, told Decoding The New Economy how the world is moving away from structured databases.

“We’re going through a sea change in the analytics space,” Papaioannou said. “What characterised the last thirty years was what I call the ‘schema write’ era; big databases that have a schema where you have to load the data into that schema then transform before you can ask questions of it.”

Breaking the structure

The key with programs like Salesforce and other database driven products like SAP and Oracle is that both the data structures — the schema — and the questions are largely pre-configured. With the unstructured model it’s Google-like queries on the stored data that matters.

For companies like Salesforce this means a fundamental change to their underlying product and possibly their business models as well.

It may well be that Salesforce, a company that defined itself by the ‘No Software’ slogan is now being challenged by the No Database era.

Paul travelled to San Francisco and Las Vegas as a guest of Salesforce and Splunk respectively

Oct 122014

A  cute little story appeared on the BBC website today about the Teatreneu club, a comedy venue in Barcelona using facial recognition technology to charge for laughs.

In a related story, the Wall Street Journal reports on how marketers are scanning online pictures to identify the people engaging with their brands and the context they’re being used.

With the advances in recognition technology and deeper, faster analytics it’s now becoming feasible that anything you do that’s posted online or being monitored by things like CCTV is now quite possibly recognise you, the products your using and the place you’re using them in.

Throw all of the data gathered by these technologies into the stew of information that marketers, companies and governments are already collecting and there a myriad of  good and bad applications which could be used.

What both stories show is that technology is moving fast, certainly faster than regulatory agencies and the bulk of the public realise. This is going to present challenges in the near future, not least with privacy issues.

For the Teatreneu club, the experiment should be interesting given rich people tend to laugh less; they may find the folk who laugh the most are the people least able to pay 3o Euro cents a giggle.

Oct 082014
How can business survive rough seas

Last week Yahoo! closed down their directory pages ending one of the defining services of the 1990s internet and showing how the internet has changed since the first dot com boom.

The Yahoo! Directory was victim of a fundamental change in how we manage data as Google showed it wasn’t necessary to tag and label every piece of information before it could be used.

Yahoo!’s Directory was a classic case of applying old methods to new technologies – in this case carrying out a librarian’s function of cataloguing and categorising every web page.

One problem with that way of saving information is you need to know part of the answer before you can start searching; you need to have some idea of what category your query comes under or the name of the business or person you’re looking for.

That pan was exploited by the Yellow Pages where licensees around the world harvested a healthy cash flow from businesses forced to list under a dozen different categories to make sure prospective customers found them.

With the arrival of Google that way of structuring information came to an end as Sergey Brin and Larry Page’s smart algorithm showed it wasn’t necessary to pigeonhole information into highly structured databases.

Unstructured data

Rather than being structured, data is now becoming ‘unstructured’ and instead of employing an army of clerks to categorise information it’s now the job of computers to analyse that raw information and pick out what we need for our businesses and lives.

As information pours into companies from increasingly diverse sources, a flood that’s becoming so great it’s being referred to as the ‘data lake’, it’s become clear the battle to structure data is lost.

At the Splunk Conference in Las Vegas this week, the term ‘data lake’ is being used a lot as the company explains its technology for analysing business information.

Splunk, along with services like IBM’s Watson and Tableau Software, is one the companies capitalising on businesses’ need to manage unstructured data by giving customers the tools to analyse their information without having first to shoehorn it into a database.

“Thanks to Google we got to look at data a different way,” says Splunk’s CEO and Chairman Godfrey Sullivan. “You don’t have to know the question before you start the search.”

Diving into the data lake

It’s always dangerous applying simple labels to computing technologies but some terms, like ‘Cloud Computing’, don’t do a bad job of describing the principles involved and so it is with the ‘data lake’.

Rather than a nice, orderly world where everything can be pigeonholed, we know have a fluid environment where it wouldn’t be possible to label everything even if we wanted to. A lake is a good description of the mass of data pouring into our lives.

The web was an early example of having to manage that data lake and Google showed how it could be done. Now it’s the turn of other companies to apply the principles to business.

Google fatally damaged both Yahoo! and the Yellow Pages, other companies that are stuck in the age of structured data are going to find the future equally dismal. Don’t drown in that data lake.

Paul travelled to Las Vegas as a guest of Splunk

Oct 042014
google self driving car

One of the promises of big data and the internet of things is that local governments will be able to gather information about the state of their infrastructure.

A good working example of this is Google’s Waze, the Israeli traffic monitoring startup bought by the search engine giant two years ago.

Waze gathers information about traffic delays and transit times from users then aggregates them to give a picture of commuting times. It has always been a good example of how collaborative data can work.

This week Google announced the service will share its information with a handful of transit agencies and councils to improve their knowledge of the traffic choke points in their cities.

In return the agencies will give their transit information to Waze.

Waze’s story is a good example of how sensors and people, in this case smartphones and their users, are going to gather information on infrastructure and cities. The key is going to be in making sure that data isn’t locked into proprietory databases.

Jul 162014
Big data takes our online, shopping and social media use it is the business challenge for our time

A story in the Atlantic – Why Poor Schools Can’t Win At Standardized testing – illustrates the limits of Big Data.

When Meredith Broussard tried to computerise the text book inventory of her son’s school district she found the project limited by poor systems, fragmented record keeping and siloed management.

Broussard found the records were manually collated, collected on Microsoft Word documents and emailed to an under resourced office that entered details into an Excel Spreadsheet.

The Philadelphia schools don’t just have a textbook problem. They have a data problem—which is actually a people problem. We tend to think of data as immutable truth. But we forget that data and data-collection systems are created by people.

The human factor is a key limitation with any technology; if people aren’t collecting or using data properly than the best computer system in the world is useless. Garbage In, Garbage Out is a long standing IT industry saying.

Management systems are more than computer networks, they go to the very core of an organisation’s culture which in itself is probably a better indicator of how well a company or institution will survive the current period of change.

Were the Philadelphia public school system a business it would be a very good example of a company on its way to being digital roadkill, that it’s an educational network should worry anybody concerned about the economy’s future. That’s a bigger issue than Big Data.


Jul 052014
supermarket checkouts

US medical centre chain Carolinas HealthCare has started mining patients’ credit card data to predict health outcomes reports Bloomberg Businessweek.

The idea is that by looking at credit information and purchasing records, doctors can anticipate what ailments their patients will present with.

Carolinas Healthcare’s matching of spending patterns to healthy is an obvious application of Big Data which illustrates some of the benefits that mining information can deliver for individuals and the community.

Should the project overcome patients’ valid privacy concerns, this is the sort of application that is going to be increasingly common as organisations figure out how to apply software to their mountains of information.

May 272014
understanding data with computers

For two years we were captivated by spectacular rise of the Bitcoin virtual currency. Allegations those gains were a result of market fixing raise important questions about the integrity of our data networks.

The Coin Desk website discusses how the Mt Gox Bitcoin exchange was being ramped by computer bot network nicknamed Willy.

Rampant market ramping – where stock prices are pushed up to attract suckers before those in know sell at a profit – has a proud financial market history; during the 1920s US stock boom, fortunes were made by inside players before the crash and its subsequent banning in 1934.

So it wouldn’t be a surprise that some smart players would try to ramp the Bitcoin market to make a buck and using a botnet – a network of infected computers – to run the trades is a good technological twist.

Blindly trusting data

The Willy botnet though is a worry for those of us watching the connected economy as it shows a number of weaknesses in a world where data is blindly trusted.

As Quinn Norton writes on Medium, everything in the software industry is broken and blindly trusting the data pouring into servers could be a risky move.

The internet of things is based upon the idea of sensors gathering data for smart services to make decisions – one of those decisions is buying and selling securities.

Feeding false information

It’s not too hard to see a scenario where a compromised service feeds false data such as steel shipments, pork belly consumption or energy usage to manipulate market prices or to damage a competitor’s business.

Real world ramifications of bad data could see not only honest investors out of pocket but also steel workers out work, abattoirs sitting on onsold stocks of pig carcasses or blackouts as energy companies miscalculate demand.

The latter has happened before, with Enron manipulating the Californian electricity market in the late 1990s.

When your supply chain depends upon connected devices reporting accurate information then the integrity of data becomes critical.

Like much in the computer world, the world of big data and the internet of things is based up trust, the Mt Gox Bitcoin manipulation reminds us that we can’t always trust the data we receive.