big data | Paul Wallbank - Communications Professional

Hacking the connected vehicle

American farmers hacking their tractors with Ukrainian software are a taste of what’s to come in the connected economy.

What happens when a vehicle manufacturer locks down their products’ software? John Deere’s customers are finding out as American farmers turn to Ukrainian software vendors for software to maintain their tractors.

John Deere’s behaviour is extreme as almost every component of a modern tractor has a software component which leaves farmers at the mercy of the company’s dealers and authorised mechanics.

So understandably the farmers are finding ways to hack their equipment to reduce downtime and costs, something permitted in the US after an exemption to the Digital Millennium Copyright Act (DCMA) was granted to vehicle software.

Vendor control over connected vehicles is a bigger problem for consumers than just maintaining the software, as the information collected from these devices becomes more valuable who controls that data becomes more important.

With global supply chains, increased regulatory requirements and demanding markets, the agricultural industries are probably leading the world in applying the Internet of Things and Big Data, so the challenges faced by farmers are things which will affect us all.

As everything from toasters to motor cars become connected and dependent upon code, the conflict between proprietary software, open markets and user rights is going to grow.

Consumers and the free market can only do so much to control the flows of data and who owns them. It’s hard to see how governments can’t become involved in how information is owned, traded and stored.

Deeper in data and debt

Data tools are getting more powerful as the information collected about us grows. It presents us with some important choices

Data collection agency Experian’s deal with Finicity to collect and process borrower information is an example of the how Big Data is being used by the financial services sector.

Recently I wrote a piece for Fairfax Media on the Science of Money which included some quotes from Experian’s Australian managers. They were quite explicit about their use of data.

That a company like Experian is adopting more advanced analytics isn’t surprising given the power of the tools available. What’s also driving the adoption is the proliferation of devices available to track people.

Notable among those devices are personal assistants, as David Pogue writes in Scientific American, household technologies like Amazon Alexa, Google Home and Apple Siri are vacuuming up huge amounts of data on our behaviour, likes and dislikes.

Increasingly all of this is being fed into machines that determine our suitability for marketing campaigns, credit and financial services.

For companies like Experian this is a massive opportunity although the focus on credit suitability betrays a mindset more suited to the 1980s finance boom than the more complex times of the early 21st century.

It’s hard though not to think that given a choice the finance sector will happily use these tools to take us into another subprime lending crisis which would be a shame as these technologies’ potential for allowing us to make better decisions is immense.

How we use these tools will define our businesses, economies and communities over the next thirty years. We need to be careful about some of the choices we make.

When governments misuse data

The Australian government’s misuse of data in harassing welfare recipients is something that should worry all citizens

Last year the Australian Federal government had a smart idea. To fix its chronic budget deficit, it would use data matching to claw back an estimated three billion dollars in social security overspending.

Unfortunately for tens of thousands of Australians the reality has turned out to very different with the system mistakenly flagging thousands of former claimants as being debtors.

How the Australian government messed up its welfare debt recovery is a cautionary tale of misusing data.

Data mis-match

At its core, the problem is due to the bureaucrats mismatching information.

Australia’s social security system requires unemployment or sickness benefit claimants file a fortnightly income statement with Centrelink, the agency that administers the system, and their payments are adjusted accordingly.

Most of those on benefits only spend a short time on them. According to the Department of Social Services, two thirds of recipients are off welfare within twelve months of starting.

Flawed numbers

Despite knowing this, the bureaucrats decided to take annual tax returns, average the individual’s income across the year and match the result against the fortnightly payment.

That obviously flawed and dishonest method has meant hundreds of former welfare recipients have been falsely accused of receiving overpayments.

Compounding the problem, the system frequently mis-identifies income because it fails to recognise employers may use different legal names, leading to people having their wages double counted and being accused of not reporting work.

Shock and awe

Under pressure from their political masters, the aggressive tactics of Centrelink and its debt collectors have left many of those accused shocked and distressed.

I can barely breathe when I think about this. My time period to pay is up tomorrow. I asked them for proof before I pay and I have heard horror stories of debt collection agencies, people being asked to pay so much, people being told there will be a black mark on their credit. I am so terrified. It’s so stupid for me to be terrified but I can’t help it. I am a student, I can’t afford anything!

Reading the minister’s response to criticisms, it’s hard not to come to the conclusion that intimidation was a key objective.

The numbers of people involved are staggering. The department of Social Services reported 732,100 Australians received the Newstart unemployment allowance in 2015-16. Should 66% of those have moved off the benefit during the tax year then up to 488,000 people will receive ‘please explain’ notices.

Nearly half a million people being falsely accused of welfare fraud is bad enough, but that is only last year’s figures – due to a law change by the previous Labor government, there is no limit to how far back Centrelink can go to recover alleged debts.

The System is working

Claiming the Centrelink debacle is a failure of Big Data and IT systems is wrong – the system is working as designed. The false positives are the result of a deliberate decision by agency bosses and their ministers to feed flawed data into the system.

How this will work out for the Australian government as tens of thousands more people receive unreasonable demands remains to be seen. Recent comments from the minister indicate they are hoping their ‘tough on welfare cheats’ line will resonate with the electorate.

Regardless of how well it turns out for the Australian government, the misuse of data by its agencies is a worrying example of how governments can use the information they collect to harass citizens for short term political advantage.

Beyond welfare

While many Australians can dismiss the travails of Centrelink ‘clients’ as not concerning them, the same data matching techniques have long been used by other agencies – not least the Australian Taxation Office.

With the Federal Treasurer threatening a campaign against corporate tax dodging and the failure of the welfare crackdown to deliver the promised funds, it’s not hard to see small and medium businesses being caught in a similar campaign using inappropriate data.

More importantly, the Australian Public Service’s senior management’s incompetence, lack of ethics and proven inability to manage data systems is something that should deeply concern the nation’s taxpayers.

In a connected age, where masses of information is being collected on all of us, this is something every citizen should be objecting to.

Connecting 400 points of voter data

US political parties are showing how organisations can use data in a targeted, sophisticated way

As the 2016 US Presidential race enters its final stages, it’s interesting to see how data is being used by American political candidates and what this means for business.

During last week’s Oracle Open World in San Francisco a panel hosted by the company’s Political Action Committee featured Stephanie Cutter, who worked on Obama’s 2008 and 2012 campaigns, and Mike Murphy, a Republican operative whose most recently worked on Jeb Bush’s primary effort against Donald Trump.

While the discussion mainly focused on the politics – “Crazy times seem to require crazy candidates” says Murphy – it was the technology aspect of modern elections that was notable.

Setting the data standard

The Obama campaign of 2008 set the standard for how modern political campaigns used social media and information, “we revolutionized how data analytics helps predict how people will vote and how they will persuade voters to turn out.” Cutter said.

“We put a big investment into it and Republicans have caught up,” she continued. “The key though was we relied on our own data and nothing that was out in the public domain. We didn’t rely on one piece of data, we had multiple sources. We had an analytics program where we were making 9,000 calls a night where we were predicting the votes.”

Murphy agreed with the political campaigns using data, “the kind of polling you see in the media has kind of vanished in campaigns where they have money to spend on research.” He said, “we don’t do telephone polling any more because we have so much data we can collect.”

Capturing everything

“We capture everything. We have about four hundred data points on the American voter and we’ll have five hundred in the next two years. We’ll be able to build massive data models without phone polling,” Murphy pointed out. “We’re waiting for the tech folk to get ahead on AI so we can predict what voters are going to do in two weeks.”

Despite the amount data collected by US political parties, the real key to success is the candidate’s organisation and management. Cutter made a strong point about the strength of Obama’s campaign team in both the 2008 and 2012 campaigns.

How the US political parties use data points to how businesses will be managing data in the future. Increasingly using information well is going to be the measure of successful organisations in both politics and industry.

Thinking about networked thinking

In a world awash with data managers may have to start thinking about networked thinking

“We want to be the Wayze of enterprise software” is the line being repeated by executives at the Inforum2016 conference in New York today.

This is an interesting strategy for Infor, who provides a range of enterprise software tools to help companies track what is going on in their business, as Wayze is built upon aggregating user data to identify traffic problems to improve commuting times. It’s no surprise that Google bought the company a few years ago.

Infor position though is slightly different as it’s aggregating individual clients’ data for them. In a world where organisations are struggling not to be overwhelmed by information, Informa are in a good position, even if their executives do overdo it on the buzzwords.

Which leads us to another buzzphrase – design thinking – which has been drifting in and out of fashion over recent years. During the opening keynotes one of the comments was about the rise of “network thinking.”

“Eighty percent of what most companies do deals with data from outside of their organisation,” says Kurt Cavano, Infor’s General Manager of their commerce cloud division. “We’ve seen in the power of networks with sites like Facebook, LinkedIn and Wayze.”

“Nobody wants to be on a network but everyone’s on a network. It takes a long time to build but once you have one it’s magical. That’s what we’re thinking for business, they need to evolve.”

In one respect this is another take on the ecosystem idea, that one vital corporate asset in the connected world is an ecosystem of partners, suppliers and users, however the Infor view articulated by Cavano is much more about the flow of data rather than the goodwill of a community.

So we may well be entering a world of ‘networked thinking’ where thinking about the effects of data flows and being able to understand them – if not manage them – becomes a key executive skill.

Paul travelled to New York as a guest of Infor

Evolving into a data centric company

The newly demerged HP Enterprise is dealing with a shifting market and a change in product focus.

I’m currently at the HP Enterprise Seize the Data roadshow in Singapore where the recently split company is showing off its range of data analytics tools.

Like companies such as IBM and Google, HPE are looking to make money out of data feeds and analytics with a key part being a platform for developers to create applications.

In launching their Haven OnDemand service, HPE are entering a crowded field with IBM, Salesforce, AWS and Splunk – among others – offering similar products. What compelling difference HPE will add to the field will be something I’ll be asking the company’s executive later.

One of the other services, HP Vertica, looks running data analytics against structured and ‘semi-structured’ sources. Again this is a field where other companies are well established and have an advantage in being able to examine unstructured data.

The overwhelming question though is how big, and lucrative, the market is for these data products. It’s not clear exactly how all of these companies are going to monetize these services and, should they be able to, their profitability.

As a company finding its feet less than a year after being split in two with the added problem of seeing its core server hardware business being eroded, HP Enterprise is realigning its business around data analytics and cloud services.

The challenge for the company is differentiating itself and providing competitive products in these markets, this will be a tough challenge.

Guessing ethnic affinity

Big data can create big risks, particularly when a service like Facebook starts racially profiling

What’s your ethnic affinity? Apparently Facebook thinks its algorithm can guess your race based upon the nature of your posts.

This application is an interesting, and dangerous, development although it shouldn’t be expected that it’s any more accurate than the plethora of ‘guess your age/nationality/star sign’ sites that trawl through Facebook pages.

Guessing your race is something clumsy and obvious but its clear that services like Google, LinkedIn and Facebook have a mass of data on each of their millions of users that enables them to crunch some big numbers and come up with all manner of conclusions.

Some of these will be useful to governments, marketers and businesses and in some cases it may lead to unforeseen consequences.

The truth may lie in the data but if we don’t understand the questions we’re asking, we risk creating a whole new range of problems.

Actuaries and the future of Public Relations

Will actuaries become the most valued profession in the PR industry? Some think so.

One of the truisms of modern industry is we’re going to need more workers with data skills. Could it be actuaries will be the profession of the information age.

Much of the focus around how companies will deal with an information rich age come down to the need for ‘data scientists’, those with a combination of statistical, analytical and coding skills will be required to coax insights out of complex and rapidly changing data sets.

At a Future of PR meetup in Sydney earlier this week, one of the panellists raised the possibility that tomorrow’s most valued agency employees will be actuaries as data analytics comes to dominate the industry.

That boring old actuaries – one particularly cruel joke is atuaries are accountants who failed the personality test – could be the hottest profession in the sexy PR industry is quite a delicious scenario.

Should that turn out to be the case though, it won’t just be the PR industry chasing actuaries, almost every industry is going to demanding the same set of skills.

In a strange way it could be the staid professions of today that are the exciting jobs of tomorrow, we’ll reserve judgement on the actuaries though.

Redefining sports media

The Australian Open tennis tournament illustrates how the world of sports broadcasting is changing

Over the last 50 years the relationship between professional sport and television broadcasters has been defined by broadcasting rights. Like most other media business models that relationship is now under threat.

Touring the Australian Open tennis tournament this week, it was striking how the relationship between sports organisations and broadcasters has changed as the internet changes distribution models and data starts to become a valuable asset in itself.

A tour of the data infrastructure behind the tournament as a guest of sponsor and service provider IBM showed how sporting organisations are hoping to use data to improve their fans’ experience and add value for sponsors and competitors.

Last year the Australian Open collected 23 Terabytes of data, a 136 percent increase on 2014, which the organisers distribute on their MatchCenter web platform along with analysis through their Slamtracker system.

Using IBMs Bluemix development platform and the company’s Watson artificial intelligence service, the Australian Open website analyses factors ranging from the audience’s social media sentiment through to predicting competitors’ performance based on historical data.

This wealth of data gives the event organisers a great platform to engage with statistics hungry fans and it was notable when talking to the Australian Open staffers how they now see the television broadcasters as much as their competitors as their partners.

When coupled with the changes to broadcasting rights – like most sports organisations the Australian Open has moved to the model pioneered by Major League Baseball of providing their own video feeds rather than engaging a host broadcaster to record the events and distribute the video – this has put the television and pay-TV networks in a far less powerful position.

For the sports organisations those broadcast rights deals are still by far the most lucrative income stream they have but the days of the host broadcasters holding power over the events are slipping away.

One telling statistic was the shift to mobile platforms. Kim Trengrove, the digital manager for Tennis Australia, pointed out how in 2015 online traffic was split equally between desktop and mobile use while in 2016 it was appearing to be 60% mobile. That change in itself has major ramifications for the market.

In the future as the data becomes more valuable and the video feeds can be distributed across web browsers and even artificial reality headsets, the late Twentieth Century broadcast model becomes even more tenuous.

For the television networks it means their power and income is reduced while those collecting, processing and distributing data become more important. However it may be the software companies managing the information aren’t able to pay the immense sums the broadcasters have been able to offer for the last fifty years.

One thing a tour of the Australian Open did show was how business model of professional sports is dramatically changing. A data driven world is going to be very different to that of the last fifty years.

Calculating the threat score

Applying Big Data marketing tools to law enforcement presents some risks

Forget credit scores, police are now running Threat Scores reports the Washington Post.

This isn’t surprising given the risks involved for officers attending an incident or detaining a suspect and now with treasure troves of data available, police forces and public safety agencies are able to evaluate what threats are present.

However there are real concerns about these databases and tools, particularly in how the algorithm determines what a ‘threat’ is. As the Washington Post explains one package will give a military veteran a greater risk rating as they are more likely than the general population to be suffering post traumatic stress disorder.

In promotional materials, Intrado writes that Beware could reveal that the resident of a particular address was a war veteran suffering from post-traumatic stress disorder, had criminal convictions for assault and had posted worrisome messages about his battle experiences on social media. The “big data” that has transformed marketing and other industries has now come to law enforcement.

The marketing industry’s use of Big Data has, and continues to be, problematic from a privacy and security point of view, that public agencies are using the same tools raises bigger concern.

Over time, we’re going to need rigorous supervision of how these tools are used. The stakes for individual citizens are high.

The limitations of algorithms

Companies like Facebook and Uber are finding there are limits to what computer algorithms can acheive

Are algorithms getting too complex asks Forbes Magazine’s Kalev Leetaru in an examination of how the formulas that are increasingly governing our lives have grown beyond the understanding of their creators.

With computer code now controlling most of the devices and processes we rely on in daily life, understanding the assumptions and limitations of those programs and formulas becomes essential for designers, managers and users.

Leetaru cites the Apollo 13 malfunction and Volvo’s recent embarrassment where a self driving car nearly ran over a group of journalists however there’s no shortage of more tragic mistakes from the consequences of software design decisions, the crash of Air France 447 over the Atlantic Ocean with the loss of 228 lives where two pilots who stalled their plane due to misunderstanding the characteristics of their cockpit is one recent sad example.

As business and government becomes more dependent on software, more risks will arise from managers not understanding the limitations of the algorithms they use in their business.

Similarly a range of industries to exploit the quirks of algorithm driven markets are developing, the Search Engine Optimisation business designed to exploit quirks in Google’s search algorithm is an established example but more will come to the fore as people find ways to profit by anticipating price movements.

However algorithms have a way to go before they fully take over, as Salon’s examination of Facebook’s news feed reveals a key part of the social media service’s deciding what appears on users screens are the decisions of around thousand ‘power users’.

The news feed algorithm had blind spots that Facebook’s data scientists couldn’t have identified on their own. It took a different kind of data—qualitative human feedback—to begin to fill them in.

While Facebook falls back on large focus groups to fill in the algorithm’s gaps, Uber has found a different problem in estimating driver arrival times where it’s currently not possible to accurately calculate estimated times of arrival in real time.

“The best way to minimise time differential issue is to communicate statistically expected time, which will result in almost always being different than actual (i.e. wrong), but will be less different/wrong on average,” says Uber CEO Travis Kalanick.

Uber and Facebook’s challenges with their algorithms illustrate there’s some way to go before all critical business functions can be handed over to software but as automation becomes standard in many areas, not least autonomous vehicles, the limitations of programs and the assumptions of programmers will become apparent.

Open sourcing artificial intelligence

The opening of artificial intelligence platforms is going to see increased development of the technologies

Silicon Valley leaders including Peter Thiel, Elon Musk and Reid Hoffman have pledged a billion dollars towards the OpenAI foundation to open source the development of Artificial Intelligence.

With one of the greatest challenges facing business, political and community leaders in coming being how to deal with the massive amounts of data generated by the Internet of Things and pervasive computers, this is a major step in making the tools available to everyone.

With both Google and Facebook opening their AI platforms in recent weeks, it seems the consensus in the tech industry is that open source is the way to develop these technologies. As a consequence we may see them become commonplace a lot faster than expected.