Viktor Mayer-Schönberger: 'More data is being collected and stored about each one of us than ever before'
Viktor Mayer-Schönberger and Kenneth Cukier
Eamon Dolan / Houghton Mifflin Harcourt
Consider the Stasi, the hated secret police force in the former German Democratic Republic. Officially known as the ministry for state security, the agency was a particularly pernicious force in an already oppressive communist state. Its modus operandi was simple: to intimidate, infiltrate and gather intelligence on East German citizens using the most invasive methods. For four post-war decades, the Stasi ruled everyday life with a clenched fist. Only the collapse of the Berlin Wall and the simultaneous fall of the Iron Curtain derailed its operations.
"Employing around a hundred thousand full-time staff, the Stasi watched from cars and streets. It opened letters and peeked into bank accounts, bugged apartments and wiretapped phone lines. Its files - including at least 39 million index cards and 70 miles of documents - recorded and detailed the most intimate aspects of the lives of ordinary people," write Viktor Mayer-Schönberger and Kenneth Neil Cukier in Big Data: A Revolution That Will Transform How We Live, Work and Think, their new book on the effects of collecting and manipulating data.
Now consider an average night out at your favourite restaurant or hotel. You're having a meal with a group of friends and family. At some stage, as the conversation ebbs a little, you might reach for your smartphone and look at your social media feed. You might "check in" at the particular location you find yourself at, you might update your status with a funny anecdote one of your companions has just told you or post a picture on your wall.
Now compare these two scenarios.
Where once an Eastern Bloc state built an elaborate and expensive surveillance machine to keep track of its citizens, now Facebook's one billion-strong worldwide community is, for the most part, freely giving up the often intimate details of their lives and whereabouts.
One would shudder, of course, to draw any real comparison, however slight, between the Stasi and Facebook - one was a particularly corrosive force, the other is a contemporary cultural powerhouse - but the issues of the right to privacy, industrial-scale data collection and the possible manipulation of that information are central questions in Mayer-Schönberger and Cukier's fascinating new book.
"Twenty years after East Germany's demise, more data is being collected and stored about each one of us than ever before," write the pair in Big Data.
Reduced to a basic definition, big data is the digital footprint each one of us leaves. The possibility this information offers is "to harness [it] in novel ways to produce useful insights or goods of significant value".
According to the authors - the former is a professor of internet governance at Oxford University, the latter is the data editor of The Economist - society stands on the precipice of a major transformation brought on by big data.
"The ground beneath our feet is shifting. Old certainties are being questioned. Big data requires fresh discussion on the nature of decision-making, destiny, justice . the possession of knowledge is coming to mean an ability to predict the future," they write, before asserting that this decade marks the moment when the information age finally delivers on its enormous promise.
This shift is under way because of the sheer volume of digital data that now sloshes around our lives, a resource which continues to grow almost beyond comprehension.
As recently as 1986, around 40 per cent of the world's computer power was to be found installed on humble pocket calculators made by Texas Instruments, Casio et al. Since then bigger, better, more powerful computers (and our ever-growing dependence on such devices) have brought with them ever greater quantities of data.
By 2007, more than 300 exabytes (each exabyte is the equivalent of 1bn gigabytes) of stored data was estimated to be in existence. This year that figure is expected to quadruple. Such growth is likely to continue almost unfettered.
Speaking by telephone from his study in Oxford, Mayer-Schönberger characterises erosion of privacy and freedom issues as "the dark side to big data".
"With Facebook, my worry is not that they capture data, but that they so far have been singularly unable to uncover the value in what they have. Soon enough they will look at the data and uncover the value and that value, that use, might be much less benign than just putting advertisements in their right-hand column."
Time and again, his book cites the power of complex algorithms and data processing: Google's ability to track the worldwide spread of the H1N1 virus via spikes in flu-related search requests; MasterCard's potential to work out that if a credit-card holder in the US filled up with petrol at around 4pm they would, in all probability, spend more than US$35 (Dh128) in a supermarket or restaurant in the following hour; Amazon's use of the information it retained from customer searches to work out that a shopper who was browsing through Ernest Hemingway's back catalogue would also be interested in works by F Scott Fitzgerald. The website didn't necessarily have to understand why customers often correlated the two authors, they just had to know that they did and then work out a corporate response to that eventuality.
Then there is Moneyball, the 2011 film of Michael Lewis's best-selling book, which represents perhaps the most high-gloss demonstration of big data's power.
The film tracks the progress of the Oakland A's major league baseball team after they set aside the hunches of the scouting team, in 2002, and replaced them with a new statistics-driven method of valuing a player's potential contribution to the team.
In common with most American sports, although amplified in this case, baseball has always been a numbers-driven game. Fans and coaches often spend hours poring over game stats in the manner of a Second World War cryptographer trying to crack the Enigma code.
The trick the A's general manager Billy Beane and his backroom staff managed to perform was to use a whole new set of numbers to model the efficiency of the team: "Out went time-honoured stats like 'batting average' and in came seemingly odd ways of thinking about the game like 'on-base percentage'," write the authors. Beane's systems are now in use throughout the league, neutralising his competitive advantage, but vindicating his methodology.
Another of the book's more arresting examples of big data adoption occurs in New York City where, in 2007, a utilities company was struggling to deal with a rash of exploding manholes around its network: "Sometimes the cast-iron covers explode into the air before crashing to the ground. This is not a good thing," write the authors, particularly when the city was populated by more than 50,000 such units and network health checks were carried out pretty much at random.
A team of big data analysts drew up a long list of likely problems - most notably the age of cables housed within a manhole and whether a particular site had experienced issues before - and managed the statistics to accurately predict future trouble spots.
As the authors rightly point out, in this example both of these predictive causes seem fairly obvious. The same might be true of the political unrest of the Arab Spring. Surely the prevailing factors of the uprisings - high unemployment wedded to large numbers of disaffected young people using social media to talk about their issues - should have made them easy to spot for a big data analyst?
Mayer-Schönberger disagrees. "People were looking at [the Arab Spring] and saying this was a Twitter revolution when less than one per cent of the people were Twitter users," he says. "Then someone said it was really an Al Jazeera revolution, which probably is closer to the truth, but we really don't know because we don't have the data."
Smartphone usage, rather than relatively low levels of internet adoption, appears to hold the key to what might happen next in this region.
"The more smartphones are being used and the more data they collect, the more we'll know," he says. "That data will provide transparency and with that comes a better ability to predict the future."
Early in the book, the authors write that they are not "big data's evangelists" but rather its "messengers". This is a viewpoint Mayer-Schönberger upholds during our conversation.
His biggest concern, he says, "is the combination of the dictatorship of data and punishment by propensity".
Big data "can't supplant human beings in coming up with innovative ideas", and some of the hype that surrounds it will, he says, inevitably deflate, just like it did after the dot.com bubble burst.
"Somebody said to me that with big data you can predict everything, that's not right. For some areas, big data is not particularly useful."
Whatever the case though, big data is here to stay.
Nick March is editor of The Review.
Updated: April 27, 2013 04:00 AM