Computer program puts pen to the sword
At some point in the recent past, humanity slipped imperceptibly from the Information Age into the Era of Big Data. We are today in the midst of a statistical revolution that's changing our world.
The shift is less about election polls and jobs numbers, share prices and box office than the unseen tools monitoring our virtual and real-world location, our shopping habits, the scale of rush-hour traffic jams in Mumbai and the precise placement of the home team's fielders in the eighth inning of a Major League baseball game.
Experts estimate the world's store of data is growing at a rate of about five trillion bits per second, doubling every two years. The data created each day could fill a billion books of 10 million pages each, according to Nate Silver, author of the just-published The Signal and the Noise. Yet our ability to tease out the lessons in all those numbers - to find the signal in the noise - has lagged behind. That's where Narrative Science comes in.
"People haven't been thinking about using data to communicate," says Kristian Hammond, the company's co-founder and chief technology officer, during a recent interview at the two-year-old company's Chicago headquarters. "They've been thinking about how do I show you the data, how do I expose data. We're all about taking the next step, the very human step of examining the data, drawing insight out of it and then crafting reporting out of that insight."
As the demand for teasing out and relaying the value and meaning of numbers increases exponentially, man's monopoly on writing is nearing its end. Narrative Science and a few other firms have devised computerised tools that transform data into relatively sophisticated narratives - raising the possibility, yet again, that the news reporter as we know him may soon become extinct.
"You're going to see more tools like this," says Matt Waite, a journalism professor at the University of Nebraska. "I don't see the trend shifting anytime soon, and inevitably they're going to get better."
The great irony of Narrative Science is that it was incubated in part at Northwestern University's Medill School of Journalism, one of America's better training grounds for reporters. While teaching a 2009 course on journalism and programming, co-instructors Hammond and Larry Birnbaum (now Narrative Science's chief scientific adviser) urged their students to create a system that could turn data into reporting.
One student group came up with a tool called Stats Monkey, which could transform baseball statistics into game recaps (with the byline, "The Machine"). "Once we did baseball we knew exactly what we had," says Hammond, referring to the program's ability to do all variety of data-heavy narratives.
Though its media output gets most of the attention, about 80 per cent of Narrative Science's total revenue - which has jumped four-fold in the past year - comes from big data clients. "We're not moving away from media but we're growing the rest of the company way faster," says Hammond.
One fast-food client, for example, receives detailed weekly reports tailored to sales and operations at each of its 14,000 franchises. Narrative Science's chief executive is Stuart Frankel, a former DoubleClick executive who brings his tech industry experience to bear. His company is now moving into medical reports and student testing and recently began creating personalised reports for business conference attendees based on Twitter comments.
Narrative Science is not the only player in this field. The US Department of Defense's Defense Advanced Research Projects Agency, or Darpa, has a team of scientists at the Massachusetts Institute of Technology working on a computer tool to transform raw data into clear, concise writing. Wikipedia uses bots to troll for errors in its millions of entries.
Yet Narrative Science goes much further. Its newswriting system, called Quill, begins with a set of tools governing each topic and the relationships between them. Input is routed into the appropriate category and Quill makes deductions based on the numbers, which it then turns into prose according to a set of templates and topic-specific vocabulary devised by Narrative Science programmers.
"By the time it hits the point where it's picking phrases, it usually has at least a half-dozen structural ways it's going to say what it wants to say," says Hammond. "At the micro-level, it knows how to pull words in and out to put in variable details, and beyond that it knows what it's already said earlier in the story."
The final product - US$10 (Dh36) for each 500-word article - reads something like this: "Analysts have become increasingly bullish on Discover Financial Services (DFS) in the month leading up to the company's third quarter earnings announcement scheduled for Thursday, September 27, 2012. The consensus earnings per share estimate has moved up from $1.02 a share to the current expectation of earnings of $1.04 a share."
Not exactly Hemingway, but as a market report it's perfectly acceptable, human even. In addition to Forbes.com, for which the above was written, Narrative Science media clients include the Big Ten Network, the financial information firm Markit, and GameChanger, which expects to produce 1.5 million recaps of children's baseball games this year, via Quill.
Since The Big Ten Network began using StatsMonkey in 2010, the sports-focused website has since seen its traffic increase by more than 30 per cent. Narrative Science's main competitor, Automated Insights (AI), also started with sports reporting and has moved on to other media areas. The company just began a partnership with Yahoo! under which it expects to produce 50 million personalised recaps of American football games over the next few months. AI also started a partnership this summer to produce as many as a million stories a year for a national online estate agent. As the number of computer-generated stories reaches into the tens of millions, these tools may start to reshape the news landscape. Clients of Narrative Science and Automated Insights are already able to customise a story's tone, from a wry, seen-it-all veteran Premier League football reporter to an overenthusiastic financial correspondent. Soon they'll start to personalise the news, tailoring stories to their neighbourhood, their profession, their politics.
A recent Gallup poll found that more than 60 per cent of Americans distrust their news sources, which suggests an opening for computer-generated content. This could lead to countless millions of personalised stories with built-in bias - and a world of people reading only the news they like.
Analysts argue this trend could push opposing groups further out of touch, reinforcing their own views and marginalising those outside the "news filter bubble". "Yes, it's possible to insert bias into our stuff," explains Hammond.
"Machines are scary, and part of our job and the content we produce is to make them less scary, and you make them less scary by making them more human."
But in making his machines more human, they may also become more scary - at least for media professionals. Hammond says he could adjust Quill to add the "human element" in stories, focusing on a single victim of a factory closing, for instance. Even so, it may be a long time before a computer understands complicated human ideas and expressions.
"Sarcasm, empathy, that sort of thing is really difficult for a computer to grasp," says Waite, the University of Nebraska journalism professor, who co-founded the Pulitzer Prize-winning news site Politifact. "Until computers are able to sense the subtle differences in the way we express ourselves, I'm really not afraid of Narrative Science taking away our jobs."
With smaller editorial staffs, Waite sees Narrative Science as a positive development for newsrooms. Hammond predicts that within 15 years tools like Quill will be creating as much as 90 per cent of all news stories.
It's hard to see how a string of algorithms could report from a natural or man-made disaster, so that number may be a bit high. And it will be a long time before Quill can punch out the closely-reported, long-form journalism that often wins awards. But most analysts estimate that by early 2014 every media company will have incorporated automated content into its newsroom in some way.
And Hammond has repeatedly predicted that a computerised reporting tool - able to comb oceans of data with much greater speed and efficiency than humans and potentially uncover powerful, previously-unattainable stories - will win a Pulitzer Prize by 2015. "That's three years away," he says with a grin, making the sound of a ticking clock.
The name Quill fits, as its arrival echoes Johannes Gutenberg's invention of the printing press. In advancing the technology that put words to paper, Gutenberg's press nearly made feather-and-ink scribes obsolete - though that was not his intention.
Certainly, the reporter will live on, in one form or another, but journalists of the future will need to be more disciplined and better trained. They'll need their own style, their own brand, to keep from being automated.
After all, a computer is infinitely faster and more responsive to request. It does not make mistakes, tire or grow lazy, whine about a dull assignment, take holidays or become frustrated with office politics. The Machine is coming.
David Lepeska is a freelance writer who contributes to The New York Times and Financial Times, and previously served as The National's Qatar correspondent.
Updated: November 3, 2012 04:00 AM