Preserving literature

We look at the debate on how best to archive the world's books and historical documents.

Powered by automated translation

Of the hundreds of thousands of books published worldwide this year (not to mention all the newspapers, magazines and websites), which of them will our grandchildren be able to dig out and read in a hundred years' time? Many of us assume that someone, somewhere is archiving all this material, but a debate is raging about how this should best be done to ensure that the world's literature and historical documents are preserved for future generations - and it's far from simple.

The latest chapter in the story is a recent announcement by Brewster Kahle, the man behind the huge, non-profit digital library Internet Archive, who said that the organisation will be storing physical copies of every book it is able to acquire (including those scanned and kept in digital form.) "Books are being thrown away, or sometimes packed away, as digitised versions become more available," Kahle said in a statement.

Internet Archive's motto is "universal access to all knowledge", and it has digitised versions of nearly three million public domain books - a project that started in 1996, long before Google Books launched with a similar goal in 2004.

But, according to Kahle, after a book has been digitised and sent back to a library, it's often moved off-site or thrown away altogether, and sometimes bindings are cut off books to speed up the process. You'd think that such an early adopter of digital methods would be fine with consigning real-life books to the dustbin of history, but in his article, Kahle outlines the reasons it's important to keep an ink-and-paper library.

Firstly, a dispute about the fidelity of differing digital versions of a book could be solved only by consulting an original printed copy, he says. Secondly, once technology improves, archivists may want to scan books at a higher DPI. And thirdly, digital material can degrade just as paper can: perhaps more easily, as problems such as "bit rot" and "flipped bits" can change data even if the physical hard drives are perfectly preserved.

Kahle is backed up by the well-reviewed 2001 book Double Fold: Libraries and the Attack on Paper by Nicholson Baker, which uses extensive research to argue that the microfilming boom of the 1980s and 1990s resulted in the destruction of many documents, which in his view it would have been better to preserve. He finishes by saying, ominously, that: "The second major wave of book wastage and mutilation, comparable to the microfilm wave but potentially much more extensive, is just beginning."

According to an estimate produced by Google in 2010, there are nearly 130 million separate titles out there, and Internet Archive hopes to find storage for around 10 million of them, in various languages, as well as audio and video. The books will be packed into climate-controlled storage containers in a facility in Richmond, California, as of this month.

It's true that there are other organisations with vast copyright libraries, such as universities and governments, but Kahle says the more archives there are, the better: each will have different uses and functions. Internet Archive's library won't be for browsing or borrowing (for that, users can go online). Its primary goal is long-term preservation.

"A seed bank might be conceptually closest to what we have in mind," Kahle says, "storing important objects in safe ways to be used for redundancy, authority, and in case of catastrophe." In other words: we want to avoid another calamity like the burning of the ancient library at Alexandria, which resulted in the destruction of books - by Aristotle and Aeschylus, for example - that were an incalculable loss to humanity.

In some countries, such as the UK and Australia, copyright law requires that a copy of every book published in the country must be kept in the national library, but other countries such as America - the world's biggest publisher - are selective about what they keep.

The British Library and the Library of Congress in the United States, store about another three million books a year. With an increasing amount of published material surfacing each year - not only books but websites, blogs, music and film - will it continue to be feasible to store the world's entire artistic output, and would we even want to? Kahle, for one, is willing to give it a go.

There are other problems, though. In 2005, The Authors Guild and others filed a lawsuit claiming that Google Books is infringing the copyright of books it has scanned, even though it provides only snippets of these books to readers unless they are in the public domain. Google tried to settle for $125 million but this was rejected, and the supervising judge recommended that the case go to the US Congress.

Although the Internet Archive doesn't intend to make its digitised books widely available, the outcome of the Google case will have wide-reaching effects for the future of digital (and perhaps physical) archives. The question isn't just whether our grandchildren in the future will have access to today's books and newspapers, it's also at what price, and who will grant them this access. A global company that relies on advertising for revenue, like Google, might not be the best information gatekeepers - but, as Kahle says, the more archives around, the better for all of us.