Digital Preservation or Conservation?

Last week’s episode of the Spark podcast featured a segment on digital preservation, a concept I’m interested in both from an organizational and practical point of view. The host interviewed Seamus Ross, Dean of the Faculty of Information at the University of Toronto. In the course of his interview, Ross mentioned Digital Preservation and Nuclear Disaster, an animation about digital preservation, and the problem of bitrot, where storage media degrades such that software can’t interpret the bitstream because some information has been lost. Ross also suggested that we should be storing entire databases of information (medical records, tax returns, etc.) for posterity because historians:

“are going to be very interested in large data sets, because embedded in these data-sets is the ability to look at our society at high levels of granularity. You can see the individuals, but you can also see the trends. And they can ask new and original questions that help them to understand who and what we were better. It’s in that base of information that the greatest knowledge about our contemporary society is being held.”

This concept came up initially for me when the site transitioned from the Bush administration to the Obama administration. Many people wondered (and still wonder) what happened to all the information that used to be at that website. Suggestions have ranged from archiving these sites and moving them to new domains or having them as subsites of But the larger problem is really whether storing large data sets, given how rapidly large amounts of data is generated, is practical. I am all for archiving and preserving information for history’s sake, but if we do this, we’ll need digital curators just as much as we’ll need the physical resources necessary to hold the data. What we don’t want is vast storage of junk tweets, blog posts, comments, Facebook wall posts, etc. Perhaps we should be considering digital conservation


