Last week I read with interest on e-Science News that Harvard University has begun the extensive task of preserving its rapidly aging digital materials.
Higher Education is one of many areas in which there is a growing amount of traction for this issue, with banks of research data continuing to grow while the digital formats upon which the material is stored begin to age.
At the beginning of May I was delighted to hear the Vatican discuss its intentions to digitally preserve the contents of its library, and so it is encouraging to also hear Harvard, another world-renowned institution, describing the task of ensuring that its assets ‘live on’ as 'one of the most pressing issues in preservation science.'
After all, if any university is going to take the task of preserving its digital assets seriously, it’s going to be Harvard.
The Harvard libraries and archives contain an immense volume of digital information that has been gathered over several decades, and is therefore currently stored on hundreds of different formats that are quickly becoming outdated.
> See also: Lack of social media archiving exposes UK businesses to risk
When this digital material first began to enter libraries in the 1980’s on floppy disks and tapes it was largely logged and tucked away as simply a growing collection of artefacts, and so a substantial amount of data may not have been accessed for 30 years, let alone archived or converted to a sustainable format.
As a result, the Harvard librarians are now scrambling to move this data from the quickly ageing formats upon which it is currently held to a modern medium that we can be confident will still be accessible in the near future, and understandably so.
A recent study written by Timothy Vines has found that with every passing year, the odds of a data set that was published in the last 22 years being retrievable fall by 17%. The degradation of a dated piece of digital data is therefore a process that may not occur for several years but can suddenly and rapidly take irreversible effect; a looming threat that is driving the librarians’ urgent work.
However, I am concerned that Harvard is so intent on ensuring that this data is retrievable today, it may be failing to fully explore the need to archive and to preserve it in a way that will be secure for generations to come. The Vatican is reportedly preserving its digital assets using open-source, non-proprietary software with the specific aim of ensuring that the data is still accessible in 50 years, and I feel that a similar solution should also be the next stage in Harvard’s plan.
Currently, sophisticated digital forensic software is being used to successfully retrieve the valuable data from failing formats before copying it to a modern device. This is delicate and time-consuming work, and the Harvard librarians are doing an inspiring job, but I cannot stress enough the importance of also consulting a provider of specialist services designed for the long-term storage of this data after it has been retrieved.
Without the implementation of storage services, Harvard must consider the inevitable fact that the modern formats upon which its data is now being saved will themselves become obsolete in the not-so-distant future, and that future generations of librarians will be tackling exactly the same preservation problems that are plaguing their predecessors today.
However, it is important to remember that Harvard is not the only university facing the task of rescuing digital material from dated formats, and indeed that universities are not the only institutions to be tackling this issue.
For example, only 50% of American films shot before 1950 were expected to survive past the year 2000, and it is believed that around 80% of silent movies made in the 1910’s and 1920’s have now been lost largely due to irreversible neglect. Vint Cerf, a vice-president of Google, is concerned that without a rise in awareness of the importance of correctly preserving digital materials, future generations will have little record of the 20th Century and will enter 'a digital dark age.'
> See also: Before you store your data, think about how to get it back
I feel I must urge Harvard to investigate the use of storage facilities that will ensure all of its data is accessible in 10, 25 or even 50 years time, so that this wealth of knowledge can be professionally preserved to the highest possible standard, and stored in such a way that it can all be returned quickly, easily and in exactly the same condition to that in which it was left.
The efforts currently being made by Harvard’s librarians to stop the decay of this unique digital data are an excellent first step, but effectively planning a long-term archiving strategy today, similar to that being devised by the Vatican, is the only way to be certain that digital material will be safe for the use of future generations.