The Wellcome Library holds one of the world’s largest collections on the history of medical science and its role in society, and is maintained to support the Wellcome Trust’s mission of achieving “extraordinary improvements in health by supporting the brightest minds”.
Like all libraries, it is increasingly tasked with curating digital material. In 2009, it embarked on an ambitious project to digitise its entire collection, starting with the works of DNA co-discover Francis Crick. Meanwhile, new submissions – which could be the personal journals of eminent scientists or research data from important experiments – often come in natively digital format.
The Library’s eventual aim is to allow researchers and students to access all of its contents, barring any personally sensitive material, through its website.
To achieve that, the Wellcome Library is currently building a new information architecture that consists of four pillars: a workflow system that helps library staff process new submissions, a search layer that will help users navigate materials, a pool of storage resources and an archiving system.
One interesting facet of the Library’s work is archiving digital material held in obsolete formats. This is a frequent challenge for the organisation, as scientists often donate their work towards the end of their careers, meaning that their files are often stored in formats that are decades old.
“We know that in the future people will want to look back at the work that is being done now in digital formats,” explains Robert Kiley, head of digital services at the Wellcome Library. “And we know from our own experience how difficult dealing with obsolete formats can be.”
One particularly tricky recent submission came in the form of documents that had been created by an application called Locoscript and stored on 5¼” Amstrad floppy disks – both completely unsupported by modern machines. In that case, it took over a year of collaboration with other institutions, including Oxford’s Bodleian Library, to access the documents.
Document formats are falling out of use all the time. For example, project plans created in Microsoft Project 4, which
was only discontinued as recently as 1998, are now entirely unreadable by contemporary computers.
For this reason, preserving digital materials for future generations requires an encyclopedic knowledge of the history of file formats – something that is beyond the scope of even the largest of libraries.
Happily, the UK’s National Archives provide that knowledge in the form of a web service. PRONOM is an online registry describing the structure of file formats and characteristics of the software used to create them. Combined with its associated discovery tool DROID, it allows archivists to analyse files and determine how best to convert them into a more usable format.
For example, creating a PDF of a file from an obsolete word processing system might make it easy to read in future, but if there is supplementary data contained in the file, such as tracked changes, they could be lost.
The Wellcome Library uses an archiving system developed by Oxford-based technology consultancy Tessella that automatically interrogates PRONOM to find out how a given file should be treated. The system, named Safety Deposit Box, is also supported by a community website that allows archivists to share tools for preserving particular documents.
The library is now in the process of extending the Safety Deposit Box implementation to cover its digitised material too.
One of the advantages of digitising this material, Kiley explains, is that it means researchers can access all the documents relevant to a given topic – which may be physically located all around world – from a single place.
For example, the Wellcome Library is now working with the American library that holds the archives of James Watson, the other co-discoverer of the DNA helix, to allow researchers to access Crick and Watson’s work together.
As that reveals, the process of digitising these historical documents is not merely a matter of preserving them for the future, but also increasing their usefulness today.