The Natural History Museum is home to over 80 million specimens from across the globe, and the organisation is on a constant mission to make its collection more accessible.
Through the digitisation of collections, the Natural History Museum will optimise the use of its collections, and become a major access point for scientists and research institutes around the globe.
Inselect is a cross-platform, open source desktop application that automates the cropping of images of specimens from whole-drawer scans and similar images that are generated by digitisation of museum collections.
Speeding up a creepy-crawly process
The Inselect application was initially developed to serve a seemingly simple function – it identifies individual specimens from a drawer of samples, so that they can be digitally categorised individually.
>See also: The Natural History Museum goes digital
But this isn’t a quick task; the Natural History Museum houses an estimated 33 million insect specimens in 130,000 drawers. Processed manually, it takes about an hour to categorise a drawer of specimens. Inselect, on the other hand, can do the same job in five to ten minutes, dependant on the complexity of the drawer.
Inselect combines image processing, barcode reading, validation of user-defined metadata and batch processing to offer a unique kind of automation. The application adds clarity as well as speed – scanning whole drawers without segmentation leads to large files that are not capable of evolving as the collections evolve. Segmenting the specimen images allows quick imaging, but facilitates specimen images (which can follow the specimens around the collection) rather than drawer images which can’t. The challenge is to get a single, high-quality image of each object along with its associated metadata, and Inselect manages to do just this.
Open source, open access
As Inselect offers a means of digitising whole drawers and cropping out specific specimen data, the application is not limited to insects – it can be used for all sorts of projects that require cataloguing and categorising for digital collections. In addition to being a flexible tool that can be applied across a range of digital collections, Inselect can operate on Windows and macOS and is open-source, allowing people from across the globe to access and collaborate on the project.
Through its development on GitHub as an open source project, Inselect can be utilised on an international basis, enabling scientists and research institutes from all countries to access unique and rare specimens from a click of their mouse, and providing a significant boost to the Natural History Museum’s digitisation plans.
>See also: Digitisation to transform the UK’s criminal justice system
Collaborating on GitHub has been a major factor in this success. Being a non-profit organisation, open sourcing the project has enabled The Natural History Museum to digitise their collections on Inselect at comparatively little time and expense, and those accessing the collection do not need to pay a licence fee.
“Inselect is a great example of how much sense it makes to open source software on GitHub,” comments Mike McQuaid, Senior Open Source Engineer at GitHub. “Doing so in the Natural History Museum’s case has made a tool that’s not only useful for them, but also to a wide array of other users for free. Additionally, they’ve been able to receive additional contributions from the community, and that makes Inselect all the better for everyone using it.”
Scaling up (and down)
As Inselect is an open source project, the Natural History Museum’s insect collection scans can be accessed across the globe via the Museum’s Data Portal. But what other opportunities are there for open source projects to help the organisation?
Housing 80 million specimens, The Natural History Museum is by no means short of material to digitise. For example, Inselect has been adapted to look at slide digitisation too, and has been used on around 100,000 microscopic slides.
The Digital Collections Programme at the museum is looking into digitising more than just insects; much larger artefacts such as fossils and skeletons can be made available online as well. The scale of these artefacts, however, presents an entirely different challenge — but one that future Open Source software may well be able to solve.
>See also: Digitisation of government to be worth £20BN by 2025
Due to making Inselect an open source project, The Natural History Museum has enabled other organisations to make use of its tools. For example, it has endorsed the Science International Open Data Accord and operates an open-by-default policy on their scientific collections.
This has enabled the University of Sheffield to conduct a project: ‘Mark my bird’. By accessing birds from the museum’s collection, the university researched why bird bills are so diverse. As this project relied on crowdsourcing, GitHub played a key role in ensuring there was clear communication amongst those collaborating on the research.
Of course, the sharing possibilities on GitHub reach far beyond code. Collaboration, collective thinking and the sharing of thoughts and ideas can enable common solutions to complex and large scale challenges across both academia and business alike. After all, two (or more) heads are always better than one.