Neil Cobb
Aug 1, 2021
The NSF-Innovation: Bioinformatics: Collaborative Research: LightningBug, An Integrated Pipeline to Overcome the Biodiversity Digitization Gap. August 2021 to August 2024. $1,076,032 NSF Awards # 2104149 (Rios), 2104152 (Guralnick), 2104151 (Hereld), 2104150 (Pierce).
PROJECT SUMMARY
Overview:
IIBR Keywords: Multi-View Imaging, Label Reconstruction, Specimen Reconstruction, Transcription, Mobilization
Insects comprise over half of the >500 million scientific specimens held in US natural history collections, yet only a fraction of these insects are digitized because we lack the technology needed to efficiently capture specimen data. Most of the data needed to assess global change impacts on insects are locked up in the specimens and their labels. The two key parts to mobilizing the data are transcribing specimen labels and imaging specimens. Transcription currently involves humans directly entering label data into databases, or imaging the labels for later manual data entry; specimen images are 2D and restricted to 1-3 aspects of exemplars. Consequently, at present, only 5% of specimen labels are transcribed and less than 1% of specimens are imaged. With the status quo, it will take well over 50 years to digitize our insect collections, and this will also result in subpar specimen imaging. LightningBug addresses the need for higher rates and better quality by developing a robotic-informatics system that will increase transcription efficiency 10-fold and dramatically reduce costs. LightningBug will also produce 100-fold more research- quality image suites that capture a full 3D view of specimens. The increase in number and quality of images, coupled with the transcribed label data, will usher in a new area of biodiversity-related research, from bioinspired robotics to evolutionary ecology.
Intellectual Merit:
LightningBug will provide an innovative, performant solution to the pinned insect digitization impediment that showcases production of research-ready data. The LightningBug model leverages recent advances in automated imaging to efficiently acquire multi-view imagery from pinned insects. Further, it connects those imaging advances to downstream steps that will deliver rich data to multiple communities. We aim to: (1) create a simple robotic system to capture multi-view images of both labels and specimens; (2) process fragmentary views of multiple labels into single integrated “virtual labels;" (3) connect virtual labels to structured text extraction services; and (4) apply photogrammetric analysis to deduce the 3D shape and structure of specimens. Two science use cases (bioluminescent beetles, gossamer-winged butterflies) will highlight the use of specimen-based multi-view imaging in studies of global change and functional morphology. The resultant digital objects will present new challenges for data preservation and access, but they will also catalyze new solutions for preservation and efficient access to petabytes of research images. We will address this challenge via a partnership with MorphoSource, and develop a linked institutional repository model for data access to large digital assets such as multi-view imaging.
Broader Impacts:
The immediate impact of LightningBug will be to provide natural history collections around the world with a set of robotic and informatics tools to exponentially increase digitization capacity. The LightningBug model will be affordable for small to large insect collections, and many aspects of hardware and software development will be transferable to all types of natural history collections. LightningBug will be tested at the Yale Peabody Museum (YPM) and the Harvard Museum of Comparative Zoology (MCZ), and we will actively collaborate with other collections to spark early adoption and build a community of practice around innovations for biocollections digitization. We will engage the natural history collections community via online webinars and the Entomological Collections Network. LightningBug will stimulate research in robotics, imaging, text processing, data standards and archiving, and in the longer term will promote new research initiatives, including ecomorphology and multidimensional trait analysis. We will collaborate with an ongoing education program (Libraries of Life) to develop 3D augmented reality designs integrated into educational learning resource pipelines. Complementary educational activities will reach K–12 and undergraduate students and community scientists via the YPM, MCZ, and the community science platform Notes from Nature.