The LightningBug Project
Insects comprise over half of the hundreds of millions of scientific specimens in US natural history collections, yet only a fraction of the holdings are digitized because we lack the technology needed to efficiently capture specimen data.
The Importance of Digitization
Most data needed to assess global change impacts on insects is locked up in collected specimen holdings and their labels. The two key parts to mobilizing these data are transcribing specimen labels and imaging specimens. Transcription currently involves humans directly entering label data into spreadsheets, or imaging them for later manual data entry; specimen images are 2D and restricted to 1-3 aspects of exemplars. Consequently, only 5% of specimen labels are transcribed and less than 1% of specimens are imaged.
At current rates, it will take well over 50 years to digitize our insect collections, and result in subpar specimen imaging. The LightningBug project addresses the need for higher rates and better quality by developing a robotic-informatics system that will increase efficiency of transcription 10-fold and dramatically reduce costs. LightningBug will also produce 100-fold more research-quality image suites that capture a full 3D view of specimens. The increase in number and quality of images, coupled with their transcribed label data will usher in a new area of biodiversity-related research and education, from bioinspired robotics to evolutionary ecology.
Intellectual Merit
The intellectual merit of the work proposed here is development of an innovative, performant solution to the pinned insect digitization impediment that showcases production of research ready data. This solution leverages recent advances in automated imaging to efficiently acquire multi-view imagery from pinned insects. Further, it connects those imaging advances to downstream steps that will deliver rich data to multiple communities.
We aim to: 1) create a simple robotic imaging system to capture multi-view images of both labels and specimens; 2) process fragmentary views of multiple labels into single integrated “virtual labels"; 3) connect virtual labels to structured text extraction services; and 4) apply photogrammetric analysis to deduce the 3D shape and structure of specimens. Two science use cases spanning from bioluminescent beetles to gossamer-winged butterflies will highlight the use of specimen-based multi-view imaging in global change and functional morphological analyses and inform development of best practices for producing immediately usable research data.
These digital objects will present new challenges for data preservation and access, but we view this positively, catalyzing new solutions to preserve them and provide efficient access to petabytes of research images. We will address key challenges through a partnership with MorphoSource to develop a linked institutional repository model of data access for large digital assets such as multi-view imaging.
Broader Impacts
Insects comprise over half of the hundreds of millions of scientific specimens in US natural history collections, yet only a fraction of the holdings are digitized because we lack the technology needed to efficiently capture specimen data. Most data needed to assess global change impacts on insects is locked up in these specimens and their labels. The two key parts to mobilizing these data are transcribing specimen labels and imaging specimens. Transcription currently involves humans directly entering label data into spreadsheets, or imaging them for later manual data entry; specimen images are 2D and restricted to 1-3 aspects of exemplars. Consequently, only 5% of specimen labels are transcribed and less than 1% of specimens are imaged.
​