An Integrated Pipeline to Overcome The Biodiversity Digitization Gap

through semi-automated label transcription and 3D photogrammetric specimen imaging.

How it Works

The LightningBug system significantly reduces the human labor needed to acquire species occurrence records and morphological trait data from insect specimens. 

Label processing

There are two general steps in processing label images, first, the letters and words on each of the labels will be transcribed into text. This involves a combination of applying text recognition software (OCR/HCR) and then vetting the transcriptions by humans (Notes from Nature). The more challenging step will be to develop software to sort the words and parse them into appropriate fields in a database.

Each collection can develop their own curation workflow that works best for their collection in terms of pre- and post-processing curation. A good curation workflow will be important in achieving efficiency in the process. For example, during each run, it will be best to process specimens that are similar in size and shape so that LightningBug can be set to obtain research-quality images.

Specimens and labels are imaged simultaneously but then processed separately. The labels undergo OCR/HCR and structured text parsing while the images are either stored for future or run through a photogrammetric process. Labels are left on the pin for imaging, label capture will occur as long as there is ~2mm separation between the labels. After imaging, specimens should be curated to make it easy to locate in case re-processing is required.

Image Processing

Images of specimens are linked to their label images, but processed separately. The end-user can determine the quality of images by increasing the number of angles imaged as well as implementing focus stacking and changing the sensors used. Images are linked to labels and then stored (MorphoSource) for photogrammetric processing.

Research Use of Specimens

We will sample broadly across the insect orders and families and develop taxon-specific workflows that work best for the respective taxa. Specimen images and labels will be made easily available for a variety of biodiversity content management systems. Images will either go directly to GBIF or indirectly through a primary aggregator (i.e., SCAN). The goal is to provide researchers with high-quality images that can be used for trait analyses and species-level identification.

Photogrammetry is a technique that allows the construction of the three-dimensional shape of specimens or parts thereof from series of two-dimensional photographs taken from multiple viewpoints. Photogrammetrically derived models can be used to overcome many of the limitations of traditional 2D photography and are increasingly being used in studies of morphological evolution.

Photogrammetry relies upon the identification and matching of features across a series of images taken from varying viewpoints. Additional processing is then done to generate dense point clouds and/or depth maps which can be used to produce a 3D mesh.