Published July 18, 2023 | Version v1
Dataset Open

Machine learning analysis of wing venation patterns accurately identifies Sarcophagidae, Calliphoridae and Muscidae fly species

  • 1. University of Malaya
  • 2. Universiti Teknologi MARA
  • 3. Natural History Museum
  • 4. Airlangga University
  • 5. International Department of Dipterology*

Description

In medical, veterinary, and forensic entomology, the ease and affordability of image data acquisition have resulted in whole-image analysis becoming an invaluable approach for species identification. Krawtchouk moment invariants are a classical mathematical transformation that can extract local features from an image, thus allowing subtle species-specific biological variations to be accentuated for subsequent analyses. We extracted Krawtchouk moment invariant features from binarised wing images of 759 male fly specimens from the Calliphoridae, Sarcophagidae, and Muscidae families (13 species and a species variant). Subsequently, we trained the Generalized, Unbiased, Interaction Detection and Estimation (GUIDE) random forests classifier using linear discriminants derived from these features and inferred the species identity of specimens from the test samples. Five-fold cross validation results show a 98.56 ± 0.38% (standard error) mean identification accuracy at the family level, and a 91.04 ± 1.33% mean identification accuracy at the species level. The mean F1-score of 0.89 ± 0.02 reflects good balance of precision and recall properties of the model. The present study consolidates findings from previous small pilot studies of the usefulness of wing venation patterns for inferring species identities. Thus, the stage is set for the development of a mature data analytic ecosystem for routine computer image-based identification of fly species that are of medical, veterinary, and forensic importance.

Notes

The binarised image files can be read using R for subsequent analyses. The raw image files are in TIF or PNG format and the binarised image files are in PNG format. Both types of formats can be opened using standard image softwares.

Files

preprocessedData.zip

Files (2.7 GB)

Name Size Download all
md5:d522a2cef787d68c49cddeb3453e3ec8
3.5 MB Preview Download
md5:30e7c54cc1b734ffb882cb66e3130eb9
2.7 GB Preview Download
md5:6fd4f0726338c46c17336712289e06ef
1.8 kB Preview Download

Additional details

Related works

Is derived from
10.5281/zenodo.7519142 (DOI)
Is source of
10.5281/zenodo.7519144 (DOI)