Published November 23, 2020 | Version v1
Dataset Open

Random forests for predicting species identity of forensically important blow flies (Diptera: Calliphoridae) and flesh flies (Sarcophagidae) using geometric morphometric data: proof of concept

  • 1. University of Malaya
  • 2. Gene Express Sdn. Bhd.*

Description

Wing shape variation has been shown to be useful for delineating forensically important fly species in two Diptera families: Calliphoridae and Sarcophagidae. Compared to DNA-based identification, the cost of geometric morphometric data acquisition and analysis is relatively much lower because the tools required are basic, and stable softwares are available. However, to date, an explicit demonstration of using wing geometric morphometric data for species identity prediction in these two families remains lacking. Here, geometric morphometric data from 19 homologous landmarks on the left wing of males from seven species of Calliphoridae (n=55), and eight species of Sarcophagidae (n=40) were obtained and processed using Generalized Procrustes Analysis. Allometric effect was removed by regressing centroid size (in log10) against the Procrustes coordinates. Subsequently, principal component analysis of the allometry-adjusted Procrustes variables was done, with the first 15 principal components used to train a random forests model for species prediction. Using a real test sample consisting of 33 male fly specimens collected around a human corpse at a crime scene, the estimated percentage of concordance between species identities predicted using the random forests model and those inferred using DNA-based identification was about 80.6% (approximate 95% confidence interval = [68.9%, 92.2%]). In contrast, baseline concordance using naive majority class prediction was 36.4%. The results provide proof of concept that geometric morphometric data has good potential to complement morphological and DNA-based identification of blow flies and flesh flies in forensic work. 

Notes

The data set can be used as it is. See the README.txt file for descriptions of file contents.

Funding provided by: Universiti Malaya
Crossref Funder Registry ID: http://dx.doi.org/10.13039/501100004386
Award Number: PG074-2015A

Files

DNA_data.zip

Files (20.8 MB)

Name Size Download all
md5:af93d54ccc87c69ef10a54499f8de1a7
4.8 kB Preview Download
md5:358ab4edb6882a17a2dd5f60bf26e4f9
19.2 kB Preview Download
md5:7090a8c309e18c7d0d16000dd18697f6
20.7 MB Preview Download
md5:5d016c6a35c64e5ac5a68d0ec047e890
15.7 kB Download
md5:8c08675a6b4dde803a8aaeebe2dd3459
2.9 kB Preview Download
md5:ebb64f5a1f87b0a03617f07cb336a2fe
41.5 kB Preview Download