Published December 4, 2022 | Version v1
Dataset Open

Benchmark datasets for detection and identification of insects from camera trap images with deep learning

Description

Insect benchmark datasets for training, validation and test (train1201.zip, val1201.zip and test1201.zip) with time-lapse images as described in paper:

Bjerge K, Alison J, Dyrmann M, Frigaard C.E., Mann H. M. R., Høye T.T., Accurate detection and identification of insects from camera trap images with deep learning, bioRxiv:10.1101/2022.10.25.513484v1

Labels in YOLO format: ultralytics/yolov5: label format

The annotated training and validation datasets contains insects of nine different species as listed below:

Coccinellidae septempunctata
Apis mellifera
Bombus lapidarius
Bombus terrestris
Eupeodes corolla
Episyrphus balteatus
Aglais urticae
Vespula vulgaris
Eristalis tenax

The test dataset contains additional classes of insects.

9 Non-Bombus Anthophila
10 Bombus spp.
11 Syrphidae
12 Fly spp.
13 Unclear insect
14 Mixed animals:
——————————
Rhopalocera
Non-Anthophila Hymenoptera
Non-Syrphidae Diptera
Non-Conccinalidae Coleoptera
Concinellidae
Other animals

There are two naming conventions for image (.jpg) and label (.txt) files.

Background images without insects are named:
X_Seq-YYYYMMDDHHMMSS-snapshot”.
E.g.:
Background image: 12_13-20190704172200-snapshot.jpg
Empty label file: 12_13-20190704172200-snapshot.txt

Images annotated with insects are named:
SZ_IP-MonthDate_C_Seq-YYYYMMDDHHMMSS”.
E.g.:
Image file: S1_146-Aug23_1_156-20190822133230.jpg
Label file: S1_146-Aug23_1_156-20190822133230.txt

Abbreviations:

YYYYMMDDHHMMSS – Capture timestamp with year, month, date, hour, minutes, and second
Seq – Sequence number created by the motion program to separate images
C – Identification of two cameras with Id=0 or Id=1 in system identified by SZ_IP
MonthDate – Folder name for where the original image were stored in the system
SZ_IP – Identification of five camera systems: S1_123, S2_146, S3_194, S4_199, S5_187 (Two cameras in each system)
X – An index number related to a specific camera and folder ensuring unique file names of background images from different camera systems.

The important information in a filename is system (SZ_IP), camera Id (C) and timestamp (YYYYMMDDHHMMSS).

The three best YOLOv5 models (YOLOv5models.zip) from the paper are available in pytorch format.

All models are tested with YOLOv5 release v7.0 (22-11-2022): ultralytics/yolov5: YOLOv5  in PyTorch

insect1201-bestF1-640v5m.pt: Model no. 6 in Table 2 (F1=0.912)
insect1201-bestF1-1280v5m6.pt: Model no. 8 in Table 2 (F1=0.925)
insect1201-bestF1-1280v5m6.pt: Model no. 10 in Table 2 (F1=0.932)

insects-1201val.yaml: YAML file with label names to train YOLOv5

trainInsects-1201m.sh: Linux bash shell script with parameters to train YOLOv5m6
valInsectsF1-1201.sh: Linux bash shell script with parameters to validated models

 

Files

test1201.zip

Files (13.8 GB)

Name Size Download all
md5:d940cac65cf067a3baf356ecaa9944e3
3.0 GB Preview Download
md5:6831b05cab0988743a113819eb23be75
9.0 GB Preview Download
md5:88317db11fd10fab4976edb4d8d4a71f
1.2 GB Preview Download
md5:bc2194e94bfbe0ba93e4a66df6eb6f1b
513.2 MB Preview Download

Additional details

Related works

References
Preprint: 10.1101/2022.10.25.513484 (DOI)

Funding

European Commission
EcoStack – Stacking of ecosystem services: mechanisms and interactions for optimal crop protection, pollination enhancement, and productivity 773554
European Commission
MAMBO – Modern Approaches to the Monitoring of BiOdiversity 101060639

References

  • Bjerge K, Alison J, Dyrmann M, Frigaard C.E., Mann H. M. R., Høye T.T., Accurate detection and identification of insects from camera trap images with deep learning, bioRxiv:10.1101/2022.10.25.513484v1