Validation Dataset for "Entomoscope 2.0 and ENIMAS 2.0: An Open-Source, AI-Integrated Platform for Rapid and Affordable Insect Digitization"
Description
This repository contains the complete dataset and validation dataset used in the manuscript "Entomoscope 2.0 and ENIMAS 2.0: An Open-Source, AI-Integrated Platform for Rapid and Affordable Insect Digitization".
The data is divided into two primary components to ensure full reproducibility of the study's results and to support the training of future AI models for insect digitization.
1. Workflow Validation Dataset (Efficiency Benchmark)
This subset comprises 54 insect specimens digitized using two distinct workflows to benchmark operational efficiency:
-
Entomoscope 2.0: A low-cost, open-source platform using the AI-integrated ENIMAS 2.0 software.
-
Keyence VHX-7000: A high-end commercial digital microscopy system using a manual workflow.
2. YOLO-Fast Training Dataset:
- Images and Labels: A custom dataset of 257 manually annotated insect images. This data was used to fine-tune the YOLOv8 object detection model, which powers the "YOLO-Fast" automated specimen cropping method described in the study.
Contents:
-
Raw Images: Original captures from both systems.
-
Processed Images: Output of AI cropping, background removal, and uniform background generation.
-
Morphometric Data: Automated OBB measurements (Entomoscope) vs. Manual measurements (Keyence).
-
Time Logs: Detailed timing data used to calculate the 2.28-fold efficiency speedup reported in the paper.
- YOLO-Fast Training Data: The 257 original images and their corresponding bounding box labels used for model training.
This data is provided to ensure full reproducibility of the study's results and to support the training of future AI models for insect digitization.