There is a newer version of this record available.

Dataset Open Access

Sticky Pi -- Machine Learning Data, Configuration and Models

Quentin Geissmann

Dataset for the Machine Learning section of the Sticky Pi project (https://doc.sticky-pi.com/)

Contains the dataset for the three algorithms described in the publication: Universal Insect Detector, Siamese Insect Matcher and Insect Tuboid Classifier.

Universal Insect Detector:

`universal_insect_detector/` contains training/validation data, configuration files to train the model, and the model as trained and used for publication.

  • `data/` – A set of svg images that contain the embedded jpg raw image, and a set of non-intersecting polygon around the labelled insects
  • `output/`
    • `model_final.pth` – the model as trained for the publication
  • `config/`
    • `config.yaml` – The configuration file defining the hyperparameters to train the model as well as the taxonomic labels
    • `config.yaml `– The configuration file defining the hyperparameters to train the model
    • `mask_rcnn_R_101_C4_3x.yaml` – the base configuration file from which config is derived

 

Siamese Insect Matcher

`siamese_insect_matcher/` contains training/validation data, configuration files to train the model, and the model as trained and used for publication.

  • `data/` – a set of svg images that contain two embedded jpg raw images vertically stacked corresponding to two frames in a series. Each predicted insect is labelled as a polygon. Insects that are labelled as the same instance, between the two frames, are grouped (i.e. SVG group). The filename of each image is `<device>.<datetime_frame_1>.<datetime_frame_2>.svg`
  • `output/`
    • `model_final.pth` – the model as trained for the publication
  • `config/`
    • `config.yaml` – The configuration file defining the hyperparameters to train the model as well as the taxonomic labels
    • `config.yaml` – The configuration file defining the hyperparameters to train the model

 

Insect Tuboid Classifier:

`insect_tuboid_classifier/` contains images of insect tuboid, a database file describing their taxonomy, a configuration file to train the model, and the model as trained and used for publication.

  • `data/`
    • `database.db`: a sqlite file with a single table `ANNOTATIONS`. The table maps a unique identifier of each tuboid (tuboid_id) to a set of manually annotated taxonomic variables.
    • A directory tree of the form: `<series_id>/<tuboid_id>/`. Each terminal directory contains:
        • `tuboid.jpg` – a jpeg image made of 224 x 224 tiles representing all the shots in a tuboid, left to right, top to bottom – might be padded with empty images
        • `metadata.txt` – a csv text file with columns:
            • parrent_image_id – <device>.<UTC_datetime>
            • X – the X coordinates of the object centroid
            • Y – the Y coordinates of the object centroid
        • scale – The scaling factor applied between the original and image and the 224 x 224 tile (>1 => image was enlarged)
        • `context.jpg` – a representation of the first whole image of a series, with a box around the first tuboid shot (this is for debugging/labelling purposes)
  • `output/`
    • `model_final.pth` – the model as trained for the publication
  • config/
    • `config.yaml` – The configuration file defining the hyperparameters to train the model as well as the taxonomic labels
Files (9.0 GB)
Name Size
insect-tuboid-classifier.zip
md5:f125654fefb6a94c5c9b1014c812344b
6.9 GB Download
siamese-insect-matcher.zip
md5:edf7b5fa94bd074e5e52284e96510c0e
1.1 GB Download
universal-insect-detector.zip
md5:2621daac4341ea3f7777c2b84e1c8568
1.1 GB Download
71
258
views
downloads
All versions This version
Views 7161
Downloads 258251
Data volume 351.2 GB328.0 GB
Unique views 6354
Unique downloads 2823

Share

Cite as