Published March 25, 2022 | Version 2
Dataset Open

Sticky Pi -- Machine Learning Data, Configuration and Models

  • 1. University of British Columbia

Description

Dataset for the Machine Learning section of the Sticky Pi project (https://doc.sticky-pi.com/)

Contains the dataset for the three algorithms described in the publication: Universal Insect Detector, Siamese Insect Matcher and Insect Tuboid Classifier.

Universal Insect Detector:

`universal_insect_detector/` contains training/validation data, configuration files to train the model, and the model as trained and used for publication.

  • `data/` – A set of svg images that contain the embedded jpg raw image, and a set of non-intersecting polygon around the labelled insects
  • `output/`
    • `model_final.pth` – the model as trained for the publication
  • `config/`
    • `config.yaml `– The configuration file defining the hyperparameters to train the model
    • `mask_rcnn_R_101_C4_3x.yaml` – the base configuration file from which config is derived

 

Siamese Insect Matcher

`siamese_insect_matcher/` contains training/validation data, configuration files to train the model, and the model as trained and used for publication.

  • `data/` – a set of svg images that contain two embedded jpg raw images vertically stacked corresponding to two frames in a series. Each predicted insect is labelled as a polygon. Insects that are labelled as the same instance, between the two frames, are grouped (i.e. SVG group). The filename of each image is `<device>.<datetime_frame_1>.<datetime_frame_2>.svg`
  • `output/`
    • `model_final.pth` – the model as trained for the publication
  • `config/`
    • `config.yaml` – The configuration file defining the hyperparameters to train

Insect Tuboid Classifier:

`insect_tuboid_classifier/` contains images of insect tuboid, a database file describing their taxonomy, a configuration file to train the model, and the model as trained and used for publication.

  • `data/`
    • `database.db`: a sqlite file with a single table `ANNOTATIONS`. The table maps a unique identifier of each tuboid (tuboid_id) to a set of manually annotated taxonomic variables.
    • A directory tree of the form: `<series_id>/<tuboid_id>/`. Each terminal directory contains:
        • `tuboid.jpg` – a jpeg image made of 224 x 224 tiles representing all the shots in a tuboid, left to right, top to bottom – might be padded with empty images
        • `metadata.txt` – a csv text file with columns:
            • parrent_image_id – <device>.<UTC_datetime>
            • X – the X coordinates of the object centroid
            • Y – the Y coordinates of the object centroid
        • scale – The scaling factor applied between the original and image and the 224 x 224 tile (>1 => image was enlarged)
        • `context.jpg` – a representation of the first whole image of a series, with a box around the first tuboid shot (this is for debugging/labelling purposes)
  • `output/`
    • `model_final.pth` – the model as trained for the publication
  • config/
    • `config.yaml` – The configuration file defining the hyperparameters to train the model as well as the taxonomic labels

Notes

Second version. Added data to the UID and SIM. Minor changes in the configurations.

Files

insect-tuboid-classifier.zip

Files (11.0 GB)

Name Size Download all
md5:f125654fefb6a94c5c9b1014c812344b
6.9 GB Preview Download
md5:847e05350bf8894e6ea3877af41c8f74
2.4 GB Preview Download
md5:af04c62fc6b1e8e9453a202f7af6900a
1.8 GB Preview Download