# Sticky Pi -- Machine Learning Data, Configuration and Models

Quentin Geissmann

Dataset for the Machine Learning section of the Sticky Pi project (https://doc.sticky-pi.com/)

Contains the dataset for the three algorithms described in the publication: Universal Insect Detector, Siamese Insect Matcher and Insect Tuboid Classifier.

Universal Insect Detector:

universal_insect_detector/ contains training/validation data, configuration files to train the model, and the model as trained and used for publication.

• data/ – A set of svg images that contain the embedded jpg raw image, and a set of non-intersecting polygon around the labelled insects
• output/
• model_final.pth – the model as trained for the publication
• config/
• config.yaml – The configuration file defining the hyperparameters to train the model as well as the taxonomic labels
• mask_rcnn_R_101_C4_3x.yaml – the base configuration file from which config is derived

Siamese Insect Matcher

siamese_insect_matcher/ contains training/validation data, configuration files to train the model, and the model as trained and used for publication.

• data/ – a set of svg images that contain two embedded jpg raw images vertically stacked corresponding to two frames in a series. Each predicted insect is labelled as a polygon. Insects that are labelled as the same instance, between the two frames, are grouped (i.e. SVG group). The filename of each image is <device>.<datetime_frame_1>.<datetime_frame_2>.svg
• output/
• model_final.pth – the model as trained for the publication
• config/
• config.yaml – The configuration file defining the hyperparameters to train the model as well as the taxonomic labels
Insect Tuboid Classifier:

insect_tuboid_classifier/ contains images of insect tuboid, a database file describing their taxonomy, a configuration file to train the model, and the model as trained and used for publication.

• data/
• database.db: a sqlite file with a single table ANNOTATIONS. The table maps a unique identifier of each tuboid (tuboid_id) to a set of manually annotated taxonomic variables.
• A directory tree of the form: <series_id>/<tuboid_id>/. Each terminal directory contains:
• tuboid.jpg – a jpeg image made of 224 x 224 tiles representing all the shots in a tuboid, left to right, top to bottom – might be padded with empty images
• metadata.txt – a csv text file with columns:
• parrent_image_id – <device>.<UTC_datetime>
• X – the X coordinates of the object centroid
• Y – the Y coordinates of the object centroid
• scale – The scaling factor applied between the original and image and the 224 x 224 tile (>1 => image was enlarged)
• context.jpg – a representation of the first whole image of a series, with a box around the first tuboid shot (this is for debugging/labelling purposes)
• output/
• model_final.pth – the model as trained for the publication
• config/
• config.yaml – The configuration file defining the hyperparameters to train the model as well as the taxonomic labels
