There is a newer version of this record available.

Dataset Open Access

Sticky Pi -- Machine Learning Data, Configuration and Models

Quentin Geissmann


JSON-LD (schema.org) Export

{
  "inLanguage": {
    "alternateName": "eng", 
    "@type": "Language", 
    "name": "English"
  }, 
  "description": "<p><strong>Dataset for the Machine Learning section of the Sticky Pi project (https://doc.sticky-pi.com/)</strong></p>\n\n<p>Contains the dataset for the three algorithms described in the publication: Universal Insect Detector, Siamese Insect Matcher and Insect Tuboid Classifier.</p>\n\n<p><strong>Universal Insect Detector:</strong></p>\n\n<p>`universal_insect_detector/` contains training/validation data, configuration files to train the model, and the model as trained and used for publication.</p>\n\n<ul>\n\t<li>`data/` &ndash; A set of svg images that contain the embedded jpg raw image, and a set of non-intersecting polygon around the labelled insects</li>\n\t<li>`output/`\n\t<ul>\n\t\t<li>`model_final.pth` &ndash; the model as trained for the publication</li>\n\t</ul>\n\t</li>\n\t<li>`config/`\n\t<ul>\n\t\t<li>`config.yaml` &ndash; The configuration file defining the hyperparameters to train the model as well as the taxonomic labels</li>\n\t\t<li>`config.yaml `&ndash; The configuration file defining the hyperparameters to train the model</li>\n\t\t<li>`mask_rcnn_R_101_C4_3x.yaml` &ndash; the base configuration file from which config is derived</li>\n\t</ul>\n\t</li>\n</ul>\n\n<p>&nbsp;</p>\n\n<p><strong>Siamese Insect Matcher</strong></p>\n\n<p>`siamese_insect_matcher/` contains training/validation data, configuration files to train the model, and the model as trained and used for publication.</p>\n\n<ul>\n\t<li>`data/` &ndash; a set of svg images that contain two embedded jpg raw images vertically stacked corresponding to two frames in a series. Each predicted insect is labelled as a polygon. Insects that are labelled as the same instance, between the two frames, are grouped (i.e. SVG group). The filename of each image is `&lt;device&gt;.&lt;datetime_frame_1&gt;.&lt;datetime_frame_2&gt;.svg`</li>\n\t<li>`output/`\n\t<ul>\n\t\t<li>`model_final.pth` &ndash; the model as trained for the publication</li>\n\t</ul>\n\t</li>\n\t<li>`config/`\n\t<ul>\n\t\t<li>`config.yaml` &ndash; The configuration file defining the hyperparameters to train the model as well as the taxonomic labels</li>\n\t\t<li>`config.yaml` &ndash; The configuration file defining the hyperparameters to train the model</li>\n\t</ul>\n\t</li>\n</ul>\n\n<p>&nbsp;</p>\n\n<p><strong>Insect Tuboid Classifier:</strong></p>\n\n<p>`insect_tuboid_classifier/` contains images of insect tuboid, a database file describing their taxonomy, a configuration file to train the model, and the model as trained and used for publication.</p>\n\n<ul>\n\t<li>`data/`\n\t<ul>\n\t\t<li>`database.db`: a sqlite file with a single table `ANNOTATIONS`. The table maps a unique identifier of each tuboid (tuboid_id) to a set of manually annotated taxonomic variables.</li>\n\t\t<li>A directory tree of the form: `&lt;series_id&gt;/&lt;tuboid_id&gt;/`. Each terminal directory contains:\n\t\t<ul>\n\t\t\t<li>\n\t\t\t<ul>\n\t\t\t\t<li>`tuboid.jpg` &ndash; a jpeg image made of 224 x 224 tiles representing all the shots in a tuboid, left to right, top to bottom &ndash; might be padded with empty images</li>\n\t\t\t\t<li>`metadata.txt` &ndash; a csv text file with columns:\n\t\t\t\t<ul>\n\t\t\t\t\t<li>\n\t\t\t\t\t<ul>\n\t\t\t\t\t\t<li>parrent_image_id &ndash; &lt;device&gt;.&lt;UTC_datetime&gt;</li>\n\t\t\t\t\t\t<li>X &ndash; the X coordinates of the object centroid</li>\n\t\t\t\t\t\t<li>Y &ndash; the Y coordinates of the object centroid</li>\n\t\t\t\t\t</ul>\n\t\t\t\t\t</li>\n\t\t\t\t</ul>\n\t\t\t\t</li>\n\t\t\t\t<li>scale &ndash; The scaling factor applied between the original and image and the 224 x 224 tile (&gt;1 =&gt; image was enlarged)</li>\n\t\t\t\t<li>`context.jpg` &ndash; a representation of the first whole image of a series, with a box around the first tuboid shot (this is for debugging/labelling purposes)</li>\n\t\t\t</ul>\n\t\t\t</li>\n\t\t</ul>\n\t\t</li>\n\t</ul>\n\t</li>\n\t<li>`output/`\n\t<ul>\n\t\t<li>`model_final.pth` &ndash; the model as trained for the publication</li>\n\t</ul>\n\t</li>\n\t<li>config/\n\t<ul>\n\t\t<li>`config.yaml` &ndash; The configuration file defining the hyperparameters to train the model as well as the taxonomic labels</li>\n\t</ul>\n\t</li>\n</ul>", 
  "license": "https://creativecommons.org/licenses/by/4.0/legalcode", 
  "creator": [
    {
      "affiliation": "University of British Columbia", 
      "@id": "https://orcid.org/0000-0001-6546-4306", 
      "@type": "Person", 
      "name": "Quentin Geissmann"
    }
  ], 
  "url": "https://zenodo.org/record/4680119", 
  "datePublished": "2021-04-12", 
  "keywords": [
    "instect traps", 
    "behavioral ecology"
  ], 
  "@context": "https://schema.org/", 
  "distribution": [
    {
      "contentUrl": "https://zenodo.org/api/files/2de26f9c-875f-4ebe-86f1-bcd66475c977/insect-tuboid-classifier.zip", 
      "encodingFormat": "zip", 
      "@type": "DataDownload"
    }, 
    {
      "contentUrl": "https://zenodo.org/api/files/2de26f9c-875f-4ebe-86f1-bcd66475c977/siamese-insect-matcher.zip", 
      "encodingFormat": "zip", 
      "@type": "DataDownload"
    }, 
    {
      "contentUrl": "https://zenodo.org/api/files/2de26f9c-875f-4ebe-86f1-bcd66475c977/universal-insect-detector.zip", 
      "encodingFormat": "zip", 
      "@type": "DataDownload"
    }
  ], 
  "identifier": "https://doi.org/10.5281/zenodo.4680119", 
  "@id": "https://doi.org/10.5281/zenodo.4680119", 
  "@type": "Dataset", 
  "name": "Sticky Pi -- Machine Learning Data, Configuration and Models"
}
210
505
views
downloads
All versions This version
Views 210128
Downloads 505276
Data volume 1.1 TB435.8 GB
Unique views 187118
Unique downloads 16238

Share

Cite as