Published September 14, 2023 | Version 1.0.0
Dataset Open

Sainfoin Fruit Processing - Object Detection Dataset

Description

This dataset consists of 500 images of sainfoin (Onobrychis viciifolia) seed pods, seed, and split seeds. The images were taken as a part of an experiment to determine minimum sample size of seed pods needed to accurately estimate pod threshing trait heritability within sainfoin breeding lines.

The experiment was a complete factorial design with the following factors:

  • Sainfoin named varieties: AAC Mountainview, Delaney, Eski , Rocky Mountain Remont, and Shoshone
  • Sample Size: 1, 2, 3, 4, and 5 grams of dried seed pods
  • Two different threshing types: Belt thresher processed 3X, Haldrup Impact Thresher (35sec @ Speed 9)

This makes for a total factorial combination set of 5 varieties X 5 sample sizes X 2 threshing types = 50.

Each combination was comprised of 10 individual replicates where each replicate in a combination was a unique, random sample of seeds of the same mass (So, 10 random, 2g samples of Eski seed, processed by belt thresher; 10 random, 5g samples of Delaney seed processed by the Haldrup thresher, etc.). This makes for a total of 500 experimental units that comprise the sample set.

Once the seeds were sampled, weighed, and processed through the threshing equipment, they were weighed again and imaged.

The threshed seeds were scattered onto an imaging platform with a blue background, lit by 2 LED panels, and photographed with a Sony ILCE-7RM2 at the following settings:

  • ISO: 100
  • Exposure: 1/40s
  • Focal Length: 55mm
  • Format: TIFF
  • Size: 7968x5320

The raw images were converted from TIFF files to JPEG format and annotated in image labeling software. The seed objects were annotated with bounding boxes classified as the following classes

  1. pod: an enclosed seed pod
  2. seed: a seed which was successfully threshed from the legume pod carpel
  3. split: a seed threshed from the pod, but which split in two halves during the threshing process

All image annotations were exported into the convenient COCO format.

No further image processing was performed.

The image set was split into a 80/20 training and validation step using `scikit-learn` in Python 3.11 stratifying the datasets equally over the various experimental factor levels.

The zip file 'train_val_images.zip' contains a 'train' folder with 400 training images, 'val' containing 100 validation images, an image taken with a color correction card named 'color_test.jpg', and a json file with all the annotations.

Another file called 'seed_weights.csv' contains the image_name to global-key mapping in tabular format as well as the before and after threshing seed weights for each experimental sample.

Labeling Metrics:

  • Pod (48.58%)
    • 36,599 objects
  • Seed (33.83%)
    • 25,488 object
  • Split (17.59%)
    • 13,255 objects
  • TOTAL (100%)
    • 75,342 objects

Files

seed_weights.csv

Files (897.5 MB)

Name Size Download all
md5:1bdf522c1532c2b9252daae3dc9f1291
44.6 kB Preview Download
md5:0f9dc94110dff02d5c3734a1c44b2135
897.4 MB Preview Download

Additional details

Dates

Updated
2023-10-16
Updated to v1.0.0