Published February 12, 2026 | Version v1
Dataset Open

Dataset for R workflow for data analysis and comparison studies of phenotypic traits of Faba bean seeds

Description

The data has been used R workflows and PDF summaries for comparing faba bean seed measurements across different methods, including ground truth (GT-MM, GT-DM) and automated pipelines (CP, SVD, min, colorbox). Two sets of analyses are included: one for Length, Width, and Area, and another for Perimeter, Aspect-Ratio, and Circularity. Both workflows generate boxplots, scatterplots, and Altman-Bland plots with regression metrics and confidence intervals to assess measurement accuracy and agreement.

Data Cleaning and Filtering: The workflow begins by handling missing and invalid entries: zeros and blank fields are systematically converted to NA across all measurement columns. To prepare the data for accurate comparison, the workflow employs two key strategies:

Group Sorting: Measurements within each seed group are sorted independently by column to ensure that data points corresponding to the same physical seed are correctly aligned across all measurement methods. 

Localized Group Removal (Filtering): It is used to maximize the sample size N. Unlike row-wise deletion, which discards an entire data row if any single measurement is NA, this method is column-by-column. An entire seed group is removed only if data is missing in the two specific columns currently undergoing a comparison (e.g., Length-CP vs. Length-GT-MM). 

 

Files

Files (9.4 MB)

Name Size Download all
md5:2a351000b2ea7f32c3a46a683dff2eff
9.4 MB Download

Additional details

Funding

Western Grains Research Foundation
VarD2329

Dates

Available
2026-02-12

Software

Repository URL
https://github.com/AAFC-Bioinfo-AAC/faba-bean-image-classification
Programming language
R
Development Status
Active