Dataset for R workflow for data analysis and comparison studies of phenotypic traits of Faba bean seeds
Authors/Creators
Description
The data has been used R workflows and PDF summaries for comparing faba bean seed measurements across different methods, including ground truth (GT-MM, GT-DM) and automated pipelines (CP, SVD, min, colorbox). Two sets of analyses are included: one for Length, Width, and Area, and another for Perimeter, Aspect-Ratio, and Circularity. Both workflows generate boxplots, scatterplots, and Altman-Bland plots with regression metrics and confidence intervals to assess measurement accuracy and agreement.
Data Cleaning and Filtering: The workflow begins by handling missing and invalid entries: zeros and blank fields are systematically converted to NA across all measurement columns. To prepare the data for accurate comparison, the workflow employs two key strategies:
Group Sorting: Measurements within each seed group are sorted independently by column to ensure that data points corresponding to the same physical seed are correctly aligned across all measurement methods.
Localized Group Removal (Filtering): It is used to maximize the sample size N. Unlike row-wise deletion, which discards an entire data row if any single measurement is NA, this method is column-by-column. An entire seed group is removed only if data is missing in the two specific columns currently undergoing a comparison (e.g., Length-CP vs. Length-GT-MM).
Files
Files
(9.4 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:2a351000b2ea7f32c3a46a683dff2eff
|
9.4 MB | Download |
Additional details
Funding
- Western Grains Research Foundation
- VarD2329
Dates
- Available
-
2026-02-12
Software
- Repository URL
- https://github.com/AAFC-Bioinfo-AAC/faba-bean-image-classification
- Programming language
- R
- Development Status
- Active