Weed Growth Stage Dataset
Creators
Description
This dataset is a comprehensive collection of temporally annotated weed imagery designed for computer vision applications in precision agriculture. Developed at the SIU Horticulture Research Center in 2024, this dataset capture the complete growth progression of 16 agriculturally significant weed species over an 11-week developmental cycle. The paper associated with this dataset has been published in Scientific Reports. Codes are available at our project GitHub page.
Dataset Composition
The datasets encompass 16 weed species from 8 botanical families, including AMAPA (palmer amaranth), AMARE (redroot pigweed), AMATU (waterhemp), DIGSA (large crabgrass), ECHCG (barnyardgrass), CHEAL (common lambsquarter), ABUTH (velvetleaf), AMBEL (common ragweed), CYPES (yellow nutsedge), ERICA (horseweed), PANDI (fall panicum), SETFA (giant foxtail), SETPU (yellow foxtail), SIDSP (prickly sida), SORHA (johnsongrass), and SORVU (shatter cane). Each species was monitored from BBCH stage 11 ("first true leaf unfolded") through stage 60 ("first flower open"). This dataset contains 203,567 high-quality images across 174 classes, following an 80/10/10 split for training (184,719 images), validation (23,090 images), and testing (23,090 images).
Data Collection and Processing
High-resolution 4K video clips (3840×2160 pixels) were captured using iPhone 15 Pro Max positioned 1.5 feet above plants, recording 360-degree perspectives during weekly imaging sessions. Plants were cultivated in controlled greenhouse conditions using standardized pots with Pro-Mix BX potting soil. This dataset utilized traditional computer vision techniques for preprocessing, featuring RGB-to-HSV color space conversion for enhanced green hue discrimination, green area detection through calibrated thresholding (hue: 25/360 to 160/360, minimum saturation: 0.20, minimum value: 0.20), and morphological operations using disk-shaped structuring elements for region refinement and connected component analysis.
Annotation and Quality Control
Each image received comprehensive Pascal VOC XML annotations including species codes and temporal information (e.g., "AMAPA_week_5"). This week-wise labeling system provides direct mapping between observation time and phenological development stages. Manual verification using LabelImg software ensured annotation accuracy following initial automated labeling.
Applications and Significance
This dataset addresses critical gaps in agricultural computer vision by providing temporally-structured weed imagery for growth stage classification, species identification, and developmental tracking. The comprehensive species representation and temporal resolution make it valuable for training robust machine learning models for precision agriculture applications.
Technical Specifications
- Total Images: 203,567 with comprehensive annotations
- Species Coverage: 16 weed species with standardized codes
- Temporal Resolution: Weekly progression over 11 weeks
- Image Quality: 4K resolution with 360-degree capture
- Annotation Format: Pascal VOC XML with species codes and temporal labels
- Preprocessing: Traditional computer vision with HSV conversion and morphological operations
This dataset represents a valuable resource for advancing precision agriculture technologies and automated weed management systems.
Citation
If you use this dataset for your research, please cite the original paper:
@article{islam2025weedswin, title={WeedSwin hierarchical vision transformer with SAM-2 for multi-stage weed detection and classification}, author={Islam, Taminul and Sarker, Toqi Tahamid and Ahmed, Khaled R and Rankrape, Cristiana Bernardi and Gage, Karla}, journal={Scientific Reports}, volume={15}, number={1}, pages={23274}, year={2025}, publisher={Nature Publishing Group UK London} }
or
Islam, T., Sarker, T.T., Ahmed, K.R. et al. WeedSwin hierarchical vision transformer with SAM-2 for multi-stage weed detection and classification. Sci Rep 15, 23274 (2025). https://doi.org/10.1038/s41598-025-05092-z
Files
Additional details
Related works
- Is supplement to
- Journal article: 10.1038/s41598-025-05092-z (DOI)
Dates
- Accepted
-
2025-05-30Scientific Reports
- Available
-
2025-07-02Scientific Reports
Software
- Repository URL
- https://github.com/taminulislam/weedswin
- Programming language
- Python
- Development Status
- Active