Actinidia chinensis Phenology Balanced Dataset (Portugal, 2025)
Authors/Creators
Description
The Actinidia chinensis Phenology Balanced Dataset is a stratified and class-balanced subset derived from the Multi-Modal Actinidia chinensis Phenology Dataset, prepared explicitly for training hierarchical detection models that target female flower phenological staging.
The source material comprises the Labelled Images component from the original dataset, which contains smartphone-acquired imagery of kiwifruit reproductive structures annotated in Pascal VOC format according to a 17-class hierarchical taxonomy organised across three levels: structure (bud, flower, fruit), gender (female, male), and BBCH-adapted phenological stage.
The balanced dataset was generated through a two-stage processing pipeline:
- Split stage: The source imagery was partitioned into test (99 images) and train+validation (1 556 images) subsets using stratified sampling to ensure proportional class representation in the test set across all hierarchical classification levels.
- Balancing stage: Images exhibiting disproportionate annotation density for over-represented classes were systematically removed from the train+validation subset to facilitate subsequent balancing operations (1 311 images retained). Augmentation operations were applied iteratively until target class distributions were achieved (1 311 original + 649 augmented). Composite operations were constructed by combining one geometric transformation (flip, scale_rotate and downscale) with one appearance transformation (bright_contrast, grid_distortion, grid_dropout, unsharp and motion_blur). All augmentations were implemented using the Albumentations library, which includes bounding box coordinate transformation.
| Operation | Description |
| flip | Horizontal and vertical reflection. |
| scale_rotate | Affine transformation with shift (±6.25%), scale (±10%), and rotation (±15°). |
| bright_contrast | Brightness and contrast modulation (±40%). |
| downscale | Resolution reduction (50%) with interpolation. |
| grid_distortion | Elastic grid-based spatial distortion. |
| grid_dropout | Grid-based region dropout (20% ratio). |
| unsharp | Unsharp masking for edge enhancement. |
| motion_blur | Directional motion blur simulation. |
Phenological stage labels for bud and male flower classes were consolidated to their parent categories, reducing the taxonomy from 17 to 9 object classes while preserving full phenological granularity for the female flower pathway.
| Original label | New label |
| bud_53 | bud |
| bud_55 | bud |
| bud_56 | bud |
| bud_57 | bud |
| bud | bud |
| flower_female_60 | flower_female_60 |
| flower_female_61 | flower_female_61 |
| flower_female_67 | flower_female_67 |
| flower_female_68 | flower_female_68 |
| flower_female_69 | flower_female_69 |
| flower_female | flower_female |
| flower_male_60 | flower_male |
| flowermale_61 | flower_male |
| flower_male_67 | flower_male |
| flower_male | flower_male |
| flower | flower |
| fruit | - |
The processed dataset is organised into two directories (test and train+validation), each containing an Images folder with JPEG files and an Annotations folder with corresponding Pascal VOC XML files.
| Set | Images | Annotations | Classes |
| test | 99 | 942 | 9 |
| train + validation | 1 960 | 16 694 | 9 |
The hierarchical detection pathway comprises three sequential classification tasks. The following tables present the class distribution for each level, demonstrating the balance achieved through the processing pipeline. Image counts indicate images containing at least one annotation of that class; individual images may contain multiple classes.
The first level performs structure detection, distinguishing between bud and flower structures:
| Class | test | train+validation |
| bud | 67 images with 473 annotations | 1,256 images with 7,771 annotations |
| flower | 174 images with 460 annotations | 2,892 images with 8,827 annotations |
| background | 11 images | 98 images |
The second level performs gender classification on detected flowers. Images containing only bud, fruit, or no annotations serve as background for this classification task:
| Class | test | train+validation |
| flower_female | 128 images with 228 annotations | 1 694 images with 3 497 annotations |
| flower_male | 27 images with 199 annotations | 704 images with 4 295 annotations |
| flower | 19 images with 33 annotations | 494 images with 1 035 annotations |
| background | 83 images | 1 373 images |
The third level performs phenological stage classification on detected female flowers. Images containing only male flowers, generic flowers, buds, fruit, or no annotations serve as background for this classification task:
| Class | test | train+validation |
| flower_female | 23 images with 32 annotations | 332 images with 537 annotations |
| flower_female_60 | 23 images with 38 annotations | 318 images with 485 annotations |
| flower_female_61 | 29 images with 45 annotations | 360 images with 683 annotations |
| flower_female_67 | 19 images with 35 annotations | 297 images with 797 annotations |
| flower_female_68 | 14 images with 26 annotations | 213 images with 489 annotations |
| flower_female_69 | 20 images with 52 annotations | 174 images with 506 annotations |
| background | 130 images | 1 559 images |
The train+validation subset should be used with k-fold cross-validation for model development and hyperparameter optimisation. The test subset is reserved for final model evaluation and should not be used during training or validation. Users requiring the complete 17-class taxonomy or male flower phenological staging should refer to the original dataset.