Published January 12, 2026 | Version v1
Dataset Restricted

Actinidia chinensis Phenology Balanced Dataset (Portugal, 2025)

  • 1. ROR icon INESC TEC
  • 2. ROR icon University of Trás-os-Montes and Alto Douro
  • 3. EDMO icon Faculty of Engineering - University of Porto (FEUP)
  • 4. Universidade do Porto Faculdade de Ciências

Description

The Actinidia chinensis Phenology Balanced Dataset is a stratified and class-balanced subset derived from the Multi-Modal Actinidia chinensis Phenology Dataset, prepared explicitly for training hierarchical detection models that target female flower phenological staging.

The source material comprises the Labelled Images component from the original dataset, which contains smartphone-acquired imagery of kiwifruit reproductive structures annotated in Pascal VOC format according to a 17-class hierarchical taxonomy organised across three levels: structure (bud, flower, fruit), gender (female, male), and BBCH-adapted phenological stage.

The balanced dataset was generated through a two-stage processing pipeline:

  1. Split stage: The source imagery was partitioned into test (99 images) and train+validation (1 556 images) subsets using stratified sampling to ensure proportional class representation in the test set across all hierarchical classification levels.
  2. Balancing stage: Images exhibiting disproportionate annotation density for over-represented classes were systematically removed from the train+validation subset to facilitate subsequent balancing operations (1 311 images retained). Augmentation operations were applied iteratively until target class distributions were achieved (1 311 original + 649 augmented). Composite operations were constructed by combining one geometric transformation (flip, scale_rotate and downscale) with one appearance transformation (bright_contrast, grid_distortion, grid_dropout, unsharp and motion_blur).  All augmentations were implemented using the Albumentations library, which includes bounding box coordinate transformation.
Operation Description
flip Horizontal and vertical reflection.
scale_rotate Affine transformation with shift (±6.25%), scale (±10%), and rotation (±15°).
bright_contrast Brightness and contrast modulation (±40%).
downscale Resolution reduction (50%) with interpolation.
grid_distortion Elastic grid-based spatial distortion.
grid_dropout Grid-based region dropout (20% ratio).
unsharp Unsharp masking for edge enhancement.
motion_blur Directional motion blur simulation.

Phenological stage labels for bud and male flower classes were consolidated to their parent categories, reducing the taxonomy from 17 to 9 object classes while preserving full phenological granularity for the female flower pathway.

Original label New label
bud_53 bud
bud_55 bud
bud_56 bud
bud_57 bud
bud bud
flower_female_60 flower_female_60
flower_female_61 flower_female_61
flower_female_67 flower_female_67
flower_female_68 flower_female_68
flower_female_69 flower_female_69
flower_female flower_female
flower_male_60 flower_male
flowermale_61 flower_male
flower_male_67 flower_male
flower_male flower_male
flower flower
fruit -

The processed dataset is organised into two directories (test and train+validation), each containing an Images folder with JPEG files and an Annotations folder with corresponding Pascal VOC XML files.

Set Images Annotations Classes
test 99 942 9
train + validation 1 960 16 694 9

The hierarchical detection pathway comprises three sequential classification tasks. The following tables present the class distribution for each level, demonstrating the balance achieved through the processing pipeline. Image counts indicate images containing at least one annotation of that class; individual images may contain multiple classes.

The first level performs structure detection, distinguishing between bud and flower structures:

Class test train+validation
bud 67 images with 473 annotations 1,256 images with 7,771 annotations
flower 174 images with 460 annotations 2,892 images with 8,827 annotations
background 11 images  98 images 

The second level performs gender classification on detected flowers. Images containing only bud, fruit, or no annotations serve as background for this classification task:

Class test train+validation
flower_female  128 images with 228 annotations 1 694 images with 3 497 annotations
flower_male 27 images with 199 annotations 704 images with 4 295 annotations
flower  19 images with 33 annotations 494 images with 1 035 annotations
background 83 images  1 373 images 

The third level performs phenological stage classification on detected female flowers. Images containing only male flowers, generic flowers, buds, fruit, or no annotations serve as background for this classification task:

Class test train+validation
flower_female 23 images with 32 annotations 332 images with 537 annotations
flower_female_60 23 images with 38 annotations 318 images with 485 annotations
flower_female_61 29 images with 45 annotations 360 images with 683 annotations
flower_female_67 19 images with 35 annotations 297 images with 797 annotations
flower_female_68 14 images with 26 annotations 213 images with 489 annotations
flower_female_69 20 images with 52 annotations 174 images with 506 annotations
background 130 images  1 559 images

The train+validation subset should be used with k-fold cross-validation for model development and hyperparameter optimisation. The test subset is reserved for final model evaluation and should not be used during training or validation. Users requiring the complete 17-class taxonomy or male flower phenological staging should refer to the original dataset.

Files

Restricted

The record is publicly accessible, but files are restricted. <a href="https://zenodo.org/account/settings/login?next=https://zenodo.org/records/18224815">Log in</a> to check if you have access.