Predicting Five Key Biomass Components from Pasture Images: A Dual-Stream Ensemble Approach for the CSIRO Image2Biomass Challenge
Authors/Creators
Description
This technical report presents an independent entry to the CSIRO Image2Biomass Prediction Kaggle competition (October 2025 – January 2026), which tasked participants with predicting five pasture biomass components — dry green vegetation, dry dead material, dry clover biomass, green dry matter (GDM), and total dry biomass — from high-resolution field images.
We propose a dual-stream ensemble combining: (1) a Vision Transformer backbone (vit_large_patch16_dinov3_qkvb) augmented with a Feature-wise Linear Modulation (FiLM) fusion module for left/right panoramic image fusion, and (2) a frozen SigLIP semantic feature extractor coupled with a gradient boosting ensemble (CatBoost, LightGBM, HistGradientBoosting, GradientBoosting). The two streams are combined via a weighted ensemble (88.5% / 11.5%). A physics-constrained post-processing step enforces biological mass balance constraints (GDM = Dry_Green + Dry_Clover, Dry_Total = GDM + Dry_Dead) and a state-based rule for Western Australia samples.
Key contributions include: a robust preprocessing pipeline with conditional orange timestamp inpainting (HSV masking, Telea algorithm, applied to 26.7% of images) and bottom-crop artifact removal; a FiLM-based dual-stream fusion module enabling cross-view interaction in O(d); text-guided semantic feature extraction via SigLIP cosine similarity probing; and physics-constrained post-processing via orthogonal projection.
The pipeline achieved a weighted R² of 0.7169 on the public leaderboard and 0.6172 (unofficial, top ~8%, silver medal zone) on the private leaderboard, ranking 167 out of 3802 teams. OOF 5-fold cross-validation yielded a weighted global R² of 0.8785 on the 357 training images, with Dry_Dead_g identified as the hardest target (R² = 0.698) due to visual ambiguity with bare soil.
All code is openly available at https://github.com/gtom-pandas/image2biomass. The dataset is provided by CSIRO, MLA, and FrontierSI under CC BY-SA 4.0.
Files
image2biomass technical paper GRACI.pdf
Files
(2.7 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:24d8273bfdd3ffcd67b1570ea22bd45b
|
2.7 MB | Preview Download |
Additional details
Related works
- Is derived from
- Event: https://www.kaggle.com/competitions/csiro-biomass (URL)
- Is supplemented by
- Computational notebook: https://github.com/gtom-pandas/image2biomass (URL)
Software
- Repository URL
- https://github.com/gtom-pandas/image2biomass
- Programming language
- Python
- Development Status
- Inactive