BioCLIP 2

Gu, Jianyang; Stevens, Samuel; Campolongo, Elizabeth G.; Thompson, Matthew J.; Zhang, Net; Wu, Jiaman; Mai, Zheda

doi:10.5281/zenodo.15644364

Published June 11, 2025 | Version v1.0.0

Software Open

BioCLIP 2

Foundation models trained at scale exhibit remarkable emergent behaviors, learning new capabilities beyond their initial training objectives. We find such emergent behaviors in biological vision models via large-scale contrastive vision-language training. To achieve this, we first curate TreeOfLife-200M, comprising 214 million images of living organisms, the largest and most diverse biological organism image dataset to date. We then train BioCLIP 2 on TreeOfLife-200M to distinguish different species. Despite the narrow training objective, BioCLIP 2 yields extraordinary accuracy when applied to various biological visual tasks such as habitat classification and trait prediction. We identify emergent properties in the learned embedding space of BioCLIP 2. At the inter-species level, the embedding distribution of different species aligns closely with functional and ecological meanings (e.g. beak sizes and habitats). At the intra-species level, instead of being diminished, the intra-species variations (e.g. life stages and sexes) are preserved and better separated in subspaces orthogonal to inter-species distinctions. We provide formal proof and analyses to explain why hierarchical supervision and contrastive objectives encourage these emergent properties. Crucially, our results reveal that these properties become increasingly significant with larger-scale training data, leading to a biologically meaningful embedding space.

Notes

If you use this software, please cite both the article and the software itself.

Article citation:

@article{gu2025bioclip,
title = {{B}io{CLIP} 2: Emergent Properties from Scaling Hierarchical Contrastive Learning},
author = {Jianyang Gu and Samuel Stevens and Elizabeth G Campolongo and Matthew J Thompson and Net Zhang and Jiaman Wu and Andrei Kopanev and Zheda Mai and Alexander E. White and James Balhoff and Wasila M Dahdul and Daniel Rubenstein and Hilmar Lapp and Tanya Berger-Wolf and Wei-Lun Chao and Yu Su},
year = {2025},
eprint={2505.23883},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2505.23883},
}

Files

Imageomics/bioclip-2-v1.0.0.zip

Files (3.4 MB)

Name	Size	Download all
Imageomics/bioclip-2-v1.0.0.zip md5:1d89051d2c8c488960d74c043bbf8b63	3.4 MB	Preview Download

Additional details

Is supplement to: Preprint: 10.48550/arXiv.2505.23883 (DOI)
Is version of: Software: https://github.com/Imageomics/bioclip-2/tree/v1.0.0 (URL)

U.S. National Science Foundation
HDR Institute: Imageomics: A New Frontier of Biological Information Powered by Knowledge-Guided Machine Learning 2118240

Repository URL: https://github.com/Imageomics/bioclip-2

Stevens, S., Wu, J., Thompson, M. J., Campolongo, E. G., Song, C. H., Carlyn, D. E., Dong, L., Dahdul, W. M., Stewart, C., Berger-Wolf, T., Chao, W., & Su, Y. (2024). BioCLIP: A Vision Foundation Model for the Tree of Life [Conference paper]. Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Ilharco, G., Wortsman, M., Wightman, R., Gordon, C., Carlini, N., Taori, R., Dave, A., Shankar, V., Namkoong, H., Miller, J., Hajishirzi, H., Farhadi, A., & Schmidt, L. (2021). OpenCLIP (Version v0.1) [Computer software]. https://doi.org/10.5281/zenodo.5143773

	All versions	This version
Views	94	84
Downloads	6	5
Data volume	20.1 MB	16.8 MB

BioCLIP 2

Notes

Files

Imageomics/bioclip-2-v1.0.0.zip

Files (3.4 MB)

Additional details

Related works

Funding

Software

References

BioCLIP 2

Creators

Description

Notes

Files

Imageomics/bioclip-2-v1.0.0.zip

Files (3.4 MB)

Additional details

Related works

Funding

Software

References