There is a newer version of the record available.

Published July 19, 2024 | Version v2
Dataset Open

Data from: BioEncoder: a metric learning toolkit for comparative organismal biology

  • 1. ROR icon Florida Museum of Natural History
  • 2. ROR icon Lund University
  • 3. University of Oslo
  • 4. Università di Catania

Description

BioEncoder: a metric learning toolkit for comparative organismal biology

Abstract - In the realm of biological image analysis, deep learning (DL) has become a core toolkit, e.g., for segmentation and classification. However, conventional DL methods are challenged by large biodiversity datasets characterized by unbalanced classes and hard-to-distinguish phenotypic differences between them. Here we present BioEncoder, a user-friendly toolkit for metric learning, which overcomes these challenges by focussing on learning relationships between individual data points rather than on the separability of classes. BioEncoder is released as a Python package, created for ease of use and flexibility across diverse datasets. It features taxon-agnostic data loaders, custom augmentation options, and simple hyperparameter adjustments through text-based configuration files. The toolkit's significance lies in its potential to unlock new research avenues in biological image analysis while democratizing access to advanced deep metric learning techniques. BioEncoder focuses on the urgent need for toolkits bridging the gap between complex DL pipelines and practical applications in biological research.

Dataset - This data repository includes two things: a snapshot of the BioEncoder package (BioEncoder-main.zip, version 1.0.0, downloaded from https://github.com/agporto/BioEncoder on 2024-07-19 at 17:20), and the damselfly dataset used for the case study presented in the paper (bioencoder_data.zip). The dataset archive also encompasses the configuration files and the final model checkpoints from the case study, as well as a script to reproduce the results and figures presented in the paper.

How to use - Get started by consulting the GithHub repository for information on how to install BioEncoder, then download the data archive and run the script. Some parts of the script can be executed using the model checkpoints, for orther parts the training rountine needs to be run.     

Files

BioEncoder-data.zip

Files (2.5 GB)

Name Size Download all
md5:48d36b385fe871698d53d44c534f98fd
2.5 GB Preview Download
md5:a875d1b98c23358bae11b4f404ea5ecc
1.5 MB Preview Download

Additional details

Funding

European Commission
PhenoDim – Phenomics and evolution of sexual dimorphism and female colour polymorphism in damselflies 898932
The Research Council of Norway
NA 314499
Nvidia (United States)
Hardware Grant NA

Dates

Created
2024-03-28
version 0.1.0 released
Updated
2024-07-19
version 1.0.0 released

Software

Repository URL
https://github.com/agporto/BioEncoder
Programming language
Python
Development Status
Active