Published January 25, 2023
| Version v2
Dataset
Open
Species-aware DNA language modeling - data
Creators
- 1. School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
- 2. School of Computation, Information and Technology, Technical University of Munich, Garching, Germany; School of Computation, Information and Technology, Technical University of Munich, Garching, Germany; Institute of Human Genetics, School of Medicine, Technical University of Munich, Munich, Germany; Institute of Computational Biology, Helmholtz Center Munich, Neuherberg, Germany; Munich Data Science Institute, Technical University of Munich, Garching, Germany; Munich Center for Machine Learning, Germany
Description
Data accompanying the publication Species-aware DNA language modeling.
For code, see: https://github.com/DennisGankin/species-aware-DNA-LM (for the latest version) or the code.zip file.
The data directory contains model checkpoints, baselines models, evaluation results and datasets used for training, testing and downstream tasks. It has the following structure:
data/ Datasets and subdirectories
- data/results/ Results from different test runs
- data/models/ Model checkpoints
- data/baselines/ Baseline models