Published January 25, 2023 | Version v2
Dataset Open

Species-aware DNA language modeling - data

  • 1. School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
  • 2. School of Computation, Information and Technology, Technical University of Munich, Garching, Germany; School of Computation, Information and Technology, Technical University of Munich, Garching, Germany; Institute of Human Genetics, School of Medicine, Technical University of Munich, Munich, Germany; Institute of Computational Biology, Helmholtz Center Munich, Neuherberg, Germany; Munich Data Science Institute, Technical University of Munich, Garching, Germany; Munich Center for Machine Learning, Germany

Description

Data accompanying the publication Species-aware DNA language modeling.

For code, see: https://github.com/DennisGankin/species-aware-DNA-LM (for the latest version) or the code.zip file.

The data directory contains model checkpoints, baselines models, evaluation results and datasets used for training, testing and downstream tasks. It has the following structure:

data/  Datasets and subdirectories

       - data/results/   Results from different test runs

       - data/models/   Model checkpoints

       - data/baselines/   Baseline models

Files

code.zip

Files (22.4 GB)

Name Size Download all
md5:bd1d5225f184bf8e826af117be80362a
785.0 kB Preview Download
md5:4248f0e52f8d0675f0b0d9e4aedb8097
22.4 GB Preview Download