Published August 26, 2024 | Version v3
Software Open

DNA language model GROVER learns sequence context in the human genome - the code to the paper

  • 1. TU Dresden
  • 2. ROR icon Helmholtz-Zentrum Dresden-Rossendorf

Description

The code to the paper https://www.nature.com/articles/s42256-024-00872-0

Python was used for the model, performance assessment and data generation. R was used for scripting and data visualisation. All input data for the R scripts are separately provided, so that the data-intense and more intense computational steps do not have to be repeated. 
For the Python code, the folder finetuning_tasks has to be combined after decompression. It had to be split into four folders due to uploading problems.  

A tutorial on how to use GROVER as a foundation model can be found at: https://doi.org/10.5281/zenodo.8373159

The pretrained model can be found at: https://doi.org/10.5281/zenodo.8373117

The data for the tokenised genome are at: https://doi.org/10.5281/zenodo.8373053

Files

chr21.zip

Files (26.7 GB)

Name Size Download all
md5:2cf08e20a807f1c5a9d93f2885709b4a
130.8 MB Preview Download
md5:fbdb587bf99e8fc0e2ba47eda0bde5d1
169.3 MB Preview Download
md5:9298ba6caea3430266fccc7c3c81ae39
2.9 GB Preview Download
md5:0bbb6b3b44babd90f378425aee7de018
6.8 GB Preview Download
md5:3030588b853e4d26447d7419624fb95b
4.3 GB Preview Download
md5:47a85174306ab13fbd193d53d1eb083b
3.9 GB Preview Download
md5:c8d3f84a410f46d4d8b03a268e313e35
205 Bytes Download
md5:f0b3a834ad20344f8c9d6a95ca7fba63
112.5 kB Preview Download
md5:1e7e63425b12273de2bd857d3c065cd6
30.7 MB Download
md5:b93073f88f6cd3edce3347adee1fa873
83.6 kB Download
md5:e7c324b82ce7acc347d5c179222e7ea2
2.9 GB Preview Download
md5:0b5b8b16dcf7154cda801426feb0973f
4.4 GB Preview Download
md5:fe15f704e7355cc6bf0b3979ab289851
751 Bytes Preview Download
md5:cb59c4eb8d1a844de9996071918824c0
4.4 kB Preview Download
md5:9f8cb6eb0b22d2befcdb1b9fa8e8b8ab
1.3 GB Preview Download

Additional details

Related works

Is cited by
Preprint: https://www.biorxiv.org/content/10.1101/2023.07.19.549677v1 (URL)
Is supplement to
Software documentation: 10.5281/zenodo.8373159 (DOI)
Requires
Software: 10.5281/zenodo.8373117 (DOI)
Dataset: 10.5281/zenodo.8373053 (DOI)