DNA language model GROVER learns sequence context in the human genome - the code to the paper
Description
The code to the paper https://www.nature.com/articles/s42256-024-00872-0
Python was used for the model, performance assessment and data generation. R was used for scripting and data visualisation. All input data for the R scripts are separately provided, so that the data-intense and more intense computational steps do not have to be repeated.
For the Python code, the folder finetuning_tasks has to be combined after decompression. It had to be split into four folders due to uploading problems.
A tutorial on how to use GROVER as a foundation model can be found at: https://doi.org/10.5281/zenodo.8373159
The pretrained model can be found at: https://doi.org/10.5281/zenodo.8373117
The data for the tokenised genome are at: https://doi.org/10.5281/zenodo.8373053
Files
chr21.zip
Files
(26.7 GB)
Name | Size | Download all |
---|---|---|
md5:2cf08e20a807f1c5a9d93f2885709b4a
|
130.8 MB | Preview Download |
md5:fbdb587bf99e8fc0e2ba47eda0bde5d1
|
169.3 MB | Preview Download |
md5:9298ba6caea3430266fccc7c3c81ae39
|
2.9 GB | Preview Download |
md5:0bbb6b3b44babd90f378425aee7de018
|
6.8 GB | Preview Download |
md5:3030588b853e4d26447d7419624fb95b
|
4.3 GB | Preview Download |
md5:47a85174306ab13fbd193d53d1eb083b
|
3.9 GB | Preview Download |
md5:c8d3f84a410f46d4d8b03a268e313e35
|
205 Bytes | Download |
md5:f0b3a834ad20344f8c9d6a95ca7fba63
|
112.5 kB | Preview Download |
md5:1e7e63425b12273de2bd857d3c065cd6
|
30.7 MB | Download |
md5:b93073f88f6cd3edce3347adee1fa873
|
83.6 kB | Download |
md5:e7c324b82ce7acc347d5c179222e7ea2
|
2.9 GB | Preview Download |
md5:0b5b8b16dcf7154cda801426feb0973f
|
4.4 GB | Preview Download |
md5:fe15f704e7355cc6bf0b3979ab289851
|
751 Bytes | Preview Download |
md5:cb59c4eb8d1a844de9996071918824c0
|
4.4 kB | Preview Download |
md5:9f8cb6eb0b22d2befcdb1b9fa8e8b8ab
|
1.3 GB | Preview Download |
Additional details
Related works
- Is cited by
- Preprint: https://www.biorxiv.org/content/10.1101/2023.07.19.549677v1 (URL)
- Is supplement to
- Software documentation: 10.5281/zenodo.8373159 (DOI)
- Requires
- Software: 10.5281/zenodo.8373117 (DOI)
- Dataset: 10.5281/zenodo.8373053 (DOI)