Published March 6, 2026 | Version v1
Dataset Open

Genomic language models improve cross-species gene expression prediction and accurately capture regulatory variant effects in Brachypodium mutant lines

  • 1. ROR icon Aarhus University
  • 2. Department of Agroecology, Aarhus University, Slagelse 4200, Denmark
  • 3. State Key Laboratory of Maize Bio-breeding, National Maize Improvement Center, Frontiers Science Center for Molecular Design Breeding, Department of Plant Genetics and Breeding, China Agricultural University, Beijing 100193, People's Republic of China
  • 4. ROR icon Nanjing Agricultural University

Description

This record contains the datasets, trained models, codes, and supporting materials for the manuscript “Genomic language models improve cross-species gene expression prediction and accurately capture regulatory variant effects in Brachypodium mutant lines”. It contains datasets including sequence embeddings across 17 plant species used for training, hyperparameter optimization, cross-validation tests, and experimental in planta validation of predictions, gene expression predictions across 17 species by each of the 4 EMPRES model types, trained models, analysis scripts, and the conda environment file. See README.md for full details.

Files

codes.zip

Files (129.4 GB)

Name Size Download all
md5:7769d82ef3329dffd510a9fbc291da99
12.6 kB Download
md5:34ae441140f2eaf3db64a1a54d5b878e
16.1 kB Preview Download
md5:fd3cb7e1e05f4f2b8222a84f34bc6686
2.4 MB Download
md5:e3f8d2d7b061b0971c0eb55803b05e93
10.2 MB Download
md5:a98619ded2f189484b8ca7b66406e0d5
796.2 kB Download
md5:35229a66d039e156c200fa632f61a600
796.2 kB Download
md5:292b514feb2c8bdb39de60d75a5af610
796.1 kB Download
md5:c39fee4c8c6d560ab323c5b7f971b530
796.1 kB Download
md5:4fdc38a37000fca7fbd87235bdb09459
795.9 kB Download
md5:5a812723242a18f23bec52b4c615e30b
2.4 MB Download
md5:9b2a9e7139c9c197b59fe07cd3b6acd6
20.1 kB Download
md5:589bc8cd6d4f47b3587e5eae046cb196
14.7 kB Download
md5:54dbaa4c2648c392b52f91441a75e6d3
14.9 kB Download
md5:428839e73f3607b60ed23228b0ad2478
15.6 kB Download
md5:af88b649d5cd3ca33297d8504a3d95bb
3.7 GB Download
md5:4ae315898b940c07ca3c97389c1a3218
322.9 MB Preview Download
md5:c03325bdfa248c44c434d9c70fa33722
279.1 kB Preview Download
md5:1648cebc326b4c3c66330ad1a5f2f0df
4.7 kB Preview Download
md5:ce111a79002c59968dfa307717f695fe
50.7 MB Download
md5:71f773a61fcd7f7b5a30a9be9e224540
7.4 kB Preview Download
md5:ef0e5cee1f243964898f25c9685c6e97
5.3 kB Download
md5:3de315931848049818e5bdd65cb6b9c9
1.6 kB Preview Download
md5:822b5a646e3eae2126d5e1d48b5b0e5e
1.8 GB Download
md5:82a09105a47233b625b9f92282fd7cf7
2.4 MB Download
md5:2cd2b8770acac4c409f49514cabc1e1c
43.6 GB Download
md5:53ef00b78e349f593d91e0e7defbd5ae
18.1 GB Download
md5:27ad183197b2827e04c2fe23286bb9e0
47.1 MB Download
md5:02a6c76909ab7ec8dceb991f8694721c
43.6 GB Download
md5:d8d84406dd303d112e3de7782ad1a145
18.1 GB Download
md5:14b0a0fc5eba5fab8b410e8f9f415823
47.1 MB Download

Additional details

Software

Repository URL
https://github.com/behroozvahedi/Regeffects
Programming language
Python
Development Status
Active