Published June 23, 2025 | Version v1
Dataset Open

Simple simulated genotype-phenotype datasets for DL scaling work

Contributors

Data collector:

  • 1. ROR icon Arcadia Science

Description

This release accompanies the pub Deep learning scaling behaviour in a quantitative genetics framework

This dataset contains the underlying simulated data used for all three in-silico 'experiments' reported in the pub. Files 'alphasimr_dilution_input.tar.xz', and 'alphasimr_pleio_input.tar.xz' correspoding to data generated for the dilution and pleiotropy/genetic correlation experiments in the pub respecitvely. The two scaling files split the data generated for the first scaling experiment. The file named 'alphasimr_scaling_1e06_input.tar.xz' contains the largest simulation replicates with 1,000,000 sampled genomes. The file 'alphasimr_scaling_input.tar.xz' contains all other smaller simulation replicates. 

Each compressed directory includes simulated genotypes, phenotypes, and QTL effect sizes saved in .txt format for each simulation replicate.  

 

 

Files

Files (17.4 GB)

Name Size Download all
md5:a8f9c8cdd284bf2bfb590cdd7ee883a4
146.1 MB Download
md5:9c95b97a072474d9603edfe1c7356d60
1.6 GB Download
md5:adec01ac5afd6cd98ae2dc0ed6b75fa0
14.8 GB Download
md5:2cd5f3ce0e5bb9837f1a74eb5bb325c8
955.9 MB Download

Additional details

Related works

Is supplement to
Publication: 10.57844/arcadia-25nt-guw3 (DOI)