Published August 27, 2024 | Version 1.0
Dataset Open

F2 simulated populations with Popsimul

  • 1. ROR icon Institut de Recherche pour le Développement
  • 2. Cirad

Description

TAR archive containing 168 files of simulated genotypes of 84 F2 bi-parental populations of 300 individuals each.

The main aim of this dataset is to test the accuracy of different genotype imputation methods, such as TASSEL-FSFhap (doi: 10.3835/plantgenome2014.05.0023) or NOISYmputer (https://gitlab.cirad.fr/noisymputer).

Genotypes were simulated with Popsimul (https://forge.ird.fr/diade/recombination_landscape/popsimul).

The imaginary species is diploid, and its genome has one chromosome. The two parental lines of the populations are pure lines, that is, 100% homozygous. The F2 individuals derive from the self-fertilization of the F1 hybrid between the parental lines. Genotypes are represented by Single Nucleotide Polymorphism (SNP) markers obtained by virtual genome re-sequencing. 

Simulations were carried using ranges of SNP densities, sequencing depth and sequencing error rates., which appear in this order in the file names, and representing 84 combinations of the three parameters in total. For each combination, a version without error was firstly produced, representing the ground truth for estimation of the imputation accuracy. 

The genotypes are encoded following the Variant Call Format (VCF). Each simulated file is compressed (.gz format) with BCFtools.

Files

Files (8.7 GB)

Name Size Download all
md5:6184d4b205e4edcf3c19c0f8f731a4a9
8.7 GB Download

Additional details