Published February 20, 2024 | Version v1
Dataset Open

10 Synthetic Genomics Datasets

  • 1. ROR icon Centre for Research and Technology Hellas
  • 2. ROR icon National and Kapodistrian University of Athens

Description

These are 10 synthetic genomics datasets generated with NEAT v3 (based on TP53 gene of Homo Sapiens) for the use case of benchmarking somatic variant callers. To find more about our generating framework please visit synth4bench GitHub repository.

The datasets explore intrinsic NGS data parameters for the use case of observing their effect on tumor-only somatic variant calling algorithms. From the 10 datasets, there are 5 of them with different coverage (while keeping all other parameters fixed) and 5 with varying read length. The reads in all datasets are paired-end .

Name of File Coverage Lenght of Reads
300_30_10 300x 150
700_70_10 700x 150
1000_100_10 1000x 150
3000_300_10 3000x 150
5000_500_10 5000x 150
1000_50 1000x 50
1000_100 1000x 100
1000_170 1000x 170
1000_200 1000x 200
1000_300 1000x 300

Files

Files (55.9 MB)

Name Size Download all
md5:c1d7f7638bfbf48c4e785b8605ad89ca
25.7 kB Download
md5:ca9ee4b2f062238c1f1d73915b186cbb
55.9 MB Download

Additional details

Software

Programming language
Python
Development Status
Active