Core resources for genome analysis
Description
The Wellcome Sanger Institute will sequence and assemble tens of thousands of genomes across the next decade through large-scale biodiversity sequencing projects such as Darwin Tree of Life. The projects span the entire tree of eukaryotic life, and the genomes will enable new and unprecedented science. But assembling a genome is just the first step on this journey. Most studies analyse features annotated on the genomes, rather than the raw DNA sequences themselves. To facilitate this, we will compute a set of elementary sequence analyses on every genome that comes out of the institute and provide the tracks on a publicly available server through Track Hubs. Envisaged tracks include sequence composition analysis (and by extension k-mer frequency analysis for several k), repeat, gene, and variants distributions. The goal is to save the community efforts and resources by providing a uniform dataset, whilst reducing the overall carbon footprint of genome analysis.
Files
2021-09-26 - Biodiversity Genomics 2021 - Core Analyses Pipelines.pdf
Files
(1.4 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:26aad6d77bd5a0738dc47673812f8010
|
1.4 MB | Preview Download |
Additional details
Funding
- Wellcome Trust
- Darwin Tree of Life 218328