A phased genome assembly for allele-specific analysis in Trypanosoma brucei
- 1. Department of Veterinary Sciences, Experimental Parasitology, Ludwig-Maximilians-Universität München, Lena-Christ-Str. 48, 82152 Planegg-Martinsried, Germany; Biomedical Center Munich, Department of Physiological Chemistry, Ludwig-Maximilians-Universität München, Großhaderner Str. 9, 82152 Planegg-Martinsried, Germany
Description
This repository contains the data analysis workflows, the supplementary tables and genome and annotation version from the manuscript entitled "A phased genome assembly for allele-specific analysis in Trypanosoma brucei" https://doi.org/10.1101/2021.04.13.439624
Due to space limitations in Zenodo, for some workflows, full datasets could not be uploaded. For those workflows we provide the directory tree of the complete data analysis folder.
Abstract
Many eukaryotic organisms are diploid or even polyploid, i.e. they harbour two or more independent copies of each chromosome. Yet, to date most reference genome assemblies represent a mosaic consensus sequence in which the homologous chromosomes have been collapsed into one sequence. This procedure generates sequence artefacts and impedes analyses of allele-specific mechanisms. Here, we report the allele-specific genome assembly of the diploid unicellular protozoan parasite Trypanosoma brucei.
As a first step, we called variants on the allele-collapsed assembly of the T. brucei Lister 427 isolate using short-read error-corrected PacBio reads. We identified ~96 thousand heterozygote variants across the genome (average of 4.2 variants / kb), and observed that the variant density along the chromosomes was highly uneven. Several long (>100 kb) regions of loss-of-heterozigosity (LOH) were identified, suggesting recent recombination events between the alleles. By analysing available genomic sequencing data of multiple Lister 427 derived clones, we found that most LOH regions were conserved, except for some that were specific to clones adapted to the insect lifecycle stage. Surprisingly, we also found that some Lister 427 clones were aneuploid. We found evidence of trisomy in chromosome five (Chr5), Chr2, Chr6 and Chr7. Moreover, by analysing RNA-seq data, we showed that the transcript level is proportional to the ploidy, evidencing the lack of a general expression control at the transcript level in T. brucei.
As a second step, to generate an allele-specific genome assembly, we used two powerful datatypes for haplotype reconstruction: raw long reads (PacBio) and chromosome conformation (Hi-C) data. With this approach, we were able to assign 99.5% of all the heterozygote variants to a specific homologous chromosome, building a 66 Mb long T. brucei Lister 427 allele-specific genome assembly. Hereby, we identified genes with allele-specific premature termination codons and showed that differences in allele-specific expression at the level of transcription and translation can be accurately monitored with the fully phased genome assembly.
The obtained reference-grade allele-specific genome assembly of T. brucei will enable the analysis of allele-specific phenomena, as well as the better understanding of recombination and evolutionary processes. Furthermore, it will serve as a standard to ‘benchmark’ much needed automatic genome assembly pipelines for highly heterozygous wild species isolates.
Notes
Files
01_Genome_correction_pipeline_complete_tree.txt
Files
(1.0 GB)
Name | Size | Download all |
---|---|---|
md5:e77696985f1506f6ede8e04724ae36a1
|
71.9 MB | Download |
md5:5468c0ee0ea61b468f351d24e4907c99
|
220.0 kB | Preview Download |
md5:57aafcf8e64fc8c3fcb520b68805f673
|
38.1 MB | Download |
md5:25422d45a4bc8ba87d87bc68f3f15626
|
804.6 kB | Preview Download |
md5:d3d8755aeef83f0eccd5b7fb30cb3d44
|
68.6 MB | Download |
md5:3700033efc1f10caffba38f9653f88e7
|
163.6 MB | Download |
md5:a7baa33efffe4ba1ab406a8bf5ea5ec1
|
47.0 MB | Download |
md5:c7718d04a838b01c075afad3e8fbf923
|
59.5 MB | Download |
md5:f056df71010d256c304d766f87569d64
|
200.4 MB | Download |
md5:731a9e4e7c24a328bc60ab0f4f900a3c
|
5.2 MB | Download |
md5:b2385a982665d2caf43eef939c6750a2
|
47.3 MB | Download |
md5:1b9c62431eb7c42874dcd97eac956b9e
|
1.2 MB | Preview Download |
md5:244ae000b809e666177b935edadc1e80
|
36.0 MB | Download |
md5:ab3027dbbd93e9b99e5271562e440493
|
19.3 MB | Download |
md5:6f782cc5a1656c195c2700bb9f3dfa0e
|
22.4 kB | Preview Download |
md5:c0360d4511f8bd85e629fb848cb3c7f1
|
38.3 MB | Download |
md5:83f744d2ed65a96b32f39be2ac383e39
|
4.0 kB | Preview Download |
md5:053d5b5fdba9bb4ff16c7e6e057f4d4c
|
5.7 MB | Download |
md5:616279c200e6c6d9752ad239a5c3b26d
|
92.5 MB | Download |
md5:30abcd59d702f508961e49258c0643bc
|
18.7 kB | Preview Download |
md5:aedbe2cb1c689d0b33b1d7018aa55103
|
131.9 MB | Preview Download |
md5:abe5b800f12f126f6daba140785da616
|
759.6 kB | Preview Download |
Additional details
Related works
- Is referenced by
- Preprint: 10.1101/2021.04.13.439624 (DOI)