Published April 9, 2021 | Version v1.0
Dataset Open

A phased genome assembly for allele-specific analysis in Trypanosoma brucei

  • 1. Department of Veterinary Sciences, Experimental Parasitology, Ludwig-Maximilians-Universität München, Lena-Christ-Str. 48, 82152 Planegg-Martinsried, Germany; Biomedical Center Munich, Department of Physiological Chemistry, Ludwig-Maximilians-Universität München, Großhaderner Str. 9, 82152 Planegg-Martinsried, Germany

Description

This repository contains the data analysis workflows, the supplementary tables and genome and annotation version from the manuscript entitled "A phased genome assembly for allele-specific analysis in Trypanosoma brucei" https://doi.org/10.1101/2021.04.13.439624

Due to space limitations in Zenodo, for some workflows, full datasets could not be uploaded. For those workflows we provide the directory tree of the complete data analysis folder.

Abstract

Many eukaryotic organisms are diploid or even polyploid, i.e. they harbour two or more independent copies of each chromosome. Yet, to date most reference genome assemblies represent a mosaic consensus sequence in which the homologous chromosomes have been collapsed into one sequence. This procedure generates sequence artefacts and impedes analyses of allele-specific mechanisms. Here, we report the allele-specific genome assembly of the diploid unicellular protozoan parasite Trypanosoma brucei.

As a first step, we called variants on the allele-collapsed assembly of the T. brucei Lister 427 isolate using short-read error-corrected PacBio reads. We identified ~96 thousand heterozygote variants across the genome (average of 4.2 variants / kb), and observed that the variant density along the chromosomes was highly uneven. Several long (>100 kb) regions of loss-of-heterozigosity (LOH) were identified, suggesting recent recombination events between the alleles. By analysing available genomic sequencing data of multiple Lister 427 derived clones, we found that most LOH regions were conserved, except for some that were specific to clones adapted to the insect lifecycle stage. Surprisingly, we also found that some Lister 427 clones were aneuploid. We found evidence of trisomy in chromosome five (Chr5), Chr2, Chr6 and Chr7. Moreover, by analysing RNA-seq data, we showed that the transcript level is proportional to the ploidy, evidencing the lack of a general expression control at the transcript level in T. brucei.

As a second step, to generate an allele-specific genome assembly, we used two powerful datatypes for haplotype reconstruction: raw long reads (PacBio) and chromosome conformation (Hi-C) data. With this approach, we were able to assign 99.5% of all the heterozygote variants to a specific homologous chromosome, building a 66 Mb long T. brucei Lister 427 allele-specific genome assembly. Hereby, we identified genes with allele-specific premature termination codons and showed that differences in allele-specific expression at the level of transcription and translation can be accurately monitored with the fully phased genome assembly.

The obtained reference-grade allele-specific genome assembly of T. brucei will enable the analysis of allele-specific phenomena, as well as the better understanding of recombination and evolutionary processes. Furthermore, it will serve as a standard to ‘benchmark’ much needed automatic genome assembly pipelines for highly heterozygous wild species isolates.

Notes

The work was funded by an ERC Starting Grant (3D_Tryps 715466). R.O.C was supported by a Georg Forster Fellowship (Humboldt Foundation).

Files

01_Genome_correction_pipeline_complete_tree.txt

Files (1.0 GB)

Name Size Download all
md5:e77696985f1506f6ede8e04724ae36a1
71.9 MB Download
md5:5468c0ee0ea61b468f351d24e4907c99
220.0 kB Preview Download
md5:57aafcf8e64fc8c3fcb520b68805f673
38.1 MB Download
md5:25422d45a4bc8ba87d87bc68f3f15626
804.6 kB Preview Download
md5:d3d8755aeef83f0eccd5b7fb30cb3d44
68.6 MB Download
md5:3700033efc1f10caffba38f9653f88e7
163.6 MB Download
md5:a7baa33efffe4ba1ab406a8bf5ea5ec1
47.0 MB Download
md5:c7718d04a838b01c075afad3e8fbf923
59.5 MB Download
md5:f056df71010d256c304d766f87569d64
200.4 MB Download
md5:731a9e4e7c24a328bc60ab0f4f900a3c
5.2 MB Download
md5:b2385a982665d2caf43eef939c6750a2
47.3 MB Download
md5:1b9c62431eb7c42874dcd97eac956b9e
1.2 MB Preview Download
md5:244ae000b809e666177b935edadc1e80
36.0 MB Download
md5:ab3027dbbd93e9b99e5271562e440493
19.3 MB Download
md5:6f782cc5a1656c195c2700bb9f3dfa0e
22.4 kB Preview Download
md5:c0360d4511f8bd85e629fb848cb3c7f1
38.3 MB Download
md5:83f744d2ed65a96b32f39be2ac383e39
4.0 kB Preview Download
md5:053d5b5fdba9bb4ff16c7e6e057f4d4c
5.7 MB Download
md5:616279c200e6c6d9752ad239a5c3b26d
92.5 MB Download
md5:30abcd59d702f508961e49258c0643bc
18.7 kB Preview Download
md5:aedbe2cb1c689d0b33b1d7018aa55103
131.9 MB Preview Download
md5:abe5b800f12f126f6daba140785da616
759.6 kB Preview Download

Additional details

Related works

Is referenced by
Preprint: 10.1101/2021.04.13.439624 (DOI)

Funding

3D_Tryps – The role of three-dimensional genome architecture in antigenic variation 715466
European Commission