Published September 30, 2021 | Version v7
Dataset Open

Acanthamoeba castellanii genome assembly and infection by Legionella pneumophila

  • 1. Institut Pasteur, Department of Genomes and Genetics


Data associated with the publication "Regulation of the Acanthamoeba castellanii genome upon infection by Legionella pneumophila". The record contains 4 archives, each associated with a github repository, and a "shared assets" archive, which contains processed files used by some repositories. The code from github repositories is embedded in each tarball, along with input and output data. Analyses are organized as independent snakemake pipelines for each part.


For convenient reanalysis, genomes, annotations and merged contact maps used in the publication can be found in the `shared_assets.tar.gz` archive. The infection analysis results are located in the `data/output` folder of Acastellanii_legionella_infection.tar.gz.

All archives can be downloaded at the bottom of the page.


Hybrid genome assembly:

Genome assembly pipeline code and output data used for the assembly of 2 A. castellanii strains (Neff and C3) through a hybrid pipeline combining Illumina shotgun, Hi-C and Oxford Nanopore long reads.


Archive: Acastellanii_hybrid_assembly.tar.gz


Genome annotation:

Genome annotation pipeline used for functional annotation of A. castellanii strains C3 and Neff, and associated output files.


Archive: Acastellanii_genome_annotation.tar.gz


Genome analyses:

Code and data related to general analyses of genomic properties of A. castellanii strains C3 and Neff.


Archive: Acastellanii_genome_analysis.tar.gz


Infection analyses:

Code and data related to the analysis of structural changes in the A. castellanii C3 genome during infection by L. pneumophila.


Archive: Acastellanii_legionella_infection.tar.gz

Shared assets:

This archive contains processed files (genomes, annotations, Hi-C matrices, differential expression results) which can be useful for reanalysis, and are automatically pulled when executing the pipeline of some repositories.

Archive: shared_assets.tar.gz


Supp. analyses:

Code and data related to short ad-hoc analyses on the genomic location of specific sequences in the genomes of C3 and Neff. The archive contains two subfolders: `telomere_repeats` where we analyse the distribution of TTAGGG subtelomeric repeats throughout the A. castellanii assemblies, and `C3_exclusive_regions` where we visualize the genomic distribution of C3-specific sequences (i.e. absent from Neff) along the C3 assembly.


Archive: supp_analyses.tar.gz


Files (1.8 GB)

Name Size Download all
334.8 MB Download
249.0 MB Download
116.4 MB Download
347.6 MB Download
691.0 MB Download
30.7 MB Download