Published December 5, 2023 | Version v1.0.1
Dataset Open

Roving methyltransferases generate a mosaic epigenetic landscape and influence evolution in Bacteroides fragilis group

  • 1. Bacterial Pathogenesis and Antimicrobial Resistance Unit, LCIM, NIAID, NIH, Bethesda, MD
  • 2. National Institutes of Health Clinical Center, National Institutes of Health, Bethesda, MD

Description

This repository contains code and the data for reproducing results and figures in the associated manuscript:

Roving methyltransferases generate a mosaic epigenetic landscape and influence evolution in Bacteroides fragilis group

 

BFG-Analysis-main/ includes scripts and data to process Nanopore and Illumina reads, assemble BFG genomes, polish those genomes, and correct out-of-frame ORFs for MLST alignment. This also includes GenBank reference genomes referred to in the manuscript.

tree_files/ includes pylogenetic tree files and aligned sequence files used in Figures 1, 5, and 6

acessory_regions/ includes a .fasta file of accessory regions in each genome from the study in which it was possible to calculate this (using Ppanggolin/panRGP)

genomes/ contains different versions of BFG genomes with and without different types of polishing and frame-correction:

  • genomes/pacbio_uncorrected/ contains genomes sequenced with PacBio and assembled with PacBio software
    • Analyzed for MLST trees in Figures: 1, 5, 6
    • Analyzed in Figures: 5, 6, S7, S9 - S16
  • genomes/nanopore_racon_medaka/ contains genomes sequenced with Nanopore, assembled with Flye, then polished with racon and medaka.
    • Analyzed in Figures: 5, 6, S7, S9 - S16
  • genomes/nanopore_racon_medaka_pilon/ contains genomes sequenced with Nanopore, assembled with Flye, polished with racon and medaka, then polished with Illumina reads with pilon.
    • Analyzed in Figures: 5, 6, S7, S9 - S16
  • genomes/proovframe_BFG_genomes/ contains genomes from the pacbio_uncorrected/, nanopore_racon_medaka/, and nanopore_racon_medaka_pilon/ directories that were frame-corrected with Proovframe.
    • Analyzed for Figures: 2, 3, 4, S2, S3, S4, S5, S6, S8
  • genomes/nanopore_MEGAN_corrected/ contains genomes from the nanopore_racon_medaka/ and nanopore_racon_medaka_pilon/ directories that were frame-corrected with MEGAN
    • Analyzed for MLST trees in Figures: 1, 5, 6

nanodisco_difference_files/ contains Nanodisco intermediate files reporting the difference in nanopore signal between native and PCR-generated gDNA at each genomic position. They refer to the genomes in directories genomes/nanopore_racon_medaka_pilon/, genomes/nanopore_racon_medaka/, genomes/pacbio_uncorrected/. Each isolate has a genome in only one of these directories.

acessory_regions/ has a .fasta file of accessory sequences (per methods in manuscript) of relevant genomes.

 

Files

Tisza_Smith_et_al_BFG_extra.zip

Files (2.6 GB)

Name Size Download all
md5:ae501727b52bc859de6f25eb1be5fd1b
2.6 GB Preview Download