Published January 21, 2025 | Version v2
Dataset Open

Exploring the global metaplasmidome: unravelling plasmid landscapes and the spread of antibiotic resistance genes across diverse ecosystems

Authors/Creators

Description

Plasmid content was predicted from assembled data already publicly available or constructed from reads for this study. The assembled data supplied by Pasolli and colleagues (Pasolli et al., 2019) , metasub consortium (Danko et al., 2020) and TARA ocean (Tully et al., 2018) were used for the human microbiome, the built environment and the marine ecosystem respectively. For assembly in the current study, reads from metagenomes were selected from two main databases. For the soil ecosystem, the metagenomes were selected from the dedicated curated database “TerrestrialMetagenomeDB” (Corrêa et al., 2020). 

If the metagenomes were not assembled, reads were assembled by using megahit 1.2.9 with the metalarge option (Li et al., 2015) after cleaning the data with bbduk2 (qtrim=rl trimq=28 minlen=25 maq=20 ktrim=r k=25 mink=11 and a list of adapters to remove) from the bbtools suite (https://jgi.doe.gov/data-and-tools/software-tools/bbtools/).

Plasmids were predicted for each assembly by using both reference-based and reference-free approaches as described in previous works (Hilpert et al., 2021; Hennequin et al., 2022) and available on the github website (https://github.com/meb-team/PlasSuite/). The databases used for the first approach included those for chromosomes (archaea and bacteria) and plasmids from RefSeq, as well as the MOB-suite tool (Robertson and Nash, 2018), SILVA (Quast et al., 2013) and phylogenetic markers hosted by chromosomes (Wu et al., 2013). The database created for this purpose is available at this address https://github.com/meb-team/PlasSuite/?tab=readme-ov-file#1-prepare-or-download-your-databases. Two reference-free methods were applied to contigs that were not affiliated with chromosomes (discarded) or plasmids (retained in the first step): PlasFlow (Krawczyk et al., 2018) and PlasClass (Pellow et al., 2020). Previously undetected viruses were removed by using ViralVerify (https://github.com/ablab/viralVerify)(Antipov et al., 2020) that provides in parallel plasmid/non-plasmid classification. This step would also remove potential plasmid-phage elements as described by Pfeifer et al.  (Pfeifer et al., 2021), but would minimise false positives. Eukaryotic contamination was removed by aligning the sequences against the NT database and human chromosomes (GRCh38) using minimap2 (Li, 2018) with -x asm5 option. Contigs mapping with 95% identity for at least 80% coverage were removed. The predicted plasmids, hereafter referred as plasmid-like sequences (PLSs), were grouped by "scientific names" (i.e. 27) such as defined in the SRA metadata (air, lake, wetland…) and subsequently named ecosystems. These ecosystems were grouped in 9 biomes (Tab Supplementary 4). The data were then dereplicated by ecosystems using cd-hit-est with a threshold of 99%. The dereplicated PLSs were then clustered using MMseqs2 (Steinegger and Söding, 2017) with 80% of coverage an 90% of identity (--min-seq-id 0.90 -c 0.8 --cov-mode 1 --cluster-mode 2 --alignment-mode 3 --kmer-per-seq-scale 0.2) to define plasmid-like clusters (PLCs).

The PLC sequences are included in the file "predicted_PLC.fasta" and the main features are dercribed in the file "metadata_PLC.tsv"

  • fasta_id: fasta identification of the PLC
  • ecosystem: ecosystem from which the PLC originates
  • biome: biome of the ecosystem
  • latitude, longitude: GPS coordinate of the ecosystem
  • length: PLC length
  • map_markers: plasmid marker genes detected by PlasSuite (Hilpert et al., 2021)
  • map_ncbi: PLCs present in the RefSeq plasmid database(Hilpert et al., 2021)
  • nb_genes: Number of genes detected by Prokka implemented in PlasSuite
  • nb_args: ARGs detected by PlasSuite
  • plascad: results from plascad (Che et al., 2021)

 

 

Antipov, D., Raiko, M., Lapidus, A., and Pevzner, P.A. (2020) MetaviralSPAdes: assembly of viruses from metagenomic data. Bioinformatics 36: 4126–4129.

Che, Y., Yang, Y., Xu, X., Břinda, K., Polz, M.F., Hanage, W.P., and Zhang, T. (2021) Conjugative plasmids interact with insertion sequences to shape the horizontal transfer of antimicrobial resistance genes. Proceedings of the National Academy of Sciences 118: e2008731118.

Corrêa, F.B., Saraiva, J.P., Stadler, P.F., and da Rocha, U.N. (2020) TerrestrialMetagenomeDB: a public repository of curated and standardized metadata for terrestrial metagenomes. Nucleic Acids Res 48: D626–D632.

Danko, D., Bezdan, D., Afshinnekoo, E., Ahsanuddin, S., Bhattacharya, C., Butler, D.J., et al. (2020) Global Genetic Cartography of Urban Metagenomes and Anti-Microbial Resistance. bioRxiv 724526.

Hennequin, C., Forestier, C., Traore, O., Debroas, D., and Bricheux, G. (2022) Plasmidome analysis of a hospital effluent biofilm: Status of antibiotic resistance. Plasmid 122: 102638.

Hilpert, C., Bricheux, G., and Debroas, D. (2021) Reconstruction of plasmids by shotgun sequencing from environmental DNA: which bioinformatic workflow? Briefings in Bioinformatics 22: bbaa059.

Krawczyk, P.S., Lipinski, L., and Dziembowski, A. (2018) PlasFlow: predicting plasmid sequences in metagenomic data using genome signatures. Nucleic Acids Res 46: e35.

Li, D., Liu, C.-M., Luo, R., Sadakane, K., and Lam, T.-W. (2015) MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31: 1674–1676.

Li, H. (2018) Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34: 3094–3100.

Pasolli, E., Asnicar, F., Manara, S., Zolfo, M., Karcher, N., Armanini, F., et al. (2019) Extensive Unexplored Human Microbiome Diversity Revealed by Over 150,000 Genomes from Metagenomes Spanning Age, Geography, and Lifestyle. Cell 176: 649-662.e20.

Pellow, D., Mizrahi, I., and Shamir, R. (2020) PlasClass improves plasmid sequence classification. PLOS Computational Biology 16: e1007781.

Pfeifer, E., Moura de Sousa, J.A., Touchon, M., and Rocha, E.P.C. (2021) Bacteria have numerous distinctive groups of phage–plasmids with conserved phage and variable plasmid gene repertoires. Nucleic Acids Res 49: 2655–2673.

Quast, C., Pruesse, E., Yilmaz, P., Gerken, J., Schweer, T., Yarza, P., et al. (2013) The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res 41: D590–D596.

Robertson, J. and Nash, J.H.E. (2018) MOB-suite: software tools for clustering, reconstruction and typing of plasmids from draft assemblies. Microbial Genomics 4:.

Steinegger, M. and Söding, J. (2017) MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nature Biotechnology.

Tully, B.J., Graham, E.D., and Heidelberg, J.F. (2018) The reconstruction of 2,631 draft metagenome-assembled genomes from the global oceans. Scientific Data 5: 170203.

Wu, D., Jospin, G., and Eisen, J.A. (2013) Systematic Identification of Gene Families for Use as “Markers” for Phylogenetic and Phylogeny-Driven Ecological Studies of Bacteria and Archaea and Their Major Subgroups. PLoS One 8:.

 

Files

Files (7.7 GB)

Name Size Download all
md5:f6a837076702b2475b9d1b5fd55bfbb7
52.2 MB Download
md5:19834991c0c7dfa730c1b860feafac2a
7.7 GB Download