There is a newer version of this record available.

Dataset Open Access

AMPSphere : the worldwide survey of prokaryotic antimicrobial peptides

Santos-Júnior, Célio Dias; Schmidt, Thomas S.B.; Fullam, Anthony; Duan, Yiqian; Bork, Peer; Zhao, Xing-Ming; Coelho, Luis Pedro

AMPSphere : the worldwide survey of prokaryotic antimicrobial peptides

 

INTRODUCTION

AMPSphere is a comprehensive catalog of antimicrobial peptides predicted using Macrel (DOI: 10.7717/peerj.10555) from 63,410 public metagenomes, ProGenomes v2.2 database (82,400 high-quality microbial genomes) and c.a. 4k non-whitelisted microbial genomes from NCBI.

 

GENERATION

Peptides were predicted using Macrel. Singleton peptides were removed, except those with a direct hit to DRAMP database.

Redundant peptides were hierarchically clustered using CD-HIT (version 4.6) at 100%, 85%, 75% and 50% of amino acid identity (and 90% of overlap of the shorter peptide). The obtained clusters were sorted by decrescent size and
numbered as families. Each level of clustering was called SPHERE and was used to understand the AMPs structure accordingly their orthology.

Nucleotide sequences from the most frequent variants per AMP also were included in this version of AMPSphere.

 

STATISTICS

AMPSphere v.2021-02 contains 863,498 sequences (avg length: 36 amino acids, range 8-98). DRAMP database was used to find confirmed sequences with strict homology to reference. This approach showed that 2,488 peptides were previously confirmed in our dataset.

 

IDENTIFIERS

Peptides are named:

>AMP10.XXX_XXX

Where `XXX_XXX` is a unique numerical identifier (starting at zero). Numbers were assigned in order of increasing
number of copies. So that the lower the number, higher number of copies of that peptide were present in the input data.

 

FILES

README.md
This file.

 

AMPSphere_v.2021-02.fna
Multi-fasta with AMPSphere gene sequences (nucleotide).

 

AMPSphere_v.2021-02.faa
Multi-fasta with AMPSphere peptide sequences (amino acid).

 

AMPSphere_v.2021-02.features.tsv
Table relating AMP name, and the features used for its prediction.
Columns:

  1. AMP accession
  2. tinyAA
  3. smallAA
  4. aliphaticAA
  5. aromaticAA
  6. nonpolarAA
  7. polarAA
  8. chargedAA
  9. basicAA
  10. acidicAA
  11. charge
  12. pI
  13. aindex
  14. instaindex
  15. boman
  16. hydrophobicity
  17. hmoment
  18. SA.Group1.residue0
  19. SA.Group2.residue0
  20. SA.Group3.residue0
  21. HB.Group1.residue0
  22. HB.Group2.residue0
  23. HB.Group3.residue0

For more details about these features see the Macrel manuscript


AMPSphere_v.2021-02.origin_samples.tsv
TSV table relating AMP accession, sequence and their origins in terms of prokaryotic genome or metagenome sample.

Columns:

  1. AMP accession
  2. GMSC accession  (comma separated list)
  3. metagenome samples  (comma separated list)
  4. proGenomes2 genomes  (comma separated list)

 

AMPSphere_v.2021-02.species.tsv
TSV table relating AMP name, sequence and the species from in which they were detected.

Columns:

  1. AMP accession
  2. proGenomes2 genomes
  3. SpecI cluster

Note that AMPSphere was generated from the complete proGenomes v2 database.

However, after the initial release, many genomes were removed due to quality-control issues, leading to version 2.2 used for constructing this table.

 

DRAMP_anno_AMPSphere_v.2021-02.parsed.tsv
TSV table relating AMP name as query and the hits obtained with Blast against DRAMP database. Format is blast `outfmt6`.

Columns:

  1. query
  2. target
  3. identity
  4. alignment length
  5. misalignment
  6. gaps
  7. query start
  8. query end
  9. target start
  10. target end
  11. e-value
  12. score
  13. target annotation
  14. target function
  15. target biochemical targets
  16. target origin reference

 

AMPSphere_v.2021-02.hosts.tsv
TSV table relating AMPs with the hosts of host-associated metagenomes via metadata.

Columns:

  1. AMP accession
  2. host common name
  3. host scientific name
  4. host NCBI taxid                                                                            
  5. counts

Column 5 (counts) measures are in number of identical variants of a given peptide assigned to a common host.
   
 

AMPSphere_v.2021-02.locations.tsv
TSV table relating AMP name and their geographic location from metadata annotation of metagenome samples.

Columns:

  1. AMP accession,
  2. geographic location,
  3. copies

Geographic location refers to the locale where the gene was found through metagenomics. It was assigned as a broad location such as country, ocean, continent (e.g. US, Atlantic Ocean, Artic, Australia).
                                              
Counts are the number of identical variants of a given peptide assigned to a common location.

 

AMPSphere_v.2021-02.microontology.tsv
Table relating AMP name and their habitat of origin.

Columns:

  1. AMP accession,
  2. microontology,
  3. counts

Microontology is a scheme used to annotate environments, it has different levels of complexity separated by ':'
                                              
Counts measures are in number of identical variants of a given peptide assigned to a common habitat.

 

CONTACT

You can contact us via our discussion group.

AMPSphere main developers:

- Célio Dias Santos Júnior
- Yiqian Duan
- Luis Pedro Coelho

 

COPYRIGHT NOTICE

AMPSphere - the worldwide survey of prokaryotic antimicrobial peptides.

This work is a joint effort of Big Data Biology group from the Institute of
Science and Technology for Brain-Inspired Intelligence (ISTBI) - Fudan
University, Shanghai, China, and the Structural and Computational Biology Unit
(Heidelberg) - European Molecular Biology Laboratory (EMBL).

Copyright (C) 2019-2021 The Authors

   AMPSphere IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
   EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
   OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
   IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
   DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
   OTHERWISE,ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
   USE OR OTHER DEALINGS IN THE SOFTWARE.

   This database is free; you can redistribute it and/or modify it
   as you wish, under the terms of the CC BY 4.0 license.

   You are allowed to:

   Share — copy and redistribute the material in any medium or format

   Adapt — remix, transform, and build upon the material for any purpose,
                 even commercially.

   You may also obtain a copy of the CC BY 4.0 license here.

 

REFERENCES CITED

  1. Macrel: Santos-Júnior CD, Pan S, Zhao X, Coelho LP. 2020. Macrel: antimicrobial peptide screening in genomes and metagenomes. PeerJ 8:e10555. https://doi.org/10.7717/peerj.10555
  2. ProGenomes: Mende DR, Letunic I, Maistrenko OM et al. 2020. proGenomes2: an improved database for accurate and consistent habitat, taxonomic and functional annotations of prokaryotic genomes.  Nucleic Acids Research 48(D1): D621–D625. https://doi.org/10.1093/nar/gkz1002
  3. DRAMP: Kang X, Dong F, Shi C et al. 2019. DRAMP 2.0, an updated data repository of antimicrobial peptides. Sci Data 6, 148. https://doi.org/10.1038/s41597-019-0154-y

Funding This work was supported by the National Key R&D Program of China (2020YFA0712403, 2018YFC0910500), the National Natural Science Foundation of China (61932008, 61772368), the Shanghai Science and Technology Innovation Fund (19511101404 and the Shanghai Municipal Science and Technology Major Project (2018SHZDZX01). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the dataset.
Files (269.9 MB)
Name Size
AMPSphere_v.2021-02.faa.gz
md5:77444d8bd8fb09d50ead2f75faa8e10e
22.2 MB Download
AMPSphere_v.2021-02.features.tsv.gz
md5:9f5273ff0fc99b1e29c6b2ca41e1905d
76.8 MB Download
AMPSphere_v.2021-02.fna.xz
md5:36d6895b8bedbbfe662a7f5f8284de36
96.1 MB Download
AMPSphere_v.2021-02.hosts.tsv.gz
md5:f3820ca388dc69d93298911d221be682
2.3 MB Download
AMPSphere_v.2021-02.locations.tsv.gz
md5:56b714bbe774697f4d4622810a4cdb67
5.6 MB Download
AMPSphere_v.2021-02.microontology.tsv.gz
md5:b7aa9ea9c3fb4857822dbfd69585358e
4.2 MB Download
AMPSphere_v.2021-02.origin_samples.tsv.gz
md5:81e09b42663182090cdd57d36c6dce22
62.2 MB Download
AMPSphere_v.2021-02.species.tsv.gz
md5:dab53d6757c39e8cc8ad8a8877ca0251
189.2 kB Download
DRAMP_anno_AMPSphere_v.2021-02.parsed.tsv.gz
md5:bea7d1cb04fe2a57c40ecc87ffd3b86c
296.4 kB Download
README.md
md5:9d416ef7e4845ef0b36d351149b18166
7.4 kB Download
340
101
views
downloads
All versions This version
Views 340243
Downloads 10139
Data volume 2.3 GB924.5 MB
Unique views 266201
Unique downloads 4215

Share

Cite as