Dataset Open Access
AMPSphere : the worldwide survey of prokaryotic antimicrobial peptides
AMPSphere is a comprehensive catalog of antimicrobial peptides predicted using Macrel (DOI: 10.7717/peerj.10555) from 63,410 public metagenomes, ProGenomes v2.2 database (82,400 high-quality microbial genomes) and c.a. 4k non-whitelisted microbial genomes from NCBI.
Peptides were predicted using Macrel. Singleton peptides were removed, except those with a direct hit to DRAMP database.
Redundant peptides were hierarchically clustered using CD-HIT (version 4.6) at 100%, 85%, 75% and 50% of amino acid identity (and 90% of overlap of the shorter peptide). The obtained clusters were sorted by decrescent size and
numbered as families. Each level of clustering was called SPHERE and was used to understand the AMPs structure accordingly their orthology.
Nucleotide sequences from the most frequent variants per AMP also were included in this version of AMPSphere.
AMPSphere v.2021-02 contains 863,498 sequences (avg length: 36 amino acids, range 8-98). DRAMP database was used to find confirmed sequences with strict homology to reference. This approach showed that 2,488 peptides were previously confirmed in our dataset.
Peptides are named:
Where `XXX_XXX` is a unique numerical identifier (starting at zero). Numbers were assigned in order of increasing
number of copies. So that the lower the number, higher number of copies of that peptide were present in the input data.
Multi-fasta with AMPSphere gene sequences (nucleotide).
Multi-fasta with AMPSphere peptide sequences (amino acid).
Table relating AMP name, and the features used for its prediction.
For more details about these features see the Macrel manuscript
TSV table relating AMP accession, sequence and their origins in terms of prokaryotic genome or metagenome sample.
TSV table relating AMP name, sequence and the species from in which they were detected.
Note that AMPSphere was generated from the complete proGenomes v2 database.
However, after the initial release, many genomes were removed due to quality-control issues, leading to version 2.2 used for constructing this table.
TSV table relating AMP name as query and the hits obtained with Blast against DRAMP database. Format is blast `outfmt6`.
TSV table relating AMPs with the hosts of host-associated metagenomes via metadata.
Column 5 (counts) measures are in number of identical variants of a given peptide assigned to a common host.
TSV table relating AMP name and their geographic location from metadata annotation of metagenome samples.
Geographic location refers to the locale where the gene was found through metagenomics. It was assigned as a broad location such as country, ocean, continent (e.g. US, Atlantic Ocean, Artic, Australia).
Counts are the number of identical variants of a given peptide assigned to a common location.
Table relating AMP name and their habitat of origin.
Microontology is a scheme used to annotate environments, it has different levels of complexity separated by ':'
Counts measures are in number of identical variants of a given peptide assigned to a common habitat.
You can contact us via our discussion group.
AMPSphere main developers:
AMPSphere - the worldwide survey of prokaryotic antimicrobial peptides.
This work is a joint effort of Big Data Biology group from the Institute of
Science and Technology for Brain-Inspired Intelligence (ISTBI) - Fudan
University, Shanghai, China, and the Structural and Computational Biology Unit
(Heidelberg) - European Molecular Biology Laboratory (EMBL).
Copyright (C) 2019-2021 The Authors
AMPSphere IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
OTHERWISE,ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
USE OR OTHER DEALINGS IN THE SOFTWARE.
This database is free; you can redistribute it and/or modify it
as you wish, under the terms of the CC BY 4.0 license.
You are allowed to:
Share — copy and redistribute the material in any medium or format
Adapt — remix, transform, and build upon the material for any purpose,
You may also obtain a copy of the CC BY 4.0 license here.