Other Open Access

Regression models generated by APRANK (computational prioritization of antigenic proteins and peptides from complete pathogen proteomes)

Ricci, Alejandro; Agüero, Fernán

Availability of highly parallelized immunoassays has renewed interest in the discovery of serology-based biomarkers for infectious diseases. Protein and peptide microarrays now provide a high-throughput platform for immunological screening of potential antigens and B-cell epitopes. However, there is still a need to prioritize relevant probes when designing these arrays. In this work we describe a computational method called APRANK (Antigenic Protein and Peptide Ranker) which integrates multiple molecular features to prioritize antigenic targets starting from a given pathogen proteome. These features include subcellular localization, presence of repetitive motifs, natively disordered regions, secondary structure, transmembrane spans and predicted interaction with the immune system. We applied this method to the prioritization of potential diagnostic antigens and peptides in a number of pathogen proteomes and human diseases: Borrelia burgdorferi (Lyme disease), Brucella melitensis (Brucellosis), Coxiella burnetii (Q fever), Escherichia coli (Gastroenteritis), Francisella tularensis (Tularemia), Leishmania braziliensis (Leishmaniasis), Leptospira interrogans (Leptospirosis), Mycobacterium leprae (Leprae), Mycobacterium tuberculosis (Tuberculosis), Plasmodium falciparum (Malaria), Porphyromonas gingivalis (Periodontal disease), Staphylococcus aureus (Bacteremia), Streptococcus pyogenes (Group A Streptococcal infections), Toxoplasma gondii (Toxoplasmosis) and Trypanosoma cruzi (Chagas Disease). After training a linear regression model the method achieves good to excellent performance on most species, measured by the enrichment of validated antigens at the top of the ranking. An unbiased validation using independent data sets shows APRANK is successful in predicting antigenicity for all pathogen species tested. We make APRANK available to facilitate the identification of novel diagnostic antigens in infectious diseases.

These files contain:

  1. R data structures that can be fed into R. They contain generalized linear models derived from curated (validated) antigens from 15 different human pathogens. Most likely users of these files may want to use our APRANK software (Antigenic Protein and Peptide Ranker, https://github.com/trypanosomatics/aprank), which is a pipeline that would rank proteins and peptides from a complete proteome based on predicted antigenicity. APRANK 
  2. Antigenicity Scores for all 15 organisms analyzed in this work. Scores are provided for all proteins in the proteomes of these 15 human pathogens, and for the top scoring peptides (score >0.7, not less than 1% of the total peptides)

Funding provided by: Division of Intramural Research, National Institute of Allergy and Infectious Diseases
Crossref Funder Registry ID: http://dx.doi.org/10.13039/100006492
Award Number: AI123070

Funding provided by: Agencia Nacional de Promoción Científica y Tecnológica
Crossref Funder Registry ID: http://dx.doi.org/10.13039/501100003074
Award Number: PICT-2017-0175

Funding provided by: Agencia Nacional de Promoción Científica y Tecnológica
Crossref Funder Registry ID: http://dx.doi.org/10.13039/501100003074
Award Number: PICT-2013-1193

Files (62.2 MB)
Name Size
62.2 MB Download
All versions This version
Views 1616
Downloads 33
Data volume 186.6 MB186.6 MB
Unique views 1515
Unique downloads 33


Cite as