The salivary proteome of the green peach aphid/peach-potato aphid (Myzus persicae) (Sulzer, 1776) (Hemiptera, Aphididae).

Liu, Qun; Goldberg, Jay K.; Mugford, Sam T.; Saalbach, Gerhard; Martins, Carlo; Singh, Archana; Kaithakottil, Gemy G.; Swarbreck, David; Hogenhout, Saskia A.

doi:10.5281/zenodo.13269257

Published August 8, 2024 | Version v1

Dataset Open

The salivary proteome of the green peach aphid/peach-potato aphid (Myzus persicae) (Sulzer, 1776) (Hemiptera, Aphididae).

1. John Innes Centre
2. Earlham Institute
3. University of Cambridge

*for correspondence: saskia.hogenhout@jic.ac.uk

Introduction

The green peach aphid/peach-potato aphid Myzus persicae colonizes hundreds of plant species, an ability that is in part due to the delivery of saliva proteins – often referred to as effectors – into the host plant that suppress plant defence. As a generalist herbivore with a remarkable ability to colonize new host plants (Dedryver et al., 2010), M. persicae represents an outstanding model system for studying the molecular mechanisms underlying plant-insect interactions.

Recent advancements in mass spectrometry instrumentation (Yu et al. 2020) and database search software (Frejno et al., 2024), along with a new high-quality reference genome assembly for M. persicae (Mathers et al., 2021) and a simplified method for improved aphid saliva recovery that we describe here, collectively enhance the detection of saliva proteins with unprecedented sensitivity and specificity.

Here, we present a high-quality salivary proteome of M. persicae generated using a nanoLC-MS/MS analysis in combination with an updated annotation of the Myzus persicae clone O genome.

Saliva from over 10,000 M. persicae aphids was collected and analysed using nanoLC-MS/MS (Figure 1). We identified 1557 peptide sequences mapped to the M. persicae clone O genome v2.1 annotation (Table 1); 210 of those peptides additionally appeared as modified by oxidation (M) and/or carbamidomethylation (C) so that a total of 1767 peptide forms mapping to M. persicae were detected with high confidence (combined from both search engines). Of those peptides, 120 were only identified by the CHIMERYS search engine, and 52 were only identified by the Mascot search engine (all with high confidence). Most peptides were detected in the concentrated sample; of the 1767 peptide forms, only 411 were detected in the unconcentrated sample and only 10 of those were exclusively identified in the unconcentrated sample.

Those peptides were assigned to a total of 423 M. persicae proteins with high confidence and at least 1 unique peptide including all proteins with shared peptides. The software generated 169 protein groups each represented by one master protein. The master protein is the largest protein with the most peptide matches in the group. Of the 169 protein groups, 126 groups were identified with at least 2 unique peptides and 43 groups were identified with only 1 unique peptide. Protein groups that included products of a single gene model were identified from the peptide output of Proteome Discoverer. The 169 protein groups included proteins encoded by 219 gene models which are listed in Table 3, of which 155 had peptides that did not match to any other gene model.

Of note, 8 Cathepsin B cysteine peptidases are listed (highlighted in Table 3) which have previously been shown to play an important role in colonisation of host plants (Mathers et al. 2017). This is a recently diversified gene family including 28 members in the Myzus persicae genome. The detected proteins belong to several protein groups including highly similar or identical proteins with shared peptides detected, and also include CathB17 (MYZPE13164_O_EIv2.1_0124700) which was excluded from the database because it is identical to CathB18 (MYZPE13164_O_EIv2.1_0124690)

Together, these findings suggest that aphid saliva contains enzymes that likely alter plant physiology by interacting with both plant proteins and small molecules. Future mechanistic studies will be able to precisely characterize the role of these salivary proteins to understand the pathways, hormones, and chemical defences that may be suppressed by aphids.

Materials and Methods

Aphid rearing

M. persicae Clone O colonies were reared on Arabidopsis thaliana Col-0 plants in a growth chamber maintained at 20°C with a 14-hour light/10-hour dark cycle and 75% relative humidity. The A. thaliana plants were grown at short-day conditions (10 hours light/14 hours dark) at 22°C and 70% relative humidity. For aphid rearing, 4-week-old A. thaliana plants were used, and plants were replaced every two weeks to ensure optimal conditions for aphid growth.

Saliva collection

Figure 1 illustrates the schematic overview of our sample collection and preparation workflow. M. persicae Clone O aphids were transferred to 4-week-old A. thaliana plants and maintained for 2-3 weeks to enable reproduction. Approximately 100 aphids were then placed into a 50 mm Petri dish, which was sealed with parafilm and had a hole in the base for introducing the aphids. The hole was subsequently covered with mesh to allow ventilation while preventing aphid escape. A 300 μL aliquot of artificial diet (15% w/v sucrose in Milli-Q water, sterilized via 0.22 μm filtration) was added to the inverted lid of the Petri dish. The dish, with the aphid chamber positioned over the lid, was set up so that the parafilm made contact with the artificial diet. This setup was kept under short-day conditions (14 hours light/10 hours dark) at 20°C and 75% relative humidity for 24 hours.

After 24 hours, the artificial diet, now containing aphid saliva, was collected and pooled to produce an unconcentrated saliva sample. A portion of this sample was then concentrated using a Vivaspin concentrator with a 3 kDa molecular weight cut-off (MWCO) at 4°C. The concentrated saliva was snap-frozen in liquid nitrogen and stored at -80°C until further analysis.

This procedure was repeated until saliva was collected from approximately 10,000 aphids.

Saliva preparation and nanoLC-MS/MS analysis

Saliva samples were precipitated by adding 4 volumes of methanol and 1 volume of chloroform, followed by centrifugation at maximum speed for 10 minutes (Wessel and Flügge, 1984). After removing the supernatant, the pellet was washed once with acetone before proceeding to trypsin digestion. The protein pellet was resuspended in 50 µL of 1.5% sodium deoxycholate (SDC; Merck) in 0.2 M EPPS buffer (Merck), pH 8.5, and vortexed under heating. Cysteine residues were reduced with dithiothreitol, alkylated with iodoacetamide, and proteins were digested with trypsin in the SDC buffer following standard protocols.

After digestion, SDC was precipitated by adjusting the solution to 0.2% trifluoroacetic acid (TFA). The clear supernatant was then subjected to C18 solid-phase extraction (SPE) using OMIX 10-100 μL C18 pipette tips (Agilent).

The samples were analysed using nanoLC-MS/MS on an Orbitrap Eclipse™ Tribrid™ mass spectrometer, coupled with an UltiMate® 3000 RSLCnano LC system (Thermo Fisher Scientific, Hemel Hempstead, UK). Samples were loaded onto a trap column (nanoEase M/Z Symmetry C18 Trap Column, Waters) with 0.1% TFA at a flow rate of 15 µL/min for 3 minutes. The trap column was then switched in-line with the analytical column (nanoEase M/Z HSS C18 T3, 1.8 µm, 100 Å, 250 mm x 0.75 µm, Waters) for separation at 40°C. The gradient used for separation was as follows: solvent A (water with 0.1% formic acid) and solvent B (80% acetonitrile with 0.1% formic acid) at a flow rate of 0.2 µL/min: 0-3 minutes at 3% B (parallel to trapping); 3-10 minutes with B increasing to 8%; 10-130 minutes with B linearly increasing to 45%; 130-145 minutes with B linearly increasing to 55%; followed by a ramp to 99% B and re-equilibration to 0% B, for a total runtime of 180 minutes.

Mass spectrometry data were acquired in positive ion mode with the following settings: Orbitrap resolution at 120K, profile mode, mass range m/z 300-1800, normalized AGC target at 100%, and a maximum injection time of 50 ms. For MS2 analysis in IT Turbo mode, parameters included quadrupole isolation window of 1.2 Da, charge states 2-5, threshold at 1.9e4, HCD CE and CID CE both set to 33 in parallel, AGC target at 1e4, maximum injection time of 35 ms, and dynamic exclusion set to 1 count for 15 seconds with a mass tolerance of ±7 ppm.

M. persicae v2.1 annotation

The chromosome scale genome assembly of M. persicae clone O (Mathers et al. 2021) was re-annotated for accurate gene prediction as follows. Illumina short read RNAseq data from M. persicae used for previous annotation (Mathers et al. 2017; EBI ENA SAMEA4469192) was used in addition to stranded RNAseq reads of M. persicae clone O feeding from 9 different host plant species (Chen et al. 2020 ; NCBI GEO GSE129669); from males, alate asexual females and winged asexual females, and nymphs (Mathers et al. 2019; NCBI SRA PRJNA437622); dissected organs from winged and alate asexual female M. persicae (EBI ENA PRJEB79119), and PacBio Isoseq RNAseq data from asexual female M. persicae (EBI ENA PRJEB79119).

Candidate transcript sequences were assembled from RNA-seq reads with Scallop (Shao and Kingsford 2017) and StringTie (Pertea et al. 2015) using a genome guided approach. A filtered set of non-redundant transcripts are derived using Mikado (Venturini et al. 2018) for the final transcript set for annotation. Mikado models together with aligned proteins and repeat annotation are provided as hints to Augustus (http://bioinf.uni-greifswald.de/augustus/). Multiple Augustus gene builds were created from alternative evidence inputs or weightings. These were supplemented with gene models derived directly from protein alignments and high confidence models from the Mikado transcript selection stage. Metrics were generated to assess how well supported each gene model is by available evidence and an integrated set of models produced by Mikado.

Long non-coding (lnc) RNAs were identified from the assembled RNAseq. Transcripts with open reading frames (Transdecoder, https://transdecoder.github.io) showing similarity to arthropod protein coding genes (BlastP e<1e-5), or with HAMMER hits against the Pfam database were excluded. Remaining transcripts with coding potential >0.5 (CPC2, Kang et al. 2017) or that were shorter than 200bp were also excluded. Transcripts mapping to rRNA, tRNA, miRNA or transposon loci were excluded using Mikado (Venturini et al. 2018).

In total we identified 37,720 total genes (with 58,609 total splice variant isoforms), including 22,796 (47,508 total isoforms encoding 39,681 unique proteins) protein coding and 7,990 (11,101 total isoforms) non-coding genes. (Further details of the annotation process and statistics can be found in CloneO_v2.1_annotation_summary_stats.txt and Myzus_persicae_O_annotation_readme.doc).

Mass spectrometry data processing

The mass spectrometry raw data were processed and quantified in Proteome Discoverer 3.1 (Thermo), all mentioned tools of the following workflows are nodes of the proprietary Proteome Discoverer (PD) software. The database search was performed using the search engines CHIMERYS (MSAID, Munich, Germany) and Mascot Server 2.8.3 (Matrixscience, London) in parallel on the following databases: MYZPE13164_O_EIv2.1.annotation.gff3.pep.fasta (39,681 entries after removal of duplicate protein sequences) and common contaminants (MaxQuant.org, 20240812, 246 entries). The databases were imported into PD adding a reversed sequence database for decoy searches. The processing workflow started with spectrum recalibration on the Myzus protein database, Minora Feature Detection with min. trace length 7, S/N 2.5, PSM confidence high, and Top N Peak Filter with 20 peaks per 100 Da. For CHIMERYS, the inferys_3.0.0_fragmentation prediction model with FDR targets 0.01 (strict) and 0.05 (relaxed), a fragment tolerance of 0.3 Da, enzyme trypsin with 2 missed cleavages, variable modification oxidation (M), fixed modification carbamidomethyl (C) were used. For Mascot, the same parameters were used including a precursor tolerance of 5 ppm and a fragment tolerance of 0.5 Da; validation was performed using Percolator based on q-values and FDR targets 0.01 (strict) and 0.05 (relaxed).

The consensus workflow in the PD software was used to evaluate the peptide identifications and to measure the abundances of the peptides based on the LC-peak intensities. For identification, an FDR of 0.01 was used as strict threshold. Protein abundance was calculated using the Top3 most abundant peptides. The results were exported into Microsoft Excel including data for protein abundances, number of peptides, protein coverage, the search identification score and other important values (Tables 1 and 2). Identification of protein groups with members encoded by a single gene model was performed by first identifying peptides that mapped to a single gene model, then counting the number of peptides that were specific to each gene model.

Data availability statement

The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE (https://www.ebi.ac.uk/pride/) partner repository with the dataset identifier PXD055051 and 10.6019/PXD055051.

Acknowledgements

We would like to thank the Informatics, and Entomology technology platforms at the John Innes Centre for technical support.

Conflicts of Interest

The authors declare that no conflicts of interest exist.

Funding Information

This work was funded by UK Research and Innovation (UKRI) Biotechnology and Biological Sciences Research Council (BBSRC) grants to S.A.H. (BB/V008544/1 and BB/R009481/1). Additional Support as provided by the BBSRC Institute Strategy Programmes (BBS/E/J/000PR9797 and BBS/E/JI/230001B) awarded to the John Innes Centre (JIC). The JIC is grant-aided by the John Innes Foundation.

Figures and legends

Fig. 1. Experimental procedure for detecting M. persicae secretome. (A) The workflow of M. persicae saliva collection and concentration for nanoLC-MS/MS. (B) The diagrammatic representation of mass spectrometry data analysis of M. persicae saliva.

Table 1. Full list of detected peptides.

List of peptides detected in M. persicae saliva from Mascot and Chimerys searches of the M. persicae clone O v2.1 database.

Table 2. Full list of proteins detected

List of proteins detected in M. persicae saliva. Closely related proteins with shared peptides are grouped into protein groups by Proteome Discoverer, the highest confidence of these is the Master protein of the group. Master proteins of the 169 protein groups are indicated in the ‘Master’ column as ‘Master protein’, other proteins belonging to these groups have shared peptides. Proteins belonging to groups containing only different isoforms (i.e. splice variants) from the same gene model are indicating by the number of peptides that specifically match that gene model

Table 3. Annotated list of unique gene models representing detected proteins.

Two hundred and nineteen (219) gene models that encode at least one protein detected in the M. persicae saliva. For each gene model, the highest confidence detected protein is shown: either a master protein of the protein group, or else the longest isoform of that protein. Annotation is derived from Interproscan, including descriptions from pfam, GO and BlastP hits, and include the identities of candidate effector proteins (e.g. Mp1, Mp2) identified in Bos et al. (2010). Cathepsin B proteins are highlighted in green.

Supplementary files:

Genome annotation files:

Myzus_persicae_O_v2.0.scaffolds.fa (as described in Mathers et al. 2021)

CloneO_v2.1_annotation_summary_stats.txt

MYZPE13164_O_EIv2.1.annotation.gff3

MYZPE13164_O_EIv2.1.annotation_w.functions.gff3

MYZPE13164_O_EIv2.1.annotation.gff3.cdna.fasta

MYZPE13164_O_EIv2.1.annotation.gff3.cds.fasta

MYZPE13164_O_EIv2.1.annotation.gff3.metrics.txt

MYZPE13164_O_EIv2.1.annotation.gff3.pep.fasta

MYZPE13164_O_EIv2.1.annotation.gff3.cdna.LTPG.fasta

MYZPE13164_O_EIv2.1.annotation.gff3.cds.LTPG.fasta

MYZPE13164_O_EIv2.1.annotation.gff3.pep.LTPG.fasta

Myzus_persicae_O_annotation_readme.doc

Literature Cited

Chen Y, Singh A, Kaithakottil GG, Mathers TC, Gravino M, Mugford ST, van Oosterhout C, Swarbreck D, Hogenhout SA. (2020). An aphid RNA transcript migrates systemically within plants and is a virulence factor. Proc Natl Acad Sci U S A. 117(23):12763-12771. doi: 10.1073/pnas.1918410117.

Dedryver, C.A., Le Ralec, A., and Fabre, F. (2010). The conflicting relationships between aphids and men: a review of aphid damage and control strategies. C R Biol 333, 539-553.

Frejno, M., Berger, M.T., Tüshaus, J., … and Wilhelm, M. (2024). Unifying the analysis of bottom-up proteomics data with CHIMERYS. bioRxiv 2024.05.27.596040; doi: https://doi.org/10.1101/2024.05.27.596040

Kang, Y. J., Yang, D. C., Kong, L., Hou, M., Meng, Y. Q., Wei L., and Gao, G. (2017). CPC2: a fast and accurate coding potential calculator based on sequence intrinsic features. Nucleic Acids Research 45, W12–W16.

Mathers, T.C., Chen, Y., Kaithakottil, G., … Hogenhout, S.A. (2017). Rapid transcriptional plasticity of duplicated gene clusters enables a clonally reproducing aphid to colonise diverse plant species. Genome Biol 18, 27. https://doi.org/10.1186/s13059-016-1145-3

Mathers TC, Mugford ST, Percival-Alwyn L, Chen Y, Kaithakottil G, Swarbreck D, Hogenhout SA, van Oosterhout C. (2020) Sex-specific changes in the aphid DNA methylation landscape. Mol Ecol. 28(18):4228-4241. doi: 10.1111/mec.15216.

Mathers, T.C., Wouters, R.H.M., Mugford, S.T., Swarbreck, D., van Oosterhout, C., Hogenhout, S.A. (2021). Chromosome-Scale Genome Assemblies of Aphids Reveal Extensively Rearranged Autosomes and Long-Term Conservation of the X Chromosome, Molecular Biology and Evolution 38, 856–875.

Perez-Riverol Y, Bai J, Bandla C, …Vizcaíno JA (2022). The PRIDE database resources in 2022: A Hub for mass spectrometry-based proteomics evidences. Nucleic Acids Res 50(D1):D543-D552 (PubMed ID: 34723319).

Pertea, M., Pertea, G.M., Antonescu, C.M., Chang, T.C., Mendell, J.T. and Salzberg, S.L. (2015). StringTie enables improved reconstruction of a transcriptome from RNA-seq reads Nature Biotechnology 33,290-295. doi:10.1038/nbt.3122

Shao, M., Kingsford, C. (2017). Accurate assembly of transcripts through phase-preserving graph decomposition. Nat Biotechnol 35, 1167–1169 (2017). https://doi.org/10.1038/nbt.4020

Venturini, L., Caim, S., Kaithakottil, G.G., Mapleson, D.L., and Swarbreck, D. (2018). Leveraging multiple transcriptome assembly methods for improved gene structure annotation, GigaScience7giy093 https://doi.org/10.1093/gigascience/giy093

Wessel, D., and Flügge, U.I. (1984). A method for the quantitative recovery of protein in dilute solution in the presence of detergents and lipids. Analytical Biochemistry 138, 141-143.

Yu, Q., Paulo, J.A., Naverrete-Perea, J., McAlister, G.C., … and Gygi, S.P., and Schweppe, D.K. (2020). Benchmarking the Orbitrap Tribrid Eclipse for Next Generation Multiplexed Proteomics. Analytical Chemistry 2020 92, 6478-6485. DOI: 10.1021/acs.analchem.9b05685

Files

CloneO_v2.1_annotation_summary_stats.txt

Files (1.0 GB)

Name	Size	Download all
CloneO_v2.1_annotation_summary_stats.txt md5:591974294670719b593ebb2baaabb61b	3.6 kB	Preview Download
Figure_1.png md5:38453de068a06b69b7f0ea86b399cfa8	324.1 kB	Preview Download
MYZPE13164_O_EIv2.1.annotation.gff3 md5:022123e2c548d57f486ce187fa29e2fe	126.5 MB	Download
MYZPE13164_O_EIv2.1.annotation.gff3.cdna.fasta md5:abf28bc700be3ae6993e0a31340e7588	136.5 MB	Download
MYZPE13164_O_EIv2.1.annotation.gff3.cdna.LTPG.fasta md5:4ede5c68f8b365d5550b83e5198a8b45	66.4 MB	Download
MYZPE13164_O_EIv2.1.annotation.gff3.cds.fasta md5:4392e86307c28a2dc6ff25f635323fe1	80.0 MB	Download
MYZPE13164_O_EIv2.1.annotation.gff3.cds.LTPG.fasta md5:c4af021bfc0af6796645b0bd8f808427	39.4 MB	Download
MYZPE13164_O_EIv2.1.annotation.gff3.pep.fasta md5:9660b55187efd64a7f0b95b0ad1440e2	28.6 MB	Download
MYZPE13164_O_EIv2.1.annotation.gff3.pep.LTPG.fasta md5:3378488cb42a864be29b852e59802bea	13.7 MB	Download
MYZPE13164_O_EIv2.1.annotation_w.functions.gff3 md5:f45f94fad7efffa2a4394d98d4f3ace8	136.5 MB	Download
Myzus_persicae_O_annotation_readme.docx md5:5c810e5466135e8e77726f9770e825f1	176.2 kB	Download
Myzus_persicae_O_v2.0.scaffolds.fasta md5:40da822be3fb33a086df7d5d5aa1636c	395.1 MB	Download
Table_1.xlsx md5:4c860ee6acb0ce2313afa930c8056844	524.0 kB	Download
Table_2.xlsx md5:569d261840889d0efe1e564adb5fb739	185.9 kB	Download
Table_3.xlsx md5:092a208a799b0baca194d3516912c02d	150.9 kB	Download

Additional details

UK Research and Innovation
All Aphid Effectors on DEK BB/V008544/1
UK Research and Innovation
Resistance: DNA methylation and the evolution of pesticide-resistance genes in aphids BB/R009481/1
UK Research and Innovation
BBSRC Institute Strategy Programmes BBS/E/J/000PR9797
UK Research and Innovation
BBSRC Institute Strategy Programmes BBS/E/JI/230001B

	All versions	This version
Views	308	308
Downloads	607	607
Data volume	41.5 GB	41.5 GB

The salivary proteome of the green peach aphid/peach-potato aphid (Myzus persicae) (Sulzer, 1776) (Hemiptera, Aphididae).

Creators

Description

Files

CloneO_v2.1_annotation_summary_stats.txt

Files (1.0 GB)

Additional details

Funding