Roberto Biello
Thomas C. Mathers
Sam T. Mugford
Qun Liu
Ana S. B. Rodrigues
Ana Carina Neto
Maria Teresa Rebelo
Octávio S. Paulo
Sofia G. Seabra
Saskia A. Hogenhout
2020-01-31
<p>We sequenced the genome of the meadow spittlebug, <em>Philaenus spumarius </em>(Linnaeus, 1758), the main insect vector of <em>Xylella fastidiosa </em>Wells et al. 1987 in Europe (Saponari et al., 2014), using 10x Chromium linked-reads. A single <em>P. spumarius</em> adult female from Portugal (Fontanelas, Sintra; GPS location: 38°50'15.75"N; 9°25'20.77"W), collected in September of 2018, was selected for genome sequencing. This population was initially surveyed for colour polymorphism in 1988 (Quartau & Borges, 1997) and was later included in phylogeographic and population genomic studies of this species (Rodrigues et al., 2014; Seabra et al., unpublished). It is also geographically close to the population from which the individual used for the first partial genome assembly was collected (Rodrigues et al., 2016). The availability of this previous genetic information contributed to the choice of this population as the source of genomic material for whole genome sequencing. A subset of males from the same collection date were analysed for genitalia morphology to confirm species identification, as the best diagnostic characters are the appendages of the aedeagus (Drosopoulos & Quartau, 2002).</p>
<p>The genomic DNA of the <em>P. spumarius</em> adult from Sintra was extracted using Illustra Nucleon Phytopure kit according to the manufacturer’s instructions (GE Healthcare). We assessed the quality and concentration of the DNA using Femto fragment analyser (Agilent). 10x Chromium library preparation and Illumina genome sequencing (HiSeq X, 150bp paired-end) were performed by Novogene Bioinformatics Technology Co, Beijing, China, in accordance with standard protocols.</p>
<p>To create the <em>de novo</em> 10x Chromium assembly we ran Supernova 2.1.1 (Weisenfeld et al., 2017) on the 10x Chromium linked-read data with default parameters, using 1.0 billion reads corresponding to 56X coverage. To improve the initial supernova assembly, we performed iterative scaffolding using all of the 10x raw data (2.3 billion of reads). We ran two rounds of Scaff10x (https://github.com/wtsi-hpag/Scaff10X), followed by mis-assembly detection and correction with Tigmint (Jackman et al., 2018). This was followed by a final round of scaffolding with ARCS (Yeo et al., 2018). The assembly was checked for contamination using the BlobTools pipeline (version 0.9.19; Laetsch and Blaxter 2017; Kumar et al., 2013) and k-mer content was analysed with the KAT comp tool (Mapleson et al., 2017). In order to perform these analyses, it was necessary to remove the 10x linked barcodes from the reads with the script process_10xReads.py (https://github.com/ucdavis-bioinformatics/proc10xG). We assessed the quality of our draft genome assembly by searching for conserved, single copy, arthropod genes (n=1,066) with Benchmarking Universal Single-Copy Orthologs (BUSCO) v3.0 (Waterhouse et al., 2018).</p>
<p>With the above assembly procedure, we obtained a final assembly of 2.7 Gb, having a scaffold N50 length of 116 Kb (contig N50 = 18 Kb) and the longest scaffold was 3.7 Mb. The length of the assembly was consistent with the genome size estimated by flow cytometry (Rodrigues et al., 2016). The k-mer distribution indicated high heterozygosity, estimated at 2.3%. BlobTools analyses revealed the presence of contigs assigned to <em>Sodalis </em>spp. (Enterobacteriaceae), a symbiont in members of tribe Philaenini (Koga et al., 2013). These contigs were filtered from the final assembly. Gene completeness assessment shows that 956 (89.6%) among 1,066 BUSCOs were found as complete copies, with only 26 (2.4%) missing. Of the BUSCOs that were detected, 878 (82.4%) were complete and single-copy, 78 (7.3%) were complete and duplicated and 84 (7.9%) were fragmented.</p>
<p>In conclusion, due in part to high (2.3%) heterozygosity levels, the <em>P. spumarius</em> version 1 genome assembly is highly fragmented. Nonetheless, the assembly is considered complete and is likely to contain the majority of the gene content of <em>P. spumarius.</em></p>
Data Availability
Short Illumina linked-reads and the genome assembly are available at the National Center for Biotechnology Information (NCBI) with the BioProject number PRJNA602656. The BioSample is available at NCBI with accession number SAMN13900937.
Acknowledgments
We thank José A. Quartau and Sara E. Silva for help with collection of the field samples of P. spumarius. Financial support for sample collection was obtained from CESAM (UID/AMB/50017/2019), cE3c (UID/BIA/00329/2019) and FCT/MCTES through national funds (Norma Transitoria – DL57/2016/CP1479), and co-funding by FEDER, within the PT2020 Partnership Agreement and Compete 2020. The collaboration between the research groups and overall research is funded from the BRIGIT project by UK Research and Innovation through the Strategic Priorities Fund, by a grant from BBSRC, with support from the Department for Environment, Food and Rural Affairs and the Scottish Government (BB/S016325/1). Additional support was received from a BBSRC Future Leader Fellowship (BB/R01227X/1) to T.C.M., the BBSRC Institute Strategy Programme (BB/P012574/1) and the John Innes Foundation.
https://doi.org/10.5281/zenodo.3368385
oai:zenodo.org:3368385
Zenodo
https://doi.org/10.5281/zenodo.3368384
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
Hemiptera
Spittlebug
Genome Assembly
Pest
Xylella fastidiosa
Insect Vector
Xylem
Plant Disease
Pathogen
Draft genome assembly version 1 of the meadow spittlebug Philaenus spumarius (Linnaeus, 1758) (Hemiptera, Aphrophoridae)
info:eu-repo/semantics/other