Annotated genome assemblies for Arachnopeziza aurata and Arachnopeziza aurelia
Authors/Creators
Description
Obligate biotrophic pathogens like the phytopathogenic powdery mildew fungi commit to a closely dependent relationship with their plant hosts and have lost the ability to survive and reproduce independently. Thus, at present these organisms are not amenable to in vitro cultivation, which is required for effective genetic modification and functional molecular studies. Saprohytic fungi of the family Arachnopezizaceae are the closest known extant relatives of the powdery mildew fungi. We hypothesize that these fungi hold great potential for studying genetic components of their obligate biotrophic lifestyle.
In this work, we established telomere-to-telomere genome assemblies for two representatives of this family, Arachnopeziza aurata and A. aurelia. We discovered that in contrast to the powdery mildews, these fungi possess compact genomes with a repeat content below 5% and signs of functioning repeat-induced point mutation (RIP) to limit TE spread. We succeeded in cultivating both fungal species in liquid and on solid standard media and show that they are sensitive to common fungicides such as hygromycin and fenhexamid. Further, we were able to use a standard protocol for the genetic modification of fungi, polyethelene glycol-mediated protoplast transformation, to confer hygromycin resistance and express a red fluorescent protein in the species A. aurata. Overall, we demonstrated that Arachnopeziza species are amenable to genetic alterations that may include gene replacement, gene modification, and gene complementation in the future.
We established a potential model system that promises to sidestep the need for genetic modification of powdery mildew fungi by using Arachnopeziza species as a proxy. Our work also provides high-quality genomic resources for A. aurata and A. aurelia, which will be valuable for the fungal research community.
Methods
Whole genome sequencing and genome assembly
High molecular weight genomic DNA was obtained from A. aurata and A. aurelia, respectively, cultivated in PDA at 80 rpm and 28 °C for 7-14 days. Mycelia balls were flash-frozen in liquid N2 and crushed to a fine powder using mortar and pestle. Then, DNA was isolated with a CTAB protocol according to (Feehan et al., 2017) with the modifications indicated in (Frantzeskakis et al., 2018). The DNA was further purified using the NucleoBond HMW DNA kit (Macherey-Nagel, Düren, Germany); DNA integrity was tested via a 0.6% agarose gel using ethidium bromide as intercalating dye and run at 30 V for 3 h. DNA quantity was determined using the Qubit dsDNA-BR assay kit on a Qubit 4 (Thermo Fisher Scientific, Langerwehe, Germany).
DNA shotgun sequencing was performed using Illumina NovaSeq (NovaSeq 6000) technology with 1 µg input DNA at the service provider CeGaT (CeGaT, Tübingen, Germany), yielding 150-bp paired-end reads. We trimmed raw reads using Trimmomatic v0.39 (Bolger et al., 2014) and assessed read quality with FastQC v0.12.1 (Babraham Bioinformatics, Cambridge, UK). Long-read sequencing was performed by MinION (Oxford Nanopore Technologies, Oxford, US) with R9.4.1 flow cells and the Ligation Sequencing Kit SQK-LSK112; basecalling was done using guppy v0.15.3. All raw reads are available at NCBI/ENA/DDBJ at project accession PRJNA1128938.
We generated draft genome assemblies using the long reads with Canu v2.2 (Koren et al., 2017), Flye v2.9.2 (Kolmogorov et al., 2019) with options ‘--iterations 3 --threads 12 --genome-size 42m --asm-coverage 50 -m 10000’, and NextDenovo v2.5.2 (Hu et al., 2024) with configuration options ‘sort_options = -m 10g -t 8’ and ‘nextgraph_options = -a 1 -q 10 -E 5000’, and then merged the assemblies with quickmerge v0.3 (Chakraborty et al., 2016) to obtain the best draft assembly. We then remapped the Illumina short reads to the respective merged assemblies using the function ‘bwa mem‘ of BWA v0.7.17-r1188 (Li & Durbin, 2009) and polished the assembly using pilon v1.24 (Walker et al., 2014).
We obtained basic assembly statistics using Quast v5.2.0 (Gurevich et al., 2013) and assembly quality estimations with CRAQ v1.0.9 (Li et al., 2023). Further, we identified the 5.8S, 18S, 28S nuclear ribosomal DNA (nrDNA) and ITS sequences using the nrDNA sequences of A. aurata CBS127674 and A. aurelia CBS127675 from GenBank (accessions MH864617.1, MH876055.1, MH864618.1, and MH876056.1). Genome completeness was estimated using 1,706 ascomycete core genes from the ascomycota_odb10 database with compleasm v0.2.6 (Simão et al., 2015; Huang & Li, 2023).
Telomere identification
Telomeres were manually identified at the ends of assembled contigs as telomeric repeats 5’-TTAGGG-3’ or 3’-CCCTAA-5’. To complete the sequence ends where telomeric repeats were not found, we used teloclip v0.0.4 (https://github.com/Adamtaranto/teloclip). Briefly, we mapped the long ONT reads to the respective assembly using Minimap2 v2.26-r1175 (Li, 2018) to retrieve nanopore reads at both ends of the assembly sequences containing telomeric repeats (5’-TTAGGG-3’) with options ‘-k 20 -ax map-ont’ and parsed the SAM files with SAMtools v1.18 (Li et al., 2009). Then, teloclip with options ‘--motifs TTAGGG,TTAAGGG --matchAny’ was used to filter reads mapping to chromosome ends, and with options ‘--extractReads --extractDir SplitOverhang’ to extract these reads. Then, the reads of each chromosome end were aligned via multiple sequence alignment using MAFFT v7.520 (Katoh & Standley, 2013) and Jalview v2.11.3.2 (Waterhouse et al., 2009) was employed to manually identify and extend scaffold ends via read alignment until the last aligning telomeric repeat where available.
Gene annotation
We used BRAKER3 v3.0.8 (Gabriel et al., 2021, 2023; Bruna et al., 2023) for evidence-based gene annotation of both A. aurata and A. aurelia. The respective RNA-seq data obtained in this work and the OrthoDB v11 Fungi protein dataset (https://www.orthodb.org/) (Kuznetsov et al., 2023) served as evidence datasets for BRAKER3 predictions. Reads were prepared for annotation by mapping to the respective genome assembly with HISAT2 (Kim et al., 2015) with ‘--max-intronlen 1000 -k 10’ and parsing the SAM files with SAMtools v1.18 (Li et al., 2009). We assessed the completeness of the gene annotations using BUSCO v5.5.0 (Simão et al., 2015) with ‘-m protein’ and the ascomycota_odb10 database.
Files
Files
(285.0 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:d92f3f6508ffc2332514ed8f9c2c08de
|
24.5 MB | Download |
|
md5:7137a44e46b0842de5d37d5a831aa28d
|
50.1 MB | Download |
|
md5:904887d22161cc7488989a1ecaa15b9f
|
21.6 MB | Download |
|
md5:44a9b7ba4be1f61d482529d99c7b3164
|
8.4 MB | Download |
|
md5:1d48564f3fd80d35a76d97c47ea7ec83
|
78.9 kB | Download |
|
md5:bd5ea7f3c12b8a1b2ed1467bf9cb65db
|
43.8 MB | Download |
|
md5:4f61b1c81b711cf990922ced2725af08
|
21.8 MB | Download |
|
md5:e877c8c8bfacd3c06ab79756ac18442f
|
42.6 MB | Download |
|
md5:134d039859a4d1f5fe64201266a97bca
|
17.5 MB | Download |
|
md5:881d5c58e50f5c059dfff6bdc493c47a
|
7.5 MB | Download |
|
md5:b7cb5fbe528c06987ea136e167cd493d
|
69.0 kB | Download |
|
md5:dd9623f6f75d92a2402e801651a8dc1e
|
47.1 MB | Download |
Additional details
Identifiers
Related works
- Is published in
- Publication: 10.1111/1755-0998.70045 (DOI)
- Is supplement to
- Workflow: https://github.com/stefankusch/arachnopeziza_analysis (URL)
- Dataset: PRJNA1128938 (Other)
Funding
- Deutsche Forschungsgemeinschaft
- 274444799
- Deutsche Forschungsgemeinschaft
- StartUP SPP1819
Software
- Development Status
- Concept
References
- Feehan JM, Scheibel KE, Bourras S, Underwood W, Keller B, Somerville SC. 2017. Purification of high molecular weight genomic DNA from powdery mildew for long-read sequencing. Journal of Visualized Experiments: JoVE: e55463.
- Frantzeskakis L, Kracher B, Kusch S, Yoshikawa-Maekawa M, Bauer S, Pedersen C, Spanu PD, Maekawa T, Schulze-Lefert P, Panstruga R. 2018. Signatures of host specialization and a recent transposable element burst in the dynamic one-speed genome of the fungal barley powdery mildew pathogen. BMC Genomics 19: 381.
- Bolger AM, Lohse M, Usadel B. 2014. Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics 30: 2114–2120.
- Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM. 2017. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Research 27: 722–736.
- Kolmogorov M, Yuan J, Lin Y, Pevzner PA. 2019. Assembly of long, error-prone reads using repeat graphs. Nature Biotechnology 37: 540–546.
- Hu J, Wang Z, Sun Z, Hu B, Ayoola AO, Liang F, Li J, Sandoval JR, Cooper DN, Ye K, et al. 2024. NextDenovo: An efficient error correction and accurate assembly tool for noisy long reads. Genome Biology 25: 107.
- Chakraborty M, Baldwin-Brown JG, Long AD, Emerson JJ. 2016. Contiguous and accurate de novo assembly of metazoan genomes with modest long read coverage. Nucleic Acids Research 44: e147.
- Li H, Durbin R. 2009. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25: 1754–1760.
- Walker BJ, Abeel T, Shea T, Priest M, Abouelliel A, Sakthikumar S, Cuomo CA, Zeng Q, Wortman JR, Young SK, et al. 2014. Pilon: An integrated tool for comprehensive microbial variant detection and genome assembly improvement (J Wang, Ed.). PLoS ONE 9: e112963.
- Gurevich A, Saveliev V, Vyahhi N, Tesler G. 2013. QUAST: Quality assessment tool for genome assemblies. Bioinformatics 29: 1072–1075.
- Li K, Xu P, Wang J, Yi X, Jiao Y. 2023. Identification of errors in draft genome assemblies at single-nucleotide resolution for quality assessment and improvement. Nature Communications 14: 6556.
- Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. 2015. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics (Oxford, England) 31: 3210–3212.
- Huang N, Li H. 2023. compleasm: A faster and more accurate reimplementation of BUSCO (T Marschall, Ed.). Bioinformatics 39: btad595.
- Li H. 2018. Minimap2: Pairwise alignment for nucleotide sequences. Bioinformatics 34: 3094–3100.
- Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R. 2009. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25: 2078–2079.
- Katoh K, Standley DM. 2013. MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Molecular Biology and Evolution 30: 772–780.
- Waterhouse AM, Procter JB, Martin DMA, Clamp M, Barton GJ. 2009. Jalview Version 2 - a multiple sequence alignment editor and analysis workbench. Bioinformatics 25: 1189–1191.
- Bruna T, Lomsadze A, Borodovsky M. 2023. A new gene finding tool GeneMark-ETP significantly improves the accuracy of automatic annotation of large eukaryotic genomes.
- Gabriel L, Brůna T, Hoff KJ, Ebel M, Lomsadze A, Borodovsky M, Stanke M. 2023. BRAKER3: Fully automated genome annotation using RNA-seq and protein evidence with GeneMark-ETP, AUGUSTUS and TSEBRA.
- Gabriel L, Hoff KJ, Brůna T, Borodovsky M, Stanke M. 2021. TSEBRA: Transcript selector for BRAKER. BMC Bioinformatics 22: 566.
- Kuznetsov D, Tegenfeldt F, Manni M, Seppey M, Berkeley M, Kriventseva EV, Zdobnov EM. 2023. OrthoDB v11: Annotation of orthologs in the widest sampling of organismal diversity. Nucleic Acids Research 51: D445–D451.
- Kim D, Langmead B, Salzberg SL. 2015. HISAT: A fast spliced aligner with low memory requirements. Nature Methods 12: 357–360.