Annotated genome assemblies for Arachnopeziza aurata and Arachnopeziza aurelia

Kusch, Stefan; Loos, Anne; Doykova, Ella; Qian, Jiangzhao; Kümmel, Florian; Ibrahim, Heba; Kiss, Levente; Panstruga, Ralph

doi:10.5281/zenodo.15303401

Published May 31, 2025 | Version v1

Dataset Open

Annotated genome assemblies for Arachnopeziza aurata and Arachnopeziza aurelia

1. Forschungszentrum Jülich
2. RWTH Aachen University
3. Max Planck Institute for Plant Breeding Research
4. KU Leuven
5. Cairo University Faculty of Agriculture
6. University of Southern Queensland

Obligate biotrophic pathogens like the phytopathogenic powdery mildew fungi commit to a closely dependent relationship with their plant hosts and have lost the ability to survive and reproduce independently. Thus, at present these organisms are not amenable to in vitro cultivation, which is required for effective genetic modification and functional molecular studies. Saprohytic fungi of the family Arachnopezizaceae are the closest known extant relatives of the powdery mildew fungi. We hypothesize that these fungi hold great potential for studying genetic components of their obligate biotrophic lifestyle.

In this work, we established telomere-to-telomere genome assemblies for two representatives of this family, Arachnopeziza aurata and A. aurelia. We discovered that in contrast to the powdery mildews, these fungi possess compact genomes with a repeat content below 5% and signs of functioning repeat-induced point mutation (RIP) to limit TE spread. We succeeded in cultivating both fungal species in liquid and on solid standard media and show that they are sensitive to common fungicides such as hygromycin and fenhexamid. Further, we were able to use a standard protocol for the genetic modification of fungi, polyethelene glycol-mediated protoplast transformation, to confer hygromycin resistance and express a red fluorescent protein in the species A. aurata. Overall, we demonstrated that Arachnopeziza species are amenable to genetic alterations that may include gene replacement, gene modification, and gene complementation in the future.

We established a potential model system that promises to sidestep the need for genetic modification of powdery mildew fungi by using Arachnopeziza species as a proxy. Our work also provides high-quality genomic resources for A. aurata and A. aurelia, which will be valuable for the fungal research community.

Methods

Whole genome sequencing and genome assembly

High molecular weight genomic DNA was obtained from A. aurata and A. aurelia, respectively, cultivated in PDA at 80 rpm and 28 °C for 7-14 days. Mycelia balls were flash-frozen in liquid N₂ and crushed to a fine powder using mortar and pestle. Then, DNA was isolated with a CTAB protocol according to (Feehan et al., 2017) with the modifications indicated in (Frantzeskakis et al., 2018). The DNA was further purified using the NucleoBond HMW DNA kit (Macherey-Nagel, Düren, Germany); DNA integrity was tested via a 0.6% agarose gel using ethidium bromide as intercalating dye and run at 30 V for 3 h. DNA quantity was determined using the Qubit dsDNA-BR assay kit on a Qubit 4 (Thermo Fisher Scientific, Langerwehe, Germany).

DNA shotgun sequencing was performed using Illumina NovaSeq (NovaSeq 6000) technology with 1 µg input DNA at the service provider CeGaT (CeGaT, Tübingen, Germany), yielding 150-bp paired-end reads. We trimmed raw reads using Trimmomatic v0.39 (Bolger et al., 2014) and assessed read quality with FastQC v0.12.1 (Babraham Bioinformatics, Cambridge, UK). Long-read sequencing was performed by MinION (Oxford Nanopore Technologies, Oxford, US) with R9.4.1 flow cells and the Ligation Sequencing Kit SQK-LSK112; basecalling was done using guppy v0.15.3. All raw reads are available at NCBI/ENA/DDBJ at project accession PRJNA1128938.

We generated draft genome assemblies using the long reads with Canu v2.2 (Koren et al., 2017), Flye v2.9.2 (Kolmogorov et al., 2019) with options ‘--iterations 3 --threads 12 --genome-size 42m --asm-coverage 50 -m 10000’, and NextDenovo v2.5.2 (Hu et al., 2024) with configuration options ‘sort_options = -m 10g -t 8’ and ‘nextgraph_options = -a 1 -q 10 -E 5000’, and then merged the assemblies with quickmerge v0.3 (Chakraborty et al., 2016) to obtain the best draft assembly. We then remapped the Illumina short reads to the respective merged assemblies using the function ‘bwa mem‘ of BWA v0.7.17-r1188 (Li & Durbin, 2009) and polished the assembly using pilon v1.24 (Walker et al., 2014).

We obtained basic assembly statistics using Quast v5.2.0 (Gurevich et al., 2013) and assembly quality estimations with CRAQ v1.0.9 (Li et al., 2023). Further, we identified the 5.8S, 18S, 28S nuclear ribosomal DNA (nrDNA) and ITS sequences using the nrDNA sequences of A. aurata CBS127674 and A. aurelia CBS127675 from GenBank (accessions MH864617.1, MH876055.1, MH864618.1, and MH876056.1). Genome completeness was estimated using 1,706 ascomycete core genes from the ascomycota_odb10 database with compleasm v0.2.6 (Simão et al., 2015; Huang & Li, 2023).

Telomere identification

Telomeres were manually identified at the ends of assembled contigs as telomeric repeats 5’-TTAGGG-3’ or 3’-CCCTAA-5’. To complete the sequence ends where telomeric repeats were not found, we used teloclip v0.0.4 (https://github.com/Adamtaranto/teloclip). Briefly, we mapped the long ONT reads to the respective assembly using Minimap2 v2.26-r1175 (Li, 2018) to retrieve nanopore reads at both ends of the assembly sequences containing telomeric repeats (5’-TTAGGG-3’) with options ‘-k 20 -ax map-ont’ and parsed the SAM files with SAMtools v1.18 (Li et al., 2009). Then, teloclip with options ‘--motifs TTAGGG,TTAAGGG --matchAny’ was used to filter reads mapping to chromosome ends, and with options ‘--extractReads --extractDir SplitOverhang’ to extract these reads. Then, the reads of each chromosome end were aligned via multiple sequence alignment using MAFFT v7.520 (Katoh & Standley, 2013) and Jalview v2.11.3.2 (Waterhouse et al., 2009) was employed to manually identify and extend scaffold ends via read alignment until the last aligning telomeric repeat where available.

Gene annotation

We used BRAKER3 v3.0.8 (Gabriel et al., 2021, 2023; Bruna et al., 2023) for evidence-based gene annotation of both A. aurata and A. aurelia. The respective RNA-seq data obtained in this work and the OrthoDB v11 Fungi protein dataset (https://www.orthodb.org/) (Kuznetsov et al., 2023) served as evidence datasets for BRAKER3 predictions. Reads were prepared for annotation by mapping to the respective genome assembly with HISAT2 (Kim et al., 2015) with ‘--max-intronlen 1000 -k 10’ and parsing the SAM files with SAMtools v1.18 (Li et al., 2009). We assessed the completeness of the gene annotations using BUSCO v5.5.0 (Simão et al., 2015) with ‘-m protein’ and the ascomycota_odb10 database.

Files

Files (285.0 MB)

Name	Size	Download all
Aaurata_braker.cds.fa md5:d92f3f6508ffc2332514ed8f9c2c08de	24.5 MB	Download
Aaurata_braker.gbk md5:7137a44e46b0842de5d37d5a831aa28d	50.1 MB	Download
Aaurata_braker.gff md5:904887d22161cc7488989a1ecaa15b9f	21.6 MB	Download
Aaurata_braker.protein.fa md5:44a9b7ba4be1f61d482529d99c7b3164	8.4 MB	Download
Aaurata_v2-3.mtchr.fasta md5:1d48564f3fd80d35a76d97c47ea7ec83	78.9 kB	Download
Aaurata_v2-3.pilon_teloclip.sort.fa md5:bd5ea7f3c12b8a1b2ed1467bf9cb65db	43.8 MB	Download
Aaurelia_braker.cds.fa md5:4f61b1c81b711cf990922ced2725af08	21.8 MB	Download
Aaurelia_braker.gbk md5:e877c8c8bfacd3c06ab79756ac18442f	42.6 MB	Download
Aaurelia_braker.gff md5:134d039859a4d1f5fe64201266a97bca	17.5 MB	Download
Aaurelia_braker.protein.fa md5:881d5c58e50f5c059dfff6bdc493c47a	7.5 MB	Download
Aaurelia_v2-3.mtchr.fasta md5:b7cb5fbe528c06987ea136e167cd493d	69.0 kB	Download
Aaurelia_v2-3.pilon_teloclip.sort.fa md5:dd9623f6f75d92a2402e801651a8dc1e	47.1 MB	Download

Additional details

DOI: 10.1101/2025.05.08.652889

Is published in: Publication: 10.1111/1755-0998.70045 (DOI)
Is supplement to: Workflow: https://github.com/stefankusch/arachnopeziza_analysis (URL); Dataset: PRJNA1128938 (Other)

Deutsche Forschungsgemeinschaft
274444799
Deutsche Forschungsgemeinschaft
StartUP SPP1819

Development Status: Concept

Feehan JM, Scheibel KE, Bourras S, Underwood W, Keller B, Somerville SC. 2017. Purification of high molecular weight genomic DNA from powdery mildew for long-read sequencing. Journal of Visualized Experiments: JoVE: e55463.
Frantzeskakis L, Kracher B, Kusch S, Yoshikawa-Maekawa M, Bauer S, Pedersen C, Spanu PD, Maekawa T, Schulze-Lefert P, Panstruga R. 2018. Signatures of host specialization and a recent transposable element burst in the dynamic one-speed genome of the fungal barley powdery mildew pathogen. BMC Genomics 19: 381.
Bolger AM, Lohse M, Usadel B. 2014. Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics 30: 2114–2120.
Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM. 2017. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Research 27: 722–736.
Kolmogorov M, Yuan J, Lin Y, Pevzner PA. 2019. Assembly of long, error-prone reads using repeat graphs. Nature Biotechnology 37: 540–546.
Hu J, Wang Z, Sun Z, Hu B, Ayoola AO, Liang F, Li J, Sandoval JR, Cooper DN, Ye K, et al. 2024. NextDenovo: An efficient error correction and accurate assembly tool for noisy long reads. Genome Biology 25: 107.
Chakraborty M, Baldwin-Brown JG, Long AD, Emerson JJ. 2016. Contiguous and accurate de novo assembly of metazoan genomes with modest long read coverage. Nucleic Acids Research 44: e147.
Li H, Durbin R. 2009. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25: 1754–1760.
Walker BJ, Abeel T, Shea T, Priest M, Abouelliel A, Sakthikumar S, Cuomo CA, Zeng Q, Wortman JR, Young SK, et al. 2014. Pilon: An integrated tool for comprehensive microbial variant detection and genome assembly improvement (J Wang, Ed.). PLoS ONE 9: e112963.
Gurevich A, Saveliev V, Vyahhi N, Tesler G. 2013. QUAST: Quality assessment tool for genome assemblies. Bioinformatics 29: 1072–1075.
Li K, Xu P, Wang J, Yi X, Jiao Y. 2023. Identification of errors in draft genome assemblies at single-nucleotide resolution for quality assessment and improvement. Nature Communications 14: 6556.
Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. 2015. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics (Oxford, England) 31: 3210–3212.
Huang N, Li H. 2023. compleasm: A faster and more accurate reimplementation of BUSCO (T Marschall, Ed.). Bioinformatics 39: btad595.
Li H. 2018. Minimap2: Pairwise alignment for nucleotide sequences. Bioinformatics 34: 3094–3100.
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R. 2009. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25: 2078–2079.
Katoh K, Standley DM. 2013. MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Molecular Biology and Evolution 30: 772–780.
Waterhouse AM, Procter JB, Martin DMA, Clamp M, Barton GJ. 2009. Jalview Version 2 - a multiple sequence alignment editor and analysis workbench. Bioinformatics 25: 1189–1191.
Bruna T, Lomsadze A, Borodovsky M. 2023. A new gene finding tool GeneMark-ETP significantly improves the accuracy of automatic annotation of large eukaryotic genomes.
Gabriel L, Brůna T, Hoff KJ, Ebel M, Lomsadze A, Borodovsky M, Stanke M. 2023. BRAKER3: Fully automated genome annotation using RNA-seq and protein evidence with GeneMark-ETP, AUGUSTUS and TSEBRA.
Gabriel L, Hoff KJ, Brůna T, Borodovsky M, Stanke M. 2021. TSEBRA: Transcript selector for BRAKER. BMC Bioinformatics 22: 566.
Kuznetsov D, Tegenfeldt F, Manni M, Seppey M, Berkeley M, Kriventseva EV, Zdobnov EM. 2023. OrthoDB v11: Annotation of orthologs in the widest sampling of organismal diversity. Nucleic Acids Research 51: D445–D451.
Kim D, Langmead B, Salzberg SL. 2015. HISAT: A fast spliced aligner with low memory requirements. Nature Methods 12: 357–360.

	All versions	This version
Views	138	138
Downloads	751	751
Data volume	18.4 GB	18.4 GB

Files (285.0 MB)

Identifiers

Related works

Funding

Software

References

Annotated genome assemblies for Arachnopeziza aurata and Arachnopeziza aurelia

Authors/Creators

Description

Methods

Files

Files (285.0 MB)

Additional details

Identifiers

Related works

Funding

Software

References