Published February 25, 2021 | Version v1
Dataset Open

Single-nucleotide polymorphisms, genome assemblies, genome annotations, and gene predictions of Pyricularia oryzae isolates from rice

Description

We analyzed genetic diversity in isolates of the rice blast fungus (Pyricularia oryzae) covering a broad geographical range. We used genotyping data (Infinium beadchip) to infer population structure and whole genome resequencing data (Illumina sequencing) to investigate differences in repertoires of putative pathogenicity effectors.

List of files:

isolates_with_genotyping_data.xlsx: isolates with Infinium genotyping data

isolates_with_resequencing_data.xlsx: isolates with whole genome sequencing data (including European Nucleotide Archive identifiers).

Infinium-and-sequencing_SNPs_lineage1.txt: Allelic states of 204 P. oryzae isolates from lineage 1, genotyped at 3,686 markers using Infinium genotyping or whole genome sequencing. Genomic coordinates (#CHROM and POS) indicate position of the markers in the 70-15 reference genome (EnsemblFungi, assembly MG8). Line 'clonal_groups' shows the assignment of isolates to groups of multilocus genotypes repeated multiple times.

Infinium-and-sequencing_SNPs.txt: Allelic states of 123 P. oryzae isolates genotyped at 3,686 markers using Infinium genotyping or whole genome sequencing. Genomic coordinates (#CHROM and POS) indicate position of the markers in the 70-15 reference genome (EnsemblFungi, assembly MG8).

Assembly.zip: Fasta-format genome assemblies. Low-quality reads were removed using the software CUTADAPT. Reads were assembled using ABYSS 2.2.3 using different K-mer sizes and for each isolate we chose the assembled sequence with the highest N50 for further analyses. Repeated regions were masked using REPEATMASKER 4.1.0.

Annotation.zip: GFF-format gene models, and corresponding protein and DNA sequences. Genes were predicted with BRAKER 2.1.5 using RNAseq data (Pordel et al. 2020 doi:10.1094/PHYTO-09-20-0423-R) as extrinsic evidence for model refinement. Genes were also predicted with AUGUSTUS 3.4.0 (training set=Magnaporthe grisea) and gene models that did not overlap with gene models identified with BRAKER were added to the GFF file generated with the latter. 

secretome_aln.zip: Alignment of all groups of orthologs corresponding to putative effector proteins. Homology relationships among predicted genes were established using ORTHOFINDER v2.4.0. Putative effector genes were identified as genes encoding proteins predicted to be secreted by at least two methods among three (SIGNALP 4.1, TARGETP and PHOBIUS), without predicted transmembrane domain based on TMHMM analysis, without predicted motif of retention in the endoplasmic reticulum based on PS-SCAN, and without CAZy annotation based on DBSCAN V7. Sequences for each group of orthologs were aligned and cleaned with TRANSLATORX using default parameters. 

non-secretome_aln.zip: Alignment of all groups of orthologs corresponding to genes that are not putative effector proteins (i.e. the portion of the gene space which is the complement of what is included in secretome_aln.zip)

More details in https://doi.org/10.1101/2020.06.02.129296

Files

Annotation.zip

Files (13.7 GB)

Name Size Download all
md5:56c70e014fcdf331d53c1bf7b5b536c2
8.9 GB Preview Download
md5:bc1976d25a34b8687e03022a8638268f
4.8 GB Preview Download
md5:a094b2583662e908c34cfff9969f13cf
984.7 kB Preview Download
md5:7dfe8a3e0881a6f6f30fb1c848b9bdb8
1.6 MB Preview Download
md5:695c0a7a5d2b796fa367be6054f15a93
6.6 MB Preview Download
md5:e6326c1e3eb2cfc2df213d2b6e7fddd1
154.3 kB Download
md5:b872e7ad421f59e209fcb83c12f531c5
28.7 kB Download
md5:005324c1538ae681dd39ddd9614d1f15
30.3 MB Preview Download
md5:34a34ed7c11336fe73ea63d1c7ba543d
4.0 MB Preview Download

Additional details

Related works

Cites
Journal article: 10.1038/s41396-018-0100-6 (DOI)
Journal article: 10.1094/PHYTO-09-20-0423-R (DOI)
Journal article: 0.1128/mBio.01806-17 (Handle)
Is supplement to
Preprint: 10.1101/2020.06.02.129296 (DOI)