Published February 24, 2022 | Version 1
Dataset Open

Datasets for "The Venturia inaequalis effector repertoire is expressed in waves, and is dominated by expanded families with predicted structural similarity to avirulence proteins "

  • 1. Massey University
  • 2. 2The New Zealand Institute for Plant and Food Research Limited
  • 3. La Trobe University

Description

Datasets for preprint entitled "The Venturia inaequalis effector repertoire is expressed in waves, and is dominated by expanded families with predicted structural similarity to avirulence proteins from other fungi"

1) ViAnnotation.gff3
Gene annotation of Venturia inaequalis MNH120 (https://genome.jgi.doe.gov/Venin1/Venin1.home.html) generated as part of the study "The Venturia inaequalis effector repertoire is expressed in waves, and is dominated by expanded families with predicted structural similarity to avirulence proteins from other fungi".   

Gene reannotation was performed to include genes that would have been missed in the previous annotation by Deng et al. (2017), especially those genes encoding putative effector proteins, which are difficult to predict. For this purpose, we used a three-step approach. In the first step, coding sequences (CDSs) from V. inaequalis isolate 05/172, which were predicted as part of a previous study by Passey et al. (2018) (https://journals.asm.org/doi/full/10.1128/MRA.01062-18), were downloaded from the National Center for Biotechnology Information (https://www.ncbi.nlm.nih.gov/nuccore/QFBF00000000.1/) and mapped to the MNH120 genome using GMAP v2021-02-22. In the second step, RNA-seq reads from one biological replicate representing each in planta time point of Malus domestica infection by V. inaequalis (12 hour post-inoculation [hpi], 24 hpi, 2 days post-inoculation [dpi], 3 dpi, 5 dpi, 7 dpi), as well as one time point representing growth of the fungus in culture, were mapped to the MNH120 genome using HISAT2 v2.2.1. Then, a genome-guided de novo transcriptome assembly was performed using Trinity v2.12.0 and likely CDSs were identified using Transdecoder v5.5.0 (https://github.com/TransDecoder/TransDecoder) in conjunction with a minimum open frame (ORF) length of 50 amino acids. Finally, in the third step, all annotations were visualized in Geneious v9.05, together with the previous annotation from Deng et al. (2017), and a manual curation was performed to create a consensus prediction. Note: this reannotation was generated with the aim of identifying as many genes as possible, and as a result, it contains many spurious genes. 

2) Protein_sequences_ViAnnotation.fasta

3) ECs_Families_AlphaFold.zip

This dataset is made up of predicted protein tertiary structures representing the main member of each up-regulated V. inaequalis effector candidate family. Structures were predicted using Alphafold with the ColabFold server (https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/beta/AlphaFold2_advanced.ipynb#scrollTo=rowN0bVYLe9n). In cases where the effector candidate had less than 30 proteins with amino acid sequence similarity in the NCBI database, a custom multiple sequence alignment (MSA) was generated and used as input for AlphaFold2. Here, mature protein sequences were used.

4) singletons_AlphaFold_OpenSourceCASP14.zip

This dataset set is made up of predicted protein tertiary structures representing up-regulated V. inaequalis singleton effector candidates. Structures were predicted using AlphaFold (https://github.com/deepmind/alphafold) open source code v2.0.1 and v2.1.0, with pre-set casp14, max_template_date: 2020-05-14. Mature protein sequences were used as input. 

5) ECs_Avrs_phytopathogens_AlphaFold.zip

Predicted tertiary structures of avirulence (Avr) proteins or candidate Avr proteins from other fungal pathogens included in the "The Venturia inaequalis effector repertoire is expressed in waves, and is dominated by expanded families with predicted structural similarity to avirulence proteins from other fungi" study. These structures were predicted using Alphafold with the ColabFold server (https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/beta/AlphaFold2_advanced.ipynb#scrollTo=rowN0bVYLe9n). Mature protein sequences were used as input. 

If you have any questions about the datasets, please contact us.
Mercedes Rocafort: m.rocafort.ferrer@massey.ac.nz
Carl Mesarich: c.mesarich@massey.ac.nz

Files

ECs_Avrs_phytopathogens_AlphaFold.zip

Files (2.5 GB)

Name Size Download all
md5:a377e2976a7b5f78c2c94045d093300f
3.7 MB Preview Download
md5:aaac14b3aaeaeffc172420ab3c409c98
76.3 MB Preview Download
md5:dd50b51e70d001c81651ba7353e5357f
7.0 MB Download
md5:b143383fe3bc10af0f76d85d63860608
2.4 GB Preview Download
md5:77948bcf55fe146136b156bf68e87fb1
11.7 MB Download