Published February 8, 2022 | Version v1
Dataset Open

Transposable element annotation Rhynchosporium commune isolate UK7

  • 1. Laboratory of Evolutionary Genetics, Institute of Biology, University of Neuchâtel, 2000 Neuchâtel, Switzerland
  • 2. School of Biosciences and Biotechnology, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, 43600 Bangi, Selangor, Malaysia

Description

To obtain a consensus sequence for each TE family, RepeatModeler v. open-4.0.7 (http://www.repeatmasker.org/RepeatModeler/) was run on the R. commune UK7 reference genome. The classification was based on the GIRI Repbase (v. 2018) using RepeatMasker v. open-4.0.7. (Smit, Hubley, and P. 2015; Bao, Kojima, and Kohany 2015). We used WICKERsoft to finalize the classification of TE consensus sequences (Breen et al. 2010). Specifically, we used WICKERsoft to screen for copies of known consensus sequences from other fungal species with blastn filtering for sequence identity > 80% and sequence length > 80%. (Altschul et al. 1997). Then, using WICKERsoft, flanks of 10000 bp were added and visually inspected for sequence similarity and terminal repeats with dot plots. Subsequent multiple sequence alignments were performed with 10-15 sequences using ClustalW (Thompson, Higgins, and Gibson 1994). Alignment boundaries were visually inspected in WICKERsoft and trimmed if necessary. Using WICKERsoft, consensus sequences were classified according to the presence and type of terminal repeats, as well as homology of the encoded proteins based on blastx against the NCBI protein database. Consensus sequences were named according to the three-letter classification system (Wicker et al. 2007). The reference genome was annotated with the curated consensus sequences using RepeatMasker v. open-4.0.7 with a cut-off value of 250 (Smit, Hubley, and P. 2015). Simple repeats, low complexity regions and annotated elements shorter than 100 bp were filtered out and adjacent identical TEs overlapping by more than 100 bp were merged as belonging to the same TE family. Different TE families overlapping by more than 100 bp were considered as nested insertions and were renamed accordingly. Identical elements separated by less than 200 bp are indicative of interrupted elements and were grouped into a single element. TEs overlapping genes were recovered using the bedtools v. 2.27.1 suite and the “overlap” function (Quinlan and Hall 2010).

Files

Files (57.5 MB)

Name Size Download all
md5:0394006467aaeb612fe675744d2c9e06
56.3 MB Download
md5:25d2dc718c56d686c915a25c03228d71
1.1 MB Download