Distinguishing between canonical and non-canonical tRNA genes reveals that Thermococcaceae adhere to the standard archaeal tRNA gene set
Creators
- 1. Centrum Wiskunde en Informatica
- 2. University of Amsterdam Institute for Biodiversity and Ecosystem Dynamics
- 3. University of Applied Sciences Leiden
- 4. Royal Netherlands Institute for Sea Research
- 5. Max Planck Institute for Evolutionary Biology
Description
Abstract
Automated genome annotation is an essential tool for extracting biological information from sequence data. The identification and annotation of tRNA genes is frequently performed by the software package tRNAscan-SE, the output of which is listed – for selected genomes – in the Genomic tRNA database (GtRNAdb). Given the central role of tRNA in molecular biology, the accuracy and proper application of tRNAscan-SE is important for both interpretation of the output, and continued improvement of the software. Here, we report a manual annotation of the predicted tRNA gene sets for 20 complete genomes from the archaeal taxon Thermococcaceae. According to GtRNAdb, these 20 genomes contain a number of putative deviations from the standard set of canonical tRNA genes in Archaea. However, manual annotation reveals that only one represents a true divergence; the other instances are either (i) non-canonical tRNA genes resulting from the integration of horizontally transferred genetic elements, or CRISPR-Cas activity, or (ii) attributable to errors in the input DNA sequence. To distinguish between canonical and non-canonical archaeal tRNA genes, we recommend using a combination of automated pseudogene detection by tRNAscan-SE and the tRNAscan-SE isotype score, greatly reducing manual annotation efforts and leading to improved predictions of tRNA gene sets in Archaea.
Repository contents
01_workflow_tRNAscanSE_predictions_210archaea.html contains the workflow and graphical output for tRNA gene set predictions in 20 Thermococcaceae genomes and 210 archaeal genomes. Files 03 to 06 below are the files quoted in this workflow.
02_workflow_tRNAscanSE_predictions_210archaea.Rmd contains the markdown file associated with 01_workflow_tRNAscanSE_predictions_210archaea.html above.
03_thermo_trnas_GtRNAdb.txt contains the predicted tRNA gene sets of 20 Thermococcaceae genomes as listed on GtRNAdb (Data Release 19 (June 2021)).
04_Archaea_genome_list.txt contains the details of all 217 archaeal genomes listed on GtRNAdb (Data Release 19 (June 2021)). The seven genomes for which the NCBI genome sequences were no longer available are indicated by #### preceding the name.
05_thermo_tRNAs_genome.txt contains the predicted tRNA gene sets of 20 Thermococcaceae genomes as predicted by locally run tRNAscan-SE (version 2.0.6), with standard settings for Archaea (option -A). To display the output, options -H and --detail were added. We note that pseudogene detection is active under these conditions.
06_Archaea_210_GtRNAdb_tRNAs.txt contains the predicted tRNA gene sets of the 210 archaeal genomes as listed on GtRNAdb (Data Release 19 (June 2021)).
07_Archaea_210genomes_tRNAs.txt contains the predicted tRNA gene sets of the 210 archaeal genomes as predicted by locally run tRNAscan-SE (version 2.0.6), with standard settings for Archaea (option -A). To display the output, options -H and --detail were added. We note that pseudogene detection is active under these conditions.
08_NCBI_genomes.zip contains the NCBI GenBank genome sequence files used in this study. These include the 20 Thermococcaceae genomes, the wider 210 archaeal genomes, and several others of interest.
09_phylogeny.tar.zip contains the data used to draw a phylogenetic tree for the 20 Thermococcaceae organisms. The folder includes a file listing the details of all data in the folder (Readme.md), a workflow file (workflow_UndinMarkers_v2.md), and data folders.
Notes
The extended TIGRFAM database referred to in the phylogenetic tree construction process can be found at https://zenodo.org/record/3839790#.YjByaVzMI3g
The perl script used during phylogenetic tree construction, catfasta2phyml.pl, is available in the GitHub repository https://github.com/nylander/catfasta2phyml
tRNAscan-SE is a freely available resource available online (http://lowelab.ucsc.edu/tRNAscan-SE/)
GtRNAdb is a publicly accessible resource available online (http://gtrnadb.ucsc.edu/)
NCBI is a publicly accessible resource available online (https://www.ncbi.nlm.nih.gov/)
rrnDB is a publicly accessible resource available online (https://rrndb.umms.med.umich.edu/)
BLAST is a publicly accessible resource available online (https://blast.ncbi.nlm.nih.gov/Blast.cgi)
Files
03_thermo_trnas_GtRNAdb.txt
Files
(201.1 MB)
Name | Size | Download all |
---|---|---|
md5:a238b9220eae9b107699421c1405208c
|
1.3 MB | Download |
md5:23dd47e0726f5af6feecec1a8a70c4ac
|
17.6 kB | Download |
md5:2947ffd7bc2b1ab33cea2b1021292340
|
74.6 kB | Preview Download |
md5:0008576c72d5a4eb96e514bbe68420c8
|
19.5 kB | Preview Download |
md5:45ec72b43bdc2b569e474c617eceb1af
|
93.0 kB | Preview Download |
md5:f5dcc4eca8a2b893ff372db9f02057b8
|
846.9 kB | Preview Download |
md5:ff8d25273ec4613fb194dffc8e2b5f42
|
1.0 MB | Preview Download |
md5:57884053b86306ad81708716703e696c
|
183.4 MB | Preview Download |
md5:eb2cac2d49cc32bb855ae1b1a7cb58b1
|
14.4 MB | Download |