Published July 5, 2022 | Version v1
Dataset Open

Distinguishing between canonical and non-canonical tRNA genes reveals that Thermococcaceae adhere to the standard archaeal tRNA gene set

  • 1. Centrum Wiskunde en Informatica
  • 2. University of Amsterdam Institute for Biodiversity and Ecosystem Dynamics
  • 3. University of Applied Sciences Leiden
  • 4. Royal Netherlands Institute for Sea Research
  • 5. Max Planck Institute for Evolutionary Biology

Description

Abstract

Automated genome annotation is an essential tool for extracting biological information from sequence data. The identification and annotation of tRNA genes is frequently performed by the software package tRNAscan-SE, the output of which is listed – for selected genomes – in the Genomic tRNA database (GtRNAdb). Given the central role of tRNA in molecular biology, the accuracy and proper application of tRNAscan-SE is important for both interpretation of the output, and continued improvement of the software. Here, we report a manual annotation of the predicted tRNA gene sets for 20 complete genomes from the archaeal taxon Thermococcaceae. According to GtRNAdb, these 20 genomes contain a number of putative deviations from the standard set of canonical tRNA genes in Archaea. However, manual annotation reveals that only one represents a true divergence; the other instances are either (i) non-canonical tRNA genes resulting from the integration of horizontally transferred genetic elements, or CRISPR-Cas activity, or (ii) attributable to errors in the input DNA sequence. To distinguish between canonical and non-canonical archaeal tRNA genes, we recommend using a combination of automated pseudogene detection by tRNAscan-SE and the tRNAscan-SE isotype score, greatly reducing manual annotation efforts and leading to improved predictions of tRNA gene sets in Archaea.

 

Repository contents

01_workflow_tRNAscanSE_predictions_210archaea.html contains the workflow and graphical output for tRNA gene set predictions in 20 Thermococcaceae genomes and 210 archaeal genomes. Files 03 to 06 below are the files quoted in this workflow.

02_workflow_tRNAscanSE_predictions_210archaea.Rmd contains the markdown file associated with 01_workflow_tRNAscanSE_predictions_210archaea.html above.

03_thermo_trnas_GtRNAdb.txt contains the predicted tRNA gene sets of 20 Thermococcaceae genomes as listed on GtRNAdb (Data Release 19 (June 2021)).

04_Archaea_genome_list.txt contains the details of all 217 archaeal genomes listed on GtRNAdb (Data Release 19 (June 2021)). The seven genomes for which the NCBI genome sequences were no longer available are indicated by #### preceding the name.

05_thermo_tRNAs_genome.txt contains the predicted tRNA gene sets of 20 Thermococcaceae genomes as predicted by locally run tRNAscan-SE (version 2.0.6), with standard settings for Archaea (option -A). To display the output, options -H and --detail were added. We note that pseudogene detection is active under these conditions.

06_Archaea_210_GtRNAdb_tRNAs.txt contains the predicted tRNA gene sets of the 210 archaeal genomes as listed on GtRNAdb (Data Release 19 (June 2021)).

07_Archaea_210genomes_tRNAs.txt contains the predicted tRNA gene sets of the 210 archaeal genomes as predicted by locally run tRNAscan-SE (version 2.0.6), with standard settings for Archaea (option -A). To display the output, options -H and --detail were added. We note that pseudogene detection is active under these conditions.

08_NCBI_genomes.zip contains the NCBI GenBank genome sequence files used in this study. These include the 20 Thermococcaceae genomes, the wider 210 archaeal genomes, and several others of interest. 

09_phylogeny.tar.zip contains the data used to draw a phylogenetic tree for the 20 Thermococcaceae organisms. The folder includes a file listing the details of all data in the folder (Readme.md), a workflow file (workflow_UndinMarkers_v2.md), and data folders.

 

Notes

The extended TIGRFAM database referred to in the phylogenetic tree construction process can be found at https://zenodo.org/record/3839790#.YjByaVzMI3g

The perl script used during phylogenetic tree construction, catfasta2phyml.pl, is available in the GitHub repository https://github.com/nylander/catfasta2phyml

tRNAscan-SE is a freely available resource available online (http://lowelab.ucsc.edu/tRNAscan-SE/)

GtRNAdb is a publicly accessible resource available online (http://gtrnadb.ucsc.edu/)

NCBI is a publicly accessible resource available online (https://www.ncbi.nlm.nih.gov/)

rrnDB is a publicly accessible resource available online (https://rrndb.umms.med.umich.edu/)

BLAST is a publicly accessible resource available online (https://blast.ncbi.nlm.nih.gov/Blast.cgi)

Files

03_thermo_trnas_GtRNAdb.txt

Files (201.1 MB)

Name Size Download all
md5:a238b9220eae9b107699421c1405208c
1.3 MB Download
md5:23dd47e0726f5af6feecec1a8a70c4ac
17.6 kB Download
md5:2947ffd7bc2b1ab33cea2b1021292340
74.6 kB Preview Download
md5:0008576c72d5a4eb96e514bbe68420c8
19.5 kB Preview Download
md5:45ec72b43bdc2b569e474c617eceb1af
93.0 kB Preview Download
md5:f5dcc4eca8a2b893ff372db9f02057b8
846.9 kB Preview Download
md5:ff8d25273ec4613fb194dffc8e2b5f42
1.0 MB Preview Download
md5:57884053b86306ad81708716703e696c
183.4 MB Preview Download
md5:eb2cac2d49cc32bb855ae1b1a7cb58b1
14.4 MB Download

Additional details

Funding

ASymbEL – A multilevel approach to address the role of Archaeal Symbionts in the Evolution of Life 947317
European Commission