There is a newer version of the record available.

Published February 4, 2025 | Version v1
Dataset Open

MetaTIS: A tool to predict eukaryotic translation initiation sites

  • 1. ROR icon Saarland University

Description

When scanning mRNA sequences, ribosomes start translation typically at an AUG codon flanked by a so-called Kozak region. However, the first AUG of an mRNA is not always effective in initiating translation. In rare cases, also near cognate codon sequences may be recognized as start sites. The ribosome profiling technique, where translation elongation is stalled by chemical agents, is able to identify actively used translation start sites. In the past, various bioinformatics classifiers have been trained on such data to predict putative translation initiation sites from mRNA sequence features. Here, we formulated a stacking approach that can differentiate between false and true translation initiation sites. This was trained on experimental data for translation initiation in HEK293 cells produced by the so-called TISCA protocol. Our classifier gave a good overall performance on its own test set (accuracy 0.93) as well as multiple external validation sets. Moreover, it was able to predict almost quantitatively whether overlapping open-reading frames suppress translation from the main ORF for 11 genes in HeLa cells as validated by experimental luciferase assays. The MetaTIS tool is publicly available as a webserver

 

The FlanksERF, KmersERF, and MetaTIS models with their training data can be found below. For information on how these models are utilized please refer to github. The datasets are composed of 229 columns. Whereby, the first 168 represent the upstream (U) and downstream (D) kmers of sizes 1 till 3. Then comes the start codon used and the normalized Noderer et al. efficiency values based on the flanking region. Next, the 20 upsteam (U) and 20 downstream (D) nucleotides with respect to the initiation site. The final 19 features represent the relative binding scores of the 9 RNA binding proteins (RBPs) considered. Note some RBPs have multiple binding motifs.

Files

DownstreamNegatives.zip

Files (11.5 GB)

Name Size Download all
md5:e1d56b02105bb82c38bf0c6adc3ba57a
9.9 GB Preview Download
md5:4c07f28ee3ffc40500a6674e6cf149bc
641.9 MB Preview Download
md5:7d360aaf75b11ce50234036cb772c578
263.0 MB Preview Download
md5:bc77862508b7f6f8772495db3307d4db
4.9 MB Preview Download
md5:b20d238ebb8506c228ba3bfc756334b4
688.0 MB Preview Download