Published August 12, 2021 | Version Pre-Print_Stage
Preprint Open

PRE-PRINT: Triplet Codon Block Shannon Entropy (TCBShE) in terms of GC(1,2,3)% equates to Napier Constant for Model Organisms, and Harmonically Averages to same approximately: a Penta-Clado-genic Quantitative Survey across ~14.45 million Transcripts Clustered by 1118 Species

  • 1. sharmaji@iscb.org
  • 2. suresh.kp@icar.gov.in
  • 3. divakar.hemadri@gmail.com
  • 4. sharanspin13@gmail.com
  • 5. dranirban.dadf@gmail.com , DAHD (NADCP), Govt of India New Delhi

Description

Abstract:

Background-

So far, several research efforts have tried to address the concept of Triplet Block Shannon entropy [TCBShE] computations pertaining to the context of Genetic codon [6]. Though dependence of block Shannon entropy values upon GC% was assessed, specifically GC-1% , GC-2% and GC-3% have not yet been taken into consideration – in this direction. Here, we utilize datasets from GC.evoBase (Dapeng Wang, 2016) to determine the typical TCBShE values and arrive at an interesting mathematical and numerical correlation, worthy of Biological interpretation.

Results-

Upon carrying out a comprehensive survey of 1118 species’ GC-1,2,3 % values across 5 clades: namely 735 Fungi genomes, 68 Metazoa genomes, 44 Plant genomes, 186 Protist genomes and 85 Vertebrate Ensembl-release genomes respectively; from GC.evoBase datasets, we apply the appopriate formula based on 64 codon Trimers Binarily classified into 8 sets of 3 Blocks - { 000, 001, 010, 011, 100, 101, 110 and 111 } to compute TCBShE. It is observed that HM: Harmonic-Mean of these Entopy values, which in the language of Information theory and coding is the Ratio of “Mutual Information to complement of Normalized Variation of Information” ; and in the case of many Model Organisms the TCBShE values themselves – converge approximating to Napier’s constant/ Base of Natural logarithms. HM of TCBShE for “Protists” is nearest to e ~ 2.71828...

Conclusions-

Here, the approximation to Napier’s constant that we have attained by considering HM of TCBShE is a sort of Lower-bound and is clearly expressed in Bits. This may very well be corroborated with the direct implications of solving the HyperProteoGenomic–equation, as follows:

https://www.wolframalpha.com/input/?i=Solve+4%5E%284%5Ex%29+%3D+20%5E%2820%5E1%29

where in Equation above,    4 = Number of cDNA nucleotides (A|C|G|T) and
20 = Number of Amino-acids,    and interestingly, x = 99.9455% Close to e, Napier constant.

Moreover, we may envisage “predicting” Modulo-3 (0,1,2) for 1st , 2nd and 3rd codon-positions by Whole Exome-Seq whereby mapping SNPs/ mutations with respect to Genomic loci (in 1-based Indexing) leads us to GC-poor (in case SNPs are biased to A/T) or GC-rich (in case SNPs are biased to G/C) whereby such point mutational perturbations within Codon positions might imply TCBShE --> e.

Notes

https://zenodo.org/record/5179552

Files

TCBShE.pdf

Files (161.9 kB)

Name Size Download all
md5:ddbb4c837d3ab0979b430cc320b0f736
161.9 kB Preview Download