PRE-PRINT: Triplet Codon Block Shannon Entropy (TCBShE) in terms of GC(1,2,3)% equates to Napier Constant for Model Organisms, and Harmonically Averages to same approximately: a Penta-Clado-genic Quantitative Survey across ~14.45 million Transcripts Clustered by 1118 Species
Creators
- 1. sharmaji@iscb.org
- 2. suresh.kp@icar.gov.in
- 3. divakar.hemadri@gmail.com
- 4. sharanspin13@gmail.com
- 5. dranirban.dadf@gmail.com , DAHD (NADCP), Govt of India New Delhi
Description
Abstract:
Background-
So far, several research efforts have tried to address the concept of Triplet Block Shannon entropy [TCBShE] computations pertaining to the context of Genetic codon [6]. Though dependence of block Shannon entropy values upon GC% was assessed, specifically GC-1% , GC-2% and GC-3% have not yet been taken into consideration – in this direction. Here, we utilize datasets from GC.evoBase (Dapeng Wang, 2016) to determine the typical TCBShE values and arrive at an interesting mathematical and numerical correlation, worthy of Biological interpretation.
Results-
Upon carrying out a comprehensive survey of 1118 species’ GC-1,2,3 % values across 5 clades: namely 735 Fungi genomes, 68 Metazoa genomes, 44 Plant genomes, 186 Protist genomes and 85 Vertebrate Ensembl-release genomes respectively; from GC.evoBase datasets, we apply the appopriate formula based on 64 codon Trimers Binarily classified into 8 sets of 3 Blocks - { 000, 001, 010, 011, 100, 101, 110 and 111 } to compute TCBShE. It is observed that HM: Harmonic-Mean of these Entopy values, which in the language of Information theory and coding is the Ratio of “Mutual Information to complement of Normalized Variation of Information” ; and in the case of many Model Organisms the TCBShE values themselves – converge approximating to Napier’s constant/ Base of Natural logarithms. HM of TCBShE for “Protists” is nearest to e ~ 2.71828...
Conclusions-
Here, the approximation to Napier’s constant that we have attained by considering HM of TCBShE is a sort of Lower-bound and is clearly expressed in Bits. This may very well be corroborated with the direct implications of solving the HyperProteoGenomic–equation, as follows:
https://www.wolframalpha.com/input/?i=Solve+4%5E%284%5Ex%29+%3D+20%5E%2820%5E1%29
where in Equation above, 4 = Number of cDNA nucleotides (A|C|G|T) and
20 = Number of Amino-acids, and interestingly, x = 99.9455% Close to e, Napier constant.
Moreover, we may envisage “predicting” Modulo-3 (0,1,2) for 1st , 2nd and 3rd codon-positions by Whole Exome-Seq whereby mapping SNPs/ mutations with respect to Genomic loci (in 1-based Indexing) leads us to GC-poor (in case SNPs are biased to A/T) or GC-rich (in case SNPs are biased to G/C) whereby such point mutational perturbations within Codon positions might imply TCBShE --> e.
Notes
Files
TCBShE.pdf
Files
(161.9 kB)
Name | Size | Download all |
---|---|---|
md5:ddbb4c837d3ab0979b430cc320b0f736
|
161.9 kB | Preview Download |