CodonU: A Python Package for Codon Usage Analysis

Codon Usage Analysis (CUA) has been accompanied by several web servers and independent programs written in several programming languages. Also this diversity speaks for the need of a reusable software that can be helpful in reading, manipulating and acting as a pipeline for such data and file formats. This kind of analyses use multiple tools to address the multifaceted aspects of CUA. So, we propose CodonU, a package written in Python language to integrate all aspects. It is compatible with existing file formats and can be used solely or with a group of other such packages. The proposed package incorporates various statistical measures necessary for codon usage analysis. The measures vary with nature of the sequences, viz. for nucleotide, codon adaptation index (CAI), codon bias index (CBI), tRNA adaptation index (tAI) etc. and for protein sequences Gravy score etc. Users can also perform the correspondence analysis (COA). This package also provides the liberty to generate graphics to users, and also develop phylogenetic tree. Capabilities of the proposed package were checked thoroughly on a genomic set of Staphylococcus aureus.


CodonU: A Python Package for Codon Usage Analysis Souradipto Choudhuri and Keya Sau
Abstract-Codon Usage Analysis (CUA) has been accompanied by several web servers and independent programs written in several programming languages.Also this diversity speaks for the need of a reusable software that can be helpful in reading, manipulating and acting as a pipeline for such data and file formats.This kind of analyses use multiple tools to address the multifaceted aspects of CUA.So, we propose CodonU, a package written in Python language to integrate all aspects.It is compatible with existing file formats and can be used solely or with a group of other such packages.The proposed package incorporates various statistical measures necessary for codon usage analysis.The measures vary with nature of the sequences, viz.for nucleotide, codon adaptation index (CAI), codon bias index (CBI), tRNA adaptation index (tAI) etc. and for protein sequences Gravy score etc. Users can also perform the correspondence analysis (COA).This package also provides the liberty to generate graphics to users, and also develop phylogenetic tree.Capabilities of the proposed package were checked thoroughly on a genomic set of Staphylococcus aureus.Index Terms-Codon bias, codon usage, codon usage analysis, CodonW, correspondence analysis, phylogenetic analysis, tRNA analysis.

I. INTRODUCTION
S YNONYMOUS codons are sets of codons that code for the same amino acid.While the resulting protein remains unchanged, the preferential usage of particular codons can vary across species, genes, and even within different regions of a single gene.This phenomenon is known as codon usage bias (CUB) and has been the subject of investigation in the field of bioinformatics for several decades [1].First work on CUB can be traced back to 1981 by Grantham et.al [2].The analyses for CUB is known as Codon Usage Analysis (CUA).It can further be divided into two major parts, i.e. sequence based analyses and correspondence analysis (COA).
Arguably the most earlier software for calculating measures of CUB was CODONS [3], published in 1992.One of the most promising softwares for analysis of biological data and make it publishable was GCG [4] developed by University of Wisconsin in 1994.First breakthrough for softwares in CUA was CodonW by Peden [5], [6].Prior to that, McInerney wrote a program named GCUA [7], which could show the COA data.In the 2000s, with growing popularity of web servers, SMS [8] (Sequence Manipulation Suite) was implemented.In this suite, Codon Usage (available at https://www.bioinformatics.org/sms2)was implemented which can calculate the usage bias.During the same time, famous suite for bioinformatics analysis named EMBOSS [9] was developed.Coderet [10] (avaiable at https: //www.bioinformatics.nl/cgi-bin/emboss/coderet),a tools for extracting coding DNA sequence (CDS), is a part of this suite.JEMBOSS [11], was later proposed as a graphics user interface (GUI) version of EMBOSS.Another GUI based web server named ACUA (Automated Codon Usage Analysis) [12] was made public from 2007.In the same year, another package named CodonO [13] was developed.But both the web server and the package is unavailable now.CAIcal [14], a web tool emphasizing on CAI (discussed in II-B2) was published in 2008.Some of the most recent tools for CUA are BCAWT [15], COUSIN [16], CUBAP [17].
Within the landscape of CUA softwares, CodonW which is implemented in ANSI C remains a prominent choice among researchers due to its established reputation.However, several challenges have arisen over time.The software confronts limitations in the scope of permissible characters for gene names, which can be a constraint when dealing with diverse genomic data.It primarily supports CUA for specific species, which may not encompass the full spectrum of organisms of interest.To utilize the software, users must extract CDS from genomic files before inputting them into the program, adding an extra preprocessing step.While recent package named BCAWT offers a streamlined approach for calculating commonly used measures and generating essential plots, it lacks some certain capabilities.The package, for instance, does not include a measure for tAI (discussed in II-B5), which is of significance in CUA.The package also leaves out features of phylogenetic tree development and the visual representation of data related to that, which are vital for comprehensive analysis.COUSIN, introduces a novel measure bearing the same name but falls short in addressing COA, phylogenetic tree development, and visual data representation.Meanwhile, CUBAP emphasizes statistics related to tRNA, dedicating its focus to this specific aspect while excluding other measures and analytical dimensions.In all cases, the extraction of multiple CDSs from parent GenBank files is typically required, facilitated by tools like Coderet, which is discussed earlier.For phylogenetic tree development, researchers often turn to other software solutions such as Mega [18], underscoring the need for multiple tools to comprehensively address the multifaceted aspects of CUA.
The authors thus propose CodonU, an integrated tool for CUA.It is inspired by CodonW and other tools mentioned earlier.CodonU, developed using Python 3, offers comprehensive functionality to calculate essential CUA measures, as detailed in subsequent sections.The added features of the tool is useful for such analyses.The proposed tool streamlines the process of acquiring CDSs by enabling users to retrieve them directly via NCBI accession IDs for the specified organisms.This feature expedites data preparation.Researchers can use codon tables acknowledged by NCBI through their specific ID number [19] or can seamlessly integrate custom codon tables of their own, enhancing the tool's adaptability to diverse genomic contexts.The proposed tool generates high-resolution plots, a crucial asset discussed in subsequent sections.These plots offer enhanced clarity and visual appeal, facilitating data interpretation.The tool also enables users to perform tAI statistics, COA and phylogenetic tree development.For calculating tAI, the tool supports data retrieval from two major databases of tRNA, i.e tRNADB-CE [20] and GtRNAdb [21], [22].The development methodology adheres to a functional programming approach, chosen for its user-friendliness, particularly for newcomers.Also the choice of not making it a command line tool is intentional for the same reason as earlier.Notably, the authors aim to consolidate and provide a comprehensive platform that encompasses all facets of CUA, serving as a one-stop solution for researchers in this domain.

II. BRIEF THEORY
To provide a comprehensive understanding of the package, it is essential to first discuss some theoretical aspects.The analyses that can be performed using the package can be broadly categorized into two categories: sequence-based and correspondence analysis.Sequence-based analyses can further be divided based on their nature, either nucleotide or protein.This section will cover all the aspects of analyses in details.

A. Biological Viewpoint
The genetic code consists of 64 codons including 3 stop codons (viz.UAA, UAG, UGA).Out of 61 codon, Met and Trp, are encoded by a single codon (AUG and UGG, respectively).The remaining 59 codons encode 18 amino acids.The subset of codons that encode for the same amino acid is known as synonymous codons.Within the subset of synonymous codons, a bias may be observed for the preference of a single codon, which can be species-specific and is referred to as the preferred codon.When analyzing codon bias, two hypothetical events are often considered.These are: r No Bias (H 0 ): The first hypothetical event assumes that all 20 amino acids are encoded equally by the 61 codons.In this scenario, there is no observed bias in the codon usage for encoding amino acids, hence the term "no bias" event.
r Extreme Bias (H * ): The second hypothetical event as- sumes that all 20 amino acids are encoded by only 20 codons, with extreme bias in the codon usage for encoding amino acids.This scenario is referred to as the "extreme bias" event.

B. Sequence-Based Analyses
Sequence-based analyses is the analyses of parameters that can be calculated from the sequence as the name suggests.They are as follows: 1) RSCU: The concept of Relative Synonymous Codon Usage (RSCU) was introduced by Sharp et.al in 1987 [23].RSCU is calculated as the ratio of the observed frequency of a codon to the expected frequency of the codon, assuming no bias in the codon usage.This metric is widely used to evaluate codon bias and is often used as a starting point for further analyses.Hence, where x ij is the observed frequency of j th codon for i th amino acid.n i is the number of other codons present in the subset of synonymous codons for i th amino acid.The minimum value of RSCU is 0. If RSCU > 1, it indicates a positive bias for that particular codon, while RSCU < 1 indicates a negative bias.If RSCU = 1, it indicates that the codon is used as expected under the assumption of no bias.See appendix for details regarding the values.
2) CAI: Codon Adaptation Index (CAI) is a quantitative measure that provides a more precise assessment of codon usage bias in comparison to RSCU.It is computed as the ratio of the geometric mean of the observed RSCU values to the maximum possible geometric mean of RSCU.This measure of bias was first proposed by Sharp et.al [23].Hence, where and where RSCU k is the RSCU value for k th codon, RSCU kmax is the maximum RSCU value for the amino acid encoded by k th codon.L is the number of codons present in the gene.The value of CAI lies between 0 (extreme bias) to 1 (no bias).See appendix for details regarding the values.
3) CBI: The Codon Bias Index (CBI) is a measure of codon bias from the perspective of a specific amino acid.It was initially proposed by Bennetzen and Hall [24] for yeast in 1982.Due to its quantitative nature and ease of interpretation, it gained popularity quickly.CBI is calculated as the ratio of the occurrence of the preferred codon minus the occurrence of the preferred codon in a non-biased situation to the occurrence of synonymous codons minus the latter part of the numerator.Hence, where n opt is the occurrence of preferred codon, n syn is the total occurrence of other codons in the synonymous subset, and n 0 is expected occurrence in no bias situation.The value of CBI lies between 0 (extreme bias) to 1 (no bias).See appendix for details regarding the values.4) ENc: ENc or Effective Number of Codons is a measure of codon usage bias that takes into account both the number of synonymous codons for each amino acid and their relative frequencies.It was first introduced by Wright in 1990 [25] and has since been widely used in bioinformatics research.
Codons can be classified into different groups based on the number of synonymous codons that encode a particular amino acid.This grouping is known as the synonymous family or SF categorization.For example, amino acids encoded by only one synonymous codon belong to SF1, while those encoded by two synonymous codons belong to SF2, and so on.The number of elements in each SF category is denoted by ENc is defined as the effective number of codons used by a gene, taking into account the different frequencies of codons in each SF category.Mathematically, it is calculated as: where F i is the arithmetic mean of homozygosity for SF type i, and is defined as: where k is the number of codons present in the subset and n is total number of codons present.If there are total 4 synonymous codons present in the subset (SF4) then, n and so forth.It is worth noting at this point that SF3 contains only Ile which is encoded from AUU, AUC and AUA.It is not highly unlikely that any of the mentioned codons may be absent in a gene.This is known as missing codon problem.Wright proposed to compute F 3 by calculating the average of F 2 and F 4 .But no proof was provided by him, and it was based on intuition.Later Anders Fuglsang pointed out that though it looks correct at first glance, but is wrong [26].He proposed that, in case of missing codon, CodonU implements this function.The value of ENc lies between 20 (extreme bias) to 61 (no bias).See appendix for details regarding the values.

5) tAI:
The tRNA Index, often denoted as tAI (tRNA Adaptation Index), is a crucial metric in the realm of CUA.It was first introduced in 2003 by Reis et.al [27], [28].It quantifies the adaptability of an organism's tRNA pool to the codon usage patterns in its protein-coding genes.The tAI is based on the hypothesis that organisms have evolved their tRNA populations to match the codon frequencies of highly expressed genes, optimizing translation efficiency.This index provides valuable insights into the translational dynamics of an organism.Reis himself noted that this measure is inspired by CAI.In order to calculate this, first the absolute adaptiveness (W i ) is calculated as, where n i denotes the count of tRNA isoacceptor which has the ability to recognize i th codon.tGCN ij represents the gene copy number associated with the j th tRNA molecule responsible for decoding the i th codon.Lastly, s ij signifies a variable that characterizes the selective constraint exerted on the efficiency of the interaction between the codon and its corresponding anticodon.The relative adaptiveness (w i ) is then defined with the help of (8) as, where W max is the highest value of W i , and W mean denotes the geometric mean of all w i with W i = 0. Lastly tAI for gene g is defined with the help of (9) as, where i kg denotes the k th triplet in gene g.l g is the length of the gene in codons excluding the stop codon.In (8), the term s ij is species specific in nature.Reis calculated the mentioned term for Saccharomyces cerevisiae.In 2023, a new way of calculating tAI by genetic algorithm was introduced by Anwar et.al [29].This new tAI is named as gtAI by its authors.CodonU implements this novel technique.
6) Gravy Score: Gravy score is a measure of the hydrophobicity of a protein sequence.While the measures mentioned previously apply to nucleotide sequences, the gravy score is specifically for protein sequences.There are various scales available to compute this score, but the most widely used scale was proposed by Kyte and Doolittle [30].CodonU implements this scale to calculate the gravy score.
7) Aromaticity Score: Aromaticity score is a metric used to assess the abundance of aromatic amino acids, such as Phe, Tyr, and Trp, in a protein sequence.This score provides insight into the aromatic nature of the protein.Various scales have been proposed to calculate this score, and CodonU employs the scale developed by Lobry and Gautier [31].

C. Correspondence Analysis
Correspondence analysis (COA) is a statistical method used for data visualization and dimension reduction.It helps to understand the relationship between two or more categorical variables.The detailed description of this method in context to codon Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
analysis can be found in the book by Lobry [32].The application of COA differs for nucleotide and protein sequences.
In the case of DNA or RNA, COA works by calculating the relative frequencies of the nucleotides or codons and plotting them in a multidimensional space.The distances between the points in the space represent the degree of similarity between the sequences.Mathematically, this works as a R 59 → R 2 projection.It should be noted that, where n Components is numbers of component for COA, n T otCodons is total number of codons in codon table, n StopCodons is number of stop codons and n SF 1 is the number of codons belonging to SF1 [details of SF is discussed in II-B4].
CodonU implements this projection in terms of codon frequency and codon RSCU value.As the tool supports integration of custom codon table, the tool enables user to provide number of components for COA.
In the case of protein sequences, COA works by calculating the frequency of occurrence of each amino acid and plotting them in a multidimensional space.The distances between the points in the space represent the degree of similarity between the sequences based on their amino acid composition.Mathematically, this works as a R 20 → R 2 projection.CodonU implements this projection in terms of amino acid frequency.

III. IMPLEMENTATION
A solid understanding of the theoretical concepts discussed in Section II is crucial for comprehending the potential of CodonU.This section aims to elucidate the importance of such software, along with an overview of the implementation of CodonU, and the requisite third-party software and packages.

A. Background
As highlighted in Section I, the analysis of codon usage has become increasingly complex for new users, requiring them to keep track of numerous files and project schemas in order to achieve successful results.Additionally, previous software, such as CodonW, generated files that are not compatible with contemporary software and therefore not suitable for further investigation.These constraints have hindered progress in the field and necessitate the development of a more user-friendly and compatible software.The proposed CodonU package addresses these issues and offers improved functionality over its predecessors.

B. Language
CodonU has been implemented using the popular and versatile programming language, Python [33].The choice of Python was made due to its widespread use and flexibility, enabling easy incorporation of future developments by other developers or research groups.The package is compatible with any version of Python 3, although it is recommended to use the latest version to avoid potential errors.

D. Third Party Programs
CodonU does not implement phylogenetic tree development directly.Instead, it utilizes popular and readily available software, namely Clustal [47], [48].CodonU supports two variants of Clustal, namely, ClustalW (also known as ClustalX) and ClustalΩ.Users can perform multiple sequence analysis (MSA) using these software variants, by having the corresponding binary files in their system, which can be obtained from the official website of Clustal at http://www.clustal.org.Once MSA has been done, users can use the package to plot tree.The choice of not implementing MSA from scratch and using popular software ensures the reliability and accuracy of the analysis, while also reducing the burden of implementation and maintenance on the developers.

IV. FUNCTIONALITIES
In this section, we describe the basic functionalities of CodonU, which have been categorized into five main categories: Analyze, Correspondence Analysis, Phylogenetic Tree Development, File Handle and Vizualize.Notably, the names of the subsection mirrors names of the sub-packages contained in CodonU.It should also be noted that due to space constraints, we are unable to discuss all available functions in details.However, users can refer to the documentation and examples provided in the Git repository for a more comprehensive understanding of the software and its functionalities.The links are provided in Section IX.

A. Analyze
This sub-package of CodonU focuses on the analysis of sequence based on their nature.The name of the sub-package is analyzer.Some functions are as follows:     r custom_codon_table: Registers a new codon table as provided by the user.Apart from above mentioned functions, g3, a3, gc_123, at_123 etc functions for measuring sequence length, bases and their relative abundance in specific positions are available and is discussed in examples provided in GitHub repository.It is worth noting that users can analyse individual gene or the whole genome.For their ease, generate_report and generate_report_summary functions has been created that do all the above mentioned computation.The former performs as gene analysis and the latter performs as genome analysis.Also, the number of codons or amino acids (aas) in a sequence to consider it as a functional gene varies with researches.Most of the functions in mentioned sub-package has an argument named min_len_threshold.Users should provide the number of codons or aas must be presented in order to consider the sequence as a gene or protein generated by the gene respectively.Default value for it is 200 and 66 for nucleotide sequence and protein sequence respectively.

B. Correspondence Analysis
This sub-package of CodonU focuses on the correspondence analysis of the sequence based on their nature.The name of the sub-package is correspondence_analysis.Some functions are as follows: Creates the contingency It is worth noting that, as discussed in Section II-C, users can provide the number of components for the analysis.Also if number of components is higher than total dimensions of data, then the results may not be of use.

C. Development of Phylogenetic Tree
This sub-package of CodonU can be used to develop phylogenetic tree using multiple sequence alignments (MSA).Users can perform MSA for various sequences and use that result to plot phylogenetic tree [discussed in the Section IV-E].It is to be noted that though this sub-package deals with MSA, the name is kept phylogenetic_analysis as the sub-package helps in phylogenetic analysis in context to CUA.Some functions are as follows:

D. File Handle
This sub-package of CodonU deals with retrieving data and writing it to local machine.The name of the sub-package is file_handler.Some functions are as follows: r set_entrez_param: Sets entrez parameters, i.e.NCBI registered email and the API key of the user.If not set, then the retrieving speed will be less.r write_nucleotide_fasta: Provided the accession id of the organism, creates a fasta file of nucleotides if not exists previously or is empty.r write_protein_fasta: Provided the accession id of the organism, creates a fasta file of proteins if not exists previously or is empty.r write_exome_fasta: Provided the accession id of the organism, creates a fasta file all exones if not exists previously or is empty.Users can also exclude stop codons from being included if they wish to.

E. Visualize
This sub-package of CodonU primarily deals with visualization.The name of the sub-package is vizualizer.It is essential to note that all the images generated can be previewed or saved, based on the user's preference.The resolution for the generated graphics is set to 500 dpi, which is considered to be of high quality.Typically, graphics with a minimum resolution of 300 dpi are suitable for publication purposes.Therefore, the generated graphics can be easily used for publication.Examples of some of the graphics generated by this sub-package are presented in Fig. 1.All the functions used to create these graphics and many others are showed in the examples provided in GitHub, the availability of which is discussed in Section IX.The need for high-quality graphics generation is critical, as it ensures that the information presented is clear and easily understandable.Some functions are as follows:          Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.The default number of components for plotting COA results is two, as two axes are enough for plotting results.

V. FILE HANDLING
A convenient way is provided for users to handle files by CodonU.Users can create a .csvfile containing the name and accession id of their organism of interest, and provide it to CodonU.Subsequently, CodonU will automatically fetch the genbank file, extract the CDS, and save them.Users may choose to save the CDS as individual entries or as a whole exome entry, both of which will be saved in .fastaformat.The results generated by CodonU are provided in .txtformat for humanreadability and .xlsxformat for machine-readability.During

VI. RESULTS
The software underwent rigorous testing utilizing genomic data from Staphylococcus aureus subsp.aureus str.Newman (https://www.ncbi.nlm.nih.gov/nuccore/AP009351.1/).To obtain the nucleotide file for this organism, the package was employed, and despite a network speed of approximately 97 KBps, the 2.9 MB file was efficiently downloaded within a mere 30 seconds.
To comprehensively analyze the computational efficiency of various functions within CodonU, the nucleotide files, which contain CDS, were further partitioned into three distinct categories.The first file encompasses the ten longest CDS, with an average gene length of 7745 base pairs (bp).The second file comprises the one hundred longest CDS, each with an average length of 3255 bp.Lastly, the third file encompasses the one thousand longest CDS, averaging at 1509 bp per gene.
Similarly, protein files were generated using an analogous approach.The initial file contains the ten longest aa sequences, with an average protein length of 2580 aa.The second file consists of the one hundred longest aa sequences, each having an average length of 1084 aa.The third and final file encompasses the one thousand longest aa sequences, characterized by an average protein length of 502 aa.
The computational efficiency of analytical functions of CodonU was methodically assessed and is presented in Table 1.

TABLE 2 CONSUMED MEMORY ANALYSIS
Within this table, the 'Hit' column quantifies the frequency of function utilization for the analysis of all genes in the dataset.The 'Time' column meticulously records the cumulative time in seconds required for the execution of each function.Finally, the 'Per Hit' column denotes the ratio of 'Time' to 'Hits,' serving as an indicator of the average time consumed per analysis iteration.Arguably, CAI is the most time intensive process.These measurements were done using line-profiler [49], a python tool.
Memory usage was also tested for single codon or aa against the most lengthy nucleotide sequence and protein sequence, the length of which are 21096 bp and 7031 aa respectively and is documented in Table 2.The column with the heading 'Peak Memory' is the peak memory usage measured in KB.It is evident from the table that tRNA is the most memory intensive process.These measurements were done using tracemalloc, an inbuilt python tool.

VII. DISCUSSION
Since its initial release of version 0.0.1 in February 21 st , 2023, CodonU has gained significant attention and has crossed 5000 downloads milestone, indicating its growing popularity and relevance in the scientific community.As of Oct 1 st , there have been no reported issues to the developers.Usage statistics for the package can be found at https://pepy.tech/project/CodonU.
CodonU has garnered positive remarks from highly qualified individuals who have a strong research interest in codon usage analysis.This speaks to the effectiveness and reliability of the package in meeting the needs of experts in the field.Here are two feedbacks that CodonU has received.
Prof. Dr. Subrata Sau, Biological Sciences, Bose Institute, commented that, "The tool is competent to provide clues about the factors responsible for codon usage biases in the genes and genomes.CodonW, another computational tool reported nearly two decades ago, has been efficiently and robustly used for similar purposes.CodonU, like CodonW, is equally efficient in analyzing the nucleotide and protein sequences.However, the former, compared to the latter, is less cumbersome, less time-consuming, user-friendly, and takes care of not only the analysis of sequences but also makes the output presentable.This all-in-one software will definitely take the place of CodonW to understand the codon usage bias and evolutionary relationship in near future." Prof. (Retd.)Dr. Tapash Chandra Ghosh, Bioinformatics Centre, Bose Institute, commented that, "I recently had the opportunity to explore the utility of CodonU.In my opinion it's a Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
very useful tool for the codon usage analysis.The developers did an excellent job in creating this user-friendly and comprehensive package.Indeed, this simplifies the workflow of codon usage analysis.
The package provides very valuable insights into various statistical measures for both nucleotides and proteins.The integration of correspondence analysis (COA) further enhances the capabilities of CodonU.It enables users to perform in-depth explorations of sequence data, uncovering hidden relationships and patterns.The graphical visualization options provided in CodonU are also commendable, as they facilitate the clear presentation of analysis results.
Overall, I highly recommend CodonU to the researchers and the scientists interested in codon usage analysis.Moreover, as an open-source project, CodonU invites collaboration and further development from the scientific community, paving the way for continuous improvements and expansion of its functionalities.I am excited to see how CodonU evolves and contributes to further advancements in the field of genomics." These comments highlight the value of CodonU in accelerating research and improving the accuracy of codon usage analysis.

VIII. CONCLUSION
In conclusion, CodonU is a user-friendly package that simplifies the process of codon usage analysis, an integral part of genomic analysis.The package combines various steps in this type of analysis, utilizes readily available third-party software, and yields fewer types of files, thus streamlining the workflow and facilitating the construction of other computational pipelines.With its easy-to-use interface and useful functionalities, CodonU is a valuable tool for researchers working in genomics and related fields.

r
calculate_rscu: Calculates RSCU values for each codon.

r
calculate_cai: Calculates CAI values for each codon.r calculate_cbi: Calculates CBI values for each amino acid and the preferred codon.

r
calculate_enc: Calculates ENc value for a given se- quence.r get_anticodon_count_dict: Retrieves the tRNA count from two major databases, i.e. tRNADB-CE and GtRNAdb.r calculate_gtai: Calculates gtAI for each codon for a given sequence.

r
calculate_gravy: Calculates the gravy score for a given protein sequence.Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

r
calculate_aromaticity: Calculates the aromatic- ity score for a given protein sequence.

r
phy_clustal_w: Perform the multiple sequence align- ment with ClustalW.

r
phy_clustal_o: Perform the multiple sequence align- ment with ClustalΩ.

r
generate_phylo_input: Generates the alignment input file for tree generation.

r
plot_enc: Plots ENc curve from given fasta file.r plot_neutrality: Plots neutrality plot from given fasta file.

r
plot_ca_codon_freq_codon: Plots COA of codon frequency for codons with frequency as scale.

r
plot_ca_codon_freq_gene: Plots COA of codon frequency for genes with gene length as scale.

r
plot_ca_codon_rscu_codon: Plots COA of codon RSCU for codons with frequency as scale.

r
plot_ca_codon_rscu_gene: Plots COA of codon RSCU for genes with gene length as scale.

r
plot_ca_aa_freq_aa: Plots COA of aa frequency for genes with frequency as scale.

r
plot_ca_aa_freq_gene: Plots COA for genes with aromaticity or gravy score as scale.
table of codon frequency.