Journal article Open Access

# Application of KL Divergence for Estimation of Each Metabolic Pathway Genes

Shohei Maruyama; Yasuo Matsuyama; Sachiyo Aburatani

### Dublin Core Export

<?xml version='1.0' encoding='utf-8'?>
<oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
<dc:creator>Shohei Maruyama</dc:creator>
<dc:creator>Yasuo Matsuyama</dc:creator>
<dc:creator>Sachiyo Aburatani</dc:creator>
<dc:date>2015-02-01</dc:date>
<dc:description>Development of a method to estimate gene functions is
an important task in bioinformatics. One of the approaches for the
annotation is the identification of the metabolic pathway that genes are
involved in. Since gene expression data reflect various intracellular
phenomena, those data are considered to be related with genes’
functions. However, it has been difficult to estimate the gene function
with high accuracy. It is considered that the low accuracy of the
estimation is caused by the difficulty of accurately measuring a gene
expression. Even though they are measured under the same condition,
the gene expressions will vary usually. In this study, we proposed a
feature extraction method focusing on the variability of gene
expressions to estimate the genes' metabolic pathway accurately. First,
we estimated the distribution of each gene expression from replicate
data. Next, we calculated the similarity between all gene pairs by KL
divergence, which is a method for calculating the similarity between
distributions. Finally, we utilized the similarity vectors as feature
vectors and trained the multiclass SVM for identifying the genes'
metabolic pathway. To evaluate our developed method, we applied the
method to budding yeast and trained the multiclass SVM for
identifying the seven metabolic pathways. As a result, the accuracy
that calculated by our developed method was higher than the one that
calculated from the raw gene expression data. Thus, our developed
method combined with KL divergence is useful for identifying the
genes' metabolic pathway.</dc:description>
<dc:identifier>https://zenodo.org/record/1099834</dc:identifier>
<dc:identifier>10.5281/zenodo.1099834</dc:identifier>
<dc:identifier>oai:zenodo.org:1099834</dc:identifier>
<dc:language>eng</dc:language>
<dc:relation>doi:10.5281/zenodo.1099833</dc:relation>
<dc:relation>url:https://zenodo.org/communities/waset</dc:relation>
<dc:rights>info:eu-repo/semantics/openAccess</dc:rights>
<dc:rights>https://creativecommons.org/licenses/by/4.0/legalcode</dc:rights>
<dc:subject>Metabolic pathways</dc:subject>
<dc:subject>gene expression data</dc:subject>
<dc:subject>microarray</dc:subject>
<dc:subject>Kullback–Leibler divergence</dc:subject>
<dc:subject>KL divergence</dc:subject>
<dc:subject>support
vector machines</dc:subject>
<dc:subject>SVM</dc:subject>
<dc:subject>machine learning.</dc:subject>
<dc:title>Application of KL Divergence for Estimation of Each Metabolic Pathway Genes</dc:title>
<dc:type>info:eu-repo/semantics/article</dc:type>
<dc:type>publication-article</dc:type>
</oai_dc:dc>

31
10
views
downloads
All versions This version
Views 3131
Downloads 1010
Data volume 2.5 MB2.5 MB
Unique views 2727
Unique downloads 1010