Journal article Open Access

Application of KL Divergence for Estimation of Each Metabolic Pathway Genes

Shohei Maruyama; Yasuo Matsuyama; Sachiyo Aburatani

DataCite XML Export

<?xml version='1.0' encoding='utf-8'?>
<resource xmlns:xsi="" xmlns="" xsi:schemaLocation="">
  <identifier identifierType="DOI">10.5281/zenodo.1099834</identifier>
      <creatorName>Shohei Maruyama</creatorName>
      <creatorName>Yasuo Matsuyama</creatorName>
      <creatorName>Sachiyo Aburatani</creatorName>
    <title>Application of KL Divergence for Estimation of Each Metabolic Pathway Genes</title>
    <subject>Metabolic pathways</subject>
    <subject>gene expression data</subject>
    <subject>Kullback–Leibler divergence</subject>
    <subject>KL divergence</subject>
vector machines</subject>
    <subject>machine learning.</subject>
    <date dateType="Issued">2015-02-01</date>
  <resourceType resourceTypeGeneral="Text">Journal article</resourceType>
    <alternateIdentifier alternateIdentifierType="url"></alternateIdentifier>
    <relatedIdentifier relatedIdentifierType="DOI" relationType="IsVersionOf">10.5281/zenodo.1099833</relatedIdentifier>
    <relatedIdentifier relatedIdentifierType="URL" relationType="IsPartOf"></relatedIdentifier>
    <rights rightsURI="">Creative Commons Attribution 4.0 International</rights>
    <rights rightsURI="info:eu-repo/semantics/openAccess">Open Access</rights>
    <description descriptionType="Abstract">&lt;p&gt;Development of a method to estimate gene functions is&lt;br&gt;
an important task in bioinformatics. One of the approaches for the&lt;br&gt;
annotation is the identification of the metabolic pathway that genes are&lt;br&gt;
involved in. Since gene expression data reflect various intracellular&lt;br&gt;
phenomena, those data are considered to be related with genes&amp;rsquo;&lt;br&gt;
functions. However, it has been difficult to estimate the gene function&lt;br&gt;
with high accuracy. It is considered that the low accuracy of the&lt;br&gt;
estimation is caused by the difficulty of accurately measuring a gene&lt;br&gt;
expression. Even though they are measured under the same condition,&lt;br&gt;
the gene expressions will vary usually. In this study, we proposed a&lt;br&gt;
feature extraction method focusing on the variability of gene&lt;br&gt;
expressions to estimate the genes&amp;#39; metabolic pathway accurately. First,&lt;br&gt;
we estimated the distribution of each gene expression from replicate&lt;br&gt;
data. Next, we calculated the similarity between all gene pairs by KL&lt;br&gt;
divergence, which is a method for calculating the similarity between&lt;br&gt;
distributions. Finally, we utilized the similarity vectors as feature&lt;br&gt;
vectors and trained the multiclass SVM for identifying the genes&amp;#39;&lt;br&gt;
metabolic pathway. To evaluate our developed method, we applied the&lt;br&gt;
method to budding yeast and trained the multiclass SVM for&lt;br&gt;
identifying the seven metabolic pathways. As a result, the accuracy&lt;br&gt;
that calculated by our developed method was higher than the one that&lt;br&gt;
calculated from the raw gene expression data. Thus, our developed&lt;br&gt;
method combined with KL divergence is useful for identifying the&lt;br&gt;
genes&amp;#39; metabolic pathway.&lt;/p&gt;</description>
    <description descriptionType="Other">{"references": ["T. Obayashi, Y. Okamura, S. Ito, S. Tadaka, Y. Aoki, M. Shirota, and K.\nKinoshita, \"ATTED-II in 2014: evaluation of gene coexpression in\nagriculturally important plants,\" Plant Cell Physiol., vol. 55, no. 1, p. e6,\nJan. 2014.", "K. Aoki, Y. Ogata, and D. Shibata, \"Approaches for extracting practical\ninformation from gene co-expression networks in plant biology,\" Plant\nCell Physiol., vol. 48, no. 3, pp. 381-390, Mar. 2007.", "K. Saito, M. Y. Hirai, and K. Yonekura-Sakakibara, \"Decoding genes\nwith coexpression networks and metabolomics - 'majority report by\nprecogs',\" Trends Plant Sci., vol. 13, no. 1, pp. 36-43, Jan. 2008.", "C. Cortes and V. Vapnik, \"Support-vector networks,\" Mach. Learn., vol.\n20, pp. 273-297, 1995.", "C.-C. Chang and C.-J. Lin, \"LIBSVM: a library for support vector\nmachine,\" ACM Trans. Intell. Syst. Technol., vol. 2, no. 3, pp. 1-27, Apr.\n2011.", "M. P. Brown, W. N. Grundy, D. Lin, N. Cristianini, C. W. Sugnet, T. S.\nFurey, M. Ares, and D. Haussler, \"Knowledge-based analysis of\nmicroarray gene expression data by using support vector machines,\" Proc.\nNatl. Acad. Sci. U. S. A., vol. 97, no. 1, pp. 262-267, Jan. 2000.", "S. Kullback, and R. A. Leibler, \"On information and sufficiency,\" Annals\nof Mathematical Statistics, vol. 22, pp. 79-86, 1951.", "R. Edgar, M. Domrachev, and A. E. Lash, \"Gene Expression Omnibus:\nNCBI gene expression and hybridization array data repository,\" Nucleic\nAcids Res., vol. 30, pp. 207-210, 2002.", "E. Hubbell, W. M. Liu, and R. Mei, \"Robust estimators for expression\nanalysis,\" Bioinformatics, vol. 18, pp. 1585-1592, 2002.\n[10] S. D. Pepper, E. K. Saunders, L. E. Edwards, C. L. Wilson, and C. J.\nMiller, \"The utility of MAS5 expression summary and detection call\nalgorithms,\" BMC Bioinformatics, vol. 8, p. 273, 2007.\n[11] M. Kanehisa, S. Goto, Y. Sato, M. Kawashima, M. Furumichi, and M.\nTanabe, \"Data, information, knowledge and principle: back to\nmetabolism in KEGG,\" Nucleic Acids Res., vol. 42, no. Database issue,\npp. D199-205, Jan. 2014."]}</description>
All versions This version
Views 3131
Downloads 1010
Data volume 2.5 MB2.5 MB
Unique views 2727
Unique downloads 1010


Cite as