Journal article Open Access

Application of KL Divergence for Estimation of Each Metabolic Pathway Genes

Shohei Maruyama; Yasuo Matsuyama; Sachiyo Aburatani


JSON Export

{
  "files": [
    {
      "links": {
        "self": "https://zenodo.org/api/files/7e56895c-3bbd-464e-a631-884824744d27/10000800.pdf"
      }, 
      "checksum": "md5:288e9650786f21d27bfa04e38403eeed", 
      "bucket": "7e56895c-3bbd-464e-a631-884824744d27", 
      "key": "10000800.pdf", 
      "type": "pdf", 
      "size": 254115
    }
  ], 
  "owners": [
    32148
  ], 
  "doi": "10.5281/zenodo.1099834", 
  "stats": {
    "version_unique_downloads": 10.0, 
    "unique_views": 27.0, 
    "views": 31.0, 
    "version_views": 31.0, 
    "unique_downloads": 10.0, 
    "version_unique_views": 27.0, 
    "volume": 2541150.0, 
    "version_downloads": 10.0, 
    "downloads": 10.0, 
    "version_volume": 2541150.0
  }, 
  "links": {
    "doi": "https://doi.org/10.5281/zenodo.1099834", 
    "conceptdoi": "https://doi.org/10.5281/zenodo.1099833", 
    "bucket": "https://zenodo.org/api/files/7e56895c-3bbd-464e-a631-884824744d27", 
    "conceptbadge": "https://zenodo.org/badge/doi/10.5281/zenodo.1099833.svg", 
    "html": "https://zenodo.org/record/1099834", 
    "latest_html": "https://zenodo.org/record/1099834", 
    "badge": "https://zenodo.org/badge/doi/10.5281/zenodo.1099834.svg", 
    "latest": "https://zenodo.org/api/records/1099834"
  }, 
  "conceptdoi": "10.5281/zenodo.1099833", 
  "created": "2018-01-16T16:52:22.178959+00:00", 
  "updated": "2020-01-20T17:34:29.390115+00:00", 
  "conceptrecid": "1099833", 
  "revision": 5, 
  "id": 1099834, 
  "metadata": {
    "access_right_category": "success", 
    "doi": "10.5281/zenodo.1099834", 
    "description": "<p>Development of a method to estimate gene functions is<br>\nan important task in bioinformatics. One of the approaches for the<br>\nannotation is the identification of the metabolic pathway that genes are<br>\ninvolved in. Since gene expression data reflect various intracellular<br>\nphenomena, those data are considered to be related with genes&rsquo;<br>\nfunctions. However, it has been difficult to estimate the gene function<br>\nwith high accuracy. It is considered that the low accuracy of the<br>\nestimation is caused by the difficulty of accurately measuring a gene<br>\nexpression. Even though they are measured under the same condition,<br>\nthe gene expressions will vary usually. In this study, we proposed a<br>\nfeature extraction method focusing on the variability of gene<br>\nexpressions to estimate the genes&#39; metabolic pathway accurately. First,<br>\nwe estimated the distribution of each gene expression from replicate<br>\ndata. Next, we calculated the similarity between all gene pairs by KL<br>\ndivergence, which is a method for calculating the similarity between<br>\ndistributions. Finally, we utilized the similarity vectors as feature<br>\nvectors and trained the multiclass SVM for identifying the genes&#39;<br>\nmetabolic pathway. To evaluate our developed method, we applied the<br>\nmethod to budding yeast and trained the multiclass SVM for<br>\nidentifying the seven metabolic pathways. As a result, the accuracy<br>\nthat calculated by our developed method was higher than the one that<br>\ncalculated from the raw gene expression data. Thus, our developed<br>\nmethod combined with KL divergence is useful for identifying the<br>\ngenes&#39; metabolic pathway.</p>", 
    "language": "eng", 
    "title": "Application of KL Divergence for Estimation of Each Metabolic Pathway Genes", 
    "license": {
      "id": "CC-BY-4.0"
    }, 
    "relations": {
      "version": [
        {
          "count": 1, 
          "index": 0, 
          "parent": {
            "pid_type": "recid", 
            "pid_value": "1099833"
          }, 
          "is_last": true, 
          "last_child": {
            "pid_type": "recid", 
            "pid_value": "1099834"
          }
        }
      ]
    }, 
    "communities": [
      {
        "id": "waset"
      }
    ], 
    "version": "10000800", 
    "references": [
      "T. Obayashi, Y. Okamura, S. Ito, S. Tadaka, Y. Aoki, M. Shirota, and K.\nKinoshita, \"ATTED-II in 2014: evaluation of gene coexpression in\nagriculturally important plants,\" Plant Cell Physiol., vol. 55, no. 1, p. e6,\nJan. 2014.", 
      "K. Aoki, Y. Ogata, and D. Shibata, \"Approaches for extracting practical\ninformation from gene co-expression networks in plant biology,\" Plant\nCell Physiol., vol. 48, no. 3, pp. 381-390, Mar. 2007.", 
      "K. Saito, M. Y. Hirai, and K. Yonekura-Sakakibara, \"Decoding genes\nwith coexpression networks and metabolomics - 'majority report by\nprecogs',\" Trends Plant Sci., vol. 13, no. 1, pp. 36-43, Jan. 2008.", 
      "C. Cortes and V. Vapnik, \"Support-vector networks,\" Mach. Learn., vol.\n20, pp. 273-297, 1995.", 
      "C.-C. Chang and C.-J. Lin, \"LIBSVM: a library for support vector\nmachine,\" ACM Trans. Intell. Syst. Technol., vol. 2, no. 3, pp. 1-27, Apr.\n2011.", 
      "M. P. Brown, W. N. Grundy, D. Lin, N. Cristianini, C. W. Sugnet, T. S.\nFurey, M. Ares, and D. Haussler, \"Knowledge-based analysis of\nmicroarray gene expression data by using support vector machines,\" Proc.\nNatl. Acad. Sci. U. S. A., vol. 97, no. 1, pp. 262-267, Jan. 2000.", 
      "S. Kullback, and R. A. Leibler, \"On information and sufficiency,\" Annals\nof Mathematical Statistics, vol. 22, pp. 79-86, 1951.", 
      "R. Edgar, M. Domrachev, and A. E. Lash, \"Gene Expression Omnibus:\nNCBI gene expression and hybridization array data repository,\" Nucleic\nAcids Res., vol. 30, pp. 207-210, 2002.", 
      "E. Hubbell, W. M. Liu, and R. Mei, \"Robust estimators for expression\nanalysis,\" Bioinformatics, vol. 18, pp. 1585-1592, 2002.\n[10] S. D. Pepper, E. K. Saunders, L. E. Edwards, C. L. Wilson, and C. J.\nMiller, \"The utility of MAS5 expression summary and detection call\nalgorithms,\" BMC Bioinformatics, vol. 8, p. 273, 2007.\n[11] M. Kanehisa, S. Goto, Y. Sato, M. Kawashima, M. Furumichi, and M.\nTanabe, \"Data, information, knowledge and principle: back to\nmetabolism in KEGG,\" Nucleic Acids Res., vol. 42, no. Database issue,\npp. D199-205, Jan. 2014."
    ], 
    "keywords": [
      "Metabolic pathways", 
      "gene expression data", 
      "microarray", 
      "Kullback\u2013Leibler divergence", 
      "KL divergence", 
      "support\nvector machines", 
      "SVM", 
      "machine learning."
    ], 
    "publication_date": "2015-02-01", 
    "creators": [
      {
        "name": "Shohei Maruyama"
      }, 
      {
        "name": "Yasuo Matsuyama"
      }, 
      {
        "name": "Sachiyo Aburatani"
      }
    ], 
    "access_right": "open", 
    "resource_type": {
      "subtype": "article", 
      "type": "publication", 
      "title": "Journal article"
    }, 
    "related_identifiers": [
      {
        "scheme": "doi", 
        "identifier": "10.5281/zenodo.1099833", 
        "relation": "isVersionOf"
      }
    ]
  }
}
31
10
views
downloads
All versions This version
Views 3131
Downloads 1010
Data volume 2.5 MB2.5 MB
Unique views 2727
Unique downloads 1010

Share

Cite as