Dataset Open Access

Single-Cell Gene Expression Profiles for Classification Problems

Gualandi, Stefano; Codegoni, Andrea; Vercesi, Eleonora


JSON Export

{
  "files": [
    {
      "links": {
        "self": "https://zenodo.org/api/files/c0b51f22-9ef2-4736-8348-4531c1abe1bc/gmd_v1.0.0.zip"
      }, 
      "checksum": "md5:91b47965e75517ed653f139774ac2e0e", 
      "bucket": "c0b51f22-9ef2-4736-8348-4531c1abe1bc", 
      "key": "gmd_v1.0.0.zip", 
      "type": "zip", 
      "size": 78747182
    }
  ], 
  "owners": [
    203372
  ], 
  "doi": "10.5281/zenodo.4604569", 
  "stats": {
    "version_unique_downloads": 9.0, 
    "unique_views": 106.0, 
    "views": 115.0, 
    "version_views": 115.0, 
    "unique_downloads": 9.0, 
    "version_unique_views": 106.0, 
    "volume": 866219002.0, 
    "version_downloads": 11.0, 
    "downloads": 11.0, 
    "version_volume": 866219002.0
  }, 
  "links": {
    "doi": "https://doi.org/10.5281/zenodo.4604569", 
    "conceptdoi": "https://doi.org/10.5281/zenodo.4604568", 
    "bucket": "https://zenodo.org/api/files/c0b51f22-9ef2-4736-8348-4531c1abe1bc", 
    "conceptbadge": "https://zenodo.org/badge/doi/10.5281/zenodo.4604568.svg", 
    "html": "https://zenodo.org/record/4604569", 
    "latest_html": "https://zenodo.org/record/4604569", 
    "badge": "https://zenodo.org/badge/doi/10.5281/zenodo.4604569.svg", 
    "latest": "https://zenodo.org/api/records/4604569"
  }, 
  "conceptdoi": "10.5281/zenodo.4604568", 
  "created": "2021-03-15T10:00:20.394171+00:00", 
  "updated": "2021-03-16T07:48:36.590025+00:00", 
  "conceptrecid": "4604568", 
  "revision": 3, 
  "id": 4604569, 
  "metadata": {
    "access_right_category": "success", 
    "doi": "10.5281/zenodo.4604569", 
    "description": "<p>This repository contains a collection of three datasets we use to introduce the Gene Mover Distance in [1] and described below. The three datasets are exported with a basic text-based format (.csv file) like other public datasets largely used in the Machine Learning community.</p>\n\n<p>The three datasets are extracted from the Gene Expression Omnibus (GEO) database [2], where they appear, respectively,&nbsp;with access number&nbsp;GSE116256 (blood leukemia, [3]), GSE84133 (human pancreas, [4]), and GSE67835 (human brain, [5]). In GEO, the datasets are decomposed into several files, which contain much more details than those reported in this version.</p>\n\n<p>However, the proposed format should facilitate other researchers in using this data.</p>\n\n<p>The Gene Mover&#39;s Distance is a measure of similarity between a pair of cells based on their gene expression profiles obtained via single-cell RNA sequencing. The underlying idea of GMD is to interpret the gene expression array of a single cell as a discrete probability measure. The distance between two cells is hence computed by solving an Optimal Transport problem between the two corresponding discrete measures. The Gene Mover&#39;s Distance can be used, for instance, to solve two classification problems: the classification of cells according to their condition and according to their type.</p>\n\n<p>The repository contains a python script to check the basic statistics of the data.</p>\n\n<p>&nbsp;</p>\n\n<p>[1] Bellazzi, R., Codegoni, A., Gualandi, S., Nicora, G., Vercesi, E. <em>The Gene Mover&#39;s Distance: Single-cell similarity via Optimal Transport</em>. <a href=\"https://arxiv.org/abs/2102.01218\">https://arxiv.org/abs/2102.01218</a></p>\n\n<p>[2] Gene Expression Omnibus (GEO) database, <a href=\"http://www.ncbi.nlm.nih.gov/geo\">http://www.ncbi.nlm.nih.gov/geo</a></p>\n\n<p>[3] van Galen, P., Hovestadt, V., Wadsworth II, M.H., Hughes, T.K., Griffin, G.K., Battaglia, S., Verga, J.A., Stephansky, J., Pastika, T.J., Story, J.L. and Pinkus, G.S., 2019. <em>Single-cell RNA-seq reveals AML hierarchies relevant to disease progression and immunity</em>. Cell, 176(6), pp.1265-1281.</p>\n\n<p>[4] Baron, M., Veres, A., Wolock, S.L., Faust, A.L., Gaujoux, R., Vetere, A., Ryu, J.H., Wagner, B.K., Shen-Orr, S.S., Klein, A.M. and Melton, D.A., 2016.<em> A single-cell transcriptomic map of the human and mouse pancreas reveals inter-and intra-cell population structure</em>. Cell systems, 3(4), pp.346-360.</p>\n\n<p>[5] Darmanis, S., Sloan, S.A., Zhang, Y., Enge, M., Caneda, C., Shuer, L.M., Gephart, M.G.H., Barres, B.A. and Quake, S.R., 2015. <em>A survey of human brain transcriptome diversity at the single cell level</em>. Proceedings of the National Academy of Sciences, 112(23), pp.7285-7290.</p>", 
    "license": {
      "id": "CC-BY-4.0"
    }, 
    "title": "Single-Cell Gene Expression Profiles for Classification Problems", 
    "relations": {
      "version": [
        {
          "count": 1, 
          "index": 0, 
          "parent": {
            "pid_type": "recid", 
            "pid_value": "4604568"
          }, 
          "is_last": true, 
          "last_child": {
            "pid_type": "recid", 
            "pid_value": "4604569"
          }
        }
      ]
    }, 
    "version": "v1.0.0", 
    "keywords": [
      "Gene-expression-profile, Leukemia, Brain, Pancreas, Gene Mover Distance"
    ], 
    "publication_date": "2021-03-15", 
    "creators": [
      {
        "orcid": "0000-0002-2111-3528", 
        "affiliation": "University of Pavia", 
        "name": "Gualandi, Stefano"
      }, 
      {
        "affiliation": "University of Pavia", 
        "name": "Codegoni, Andrea"
      }, 
      {
        "affiliation": "University of Pavia", 
        "name": "Vercesi, Eleonora"
      }
    ], 
    "access_right": "open", 
    "resource_type": {
      "type": "dataset", 
      "title": "Dataset"
    }, 
    "related_identifiers": [
      {
        "scheme": "url", 
        "identifier": "http://arxiv.org/abs/2102.01218", 
        "relation": "isSupplementTo"
      }, 
      {
        "scheme": "doi", 
        "identifier": "10.5281/zenodo.4604568", 
        "relation": "isVersionOf"
      }
    ]
  }
}
115
11
views
downloads
All versions This version
Views 115115
Downloads 1111
Data volume 866.2 MB866.2 MB
Unique views 106106
Unique downloads 99

Share

Cite as