Conference paper Open Access

I'll take that to go: Big data bags and minimal identifiers for exchange of large, complex datasets

Chard, Kyle; D'Arcy, Mike; Heavner, Ben; Foster, Ian; Kesselman, Carl; Madduri, Ravi; Rodriguez, Alexis; Soiland-Reyes, Stian; Goble, Carole; Clark, Kristi; Deutsch, Eric W.; Dinov, Ivo; Price, Nathan; Toga, Arthur


Citation Style Language JSON Export

{
  "publisher": "IEEE", 
  "DOI": "10.1109/BigData.2016.7840618", 
  "ISBN": "978-1-4673-9005-7", 
  "container_title": "2016 IEEE International Conference on Big Data (Big Data)", 
  "title": "I'll take that to go: Big data bags and minimal identifiers for exchange of large, complex datasets", 
  "issued": {
    "date-parts": [
      [
        2016, 
        12, 
        5
      ]
    ]
  }, 
  "abstract": "<p><em>Big data workflows</em> often require the assembly and exchange of complex, multi-element datasets. For example, in biomedical applications, the input to an analytic pipeline can be a dataset consisting thousands of images and genome sequences assembled from diverse repositories, requiring a description of the contents of the dataset in a concise and unambiguous form. Typical approaches to creating datasets for big data workflows assume that all data reside in a single location, requiring costly data marshaling and permitting errors of omission and commission because dataset members are not explicitly specified.</p>\n\n<p>We address these issues by proposing simple methods and tools for assembling, sharing, and analyzing large and complex datasets that scientists can easily integrate into their daily workflows. These tools combine a simple and robust method for describing data collections (<strong>BDBags</strong>), data descriptions (<strong>Research Objects</strong>), and simple persistent identifiers (<strong>Minids</strong>) to create a powerful ecosystem of tools and services for big data analysis and sharing.</p>\n\n<p>We present these tools and use biomedical case studies to illustrate their use for the rapid assembly, sharing, and analysis of large datasets.</p>", 
  "author": [
    {
      "family": "Chard, Kyle"
    }, 
    {
      "family": "D'Arcy, Mike"
    }, 
    {
      "family": "Heavner, Ben"
    }, 
    {
      "family": "Foster, Ian"
    }, 
    {
      "family": "Kesselman, Carl"
    }, 
    {
      "family": "Madduri, Ravi"
    }, 
    {
      "family": "Rodriguez, Alexis"
    }, 
    {
      "family": "Soiland-Reyes, Stian"
    }, 
    {
      "family": "Goble, Carole"
    }, 
    {
      "family": "Clark, Kristi"
    }, 
    {
      "family": "Deutsch, Eric W."
    }, 
    {
      "family": "Dinov, Ivo"
    }, 
    {
      "family": "Price, Nathan"
    }, 
    {
      "family": "Toga, Arthur"
    }
  ], 
  "page": "319-328", 
  "type": "paper-conference", 
  "id": "820878"
}
305
148
views
downloads
Views 305
Downloads 148
Data volume 105.6 MB
Unique views 291
Unique downloads 135

Share

Cite as