Conference paper Open Access

I'll take that to go: Big data bags and minimal identifiers for exchange of large, complex datasets

Chard, Kyle; D'Arcy, Mike; Heavner, Ben; Foster, Ian; Kesselman, Carl; Madduri, Ravi; Rodriguez, Alexis; Soiland-Reyes, Stian; Goble, Carole; Clark, Kristi; Deutsch, Eric W.; Dinov, Ivo; Price, Nathan; Toga, Arthur


DataCite XML Export

<?xml version='1.0' encoding='utf-8'?>
<resource xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://datacite.org/schema/kernel-4" xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4.1/metadata.xsd">
  <identifier identifierType="URL">https://zenodo.org/record/820878</identifier>
  <creators>
    <creator>
      <creatorName>Chard, Kyle</creatorName>
      <givenName>Kyle</givenName>
      <familyName>Chard</familyName>
      <affiliation>The University of Chicago and Argonne National Laboratory, Chicago IL, USA</affiliation>
    </creator>
    <creator>
      <creatorName>D'Arcy, Mike</creatorName>
      <givenName>Mike</givenName>
      <familyName>D'Arcy</familyName>
      <affiliation>University of Southern California, Los Angeles, CA, USA</affiliation>
    </creator>
    <creator>
      <creatorName>Heavner, Ben</creatorName>
      <givenName>Ben</givenName>
      <familyName>Heavner</familyName>
      <affiliation>Institute for Systems Biology, Seattle, WA, USA</affiliation>
    </creator>
    <creator>
      <creatorName>Foster, Ian</creatorName>
      <givenName>Ian</givenName>
      <familyName>Foster</familyName>
      <affiliation>The University of Chicago and Argonne National Laboratory, Chicago IL, USA</affiliation>
    </creator>
    <creator>
      <creatorName>Kesselman, Carl</creatorName>
      <givenName>Carl</givenName>
      <familyName>Kesselman</familyName>
      <affiliation>University of Southern California, Los Angeles, CA, USA</affiliation>
    </creator>
    <creator>
      <creatorName>Madduri, Ravi</creatorName>
      <givenName>Ravi</givenName>
      <familyName>Madduri</familyName>
      <affiliation>The University of Chicago and Argonne National Laboratory, Chicago IL, USA</affiliation>
    </creator>
    <creator>
      <creatorName>Rodriguez, Alexis</creatorName>
      <givenName>Alexis</givenName>
      <familyName>Rodriguez</familyName>
      <affiliation>The University of Chicago and Argonne National Laboratory, Chicago IL, USA</affiliation>
    </creator>
    <creator>
      <creatorName>Soiland-Reyes, Stian</creatorName>
      <givenName>Stian</givenName>
      <familyName>Soiland-Reyes</familyName>
      <affiliation>The University of Manchester, Manchester, UK</affiliation>
    </creator>
    <creator>
      <creatorName>Goble, Carole</creatorName>
      <givenName>Carole</givenName>
      <familyName>Goble</familyName>
      <affiliation>The University of Manchester, Manchester, UK</affiliation>
    </creator>
    <creator>
      <creatorName>Clark, Kristi</creatorName>
      <givenName>Kristi</givenName>
      <familyName>Clark</familyName>
      <affiliation>University of Southern California, Los Angeles, CA, USA</affiliation>
    </creator>
    <creator>
      <creatorName>Deutsch, Eric W.</creatorName>
      <givenName>Eric W.</givenName>
      <familyName>Deutsch</familyName>
      <affiliation>Institute for Systems Biology, Seattle, WA, USA</affiliation>
    </creator>
    <creator>
      <creatorName>Dinov, Ivo</creatorName>
      <givenName>Ivo</givenName>
      <familyName>Dinov</familyName>
      <affiliation>The University of Michigan, Ann Arbor, MI, USA</affiliation>
    </creator>
    <creator>
      <creatorName>Price, Nathan</creatorName>
      <givenName>Nathan</givenName>
      <familyName>Price</familyName>
      <affiliation>Institute for Systems Biology, Seattle, WA, USA</affiliation>
    </creator>
    <creator>
      <creatorName>Toga, Arthur</creatorName>
      <givenName>Arthur</givenName>
      <familyName>Toga</familyName>
      <affiliation>University of Southern California, Los Angeles, CA, USA</affiliation>
    </creator>
  </creators>
  <titles>
    <title>I'll take that to go: Big data bags and minimal identifiers for exchange of large, complex datasets</title>
  </titles>
  <publisher>Zenodo</publisher>
  <publicationYear>2016</publicationYear>
  <subjects>
    <subject>Big Data</subject>
    <subject>data analysis</subject>
    <subject>BDBags</subject>
    <subject>Big Data analysis</subject>
    <subject>Big Data bags</subject>
    <subject>Big Data sharing</subject>
    <subject>Minid</subject>
    <subject>data assembling</subject>
    <subject>data collections</subject>
    <subject>data descriptions</subject>
    <subject>datasets</subject>
    <subject>identifiers</subject>
    <subject>research objects</subject>
    <subject>Encoding</subject>
    <subject>Metadata</subject>
    <subject>Payloads</subject>
    <subject>Robustness</subject>
    <subject>Software</subject>
    <subject>Uniform resource locators</subject>
    <subject>bdbag</subject>
  </subjects>
  <dates>
    <date dateType="Issued">2016-12-05</date>
  </dates>
  <resourceType resourceTypeGeneral="Text">Conference paper</resourceType>
  <alternateIdentifiers>
    <alternateIdentifier alternateIdentifierType="url">https://zenodo.org/record/820878</alternateIdentifier>
  </alternateIdentifiers>
  <relatedIdentifiers>
    <relatedIdentifier relatedIdentifierType="URL" relationType="IsIdenticalTo">https://static.aminer.org/pdf/fa/bigdata2016/BigD418.pdf</relatedIdentifier>
    <relatedIdentifier relatedIdentifierType="URL" relationType="IsIdenticalTo">https://www.research.manchester.ac.uk/portal/files/45989205/bagminid.pdf</relatedIdentifier>
    <relatedIdentifier relatedIdentifierType="URL" relationType="IsSupplementedBy">http://bd2k.ini.usc.edu/tools/</relatedIdentifier>
    <relatedIdentifier relatedIdentifierType="URL" relationType="IsSupplementedBy">https://github.com/ini-bdds/bdbag</relatedIdentifier>
    <relatedIdentifier relatedIdentifierType="URL" relationType="IsPartOf">https://www.research.manchester.ac.uk/portal/en/publications/ill-take-that-to-go(8335e672-1d85-4649-a245-56fbdb1bd423).html</relatedIdentifier>
    <relatedIdentifier relatedIdentifierType="URL" relationType="Cites">https://w3id.org/ro/bagit</relatedIdentifier>
    <relatedIdentifier relatedIdentifierType="DOI" relationType="IsIdenticalTo">10.1109/BigData.2016.7840618</relatedIdentifier>
    <relatedIdentifier relatedIdentifierType="URL" relationType="IsPartOf">https://zenodo.org/communities/bioexcel</relatedIdentifier>
    <relatedIdentifier relatedIdentifierType="URL" relationType="IsPartOf">https://zenodo.org/communities/linkeddata</relatedIdentifier>
  </relatedIdentifiers>
  <rightsList>
    <rights rightsURI="http://creativecommons.org/licenses/by/4.0/legalcode">Creative Commons Attribution 4.0 International</rights>
    <rights rightsURI="info:eu-repo/semantics/openAccess">Open Access</rights>
  </rightsList>
  <descriptions>
    <description descriptionType="Abstract">&lt;p&gt;&lt;em&gt;Big data workflows&lt;/em&gt; often require the assembly and exchange of complex, multi-element datasets. For example, in biomedical applications, the input to an analytic pipeline can be a dataset consisting thousands of images and genome sequences assembled from diverse repositories, requiring a description of the contents of the dataset in a concise and unambiguous form. Typical approaches to creating datasets for big data workflows assume that all data reside in a single location, requiring costly data marshaling and permitting errors of omission and commission because dataset members are not explicitly specified.&lt;/p&gt;

&lt;p&gt;We address these issues by proposing simple methods and tools for assembling, sharing, and analyzing large and complex datasets that scientists can easily integrate into their daily workflows. These tools combine a simple and robust method for describing data collections (&lt;strong&gt;BDBags&lt;/strong&gt;), data descriptions (&lt;strong&gt;Research Objects&lt;/strong&gt;), and simple persistent identifiers (&lt;strong&gt;Minids&lt;/strong&gt;) to create a powerful ecosystem of tools and services for big data analysis and sharing.&lt;/p&gt;

&lt;p&gt;We present these tools and use biomedical case studies to illustrate their use for the rapid assembly, sharing, and analysis of large datasets.&lt;/p&gt;</description>
  </descriptions>
  <fundingReferences>
    <fundingReference>
      <funderName>European Commission</funderName>
      <funderIdentifier funderIdentifierType="Crossref Funder ID">10.13039/501100000780</funderIdentifier>
      <awardNumber awardURI="info:eu-repo/grantAgreement/EC/H2020/675728/">675728</awardNumber>
      <awardTitle>Centre of Excellence for Biomolecular Research</awardTitle>
    </fundingReference>
  </fundingReferences>
</resource>
314
150
views
downloads
Views 314
Downloads 150
Data volume 107.0 MB
Unique views 300
Unique downloads 137

Share

Cite as