10.5281/zenodo.1484403
https://zenodo.org/records/1484403
oai:zenodo.org:1484403
Kyle Chard
Kyle Chard
0000-0002-7370-4805
Ravi Madduri
Ravi Madduri
Michael D'Arcy
Michael D'Arcy
Segun Jung
Segun Jung
Alexis Rodriguez
Alexis Rodriguez
Dinanath Sulakhe
Dinanath Sulakhe
Eric Deutsch
Eric Deutsch
Cory Funk
Cory Funk
Ben Heavner
Ben Heavner
Matthew Richards
Matthew Richards
Paul Shannon
Paul Shannon
Gustavo Glusman
Gustavo Glusman
Nathan Price
Nathan Price
Carl Kesselman
Carl Kesselman
Ian Foster
Ian Foster
Reproducible big data science: A case study in continuous FAIRness
Zenodo
2018
2018-10-29
Presentation
10.1101/268755
10.1109/BigData.2016.7840618
10.1038/sdata.2016.18
10.5281/zenodo.1310034
10.1101/252023
http://bd2k.ini.usc.edu/tools/bdbag/
10.1101/gr.134890.111
http://identifiers.org/ark/ark:/57799/b9dt2t
http://identifiers.org/ark/ark:/57799/b9vx04
http://identifiers.org/ark/ark:/57799/b9496p
http://identifiers.org/ark/ark:/57799/b9v398
http://identifiers.org/ark/ark:/57799/b97957
http://identifiers.org/ark/ark:/57799/b9p09p
http://identifiers.org/ark/ark:/57799/b93m4q
http://identifiers.org/ark/ark:/57799/b9jd6f
http://identifiers.org/ark/ark:/57799/b97x0j
http://identifiers.org/ark/ark:/57799/b9zh5t
http://identifiers.org/ark/ark:/57799/b9fx1s
https://github.com/fair-research/encode2bag
https://github.com/fair-research/encode2bag-service
10.5281/zenodo.1484402
https://zenodo.org/communities/ro
Creative Commons Attribution 4.0 International
Big biomedical data create exciting opportunities for discovery but make it difficult to capture analyses and outputs in forms that are findable, accessible, interoperable, and reusable (FAIR). In response, we describe tools that make it easy to capture, and assign identifiers to, data and code throughout the data lifecycle. We illustrate the use of these tools via a case study involving a multi- step analysis that creates an atlas of putative transcription factor binding sites from terabytes of ENCODE DNase I hypersensitive sites sequencing data. We show how the tools automate routine but complex tasks, capture analysis algorithms in understandable and reusable forms, and harness fast networks and powerful cloud computers to process data rapidly, all without sacrificing usability or reproducibility--thus ensuring that big data are not hard-to-(re)use data.
In this talk, we will describe the enhancements made to the Galaxy framework to support working with datasets referred to by minids, support analyzing BagIt-based research objects called BDBags, execution using software encapsulated using docker containers with unique identifiers. We will describe the tools, services developed to create an end-to-end reproducible analysis pipelines while adhering to FAIR principles.
Accepted talk at RO2018