Defining Big Data for Generalist Repositories
Description
This white paper defines Big Data and its key characteristics, including volume, variety, velocity, and veracity. It explains that Generalist Repositories (GRs) are suitable for handling Big Data due to their data-agnostic nature and ability to store large, heterogeneous datasets. The document highlights the increasing importance of Big Data in research and the efforts of the NIH-funded Generalist Repository Ecosystem Initiative (GREI) to support its sharing and reuse.
The document outlines specific features and challenges related to Big Data in GRs, such as storage size, file number, data updates, access methods, and interoperability with cloud platforms. It also addresses data packaging, metadata creation, moderated access, curation support, retention periods, and associated costs. The document provides links to additional GREI resources for repository comparisons and selection. It acknowledges the sources used to develop the definition of Big Data and the NIH funding support for this initiative.
Files
Defining Big Data for Generalist Repositories.pdf
Files
(294.7 kB)
Name | Size | Download all |
---|---|---|
md5:ff21c49f22b21c8c6ddaa38bf057f6eb
|
294.7 kB | Preview Download |
Additional details
Funding
- National Institutes of Health
- Advancing Figshare and the generalist repository landscape to meet research community needs 3OT2DB000006-01S1
- National Institutes of Health
- Center for Open Science (COS) Proposal for the NIH Generalist Repository Ecosystem Initiative (GREI) 3OT2DB000001-01S1
- National Institutes of Health
- THE GENERALIST REPOSITORY ECOSYSTEM INITIATIVE (GREI) 1OT2DB000002-01
- National Institutes of Health
- Zenodo and the Generalist Repository Ecosystem Initiative (GREI) 1OT2DB000013-01
- National Institutes of Health
- Dryad and the Generalist Repository Ecosystem Initiative (GREI) 3OT2DB000005-01S1
- National Institutes of Health
- Vivli: A Generalist Repository For Clinical Trials Data 3OT2DB000003-01S1
- National Institutes of Health
- The Harvard Dataverse repository: A generalist repository integrated with a Data Commons 3OT2DB000004-01S1