Published March 4, 2025 | Version v1
Publication Open

Defining Big Data for Generalist Repositories

  • 1. ROR icon Figshare (United Kingdom)
  • 2. ROR icon Northwestern University
  • 3. ROR icon Northwestern Medicine
  • 4. Mendeley Data

Description

This white paper defines Big Data and its key characteristics, including volume, variety, velocity, and veracity. It explains that Generalist Repositories (GRs) are suitable for handling Big Data due to their data-agnostic nature and ability to store large, heterogeneous datasets. The document highlights the increasing importance of Big Data in research and the efforts of the NIH-funded Generalist Repository Ecosystem Initiative (GREI) to support its sharing and reuse.

The document outlines specific features and challenges related to Big Data in GRs, such as storage size, file number, data updates, access methods, and interoperability with cloud platforms. It also addresses data packaging, metadata creation, moderated access, curation support, retention periods, and associated costs. The document provides links to additional GREI resources for repository comparisons and selection. It acknowledges the sources used to develop the definition of Big Data and the NIH funding support for this initiative.

Files

Defining Big Data for Generalist Repositories.pdf

Files (294.7 kB)

Name Size Download all
md5:ff21c49f22b21c8c6ddaa38bf057f6eb
294.7 kB Preview Download

Additional details

Funding

National Institutes of Health
Advancing Figshare and the generalist repository landscape to meet research community needs 3OT2DB000006-01S1
National Institutes of Health
Center for Open Science (COS) Proposal for the NIH Generalist Repository Ecosystem Initiative (GREI) 3OT2DB000001-01S1
National Institutes of Health
THE GENERALIST REPOSITORY ECOSYSTEM INITIATIVE (GREI) 1OT2DB000002-01
National Institutes of Health
Zenodo and the Generalist Repository Ecosystem Initiative (GREI) 1OT2DB000013-01
National Institutes of Health
Dryad and the Generalist Repository Ecosystem Initiative (GREI) 3OT2DB000005-01S1
National Institutes of Health
Vivli: A Generalist Repository For Clinical Trials Data 3OT2DB000003-01S1
National Institutes of Health
The Harvard Dataverse repository: A generalist repository integrated with a Data Commons 3OT2DB000004-01S1