Published April 27, 2022 | Version v1
Poster Open

Librarians and Researchers Working Together: Applying FAIR Principles to Cohort Research Projects

  • 1. Harvard Medical School
  • 2. Harvard T.H. Chan School of Public Health

Description

Objectives: Long-term cohorts provide a valuable scientific resource. Cohort datasets can be analyzed for numerous questions that go beyond the initial study’s aims. However, this potential is achieved only if thorough data documentation and accessibility strategies are implemented. As cohort studies can span several decades, proper documentation is essential because data may have multiple file types that are not always stable over time. The NIH continues to develop data management guidelines to ensure research results are Findable, Accessible, Interoperable, and Reusable (FAIR; FAIR Data Principles). Recent funding from NIH provides opportunities to incorporate new expertise into research projects to enhance overall data management.

Methods: A new collaboration between Boston Children's Hospital Neuroepidemiology Unit and Countway Library is directly linking researchers and data science librarians to implement recommended data management practices for research studies (R01ES027825, 2018-2022, PI: Dr. Maitreyi Mazumdar). The combination of these two perspectives provides a comprehensive approach to all aspects of the data lifecycle. Some examples of the data management methods used in this study include: preparing data documentation in long-term stable file formats with associated readme files, documenting research analysis code with markdown and Jupyter Notebook, applying biomedical ontologies to ensure data are findable and operable, and exploring the use of data repositories, such as the Harvard Dataverse.

Results: We were able to create a comprehensive framework that includes best practices for data documentation, retention, and file formats. For example, successes included combining different datasets and describing them with metadata and using the MeSH subject headings to try to flag different parts of the study using consistent terminology. However, challenges arose around data file longevity; data identification; institutional preferences for data sharing; and informed consent for longitudinal studies. Examples of the project’s data management framework and documentation are presented, including strategies for overcoming some of the challenges inherent with human subjects studies.

Conclusions: The approaches to apply the FAIR Principles discussed in this study will be applicable to a broad range of epidemiologic research. Additionally, these efforts align with the recently proposed provisions for an NIH Data Management and Sharing Policy that includes data management and sharing plans, data preservation and access, and data sharing agreements. Additional information will be shared on ways to enhance data documentation efforts while limiting the additional time required from a study’s principal investigator or research staff.

Notes

Funding: NIEHS R01ES027825-S1 supported this work. Content does not necessarily represent the official views of NIH.

Files

2022-MLA-Poster-Goldman-Obrycki.pdf

Files (238.4 kB)

Name Size Download all
md5:a66d24c909e71b307a3e7454ece44cca
238.4 kB Preview Download

Additional details

Funding

Arsenic related cystic fibrosis 5R01ES027825-02
National Institutes of Health