Documentation to Foster Sharing and Use of Open Earth Science Data: Quality Information
- 1. Socioeconomic Data and Applications Center, Center for International Earth Science Information Network, Columbia University
- 2. University of Alabama at Huntsville & NASA Marshall Space Flight Center, IMPACT Project
- 3. Jet Propulsion Laboratory, California Institute of Technology, Pasadena, CA
- 4. Science Systems and Applications, Inc. & NASA Goddard Space Flight Center
- 5. Oak Ridge National Laboratory, Oak Ridge, TN
Description
Providing capabilities to reuse open Earth science data offers opportunities to leverage observations from previously conducted research so that new research can be conducted. By supporting data producers in their efforts to share the data that they have collected, scientific data repositories enable broader audiences to reuse these data. However, simply providing access to data is not enough to facilitate understanding of the data by diverse users who did not collect the data and are not trained in the discipline represented by the data. Moreover, simply providing access to data is not enough to foster reuse by transdisciplinary users, practitioners, learners, and members of the general public who could benefit from reusing research data. If plans for the dissemination of data include reuse across disciplinary boundaries, including providing opportunities for non-scientific reuse, then the data must be packaged, described, and discoverable in a way that allows potential users to understand the data in terms of their own objectives. Effectively packaging, documenting, and indexing data offers opportunities to facilitate understanding of data and to foster reuse of data products and services by broader audiences as well as by those who are more familiar with the data. Furthermore, such data curation activities also can reduce the potential for misuse of the data.
What is often missing is the documentation of data quality that is needed to support the effective use of open data. In addition to ensuring that data are designated as open, the quality of datasets also must be described to foster more informed and proper usage. Information about the quality of open data should be clearly documented and discoverable such that it is available to all potential users of the data. When searching for open data to complete a project, potential users require information about the quality of the data to determine whether the quality of the data is sufficient to meet their data usage needs. Providing data quality information reduces the need to make assumptions when selecting among candidate data products and services for potential reuse. Similarly, data users require information on the data quality when deciding on the applicability of data and on the methods to be used for analyzing and interpreting the data. Information about the data quality should be clearly described so that it can be easily understood by potential users from cross-disciplinary fields, as well as by practitioners, such as operational forecasters, planners and decision makers.
Reuse scenarios demonstrate the need for providing understandable data quality information along with the data. For example, when identifying open data products for possible integration to create new data products or services, interoperable data integration workflows are developed to ensure that the quality of each candidate dataset is described. Data quality descriptions enable candidate data products to be assessed in terms of their compatibility with each other and their applicability for the purposes of the proposed data integration project. In these workflows, the quality of the resulting integrated dataset must be described in appropriately curated discovery metadata and in the data documentation to foster traceable, targeted, and efficient decision making about the potential of each data product and to support the data integration effort. The importance of documenting data quality is described and the value of effectively documenting and packaging data quality information is demonstrated with real-world research and operational use cases.
Notes
Files
Downs-IDCC21DocShareUseESDataQualityInfoSlides20210419.pdf
Files
(327.4 kB)
Name | Size | Download all |
---|---|---|
md5:17e127f3af8c669491a48053da437c2b
|
238.1 kB | Preview Download |
md5:de5a5b38c7ecd065f2580550ce66f2ce
|
89.3 kB | Preview Download |
Additional details
References
- Wilkinson, et al. 2016. The FAIR Guiding Principles for Scientific Data Management and Stewardship. Sci Data 3. https://doi.org/10.1038/sdata.2016.18
- Carroll, et al. 2020. The CARE Principles for Indigenous Data Governance. Data Science Journal, 19(1), DOI: http://doi.org/10.5334/dsj-2020-043
- Lin, et al. 2020. The TRUST Principles for Digital Repositories. Scientific Data 7, 144. https://doi.org/10.1038/s41597-020-0486-7
- GEOSS Data Sharing Principles. 2016. Group on Earth Observations. https://earthobservations.org/open_eo_data.php
- GEOSS Data Management Principles. 2015. Group on Earth Observations. https://earthobservations.org/open_eo_data.php