Survey on FAIRness of CFReDS Portal datasets' metadata - 2022
Description
This dataset consists of a database (.SQL format) containing the result of the analysis of metadata of 212 datasets in the Computer Forensic Reference DataSet Portal (CFReDS, NIST). The survey that led to this dataset, carried out by Samuele Mombelli between 21 December 2022 and 28 January 2023, focused on analyzing the metadata associated with these datasets and assessing their compliance with the FAIR Principles (Findabiliy, Accessibility, Interoperability and Reusability). The data were collected using a specially developed checklist that encapsulates a set of criteria representing its own implementation of the FAIR principles.
This dataset is linked to the publication in the following article: Mombelli S., Lyle J. R., and Breitinger F. (2024). FAIRness in digital forensics datasets’ metadata – and how to improve it. Forensic Science International: Digital Investigation 48. DFRWS EU 2024 - Selected Papers from the 11th Annual Digital Forensics Research Conference Europe: 301681. https://doi.org/10.1016/j.fsidi.2023.301681.
Further details on the criteria used and the structure of the data can be found in the documentation associated with the database.
MD5 checksum of the SQL database: FBC41CFB9FF8F4CB1BE08D779DA7EB56 SHA-256 checksum of the SQL database: 23498F297F42CDC3F058A2804FE5092DCA8087BC2BC718C6DCB377B5B1207154
Two documentation files are attached to the SQL database to describe its data structure: a human-readable TXT file and a machine-readable JSON file.
Abstract
The availability of research data (datasets) and compliance with FAIR principles (i.e., Findability, Accessibility, Interoperability, and Reusability) are critical to the advancement of digital forensics. This study assesses metadata completeness and adherence to FAIR principles based on the 212 datasets listed in CFReDS from NIST. The findings underscore deficiencies in metadata quality and FAIR compliance, emphasizing the need for improved data management standards. Based on our critical review, we then propose and discuss various approaches to improve the status quo.
Results
This article has shed light on the current practices for sharing datasets in digital forensics. While there are well-established repositories providing datasets that contribute to the findability of data, we have identified a significant deficiency in addressing the holistic FAIR principles in this field.
Our study began by addressing three fundamental research questions, namely, the availability and comprehensiveness of metadata for digital forensics datasets, the compliance of this metadata with the FAIR principles, and strategies to enhance such compliance. Through a meticulous examination of all 212 datasets referenced in the CFReDS Portal, we have uncovered a sobering reality: current practices in this domain are far from ideal.
Our contributions to the field are twofold: Firstly, we have identified the existing deficiencies in metadata quality, thereby expanding the understanding of data quality challenges within the research community. This insight serves as a valuable starting point for addressing these issues comprehensively.
Secondly, we offer a practical set of recommendations aimed at improving metadata quality. These recommendations can be instrumental for dataset creators, curators, and data users, enabling them to enhance the completeness and quality of metadata associated with datasets.
Our contributions serve as a foundation for fostering better data practices, ultimately advancing the FAIR principles within the digital forensics community. We hope that this research will stimulate further discussion, innovation, and collaboration.