DOME Copilot - Data Files (V1)
Description
Archive Content: DOME_Copilot_Data_Package.zip
The main data package is organised into the following directories and files:
1. Metadata Index
-
DOME_Copilot_Data_Package_Metadata.csv: A comprehensive index mapping every PMCID (PubMed Central ID) in the dataset to the directories where its data can be found. Use this to quickly locate resources.
2. Registry & Ground Truth
-
DOME_Registry_Human_Reviews_258_20260205.json: The core reference dataset containing expert human-reviewed metadata annotations for the DOME Registry papers. 258 entries.
3. Copilot Output (AI Generated)
-
Copilot_Processed_Datasets_JSON/:
-
Contains the structured JSON metadata generated by the DOME Copilot pipeline.
-
Includes processed versions for the Registry Cohort (222 papers) and the Experimental Cohort (1012 papers).
- Note: API reapired metadata files also available.
4. Supplementary Data Material This archive contains the raw supplementary material downloaded from PMC for the study cohorts, which serves as input data for the DOME analysis.
-
DOME_Registry_PMC_Supplementary/: Main PDF & upplementary files for the 222 articles in the DOME Registry (Source Truth).
-
Positive_PMC_Supplementary/: Main PDF & Supplementary files for the 1012 "Positive" articles (Machine Learning relevant - Experimental Cohort).
-
Negative_PMC_Supplementary/: Main PDF & Supplementary files for the 1012 "Negative" articles (Control set / Non-ML).
5. Evaluation Data
-
30_Evaluation_Source_JSONs_Human_and_Copilot_Including_PDFs/:
-
The specific source files used for the deep-dive comparative evaluation (Human vs. AI) on a diverse subset of 30 from 222 processed papers.
-
Contains the Copilot JSONs, Human JSONs, and original PDFs used in the evaluation interface.
Notes
-
Matching: The DOME_Copilot_Data_Package_Metadata.csv file is the key to linking entries across these folders. Not all papers have supplementary data.
-
Privacy: User-specific data (profiles, emails) has been excluded.