Scaling Sensor Metadata Extraction for Exposure Health Using LLMs
Authors/Creators
Description
This repository contains resources supporting the manuscript “Scaling Sensor Metadata Extraction for Exposure Health Using Large Language Models.” It provides the workflow and supporting files for automating the extraction and harmonization of sensor metadata from exposure health literature.
Contents:
-
Paper List (Excel): list of 20 used research papers. Users should download the full-text PDFs of these papers.
-
Extraction Code: Python scripts leveraging the OpenAI API to process downloaded PDFs, extract sensor metadata, and output results in an excel file.
-
Postprocessing Code: Scripts that process the GPT-generated outputs, extract metadata fields for each attribute, and compile them into structured Excel files.
Usage:
-
Download the listed papers in PDF format.
-
Run the instrument_entity.py code to generate raw metadata outputs.
-
Apply the postprocessing scripts json_to_xlsx.py to organize extracted metadata into attribute-level Excel tables.
Files
Evaluation.zip
Additional details
Dates
- Submitted
-
2025-08-22
Software
- Programming language
- Python