Published February 5, 2026 | Version v2
Journal article Open

Scaling Sensor Metadata Extraction for Exposure Health Using LLMs

  • 1. ROR icon University of Utah
  • 2. university of utah

Description

This repository contains resources supporting the manuscript “Scaling Sensor Metadata Extraction for Exposure Health Using Large Language Models.” It provides the workflow and supporting files for automating the extraction and harmonization of sensor metadata from exposure health literature.

Contents:

  • Paper List (Excel): list of 20 used research papers. Users should download the full-text PDFs of these papers.

  • Extraction Code: Python scripts leveraging the OpenAI API to process downloaded PDFs, extract sensor metadata, and output results in an excel file.

  • Postprocessing Code: Scripts that process the GPT-generated outputs, extract metadata fields for each attribute, and compile them into structured Excel files.

Usage:

  1. Download the listed papers in PDF format.

  2. Run the instrument_entity.py code to generate raw metadata outputs.

  3. Apply the postprocessing scripts json_to_xlsx.py to organize extracted metadata into attribute-level Excel tables.

Files

Evaluation.zip

Files (230.1 kB)

Name Size Download all
md5:9a3fd7347e418ae5027d49a09791bf7e
210.4 kB Preview Download
md5:a78249eee62bf29643cfe44afca81399
7.7 kB Download
md5:8860fc6e0bbd22efbbd15a2f721e351c
1.5 kB Download
md5:52a1075b0e2f6a1ee1b6558f49c72ec0
10.5 kB Download

Additional details

Dates

Submitted
2025-08-22

Software

Programming language
Python