Published August 4, 2025 | Version V1
Software Open

SDS Toolbox: End-to-End SDS Retrieval and Structured Data Extraction Using LLMs

  • 1. ROR icon Edelweiss Connect (Switzerland)

Description

SDS Toolbox is an automated system designed to streamline the process of retrieving and extracting structured data from Safety Data Sheets (SDS) using chemical identifiers such as CAS numbers or IUPAC names. It combines intelligent search capabilities with language model–powered data extraction in a modular architecture.

Modules

The toolbox consists of three core modules:

1. SDS-FIND

  • Function: Searches for SDS files online using a CAS number or IUPAC name.

  • Sources: PDF files are retrieved from trusted sources indexed by search engines.

  • Search Engines Supported: Currently uses SerpAPI (Google) or Brave Search.

2. SDS-STRUCT

  • Function: Parses and extracts structured data from SDS PDF files.

  • Technology: Utilizes LLMs to convert unstructured PDF content into structured formats (e.g., CSV, JSON).

  • Data Output: Extracted data includes chemical identifiers, hazard statements, manufacturer info, and safety classifications.

3. SDS-FLOW

  • Function: An end-to-end pipeline that combines search (SDS-FIND) and extraction (SDS-STRUCT) into a single seamless process.

  • Input: CAS number or IUPAC name.

  • Output: Archived SDS files + extracted structured data.

  • Features:

    • Automated retries

    • Logging

    • Zip output with downloadable results

    • Email delivery (optional)

Key Features

  • Supports both single requests and batch processing

  • Modular and extensible architecture

  • Works with both local PDFs and online searches

  • Fully integrated with Streamlit UI

  • Optional email delivery of results

Files

SDS1.png

Files (518.8 kB)

Name Size Download all
md5:9ca70491c2bc2a22091b8bdf7f2f1ed8
107.0 kB Preview Download
md5:3a8e32f26beebd9f5e30f953643a0cec
207.2 kB Preview Download
md5:88593cd22313eaac970ed90210286253
204.5 kB Preview Download