Published October 7, 2025 | Version 1.3
Dataset Open

Evaluation Dataset for ChemGraph: An Agentic Framework for Computational Chemistry Workflows

  • 1. ROR icon Argonne National Laboratory
  • 2. ROR icon Argonne Leadership Computing Facility

Description

This dataset provides scripts, reference data, and evaluation tools for benchmarking ChemGraph, an LLM-based molecular simulation framework. It includes outputs generated by four different language models: GPT-4o-mini, Claude-3.5-haiku, Qwen2.5-14B, and GPT-4o. 

Main Files and Descriptions

- data_from_pubchempy.json: Structured chemical information obtained from PubChemPy. Serves as an input dataset for each experiment

- manual_workflow.json: A manually constructed reference workflow representing true tool call sequences and outputs. Used for benchmarking LLM results.

- llm_workflow_[...].json: A JSON file containing tool-use outputs generated by different LLMs. Includes additional metadata such as model name, timestamps and system prompt.

 

** Update history:

  • October 7th, 2025:
    • Uploaded the ChemGraph source code release associated with the manuscript.
  • October 6th, 2025:
    • Uploaded plotting data in the manuscript: evaluation_plot_data.json.
  • August 29th, 2025:
    • Expanded benchmark from 260 to 360. 
    • Reran all evaluations. Added GPT-4o multi-agent evaluation.

Files

ChemGraph-1.0.0.zip

Files (14.7 MB)

Name Size Download all
md5:c5e3d10699ffe91e9e96662a87a33d60
891.7 kB Preview Download
md5:c4f7a2865ef388900cc62f3dbefc3932
13.8 MB Preview Download
md5:d79dacb901e0dc6317b88c18a2af744d
36.4 kB Preview Download

Additional details

Dates

Created
2025-06-02
Initial submission