Evaluation Dataset for ChemGraph: An Agentic Framework for Computational Chemistry Workflows
Authors/Creators
Description
This dataset provides scripts, reference data, and evaluation tools for benchmarking ChemGraph, an LLM-based molecular simulation framework. It includes outputs generated by four different language models: GPT-4o-mini, Claude-3.5-haiku, Qwen2.5-14B, and GPT-4o.
Main Files and Descriptions
- data_from_pubchempy.json: Structured chemical information obtained from PubChemPy. Serves as an input dataset for each experiment
- manual_workflow.json: A manually constructed reference workflow representing true tool call sequences and outputs. Used for benchmarking LLM results.
- llm_workflow_[...].json: A JSON file containing tool-use outputs generated by different LLMs. Includes additional metadata such as model name, timestamps and system prompt.
** Update history:
- October 7th, 2025:
- Uploaded the ChemGraph source code release associated with the manuscript.
- October 6th, 2025:
- Uploaded plotting data in the manuscript: evaluation_plot_data.json.
- August 29th, 2025:
- Expanded benchmark from 260 to 360.
- Reran all evaluations. Added GPT-4o multi-agent evaluation.
Files
ChemGraph-1.0.0.zip
Additional details
Dates
- Created
-
2025-06-02Initial submission
Software
- Repository URL
- https://github.com/argonne-lcf/ChemGraph