Autonomous extraction and building of machine-readable molecular models from publications using large language models

Fleckenstein, Florian; Rodin, Volodymyr; Dieudonne, Isabell; Al Machot, Fadi; Chiacchiera, Silvia; Akagic, Amila; Stephan, Simon

doi:10.5281/zenodo.17178832

Published September 22, 2025 | Version v1

Preprint Open

Autonomous extraction and building of machine-readable molecular models from publications using large language models

1. Rheinland-Pfälzische Technische Universität Kaiserslautern-Landau
2. University of Sarajevo
3. Norwegian University of Life Sciences
4. Science and Technology Facilities Council (STFC)

Force field models are one of the central elements of molecular simulations. A
large number of molecular force field models has been developed in the past decades
– mostly published in scientific papers. Building machine readable force field input
files for simulation engines is a tedious and error-prone task. We developed a method
for autonomously extracting and building force field files from publications using large
language models (LLM). We have tested the new method by extracting 114 force
field models from 21 scientific publications. The studied force fields comprise 6 - 74
parameters. We have compared the performance of different LLMs, namely Gemini
2.5 Pro, Claude 4 Sonnet, GPT-4o, Claude 3.7 Sonnet, and Gemini 2.5 Flash. Overall,
they yield a similar performance – yet, important differences in individual cases. The
overall best performance was obtained by the Gemini 2.5 Pro LLM. The force field
parameters were extracted and identified with an accuracy of 89.1% by the Gemini
2.5 Pro LLM. The new autonomous extraction method drastically reduces the time
required for building force field files – and does not depend on the experience of the
simulator.

Files

ExtractingForceFieldsFromPapersUsingLLMs.pdf

Files (2.4 MB)

Name	Size	Download all
ExtractingForceFieldsFromPapersUsingLLMs.pdf md5:046487281d96e2b7b3ed87a5923f1135	2.4 MB	Preview Download

	All versions	This version
Views	30	30
Downloads	38	37
Data volume	148.8 MB	146.5 MB

Autonomous extraction and building of machine-readable molecular models from publications using large language models

Authors/Creators

Description

Files

ExtractingForceFieldsFromPapersUsingLLMs.pdf

Files (2.4 MB)