Autonomous extraction and building of machine-readable molecular models from publications using large language models
Authors/Creators
Description
Force field models are one of the central elements of molecular simulations. A
large number of molecular force field models has been developed in the past decades
– mostly published in scientific papers. Building machine readable force field input
files for simulation engines is a tedious and error-prone task. We developed a method
for autonomously extracting and building force field files from publications using large
language models (LLM). We have tested the new method by extracting 114 force
field models from 21 scientific publications. The studied force fields comprise 6 - 74
parameters. We have compared the performance of different LLMs, namely Gemini
2.5 Pro, Claude 4 Sonnet, GPT-4o, Claude 3.7 Sonnet, and Gemini 2.5 Flash. Overall,
they yield a similar performance – yet, important differences in individual cases. The
overall best performance was obtained by the Gemini 2.5 Pro LLM. The force field
parameters were extracted and identified with an accuracy of 89.1% by the Gemini
2.5 Pro LLM. The new autonomous extraction method drastically reduces the time
required for building force field files – and does not depend on the experience of the
simulator.
Files
ExtractingForceFieldsFromPapersUsingLLMs.pdf
Files
(2.4 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:046487281d96e2b7b3ed87a5923f1135
|
2.4 MB | Preview Download |