Molecular geometries and energies from quantum mechanical calculations and small molecule force field evaluations.
Authors/Creators
- 1. Department of Chemistry, University of California, Irvine
- 2. Computational Chemistry, Janssen Research \& Development, Turnhoutseweg 30, Beerse B-2340, Belgium
- 3. OpenEye Scientific, Santa Fe, NM 87507
Description
Force fields are used in a wide variety of contexts for classical molecular simulation, including studies on protein-ligand binding, membrane permeation, and thermophysical property prediction.
The quality of these studies relies on the quality of the force fields used to represent the systems.
Focusing on small molecules of fewer than 50 heavy atoms, this data compares nine force fields: GAFF, GAFF2, MMFF94, MMFF94S, OPLS3e, SMIRNOFF99Frosst, and the Open Force Field Parsley, versions 1.0, 1.1, and 1.2.
On a dataset comprising 22,675 molecular structures of 3,271 molecules, we analyzed force field-optimized geometries and conformer energies compared to reference quantum mechanical (QM) data.
The data was created using scripts of the benchmarkff github repository.
A corresponding manuscript is submitted, a preprint is available on ChemRxiv:
Lim, Victoria T.; Hahn, David F.; Tresadern, Gary; Bayly, Christopher I.; Mobley, David (2020): Benchmark Assessment of Molecular Geometries and Energies from Small Molecule Force Fields. ChemRxiv. Preprint
Read below or the file README.md for further information and description of the content:
# README
Version: 04 Nov 2020
For Python scripts that are NOT found in these directories, please check the
[BenchmarkFF Github repo](https://github.com/MobleyLab/benchmarkff/tree/master/tools).
## Procedure
1. Prep OPLS3e file for analysis: standardize format by OpenEye in case of differences
and convert from kJ/mol to kcal/mol.
```
cd prep
python convert_extension.py -i opls3e_minimized.sd -o opls3e.sdf
```
2. Remove mols that couldn't parameterize by ALL FFs.
```
python get_by_tag.py -i opls3e.sdf -s "SMILES QCArchive" -list trim3.txt -o trim3_full_opls3e.sdf
```
3. Run analysis.
```
conda activate parsley
# calc ddE, RMSD, and TFD distributions
python compare_ffs.py -i match.in -t 'SMILES QCArchive' --plot > metrics.out
# match_minima, only in 01_analysis_all and 02_analysis_all_smaller_cutoff
python match_minima.py -i match.in --plot --cutoff 1.0 --readpickle
# look at specific subsets, only in 01_analysis_all
python color_by_moiety.py -i match.in -p metrics.pickle -s N-N.dat azetidine.dat octahydrotetracene.dat -o scatter_tfd_3_
# look at outliers,only in 01_analysis_all and 02_analysis_all_smaller_cutoff
python tailed_parameters.py -i refdata_trim_overlap_full_openff_unconstrained-1.2.0.sdf -f <offxml file> --metric 'TFD' --cutoff 0.12 --tag "TFD to trim_overlap_full_qcarchive.sdf" --tag_smiles "SMILES QCArchive" > output_tfd.dat
```
## Brief description of contents
* High level:
```
.
├── 00_prep
│ ├── convert_extension.py
│ ├── opls3e_minimized.sd OPLS3e minimized structures from Schrodinger Maestro
│ ├── opls3e.sdf standardized through OpenEye tools
│ ├── opt_openff*.sdf OpenFF minimized conformations
├── 01_analysis_all compare all ffs (qm, GAFF(2), MMFF94(S), Smirnoff, OpenFF-X.X, OPLS3e)
├── 02_analysis_all_smaller_cutoff compare all ffs (qm, GAFF(2), MMFF94(S), Smirnoff, OpenFF-X.X, OPLS3e) with a smaller cutoff of .3 for match_minima
├── 03_analysis_latest_ffs compare only the latest versions of ffs (qm, GAFF2, MMFF94S, OpenFF-1.2, OPLS3e)
├── 04_analysis_openff_only compare only OpenFF ffs (qm, Smirnoff, OpenFF-X.X)
└── README.md
```
* Inside an output directory:
```
YY_analysis_* various output files of above mentioned scripts, some are listed and described below:
├── bar*.png parameter coverage bar plots
├── ddE.dat relative energies data
├── fig_density_*.png scatter plots of ddE vs (RMSD or TFD) for each force field
├── match.in input file for compare_ffs.py
├── metrics.out output file for compare_ffs.py
├── metrics.pickle pickle file for compare_ffs.py -- you can read this into compare_ffs instead of rerunning the full analysis
├── refdata_*.sdf output SDF files with stored RMSD / TFD scores with reference to QM for each structure
├── relene_*.dat relative energies of matched conformers
├── ridge_dde.png compared energies plot
├── ridge_rmsd.svg compared rmsds plot
├── ridge_tfd.svg compared tfds plot
├── fig_scatter_*.png scatter plots of ddE vs (RMSD or TFD). these are noisy; I don't use these
├── trim3_*.sdf input SDF files for compare_ffs.py listed in match.in file
├── violin*.* violin plot showing ddE distributions
```
Files
Files
(893.8 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:7c2f9446743ac24aeb59ae7ec5e37276
|
893.8 MB | Download |
Additional details
Funding
- National Institutes of Health
- Alchemical free energy methods for efficient drug lead optimization 1R01GM108889-01
- National Institutes of Health
- Advancing predictive physical modeling through focused development of model systems to drive new modeling innovations 1R01GM124270-01A1