Published January 30, 2026 | Version v1
Dataset Open

QC Fitting Datasets for OpenFF SMIRNOFF Sage 2.3.0

Authors/Creators

  • 1. ROR icon Open Molecular Software Foundation

Contributors

Data curator:

  • 1. ROR icon Open Molecular Software Foundation
  • 2. Open Force Field

Description

Description
 
A quantum chemical (QC) optimization and torsiondrive datasets generated at the OpenFF default level of theory, B3LYP-D3BJ/DZVP, and curated to train parameters in OpenFF 2.3.0 Sage with NAGL partial charge model AshGC v1.0.
 
The records compiled in this dataset are a subset sourced from the QCArchive datasets listed in the Technical Information section, were there is overlap with the datasets used to train previous versions of Sage (see references).
 
Additional details can be found in the GitHub dataset repository and the Force field repository.
 
General Information
* Date: 2026-01-27
* Name: OpenFF SMIRNOFF Sage 2.3.0
* Dataset submitter: Jennifer A Clark
* Dataset curator: Lily Wang
 
* Class: OpenFF Optimization Dataset
* Dataset Type: optimization
* Purpose: B3LYP-D3BJ/DZVP conformers for training OpenFF 2.3.0 Sage with AshGC v1.0 NAGL partial charge model.
* Number of unique molecules: 4696
* Number of conformers: 4696
* Number of conformers (min, mean, max): 1.00, 1.00, 1.00
* Molecular weight (min, mean, max): 32.05, 207.67, 878.25
* Charges: -4.0, -3.0, -2.0, -1.0, 0.0, 1.0, 2.0
 
* Class: OpenFF TorsionDrive Dataset
* Dataset Type: torsiondrive
* Purpose: B3LYP-D3BJ/DZVP conformers for training torsion drives for OpenFF 2.3.0 Sage with AshGC v1.0 NAGL partial charge model.
* Number of unique molecules: 1265
* Number of driven torsions: 1371
* Number of conformers: 1265
* Number of conformers (min, mean, max): 1, 1, 1
* Molecular weight (min, mean, max): 32.05, 165.33, 511.27
* Charges: -3.0, -2.0, -1.0, 0.0, 1.0, 2.0

Notes

Manifest
  • OpenFF-SMIRNOFF-Sage-2.3.0_optimization_view.sqlite : QCFractal view of optimization dataset 469
  • OpenFF-SMIRNOFF-Sage-2.3.0_torsiondrive_view.sqlite : QCFractal view of torsiondrive dataset 470
  • docker_handle_dataset_views.tar.gz : Compressed docker image providing a notebook ready to view and handle the data
  • README.txt on how to launch the docker image and access the data

Instructions for Dataset Handling

This data is stored a an sqlite file the is directly readable with QCFractal as a dataset view. Because sqlite is a general format, this data may also be accessed in that way, where compressed contents are python picked json strings. To make viewing and handling the data straighforward, a docker image wiith a jupyter notebook entry has been provided.

Load the docker image with:

$ docker load -i docker_handle_dataset_views.tar.gz

A user can run the docker image to spawn the jupyter notebook.

$ mkdir views; mv *sqlite views/
$ mkdir outputs
$ docker run -p 8888:8888 -v ./views:/workspace/views -v ./outputs:/workspace/outputs docker_handle_dataset_views

The -p flag exposed the port 8888 inside the docker image to the port by the same name externally.
The -v flag exposes a directory (in this case ./views, so put your dataset views there) to a directory inside the docker image so that the jupyter notebook and access them. 
The ./outputs directory provides another shared directory that can be useful to pass output files.
If using a M* MAC, the flag --platform=linux/amd64 could be required.

Entering the URL that starts with http://127.0.0.1:8888... in a internet browser should lead to a jupyterlab instance.

Technical info

Optimization Datasets

The optimization records were selected to maximize chemical diversity using a selection of record IDs listed in the Sage 2.3.0 repository. These records came from the following datasets:

 
Torsiondrive Datasets

The optimization records were selected to maximize chemical diversity using a selection of record IDs listed in the Sage 2.3.0 repository. These records came from the following datasets:

Files

README.txt

Files (17.4 GB)

Name Size Download all
md5:c004eeb3d8d46751da3ce309024c9675
2.6 GB Download
md5:e4ebe5f641f589c69aeaf299b236388c
40.5 MB Download
md5:f566dcc834c7d47f87fb959403b88f84
14.7 GB Download
md5:79c6d3b1aaa0a3fb92ea8be9c851ef0e
1.4 kB Preview Download

Additional details

References

  • Boothroyd, S., Maat, J., Jang, H., Stern, C. D., Behara, P. K., Qiu, Y., Tjanaka, B., Madin, O., Hahn, D., Gapsys, V., Horton, J. T., Dotson, D., Gokey, T., Jennifer, C., Mitchell, J., Wang, L., Shirts, M., Cole, D., Chodera, J., & Mobley, D. (2025). QC Fitting Datasets for OpenFF SMIRNOFF Sage 2.2.0 [Data set]. Zenodo. https://doi.org/10.5281/zenodo.15635099
  • Boothroyd, S., Maat, J., Jang, H., Stern, C. D., Behara, P. K., Qiu, Y., Tjanaka, B., Madin, O., Hahn, D., Gapsys, V., Horton, J. T., Dotson, D., Gokey, T., Clark, J., Mitchell, J., Wang, L., Shirts, M., Cole, D., Chodera, J., & Mobley, D. (2025). QC Fitting Datasets for OpenFF SMIRNOFF Sage 2.1.0 [Data set]. Zenodo. https://doi.org/10.5281/zenodo.15633037
  • Boothroyd, S., Maat, J., Jang, H., Behara, P. K., Madin, O., Hahn, D., Gapsys, V., Horton, J. T., Dotson, D., Gokey, T., Clark, J., Mitchell, J., Wang, L., Shirts, M., Cole, D., & Mobley, D. (2025). QC Fitting Datasets for OpenFF SMIRNOFF Sage 2.0.0 [Data set]. In Journal of Chemical Theory and Computation (Vol. 19, pp. 3251–3275). Zenodo. https://doi.org/10.5281/zenodo.15611784