Published September 10, 2024 | Version v1
Dataset Open

Datasets and geometries for "MORE-Q, Dataset for molecular olfactorial receptor engineering by quantum mechanics"

Description

We introduce the MORE-Q dataset, a quantum-mechanical (QM) dataset encompassing the structural and electronic data of non-covalent molecular sensors formed by combining 18 mucin-derived olfactorial receptors with 102 body odor volatilome (BOV) molecules. To have a better understanding of their intra- and inter-molecular interactions, we have performed accurate QM calculations in different stages of the sensor design and, accordingly, MORE-Q splits into three subsets: i) MORE-Q-G1: QM data of 18 receptors and 102 BOV molecules, ii) MORE-Q-G2: QM data of 23, 838 BOV-receptor configurations, and iii) MORE-Q-G3: QM data of 1, 836 BOV-receptor-graphene systems. Each subset involves geometries optimized using GFN2-xTB with D4 dispersion correction and up to 39 physicochemical properties, including global and local properties as well as binding features, all computed at the tightly converged PBE+D3 level of theory. By addressing BOV-receptor-graphene systems from a QM perspective, MORE-Q can serve as a benchmark dataset for state-of-the-art machine learning methods developed to predict binding features. This, in turn, can provide valuable insights for developing the next-generation mucin-derived olfactory receptor sensing devices.

The dataset is provided in 3 HDF5 based files. One can also find here a README file with technical usage details and examples of how to access the information stored in the dataset (see createDF.py).

Notes

We introduce the MORE-Q dataset, a quantum-mechanical (QM) dataset encompassing the structural and electronic data of non-covalent molecular sensors formed by combining 18 mucin-derived olfactorial receptors with 102 body odor volatilome (BOV) molecules. To have a better understanding of their intra- and inter-molecular interactions, we have performed accurate QM calculations in different stages of the sensor design and, accordingly, MORE-Q splits into three subsets: i) MORE-Q-G1: QM data of 18 receptors and 102 BOV molecules, ii) MORE-Q-G2: QM data of 23, 838 BOV-receptor configurations, and iii) MORE-Q-G3: QM data of 1, 836 BOV-receptor-graphene systems. Each subset involves geometries optimized using GFN2-xTB with D4 dispersion correction and up to 39 physicochemical properties, including global and local properties as well as binding features, all computed at the tightly converged PBE+D3 level of theory. By addressing BOV-receptor-graphene systems from a QM perspective, MORE-Q can serve as a benchmark dataset for state-of-the-art machine learning methods developed to predict binding features. This, in turn, can provide valuable insights for developing the next-generation mucin-derived olfactory receptor sensing devices.

Files

README.txt

Files (1.4 GB)

Name Size Download all
md5:694cdcc163caf969e274fb8a5687ed06
6.9 kB Download
md5:a0467f77f41786560f10f6132cb6d4b4
118.5 MB Preview Download
md5:88496ebeffe8dd8a4f53295e0fe60114
2.1 MB Download
md5:1ed0ea82b6c00e73664a45b2dccee115
1.1 GB Download
md5:d498cc7a3705ab0d40d6807409f023d4
179.3 MB Download
md5:22f60b2e08e6fa27fc37f1bb6a7ed0d6
1.7 kB Preview Download