GALP Validation Set for Binding Site Identification in Metal Organic Frameworks
Contributors
Data collector (2):
Related person:
Supervisor:
- 1. University of Ottawa
Description
This dataset contains the binding site outputs generated with the in house GALP code for the validation of the algorithm across several molecular guests. For every experimental MOF and guest in the validation set, the dataset includes the folded adsorption density profiles, the volumetric overlap data used during fitting, and the final processed binding site results. The guests included are C2H2.T298.P1, C3H6.T373.P1, C3H8.T373.P1, CH4.T298.P1, CH4.T298.P65, CO2.T298.P1, Kr.T298.P1, Xe.T298.P1, and N2.T298.P0.75.
These files correspond to the data used in the validation of the GALP algorithm described in the paper titled “A robust and automated tool for localizing binding sites from adsorbate probability distributions generated from molecular simulation of metal organic frameworks.” A separate overview PDF summarizes which MOFs appear in the validation set for each guest. Note that for N2.T298.P0.75, ten additional MOFs were included beyond the core set of one hundred used for the other guests.
Each guest also has a companion CSV file containing information relevant to that specific guest and MOF. These CSV files report the optimized fitting parameters used for that MOF guest combination (Sigma, Radius, Cutoff, OV_Tol, RMSD), a boolean localization flag from manual assessment, Tanimoto and Entropy values for each ADPs, and general adsorption metrics such as uptake and heat of absorption.
Each compressed archive (C2H2.T298.P1.tar.gz, C3H6.T373.P1.tar.gz, C3H8.T373.P1.tar.gz, CH4.T298.P1.tar.gz, CO2.T298.P1.tar.gz, Kr.T298.P1.tar.gz, Xe.T298.P1.tar.gz, N2.T298.P0.75.tar.gz) contains one directory per MOF, with the following contents.
Notes on pseudoatom labels:
The label “O” refers to the methane adsorbate, which is treated as a single site. This is an internal naming choice and does not represent oxygen.
For C3H6, the labels Cx, Cy, and Cz refer to a coarse-grain model. Cx corresponds to CH2, Cy corresponds to CH, and Cz corresponds to CH3.
For C3H8, Cx corresponds to CH2 and Cy corresponds to the two CH3 groups.
The labels Xex and Krx refer to xenon and krypton, respectively.
Files
- FIELD -- A file in DLPOLY FIELD format defining the force field and interaction parameters used for guest–host binding energy calculations.
- CONTROL -- DLPOLY CONTROL file defining the GCMC simulation. It is not required by GALP but is parsed when present, which reduces the number of parameters needed in GALA.inp.
- GALA.inp -- Input file specifying the parameters used by the GALP code for binding-site extraction.
- gala.log -- Log file recording all operations and diagnostics from the GALA run.
- Tanimoto.txt -- Text file containing the Tanimoto values for each of the ADPs in the system.
- Entropy.txt -- Text file containing the Entropy values for each of the ADPs in the system.
- gala.err -- Error file recording any warning or errors that would be printed to the command line.
- Prob_Guest_<guest>_Site_<site>_folded.cube -- Raw folded adsorbate probability distributions for each site in guest molecule.
Subdirectories
- DL_poly_BS/ -- Directory generated by GALA to compute guest–host interaction energies for each binding site. Each subdirectory within DL_poly_BS contains DLPOLY input/output files for the framework and for each binding-site configuration.
- The <framework>_<guest> subdirectory contains the framework with a randomly placed guest configuration, used to calculate the framework’s electrostatic reference energy. This energy is subtracted from each binding-site configuration energy in GALA. In this configuration, any guest charges are zeroed out. (See the GALP documentation for more details on the energy decomposition and binding energy calculation procedure.)
- GALA_Output/ -- Directory generated by GALA containing the processed binding-site information
- <guest>_binding_sites.cif -- Binding sites in CIF format, listed in order of decreasing occupancy
- <guest>_binding_sites_fractional.xyz -- Fractional coordinates of all binding sites within the MOF unit cell
- *.vesta -- Visualization files that combine the fitted binding sites with the corresponding volumetric ADPs.
- <guest>_gala_binding_sites.xyz -- Cartesian coordinates of each binding site, ordered by decreasing occupancy. This file also reports:
- Total binding energy (Ebind)
- Electrostatic contribution percentage (esp%)
- van der Waals energy (Evdw)
- Electrostatic energy (Eesp)
- Relative occupancy (%) (occ)
- <guest>_guest_information.xyz -- Contains guest information and binding sites listed in order of increasing occupancy, including absolute occupancy values and both fractional and Cartesian coordinates
- <guest>_gala_local_maxima.cif -- Raw local maxima extracted from the probability map prior to any pruning or molecular fitting.
Directory Naming Convention
Each guest directory follows the format:
<Guest>.<Temperature in K>.<Pressure in bar>
This naming scheme identifies the exact simulation conditions associated with the files.
Files
Additional details
Software
- Repository URL
- https://github.com/uowoolab/GALA
- Programming language
- Python
- Development Status
- Active