PURPOSE OF THE DATASET:
The Pose Selector (PS) workflow computed absolute binding free energy (ABFE) estimates for a large-scale dataset of docking poses in order to provide training data for a machine-learning (ML) model predicting the ABFE of a ligand docking pose in a protein binding pocket given a protein structure and the atom positions of the ligand.
Starting from the docking poses, you can reproduce the 8*100 ps MD simulations that were carried out for >800,000 docking poses of 4022 protein-ligand complexes and the experimental poses of 4549 protein-ligand complexes within LIGATE as well as the subsequent free-energy calculations with the Generalised Born model of implicit solvation. As the system setup, the MD simulations, the post-processing and the final free-energy calculations were all automated in the scripts of the PS workflow, the docking poses shared here should enable you to obtain the ABFE estimates in kcal/mol listed in absoluteBindingFreeEnergyEstimates.tar.gz after re-running the system setup, the MD simulations and the free-energy calculations.
Moreover, the structure files provided in structureFiles_dockingPoses1.tar.gz, structureFiles_dockingPoses2.tar.gz and structureFiles_experimentalStructures.tar.gz together with the ABFE estimates in absoluteBindingFreeEnergyEstimates.tar.gz can be used without further modification to train an ML model predicting the ABFE of docking poses of protein-ligand complexes.

ACCESS TO THE UNDERLYING FREE-ENERGY WORKFLOW:
The scripts encoding the PS workflow can be found on https://github.com/LigateProject/Pose-Selector-workflow.

DESCRIPTION OF THE DATA FORMATS:
1) The docking poses obtained with LiGen for the entire PDBbind 2020 dataset (dockingPosesPDBBind2020.tar.gz) are grouped into folders labelled by a number and the PDB ID of the protein-ligand complex stored in the folder. Each folder contains four files named as <folder name>_ligand_cleaned.mol, <folder name>_protein.pdb, ligen_poses.mol2 and ligen_scores.txt. <folder name>_protein.pdb contains the protein structure, <folder name>_ligand_cleaned.mol provides the experimental binding pose of the ligand, ligen_poses.mol2 provides all docking poses (usually 256, for a few complexes only 16 or 1), and ligen_scores.txt provides the docking scores of the poses (the higher, the better).
2) The ABFE estimates in absoluteBindingFreeEnergyEstimates.tar.gz are grouped into two folders for ABFE estimates obtained for docking poses and ABFE estimates obtained for experimental poses observed in the crystal structure of the protein-ligand complexes. Each folder contains a collection of CSV files. Each CSV file is dedicated to one protein-ligand complex, i.e. it contains either the ABFE estimates calculated for all docking poses of the respective protein-ligand complex or the ABFE estimates calculated for the experimental pose. For each pose, eight replicate 100 ps simulations were performed such that the CSV files contain eight ABFE estimates per pose. The columns of the CSV files contain a pose identifier, the replica number, the ABFE component, the average ABFE in the trajectory, the corrected standard deviation (Prop.), the standard deviation, the corrected standard error of the mean (Prop.) and the standard error of the mean. However, for the PS workflow, only the final structure of the 100 ps simulation was analysed with the Generalised Born model of implicit solvation such that all error estimates provided by gmx_MMPBSA are zero. An error estimate can be obtained by calculating the standard deviation or the standard error of the mean for the eight ABFE values computed for each pose. All ABFE estimates are provided in kcal/mol.
3) The structure files to be employed as input data to train a machine-learning model (structureFiles_dockingPoses1.tar.gz, structureFiles_dockingPoses2.tar.gz, structureFiles_experimentalStructures.tar.gz) contain the atom positions of the protein-ligand complex that were used as the initial coordinates in the 100 ps MD simulations. The protein structure is provided as PDB file, the ligand coordinates are shared as MOL2 file.