Published January 27, 2026 | Version v2.0.3
Dataset Open

KinFragLib: Combinatorial library

Description

KinFragLib: Exploring the Kinase Inhibitor Space Using Subpocket-Focused Fragmentation and Recombination.

Project description.

Protein kinases play a crucial role in many cell signaling processes, making them one of the most important families of drug targets. In this context, fragment-based drug design strategies have been successfully applied to develop novel kinase inhibitors, usually following a knowledge-driven approach to optimize a focused set of fragments to a potent kinase inhibitor.

Alternatively, KinFragLib is a new method that allows to explore and extend the chemical space of kinase inhibitors using data-driven fragmentation and recombination, built on available structural kinome data from the KLIFS database for over 3,200 kinase DFG-in complexes. The computational fragmentation method splits the co-crystallized non-covalent kinase inhibitors into fragments with respect to their 3D proximity to six predefined functionally relevant subpocket centers. The resulting fragment library consists of six subpocket pools with over 9,000 fragments, available at https://github.com/volkamerlab/KinFragLib.

KinFragLib offers two main applications: (i) In-depth analyses of the chemical space of known kinase inhibitors, subpocket characteristics and connections, as well as (ii) subpocket-informed recombination of fragments to generate potential novel inhibitors. The latter showed that recombining only a subset of 722 representative fragments generated a combinatorial library of 11.3 million molecules, containing, besides some known kinase inhibitors, more than 99% novel chemical matter compared to ChEMBL and 56% molecules compliant with Lipinski's rule of five.

Combinatorial library dataset.

The dataset offered here is part of the KinFragLib GitHub repository (https://github.com/volkamerlab/KinFragLib) and contains the metadata and properties of the KinFragLib combinatorial library.

1. Raw data

  • combinatorial_library.json: Full combinatorial library, please refer to notebooks/4_1_combinatorial_library_data_preparation.ipynb at https://github.com/volkamerlab/KinFragLib for detailed information about this data format.
  • combinatorial_library_deduplicated.json: Deduplicated combinatorial library (based on InChIs).
  • chembl_standardized_inchi.csv: Standardized ChEMBL 36 molecules in the form of InChI strings.
  • klifs_download_summary.csv: PDB codes of all KLIFS structures used to generate the KinFragLib fragmentation library. 
  • combinatorial_library_custom_sampled.sdf: Combinatorial library created from a sampled subset of CustomKinFragLib fragments. 
  • combinatorial_library_rejected_sampled.sdf: Combinatorial library created from a sampled subset of fragments rejected by the CustomKinFragLib filtering pipeline. 

2. Processed data

Data extracted from combinatorial_library_deduplicated.json, performed in notebooks/4_1_combinatorial_library_data_preparation.ipynb at https://github.com/volkamerlab/KinFragLib.

  • n_atoms.csv: Number of atoms for each recombined ligand.
  • ro5.csv: Number of ligands that fulfill Lipinski's rule of five (Ro5) and its individual criteria; number of ligands in total.
  • subpockets.csv: Number of ligands per subpocket combination.
  • original_exact.json: Ligands with exact matches in original ligands, i.e. KLIFS ligands that were used for the fragmentation.
  • original_substructure.json: Ligands with substructure matches in original ligands, i.e. KLIFS ligands that were used for the fragmentation.
  • chembl_exact.json: Ligands with exact matches in ChEMBL.
  • chembl_most_similar.json: Most similar ligand in ChEMBL for each recombined ligand.
  • chembl_highly_similar.json: Most similar ligand in ChEMBL for each recombined ligand with similarity greater than 0.9.
  • custom_enamine_search_sampled.csv: Most similar molecule from Enamine REAL Space for each molecule in the CustomKinFragLib combinatorial library. 
  • reference_enamine_search_sampled.csv: Most similar molecule from Enamine REAL Space for each molecule in the rejected fragments combinatorial library. 

Usage.

This dataset can be used to run the notebooks available on https://github.com/volkamerlab/KinFragLib.

  1. Clone the KinFragLib repository.
  2. Download the tar.bz2 file provided here.
  3. Extract the archive content to the combinatorial library folder in your local KinFragLib folder and run the notebooks.
tar -xvf combinatorial_library.tar.bz2 -C /path_to_kinfraglib/data/combinatorial_library/

 Citation.

This dataset is part of the KinFragLib publications:

Sydow, D., Schmiel, P., Mortier, J., and Volkamer, A. KinFragLib: Exploring the Kinase Inhibitor Space Using Subpocket-Focused Fragmentation and Recombination. J. Chem. Inf. Model. 2020. https://pubs.acs.org/doi/abs/10.1021/acs.jcim.0c00839

Kramer, P. L., Buchthal, K., Sydow, D., Leo, K. S., Volkamer, A. CustomKinFragLib: Filtering the Kinase-Focused Fragmentation Library. ChemRxiv preprint. 2025. https://doi.org/10.26434/chemrxiv-2025-3gz92-v3 

Files

Files (1.9 GB)

Name Size Download all
md5:80d2962526583efcfc6c06b3eebe0f7c
1.9 GB Download