Published March 12, 2026 | Version v3
Dataset Open

SAFR: Structurally Augmented Fragment Repository

  • 1. Chemotargets, S.L.
  • 2. Universitat de Girona

Description

SAFR: Structurally Augmented Fragment Repository

SAFR_v1.0, is a public dataset of high confidence protein-fragment with predicted 3D structures coming from known active compounds for each protein targe. SAFR contains 231,901 unique pairs and a total of 818,385 3D structures, curated from the ChEMBL and BindingDB databases and placed on their binding site via template docking. 

The Repository has the following structure:

SAFR_v1.0/

├── A0A1L8F5J9/                       <-- Folder named by UniProt Accession number
│   ├── 3qel.sdf                            <-- File named by PDB Code (Reference Frame)
│   │                                                  (Contains multiple fragment molecules aligned to 3qel)
│   ├── 3qem.sdf
│   └── 5ewj.sdf

├── P09601/
│   ├── 3k4f.sdf
│   └── ...

└── [UniProt_ID]/
    └── [PDB_ID].sdf
    
In each .sdf file each the 3D poses of the fragments contain the following information

Molecule Name [0-818,385]: Custom sequencial ID code unique for each fragment pose.

InchiKey:  InChIKey code of each fragment.

confidence_score [0.6-1]: Confidence score of the pose prediction calculated with our custom scoring function, all fragments in the repository have already been filtered so that we are highly confident their pose is correct.

exp_activity [>5]: pActivity of the ligand from which the fragment comes against that particular protein target.

ligand_inchikey:  InChIKey code of the ligand each fragment comes from.

pdb: PDB code of the protein that acts as spatial frame of reference (Matches the name of the .sdf file).

uniprot: Uniprot Accession number for the protein target.

BindingDB_Compound_id: Ligand ID on the BindingDB if available.

CHEMBL_Compound_id: Ligand ID on ChEMBL if available.

 

Files

SAFR_v1.0.1.zip

Files (426.3 MB)

Name Size Download all
md5:daaaac13d12cea2d6254dfc864d07fac
307.2 MB Preview Download
md5:c428ce75cea8e9f0bef9ccb211be3d57
119.0 MB Preview Download

Additional details

Funding

Ministerio de Ciencia, Innovación y Universidades
PID2023-153094OB-I00
Agència de Gestió d'Ajuts Universitaris i de Recerca
2024DI00015