PROTAC-8K
Description
To evaluate the performance of the proposed DegradeMaster and baseline methods, we collected data from the PROTAC-DB 3.0 database. The latest release of PROTAC-DB 3.0 comprises 9,380 PROTAC entries, including 569 warheads, 107 E3 ligands, and 5,753 linkers. Each entry includes detailed information such as the PROTAC's SMILES representation, and UniProt ID of the POI and the E3 ligase.
We first removed entries that lack critical information, e.g., the UniProt ID of the POI or E3 ligase. For degradation labels, we utilized both explicit DC50/Dmax values and implicit values inferred from experimental descriptions to predict PROTAC degradation activity. A PROTAC is considered to have low degradation activity if DC50 is greater than or equal to 100 nM and Dmax is below 80%, otherwise, it is labeled with high degradation activity. Crystal structures of POIs and E3 ligases are sourced from the Protein Data Bank (PDB), while proteins without available crystal structures are supplemented with predicted structures generated by AlphaFold 2. We apply Smina to dock the warhead and E3 ligand to POI and E3 ligase, respectively.
Using these criteria, we constructed a supervised PROTAC dataset consisting of 620 high-activity entries and 1,011 low-activity entries. Additionally, we curated a semi-supervised PROTAC dataset containing 8,603 entries in total, incorporating the same labeled subset as the supervised dataset.
Files
PROTAC-8K.zip
Files
(65.4 MB)
Name | Size | Download all |
---|---|---|
md5:25cb9d8c1718ba2d18871a4d525d53b5
|
65.4 MB | Preview Download |
Additional details
Dates
- Submitted
-
2025-01-22Dataset of the ISMB/ECCB 2025 submitted paper "Accurate PROTAC targeted degradation prediction with DegradeMaster".