Published February 27, 2026
| Version 2.0.0
Dataset
Open
MPLID: Membrane Protein-Lipid Interface Dataset v2.0.0
Description
A large-scale dataset of experimentally validated lipid contact residues derived from experimentally determined structures in the Protein Data Bank.
100% EXPERIMENTAL LABELS - NO COMPUTATIONAL DATABASE DEPENDENCIES
Dataset Statistics (v2.0.0)
- Proteins: 4,704
- Total residues: 8,055,325
- Contact residues: 80,439
- Contact rate: 1.00%
- Sequence clusters: 813 (30% identity)
- Lipid codes recognized: 117
Train/Validation/Test Splits
- Train: 2,578 proteins, 4,907,696 residues
- Val: 1,051 proteins, 1,403,838 residues
- Test: 1,075 proteins, 1,743,791 residues
Key Features
- Labels derived 100% from experimentally resolved lipids in PDB structures
- 4.0 Angstrom all-atom heavy-atom distance cutoff
- 4,704 proteins across all membrane protein classes
- Cluster-aware splits prevent data leakage
- Fully reproducible from public PDB data
Notes
Files
DATA_DICTIONARY.txt
Files
(112.2 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:345982a32292f499034e9a6c3d27c434
|
4.3 kB | Preview Download |
|
md5:e249a876631dede45a3077efd8118309
|
1.3 kB | Preview Download |
|
md5:9f35324ae0263b1fad2dd57c15a91f93
|
216.3 kB | Preview Download |
|
md5:fd0775092c4236bcbf747ea75118d4a5
|
3.9 kB | Preview Download |
|
md5:69c0863eeef4cf0ec32048445a51166d
|
24.0 MB | Download |
|
md5:e6f3c0660b1740d43d49e7d3a9daf780
|
68.6 MB | Download |
|
md5:048c3881d0944d0b97c565ba66aaee2b
|
19.3 MB | Download |
Additional details
Related works
- Is supplement to
- https://github.com/omagebright/MPLID (URL)