Published February 16, 2024
| Version v1
Dataset
Open
Dataset of AlphaFold's internal representations of 4,581 proteins relevant for drug discovery
Description
This dataset contains the outputs of the AlphaFold model for 4,581 proteins that are relevant targets in drug discovery.
More information on the dataset can be found at the following repository:
Dataset structure:
↓ data/* -> main data directory
↓ data/PID/* -> data of a single protein of length L
Filename Description Tensor shape Lightweight single.npy ( s i ) evoformer single representation [L x 384] ✔️ structure.npy ( a i ) output of the last layer of structure module [L x 384] ✔️ msa.npy*** ( m s i ) processed MSA representation [N x L x 256] pair.npy*** ( z i j ) evoformer pair representation [L x L x 128] PID.pdb 3D protein structure prediction ✔️ PID_unrelaxed.pdb 3D protein structure prediction w/o relaxation step (D) ✔️ confidence.npy* confidence in structure prediction (0-100) 1 ✔️ plldt.npy* confidence in structure prediction per residue [L] ✔️ PID.fasta protein amino acid sequence and metadata ✔️ timings.json Processing log ✔️
↓ data/PID2/* -> data of protein #2
...
*Note: L: sequence length, N: number of aligned sequences via MSA.
Files
FoldedPapyrus_4581_v01.zip
Files
(6.1 GB)
Name | Size | Download all |
---|---|---|
md5:4bccb348b2a0dfed4f0e1b0f9d9253f4
|
6.1 GB | Preview Download |
Additional details
Software
- Repository URL
- https://github.com/andriusbern/foldedPapyrus
- Programming language
- Python