Published March 28, 2020
                      
                       | Version v1
                    
                    
                      
                        
                          Dataset
                        
                      
                      
                        
                          
                        
                        
                          Open
                        
                      
                    
                  Protein Graphs Dataset from PDB
Description
This dataset contains the protein graphs constructed from PDB, the Protein Data Bank (www.rcsb.org/pdb), used in the paper:
Nilothpal Talukder and Mohammed J. Zaki. A distributed approach for graph mining in massive networks. Data Mining and Knowledge Discovery: Special Issue on ECML/PKDD 2016 Journal Track Papers, 30(5):1024–1052, 2016. URL: http://link.springer.com/article/10.1007/s10618-016-0466-x.
The format of graphs is as follows:
t # GID
v VID VLABEL
e VID1 VID2 ELABEL
where
GID is a graph identifier (integer)
VID is a vertex identifier (integer) with VLABEL its vertex label (integer)
VID1 VID2 denotes an edge between the two vertices, with ELABEL the edge label (integer)
Files
      
        Files
         (959.2 MB)
        
      
    
    | Name | Size | Download all | 
|---|---|---|
| md5:3699b0cf9b747310cae5a27e9b7c52ac | 959.2 MB | Download | 
Additional details
Funding
- U.S. National Science Foundation
- III: Medium: Mining petabytes of data using cloud computing and a massively parallel cyberinstrument 1302231
References
- Nilothpal Talukder and Mohammed J. Zaki. A distributed approach for graph mining in massive networks. Data Mining and Knowledge Discovery: Special Issue on ECML/PKDD 2016 Journal Track Papers, 30(5):1024–1052, 2016. URL: http://link.springer.com/article/10.1007/s10618-016-0466-x.