Published December 7, 2022 | Version v1
Dataset Open

CAFA3 based dataset

  • 1. Institute of Computing, University of Campinas

Description

Dataset based on CAFA3 database challenge. Functions presented in at least 50 and the duplicated proteins were removed from different sets.

Each file contains the ontology (bp for Biological Process, cc for Cellular Component, and mf for Molecular Function) and the set (training, validation and test). For generated for augmentation, it has the identification of augmented_training.

Files

bp-augmented_training.csv

Files (1.7 GB)

Name Size Download all
md5:a34dbccbf7ab4eec7031f1a4dc938195
787.9 MB Preview Download
md5:e9a4b239cd47a7ac80975f63e259581e
20.4 MB Preview Download
md5:85c19594547a503956226b9c225efc5d
407.3 MB Preview Download
md5:c2674223770d6a8cf680dd9335d51ebe
44.9 MB Preview Download
md5:398795bfa6179ec170d02ab6fb35ebb5
124.9 MB Preview Download
md5:0e5dc8528ca95e8897b10cddaa12a775
2.1 MB Preview Download
md5:074b13dd50fad4a6a4f13e4d8d4105d6
75.2 MB Preview Download
md5:cdc8ceefcab4fb8c9278dd07c184327f
8.3 MB Preview Download
md5:b1bed0704fb72d77350c20d25fcca0df
34.1 MB Download
md5:c3c7004cf5989bdef82420cf9198f59f
105.9 MB Preview Download
md5:2735e408dd57f6de29b1538f6b150d68
2.1 MB Preview Download
md5:b31a8f22b5934aef61b76ec3b89296da
62.1 MB Preview Download
md5:897921ce5df8174672200320926ccc87
6.9 MB Preview Download