There is a newer version of the record available.

Published October 10, 2023 | Version v1
Dataset Open

Benchmark dataset for CATH hierarchical clustering tools (GeMMA/FunFHMMEr, MARC, FRAN and eMMA)

  • 1. University College London

Description

Benchmark dataset for CATH SuperFamily 3.40.50.620 (HUPS).

Contains Functional Families alignments and Hidden Markov Models generated by GeMMA/FunFHMMER, MARC, FRAN and CATH-eMMA and Python code used to assess their quality (EC purity, DOPS, Neff) and intermediate steps by the MARC and FRAN pipelines (pooling, randomisation, renaming).

3.40.50.620_embedded.pt includes embeddings for the HUPs superfamily generated using the ESM2 Protein Language Model.

Files

Files (41.2 MB)

Name Size Download all
md5:433547804445216fa7045c51f35657ab
19.8 MB Download
md5:eb9a4d07428eed7ded5b0860ec207a22
21.4 MB Download