Published June 6, 2024
| Version v2
Dataset
Open
Benchmark dataset for CATH hierarchical clustering tools (GeMMA/FunFHMMEr, MARC, FRAN and eMMA)
Authors/Creators
- 1. University College London
Description
Benchmark dataset for CATH SuperFamily 3.40.50.620 (HUPS).
Contains Functional Families alignments and Hidden Markov Models generated by GeMMA/FunFHMMER, MARC, FRAN and CATH-eMMA and Python code used to assess their quality (EC purity, DOPS, Neff) and intermediate steps by the MARC and FRAN pipelines (pooling, randomisation, renaming).
3.4.50.620_full_superfamily_sequences.fasta contains all HUPs superfamily sequences, the FunFams are a subset of these.
all_starting_clusters_sequences.fasta contain the sequences included in the starting clusters used in the analyses.
3.40.50.620_embedded.pt includes embeddings for the HUPs superfamily generated using the ESM2 Protein Language Model.
Files
Files
(197.5 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:433547804445216fa7045c51f35657ab
|
19.8 MB | Download |
|
md5:eb15bea9a23f7f8154b18e93d20ffb46
|
146.5 MB | Download |
|
md5:51fcc0ec83e5ffa4ff11fcc25ac137c0
|
9.8 MB | Download |
|
md5:eb9a4d07428eed7ded5b0860ec207a22
|
21.4 MB | Download |