Published January 19, 2022 | Version 1.1
Dataset Open

Metaclusters by DPCfam clustering of UniRef50 v 2017_07

  • 1. Sissa, Trieste (IT); Area Science Park, Trieste (IT)
  • 2. Area Science Park, Trieste (IT)
  • 1. Area Science Park, Trieste (IT)
  • 2. SISSA, Trieste (IT)
  • 3. Center for Omics Sciences, IRCCS San Raffaele Hospital Milan (IT)
  • 4. European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton. UK

Description

Metaclusters obtained from the DPCfam clustering of UniRef50, v. 2017_07.
Metaclusters represent putative protein families automatically derived using the DPCfam method, as described in Unsupervised protein family classification by Density Peak clustering, Russo ET, 2020, PhD Thesis http://hdl.handle.net/20.500.11767/116345 . Supervisors: Alessandro Laio, Marco Punta.

Visit also https://dpcfam.areasciencepark.it/  to easily navigate the data.

VERSION 1.1 changes:

  • Added DPCfamB database, including all small metaclusters with  25<=N<50 seed sequences. DPCdamB files are named with the prefix B_
  • Added Alphafold representative based on AlphaFoldDB for each MC

FILES DESCRIPTION:

1) Standard DPCfam database

  • metaclusters_xml.tar.gz Metaclusters' seeds, unaligned in an xml table. Only MCs with seeds with 1) more than 50 elements and 2) average length larger than 50 a.a.s are reported. Metaclusters entries include also some statistical information about each MC (such as size, average length, low complexity fraction etc, ) and Pfam comparison (Dominant Architecture). A README file is included describing the data. A parser is included to transform XML data to space-separated tables. XML schema is included.
  • metaclusters_msas.tar.gz Metsclusters' multiple sequence alignments, in fasta format. Only MCs with seeds with 1) more than 50 elements and 2) average length larger than 50 a.a.s are reported .
  • metaclusters_hmms.tar.gz Metsclusters' profile-hmms. A ".hmm" file for each metacluser. Only MCs with seeds with 1) more than 50 elements and 2) average length larger than 50 a.a.s are reported .
  • all_metaclusters_hmm.tar.gz Collctive metaclusters' profile-hmm.  A single .hmm file collecting all MC's profile-hmm. . Only MCs with seeds with 1) more than 50 elements and 2) average length larger than 50 a.a.s are reported 
  • uniref50_annotated.xml.gz UniRef50 v.2017_07 database annotated with Pfam families and DPCfam metaclusters. A README file is included describing the data. A parser is included to transform XML data to space-separated tables. XML schema is included. XML schema is derived from uniprot's UniRef50 xml schema.

2) DPCfamB database

  • B_metaclusters_xml.tar.gz Metaclusters' seeds, unaligned in an xml table. All metaclusters are listed. Metaclusters entries include also some statistical information about each MC (such as size, average length, low complexity fraction etc, ) and Pfam comparison (Dominant Architecture). A README file is included describing the data. A parser is included to transform XML data to space-separated tables. XML schema is included. 
  • B_metaclusters_msas.tar.gz Metsclusters' multiple sequence alignments, in fasta format. Only MCs with seeds with 1) 25<=N<50  elements and 2) average length larger than 50 a.a.s are reported .
  • B_metaclusters_hmms.tar.gz Metsclusters' profile-hmms. A ".hmm" file for each metacluser. Only MCs with seeds with 1) 25<=N<50 elements and 2) average length larger than 50 a.a.s are reported .
  • B_ all_metaclusters_hmm.tar.gz Collctive metaclusters' profile-hmm.  A single .hmm file collecting all MC's profile-hmm. . Only MCs with seeds with 1) 25<=N<50 elements and 2) average length larger than 50 a.a.s are reported 

 

 

 

 

 

 

Files

Files (18.8 GB)

Name Size Download all
md5:f45fdcefda4774732518fb4540611105
868.6 MB Download
md5:69ab9002ae2723a0e09cdfb68fd53278
612.1 MB Download
md5:dc011864bfd3b5227f8eb6d367ba5c03
616.6 MB Download
md5:0495a913905493d93acac4040c38ae49
112.3 MB Download
md5:2d90e2e2701fed2808f581530d1532b1
129.7 MB Download
md5:a2355250a9b6c59a85e3809d36c0f0d6
874.6 MB Download
md5:a9c29ec54c6692426760aac2e03082d6
1.5 GB Download
md5:9155aafb636717bf65ad6b8dd79fff6d
1.7 GB Download
md5:3977c848d3cdc4c24329196321c9cbff
12.4 GB Download

Additional details

References