DPCfam PUA_UR50 and P53_UR50 datasets and metaclusters

Elena Tea Russo

doi:10.5281/zenodo.3934399

Published July 7, 2020 | Version v1

Other Open

DPCfam PUA_UR50 and P53_UR50 datasets and metaclusters

Elena Tea Russo¹

1. Sissa, Trieste

This zip contains data used in DPCfam to analyise PUA_UR50 and P53_UR50 datasets (paper in submission).

Each folder, named as the dataset, contains two files:
- all.fasta (fasta file containing the query sequences of the dataset)
- all_blasted_out.txt (alignments produced running blast using the respective all.fasta file)
To analyze these files, you can use the DPCfam0 program at https://gitlab.com/ETRu/dpcfam (see the repository README on how to use these data)

Moreover, each folder contains an "MCs" folder.
Here final MCs, filtered at 95 PI with CD-HIT,are stored. Each MC file is a fasta file named as the numbered MC discussed in the paper. Each sequence is named using its protein's Uniref50 identifier and, separated by a | , the starting and the endig position of the sequence along the given protein. Note that the sequences reported are NOT the full protein, but the specific sequence located at the starting-ending position written in the sequence name.

Note, finally, that the enumeration of the MCs reported corresponds to the enumeration in the paper tables (2 and 4), and NOT with the enumeration produced by the algorithm.

Files

Files (1.2 GB)

Name	Size	Download all
DPCfam-PUA-P53-data.tar.gz md5:c4471f4f17b649864dcf15e162b5d13c	1.2 GB	Download

	All versions	This version
Views	595	239
Downloads	105	44
Data volume	122.2 GB	50.9 GB

DPCfam PUA_UR50 and P53_UR50 datasets and metaclusters

Authors/Creators

Description

Files

Files (1.2 GB)