Published July 22, 2024 | Version v1
Dataset Open

ProteinCartograhpy data accompanying "Identification of capsid-like proteins in venomous and parasitic animals"

Description

This is the data for the ProteinCartography analysis in the pub "Identification of capsid-like proteins in venomous and parastic animals." Note that ProteinCartography (v0.4.2) was run in "Cluster" mode or "From-folder" mode using the parameters set in the config_ff.yml and the Ornithodoros turicata proteins. Notebooks used to fetch data and prepare custom plots can be found in the capsids GitHub repository.

 

Files in this data repository include: 

output.zip is a folder containing all of the output files for the Ornithodoros ProteinCartography run, including the maps, aggregated features files, and the all-v-all similarity matrix.

structures.zip is a folder containing all the structures used in the ProteinCartography analysis. Structures beginning with "VOG" are viral capsid proteins folded using ESMFold.

ornithodoros_aggregated_features.tsv is a file containing all the metadata gathered for each protein in the analysis from either UniProt or the VOG database.

ornithodoros_aggregated_features_pca_umap.html is the final map of the capsid proteins with the Ornithodoros proteins with metadata overlays.

ornithodoros_aggregated_features_pca_umap.tsv is a file containing all of the metadata gathered for each protein in the analysis, as well as the coordinates for the map.

ornithodoros_leiden_similarity.html is a heatmap showing the average between-cluster and within-cluster similarity between every cluster in the analysis.

tick_or_virus_umap.html is a version of the final map that specifically highlights which proteins are from capsids and which proteins are from Ornithodoros

uniprot_features1.tsv is a file containing all of the UniProt metadata fetched for the Ornithodoros proteins, as well as the viral capsid VOG identifiers. This file is used as an input for the ProteinCartography analysis.

ornithodoros.txt us a file containing all of the Ornithodoros proteins fetched from the AlphaFold database.

config_ff.yml is the configuration file for the ProteinCartography run. 

Files

ornithodoros.txt

Files (2.3 GB)

Name Size Download all
md5:cb19fd0f227d6219147cdfc763ebd31f
2.1 kB Download
md5:62ccde02be538183d4bb62b3ce74799e
83.6 kB Preview Download
md5:e730e287afea4165fd9ee5487f1d0454
13.2 MB Download
md5:dc6802552b83b5e76997f279c007a4cb
38.6 MB Download
md5:844c181c9f83e2a67cc95e5cfe377478
12.6 MB Download
md5:d49888b37bebeaec7ec46eb4076ee5ba
3.6 MB Download
md5:1f48974c8c1c3dc7e44129f2da6e8099
1.1 GB Preview Download
md5:5bf41e293f19ba00bf6e391c04bca118
1.1 GB Preview Download
md5:a7376a6309b63036456295aee498ef3b
5.3 MB Download
md5:9323e5e7ebbbfa2fd26d1fe3d9fbdca7
12.3 MB Download