Published November 29, 2024 | Version v1
Dataset Open

pLM-Repeat: Exploiting the sequence representations of protein language models for sensitive repeat detection

Authors/Creators

  • 1. ROR icon Max Planck Institute for Biology

Description

deeprepeat_train_seq.fasta & deeprepeat_test_seq.fasta: sequence datasets used for training and testing the deeprepeat model

deeprepeat_weight.pt: deeprepeat model weight

repeatsdb_0.9.fasta & pdb30_after_symd4_sample.fasta: positive and negative datasets used for benchmarking HHrepID, RADAR and pLM-Repeat

afdb90_filtered_domain.csv: the filtered dataset after checking sequence and structural novelty and coiled coil fraction 

Files

afdb90_filtered_domain.csv

Files (80.9 MB)

Name Size Download all
md5:6ce8d4ffeb99452b707eceab57c9ac1a
1.2 MB Preview Download
md5:3fbd5922acdc8ecd2ecb578da0f8051e
263.9 kB Download
md5:4c327706f1881d40625b18c4f7aa7c7b
2.5 MB Download
md5:37781e314bae9e8b92946a3bffce58b6
75.8 MB Download
md5:80d88f91ec774f62d212b44f3b356ade
436.3 kB Download
md5:fea65a7e52f4f989623173b9ba481338
783.7 kB Download