pLM-Repeat: Exploiting the sequence representations of protein language models for sensitive repeat detection

Published November 29, 2024 | Version v1

Dataset Open

deeprepeat_train_seq.fasta & deeprepeat_test_seq.fasta: sequence datasets used for training and testing the deeprepeat model

deeprepeat_weight.pt: deeprepeat model weight

repeatsdb_0.9.fasta & pdb30_after_symd4_sample.fasta: positive and negative datasets used for benchmarking HHrepID, RADAR and pLM-Repeat

afdb90_filtered_domain.csv: the filtered dataset after checking sequence and structural novelty and coiled coil fraction

Files

Name	Size	Download all
afdb90_filtered_domain.csv md5:6ce8d4ffeb99452b707eceab57c9ac1a	1.2 MB	Preview Download
deeprepeat_test_seq.fasta md5:3fbd5922acdc8ecd2ecb578da0f8051e	263.9 kB	Download
deeprepeat_train_seq.fasta md5:4c327706f1881d40625b18c4f7aa7c7b	2.5 MB	Download
deeprepeat_weight.pt md5:37781e314bae9e8b92946a3bffce58b6	75.8 MB	Download
pdb30_after_symd4_sample.fasta md5:80d88f91ec774f62d212b44f3b356ade	436.3 kB	Download
repeatsdb_0.9.fasta md5:fea65a7e52f4f989623173b9ba481338	783.7 kB	Download