Published November 29, 2024
| Version v1
Dataset
Open
pLM-Repeat: Exploiting the sequence representations of protein language models for sensitive repeat detection
Description
deeprepeat_train_seq.fasta & deeprepeat_test_seq.fasta: sequence datasets used for training and testing the deeprepeat model
deeprepeat_weight.pt: deeprepeat model weight
repeatsdb_0.9.fasta & pdb30_after_symd4_sample.fasta: positive and negative datasets used for benchmarking HHrepID, RADAR and pLM-Repeat
afdb90_filtered_domain.csv: the filtered dataset after checking sequence and structural novelty and coiled coil fraction
Files
afdb90_filtered_domain.csv
Files
(80.9 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:6ce8d4ffeb99452b707eceab57c9ac1a
|
1.2 MB | Preview Download |
|
md5:3fbd5922acdc8ecd2ecb578da0f8051e
|
263.9 kB | Download |
|
md5:4c327706f1881d40625b18c4f7aa7c7b
|
2.5 MB | Download |
|
md5:37781e314bae9e8b92946a3bffce58b6
|
75.8 MB | Download |
|
md5:80d88f91ec774f62d212b44f3b356ade
|
436.3 kB | Download |
|
md5:fea65a7e52f4f989623173b9ba481338
|
783.7 kB | Download |