Published March 17, 2025
| Version 1.1
Dataset
Open
Pretraining data for PeptideCLM (UPDATED)
Description
This version update includes changes to Generated_peptides.csv to fix cyclization. The prior upload did not have ring closures generated correctly as SMILES strings. The model in the publication was trained on the dataset containing errors, however to support the community we decided it would be best to release a 10M peptide SMILES dataset for use in future pretraining applications. All strings should now load correctly to mol files with RDKit.
Files
Generated_peptides.csv
Files
(14.0 GB)
Name | Size | Download all |
---|---|---|
md5:f891628037f968145a7fdc0b8b099f8c
|
10.8 GB | Preview Download |
md5:e3e045b4a2c18a84d1134f261063f031
|
763.4 MB | Download |
md5:c2b81725a458a9b38e49f6e72bc110cd
|
455.7 MB | Preview Download |
md5:d683dc67487320dddb3a105faa2da2f0
|
743.0 MB | Preview Download |
md5:c595e8175d42c65d94801791962f712a
|
1.2 GB | Preview Download |
Additional details
Dates
- Available
-
2024-11-20