Published March 17, 2026
| Version 03-17-2026
Software
Open
BERTology of Molecular Property Prediction (Tokenization Effect)
Authors/Creators
Description
Tokenization Effects on Pre-Training
This record pertains to the Tokenization Effects Experiments.It includes the following files:
- tiny-bpe-vocab30522-ms-1234-ds-1234.tar.gz
- tiny-bpe-vocab30522-ms-1234-ds-2345.tar.gz
- tiny-bpe-vocab30522-ms-2345-ds-1234.tar.gz
- small-bpe-vocab30522-ms-1234-ds-1234.tar.gz
- small-bpe-vocab30522-ms-1234-ds-2345.tar.gz
- small-bpe-vocab30522-ms-2345-ds-1234.tar.gz
- base-bpe-vocab30522-ms-1234-ds-1234.tar.gz
- base-bpe-vocab30522-ms-1234-ds-2345.tar.gz
- base-bpe-vocab30522-ms-2345-ds-1234.tar.gz
Each tar file contains all model artifacts (checkpoints, random-number generator states, optimizer states etc.), training logs (Tensorboard, MLFlow and Weights & Biases), and evaluation results, configuration files, run scripts, SLURM sbatch driver scripts, and any additional artifacts generated during the experiments.
Preprint: https://doi.org/10.48550/arXiv.2603.13627
Files
Files
(34.6 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:00eea6bf1c8b482efa6807bd39ab8d94
|
15.4 GB | Download |
|
md5:846cd7c2b9c3d1e0edb8637831e98e48
|
3.5 GB | Download |
|
md5:d97d84d7c871c8ae16dd96e51c5eb523
|
7.7 GB | Download |
|
md5:35b3e2dc27f755932dc06084528be5c9
|
2.2 GB | Download |
|
md5:6c18bca50de0d089eca8614fa2036209
|
2.2 GB | Download |
|
md5:7208c1fb27df0d6fe32771d297868bb6
|
2.2 GB | Download |
|
md5:0cdc6289889635ce0a7d6d0b05a1c33c
|
530.0 MB | Download |
|
md5:08acbd42294e8726b64d0e48507c721c
|
538.0 MB | Download |
|
md5:d6f40e463074f868d911ab9acc6a6488
|
537.6 MB | Download |
Additional details
Software
- Repository URL
- https://github.com/molssi-ai/bertology
- Programming language
- Python
- Development Status
- Active