Published March 17, 2026 | Version 03-17-2026
Software Open

BERTology of Molecular Property Prediction (Tokenization Effect)

  • 1. ROR icon Molecular Sciences Software Institute
  • 2. ROR icon Virginia Tech

Description

Tokenization Effects on Pre-Training

This record pertains to the Tokenization Effects Experiments.
It includes the following files:
  • tiny-bpe-vocab30522-ms-1234-ds-1234.tar.gz
  • tiny-bpe-vocab30522-ms-1234-ds-2345.tar.gz
  • tiny-bpe-vocab30522-ms-2345-ds-1234.tar.gz
  • small-bpe-vocab30522-ms-1234-ds-1234.tar.gz
  • small-bpe-vocab30522-ms-1234-ds-2345.tar.gz
  • small-bpe-vocab30522-ms-2345-ds-1234.tar.gz
  • base-bpe-vocab30522-ms-1234-ds-1234.tar.gz
  • base-bpe-vocab30522-ms-1234-ds-2345.tar.gz
  • base-bpe-vocab30522-ms-2345-ds-1234.tar.gz
 
Each tar file contains all model artifacts (checkpoints, random-number generator states, optimizer states etc.), training logs (Tensorboard, MLFlow and Weights & Biases), and evaluation results, configuration files, run scripts, SLURM sbatch driver scripts, and any additional artifacts generated during the experiments.
 
Preprint: https://doi.org/10.48550/arXiv.2603.13627

Files

Files (34.6 GB)

Name Size Download all
md5:00eea6bf1c8b482efa6807bd39ab8d94
15.4 GB Download
md5:846cd7c2b9c3d1e0edb8637831e98e48
3.5 GB Download
md5:d97d84d7c871c8ae16dd96e51c5eb523
7.7 GB Download
md5:35b3e2dc27f755932dc06084528be5c9
2.2 GB Download
md5:6c18bca50de0d089eca8614fa2036209
2.2 GB Download
md5:7208c1fb27df0d6fe32771d297868bb6
2.2 GB Download
md5:0cdc6289889635ce0a7d6d0b05a1c33c
530.0 MB Download
md5:08acbd42294e8726b64d0e48507c721c
538.0 MB Download
md5:d6f40e463074f868d911ab9acc6a6488
537.6 MB Download

Additional details

Software

Repository URL
https://github.com/molssi-ai/bertology
Programming language
Python
Development Status
Active