BinaryCorp small train processed
Authors/Creators
Description
The processed BinaryCorp small train dataset for fine-tuning on the binary code similarity detection.
This dataset is used in the paper: "Nova: Generative language models for assembly code with hierarchical attention and contrastive learning"
@inproceedings{
jiang2025nova,
title={Nova: Generative Language Models for Assembly Code with Hierarchical Attention and Contrastive Learning},
author={Nan Jiang and Chengxiao Wang and Kevin Liu and Xiangzhe Xu and Lin Tan and Xiangyu Zhang and Petr Babkin},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025}
}
The dataset is originally obtained from paper "jTrans: jump-aware transformer for binary code similarity detection"
@inproceedings{10.1145/3533767.3534367,
author = {Wang, Hao and Qu, Wenjie and Katz, Gilad and Zhu, Wenyu and Gao, Zeyu and Qiu, Han and Zhuge, Jianwei and Zhang, Chao},
title = {jTrans: jump-aware transformer for binary code similarity detection},
publisher = {Association for Computing Machinery},
url = {https://doi.org/10.1145/3533767.3534367},
doi = {10.1145/3533767.3534367},
booktitle = {Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis},
pages = {1–13},
numpages = {13},
series = {ISSTA 2022}
}
Files
BinaryCorp_small_train.zip
Files
(1.0 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:c7f35543b4fae05c2003e09dae43d4cb
|
1.0 GB | Preview Download |