Published October 20, 2022 | Version v1
Software Open

BinT5: Binary Code Summarisation Model

  • 1. Delft University of Technology
  • 2. University of California, Davis

Description

BinT5

This dataset is published as part of the paper: "Extending Source Code Pre-Trained Language Models to Summarise Decompiled Binaries".

It includes the fine-tuned CodeT5 checkpoints, packaged in a single .zip file.

 

For each of the models, a `pytorch.bin` file is provided in its respective folder.

These models can be loaded into CodeT5 and used for inference or further training.

To utilise the models, download the reference CodeT5-base model from HuggingFace:

> GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/Salesforce/codet5-base

  • This will pull the repo but skip the `pytorch_model.bin` file, which will be replaced in the next step.
  • Select the model that you wish to use from the respective directory. Copy this file and replace the `pytorch_model.bin` in the local `codet5-base` directory downloaded in the previous step.
  • Instead of loading in the model through HuggingFace, load in a local model. To load a local model, change line 66 in the `sh/exp_with_args.sh` file to the path of your local `codet5-base` model which you downloaded and configured in the previous step. The tokenizer does not need to be replaced.
  •  The model can now be run by executing `sh/run_exp.py`

License

Copyright 2022 ##########

Licensed under the Apache License, Version 2.0 (the "License");

you may not use this file except in compliance with the License.

You may obtain a copy of the License at:

http://www.apache.org/licenses/LICENSE-2.0

 

Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and

limitations under the License.

Files

BinT5.zip

Files (4.5 GB)

Name Size Download all
md5:fe42b04dcf8fa5d78de14ecd6cfed07c
4.5 GB Preview Download