On the Compression of Language Models for Code: An Empirical Study on CodeBERT

Giordano d'Aloisio; Luca Traini; Federica Sarro; Antinisca Di Marco

doi:10.5281/zenodo.14357478

Published December 10, 2024 | Version 0.1

Software Open

On the Compression of Language Models for Code: An Empirical Study on CodeBERT

This repository contains the data and scripts used in the paper On the Compression of Language Models for Code: An Empirical Study on CodeBERT accepted at IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER 2025) conference.

Repository Structure

The repository is structured as follows:

analysis: this folder contains the jupyter notebooks used to analyze the data and produce the figures and tables presented in the paper.
Code-Code: this folder contains the code to fine-tune, compress, and evaluate CodeBERT on vulnerability detection task. Refer to the README.md file in this folder for more details.
Code-Text: this folder contains the code to fine-tune, compress, and evaluate CodeBERT on code summarization task. Refer to the README.md file in this folder for more details.
Text-Code: this folder contains the code to fine-tune, compress, and evaluate CodeBERT on code search task. Refer to the README.md file in this folder for more details.

Setup

Install the required dependencies by running one of the following commands:

pip

pip install -r requirements.txt

conda

conda env create -f environment.yml
conda activate lm_compress

Next, refer to the README.md file in each of the Code-Code, Code-Text and Text-Code subfolders to download the datasets for each task.

Files

giordanoDaloisio/lm-compression-evaluation-0.1.zip

Files (44.7 MB)

Name	Size	Download all
giordanoDaloisio/lm-compression-evaluation-0.1.zip md5:7a1ba9729890daec3be6328b15366bec	44.7 MB	Preview Download

Additional details

Is supplement to: Software: https://github.com/giordanoDaloisio/lm-compression-evaluation/tree/0.1 (URL)

Repository URL: https://github.com/giordanoDaloisio/lm-compression-evaluation

	All versions	This version
Views	67	67
Downloads	26	26
Data volume	1.2 GB	1.2 GB

On the Compression of Language Models for Code: An Empirical Study on CodeBERT

Creators

Description

Repository Structure

Setup

pip

conda

Files

giordanoDaloisio/lm-compression-evaluation-0.1.zip

Files (44.7 MB)

Additional details

Related works

Software