Artifact for Article (CI-DD-Perses): Syntax-Guided Program Reduction for Understanding Neural Code Intelligence Models
Description
This artifact contains the proposed prediction-preserving program reduction framework for CI models and the corresponding reduced data using Perses and DD algorithms that support the findings of our paper 'Syntax-Guided Program Reduction for Understanding Neural Code Intelligence Models' accepted at MAPS'22. DOI: https://doi.org/10.1145/3520312.3534869
Given a set of input programs, the proposed approach reduces each input program using DD/Perses while preserving the same prediction of the CI model. The main insight is that, by reducing some input programs of a target label, we can identify key input features of the CI model for that target label. The approach removes irrelevant parts from an input program and keeps the minimal code snippet that the CI model needs to preserve its prediction. DD is syntax-unaware and does not follow the syntax of the programming language during the reduction, therefore, an additional input program validity checking is required. On the other hand, Perses is syntax-guided and follow the syntax of the programming language during the reduction. Also, DD reduces token-by-token (or, char-by-char) from the deltas of the input program, while Perses reduces node-by-node from the tree of the input program. Having knowledge about program syntax for avoiding generating syntactically invalid programs helps Perses to run faster with valid programs than DD. As a result, Perses and DD end up with a different set of features for explaining the model's prediction.
The proposed approach is model-agnostic and can be applied to various tasks and programming datasets. For the experimentation of the approach, we study two well-known code intelligence models (Code2Vec and Code2Seq), a popular code intelligence task (MethodName), and one commonly used programming language dataset (Java-Large) with different types of input programs (Frequent, Rare, Large, Small). We first provide a systematic comparison between the syntax-guided program reduction (Perses) and the syntax-unaware program reduction (DD) in terms of token reduction, reduction steps, and reduction time. Then, we provide the summary of extracted input features and their effects on multiple explanations and adversarial attacks.
Notes
Files
mdrafiqulrabin/CI-DD-Perses-v1.0.zip
Files
(43.9 MB)
Name | Size | Download all |
---|---|---|
md5:8ff06ad6d28b1cf2adb884528c706b14
|
43.9 MB | Preview Download |
Additional details
Related works
- Is derived from
- Conference paper: 10.1145/3520312.3534869 (DOI)
- Preprint: https://arxiv.org/abs/2205.14374 (URL)
- Is supplement to
- Project deliverable: https://github.com/mdrafiqulrabin/CI-DD-Perses/tree/v1.0 (URL)