Published May 27, 2023 | Version v1
Dataset Open

Artifacts for the ISSTA 2023 Paper: An Empirical Study on the Effects of Obfuscation on Static Machine Learning-based Malicious JavaScript Detectors

Creators

  • 1. Huazhong University of Science and Technology

Description

An Empirical Study on the Effects of Obfuscation on Static Machine Learning-Based Malicious JavaScript Detectors

This repository contains the evaluation script and the corresponding data of the ISSTA'23 paper "An Empirical Study on the Effects of Obfuscation on Static Machine Learning-Based Malicious JavaScript Detectors".

Abstract

Machine learning is increasingly being applied to malicious JavaScript detection in response to the growing number of Web attacks and the attendant costly manual identification. In practice, to hide their malicious behaviors or protect intellectual copyrights, both malicious and benign scripts tend to obfuscate their own code before uploading. While obfuscation is beneficial, it also introduces some additional code features (e.g., dead code) into the code. When machine learning is employed to learn a malicious JavaScript detector, these additional features can affect the model to make it less effective. However, there is still a lack of clear understanding of how robust existing machine learning-based detectors are on different obfuscators.

In this paper, we conduct the first empirical study to figure out how obfuscation affects machine learning detectors based on static features. Through the results, we observe several findings: 1) Obfuscation has a significant impact on the effectiveness of detectors, causing an increase both in false negative rate (FNR) and false positive rate (FPR), and the bias of obfuscation in the training set induces detectors to detect obfuscation rather than malicious behaviors. 2) The common measures such as improving the quality of the training set by adding relevant obfuscated samples and leveraging state-of-the-art deep learning models can not work well. 3) The root cause of obfuscation effects on these detectors is that feature spaces they use can only reflect shallow differences in code, not about the nature of benign and malicious, which can be easily affected by the differences brought by obfuscation. 4) Obfuscation has a similar effect on realistic detectors in VirusTotal, indicating
that this is a common real-world problem.

Getting Started

Requirements

install python3 version 3.9.12
pip3 install -r requirements.txt

install nodejs
install npm
npm install escodegen
npm install esprima

Step 1: Generating PDGs for JStap

cd detectors/jstap/pdg_generation

python generate_PDGs.py

Step 2: Getting the results for RQ1: What Impact Does Obfuscation Have on Static Machine Learning Malicious JavaScript Detectors?

cd RQ1/

1. Detectors Performance on Obfuscated Samples.

To train the models:

python RQ1_1_train.py

To get the results:

python RQ1_1_test.py

2. Different Machine Learning Algorithms.

To train the models:

python RQ1_2_train.py

To get the results:

python RQ1_2_test.py

3. Biased Training Sets

To train the models:

python RQ1_3_train.py

To get the results:

python RQ1_3_test.py

All the trained models will be stored in RQ1/models/.

All the results will be stored in RQ2/results/.

Step 3: Getting the results for RQ2: Are the Common Measures to Mitigate the Impact of Obfuscation Effective?

cd RQ2/

1. Training and Testing Detectors on Samples with Same Types of Obfuscation.

To train the models:

python RQ2_1_train.py

To get the results:

python RQ2_1_test.py

2. Training and Testing Detectors on Samples with Different Types of Obfuscation.

If you follow the steps, the models is already trained.

To get the results:

python RQ2_2_test.py

3. BERT Variants.

To get the results:

python RQ2_3.py

All the trained models will be stored in RQ2/models/.

All the results will be stored in RQ2/results/.

 

Step 4: Getting the results for RQ3: What Is the Root Cause of Obfuscation Affecting Static Machine Learning Malicious JavaScript Detectors?

To get the results of vectors visualization, top ten features, and distances between vectors sets:

cd RQ3

python visulization.py

The figures of vectors visualization will be stored in RQ3/results/.

Other results will be shown in the console.

 

Step 5: Getting the results for RQ4: How Does Obfuscation Affect Real-world Static Malicious JavaScript Detectors?

To get the results, submit the sample under the folder samples/ to VirusTotal .

 

Detailed Instructions

detectors

The detectors under the folder detectors/ are the main projects to be evaluated in our paper, which are CUJO, ZOZZLE, JAST, and JSTAP.

Detailed setup and usage instructions are described in README.md in the corresponding folder.

samples

The files under the folder samples/ are the samples from a random tenth of our dataset used in our paper.

Results can be obtained quickly using these samples. These results will not be exactly the same as in the paper, but they are similar.

RQ1

The code under folder RQ1/ is to figure out how obfuscation affects these detectors.

RQ1_1_train.py is to train four detectors with unobfuscated samples.

RQ1_1_test.py tests these trained detectors with unobfuscated and obfuscated samples.

RQ1_2_train.py is to train the detector ZOZZLE that uses different machine learning algorithms.

RQ1_2_test.py tests these trained models with unobfuscated and obfuscated samples.

RQ1_3_train.py uses a training set with all unobfuscated benign samples and all obfuscated malicious samples, and a training set with all obfuscated benign samples and all unobfuscated malicious samples to train the detectors.

RQ1_3_test.py uses these detectors to detect unobfuscated benign samples, obfuscated benign samples, unobfuscated malicious samples, and obfuscated malicious samples, respectively.

RQ2

The code under folder RQ2/ is to study the two measures to mitigate the impact of obfuscation effective or not.

RQ2_1_train.py uses obfuscated samples to train four detectors.

RQ2_1_test.py tests these detectors on the same type of obfuscated samples.

RQ2_2_test.py tests thest detectors on the different type of obfuscated samples.

RQ2_3.py uses the BERT variants to generate code representation of unobfuscated samples, trains the detector with these code representations, and tests the trained detectors with code representations of obfuscated samples.

RQ3

The code unser fodler RQ3/ visualizes the vectors, extracts the ten most important features, and calculates the distance between different sets of vectors.

RQ4

There is no code related to RQ4 here because the actual operation of RQ4 is to submit the samples to VirusTotal .

 

The whole dataset is available at https://drive.google.com/file/d/1a7pNUwzikiJyY9L7dIu53I6_MR0oDpgi/view?usp=sharing.

 

Cite this work

@inproceedings{staticanalysis,
  author    = {Kunlun Ren, Qiang Weizhong, Yueming Wu, Yi Zhou, Deqing Zou, Hai Jin}, 
  title     = {An Empirical Study on the Effects of Obfuscation on Static Machine Learning-Based Malicious JavaScript Detectors},
  booktitle = {Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA'23)},
  year      = {2023}
}

 

Files

Obfucation_effects.zip

Files (139.8 MB)

Name Size Download all
md5:ed01619706f287a51686f7080e5d3155
139.8 MB Preview Download