Software Open Access

[Artifact] Nessie: Automatically Testing JavaScript APIs with Asynchronous Callbacks

Anonymous

This is a description of the artifact associated with paper 415 Nessie: Automatically Testing JavaScript APIs with Asynchronous Callbacks. It includes setup/installation instructions for our tool Nessie, and some example usage.
Note: this readme is also included in the tarball itself, as Readme.pdf and Readme.md (these versions both have slightly nicer formatting :) )

It also includes instructions for rerunning our experiments in the paper, and reproducing the plots. Note that since there are non-deterministic elements of Nessie (we use a random number generator in argument selection, etc) the results produced by rerunning the experiments will not be exactly the same as those in the paper. We've also included the original data from when we run our experiments, so that the plots in the paper can be reproduced exactly.

We use the package fs-extra as our working example walking through all of the experiments, but the process is the same for all the packages we tested.

Relevant Contents

RegressionTesting directory: raw data from the regression testing experiment

CoverageExperiment directory: raw data from the coverage experiment

MiningDataUseExperiment directory: raw data from the mined data use experiment

data_analysis.py: script for interacting with the data

ToolCode: code base for the tool

paper.pdf: paper submission associated with this artifact

supplementary.pdf: supplementary materials to go along with the paper

REQUIREMENTS.md: system requirements

INSTALL.md: instructions for installing the Nessie test generator, and a simple example of using it

STATUS.md: file stating what kind of badges we are applying for

README.md: this file, describes how to use Nessie, run the experiments in the paper, and reproduce our graphs

Docker files

Dockerfile, build.sh, runDocker.sh: files for building and running the Nessie docker container

System Requirements

This information is also in the REQUIREMENTS.md file.

We provide instructions for building a docker container that runs Nessie, so the only real requirement is that you have docker installed on your machine.

Experiments

We ran our experiments on a machine with two 32-core 2.35GHz AMD EPYC 7452 CPUs and 128GB RAM, running CentOS 7.8.2003 and Node.js 14.16.1.

You don't need this much RAM to run anything: we tested the artifact on a ThinkPad P43s with 32GB RAM running ArchLinux (kernel 5.14.8-arch1-1) and the time estimates for the provided examples are from this machine; the longest experiment takes 10 minutes. That being said, the examples we give in the artifact are not the full experiments from the paper, as these would take too long -- this is all explained in the experiments sections below.

Installation

This information is also in the INSTALL.md file.

First, unzip the artifact tarball and enter the generated directory. From there, build the docker container:

docker build -t nessie .


Then, enter the docker container. We've provided a script for this:

./runDocker.sh


To use our tool, once you're in the docker:

cd ToolCode


To install/build, run:

npm install

npm run clean
npm run compile


Test generation

To use the tool to generate tests for a repo:

Make sure you're still in the ToolCode directory.

First, download a repo to generate tests for. This example will be with fs-extra:

git clone https://github.com/jprichardson/node-fs-extra.git


Then, generate the tests. From the ToolCode directory, run

# ./genTestsForRepoCommit.sh [package-name] [path-to-package-directory] [number-of-tests] [type-of-test] [commit-hash-to-gen-for]

./genTestsForRepoCommit.sh fs-extra node-fs-extra 10 NonEmptyCallback master


This will generate 10 tests for the fs-extra package. It will generate these tests in a directory called _fs_extra_test. This relevant contents of the generated directory is the test files test[0-9].js and the test runner metatest.js.

To run the generated tests:

mocha _fs_extra_test/metatest.js


You should see the 10 tests passing, with output printing the state of the arguments, return types, callbacks executing, etc.

RQ1: API discovery experiment

The lists of all the signatures manually and automatically discovered, for all APIs, are in Section 6 of the supplementary materials. This data is summarized in Table 2 in the paper.

RQ2: Coverage experiment

In our evaluation of Nessie in the paper, we did an experiment to determine the effect of including nesting and sequencing on the code coverage achieved by the generated tests, and compared it to the coverage achieved by using the previous state of the art tool LambdaTester. These results are summarized in Table 3, with Figure 3 as a package-specific example, in the paper.

This experiment took about an hour for each package tested, and we have 10 packages in the evaluation. We include instructions below on how to rerun the experiments, but we also show how to run a small demonstrative example of the experiment (so that it can finish in a reasonable amount of time). We've included all of the data from the original experiments too, and instructions on interacting with this data so you can reproduce the plots from the paper.

Rerunning coverage experiments

For each package tested, we generate 1000 tests and compute the cumulative coverage using the nyc Istanbul command line coverage tool.

If you want to reproduce the results from the paper (modulo the nondeterminism of the approach) then you should run the test generator at the commit at which we tested the repo. Table 1 in the paper lists the SHA of the commit we tested each library on.

We use the package fs-extra as a demonstrative example.

First: make sure you're in the ToolCode directory, and download and install the package you wish to generate tests for.

git clone https://github.com/jprichardson/node-fs-extra/ Evaluation/node-fs-extra


Then, use Nessie to generate the tests and determine sequential (cumulative) coverage. This will take about an hour. If you want to do a shorter version of the experiment, we recommend trying it with 50 tests instead -- this should take about 3 minutes. The example command below uses 50 tests; if you want to run the full 1000 test experiment change the 50 to 1000 in the argument list.

./genTestsAndCumulativeCoverage.sh fs-extra Evaluation/node-fs-extra 50 NonEmptyCallback NestSeq 6bffcd8


The results will be in the newly generated file Evaluation/fs-extra_50tests_NestSeq_coverage.txt.

For this experiment, we compared Nessie with nesting+sequencing, with just sequencing, and the old LambdaTester approach.

If you want to rerun the whole experiment you'll need to rerun the above command with both other options (each of these should also take about 3 minutes):

./genTestsAndCumulativeCoverage.sh fs-extra Evaluation/node-fs-extra 50 NonEmptyCallback Seq 6bffcd8
./genTestsAndCumulativeCoverage.sh fs-extra Evaluation/node-fs-extra 50 NonEmptyCallback LambdaTester 6bffcd8


Corresponding files Evaluation/fs-extra_50tests_Seq_coverage.txt and Evaluation/fs-extra_50tests_LambdaTester_coverage.txt have been generated.

Instructions for looking at this new data is included in the next section.

Data analysis / reproducing the plots

Return to the home directory of the docker: cd /home/nessie.

To run the data analysis script, run ipython3 -i data_analysis.py from the home directory of the docker.

To generate the coverage chart for a particular package (for example, fs-extra), run the following code in the ipython session:

pkgname = "fs-extra"
with open("CoverageExperiment/"+pkgname+"_coverage_nested.out") as f:
nest_cov = [float(i) for i in f.read().split("\n") if len(i) > 0]

with open("CoverageExperiment/"+pkgname+"_coverage_nonest.out") as f:
nonest_cov = [float(i) for i in f.read().split("\n") if len(i) > 0]

with open("CoverageExperiment/"+pkgname+"_coverage_sync.out") as f:
LT_cov = [float(i) for i in f.read().split("\n") if len(i) > 0]

graph_multi_cov([[nest_cov, "NES (seq+nest)", '-x'],[nonest_cov,"NES (seq)", '-o'], [LT_cov, "LT", '-*']],
MB_LBS.get(pkgname, {}).get("LB", 0), pkgname,[0,100])


If you decided to rerun the experiments and want to look at your data from these instead of the data from the paper, simply load those generated files instead of the provided data files. For our running example:

pkgname = "fs-extra"
num_tests = 50
with open("ToolCode/Evaluation/"+pkgname+"_"+str(num_tests)+"tests_NestSeq_coverage.txt") as f:
nest_cov = [float(i) for i in f.read().split("\n") if len(i) > 0]

with open("ToolCode/Evaluation/"+pkgname+"_"+str(num_tests)+"tests_Seq_coverage.txt") as f:
nonest_cov = [float(i) for i in f.read().split("\n") if len(i) > 0]

with open("ToolCode/Evaluation/"+pkgname+"_"+str(num_tests)+"tests_LambdaTester_coverage.txt") as f:
LT_cov = [float(i) for i in f.read().split("\n") if len(i) > 0]

graph_multi_cov([[nest_cov, "NES (seq+nest)", '-x'],[nonest_cov,"NES (seq)", '-o'], [LT_cov, "LT", '-*']],
MB_LBS.get(pkgname, {}).get("LB", 0), pkgname,[0,100])


Note that if you graph these results they will not perfectly match the charts in the paper, due to the non-determinism in the Nessie test generator. Note also that if you only ran 50 tests, this will look like the very beginning of the graph in the paper (as it goes up to 1000 tests in our evaluation).

When you're done, you can exit the ipython session with CTRL-D and selecting y when prompted.

RQ3: Regression testing experiments

In our evaluation, we also did an experiment to compare Nessie with nesting + sequencing, with just sequencing, and the LambdaTester approach in terms of their efficacy in generating tests that flag behavioural differences in the code bases through regression testing. This consisted of generating tests at different commits, and then doing a diff on the test output to identify differences. We then parsed these differences to determine what kind of difference we are seeing: for example, if a callback is executed in one commit version of the code base and not the next, if there is an error in API call execution at one commit and not the next, etc. The results of this experiment are summarized in Tables 4 and 5, with a package-specific example plot in Figure 4 in the paper.

For each library in our evaluation, we generated 100 tests and ran each of them 10 times (to avoid false differences caused by async execution), over 100 sequential source-code-modifying commits. This experiment took between 3-4 hours per library (depending on how many source-code-modifying commits a library has). Like with the code coverage experiment, we include instructions for rerunning all of the experiments but we've also included a shorter, demonstrative example of the experiment. The full data from the evaluation and instructions on how to reproduce the plots from the paper are included below.

Rerunning regression testing experiments

Return to the ToolCode directory: cd ToolCode

In our experiments, we ran, for each library, 100 tests with 10 reps each, 100 commits. This experiment takes a very long time (3-4 hours depending on the library) because of all the pairwise diffs required. If you want to do a mini example, you can run:

# general case
# ./regTestRelCommits.sh libName repoSrcDir numTests feedbackMode numTestReps branch numCommitsToAnalyze testGenStrat

./regTestRelCommits.sh fs-extra Evaluation/node-fs-extra/ 10 NonEmptyCallback 2 master 10 NestSeq


This generates 10 tests for the fs-extra running example, for 10 sequential commits for which there are changes to the source code (i.e., where the git log includes modifications to JS/TS files), with 2 reps each. It should take about 9 minutes.

The output is the following set of files:

• fs-extra_10_<commit SHA 1>_<commit SHA 2>_seqDiffs.out: these files are the lists of differences determined between the source code at commit SHA 1 and commit SHA 2. In our regression testing experiments, these files are the input for the chart generation; when we provide the data from our evaluation, it is these seqDiff files.
• fs-extra_commits.out: the sequence of commits analyzed
• testlog_testfs-extra_<commit SHA 1>_<commit SHA 2>_<rep number>.log: the output from running each rep of each test suite at each set of commits. These files are what get analyzed by the diff analysis to produce the seqDiff files; you can ignore these.

This command generates the tests for the NestSeq generation strategy: Nessie with both nesting and sequencing. If you want to generate the same files for the Seq and LambdaTester, just replace the NestSeq with these strategies respectively. Note though, that you'll need to store the NestSeq files in a new directory if you want to avoid them being overwritten.

If you want to rerun all these experiments and look at graphs of the results, follow the instructions for reproducing the graphs from the paper (see below) but replace the paths to the directories with the data from our experiments, with the new data from your experiments.

Data analysis / reproducing the plots

Return to the home directory of the docker: cd /home/nessie.

To run the data analysis script, run ipython3 -i data_analysis.py from the home directory of the docker.

To generate the regression testing chart for a particular commit (for example, commit 2c38bf4 for jsonfile, which is Figure 4 in the paper), run the following code in the ipython session:

pkgname = "jsonfile"
k = 40
(nested_ys, all_lists, all_indices) = get_lists_data("RegressionTesting/SeqDiffsNesting/", pkgname)
(nonest_ys, all_lists, all_indices) = get_lists_data("RegressionTesting/SeqDiffsNoNesting/", pkgname)
(oldLT_ys, all_lists, all_indices) = get_lists_data("RegressionTesting/SeqDiffsOldLT/", pkgname)
graph_multi_diffs([[[nested_ys[k]], "NES (seq+nest)", '-x'], [[nonest_ys[k]], "NES (seq)", '-o'], [[oldLT_ys[k]], "LT", '-*']],
pkgname, all_indices[k])


And, to generate the regression testing chart for all commits (for example, for jsonfile), it is the same graphing command but we ignore the particular index k.

graph_multi_diffs([[nested_ys, "NES (seq+nest)",'-x'], [nonest_ys, "NES (seq)",'-o'], [oldLT_ys, "LT",'-*']], pkgname)


RQ4: Mined Data

All of the nesting examples we mined are included in Section 5 in the supplemental materials.

We also did an experiment very similar to the coverage experiment: for each package tested, we generate 1000 tests and compute the cumulative coverage for Nessie running while using the mined data a specified percentage of the time when generating nested function calls. In the experiment, we tested this percentage at 0%, 25%, 50%, 75%, and 100%; the results are in Table 6 in the paper.

Rerunning mined data use experiments

If you want to reproduce the results from the paper (modulo the nondeterminism of the approach) then you should run the test generator with the additional argument of the mined data use (the default is 50%). We use the package fs-extra as a demonstrative example again.

Make sure you're in the ToolCode directory: cd /home/nessie/ToolCode

Then, use Nessie to generate the tests and determine sequential (cumulative) coverage. This will take about an hour. Like the coverage experiment, if you want to do a shorter version of the experiment, we recommend trying it with 50 tests instead -- this should take about 3 minutes. The example command below uses 50 tests. Notice that it is the same command as in the coverage experiment, with the additional argument specifying the percentage of mined data use (here, 0).

./genTestsAndCumulativeCoverage.sh fs-extra Evaluation/node-fs-extra 50 NonEmptyCallback NestSeq 6bffcd8 0


The results will be in the newly generated file Evaluation/fs-extra_50tests_NestSeq_coverage.txt.

If you want to rerun the whole experiment you'll need to rerun the above command with the other mined data use options (each of these should also take about 3 minutes). Note that you'll need to save the previously generated file, as it will be overwritten otherwise:

mv Evaluation/fs-extra_50tests_NestSeq_coverage.txt Evaluation/fs-extra_50tests_NestSeq_coverage_mined0.txt

./genTestsAndCumulativeCoverage.sh fs-extra Evaluation/node-fs-extra 50 NonEmptyCallback NestSeq 6bffcd8 0.25
mv Evaluation/fs-extra_50tests_NestSeq_coverage.txt Evaluation/fs-extra_50tests_NestSeq_coverage_mined25.txt

./genTestsAndCumulativeCoverage.sh fs-extra Evaluation/node-fs-extra 50 NonEmptyCallback NestSeq 6bffcd8 0.5
mv Evaluation/fs-extra_50tests_NestSeq_coverage.txt Evaluation/fs-extra_50tests_NestSeq_coverage_mined50.txt

./genTestsAndCumulativeCoverage.sh fs-extra Evaluation/node-fs-extra 50 NonEmptyCallback NestSeq 6bffcd8 0.75
mv Evaluation/fs-extra_50tests_NestSeq_coverage.txt Evaluation/fs-extra_50tests_NestSeq_coverage_mined75.txt

./genTestsAndCumulativeCoverage.sh fs-extra Evaluation/node-fs-extra 50 NonEmptyCallback NestSeq 6bffcd8 1
mv Evaluation/fs-extra_50tests_NestSeq_coverage.txt Evaluation/fs-extra_50tests_NestSeq_coverage_mined100.txt


If you want to rerun all these experiments and look at graphs of the results, follow the instructions for reproducing the graphs from the paper (see below) but replace the paths to the files with data from our experiments, with the new data from your experiments.

Data analysis / reproducing the plots in the supplementary materials

We don't have any plots of the effect of mined data use in the paper itself; the results are summarized in Table 6 in the paper. However, we do have plots in Section 3 of the supplementary materials. These are the same style of coverage plots as in the coverage experiment, but where we're varying the mined data use instead of the use of nesting etc.

Return to the home directory of the docker: cd /home/nessie.

To run the data analysis script, again run ipython3 -i data_analysis.py from the home directory of the docker.

To generate the coverage chart for a particular package (for example, fs-extra), run:

pkgname = "fs-extra"
with open("MiningDataUseExperiment/"+pkgname+"_coverage_mining0.out") as f:
mining0 = [float(i) for i in f.read().split("\n") if len(i) > 0]

with open("MiningDataUseExperiment/"+pkgname+"_coverage_mining25.out") as f:
mining25 = [float(i) for i in f.read().split("\n") if len(i) > 0]

with open("MiningDataUseExperiment/"+pkgname+"_coverage_mining50.out") as f:
mining50 = [float(i) for i in f.read().split("\n") if len(i) > 0]

with open("MiningDataUseExperiment/"+pkgname+"_coverage_mining75.out") as f:
mining75 = [float(i) for i in f.read().split("\n") if len(i) > 0]

with open("MiningDataUseExperiment/"+pkgname+"_coverage_mining100.out") as f:
mining100 = [float(i) for i in f.read().split("\n") if len(i) > 0]

graph_multi_cov([[mining0, "0%",'-o'],[mining50, "50%",'-x'], [mining100, "100%",'-*']],
MB_LBS.get(pkgname, {}).get("LB", 0), pkgname)


The graphs in the supplementary materials only include data from mining levels 0%, 50%, and 100%. If you want to see all of the data (like that in Table 6 in the paper), just add the 25% and 75% data to the graphing command:

graph_multi_cov([[mining0, "0%",'-o'],[mining25, "25%",'-.'],[mining50, "50%",'-x'],
[mining75, "75%",'-^'], [mining100, "100%",'-*']],
MB_LBS.get(pkgname, {}).get("LB", 0), pkgname)


RQ5: Performance of Nessie

We report the performance of Nessie by using the Linux time command. We just ran the test generator for 100 tests, timed it with time, repeated this 10 times (per library), and took the average of the user time. These results were gotten by running Nessie on the CentOS machine described above; if you choose to rerun this experiment in the docker container provided you will likely get different performance numbers.

Files (2.2 MB)
Name Size
ExperimentalData.tgz
md5:b86401913110a83478e35efd7a79e658
2.2 MB
142
28
views