Demo: Detecting Third-Party Library Problems with Combined Program Analysis

Third-party libraries ease the software development process and thus have become an integral part of modern software engineering. Unfortunately, they are not usually vetted by human developers and thus are often responsible for introducing bugs, vulnerabilities, or attacks to programs that will eventually reach end-users. In this demonstration, we present a combined static and dynamic program analysis for inferring and enforcing third-party library permissions in server-side JavaScript. This analysis is centered around a RWX permission system across library boundaries. We demonstrate that our tools can detect zero-day vulnerabilities injected into popular libraries and often missed by state-of-the-art tools such as snyk test and npm audit.


INTRODUCTION
Modern software development relies heavily on third-party libraries. Applications use several dozens or even hundreds of libraries, created by many different authors and accessed via public repositories. The heavy use of libraries is particularly common in JavaScript applications [6,8,12,13,15], and especially in those running on the Node.js platform [16,19], where developers have millions of libraries at their fingertips through the npm package manager.
Security Problems: Reliance on libraries introduces several security risks-ranging from dynamic compromise, the runtime exploitation of a benign library via its inputs, to full-fledged malicious Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). CCS '21, November 15-19, 2021 library operation-affecting the security of the entire application and its broader operating environment. For example, consider a (de)serialization library that uses JavaScript's built-in eval function to parse a string into a runtime object. While the library itself is benign, accessing no other external API apart from eval, an attacker may pass a malicious serialized object to the deserialization function, which in turn will pass it to eval. As a result, the library may be subverted into malicious behavior, e.g., accessing the file system or the network, that goes far beyond what a (de)serialization library is supposed to do. The underlying problem is that every library running on Node.js has all privileges offered by the JavaScript language and its runtime environment. In particular, each library is allowed to access any built-in API, global variables, APIs of other imported libraries, and even import additional libraries.
Overview: In this demo, we show how to leverage a combined static and dynamic program analysis to understand program behavior prior to the program's production execution and enforce this behavior during the program's production execution. Our techniques form a sharp contrast to state-of-the-art vulnerability detection tools such as npm audit [9] and snyk [11]: while these tools scan a program's dependencies to report on known attacks-collected from vulnerability reports accessible publicly-our tools can detect and notify developers of previously unseen, zero-day attacks, as we show during the demonstration of our tools.
Demo Outline: The demonstration starts by exemplifying the use of third-party libraries common in server-side Node.js development today. It then shows the expected (normal and benign) behavior of these libraries as part of larger applications, and then demonstrates unexpected (abnormal and malicious) behavior of these libraries when subverted by attackers-for example, an attacker can read and exfiltrate the contents of /etc/passwd. It then applies state-of-theart vulnerability detection tools such as npm audit and snyk test, which do not report any risks-due to the reason that both tools report only known vulnerabilities. The demo finally demonstrates the use of a combined program analysis designed to report on the permissions used by third-party libraries-showing the set of permissions required for the normal operation of a library, and thus delineating between normal and malicious operation. All the tools presented in this demonstration are open-source software.

RELATED WORK
This section briefly outlines static and dynamic analysis techniques.
Static analysis: Static program analysis is a technique for extracting information about the behavior of a program by inspecting its source code. Static analysis tends to focus on invariants related to all executions of the program, but often misses information related  Figure 1: A static analysis tool extracts the static permissions from a Node.js third-party library. The dynamic analysis tool then extracts the corresponding dynamic permissions using the test cases. The combination of permissions are used as input to the policy enforcement component.
to dynamic program behavior. Several static analysis systems have been developed for Node.js [5,7,18].
Dynamic analysis: Dynamic program analysis is a technique for extracting information about a program by instrumenting its execution. Because of its nature, dynamic analysis can extract a wealth of information about a single execution but (1) this information might not generalize to other executions, and (2) it might impose a significant runtime overhead to the program's execution. Several dynamic analysis frameworks have been developed for JavaScript [2,10,14,17].
Combined analysis: While both static and dynamic analysis are necessarily imprecise approximations of program behavior, their relative trade-offs make them complementary tools in a programmer's tool arsenal [3,4]. By combining these two synergistic approaches, as our demo shows, we aim at providing improved analysis results with minimal-to-zero developer effort. Fig. 1 shows an overview of our proposed techniques and the way they are applied on a real use case. Our techniques start by running a static program analysis on the source code of the target library to extract a first set of candidate permissions (Fig. 1, (1)). This phase analyzes the source code of the library and corresponding dependencies to extract the set of interfaces-e.g., functions, global objects, language built-ins-used by the library.

TOOL OVERVIEW
Our techniques then pair this static permission set with a second set gathered via dynamic program analysis (Fig. 1, (2)). During this phase, dynamic analysis is applied against the testing infrastructure of the library, which encodes anticipated library behaviors envisioned by the library's developers. We augment these test inputs with ones gathered via active learning [15]-a critical addition for libraries that do not have test cases.
Finally, our techniques enforce the permission sets gathered by both analysis phases by instrumenting program execution (Fig. 1,  (3)). When this enforcement instrumentation framework detects an access outside of the generated permission set, it throws an exception aborting the execution of the program.

A REAL EXAMPLE
This section exemplifies our techniques against dynamic library subversion-a common attack vector in libraries that evaluate user input.
A de-serialization library: Consider a Node.js application that uses a third-party (de)serialization library for converting serialized strings into in-memory objects. The (de)serialization library is fed client-generated strings, which may lead to remote code execution (RCE) attacks. The code below shows the relevant application fragment: The code above first imports the serialization library. It then creates a web server that receives user-provided values arriving from the network as strings, which get deserialized into in-memory objects. Values containing a special token are printed in the console.
Unfortunately, this deserialization functionality is provided by serial.dec which is implemented by a third-party library developed by programmers other than the application's nominal developers. Internally, this function uses the unsafe eval primitive of Node.js which evaluates any valid JavaScript code: module.exports = { dec: (str) => { let obj; obj = eval(str); return obj; } } Benign vs. malicious operation: Benign user requests work as expected-e.g., , the following request will cause the value to be printed: let key = 'a1b2c33d4e5f6g7h8i9jakblc'); request.write(payload); // part of a request However, adversaries can pass Turing-complete programs that will execute on the host environment-e.g., , the following input will create a file pwned.txt using the fs library of Node.js: let payload = 'require("fs").

Tested 1 dependencies for known issues, no vulnerable paths found
The results are similar for npm audit: found 0 vulnerabilities in 1 scanned packages The reason these tools fail to report any risks is that the dependencies of our program do not have any known vulnerabilities.
Applying Static Analysis: We first run perm.js -s, our static permission inference analysis, to extract the first set of permissions for serialization: { "~/libs/serialization/index.js": { "eval": "rx", "module": "r", "module.exports": "w" } } The inferred permissions show the use of eval and module.export for evaluating code and exporting library functionality.
Applying dynamic analysis: We then run perm.js -d, our dynamic permission inference analysis, with the use of the provided test cases in order to extract permissions from the third-party library serialization. As all of the inputs are JSON objects, they only additional permissions are related to a few built-in primitives such as the Array constructor and the value null.

DISCUSSION & CONCLUSION
We hope that our demo will form the basis of a discussion around the practices of third-party libraries. We outline a few potential discussion threads below. First, what is the best way for developers to incorporate specific standards to the libraries they develop and make available to the community? The goal here is to minimize supply-chain attacks due to developer mistakes. A formalization could include languagespecific standards (e.g., minimizing the use of eval, or enforcing the inclusion of test cases with adequate coverage) for ameliorating security problems before the libraries are shared. Second, what are the possible steps to be taken by library repositories in order to shield the community against these problems? The goal here is to identify a set of simple steps that repositories can take in order to mitigate many of these problems-with minimal overhead for the end user.
Finally, what is a good way to improve checks on program updates? As the SolarWinds [1] attack has demonstrated, discovering and mitigating vulnerabilities related to program updates is of paramount importance, and thus automating these checks to the extent possible would provide significant security benefits.
We hope that our demonstration of a combined static and dynamic program analysis in the context of real Node.js applications, will serve to kick off a targeted discussion around the problems of third-party libraries and possible ways to mitigate them.