Detecting Missing Dependencies and Notifiers in Puppet Programs

Puppet is a popular computer system configuration management tool. It provides abstractions that enable administrators to setup their computer systems declaratively. Its use suffers from two potential pitfalls. First, if ordering constraints are not specified whenever an abstraction depends on another, the non-deterministic application of abstractions can lead to race conditions. Second, if a service is not tied to its resources through notification constructs, the system may operate in a stale state whenever a resource gets modified. Such faults can degrade a computing infrastructure's availability and functionality. We have developed an approach that identifies these issues through the analysis of a Puppet program and its system call trace. Specifically, we present a formal model for traces, which allows us to capture the interactions of Puppet abstractions with the file system. By analyzing these interactions we identify (1) abstractions that are related to each other (e.g., operate on the same file), and (2) abstractions that should act as notifiers so that changes are correctly propagated. We then check the relationships from the trace's analysis against the program's dependency graph: a representation containing all the ordering constraints and notifications declared in the program. If a mismatch is detected, our system reports a potential fault. We have evaluated our method on a large set of Puppet modules, and discovered 57 previously unknown issues in 30 of them. Benchmarking further shows that our approach can analyze in minutes real-world configurations with a magnitude measured in thousands of lines and millions of system calls.


INTRODUCTION
The prevalence of cloud computing and the advent of microservices have made the management of multiple deployment and testing environments a challenging and time-consuming task [5,29,32,39].Infrastructure as Code (IaC) methods and tools automate the setup and provision of these environments, promoting reliability, documentation, and reuse [39].Specifically, IaC (1) boosts the reliability of an infrastructure, because it minimizes the human intervention which is both time-consuming and error-prone; (2) ensures the predictability and consistency of the final product, because it eases the repetition of the steps followed to produce a specific outcome; and (3) allows the documentation and reuse of a system's configuration, because it associates the system's configuration with modular code [22,29,39,41].
Puppet [27] is one of the most popular system configuration tools used in the IaC context [33,37].Puppet abstracts the actual system resources through a declarative approach.It collects all the declared abstractions from a program, and applies them one-by-one so that the system eventually reaches the desired state.
By default, any execution sequence of abstractions is valid, unless there are specific ordering constraints imposed by the interdependencies among them.For example, an Apache service should run only after the installation of the corresponding package.Therefore, developers need to declare any ordering constraints between abstractions in their programs to remove erroneous execution sequences, e.g., trying to start a service before the installation of its package.Conceptually, Puppet captures all the ordering relationships defined in a program through a directed acyclic graph and applies each abstraction in topological ordering.In this context, all the unrelated abstractions are processed non-deterministically.Furthermore, Puppet allows programmers to apply certain abstractions whenever specific events take place via a feature called notification.Notifications propagate changes to related resources, ensuring that their state is up-to-date.For instance, when a configuration file changes the corresponding service has to be notified so that it will run with the new settings.
Tracking all the required ordering constraints and notifications is a complicated task though, mostly because developers are not always aware of the actual interactions of Puppet abstractions with the underlying operating system.Notably, such errors can have a negative impact on the reliability of an organization's infrastructure leading to inconsistencies [37] and outages [18].For example, the Github's services became unavailable when a missing notifier in their Puppet codebase caused a chain of failures such as dns timeouts [18].
Approaches that automatically detect these issues in production code [20,37] have significant room for improvement, facing limitations that prevent them from being practical.Rehearsal [37] employs static code verification and cannot handle realistic Puppet programs.In particular, it cannot reason about programs that abstract arbitrary shell commands.Additionally, the model-based testing approach adopted by Citac [20] imposes a significant overhead and restrictions on the supported Puppet programs under test (they must be able to run in Docker1 containers).It also requires extra instrumentation on the execution engine of Puppet.Finally, none of those tools addresses missing notification issues.
We have developed a practical and effective approach to identify faults involving ordering violations and notifiers in Puppet programs.To do so, we examine the system call trace produced by a single execution.The stepping stone of our approach is FStrace; a language for modeling a sequence of system call traces.We employ FStrace and operate in the following steps.First, we model the system call trace of a Puppet program in FStrace.Through FStrace, we derive an analysis that captures the interactions of higher-level programming constructs (Puppet abstractions) with the file system, and estimates the set of the expected relationships among them.Then, for a given Puppet program, we build the dependency graph which reflects all the ordering relationships and notifications that have been specified by the developer.Finally, we verify whether the expected relationships (as specified from the analysis of traces) hold with respect to the dependency graph.Unlike previous tools [20,37], our approach (1) can reason about which system resources are affected by the execution of a program and how, and (2) requires only a single Puppet run for discovering issues.Contributions.Our work makes the following contributions: • We introduce FStrace, a language for modeling system call traces.
The interpretation rules of FStrace allow us to infer the impact that higher-level building blocks have on the file system.The model proposed is generic and can be leveraged-apart from Puppet programs-by other domains (Section 3).• We design a framework for detecting faults regarding ordering violations and notifiers in Puppet programs.To the best of our knowledge, it is the first approach to deal with issues involving notifiers.(Section 4).

•
We provide an open-source implementation of our approach (Section 5).• We demonstrate the effectiveness and performance of our tool on a large set of Puppet modules.Specifically, our tool was able to detect 57 previously unknown faults in 30 Puppet modules.We provided fixes for 21 projects and 16 of them were accepted and integrated.This implies that our tool is capable of discovering issues that are useful to developers (Section 6).

OVERVIEW
Here is a brief overview of Puppet, a motivating example that demonstrates the types of defects our approach detects, and how our approach is structured.
Puppet.Puppet enables developers to describe the desired state of a system declaratively.
The code above indicates that the apache2 package should be installed in the host, the file apache2.confshould exist in the /etc/apache2/ path, and that the Apache server should be running.There are different types for abstracting system resources, including but not limited to, file, package, service, exec.Beyond that, the Puppet language provides conditionals, loops, and-for reusability-supports the creation of new abstractions and classes.
Puppet code is stored in files called manifests.Puppet compiles manifests into catalogs that specify all the abstractions it needs to apply in a particular system to reach the desired state [26].Then, it evaluates the compiled catalogs and applies potential changes, if Figure 1: A Puppet program that manifests a missing ordering relationship and notifier.We omit irrelevant code.
the system is not in the appropriate state.For example, if a file does not exist at a certain location, Puppet will create it.The execution of a catalog must be idempotent [25], so that the evaluation proceeds only if the current and the desired state of the system do not match.Motivating Example.In the following, we present a motivating example that demonstrates the issues that our approach addresses.
Missing Ordering Relationships (MOR) occur when a developer fails to define a happens-before relation between two Puppet abstractions that depend on each other.This can lead to unstable code that behaves correctly in some circumstances, but breaks in others depending on the order that Puppet processes resources.
Consider the code snippet shown in Figure 1.This fragment is taken from a real-world Puppet module [12], which defines a class that setups a Network Time Protocol (ntp) service.This class expects the String parameter $defaults_file as an argument (line 1), which stands for the path where the service's default configuration file is created.Notice that the default value of $defaults_file is /etc/default/ntp.Initially, at line 2, the program installs the ntp package.Then, it creates a configuration file at the path /etc/ntp.conf(lines 3-6).If the variable $defaults_file is defined, Puppet generates a file at the location specified by $defaults_file (lines 7-13).Note that if Puppet finds an already defined variable in the condition of an if statement, it implicitly coerces it to true.Puppet evaluates both file abstractions after the package resource.This is expressed through the require property at lines 5, 11.
In lines 18 to 22, the program declares that an ntp service should be in a running state.Notice the subscribe parameter at line 21, where the ntp service subscribes to the variable $service_subscribe which in turn is computed at lines 14-17.The subscribe construct states that the service depends on the Puppet abstractions included in the service_subscribe variable.
The initial intention of the programmer is that when the $defaults_file is defined, then the service should subscribe to both File[$default_file] and File["/etc/ntp.conf"](line  15).However, unlike if statements, the operand of the "?" operator (i.e, $defaults_file) at line 14 never evaluates to true because it is a String variable.Therefore, the program considers only the default case (line 16) where the subscribers' list only contains File[/etc/ntp.conf].As a result, the dependency between the service and the /etc/default/ntp file is never created.In other words, Puppet might apply service before the configuration of /etc/default/ntp.A fix to this problem is to replace "?" operator with an if statement.
Missing Notifiers (MN).Notifiers are necessary for many entities such as configuration files and services.An update to a configuration file should trigger the restart of the corresponding service, because these files typically describe settings processed during a service's startup.For example the configuration file of an Apache service lists additional modules that should be loaded into memory.
A missing notifier issue is illustrated in Figure 1.The subscribe primitive (line 21) creates a notifier that restarts the ntp service whenever there is a change to the resources included in the $service_subscribe list.Although the programmer's intention was correct, the programming error at lines 14-17 causes an unexpected behavior: the service does not restart even if the configuration /etc/default/ntp changes because the service subscribes only to /etc/ntp.conf.
Framework.To address these issues, we propose a frameworkillustrated in Figure 2-that operates as follows.First, it monitors the system calls of the Puppet process and its descendants.Then, the framework employs two components: the trace analyzer, and the fault detector.The trace analyzer takes as input a system cal trace derived from the application of a Puppet configuration, and it interprets each system call based on the model described in Section 3. The analyzer is instantiated with the block tagger component, which splits system calls into different blocks that correspond to Puppet abstractions.The analysis output is the set of the effects that Puppet abstractions have on the file system.The fault detector, generates the directed acyclic graph containing all the ordering constraints and notifications declared in a Puppet program, and compares it against the expected relationships inferred from the output of the trace analysis.If a mismatch is identified, the fault detector reports a potential fault.
Trace Example.To generates traces, we exploit a system call tracing program [28,35], namely, strace.Figure 3 presents an excerpt from the trace of the program of Figure 1.Each line denotes an invocation of a system call along with the process (pid) that triggered it.For example, the entry 103 close(7) = 0 states that the process with id = 103, invoked close with 7 as an argument, and that system call returned 0. By further inspecting Figure 3, we observe that Puppet initially processes the File[/etc/default/ntp] resource (lines 2-7), and then the ntp service (lines 8-16).Observe the calls of write at lines 2, 7-8, 16.These calls correspond to messages printed to the standard output by Puppet indicating the points where the application of each abstraction starts and ends respectively.We exploit these points to classify system calls according to the Puppet abstraction they come from (Section 3.2).

MODELING SYSTEM CALL TRACES
We formally introduce FStrace, the language we use to model system calls in it, and discuss how we model traces stemming from Puppet programs.

The FStrace Language
FStrace primitives are designed to model system calls that operate on file system resources.Some of the constructs have direct correspondence with the actual system calls, while others are generic enough so that they can represent a family of system calls.We group system calls into execution blocks, with a unique id.FStrace assumes that within a block, all system calls are processed sequentially.However, the execution order at the level of blocks is not deterministic.Therefore, there is no guaarantee that a block b 1 is always processed before b 2 , even if the former appears before the latter in traces.FStrace processes every execution block atomically, and nested blocks are not allowed.
3.1.1Syntax and Domains.Figure 4 shows the syntax of FStrace.The language constists of file names, paths-which are sequences of file names-and file descriptors represented by either an integer or the at_fdcwd construct.We also include (1) flags (e.g., read-mode, write-mode, o_trunc) that indicate how a file is opened, (2) the constructs fd and cwd that provide information for cloning a process, (3) primitives (consumed, produced, expunged), that stand for the types of the effect that a system call has on a file, and (4) an infinite set of unique identifiers for execution blocks.A trace is a sequence of blocks.A block has the following syntax: begin b (z, s) * end, where b implies its id and (z, s) * is a sequence of trace entries.Each pair (z, s) is a process id (pid), which is a positive integer, and a system call.Finally, FStrace models every system call s ∈ Sys using the following eleven constructs.chdir p changes the working directory of the current process to p. clone c * f spawns a new process whose pid is f .The given flags c * reveal what kind of information is shared between the parent and the child process.close f disassociates the file descriptor f from the corresponding resource.dupfd f 1 f 2 creates a new file descriptor f 2 as a copy of f 1 .This construct models a number of system calls such as dup, dup2, dup3, fcntl(fd, F_DUPFD).hpath d p m captures the effect m that a system call has on the path p.If the given path name is not absolute, we interpret it as relative to the file descriptor d.If the value of d is at_fdcwd, we consider p relative to the current working directory.Otherwise, if p is absolute, we ignore d.In this way, we can represent the system calls whose suffix is "at" (e.g., linkat, renameat) or system calls that take relative paths as arguments.For instance, the system call stat("foo",. . .)-which retrieves the main information and attributes of the file foo-is represented as hpath at_fdcwd foo consumed.On the other hand, we represent the system call mkdir("/foo/bar")-which creates a new directory at path /foo/bar-as hpath at_fdcwd ("/", "foo", "bar") produced.nop (no operation) does not affect the state.We use nop to model all system calls that we do not need to take into account, e.g., getpid, sync.
Figure 5 illustrates the semantic domains of FStrace.FStrace introduces six major components: An inode table τ ∈ I NodeT is a map of a pair, consisting of an inode and a file name to another inode.An inode is a positive integer that acts as the identifier for a certain file system resource.Note that we also keep the special inode ι r which corresponds to the inode of the root directory "/".The inode table mimics the inode structure implemented in Unixlike operating systems.In this context, the first element of the key is the inode of the directory where the file name exists.For example, the inode of the /foo file, whose value is 3, is stored as follows: [(ι r , "foo") → 3)].A file descriptor table π ∈ FdT maps an address and a file descriptor to an inode.We use this component to map open file descriptors of a process to the resource they handle.The CwdT element maps an address to an inode.That inode stands for the current working directory of a process.
Observe that we do not use the pid found in the trace entries as the key of the two definitions above.Instead, we have an indirection: each process points to a pair of addresses (e.g., see the domain ProcT ).The first element of the pair is the address that stores the file descriptor table of the process.The second element of the pair reflects the address where the current working directory of the process is located.Therefore, two different processes might share the same file descriptor table or working directory.For example, in the following entries: [(z 1 → (α 1 , α 2 )), z 2 → (α 1 , α 3 )], the processes z 1 and z 2 point to the same file descriptor table because the first elements of their pairs are identical (i.e., α 1 ).Similarly, since their second addresses do not match (i.e, α 2 α 3 ), we presume that they do not share the same working directory; thus, a change imposed by any process does not affect the other one.
A table of symbolic links κ ∈ SymT is a partial map of inodes to paths.This domain holds the path names that symbolic linksidentified by their inodes-point to.The last component of FStrace (ρ ∈ Res) maps path names to an element of the power set of blocks and effects.Specifically, this component tracks where and how each path is accessed.For example, the entry /foo → {(produced, b 1 ), (consumed, b 2 )} indicates that the path /foo is produced in the block b 1 and consumed in b 2 .The state ⟨τ , π , ϕ, ν, κ, ρ⟩ is a tuple consisting of the six components described above.

Preliminary Definitions.
A number of specific operations apply to FStrace's domains.The binary operator :: denotes the addition of an element to a set, while ↓ i manifests the projection of the i t h element.Also, we define the following auxilliary functions: • I (p, τ ): returns the inode to which the path p points based on the inode table τ .• P(ι, τ ): returns the paths that point to the inode ι according to the inode table τ .• join(p 1 , p 2 ): joins the two paths p 1 and p 2 .
• dir (p) returns the parent directory of the path p.
• base(p) returns the base name of the path p.We also define the function Ab(d, p, l, r, τ ) which gives the absolute path for a given path name p and a file descriptor d with regards to the provided open file descriptors l and the current working directory r of a process.
Finally, the function Op(m) gives the effect that the open system call has on a file based on the flags m.Op is defined as: 6 shows the semantics of FStrace.We present a subset of our rules for brevity.Each rule follows the form below: The relation b,e − −− → indicates that given a trace entry e (a pair of a pid and a system call) in execution block b, the initial state ⟨τ , π , ϕ, ν, κ, ρ⟩ transitions to a new state ⟨τ ′ , π ′ , ϕ ′ , ν ′ , κ ′ , ρ ′ ⟩.
[CHDIR] changes the working directory of the current process z.First, it inspects the process table ν to get the address that holds the value of the current process's working directory.Then, it updates ϕ so that the address α points to the inode of the path p.
[CLONE-COPY] demonstrates the case where we spawn a new process f by passing the empty sequence c = ∅.In this case, f shares neither the file descriptor table nor the working directory with the parent process z.So, we make copies of those values by creating two fresh addresses α 1 , α 2 .Then, we update the process table ν so that the new process f points to those new addresses.
[CLONE-SHARE] behaves in a similar way with [CLONE-COPY].However, this time, the new process f shares the open file descriptors (flag fd) and the working directory with z (flag cwd).Therefore, the freshly-created process f points to the same addresses as z.
[DUPFD] involves the scenario where we duplicate a provided file descriptor.Specifically, we lookup the address α of the current process's file descriptors table.Then, we retrieve the inode ι pointed by the file descriptor f 1 .Finally, we add the file descriptor f 2 , whose inode value is ι, to the file descriptor table of z.
[OPEN] opens a file and returns a new file descriptor.First, it inspects the addresses α 1 , α 2 where the file descriptor table and the working directory of the process z are located.Through the Ab function, it computes the absolute path p ′ using the file descriptor d, and the path p.This computation is boilerplate, so we abbreviate it as Ab(d, p, . . . ) in the next rules.Given the flags o * , it estimates the effect m that open has on the path p ′ (via the function Op).In turn, it performs two updates.First, it adds f to the file descriptor table of the process z using the address α 1 .Notice that f points to the inode of the path p ′ (f → I (p ′ )).Second, it updates the ρ element: the path p ′ receives the effect m in the the block b.
[HPATH] records the effect m that a system call has in the current execution block b.It handles the case when the given effect m is not expunged.Specifically, it determines the absolute path p ′ through Ab(d, p, . . .).It then inspects the symbolic link table to check whether the path p ′ points to another path or not.If this is the case (i.e., κ(p ′ ) undef), we associate the resulting resource p ′′ with the effect m in the current execution block b.Note that the hpathsym operates similarly, but it does not check whether p ′ is a symbolic link or not; it just considers the path p ′ .
[HPATH-EXPNG] illustrates the case where we expunge the provided file.As before, we first compute the absolute path p ′ associated with that resource.Subsequently, we remove all the effects associated with p ′ in the current execution block b, leading to the set l ′ (i.e., l = {m | ∀m ∈ ρ(p ′ ) : m ↓ 2 b}).We add the expunged effect to l ′ , and finally, we unlink the path p ′ from the inode table.For unlinking, the pair (I (p 1 ), p 2 ) refers to undef.Notice that p 1 is the parent directory of p ′ , and p 2 is the base name of p ′ .
[LINK] creates a hard link between two files.As a starting point, we take the absolute paths p ′ 1 and p ′ 2 , where p ′ 1 corresponds to the existing file, while p ′ 2 stands for the path where we create the hard link.Then, the inode of the new path p ′ 2 is identical to that of p ′ ).We also change the table ρ so that the path p ′ 2 is produced in the current execution block b.
[SYMLINK] creates a new symbolic link that points to the path p 1 .It first estimates the absolute path p ′ 2 of the fresh symbolic link.Then, it creates a new inode ι which the symbolic link points to by updating the inode table τ .It also changes the table κ so that the new inode ι targets the path p 1 .Finally, the path p ′ 2 is produced in the current execution block b leading to the new table ρ ′ .
[RENAME] renames the name of a given file.First, it retrieves the absolute paths corresponding to the old and the new path names (i.e, p ′ 1 and p ′ 2 ).Then, it updates the inode table τ : the inode of p ′ 2 is the same with that of p ′ 1 .In turn, it removes p ′ 1 from the inode table (i.e., p ′ 1 points to undef).For these updates, it is necessary to estimate: (1) the inode of their parent directories, and (2) their base names.Finally, it updates the component ρ.In particular, it removes any effects on path p ′ 1 that took place within the block b, and it marks p ′ 1 as expunged and p ′ 2 as produced in b.

Modeling Puppet Traces
To leverage FStrace we need to group system calls into blocks corresponding to higher-level programming constructs.Specifically for Puppet artifacts, it makes sense to classify system calls according to the Puppet abstraction where they come from.In this context, we presume that an execution block begins or ends when the evaluation of a Puppet abstraction starts or terminates.Thus, the name of the execution block corresponds to the name of the Puppet abstraction.
It is easy to identify the points where the evaluation of a Puppet abstraction starts/finishes by decoding the Puppet's debug messages.Recall from Figure 3 that those messages appear in the execution traces as writes to the standard output.We have developed a block tagger for Puppet that detects those debug messages and marks them as the entry and exit points of execution blocks.For example, consider again the traces in Figure 3.We can model the trace entry at line 2 as the entry point of an execution block whose name is "File[/etc/default/ntp]", whereas the system call at line 8 signals the ending of that execution block.Hence, all system calls that appear between lines 2 and 8, are included in the aforementioned block.

DETECTING FAULTS
We locate faults in Puppet programs by combining the trace analysis output with the dependency graph: a directed acyclic graph used to capture all the ordering and notification relationships between the abstractions of a given Puppet program.

The Dependency Graph
We consider the dependency graph as an element of the following power set: д ∈ DG = P(P × P × L) where P is the set of Puppet abstractions, and L = {notify, before}.
An entry (p 1 , p 2 , l) ∈ д means that the Puppet abstraction p 2 is dependent on p 1 .The label l shows the relationship's type between p 1 and b 2 : if l = before, p 1 is processed before p 2 , whereas if l = notify, apart from preceding p 2 , p 1 also sends notifications to p 2 .Let the binary relation ≺ д on nodes of a dependency graph д ∈ CG defined as follows For that purpose, it only considers paths in д with notify edges.Figure 7 depicts the dependency graph of the program of Figure 1.For brevity, the node conf1 stands for the /etc/ntp.conf,and conf2 is the /etc/default/ntp file.We observe that the resource /etc/default/ntp has neither an ordering nor a notification relationship with the service, because the corresponding nodes are not connected to each other.Also, package does not send notifications to service because there is no path from the former to the latter where all edges have notify labels.

Combining FStrace with the Dependency Graph
The dependency graph is a key element in our fault detection approach.Recall that the execution blocks in FStrace are not totally ordered.For example, consider two blocks b 1 , b 2 that affect the same file: b 1 produces it and b 2 reads its contents.In this case, b 2 can be processed first, because FStrace does not define a temporal relation between the two blocks.As a result, there will be a failure because b 2 will attempt to consume a file that does not exist.Thankfully, the dependency graph of a Puppet program can be employed to define the ordering relationships of two execution blocks (expressed through the ≺ д relation).Specifically, we need to check whether the ≺ д relation is defined for b 1 , b 2 to identify missing ordering relationships.

Algorithm 1 Detecting Faults
Input: ρ ∈ Res, д ∈ CG 1: for all p, l in ρ do report MOR between b 1 , b 2 on path p 7: end if if isService(b 2 ) then end if end for 14: end for For missing notifiers, we first need to identify pairs of Puppet abstractions where the application of the first element should trigger the application of the second one.To this end, we look for blocks that produce a particular resource p.If the same resource p is consumed by a block that maps to a service, presumably, the blocks, which produced p, should have notification relationships with the service block.That is, if they produce an update to p, service should be refreshed to consume the new version of p.
Algorithm 1 summarizes our fault detection approach.The algorithm expects as input the map ρ ∈ Res-as specified from the analysis of traces-and a dependency graph д ∈ CG.Then it iterates over every key-value pair of ρ.Recall that ρ is a map of a path p to a set of pairs l; each pair (m, b) ∈ l stands for the effect m that took place in the block b.For a certain path p, we retrieve the set of blocks c that consumed p (line 2).Then, at line 3, we do the same in order to compute the set of blocks t that produced p.In turn, for every block pair (b 1 , b 2 ) of the Cartesian product t × c, we check whether there is a happens-before relation between b 1 and b 2 .If b 2 is not dependent on b 1 (b 1 ⊀ д b 2 ) and vice versa, we report a missing ordering relationship (line 6).
As a next step, the algorithm checks for missing notifiers.If the block b 2 , which consumed p, corresponds to a service (line 8), the algorithm examines whether the relation b 1 → д b 2 holds (line 9).If the block b 1 , which produced p, does not send notifications to the service b 2 the algorithm reports a missing notifier (line 10).

IMPLEMENTATION
Here are our method's implementation details and their current limitations.

Details
We have developed a prototype that implements our approach in the OCaml programming language2 .Our tool consists of three different components: (1) an executor that is responsible for tracing Puppet programs by taking a Puppet manifest as input and executes it using strace to collect traces; (2) an analyzer that receives a sequence of system calls, models them in FStrace, and implements the interpretation rules presented in Figure 6; (3) a fault detector for Puppet, which takes the analyzer's output and follows the steps of Algorithm 1.Note that, we build the dependency graph of a Puppet program through a simple analysis of the catalogs produced by Puppet after the compilation of the manifests.(Catalogs are json documents that list all Puppet abstractions that are going to be applied along with their dependencies.) We have implemented our method with efficiency in mind.Our tool is able to handle gb-sized traces with reasonable time and space requirements (see Section 6.4).This was made possible through a number of optimizations, such as the use of streams to process and analyze traces, a reversed inode table to lookup paths based on their inodes, and function memoization.

Current Limitations
Currently, our tool can only support Linux distributions because strace is a utility for Linux-based operating systems.However, we can easily extend it to support other posix-compliant environments such as FreeBSD or Solaris.Also, as we will observe in Section 6, our tool might produce false positives when two Puppet abstractions operate on the same file, but they are commutative to each other, i.e., the application order does not matter.Even though commutative pairs of abstractions are not so common (see Section 6), we plan to address this issue in future work by examining Puppet catalogs to identify such pairs.

EVALUATION
We have evaluated our framework by examining a large number of Puppet modules in order to answer the following research questions.RQ1 Is the proposed approach effective for finding faults in Puppet manifests?(Section 6.2) RQ2 How can we categorize the detected faults?(Section 6.3) RQ3 What is the performance of our analysis?(Section 6.4)

Experimental Setup
We collected a large number of Puppet modules taken from Forge api3 and Github.We were particularly interested in modules that support Debian Stretch, because Debian is one of the most popular Linux distributions [1].We used Docker to spawn a clean Debian environment efficiently.Then, we ran our framework on every module separately.We monitored the Puppet process and collected the system call trace of every program through strace.Finally, we ran each step (trace analysis and fault detection) and logged the reports generated by our framework.Through this process, we successfully ran and analyzed 351 Puppet modules in total.
To compute the performance of our approach we ran the trace analysis and fault detection steps ten times to get reliable measurements.By examining the standard deviation, we observed that the running times did not vary significantly among different executions.All the experiments were run on a Virtual Machine with an 2.1ghz 8-core processor and 8gb of ram.

Fault Detection Results
Our framework detected 57 previously unknown issues in 30 Puppet modules.Table 1 presents the analysis results for each module.Notably, this is the first study that led to the disclosure of such a large number of faults in Puppet repositories.Our framework marks 43 out of 57 faults as missing ordering relationships (column Total 57 43 14 mor).We observe that ordering violations are the most prevalent issue in the inspected Puppet manifests.The rest of the faults are related to missing notifiers (column mn).
Based on the reports of our tool, we manually verified that each reported fault can lead to a problematic situation by reproducing each case.We provided fixes for 21 projects, and 16 of them were accepted by their development teams and integrated into their code.This indicates that our tool produces reports that are meaningful to developers.At the time of the submission, none of our patches have been rejected.

Fault Patterns
Below, we categorize and discuss some of the faults identified by our framework.Most represent previously unknown to us fault patterns which we learned through our tool.

Missing Ordering Relationships. We have observed two types of missing ordering relationships issues.
Generate-Use Violation.The use of a resource must always succeed its creation.Many modules fail to preserve that ordering relationship.We observed this violation in 16 Puppet modules such as alertlogic-al_agents, hardening-os_hardening, etc. Figure 8 shows a fragment from alertlogic-al_agents [7].The code first fetches a .debpackage (a Debian archive) using the wget command (lines 1-5).The package is stored at the path specified by the $package_path variable whose value is /tmp/al-agent.Then, the code installs the Debian archive on the system (lines 6-10) through dpkg. 4It is easy to see that the package depends on exec because 4 dpkg is a package management system for Debian-based operating systems 1 $package_path = "/tmp/al-agent" 2 exec {"download": 3 command => "/usr/bin/wget -O ${package_path} ${pkg_url}", 4 creates => $package_path 5 } 6 package {"al-agent": 7 ensure => "installed", 8 provider => "dpkg", 9 source => $package_path, 10 } it requires $package_path (the .debfile) to exist in the system (line 9) so that it can install the package successfully.
The Generate-Use category produces observable errors (errors that manifest during the catalog's application), when Puppet applies abstractions in the erroneous order.For example, when it processes package before exec the application of the catalog fails with the following error: "dpkg: error: cannot access archive "/tmp/al-agent": No such file or directory" Configure-Use Violation.The configuration of a file must precede its use.For example, when a service starts, all the files consumed by that service have to be properly configured.This category differs from the previous one because when a Puppet abstraction attempts to use the file, the latter exists in the system.However, this is not in the expected state (e.g., the file does not have the right contents, permissions, etc).This error pattern appears in four modules: saz-ntp, vpgrp-influxdbrelay, olivierHa-influxdb, and ploperations-puppet.
Figure 1 illustrates a program with an issue related to this category.When the ntp service starts, the configuration files are guaranteed to be there because the abstraction package creates them during installation.However" it is possible that the ntp service does not read the desired contents of the /etc/default/ntp file specified by content => "conf content..." (line 11), because there is a missing ordering relationship between file and service.Note that this category-unlike the previous one-might lead to unexpected behaviors silently, i.e., the application of the catalog does not produce any error messages.6.3.2Missing Notifiers.We have identified four different categories of issues related to notifiers.
Configuration Files.A configuration file must always send notifications to a service so that any change to that file triggers the restart of the corresponding service.Although this is a standard pattern, we observed that in four modules (shown in rows 6, 7, 11, 12, 30 of Table 1) this is not the case.As an example recall the program discussed in Section 2.
Log Files.Typically, services log various events in dedicated files.For instance, the log file of an Apache server records-among other things-every incoming http request.Log files are very beneficial for debugging and monitoring purposes.When a service starts, it opens a corresponding log file, which remains open, while the service is up, to write any events that take place.
We discovered issues related to logging in two popular Puppet modules (puppetlabs-apache [11], and deric-zookeeper [13]).These modules declare the log files for the apache and zookeeper services in their manifests.However, the log files do not have a notifier for their associated services.This may lead to a problematic situation.Consider the case where the log file of a service is removed from the host.When we remove an open file, the underlying system call (unlink) does not update the file descriptors associated with the removed file, even though Puppet will create a new one.This means that although the file disappears from the file system, the service still handles a file descriptor that points to the inode of the original file.The issue is that after removal, the inode becomes an orphan (i.e., it is not linked with any file), which means that it is no longer accessible through a file path.Therefore, in the case of a missing notifier, the log history of the upcoming events is lost because the service writes to an orphan inode.To fix that issue, the log file should notify the service so that the service opens the newly-created log file.
Init Scripts.Init scripts specify how a service starts, stops or restarts.In practice, they are wrapper shell scripts which setup the required environment and invoke the actual executables of the services with the appropriate parameters.
Puppet manifests, that manage init scripts should notify the corresponding service whenever there is a change to those scripts.The camptocamp-tomcat [16] and alexharvey-disable_-transparent_hugepage [9] modules fail to follow that pattern.Consider the code listed in Figure 9, coming from the camptocamp-tomcat module.The Figure shows fragments, coming from two different files.First, the config.ppPuppet manifest, defines a custom abstraction named tomcat::instance::config, which takes the variables $basedir and $javahome as parameters (line 1).This abstraction configures the init script of the tomcat service (lines 2-8) whose contents are determined by the tomcat/tomcat.init.erbtemplate (lines [9][10][11][12].By examining this template we see that, before the init script starts tomcat, it sets some environment variables based on the values of the Puppet parameters $basedir and javahome (lines 10-12).When there is an update to the init script (e.g., $javahome variable has a different value), Tomcat should restart in order to operate on the new environment, e.g., to use a different version of Java.
Packages.When Puppet applies a package abstraction, the service that depends on that package should restart.In this way, we ensure that a service gets all the necessary updates, including, security patches, new features, etc.Our tool identified this kind of issue in example42-apache [14], saz-ntp [12], and puppet-telegraf [8].Specifically, the package abstractions that were responsible for installing the Apache, ntp, and telegraf did not notify the running instances whenever there was a new version of those packages.

Performance Evaluation
Figure 10 shows the running times (in seconds) of the trace analysis and fault detection phases relatively to the size of the provided traces (in mb).We observe that the correlation between the trace size and analysis time is almost linear.Notice that our framework is able to handle a large volume of traces (more than 1.2gb) in a reasonable amount of time (< 3 minutes).The average trace size and analysis time of the inspected modules is 84mb and 9 seconds respectively.
There are 4 cases out of 351 (ceritsc-dkms, datadog-datadog_agent, nexcess-ksplice, puppet-rabbitmq) where the execution times were relatively high compared to the rest of the modules.Nevertheless, they all remain in acceptable limits.By examining the characteristics of the traces obtained by the execution of these modules, we observed that they contain more unlink system calls than the rest of the modules.Notably, such calls involve more expensive operations on the analysis state (they are modeled as hpath d, p expunged, recall Figure 6).
Overall, the overhead of the analysis is relatively small.We argue that our approach is practical and can be used as part of the testing process for Puppet manifests.

False Positives
We have manually inspected the reported issues and identified a potential source of false positives.Consider two abstractions that are commutative to each other.For example, in the claranet-varnish module [10], the developers use two different abstractions to partially configure a certain file.On the one hand they use file to set the permissions and ownership of the file, and on the other they use exec to initialize its contents.In this case the execution order in which Puppet processes abstractions does not matter.Specifically, Puppet can first use exec to create the file with the desired contents, and then apply file to set the appropriate file's attributes or vice versa.Our approach reported false positives only in 7 out of 351 cases.Therefore, we argue that this pattern (i.e, configuring a file through the combination of abstractions) is not particularly common.
We noticed one more false positive which was related to missing notifiers.The developers of bodgit-dbus [15] use a custom command (expressed via exec) to reload the configuration of the service.Consequently, the configuration files notify the exec abstraction instead of service.We did not observe this case elsewhere, because Puppet programmers typically exploit the restart parameter of the service type to define a custom restart command in the following manner: "service { restart => "/custom/cmd", . . .}".

RELATED WORK
Our work is related to three research areas, namely quality in IaC, trace analysis, and modeling of file systems operations.
Quality in IaC.With the proliferation of the IaC process, there have been numerous attempts to identify defects and quality concerns in configuration code.
A number of studies focus on maintainability issues.Sharma et al. [38] design and implement a code-smell detection scheme for Puppet, which searches for issues related to naming conventions, code design, indentation, etc.Their findings suggest that such anti-patterns-as in the traditional programs-exist in many IaC repositories.Van der Bent et al. [40] introduce a quality model for Puppet programs which is empirically evaluated by interviewing practitioners from industry.Schwarz et al. [36] do similar work focusing on Chef recipes.Endeavors have recently moved to the identification of security issues.Rahman et al. [33] define and classify security smells into seven categories, such as hard-coded passwords and the use of weak cryptographic algorithms, and then build a tool for statically detecting these smells in Puppet repositories.
Other studies attempt to extract error patterns and source code properties from the analysis of defective IaC programs.Rahman et al. [34] employ machine learning and text processing techniques to identify properties that faulty Puppet programs hold.Then, they build a prediction model for asserting whether IaC scripts manifest faults or not.Chen et al. [3] identify error patterns in Puppet manifests by following a different approach.First, they inspect the code changes from repositories' commits.Second, they construct an unsupervised learning model to detect error patterns based on the clustering of the proposed fixes.Their approach is based on the assumption that similar faults are fixed with similar patches [19].
There are few automated techniques proposed for improving the reliability of configuration management programs.Rehearsal [37] statically verifies that a given Puppet configuration is deterministic and idempotent.Rehearsal models a given Puppet manifest in a small language called fs and then it constructs logical formulas based on the semantics of each language's primitive.Finally, an smt solver decides whether the initial program is non-deterministic or not.Compared to our approach, Rehearsal is less effective and practical.Specifically, Rehearsal employs a form of static analysis that can only handle a subset of Puppet programs.For example, the analysis does not support exec abstractions because it cannot reason about the file system resources that exec processes.Unlike Rehearsal, our approach operates on the actual system calls rather than Puppet manifests; thus, it can effectively determine which files are affected by a Puppet run and how.
Other advances [20,23] adopt a model-based testing approach for checking whether configuration scripts meet certain properties.Hummer et al. [23] focus on testing the idempotence of Chef scripts.Their proposed framework generates multiple test cases that explore different task schedules.By tracking the changes in the system (they compare the system state before and after execution), they determine if idempotence holds for a given program.Hanappi et al. [20] extend the work of Hummer et al. and introduce Citac; a framework that can be applied to Puppet manifests to examine the convergence of programs.Convergence states that the system reaches a desired state even at the presence of failed Puppet abstractions.They formally express the properties of idempotence and convergence, and through test case generation, they verify if the provided manifests violate those properties.Contrary to Citac, we adopt a more lightweight and practical approach applying manifests only once.Finally, neither Rehearsal nor Citac detect issues involving missing notifiers.
Trace Analysis.Analysis of system call traces has been widely used in the past, especially for intrusion and malware detection [2,6,24,42].Mutlu et al. [30] collect execution traces from JavaScript applications.Their traces do not track system calls, but they capture memory and storage (e.g., cookies) accesses in the context of the browser.They split traces into blocks, where each block describes the execution of an asynchronous callback (e.g., ajax handler).As the execution of each handler is partially ordered, they apply a simple data-flow analysis over traces to join the states coming from different handlers.In this manner, they effectively detect data races by identifying handler pairs where the merges of their corresponding states result in different values of the same variable.In our work, we also separate the trace sequences into blocks.However, we are interested in file system operations instead of reads and writes to memory locations.Also, we apply a different methodology for discovering execution blocks that might lead to harmful scenarios.
Modeling File System Operations.Several researchers have designed specifications for the posix file system [17,21,31].The specifications mainly focus on program reasoning and verification.Furthermore, Shambaugh et al. [37] have introduced fs; a small language used to model the effects of Puppet abstractions on the file system.In this work, we model system calls rather than Puppet abstractions.

CONCLUSION
We have introduced a novel and practical approach for identifying missing dependencies and notifiers in Puppet programs.Our method collects the system calls invoked by a Puppet program and models them in FStrace.Through FStrace, we capture how higherlevel programming constructs, such as Puppet abstractions, interact with the operating system and derive their relationships.Then, our method checks the inferred relationships against the program's dependency graph and reports potential mismatches.
The effectiveness of our approach is exemplified by the uncovering of 57 previously unknown issues in 30 projects.Notably, we provided fixes for 21 modules and 16 of them were accepted by the developers.We have further showed that our tool can handle realistic traces in a reasonable time.Our results indicate that our tool can be used as part of the testing process for Puppet programs.
FStrace is a generic model that can be applied to other domains with partially ordered constructs such as the asynchronous callbacks of JavaScript.Recent work [4,43] has showed that many concurrency faults in Node.js applications are caused by data races that appear in files instead of memory locations.As future work, we are planning to leverage our method to detect such concurrency faults in JavaScript server-side applications.

Figure 2 :
Figure 2: The Abstract Architecture of our Framework.

Figure 3 :
Figure 3: An example of trace produced by the strace.

Figure 4 :
The syntax of FStrace.hpathsym d p m operates in a way similar to hpath.In hpathsym though, if the given path p is a symbolic link, we do not dereference it.Through hpathsym we express system calls that do not follow symbolic links such as lstat, lchown, lgetxattr.link d 1 p 1 d 2 p 2 creates a hard link that points to the same resource as the file indicated by the file descriptor d 1 and the path p 1 .open d p o * f associates the file indicated by the path p with the file descriptor f .A sequence of flags o * captures the operations that can be performed on the file.rename d 1 p 1 d 2 p 2 arranges that a given file, specified through the path defined by the file descriptor d 1 and path p 1 , is accessed through the new path defined by d 2 and p 2 .symlink p 1 d p 2 creates a symbolic link file at the location specified by the file descriptor d and path p 2 pointing to the path p 1 .

Figure 8 :
Figure 8: A Missing Ordering Relationship between package and exec.

Figure 9 :
Figure 9: Manifest and its template taken from camptocamp-tomcat.We omit irrelevant code for brevity.

Figure 10 :
Figure 10: The trace analysis and fault detection time as a function of the trace size.Each spot shows the average time spent on both the trace analysis and fault detection phases for a given trace obtained by the execution of a module.

Process Trace Trace Analyzer Block Tagger
where , p 1 , p 2 , p 3 ∈ P The relation ≺ д forms a happens-before relation between two Puppet abstractions i.e., if there is path (of any length) from p 1 to p 2 , then the former is executed before the latter.

Table 1 :
Faults found in Puppet modules