Online and Oﬄine Stream Runtime Veriﬁcation of Synchronous Systems (cid:63)

. We revisit Stream Runtime Veriﬁcation for synchronous systems. Stream Runtime Veriﬁcation (SRV) is a declarative formalism to express monitors using streams, which aims to be a simple and expressive speciﬁcation language. The goal of SRV is to allow engineers to describe both correctness/failure assertions and interesting statistical measures for system proﬁling and coverage analysis. The monitors generated are useful for testing, under actual deployment, and to analyze logs. The main observation that enables SRV is that the steps in the algorithms to monitor temporal logics (which generate Boolean verdicts) can be generalized to compute statistics of the trace if a diﬀerent data domain is used. Hence, the fundamental idea of SRV is to separate the temporal dependencies in the monitoring algorithm from the concrete operations to be performed at each step. In this paper we revisit the pioneer SRV speciﬁcation language Lola and present in detail the online and oﬄine monitoring algorithms. The algorithm for online monitoring Lola speciﬁcations uses a partial evaluation strategy, by incrementally constructing output streams from input streams, maintaining a storage of partially evaluated expressions. We identify syntactically a class of speciﬁcations for which the online algo-rithm is trace length independent, that is, the memory requirement does not depend on the length of the input streams. Then, we extend the principles of the online algorithm to create an eﬃcient oﬄine monitoring algorithm for large traces, which consist on scheduling trace length independent passes on a dumped log.


Introduction
Runtime Verification (RV) is an applied formal method for software reliability that analyzes the system by processing one trace at a time. In RV a specification is transformed automatically into a monitor, and algorithms are presented This research has been partially supported by: the EU H2020 project Elastest (num. 731535), by the Spanish MINECO Project "RISCO (TIN2015-71819-P)" and by the EU ICT COST Action IC1402 ARVI (Runtime Verification beyond Monitoring).
to evaluate monitors against traces of observations from the system. There are two kinds of monitoring algorithms in RV depending on when the trace is generated and processed. In online monitoring the monitor checks the trace while the system runs, while in offline monitoring a finite collection of previously generated traces are analyzed. Online monitoring is used to detect violations of the specification when the system is in operation, while offline monitoring is used in post-mortem analysis and for testing large systems before deployment.
Static verification techniques like model checking intend to show that every (infinite) run of a system satisfies a given specification, while runtime verification is concerned only with a single (finite) trace. Thus, RV sacrifices completeness to provide an applicable formal extension of testing. See [24,21] for modern surveys on runtime verification and the recent book [4].
The first specification languages studied for runtime verification were based on temporal logics, typically LTL [22,13,6], regular expressions [28], timed regular expressions [1], rules [3], or rewriting [27]. In this paper we revisit the Stream Runtime Verification specification formalism, in particular the Lola specification language for synchronous systems [12]. The Lola language can express properties involving both the past and the future and their arbitrary combination. In SRV, specifications declare explicitly the dependencies between input streams of values-that represent the observations from the system-and output streams of values-that represent monitoring outputs, like error reports and diagnosis information. The fundamental idea of SRV is to cleanly separate the temporal reasoning from the individual operations to be performed at each step. The temporal aspects are handled in a small number of constructs to express the offsets between observations and their uses. For the data, SRV uses off-the-self domains with interpreted functions so function symbols can be used as constructors to create expressions, and their interpretation is used for evaluation during the monitoring process. The domains used for SRV are not restricted to Booleans and allow richer domains like Integers, Reals (for computing quantitative verdicts) and even queues, stacks, etc. These domains do not involve any reasoning about time. The resulting expressiveness of SRV surpasses that of temporal logics and many other existing formalisms including finite-state automata. The restriction of SRV to the domain of Booleans is studied in [10], including the expressivity, the comparison with logics and automata and the complexity of the decision problems.
The online monitoring problem of past specifications can be solved efficiently using constant space and linear time in the trace size. For future properties, on the other hand, the space requirement depends on the length of the trace for rich types (even though for LTL, that is for the verdict domain of the Booleans, one can use automata techniques to reduce the necessary space to exponential in the size of the specification). Consequently, online monitoring of future temporal formulas quickly becomes intractable in practical applications with long traces. On the other hand, the offline monitoring problem for LTL-like logics is known to be easy for purely past or purely future properties. We detail in the paper a syntactic characterization of efficiently monitorable specifications (introduced in [12]), for which the space requirement of the online monitoring algorithm is independent of the size of the trace, and linear in the specification size. This property was later popularized as trace length independence [5] and is a very desirable property as it allows online monitors to scale to arbitrarily large traces. In practice, most properties of interest in online monitoring can be expressed as efficiently monitorable properties. For the offline monitoring problem, we show an efficient monitoring strategy in the presence of arbitrary past and future combinations by scheduling trace length independent passes. We describe here the algorithm and results using the Lola specification language. An execution of the monitor extracted from a Lola specification computes data values at each position by evaluating the expressions over streams of input, incrementally computing the output streams.
Two typical specifications are properties that specify correct behavior, and statistical measures that allow profiling the system that produces the input streams. One important limitation of runtime verification is that liveness properties can never be violated on a finite trace. Hence, most of these properties have been typically considered as non-monitorable (for violation) as every finite prefix can be extended to a satisfying trace, at least if the system is considered as a black box and can potentially generate any suffix. An appealing solution that SRV supports is to compute quantitative measures from the observed trace. For example, apart from "there are only finitely many retransmissions of each package," which is vacuously true over finite traces, SRV allows to specify "what is the average number of retransmissions." Following this trend, runtime monitors can be used not only for bug-finding, but also for profiling, coverage, vacuity and numerous other analyses. An early approach for combining proving properties with data collection, which inspired SRV, appeared in [16].
In the present paper we present a simplified semantics of Lola [12] together with a detailed presentation of the monitoring algorithms as well as the necessary definitions and proofs. In the rest of the paper we use SRV and Lola interchangeably.
Related Work. The expressions that declare the dependencies between input streams and output streams in SRV are functional, which resemble synchronous languagues-which are also functional reactive stream computation languageslike Lustre [20], Esterel [9] and Signal [17], with additional features that are relevant to monitoring. The main difference is that synchronous languages are designed to express behaviors and therefore assume the causality assumption and forbid future references, while in SRV future references are allowed to describe dependencies on future observations. This requires additional expressiveness in the language and the evaluation strategies to represent that the monitor cannot decide a verdict without observing future values. These additional verdicts were also introduced for this purpose in LTL-based logics, like LTL 3 and LTL 4 [7,6,8], to encode that the monitor is indecisive.
An efficient method for the online evaluation of past LTL properties is presented in [22], which exploits that past LTL can be recursively defined using only values in the previous state of the computation. The efficiently monitorable fragment of SRV specifications generalize this idea, and apply it uniformly to both verification and data collection. One of the early systems that most closely resembles Lola is Eagle [3], which allows the description of monitors using greatest and least fixed points of recursive definitions. Lola differs from Eagle in the descriptive nature of the language, and in that Lola is not restricted to checking logical formulas, but can also express numerical queries.
The initial application domain of Lola was the testing of synchronous hardware by generating traces of circuits and evaluating monitors against these traces. Temporal testers [26] were later proposed as a monitoring technique for LTL based on Boolean streams. Copilot [25] is a domain-specific language that, similar to Lola, declares dependencies between streams in a Haskell-based style, to generate C monitors that operate in constant time and space (the fragment of specifications that Copilot can describe is efficiently monitorable). See also [18].
The simple version of Lola presented here does not allow to quantify over objects and instantiate monitors to follow the actual objects observed, like in Quantified Event Automata [2]. Lola2.0 [14] is an extension of Lola that allows to express parametrized streams and dynamically generates monitors that instantiate these streams for the observed data items. The intended application of Lola2.0 is network monitoring.
Stream runtime verification has a also been extended recently to asynchronous and real-time systems. RTLola [15] extends SRV from the synchronous domain to timed streams. In RTLola streams are computed at predefined periodic instants of time, collecting aggregations between these predefined instants using a library of building blocks. TeSSLa [11] also offers a small collection of primitives for expressing stream dependencies (see also [23]) but allows to compute timedstreams at arbitrary real-time instants. The intended application of TeSSLa is hardware based monitoring. Striver [19] offers a Lola-like language with time offsets, that allows to express explicit instants of time in the expressions between streams. Striver is aimed at testing and monitoring of cloud based systems.
The rest of the paper is structured as follows. Section 2 revisits the syntax and semantics of SRV. Section 3 presents the online monitoring of SRV specifications, including the notion of efficient monitorability. Section 4 presents the algorithm for offline monitoring, and finally Section 5 concludes.

Overview of Stream Runtime Verification
In this section we describe SRV using the Lola specification language. The monitoring algorithms will be presented in Sections 3 and 4.

Specification Language: Syntax
We use many-sorted first order interpreted theories to describe data domains. A theory is given by a finite collection T of types and a finite collection F of function symbols. Since our theories are interpreted every type T is associated with a domain D of values and every symbol f is associated with a computable function that, given elements of the domains of the arguments compute a value of the domain of the resulting type. We use sort and type interchangeably in this paper.
For example, the theory Boolean uses the type Bool associated with the Boolean domain with two values { , ⊥}, and has constant symbols true and false, and binary function symbols ∧, and ∨, unary function symbol ¬, etc all with their usual interpretations. A more sophisticated theory is Naturals, the theory of the Natural numbers, that uses two types Nat and Bool. The type Nat is associated with the domain {0, 1, . . .} of the Natural numbers, and has constant symbols 0, 1, 2, . . . and binary symbols +, * , etc of type Nat × Nat → Nat. Other function symbols in this theory are predicates <, ≤, . . . of type Nat×Nat → Bool. All our theories include equality and also, for every type T , a ternary predicate if · then · else · of type Bool × T → T . For simplicity we restrict the rest of the paper to types Nat and Bool.
Definition 1 (Stream Expression). Given a finite set Z of stream variable (each with a given type) the set of stream expressions is defined as follows: -Variable: If s is a stream variable of type T , then s is a stream expression of type T ; -Function Application: Let f : T 1 × T 2 × · · · × T k → T be a k-ary function symbol. If for 1 ≤ i ≤ k, e i is an expression of type T i , then f (e 1 , . . . , e k ) is a stream expression of type T .  A Lola specification describes a relation between input streams and output streams. A stream σ of type T and length N is a finite sequence of values from the domain of T ; σ(i), i ≥ 0 denotes the value of the stream at time step i. a finite set I of typed independent stream variables; a collection O of typed dependent stream variables; and a collections E of defining expressions, with one expression E y ∈ Expr (I∪O) for each output variable y ∈ O, where y and E y must have the same type.
We write y := E y to denote that the stream y is defined by its defining expression E y , which can use every stream variable in I ∪ O (including y itself) as atomic terms. Sometimes, Lola specifications include a collection of triggers defined by expressions of type Bool over the stream variables, with the intended meaning of informing the user when the corresponding expressions become true, but we do not use triggers in the presentation in this paper. Independent variables refer to input streams and dependent variables refer to output streams. It is often convenient to partition the dependent variables into output variables and intermediate variables to distinguish streams that are of interest to the user from those that are used only to facilitate the description of other streams. However, for the semantics and the algorithm this distinction is not important, and hence we will ignore this classification in the rest of the paper.
Example 1. Let x 1 and x 2 be stream variables of type Boolean and x 3 be a stream variable of type integer. The following is an example of a Lola specification with I = {x 1 , x 2 , x 3 } as independent variables, O = {y 1 , . . . , y 10 } as dependent variables and the following defining equations: Stream variable y 1 denotes a stream whose value is true at all positions, while y 2 denotes a stream whose values are the same at all positions as those in x 3 . The values of the streams corresponding to y 3 , . . . , y 6 are obtained by evaluating their defining expressions place-wise at each position. The stream corresponding to y 7 is obtained by taking at each position i the value of the stream corresponding to x 1 at position i + 1, except at the last position, which assumes the default value false. Similarly for the stream for y 8 , whose values are equal to the values of the stream for x 1 shifted by one position, except that the value at the first position is the default value true. The stream specified by y 9 counts the number of odd entries in the stream assigned to x 3 by accumulating (x 3 mod 2). Finally, y 10 denotes the stream that gives at each position the value of the temporal formula x 1 U x 2 with the stipulation that unresolved eventualities be regarded as satisfied at the end of the trace.
To present formal results, it is sometimes convenient to work with a simpler class of specifications.

Definition 3 (Flat).
A specification is flat if each defining expression E y is one of the following Definition 4 (Normalized). A specification is normalized if it only contains offsets 1 or −1.
Any Lola specification can be converted into a flat specification by introducing extra stream variables as place-holders for complex sub-expressions. Similarly, any Lola specification can be converted into a normalized specification by introducing additional stream variables defined to carry value n−1 for offsets of n > 1 (and n + 1 for offsets of n < −1). This transformation also preserves flatness so every Lola specification can be converted into a normalized flat specification.

Specification Language: Semantics
In order to define the semantics of SRV specifications we first define how to evaluate expressions. Consider a map σ I that assigns one stream σ x of type T and length N for each input stream variable x of type T , and a map σ O : {. . . , σ y , . . .} that contains one stream σ y of length N for each defined stream variable y (again of the same type as y). We call (σ I , σ O ) an interpretation of ϕ, and use σ as the map that assigns the corresponding stream as σ I or σ O (depending on whether the stream variable is an input variable or an output variable).

Definition 5 (Valuation).
Given an interpretation (σ I , σ O ) a valuation is a map · that assigns to each expression a stream of length N of the type of the expression as follows: We now can define when an interpretation (σ I , σ O ) of ϕ is an evaluation model, which gives denotational semantics to Lola specifications.

Definition 6 (Evaluation Model). An interpretation
For a given set of input streams, a Lola specification may have zero, one, or multiple evaluation models. where x has type Nat and y has type Bool .
For any given input stream σ x , ϕ 1 has exactly one evaluation model (σ x , σ y ), where σ y (i) = true if and only if σ x (i) ≤ 10, for 1 ≤ i ≤ N . The specification ϕ 2 , however, may give rise to multiple evaluation models for a given input stream. For example, for input stream σ x : 0, 15, 7, 18 , both σ y : false, false, false, false and σ y : false, true, false, true make (σ x , σ y ) an evaluation model of ϕ 2 . The specification ϕ 3 , on the other hand, has no evaluation models, because there is no solution to the equations σ y (i) = ¬σ y (i).

Well-definedness and Well-formedness
SRV specifications are meant to define monitors, which intuitively correspond to queries of observations of the system under analysis (input streams) for which we want to compute a unique answer (the output streams). Therefore, the intention of a specification is to define a function from input streams to output streams, and this requires that there is a unique evaluation model for each instance of the input streams. The following definition captures this intuition.
Definition 7 (Well-defined). A Lola specification ϕ is well-defined if for any set of appropriately typed input streams σ I of the same length N > 0, there exists a unique valuation σ O of the defined streams such that (σ I , σ O ) |= ϕ.
A well-defined Lola specification maps a set of input streams to a unique set of output streams. Unfortunately well-definedness is a semantic condition that is hard to check in general (even undecidable for rich types). Therefore, we define a more restrictive (syntactic) condition called well-formedness, that can be easily checked on every specification ϕ and implies well-definedness. We first add an auxiliary definition.
An edge y k → v encode that y at a particular position potentially depends on the value of v, offset by k positions. Note that there can be multiple edges between x and y with different weights on each edge. Also note that vertices that correspond to input variables do not have outgoing edges.
A walk of a graph is a sequence v 1 The weight of a walk is the sum of the weights of its edges. A simple walk is a walk in which no vertex is repeated. A cycle is a simple closed walk.
Definition 9 (Well-Formed Specifications). A Lola specification ϕ is wellformed if its dependency graph has no closed walk with weight zero.
Its normalized specification is The dependency graph of the normalized specifications is: This specification has a zero-weight closed walk, namely y 1 0 − → y 9 0 − → y 2 0 − → y 1 , and hence the specification is not well-formed.
To prove that well-formedness implies well-definedness, we first define the notion of an evaluation graph which captures the dependencies for a given input length N .
Definition 10 (Evaluation graph). Given a specification ϕ and a length N , the evaluation graph is the directed graph G N : V, E where V contains one vertex v j for each position j of each stream variable v and there is an edge y j → v j if E y contains v as an atom, and there is an edge The vertices v j are called position variables as they encode the value of stream variable v at position j. We will prove later that a specification is guaranteed to be well-defined if no evaluation graph for any length contains a cycle, because in this case the value of each position variable can be uniquely determined. The following lemma relates this acyclicity notion with the absence of zero-weight cycles in the dependency graph. Lemma 1. Let ϕ be a specification with dependency graph D, let N be a trace length and G N the explicit dependency graph. If G N has a cycle then D has a zero-weight closed walk.
The corresponding closed walk in D is Note that the closed walk induced in D needs not be a cycle since some of the intermediate nodes may be repeated, if they correspond to the same y k for different position j.
Lemma 2. Let ϕ be a specification and N a length. If G N has no cycles, then for every tuple σ I of input streams of length N , there is a unique evaluation model.
Proof. Assume G N has no cycles, so G N is a DAG. Then we can define a topological order > on G N by taking the transitive closure of →. We prove by induction on this order that the value of each vertex is uniquely determined, either because this value is obtained directly from an input stream or constant value in the specification, or because the value can be computed from values computed before according to >.
For the base case, the value of a vertex v j without outgoing edges does not depend on other streams. The only possible value is either the value of an input stream (if v is an input stream variable) at position j, or a value obtained from an equation with no variables or offsets as atoms. In all these cases the value is uniquely determined.
For the inductive case, the value of v j can be computed uniquely from the values of its adjacent vertices in G N . Indeed, by Definition 10, if the value of v j depends on the value of v k then there exists an edge v j → v k in G N and thus v j > v k and, by the inductive hypothesis, the value of v k is uniquely determined. Then, since every atom in E v (j) is uniquely determined, the value of E v (j) is uniquely determined. Since this value has been computed only from inputs, this is the only possible value for σ v (j) to form an evaluation model.
Consider now a well-formed specification ϕ. Then, by Lemma 1, no evaluation graph has cycles, and thus by Lemma 2 for every set of input streams, there exists a unique solution for the output streams, and hence there is exactly one evaluation model. Theorem 1. Every well-formed Lola specification is well-defined.
Note that the converse of Theorem 1 does not hold. First, the absence of cycles in G N does not imply the absence of a zero-weight closed walk in D. For example, the evaluation graph for the specification for N < k has no cycles (since it has no edges), but it is easy to see that D has a zero-weight closed walk. Second, a cycle in G N does not necessarily imply that ϕ is not well-defined. For example, the evaluation graph of the specification y := (z ∨ ¬z) ∧ x z := y has a cycle for all N , but for every input stream, ϕ has exactly one evaluation model, namely σ y = σ z = σ x , and thus, by definition, the specification is welldefined.

Checking Well-Formedness
A Lola specification ϕ is well-formed if its dependency graph D has no closed walks, so checking well-formedness is reduced to construct D and check for closed walks. In turn, this can be reduced to checking for cycles as follows.
Let a gez-cycle be a cycle in which the sum of the weight of the edges is greater than or equal to zero, and let a gz-cycle be a cycle in which the sum of the weight of the edges is strictly greater than zero. Similarly, a lez-cycle is a cycle where the sum is less than or equal zero and a lz-cycle is one where the sum is less than zero. The reduction is based on the observation that a graph has a zero-weight closed walk if and only if it has a maximally strongly component (MSCC) with both a gez-cycle and a lez-cycle. Proof. (⇒) Assume v is part of gez-cycle C 1 and lez-cycle C 2 , with weights w 1 ≥ 0 and w 2 ≤ 0, respectively. The closed walk consisting of traversing w 1 times C 2 and then traversing |w 2 | times C 1 has weight w 1 w 2 + |w 2 |w 1 = 0, as desired.
(⇐) Assume D has a zero-weight closed walk. If D has a zero-weight cycle C we are done, as C is both a gez-cycle and a lez-cycle and any vertex in C has the desired property.
For the other case, assume D has no zero-weight cycles. It is easy to show by induction in the length of W that every closed walk can be decomposed into cycles that share one vertex. If one of these cycles is a lez-cycle or a gez-cycle the result follows. Now, not all the cycles can be strictly positive, because then the total weight of W would not be zero. Consequently there must a positive cycle and a negative cycle, and therefore there must be two consecutive cycles C 1 and C 2 that share one node and C 1 is positive and C 2 is negative. Proof. (⇒) Consider an arbitrary MSCC with only gz-cycles (the case for only lz-cycles is analogous). By the proof of Lemma 3, a closed walk is the multiset union of one or more cycles with weight the sum of the weights of the cycles. Hence the weight of any closed walk within the MSCC must be strictly greater than zero. Since any closed walk must stay within an MSCC, the weight of any closed walk must be strictly greater than zero.
(⇐) Assume D has no zero-weight closed walk. Then, by Lemma 3, D has no vertex that lies on both a gez-cycle and a lez-cycle. Suppose D has an MSCC with a gz-cycle C 1 and a lz-cycle C 2 . Consider an arbitrary vertex v 1 on C 1 and v 2 on C 2 . If v 1 = v 2 = v, v lies on both a gez-cycle and a lez-cycle, a contradiction. If v 1 = v 2 , since v 1 and v 2 are in the same MSCC, there exists a cycle C 3 that contains both v 1 and v 2 . C 3 is either a zero-weight cycle, a gz-cycle or a lz-cycle. In all three cases either v 1 or v 2 or both lie on both a gez-cycle and a lez-cycle, a contradiction.
Thus to check well-formedness of a SRV specification ϕ it is sufficient to check that each MSCC in G has only gz-cycles or only lz-cycles. This can be checked efficiently, even for large dependency graphs.

Online Monitoring
We distinguish two situations for monitoring-online and offline monitoring. In online monitoring, the traces from the system under observation are received as the system run, and the monitor works in tandem with the system. This leads to the following restriction for online monitoring: the traces are available a few points at a time starting at the initial instant on-wards, and need to be processed to make way for more incoming data. In particular, random access to the traces is not available. The length of the trace is assumed to be unknown upfront and very large.
In offline monitoring, on the other hand, we assume that the system has run to completion and the trace of data has been dumped to a storage device. Offline monitoring is covered in Section 4.

Monitoring Algorithm
We start by exhibiting a general monitoring algorithm for arbitrary Lola specifications, and then study its efficiency. Let ϕ be a Lola specification with independent stream variables I, dependent stream variables O and defining expressions E. Let j be the current position, at which the latest data is available from all input streams. The monitoring algorithm maintains two sets of equations as storage: An equation (v k , c) stored in R denotes that stream variable v at position k in the trace has been determined to have value c. This happens in two cases: input streams whose reading has been performed, and dependent stream variables whose value has been computed. Equations in U relate position variables y kwhere y is a dependent stream variable-with a (possibly partially simplified) expression over position variables whose values have not yet been determined. Note that if (y k , e) is in U then e must necessarily contain at least one position variable, because otherwise e is a ground expression and the interpreted functions from the domain can transform e into a value. The monitoring algorithm is shown in Algorithm 1. After initializing the U and R stores to empty and j to 0, the monitoring algorithm executes repeatedly the main loop (lines 5 to 11). This main loop first reads values for all inputs at the current position and adds these values to R (line 6). Then, it instantiates the defining equations for all outputs and adds these to U (line 7). Finally, it propagates new known values (v k , c) in R by substituting all occurrences of v k in unresolved equations by c and then simplifies resulting equations (procedure Propagate). This procedure simply uses all the information in R to substitute occurrences of known values in unresolved equations. In some cases, these equations become resolved (the term becomes a value) and the corresponding pair is moved to R (lines 23 and 24). Then, the procedure Prune is used to eliminate unnecessary information from R as described below. Finally, procedure Finalize is invoked at the end of the trace. This procedure is used to determine whether a given offset expression that remains in an unresolved equation falls beyond the end of the trace, which is converted into its default value. This procedure also performs a final call to Propagate, which is guaranteed (see below) to resolve all position variables, and therefore U becomes empty.
Procedure Inst, shown in Algorithm 2, instantiates the defining equation for v into the corresponding equation for v j at given position j by propagating the value into the atomic stream variable references and offsets atoms, which become instance variables. Note that the default value c is recorded in line 57 in case the computed position k + j falls beyond the end of the trace N , which is not known at the point of the instantiation. Whether k + j is inside the trace will be determined after k steps or resolved by Finalize.

Algorithm 1 Monitoring algorithm
while not finished do 6: R ← R ∪ {(y j , σy(j)) | for every y ∈ I} Add new inputs to R 7: Inst(ex, j)) | for every x ∈ O} Add output instances to U 8: Propagate() 9: Prune(j) 10: j ← j + 1 11: end while 12: N ← j + 1 13: Finalize(N ) 14: end procedure 15: procedure Propagate 16: repeat 17: change ← false 18: for all (v k , e) ∈ U do Try to resolve every v k in U 19: e ← simplify(subst(e, R)) 20: if e is value then 22: change ← true v k is resolved 23: return f (Inst(e1, j), . . . , Inst(en, j)) 51: case v 52: return v j 53: case v[k, c] 54: if k + j < 0 then 55: return c 56: else 57: return v k+j c 58: end if 59: end procedure We show now how the resolved storage R can be pruned by removing information that is no longer necessary. The back reference distance of a stream variable represents the maximum time steps that its value needs to be remembered.
Definition 11 (Back Reference Distance). Given a specification ϕ with dependency graph D the back reference distance ∇v of a vertex v is Example 5. We illustrate the use of back reference distances for pruning R (lines 31 and 32) revisiting Example 4. The back reference distances are ∇y 1 = ∇y 10 = ∇y 11 = ∇y 12 = ∇y 13 = ∇y 14 = ∇y 15 = ∇y 16 = ∇x 1 = 0 and ∇y 2 = ∇x 2 = 2. Consequently, all equations (v j , c) are removed from R in the same time step that they are entered in R, except for y j 2 and x j 2 , which must remain in R for two time steps until instant j + 2.
Example 6. Consider the following specification which computes p U q. For input streams σ p : false, false, true, false and σ q : true, false, false, false the equations in stores R and U at the completion of step (3) of the algorithm at each position are: Since the back reference distance of all stream variables is 0, all equations can be removed from R at each position.

Theorem 3 (Correctness).
Let ϕ be a specification and σ I be input streams of length N . If ϕ is well-formed, then Algorithm 1 computes the unique evaluation model of ϕ for σ I . That is, at the end of the trace the unique value has been computed for each y k , and U is empty.
Proof. Assume ϕ is well-formed. By Definition 9 the dependency graph D has no zero-weight closed walks and hence by Lemma 1, the evaluation graph G N has no cycles, and we can define a topological order < in G N . As in the proof of Lemma 1, every vertex of G N can be mapped to the corresponding value of the unique evaluation model. We prove by induction on G N that at the end of the trace each of these values has been computed and that each value has been available in R at some point j < N during the computation.
For the base case, leaf vertices v j correspond to either input stream variables or values from equations of the form x = c or x = y[k, c] such that j + k < 0. In both cases the value is uniquely obtained and the corresponding equation is added to R.
For the inductive case, the value for vertex v j is uniquely computed from the values for vertices w k such that v j → w k , and hence w k < v j and by the inductive hypothesis, the value for w k is uniquely computed or obtained and is at some point available in R. It remains to be shown that these values are available in R for substitution. We distinguish three cases: 1. j = k. In this case (v j , e) and (w k , e ) are added to U (or R) at position j (either in line 6 or in line 7). If (w k , e ) is added to R, the value of w k in e is substituted in e in line 19. If (w k , e ) is added to U , by the inductive hypothesis, it is available at some later point in the computation. Then it must be moved to R in line 23, and hence in the same step it is substituted in e. 2. j < k. In this case (w k , e ) is added to U (or R) after (v j , e) is added to U . Again, by the inductive hypothesis, (w k , c) will be resolved and become available in R at some position l < N and thus at that same position is substituted in e if (v j , e) is still in U . 3. j > k. In this case E v contains w[i, c] and thus k = j + i (i.e. i < 0). Now, (w k , c) is added to R or U before (v j , e) is added to U . Again, by the inductive hypothesis, w k will be resolved at some position l ≤ N , which must be after k. By the definition of k, (w k , c) will be in R at least until k + ∇w which is guaranteed to be at j or after and hence be available when v j is added to U .
This finishes the proof.

Efficiently Monitorable Specifications
In the general case the algorithm Monitor described above is linear in both time and space in the length of the trace and the size of the specification. In these bounds, we assume that the value of a type can be stored in a single register of the type, and that a single function is computed in a single step.
In online monitoring, since the traces are assumed to be large, it is generally assumed that a specification can be monitored efficiently only if the memory requirements are independent of the trace length. In general, for any input stream σ x , output stream σ z has all its values equal to the last value of σ x . However, for all j, equations (w j , z j+1 0 ) and (z j , if last j then x j else w j ) remain unresolved until the end of the trace, and thus the memory requirements of Algorithm 1 for this specification are linear in the length of the trace.
The worst-case memory usage of a Lola specification for a given trace length can be derived from the evaluation graph with the aid of the following definitions.
Definition 12 (Fan and Latency). The fan of a vertex v j of an evaluation graph G N is the set of vertices reachable v j : The latency of a position variable v j is the difference between j and the position of the furthest vertex in fan(v j ): The fan of v j is an over-approximation of the set of vertices on which the value of v j depends. The latency is an upper-bound on the number of trace steps it takes before a value at a given position is guaranteed to be resolved.
Theorem 4. If a vertex v j has latency k, then the corresponding equation (v j , e) will be fully resolved by Monitor at or before step j + k.
Proof. Since the specification is well-formed the evaluation graph is acyclic. We show the results by induction on a topological order < of the evaluation graph. Note that if v j → w i then lat(v j ) ≥ lat(w i ) directly by the definition of latency. Then, at position j + k it is guaranteed that w i is resolved. Since all atoms in the expression e of equation (v j , e) are resolved at j + k or before, the corresponding values are subsituted in e (line 19) at step j + k or before, so e is simplified into a value at j + k or before.
Example 8. Consider again the specification of Example 7. The latency of z 2 is N − 2, so equations for z 2 may reside in U for N − 2 steps, so this specification cannot be monitored online in a trace-length independent manner.

Definition 13 (Efficiently Monitorable).
A Lola specification is efficiently monitorable if the worst case memory usage of Monitor is independent of the length of the trace.
Some specifications that are not efficiently monitorable may be rewritten into equivalent efficiently monitorable form, as illustrated by the following example.
Example 9. Consider the specification "Every request must be eventually followed by a grant before the trace ends", expressed as ϕ 1 as follows: reqgrant := if request then evgrant else true evgrant := grant ∨ nextgrant nextgrant := evgrant [1, false] This specification encodes the temporal assertion (request → grant). Essentially, evgrant captures grant and reqgrant corresponds to (request → grant) (see [12] and [10] for a description of translation from LTL to Boolean SRV). An alternative specification ϕ 2 of the same property is It is easy to see that, for the same input, ended ∧ waitgrant is true at the end of the trace (for ϕ 2 ) if and only if ¬nextgrant is true at the beginning of the trace for ϕ 1 . Hence, both specifications can report a violation at the end of the trace if a request was not granted. The second specification, however, is efficiently monitorable, while the first one is not.
Similar to the notion of well-definedness, checking whether a specification is efficiently monitorable is a semantic condition and cannot be checked easily in general. Therefore we define a syntactic condition based on the dependency graph of a specification that guarantees that a specification is efficiently monitorable.
Definition 14 (Future Bounded). A well-formed specification ϕ is future bounded if its dependency graph D has no positive-weight cycles.
We show that every future bounded specification is efficiently monitorable by showing that in the absence of positive-weight cycles every vertex in the dependency graph can be mapped to a non-negative integer that provides an upper-bound on the number of trace steps required to resolve the equation for the corresponding instance variable.
Definition 15 (Look-ahead Distance). Given a future bounded specification with dependency graph D, the look-ahead distance ∆v of a vertex v is the maximum weight of a walk starting from v (or zero if the maximum weight is negative).
Note that the look-ahead distance is well defined only in the absence of positive-weight cycles. The look-ahead distance of a vertex can be computed easily using shortest path traversals on the dependency graph D. The dependency graph D of this specification is: Consequently, the values of the look-ahead distance are: ∆y 4 = 1 ∆y 5 = 0 ∆y 6 = 3 ∆y 7 = 2 ∆y 8 = 0 ∆y 9 = 7 which are easily computer from D.
The look-ahead distance provides an upper-bound on the number of equations that may simultaneously be in U . Example 11. Consider again the specification of Example 10. The back reference distance is 0 for all variables except for x 2 and y 3 , which are ∇x 2 = 1, ∇y 3 = 7. Hence, at the end of every main loop, R only contains one instance of x 2 and seven instances of y 3 . Additionally, the look-ahead distance of a stream variable v bounds linearly the number of instances of v in U .

Corollary 1. Every future-bounded specification is efficiently monitorable.
Note that the converse does not hold. In practice, it is usually possible to rewrite an online monitoring specification with a positive cycle into one without positive cycles, as illustrated in Example 9.

Offline Monitoring
In offline monitoring we assume that all trace data is available on tape, and therefore we can afford more flexibility in accessing the data. In this section we show that every well-formed SRV specification can be monitored efficiently offline, in contrast to online monitoring where we required that the dependency graph not have any positive-weight cycles. The reason why we can efficiently monitor in an offline manner all specifications is that we can perform both forward and backward passes over the trace. We will show that every well-formed specification can be decomposed into sub-specifications such that each sub-specification needs to be checked only once and can be done so efficiently by either traversing the trace in a forward or in a backward direction. In this manner, all values of the output streams of a sub-specification can written to tape and are accessible for subsequent traversals.
We first define the notions of reverse efficiently monitorable and its corresponding syntactic condition, past bounded, as the duals of efficiently monitorable and future bounded. A reverse monitoring algorithm RevMonitor can be easily obtained by initializing j to N (line 4) decreasing j (line 10), pruning j on the dual of the back-reference distance in line 31 and performing substitutions when the offset becomes negative (so Finalize is not necessary for reverse monitoring). This is essentially the same algorithm as Monitor but performing the index transformation j = N − (j + 1).

Definition 16 (Reverse Efficiently Monitorable).
A Lola specification is reverse efficiently monitorable if its worst-case memory requirement when applying RevMonitor is independent of the length of the trace.
Definition 17 (Past Bounded). A well-formed Lola specification is past bounded if its dependency graph has no negative-weight cycles.
Proof. The dual of the argument for Corollary 1.
We construct now an offline algorithm that can check a well-formed Lola specification in a sequence of forward and reverse passes over the tapes, such that the number of passes is linear in the size of the specification and each pass is trace-length indepedent.
Let ϕ be a well-formed specification with dependency graph D. From the definition of well-formedness it follows that D has no zero-weight cycles, so each MSCC consists of only negative-weight or only positive-weight cycles. Let G M : {V p , V n } , E M be the graph induced by the MSCCs of D defined as follows. For each positive-weight MSCC in D there is a vertex in V p and for each negative-weight MSCC in D there is a vertex in V n . For each edge between two MSCCs there is an edge in E M connecting the corresponding vertices. Clearly, G M is a DAG. Now we assign each MSCC a stage that will determine the order of computing the output for each MSCC following the topological order of G M . Positive MSCCs will be assigned even numbers and negative MSCCs will be assigned odd numbers. Every MSCC will be assigned the lowest stage possible that is higher than that of all its descendants with opposite polarity. In other words, the stage of an MSCC v is at least the number of alternations in a path in G M from v. Formally, let the opposite descendants be defined as follows: which can be computed following a topological order on G M . Each vertex v in G M can be viewed as representing a sub-specification ϕ v whose defining equations refer only to stream variables in sub-specifications with equal or lower stage processing order. Based on this processing order we construct the following algorithm.
Algorithm 3 Offline Trace Processing 1: procedure OfflineMon 2: for i = 0 to max(stage(v)) with increment 2 do 3: for all v with stage(v) = i do 4: Monitor(ϕv) Forward pass 5: end for 6: for all v with stage(v) = i + 1 do 7: RevMonitor(ϕv) Backward pass 8: end for 9: end for 10: end procedure Theorem 6. Given a well-formed specification, a trace can be monitored in time linear in the size of the specification and the length of the trace, with memory requirements linear in the size of the specification and independent of the length of the trace.
Proof. Follows directly from Lemmas 1 and 5 and Algorithm 3.
Example 12. Figure 1 shows the dependency graph of a Lola specification and its decomposition into MSCCs, along with its induced graph G M annotated with the processing order of the vertices. MSCCs G 1 and G 4 are positive, while G 2 and G 5 are negative. G 3 is a single node MSCC with no edges, which can be chosen to be either positive or negative. The passes are: G 5 is first monitored forward because it is efficiently monitorable. Then, G 4 is monitored backwards. After that, G 3 and G 2 are monitored forward. Finally, G 1 is monitored backwards.

Conclusions
We have revisited Stream Runtime Verification, a formalism for runtime verification based on expressing the functional relation between input streams and output streams, and we have presented in detail evaluation strategies for online and offline monitoring. SRV allows both runtime verification of temporal specifications and collection of statistical measures that estimate coverage and specify complex temporal patterns. The Lola specification language is sufficiently expressive to specify properties of interest to applications like large scale testing, and engineers find the language easy to use. Even specifications with more than 200 variables could be constructed and understood relatively easily by engineers. Even though the language allows ill-defined specifications, SRV provides a syntactic condition that is easy to check and that guarantees well-definedness, using the notion of a dependency graph. Dependency graphs are also used to check whether a specification is efficiently monitorable online, that is, in space independent of the trace length. In practical applications most specifications of interest are in fact efficiently monitorable or can be rewritten into an efficiently monitorable fashion. We revisited the online algorithm for Lola specifications, and presented an algorithm for offline monitoring whose memory requirements are independent of the trace length for any well-formed specification.
The design of the Lola specification language was governed by ease of use by engineers. In runtime verification, unlike in static verification, one is free to choose Turing-complete specification languages. As a result researchers have explored the entire spectrum from temporal logics to programming languages. The advantage of a programming language in comparison with a temporal logic is that a declarative programming language is more familiar to engineers and large specifications are easier to write and understand. The disadvantage is that the semantics is usually tied to the evaluation strategy (typically in an informal implicit manner) and the complexity is hard to determine, while the semantics of a logic is independent of the evaluation strategy and upper bounds for its complexity are known. In practice, the choice is motivated by the intended use. Stream Runtime Verification is usually employed to facilitate the task of writing large specifications for engineers, so the natural choice was a programming-language. SRV retains, however, most of the advantages of a logic: the semantics is independent of the evaluation strategy and the efficiently monitorable specifications provide a clear bound on complexity. See [10] where we study decision procedures and complexities of decision problems for Boolean Stream Runtime Verification. For example, comparing Lola with specification languages at the other end of the spectrum, such as Eagle [3], Lola usually allows simpler specifications, as illustrated in Figure 2.
Eagle encoding of the operator 2pF with intended meaning that F holds with at least probability p over a given sequence [3]: