Striver : Stream Runtime Veriﬁcation for Real-Time Event-Streams (cid:63)

. We study the problem of monitoring rich properties of real-time event streams, and propose a solution based on Stream Runtime Veriﬁcation (SRV), where observations are described as output streams of data computed from input streams of data. SRV allows a clean separation between the temporal dependencies among incoming events, and the concrete operations that are performed during the monitoring. SRV speciﬁcation languages typically assume that all streams share a global synchronous clock and input events arrive in a synchronous manner. In this paper we generalize the time assumption to cover real-time event streams, but keep the essential explicit time dependencies present in synchronous SRV languages. We introduce Striver , which shares with SRV the simplicity, and the separation between the timing reasoning and the data domain. Striver is a general language that allows to express other real-time monitoring languages. We show in this paper translations from other formalisms for (piece-wise constant) signals and timed event streams. Finally, we report an empirical evaluation of an implementation of Striver .


Introduction
Runtime verification (RV) is a lightweight formal method that studies the problem of whether a single trace from the system under analysis satisfies a formal specification. From the point of view of coverage, static verification must consider all possible executions of the system while RV only considers the traces observed. In this manner, RV sacrifices completeness but offers a readily applicable formal method that can be combined with testing or debugging. See [16,20] for surveys on RV, and the recent book [2]. Early specification languages proposed in RV were based on temporal logics [17,12,7], regular expressions [25], timed regular expressions [3], rules [5], or rewriting [24].
Stream runtime verification (SRV), pioneered by Lola [11] defines monitors by declaring the dependencies between output streams (results) and input streams (observations). The main idea of SRV is that the same sequence of operations performed during the monitoring of a temporal logic formula can be followed to compute statistics of the input trace, if the data type and the operations are changed. The generalization of the outcome of the monitoring process to richer verdict values brings runtime verification closer to monitoring and data streamprocessing. See [22,15,8] for further works on SRV. Temporal testers [23] were later proposed as a monitoring technique for LTL based on Boolean streams. SRV was initially conceived for monitoring synchronous systems, where all observations proceed in cycles. In this paper we present a specification formalism for timed asynchronous observations, where streams are sequences of timed events, not necessarily happening at the same time in all input streams, but where all time-stamps are totally ordered according to a global clock (following the timed asynchronous model of distributed systems [10]). The formalism that we propose in this paper targets the outline, non-intrusive monitoring (see [18] for definitions), where the model of time is that of timed asynchronous distributed systems. Our target application is the monitoring and testing of cloud systems and multi-core hardware monitoring, where this assumption is reasonable.
Related work The work [19] presents an asynchronous evaluation engine for a simple event stream language for timed events, based on a collection of language constructs that compute aggregations. This language does not allow explicit time references and offsets. Moreover, recursion is not permitted and all recursive computations are encapsulated implicitly in the language constructs. A successor work of [19] is TeSSLa [9] which allows recursion and offers a smaller collection of language constructs. Still, TeSSLa precludes explicit offset dependencies, and the target application domain is hardware based monitoring. We sketched that Striver subsumes TeSSLa. Another similar work is RTLola [14], which also aims to extend SRV from the synchronous domain to timed streams. However, in RTLola defined streams are computed at predefined periodic instants of time, collecting aggregations between these predefined instants using language constructs. In this manner, the output streams in RTLola are isochronous 1 , while in Striver defined streams are computed at the specific real-time instants where they are required, resulting in a completely asynchronous SRV system (in the sense that streams can tick at arbitrary time points). Striver can be used as a low level language to compile TeSSLa, RTLola and similar specifications.
The rest of the paper is organized as follows. Section 2 describes the Striver specification language. Section 3 presents a trace-lenght independent online algorithm. Section 4 shows some extensions of Striver. Section 5 reports on an empirical evaluation and Section 6 concludes the paper.

The Striver Specification Language
In this section we introduce Striver, a specification language that allows defining efficiently monitorable specifications [11], those for which all streams can be resolved immediately. We show in Section 3 an online monitoring algorithm and prove that this algorithm is also trace length independent.

Preliminaries
The main idea behind SRV is to separate two concerns: the temporal dependencies and the data manipulated, for which we use data domains.
Data Domains. We use many-sorted first order logic to describe data domains. A simple theory, Booleans, has only one sort 2 , Bool, two constants true and false, binary functions ∧ and ∨, unary function ¬, etc. A more sophisticated signature is Naturals that consists of two sorts (Nat and Bool ), with constant symbols 0, 1, 2. . . of sort Nat, binary symbols +, * , etc (of sort Nat×Nat → Nat) as well as predicates <, ≤, etc of sort Nat × Nat → Bool, with their usual interpretation. All theories have equality and are typically (e.g. Naturals, Booleans, Queues, Stacks, etc) equipped with a ternary symbol if · then · else·. In the case of Naturals, the if · then · else· symbol has sort Bool × Nat × Nat → Nat.
Our theories are interpreted, so each sort S is associated with a domain D S (a concrete set of values), and each function symbol f is interpreted as a total computable function f , with the given arity and that produces values of the domain of the result given elements of the arguments' domains. For simplicity, we omit the sort S from D S .
We will use stream variables with an associated sort, but from the point of view of the theories, these stream variables are atoms. As usual, given a set of sorted atoms A and a theory, the set of terms is the smallest set containing A and closed under the use of function symbols in the theory as a constructors (respecting sorts).
We consider a special time domain T, whose interpretation is a (possibly infinite, possibly dense) set with a total order and a minimal element 0, and a binary addition symbol +. Examples of time domains are R + 0 , Q + 0 and N 0 with their usual order. Given t a , t b ∈ T we use [t a , t b ] = {t ∈ T | t a ≤ t ≤ t b }, and also (t a , t b ), [t a , t b ) and (t a , t b ] with the usual meaning. We say that a set of time points S ⊆ T does not contain bounded infinite subsets, whenever for every t a , t b ∈ T, the set S ∩ [t a , t b ] is finite, in which case we say that S is a non-Zeno set.
We extend every domain D into D ⊥ that includes two special fresh symbols ⊥ D notick and ⊥ D outside . These new symbols allow capturing when a stream does not generate an event, and when the time offset falls off the beginning and the end of the trace.
Streams. Monitors observe sequences of events as inputs, where each event is time-stamped and contains a data value from its domain.
Definition 1 (Event stream). An event stream of sort D is a partial function η : T D such that dom(η) does not contain bounded infinite subsets, where dom(η) is the subset of T where η is defined.
We introduce some notation for event streams. The functions prev < and prev ≤ with type E D × T → T ⊥ are defined as follows. Note that the functions can return a value in T ⊥ because sup can return ⊥ T outside when the stream has no event in the interval provided.
Essentially, given a stream σ and a time instant t ∈ T, the expression prev < (σ, t) provides the nearest time instant in the past of t at which σ is defined. Similarly, prev ≤ (σ, t) returns t if t ∈ dom(σ), otherwise it behaves as prev < .
Synchronous SRV In synchronous SRV, specifications are given by associating every output stream variable y with a defining equation that, once the input streams are known, associates y to an output stream. For example: defines two output streams: always_p, which calculates whether Boolean input stream p was true at every point in the past (that is, p) and count_p, which counts the number of times p was true in the past. Offset expressions like count_p[-1,0] allow referring to streams in a different position (in this case in the previous position) with a default value when there is no previous position (the beginning of the trace). In this paper we introduce a similar formalism for timed event streams. Our goal is to provide a simple language with few constructs including explicit references to the previous position at which some stream contains an event, contrary to other stream languages like TeSSLa [9] and RTLola [14] which preclude to reason about real-time instants. We say that Striver is an explicit time SRV formalism.

Syntax of Striver
A Striver specification describes the relation between input event-streams and output event-streams, where an input stream is a sequence of observations from the system under analysis.
The key idea in Striver is to associate each defined stream variable with: a ticking expression that captures when the stream may contain an event; a value expression that defines the value contained in the event.
Note that in synchronous SRV, only a value expression is necessary because every stream has a value at every clock tick. In practice, it is very useful that T y defines an over-approximation of the set of instants at which y ticks, and then allow the value expression to evaluate to ⊥ D notick . The stream associated with y does not contain an event at t if V y evaluates to ⊥ D notick at t, even if t is in T y . For example, if one wishes y to filter out events from a given stream x it is simple to define in T y that y ticks whenever x does, and delegate to V y to decide whether an event is relevant of should be filtered out.
Expressions. We fix a set of stream variables Z = I ∪O. Apart from ticking expressions and value expressions, offset expressions (used inside value expressions) allow defining temporal dependencies between ticking instants.
-Ticking Expressions: where c ∈ T is a time constant, v is an arbitrary stream variable, and w is a stream variable of type T , and U is used for the union of sets of ticks. The type T is defined as T = {t | t ≥ } for a given > 0. This restriction on the argument of delay guarantees that the ticking instants are non-zeno if all their inputs are non-zeno (see Section 3).
-Offset Expressions, which allow fetching previous events from streams: Offset expressions have sort T. Here, t represents the current value of the clock. The intended meaning of x << τ is to refer to the previous instant strictly in the past of τ where x ticks (or ⊥ D outside if there is not such an instant). The expression x <~τ also considers the present as a candidate.
-Value Expressions, which give the value of a defined stream at a given ticking point candidate: where d is a constant of type D, x ∈ Z is a stream variable of type D and f is a function symbol of return type D. Note that in x(τ x ) the value of stream x is fetched at an offset expression indexed by x, which captures the ticking points of x and guarantees the existence of an event. Expressions t and τ x build expressions of sort T. The two additional constants outside D and notick D allow to reason about accessing the end of the streams, or not generating an event at ticking candidate instant.
We also use the following syntactic sugar: Essentially, x(~t) provides the value of x at the previous ticking instant of x (including the present) and x(<t) is similar but not including the present. Also, The stream clock emits an event every second since time 0.

Semantics
As common in SRV, the semantics is defined denotationally first. This semantics establishes whether a given input and a given output satisfy the specification, which is defined in terms of valuations. Given a set of variables Z, a valuation σ is a map that associates every x in Z of sort D with an event stream from E D . Given a valuation σ we define the result of evaluating an expression for σ. We define three evaluation maps . σ , . σ , . σ depending on the type of the expression 3 : 3 we use colors to better distinguish between semantic maps -Ticking Expressions. The semantic map . σ assigns a set of time instants to each ticking expression as follows: For offset expressions . σ provides, given a time instant t, another time instant: Finally, value expressions are evaluated into event streams of the appropriate type. For a given instant t: Note that x(e) σ includes the possibility that (1) the expression cannot be evaluated because the time instant given by e σ (t) is outside the boundaries of domain of the stream and (2) the expression is not defined because the stream does not tick at t. It is easy to see that the cases for x(e) σ are exhaustive because e σ (t) guarantees that σ x (t ) is defined.
notick } An evaluation model is a valuation σ such that for every y ∈ O: σ y = T y , V y σ .
The goal of a Striver specification is to define a monitor, that intuitively should be a computable function from input streams into output streams. The following definition captures whether a specification indeed corresponds to such a function.
Definition 3 (Well-defined). A specification ϕ is well-defined if for all σ I , there is a unique σ O , such that σ I ∪ σ O is an evaluation model of ϕ.
As with synchronous SRV, specifications can be ill-defined. For example, the following specification (define bool a:= not a) admits no evaluation model, and (define bool a:= a) admits many evaluation models. Additionally, a specification is efficiently monitorable if the output at time t only depends on the input at time t, which enable the incremental computation of the output stream.

Definition 4 (Efficiently monitorable).
A well-defined specification ϕ is efficiently monitorable whenever for every two input σ I and σ I with evaluation models σ O and σ O , and for every time t, if σ

Well-formedness
The condition of well-definedness is a semantic condition, which is not easy to check for a given specification (undecidable for expressive enough domains). We present here a syntactic condition, called well-formedness, that is easy to check on input specifications and guarantees that specifications are well-defined. Most specifications encountered in practice are well-formed.
We first define a subset of the offset expressions, called the Present subset, as the smallest subset that contains t and such that if e ∈ Present then (x <~e) ∈ Present. We say that an output stream variable y directly depends on a stream variable x (and we write x → y) if x appears in T y or V y . We say that y has a present direct dependency on x (and write x 0 − → y) if x → y and either x.ticks appears in T y , or -(x<~e) appears in V y and e ∈ Present. A direct dependency captures whether in order to compute a value of a stream variable y at position t, it is necessary to know the value of stream variable x up to t. If x → y but x 0 − → y we say that y directly depends on x in the past (and we write x − − → y). Closed paths in the dependency graph correspond to dependencies between a stream and itself in the specification ϕ. These closed paths do not create problems if the path corresponds to accessing the strict past of the stream. Note that if one removes − − → edges from the dependency graph of a well-formed specification, the resulting graph is necessarily a DAG. In other words 0 − → * is irreflexive. The following lemma formally captures the information that is sufficient to determine the value of a given stream at a given time instant.

Definition 5 (Dependency Graph). The dependency graph of a specification ϕ is a graph
Lemma 1. Let y be an output stream variable of a specification ϕ, σ, σ be two evaluation models of ϕ, such that, for time instant t: (i) For every variable x, σ x (t ) = σ x (t ) for every t < t, and (ii) For every x, such that x 0 − → * y, σ x (t ) = σ x (t ) for every t ≤ t Then σ y (t) = σ y (t).
The proof proceeds by structural induction on expressions, with the observation that only values in the past are necessary, as in conditions (i) and (ii). We are now ready to show that well-formed specifications cannot have two different evaluation models.

Theorem 1. Every well-formed Striver specification is well-defined.
The proof proceeds by showing that for well-formed specifications two evaluation models must be equal. This is shown by induction on the events in the traces to prove that the i-th event must be identical. Lemma 1 guarantees that induction can be applied.

Operational semantics
The semantics of Striver specifications introduced in the previous section are denotational in the sense that these semantics associate for a given input stream valuation exactly one output stream valuation, but does not provide a procedure to compute the output streams, let alone do it incrementally. We provide in this section an operational semantics that computes the output incrementally . We fix a specification ϕ with dependency graph G and we let G = be its pruned dependency graph (obtained from G by removing 0 − → edges). We also fix < to be an arbitrary total order between stream variables that is a reverse topological order of G = .
We first present an online monitoring algorithm that stores the full history computed so far for every output stream variable. Later we will provide bounds on the portion of the history that needs to be remembered by the monitor, showing that only a bounded number of events needs to be recorded, and that this bound depends only on the size of the specification (number of streams) Algorithm 1 monitor: Online Monitor 1: procedure Monitor 2: Hs ← for every s 3: tq ← −∞ 4: loop Step 5: tq ← min s∈O {t | t = vote(H, Ts, tq)}

6:
if tq = ∞ then break 7: for s in G = following < do 8: if tq ∈ Ts H then 9: v ← Vs H (tq) 10: if v = ⊥ D notick then 11: Hs ← Hs ++ (tq, v) Updates history H 12: emit(tq, v, s) 13: end for 14: end loop and not on the length of trace. This modified algorithm is a trace-length independent monitor for efficiently monitorable Striver specifications. The algorithm maintains the following state (H, t q ): -History: H is a finite event stream one for each output stream variable. We use H y for the event stream prefix for stream variable y. -Quiescence time: t q is the time up to which all output streams have been computed.
The monitor runs a main loop, calculating first the next relevant time t q for the monitoring evaluation and then computing all outputs (if any) for time t q . We show that no event exists in any stream in the interval between two consecutive quiescence time instants. We assume that at time t, the next event for every input stream is available to the monitor, even though knowing that there is no event up-to some t q is sufficient.
The core observation follows from Lemma 1, which limits the information that is necessary to compute whether stream y at instant t contains an event (t, d). All this information is contained in H, so we write T y H and V y H to remark that only H is needed to compute T y σ and V y σ .
The main algorithm, Monitor, is shown in Algorithm 1. Lines 2 and 3 set the history and initial quiescence time. The main loop continues until no more events can be generated. Line 5 computes the next quiescence time, by taking the minimum instant after the last quiescence time at which some output stream may tick. A stream y "votes" (see Algorithm 2) for the next possible instant at which its ticking equation T y can possibly contain a value. Consequently, if no input stream votes for an earlier time it is guaranteed that no ticking equation will contain a value t lower than the lowest vote. Note that recursive calls at line 28 terminate because the graph G = is acyclic (recall that the specification is well-formed). if (t + v) > t (where (t , v) = last(Hs)) then return t + v 19: else return ∞ 20: case {c} 21: if c > t then return c 22: else return ∞ 23: case a ∪ b 24: return min(vote (H, a, t), vote(H, b, t)) 25: case y.ticks with y ∈ O 26: return vote(H, Ty, t) 27: case s.ticks with s ∈ I 28: return succ>(σs, tq) The algorithm follows a topological order over the G = , so the information about the past required in Lemma 1 is contained in H. The following result shows that, assuming that σ I is non-zeno, the output is also non-zeno. Hence, for every instant t, the algorithm eventually reaches t q > t in a finite number of executions of the main loop.

Lemma 2.
Monitor generates non-zeno output for a given non-zeno input.
The proof proceeds by contradiction assuming a t with non-zeno output, and the minimum output stream in G = that has a non-zeno output, and then showing that there must be a non-zeno output for t − . This can be applied t times to conclude that there is non-zeno output before 0 which is a contradiction.
We finally show that the output of Monitor is an evaluation model. We use H i s (σ I ) for the history of events H s after the i-th execution of the loop body, and H * s (σ I ) for the sequence of events generated after a continuous execution of the monitor. Note that H * s (σ I ) can be a finite sequence of events (if the input is bounded and no repetition is introduced in the specification using delay) or an infinite sequence of events. In the first case, the vote is eventually ∞ and the monitoring algorithm halts. The proof proceeds by induction on the number of rounds in the loop, showing that the output is an evaluation model up-to the quiescence time. Putting together Theorem 2, Lemma 1 and Lemma 2 we obtain the following result. Corollary 1. Let ϕ be a well-formed specification, σ I a non-zeno input stream and H * the result of Monitor. Then, H * is the only evaluation model for input σ I , and H * is non-zeno.
Trace Length Independent Monitoring The algorithm Monitor shown above computes incrementally the only possible evaluation model for a given input stream, but this algorithm stores the whole prefix H y for every output stream variable y. We show now a modification of the algorithm that is trace length independent, based on flattening the specification. A specification is flat if every occurrence of an offset expression in every T y is either x(<~t) or x(<< t). In other words, there can be no nested term of the form x(<~(y<~t)) or x(<~(y<< t)) or x(<< (y<~t)) or x(<< (y<< t)). We first show that every specification can be transformed into a flat specification. The flattening applies incrementally the following steps to every nested term x(E(y<< t)), where E is an arbitrary offset term: 1. introduce a fresh stream s with equations T s = y.ticks and V s = x(E(t)) 2. replace every occurrence of x(E(y << t)) by s(<t).
Example 3. Consider the following specification of a continuous integration process in software engineering. The intended meaning is to report in faulty those commits to a repository that fail the unit tests.
input commit_id commits , unit push , bool tests ticks faulty := tests . ticks define commit_id faulty := if tests (~t ) then notick else commits ( < push <<t ) After applying the flattening process the specification becomes: define commit_id faulty := if tests (~t ) then notick else s ( < t ) ticks s := push . ticks define commit_id s := commits ( < t ) Here, s stores the commit_id of the last commit at the point of a push, which is precisely the information to report at the time of a faulty commit.  We first sketch how to define the most complex operator 4 of TeSSLa: x = delay s 0 , s 1 , which creates an event stream x whick will tick at an instant t if there is an event (t , v) in s 0 such that t + v = t and also dom(s 1 ) ∩ (t , t) = ∅ TeSSLa does not handle explicit time and offsets but builds specifications from building blocks like delay. Given inputs s0 and s1 the Striver specification is: ticks aux := s0 . ticks U s1 . ticks define Time _eps aux := if isticking ( s1 ) then infty else if aux ( <t , infty ) = infty || aux ( < t ) + aux <<t <= t then s0 (~t ) else notick ticks x := delay x_aux define unit x := () We now present three extensions to the basic Striver introduced previously.
Accessing successors. The first extension allows accessing future events, via the dual of the offset operators x >~e and x >> e, and the syntactic sugar to access the successor value x(e>), x(e~), x(e, d>) and x(e, d~). As for Lola, well-formedness can be guaranteed as long as all strongly connected components in the dependency graph contain only All Delays. This allows defining tick sets that consider all delays. The ticking expressions are extended with an operator delayall with the following semantics: This extension requires only to change vote to accommodate for a set of possible pending delays and not just a single delay. In general, this cannot be implemented in finite memory for arbitrary event rates and delays, but Monitor works directly for the online monitoring this construct.
Windows. The last extension allows implementing computations over precise windows, like "count the number of events in every window of one second ". This cannot be described in TeSSLa [9], which is limited to finite memory monitors, or in RTLola [14] because this specification is not isochronous. Note that this property cannot be monitored by splitting the time in intervals of one second and counting the events in each of the intervals obtained (as in RTLola) as this approach misses the case of counting the events in part of one window and the remaining time in the adjacent window. The main idea of this extension is to enrich time expressions with a tag, in such a way that every tick carries an additional value (we called this extension dependent time). Then, delay and delayall are enriched with the ability to use tagged time streams, with the caveat that the U combinator must now indicate how to combine tags. Consider the following example with input int s: ticks wcount := ( const 1 s ) U delay all ( const ( -1 ,5) s ) define int wcount t aux := wcount ( <t ,0) + aux The stream wcount must only be computed when a new event arrives in s (adding 1) or when an event leaves the window (substracting 1), which is monitored with a constant number of operations per event, but requires storing a number of events that depends on the event rate.
The Signal Temporal Logic STL [21,1]-when interpreted over piecewiseconstant signals-is subsumed by Striver. First note that event streams have a dual interpretation as piece-wise constant signals, where the signal only changes at the point where events are produced. The translation to Striver opens the door to a quantitative computation of STL by enriching the data types of expressions and verdicts. We show the operator x U [0,b] y: ticks v := x U y U delay all -b x U delay all -b y define bool v t := if y (~t , false ) then true else if ! x (~t , false ) then false else let t ' := yT ( t~) in if t ' == outside || t ' > t + b then false else t ' <= xF ( t~)

Empirical Evaluation
We report an empirical evaluation of a prototype sequential Striver implementation, written in the Go programming language 5 . We measure the memory usage and time per event for two collections of specifications. The first collection, from Example 1, computes the stocks of p independent products. These specifications contain a number of streams proportional to p, where each defining equation is of constant size. The second collection computes the average of the last k sales of a fixed product, via streams that tick at the selling instants and compute the sum of the last k sales (see the appendix for the concrete specs). The resulting specifications has depth proportional to k. We instantiate k and p from 10 to 500 and run each resulting specification with a set of generated input traces. We run the experiments on a virtual machine on top of an Intel Xeon at 3GHz with 32GB of RAM, and measure the average memory usage (using the OS) and the number of events processed per second.
In the first experiment, we run the synthesized monitors with traces of varying length (top two plots in Figure 1). The results illustrate that the memory needed to monitor each specification is independent of the length of the trace (the curves are roughly constant). Also, the ratio of events processed is independent of the length of the trace. In the second experiment, we fix a trace of (a) Memory wrt trace length (b) Event ratio wrt trace length length 1 million events and run the specifications with k and p ranging from 250 to 550. The results (lower diagrams) indicate that the memory needed to monitor stock p is independent of the number of products while the memory needed to monitor each avg k specification grows linearly with k. Recall that theoretically all specifications can be monitored with memory linearly on the size of the specification.

Conclusion and Future Work
We have introduced Striver, a specification language with explicit time and offset reference for the stream runtime verification of timed event streams. We have presented a trace-length independent online monitoring algorithm for the efficiently monitorable fragment. Future work includes the extension of the language with parametrization, (like in QEA [4], MFOTL [6] and Lola2.0 [13]), to dynamically instantiate monitors for observed data items. We are also studying offline evaluation algorithms, and algorithms that tolerate deviations in the time-stamps and asynchronous arrival of events from the different input streams.