Robust trajectory-based density estimation for geometric structure recovery

We propose a method to both quickly and robustly extract geometric information from trajectory data. While point density may be of interest in some applications, trajectories provide different guarantees about our data such as path densities as opposed to location densities provided by points. We aim to utilize the concise nature of quadtrees in two dimensions to reduce run time complexity of counting trajectories in a neighborhood. We compare the accuracy of our methodology to a common current practice for subsampling a structure. Our results show that the proposed method is able to capture the geometric structure. We find an improvement in performance over the current practice in that our method is able to extract only the salient data and ignore trajectory outliers.


Introduction
The increasing availability of large data sets containing spatio-temporal data is driving the need for data analysis methodologies. Spatio-temporal data has been widely used in various applications: traffic monitoring [5,21,20], trajectory similarity search used in route searching and semantic understanding, [11,3,10,28], weather and pollution monitoring [25], and anomaly detection [7,8,24].
Trajectory clustering is an efficient method to analyze trajectory data. By counting the number of trajectories passing through regions of the data space, both the spread and density of the data can be evaluated. Researchers have explored applying the principles of clustering trajectories to determine animal movement patterns [29,30,22], mapping roads from vehicle GPS data [9,1,22], and describing environmental characteristics such as fault lines [1], and hurricane forecasts [26].
Trajectory clustering techniques are often used to supply a set of descriptive trajectories, however this information can also be obtained by learning the region's trajectory density. Density based methods have been used in the context of clustering as in DBSCAN [14], where density of points has been taken into account for purpose of clustering. The density based method TRA-CLUS [19] is used to cluster segments of larger trajectories. Similarly, in [17], the authors use density of edges accumulated by aggregating segmentations by an ensemble scheme. Each of these practices has drawbacks associated with finding the density of trajectories in an arXiv:2210.00343v1 [eess.SP] 1 Oct 2022 Fig. 1 Pipelines for Map Reconstruction and Activity Recognition Applications. Along the top, we apply our method to raw GPS trajectories of Chicago (a) and show the hierarchical structure (b) of the red highlighted region. The skeleton of the reconstructed map from our method is shown in (c). We show our method applied to an activity recognition pipeline which performs Time Delay Embedding (TDE) on inertial signals of a wearable devices (d). Our hierarchical structure (e) and subsampled points (f) are used to analyze the shape of the TDE for activity recognition purposes. efficient manner. In this paper, we explore the issues associated with using a point clustering algorithm, and aim to present an algorithm which is not highly sensitive to input parameters or trajectories, but is instead deterministic and robust. In [19], it is noted that the TRACLUS algorithm is sensitive to input trajectories where short input trajectories may produce undesirable clustering results. Many of the density estimation approaches have large computational complexity which we aim to reduce.
Evaluating the shape of a density estimation can be computationally expensive. Therefore in real-time application, evaluation of the geometry of a data set often requires subsampling the data. Subsampling methods aim to downsample the data to keep the most representative data points while removing outliers [31,12]. A k-nearest neighbors density based subsampling together with the maxmin landmark selection algorithm (KNNmaxmin) is utilized in [12]. Many devices in real-time applications cannot guarantee a set sampling frequency and point-based methods fail in these cases because if we only consider densities of points we will end up with more samples in the segments where the sampling frequency was higher. The need to minimize the affect of sampling rate on the density of paths is another motivation for the work present in this paper. We compare our proposed method to the maxmin subsampling algorithm [12] to highlight these issues and also to validate our method.
The strategy presented in this paper aims to generate a level set density function of time series data sets. We can utilize the density function to subsample points in efficient manner which captures the structure of the data. By utilizing the density function, we are able to uniformly subsample the entire structure of the data without costly pointwise distance comparisons. Furthermore, the sampling frequency does not affect the shape of the density function and thus our process is robust to changes in sampling frequency along a trajectory while point based subsampling may be affected by uneven sampling.
As an example application, activity recognition is a common problem in wearable sensing applications. The authors of [13] show that the structure of the person's joint data can be recovered by topological feature extraction. This method allows for real-time activity recognition however, suffers from the issues of point-based subsampling method. We improve on this method by employing our trajectory based subsampling strategy and speed the process by 75% while achieving similar performance.
We provide an offline as well as an update implementation 1 of our existing geometric structure. Offline processing can be utilized in the case of large data sets or for performing exploratory data structure analysis as in the case of generating an initial road map from GPS trajectories, see (a-c) in Figure 1, and it can be updated once new trajectory becomes available. Real time data applications can benefit from updating an existing or expected hierarchical data structure representing the underlying geometry, see (d-f) in Figure 1.
The rest of this article is organized as follows; we present the problem formulation as well as our density function in Section 2, followed by the implementation details in Section 3. Section 4 describes the synthetic data generation and evaluation of recovering the underlying shape from data with harmonic and impulse noise. Section 5 presents the activity recognition data set, pre-processing steps, and the evaluation of our method on the data set. Section 6 presents the application of our method on Street Map generation. We conclude in Section 7 to highlight the benefits and limitations presented in the paper.

Mathematical Formulation
Given a set of trajectories in R d , our objective is to characterize how their density varies across the space. We will characterize the density by counting the number of trajectory segments in a neighborhood of radius r ∈ R + at a location x ∈ R d in the space with coordinates (x 1 , . . . , x d ). This defines a local count function over the space. For our analysis, we aim to (1) develop a computationally efficient way to estimate these counts over the entire space, (2) show that these functions are stable, and (3) define a sampling strategy that produces points that have a similar structure to that of a specific level set of the count function. As we will see later, we will analyze and compare the structure of a level set by characterizing the local topological structure.
In order to formalize the problem, we introduce some notation. In particular, for a set {γ k } N k=1 of continuous The green portions of the trajectories are the elements of the set A S r 2 . As shown C B r 1 = 2, C B r 2 = 3, C S r 1 = 4 and C S r 1 ,S r 2 = 3. As specified in Theorem 1 C B r 1 ≤ C S r 1 ,S r 2 ≤ C B r 2 .
trajectories, we define γ k : [0, T k ] → R d and their trace over an interval I as γ k (I) = {γ k (t) | t ∈ I} ⊂ R d . For simplicity, the analysis will focus on a single trajectory (i.e., we drop the index k in γ k ) in R 2 that has continuous second derivatives, but the definition and results can be directly generalized to multiple trajectories and higher dimensions. In order to define our local count function, we will make use of the closed disk B r (x) = {y ∈ R 2 | (y 1 − x 1 ) 2 + (y 2 − x 2 ) 2 ≤ r 2 }. We consider the pair of entry / exit points where by maximal we mean that there is no larger in- . Formally, we define the local trajectory count function C Br : R 2 → N as: where |A| is the cardinality of set A. Figure 2 (a) shows an example of the segments obtained as part of the set A r (x).

Approximating the Local Count Function
Directly computing C Br (x) over multiple scale parameters r would require determining the segments of the trajectories within a given region (B r ) for every location (x) in the space. This can be computationally intensive. Hence, it would be beneficial to make use of hierarchical data structures such as quadtrees, which can partition trajectories into segments at a coarse scale and then subpartition these segments when transitioning to a finer scale. However, quadtrees partition the space into square regions, so we require a count based on square neighborhoods (instead of disks) in order to exploit the computational efficiency of the representation. Figure 1 (top) illustrates how to use a hierarchical structure that allows the efficient computations of these counts. Let us define the closed square S r (x) = {y ∈ R 2 | max(|y 1 − x 1 | , y 2 r}. We can define a count C Sr (x) in a similar way as we did in Equation 2. However, as we will see in section 2.2, this function is sensitive to small perturbations of the trajectories, which can make the counts arbitrarily large. Hence, we define a robust local square count function where r 1 < r 2 and A Sr 1 / ∼ is the equivalence class corresponding to the set of intervals  Figure 2 illustrates this square count and its relationship to the disk counts. We can guarantee, given a bound on curvature, that the robust local square count function approximates the disk count as it is shown in the theorem below. The proof of this result is outlined in the Appendix A.
Theorem 1 Given that √ 2 · r 1 ≤ r 2 < 1 κmax , where κ max is a bound on the maximum curvature of a γ, then for all x ∈ R 2 we have that

Stability of Local Count Functions
When trajectory densities are counted, it is important to define a notion of stability of the count in a region such that the change in a region's count is bounded when a small -perturbation is applied. We begin by formalizing our notion of stability.

Definition 21
An -perturbation to a trajectory γ with maximum curvature bounded by κ max produces a new trajectoryγ with the same maximum curvature bound such that ||γ(t) − γ(t)|| 2 ≤ . We say that the -perturbation is small if < 1 κmax . Definition 22 A count function is stable if for any trajectory γ with maximum curvature bounded by κ max the change in its value due to a small -perturbation is bounded everywhere. The bound may depend on the actual trajectory. Otherwise, the count is said to be unstable.
It may seem that a bound depending on γ may seem too relaxed of a condition. However as can be seen from the proof of the theorem below, a bound on the change of the count C Br that is independent of γ can be found if we constrain our definition to curves of fixed length.
The following theorem shows that indeed the count over a disk neighborhood is stable, and the count over a square unstable while the robust square count is stable. Figure 3 provides an example in which an -perturbation can be constructed to cause an unbounded increase on the square count. This may seem like an unlikely scenario. However, we observed situations in our experimental validation in which the square counts were much larger than the disk counts, which corresponded to scenarios similar to the one in the figure. Proof of this theorem is outlined in the Appendix B.
Theorem 2 C Sr is an unstable count. C Br is stable given that r < 1 κmax . C Sr 1 ,Sr 2 is stable to perturbations of a trajectory given that r 2 < 1 κmax . Fig. 3 Illustration of unstability of square count. Let γ be a trajectory (shown in black) that moves along the boundary of the square S r (shaded in gray). The count for this trajectory is 1. We can create a sequence of -perturbations such that the count function is not bounded. An initial trajectory can be constructed by perturbing the trajectory to match a number of arcs of circles with radius r > 1 κ max (as shown in green). The count of γ 1 is 2 (counting the left end point). By construction, the perturbation will be small. We can create a new trajectory (shown in red) by increasing the number of arcs to be twice as many. The count of γ 2 is 3. This process can be repeated indefinitely.

Density-based Sampling
Our method also provides a simple choice for subsample points when a small set of geometrically representative points are required for shape analysis. As it will be described in the next section, a hierarchical representation based on a quad-tree structure is constructed as Figure 1 illustrates. At the coarser scale, segments of a trajectory within a given square region will be extracted. Then, the region will be further split into smaller squares, and the process will be repeated. The number of segments at each scale will be counted using our robust local square counting scheme. The centers of squares at the finest scale with a count higher than specified threshold (i.e., regions with high enough density) will be taken as samples from the targeted superlevel set of the trajectory density function. We refer to this scheme as the Trajectory-based Representation for Estimation of Density (TRED) representation.

Implementation Details
For our implementation of TRED, we use the leaf-unbalanced quadtree data structure introduced by Finkel and Bentley [15] to refine a search space in order to extract the underlying geometric structure of the data set. The generalizable nature of the quadtree allows for it to be run in higher dimensions with the need to define only a few parameters. We refer to the nodes of the quadtree as bins which are always square regions. Algorithms 1 and 2 provide pseudocode. The parameters required are: 1. The threshold τ used for specifying a stop criterion for splitting the bins. If we want to compute the counts for all scales then we can select τ = 0. 2. A length for the base square R that encloses the entire trajectory. It specifies the initial region to be split. 3. A maximum depth M of the quadtree. We can select this depth such that at the finest scale the square size r 1 = R · 2 −M satisfies any assumptions about our curvature constraints. 4. A radius offset δ r (·) used to specify the difference between r 1 and r 2 at every scale. If we assume that the finest scale satisfies the curvature constraints then a good choice for this offset is a constant Let us review the parameters required for a successful application of TRED. Trajectory threshold defines how many times a set of trajectories needs to pass through a region before it should be split into child bins. Thus threshold indicates the density required to be included in a resulting superlevel set, where higher thresholds produce structures with more restricted superlevel sets. It follows that the density of the trajectories is the most influential factor in choosing an effective threshold. The choice of bin depth defines at what point a bin no longer splits into child bins after it exceeds the Algorithm 1: TRED Offline Pseudocode trajectory threshold. Bin depth is closely coupled to maximum curvature and the size of the region as the finest bin side is a function of the region size and the maximum bin depth. Thus treating the depth accordingly, the region occupied by the trajectories along with the maximum curvature and desired resolution of the space can be used to estimate an effective bin depth. We expect that larger bins will include higher counts of trajectories as compared to the children of the bin. As a direct result, modifying the bin depth or region size may affect the optimal choice of trajectory threshold. Once the first three parameters are known, the value for radius offset must be decided. We have shown the importance of defining maximum curvature in Section 2.1, and as such it is vital to choose an effective offset. Estimated maximum curvature should be used to determine the radius offset once the region size and maximum depth are known utilizing the aforementioned formula for δ r . Of course, from a data-driven approach, we can consider all these variables as hyper-parameters to tune.
Offline Version. Algorithm 1 stores all segments in the list S. These segments correspond to the squares S r2m used in the robust count computation. The list Λ is used to keep track of the active bins to be refined at each scale based on the count criteria used in line 11. For each scale and active bin, counts and corresponding segments for the children are computed. Line 10 calls a function that returns the segments associated with S r2m (x m,p ) and the counts associated with C Sr 1m ,Sr 2m (x m,p ), where x m,p is the center of the bin at scale m and index p. The inputs for this function are the segments in the parent bin, center for the child bin, and the corresponding radii. Finally, line 12 updates the list of active bins at the corresponding scale given that the count criterion is satisfied. The algorithm returns the lists containing counts C, segments S, and active bins Λ.
Update Version. Since we are also considering applications in which trajectory data will be incrementally provided to us (e.g., one trajectory at the time after a vehicle completes a trip for road reconstruction), we also provide an update version of the algorithm that can update the counts by iterating over consecutive samples over time (see Algorithm 2). Additionally, we ambition streaming versions of the algorithm for applications such as activity recognition in which data points could be added and removed over time.
In the following sections, we run the algorithms presented using a computer with 16 GB of RAM and an Intel i7-3770 CPU at 3.4 GHz. TRED was implemented using python 3.6 and compared against existing im-

Synthetic Data Evaluation
We utilize synthetic data to evaluate the ability of TRED to recover an underlying geometric shape given noisy data. TRED is evaluated against a brute force trajectory local density estimation. The brute force method calculates C Br (x) for points in a dense regular grid. We will refer to our brute force method as Trajectory-based Local Density Estimation (TLDE) for ease of reference. We note that TLDE may be compared to a kernel density estimation (KDE) with a uniform kernel followed by a threshold applied to generate a binary grid. The comparison aims to identify local topological differences between the ground truth and the two geometric recovery techniques (TRED and TLDE). Below, we provide details on the data generation followed by the evaluation.

Data Generation
We evaluate the performance using four synthetic shapes shown in Figure 4. The circle was generated with a radius of 1. The ellipse was generated with a semi-major axis of 1 and a semi-minor axis of 0.5. The "Eight" is a "Lemniscate of Bernoulli" with width 1. The "Peanut" is a "Cassini Oval" with distances between the centers set to 0.92, product of distances from centers set to 1, and then the x-axis is scaled such that its range is [−1, 1]. A reference trajectory is generated by creating the shape without any perturbations.
For each shape, 200 samples were generated where each sample contains 100 cycles of the shape. Each sample contains both harmonic and random noise added perpendicularly to the direction of travel. The magnitude of the harmonic noise for each sample is selected using a uniform random sampling from 0.05 to 0.1 for the peanut and 0.02 to 0.07 for the circle, eight, and ellipse. The random noise is additionally added to the harmonic noise. The random noise for each sample has between 10 and 80 pulses with magnitude between 0.1 and 0.5 where the quantity and magnitude of pulses is selected using a uniform random sampling.

Evaluation
For TRED, we set R = 6 for all shapes and use a maximum depth M = 5. For a radius of r 1 = R · 2 −M , if the count function C Sr 1 ,r 2 (x) is greater than the given threshold then all locations within B r2 (x) are set to 1. Similarly for TLDE with r = 0.3, all locations within B r (x) are set to 1 if C Br (x) is greater than the given threshold. Given these mappings over the dense regular grid containing the trajectories, we can compute the local homology differences between TRED and the ground truth as well as between TLDE and the ground truth.
The level sets produced for both TRED and TLDE as described above are compared to the ground truth level set for each shape. The ground truth level sets are produced using TLDE with a threshold of 1 run on the reference trajectory for the shape. Note the reference trajectory has no noise, so the count for TLDE is C Br (x) ∈ {0, 100}. When comparing the level set of TRED or TLDE against the ground truth, we use Fig. 4 Four synthetic trajectories with perturbations (black) with the reference trajectory (red) for each shape. Next to the shapes, we show TRED and TLDE recoveries using the thresholds 5, 25, and 65. Note that the ground truth is often very similar to TLDE at a threshold of 25. a topologically aware metric presented by Ge et a. [18] with a Lipschitz deformation bound K d = 0.096 and the radius of the region ρ = 0.48. The parameters for the metric are selected to quantify what a meaningful local homology difference is, and for this reason different parameters could be selected to relax the metric.
The results of both TRED and TLDE compared against the ground truth are shown in Figure 5. As expected, the threshold needs to be increased in order to reduce the effect of noise on the resulting level set. We use Figure 6 to show how effective TRED and TLDE are at extracting the underlying geometry across all shapes. The values plotted indicate the difference between the values shown in Figure 5 at each threshold for each shape.
We expect the local density estimation to improve rapidly, and produce level sets with the same local topology structure as the reference signal as the perturbations are filtered out due to the higher threshold. Due to the small number of pulse perturbations with a large magnitude, the metric improves more rapidly at smaller thresholds. Yet as the threshold increases above 50, we see a large local homology difference due to the decreasing width of the level set. In Figure 4, the level sets with a threshold of 25 have low local homology difference while level sets with a threshold of 65 have a large local homology difference. Despite the same global topology, these level sets have too great a difference in their widths to have a similar local topology. Note the gaps in TRED mask with a threshold of 65. Due to the high threshold and the granularity of the data structure, some gaps appear which are not present in the higher resolution TLDE. For this reason, it is important to choose an effective threshold for a particular depth.
As shown in Figure 5, we see TRED and TLDE have similar trends where the local homology difference improves until the threshold imposes too strict a constraint on level set. Yet, overall we see that TRED is able to outperform TLDE when extracting the geometric structure of a shape. As TLDE increases the threshold, it approaches the ground truth until the threshold surpasses 50 at which point the level set is too thin to be considered locally similar. Yet even at the optimal TLDE thresholds, TRED is able to perform comparably. The two strategies do not have the same optimal thresholds, and as such it is important to find the best parameters for TRED using experimental validation and application knowledge.
To conclude, we consider the run time complexity to compare these two approaches. TRED has an O(n) runtime on the synthetic dataset while TLDE has a runtime complexity of O(nm) where m is the number of locations at which C r1 (x) is calculated. For this experiment m = 500 2 which effectively squares the runtime.
At lower values of m we would expect to see an improved runtime, but the resulting level set would have lower resolution. When the resolutions between TRED and TLDE are comparable, TRED still maintains a superior run time complexity due to the hierarchical nature of the data structure.   6 TRED average local homology difference subtracted from TLDE. As shown, TRED performs better at lower thresholds, and TLDE performs better at higher thresholds.

Activity Recognition Evaluation
In this section, we show how TRED can be utilized for activity recognition. Our goal is to show that TRED performs comparably or even better than the existing subsampling strategy (maxmin) while achieving a significant gain in computational speed.

Activity Recognition Data Set
The NCSU-ADL data set presented by Lokare et al. [23] is used for real-time activity recognition. The data set contains physiological and motion data for daily living activities from 11 healthy individuals. From the data set, the 3-axis accelerometer measurements of the devices on an individual's left wrist and right leg are utilized to classify activity. Single windows of 1000 and 2000 samples are of interest which amount to 5 and 10 seconds respectively. Table 1 describes the activities of interest in the NCSU-ADL data set categorized by activity type, activity frequency, and sampling frequency. Activity frequency is extracted using the first zero crossing of the autocorrelation function. This is averaged over all the subjects.The activity frequency gives us the time period, on average, needed for a subject to complete a full motion for a particular activity.
The NCSU-ADL data set is processed using the method developed by Dirafzoon et al. [13]. The accelerometer data is windowed over time into the 1000 and 2000 sample windows previously mentioned. Transitions of activity (windows containing multiple activities) are discarded from the study since they are very noisy even after filtering. We perform a 4-fold cross validation scheme over the entire data set where each subject is equally represented in each partition of data.
The activity classification algorithm used in [13,27] is shown in Figure 7 and is summarized as follows: -Thread the x, y, and z accelerometer from the right knee and left wrist as features. For each one of the six features, perform time delay embedding (to a three dimensional space) using the specified windows (1000 or 2000 samples). -Subsample the embedding utilizing either maxmin subsampling or TRED. -Extract a feature based on the topologically persistent hole for each feature using Persistent Topology. -Classify the window using a majority vote over the k-nearest neighbors.
In this article we expand on the analysis presented in [27] for the above pipeline.

Activity Recognition Evaluation
The activity prediction method used in [13] is applied on the NCSU-ADL data set as described in Section 5.1. We compare our method TRED against the maxmin subsampling method [12]. Two window sizes of 1000 and 2000 samples which correspond to 5 and 10 seconds, respectively, are chosen to compare the two methods. Tables 2 and 3 show the comparison of TRED against the maxmin subsampling strategy categorized by subject and activity, respectively. Periodic activities are of interest to us when using time delay embedding classification. We show the F1 score performance of both methods for non-periodic as well as complex activities for completion. Figure 8 shows the confusion matrix for both methods.
Overall we see that TRED outperforms maxmin subsampling in classification performance for periodic activities. Among the periodic activities, we see performance of TRED is lower for "Rowing" activity as compared to maxmin subsampling in case of the 5 sec window. Table 1 shows that rowing has a long period (2.5 seconds) which results in generation of 2 cycles when time delay embedding is performed. As described above, it is important for trajectories to be clustered in high numbers to be successfully subsampled in TRED. Due to the long period of rowing combined with the inherent noise of the sensors, subsampling with TRED utilizing 5 seconds is prone to misclassification of windows where the waveform is perturbed either due to noise or inconsistency of a periodic motion.
We see significant improvement in performance of TRED over maxmin subsampling in the "Carrying a Box" and "Walking" activities. Differentiating "Walking" from the complex activity of "Carrying a Box" which contains the same movement in the lower body is a notable advantage to utilizing the TRED subsampling method. We see the improvement due to TRED's noise filtering property where single large variations in the time delay embedding are no longer considered representative sections of the embedding. The filtering of noise is most notably seen in the wrist. Walking has a consistent movement with a clear structure in the time delay embedding, yet the wrist movement in carrying a box is localized and unstructured with some large variations. TRED returns a subsampling only containing points in the localized region with most of the data  while maxmin subsampling returns points in the localized region as well as the large variations which incorrectly suggests a larger and more structured time delay embedding.
The degradation in performance in TRED when increasing the window size is somewhat surprising until  inspected closely. It is expected that increasing the window size should increase the number of cycles resulting in a more desirable subsampling. For smaller windows, inconsistent movement can affect the embedding delay τ d , but result in a recognizable structure. For larger windows, the same inconsistent movement has an affect on the τ d which has a larger effect on the structure by perturbing the pattern which results in less overlap. Figure 9 shows the effect of combining two windows which were classified correctly, but the calculation of τ d over the combination of the windows differs resulting in a more perturbed embedding. The same effect on τ d can be attributed to TRED outperforming maxmin in the 10 sec window as the embedding demonstrates more perturbed, but the outliers are removed in TRED. By comparing trajectories of the same activity with variable noise, we are able to see a realistic example of TRED effectively capturing the underlying geometric structure of a noisy trajectory as was the focus of Section 4. We see a similar effect on the performance of maxmin subsampling when increasing the window size, however this effect is quite drastic in the case of maxmin subsampling. The increased window size causes the performance to drop by 6% on average while the performance of TRED only drops 3%. It should be noted that maxmin subsampling is not deterministic and thus running maxmin subsampling again on the same data could produce results which perform either better or worse than the presented data. The ability to reproduce results is a beneficial characteristic of any algorithm, and as such we see that TRED's deterministic nature is preferred over non-deterministic subsampling.

Computational Complexity
When comparing these two approaches of subsampling for use in activity classification, we find TRED to more reliably extract the underlying geometry and in turn give an effective subsampling. It should be noted that TRED is 4x more efficient than maxmin subsampling. TRED runs at an average rate of 0.0707 seconds per five second window while maxmin subsampling takes an average of 0.2913 seconds. As such TRED's effectiveness in real time subsampling applications is shown even without incorporating the more efficient online insertion and deletion algorithm for streaming applications. The run time improvement is expected for the type of trajectory data that we are considering as TRED should follow the computational complexity of octrees in this case which is O(n) considering the depth is a relatively small constant while maxmin subsampling has a computational complexity of O(n 2 ).

Street Map Evaluation
We conclude the evaluations by exploring the effectiveness of TRED in extracting the underlying road map of a GPS data set. We show that TRED is not only effective at filtering out noise (mostly due to the low GPS sampling rate), but also is able to extract a representative map from the raw GPS trajectories. While TRED may not be a domain specific algorithm, we aim to achieve comparable performance with a significant improvement in computational complexity when compared to algorithms written specifically for the purpose of extracting the road network from GPS traces.

Streetmap Data Set
Ahmed, Karagiorgou, Pfoser, and Wenk [4] provide a collection of GPS tracking data sets along with state of the art algorithms for extracting a road network from the raw GPS data. The data sets used in this paper contain both the tracking data set as well as the ground truth with the associated evaluation metrics. The road networks in these data sets vary in covered area from 15.6 km 2 to 168 km 2 . We provide an overview of the data collection characteristics for each of the maps and the corresponding characteristics for the ground truth data structures from OpenStreetMap in Table 4. We note the methods of data collection vary slightly among the data sets with Athens large and small being collected from the routes of school buses, Berlin collected from a taxi fleet, and Chicago collected from university shuttle buses.

Streetmap Reconstruction Evaluation
We show the results of TRED as evaluated using two metrics presented by Ahmed et. al [3], directed Hausdorff distance and path-based distance. We now give a short overview of the metrics developed by Ahmed et. al for evaluation. Let the ground truth graph G have optimal path G * between two points and the reconstructed graph H have optimal path H * between two points. The directed Hausdorff distance can be calculated by taking all points in G * and finding their closest neighboring point in H * . The maximum distance over a pair of closest point is the Hausdorff distance. The path-based distance uses the Fréchet distance to calculate the metric along G * and H * such that the points along the paths are considered in a monotonic manner. The Fréchet distance here is often described as the minimum length leash required for a person to walk along the path G * while their dog walks along the path H * from the beginning to end of the path in a monotonic way.
In Table 5, we append our results to the results presented by Ahmed et. al [3] for comparing the success of TRED in extracting the underlying road network from GPS trajectories. As noted in [3], many algorithms failed to produce maps using the GPS traces from both Athens Large and Athens Small. Similarly, TRED was not evaluated on these datasets due to the small number of trajectories and large portions of the maps with only single trajectories. As a desired property of TRED, those regions with single trajectories are not recovered due to their low trajectoy density. For this reason, the evaluation of TRED on such datasets does not accurately evaluate TRED's ability to extract the underlying geometry of the space which is why those maps were excluded.
Despite comparing TRED to algorithms which are specifically written to generate road networks from noisy GPS traces, we see TRED is able to perform comparably in both the path based and directed Hausdorff distances. While performing comparably in the distance metrics for reconstruction accuracy, TRED is able to generate the maps at a vastly reduced computational cost as shown in the last column of Table 5. Figure 10 shows the output of our method.
We now show how the parameters for TRED align with physical attributes of the underlying maps. For each of the maps, the parameter R which represents the length for the base square that encloses the entire map is defined by the input trajectories. We empirically determined the maximum depth, M, of the quadtree which defined our radius of the finest scale square and the radius offset as r 1 = R · 2 −M and r 2 = (1 − √ 2) · R · 2 −M . As an analysis on the meaning of our selection of M, we compare the radius of finest scale square with the width of a road as our bins should aim to capture a single road. We show the parameters used to generate the maps in Table 6 as well as the comparison of bin with to the city road width. It is clear that the selected bin depth produces the bin width closest to the underlying physical city road width. Thus while parameters can be determined empirically, it may be beneficial to utilize known conditions of the space being reconstructed to guide the parameter selection.
It should be noted that TRED could be applied to some of the algorithms presented in Table 5 or even used as the base structure for an algorithm written specifically for map generation from noisy GPS data. Yet we show the comparison to highlight the effectiveness of our algorithm's success even when compared against tailored and domain specific algorithms. With the effectiveness of a generic TRED algorithm in mind, it would be a valuable candidate for extension into a domain specific algorithm as it currently performs comparably with improved runtime. As such a domain specific extension should either see a superior performance to existing algorithms or a top performance with a superior run time.
We conclude the evaluation of the map construction by noting run time comparisons. For available source codes, the runtimes on the Berlin and Chicago datasets are appended to Table 5 for comparison to TRED. From the runtimes gathered, it is clear that TRED is a far more efficient algorithm while still performing comparably on recovering the underlying road network. We see TRED runs in under one minute for each dataset and is well over 30 times faster than even the most efficient algorithm presented. From these performance and runtime comparisons, it is clear that TRED is an extremely efficient algorithm which is able to successfully extract underlying geometry of road networks even without being written as a road extraction algorithm.

Conclusion and Discussion
We propose a method that is able to generate a representative subsampling of points that captures the structure of a trajectory-based density function in space. TRED is robust to sampling frequency both when considering noise due to sampling frequency as well as point density due to sampling frequency. For this reaa) Berlin overlay b) Chicago overlay Fig. 10 The TRED reconstruction (blue lines) utilizing the GPS trajectories of vehicles overlaid on the ground truth network (black).  [6] due to legacy dependency issues. They are seen to be comparable to runtimes presented by [4] which are comparable to runtimes of algorithms run on the specified personal computer. son, TRED is a good candidate for datasets which have noise introduced through sampling techniques as well as inherent inaccuracy of the measurement devices.
We showed TRED is applicable to large data sets which need to be processed in an offline manner as well as streaming data which needs real-time processing. Our algorithm produces a deterministic datastructure which is desirable in reproducing results and creating guarantees about the performance of the algorithms for some given input data. The evaluation of TRED on real-time applications, namely activity recognition and street map generation, shows promising results. Adopting TRED for any spatio-temporal datasets which have an underlying geometry of interest will provide an efficient runtime complexity due to the well understood quadtree data structure while remaining robust to noise. This proof will built on the theoretical framework developed by Agarwal et al. [2] for the analysis of curvature-constrained shortest paths in convex polygons, so we introduce some necessary notation. For a trajectory to be moderate it cannot violate its maximum curvature constraint. Let P be a closed convex polygon with boundary ∂P . A trajectory whose trace is entirely contained within P is called free. If a trajectory is both moderate and free, then we will refer to it as feasible. Pockets are constructed from P and the boundary of a ball ∂B r whose radius r = 1 κ max . Let ∂B r intersect ∂P in at least two locations. Consider two consecutive intersection locations on ∂B r where the arc joining the points with length less than 1 κ max · π lies inside P . We call these points A and B. If the arc joining A and B is clockwise and the turning angle of ∂P [A, B] in the clockwise direction is less than π, Without loss of generality, let the maximum curvature of γ be 1. We drop the variable x from our notation since all sets will be centered around this point. We utilize the following inclusion relationship throughout the proof: We begin by showing the right hand side of Equation 4 from the main article holds true. In particular, the steps below show that C S r 1 ,S r 2 ≤ C S r 1 ,B r 2 ≤ C B r 2 , where C S r 1 ,B r 2 is constructed in a similar way to C S r 1 ,S r 2 except that the identification is done by using the segments within B r 2 . [ Step 1] The bound C S r 1 ,B r 2 ≤ C B r 2 holds: Let us define A S r 1 ,B r 2 = A S r 1 / ∼ which has as elements the intervals corresponding to segments in A S r 1 that are identified using segments corresponding to A B r 2 . There is a clear injective map between A S r 1 ,B r 2 and A B r 2 (i.e., map the elements in A S r 1 ,B r 2 to the interval in A B r 2 that is used on its identification). Hence, we get C S r 1 ,B r 2 ≤ C B r 2 . [ Step 2] The bound C S r 1 ,S r 2 ≤ C S r 1 ,B r 2 holds: We proceed by showing that there is an injective mapping i : A S r 1 ,S r 2 → A S r 1 ,B r 2 . Let us take an element p ∈ A S r 1 ,S r 2 , then this element was formed by taking a subset of intervals {[a j ]} j ⊂ S r 1 and identifying them using a single interval [c (p) , d (p) ] ⊂ S r 2 . There is also an element q ∈ A S r 1 ,B r 2 associated with the interval [a 1 ]). We also have a single interval [e (q) , f (q) ] ⊂ B r 2 associated with q. We let i(p) = q. In order to show that this map is injective, we assume that there exist two elements p 1 and p 2 ∈ A S r 1 ,S r 2 that map to the same q, and show that p 1 = p 2 . By contradiction, we assume that p 1 and p 2 are different. In this case, we must have that [c (p 1 ) , d (p 1 ) ] = [c (p 2 ) , d (p 2 ) ]. If not, then all the segments that each [c (p k ) , d (p k ) ] identify would be the same, hence making p 1 = p 2 . Furthermore, remember that the in- and [e (q) , f (q) ] = ∅ (due to the fact that it is associated with q) then we have that [c (p 1 ) , d (p 1 ) ] ∩ [c (p 2 ) , d (p 2 ) ] = ∅, which is a contradiction. Hence, p 1 = p 2 which implies that the mapping is injective and so C S r 1 ,S r 2 ≤ C S r 1 ,B r 2 .
By combining steps 1 and 2, we conclude that C S r 1 ,S r 2 ≤ C B r 2 . (2) Next, we show that C B r 1 ≤ C S r 1 ,S r 2 . As done previously, we can show this by constructing an injective mapping j : A B r 1 → A S r 1 ,S r 2 . For any element p ∈ A B r 1 associated to the interval [a, b] ⊂ B r 1 , we can find an interval [c, d] ⊂ S r 1 that contains [a, b] (since B r 1 ⊂ S r 1 ), which in turn is identified to an element q ∈ A S r 1 ,S r 2 . We want to show that the mapping j(p) → q is injective. Hence, we want to show that if j(p 1 ) = q = j(p 2 ) then [a 1 , b 1 ] = [a 2 , b 2 ]. By contradiction, if this was not the case then we would have a segment of the trajectory γ that is contained in S r 2 and that escaped and entered B r 1 while remaining in S r 2 . It needs to escape and enter B r 1 in order to form two different components in B r 1 , and it needs to remain within S r 2 in order to be identified as a single element in A S r 1 ,S r 2 . We will show that this is not possible. If the trajectory segment in question exists then it either escapes and enters B r 1 by remaining in B r 2 , or also leaves B r 2 while remaining in S r 2 . The later case means that there is a subsegment that escapes and enters B r 2 while entering and escaping a pocket of S r 2 − B r 2 (see Figure 2 for an illustration). The following steps show that neither of these cases can happen. [ Step 3] There cannot exist a trajectory segment of γ that escapes and enters B r 1 and remains entirely in B r 2 : For a trajectory segment to escape B r 1 and reenter while staying in B r 2 , it would mean that it would have to curve beyond its curvature limits since r 2 ≤ 1/κ max . [ Step 4] There cannot exist a trajectory segment of γ that leaves and enters B r 2 by entering and escaping a pocket of S r 2 − B r 2 : Such a trajectory segment cannot travel between pockets of S r 2 − B r 2 because the set is four disconnected subsets of S r 2 . Since the trajectory must escape B r 2 but remain in S r 2 , it must enter one pocket of S r 2 − B r 2 and reenter B r 2 (i.e., escape the pocket). By Lemma 1, a feasible trajectory that enters a pocket from B r 2 cannot reenter through B r 2 (i.e., escape). That is, such a trajectory segment is not possible.
Hence, by steps 3 and 4, we must have that [a 1 , b 1 ] = [a 2 , b 2 ], j is injective, and so C B r 1 ≤ C S r 1 ,S r 2 . ( This completes the proof.
B Proof of Stability Theorem 2 Figure 3 from the main article provides a constructive proof showing that C S r is an unstable count. Furthermore, if C B r is stable for r < 1 κ max then by Theorem 1 from the main article, it is immediate that C S r 1 ,S r 2 has an upper bound which is controlled by C B r 2 . Hence, it follows that C S r 1 ,S r 2 would be stable for r 2 < 1 κ max . Therefore, the rest of the section focuses on proving that C B r is stable for r < 1 κ max .
Since we are only considering small -perturbation, we only need to consider the segments of γ that are in B r+ . Anything outside this ball cannot affect the count. Since each one of these segments has a fixed length, and perturbations cannot increase its length indefinitely since they are small (i.e., a perturbed curve cannot wrap around itself indefinitely otherwise it would violate the curvature constraints), then we only need to show that a segment of finite length in B r+ can only generate a finite count. Alternatively, such segment can only enter and escape the ball B r a finite number of times.
Without loss of generality, let us assume that the segment starts in B r and escapes at location y 1 and reenters B r through y 2 . We show that the distance between y 1 and y 2 has a lower bounded and hence having an infinite number of entry and exit events is impossible since the segment has finite length. Using the Fortune and Wilfong's algorithm [16], we can find the reachable set for the trajectory escaping through y 1 . Figure 3 shows the reachable set restricted by curvature and trajectories entirely within B (y 1 ). Note that none of these trajectory has reentered B r yet, and they all have length greater or equal to . Hence, this implies that there can only be a finite number of escape and entry points, and the count is bounded and stable. This concludes the proof.