Data Mining of Network Events with Space-Time Cube Application

The scientific and practical principles of data analysis are proposed with the help of methodology of spacetime cube construction as one of the types of data mining with spatial-temporal distribution. The use of this method on the example of information analysis from subscribers of one of the major mobile operator networks allows to carry out statistical analysis and to detect statistically significant spatio-temporal clusters in the data that can be used during data structuring in order to provide safety and react quickly to hazardous situations.


I. INTRODUCTION
Starting from 2013, Ukraine's national security becomes more and more important.Every day, citizens of Ukraine meet with the most diverse threats of natural, technological, social and military character.Dangerous processes, extreme events, catastrophes, virtual and real terrorism, and so onthese are exactly the things to which public authorities need to react practically every day.In addition, rapid response to an emergency significantly increases the chances of reducing the number of victims or other negative consequences, ranging from a specific person to the size of the state.
The modern world is extremely rich in the most diverse information, the vast arrays of which people collect, store, analyze, and on the basis of it tries to make forecasts and predictions.This is especially true for information that directly and indirectly affects human security.Equally important information for government bodies, whether public or private institutions, the correct analysis of which allows you to take a step in the right (progressive) direction.But operating big data, which are dispersed not only in a large area, but also in a large time interval, usually make it difficult to make the right decision, or significantly increase the time for its adoption.
One of the main issues of our time is "security".Often, for which you need to make a quick and balanced decision.Especially when it comes to eliminating the consequences of a certain disaster, where the bill goes, sometimes, for a minute.For example, international statistics show that the number of rescued after the earthquake directly depends on the beginning of rescue operations.If the saviors arrive in the earthquake zone in the first three hours, they can save up to 90% of the survivors, in six hours -50%.In the future, the chances of salvation are greatly reduced.Only by means of rapid response can reduce the number of victims by 20-30 percent [1][2].That is why it is important to get information and to give an answer immediately after an emergency, than after a while, and moreover, to prevent an emergency in advance, which can lead to a significant number of victims.

II. BACKGROUND
Recently, security issues are becoming increasingly significant, due to the increasing number of threats to ordinary people and the region or the country as a whole.One of the options for solving this issue is to study, analyze and forecast the event by building a spatial-temporal cube.For the first time, the use of the space-time cube was proposed by T. Hägerstrand in the early 70's, [3] whose possibilities he described in his work "What about people in regional science?".But the active development of geographic information systems (further GIS), its use was limited.Only in the 2000s there are work on the use of the spatial-temporal cube in GIS.In the works of this period, new possibilities for using the spatial-temporal cube were presented using GIS, including earthquake surveys [4][5][6][7].
The next steps in using the spatial-temporal cube method were its application in the intelligent analysis of data of a variety of nature: crime analysis, infrastructure studies, animal behavior analysis, human motion visualization, dependence studies on weather conditions changes over time, and much more [8][9][10][11][12][13].In the field of data mining in Ukraine widely known of the Institute for Applied System Analysis NTUU "KPI", World Data Center for Geoinformatics and Sustainable Development [14][15][16][17][18].

III. GOAL AND TASKS
The goal of the work is to analyze the spatial-temporal regularities in the distribution of events in the Vodafone network based on the use of the methodology of space-time cube construction.
The tasks are:

•
to study the methodology of using the space-time cube for the data mining of spatial-temporal data; • study of the application peculiarities of the spacetime cube construction method for the analysis of space-time series of data generated by users of Vodafone telecommunication network; • the use of building space-time cube for distribution analysis of spatial and temporal patterns of mobile data for the purpose of emergency response to natural and social emergencies.

IV. SPATIAL-TIME CUBE
Spatial-time cube is a 3D visualization technology designed to simultaneously represent spatial and temporal characteristics of motion.Accordingly, trajectory points are displayed in three-dimensional space, where the vertical axis usually expresses time [19].
In the early 70's T. Hägerstrand [3] proposed the use of a graphical approach to reflecting time as an addition to spatial measurements.He developed a three-dimensional diagram as a spatio-temporal cube, which allows you to visually explore space-time events and processes interactions.The cube's base reflects a flat geographic dimension, and the cube's height is time.Initially, the tool was designed to manually reproduce graphics.In our time, there are several approaches to the automated construction of such models using the tools of modern GIS.
The use of the space-time cube requires spatial and temporal data, for the purpose of analyzing certain events.Examples of such events include earthquakes, road accidents, cases of disease or the observation of rare animals [6].
T. Hägerstrand proposed to apply the space-time cube to the data on the motion of objects on the changes of spatial sites with an anchor to time.In this paper, the authors propose to apply the concept of T. Hägerstrand to another type of data, namely, to analyze network events.
In addition, the use of the spatial-temporal cube makes it possible to answer 3 questions of Puke [5], supplied to spatial-temporal data [20]: • when plus where → what: description of objects or a set of objects that are present in a certain place or a set of locations for a certain time or time interval; • when plus what → where: a description of the location or set of locations occupied by a particular object or set of objects at a specific time or time interval; • where plus what → when: a description of a specific time or interval of time when a particular object or set of objects occupied a particular place or set of locations.
In addition to the space-time cube, a number of other methods are also used to display the dynamics of events in time.
The Time Slices model is one of the first spatio-temporal data models.Its main features: 1. Storage of data at a regular interval of time.
2. Separate data sets for each specific time interval.The research uses the data provided by Vodafone, which has spatial and temporal bindings, as well as some attribute information.The processed database has 1,5 million calls, messages and exits to the Internet, from the most diverse devices and from different subscribers.All "events" are concentrated practically in the western regions of Ukraine and has geographic coordinate system WGS84.
In order to achieve the goals, the authors use a set of tools for in-depth analysis of spatial and temporal regularities in the software ArcMap 10.5.This toolkit contains statistical tools for analyzing data distribution and identifying patterns in the context of space-time.The set includes tools: Create a Space-Time Cube and Analyze the emergence of hot spots.
The dataset structure has a combined set of attributes that characterize the nature of the communication event, location, feature of calls and devices, as well as subscriber preferences.The description of network events is the event type, which is divided into incoming-outgoing calls, SMS and Internet usage.The location is described by the direction and coordinates of the signal receiving station.The peculiarities of network events include the tariff plan, the category of numbers, the amount of Internet traffic, the cost of use and the type of device that is distributed to ordinary mobile phones and different types of smartphones.Personal preferences of the client are presented in the form of three attributes describing the interests located in the first, second and third place for the subscriber.As examples of such preferences are the categories of science, culture, tourism, travel, football, etc.
The creation of a spatio-temporal cube takes place by arranging point data of events in space and time in the form of a cubic structure, which is formed in a special netCDF format.The hot spot analysis tool uses a cubic structure to detect statistically significant trends over time.This type of analysis is well suited for studying offenses, outbreaks of infections, events in social networks.
The base unit of the cube is the bin of space-time (Fig. 1), which counts the number of points in time and each location using the Mann-Kendall statistics.[20].The spatial-temporal cube consists of rows, columns, and time steps, which together form the total number of bins in the cube.The rows and columns correspond to the placement of objects in the latitude and longitude plane, and the cube height corresponds to the time period.If an event occurred for a certain period of time, it will be fixed in a certain bin with spatial-temporal characteristics.
In the presence of at least one spatial-temporal event creates a new bin of data.Bin without data gets a zero number of events, but can be saved in a common structure to maintain data continuity.Information on the number of such bins is given as the amount of discharge.

V. MANN-KENDALL TEST
As input objects there can be only point classes that describe the events that have taken place.Such events may include network events, emergencies, trade operations, and other events that are time-consuming and space-based.Time binding is done using the attribute in the Date format.The toolkit works in a range of events from 60 to 2 billion, which allows for sufficient flexibility in data processing.In order to obtain valid data of distances calculations, rectangular coordinate systems with corresponding projections are used.
An important part of the tool's operation is the analytical operations over the data bins used during the simulation.The basic set of operations is the definition of the general trend of data, which is calculated on the basis of time series.Using trend analysis allows you to determine the positive or negative trends in the number of events.The trend analysis is based on Mann-Kendall's statistics.
The non-parametric Mann-Kendall test is commonly employed to detect monotonic trends in series of data.The null hypothesis, H_0, is that the data come from a population with independent realizations and are identically distributed.The alternative hypothesis, H_A, is that the data follow a monotonic trend.The Mann-Kendall test statistic is calculated according to: sgn( ) The mean of S is [ ] 0 E S = and the variance 2 where is the number of the tied groups in the data set and is the number of data points in the th tied group.The statistic is approximately normal distributed provided that the following Z-transformation is employed: The statistic S is closely related to Kendall's t as given by: where The resulting Vodafone data set is in the time period from June 1, 2017 to August 31, 2017.For the convenience of analysis, the authors used a 5-day time step.As a result, the tool built a cube with a height of 19 bins (Fig. 2).

VI. HOT SPOTS ANALYSIS
Tool Analysis of hot spots determines trends in the cluster of density of points (calculations) or fields of sums in a cube.The categories of cold and hot spots include the following characteristics [21]: new, consistent, growing, constant, declining, sporadic, and fluctuating historical (Fig. 3).The tool for analysis of hot spots determines the variability in the input cube based on the application of the mathematical calculator of Getis-Ord Gi statistics.This calculation is made for each bin in the cube in relation to its neighbors.
The Hot Spot method calculates a statistic for each event in the data set.The final values of p (probability) and zestimates (standard deviations) indicate that in what region of the space clustered events with high or low values [5].The method works by analyzing each event in the context of the neighboring geography of events.To be a statistically significant hot spot, the event must have a high value and be surrounded by other approaches with also high values.The local amount for the event and its neighbors is proportional to the sum of all events; when the local amount is very different from the expected local amount, and if this discrepancy is too large to be the result of a random process, a statistically significant z-score is obtained.Hot dots statistics uses the formula: ( ) 1 ( ) where xj is the attributive value for the event j, wi, j is the spatial weight between the events i and j, n is the total number of events, X is the mean of the arithmetic values of the course, S is the dispersion.
The statistical value Gi gives each object in the set its own z-score.If the z-score has a positive value, then the probability of the intensity of the clustering of hot spots increases, which is proportional to the size of the positive estimate.Negative z-values are directly proportional to the intensity of clustering of low values and correspond to cold points.
Output objects are added to the Table of Contents and represent a summary of the spatial-temporal analysis for all the analyzed locations.In addition to creating a class of Output objects, the summary analysis results are recorded in the Results window (Fig. 4).

VII. LOCAL OUTLIER ANALYSIS
The analysis tools group includes the Local Outlier Analysis tool, which allows you to identify significant statistical data in both space and time.To determine statistically significant data outiers, the Anselin Local Moran I statistical is used statistic option, which calculates the value of each bin relative to its neighbors.
where i x is an attribute for feature i , X is the mean of the corresponding attribute , i j w is the spatial weight between feature i and j , and: with n equating to the total number of features.
The i I z -score for the statistics are computed as: where: The presence of positive evaluations for I is a certificate that is adjacent to objects with similar values that may be part of a cluster.Negative values indicate the difference between the estimates of the object and its neighbors.In all cases, the value of p for the object must be small so that the cluster is determined to be statistically significant.
To determine the belonging of the bin to the clusters, the rules of the conceptualization of spatial relationships are first defined, which determine the belonging of the bin to one of the clusters.Further, the values of bins are estimated in proximity to the center of the cluster.
Bins with high values of local emissions contain abnormal changes in the behavior of users, which may have a different nature both positive and negative.Together with the use of classifiers and social news dissemination channels, they can be identified and transmitted to relevant government agencies and services.The toolkit for building a spatial-temporal cube provides a convenient visual interface for data mining of big data.The use of the spatial-temporal cube is practically possible in virtually all areas where it is necessary to analyze the behavior of objects and events occurring with the change of location in space and time.
An example of the use of spatial-temporal analysis of data for events in mobile networks, for example, of the Vodafon network, makes it possible to use the data more effectively, primarily for security purposes, which will be useful to governmental organizations for the rapid detection or prevention of dangerous situations (such as terrorism, extraordinary events, catastrophes, etc.).In the future, using the spatial-temporal cube based on the data of mobile operators, it is possible to analyze the statistical emissions in the activity of subscribers in calls or connecting people to the Internet with an anchorage of a certain territory, which will allow to identify certain anomalies and respond accordingly.

3 .
Time-dependent (cross-time) classification of data storage objects.Such a model is convenient at the stage of transition from the spatial to the space-time model.The model of time series.This is a model with a base state and subsequent changes.Unlike the model of temporary layers, only the basic state of objects and their changes are stored, through irregular, in general, intervals of time.Thus, the time series model contains much less redundant data than the temporal layer model.

Fig. 2 .
Fig. 2. 3D visualization of space-time cube for western regions of Ukraine

Fig. 3 .
Fig. 3. Transformation of space-time cube for analysis through hot spots Bin in the cube has the properties of position and time recording in a three-dimensional structure, which is written in the attributes LOCATION_ID, time_step_ID, COUNT.The same values of the spatial and temporal identifiers of the bins can be associated with the corresponding rows and

Fig. 4 .
Fig. 4. Map occurrence of hot spots, resulting space-time analysis cube The cluster analysis tool divides bins and sets of objects in the category of cluster allocation with high and low trend values.In this process, statistical outiers in spatial data are also determined.Based on the calculation of z and p values of Anselin Local Moran I statistics, each time series receives the coded value of belonging to a particular cluster with the corresponding statistical value.The local Moran's I statistic of spatial association is given as:

Fig. 5 .
Fig. 5.The map of local outlier, created as a result of the analysis of the space-time cube