Border surveillance using computer vision enabled robotic swarms for semantically enriched situational awareness (cid:63)

,


Introduction
Political instabilities, war conicts, economic crises and the maximization of personal prot comprise few of the main causalities that result in increased illegal events at border territories. Cross-border crime is referred to any serious crime with a cross-border dimension committed at or along to the external borders [20].
Towards maximizing the overall prot, such activities involve in many cases the Such systems mostly involve video and thermal cameras, dedicated sensors for motion, pressure etc., RFID tags, radars and satellite images. Despite their sucient eectiveness, each system displays either environmental restrictions or limited capacities due to spatial heterogeneity. In addition, the majority of these sensory systems are static resulting to restricted monitored areas strictly depending on their technical specications. As a result, border authorities currently exploit novel technologies posing existing infrastructure as legacy systems. Unmanned Vehicles (UxV) provide such cutting-edge technologies that are utilized as complete border surveillance solutions. In this book chapter, we introduce and analyse relevant robotic technologies combined with swarm intelligence for a completely autonomous border surveillance system. In addition, pioneer visual detection approaches are presented for increased eciency while semantic data representation models upgrade the overall capacities for optimum situation awareness.
The rest of the chapter is organized as follows. Section 2 introduces swarm intelligence as an autonomous navigation scheme while Section 3 presents enhanced visual detection models. Following Section describes semantic enrichment models towards increased situation awareness while Section 5 concludes the chapter by highlighting the benets of such technologies.

Swarm intelligence for autonomous navigation
The utilization of dierent UxVs acquires much popularity in missions that demand immediate situation awareness or are considered as hazardous for the integrity of human lives. Due to these technologies, data acquisition from the operational areas of interest is obtained currently safer, faster and more af-fordable. However, despite the convenience that a UxV can oer, such systems prerequisites a specialized operator in order to command and manipulate the assets. The complexity of the process is increased in missions where multiple UxVs are commanded to complete one major objective. In such cases, not only the total operator number is increased accordingly but also, the personnel must be in continuous communication to achieve the overall mission.
An autonomous, yet safe and secure, navigation system for operating UxVs has been proven to be essential in numerous application elds. Introducing autonomy for navigation objectives decreases the operator's interference in the overall operations since his involvement from a low-level operator is converted into a manipulator of higher-level objectives for the dened missions, without the requirement of a priori knowledge of utilizing multiple and heterogeneous UxVs. After the identication of high-level objectives, the navigation system will commence to design robot trajectories in order to successfully complete the overall goal of the dened mission. During the execution of the dened mission, the operator acts only as a supervisor nonetheless, for safety reasons, the system is responsive to any interference at any moment. Thus, the process is more eective since the operator can utilize multiple UxVs, without any special expertise and training while simultaneously, the eciency of the mission is increased and the operational time is reduced.
The presented autonomous navigation system, developed specically for border security operations, supports three dierent types of missions. More specically: Strictly user dened paths to be executed separately from UxVs. Complete coverage of a polygon Region of Interest (ROI) over a map, utilizing multiple UxVs.
Continuous surveillance of an unknown, dynamically changed ROI utilizing multiple UxVs.
For the rst and most simple mission type, the operator/practitioner identies a set of waypoints for a UxV over a map corresponding to the area of interest.
The module provides high-level controls for the UxVs without the need of special training courses or awareness of technical limitations. Moreover, operating multiple UxVs simultaneously is simplied while the requirement of using multiple operators is no longer valid. This mission type is considered appropriate for objectives when specic locations must be monitored continuously.
The second type of mission provides the feature of commanding a swarm of UxVs to completely scan a user-dened ROI. Thus, the module is appropriate in covering wide, arbitrary-dened territories beneting from the number of UxVs in order to signicantly limit the overall execution time of the mission and constrain human interference. In addition, it is suitable for dierent types of UxVs, requiring just minor adjustments on the mission's parameters according to the UxVs' specications. The overall mission is reduced to a multi-robot Coverage Path Planning (CPP) [5] problem. Receiving as input a polygon for ROI, the number of UxVs and a scanning density (distance between two sequential trajectories), the polygon is represented on an optimized grid for the specic problem, obtaining values that correspond to free space or an obstacle. The entire region is divided into exclusive sub-regions for every UxV with DARP algorithm [14]. For every sub-region, an independent Spanning Tree Coverage (STC) [8] problem is solved. A Minimum Spanning Tree (MST) [9] is constructed and a circumnavigating path is outlined. These paths incorporate energy aware features, posing them as resource ecient (Fig. 1).
Finally, the third mission type provides the capability to the operator to select a region over the map and continuously calculates the optimal monitoring position for every UxV, in order to provide complete situation awareness of the region. The morphology of the region may be completely unknown and dynamically changed, while the number of UxVs may similarly modied even during the mission. The autonomous navigator will reallocate the available resources to provide the best possible result and fulll the overall objective.
A relevant module as reported above was implemented according to a distributed, plug-n-play algorithm for multi-robot applications with a priori noncomputable objective functions [15]. This algorithm extracts a sub-cost function individually for each UxV and achieves the overall objective of the swarm by optimizing them combined. Towards this objective, a distributed methodology according to the cognitive-based adaptive optimization (CAO) algorithm [16] is implemented, that approximates the evolution of each robot's cost function and adequately optimize its decision variables. The entire training procedure is performed online focusing only on problem-specic characteristics that aect the completion of mission objectives. The fast convergence of the algorithm can ensure fast adaptation of the swarm to the mission, not only during the rst stage, but also during modications of the ROI or the swarm itself (Fig. 2). As a result, border personnel acting as operators can leverage such systems without requiring specialized training courses.

Visual detection capabilities
Similarly, due to the heterogeneity of the identied threats, systems utilized by border practitioners should depict enhanced capabilities in identifying specic objects of interest. Considering also that a deployed surveillance system relies on robotic technologies, navigation systems are strictly related to object detection capacities for completeness in the context of autonomous functionalities.
In principle, an object detection model corresponds to a schema for simultaneous recognition and localization over the projection plane of objects of interest within a visual representation. Therefore, the real objective of object detection is to scan the acquired images for identifying any appearance of objects of interest and localizing the detected instances in the processed images. The localization result corresponds to a bounding box surrounding each object of interest, which can be provided in various formats, for example in upper left and lower right coordinates, center coordinates and width and height of the bounding box etc. (Fig. 3). There are two main categories for visual object detectors: two-step and single-step approaches.
The former perform an additional initial step for deciding the objectiveness of the area included in a bounding box to determine the best candidates for objects included in the image. The latter category performs both area selection and label assignment (classication) in the same step. The predominant method belonging to the rst category is Faster RCNN [22] and typical examples of the second category are Single Shot Detector (SSD) [17] and You Only Look Once Detector (YOLO) [21] with the latter having several improved versions. The object detector output involves a list of bounding boxes along with their corresponding class labels and their condence scores. The latter roughly represents the estimation of how condent is the model for the assigned to this bounding box label. Object detection as a capacity is considered overall precise nonetheless,

Fig. 3. Example result of a visual detector.
depending on the level of some limitations, inecient. Thus, a typical approach is to combine this functionality with a tracking module in order to monitor the identied objects. A tracker comprises a module which is provided with an initial bounding box for each detected object and attempts to estimate its motion from a sequence of images or video streams. In most cases, the application of an object tracker is computationally more eective rather than feeding continuously an object detector with sequential frames in systems that require visually identication of specic objects. A typical, yet ecient and fast, tracker relies on the Kernelized Correlation Filters (KCF) [12].
Towards identifying the most ecient object detection model for border surveillance applications, multiple relevant models were deployed and properly evaluated considering both accuracy and execution time. After extensive experiments and evaluations, Faster RCNN [22] resulted the most sucient outcomes for the objects of interests as typically, the objects to be identied display small sizes (due to the height and angle of perception) and the model is reported as the most ecient for this objective.
Towards decreasing the overall execution time of the visual identication system, a KCF tracker [12] is applied between two subsequent frames. At every key-frame, an object id is assigned to each distinct object in order to uniquely identify its presence. During the tracking frames, which are typically larger in number than the key-frames, the object id remains unchanged. At the next keyframes, an Intersection-Over-Union comparison against a xed threshold of the two bounding boxes is applied. The two bounding boxes, resulted from the object detector and from the tracker respectively, are utilized to estimate if the same object is encompassed within the bounding boxes' limitations. The entire scheme is depicted in Fig. 4.  For the evaluation process in order to identify the adequacy of the module, the PascalVoc evaluation metric was exploited [6]. The resulted object detection accuracy values are provided in Table 1   More specic, ontologies are a means for specifying a vocabulary for conceptualizing and representing a shared domain of discourse [10] in a formal, structured and semantically enriched way. Knowledge in ontologies is modelled via the knowledge graphs by dening common components, like classes (objects, concepts, and other entities existing in a domain of interest), properties (attributes, relationships that hold between them), axioms (expressed in a logical form) and rules (if-then statements for logical inferences). With the use of semantic reasoners such as FACT++ [24], Pellet [23] and HermiT [19], logical consequences and new assertions (facts) that are not explicitly expressed in an ontology can be derived.
Ontologies play a key role in facilitating the understanding, sharing and reuse of knowledge between dierent components within complex systems such as swarm robotics. They have been widely used for situation awareness [7], decisionmaking [13], in IoT infrastructures [4], natural language processing [11] and many more. They demonstrate multiple benets and capabilities in improved searching, data integration, interoperability, multilinguality and dynamic content generation in an extensive range of areas such as security, healthcare [18], telecommunications, archive portals and law [3]. In the current work, we focus on the semantic representation and enrichment of sensor-based data sourced from dierent surveillance components (swarm robotics, additional sensors etc.), for extracting potential threats and alerts in the surveillance area, enhancing this way the detections derived from the sensors and improving the situation awareness of the end-users.
Therefore, the corresponding service of the increased situation awareness is strictly dependent with the application and the described operational scenarios. More specically, an ontology was developed for the representation and semantic integration of heterogeneous data generated and exchanged across the cooperative surveillance systems. The proposed semantic model is compliant and extends the EUCISE2020 data model [1], a CISE(Common Information  persons. For demonstration purposes, we consider one rather common scenario in maritime surveillance that involves the detection of an oil spill over sea surface. Whenever an oil spill is detected, an instance of PollutionIncident class (Fig. 7) is created, which involves an incident of OilSpill and is associated to respective PollutionType and NatureType instances. Also, an instance of Detection is created (Fig. 8), which is associated with all rele- Fig. 8. An instance of Detection type associated with an operational asset, a document of reporting and the location of interest.
vant information populated in the AttachedDocument, Geometry and the OperationalAsset classes that made the detection via the appropriate data and object properties including hasAnalysisDataset, hasStartLocation, hasSource.
On the basis of the implemented ontology, semantic reasoning techniques (SPARQL rules and constraints) might be additionalyl adopted to aggregate data from various sources and to achieve both low-level fusion from external resources (such as geospatial services), and high-level fusion by combining information from geographically dispersed and heterogeneous sensors. This approach facilitates the automatic detection and inference of complex events of interest like threats, abnormal activities and illegal border trespassing.
In details, SPARQL is a highly expressive RDF query language that allows us to query the linked data, by matching one more or patterns against the relationships of the knowledge base, while supporting features like aggregation, negation, ltering, constraints, property paths. In the case of the oil spill detection, we may infer for example if the total size of the oil spill(s) detected is higher than a specic value in cubic meters (m2), or retrieve, for example, the number of entities (persons, vessels) detected close to the oil spill or close to the shore.

Conclusions
Recent technology advancements are considered to be suciently mature for integration in many systems and applications. Even in very complex operational scenarios like border surveillance, cutting edge technologies can perform adequately well. The relevant practitioners can benet of such systems towards improving their operational capabilities. As the challenges that they have to confront display signicant diversities, the utilized surveillance systems must integrate specialized capacities.
Towards this objective, swarm robotics can broaden the solutions that are provided to the border practitioners. Such systems enhanced with additional fea-tures can be used eectively to monitor distant territories. In this chapter, three dierent pillars of services in dierent levels of implantation were presented towards describing a fully autonomous and operational surveillance systems. More specic, an optimizer for autonomous navigation of a swarm was presented. The service provides high level commands to the practitioner to mitigate the complexity of operating such systems while retaining nonetheless, their eectiveness in monitoring tasks. In addition, visual recognition of object of interests can increase the detection capabilities of the overall system leading to a truly autonomous surveillance framework. Finally, the integration of semantics improve the practitioners' perception for the identifying events increasing the level of the current situation awareness. These three types of technology have been proven particularly ecient in monitoring tasks leading to optimal surveillance solutions.