The H2020 ANITA platform: generating knowledge about crime through user-centred innovative tools

: This paper illustrates a novel user-centred and secure investigation platform to discover relevant data sources disseminated on the Web (including the Dark Web) and to analyse, enrich and correlate them to support knowledge-generation and investigations on illegal trafficking activities. The platform was designed and implemented in the context of the Horizon 2020 project ANITA (https://www.anita-project.eu/) to effectively support Law Enforcement Agencies in better understanding and investigating the online illegal trafficking framework and further improve their capacity to analyse complex criminal scenarios and big amount of relevant data. The paper illustrates the ANITA platform capabilities through some of the ‘horizontal criminal scenarios’ defined by the Project to explore connections between illegal trafficking activities concerning different goods. These ‘scenarios’ refer to cases of the same vendor distributing both firearms and synthetic drugs on the same or in different crypto markets, the same profiles found to be active on both some Dark Web crypto markets and Surface platforms or terrorist attacks performed with firearms bought on the Dark Web.


I. INTRODUCTION
Over the recent years, online illegal trafficking activities have hugely elaborated and expanded. The illegal trade of drugs and firearms represents a serious international threat that involves countless unscrupulous criminal networks. The production, trafficking and distribution of illegal drugs and firearms are even more interrelated and key enablers for other criminal activities and violent acts, including terrorist attacks [1]. Law Enforcement Agencies (LEAs) have been invested both on new technologies and in training activities to equip officers and practitioners with the necessary tools and professional skills for fighting the rapid increasing of the illegal trafficking phenomenon. To efficiently understand the organisational structure, the criminal behaviours and dynamics and their interconnections and interactions, it is of vital importance to collect and analyse all relevant online available clues in near real-time and to correlate them with closed-source information provided by the LEAs, in order to identify hidden relationships, recurrent strategies and emerging patterns in support of illegal activities. Among the most emblematic and widespread aspects of online illegal trafficking are the cases of counterfeit/falsified medicines, drugs, and Novel Psychoactive Substances (NPS), weapons and firearms, and terrorism funding. Surface Web, Deep Web and Dark Nets can be regarded as key crime-facilitators in this context [2]. They offer new opportunities to the organised criminal groups, assisting them in: a) cooperating in a more effective, efficient, anonymous and secure way, although dislocated in diverse countries (e.g., through the usage of P2P or Tor networks); b) to manage criminal activities, with anonymous online payments, through crypto-currencies (e.g., Bitcoin); and c) to enlarge the crime-asa-service activities and the related black markets (e.g., Tochka, Agartha market [3]) in the Dark Web. Scope of this article is to present the H2020 ANITA Project's platform, which aims to provide LEAs with an integrated platform equipped with tools and services to detect, investigate and mitigate online illegal trafficking activities, especially in black markets, through the automated monitoring of Surface and Dark Web source, analysis and correlation of acquired information, and generation of new knowledge to propose to end users to support investigative activities, while maintaining the chain of custody on acquired contents throughout their journey in the platform, from their acquisition to their export. Based on ontological inference and reasoning mechanisms, the process of knowledge generation and management starts from acquired information and is conducted through a continuous interaction with the user, which is the final recipient of such knowledge. Indeed, generated knowledge is proposed to the user under the form of "suggestions", which the user can validate or not. If the user validates the proposed suggestion, it becomes validated knowledge, that can be further reused by the platform to generate and propose new suggestions to the user. For this reason, ANITA platform can be considered knowledgecentred and user-oriented and the quality level of the knowledge produced depends on the continuous interaction between the user's contribution and the platform capabilities. The rest of the paper is organized as follow: Section II provides an overview of state-of-the-art frameworks and platforms currently used to support LEAs activities in combating online illegal activities. Then, the ANITA platform is illustrated in Section III. Operational scenarios in which ANITA has been tested to support LEAs are depicted in Section IV. Conclusions are reported in Section V.

ILLEGAL ACTIVITIES
Criminal networks apply constantly changing operational models related to online illicit trafficking to get in contact with victims, anonymise their identities, promote their products on online markets, share information both in clear (videos, handbooks, etc.) and encrypted (e.g., PGP-encrypted emails) way, and manage fake social profiles to get financing. These criminal phenomena increase the exigency of solutions that enable LEAs and security organisations to fight them in the most rapid and accurate way. Some research approaches are emerging for the identification of suspected web sites and markets in the Dark web [4] [5], sometimes linked with Surface web sites to gain more visibility and attract more sellers [6]. However, the maturity level is usually low, most of them only proofs on concept, and need additional refinements to be used in operational environments. More comprehensive platforms have been also built and are continuously updated and revised according to these needs. Such solutions usually comprehend one or more applications dedicated to information collection, curation and exploration. Paterva Maltego [7] is a software for open-source intelligence and forensics, discovery of data from open sources, and visualisation of information in a graph format, suitable for link analysis and data mining. The focus of the application is analysing real-world relationships between people, groups, websites, domains, networks, internet infrastructure, and affiliations with online services exploiting open-source intelligence (OSINT) techniques (DNS records, whois records, search engines, social networks, various online APIs). Maltego does not provide integrated features for multimedia analysis, crawling on Deep web and Dark nets, visual analytics. IBM i2 Intelligence Analysis Portfolio [8] consists in a suite of products for intelligence and investigative operations, like managing information on people, groups, companies, websites, vehicles, etc., and their relationships through graph 1 The project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement n° 787061 and topic SEC-12-FCT-2016-2017 visualization, generating alerts whenever an item changes, creating customised charts, and performing data mining. However, it does not support source crawling and monitoring, nor multimedia analysis and chain of custody. Palantir Technologies [9] offers a general-purpose suite of software tools for integrating, visualising and analysing information; some applications in the law enforcement have been built by means of this suite. It does not offer integrated tools for online information acquisition, or video analysis. SAS Intelligence and Investigation Management [10] provides data management, advanced analytics and operational solutions for gathering and managing intelligence, conducting complex investigations, and improving investigative processes and workflows. It does not include monitoring capabilities and multimedia analysis. Additional software solutions, more or less similar to the previous ones, are also available in the market of investigation management. However, all these solutions, due to their commercial nature (and then born to be acquired by a broader set of customers), can address generic tasks related to intelligence and investigative activities, without providing relevant support in specific contexts, like for instance illicit trafficking domain. ANITA platform aims to be complementary, focusing on real-time analysis of OSINT as well as deep and dark web sources, providing an integrated platform to LEAs and potential stakeholders. Additionally, ANITA focuses specifically on monitoring, analysing, and detecting dangerous activity considering knowledge of the illegal trafficking domain with all its peculiarities, and therefore respond the requirements of LEAs in this field. Furthermore, its multimodal analysis capability that includes cryptocurrency transactions allows a complete and precise investigative reconstruction usable for prosecution purposes thanks to a lawful chain of custody.
III. ANITA PLATFORM ANITA stands for Advanced tools for fighting oNline Illegal TrAfficking and aims to provide Law Enforcement Agencies with an integrated platform equipped with tools to boost investigative work to combat illicit trafficking activities. The project has received funding from the European Union's Horizon 2020 programme 1 and focuses on three main use cases related to drugs, weapons, and terrorism financing, and are used to validate the capabilities and potentialities of the platform. ANITA provides officers with a set of interactive applications to support them in the discovery of illegal trafficking activities conducted online, specifically for searching and monitoring over time online sources of interest (both on Surface and Dark web), extracting information from multimedia (documents, videos, images and audio files) contents through advanced and context-oriented analysis tools, generating new knowledge under the form of suggestions to propose to users, based on previously collected information and fresh analysis results, visualizing and exploring analytics on current events and (Technologies for prevention, investigation, and mitigation in the context of fight against crime and terrorism).
trends. It assumes the continuous interaction with users, that is responsible for seeing and validating the proposed suggestions. Once validated, the suggestions become available knowledge to be reused to generate further suggestions (Fig. 1).

Fig. 1: Knowledge validation process
The platform is case-oriented: users can operate inside dedicated workspaces, which can be assigned with specific permissions. Access to the acquired information is provided in a user-friendly and meaningful way through graph-oriented visualization and exploration patterns (Fig. 2). Finally, users can share and use information externally to the platform by exporting collected data and the related chain of custody, under the form of documental report (for prosecution purposes) or machine-readable package (to be read by external software systems or other ANITA installations). Multiple tasks running in background can be started to monitor online sources. These monitoring processes leverage on embedded crawlers able to download and reconstruct the navigability of crawled sites, enabling users to browse offline those sites. Contents can also be searched online as well as uploaded by users inside the platform. Data acquisition is performed by SONAR (Source iNtegration And monitoring fRamework), which is a framework designed and developed by Engineering Ingegneria Informatica SpA that consists in several crawlers/scrapers integrated as third-party services and managed by a single node (the SONAR core) in order to separate the crawlers' layer. This way, each crawler can have its specific custom crawling process for a specific source, and it can even be developed and coded with different languages. The advantage of this layer separation is the possibility to add or remove crawlers on the fly, creating, in fact, a plug-n-play framework. Moreover, a strong benefit of the framework is the implementation of unified data models for requests and data collected, making it a middleware between the ANITA platform and the internet. Whenever a user performs a search or a source monitoring request in ANITA, the platform contacts the SONAR core with the same request data model, then the core routes the request to crawlers that can fulfil it. Data collected by crawlers are very heterogeneous since they come from different internet sources (e.g., forums, social media, blogs, Dark web pages), and they can be retrieved in different ways (e.g., traditional scraping, API, third party services), so each crawler performs an homogeneity process, mapping the collected data in a single and unified data model as general as possible to describe data coming from most of the internet sources. Search and monitoring are the two possible requests allowed: the former is a simple "one-shot" search task, while the latter represents a search over time, very useful when users want to track a specific source and all its changes by collecting different versions of the monitored source. Moreover, SONAR has a granular management of tasks with the opportunity to edit, delete, suspend, resume, and check running tasks. Once acquired, contents are automatically analysed by integrated tools able to extract information on vendors, products, locations, organizations, and other items of interest (like phone numbers, crypto addresses, email addresses, pills, medicines, rifles, guns, laboratories, 3D printers, but also stylometric writeprints over anonymous texts). Extracted information is then correlated to that already stored inside the platform: result of these correlations consists in a knowledge graph, which supports users in connecting-the-dots about a case. The process of knowledge enrichment, useful to reconstruct criminal activities, consists in the exploitation of reasoning mechanisms applied to the information stored in the knowledge graph for the generation of suggestions that are proposed to the user and that can open new avenues for the resolution of the case. A suggestion is a "virtual" sub-graph that connects entities already present in the knowledge graph (Fig.  3). Such a sub-graph is virtual because it is not (yet) part of the investigation, it does not represent real knowledge and needs to be validated by the user before being added to the knowledge graph. Since it is a sub-graph, a suggestion consists of one or more relationships and nodes: the simplest form of a suggestion is a single relationship, linking two nodes in the knowledge graph. Produced suggestions are proposed as grey parts of the knowledge graph, waiting for users to validate them. A user can decide to validate or not a suggestion, as well as to leave it as is in the knowledge graph and decide on it later. This is because in an investigation the collection of the required information to validate a suggestion could come at different time frames. Suggestions are produced on the basis of a) similarity metrics between nodes, b) matches between information stored in the knowledge graph and crawled contents, c) ontological inference and topological patterns identified on the basis of how nodes are interlinked between them. For the latter case, a dedicated ontology has been realised, focused on illegal trafficking domain and events of interest. The ontology includes entities managed inside the knowledge graph (Fig. 4) [11] and on the marketing and use of explosives precursors [12] currently in force. Starting from this ontology, it was possible to set ontological rules to identify patterns of events and entities in the knowledge graph and derive more complex illegal trafficking-related activities from them. A knowledge pattern (KP) is the topological definition of a sub-graph in the knowledge graph that can be relevant to the domain of application. Generally, a sub-graph contains more information than the sum of single entities that compose it, since it also involves the relationships (with types and metadata) existing among these entities. This makes KPs suitable to represent complex information that exists among one or more entities and that would not be represented with the same level of accuracy by considering each single entity at time. By defining domain-oriented KPs and including a set of rules based on the temporal, spatial and causal relationships between events, it has been possible to recognize high-level illegal activities starting from low-level information about resources, people, digital identities, groups, organizations, objects, events and their relationships (Fig. 6). KPs that represent complex illegal activities have then a wellknown topology. Starting from knowledge of these expected KP topologies, it is also possible to identify incomplete KPs. Incomplete patterns can be useful to alert users about potential suspicious activities that can occur and to suggest users which kind of missing information to look for to complete a pattern. In the former case, users can prevent that an illicit activity occurs. In the latter case, in case of successful pattern completion, users can verify that an illegal activity has really occurred. This way, the process can guide not only in "recognizing", but also in "tracking" emerging illegal activities. Once the user validates a suggestion, nodes and relationships belonging to that suggestion are stored in the knowledge graph, ready to be exploited for further tasks. That way, both the user and the platform take advantages from this iterative and interactive approach: the user receives support from the platform in discovering hidden links and considering additional investigative pathways and strategies, while the platform benefits from user validation feedback to minimize the risk of false positives presence, thus maintaining only high-quality information in the knowledge graph.
IV. OPERATIONAL SCENARIOS ANITA platform has been tested by users during two pilot sessions planned for the project. Two realistic scenarios were prepared which combine the project use cases: drugs trafficking, weapons trafficking, and terrorist funding. The scenarios include actors that are active both in Dark web markets and on the Surface web. The collection of information, their analysis and understanding -with the support of ANITA -can provide useful insights for both monitoring and investigative purposes. The added value is represented by the capacity to connect-the-dots, experiment with different paths of analysis and validate diverse hypothesis, through a user-centred approach. The first scenario focuses on a man, arrested because in possess of cocaine bought on a specific black market. Police want to know more on who is behind this illegal business. First steps that are performed by a user through ANITA is starting a monitoring task on the monitored market and uploading the suspicious emails found in the laptop of the arrested man. ANITA starts retrieving contents from the selected market, as well as analysing uploaded emails, from which information is extracted, added in the knowledge graph, and compared to that collected from Agartha market. A match between an email address and a vendor nickname is found and proposed to the user as suggestion, as well as the same vendor and the cocaine product (Fig. 7). Through the continuous interaction between the system, through the generation of suggestions, and the user, through the validation of proposed suggestion, it is found that the vendor has also left traces on Reddit, as well as on Surface web sites, where other email addresses are found, leading to the hypothesis that a criminal network, instead of single individual, could be behind the selling of cocaine (and possibly other illegal products). The second scenario is about a person stopped by police during a daily roadblock. Police found a rifle inside his car. During the interrogation, the car driver admitted having bought the rifle on Darkmarket. The police want to gather more information -and possibly evidence -about vendors selling this type of rifle, so to better assess if the business is done legally or not. From the monitoring of Darkmarket, a vendor is found to sell the same rifle found in the car, but also other kinds of weapons, fake identity cards, drive licences and credit cards (Fig. 8). Then, the user starts searching through ANITA more information on the vendor, finding that two web markets on the Surface web are active, registered under two different aliases (one of them equal to the vendor nickname), but with "About us" page with identical description and with one equal phone number (multiple numbers are present in the Contact page of both markets). Then, it is found that one of the other phone numbers is referring to a Pinterest account collecting pins related to identity cards of different nationality. This leads the user to the hypothesis that the vendor could be linked to other types of illegal activities and continue to investigate on that with the support of ANITA. ANITA enables to demonstrate in court the entire investigative reconstruction thanks to its own lawful chain of custody and evidence (CoC/CoE). The need of embedding a CoC/CoE mechanism into ANITA system originates from the danger that stored data could be accessed and manipulated in unauthorised and malicious way to compromise current investigations. Data stored into the ANITA system by the CoC/CoE can be divided in two main categories: multimedia resources and metadata generated manually or added after validation of analysis results. Chain of evidence consists in the calculation of a digital mark that uniquely identifies a resource every time it is modified. Its hash value is continuously recalculated and compared to the known one, to verify its integrity. Whenever a hash value of a resource does not correspond to that computed at acquisition time, this means that the resource has been corrupted in unauthorized way. Integrity of resources forensically marked in this way can be proved and ensured through comparison of hash values. Instead, chain of custody must be maintained for both resources and metadata. Chain of custody refers to mechanisms for logging activities that have been performed by a user or the system over stored information. Chain of custody, together with chain of evidence in case of resources, enables the forensic reuse of information stored into ANITA system. Functionalities to handle the evidence export have been realised to allow users to export information of the investigative reconstruction. The output of the export services consists in an archive package containing metadata of the investigation to which the entities being exported belong, metadata of those entities, potential relationships among them tracked into ANITA, and the chain of custody (Fig. 9). The ANITA export package format was realised according to the two assumptions: • any LEA could have its own ANITA instance running on their premises, most likely running within a secured networking infrastructure directly managed by the IT team of that LEA; • because of different legislations and procedures in force into any nation, it would be advisable that the sharing of information is managed at organizational level: organisations could agree through signed documents the sharing of information.
Due to these assumptions, the solution adopted to enable information sharing capabilities between two physically separated ANITA instances was to include in the system complementary export and import services to be used by end users whenever needed (Fig. 10). ANITA platform has been realized to support LEAs' activities in combating illegal trafficking activities, as well as related crimes with different nature but somehow linked. Through a knowledge-based and user-centred approach, it was shown how the users and the platform itself can benefit from each other in investigative cases. Their continuous interaction enables to connect-the-dots in faster and higher-quality way, while always keeping in the loop the human factor in the validation of suggestions proposed by the platform.