Detection of Terrorism-related Twitter Communities using Centrality Scores

Social media are widely used among terrorists to communicate and disseminate their activities. User-to-user interaction (e.g. mentions, follows) leads to the formation of complex networks, with topology that reveals key-players and key-communities in the terrorism domain. Both the administrators of social media platforms and Law Enforcement Agencies seek to identify not only single users but groups of terrorism-related users so that they can reduce the impact of their information exchange efforts. To this end, we propose a novel framework that combines community detection with key-player identification to retrieve communities of terrorism-related social media users. Experiments show that most of the members of each retrieved key-community are already suspended by Twitter, violating its terms, and are hence associated with terrorism-oriented content with high probability.


INTRODUCTION
The rapid growth of the Internet has resulted in modern forms of communication and exchange of information, realized mainly through the use of social media networking platforms (e.g. Twitter, Facebook, etc.), which have dominated the online world during the past few years. Social media networks have made possible the communication among people across nationalities, religions, cultures or residences; however, their great power and reach has become an attractive feature for their use by terrorist and extremist organizations for disseminating their propaganda, recruiting and radicalizing new members, raising funds, organizing operations, and publishing information and instructions exploited by lone-wolf terrorists when preparing and committing acts of terror [27][28][29].
Due to its nature that permits the inexpensive communication of multimedia messages (i.e. tweets) to users worldwide, Twitter has been used primarily for promoting and spreading their propaganda typically using a top-down approach, with a core group of members spreading the group's messages, which are then re-shared by other affiliated accounts. Both the administrators of the social media networking platform itself (Twitter), on the one hand, and the Law Enforcement Agencies (LEAs), on the other, are interested in monitoring terrorism-related activities taking place through the platform. In the former case, the goal is to detect material that violates the platform's terms and conditions regarding extremist content, while in the latter case such information may be very useful in investigations for prosecuting the perpetrators of terrorist attacks. In both cases, it is of vital significance to detect the communities in the social networks and their most prominent users (i.e. key players) who disseminate terrorism-related information, so as to prevent terrorist groups from spreading their propaganda (to the extent possible), by shutting down accounts who are found to play a central role in this information exchange.
Over the past two decades, several research efforts have discussed the network structure of terrorist organizations. One of the early efforts examined the network structure of the 9/11 hijackers along with their accomplices and detected the ring leaders of the terrorist attacks based on their social associations [15]. Later work focused on using social network analysis for examining the basic characteristics of terrorist groups or organizations [26]. More recent research has examined the survival mechanisms of the Global Salafi Jihad (GSJ) terrorist network, even after being severely damaged by the authorities, by analyzing its network structure and topology [30]. In addition, several works have been conducted for studying the use of social media, and especially Twitter, by terrorist organizations. Specifically, a work has examined the significant role of Twitter in facilitating terrorists to execute their attack in Mumbai (November 2008), by monitoring and exploiting situational information which was broadcast through Twitter [19]. More recent research has studied the Islamic State's (IS) strategy for communicating their propaganda for radicalizing and recruiting Twitter users [6]. Furthermore, the significant role played by feeder accounts of terrorist organizations for exchanging information from the Syria insurgency zone is pointed out in [14]. Key player identification in complex networks, on the other hand, has been mainly addressed through the use of different centrality measures; e.g. recent work [10] has used several centrality measures to rank terrorism-related Twitter accounts based on their location in the network and the topology of the network of user-to-user mentions.
This work aims at identifying groups of terrorism-related users exchanging information through social media platforms by detecting the key players of a social media network and the interrelated communities of users interacting with them. To this end, we extend the approach of [10] and propose a hybrid framework which first retrieves the key network players and then enriches the retrieved results by adding the members of a user's detected community based on the combination of centrality scores with community detection algorithms. These centrality measures, which aim to identify key-players in the terrorism domain, are estimated on social media networks based on user mentions and are compared with other popularity measures (i.e. number of followers, number of friends) used for identifying very important users within the structure of these networks. This work also presents a case study on a social media network formed by Twitter accounts based on a set of terrorism-related Arabic keywords provided by LEAs and domain experts, for demonstrating the performance of our proposed framework based on evidence related to the suspension of the majority of the retrieved Twitter accounts.

KEY TERRORISM COMMUNITY DETECTION FRAMEWORK
In this work, entropy-based centrality measures are exploited to first retrieve a list of key-players and then a community detection algorithm to enrich the initial set of results. Our framework is presented in Figure 1, where keyword-based search provides a set of social media posts. Based on this, a network of mentions is created, using the user-to-user interactions contained in the corresponding posts. In the resulting network of users, each user is represented by a node and a link between two users (i, k) exists if user n i mentions or is mentioned by user n k . We use entropy-based centralities to, first, identify key-players [10] and we then extend the method by associating key-players with their community.

Centrality-based key player identification
We denote by G(N , L) the network of mentions with N nodes (users accounts) and L links. The network is unweighted and undirected capturing only the user-to-user interactions in Twitter or any other social media domain. The degree of a node n k is denoted by deд(n k ), and is equal to the number of its adjacent links. The degree is normalized to define the degree centrality as follows [9]: The degree simply counts the number of nodes and is not affected by the position of a hub in the network. However, the betweenness centrality [9] of a node n k is based on the number of paths д i j (n k ) from node n i to node n j that pass through node n k , divided by the number of all paths д i j from node n i to node n j , summed over all pairs of nodes (n i , n j ) and normalized by its maximum value: Nodes with high betweenness centrality are very important for the communication in a network [1] , due to the fact that their removal strongly affects the network connectivity and robustness. Other centrality measures have also been proposed, based on the mutual distances of all nodes (closeness centrality) [9], on the influence of a node (eigenvector centrality) [4], or motivated by the importance of a Web page (PageRank) [5].
In the context of this work, we propose the use of entropy-based centrality measures, such as the Mapping Entropy (ME) and the Mapping Entropy Betweenness (MEB), taking also into account the neighborhood N(n k ) of a node n k . Mapping Entropy centrality [18] is defined as a function of the degree centrality: whereas Mapping Entropy Betweenness centrality [10] is defined as a function of betweenness centrality: Intuitively, to interpret Equations (3) and (4), one may think of a random walker on the network, standing at node n k , who picks his/her next step with probability DC i (BC i ). Then, the weight − log DC i (− log BC i ) is interpreted as the Shannon information of the event that the random walker picked node n i , and is summed over all neighbors of node n k . These two measures consider the information that is communicated through nodes who act as a hub (bridge), i.e. those with high values of degree (betweenness) centrality between any two members. In particular, the MEB centrality considers the betweenness centrality of a node and exploits local information from its neighborhood; hence, high MEB values indicate that a particular node can act as a bridge for disseminating information, even if their degree centrality is low [22].
In the following, we combine the key-player identification methods with community detection approaches that are able to cluster the network into communities of densely connected user accounts.

Community detection around key players
In parallel to the key-player identification, a community detection algorithm is used to divide the network into groups of users (communities). The top-ranked key-player is used to enrich the retrieved results, which is achieved by searching for the community where the key-player belongs to.

Mapping Entropy Betweenness
Community Detection Community Detection Figure 1: Key terrorism-related community detection on the network of Twitter mentions.
Community detection in complex networks aims to identify groups of nodes that are more densely connected to each other within a group than to the rest of the network outside of the group [20]. The groups are communities of users in the social media domain, sharing a common property or playing similar roles within the network [8]. Community structure is very popular in many fields, including sociology and biology [12], as well as computer science [17], and in any domain where systems or items admit a network representation. Detecting communities in complex networks is often viewed as a graph partitioning problem, where all nodes are assigned to a community, but density-based approaches leave out noise, i.e. do not assign all nodes to communities. In our experiments, we shall present and compare both approaches.
Several community detection algorithms have been proposed (e.g. [2,8,12,13,16,21,23,25]). The network is partitioned into communities using either the maximization of modularity [2,17], the minimization of codelength [24] or density-based approaches [11]. We present in the experiments the key-community, defined as the community that the key-player belongs to, as provided by the algorithms FastGreedy [7], Walktrap [21], Infomap [3,24,25], Louvain [2] and DBSCAN*-Martingale [11]. The most popular methods are those aiming at the maximization of modularity, defined as [7]: where e i j is the fraction of links between a node in community i and a node in community j, α i is the fraction of links between two members of the community i, m = k deд(n k ), and c is the number of communities. We adopt the modularity maximization community detection approach as a fast and scalable approach that admits hierarchical and iterative methods [2,20] to maximize the objective function of Equation 5. Assuming the key-player is a member of the k-th community, our framework returns all its members n k 1 , n k 2 , . . . , n k l , all of which are marked as the final list of accounts with suspicious activity.

EXPERIMENTS
We evaluate our framework in a network consisting of terrorismrelated Twitter accounts formed based on user mentions. As ground-truth we make use of information from Twitter, which marks user accounts as suspended, given that the suspension process is applied when an account violates Twitter rules by exhibiting abusive behavior, including posting content related to violent threats and hate speech (Twitter has suspended 360,000 terrorismrelated accounts from mid-2015 until August 2016 1 ). Our data were collected by executing queries on the Twitter API 2 based on a set of five Arabic keywords related to terrorist propaganda. These keywords were provided by LEAs and domain experts and are related to the Caliphate State, its news, publications, and photos from the Caliphate area. The collected dataset consists of 9,528 Twitter posts by 4,400 users. The top-100 user accounts are retrieved in the keyplayer identification step using the ranking methods of Table 1 and are then combined with the community detection approaches of Table 2. The evaluation is performed by assessing whether these accounts are suspended, active or no longer exist (i.e. accounts which have been temporarily or permanently deactivated).
The first part of our framework evaluates several centrality measures, including the proposed Mapping Entropy and Mapping Entropy Betweenness, as well as popularity measures, such as the number of friends and followers, in terms of their ability to retrieve suspended users. The results in Table 1 indicate that the entropy-based centralities ME and MEB are able to retrieve the first suspended user at position 16, while PageRank follows at position 19. Other centrality and popularity measures, such as closeness, eigenvector and number of followers do not find any suspended  user at the top-100 positions of their retrieved users. We observe that the network is very spread with many bridges and a diameter equal to 27, so key-players are expected to be positioned in between many pairs of nodes in the network, exploiting also their neighborhood's high betweenness centrality. The K-th order neighborhood N K of node n j is the set of all nodes that are reachable from n j within K − 1 intermediate nodes: is the network distance of any two nodes. In Figure 2 we show the first (K = 1), second (K = 2), third (K = 3), fifth (K = 5) and tenth (K = 10) order neighborhoods of the first suspended user and the largest connected component. Although the ME and MEB centralities both retrieve a suspended user at rank 16, the user does not correspond to the same Twitter account. In fact, the Twitter user at the 16 th position of ME centrality leads to a disconnected component of two users, where one of them is suspended and the other is not. However, the neighborhood of the suspended user ( Figure 2) from the MEB centrality is part of the largest connected component of the network with 1,334 accounts. Therefore, we proceed to the next step by considering the MEB centrality measure and not ME.
Given the first identified suspended user in the MEB ranking, we explore the community where the user belongs to. The results are reported in Table 2, along with the community size per community  detection method. We observe that in all cases examined, the majority of accounts are already suspended and some of them no longer exist. In particular, the modularity maximization methods (Fast-Greedy, Louvain) are able to retrieve the largest communities and thus more accounts with potentially illegal activity. The percentage of suspended users is 82.76% for the modularity maximization approaches and 78% for the Walktrap and DBSCAN*-Martingale, indicating a marginal advantage for the former. The community provided by Infomap is very small, compared to the other community sizes, but still the number of active accounts (not yet suspended) is only 20%. Figure 3 depicts sample content from such active accounts that have not been marked as suspended by Twitter. One may note that their content is military-themed, indicating potentially suspicious user activity even in non-suspended accounts.

CONCLUSIONS
We proposed a hybrid model that combines MEB centrality and community detection that retrieves groups of social media user accounts that are key-players in the terrorism domain. We found that centrality measures on the network of mentions perform better than other popularity measures (number of followers or friends) in finding key-players in the terrorism domain. Given a terrorism-related user, his/her network community reveals a group of additional terrorismrelated users, exploiting the outcome of a community detection method, with modularity maximization methods outperforming density-based and other methods.