Who Let The Trolls Out? Towards Understanding State-Sponsored Trolls

Over the past few years, extensive anecdotal evidence emerged that suggests the involvement of state-sponsored actors (or"trolls") in online political campaigns with the goal to manipulate public opinion and sow discord. Recently, Twitter and Reddit released ground truth data about Russian and Iranian state-sponsored actors that were active on their platforms. In this paper, we analyze these ground truth datasets across several axes to understand how these actors operate, how they evolve over time, who are their targets, how their strategies changed over time, and what is their influence to the Web's information ecosystem. Among other things we find: a) campaigns of these actors were influenced by real-world events; b) these actors were employing different tactics and had different targets over time, thus their automated detection is not straightforward; and c) Russian trolls were clearly pro-Trump, whereas Iranian trolls were anti-Trump. Finally, using Hawkes Processes, we quantified the influence that these actors had to four Web communities: Reddit, Twitter, 4chan's Politically Incorrect board (/pol/), and Gab, finding that Russian trolls were more influential than Iranians with the exception of /pol/.


Introduction
Recent political events and elections have been increasingly accompanied by reports of disinformation campaigns attributed to state-sponsored actors [16]. In particular, "troll farms," allegedly employed by Russian state agencies, have been actively commenting and posting content on social media to further the Kremlin's political agenda [44].
Despite the growing relevance of state-sponsored disinformation, the activity of accounts linked to such efforts has not been thoroughly studied. Previous work has mostly looked at campaigns run by bots [16,22,37]. However, automated content diffusion is only a part of the issue. In fact, recent research has shown that human actors are actually key in spreading false information on Twitter [41]. Overall, many aspects of statesponsored disinformation remain unclear, e.g., how do statesponsored trolls operate? What kind of content do they disseminate? How does their behavior change over time? And, perhaps more importantly, is it possible to quantify the influence they have on the overall information ecosystem on the Web?
In this paper, we aim to address these questions, by relying on two different sources of ground truth data about statesponsored actors. First, we use 10M tweets posted by Russian and Iranian trolls between 2012 and 2018 [20]. Second, we use a list of 944 Russian trolls, identified by Reddit, and find all their posts between 2015 and 2018 [38]. We analyze the two datasets across several axes in order to understand their behavior and how it changes over time, their targets, and the content they shared. For the latter, we leverage word embeddings to understand in what context specific words/hashtags are used and shed light to the ideology of the trolls. Finally, we use Hawkes Processes [29] to model the influence that the Russian and Iranian trolls had over multiple Web communities; namely, Twitter, Reddit, 4chan's Politically Incorrect board (/pol/) [23], and Gab [52].
Main findings. Our study leads to several key observations: 1. Our influence estimation experiments reveal that Russian trolls were extremely influential and efficient in spreading URLs on Twitter. Also, when we compare their influence and efficiency to Iranian trolls, we find that Russian trolls were more efficient and influential in spreading URLs on Twitter, Reddit, Gab, but not on /pol/.
2. By leveraging word embeddings, we find ideological differences between Russian and Iranian trolls. For instance, we find that Russian trolls were pro-Trump, while Iranian trolls were anti-Trump. 3. We find evidence that the Iranian campaigns were motivated by real-world events. Specifically, campaigns against France and Saudi Arabia coincided with real-world events that affect the relations between these countries and Iran. 4. We observe that the behavior of trolls varies over time. We find substantial changes in the use of language and Twitter clients over time for both Russian and Iranian trolls. These insights allow us to understand the targets of the orchestrated campaigns for each type of trolls over time. 5. We find that the topics of interest and discussion vary across Web communities. For example, we find evidence that Russian trolls on Reddit were extensively discussing about cryptocurrencies, while this does not apply in great extent for the Russian trolls on Twitter.

Related Work
We now review previous work on opinion manipulation as well as politically motivated disinformation on the Web.
Opinion manipulation. The practice of swaying opinion in Web communities has become a hot-button issue as malicious actors are intensifying their efforts to push their subversive agenda. Kumar et al. [27] study how users create multiple accounts, called sockpuppets, that actively participate in some communities with the goal to manipulate users' opinions. Mihaylov et al. [31] show that trolls can indeed manipulate users' opinions in online forums. In follow-up work, Mihaylov and Nakov [32] highlight two types of trolls: those paid to operate and those that are called out as such by other users. Then, Volkova and Bell [47] aim to predict the deletion of Twitter accounts because they are trolls, focusing on those that shared content related to the Russia-Ukraine crisis. Elyashar et al. [15] distinguish authentic discussions from campaigns to manipulate the public's opinion, using a set of similarity functions alongside historical data. Also, Steward et al. [42] focus on discussions related to the Black Lives Matter movement and how content from Russian trolls was retweeted by other users. Using community detection techniques, they unveil that Russian trolls infiltrated both left and right leaning communities, setting out to push specific narratives. Finally, Varol et al. [46] aim to identify memes (ideas) that become popular due to coordinated efforts, and achieve a 75% AUC score before memes become trending and a 95% AUC score afterwards.
False information on the political stage. Conover et al. [9] focus on Twitter activity over the six weeks leading to the 2010 US midterm elections and the interactions between right and left leaning communities. Ratkiewicz et al. [37] study political campaigns using multiple controlled accounts to disseminate support for an individual or opinion. Specifically, they use machine learning to detect the early stages of false political information spreading on Twitter. Wong et al. [50] aim to quantify the political leanings of users and news outlets during the 2012 US presidential election on Twitter by formulating the problem as an ill-posed linear inverse problem, and using an inference engine that considers tweeting and retweeting behavior of articles. Yang et al. [51] investigate the topics of discussions on Twitter for 51 US political persons showing that Democrats and Republicans are active in a similar way on Twitter, although the former tend to use hashtags more frequently. Le et al. [28] study 50M tweets pertaining to the 2016 US election primaries and highlight the importance of three factors in political discussions on social media, namely the party (e.g., Republican or Democrat), policy considerations (e.g., foreign policy), and personality of the candidates (e.g., intelligent or determined).
Howard and Kollanyi [24] study the role of bots in Twitter conversations during the 2016 Brexit referendum. They find that most tweets are in favor of Brexit, that there are bots with various levels of automation, and that 1% of the accounts generate 33% of the overall messages. Also, Hegelich and Janetzko [22] investigate whether bots on Twitter are used as political actors. By exposing and analyzing 1.7K bots on Twitter during the Russia-Ukraine conflict, they uncover their political agenda and show that bots exhibit various behaviors, e.g., trying to hide their identity, promoting topics through the use of hashtags, and retweeting messages with particularly interesting content. Badawy et al. [3] aim to predict users that are likely to spread information from state-sponsored actors, while Dutt et al. [13] focus on the Facebook platform and analyze ads shared by Russian trolls in order to find the cues that make them effective. Finally, a large body of work focuses on social bots [5,10,17,16,45] and their role in spreading political disinformation, highlighting that they can manipulate the public's opinion at a large scale, thus potentially affecting the outcome of political elections. Remarks. Unlike previous work, our study focuses on a set of Russian and Iranian trolls that were suspended by Twitter and Reddit. To the best of our knowledge, this constitutes the first effort not only to characterize a ground truth of troll accounts independently identified by Twitter and Reddit, but also to quantify their influence on the greater Web, specifically, on Twitter as well as on other communities like Reddit, 4chan, and Gab.

Background
In this section, we provide a brief overview of the social networks studied in this paper, i.e., Twitter, Reddit, 4chan, and Gab, which we choose because they are impactful actors on the Web's information ecosystem [54,53,52,23]. Note that the two latter Web communities are only used in our influence estimation experiments (see Section 6), where we aim to understand the influence that trolls had to these Web communities. Twitter. Twitter is a mainstream social network, where users can broadcast short messages, called "tweets," to their followers. Tweets may contain hashtags, which enable the easy index and search of messages, as well as mentions, which refer to other users on Twitter. Reddit. Reddit is a news aggregator with several social features. It allows users to post URLs along with a title; posts can get up-and down-votes, which dictate the popularity and order in which they appear on the platform. Reddit is divided to "subreddits," which are forums created by users that focus on a particular topic (e.g., /r/The Donald is about discussions around Donald Trump). 4chan. 4chan is an imageboard forum, organized in communities called "boards," each with a different topic of interest. A user can create a new post by uploading an image with or without some text; others can reply below with or without images. 4chan is an anonymous community, and several of its boards are reportedly responsible for a substantial amount of hateful content [23]. In this work we focus on the Politically Incorrect board (/pol/) mainly because it is the main board for the discussion of politics and world events. Furthermore, 4chan is  We report the overall number of identified trolls, the trolls that had at least one tweet/post, and the overall number of tweets/posts. ephemeral, i.e., there is a limited number of active threads and all threads are permanently deleted after a week. We collect our 4chan dataset, between June 30, 2016, and October 20, 2018, using the methodology described in [23], ultimately collecting 98M posts.
Gab. Gab is a social network launched in August 2016 aiming to provide a platform for free speech and explicitly welcomes users banned from other communities.. It combines features from Twitter (broadcast of 300-character messages, called "gabs") and Reddit (content popularity according to a voting system). It also has extremely lax moderation policies; it allows everything except illegal pornography, terrorist propaganda, and doxing [40]. Overall, Gab attracts alt-right users, conspiracy theorists, and high volumes of hate speech [52]. We collect 46M posts, posted on Gab between August 10, 2016 and October 20, 2018, using the same methodology as in [52].

Troll Datasets
In this section, we describe our dataset of Russian and Iranian trolls on Twitter and Reddit.
Twitter. On October 17, 2018, Twitter released a large dataset of Russian and Iranian troll accounts [20]. Although the exact methodology used to determine that these accounts were state-sponsored trolls is unknown, based on the most recent Department of Justice indictment [11], the dataset appears to have been constructed in a manner that we can assume essentially no false positives, while we cannot make any postulation about false negatives. Table 1 summarizes the troll dataset.
Reddit. On April 10, 2018, Reddit released a list of 944 accounts which they determined were operated by actors working on behalf of the Russian government [38]. We recover the submissions, comments, and account details for these accounts using two mechanisms: 1) dumps of Reddit provided by Pushshift [35]; and 2) crawling the user pages of those accounts. Although omitted for lack of space, we note that the union of these two data sources reveals some gaps in both, likely due to a combination of subreddit moderators removing posts or the troll users themselves deleting them, which would affect the two data sources in different ways. In any case, for our purposes, we merge the two datasets, with Table 1 describing the final dataset. Note that only about one third (335) of the accounts released by Reddit had at least one submission or comment in our dataset. We suspect the rest were simply used as dedicated upvote/downvote accounts used in an effort to push (or bury) specific content.
Ethics. Although we only work with publicly available data,   we follow standard ethical guidelines [39] and make no attempt to de-anonymize users.

Analysis
In this section, we present an in-depth analysis of the activities and the behavior of Russian and Iranian trolls on Twitter and Reddit.

Accounts Characteristics
First we explore when the accounts appeared, what they posed as, and how many followers/friends they had on Twitter.
Account Creation. Fig. 1 plots the Russian and Iranian troll accounts creation dates on Twitter and Reddit. We observe that the majority of Russian troll accounts were created around the time of the Ukrainian conflict: 80% of have an account creation date earlier than 2016. That said, there are some meaningful peaks in account creation during 2016 and 2017. 57 accounts were created between July 3-17, 2016, which was right before the start of the Republican National Convention (July 18-21) where Donald Trump was named the Republican nominee for President [48] . Later, 190 accounts were created between July, 2017 and August, 2017, during the run up to the infamous Unite the Right rally in Charlottesville [49]. Taken together, this might be evidence of coordinated activities aimed at manipulating users' opinions on Twitter with respect to specific events. This is further evidenced when examining the Russian trolls on Reddit: 75% of Russian troll accounts on Reddit were created in a single massive burst in the first half of 2015. Also, there are a few smaller spikes occurring just prior to the 2016 US Presidential election. For the Iranian trolls on Twitter we observe that they are much "younger," with the larger bursts of account creation after the 2016 US Presidential election. Account Information. To avoid being obvious, state sponsored trolls might attempt to present a persona that masks their true nature or otherwise ingratiates themselves to their target audience. By examining the profile description of trolls we can get a feeling for how they might have cultivated this persona. In Table 2, we report the top ten words and bigrams that appear in profile descriptions of trolls on Twitter. Note that we do this only for Twitter trolls as we do not have descriptions for Reddit accounts. From the table we see that a relatively large number of Russian trolls pose as news outlets, with "news" (1.3%) and "breaking news" (0.8%) appearing in their description. Further, they seem to use their profile description to more explicitly increase their reach on Twitter, by nudging users to follow them (e.g., "follow me" appearing in almost 6.4% of profile descriptions). Finally, 3.4% of the Russian trolls describe themselves as Trump supporters: see "trump" (4.4%) and "maga" (3.4%) terms. Iranian trolls are even more likely to pose as news outlets or journalists: 3.6% have "journalist" and 3.2% have "news" in their profile descriptions. This highlights that accounts that pose as news outlets may in fact be accounts controlled by statesponsored actors, hence regular users should critically think in order to assess whether the account is credible or not. Followers/Friends. Fig. 2 plots the CDF of the number of followers and friends for both Russian and Iranian trolls. 25% of Iranian trolls had more than 1k followers, while the same applies for only 15% of the Russian trolls. In general, Iranian trolls tend to have more followers than Russian trolls (median of 392 and 132, respectively). Both Russian and Iranian trolls tend to follow a large number of users, probably in an attempt to increase their follower count via reciprocal follows. Iranian trolls have a median followers to friends ratio of 0.51, while Russian trolls have a ratio of 0.74. This might indicate that Iranian trolls were more effective in acquiring followers without resorting in massive followings of other users, or perhaps that they took advantages of services that offer followers for sale [43].

Temporal Analysis
We next explore aggregate troll activity over time, looking for behavioral patterns. Fig. 3(a) plots the (normalized) volume of tweets/posts shared per week in our dataset. We observe that both Russian and Iranian trolls on Twitter became active during the Ukrainian conflict. Although lower in overall volume, there an increasing trend starts around August 2016 and continues through summer of 2017.   We also see three major spikes in activity by Russian trolls on Reddit. The first is during the latter half of 2015, approximately around the time that Donald Trump announced his candidacy for President. Next, we see solid activity through the middle of 2016, trailing off shortly before the election. Finally, we see another burst of activity in late 2017 through early 2018, at which point the trolls were detected and had their accounts locked by Reddit.
Next, we examine the hour of day and week that the trolls post. Fig. 3(b) shows that Russian trolls on Twitter are active throughout the day, while on Reddit they are particularly active during the first hours of the day. Similarly, Iranian trolls on Twitter tend to be active from early morning until 13:00 UTC. In Fig. 3(c), we report temporal characteristics based on hour of the week, finding that Russian trolls on Twitter follow a diurnal pattern with slightly less activity during Sunday. In contrast, Russian trolls on Reddit and Iranian trolls on Twitter are particularly active during the first days of the week, while their activity decreases during the weekend. For Iranians this is likely due to the Iranian work week being from Sunday to Wednesday with a half day on Thursday.
But are all trolls in our dataset active throughout the span of our datasets? To answer this question, we plot the percentage of unique troll accounts that are active per week in Fig. 4 from which we draw the following observations. First, the Russian troll campaign on Twitter targeting Ukraine was much more diverse in terms of accounts when compared to later campaigns. There are several possible explanations for this. One explanation is that trolls learned from their Ukrainian campaign and became more efficient in later campaigns, perhaps relying on large networks of bots in their earlier campaigns which were later abandoned in favor of more focused campaigns like project Lakhta [12]. Another explanation could be that attacks on the US election might have required "better trained" trolls, perhaps those that could speak English more convincingly. The Iranians, on the other hand, seem to be slowly building their troll army over time. There is a steadily increasing number of active trolls posting per week over time. We speculate that this is due to their troll program coming online in a slow-but-steady manner, perhaps due to more effective training. Finally, on Reddit we see most Russian trolls posted irregularly, possibly performing other operations on the platform like manipulating votes on other posts.
Next, we investigate the point in time when each troll in our dataset made his first and last tweet. Fig. 5 shows the number of users that made their first/last post for each week in our dataset, which highlights when trolls became active as well as when they "retired." We see that Russian trolls on Twitter made their first posts during early 2014, almost certainly in response to the Ukrainian conflict. When looking at the last tweets of Russian trolls on Twitter we see that a substantial portion of the trolls "retired" by the end of 2015. In all likelihood this is because the Ukrainian conflict was over and Russia turned their information warfare arsenal to other targets (e.g., the USA, this is also aligned with the increase in the use of English language, see Section 5.3). When looking at Russian trolls on Reddit, we do not see a substantial spike in first posts close to the time that the majority of the accounts were created (see Fig. 1). This indicates that the newly created Russian trolls on Reddit became active gradually (in terms of posting behavior).
Finally, we assess whether Russian and Iranian trolls mention or retweet each other, and how this behavior occurs over time. Fig. 6 shows the number of tweets that were mentioning/retweeting other trolls' tweets over the course of our datasets. Russian trolls were particularly fond of this strategy during 2014 and 2015, while Iranian trolls started using this strategy after August, 2017. This again highlights how the strategies employed by trolls adapts and evolves to new campaigns.

Languages and Clients
In this section, we study the languages that Russian and Iranian Twitter trolls posted in, as well as their Twitter clients they used to make tweets (this information is not available for Reddit).
Languages. First we study the languages used by trolls as it provides an indication of their targets. The language information is included in the datasets released by Twitter. Fig. 7(a) plots the CDF of the number of languages used by troll accounts. We find that 80% and 75% of the Russian and Iranian trolls, respectively, use more than 2 languages. Next, we note that in general, Iranian trolls tend to use fewer languages than Russian trolls. The most popular language for Russian trolls is Russian (53% of all tweets), followed by English (36%), Deutsch (1%), and Ukrainian (0.9%). For Iranian trolls we find that French is the most popular language (28% of tweets), followed by English (24%), Arabic (13%), and Turkish (8%). Fig. 8 plots the use of different languages over time. Fig. 8(a) and Fig. 8(b) plot the percentage of tweets that were in a given language on a given week for Russian and Iranian trolls, respectively, in a stacked fashion, which lets us see how the usage of different languages changed over time relative to each other. Fig. 8(c) and Fig. 8(d) plot the language use from a different perspective: normalized to the overall number of tweets in a given language. This view gives us a better idea of how the use of each particular language changed over time. From the plots we make the following observations. First, there is a clear shift in targets based on the campaign. For example, Fig. 8(a) shows that the overwhelming majority of early tweets by Russian trolls were in Russian, with English only reaching the volume of Russian language tweets in 2016. This coincides with the "retirement" of several Russian trolls on Twitter (see Fig 5). Next, we see evidence of other campaigns, for example German language tweets begin showing up in early to mid 2016, and For the Iranians, we see more obvious evidence of multiple campaigns. For example, although Turkish and English are present for most of the timeline, French quickly becomes a commonly used language in the latter half of 2013, becoming the dominant language used from around May 2014 until the end of 2015. This is likely due to political events that happened during this time period. E.g., in November, 2013 France blocked a stopgap deal related to Iran's uranium enrichment program [21], leading to some fiery rhetoric from Iran's government (and apparently the launch of a troll campaign targeting French speakers). As tweets in French fall off, we also observe a dramatic increase in the use of Arabic in early 2016. This coincides with an attack on the Saudi embassy in Tehran [33], the primary reason the two countries ended diplomatic relations. When looking at the language usage normalized by the total number of tweets in that language, we can get a more focused perspective. In particular, from Fig. 8(c) it becomes strikingly clear that the initial burst of Russian troll activity was targeted at Ukraine, with the majority of Ukrainian language tweets coinciding directly with the Crimean conflict [4]. From Fig. 8(d) we observe that English language tweets from Iranian trolls, while consistently present over time, have a relative peak corresponding with French language tweets, likely indicating an attempt to influence non-French speakers with respect to the campaign against French speakers. Client usage. Finally, we analyze the clients used to post tweets. When looking at the most popular clients, we find that Russian and Iranian trolls use the main Twitter Web Client (28.5% for Russian trolls, and 62.2% for Iranian trolls). This is in contrast with what normal users use: using a random set of Twitter users, we find that mobile clients make up a large chunk of tweets (48%), followed by the TweetDeck dashboard (32%). We next look at how many different clients trolls use throughout our dataset: in Fig. 7(b), we plot the CDF of the number of clients used per user. 25% and 21% of the Russian and Iranian trolls, respectively, use only one client, while in general Russian trolls tend to use more clients than Iranians. Fig. 9 plots the usage of clients over time in terms of weekly tweets by Russian and Iranian trolls. We observe that the Russians ( Fig. 9(a)) started off with almost exclusive use of the "twitterfeed" client. Usage of this client drops off when it was shutdown in October, 2016. During the Ukrainian crisis, however, we see several new clients come into the mix. Iranians ( Fig. 9(b)) started off almost exclusively using the "facebook" Twitter client. To the best of our knowledge, this is a client that automatically Tweets any posts you make on Facebook, indicating that Iranians likely started with a campaign on Facebook. At the beginning of 2014, we see a shift to using the Twitter Web Client, which only begins to decrease towards the end of 2015. Of particular note in Fig. 9(b) is the appearance of "dlvr.it," an automated social media manager, in the beginning of 2015. This corresponds with the creation of IUVM [25], which is a fabricated ecosystem of (fake) news outlets and social media accounts created by the Iranians, and might indicate that Iranian trolls stepped up their game around that time, starting using services that allowed them for better account orchestration to run their campaigns more effectively.

Geographical Analysis
We then study users' location, relying on the self-reported location field in their profiles. Note that this field is not required, and users are also able to change it whenever they like, so we look at locations for each tweet. Note that 16.8% and 20.9% of the tweets from Russian and Iranians trolls, respectively, do not include a self-reported location. To infer the geographical location from the self-reported text, we use pigeo [36], which provides geographical information (e.g., latitude, longitude, country, etc.) given the text that corresponds to a location. Specifically, we extract 626 self-reported locations for the Russian trolls and 201 locations for the Iranian trolls. Then, we use pigeo to systematically obtain a geographical location (and its associated coordinates) for each text that corresponds to a location. Fig. 10 shows the locations inferred for Russian trolls (red circles) and Iranian trolls (green triangles). The size of the shapes on the map indicates the number of tweets that appear on each location. We observe that most of the tweets from Russian trolls come from locations within Russia (34%), the USA (29%), and some from European countries, like United Kingdom (16%), Germany (0.8%), and Ukraine (0.6%). This suggests that Russian trolls may be pretending to be from certain countries, e.g., USA or United Kingdom, aiming to pose as locals and effectively manipulate opinions. A similar pattern exists with Iranian trolls, which were particularly active in France (26%), Brazil (9%), the USA (8%), Turkey (7%), and Saudi Arabia (7%). It is also worth noting that Iranians trolls, unlike Russian trolls, did not report locations from their country, indicating that these trolls were primarily used for campaigns targeting foreign countries. Finally, we note that the locationbased findings are in-line with the findings on the languages analysis (see Section 5.3), further evidencing that both Russian and Iranian trolls were specifically targeting different countries over time.

Content Analysis
Word Embeddings Recent indictments by the US Department of Justice have indicated that troll messaging was crafted, with certain phrases and terminology designated for use in certain contexts. To get a better handle on how this was expressed, we build two word2vec models on the corpus of tweets: one for the Russian trolls and one for the Iranian trolls. To train the models, we first extract the tweets posted in English, according to the data provided by Twitter. Then, we remove stop words, perform stemming, tokenize the tweets, and keep only words that appear   at least 500 and 100 times for the Russian and Iranian trolls, respectively. Table 3 shows the top 10 most similar terms to "maga" for each model. We see a marked difference between its usage by Russian and Iranian trolls. Russian trolls are clearly pushing heavily in favor of Donald Trump, while it is the exact opposite with Iranians.
Hashtags. Next, we aim to understand the use of hashtags with a focus on the ones written in English. In Table 4, we report the top 20 English hashtags for both Russian and Iranian trolls. State-sponsored trolls appear to use hashtags to disseminate news (9.5%) and politics (3.0%) related content, but also use several that might be indicators of propaganda and/or controversial topics, e.g., #BlackLivesMatter. For instance, one notable example is: "WATCH: Here is a typical #BlackLives-Matter protester: 'I hope I kill all white babes!' #BatonRouge <url>" on July 17, 2016. Note that <url> denotes a link. Fig. 11 shows a visualization of hashtag usage built from the two word2vec models. Here, we show hashgtags used in a similar context, by constructing a graph where nodes are words that correspond to hashtags from the word2vec models, and edges are weighted by the cosine distances (as produced by the word2vec models) between the hashtags. After trimming out all edges between nodes with weight less than a threshold, based on methodology from [18], we run the community detection heuristic presented in [7], and mark each community with a different color. Finally, the graph is layed out with the ForceAt-las2 algorithm [26], which takes into account the weight of the edges when laying out the nodes in 2-dimensional space. Note that the size of the nodes is proportional to the number of times the hashtag appeared in each dataset.
We first observe that, in Fig. 11(a) there is a central mass of  Figure 11: Visualization of the top hashtags used by a) Russian trolls on Twitter (see [2] for interactive version) and b) Iranian trolls on Twitter (see [1] for an interactive version).
what we consider "general audience" hashtags (see green community on the center of the graph): hashtags related to a holiday or a specific trending topic (but non-political) hashtag. In the bottom right portion of the plot we observe "general news" related categories; in particular American sports related hashtags (e.g., "baseball"). Next, we see a community of hashtags (light blue, towards the bottom left of the graph) clearly related to Trump's attacks on Hillary Clinton. The Iranian trolls again show different behavior. There is a community of hashtags related to nuclear talks (orange), a community related to Palestine (light blue), and a community that is clearly anti-Trump (pink). The central green community exposes some of the ways they pushed the IUVM fake news network by using innocuous hashtags like "#MyDatingProfile-Says" as well as politically motivated ones like "#JerusalemIs-TheEternalCapitalOfPalestine. " We also study when these hashtags are used by the trolls, finding that most of them are well distributed over time. How-ever we find some interesting exceptions. We highlight a few of these in Fig. 12, which plots the top ten hashtags that Russian and Iranian trolls posted with substantially different rates before and after the 2016 US Presidential election. The set of hashtags was determined by examining the relative change in posting volume before and after the election. From the plots we make several observations. First, we note that more general audience hashtags remain a staple of Russian trolls before the election (the relative decrease corresponds to the overall relative decrease in troll activity following the Crimea conflict). They also use relatively innocuous/ephemeral hashtags like #IHatePokemonGoBeacause, likely in an attempt to hide the true nature of their accounts. That said, we also see them attaching to politically divisive hashtags like #BlackLivesMatters around the time that Donald Trump won the Republican Presidential primaries in June 2016. In the ramp up to the 2016 election, we see a variety of clearly political related hashtags, with #MAGA seeing substantial peaks starting in early 2017  Figure 12: Top ten hashtags that appear a) c) substantially more times before the US elections rather than after the elections; and b) d) substantially more times after the elections rather than before.
(higher than any peak during the 2016 Presidential campaigns).
We also see a large number of politically ephemeral hashtags attacking Obama and a campaign to push the border wall between Mexico. In addition to these politically oriented hashtags, we again see the usage of ephemeral hashtags related to holidays. #SurvivalGuideToThanksgiving in late November 2016 is particularly interesting as it was heavily used for discussing how to deal with interacting with family members with wildly different view points on the recent election results. This hashtag was exclusively used to give trolls a vector to sow discord. When it comes to Iranian trolls, we note that, prior to the 2016 election, they share many posts with hashtags related to Hillary Clinton (see Fig. 12(c)). After the election they shift to posting negatively about Donald Trump (see Fig. 12(d)).
LDA analysis. We also use the Latent Dirichlet Allocation (LDA) model [6] to analyze tweets' semantics. We train an LDA model for each of the datasets and extract ten distinct topics with ten words, as reported in Table 5. While both Russian and Iranian trolls tweet about politics related topics, for Iranian trolls, this seems to be focused more on regional, and possibly even internal issues. For example, "iran" itself is a common term in several of the topics, as is "israel," "saudi," "yemen," and "isis." While both sets of trolls discuss the proxy war in Syria (in which both states are involved), while the Iranian trolls have topics pertaining to Russia and Putin, the Russian trolls do not make any mention of Iran, instead focusing on more vague political topics like gun control and racism. For Russian trolls on Reddit (see Table 6) we again find topics related to politics as well some topics related to discussions about cryptocurrencies (see topics 9 and 10).
Subreddits. Fig. 13 shows the top 20 subreddits that Russian trolls on Reddit exploited and their respective percentage of posts over the whole dataset. The most popular subreddit is /r/uncen (11% of posts), which is a subreddit created by a specific Russian troll and, via manual examination, appears to be primarily used to disseminate news articles of questionable credibility. Other popular subreddits include general audience subreddits like /r/funny (6%) and /r/AskReddit (4%), likely in an attempt to obfuscate the fact that they are state-sponsored trolls in the same way that innocuous hashtags were used on Twitter. Finally, it is worth noting that the Russian trolls were particularly active on communities related to cryptocurrencies like /r/CryptoCurrency (3.6%) and /r/Bitcoin (1%) possibly attempting to influence the prices of specific cryptocurrencies. This is particularly noteworthy considering cryptocurrencies have been reportedly used to launder money, evade capital controls, and perhaps used to evade sanctions [34,8].
URLs. We next analyze the URLs included in the tweets/posts. In Table 7, we report the top 20 domains for both Russian and Iranian trolls. Livejournal (5.4%) is the most popular domain in the Russian trolls dataset on Twitter, likely due the Ukrainian  campaign. Overall, we can observe the impact of the Crimean conflict, with essentially all domains posted by the Russian trolls being Russian language or Russian oriented. One exception to Russian language sites is RT, the Russian-controlled propaganda outlet. The Iranian trolls similarly post more "localized" domains, for example, jordan-times, but we also see them heavily pushing the IUVM fake news network. When it comes to Russian trolls on Reddit, we find that they were mostly posting random images through Imgur (image-hosting site, 16% of the posts), likely in an attempt to accumulate karma score. We also note that a substantial portion of posts contained URLs to (fake) news sites linked with the Internet Research Agency like blackmattersus.com(5.7%) and donotshootus.us (2.5%).  information that they share (e.g., lure other users into posting similar content) [14]. Therefore, we now set out to determine their impact in terms of the dissemination of information on Twitter, and on the greater Web.

Influence Estimation
To assess their influence, we look at three different groups of URLs: 1) URLs shared by Russian trolls on Twitter, 2) URLs shared by Iranian trolls on Twitter, and 3) URLs shared by both Russian and Iranian trolls on Twitter. We then find all posts that include any of these URLs in the following Web communities: Reddit, Twitter (from the 1% Streaming API, with posts from confirmed Russian and Iranian trolls removed), Gab, and 4chan's Politically Incorrect board (/pol/). For Reddit and Twitter our dataset spans January 2016 to October 2018, for /pol/ it spans July 2016 to October 2018, and for Gab it spans August 2016 to October 2018. 1 We select these communities as previous work shows they play an important and influential role on the dissemination of news [54] and memes [53]. Table 8 summarizes the number of events (i.e., occurrences of a given URL) for each community/group of users that we consider (Russia refers to Russian trolls on Twitter, while Iran refers to Iranian trolls on Twitter). Note that we decouple The Donald from the rest of Reddit as previous work showed that it is quite efficient in pushing information in other communities [53]. From the table we make several observations: 1) Twitter has the largest number of events in all groups of URLs mainly because it is the largest community and 2) Gab has a considerably large number of events; more than /pol/ and The Donald, which are bigger communities.
For each unique URL, we fit a statistical model known as Hawkes Processes [29,30], which allows us to estimate the strength of connections between each of these communities in terms of how likely an event -the URL being posted by either trolls or normal users to a particular platform -is to cause subsequent events in each of the groups. We fit each Hawkes model using the methodology presented by [53]. In a nutshell, by fitting a Hawkes model we obtain all the necessary parame-     Figure 15: Influence from source to destination community, normalized by the number of events in the source community for URLs shared by a) Russian trolls; b) Iranian trolls; and c) Both Russian and Iranian trolls. We also include the total external influence of each community.
ters that allow us to assess the root cause of each event (i.e., the community that is "responsible" for the creation of the event). By aggregating the root causes for all events we are able to measure the influence and efficiency of each Web community we considered.
We demonstrate our results with two different metrics: 1) the absolute influence, or percentage of events on the destination community caused by events on the source community and 2) the influence relative to size, which shows the number of events caused on the destination platform as a percent of the number of events on the source platform. The latter can also be interpreted as a measure of how efficient a community is in pushing URLs to other communities. Fig. 14 reports our results for the absolute influence for each group of URLs. When looking at the influence for the URLs shared by Russian trolls on Twitter (Fig. 14(a)), we find that Russian trolls were particularly influential to users from Gab (1.9%), the rest of Twitter (1.29%), and /pol/ (1.08%). When looking at the communities that influenced the Russian trolls we find the rest of Twitter (7%) followed by Reddit (4%). By looking at URLs shared by Iranian trolls on Twitter (Fig. 14(b)), we find that Iranian trolls were most successful in pushing URLs to The Donald (1.52%), the rest of Reddit (1.39%), and Gab (1.05%), somewhat ironic considering The Donald and Gab's zealous pro-Trump leanings and the Iranian trolls' clear anti-Trump leanings [19,52]. Similarly to Russian trolls, the Iranian trolls were most influenced by Reddit (5.6%) and the rest of Twitter (4.6%). When looking at the URLs posted by both Russian and Iranian trolls we find that, overall, the Russian trolls were more influential in spreading URLs to the other Web communities with the exception of (again, somewhat ironically) /pol/.
But how do these results change when we normalize the influence with respect to the number of events that each community creates? Fig. 15 shows the influence relative to size for each pair of communities/groups of users. For URLs shared by Russian trolls (Fig. 15(a)) we find that Russian trolls were particularly efficient in spreading the URLs to Twitter (10.4%)which is not a surprise, given that the accounts operate directly on this platform-and Gab (3.19%). For the URLs shared by Iranian trolls, we again observe that were most efficient in pushing the URLs to Twitter (3.6%), and the rest of Reddit (2.04%). Also, it is worth noting that in both groups of URLs The Donald had the highest external influence to the other platforms. This highlights that The Donald is an impactful actor in the information ecosystem and is quite possibly exploited by trolls as a vector to push specific information to other communities. Finally, when looking at the URLs shared by both Russian and Iranian trolls, we find that Russian trolls were more efficient (greater impact relative to the number of URLs posted) at spreading URLs in all the communities with the exception of /pol/, where Iranians were more efficient.

Discussion & Conclusion
In this paper, we analyzed the behavior and evolution of Russian and Iranian trolls on Twitter and Reddit during the course of several years. We shed light to the target campaigns of each group of trolls, we examined how their behavior evolved over time, and what content they disseminated. Furthermore, we find some interesting differences between the trolls depending on their origin and the platform from which they operate. For instance, for the latter, we find discussions related to cryptocurrencies only on Reddit by Russian trolls, while for the former we find that Russian trolls were pro-Trump and Iranian trolls anti-Trump. Also, we quantify the influence that these statesponsored trolls had on several mainstream and alternative Web communities (Twitter, Reddit, /pol/, and Gab), showing that Russian trolls were more efficient and influential in spreading URLs on other Web communities than Iranian trolls, with the exception of /pol/.
Our findings have serious implications for society at large. First, our analysis shows that while troll accounts use peculiar tactics and talking points to further their agendas, these are not completely disjoint from regular users, and therefore developing automated systems to identify and block such accounts remains an open challenge. Second, our results also indicate that automated systems to detect trolls are likely to be difficult to realize: trolls change their behavior over time, and thus even a classifier that works perfectly on one campaign might not catch future campaigns. Third, and perhaps most worrying, we find that state-sponsored trolls have a meaningful amount of influence on fringe communities like The Donald, 4chan's /pol/, and Gab, and that the topics pushed by the trolls resonate strongly with these communities. This might be due to users on these communities that sympathize with the views the trolls aim to share (i.e., "useful idiots") or to unidentified state-sponsored actors on these communities. In either case, considering recent tragic events like the Tree of Life Synagogue shootings, perpetuated by a Gab user seemingly influenced by content posted there, the potential for mass societal upheaval cannot be overstated. Because of this, we implore the research community, as well as governments and non-government organizations to expend whatever resources are at their disposal to develop technology and policy to address this new, and effective, form of digital warfare.