Other Open Access
Fırat Duruşan; Ali Hürriyetoğlu; Erdem Yörük; Osman Mutlu; Çağrı Yoltar; Burak Gürel; Alvaro Comin
This document is the annotation manual developed in the scope of Emerging Markets Welfare project . The project investigates the effects of contentious politics on welfare state programs in countries of the Global South. It hypothesizes that government response to social contention is a significant factor that shapes welfare policies. It is in this respect that mapping the dynamics of social contention in a given country becomes crucial, and duly constitutes a fundamental component of the entire project. Investigating the causal relationship between social contention and government policy involves more than a simple correlation, particularly if the focus is on specific government action, namely welfare policies. The map of social contention adequate for such an understanding should thus go beyond laying out basic trends of ebbing and flowing of social contention over space and time and provide insight into particularities such as the types of action repertoires, levels of violence, characteristics of actors or social groups that engage in contentious politics, the characteristics of the demands that they raise.
The purpose of the second work package of the EMW Project is to draw the aforementioned map of social contention. For achieving this purpose, we created a database of contentious politics events through the extraction of information from the news reports that are featured in the most prominent online sources each focus country has to offer. The Global Contentious Politics Database (GLOCON) records contentious politics events (referred to as protest events for the sake of brevity) that take place within the borders of our focus countries with all the information available in the source about the events’ time and place, actor, type, demands raised, violence level. As of the moment, the GLOCON database contains protest event data from India, China, South Africa, Argentina, and Brazil. It features data in three languages: English for India, China, and South Africa data, Spanish for Argentina data, and Portuguese for Brazil data. The database was created in a way that is able to accommodate additions of other focus countries and/or news sources in the future.
The database creation utilized automated text processing tools that detects if a news article contains a protest event, locate protest information within the article, and extract pieces of information regarding the detected protest events. The basis of training and testing the automated tools is the GLOCON Gold Standard Corpus (GSC), which contains news articles from multiple sources from each focus country. The articles in the GSC were manually coded by skilled annotators in both classification and extraction tasks with the utmost accuracy and consistency that automated tool development demands. In order to assure these, the annotation manuals  in this document lay out the rules according to which annotators code the news articles. Annotators refer to the manuals at all times for all annotation tasks and apply the rules that they contain.
Despite the EMW Project's focus on the countries of the Global South, and the initial choice of a limited number of countries to be featured in the GSC, none of the rules or principles contained in this manual is more or less applicable to certain countries, sources or periods than others. The GLOCON database aims to be inclusive and capable of expanding. Securing consistency, reliability, and validity of data in the face of temporal and spatial expansion requires that annotation principles are generally applicable and that they are applied consistently.
The annotation process is composed of three main levels for each news report document. The document-level annotation determines the news articles that contain information on actual (past or ongoing) protest events. The sentence-level annotation aims to locate sentences that contain protest event-related information. In the final phase, words or phrases that give concrete information about protest events are detected.
You can find more information on the project at https://emw.ku.edu.tr/. You can stay updated by following the twitter account at https://twitter.com/EmergingWelfare and the youtube channel at https://www.youtube.com/channel/UC1SDR9yjXAFTAGRVHSyefuw
The database can be found at https://glocon.ku.edu.tr/.
This manual was published along with the publication Hürriyetoğlu, A., Yörük, E., Mutlu, O., Duruşan, F., Yoltar, Ç., Yüret, D. and Gürel, B.: Cross-context news corpus for protest event-related knowledge base construction. Data Intelligence 3(2), 2021. doi: 10.1162/dint_a_00092. See the final section for the list of publications from the EMW Project that refer to it. Regarding the contents of the manual, please contact Fırat Duruşan (email: email@example.com).
|All versions||This version|
|Data volume||63.5 MB||63.5 MB|