Event Detection with Entity Markers

. Event detection involves the identiﬁcation of instances of speciﬁed types of events in text and their classiﬁcation into event types. In this paper, we approach the event detection task as a relation extraction task. In this context, we assume that the clues brought by the entities participating in an event are important and could improve the performance of event detection. Therefore, we propose to exploit entity information explicitly for detecting the event triggers by marking them at diﬀerent levels while ﬁne-tuning a pre-trained language model. The experimental results prove that our approach obtains state-of-the-art re-sults on the ACE 2005 dataset.


Introduction
Event detection (ED) aims to identify the instances of specified types of events in text. An event is represented by an event mention (a text that contains an event of a specific type and subtype), an event trigger (the word that expresses the event mention), an event argument (a participant in the event of a specific type), and an argument role (the role of the entity in the event). For instance, according to the ACE 2005 annotation guidelines 3 , in the sentence "She's been convicted of obstruction of justice.", an event detection system should be able to recognize the word convicted as a trigger for the specific event type Convict.
A main challenge intervenes when the same event might appear in the form of various trigger expressions and an expression might represent different event types in different contexts. For example, transfer could refer to transferring ownership of an item, transferring money, or transferring personnel from one location to another. Each sense of the word is linked with an event type. In the same manner, fired can correspond to an attack type of event as in "an American tank fired on the street" or it can express the dismissal of an employee from a job as in "Hillary Clinton was fired from the House Judiciary Committee's Watergate investigation".
Therefore, we would assume that, in such cases, significant clues can be given by the context of a candidate trigger and by the presence of the participants at the event in this context, e.g. named entities. For analyzing the importance of these indicators of the existence of an event in a sentence, we adopt a relation extraction model to perform event detection by taking advantage of the participants in the event (event arguments).

Related Work
Most current state-of-the-art systems perform event detection individually [2,17,6], where the entities are either ignored or considered helpful in joint models.
Some works made use of gold-standard entities in different manners. Higher results can be obtained with gold-standard entity types [17], by concatenating randomly initialized embeddings for the entity types. A graph neural network (GNN) based on dependency trees [18] has also been proposed to perform event detection with a pooling method that relies on entity mentions aggregating the convolution vectors. Arguments provided significant clues to this task in the supervised attention mechanism proposed to exploit argument information explicitly for event detection [11], while also using events from FrameNet.
Although some joint learning-based methods have been proposed, which tackled event detection and argument extraction simultaneously, these approaches usually only make significant improvements on the argument extraction, but insignificant to event detection. These methods usually combine the loss functions of these two tasks and are jointly trained under the supervision of annotated triggers and arguments. Event triggers and their arguments are predicted at the same time in a joint framework [15] with bidirectional recurrent neural networks (Bi-RNNs) and a convolutional neural network (CNN) and systematically investigate the usage of memory vectors/matrices to store the prediction information during the course of labeling sentence features.
The architecture adopted in [12] was to jointly extract multiple event triggers and event arguments by introducing syntactic shortcut arcs derived from the dependency parsing trees to enhance the information flow in an attentionbased graph convolution network (GCN) model. The gold-standard entity types are embedded as features for trigger and argument prediction. The argument information was also exploited in [11] explicitly for event detection by experimenting with different strategies for adding supervised attention mechanisms. The authors exploit the annotated entity information by concatenating the token embeddings with randomly initialized entity type embeddings.
Recently, different approaches that include external resources and features at a sub-word representation level have been proposed. Thus, generative adversarial networks (GANs) have been applied in event detection [24,8]. Besides, reinforcement learning (RL) is used in [24] for creating an end-to-end entity and event extraction framework. The approach attempted in [23] based on the BERT model with an automatic generation of labeled data by editing prototypes and filtering out the labeled samples through argument replacement by ranking their quality. A similar framework is proposed by [22] but information is encoded by BERT or a CNN suggesting a growing interest in adversarial models. Simultaneously, an integration of a distillation technique to enhance the adversarial prediction was explored in [13].
Although recent advances are focused on multiple techniques, several BERTbased architectures have been proposed [21,23,22]. In this work, we demonstrate that the advantages of BERT can be improved by adding extra information by explicitly marking the entities in the input text. We continue with the presentation of our proposed model in Section 3. The experimental setup and the results are detailed in Section 4 and we finalize with some conclusions and perspectives in Section 5.

Approach
We implemented the BERT-based model with EntityMarkers 4 We adapt the method presented in [19] applied for relation classification, to perform event detection. First, our model extends the BERT [3] model applied to sequential data. BERT itself is a stack of Transformer layers [20]. We refer the readers to the original paper for a more detailed description. We modify BERT by adding a conditional random fields (CRF) layer instead of the dense one, which is commonly used in other works on sequential labeling [9,14] to ensure output consistency. Next, the EntityMarkers model [19] consists in augmenting the input data with a series of special tokens. Thus, if we consider a sentence x = [x 0 , x 1 , . . . , x n ] with n tokens, we augment x with two reserved word pieces to mark the beginning and the end of each event argument mention in the sentence.  4 We only used the input type representation and consider a complex output based on tokens which is not considered in [19].
In the ACE 2005 dataset, an event argument is defined as an entity mention, a temporal expression or a value (e.g. Crime, Sentence, Job-Title) that is involved in an event (as participants or attributes with a specific role in an event mention). An event argument has an entity type and a role. For example, in a Conflict.Attack event type, one event argument can be an Attacker with three possible types: PER, ORG, GPE (Person, Organization, Geo-political Entity). Thus, we introduce three types of markers: (1) Entity Position Markers, e.g. [E start ] and [E end ] where E represents an entity of any type, (2) Entity Type Markers, e.g. P ER start and P ER end where PER represents an entity of type Person, and (3) we also test that, in the case of the event argument roles are known beforehand, the Argument Role Markers, e.g.
where Defendant is an event argument role. We modify x to give: . , x n ] and we feed this token sequence into BERT instead of x. We also update the entity indices E = (i + 1, j + 1) to account for the inserted tokens, as shown in Figure  1 for the model with Entity Position Markers.
As an example, in the sentence "She's been convicted of obstruction of justice.", where She has the argument role of a Defendant and obstruction of justice is an argument of type Crime, the sentence is augmented as follows: For the Argument Role Markers, if an entity has different roles in different events that are present in the same sentence, we mark the entity with all the argument roles that it has.
We compare our proposed models with markers with several state-of-the-art neural-based models proposed for event detection, that do not use external resources, more specifically with the following models based on CNNs and RNNs: the CNN-based model [17] with and without the addition of gold-standard entities, the dynamic multi-pooling CNN model [2], the bidirectional joint RNNs [15], the non-consecutive CNN in [16], the hybrid model [7], the GAIL model [24], the gated cross-lingual attention model [10], and the graph CNN [18].We also compare our approach with recent proposed BERT-based models, the finetuned baseline BERT-base-uncased [5], the QA-BERT [5] where the task has been approached as a question answering task, the two models with adversarial training for weakly supervised event detection [22], and the BERT and LSTMs approaches [21] that models text spans and captures within-sentence and crosssentence context.
Between the BERT-based baseline models presented in Table 1, it is worth noticing that the cased models perform better than the uncased ones, which could confirm that named entities that are usually capitalized are an important clue for the event detection task 5 . Moreover, the results are similar to the BERTbase-uncased in [5] (the same F1 value and similar precision and recall scores) and [21].
Full results of our model and its comparison against state of the art is presented in Table 2. There is a significant gain with the trigger classification of 9.04% higher over the stand-alone BERT-based model and 5.99% to the best reported previous models. These results demonstrate the effectiveness of our method to incorporate the argument information.
Moreover, the improvements are consistent regardless of the type of encoder (BERT or other) used to represent the inputs. For our first model (Entity Position Markers), where the entities are surrounded by a general marker that does not depend on the entity type, the results are improved with three percentage points revealing that the position of the entities is relevant for the trigger detection task. Furthermore, when we mark the entities with their argument roles (Argument Role Markers), the recall and F1 increase with around one absolute percentage point. However, this case is substantially optimistic as it assumes that argument roles were correctly identified and typed.

Conclusions and Perspectives
We presented an approach for integrating entity information for the event detection task by adding different levels of entity markers, their positions, their types, and finally, their argument roles. Considering the results, we can conclude that marking entities in a sentence can significantly improve the F1 scores and obtain state-of-the-art values. Further analysis remains to be done in order to understand in which cases the markers bring informative features. As future work, we propose to tackle the drawbacks of our current model by introducing the recognition and typing of the entities in our model.