There is a newer version of the record available.

Published June 1, 2022 | Version v1
Presentation Open

Whose 'I' is it anyway? Comparing a rule-based approach and a BERT token-classifier for quote detection in Dutch newspapers

  • 1. Kim
  • 2. Herbert
  • 3. Frank
  • 4. Marcel

Description

In this paper we compare a rule-based approach and a BERT token-classifier for quote detection in Dutch newspapers in order to automatically identify personal journalism. Personal journalism is all journalism in which the journalist explicitly refers to themselves using first-person pronouns. With journalism's current struggle to maintain its authority and commercial viability, personal journalism has been increasingly present. This type of journalism could have far-reaching consequences for what is considered trustworthy journalistic knowledge. Because of the abundance of newspaper data and the time consuming nature of close reading, we apply computational methods to find personal journalism in newspapers. To distinguish between first-person pronouns that refer to the journalist and those that refer to their sources, we employ automatic quote extraction. We train and evaluate the rule-based method and the BERT token-classifier on a dataset of manually labelled newspaper articles from 1999, 2009 and 2019 from three newspapers.

Files

Files (1.7 MB)

Name Size Download all
md5:a5aa01e32a6ab40bb82369016987a7ac
1.7 MB Download