Dataset Open Access

Contextualizing Trending Entities in News Stories

Ponza, Marco; Ceccarelli, Diego; Ferragina, Paolo; Meij, Edgar; Kothari, Sambhav

This repository contains the enrichments for the dataset The New York Times Annotated Corpus developed for the paper:

“Marco Ponza, Diego Ceccarelli, Paolo Ferragina, Edgar Meij, Sambhav Kothari. Contextualizing Trending Entities in News Stories. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining (WSDM 2021).”

It includes a total of 149 trends constituted by 120K entities. The goal is to retrieve a set of entities ranked with respect to their usefulness in explaining why a given trending entity is actually trending.

Format

The repository contains the enrichments in JSON format.

The news stories of the New York Times from which these enrichments have been developed are available from LDC.

Data Splits

We perform two kinds of evaluation.

  1. Unsupervised evaluation, where we use the complete dataset of 149 trends as a benchmark.
  2. Supervised evaluation, where we train/tune our models on a training/development set and we test them on a test set.
  • The training set contains 50 trends constituted by 36.3K entities from 1996 to 2000.
  • The development set contains 34 trends constituted by 26.7K entities from 2000 to 2002.
  • The test set contains 65 trends constituted by 57K entities from 2002 to 2007.

Use

Please cite the data set and the accompanying paper if you found the resources in this repository useful:

@inproceedings{ponza2021,
     Title = {Contextualizing Trending Entities in News Stories},
     author = {Ponza, Marco and Ceccarelli, Diego and Ferragina, Paolo and Meij, Edgar and Kothari, Sambhav},
     Booktitle = {Proceedings of the 14th ACM International Conference on Web Search and Data Mining},
     Year = {2021},
}

Files (19.0 MB)
Name Size
contextualizing-trending-entities.zip
md5:3cc71b7e1461637531143549f6c4c5ba
19.0 MB Download
320
22
views
downloads
All versions This version
Views 320319
Downloads 2222
Data volume 417.6 MB417.6 MB
Unique views 283282
Unique downloads 2222

Share

Cite as