Dataset Open Access

Contextualizing Trending Entities in News Stories

Ponza, Marco; Ceccarelli, Diego; Ferragina, Paolo; Meij, Edgar; Kothari, Sambhav


Citation Style Language JSON Export

{
  "publisher": "Zenodo", 
  "DOI": "10.5281/zenodo.4422045", 
  "language": "eng", 
  "title": "Contextualizing Trending Entities in News Stories", 
  "issued": {
    "date-parts": [
      [
        2021, 
        1, 
        6
      ]
    ]
  }, 
  "abstract": "<p>This repository contains the enrichments for the dataset <a href=\"https://catalog.ldc.upenn.edu/LDC2008T19\">The New York Times Annotated Corpus</a> developed for the paper:</p>\n\n<p>&ldquo;Marco Ponza, Diego Ceccarelli, Paolo Ferragina, Edgar Meij, Sambhav Kothari. Contextualizing Trending Entities in News Stories. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining (WSDM 2021).&rdquo;</p>\n\n<p>It includes a total of 149 trends constituted by 120K entities. The goal is to retrieve a set of entities ranked with respect to their usefulness in explaining why a given trending entity is actually trending.</p>\n\n<p><strong>Format</strong></p>\n\n<p>The repository contains the enrichments in JSON format.</p>\n\n<p>The news stories of the New York Times from which these enrichments have been developed are available from <a href=\"https://catalog.ldc.upenn.edu/LDC2008T19\">LDC</a>.</p>\n\n<p><strong>Data Splits</strong></p>\n\n<p>We perform two kinds of evaluation.</p>\n\n<ol>\n\t<li>Unsupervised evaluation, where we use the complete dataset of 149 trends as a benchmark.</li>\n\t<li>Supervised evaluation, where we train/tune our models on a training/development set and we test them on a test set.</li>\n</ol>\n\n<ul>\n\t<li>The training set contains 50 trends constituted by 36.3K entities from 1996 to 2000.</li>\n\t<li>The development set contains 34 trends constituted by 26.7K entities from 2000 to 2002.</li>\n\t<li>The test set contains 65 trends constituted by 57K entities from 2002 to 2007.</li>\n</ul>\n\n<p>Use</p>\n\n<p>Please cite the data set and the accompanying paper if you found the resources in this repository useful:</p>\n\n<p>@inproceedings{ponza2021,<br>\n&nbsp;&nbsp;&nbsp;&nbsp; Title = {Contextualizing Trending Entities in News Stories},<br>\n&nbsp;&nbsp;&nbsp;&nbsp; author = {Ponza, Marco and Ceccarelli, Diego and Ferragina, Paolo and Meij, Edgar and Kothari, Sambhav},<br>\n&nbsp;&nbsp;&nbsp;&nbsp; Booktitle = {Proceedings of the 14th ACM International Conference on Web Search and Data Mining},<br>\n&nbsp;&nbsp;&nbsp;&nbsp; Year = {2021},<br>\n}</p>", 
  "author": [
    {
      "family": "Ponza, Marco"
    }, 
    {
      "family": "Ceccarelli, Diego"
    }, 
    {
      "family": "Ferragina, Paolo"
    }, 
    {
      "family": "Meij, Edgar"
    }, 
    {
      "family": "Kothari, Sambhav"
    }
  ], 
  "id": "4422045", 
  "version": "v1", 
  "type": "dataset", 
  "event": "14th ACM International Conference on Web Search and Data Mining (WSDM 2021)"
}
335
25
views
downloads
All versions This version
Views 335334
Downloads 2525
Data volume 474.5 MB474.5 MB
Unique views 297296
Unique downloads 2525

Share

Cite as