Published May 27, 2024
| Version v2
Dataset
Open
ETimeline: An Extensive Timeline Generation Dataset based on Large Language Model
Creators
Contributors
Contact person:
Description
Timeline generation is of great significance for a comprehensive understanding of the development of events over time. Its goal is to organize news chronologically, which helps to identify patterns and trends that may be obscured when viewing news in isolation, making it easier to track the development of stories and understand the interrelationships between key events. Timelines have appeared in many commercial products, but there is a noticeable lack of research in this field in academia, and existing datasets need improvement in terms of effectiveness and scale. We propose the ETimeline dataset, which contains over 13,000 news articles, covering 600 bilingual timelines across 23 news domains. We collected more than 120,000 news articles as a candidate news pool and used the large language model (LLM) Pipeline to enhance performance, ultimately obtaining the ETimeline, and the news pool data will also be provided. This work contributes to the advancement of timeline generation research and supports a wide range of tasks, including topic generation and event relationships. We believe that this dataset will serve as a catalyst for innovative research and bridge the gap between academia and industry in understanding the practical application of technology services.
Files
Files
(9.1 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:b9b01a89cf9557a96b207d41294e2250
|
9.1 MB | Download |