There is a newer version of the record available.

Published August 24, 2022 | Version v1
Dataset Open

Long document similarity dataset, Wikipedia excerptions for movies collections

Creators

Description

Movies-related articles extracted from Wikipedia.

For all articles, the figures and tables have been filtered out, as well as the categories and "see also" sections.

The article structure, and particularly the sub-titles and paragraphs are kept in these datasets

 

Movies

The Wikipedia Movies dataset consists of 100,371 articles describing various movies. Each article may consist of text passages describing the plot, cast, production, reception, soundtrack, and more.

Files

movies.txt

Files (448.1 MB)

Name Size Download all
md5:26f8a220e12013a7962f5e3b804285da
448.1 MB Preview Download