Published August 24, 2022
| Version v1
Dataset
Open
Long document similarity dataset, Wikipedia excerptions for movies collections
Creators
Description
Movies-related articles extracted from Wikipedia.
For all articles, the figures and tables have been filtered out, as well as the categories and "see also" sections.
The article structure, and particularly the sub-titles and paragraphs are kept in these datasets
Movies
The Wikipedia Movies dataset consists of 100,371 articles describing various movies. Each article may consist of text passages describing the plot, cast, production, reception, soundtrack, and more.
Files
movies.txt
Files
(448.1 MB)
Name | Size | Download all |
---|---|---|
md5:26f8a220e12013a7962f5e3b804285da
|
448.1 MB | Preview Download |