Published February 19, 2021 | Version v1
Dataset Open

Pokémon Story Corpus

  • 1. University of Helsinki

Description

The larger corpus consists of fan written stories about Pokémon. The corpus is sentence and word tokenized. The order of sentences is shuffled for copyright reasons. The smaller corpus is a Pokémon description corpus for the first 151 Pokémon.

Sources: https://www.fanfiction.net/ and https://www.giantbomb.com/

Please cite the following paper if you use the resources:

Hämäläinen, M.,  Alnajjar, K. & Partanen, N. (2021). Nettikorpuksen avulla tuotettuja sanavektorimalleja Pokémonien ominaisuuksien kuvaamiseksi. In Saarikivi, T. & Saarikivi, J. (eds.) Turhan tiedon kirja — Tutkimuksista pois jätettyjä sivuja. p. 199-214. SKS Kirjat

Translation of the paper in English

 

Files

pokemon-descriptioncorpus.json

Files (2.3 GB)

Name Size Download all
md5:a88d22c82736f01053c1b4b029d0b867
253.4 kB Preview Download
md5:1ce50304d97164aced67be2e784ae9d4
2.3 GB Preview Download