Dataset Open Access

Pokémon Story Corpus

Hämäläinen, Mika; Alnajjar, Khalid; Partanen, Niko

The larger corpus consists of fan written stories about Pokémon. The corpus is sentence and word tokenized. The order of sentences is shuffled for copyright reasons. The smaller corpus is a Pokémon description corpus for the first 151 Pokémon.

Sources: https://www.fanfiction.net/ and https://www.giantbomb.com/

Please cite the following paper if you use the resources:

Hämäläinen, M.,  Alnajjar, K. & Partanen, N. (2021). Nettikorpuksen avulla tuotettuja sanavektorimalleja Pokémonien ominaisuuksien kuvaamiseksi. In Saarikivi, T. & Saarikivi, J. (eds.) Turhan tiedon kirja — Tutkimuksista pois jätettyjä sivuja. p. 199-214. SKS Kirjat

Translation of the paper in English

 

Files (2.3 GB)
Name Size
pokemon-descriptioncorpus.json
md5:a88d22c82736f01053c1b4b029d0b867
253.4 kB Download
pokemon-storycorpus.txt
md5:1ce50304d97164aced67be2e784ae9d4
2.3 GB Download
152
31
views
downloads
All versions This version
Views 152152
Downloads 3131
Data volume 28.0 GB28.0 GB
Unique views 136136
Unique downloads 2222

Share

Cite as