Dataset Open Access

FinMeter models

Hämäläinen, Mika; Alnajjar, Khalid

This contains data files needed for FinMeter.

This data is complementary for FinMeter Python library described in:

Mika Hämäläinen and Khalid Alnajjar (2019). Let's FACE it. Finnish Poetry Generation with Aesthetics and Framing. In the Proceedings of The 12th International Conference on Natural Language Generation.

 

 

Sources:

The pretrained vectors for Finnish (es - I know) and English (en) are from E. Grave, P. Bojanowski, P. Gupta, A. Joulin, T. Mikolov, Learning Word Vectors for 157 Languages . Creative Commons Attribution-Share-Alike License 3.0. See https://fasttext.cc/docs/en/crawl-vectors.html

The word2vec model trained on the Finnish Internet ParseBank is from Kanerva, Jenna; Luotolahti, Juhani; Laippala, Veronika; Ginter, Filip: Syntactic N-gram Collection from a Large-Scale Corpus of Internet Finnish. Proceedings of the Sixth International Conference Baltic HLT. 2014. paper.  Creative Commons Attribution-ShareAlike 4.0 International License. See http://bionlp.utu.fi/finnish-internet-parsebank.html

The Finnish concreteness data has been automatically translated from Brysbaert, Marc, Amy Beth Warriner, and Victor Kuperman. "Concreteness ratings for 40 thousand generally known English word lemmas.Behavior research methods 46.3 (2014): 904-911. Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License. see http://crr.ugent.be/archives/1330

Files (10.2 GB)
Name Size
en.bin
md5:d72ddc55d7f32e26dcb11e2f2b5c138d
5.4 GB Download
es.bin
md5:4c1d1570e1f7456f3a48d92868f0fa62
1.5 GB Download
fi_concreteness.txt
md5:836745563679b08550de13bb7713e227
1.8 MB Download
fin-word2vec-lemma.bin
md5:882670227a07af80d23852f9051b61cf
2.7 GB Download
rel_matrix_n_csr.hkl
md5:549ef9dfec64d5e6febedcf7e19ba1f3
663.7 MB Download
unigrams_sorted_5k.txt
md5:40199a8b76838f5faaf295f1832dd747
801.5 kB Download
33
151
views
downloads
All versions This version
Views 3333
Downloads 151151
Data volume 207.4 GB207.4 GB
Unique views 2828
Unique downloads 4040

Share

Cite as