Dataset Open Access

FinMeter models

Hämäläinen, Mika; Alnajjar, Khalid


MARC21 XML Export

<?xml version='1.0' encoding='UTF-8'?>
<record xmlns="http://www.loc.gov/MARC21/slim">
  <leader>00000nmm##2200000uu#4500</leader>
  <datafield tag="041" ind1=" " ind2=" ">
    <subfield code="a">fin</subfield>
  </datafield>
  <controlfield tag="005">20200124192514.0</controlfield>
  <controlfield tag="001">3473456</controlfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">University of Helsinki</subfield>
    <subfield code="0">(orcid)0000-0002-7986-2994</subfield>
    <subfield code="a">Alnajjar, Khalid</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">5393166296</subfield>
    <subfield code="z">md5:d72ddc55d7f32e26dcb11e2f2b5c138d</subfield>
    <subfield code="u">https://zenodo.org/record/3473456/files/en.bin</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">1497954280</subfield>
    <subfield code="z">md5:4c1d1570e1f7456f3a48d92868f0fa62</subfield>
    <subfield code="u">https://zenodo.org/record/3473456/files/es.bin</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">1824059</subfield>
    <subfield code="z">md5:836745563679b08550de13bb7713e227</subfield>
    <subfield code="u">https://zenodo.org/record/3473456/files/fi_concreteness.txt</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">2681551329</subfield>
    <subfield code="z">md5:882670227a07af80d23852f9051b61cf</subfield>
    <subfield code="u">https://zenodo.org/record/3473456/files/fin-word2vec-lemma.bin</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">663652524</subfield>
    <subfield code="z">md5:549ef9dfec64d5e6febedcf7e19ba1f3</subfield>
    <subfield code="u">https://zenodo.org/record/3473456/files/rel_matrix_n_csr.hkl</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">801539</subfield>
    <subfield code="z">md5:40199a8b76838f5faaf295f1832dd747</subfield>
    <subfield code="u">https://zenodo.org/record/3473456/files/unigrams_sorted_5k.txt</subfield>
  </datafield>
  <datafield tag="542" ind1=" " ind2=" ">
    <subfield code="l">open</subfield>
  </datafield>
  <datafield tag="260" ind1=" " ind2=" ">
    <subfield code="c">2019-10-04</subfield>
  </datafield>
  <datafield tag="909" ind1="C" ind2="O">
    <subfield code="p">openaire_data</subfield>
    <subfield code="o">oai:zenodo.org:3473456</subfield>
  </datafield>
  <datafield tag="100" ind1=" " ind2=" ">
    <subfield code="u">University of Helsinki</subfield>
    <subfield code="0">(orcid)0000-0001-9315-1278</subfield>
    <subfield code="a">Hämäläinen, Mika</subfield>
  </datafield>
  <datafield tag="245" ind1=" " ind2=" ">
    <subfield code="a">FinMeter models</subfield>
  </datafield>
  <datafield tag="540" ind1=" " ind2=" ">
    <subfield code="u">https://creativecommons.org/licenses/by/4.0/legalcode</subfield>
    <subfield code="a">Creative Commons Attribution 4.0 International</subfield>
  </datafield>
  <datafield tag="650" ind1="1" ind2="7">
    <subfield code="a">cc-by</subfield>
    <subfield code="2">opendefinition.org</subfield>
  </datafield>
  <datafield tag="520" ind1=" " ind2=" ">
    <subfield code="a">&lt;p&gt;This contains data files needed for FinMeter.&lt;/p&gt;

&lt;p&gt;This data is complementary for FinMeter Python library described in:&lt;/p&gt;

&lt;p&gt;Mika H&amp;auml;m&amp;auml;l&amp;auml;inen and Khalid Alnajjar (2019).&amp;nbsp;Let&amp;#39;s FACE it. Finnish Poetry Generation with Aesthetics and Framing. In &lt;em&gt;the Proceedings of The 12th International Conference on Natural Language Generation&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;

&lt;p&gt;Sources:&lt;/p&gt;

&lt;p&gt;The pretrained vectors for Finnish (es - I know) and English (en) are from&amp;nbsp;E. Grave, P. Bojanowski, P. Gupta, A. Joulin, T. Mikolov,&amp;nbsp;&lt;em&gt;&lt;a href="https://arxiv.org/abs/1802.06893"&gt;Learning Word Vectors for 157 Languages&lt;/a&gt;&amp;nbsp;.&amp;nbsp;Creative Commons Attribution-Share-Alike License 3.0&lt;/em&gt;. See&amp;nbsp;&lt;a href="https://fasttext.cc/docs/en/crawl-vectors.html"&gt;https://fasttext.cc/docs/en/crawl-vectors.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The word2vec model trained on the Finnish Internet ParseBank is from&amp;nbsp;Kanerva, Jenna; Luotolahti, Juhani; Laippala, Veronika; Ginter, Filip: Syntactic N-gram Collection from a Large-Scale Corpus of Internet Finnish. Proceedings of the Sixth International Conference Baltic HLT. 2014.&amp;nbsp;&lt;a href="http://ebooks.iospress.nl/volumearticle/38025"&gt;paper&lt;/a&gt;.&amp;nbsp;&amp;nbsp;Creative Commons Attribution-ShareAlike 4.0 International License. See&amp;nbsp;&lt;a href="http://bionlp.utu.fi/finnish-internet-parsebank.html"&gt;http://bionlp.utu.fi/finnish-internet-parsebank.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Finnish concreteness data has been&amp;nbsp;automatically translated from&amp;nbsp;Brysbaert, Marc, Amy Beth Warriner, and Victor Kuperman. &amp;quot;&lt;a href="http://crr.ugent.be/papers/Brysbaert_Warriner_Kuperman_BRM_Concreteness_ratings.pdf"&gt;Concreteness ratings for 40 thousand generally known English word lemmas.&lt;/a&gt;&amp;quot;&amp;nbsp;&lt;em&gt;Behavior research methods&lt;/em&gt;&amp;nbsp;46.3 (2014): 904-911.&amp;nbsp;Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License. see&amp;nbsp;&lt;a href="http://crr.ugent.be/archives/1330"&gt;http://crr.ugent.be/archives/1330&lt;/a&gt;&lt;/p&gt;</subfield>
  </datafield>
  <datafield tag="773" ind1=" " ind2=" ">
    <subfield code="n">doi</subfield>
    <subfield code="i">isSupplementedBy</subfield>
    <subfield code="a">10.5281/zenodo.3473449</subfield>
  </datafield>
  <datafield tag="773" ind1=" " ind2=" ">
    <subfield code="n">doi</subfield>
    <subfield code="i">isVersionOf</subfield>
    <subfield code="a">10.5281/zenodo.3473455</subfield>
  </datafield>
  <datafield tag="024" ind1=" " ind2=" ">
    <subfield code="a">10.5281/zenodo.3473456</subfield>
    <subfield code="2">doi</subfield>
  </datafield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">dataset</subfield>
  </datafield>
</record>
50
163
views
downloads
All versions This version
Views 5050
Downloads 163163
Data volume 210.1 GB210.1 GB
Unique views 4545
Unique downloads 4848

Share

Cite as