Dataset Open Access

Human and Machine Judgements for Russian Semantic Relatedness

Panchenko, Alexander; Ustalov, Dmitry; Arefyev, Nikolay; Paperno, Denis; Konstantinova, Natalia; Loukachevitch, Natalia; Biemann, Chris


MARC21 XML Export

<?xml version='1.0' encoding='UTF-8'?>
<record xmlns="http://www.loc.gov/MARC21/slim">
  <leader>00000nmm##2200000uu#4500</leader>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">semantic relatedness</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">semantic similarity</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">distributional semantics</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">word2vec</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">russian language</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">evaluation</subfield>
  </datafield>
  <controlfield tag="005">20190410034845.0</controlfield>
  <controlfield tag="001">163857</controlfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="a">Ustalov, Dmitry</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="a">Arefyev, Nikolay</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="a">Paperno, Denis</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="a">Konstantinova, Natalia</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="a">Loukachevitch, Natalia</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="a">Biemann, Chris</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">1884528636</subfield>
    <subfield code="z">md5:530d03982f35552c762a64a0a7f8417b</subfield>
    <subfield code="u">https://zenodo.org/record/163857/files/all.norm-sz500-w10-cb0-it3-min5.w2v.vocab_1100000_similar250.gz</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">15205</subfield>
    <subfield code="z">md5:2ccd92a09182b581423bb86186e9b394</subfield>
    <subfield code="u">https://zenodo.org/record/163857/files/hj.csv</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">1091</subfield>
    <subfield code="z">md5:1ed4b7d36b485b3e8bbc151140e6b50e</subfield>
    <subfield code="u">https://zenodo.org/record/163857/files/hj-mc.csv</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">2379</subfield>
    <subfield code="z">md5:fed9990e1e4b67cd415ae4862dde4e95</subfield>
    <subfield code="u">https://zenodo.org/record/163857/files/hj-rg.csv</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">9532</subfield>
    <subfield code="z">md5:c6d3437cd205fbd53c622bfcf99ee4cf</subfield>
    <subfield code="u">https://zenodo.org/record/163857/files/hj-wordsim353-relatedness.csv</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">7391</subfield>
    <subfield code="z">md5:365c7be0528a28328f252a9c57ef3ed1</subfield>
    <subfield code="u">https://zenodo.org/record/163857/files/hj-wordsim353-similarity.csv</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">594231</subfield>
    <subfield code="z">md5:7cf92fd365d18105cbdb3226957e7239</subfield>
    <subfield code="u">https://zenodo.org/record/163857/files/mj.csv</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">4109127</subfield>
    <subfield code="z">md5:826bc67f4a774a9f9ef167b784894ba6</subfield>
    <subfield code="u">https://zenodo.org/record/163857/files/rt.csv</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">350485</subfield>
    <subfield code="z">md5:d065c9dae0e6ee06f74f0e4e582b7550</subfield>
    <subfield code="u">https://zenodo.org/record/163857/files/rt-test.csv</subfield>
  </datafield>
  <datafield tag="542" ind1=" " ind2=" ">
    <subfield code="l">open</subfield>
  </datafield>
  <datafield tag="260" ind1=" " ind2=" ">
    <subfield code="c">2016-11-26</subfield>
  </datafield>
  <datafield tag="909" ind1="C" ind2="O">
    <subfield code="p">openaire_data</subfield>
    <subfield code="p">user-zenodo</subfield>
    <subfield code="o">oai:zenodo.org:163857</subfield>
  </datafield>
  <datafield tag="100" ind1=" " ind2=" ">
    <subfield code="a">Panchenko, Alexander</subfield>
  </datafield>
  <datafield tag="245" ind1=" " ind2=" ">
    <subfield code="a">Human and Machine Judgements for Russian Semantic Relatedness</subfield>
  </datafield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">user-zenodo</subfield>
  </datafield>
  <datafield tag="540" ind1=" " ind2=" ">
    <subfield code="u">http://creativecommons.org/licenses/by/4.0/legalcode</subfield>
    <subfield code="a">Creative Commons Attribution 4.0 International</subfield>
  </datafield>
  <datafield tag="650" ind1="1" ind2="7">
    <subfield code="a">cc-by</subfield>
    <subfield code="2">opendefinition.org</subfield>
  </datafield>
  <datafield tag="520" ind1=" " ind2=" ">
    <subfield code="a">&lt;p&gt;Semantic relatedness of terms represents similarity of meaning by a numerical score. On the one hand, humans easily make judgements about semantic relatedness. On the other hand, this kind of information is useful in language processing systems. While semantic relatedness has been extensively studied for English using numerous language resources, such as associative norms, human judgements and datasets generated from lexical databases, no evaluation resources of this kind have been available for Russian to date. Our contribution addresses this problem. We present five language resources of different scale and purpose for Russian semantic relatedness, each being a list of triples (wordi, wordj , similarityij ). Four of them are designed for evaluation of systems for computing semantic relatedness, complementing each other in terms of the semantic relation type they represent. These benchmarks were used to organise a shared task on Russian semantic relatedness, which attracted 19 teams. We use one of the best approaches identified in this competition to generate the fifth high-coverage resource, the first open distributional thesaurus of Russian. Multiple evaluations of this thesaurus, including a large-scale crowdsourcing study involving native speakers, indicate its high accuracy.&lt;/p&gt;

&lt;p&gt;For more details see: &lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;The web page of the RUSSE evaluation campaign: http://russe.nlpub.ru/downloads&lt;/li&gt;
	&lt;li&gt;The original publication "Panchenko A., Ustalov D., Arefyev N., Paperno D. Konstantinova N., Loukachevitch N. and Biemann C. undefinedHuman and Machine Judgements about Russian Semantic Relatedness. In Proceedings of the 5th Conference on Analysis of Images, Social Networks and Texts (AIST'2016). Communications in Computer and Information Science (CCIS). Springler-Verlag Berlin Heidelberg": https://www.lt.informatik.tu-darmstadt.de/fileadmin/user_upload/Group_LangTech/publications/aist_2016_hmj.pdf&lt;/li&gt;
&lt;/ul&gt;</subfield>
  </datafield>
  <datafield tag="024" ind1=" " ind2=" ">
    <subfield code="a">10.5281/zenodo.163857</subfield>
    <subfield code="2">doi</subfield>
  </datafield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">dataset</subfield>
  </datafield>
</record>
116
2,627
views
downloads
All versions This version
Views 116116
Downloads 2,6272,627
Data volume 36.6 GB36.6 GB
Unique views 112112
Unique downloads 2,4192,419

Share

Cite as