Dataset Open Access

PrevDistro - Preverb Distributions in Hungarian

Kalivoda, Ágnes


DataCite XML Export

<?xml version='1.0' encoding='utf-8'?>
<resource xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://datacite.org/schema/kernel-4" xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4.1/metadata.xsd">
  <identifier identifierType="DOI">10.5281/zenodo.6349410</identifier>
  <creators>
    <creator>
      <creatorName>Kalivoda, Ágnes</creatorName>
      <givenName>Ágnes</givenName>
      <familyName>Kalivoda</familyName>
      <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="http://orcid.org/">0000-0003-2520-5523</nameIdentifier>
      <affiliation>Hungarian Research Centre for Linguistics</affiliation>
    </creator>
  </creators>
  <titles>
    <title>PrevDistro - Preverb Distributions in Hungarian</title>
  </titles>
  <publisher>Zenodo</publisher>
  <publicationYear>2021</publicationYear>
  <subjects>
    <subject>linguistics</subject>
    <subject>Hungarian</subject>
    <subject>preverb constructions</subject>
    <subject>preverb</subject>
    <subject>verbal prefix</subject>
    <subject>verbal particle</subject>
    <subject>construction</subject>
  </subjects>
  <dates>
    <date dateType="Issued">2021-06-21</date>
  </dates>
  <language>hu</language>
  <resourceType resourceTypeGeneral="Dataset"/>
  <alternateIdentifiers>
    <alternateIdentifier alternateIdentifierType="url">https://zenodo.org/record/6349410</alternateIdentifier>
  </alternateIdentifiers>
  <relatedIdentifiers>
    <relatedIdentifier relatedIdentifierType="DOI" relationType="IsNewVersionOf" resourceTypeGeneral="Text">10.15774/PPKE.BTK.2021.019</relatedIdentifier>
    <relatedIdentifier relatedIdentifierType="DOI" relationType="IsVersionOf">10.5281/zenodo.6349409</relatedIdentifier>
  </relatedIdentifiers>
  <version>2.0.0</version>
  <rightsList>
    <rights rightsURI="https://opensource.org/licenses/GPL-3.0">GNU General Public License v3.0 or later</rights>
    <rights rightsURI="info:eu-repo/semantics/openAccess">Open Access</rights>
  </rightsList>
  <descriptions>
    <description descriptionType="Abstract">&lt;p&gt;PrevDistro (Preverb Distributions) is an open-source dataset containing 41.5 million corpus occurrences of 49 preverb-verb construction types. It consists of the following columns:&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;1 &lt;em&gt;sid&lt;/em&gt;: ID&lt;/li&gt;
	&lt;li&gt;2 &lt;em&gt;constype&lt;/em&gt;: construction type&lt;/li&gt;
	&lt;li&gt;3 &lt;em&gt;subtype&lt;/em&gt;: construction subtype&lt;/li&gt;
	&lt;li&gt;4 &lt;em&gt;prevpos&lt;/em&gt;: preverb position&lt;/li&gt;
	&lt;li&gt;5 &lt;em&gt;prev&lt;/em&gt;: preverb&lt;/li&gt;
	&lt;li&gt;6 &lt;em&gt;verb&lt;/em&gt;: verb lemma&lt;/li&gt;
	&lt;li&gt;7 &lt;em&gt;intervening&lt;/em&gt;: intervening words (as lemmas)&lt;/li&gt;
	&lt;li&gt;8 &lt;em&gt;actform&lt;/em&gt;: actual form (the same content as in column 10, but this column is lowercase)&lt;/li&gt;
	&lt;li&gt;9 &lt;em&gt;left&lt;/em&gt;: left context&lt;/li&gt;
	&lt;li&gt;10 &lt;em&gt;kwic&lt;/em&gt;: keyword in context&lt;/li&gt;
	&lt;li&gt;11 &lt;em&gt;right&lt;/em&gt;: right context&lt;/li&gt;
	&lt;li&gt;12 &lt;em&gt;docid&lt;/em&gt;: document ID from the Hungarian Gigaword Corpus&lt;/li&gt;
	&lt;li&gt;13 &lt;em&gt;title&lt;/em&gt;: document title&lt;/li&gt;
	&lt;li&gt;14 &lt;em&gt;style&lt;/em&gt;: document style (e.g. official, press, ...)&lt;/li&gt;
	&lt;li&gt;15 &lt;em&gt;region&lt;/em&gt;: document region (e.g. Transylvania, Subcarpathia, ...)&lt;/li&gt;
	&lt;li&gt;16 &lt;em&gt;year&lt;/em&gt;: year of publication (sometimes several years can be found in one document)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The first row stands for the header. If a cell&amp;#39;s value is unspecified, it is marked with underscore (_).&lt;/p&gt;</description>
    <description descriptionType="Other">PrevDistro 1.0.0 (deprecated) can be found at https://science-data.hu/dataset.xhtml?persistentId=doi:10.5072/FK2/TRSD50
In PrevDistro 2.0.0, several new columns were added and the already existing data has undergone some fixes as well.</description>
  </descriptions>
</resource>
39
3
views
downloads
All versions This version
Views 3939
Downloads 33
Data volume 39.7 GB39.7 GB
Unique views 2828
Unique downloads 33

Share

Cite as