Published June 12, 2021
| Version v1
Dataset
Open
WikiMorph: Learning to Decompose Words into Morphological Structures
Description
WikiMorph is a JSON dataset that contains word breakdowns for English words. These word breakdowns primarily consist of morphological compounds (both from English and the word's etymology) along with each compound's associated definition. It also contains other fields that might be useful, such as syllables and parts-of-speech tags. The dataset contains entries for 355,782 unique words and 505,033 total entries. The data collection process for this dataset was described in the paper "WikiMorph: Learning to Decompose Words into Morphological Structures", with some additional updates after publication.
{
"Word": "abduction",
"PoS": "Noun",
"Syllables": [
"ab",
"duc",
"tion"
],
"Definition": "The act of abducing or abducting; a drawing apart; the movement which separates a limb or other part from the axis, or middle line, of the body.",
"Morphemes": [
{
"Affix": "abduct",
"Language": "en",
"PoS": "Verb",
"Meaning": "To draw away, as a limb or other part, from the median axis of the body.",
"Etymology Compounds": [
{
"Affix": "ab",
"Language": "la",
"Decoded": "ab",
"PoS": null,
"Meaning": "away"
},
{
"Affix": "duco",
"Language": "la",
"Decoded": "duco",
"PoS": null,
"Meaning": "to lead"
}
]
},
{
"Affix": "-ion",
"Language": "en",
"PoS": "Suffix",
"Meaning": "an action or process, or the result of an action or process",
"Etymology Compounds": [
{
"Affix": "-iō",
"Language": "la",
"Decoded": "-io",
"PoS": "Suffix",
"Meaning": "Used to form abstract nouns from verbs."
}
]
}
]
}
Files
wiki_morph.json
Files
(607.2 MB)
Name | Size | Download all |
---|---|---|
md5:bd6538695555bea131c0261e1c89c2f8
|
607.2 MB | Preview Download |