Dataset Open Access

WikiMorph: Learning to Decompose Words into Morphological Structures

Jeff Yarbro; Andrew Olney

WikiMorph is a JSON dataset that contains word breakdowns for English words. These word breakdowns primarily consist of morphological compounds (both from English and the word's etymology) along with each compound's associated definition. It also contains other fields that might be useful, such as syllables and parts-of-speech tags. The dataset contains entries for 355,782 unique words and 505,033 total entries. The data collection process for this dataset was described in the paper "WikiMorph: Learning to Decompose Words into Morphological Structures", with some additional updates after publication.

 

    {
        "Word": "abduction",
        "PoS": "Noun",
        "Syllables": [
            "ab",
            "duc",
            "tion"
        ],
        "Definition": "The act of abducing or abducting; a drawing apart; the movement which separates a limb or other part from the axis, or middle line, of the body.",
        "Morphemes": [
            {
                "Affix": "abduct",
                "Language": "en",
                "PoS": "Verb",
                "Meaning": "To draw away, as a limb or other part, from the median axis of the body.",
                "Etymology Compounds": [
                    {
                        "Affix": "ab",
                        "Language": "la",
                        "Decoded": "ab",
                        "PoS": null,
                        "Meaning": "away"
                    },
                    {
                        "Affix": "duco",
                        "Language": "la",
                        "Decoded": "duco",
                        "PoS": null,
                        "Meaning": "to lead"
                    }
                ]
            },
            {
                "Affix": "-ion",
                "Language": "en",
                "PoS": "Suffix",
                "Meaning": "an action or process, or the result of an action or process",
                "Etymology Compounds": [
                    {
                        "Affix": "-iō",
                        "Language": "la",
                        "Decoded": "-io",
                        "PoS": "Suffix",
                        "Meaning": "Used to form abstract nouns from verbs."
                    }
                ]
            }
        ]
    }

 

Files (607.2 MB)
Name Size
wiki_morph.json
md5:bd6538695555bea131c0261e1c89c2f8
607.2 MB Download
38
5
views
downloads
All versions This version
Views 3838
Downloads 55
Data volume 3.0 GB3.0 GB
Unique views 2222
Unique downloads 55

Share

Cite as