Published June 12, 2021 | Version v1
Dataset Open

WikiMorph: Learning to Decompose Words into Morphological Structures

  • 1. University of Memphis

Description

WikiMorph is a JSON dataset that contains word breakdowns for English words. These word breakdowns primarily consist of morphological compounds (both from English and the word's etymology) along with each compound's associated definition. It also contains other fields that might be useful, such as syllables and parts-of-speech tags. The dataset contains entries for 355,782 unique words and 505,033 total entries. The data collection process for this dataset was described in the paper "WikiMorph: Learning to Decompose Words into Morphological Structures", with some additional updates after publication.

 

    {
        "Word": "abduction",
        "PoS": "Noun",
        "Syllables": [
            "ab",
            "duc",
            "tion"
        ],
        "Definition": "The act of abducing or abducting; a drawing apart; the movement which separates a limb or other part from the axis, or middle line, of the body.",
        "Morphemes": [
            {
                "Affix": "abduct",
                "Language": "en",
                "PoS": "Verb",
                "Meaning": "To draw away, as a limb or other part, from the median axis of the body.",
                "Etymology Compounds": [
                    {
                        "Affix": "ab",
                        "Language": "la",
                        "Decoded": "ab",
                        "PoS": null,
                        "Meaning": "away"
                    },
                    {
                        "Affix": "duco",
                        "Language": "la",
                        "Decoded": "duco",
                        "PoS": null,
                        "Meaning": "to lead"
                    }
                ]
            },
            {
                "Affix": "-ion",
                "Language": "en",
                "PoS": "Suffix",
                "Meaning": "an action or process, or the result of an action or process",
                "Etymology Compounds": [
                    {
                        "Affix": "-iō",
                        "Language": "la",
                        "Decoded": "-io",
                        "PoS": "Suffix",
                        "Meaning": "Used to form abstract nouns from verbs."
                    }
                ]
            }
        ]
    }

 

Files

wiki_morph.json

Files (607.2 MB)

Name Size Download all
md5:bd6538695555bea131c0261e1c89c2f8
607.2 MB Preview Download