Published January 31, 2023 | Version 1
Dataset Open

animacy data for animcay classification

Authors/Creators

  • 1. University of Zurich,

Description

This is the training data for an animacy classifier (see References LREC)

1) gold_actor:           7468 nouns denoting animate entities
2) gold_nonactor        5511 nouns denoting non-animate entities

subsets of 1:

gold_direct           6897 nouns directly denoting animate entities
gold_metonym           587  metonymy trigger nouns

gold_female           3738 nouns denoting female actors
gold_male           2830 nouns denoting male actors
gold_nogender           329 nouns denoting female or male actors (often plural)


Format: just lists

Note: although some person names are in the data, a separate NER for person names should be used .

References:

@inproceedings{LREC,
           month = {Juni},
          author = {Manfred Klenner and Anne G{\"o}hring},
       booktitle = {Proceedings of the Language Resources and Evaluation Conference},
         address = {Marseille, France},
           title = {Animacy Denoting {G}erman Nouns: Annotation and Classification},
       publisher = {European Language Resources Association},
           pages = {1360--1364},
            year = {2022},
        language = {english},
             url = {https://doi.org/10.5167/uzh-219148},
        abstract = {In this paper, we introduce a gold standard for animacy detection comprising almost 14,500 German nouns that might be used to denote either animate entities or non-animate entities. We present inter-annotator agreement of our crowd-sourced seed annotations (9,000 nouns) and discuss the results of machine learning models applied to this data.}
}
 

Files

Files (138.0 kB)

Name Size Download all
md5:303525eaa9aacf4dd0d4797dd9018503
138.0 kB Download