Encoding polylexical units with TEI Lex-o: A case study

doi:10.5281/zenodo.4453143

Published January 20, 2021 | Version v1

Journal article Open

Encoding polylexical units with TEI Lex-o: A case study

1. Belgrade Center for Digital Humanitie
2. NOVA CLUNL, Faculdade de Ciências Sociais e Humanas, Universidade NOVA de Lisboa

The modelling and encoding of polylexical units, i.e. recurrent sequences of lexemes that are perceived as independent lexical units, is a topic that has not been covered adequately and in sufficient depth by the Guidelines of the Text Encoding Initiative (TEI), a de facto standard for the digital representation of textual resources in the scholarly research community. In this paper, we use the Dictionary of the Portuguese Academy of Sciences as a case study for presenting our ongoing work on encoding polylexical units using TEI Lex-0, an initiative aimed at simplifying and streamlining the encoding of lexical data with TEI in order to improve interoperability. We introduce the notion of macro- and microstructural relevance to differentiate between polylexicals that serve as headwords for their own independent dictionary entries and those which appear inside entries for different headwords. We develop the notion of lexicographic transparency to distinguish between those units which are not accompanied by an explicit definition and those that are: the former are encoded as <form>–like constructs, whereas the latter becomes <entry>–like constructs, which can have further constraints imposed on them (sense numbers, domain labels, grammatical labels etc.). We codify the use of attributes on <gram> to encode different kinds of labels for polylexicals (implicit, explicit and normalised), concluding that the interoperability of lexical resources would be significantly improved if dictionary encoders would have access to an expressive but relatively simple typology of polylexical units.

Files

9157-Article Text-25827-1-10-20200810.pdf

Files (1.5 MB)

Name	Size	Download all
9157-Article Text-25827-1-10-20200810.pdf md5:924242fc636e926f4e22c15392df6c19	1.5 MB	Preview Download

	All versions	This version
Views	139	139
Downloads	102	102
Data volume	160.0 MB	160.0 MB

Encoding polylexical units with TEI Lex-o: A case study

Creators

Description

Files

9157-Article Text-25827-1-10-20200810.pdf

Files (1.5 MB)