Overview of writing systems and character encoding standards and article formats used in the domain
- 1. Universität zu Köln
- 2. Berlin-Brandenburgische Akademie der Wissenschaften
Description
Writing systems are the visual representations of language. Lexical resources have to represent the entire lexical system of a language digitally, at least potentially. As a consequence, lexical resources may have to handle the entirety of a writing system used to represent a given language and represent this information in a standardized digital manner. The information to every lexeme of a language has to be represented by one (or more) lexical entries, which again needs to be coherently represented digitally.
There may exist close to three hundred known writing systems in current and historical use. These writing systems differ in appearance and inventory, but more crucially in their structure and organizing principles. Technically, writing systems are represented in digital technical systems through various character encodings. These character encodings differ in scope and technical implementation. Computer fonts are then used to render the encoded character in a specific typeface or font. Again there are different formats for computer fonts which differ in their capability of representing characteristics of different writing systems.
Besides character encodings for representing different writing systems and computer fonts for rendering the encoded characters, article formats for lexicographical entries constitute a third important component of all digital lexical resources. These article formats are used to structure and organize the information contained in a lexical entry. They are used to represent the information in a standardized and coherent manner.
This document provides an overview of writing systems, recommended character encodings, and standardized article formats used in the domain of lexical resources. It discusses the importance of these components for the representation and processing of language data and provides examples of how they are used in practice.
Files
LR_2.3_Overview_of_Writing_Systems_and_Character_Encoding_Standards_and_Article_Formats_Used_in_the_domain.pdf
Files
(893.2 kB)
Name | Size | Download all |
---|---|---|
md5:68a0583622e30194e53fd3695deca35d
|
893.2 kB | Preview Download |