Federated Content Search for Lexical Resources (LexFCS): Specification
Creators
- 1. Saxon Academy of Sciences and Humanities in Leipzig
- 2. Berlin-Brandenburg Academy of Sciences and Humanities
- 3. Leibniz-Institut für Deutsche Sprache
- 4. Universität Trier
- 5. Universität zu Köln
Description
The landscape of digital lexical resources is often characterized by dedicated local portals and proprietary interfaces as primary access points for scholars and the interested public. In addition, legal and technical restrictions are potential issues that can make it difficult to efficiently query and use these valuable resources. As part of the research data consortium Text+, solutions for the storage and provision of digital language resources are being developed and provided in the context of the unified cross-domain German research data infrastructure NFDI. The specific topic of accessing lexical resources in a diverse and heterogenous landscape with a variety of participating institutions and established technical solutions is met with the development of the federated search and query framework LexFCS. The LexFCS extends the established CLARIN Federated Content Search that already allows accessing spatially distributed text corpora using a common specification of technical interfaces, data formats, and query languages. This paper describes the current state of development of the LexFCS, gives an insight into its technical details, and provides an outlook on its future development.
The FCS specification (Schonefeld et al. 2014) will be extended with regard to announcing, querying and retrieving lexical resources. Specifically, this entails:
-
Specifying the query language which is a “CQL Context Set” of the Contextual Query Language (standardized by the US Library of Congress) dedicated to query lexical entries. Its specification includes agreements on accessible fields of information (like part-of-speech, definitions, (semantically) related entries etc.) for a lexeme and how to combine them to complex queries. This is especially challenging due to the inherently hierarchical structure of lexical data.
-
Specifying common data formats for a unified result presentation. On the basic level, this is achieved by a mandatory KWIC representation that allows annotating information types inline and by an advanced tabular-representation of all fields in a key-value-style. It is clearly understood that in most cases these representations can only provide a simplified view of the data. It is therefore endorsed to provide records in their complex native representation as well, with examples being different TEI dialects including TEI Lex-0, OntoLex/Lemon, and other formats.
-
Extending the core FCS specification while remaining compatible with the overall architecture to enable the reuse of features such as access control for restricted resources or automatic registering of endpoints within the FCS system.
Notes
Files
lexfcs-specification.pdf
Files
(1.6 MB)
Name | Size | Download all |
---|---|---|
md5:775fa1294fe5e6fad2e76adc63bd6140
|
1.6 MB | Preview Download |
md5:30086b07dd59489e53befe0c26450d60
|
17.1 kB | Preview Download |
Additional details
Related works
- Obsoletes
- 10.5281/zenodo.7849754 (DOI)
- 10.5281/zenodo.7923699 (DOI)