FLExModules
Creators
- 1. Humboldt-Universität zu Berlin, Department of German Studies and Linguistics, Berlin, Germany
Description
Pepper is a conversion framework for linguistic data. pepperModules-FLExModules is a plugin for Pepper and provides an importer for FLEx XML, i.e., the XML export format from SIL Fieldworks Language Explorer. The format is used frequently for persisting language documentation data.
With the pepperModules-FLExModules importer, the data stored in FLEx XML interlinear text files can be transferred to another format. This way, the data can be re-used for other purposes (such as adding different annotation types), or visualized and analyzed, e.g., in ANNIS, a search and visualization platform for linguistic data. For a list of available format converters for Pepper, see the list of known Pepper modules.
Context
The development of pepperModules-ToolboxTextModules has been initiated in the MelaTAMP research project.
Requirements
Pepper >= 3.1.1-SNAPSHOT
Usage
- Create a Pepper workflow file for the conversion, with the importer set to
FLExImporter
. Configure properties as needed. - Download Pepper, and run it with the workflow file.
Importer
Requirements, assumptions, behaviour
Annotation mapping
FLEx XML has features that necessitate a certain importer behaviour with regard to annotation namespace and names.
In Salt, the data model onto which data is mapped during import, annotations can have a namespace
, and a name
. In FLEx XML, one and the same annotation name, i.e., the 'type'
of an <item>
can be used on different levels, i.e., <phrase>
, <word>
or <morph>
, etc. Additionally, an <item>
also has a 'lang'
, so 3 attributes in FLEx XML (level, 'lang', 'item') must be mapped onto 2 attributes in Salt annotations.
To preserve the level information of annotation during conversion, the FLExImporter maps it by adding the container (node/edge) of the annotation to a layer with the name of the level, i.e., phrase
, word
, and morph
. Annotations on the document (FLEx level interlinear-text
) are being made on the Salt document (SDocument
), which itself cannot be added to a layer - the layer is a node in an SDocument
's graph. Instead, all annotations on the document itself can be assumed to belong the interlinear-text
level.
At the same time, the 'lang' information is recorded in the namespace of the Salt annotation.
Therefore, if clients such as exporters need to re-combine this information, they need to retrieve language information from the namespace, and type information from the name of the annotation, and the level of the annotation from the layer name of the layer included in the set of layers which the container of the annotation is a part of, or the information whether an annotation is attached to an SDocument
. The importer will create exactly one layer for each level, which will be named phrase
, word
, morph
(according to the XML schema XSD file supplied by SIL, paragraphs cannot have annotations).
Properties
languageMap
: A map with original 'lang' strings and the target strings the original should be changed to during conversion. E.g., <property key="languageMap">ENGLISH=en,NORTH-AMBRYM=mmg</property>
typeMap:
A map with original 'type' strings and the target strings the original should be changed to during conversion. E.g., <property key="typeMap">txt=tx,gls=ge</property>
One document per file
As FLExText files can contain n
documents (corresponding to the XML element interlinear-text
). However, files with more than one interlinear-text
element cannot currently be processed by the FLExImporter.
Files
pepperModules-FLExModules-1.0.0.zip
Files
(81.7 kB)
Name | Size | Download all |
---|---|---|
md5:8629e6e348053ea3a0df4e8a2c08e578
|
17.6 kB | Download |
md5:69dfe1e956c83d6a761b4f6972252053
|
23.1 kB | Download |
md5:4263516c8e7bab2a4b822ac88871e5be
|
41.0 kB | Preview Download |
Additional details
Related works
- Is supplement to
- https://github.com/sdruskat/pepperModules-FLExModules/releases/tag/1.0.0 (URL)