2247370
doi
10.5281/zenodo.2247370
oai:zenodo.org:2247370
FLExModules (Version 1.0.8)
Druskat, Stephan
Humboldt-Universität zu Berlin, Department of German Studies and Linguistics, Berlin, Germany
url:https://github.com/sdruskat/pepperModules-FLExModules/releases/tag/1.0.8
info:eu-repo/semantics/openAccess
Apache License 2.0
http://www.apache.org/licenses/LICENSE-2.0
Pepper
linguistics
corpus
format
SIL Fieldworks
FLEx
XML
importer
<p><a href="http://corpus-tools.org/pepper">Pepper</a> is a conversion framework for linguistic data. <em>pepperModules-FLExModules</em> is a plugin for <em>Pepper</em> and provides an importer for <strong>FLEx XML</strong>, i.e., the XML export format from <a href="https://software.sil.org/fieldworks/">SIL Fieldworks Language Explorer</a>. The format is used frequently for persisting language documentation data.</p>
<p>With the <em>pepperModules-FLExModules</em> importer, the data stored in FLEx XML interlinear text files can be transferred to another format. This way, the data can be re-used for other purposes (such as adding different annotation types), or visualized and analyzed, e.g., in <a href="http://corpus-tools.org/annis">ANNIS</a>, a search and visualization platform for linguistic data. For a list of available format converters for Pepper, see the <a href="http://corpus-tools.org/pepper/knownModules.html">list of known Pepper modules</a>.</p>
<p><strong>Context</strong></p>
<p>The development of pepperModules-ToolboxTextModules has been initiated in the <a href="https://hu.berlin/melatamp">MelaTAMP research project</a>.</p>
<p><strong>Requirements</strong></p>
<p><code>Pepper >= 3.2.7</code></p>
<p><strong>Usage</strong></p>
<ul>
<li>Create a <a href="http://corpus-tools.org/pepper/userGuide.html#workflow_file">Pepper workflow file</a> for the conversion, with the importer set to <code>FLExImporter</code>. Configure #properties as needed.</li>
<li><a href="http://corpus-tools.org/pepper/">Download Pepper</a>, and run it with the workflow file.</li>
</ul>
<p><strong>Importer</strong></p>
<p><em><strong>Requirements, assumptions, behaviour</strong></em></p>
<p><em>Annotation mapping</em></p>
<p>FLEx XML has features that necessitate a certain importer behaviour with regard to annotation namespace and names.</p>
<p>In <em>Salt</em>, the data model onto which data is mapped during import, annotations can have a <code>namespace</code>, and a <code>name</code>. In <em>FLEx XML</em>, one and the same annotation name, i.e., the <code>'type'</code> of an <code><item></code> can be used on different <em>levels</em>, i.e., <code><phrase></code>, <code><word></code> or <code><morph></code>, etc. Additionally, an <code><item></code> also has a <code>'lang'</code>, so 3 attributes in <em>FLEx XML</em> (<em>level</em>, <em>‘lang’</em>, <em>‘item’</em>) must be mapped onto 2 attributes in <em>Salt</em> annotations.</p>
<p>To preserve the <em>level</em> information of annotation during conversion, the <em>FLExImporter</em> maps it by adding the container (node/edge) of the annotation to a layer with the name of the level, i.e., <code>phrase</code>, <code>word</code>, and <code>morph</code>. Annotations on the document (FLEx level <code>interlinear-text</code>) are being made on the Salt document (<code>SDocument</code>), which itself cannot be added to a layer - the layer is a node in an <code>SDocument</code>’s graph. Instead, all annotations on the document itself can be assumed to belong the <code>interlinear-text</code> level.</p>
<p>At the same time, the <em>‘lang’</em> information is recorded in the namespace of the <em>Salt</em> annotation.</p>
<p>Therefore, if clients such as exporters need to re-combine this information, they need to retrieve language information from the namespace, and type information from the name of the annotation, and the <em>level</em> of the annotation from the <em>layer name</em> of the layer included in the set of layers which the container of the annotation is a part of, or the information whether an annotation is attached to an <code>SDocument</code>. The importer will create exactly one layer for each level, which will be named <code>phrase</code>, <code>word</code>, <code>morph</code> (according to the XML schema XSD file supplied by SIL, paragraphs cannot have annotations).</p>
<p><em>Properties</em></p>
<p><code>languageMap</code>: A map with original ‘lang’ strings and the target strings the original should be changed to during conversion. E.g., <code><property key="languageMap">ENGLISH=en,NORTH-AMBRYM=mmg</property></code></p>
<p><code>typeMap: </code>A map with original ‘type’ strings and the target strings the original should be changed to during conversion. E.g., <code><property key="typeMap">txt=tx,gls=ge</property></code></p>
<p><code>dropAnnotations</code>: A list of annotations that should be ignored during conversion. Annotations are defined as <code>{phrase\|word\|morph}::{language}:name</code>, of which the layer (the first) and the language (the second) element are optional. <code>languages</code> is a reserved name and will drop all language meta annotations from the child elements of <code><languages/></code>. E.g., <code><property key="dropAnnotations">languages,morph::en:hn,fr:gls,morph::dro,xxx</property></code></p>
<p><code>annotationMap</code>: A map whose keys are FLEx annotation and whose values are annotations they should be mapped to.<code> E.g., <property key="annotationMap">word::en:gls=ge,morph::en:gls=ps</property></code></p>
<p><strong>One document per file</strong></p>
<p>As <em>FLExText</em> files can contain <code>n</code> documents (corresponding to the XML element <code>interlinear-text</code>). However, files with more than one <code>interlinear-text</code> element cannot currently be processed by the FLExImporter.</p>
<p><strong>Javadoc Documentation</strong></p>
<p>The Javadoc documentation can be found at <a href="https://sdruskat.github.io/pepperModules-FLExModules">https://sdruskat.github.io/pepperModules-FLExModules</a>.</p>
Zenodo
2018-12-13
info:eu-repo/semantics/other
1297385
1.0.8
1579980058.671057
26860
md5:d184dcd0174f5fb1bde20641acdaa44d
https://zenodo.org/records/2247370/files/pepperModules-FLExModules-1.0.8.jar
19166
md5:f8b1820b7fe7c69ea43c6b1668ce391d
https://zenodo.org/records/2247370/files/pepperModules-FLExModules-1.0.8-sources.jar
137633
md5:7f11bb19ebf57f22dc066110f5366c2a
https://zenodo.org/records/2247370/files/pepperModules-FLExModules-1.0.8-javadoc.jar
public
https://github.com/sdruskat/pepperModules-FLExModules/releases/tag/1.0.8
Is supplement to
url
10.5281/zenodo.1297385
isVersionOf
doi