public interface FLExText
This interface is a dictionary for files following the model of FLExText.
The FLExText model has the following structure, according to FlexInterlinear.xsd:
(! = required, ? = optional, () = fixed)
document
interlinear-text
1..*
item
0..1paragraphs
1..1
paragraph
0..*
phrases
1..1
phrase
0..*
item
0..*words
1..1
scrMilestone
0..*word
0..*
item
0..*morphemes
0..1
morph
0..*
item
0..*item
0..* (Order seems to be important here!)languages
0..1
language
0..*media-files
0..*
media
0..*item
(non-nillable)The resulting Salt model will look like the following
+---------------------------------------------------------------------------------------------+
| SCorpus document |
+---------------------------------------------------------------------------------------+-----+
| SDocument interlinear-text | | *Currently, only
| annotations: | ... | one interlinear-text
| item "type"_"lang":value | | per file implemented!
+---------------------------------------------------------------------------------+-----+-----+
| SSpan phrase "item 'segnum'" | |
| annotations: | ... |
| item "type"_"lang":value | |
+---------------------------------------------------------------------------+-----+-----+
| SToken word | |
| annotations: | ... |
| item "type"_"lang":value | |
+-↓-↓-↓---------------------------------------------------------------------+-↓-↓-+-----------+
| STimeline timeline |
| [Ties together "word" tokens (above) and "morph" tokens (below) |
| to interlinearize on the real data source] |
+-↑-↑-↑------------------------------↑-↑-↑----------------------------+-↑-↑-+-----------------+
| SToken morph | SToken morph | |
| annotations: | annotations: | ... |
| item "type"_"lang":value | item "type"_"lang":value | |
+----------------------------------+----------------------------------+-----+-----------------+
| STextualDS "word" (compiled from word > item type="text") |
+---------------------------------------------------------------------------------------------+
| STextualDS "morph" (compiled from morph > item type="text") |
+---------------------------------------------------------------------------------------------+
‘item’ elements, i.e., linguistically relevant annotations, are mapped onto Salt SAnnotation
s. These have a namespace
, a name
and a value
.
During the mapping, an ’item’s ‘lang’ is mapped onto the SAnnotation
namespace
, and its ‘type’ is mapped onto the SAnnotation
name
.
The level of the annotation in the FLEx XML structure is represented by assigning the annotation’s container (i.e., the respective node) an item layer. This can be used to retrieve the level of the annotation during downstream manipulation and export.
Item layers represent the levels of the FLEx XML which can have ‘item’ elements, i.e., linguistically relevant annotations. They are (bottom to top):
morph
word
phrase
interlinear-text
‘item’ elements on the interlinear-text
level cannot be mapped to annotations whose container (an SDocument
) cannot be added to its internal layer. Therefore, ’item’s on the interlinear-text
level are annotations on the SDocument
. This information can be used to retrieve the level of the annotation during downstream manipulation and export.
As the import of FLEx XML creates two token layers in Salt (and also two different data source nodes) - namely for lexical and for morphological tokens - these are being added to a dedicated layer respectively. These are called lexial-data
and morphological-data
.
Modifier and Type | Field and Description |
---|---|
static String |
FLEX__ANALYSIS_STATUS_ATTR
Constant for the ‘analysisStatus’ attribute in ‘item’ elements.
|
static String |
FLEX__LANG_ATTR
Constant for the ‘lang’ attribute in ‘item’ elements.
|
static String |
FLEX__TYPE_ATTR
Constant for the ‘type’ attribute in ‘item’ elements.
|
static String |
FLEX_ITEM_TYPE__PUNCT
Constant for the ‘punct’ value of the ‘type’ attribute in ‘item’ elements.
|
static String |
FLEX_ITEM_TYPE__TXT
Constant for the ‘txt’ value of the ‘type’ attribute in ‘item’ elements.
|
static String |
FLEX_LANGUAGE__ENCODING_ATTR
Constant for the ‘encoding’ attribute in ‘language’ elements.
|
static String |
FLEX_LANGUAGE__FONT_ATTR
Constant for the ‘font’ attribute in ‘language’ elements.
|
static String |
FLEX_LANGUAGE__VERNACULAR_ATTR
Constant for the ‘vernacular’ attribute in ‘language’ elements.
|
static String |
ITEM_LAYER_MORPH
A constant for the name of the layer that includes the morphological
SToken s. |
static String |
ITEM_LAYER_PHRASE
A constant for the name of the layer that includes the phrase segmentation
SSpan s. |
static String |
ITEM_LAYER_WORD
A constant for the name of the layer that includes the lexical
SToken s. |
static String |
PROCESSING__ACTIVE_ELEMENT_VALUE
Constant for the ‘activeElement’ variable used during processing.
|
static String |
PROCESSING__KEY_VALUE_SEPARATOR
Constant for the key value separator
= used during processing. |
static String |
PROCESSING__UNDERSCORE
Constant for the underscore char used for dynamicising annotation values during processing.
|
static String |
TAG_INTERLINEAR_TEXT
Constant to address the xml-element
interlinear-text . |
static String |
TAG_ITEM
Constant to address the xml-element
item . |
static String |
TAG_LANGUAGE
Constant to address the xml-element
language . |
static String |
TAG_LANGUAGES
Constant to address the xml-element
languages . |
static String |
TAG_MORPH
Constant to address the xml-element
morph . |
static String |
TAG_MORPHEMES
Constant to address the xml-element
morphemes . |
static String |
TAG_PARAGRAPH
Constant to address the xml-element
paragraph . |
static String |
TAG_PHRASE
Constant to address the xml-element
phrase . |
static String |
TAG_SEQNUM
A constant for the name of the annotation recording sequential numbering of segments.
|
static String |
TAG_WORD
Constant to address the xml-element
word . |
static String |
TAG_WORDS
Constant to address the xml-element
words . |
static String |
TOKEN_LAYER_LEXICAL
A constant for the name of the layer that includes the lexical data nodes, used for processing data source.
|
static String |
TOKEN_LAYER_MORPHOLOGICAL
A constant for the name of the layer that includes the morphological data nodes, used for processing data source.
|
static final String ITEM_LAYER_MORPH
A constant for the name of the layer that includes the morphological SToken
s.
static final String ITEM_LAYER_WORD
A constant for the name of the layer that includes the lexical SToken
s.
static final String ITEM_LAYER_PHRASE
A constant for the name of the layer that includes the phrase segmentation SSpan
s.
static final String TOKEN_LAYER_LEXICAL
A constant for the name of the layer that includes the lexical data nodes, used for processing data source.
static final String TOKEN_LAYER_MORPHOLOGICAL
A constant for the name of the layer that includes the morphological data nodes, used for processing data source.
static final String TAG_PARAGRAPH
Constant to address the xml-element paragraph
.
paragraph
s are the top level segments in a document.
static final String TAG_ITEM
Constant to address the xml-element item
.
This corresponds to the generic element item
, so items carry their domain info in their type
attribute.
static final String TAG_LANGUAGES
Constant to address the xml-element languages
.
languages
is a container for language
s.
static final String TAG_SEQNUM
A constant for the name of the annotation recording sequential numbering of segments.
static final String TAG_WORDS
Constant to address the xml-element words
.
words
is a container for word
elements.
static final String TAG_LANGUAGE
Constant to address the xml-element language
.
static final String TAG_MORPHEMES
Constant to address the xml-element morphemes
.
morphemes
is a container for morph
s.
static final String TAG_INTERLINEAR_TEXT
Constant to address the xml-element interlinear-text
.
This corresponds to the Salt element SDocument
.
interlinear-text
can, to current knowledge, have the following child elements:
paragraphs
- languages
- item
static final String TAG_PHRASE
Constant to address the xml-element phrase
.
static final String TAG_MORPH
Constant to address the xml-element morph
.
static final String TAG_WORD
Constant to address the xml-element word
.
static final String FLEX__TYPE_ATTR
Constant for the ‘type’ attribute in ‘item’ elements.
static final String FLEX__LANG_ATTR
Constant for the ‘lang’ attribute in ‘item’ elements.
static final String FLEX__ANALYSIS_STATUS_ATTR
Constant for the ‘analysisStatus’ attribute in ‘item’ elements.
static final String FLEX_LANGUAGE__ENCODING_ATTR
Constant for the ‘encoding’ attribute in ‘language’ elements.
static final String FLEX_LANGUAGE__FONT_ATTR
Constant for the ‘font’ attribute in ‘language’ elements.
static final String FLEX_LANGUAGE__VERNACULAR_ATTR
Constant for the ‘vernacular’ attribute in ‘language’ elements.
static final String FLEX_ITEM_TYPE__TXT
Constant for the ‘txt’ value of the ‘type’ attribute in ‘item’ elements.
static final String FLEX_ITEM_TYPE__PUNCT
Constant for the ‘punct’ value of the ‘type’ attribute in ‘item’ elements.
static final String PROCESSING__KEY_VALUE_SEPARATOR
Constant for the key value separator =
used during processing.
static final String PROCESSING__ACTIVE_ELEMENT_VALUE
Constant for the ‘activeElement’ variable used during processing.
static final String PROCESSING__UNDERSCORE
Constant for the underscore char used for dynamicising annotation values during processing.
Copyright © 2011–2018 Humboldt-Universität zu Berlin. All rights reserved.