Formats

Corpus Gesproken Nederlands

exception pynlpl.formats.cgn.InvalidFeatureException
exception pynlpl.formats.cgn.InvalidTagException
pynlpl.formats.cgn.parse_cgn_postag(rawtag, raisefeatureexceptions=False)

GIZA++

class pynlpl.formats.giza.GizaModel(filename, encoding='utf-8')
class pynlpl.formats.giza.GizaSentenceAlignment(sourceline, targetline, index)
getalignedtarget(index)

Returns target range only if source index aligns to a single consecutive range of target tokens.

intersect(other)
class pynlpl.formats.giza.IntersectionAlignment(source2target, target2source, encoding=False)
reset()
class pynlpl.formats.giza.MultiWordAlignment(filename, encoding=False)

Source to Target alignment: reads source-target.A3.final files, in which each source word may be aligned to multiple target words (adapted from code by Sander Canisius)

reset()
targetword(index, targetwords, alignment)

Return the aligned targeword for a specified index in the source words. Multiple words are concatenated together with a space in between

targetwords(index, targetwords, alignment)

Return the aligned targetwords for a specified index in the source words

class pynlpl.formats.giza.WordAlignment(filename, encoding=False)

Target to Source alignment: reads target-source.A3.final files, in which each source word is aligned to one target word

reset()
targetword(index, targetwords, alignment)

Return the aligned targetword for a specified index in the source words

pynlpl.formats.giza.parseAlignment(tokens)

Moses

class pynlpl.formats.moses.PTFactory(phrasetable)
protocol

alias of PTProtocol

class pynlpl.formats.moses.PTProtocol
lineReceived(phrase)
class pynlpl.formats.moses.PhraseTable(filename, quiet=False, reverse=False, delimiter='|||', score_column=3, max_sourcen=0, sourceencoder=None, targetencoder=None, scorefilter=None)
class pynlpl.formats.moses.PhraseTableClient(host='localhost', port=65432)
class pynlpl.formats.moses.PhraseTableServer(phrasetable, port=65432)

SoNaR

class pynlpl.formats.sonar.Corpus(corpusdir, extension='pos', restrict_to_collection='', conditionf=<function Corpus.<lambda> at 0x7f461d218a60>, ignoreerrors=False)
class pynlpl.formats.sonar.CorpusDocument(filename, encoding='iso-8859-15')

This class represent one document/text of the Corpus (read-only)

paragraphs(with_id=False)

Extracts paragraphs, returns list of plain-text(!) paragraphs

sentences()

Iterate over all sentences (sentence_id, sentence) in the document, sentence is a list of 4-tuples (word,id,pos,lemma)

words()
class pynlpl.formats.sonar.CorpusDocumentX(filename, tree=None, index=True)

This class represent one document/text of the Corpus, loaded into memory at once and retaining the full structure

paragraphs(node=None)

iterate over paragraphs

save(filename=None, encoding='iso-8859-15')
sentences(node=None)

iterate over sentences

validate(formats_dir='../formats/')

checks if the document is valid

words(node=None)

iterate over words

xpath(expression)

Executes an xpath expression using the correct namespaces

class pynlpl.formats.sonar.CorpusFiles(corpusdir, extension='pos', restrict_to_collection='', conditionf=<function Corpus.<lambda> at 0x7f461d218a60>, ignoreerrors=False)
class pynlpl.formats.sonar.CorpusX(corpusdir, extension='pos', restrict_to_collection='', conditionf=<function Corpus.<lambda> at 0x7f461d218a60>, ignoreerrors=False)
pynlpl.formats.sonar.ns(namespace)

Resolves the namespace identifier to a full URL

FoLiA

See folia_ : folia.html

Taggerdata

class pynlpl.formats.taggerdata.Taggerdata(filename, encoding='utf-8', mode='r')
align(referencewords, datatuple)

align the reference sentence with the tagged data

close()
next()
reset()
write(sentence)

TiMBL

class pynlpl.formats.timbl.TimblOutput(stream, delimiter=' ', ignorecolumns=[], ignorevalues=[])

A class for reading Timbl classifier output, supports the +v+db option and ignores comments starting with #

parseDistribution(instance, start, end=None)

FoLiA

class formats.folia.AbstractAnnotation(doc, *args, **kwargs)
class formats.folia.AbstractAnnotationLayer(doc, *args, **kwargs)

Annotation layers for Span Annotation are derived from this abstract base class

OPTIONAL_ATTRIBS = (0, 6)
PRINTABLE = False
ROOTELEMENT = False
add(child, *args, **kwargs)
alternatives(Class=None, set=None)

Generator over alternatives, either all or only of a specific annotation type, and possibly restrained also by set.

Arguments:
  • Class - The Class you want to retrieve (e.g. PosAnnotation). Or set to None to select all alternatives regardless of what type they are.
  • set - The set you want to retrieve (defaults to None, which selects irregardless of set)
Returns:
Generator over Alternative elements
annotation(type, set=None)

Will return a single annotation (even if there are multiple). Raises a NoSuchAnnotation exception if none was found

annotations(Class, set=None)

Obtain annotations. Very similar to select() but raises an error if the annotation was not found.

Arguments:
  • Class - The Class you want to retrieve (e.g. PosAnnotation)
  • set - The set you want to retrieve (defaults to None, which selects irregardless of set)
Returns:
A list of elements
Raises:
NoSuchAnnotation if the specified annotation does not exist.
append(child, *args, **kwargs)
findspan(*words)

Returns the span element which spans over the specified words or morphemes

hasannotation(Class, set=None)

Returns an integer indicating whether such as annotation exists, and if so, how many. See annotations() for a description of the parameters.

classmethod relaxng(includechildren=True, extraattribs=None, extraelements=None, origclass=None)

Returns a RelaxNG definition for this element (as an XML element (lxml.etree) rather than a string)

xml(attribs=None, elements=None, skipchildren=False)
class formats.folia.AbstractCorrectionChild(doc, *args, **kwargs)
ACCEPTED_DATA = (<class 'formats.folia.AbstractTokenAnnotation'>, <class 'formats.folia.AbstractSpanAnnotation'>, <class 'formats.folia.Word'>, <class 'formats.folia.TextContent'>, <class 'formats.folia.String'>, <class 'formats.folia.Description'>, <class 'formats.folia.Metric'>)
OPTIONAL_ATTRIBS = (2, 3, 5, 4)
PRINTABLE = True
ROOTELEMENT = False
TEXTDELIMITER = None
class formats.folia.AbstractDefinition
class formats.folia.AbstractElement(doc, *args, **kwargs)

This is the abstract base class from which all FoLiA elements are derived. This class should not be instantiated directly, but can useful if you want to check if a variable is an instance of any FoLiA element: isinstance(x, AbstractElement). It contains methods and variables also commonly inherited.

ACCEPTED_DATA = ()
ANNOTATIONTYPE = None
AUTH = True
OCCURRENCES = 0
OCCURRENCESPERSET = 1
OPTIONAL_ATTRIBS = ()
PRINTABLE = False
REQUIRED_ATTRIBS = ()
ROOTELEMENT = True
TEXTCONTAINER = False
TEXTDELIMITER = None
XMLTAG = None
add(child, *args, **kwargs)

High level function that adds (appends) an annotation to an element, it will simply call append() for token annotation elements that fit within the scope. For span annotation, it will create and find or create the proper annotation layer and insert the element there

classmethod addable(Class, parent, set=None, raiseexceptions=True)

Tests whether a new element of this class can be added to the parent. Returns a boolean or raises ValueError exceptions (unless set to ignore)!

This will use OCCURRENCES, but may be overidden for more customised behaviour.

This method is mostly for internal use.

addidsuffix(idsuffix, recursive=True)
addtoindex(norecurse=[])

Makes sure this element (and all subelements), are properly added to the index

ancestor(*Classes)

Find the most immediate ancestor of the specified type, multiple classes may be specified

ancestors(Class=None)

Generator yielding all ancestors of this element, effectively back-tracing its path to the root element. A tuple of multiple classes may be specified.

append(child, *args, **kwargs)

Append a child element. Returns the added element

Arguments:
  • child - Instance or class

If an instance is passed as first argument, it will be appended If a class derived from AbstractElement is passed as first argument, an instance will first be created and then appended.

Keyword arguments:
  • alternative= - If set to True, the element will be made into an alternative.

Generic example, passing a pre-generated instance:

word.append( folia.LemmaAnnotation(doc,  cls="house", annotator="proycon", annotatortype=folia.AnnotatorType.MANUAL ) )

Generic example, passing a class to be generated:

word.append( folia.LemmaAnnotation, cls="house", annotator="proycon", annotatortype=folia.AnnotatorType.MANUAL )

Generic example, setting text with a class:

word.append( “house”, cls=’original’ )
context(size, placeholder=None, scope=None)

Returns this word in context, {size} words to the left, the current word, and {size} words to the right

copy(newdoc=None, idsuffix='')

Make a deep copy of this element and all its children. If idsuffix is a string, if set to True, a random idsuffix will be generated including a random 32-bit hash

copychildren(newdoc=None, idsuffix='')

Generator creating a deep copy of the children of this element. If idsuffix is a string, if set to True, a random idsuffix will be generated including a random 32-bit hash

count(Class, set=None, recursive=True, ignore=True, node=None)

Like select, but instead of returning the elements, it merely counts them

deepvalidation()
description()

Obtain the description associated with the element, will raise NoDescription if there is none

feat(subset)

Obtain the feature value of the specific subset. If a feature occurs multiple times, the values will be returned in a list.

Example:

sense = word.annotation(folia.Sense)
synset = sense.feat('synset')
classmethod findreplaceables(Class, parent, set=None, **kwargs)

Find replaceable elements. Auxiliary function used by replace(). Can be overriden for more fine-grained control. Mostly for internal use.

getindex(child, recursive=True, ignore=True)

returns the index at which an element occurs, recursive by default!

gettextdelimiter(retaintokenisation=False)

May return a customised text delimiter instead of the default for this class.

hastext(cls='current')

Does this element have text (of the specified class)

incorrection()

Is this element part of a correction? If it is, it returns the Correction element (evaluating to True), otherwise it returns None

insert(index, child, *args, **kwargs)

Insert a child element at specified index. Returns the added element

If an instance is passed as first argument, it will be appended If a class derived from AbstractElement is passed as first argument, an instance will first be created and then appended.

Arguments:
  • index
  • child - Instance or class
Keyword arguments:
  • alternative= - If set to True, the element will be made into an alternative.
  • corrected= - Used only when passing strings to be made into TextContent elements.

Generic example, passing a pre-generated instance:

word.insert( 3, folia.LemmaAnnotation(doc,  cls="house", annotator="proycon", annotatortype=folia.AnnotatorType.MANUAL ) )

Generic example, passing a class to be generated:

word.insert( 3, folia.LemmaAnnotation, cls="house", annotator="proycon", annotatortype=folia.AnnotatorType.MANUAL )

Generic example, setting text:

word.insert( 3, "house" )
items(founditems=[])

Returns a depth-first flat list of all items below this element (not limited to AbstractElement)

json(attribs=None, recurse=True)
leftcontext(size, placeholder=None, scope=None)

Returns the left context for an element, as a list. This method crosses sentence/paragraph boundaries by default, which can be restricted by setting scope

next(Class=True, scope=True, reverse=False)

Returns the next element, if it is of the specified type and if it does not cross the boundary of the defined scope. Returns None if no next element is found. Non-authoritative elements are never returned.

Arguments:
  • Class: The class to select; any python class subclassed off ‘AbstractElement`, may also be a tuple of multiple classes. Set to True to constrain to the same class as that of the current instance, set to None to not constrain at all
  • scope: A list of classes which are never crossed looking for a next element. Set to True to constrain to a default list of structure elements (Sentence,Paragraph,Division,Event, ListItem,Caption), set to None to not constrain at all.
originaltext()

Alias for retrieving the original uncorrect text

classmethod parsexml(Class, node, doc)

Internal class method used for turning an XML element into an instance of the Class.

Args:
  • ``node`’ - XML Element
  • doc - Document
Returns:
An instance of the current Class.
postappend()

This method will be called after an element is added to another. It can do extra checks and if necessary raise exceptions to prevent addition. By default makes sure the right document is associated.

This method is mostly for internal use.

previous(Class=True, scope=True)

Returns the previous element, if it is of the specified type and if it does not cross the boundary of the defined scope. Returns None if no next element is found. Non-authoritative elements are never returned.

Arguments:
  • Class: The class to select; any python class subclassed off ‘AbstractElement`. Set to True to constrain to the same class as that of the current instance, set to None to not constrain at all
  • scope: A list of classes which are never crossed looking for a next element. Set to True to constrain to a default list of structure elements (Sentence,Paragraph,Division,Event, ListItem,Caption), set to None to not constrain at all.
classmethod relaxng(includechildren=True, extraattribs=None, extraelements=None, origclass=None)

Returns a RelaxNG definition for this element (as an XML element (lxml.etree) rather than a string)

remove(child)

Removes the child element

replace(child, *args, **kwargs)

Appends a child element like append(), but replaces any existing child element of the same type and set. If no such child element exists, this will act the same as append()

Keyword arguments:
  • alternative - If set to True, the replaced element will be made into an alternative. Simply use append() if you want the added element

to be an alternative.

See append() for more information.

resolveword(id)
rightcontext(size, placeholder=None, scope=None)

Returns the right context for an element, as a list. This method crosses sentence/paragraph boundaries by default, which can be restricted by setting scope

select(Class, set=None, recursive=True, ignore=True, node=None)

Select child elements of the specified class.

A further restriction can be made based on set. Whether or not to apply recursively (by default enabled) can also be configured, optionally with a list of elements never to recurse into.

Arguments:
  • Class: The class to select; any python class subclassed off ‘AbstractElement`

  • set: The set to match against, only elements pertaining to this set will be returned. If set to None (default), all elements regardless of set will be returned.

  • recursive: Select recursively? Descending into child elements? Boolean defaulting to True.

  • ignore: A list of Classes to ignore, if set to True instead

    of a list, all non-authoritative elements will be skipped (this is the default behaviour). It is common not to

    want to recurse into the following elements: folia.Alternative, folia.AlternativeLayer, folia.Suggestion, and folia.Original. These elements contained in these are never authorative. set to the boolean True rather than a list, this will be the default list. You may also include the boolean True as a member of a list, if you want to skip additional tags along non-authoritative ones.

  • node: Reserved for internal usage, used in recursion.

Returns:
A generator of elements (instances)

Example:

text.select(folia.Sense, 'cornetto', True, [folia.Original, folia.Suggestion, folia.Alternative] )
setdoc(newdoc)

Set a different document, usually no need to call this directly, invoked implicitly by copy()

setdocument(doc)

Associate a document with this element

setparents()

Correct all parent relations for elements within the scope, usually no need to call this directly, invoked implicitly by copy()

settext(text, cls='current')

Set the text for this element (and class)

stricttext(cls='current')

Get the text strictly associated with this element (of the specified class). Does not recurse into children, with the sole exception of Corection/New

text(cls='current', retaintokenisation=False, previousdelimiter='')

Get the text associated with this element (of the specified class), will always be a unicode instance. If no text is directly associated with the element, it will be obtained from the children. If that doesn’t result in any text either, a NoSuchText exception will be raised.

If retaintokenisation is True, the space attribute on words will be ignored, otherwise it will be adhered to and text will be detokenised as much as possible.

textcontent(cls='current')

Get the text explicitly associated with this element (of the specified class). Returns the TextContent instance rather than the actual text. Raises NoSuchText exception if not found.

Unlike text(), this method does not recurse into child elements (with the sole exception of the Correction/New element), and it returns the TextContent instance rather than the actual text!

toktext(cls='current')

Alias for text with retaintokenisation=True

updatetext()

Internal method, recompute textual value. Only for elements that are a TEXTCONTAINER

xml(attribs=None, elements=None, skipchildren=False)

Serialises the FoLiA element to XML, by returning an XML Element (in lxml.etree) for this element and all its children. For string output, consider the xmlstring() method instead.

xmlstring(pretty_print=False)

Serialises this FoLiA element to XML, returns a (unicode) string with XML representation for this element and all its children.

class formats.folia.AbstractExtendedTokenAnnotation(doc, *args, **kwargs)
class formats.folia.AbstractSpanAnnotation(doc, *args, **kwargs)

Abstract element, all span annotation elements are derived from this class

OCCURRENCESPERSET = 0
OPTIONAL_ATTRIBS = (0, 1, 2, 4, 3, 5)
PRINTABLE = True
REQUIRED_ATTRIBS = ()
add(child, *args, **kwargs)
addtoindex(norecurse=None)
annotation(type, set=None)

Will return a single annotation (even if there are multiple). Raises a NoSuchAnnotation exception if none was found

append(child, *args, **kwargs)
copychildren(newdoc=None, idsuffix='')

Generator creating a deep copy of the children of this element. If idsuffix is a string, if set to True, a random idsuffix will be generated including a random 32-bit hash

hasannotation(Class, set=None)

Returns an integer indicating whether such as annotation exists, and if so, how many. See annotations() for a description of the parameters.

setspan(*args)

Sets the span of the span element anew, erases all data inside

wrefs(index=None)

Returns a list of word references, these can be Words but also Morphemes or Phonemes.

Arguments:
  • index: If set to an integer, will retrieve and return the n’th element (starting at 0) instead of returning the list of all
xml(attribs=None, elements=None, skipchildren=False)
class formats.folia.AbstractSpanRole(doc, *args, **kwargs)
OPTIONAL_ATTRIBS = (0, 2, 4, 5)
REQUIRED_ATTRIBS = ()
ROOTELEMENT = False
class formats.folia.AbstractStructureElement(doc, *args, **kwargs)

Abstract element, all structure elements inherit from this class. Never instantiated directly.

OCCURRENCESPERSET = 0
OPTIONAL_ATTRIBS = (0, 1, 2, 4, 3, 5)
PRINTABLE = True
REQUIRED_ATTRIBS = (0,)
TEXTDELIMITER = '\n\n'
append(child, *args, **kwargs)

See AbstractElement.append()

hasannotationlayer(annotationtype=None, set=None)

Does the specified annotation layer exist?

layers(annotationtype=None, set=None)

Returns a list of annotation layers found directly under this element, does not include alternative layers

paragraphs(index=None)

Returns a generator of Paragraph elements found (recursively) under this element.

Arguments:
  • index: If set to an integer, will retrieve and return the n’th element (starting at 0) instead of returning the generator of all
resolveword(id)
sentences(index=None)

Returns a generator of Sentence elements found (recursively) under this element

Arguments:
  • index: If set to an integer, will retrieve and return the n’th element (starting at 0) instead of returning a generator of all
words(index=None)

Returns a generator of Word elements found (recursively) under this element.

Arguments:
  • index: If set to an integer, will retrieve and return the n’th element (starting at 0) instead of returning the list of all
class formats.folia.AbstractSubtokenAnnotation(doc, *args, **kwargs)

Abstract element, all subtoken annotation elements are derived from this class

OCCURRENCESPERSET = 0
OPTIONAL_ATTRIBS = (0, 1, 2, 4, 3, 5)
PRINTABLE = True
REQUIRED_ATTRIBS = ()
class formats.folia.AbstractTextMarkup(doc, *args, **kwargs)
ACCEPTED_DATA = (<class 'formats.folia.AbstractTextMarkup'>,)
OPTIONAL_ATTRIBS = (0, 1, 2, 4, 3, 5)
PRINTABLE = True
REQUIRED_ATTRIBS = ()
ROOTELEMENT = False
TEXTCONTAINER = True
TEXTDELIMITER = ''
json(attribs=None, recurse=True)
classmethod parsexml(Class, node, doc)
classmethod relaxng(includechildren=True, extraattribs=None, extraelements=None)
resolve()
settext(text)
text()

Obtain the text (unicode instance)

xml(attribs=None, elements=None, skipchildren=False)
class formats.folia.AbstractTokenAnnotation(doc, *args, **kwargs)

Abstract element, all token annotation elements are derived from this class

OCCURRENCESPERSET = 1
OPTIONAL_ATTRIBS = (0, 1, 2, 4, 3, 5)
REQUIRED_ATTRIBS = (1,)
append(child, *args, **kwargs)

See AbstractElement.append()

class formats.folia.ActorFeature(doc, *args, **kwargs)

Actor feature, to be used within Event

SUBSET = 'actor'
XMLTAG = None
class formats.folia.AlignReference(doc, *args, **kwargs)
REQUIRED_ATTRIBS = (0,)
XMLTAG = 'aref'
json(attribs=None, recurse=True)
classmethod parsexml(Class, node, doc)
classmethod relaxng(includechildren=True, extraattribs=None, extraelements=None)
resolve(alignmentcontext)
xml(attribs=None, elements=None, skipchildren=False)
class formats.folia.Alignment(doc, *args, **kwargs)
ACCEPTED_DATA = (<class 'formats.folia.AlignReference'>, <class 'formats.folia.Description'>, <class 'formats.folia.Metric'>)
ANNOTATIONTYPE = 28
OCCURRENCESPERSET = 0
OPTIONAL_ATTRIBS = (0, 1, 2, 4, 3, 5)
PRINTABLE = False
REQUIRED_ATTRIBS = ()
XMLTAG = 'alignment'
json(attribs=None)
resolve()
class formats.folia.AllowCorrections
correct(**kwargs)

Apply a correction (TODO: documentation to be written still)

class formats.folia.AllowGenerateID

Classes inherited from this class allow for automatic ID generation, using the convention of adding a period, the name of the element , another period, and a sequence number

generate_id(cls)
class formats.folia.AllowTokenAnnotation

Elements that allow token annotation (including extended annotation) must inherit from this class

alternatives(Class=None, set=None)

Generator over alternatives, either all or only of a specific annotation type, and possibly restrained also by set.

Arguments:
  • Class - The Class you want to retrieve (e.g. PosAnnotation). Or set to None to select all alternatives regardless of what type they are.
  • set - The set you want to retrieve (defaults to None, which selects irregardless of set)
Returns:
Generator of Alternative elements
annotation(type, set=None)

Will return a single annotation (even if there are multiple). Raises a NoSuchAnnotation exception if none was found

annotations(Class, set=None)

Obtain annotations. Very similar to select() but raises an error if the annotation was not found.

Arguments:
  • Class - The Class you want to retrieve (e.g. PosAnnotation)
  • set - The set you want to retrieve (defaults to None, which selects irregardless of set)
Returns:
A generator of elements
Raises:
NoSuchAnnotation if the specified annotation does not exist.
hasannotation(Class, set=None)

Returns an integer indicating whether such as annotation exists, and if so, how many. See annotations() for a description of the parameters.

class formats.folia.Alternative(doc, *args, **kwargs)

Element grouping alternative token annotation(s). Multiple alternative elements may occur, each denoting a different alternative. Elements grouped inside an alternative block are considered dependent.

ACCEPTED_DATA = [<class 'formats.folia.AbstractTokenAnnotation'>, <class 'formats.folia.Correction'>, <class 'formats.folia.MorphologyLayer'>]
ANNOTATIONTYPE = 19
AUTH = False
OPTIONAL_ATTRIBS = (0, 1, 2, 4, 3, 5)
PRINTABLE = False
REQUIRED_ATTRIBS = ()
XMLTAG = 'alt'
class formats.folia.AlternativeLayers(doc, *args, **kwargs)

Element grouping alternative subtoken annotation(s). Multiple altlayers elements may occur, each denoting a different alternative. Elements grouped inside an alternative block are considered dependent.

ACCEPTED_DATA = (<class 'formats.folia.AbstractAnnotationLayer'>,)
AUTH = False
OPTIONAL_ATTRIBS = (0, 1, 2, 4, 3, 5)
PRINTABLE = False
REQUIRED_ATTRIBS = ()
XMLTAG = 'altlayers'
class formats.folia.AnnotationType
ALIGNMENT = 28
ALTERNATIVE = 19
CHUNKING = 14
COMPLEXALIGNMENT = 29
COREFERENCE = 30
CORRECTION = 16
DEPENDENCY = 24
DIVISION = 2
DOMAIN = 11
ENTITY = 15
ERRORDETECTION = 18
EVENT = 23
FIGURE = 5
GAP = 26
LANG = 33
LEMMA = 10
LINEBREAK = 7
LIST = 4
METRIC = 32
MORPHOLOGICAL = 22
NOTE = 27
PARAGRAPH = 3
PART = 37
PHON = 20
POS = 9
SEMROLE = 31
SENSE = 12
SENTENCE = 8
STRING = 34
STYLE = 36
SUBJECTIVITY = 21
SUGGESTION = 17
SYNTAX = 13
TABLE = 35
TEXT = 0
TIMESEGMENT = 25
TOKEN = 1
WHITESPACE = 6
class formats.folia.AnnotatorType
AUTO = 1
MANUAL = 2
UNSET = 0
class formats.folia.Attrib
ALL = (0, 1, 2, 4, 3, 5)
ANNOTATOR = 2
CLASS = 1
CONFIDENCE = 3
DATETIME = 5
ID = 0
N = 4
SETONLY = 6
class formats.folia.BegindatetimeFeature(doc, *args, **kwargs)

Begindatetime feature, to be used within Event

SUBSET = 'begindatetime'
XMLTAG = None
class formats.folia.BypassLeakFile
read(n=0)
readline()
class formats.folia.Caption(doc, *args, **kwargs)

Element used for captions for figures or tables, contains sentences

ACCEPTED_DATA = (<class 'formats.folia.Sentence'>, <class 'formats.folia.Reference'>, <class 'formats.folia.Description'>, <class 'formats.folia.TextContent'>, <class 'formats.folia.String'>, <class 'formats.folia.Alignment'>, <class 'formats.folia.Gap'>, <class 'formats.folia.Metric'>, <class 'formats.folia.Alternative'>, <class 'formats.folia.Alternative'>, <class 'formats.folia.AlternativeLayers'>, <class 'formats.folia.AbstractAnnotationLayer'>, <class 'formats.folia.Correction'>, <class 'formats.folia.Part'>)
OCCURRENCES = 1
XMLTAG = 'caption'
class formats.folia.Cell(doc, *args, **kwargs)
ACCEPTED_DATA = (<class 'formats.folia.Paragraph'>, <class 'formats.folia.Head'>, <class 'formats.folia.Sentence'>, <class 'formats.folia.Word'>, <class 'formats.folia.Correction'>, <class 'formats.folia.Event'>, <class 'formats.folia.Note'>, <class 'formats.folia.Reference'>, <class 'formats.folia.Linebreak'>, <class 'formats.folia.Whitespace'>, <class 'formats.folia.Gap'>, <class 'formats.folia.AbstractAnnotationLayer'>, <class 'formats.folia.AlternativeLayers'>, <class 'formats.folia.AbstractExtendedTokenAnnotation'>, <class 'formats.folia.Correction'>, <class 'formats.folia.Part'>)
ANNOTATIONTYPE = 35
REQUIRED_ATTRIBS = ((),)
TEXTDELIMITER = ' | '
XMLTAG = 'cell'
class formats.folia.Chunk(doc, *args, **kwargs)

Chunk element, span annotation element to be used in ChunkingLayer

ACCEPTED_DATA = (<class 'formats.folia.WordReference'>, <class 'formats.folia.Description'>, <class 'formats.folia.Feature'>, <class 'formats.folia.Metric'>)
ANNOTATIONTYPE = 14
REQUIRED_ATTRIBS = ()
XMLTAG = 'chunk'
class formats.folia.ChunkingLayer(doc, *args, **kwargs)

Chunking Layer: Annotation layer for Chunk span annotation elements

ACCEPTED_DATA = (<class 'formats.folia.Chunk'>, <class 'formats.folia.Description'>, <class 'formats.folia.Correction'>)
ANNOTATIONTYPE = 14
XMLTAG = 'chunking'
class formats.folia.ClassDefinition(id, label, constraints=[], subclasses=[])
json()
classmethod parsexml(Class, node, constraintindex)
class formats.folia.ConstraintDefinition(id, restrictions={}, exceptions={})
json()
classmethod parsexml(Class, node, constraintindex)
class formats.folia.Content(doc, *args, **kwargs)
OCCURRENCES = 1
XMLTAG = 'content'
json(attribs=None, recurse=True)
classmethod parsexml(Class, node, doc)
classmethod relaxng(includechildren=True, extraattribs=None, extraelements=None)
xml(attribs=None, elements=None, skipchildren=False)
class formats.folia.CoreferenceChain(doc, *args, **kwargs)

Coreference chain. Consists of coreference links.

ACCEPTED_DATA = (<class 'formats.folia.CoreferenceLink'>, <class 'formats.folia.Description'>, <class 'formats.folia.Metric'>)
ANNOTATIONTYPE = 30
REQUIRED_ATTRIBS = ()
XMLTAG = 'coreferencechain'
class formats.folia.CoreferenceLayer(doc, *args, **kwargs)

Syntax Layer: Annotation layer for SyntacticUnit span annotation elements

ACCEPTED_DATA = (<class 'formats.folia.CoreferenceChain'>, <class 'formats.folia.Description'>, <class 'formats.folia.Correction'>)
ANNOTATIONTYPE = 30
XMLTAG = 'coreferences'

Coreference link. Used in coreferencechain.

ACCEPTED_DATA = (<class 'formats.folia.WordReference'>, <class 'formats.folia.Description'>, <class 'formats.folia.Headspan'>, <class 'formats.folia.Alignment'>, <class 'formats.folia.ModalityFeature'>, <class 'formats.folia.TimeFeature'>, <class 'formats.folia.LevelFeature'>, <class 'formats.folia.Metric'>)
ANNOTATIONTYPE = 30
OPTIONAL_ATTRIBS = (2, 4, 5)
REQUIRED_ATTRIBS = ()
ROOTELEMENT = False
XMLTAG = 'coreferencelink'
class formats.folia.Corpus(corpusdir, extension='xml', restrict_to_collection='', conditionf=<function Corpus.<lambda> at 0x7f461cf0a400>, ignoreerrors=False, **kwargs)

A corpus of various FoLiA documents. Yields a Document on each iteration. Suitable for sequential processing.

class formats.folia.CorpusFiles(corpusdir, extension='xml', restrict_to_collection='', conditionf=<function Corpus.<lambda> at 0x7f461cf0a400>, ignoreerrors=False, **kwargs)

A corpus of various FoLiA documents. Yields the filenames on each iteration.

class formats.folia.CorpusProcessor(corpusdir, function, threads=None, extension='xml', restrict_to_collection='', conditionf=<function CorpusProcessor.<lambda> at 0x7f461cf0a620>, maxtasksperchild=100, preindex=False, ordered=True, chunksize=1)

Processes a corpus of various FoLiA documents using a parallel processing. Calls a user-defined function with the three-tuple (filename, args, kwargs) for each file in the corpus. The user-defined function is itself responsible for instantiating a FoLiA document! args and kwargs, as received by the custom function, are set through the run() method, which yields the result of the custom function on each iteration.

execute()
run(*args, **kwargs)
class formats.folia.Correction(doc, *args, **kwargs)
ACCEPTED_DATA = (<class 'formats.folia.New'>, <class 'formats.folia.Original'>, <class 'formats.folia.Current'>, <class 'formats.folia.Suggestion'>, <class 'formats.folia.Description'>, <class 'formats.folia.Metric'>)
ANNOTATIONTYPE = 16
OCCURRENCESPERSET = 0
OPTIONAL_ATTRIBS = (0, 1, 2, 4, 3, 5)
PRINTABLE = True
REQUIRED_ATTRIBS = ()
ROOTELEMENT = True
TEXTDELIMITER = None
XMLTAG = 'correction'
append(child, *args, **kwargs)

See AbstractElement.append()

current(index=None)
gettextdelimiter(retaintokenisation=False)

May return a customised text delimiter instead of the default for this class.

hascurrent()
hasnew()
hasoriginal()
hassuggestions()
new(index=None)
original(index=None)
suggestions(index=None)
text(cls='current', retaintokenisation=False, previousdelimiter='')
textcontent(cls='current')

Get the text explicitly associated with this element (of the specified class). Returns the TextContent instance rather than the actual text. Raises NoSuchText exception if not found.

Unlike text(), this method does not recurse into child elements (with the sole exception of the Correction/New element), and it returns the TextContent instance rather than the actual text!

class formats.folia.Current(doc, *args, **kwargs)
OCCURRENCES = 1
OPTIONAL_ATTRIBS = ((),)
REQUIRED_ATTRIBS = ((),)
XMLTAG = 'current'
classmethod addable(Class, parent, set=None, raiseexceptions=True)
exception formats.folia.DeepValidationError
class formats.folia.DependenciesLayer(doc, *args, **kwargs)

Dependencies Layer: Annotation layer for Dependency span annotation elements. For dependency entities.

ACCEPTED_DATA = (<class 'formats.folia.Dependency'>, <class 'formats.folia.Description'>, <class 'formats.folia.Correction'>)
ANNOTATIONTYPE = 24
XMLTAG = 'dependencies'
class formats.folia.Dependency(doc, *args, **kwargs)
ACCEPTED_DATA = (<class 'formats.folia.Description'>, <class 'formats.folia.Feature'>, <class 'formats.folia.Headspan'>, <class 'formats.folia.DependencyDependent'>, <class 'formats.folia.Alignment'>, <class 'formats.folia.Metric'>)
ANNOTATIONTYPE = 24
REQUIRED_ATTRIBS = ()
XMLTAG = 'dependency'
dependent()

Returns the dependent of the dependency relation. Instance of DependencyDependent

head()

Returns the head of the dependency relation. Instance of DependencyHead

class formats.folia.DependencyDependent(doc, *args, **kwargs)
ACCEPTED_DATA = (<class 'formats.folia.WordReference'>, <class 'formats.folia.Description'>, <class 'formats.folia.Feature'>, <class 'formats.folia.Alignment'>, <class 'formats.folia.Metric'>)
ANNOTATIONTYPE = 24
XMLTAG = 'dep'
formats.folia.DependencyHead

alias of Headspan

class formats.folia.Description(doc, *args, **kwargs)

Description is an element that can be used to associate a description with almost any other FoLiA element

OCCURRENCES = 1
XMLTAG = 'desc'
json(attribs=None, recurse=True)
classmethod parsexml(Class, node, doc)
classmethod relaxng(includechildren=True, extraattribs=None, extraelements=None)
xml(attribs=None, elements=None, skipchildren=False)
class formats.folia.Division(doc, *args, **kwargs)

Structure element representing some kind of division. Divisions may be nested at will, and may include almost all kinds of other structure elements.

ACCEPTED_DATA = (<class 'formats.folia.Division'>, <class 'formats.folia.Quote'>, <class 'formats.folia.Gap'>, <class 'formats.folia.Event'>, <class 'formats.folia.Head'>, <class 'formats.folia.Paragraph'>, <class 'formats.folia.Sentence'>, <class 'formats.folia.List'>, <class 'formats.folia.Figure'>, <class 'formats.folia.Table'>, <class 'formats.folia.Note'>, <class 'formats.folia.Reference'>, <class 'formats.folia.AbstractExtendedTokenAnnotation'>, <class 'formats.folia.Description'>, <class 'formats.folia.Linebreak'>, <class 'formats.folia.Whitespace'>, <class 'formats.folia.Alternative'>, <class 'formats.folia.AlternativeLayers'>, <class 'formats.folia.AbstractAnnotationLayer'>, <class 'formats.folia.Correction'>, <class 'formats.folia.Part'>)
ANNOTATIONTYPE = 2
OPTIONAL_ATTRIBS = (1, 4)
REQUIRED_ATTRIBS = (0,)
TEXTDELIMITER = '\n\n\n'
XMLTAG = 'div'
head()
class formats.folia.Document(*args, **kwargs)

This is the FoLiA Document, all elements have to be associated with a FoLiA document. Besides holding elements, the document hold metadata including declaration, and an index of all IDs.

IDSEPARATOR = '.'
append(text)

Add a text to the document:

Example 1:

doc.append(folia.Text)
Example 2::
doc.append( folia.Text(doc, id=’example.text’) )
count(Class, set=None)
create(Class, *args, **kwargs)

Create an element associated with this Document. This method may be obsolete and removed later.

date(value=None)

No arguments: Get the document’s date from metadata Argument: Set the document’s date in metadata

declare(annotationtype, set, **kwargs)
declared(annotationtype, set)
defaultannotator(annotationtype, set=None)
defaultannotatortype(annotationtype, set=None)
defaultdatetime(annotationtype, set=None)
defaultset(annotationtype)
findwords(*args, **kwargs)
items()

Returns a depth-first flat list of all items in the document

json()
jsondeclarations()
language(value=None)

No arguments: Get the document’s language (ISO-639-3) from metadata Argument: Set the document’s language (ISO-639-3) in metadata

license(value=None)

No arguments: Get the document’s license from metadata Argument: Set the document’s license in metadata

load(filename)

Load a FoLiA or D-Coi XML file

paragraphs(index=None)

Return a generator of all paragraphs found in the document.

If an index is specified, return the n’th paragraph only (starting at 0)

parsemetadata(node)
parsexml(node, ParentClass=None)

Main XML parser, will invoke class-specific XML parsers. For internal use.

parsexmldeclarations(node)
publisher(value=None)

No arguments: Get the document’s publisher from metadata Argument: Set the document’s publisher in metadata

save(filename=None)

Save the document to FoLiA XML.

Arguments:
  • filename=: The filename to save to. If not set (None), saves to the same file as loaded from.
select(Class, set=None, recursive=True, ignore=True)
sentences(index=None)

Return a generator of all sentence found in the document. Except for sentences in quotes.

If an index is specified, return the n’th sentence only (starting at 0)

setcmdi(filename)
setimdi(node)
text(retaintokenisation=False)

Returns the text of the entire document (returns a unicode instance)

title(value=None)

No arguments: Get the document’s title from metadata Argument: Set the document’s title in metadata

words(index=None)

Return a generator of all active words found in the document. Does not descend into annotation layers, alternatives, originals, suggestions.

If an index is specified, return the n’th word only (starting at 0)

xml()
xmldeclarations()
xmlmetadata()
xmlstring()
xpath(query)

Run Xpath expression and parse the resulting elements. Don’t forget to use the FoLiA namesapace in your expressions, using folia: or the short form f:

class formats.folia.DomainAnnotation(doc, *args, **kwargs)

Domain annotation: an extended token annotation element

ACCEPTED_DATA = (<class 'formats.folia.Feature'>, <class 'formats.folia.Description'>, <class 'formats.folia.Metric'>)
ANNOTATIONTYPE = 11
XMLTAG = 'domain'
exception formats.folia.DuplicateAnnotationError
exception formats.folia.DuplicateIDError

Exception raised when an identifier that is already in use is assigned again to another element

class formats.folia.EnddatetimeFeature(doc, *args, **kwargs)

Enddatetime feature, to be used within Event

SUBSET = 'enddatetime'
XMLTAG = None
class formats.folia.EntitiesLayer(doc, *args, **kwargs)

Entities Layer: Annotation layer for Entity span annotation elements. For named entities.

ACCEPTED_DATA = (<class 'formats.folia.Entity'>, <class 'formats.folia.Description'>, <class 'formats.folia.Correction'>)
ANNOTATIONTYPE = 15
XMLTAG = 'entities'
class formats.folia.Entity(doc, *args, **kwargs)

Entity element, for named entities, span annotation element to be used in EntitiesLayer

ACCEPTED_DATA = (<class 'formats.folia.WordReference'>, <class 'formats.folia.Description'>, <class 'formats.folia.Feature'>, <class 'formats.folia.Metric'>)
ANNOTATIONTYPE = 15
REQUIRED_ATTRIBS = ()
XMLTAG = 'entity'
class formats.folia.ErrorDetection(doc, *args, **kwargs)
ANNOTATIONTYPE = 18
OCCURRENCESPERSET = 0
ROOTELEMENT = True
XMLTAG = 'errordetection'
class formats.folia.Event(doc, *args, **kwargs)
ACCEPTED_DATA = (<class 'formats.folia.Event'>, <class 'formats.folia.Paragraph'>, <class 'formats.folia.Sentence'>, <class 'formats.folia.Division'>, <class 'formats.folia.Word'>, <class 'formats.folia.Head'>, <class 'formats.folia.List'>, <class 'formats.folia.Figure'>, <class 'formats.folia.Table'>, <class 'formats.folia.Reference'>, <class 'formats.folia.Feature'>, <class 'formats.folia.ActorFeature'>, <class 'formats.folia.BegindatetimeFeature'>, <class 'formats.folia.EnddatetimeFeature'>, <class 'formats.folia.TextContent'>, <class 'formats.folia.String'>, <class 'formats.folia.Metric'>, <class 'formats.folia.AbstractExtendedTokenAnnotation'>, <class 'formats.folia.Correction'>, <class 'formats.folia.Part'>)
ANNOTATIONTYPE = 23
OCCURRENCESPERSET = 0
XMLTAG = 'event'
class formats.folia.External(doc, *args, **kwargs)
ACCEPTED_DATA = []
AUTH = True
OPTIONAL_ATTRIBS = ()
PRINTABLE = True
REQUIRED_ATTRIBS = ()
XMLTAG = 'external'
classmethod parsexml(Class, node, doc)
classmethod relaxng(includechildren=True, extraattribs=None, extraelements=None)
select(Class, set=None, recursive=True, ignore=True, node=None)
xml(attribs=None, elements=None, skipchildren=False)
class formats.folia.Feature(doc, *args, **kwargs)

Feature elements can be used to associate subsets and subclasses with almost any annotation element

OCCURRENCESPERSET = 0
SUBSET = None
XMLTAG = 'feat'
json(attribs=None, recurse=True)
classmethod relaxng(includechildren=True, extraattribs=None, extraelements=None)
xml()
class formats.folia.Figure(doc, *args, **kwargs)

Element for the representation of a graphical figure. Structure element.

ACCEPTED_DATA = (<class 'formats.folia.Sentence'>, <class 'formats.folia.Description'>, <class 'formats.folia.Caption'>, <class 'formats.folia.TextContent'>, <class 'formats.folia.String'>, <class 'formats.folia.Alignment'>, <class 'formats.folia.Metric'>, <class 'formats.folia.Alternative'>, <class 'formats.folia.Alternative'>, <class 'formats.folia.AlternativeLayers'>, <class 'formats.folia.AbstractAnnotationLayer'>, <class 'formats.folia.Correction'>, <class 'formats.folia.Part'>)
ANNOTATIONTYPE = 5
XMLTAG = 'figure'
caption()
json(attribs=None, recurse=True)
classmethod relaxng(includechildren=True, extraattribs=None, extraelements=None)
xml(attribs=None, elements=None, skipchildren=False)
class formats.folia.FunctionFeature(doc, *args, **kwargs)

Function feature, to be used with morphemes

SUBSET = 'function'
XMLTAG = None
class formats.folia.Gap(doc, *args, **kwargs)

Gap element. Represents skipped portions of the text. Contains Content and Desc elements

ACCEPTED_DATA = (<class 'formats.folia.Content'>, <class 'formats.folia.Description'>, <class 'formats.folia.Part'>)
ANNOTATIONTYPE = 26
OPTIONAL_ATTRIBS = (0, 1, 2, 3, 4)
XMLTAG = 'gap'
content()
class formats.folia.Head(doc, *args, **kwargs)

Head element. A structure element. Acts as the header/title of a division. There may be one per division. Contains sentences.

ACCEPTED_DATA = (<class 'formats.folia.Sentence'>, <class 'formats.folia.Word'>, <class 'formats.folia.Description'>, <class 'formats.folia.Event'>, <class 'formats.folia.Reference'>, <class 'formats.folia.TextContent'>, <class 'formats.folia.String'>, <class 'formats.folia.Alignment'>, <class 'formats.folia.Metric'>, <class 'formats.folia.Linebreak'>, <class 'formats.folia.Whitespace'>, <class 'formats.folia.Gap'>, <class 'formats.folia.Alternative'>, <class 'formats.folia.AlternativeLayers'>, <class 'formats.folia.AbstractAnnotationLayer'>, <class 'formats.folia.AbstractExtendedTokenAnnotation'>, <class 'formats.folia.Correction'>, <class 'formats.folia.Part'>)
OCCURRENCES = 1
TEXTDELIMITER = ' '
XMLTAG = 'head'
class formats.folia.HeadFeature(doc, *args, **kwargs)

Head feature, to be used within PosAnnotation

SUBSET = 'head'
XMLTAG = None
class formats.folia.Headspan(doc, *args, **kwargs)
ACCEPTED_DATA = (<class 'formats.folia.WordReference'>, <class 'formats.folia.Description'>, <class 'formats.folia.Feature'>, <class 'formats.folia.Alignment'>, <class 'formats.folia.Metric'>)
XMLTAG = 'hd'
class formats.folia.Label(doc, *args, **kwargs)

Element used for labels. Mostly in within list item. Contains words.

ACCEPTED_DATA = (<class 'formats.folia.Word'>, <class 'formats.folia.Reference'>, <class 'formats.folia.Description'>, <class 'formats.folia.TextContent'>, <class 'formats.folia.String'>, <class 'formats.folia.Alignment'>, <class 'formats.folia.Metric'>, <class 'formats.folia.Alternative'>, <class 'formats.folia.Alternative'>, <class 'formats.folia.AlternativeLayers'>, <class 'formats.folia.AbstractAnnotationLayer'>, <class 'formats.folia.AbstractExtendedTokenAnnotation'>, <class 'formats.folia.Correction'>, <class 'formats.folia.Part'>)
XMLTAG = 'label'
class formats.folia.LangAnnotation(doc, *args, **kwargs)

Language annotation: an extended token annotation element

ACCEPTED_DATA = (<class 'formats.folia.Feature'>, <class 'formats.folia.Description'>, <class 'formats.folia.Metric'>)
ANNOTATIONTYPE = 33
XMLTAG = 'lang'
class formats.folia.LemmaAnnotation(doc, *args, **kwargs)

Lemma annotation: a token annotation element

ACCEPTED_DATA = (<class 'formats.folia.Feature'>, <class 'formats.folia.Description'>, <class 'formats.folia.Metric'>)
ANNOTATIONTYPE = 10
XMLTAG = 'lemma'
class formats.folia.LevelFeature(doc, *args, **kwargs)

Level feature, to be used with coreferences

SUBSET = 'level'
XMLTAG = None
class formats.folia.Linebreak(doc, *args, **kwargs)

Line break element, signals a line break

ACCEPTED_DATA = ()
ANNOTATIONTYPE = 7
REQUIRED_ATTRIBS = ()
TEXTDELIMITER = '\n'
XMLTAG = 'br'
class formats.folia.List(doc, *args, **kwargs)

Element for enumeration/itemisation. Structure element. Contains ListItem elements.

ACCEPTED_DATA = (<class 'formats.folia.ListItem'>, <class 'formats.folia.Description'>, <class 'formats.folia.Caption'>, <class 'formats.folia.Event'>, <class 'formats.folia.Note'>, <class 'formats.folia.Reference'>, <class 'formats.folia.TextContent'>, <class 'formats.folia.String'>, <class 'formats.folia.Alignment'>, <class 'formats.folia.Metric'>, <class 'formats.folia.Alternative'>, <class 'formats.folia.Alternative'>, <class 'formats.folia.AlternativeLayers'>, <class 'formats.folia.AbstractAnnotationLayer'>, <class 'formats.folia.AbstractExtendedTokenAnnotation'>, <class 'formats.folia.Correction'>, <class 'formats.folia.Part'>)
ANNOTATIONTYPE = 4
TEXTDELIMITER = '\n'
XMLTAG = 'list'
class formats.folia.ListItem(doc, *args, **kwargs)

Single element in a List. Structure element. Contained within List element.

ACCEPTED_DATA = (<class 'formats.folia.List'>, <class 'formats.folia.Sentence'>, <class 'formats.folia.Description'>, <class 'formats.folia.Label'>, <class 'formats.folia.Event'>, <class 'formats.folia.Note'>, <class 'formats.folia.Reference'>, <class 'formats.folia.TextContent'>, <class 'formats.folia.String'>, <class 'formats.folia.Gap'>, <class 'formats.folia.Alignment'>, <class 'formats.folia.Metric'>, <class 'formats.folia.Alternative'>, <class 'formats.folia.AlternativeLayers'>, <class 'formats.folia.AbstractAnnotationLayer'>, <class 'formats.folia.AbstractExtendedTokenAnnotation'>, <class 'formats.folia.Correction'>, <class 'formats.folia.Part'>)
ANNOTATIONTYPE = 4
XMLTAG = 'item'
exception formats.folia.MalformedXMLError
class formats.folia.MetaDataType
CMDI = 1
IMDI = 2
NATIVE = 0
class formats.folia.Metric(doc, *args, **kwargs)

Metric elements allow the annotatation of any kind of metric with any kind of annotation element. Allowing for example statistical measures to be added to elements as annotation,

ACCEPTED_DATA = (<class 'formats.folia.Feature'>, <class 'formats.folia.ValueFeature'>, <class 'formats.folia.Description'>)
ANNOTATIONTYPE = 32
OPTIONAL_ATTRIBS = (0, 1, 2, 4, 3, 5)
REQUIRED_ATTRIB = (1,)
XMLTAG = 'metric'
class formats.folia.ModalityFeature(doc, *args, **kwargs)

Modality feature, to be used with coreferences

SUBSET = 'modality'
XMLTAG = None
class formats.folia.Mode
ITERATIVE = 2
MEMORY = 0
XPATH = 1
exception formats.folia.ModeError
class formats.folia.Morpheme(doc, *args, **kwargs)

Morpheme element, represents one morpheme in morphological analysis, subtoken annotation element to be used in MorphologyLayer

ACCEPTED_DATA = (<class 'formats.folia.FunctionFeature'>, <class 'formats.folia.Feature'>, <class 'formats.folia.TextContent'>, <class 'formats.folia.String'>, <class 'formats.folia.Metric'>, <class 'formats.folia.Alignment'>, <class 'formats.folia.AbstractTokenAnnotation'>, <class 'formats.folia.Correction'>, <class 'formats.folia.Description'>)
ANNOTATIONTYPE = 22
OPTIONAL_ATTRIBS = (0, 1, 2, 4, 3, 5)
REQUIRED_ATTRIBS = ((),)
XMLTAG = 'morpheme'
findspans(type, set=None)

Find span annotation of the specified type that include this word

class formats.folia.MorphologyLayer(doc, *args, **kwargs)

Morphology Layer: Annotation layer for Morpheme subtoken annotation elements. For morphological analysis.

ACCEPTED_DATA = (<class 'formats.folia.Morpheme'>, <class 'formats.folia.Correction'>)
ANNOTATIONTYPE = 22
XMLTAG = 'morphology'
class formats.folia.NativeMetaData(*args, **kwargs)
items()
class formats.folia.New(doc, *args, **kwargs)
OCCURRENCES = 1
OPTIONAL_ATTRIBS = ((),)
REQUIRED_ATTRIBS = ((),)
XMLTAG = 'new'
classmethod addable(Class, parent, set=None, raiseexceptions=True)
exception formats.folia.NoDefaultError
exception formats.folia.NoDescription
exception formats.folia.NoSuchAnnotation

Exception raised when the requested type of annotation does not exist for the selected element

exception formats.folia.NoSuchText

Exception raised when the requestion type of text content does not exist for the selected element

class formats.folia.Note(doc, *args, **kwargs)
ACCEPTED_DATA = (<class 'formats.folia.Paragraph'>, <class 'formats.folia.Sentence'>, <class 'formats.folia.Word'>, <class 'formats.folia.Head'>, <class 'formats.folia.List'>, <class 'formats.folia.Figure'>, <class 'formats.folia.Table'>, <class 'formats.folia.Reference'>, <class 'formats.folia.Feature'>, <class 'formats.folia.TextContent'>, <class 'formats.folia.String'>, <class 'formats.folia.Metric'>, <class 'formats.folia.AbstractExtendedTokenAnnotation'>, <class 'formats.folia.Correction'>, <class 'formats.folia.Part'>)
ANNOTATIONTYPE = 27
OCCURRENCESPERSET = 0
XMLTAG = 'note'
class formats.folia.Original(doc, *args, **kwargs)
ACCEPTED_DATA = (<class 'formats.folia.AbstractTokenAnnotation'>, <class 'formats.folia.AbstractSpanAnnotation'>, <class 'formats.folia.Word'>, <class 'formats.folia.TextContent'>, <class 'formats.folia.String'>, <class 'formats.folia.Correction'>, <class 'formats.folia.Description'>, <class 'formats.folia.Metric'>)
AUTH = False
OCCURRENCES = 1
OPTIONAL_ATTRIBS = ((),)
REQUIRED_ATTRIBS = ((),)
XMLTAG = 'original'
classmethod addable(Class, parent, set=None, raiseexceptions=True)
class formats.folia.Paragraph(doc, *args, **kwargs)

Paragraph element. A structure element. Represents a paragraph and holds all its sentences (and possibly other structure Whitespace and Quotes).

ACCEPTED_DATA = (<class 'formats.folia.Sentence'>, <class 'formats.folia.Quote'>, <class 'formats.folia.AbstractExtendedTokenAnnotation'>, <class 'formats.folia.Correction'>, <class 'formats.folia.TextContent'>, <class 'formats.folia.String'>, <class 'formats.folia.Description'>, <class 'formats.folia.Linebreak'>, <class 'formats.folia.Whitespace'>, <class 'formats.folia.Gap'>, <class 'formats.folia.List'>, <class 'formats.folia.Figure'>, <class 'formats.folia.Event'>, <class 'formats.folia.Head'>, <class 'formats.folia.Note'>, <class 'formats.folia.Reference'>, <class 'formats.folia.Alignment'>, <class 'formats.folia.Metric'>, <class 'formats.folia.Alternative'>, <class 'formats.folia.AlternativeLayers'>, <class 'formats.folia.AbstractAnnotationLayer'>, <class 'formats.folia.Part'>)
ANNOTATIONTYPE = 3
TEXTDELIMITER = '\n\n'
XMLTAG = 'p'
class formats.folia.Part(doc, *args, **kwargs)
ACCEPTED_DATA = (<class 'formats.folia.AbstractStructureElement'>, <class 'formats.folia.AbstractExtendedTokenAnnotation'>, <class 'formats.folia.AlternativeLayers'>, <class 'formats.folia.AbstractAnnotationLayer'>, <class 'formats.folia.Correction'>)
ANNOTATIONTYPE = 37
XMLTAG = 'part'
class formats.folia.Pattern(*args, **kwargs)
This class describes a pattern over words to be searched for. The

Document.findwords() method can subsequently be called with this pattern, and it will return all the words that match. An example will best illustrate this, first a trivial example of searching for one word:

    for match in doc.findwords( folia.Pattern('house') ):
        for word in match:
            print word.id
        print "----"

The same can be done for a sequence::

    for match in doc.findwords( folia.Pattern('a','big', 'house') ):
        for word in match:
            print word.id
        print "----"

The boolean value ``True`` acts as a wildcard, matching any word::

    for match in doc.findwords( folia.Pattern('a',True,'house') ):
        for word in match:
            print word.id, word.text()
        print "----"

Alternatively, and more constraning, you may also specify a tuple of alternatives::


    for match in doc.findwords( folia.Pattern('a',('big','small'),'house') ):
        for word in match:
            print word.id, word.text()
        print "----"

Or even a regular expression using the ``folia.RegExp`` class::


    for match in doc.findwords( folia.Pattern('a', folia.RegExp('b?g'),'house') ):
        for word in match:
            print word.id, word.text()
        print "----"


Rather than searching on the text content of the words, you can search on the
classes of any kind of token annotation using the keyword argument
``matchannotation=``::

    for match in doc.findwords( folia.Pattern('det','adj','noun',matchannotation=folia.PosAnnotation ) ):
        for word in match:
            print word.id, word.text()
        print "----"

The set can be restricted by adding the additional keyword argument
``matchannotationset=``. Case sensitivity, by default disabled, can be enabled by setting ``casesensitive=True``.

Things become even more interesting when different Patterns are combined. A
match will have to satisfy all patterns::

    for match in doc.findwords( folia.Pattern('a', True, 'house'), folia.Pattern('det','adj','noun',matchannotation=folia.PosAnnotation ) ):
        for word in match:
            print word.id, word.text()
        print "----"


The ``findwords()`` method can be instructed to also return left and/or right context for any match. This is done using the ``leftcontext=`` and ``rightcontext=`` keyword arguments, their values being an integer number of the number of context words to include in each match. For instance, we can look for the word house and return its immediate neighbours as follows::

    for match in doc.findwords( folia.Pattern('house') , leftcontext=1, rightcontext=1):
        for word in match:
            print word.id
        print "----"

A match here would thus always consist of three words instead of just one.

Last, ``Pattern`` also has support for variable-width gaps, the asterisk symbol
has special meaning to this end::


    for match in doc.findwords( folia.Pattern('a','*','house') ):
        for word in match:
            print word.id
        print "----"

Unlike the pattern ``('a',True,'house')``, which by definition is a pattern of
three words, the pattern in the example above will match gaps of any length (up
to a certain built-in maximum), so this might include matches such as *a very
nice house*.

Some remarks on these methods of querying are in order. These searches are
pretty exhaustive and are done by simply iterating over all the words in the
document. The entire document is loaded in memory and no special indices are involved.
For single documents this is okay, but when iterating over a corpus of
thousands of documents, this method is too slow, especially for real-time
applications. For huge corpora, clever indexing and database management systems
will be required. This however is beyond the scope of this library.
resolve(size, distribution)

Resolve a variable sized pattern to all patterns of a certain fixed size

variablesize()
variablewildcards()
class formats.folia.PosAnnotation(doc, *args, **kwargs)

Part-of-Speech annotation: a token annotation element

ACCEPTED_DATA = (<class 'formats.folia.Feature'>, <class 'formats.folia.HeadFeature'>, <class 'formats.folia.Description'>, <class 'formats.folia.Metric'>)
ANNOTATIONTYPE = 9
XMLTAG = 'pos'
class formats.folia.Query(files, expression)

An XPath query on one or more FoLiA documents

class formats.folia.Quote(doc, *args, **kwargs)

Quote: a structure element. For quotes/citations. May hold words, sentences or paragraphs.

ACCEPTED_DATA = (<class 'formats.folia.Word'>, <class 'formats.folia.Sentence'>, <class 'formats.folia.Paragraph'>, <class 'formats.folia.Division'>, <class 'formats.folia.Quote'>, <class 'formats.folia.TextContent'>, <class 'formats.folia.String'>, <class 'formats.folia.Gap'>, <class 'formats.folia.Description'>, <class 'formats.folia.Alignment'>, <class 'formats.folia.Metric'>, <class 'formats.folia.Alternative'>, <class 'formats.folia.AlternativeLayers'>, <class 'formats.folia.AbstractAnnotationLayer'>, <class 'formats.folia.Correction'>, <class 'formats.folia.Part'>)
REQUIRED_ATTRIBS = ()
XMLTAG = 'quote'
append(child, *args, **kwargs)
gettextdelimiter(retaintokenisation=False)
resolveword(id)
class formats.folia.Reader(filename, target, *args, **kwargs)

Streaming FoLiA reader. The reader allows you to read a FoLiA Document without holding the whole tree structure in memory. The document will be read and the elements you seek returned as they are found. If you are querying a corpus of large FoLiA documents for a specific structure, then it is strongly recommend to use the Reader rather than the standard Document!

findwords(*args, **kwargs)
initdoc()
openstream(filename)
class formats.folia.Reference(doc, *args, **kwargs)
ACCEPTED_DATA = (<class 'formats.folia.TextContent'>, <class 'formats.folia.String'>, <class 'formats.folia.Description'>, <class 'formats.folia.Metric'>)
OPTIONAL_ATTRIBS = (0, 2, 3, 5)
PRINTABLE = True
REQUIRED_ATTRIBS = ()
XMLTAG = 'ref'
classmethod parsexml(Class, node, doc)
classmethod relaxng(includechildren=True, extraattribs=None, extraelements=None)
resolve()
xml(attribs=None, elements=None, skipchildren=False)
class formats.folia.RegExp(regexp)
class formats.folia.Row(doc, *args, **kwargs)
ACCEPTED_DATA = (<class 'formats.folia.Cell'>, <class 'formats.folia.AbstractAnnotationLayer'>, <class 'formats.folia.AlternativeLayers'>, <class 'formats.folia.AbstractExtendedTokenAnnotation'>, <class 'formats.folia.Correction'>, <class 'formats.folia.Part'>)
ANNOTATIONTYPE = 35
REQUIRED_ATTRIBS = ((),)
TEXTDELIMITER = '\n'
XMLTAG = 'row'
class formats.folia.SemanticRole(doc, *args, **kwargs)

Semantic Role

ACCEPTED_DATA = (<class 'formats.folia.WordReference'>, <class 'formats.folia.Description'>, <class 'formats.folia.Headspan'>, <class 'formats.folia.Alignment'>, <class 'formats.folia.Metric'>)
ANNOTATIONTYPE = 31
REQUIRED_ATTRIBS = (1,)
XMLTAG = 'semrole'
class formats.folia.SemanticRolesLayer(doc, *args, **kwargs)

Syntax Layer: Annotation layer for SemanticRole span annotation elements

ACCEPTED_DATA = (<class 'formats.folia.SemanticRole'>, <class 'formats.folia.Description'>, <class 'formats.folia.Correction'>)
ANNOTATIONTYPE = 31
XMLTAG = 'semroles'
class formats.folia.SenseAnnotation(doc, *args, **kwargs)

Sense annotation: a token annotation element

ACCEPTED_DATA = (<class 'formats.folia.Feature'>, <class 'formats.folia.SynsetFeature'>, <class 'formats.folia.Description'>, <class 'formats.folia.Metric'>)
ANNOTATIONTYPE = 12
XMLTAG = 'sense'
class formats.folia.Sentence(doc, *args, **kwargs)

Sentence element. A structure element. Represents a sentence and holds all its words (and possibly other structure such as LineBreaks, Whitespace and Quotes)

ACCEPTED_DATA = (<class 'formats.folia.Word'>, <class 'formats.folia.Quote'>, <class 'formats.folia.AbstractExtendedTokenAnnotation'>, <class 'formats.folia.Correction'>, <class 'formats.folia.TextContent'>, <class 'formats.folia.String'>, <class 'formats.folia.Gap'>, <class 'formats.folia.Description'>, <class 'formats.folia.Linebreak'>, <class 'formats.folia.Whitespace'>, <class 'formats.folia.Event'>, <class 'formats.folia.Note'>, <class 'formats.folia.Reference'>, <class 'formats.folia.Alignment'>, <class 'formats.folia.Metric'>, <class 'formats.folia.Alternative'>, <class 'formats.folia.AlternativeLayers'>, <class 'formats.folia.AbstractAnnotationLayer'>, <class 'formats.folia.Part'>)
ANNOTATIONTYPE = 8
TEXTDELIMITER = ' '
XMLTAG = 's'
corrections()

Are there corrections in this sentence?

correctwords(originalwords, newwords, **kwargs)

Generic correction method for words. You most likely want to use the helper functions splitword() , mergewords(), deleteword(), insertword() instead

deleteword(word, **kwargs)

TODO: Write documentation

division()

Obtain the division this sentence is a part of (None otherwise)

insertword(newword, prevword, **kwargs)
insertwordleft(newword, nextword, **kwargs)
mergewords(newword, *originalwords, **kwargs)

TODO: Write documentation

paragraph()

Obtain the paragraph this sentence is a part of (None otherwise)

resolveword(id)
splitword(originalword, *newwords, **kwargs)

TODO: Write documentation

class formats.folia.SetDefinition(id, type, classes=[], subsets=[], constraintindex={})
json()
classmethod parsexml(Class, node)
testclass(cls)
testsubclass(cls, subset, subclass)
exception formats.folia.SetDefinitionError
class formats.folia.SetType
CLOSED = 0
MIXED = 2
OPEN = 1
class formats.folia.String(doc, *args, **kwargs)

String

ACCEPTED_DATA = (<class 'formats.folia.TextContent'>, <class 'formats.folia.Alignment'>, <class 'formats.folia.Description'>, <class 'formats.folia.Metric'>, <class 'formats.folia.Correction'>, <class 'formats.folia.AbstractExtendedTokenAnnotation'>)
ANNOTATIONTYPE = 34
OCCURRENCES = 0
OCCURRENCESPERSET = 0
OPTIONAL_ATTRIBS = (0, 1, 2, 3, 5)
PRINTABLE = True
REQUIRED_ATTRIBS = ()
XMLTAG = 'str'
class formats.folia.StyleFeature(doc, *args, **kwargs)
SUBSET = 'style'
XMLTAG = None
class formats.folia.SubjectivityAnnotation(doc, *args, **kwargs)

Subjectivity annotation/Sentiment analysis: a token annotation element

ACCEPTED_DATA = (<class 'formats.folia.Feature'>, <class 'formats.folia.Description'>, <class 'formats.folia.Metric'>)
ANNOTATIONTYPE = 21
XMLTAG = 'subjectivity'
class formats.folia.SubsetDefinition(id, type, classes=[], constraints=[])
json()
classmethod parsexml(Class, node, constraintindex={})
class formats.folia.Suggestion(doc, *args, **kwargs)
ANNOTATIONTYPE = 17
AUTH = False
OCCURRENCES = 0
OCCURRENCESPERSET = 0
XMLTAG = 'suggestion'
class formats.folia.SynsetFeature(doc, *args, **kwargs)

Synset feature, to be used within Sense

SUBSET = 'synset'
XMLTAG = None
class formats.folia.SyntacticUnit(doc, *args, **kwargs)

Syntactic Unit, span annotation element to be used in SyntaxLayer

ACCEPTED_DATA = (<class 'formats.folia.SyntacticUnit'>, <class 'formats.folia.WordReference'>, <class 'formats.folia.Description'>, <class 'formats.folia.Feature'>, <class 'formats.folia.Metric'>)
ANNOTATIONTYPE = 13
REQUIRED_ATTRIBS = ()
XMLTAG = 'su'
class formats.folia.SyntaxLayer(doc, *args, **kwargs)

Syntax Layer: Annotation layer for SyntacticUnit span annotation elements

ACCEPTED_DATA = (<class 'formats.folia.SyntacticUnit'>, <class 'formats.folia.Description'>, <class 'formats.folia.Correction'>)
ANNOTATIONTYPE = 13
XMLTAG = 'syntax'
class formats.folia.Table(doc, *args, **kwargs)
ACCEPTED_DATA = (<class 'formats.folia.TableHead'>, <class 'formats.folia.Row'>, <class 'formats.folia.AbstractAnnotationLayer'>, <class 'formats.folia.AlternativeLayers'>, <class 'formats.folia.AbstractExtendedTokenAnnotation'>, <class 'formats.folia.Correction'>, <class 'formats.folia.Part'>)
ANNOTATIONTYPE = 35
XMLTAG = 'table'
class formats.folia.TableHead(doc, *args, **kwargs)
ACCEPTED_DATA = (<class 'formats.folia.Row'>, <class 'formats.folia.AbstractAnnotationLayer'>, <class 'formats.folia.AlternativeLayers'>, <class 'formats.folia.AbstractExtendedTokenAnnotation'>, <class 'formats.folia.Correction'>, <class 'formats.folia.Part'>)
ANNOTATIONTYPE = 35
REQUIRED_ATTRIBS = ((),)
XMLTAG = 'tablehead'
class formats.folia.Text(doc, *args, **kwargs)

A full text. This is a high-level element (not to be confused with TextContent!). This element may contain divisions, paragraphs, sentences, etc..

ACCEPTED_DATA = (<class 'formats.folia.Gap'>, <class 'formats.folia.Event'>, <class 'formats.folia.Division'>, <class 'formats.folia.Paragraph'>, <class 'formats.folia.Quote'>, <class 'formats.folia.Sentence'>, <class 'formats.folia.Word'>, <class 'formats.folia.List'>, <class 'formats.folia.Figure'>, <class 'formats.folia.Table'>, <class 'formats.folia.Note'>, <class 'formats.folia.Reference'>, <class 'formats.folia.AbstractAnnotationLayer'>, <class 'formats.folia.AbstractExtendedTokenAnnotation'>, <class 'formats.folia.Description'>, <class 'formats.folia.TextContent'>, <class 'formats.folia.String'>, <class 'formats.folia.Metric'>, <class 'formats.folia.Correction'>)
OPTIONAL_ATTRIBS = (4,)
REQUIRED_ATTRIBS = (0,)
TEXTDELIMITER = '\n\n\n'
XMLTAG = 'text'
class formats.folia.TextContent(doc, *args, **kwargs)

Text content element (t), holds text to be associated with whatever element the text content element is a child of.

Text content elements on structure elements like Paragraph and Sentence are by definition untokenised. Only on Word level and deeper they are by definition tokenised.

Text content elements can specify offset that refer to text at a higher parent level. Use the following keyword arguments:
  • ref=: The instance to point to, this points to the element holding the text content element, not the text content element itself.
  • offset=: The offset where this text is found, offsets start at 0
ACCEPTED_DATA = (<class 'formats.folia.AbstractTextMarkup'>, <class 'formats.folia.Linebreak'>)
ANNOTATIONTYPE = 0
OCCURRENCES = 0
OCCURRENCESPERSET = 0
OPTIONAL_ATTRIBS = (1, 2, 3, 5)
ROOTELEMENT = True
TEXTCONTAINER = True
XMLTAG = 't'
finddefaultreference()

Find the default reference for text offsets: The parent of the current textcontent’s parent (counting only Structure Elements and Subtoken Annotation Elements)

Note: This returns not a TextContent element, but its parent. Whether the textcontent actually exists is checked later/elsewhere

classmethod findreplaceables(Class, parent, set, **kwargs)

(Method for internal usage, see AbstractElement)

json(attribs=None, recurse=True)
classmethod parsexml(Class, node, doc)

(Method for internal usage, see AbstractElement)

postappend()

(Method for internal usage, see AbstractElement.postappend())

classmethod relaxng(includechildren=True, extraattribs=None, extraelements=None)
settext(text)
text()

Obtain the text (unicode instance)

validateref()

Validates the Text Content’s references. Raises UnresolvableTextContent when invalid

xml(attribs=None, elements=None, skipchildren=False)
class formats.folia.TextCorrectionLevel
CORRECTED = 0
INLINE = 3
ORIGINAL = 2
UNCORRECTED = 1
class formats.folia.TextMarkupCorrection(doc, *args, **kwargs)
ANNOTATIONTYPE = 16
XMLTAG = 't-correction'
json(attribs=None, recurse=True)
classmethod parsexml(Class, node, doc)
classmethod relaxng(includechildren=True, extraattribs=None, extraelements=None)
xml(attribs=None, elements=None, skipchildren=False)
class formats.folia.TextMarkupError(doc, *args, **kwargs)
ANNOTATIONTYPE = 18
XMLTAG = 't-error'
class formats.folia.TextMarkupGap(doc, *args, **kwargs)
ANNOTATIONTYPE = 26
XMLTAG = 't-gap'
class formats.folia.TextMarkupString(doc, *args, **kwargs)
ANNOTATIONTYPE = 34
XMLTAG = 't-str'
class formats.folia.TextMarkupStyle(doc, *args, **kwargs)
ANNOTATIONTYPE = 36
XMLTAG = 't-style'
class formats.folia.TimeFeature(doc, *args, **kwargs)

Time feature, to be used with coreferences

SUBSET = 'time'
XMLTAG = None
class formats.folia.TimeSegment(doc, *args, **kwargs)
ACCEPTED_DATA = (<class 'formats.folia.WordReference'>, <class 'formats.folia.Description'>, <class 'formats.folia.Feature'>, <class 'formats.folia.ActorFeature'>, <class 'formats.folia.BegindatetimeFeature'>, <class 'formats.folia.EnddatetimeFeature'>, <class 'formats.folia.Metric'>)
ANNOTATIONTYPE = 25
OCCURRENCESPERSET = 0
XMLTAG = 'timesegment'
formats.folia.TimedEvent

alias of TimeSegment

class formats.folia.TimingLayer(doc, *args, **kwargs)

Dependencies Layer: Annotation layer for Dependency span annotation elements. For dependency entities.

ACCEPTED_DATA = (<class 'formats.folia.TimeSegment'>, <class 'formats.folia.Description'>, <class 'formats.folia.Correction'>)
ANNOTATIONTYPE = 25
XMLTAG = 'timing'
exception formats.folia.UnresolvableTextContent
class formats.folia.ValueFeature(doc, *args, **kwargs)

Value feature, to be used within Metric

SUBSET = 'value'
XMLTAG = None
class formats.folia.Whitespace(doc, *args, **kwargs)

Whitespace element, signals a vertical whitespace

ACCEPTED_DATA = ()
ANNOTATIONTYPE = 6
REQUIRED_ATTRIBS = ()
TEXTDELIMITER = '\n\n'
XMLTAG = 'whitespace'
class formats.folia.Word(doc, *args, **kwargs)

Word (aka token) element. Holds a word/token and all its related token annotations.

ACCEPTED_DATA = (<class 'formats.folia.AbstractTokenAnnotation'>, <class 'formats.folia.Correction'>, <class 'formats.folia.TextContent'>, <class 'formats.folia.String'>, <class 'formats.folia.Alternative'>, <class 'formats.folia.AlternativeLayers'>, <class 'formats.folia.Description'>, <class 'formats.folia.AbstractAnnotationLayer'>, <class 'formats.folia.Alignment'>, <class 'formats.folia.Metric'>, <class 'formats.folia.Reference'>)
ANNOTATIONTYPE = 1
XMLTAG = 'w'
division()

Obtain the deepest division this word is a part of, otherwise return None

domain(set=None)

Shortcut: returns the FoLiA class of the domain annotation (will return only one if there are multiple!)

findspans(type, set=None)

Find span annotation of the specified type that includes this word

getcorrection(set=None, cls=None)
getcorrections(set=None, cls=None)
gettextdelimiter(retaintokenisation=False)

Returns the text delimiter

json(attribs=None, recurse=True)
lemma(set=None)

Shortcut: returns the FoLiA class of the lemma annotation (will return only one if there are multiple!)

morpheme(index, set=None)

Returns a specific morpheme, the n’th morpheme (given the particular set if specified).

morphemes(set=None)

Generator yielding all morphemes (in a particular set if specified). For retrieving one specific morpheme by index, use morpheme() instead

paragraph()

Obtain the paragraph this word is a part of, otherwise return None

classmethod parsexml(Class, node, doc)
pos(set=None)

Shortcut: returns the FoLiA class of the PoS annotation (will return only one if there are multiple!)

classmethod relaxng(includechildren=True, extraattribs=None, extraelements=None)
resolveword(id)
sense(set=None)

Shortcut: returns the FoLiA class of the sense annotation (will return only one if there are multiple!)

sentence()

Obtain the sentence this word is a part of, otherwise return None

split(*newwords, **kwargs)
xml(attribs=None, elements=None, skipchildren=False)
class formats.folia.WordReference(doc, *args, **kwargs)

Word reference. Used to refer to words or morphemes from span annotation elements. The Python class will only be used when word reference can not be resolved, if they can, Word or Morpheme objects will be used

REQUIRED_ATTRIBS = (0,)
XMLTAG = 'wref'
classmethod parsexml(Class, node, doc)
classmethod relaxng(includechildren=True, extraattribs=None, extraelements=None)
formats.folia.c

alias of Division

formats.folia.commonancestors(Class, *args)

Generator over common ancestors, of the Class specified, of the current element and the other specified elements

formats.folia.findwords(doc, worditerator, *args, **kwargs)
formats.folia.isncname(name)
formats.folia.loadsetdefinition(filename)
formats.folia.makeelement(E, tagname, **kwargs)
formats.folia.parse_datetime(s)

Returns (datetime, tz offset in minutes) or (None, None).

formats.folia.parsecommonarguments(object, doc, annotationtype, required, allowed, **kwargs)

Internal function, parses common FoLiA attributes and sets up the instance accordingly

formats.folia.relaxng(filename=None)
formats.folia.relaxng_declarations()
formats.folia.validate(filename, schema=None, deep=False)
formats.folia.xmltreefromfile(filename, bypassleak=False)
formats.folia.xmltreefromstring(s, bypassleak=False)

Table Of Contents

Previous topic

FoLiA library

Next topic

Language Models

This Page