Published October 9, 2025
| Version Full text
Preprint
Open
Enhancing complex XML documents with linguistic annotation
Description
When building corpora from annotated XML documents, the compilers are usually confronted with the incapability of most tools for linguistic analysis and parsing (tokenizers, lemmatizers, PoS-taggers, etc.) to process more than just plain text input. Various single purpose solutions have been created for this purpose. We tried to develop a general set of scripts to assist with the task of enriching documents containing complex XML annotation with linguistic annotation generated by automatic analyzers. We present the challenges we met and solutions we chose, discussing their advantages, disadvantages and limits.
Files
Xml_annotation_LREC2026.pdf
Files
(132.0 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:270a28ede0a54d64c444995af8139926
|
132.0 kB | Preview Download |
Additional details
Dates
- Submitted
-
2025-10-03LREC 2026