An Introduction to Schematron & Schematron QuickFix

David Maus dmaus@dmaus.name

TEI Member's Meeting 2019

Administrivia

About me

  • senior software engineer working at the Herzog August Bibliothek Wolfenbüttel
  • markup & web technologies
  • author of SchXslt, an XSLT based Schematron processor

About you

Agenda

  1. Introduction
  2. Core features
  3. Advanced features
  4. Schematron QuickFix

Workshop files

https://github.com/dmj/workshop-tei2019-public

Schematron – An Introduction

What is Schematron

  • an ISO-standardized language to express constraints on structured documents
  • a rule-based language to find (assert, report) patterns in XML documents
  • uses XPath as expression langue

Reasons to use Schematron

  • express constraints other schema languages can't express
  • different requirements at different stages of a documents lifecycle
  • manage legal but unusual variations of a document
  • generate reports about your documents

Example 1

The @to and @notAfter attributes cannot be used together.
<schema xmlns="http://purl.oclc.org/dsdl/schematron" queryBinding="xslt2">
    <pattern>
        <rule context="tei:*[@to]">
            <report test="@notAfter" role="nonfatal">
                The @to and @notAfter attributes cannot be used together.
            </report>
        </rule>
    </pattern>
</schema>
File: examples/1-introduction/1-attributes.sch

Example 2

Every footnote must end with a punctuation mark.
<schema xmlns="http://purl.oclc.org/dsdl/schematron" queryBinding="xslt2">
  <pattern>
   <rule context="tei:note[@type='footnote']">
    <assert test="ends-with(., '.') or ends-with(., '?') or ends-with(., '!')">
        A footnote must end with a punctuation mark.
    </assert>
   </rule>
 </pattern>
</schema>
File: examples/1-introduction/2-footnote.sch

Example 3

Tell me if the document contains characters that require a special font to render correctly.
<schema xmlns="http://purl.oclc.org/dsdl/schematron" queryBinding="xslt2">
    <pattern>
        <rule context="text()|@*">
            <report test="matches(., '\p{Co}')" role="info">
                This element or attribute contains characters
                from the Unicode Private Use Area.
            </report>
        </rule>
    </pattern>
</schema>
File: examples/1-introduction/3-unicode.sch

Schematron is a feather duster to reach the corners that other schema languages cannot reach.

Schematron resources

  • ISO-IEC 19757-3:2016
  • http://www.schematron.com
  • Schematron Users Meetup at XML Prague
  • Hedler et al. (2011): Schematron: Effiziente Business Rules für XML-Dokumente

Using Schematron

A language, not a program

  • Schematron schema specifies the tests to be made on your XML documents
  • Schematron processor reads the schema, applies the tests to your document, and reports back

Simple processing architecture

Schematron in <oXygen/>

  • well integrated, based on an implementation by Rick Jelliffe, Oliver Becker and others
  • usable like any other schema technology
  • also support for Schematron QuickFix

Schematron from the command line

  • SchXslt CLI based on SchXslt, a Schematron processor implementation by David Maus
  • Java-based command line tool
  • schxslt-cli.jar in the workshop's bin/ folder

Calling SchXslt CLI


java -jar bin/schxslt-cli.jar
usage: name.dmaus.schxslt.cli.Main [-d ] [-o ] [-p ] [-r] -s  [-v]
 -d,--document      Path to document
 -o,--output        Output file (SVRL report)
 -p,--phase         Validation phase
 -r,--repl               Run as REPL
 -s,--schematron    Path to schema
 -v,--verbose            Verbose output
                            

Other processors

  • Schematron Ant / SchXslt Ant
  • XProc p:validate-with-schematron
  • Batch files for XSLT-based processors

Core features

Overview

Assertions
the tests you want to make on your document
Messages
human-readable text you get back when a test succeeds or fails
Rules
select a part of your document as context for a set of assertions
Patterns
collect a list of related rules

Structure of a simple Schematron


schema
  title?
  p*
  ns*
  pattern+
    rule+
      (assert | report)+
                        

Top-level elements

<schema xmlns="http://purl.oclc.org/dsdl/schematron" queryBinding="xslt2">
    <title>Example schema</title>
    <ns prefix="tei" uri="http://www.tei-c.org/ns/1.0"/>
    <pattern>
      …
    </pattern>
</schema>

Top-level elements

<schema xmlns="http://purl.oclc.org/dsdl/schematron" queryBinding="xslt2">
    <title>Example schema</title>
    <ns prefix="tei" uri="http://www.tei-c.org/ns/1.0"/>
    <pattern>
      …
    </pattern>
</schema>
  • namespace declaration for the Schematron vocabulary

Top-level elements

<schema xmlns="http://purl.oclc.org/dsdl/schematron" queryBinding="xslt2">
    <title>Example schema</title>
    <ns prefix="tei" uri="http://www.tei-c.org/ns/1.0"/>
    <pattern>
      …
    </pattern>
</schema>
  • select query language for schema expressions
  • common values are xslt (default) or xslt2
  • recommended value is xslt2

Top-level elements

<schema xmlns="http://purl.oclc.org/dsdl/schematron" queryBinding="xslt2">
    <title>Example schema</title>
    <ns prefix="tei" uri="http://www.tei-c.org/ns/1.0"/>
    <pattern>
      …
    </pattern>
</schema>
  • optional element for documentation purposes
  • also p for paragraphs

Top-level elements

<schema xmlns="http://purl.oclc.org/dsdl/schematron" queryBinding="xslt2">
    <title>Example schema</title>
    <ns prefix="tei" uri="http://www.tei-c.org/ns/1.0"/>
    <pattern>
      …
    </pattern>
</schema>
  • namespaces used in schema expressions

Top-level elements

<schema xmlns="http://purl.oclc.org/dsdl/schematron" queryBinding="xslt2">
    <title>Example schema</title>
    <ns prefix="tei" uri="http://www.tei-c.org/ns/1.0"/>
    <pattern>…</pattern>
</schema>
  • container for related rules

The Schematron Rule

  • a rule is said to fire if the expression in @context matches a node
  • a rule can match every kind of node: elements, attributes, comments, processing-instructions, text nodes
  • this node acts as the context for the rule's assertions

The Schematron Rule

<schema xmlns="http://purl.oclc.org/dsdl/schematron" queryBinding="xslt2">
   <ns prefix="tei" uri="http://www.tei-c.org/ns/1.0"/>
   <pattern>
       <rule context="tei:titleStmt">…</rule>
       <rule context="@ref">…</rule>
       <rule context="text()">…</rule>
   </pattern>
</schema>

The Schematron Rule

<schema xmlns="http://purl.oclc.org/dsdl/schematron" queryBinding="xslt2">
   <ns prefix="tei" uri="http://www.tei-c.org/ns/1.0"/>
   <pattern>
       <rule context="tei:titleStmt" role="info">…</rule>
       <rule context="@ref">…</rule>
       <rule context="text()">…</rule>
   </pattern>
</schema>
  • @role allows for a classification of the rule
  • commonly used to indicate severity
  • no standardized vocabulary, common values are info, warn, error etc.

The Schematron Assertion

  • umbrella term for tests checking things that should be there (assert) and things you wan't to be told about (report)
  • assert: if the expression in the @test attribute evaluates to false the assertion is unmet and will be reported
  • report: if the expression in the @test attribute evaluates to true the assertion is met and will be reported

The Schematron Assertion

<schema xmlns="http://purl.oclc.org/dsdl/schematron" queryBinding="xslt2">
    <ns prefix="tei" uri="http://www.tei-c.org/ns/1.0"/>
    <pattern>
        <rule context="tei:titleStmt">
            <assert test="empty(tei:title[@type = 'short']/tei:*)">…</assert>
        </rule>
        <rule context="text()">
            <report test="contains(., '...') or contains(., '…')">…</report>
        </rule>
    </pattern>

</schema>
File: examples/2-core/1-assertions.sch

The Schematron Message

  • natural-language messages targeted at consumers of the validation report
  • can contain dynamic text via name (name of the context node) and value-of elements (arbitrary XPath expression)

Caution: Order of rules matters

  • only the first rule in a pattern that matches a particular node in your document fires
  • rules in a pattern act like a if-then-else statement
  • don't just add rules without thinking!

Order of rules example

  • second rule in examples/2-core/02-rule-order-error.sch does not fire because the titleStmt was already matched by the first rule
  • hard to detect, can give the impression is valid while it is not
  • files 03-… to 05-… present different solutions to the problem

Advanced Schematron

Variables

  • reuse an XPath expression more than once
  • expression in @value is avaluated in the context of a rule or the XML document, depending on where it is defined
  • use variable by adding a '$' in front of it's name

Variables

Child of Context Scope
schema document root global
pattern document root pattern
rule node selected by @context rule

Variables

Example

examples/3-advanced/01-let.sch

Abstract Rules

  • a rule can declared to be abstract by setting the attribute @abstract to 'true'
  • an abstract rule collects assertions but has no @context
  • @context is provided by a rule that extends an abstract rule

Abstract Rules

Example

examples/3-advanced/03-abstract-rule.sch

Abstract Patterns

  • a pattern can declared to be abstract by setting the attribute @abstract to 'true'
  • an abstract pattern collects rules
  • these rules and their assertions can use placeholders that are replaced in a pattern that instantiates the abstract pattern

Abstract Patterns

Example

examples/3-advanced/04-abstract-pattern.sch

Abstract Patterns

  • placeholders and variables use the same notation but are different
  • variables are calculated during validation
  • placeholders are simply substituted with their values

Schematron QuickFix

QuickFix

  • Schematron extension language to define corrections to Schematron errors
  • designed by Nico Kutscherauer, Octavian Nadolu and others
  • well integrated into <oXygen/>

Interactive

  • specify assertions
  • specify possible corrections
  • let the user choose which correction to apply

QuickFix workflow

Source: Schematron Quick Fixes Specification, Draft 2018

QuickFix resources

  • http://schematron-quickfix.github.io/sqf
  • https://www.youtube.com/user/oxygenxml
  • Schematron Users Meetup at XML Prague

QuickFix in action

Example

examples/4-quickfix/xx-complete.xml

Structure of a simple QuickFix


assert
  sqf:fix*
    sqf:description
      sqf:title
    ( sqf:add | sqf:delete | sqf:replace  | sqf:replaceString )
                        

Adding a node (sqf:add)

@match
select a context node other then the default (i.e. rule context)
@node-type
type of the new node (attribute, element, …)
@target
name of the new node
@select
value of the new node
@position
position of the new node

Adding a node (sqf:add)

Example

examples/4-quickfix/01-add-attribute.sch

examples/4-quickfix/02-user-entry.sch

Removing a node (sqf:delete)

@match
select a node other then the default (i.e. rule context)

Removing a node (sqf:delete)

Example

examples/4-quickfix/03-del-attribute.sch

Replacing a node (sqf:replace)

@match
select a context node other then the default (i.e. rule context)
@node-type
type of the new node (attribute, element, …)
@target
name of the new node
@select
value of the new node

Replacing a node (sqf:replace)

Example

examples/4-quickfix/04-unwrap-elements.sch