Schema and guidelines for creating a staticSearch engine for your HTML5 site
Martin Holmes
Joey Takeda
2019-2021

This documentation provides instructions on how to use the Project Endings staticSearch Generator to provide a fully-functional search ‘engine’ to your website without any dependency on server-side code such as a database.

Appendix A Schema specification and tag documentation

Appendix A.1 Elements

Appendix A.1.1 <config>

<config> (The root element for the Search Generator configuration file.)
Namespace http://hcmc.uvic.ca/ns/staticSearch
Module ss — Schema specification and tag documentation
Contained by
May contain
Content model
<content>
 <elementRef key="params"/>
 <elementRef key="rules" minOccurs="0"/>
 <elementRef key="contexts" minOccurs="0"/>
 <elementRef key="excludes" minOccurs="0"/>
</content>
    
Schema Declaration
element config { params, rules?, contexts?, excludes? }

Appendix A.1.2 <context>

<context> (A context definition, providing a match attribute that identifies the context, allowing keyword-in-context fragments to be bounded by a specific context.)
Namespace http://hcmc.uvic.ca/ns/staticSearch
Module ss — Schema specification and tag documentation
Attributes att.match (@match) att.labelled (@label)
context
Status Optional
Datatype boolean
Contained by
May contain Empty element
Content model
<content>
 <empty/>
</content>
    
Schema Declaration
element context
{
   att.match.attributes,
   att.labelled.attributes,
   attribute context { text }?,
   empty
}

Appendix A.1.3 <contexts>

<contexts> (The set of contexts, expressed as XPath in match, that controls the identification of contexts for keyword-in-context fragments.)
Namespace http://hcmc.uvic.ca/ns/staticSearch
Module ss — Schema specification and tag documentation
Contained by
ss: config
May contain
Content model
<content>
 <elementRef key="context" minOccurs="1"
  maxOccurs="unbounded"/>
</content>
    
Schema Declaration
element contexts { context+ }

Appendix A.1.4 <createContexts>

<createContexts> (Whether to include keyword-in-context extracts in the index. This increases the size of the index considerably, but it allows for more user-friendly search results, as well as phrasal searches.)
Namespace http://hcmc.uvic.ca/ns/staticSearch
Module ss — Schema specification and tag documentation
Contained by
ss: params
May contain
XSD boolean
Content model
<content>
 <dataRef name="boolean"/>
</content>
    
Schema Declaration
element createContexts { xsd:boolean }

Appendix A.1.5 <dictionaryFile>

<dictionaryFile> (The location of a dictionary file (one word per line) which will be used to check tokens when indexing.)
Namespace http://hcmc.uvic.ca/ns/staticSearch
Module ss — Schema specification and tag documentation
Contained by
ss: params
May contain
XSD anyURI
Content model
<content>
 <dataRef name="anyURI"/>
</content>
    
Schema Declaration
element dictionaryFile { xsd:anyURI }

Appendix A.1.6 <exclude>

<exclude> (An exclusion definition, which excludes either documents or filters as defined by an XPath in match.)
Namespace http://hcmc.uvic.ca/ns/staticSearch
Module ss — Schema specification and tag documentation
Attributes att.match (@match)
type
Status Required
Legal values are:
index
(Index exclusion) An exclusion that specifies HTML fragment (which itself can be the root HTML element) to exclude from the document index.
filter
(Filter exclusion) An exclusion that matches an HTML meta tag to exclude from the filter controls on the search page.
Contained by
May contain Empty element
Content model
<content>
 <empty/>
</content>
    
Schema Declaration
element exclude
{
   att.match.attributes,
   attribute type { "index" | "filter" },
   empty
}

Appendix A.1.7 <excludes>

<excludes> (The set of exclusions, expressed as XPath in match, that control the subset of documents for a particular search.)
Namespace http://hcmc.uvic.ca/ns/staticSearch
Module ss — Schema specification and tag documentation
Contained by
ss: config
May contain
Content model
<content>
 <elementRef key="exclude" minOccurs="1"
  maxOccurs="unbounded"/>
</content>
    
Schema Declaration
element excludes { exclude+ }

Appendix A.1.8 <indentJSON>

<indentJSON> (Whether or not to indent code in the JSON index files. Indenting increases the file size, but it can be useful if you need to read the files for debugging purposes.)
Namespace http://hcmc.uvic.ca/ns/staticSearch
Module ss — Schema specification and tag documentation
Contained by
ss: params
May contain
XSD boolean
Content model
<content>
 <dataRef name="boolean"/>
</content>
    
Schema Declaration
element indentJSON { xsd:boolean }

Appendix A.1.9 <kwicTruncateString>

<kwicTruncateString> (The string that will be used to signal ellipsis at the beginning and end of a keyword-in-context extract. Conventionally three periods, or an ellipsis character.)
Namespace http://hcmc.uvic.ca/ns/staticSearch
Module ss — Schema specification and tag documentation
Contained by
ss: params
May contain Character data only
Content model
<content>
 <textNode/>
</content>
    
Schema Declaration
element kwicTruncateString { text }

Appendix A.1.10 <linkToFragmentId>

<linkToFragmentId> (Whether to link keyword-in-context extracts to the nearest id in the document. Default is true.)
Namespace http://hcmc.uvic.ca/ns/staticSearch
Module ss — Schema specification and tag documentation
Contained by
ss: params
May contain
XSD boolean
Content model
<content>
 <dataRef name="boolean"/>
</content>
    
Schema Declaration
element linkToFragmentId { xsd:boolean }

Appendix A.1.11 <maxKwicsToHarvest>

<maxKwicsToHarvest> (This controls the maximum number of keyword-in-context extracts that will be stored for each term. If phrasalSearch is set to true, this parameter is ignored, because phrasal searches will only work properly if all contexts are stored.)
Namespace http://hcmc.uvic.ca/ns/staticSearch
Module ss — Schema specification and tag documentation
Contained by
ss: params
May contain
XSD nonNegativeInteger
Content model
<content>
 <dataRef name="nonNegativeInteger"/>
</content>
    
Schema Declaration
element maxKwicsToHarvest { xsd:nonNegativeInteger }

Appendix A.1.12 <maxKwicsToShow>

<maxKwicsToShow> (This controls the maximum number of keyword-in-context extracts that will be shown in the search page for each hit document returned.)
Namespace http://hcmc.uvic.ca/ns/staticSearch
Module ss — Schema specification and tag documentation
Contained by
ss: params
May contain
XSD nonNegativeInteger
Content model
<content>
 <dataRef name="nonNegativeInteger"/>
</content>
    
Schema Declaration
element maxKwicsToShow { xsd:nonNegativeInteger }

Appendix A.1.13 <outputFolder>

<outputFolder> (The name of the output folder into which the index data and JavaScript will be placed in the site search. This should conform with the XML Name specification.)
Namespace http://hcmc.uvic.ca/ns/staticSearch
Module ss — Schema specification and tag documentation
Contained by
ss: params
May contain
XSD NCName
Content model
<content>
 <dataRef name="NCName"/>
</content>
    
Schema Declaration
element outputFolder { xsd:NCName }

Appendix A.1.14 <params>

<params> (Element containing most of the settings which enable the Generator to find the target website content and process it appropriately.)
Namespace http://hcmc.uvic.ca/ns/staticSearch
Module ss — Schema specification and tag documentation
Contained by
ss: config
May contain
Content model
<content>
 <elementRef key="searchFile"/>
 <elementRef key="versionFile"
  minOccurs="0"/>
 <elementRef key="stemmerFolder"
  minOccurs="0"/>
 <elementRef key="recurse"/>
 <elementRef key="linkToFragmentId"
  minOccurs="0"/>
 <elementRef key="scrollToTextFragment"
  minOccurs="0"/>
 <elementRef key="scoringAlgorithm"
  minOccurs="0"/>
 <elementRef key="phrasalSearch"
  minOccurs="0"/>
 <elementRef key="wildcardSearch"
  minOccurs="0"/>
 <elementRef key="createContexts"
  minOccurs="0"/>
 <elementRef key="maxKwicsToHarvest"
  minOccurs="0"/>
 <elementRef key="maxKwicsToShow"
  minOccurs="0"/>
 <elementRef key="totalKwicLength"
  minOccurs="0"/>
 <elementRef key="kwicTruncateString"
  minOccurs="0"/>
 <elementRef key="verbose" minOccurs="0"/>
 <elementRef key="stopwordsFile"
  minOccurs="0"/>
 <elementRef key="dictionaryFile"
  minOccurs="0"/>
 <elementRef key="replacementsFile"
  minOccurs="0"/>
 <elementRef key="indentJSON" minOccurs="0"/>
 <elementRef key="outputFolder"
  minOccurs="0"/>
</content>
    
Schema Declaration
element params {  }

Appendix A.1.15 <phrasalSearch>

<phrasalSearch> (Whether or not to support phrasal searches. If this is true, then the maxContexts setting will be ignored, because all contexts are required to properly support phrasal search.)
Namespace http://hcmc.uvic.ca/ns/staticSearch
Module ss — Schema specification and tag documentation
Contained by
ss: params
May contain
XSD boolean
Content model
<content>
 <dataRef name="boolean"/>
</content>
    
Schema Declaration
element phrasalSearch { xsd:boolean }

Appendix A.1.16 <recurse>

<recurse> (Whether to recurse into subdirectories of the collection directory or not.)
Namespace http://hcmc.uvic.ca/ns/staticSearch
Module ss — Schema specification and tag documentation
Contained by
ss: params
May contain
XSD boolean
Content model
<content>
 <dataRef name="boolean"/>
</content>
    
Schema Declaration
element recurse { xsd:boolean }

Appendix A.1.17 <rule>

<rule> (A rule that specifies a document path as XPath in match, and provides weighting for search terms found in that context.)
Namespace http://hcmc.uvic.ca/ns/staticSearch
Module ss — Schema specification and tag documentation
Attributes att.match (@match)
weight (The weighting to give to a search token found in the context specified by the match attribute. Set to 0 to completely suppress indexing for a specific context, or greater than 1 to give stronger weighting.)
Status Required
Datatype nonNegativeInteger
Contained by
ss: rules
May contain Empty element
Content model
<content>
 <empty/>
</content>
    
Schema Declaration
element rule { att.match.attributes, attribute weight { text }, empty }

Appendix A.1.18 <rules>

<rules> (The set of rules, expressed as XPath in match, that control weighting of search terms found in specific contexts.)
Namespace http://hcmc.uvic.ca/ns/staticSearch
Module ss — Schema specification and tag documentation
Contained by
ss: config
May contain
ss: rule
Content model
<content>
 <elementRef key="rule" minOccurs="1"
  maxOccurs="unbounded"/>
</content>
    
Schema Declaration
element rules { rule+ }

Appendix A.1.19 <scoringAlgorithm>

<scoringAlgorithm> (Which scoring algorithm to use. Default is "raw" (i.e. weighted counts))
Namespace http://hcmc.uvic.ca/ns/staticSearch
Module ss — Schema specification and tag documentation
Contained by
ss: params
May contain Empty element
Content model
<content>
 <valList type="closed">
  <valItem ident="raw">
   <desc>raw score</desc>
   <gloss>Default: Calculate the score based off of the weighted number of
       instances of a term in a text.</gloss>
  </valItem>
  <valItem ident="tf-idf">
   <gloss>Calculate the score based off of the tf-idf scoring algorithm.</gloss>
  </valItem>
 </valList>
</content>
    
Legal values are:
raw
(Default: Calculate the score based off of the weighted number of instances of a term in a text.) raw score
tf-idf
(Calculate the score based off of the tf-idf scoring algorithm.)
Schema Declaration
element scoringAlgorithm { "raw" | "tf-idf" }
Legal values are:
raw
(Default: Calculate the score based off of the weighted number of instances of a term in a text.) raw score
tf-idf
(Calculate the score based off of the tf-idf scoring algorithm.)

Appendix A.1.20 <scrollToTextFragment>

<scrollToTextFragment> (WARNING: Experimental technology. This turns on a feature currently only supported by a subset of browsers, enabling links from keyword-in-context results directly to the specific text string in the target document.)
Namespace http://hcmc.uvic.ca/ns/staticSearch
Module ss — Schema specification and tag documentation
Contained by
ss: params
May contain
XSD boolean
Content model
<content>
 <dataRef name="boolean"/>
</content>
    
Schema Declaration
element scrollToTextFragment { xsd:boolean }

Appendix A.1.21 <searchFile>

<searchFile> (The search file (aka page) that will be the primary access point for the staticSearch. Note that this page must be at the root of the collection directory.)
Namespace http://hcmc.uvic.ca/ns/staticSearch
Module ss — Schema specification and tag documentation
Contained by
ss: params
May contain
XSD anyURI
Content model
<content>
 <dataRef name="anyURI"/>
</content>
    
Schema Declaration
element searchFile { xsd:anyURI }

Appendix A.1.22 <stemmerFolder>

<stemmerFolder> (The name of a folder inside the staticSearch /stemmers/ folder, in which the JavaScript and XSLT implementations of stemmers can be found. If left blank, then the staticSearch default English stemmer will be used (stemmers/en).)
Namespace http://hcmc.uvic.ca/ns/staticSearch
Module ss — Schema specification and tag documentation
Contained by
ss: params
May contain
XSD NCName
Content model
<content>
 <dataRef name="NCName"/>
</content>
    
Schema Declaration
element stemmerFolder { xsd:NCName }

Appendix A.1.23 <stopwordsFile>

<stopwordsFile> (The location of a text file containing a list of stopwords (words to be ignored when indexing). These are typically words too common to be worth searching for, but every site will also have some specific terms which are used so widely across the site that they should be suppressed to control the index size. The list should be in plain text with one word per line.)
Namespace http://hcmc.uvic.ca/ns/staticSearch
Module ss — Schema specification and tag documentation
Contained by
ss: params
May contain
XSD anyURI
Content model
<content>
 <dataRef name="anyURI"/>
</content>
    
Schema Declaration
element stopwordsFile { xsd:anyURI }

Appendix A.1.24 <totalKwicLength>

<totalKwicLength> (If createContexts is set to true, then this parameter controls how long the contexts will be.)
Namespace http://hcmc.uvic.ca/ns/staticSearch
Module ss — Schema specification and tag documentation
Contained by
ss: params
May contain
XSD nonNegativeInteger
Content model
<content>
 <dataRef name="nonNegativeInteger"/>
</content>
    
Schema Declaration
element totalKwicLength { xsd:nonNegativeInteger }

Appendix A.1.25 <verbose>

<verbose> (Turns on more detailed reporting during the indexing process.)
Namespace http://hcmc.uvic.ca/ns/staticSearch
Module ss — Schema specification and tag documentation
Contained by
ss: params
May contain
XSD boolean
Content model
<content>
 <dataRef name="boolean"/>
</content>
    
Schema Declaration
element verbose { xsd:boolean }

Appendix A.1.26 <versionFile>

<versionFile> (The relative path to a text file containing a single version identifier (such as 1.5, 123456, or 06ad419). This will be used to create unique filenames for JSON resources, so that when a site is updated, so that the browser will not use cached versions of older index files.)
Namespace http://hcmc.uvic.ca/ns/staticSearch
Module ss — Schema specification and tag documentation
Contained by
ss: params
May contain
XSD anyURI
Content model
<content>
 <dataRef name="anyURI"/>
</content>
    
Schema Declaration
element versionFile { xsd:anyURI }

Appendix A.1.27 <wildcardSearch>

<wildcardSearch> (Whether or not to support wildcard searches. Note that wildcard searches are more effective when phrasal searching is also turned on, because the contexts available for phrasal searches are also used to provide wildcard results.)
Namespace http://hcmc.uvic.ca/ns/staticSearch
Module ss — Schema specification and tag documentation
Contained by
ss: params
May contain
XSD boolean
Content model
<content>
 <dataRef name="boolean"/>
</content>
    
Schema Declaration
element wildcardSearch { xsd:boolean }

Appendix A.2 Attribute classes

Appendix A.2.1 att.labelled

att.labelled (A class providing a label attribute that can be used to identify/describe contexts and other things which benefit from description.)
Module ss — Schema specification and tag documentation
Members context
Attributes
label (A string identifier which may be descriptive.)
Status Optional
Datatype string

Appendix A.2.2 att.match

att.match (A class providing attributes that enable specification of document locations.)
Module ss — Schema specification and tag documentation
Members context exclude rule
Attributes
match (An XPath equivalent to the @match attribute of an xsl:template, which specifies a context in a document.)
Status Required
Datatype string
Notes
1
This example taken from Thomas S. Kuhn, The Structure of Scientific Revolutions (50th anniversary edition), University of Chicago Press, 2012: p. 191.
Martin Holmes and Joey Takeda. Date: 2019-2021