fts_indexer − A tool to index Emdros databases for Full Text Search
fts_indexer [ options ] [filters]
fts_indexer is a command-line tool to prepare an Emdros database for Full Text Search using the techniques of the fts_search tool (or the libharvest library).
The fts_indexer tool needs a source database to read from, an optional target database to write to (which by default is the same as the source database), and an optional output file name to (also) dump the MQL to.
The program also needs a "bookcase object type", which must consist of objects that are somehow "larger" than the "indexed object type". That is, the "indexed object type" (for example, "Word") must have objects which are wholly contained within the "bookcase object type" (for example, "document"). In effect, the "bookcase object type" provides the frames of monads within which to index the "indexed object type". In addition, the "indexed object type" must have a concomitant "indexed feature", which is the feature on which to index. For example, "surface" could be a feature on "Word" which could be ideal to index on.
The program will issue a series of "CREATE OBJECT TYPE" and "CREATE OBJECTS WITH OBJECT TYPE" statements on the target database. Optionally, with the --nodb switch, one can skip the actual creation of the data in the database. This is usually used with the -o switch, which dumps the MQL statements to a file. It is possible both to dump the MQL statements to a file, and to create the data in the database on-the-fly at the same time.
fts_indexer
supports the following command-line switches:
−−help
show help
−V , −−version
show version
−fe , −−fts−engine fts−engine−version
Use the given version of the FTS engine. Currently, versions 1 and 2 are supported. Version 1 is the current default.
−−nodb
Do not create objects or object types in the target database. In fact, do not connect to the target database at all for creation of objects or object types.
−−no−vacuum
Do not emit a VACUUM DATABASE ANLYZE GO statement at the end of the indexing. This can be useful if more than one invocation is needed on a particular database, and one intends to run VACUUM DATABASE ANALYZE only on the last invocation. This can save some time otherwise wasted.
−d , −−dbname dbname
set database from which to read, and to which to write (unless -td or --nodb is given).
−td , −−target-dbname dbname
set target database to which to write. Normally, the target database name is the same as the source database, but this switch allows you to override that behavior and choose another database. The database should exist.
−−bookcase−otn object−type−name
Use object-type-name as the object type which surrounds the "indexed object type".
−−indexed−otn object−type−name
Use object-type-name as the object type which must be indexed.
−−indexed−feature feature−name...]
Use feature-names in the comma-separated list as the feature(s) on the indexed object type for which to generate the index. That is, if the indexed object type is, for example, "Word", then the indexed feature could be "surface", and then "Word" must have a feature called "surface". Alternatively, the feature-name list could be "surface,stemmed_surface", meaning that the two features "surface" and "stemmed_surface" would both be indexed in the same index.
−o , −−output filename
dump MQL statements to file filename. If this switch is not given, then the MQL is just executed within the target database.
−h , −−host hostname
set source db back-end hostname to connect to (default is ’localhost’). Has no effect on SQLite or SQLite 3. If --nodb is not given, and if -th or --target-host is not given, then the target database host will be the same as the source database host.
−th , −−target−host hostname
set target db back-end hostname to connect to (default is ’localhost’). Has no effect on SQLite or SQLite 3. Has no effect it --nodb is given. If -th and --target-host are not given, then the target database host will be the same as the source database host.
−u , −−user user
set source database user to connect as (default is ’emdf’). Has no effect on SQLite or SQLite 3. Will be used for the target database user as well, unless --nodb, -tu, or --target-user is given.
−tu , −−target−user user
set target database user to connect as (default is ’emdf’). Has no effect on SQLite or SQLite 3. Will override the source database user if given. Has no effect if --nodb is given.
−p , −−password password
set source database password to use for the source database user. Has no effect on SQLite or SQLite 3, unless you have an encryption-enabled SQLite (3), in which case this gets passed as the key. Will be used for the target database password, unless one of --nodb, -tp or --target-password is given.
−tp , −−target−password password
set target password to use for the target database user. Has no effect on SQLite (3), unless you have an encryption-enabled SQLite (3), in which case this gets passed as the key. Will override the source database user for the target database. Has no effect if --nodb is given. −b , −−backend backend set source database backend to ‘backend’. The target database backend is also given with this switch, unless provided by the -tb switch. Valid values are: For PostgreSQL: "p", "pg", "postgres", and "postgresql". For MySQL: "m", "my", and "mysql". For SQLite 2.X.X: "2", "s", "l", "lt", "sqlite", and "sqlite2". For SQLite 3.X.X: "3", "s3", "lt3", and "sqlite3".
−tb , −−target−backend backend
set target database backend to ‘backend’. The target database backend is usually the same as the source database backend, but this switch makes it possible to use another database backend for the target. Valid values are the same as for -b.
−sf , −−stylesheet-filename stylesheet-filename
Gives the name of the file containing the JSON stylesheet to use. See the man-page fts_filters(5) to learn the syntax of that file.
This option must be used in conjunction with the -s option. If it is used, you cannot also use filters at the end of the command line.
−s , −−stylesheet stylesheet-name
This option tells the program which stylesheet within the stylesheet file provided with the -sf option must be used.
This option must be used in conjunction with the -sf option. If it is used, you cannot also use filters at the end of the command line.
After the last option, you have the possibility to specify any number of filters through which the indexed feature string must be passed before being stored.
Some filters take an argument, in which case the argument must come directly after the filter name, separated by whitespace on the command line.
You cannot use both filters on the command line and the -s and -sf options.
The following
filters are available:
strip−whitespace
Takes no parameter; just strips whitespace from either end of the string.
strip−chars chars−to−strip
Takes 1 parameter; strips all characters in the parameter from either end of the string.
lowercase
Takes no parameter; makes the string lower-case. Only works with ASCII letters.
uppercase
Takes no parameter; makes the string upper-case. Only works with ASCII letters.
For example, the following chain:
strip-whitespace lowercase strip-chars ’.,;:?!’
will first strip whitespace from either end, then make all ASCII letters lower-case, then strip any of the punctuation chars ’.,;:?!’ from either end of the string.
0
Success
1 Wrong usage
2 Connection to backend server could not be established
3 An exception occurred (the type is printed on stderr)
4 Could not open file
5 Database error
6 Compiler error (internal error)
Written Ulrik Sandborg-Petersen (ulrikp@emdros.org).