CWB
|
#include "globals.h"
Functions | |
unsigned char * | cl_string_maptable (CorpusCharset charset, int flags) |
Gets a specified character mapping table for use in regular expressions. More... | |
int | cl_iso_char_is_alphanumeric (unsigned char c, CorpusCharset charset) |
Checks whether a character is alphanumeric in the given ISO-8859 character set. More... | |
int cl_iso_char_is_alphanumeric | ( | unsigned char | c, |
CorpusCharset | charset | ||
) |
Checks whether a character is alphanumeric in the given ISO-8859 character set.
This function is exported but NOT via cl.h - it is only for the use of CWB utilities. It is not part of the standard API.
Returns false if charset is utf8.
c | The character to check. |
charset | The character set to check against. |
References charset, checktable_is_alphanum, and utf8.
Referenced by scancorpus_word_is_regular().
unsigned char* cl_string_maptable | ( | CorpusCharset | charset, |
int | flags | ||
) |
Gets a specified character mapping table for use in regular expressions.
Returns pointer to static mapping table for given flags (IGNORE_CASE and IGNORE_DIAC) and character set.
Removed from the public API for 3.2.0 because there's no way for it to work if the CorpusCharset is UTF8. Prototype moved to special-chars.h
Tables exist for all character sets, but for all except Latin1 and ASCII, they are currently identical to the ASCII tables (i.e. the awareness of case/accent relationships in the upper half of each character set have not yet been inserted).
charset | The character set of this corpus. Currently ignored. |
flags | The flags that specify which table is required. Can be IGNORE_CASE and/or IGNORE_DIAC. |
References ascii, charset, identity_tab, identity_tab_init, IGNORE_CASE, IGNORE_DIAC, maptable_init_both(), maptable_init_identity(), nocase_nodiac_tab, nocase_nodiac_tab_init, nocase_tab, nodiac_tab, and utf8.
Referenced by cl_string_canonical().