CWB
Macros | Functions | Variables
cwb-check-input.c File Reference

This was a temporary, experuimental "fiddle" with unicode programming. More...

#include <glib.h>
#include "../cl/globals.h"
#include "../cl/list.h"

Macros

#define MAX_INPUT_LINE_LENGTH   65536
 Input buffer size: copied from cwb-encode. More...
 

Functions

void cwbci_file_write_abort (void)
 convenience function with which to abort the program if file-write fails. More...
 
int cwbci_encoding_ok (char *str)
 checks whether the encoding of a given string is OK. More...
 
int cwbci_is_wordchar (char c)
 
int cwbci_begins_with_blank (char *str)
 Function for inner-loop in cwbci_check_lin(). More...
 
void cwbci_report_error_fixable (char *msg)
 
void cwbci_report_error_unfixable (char *msg)
 
void cwbci_check_line (char *line)
 
void cwbci_usage (void)
 
void cwbci_parse_options (int argc, char **argv)
 Parses commandline options for cwb-check-input and sets global variables accordingly. More...
 
int main (int argc, char **argv)
 Main function for cwb-check-input. More...
 

Variables

int line_no = 0
 line number of the line in the input file currently being checked; first == 1 More...
 
int established_number_of_p_atts = 0
 first p-att line established number of tags; anything that deviates then counts as an error More...
 
int silent = 0
 hide messages More...
 
int verbose = 0
 show messages about fixable errors in repair mode More...
 
int print_fixable_errors = 0
 deduced from mode, silent & verbose More...
 
int print_unfixable_errors = 0
 deduced from mode, silent & verbose More...
 
int errors_detected = 0
 number of errors found so far More...
 
int xml_aware = 0
 ignore <? and <! lines More...
 
int skip_empty_lines = 0
 check for empty lines More...
 
int strip_blanks = 0
 check for leading and trailing blanks in input and token annotations? More...
 
int check_nesting = 0
 check perfect nesting of XML? More...
 
FILE * input_fd = NULL
 file handle for the input file More...
 
char * input_file = NULL
 filename of the input file More...
 
FILE * output_fd = NULL
 file handle for the output file; also used for boolean tests on whether we are repairing or not More...
 
char * output_file = NULL
 filename of the output file More...
 
char * charset_label = "ascii"
 label of character set used for checking encoding More...
 
CorpusCharset charset
 character set used for checking encoding More...
 
cl_string_list hierarchy = NULL
 string list for keeping track of the XML hierarchy More...
 
char * progname = NULL
 name of the currently running program More...
 

Detailed Description

This was a temporary, experuimental "fiddle" with unicode programming.

Macro Definition Documentation

#define MAX_INPUT_LINE_LENGTH   65536

Input buffer size: copied from cwb-encode.

Referenced by cwbci_check_line(), and main().

Function Documentation

int cwbci_begins_with_blank ( char *  str)

Function for inner-loop in cwbci_check_lin().

IMPORTANT NOTE: if to be used elsewhere will need adapting, because it assumes all utf8 is well-validated and that blanks will be deleted from the line, starting with the first character.

References charset, and utf8.

Referenced by cwbci_check_line().

void cwbci_check_line ( char *  line)
int cwbci_encoding_ok ( char *  str)

checks whether the encoding of a given string is OK.

(Maybe move to the CL later?? in which case the charset should be a parameter, as a global variable cannot be assumed in all programs.) Returns boolean.

References ascii, charset, latin1, and utf8.

Referenced by cwbci_check_line().

void cwbci_file_write_abort ( void  )

convenience function with which to abort the program if file-write fails.

References input_fd, output_fd, and output_file.

Referenced by cwbci_check_line().

int cwbci_is_wordchar ( char  c)

Referenced by cwbci_check_line().

void cwbci_parse_options ( int  argc,
char **  argv 
)
void cwbci_report_error_fixable ( char *  msg)
void cwbci_report_error_unfixable ( char *  msg)
void cwbci_usage ( void  )

References progname.

Referenced by cwbci_parse_options().

int main ( int  argc,
char **  argv 
)

Variable Documentation

CorpusCharset charset
char* charset_label = "ascii"

label of character set used for checking encoding

Referenced by cwbci_parse_options().

int check_nesting = 0

check perfect nesting of XML?

Referenced by cwbci_check_line(), and cwbci_parse_options().

int errors_detected = 0

number of errors found so far

Referenced by cwbci_report_error_fixable(), cwbci_report_error_unfixable(), and main().

int established_number_of_p_atts = 0

first p-att line established number of tags; anything that deviates then counts as an error

Referenced by cwbci_check_line().

cl_string_list hierarchy = NULL

string list for keeping track of the XML hierarchy

FILE* input_fd = NULL

file handle for the input file

Referenced by cwbci_file_write_abort(), lexdecode_show(), and main().

char* input_file = NULL

filename of the input file

Referenced by cwbci_parse_options(), and main().

int line_no = 0

line number of the line in the input file currently being checked; first == 1

Referenced by cwbci_report_error_fixable(), cwbci_report_error_unfixable(), and main().

FILE* output_fd = NULL

file handle for the output file; also used for boolean tests on whether we are repairing or not

Referenced by cwbci_check_line(), cwbci_file_write_abort(), and main().

char* output_file = NULL

filename of the output file

Referenced by cwbci_file_write_abort(), cwbci_parse_options(), and main().

int print_fixable_errors = 0

deduced from mode, silent & verbose

Referenced by cwbci_parse_options(), and cwbci_report_error_fixable().

int print_unfixable_errors = 0

deduced from mode, silent & verbose

Referenced by cwbci_parse_options(), and cwbci_report_error_unfixable().

char* progname = NULL

name of the currently running program

Referenced by cwbci_usage(), and main().

int silent = 0

hide messages

Referenced by cwbci_parse_options().

int skip_empty_lines = 0

check for empty lines

Referenced by cwbci_check_line(), and cwbci_parse_options().

int strip_blanks = 0

check for leading and trailing blanks in input and token annotations?

Referenced by cwbci_check_line(), and cwbci_parse_options().

int verbose = 0

show messages about fixable errors in repair mode

Referenced by cwbci_parse_options().

int xml_aware = 0

ignore <? and <! lines

Referenced by cwbci_check_line(), and cwbci_parse_options().