CWB
Macros | Functions | Variables
makecomps.c File Reference

This file contains functions for creating four different P-attribute components: namely CompLexiconSrt, CompCorpusFreqs, CompRevCorpus, and CompRevCorpusIdx. More...

#include <ctype.h>
#include <sys/types.h>
#include "globals.h"
#include "endian.h"
#include "macros.h"
#include "storage.h"
#include "fileutils.h"
#include "corpus.h"
#include "attributes.h"
#include "cdaccess.h"
#include "makecomps.h"

Macros

#define BUFSIZE   0x10000
 

Functions

static int scompare (const void *idx1, const void *idx2)
 Sorts two lexicon entries using cl_strcmp. More...
 
int creat_sort_lexicon (Component *lexsrt)
 creates a sorted index from the (already existing) lexicon index of the Attribute. More...
 
int creat_freqs (Component *freqs)
 Creates the CompCorpusFreqs component (list of type frequencies for a given p-attribute) More...
 
int creat_rev_corpus (Component *revcorp)
 Creates a reversed corpus component. More...
 
int creat_rev_corpus_idx (Component *revcidx)
 creates index for reversed corpus More...
 

Variables

char errmsg [CL_MAX_LINE_LENGTH]
 
static MemBlobSortLexicon
 
static MemBlobSortIndex
 

Detailed Description

This file contains functions for creating four different P-attribute components: namely CompLexiconSrt, CompCorpusFreqs, CompRevCorpus, and CompRevCorpusIdx.

These are all produced by permutation of a previously encoded attribute (CompCorpus, CompLExicon, etc.)

Macro Definition Documentation

#define BUFSIZE   0x10000

Referenced by creat_freqs().

Function Documentation

int creat_freqs ( Component freqs)
int creat_rev_corpus ( Component revcorp)

Creates a reversed corpus component.

This function should only be invoked by the makeall tool (via create_component()), which must make sure that the lexicon and (possibly) compressed token stream have been created by now, so CL access to the token stream works.

See also
create_component
makeall_do_attribute
Returns
number of passes made through the corpus.

References TComponent::attribute, cl_cpos2id(), cl_debug, cl_free, cl_id2freq(), cl_malloc(), cl_max_cpos(), cl_max_id(), cl_memory_limit, CompCorpusFreqs, TComponent::corpus, TMblob::data, TComponent::data, ensure_component(), NwriteInt(), NwriteInts(), and TComponent::path.

Referenced by create_component().

int creat_rev_corpus_idx ( Component revcidx)
int creat_sort_lexicon ( Component lexsrt)
static int scompare ( const void *  idx1,
const void *  idx2 
)
static

Sorts two lexicon entries using cl_strcmp.

This function is for use with qsort().

References cl_strcmp(), and TMblob::data.

Referenced by creat_sort_lexicon().

Variable Documentation

char errmsg[CL_MAX_LINE_LENGTH]
MemBlob* SortIndex
static
MemBlob* SortLexicon
static