s_attribute_decode.Rd
Get data.frame
with left and right corpus positions (cpos) for structural
attributes and values.
s_attribute_decode(corpus, data_dir, s_attribute, encoding = NULL, registry = Sys.getenv("CORPUS_REGISTRY"), method = c("R", "Rcpp"))
corpus | a CWB corpus |
---|---|
data_dir | data directory where binary files for corpus are stored |
s_attribute | a structural attribute |
encoding | encoding of the values ("latin-1" or "utf-8") |
registry | registry directory |
method | character vector, whether to use "R" or "Rcpp" implementation |
A data.frame
with three columns. Column cpos_left
are the start
corpus positions of a structural annotation, cpos_right
the end corpus positions.
Column value
is the value of the annotation.
a character vector
Two approaches are implemented: A pure R solution will decode the files directly in
the directory specified by data_dir
. An implementation using Rcpp will use the
registry file for corpus
to find the data directory.
registry <- if (!check_pkg_registry_files()) use_tmp_registry() else get_pkg_registry() Sys.setenv(CORPUS_REGISTRY = registry) # pure R implementation (Rcpp implementation fails on Windows in vanilla mode) b <- s_attribute_decode( data_dir = system.file(package = "RcppCWB", "extdata", "cwb", "indexed_corpora", "reuters"), s_attribute = "places", method = "R" )