The Corpus Workbench (CWB) is a classic indexing and query engine to efficiently work with large, linguistically annotated corpora. The cwbtools package offers a set of tools to conveniently create, modify and manage CWB indexed corpora from within R. It complements R packages that use the CWB as a backend for text mining with R, namely the RcppCWB package for low-level access to CWB indexed corpora, and polmineR as a toolset to implement common text mining workflows.

Installation

The package is a “GitHub only”" package at this stage. The easiest way to install the package is to use an installation mechanism offered by the devtools package. The procedure is the same for Windows, Linux, and macOS. On Windows, having Rtools installed on your system may be necessary to use the full functionality of ‘devtools’.

First, check that devtools is installed …

if (!"devtools" %in% installed.packages()[,"Package"]) install.packages("devtools")

Then install the cwbtools package.

devtools::install_github("PolMine/cwbtools")

Acknowledgements

The CWB is a classical indexing and query engine. Its character as an open source project is of great value for the community working with corpora. The enduring effort of the developers of the CWB is gratefully acknowledged!