CWB corpora and openNLP

Andreas Blätte (andreas.blaette@uni-due.de)

2021-02-22

Required packages

library(cwbtools)
library(RcppCWB)
library(NLP)
library(openNLP)
library(data.table)

Interfacing to openNLP

Decode p-attribute ‘word’

Reconstruct String

Match token annotation

Create Annotations

Sentences

Part-of-speech

Named entities

Add annotation to corpus

pos <- unlist(lapply(as.data.frame(p)[["features"]], `[[`, "POS"))