dfm_compress.Rd
"Compresses" or groups a dfm or fcm whose dimension names are
the same, for either documents or features. This may happen, for instance,
if features are made equivalent through application of a thesaurus. It could also be needed after a
cbind.dfm
or rbind.dfm
operation. In most cases, you will not
need to call `dfm_compress`, since it is called automatically by functions that change the
dimensions of the dfm, e.g. dfm_tolower
.
dfm_compress(x, margin = c("both", "documents", "features")) fcm_compress(x)
x | |
---|---|
margin | character indicating on which margin to compress a dfm, either
|
... | additional arguments passed from generic to specific methods |
dfm_compress
returns a dfm whose dimensions have been
recombined by summing the cells across identical dimension names
(docnames or featnames). The docvars will be
preserved for combining by features but not when documents are combined.
fcm_compress
returns an fcm whose features have been
recombined by combining counts of identical features, summing their counts.
fcm_compress
works only when the fcm was created with a
document context.
# dfm_compress examples dfmat <- rbind(dfm(c("b A A", "C C a b B"), tolower = FALSE), dfm("A C C C C C", tolower = FALSE)) colnames(dfmat) <- char_tolower(featnames(dfmat)) dfmat#> Document-feature matrix of: 3 documents, 5 features (46.7% sparse). #> 3 x 5 sparse Matrix of class "dfm" #> features #> docs b a c a b #> text1 1 2 0 0 0 #> text2 1 0 2 1 1 #> text1 0 1 5 0 0dfm_compress(dfmat, margin = "documents")#> Document-feature matrix of: 2 documents, 5 features (30.0% sparse). #> 2 x 5 sparse Matrix of class "dfm" #> features #> docs b a c a b #> text1 1 3 5 0 0 #> text2 1 0 2 1 1dfm_compress(dfmat, margin = "features")#> Document-feature matrix of: 3 documents, 3 features (22.2% sparse). #> 3 x 3 sparse Matrix of class "dfm" #> features #> docs b a c #> text1 1 2 0 #> text2 2 1 2 #> text1 0 1 5dfm_compress(dfmat)#> Document-feature matrix of: 2 documents, 3 features (0.0% sparse). #> 2 x 3 sparse Matrix of class "dfm" #> features #> docs b a c #> text1 1 3 5 #> text2 2 1 2# no effect if no compression needed dfmatsubset <- dfm(data_corpus_inaugural[1:5]) dim(dfmatsubset)#> [1] 5 1948#> [1] 5 1948# compress an fcm fcmat1 <- fcm(tokens("A D a C E a d F e B A C E D"), context = "window", window = 3) ## this will produce an error: # fcm_compress(fcmat1) txt <- c("The fox JUMPED over the dog.", "The dog jumped over the fox.") toks <- tokens(txt, remove_punct = TRUE) fcmat2 <- fcm(toks, context = "document") colnames(fcmat2) <- rownames(fcmat2) <- tolower(colnames(fcmat2)) colnames(fcmat2)[5] <- rownames(fcmat2)[5] <- "fox" fcmat2#> Feature co-occurrence matrix of: 7 by 7 features. #> 7 x 7 sparse Matrix of class "fcm" #> features #> features the fox jumped over fox dog jumped #> the 0 2 1 2 2 2 1 #> fox 0 0 1 2 2 2 1 #> jumped 0 0 0 1 1 1 0 #> over 0 0 0 0 2 2 1 #> fox 0 0 0 0 0 2 1 #> dog 0 0 0 0 0 0 1 #> jumped 0 0 0 0 0 0 0fcm_compress(fcmat2)#> Feature co-occurrence matrix of: 5 by 5 features. #> 5 x 5 sparse Matrix of class "fcm" #> features #> features the fox jumped over dog #> the 0 4 2 2 2 #> fox 0 2 3 2 4 #> jumped 0 1 0 1 1 #> over 0 2 1 0 2 #> dog 0 0 1 0 0