Combine documents in a dfm by a grouping variable, which can also be
one of the docvars attached to the dfm. This is identical in
functionality to using the "groups"
argument in dfm
.
dfm_group(x, groups = NULL, fill = FALSE)
x | a dfm |
---|---|
groups | either: a character vector containing the names of document variables to be used for grouping; or a factor or object that can be coerced into a factor equal in length or rows to the number of documents. See groups for details. |
fill | logical; if |
dfm_group
returns a dfm whose documents are equal to
the unique group combinations, and whose cell values are the sums of the
previous values summed by group. This currently erases any docvars in the
dfm.
Setting the fill = TRUE
offers a way to "pad" a dfm with document
groups that may not have been observed, but for which an empty document is
needed, for various reasons. If groups
is a factor of dates, for
instance, then using fill = TRUE
ensures that the new documents will
consist of one row of the dfm per date, regardless of whether any documents
previously existed with that date.
mycorpus <- corpus(c("a a b", "a b c c", "a c d d", "a c c d"), docvars = data.frame(grp = c("grp1", "grp1", "grp2", "grp2"))) mydfm <- dfm(mycorpus) dfm_group(mydfm, groups = "grp")#> Document-feature matrix of: 2 documents, 4 features (25% sparse). #> 2 x 4 sparse Matrix of class "dfmSparse" #> features #> docs a b c d #> grp1 3 2 2 0 #> grp2 2 0 3 3dfm_group(mydfm, groups = c(1, 1, 2, 2))#> Document-feature matrix of: 2 documents, 4 features (25% sparse). #> 2 x 4 sparse Matrix of class "dfmSparse" #> features #> docs a b c d #> 1 3 2 2 0 #> 2 2 0 3 3#> Document-feature matrix of: 2 documents, 4 features (25% sparse). #> 2 x 4 sparse Matrix of class "dfmSparse" #> features #> docs a b c d #> grp1 3 2 2 0 #> grp2 2 0 3 3#> Document-feature matrix of: 2 documents, 4 features (25% sparse). #> 2 x 4 sparse Matrix of class "dfmSparse" #> features #> docs a b c d #> 1 3 2 2 0 #> 2 2 0 3 3