Combine documents in a dfm by a grouping variable, which can also be
one of the docvars attached to the dfm. This is identical in
functionality to using the "groups"
argument in dfm
.
dfm_group(x, groups = NULL, fill = FALSE)
x | a dfm |
---|---|
groups | either: a character vector containing the names of document variables to be used for grouping; or a factor or object that can be coerced into a factor equal in length or rows to the number of documents. See groups for details. |
fill | logical; if |
dfm_group
returns a dfm whose documents are equal to
the unique group combinations, and whose cell values are the sums of the
previous values summed by group. Document-level variables that have no
variation within groups are saved in docvars.
Setting the fill = TRUE
offers a way to "pad" a dfm with document
groups that may not have been observed, but for which an empty document is
needed, for various reasons. If groups
is a factor of dates, for
instance, then using fill = TRUE
ensures that the new documents will
consist of one row of the dfm per date, regardless of whether any documents
previously existed with that date.
mycorpus <- corpus(c("a a b", "a b c c", "a c d d", "a c c d"), docvars = data.frame(grp = c("grp1", "grp1", "grp2", "grp2"))) mydfm <- dfm(mycorpus) dfm_group(mydfm, groups = "grp")#> Document-feature matrix of: 2 documents, 4 features (25% sparse). #> 2 x 4 sparse Matrix of class "dfm" #> features #> docs a b c d #> grp1 3 2 2 0 #> grp2 2 0 3 3dfm_group(mydfm, groups = c(1, 1, 2, 2))#> Document-feature matrix of: 2 documents, 4 features (25% sparse). #> 2 x 4 sparse Matrix of class "dfm" #> features #> docs a b c d #> 1 3 2 2 0 #> 2 2 0 3 3#> Document-feature matrix of: 2 documents, 4 features (25% sparse). #> 2 x 4 sparse Matrix of class "dfm" #> features #> docs a b c d #> grp1 3 2 2 0 #> grp2 2 0 3 3#> Document-feature matrix of: 2 documents, 4 features (25% sparse). #> 2 x 4 sparse Matrix of class "dfm" #> features #> docs a b c d #> 1 3 2 2 0 #> 2 2 0 3 3