The dfm class of object is a type of Matrix-class object with
additional slots, described below. quanteda uses two subclasses of the
dfm
class, depending on whether the object can be represented by a
sparse matrix, in which case it is a dfm
class object, or if dense,
then a dfmDense
object. See Details.
# S4 method for dfm t(x) # S4 method for dfm colSums(x, na.rm = FALSE, dims = 1L, ...) # S4 method for dfm rowSums(x, na.rm = FALSE, dims = 1L, ...) # S4 method for dfm colMeans(x, na.rm = FALSE, dims = 1L, ...) # S4 method for dfm rowMeans(x, na.rm = FALSE, dims = 1L, ...) # S4 method for dfm,numeric +(e1, e2) # S4 method for numeric,dfm +(e1, e2) # S4 method for dfm,index,index,missing [(x, i, j, ..., drop = FALSE) # S4 method for dfm,index,index,logical [(x, i, j, ..., drop = FALSE) # S4 method for dfm,missing,missing,missing [(x, i, j, ..., drop = FALSE) # S4 method for dfm,missing,missing,logical [(x, i, j, ..., drop = FALSE) # S4 method for dfm,index,missing,missing [(x, i, j, ..., drop = FALSE) # S4 method for dfm,index,missing,logical [(x, i, j, ..., drop = FALSE) # S4 method for dfm,missing,index,missing [(x, i, j, ..., drop = FALSE) # S4 method for dfm,missing,index,logical [(x, i, j, ..., drop = FALSE)
x | the dfm object |
---|---|
na.rm | if |
dims | ignored |
... | additional arguments not used here |
e1 | first quantity in "+" operation for dfm |
e2 | second quantity in "+" operation for dfm |
i | index for documents |
j | index for features |
drop | always set to |
The dfm
class is a virtual class that will contain
dgCMatrix-class.
settings
settings that govern corpus handling and subsequent downstream
operations, including the settings used to clean and tokenize the texts,
and to create the dfm. See settings
.
weighting
the feature weighting applied to the dfm. Default is
"frequency"
, indicating that the values in the cells of the dfm are
simple feature counts. To change this, use the dfm_weight
method.
smooth
a smoothing parameter, defaults to zero. Can be changed using
either the smooth
or the dfm_weight
methods.
Dimnames
These are inherited from Matrix-class but are
named docs
and features
respectively.
# dfm subsetting x <- dfm(tokens(c("this contains lots of stopwords", "no if, and, or but about it: lots", "and a third document is it"), remove_punct = TRUE)) x[1:2, ]#> Document-feature matrix of: 2 documents, 16 features (59.4% sparse). #> 2 x 16 sparse Matrix of class "dfm" #> features #> docs this contains lots of stopwords no if and or but about it a third #> text1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 #> text2 0 0 1 0 0 1 1 1 1 1 1 1 0 0 #> features #> docs document is #> text1 0 0 #> text2 0 0x[1:2, 1:5]#> Document-feature matrix of: 2 documents, 5 features (40% sparse). #> 2 x 5 sparse Matrix of class "dfm" #> features #> docs this contains lots of stopwords #> text1 1 1 1 1 1 #> text2 0 0 1 0 0# fcm subsetting y <- fcm(tokens(c("this contains lots of stopwords", "no if, and, or but about it: lots"), remove_punct = TRUE)) y[1:3, ]#> Feature co-occurrence matrix of: 3 by 12 features. #> 3 x 12 sparse Matrix of class "fcm" #> features #> features this contains lots of stopwords no if and or but about it #> this 0 1 1 1 1 0 0 0 0 0 0 0 #> contains 0 0 1 1 1 0 0 0 0 0 0 0 #> lots 0 0 0 1 1 1 1 1 1 1 1 1y[4:5, 1:5]#> Feature co-occurrence matrix of: 2 by 5 features. #> 2 x 5 sparse Matrix of class "fcm" #> features #> features this contains lots of stopwords #> of 0 0 0 0 1 #> stopwords 0 0 0 0 0