textmodel_ca
implements correspondence analysis scaling on a
dfm. The method is a fast/sparse version of function ca, and
returns a special class of ca object.
textmodel_ca(x, smooth = 0, nd = NA, sparse = FALSE, threads = 1, residual_floor = 0.1)
x | the dfm on which the model will be fit |
---|---|
smooth | a smoothing parameter for word counts; defaults to zero. |
nd | Number of dimensions to be included in output; if |
sparse | retains the sparsity if set to |
threads | the number of threads to be used; set to 1 to use a
serial version of the function; only applicable when |
residual_floor | specifies the threshold for the residual matrix for
calculating the truncated svd.Larger value will reduce memory and time cost
but might sacrify the accuracy; only applicable when |
svds in the RSpectra package is applied to enable the fast computation of the SVD.
Setting threads larger than 1 (when sparse = TRUE
) will trigger
parallel computation, which retains sparsity of all involved matrices. You
may need to increase the value of residual_floor
to ignore less
important information and hence to reduce the memory cost when you have a
very big dfm.
If your attempt to fit the model fails due to the matrix being too large,
this is probably because of the memory demands of computing the \(V
\times V\) residual matrix. To avoid this, consider increasing the value of
residual_floor
by 0.1, until the model can be fit.
Nenadic, O. and Greenacre, M. (2007). Correspondence analysis in R, with two- and three-dimensional graphics: The ca package. Journal of Statistical Software, 20 (3), http://www.jstatsoft.org/v20/i03/.
#> Error in get(".SigLength", envir = env): object '.SigLength' not foundsummary(wca)#> Error in summary(wca): object 'wca' not found