Internal function used in textstat_keyness. Computes \(chi^2\) with Yates' continuity correction for 2x2 tables.

keyness_chi2_dt(x, correction = c("default", "yates", "williams", "none"))

keyness_chi2_stats(x)

keyness_exact(x)

keyness_lr(x, correction = c("default", "yates", "williams", "none"))

keyness_pmi(x)

Arguments

x

a dfm object

correction

implement the Yates correction for 2x2 tables

Value

a data.frame of chi2 and p-values with rows named for each feature

Details

keyness_chi2_dt uses vectorized computation from data.table objects. keyness_chi2_stats uses element-by-element application of chisq.test. keyness_exact computes Fisher's exact using element-by-element application of fisher.test, returning the odds ratio. keyness_lr computes the \(G^2\) likelihood ratio statistic using vectorized computation keyness_pmi computes the Pointwise Mutual Information statistic using vectorized computation

References

https://en.wikipedia.org/wiki/Yates's_correction_for_continuity http://influentialpoints.com/Training/g-likelihood_ratio_test.htm

Examples

mydfm <- dfm(c(d1 = "a a a b b c c c c c c d e f g h h", d2 = "a a b c c d d d d e f h")) quanteda:::keyness_chi2_dt(mydfm)
#> chi2 p target reference #> a 0.004738562 0.9451192 3 2 #> b 0.089303670 0.7650643 2 1 #> c 0.467298378 0.4942327 6 2 #> d -2.040247141 0.1531848 1 4 #> e -0.065813362 0.7975330 1 1 #> f -0.065813362 0.7975330 1 1 #> g 0.731092437 0.3925293 1 0 #> h 0.089303670 0.7650643 2 1
quanteda:::keyness_chi2_stats(mydfm)
#> Error in get(".SigLength", envir = env): object '.SigLength' not found
quanteda:::keyness_exact(mydfm)
#> Error in get(".SigLength", envir = env): object '.SigLength' not found
quanteda:::keyness_lr(mydfm)
#> G2 p target reference #> a 0.004751467 0.9450446 3 2 #> b 0.091241952 0.7626042 2 1 #> c 0.477346514 0.4896267 6 2 #> d -2.028083548 0.1544152 1 4 #> e -0.064900531 0.7989117 1 1 #> f -0.064900531 0.7989117 1 1 #> g 1.093291040 0.2957432 1 0 #> h 0.091241952 0.7626042 2 1
quanteda:::keyness_pmi(mydfm)
#> pmi p target reference #> a 0.02325686 0.8787910 3 2 #> b 0.12861738 0.7198699 2 1 #> c 0.24640041 0.6196211 6 2 #> d -1.07535542 0.2997389 1 4 #> e -0.15906469 0.6900191 1 1 #> f -0.15906469 0.6900191 1 1 #> g 0.53408249 0.4648955 1 0 #> h 0.12861738 0.7198699 2 1