Published July 4, 2019
| Version v1.5.0
Software
Open
quanteda/quanteda: CRAN v1.5.0
Authors/Creators
- Kenneth Benoit1
- Kohei Watanabe2
- Haiyan Wang3
- Paul Nulty4
- Adam Obeng5
- Stefan Müller6
- Jiong Wei Lua7
- Aki Matsuo8
- Christian Mueller1
- Will Lowe9
- Pablo Barberá10
- Tyler Rinker11
- mark padgham12
- Christopher Gandrud13
- José Tomás Atria
- Tom Paskhalis1
- nicmer
- lindbrook
- hofaichan
- etienne-s
- hotzeplotz
- Thomas J. Leeper
- Stas Malavin14
- Michael W. Kearney15
- Michael Chirico16
- Katrin Leinweber17
- Johannes Gruber18
- 1. London School of Economics and Political Science
- 2. Waseda University
- 3. Tracr
- 4. University College Dublin
- 5. Columbia University, London School of Economics
- 6. University of Zurich
- 7. London School of Economics
- 8. Institute for Analytics and Data Science, University of Essex
- 9. Princeton University
- 10. University of Southern California
- 11. Campus Labs
- 12. @ATFutures
- 13. @zalando
- 14. Soil Cryology Lab
- 15. @MUDSA
- 16. @myteksi
- 17. @TIBHannover
- 18. University of Glasgow
Description
New features
- Add
flattenandlevelsarguments toas.list.dictionary2()to enable more flexible conversion of dictionary objects. (#1661) - In
corpus_sample(), thesizenow works with thebyargument, to control the size of units sampled from each group. - Improvements to
textstat_dist()andtextstat_simil(), see below. - Long tokens are not discarded automatically in the call to
tokens(). (#1713)
textstat_dist()andtextstat_simil()now return sparse symmetric matrix objects using classes from the Matrix package. This replaces the former structure based on thedistclass. Computation of these classes is now also based on the fast implementation in the proxyC package. When computing similarities, the newmin_similargument allows a user to ignore certain values below a specified similarity threshold. A new coercion methodas.data.frame.textstat_simildist()now exists for converting these returns into a data.frame of pairwise comparisons. Existing methods such asas.matrix(),as.dist(), andas.list()work as they did before.- We have removed the "faith", "chi-squared", and "kullback" methods from
textstat_dist()andtextstat_simil()because these were either not symmetric or not invariant to document or feature ordering. Finally, theselectionargument has been deprecated in favour of a newyargument. textstat_readability()now defaults tomeasure = "Flesch"if no measure is supplied. This makes it consistent withtextstat_lexdiv()that also takes a default measure ("TTR") if none is supplied. (#1715)- The default values for
max_ncharandmin_ncharintokens_select()are now NULL, meaning they are not applied if the user does not supply values. Fixes #1713.
kwic.corpus()andkwic.tokens()behaviour now aligned, meaning that dictionaries are correctly faceted by key instead of by value. (#1684)- Improved formatting of
tokens()verbose output. (#1683) - Subsetting and printing of subsetted kwic objects is more robust. (#1665)
- The "Bormuth" and "DRP" measures are now fixed for
textstat_readability(). (#1701)
Files
quanteda/quanteda-v1.5.0.zip
Files
(37.0 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:5b7294057e21e230fdb349076e556c1e
|
37.0 MB | Preview Download |
Additional details
Related works
- Is supplement to
- https://github.com/quanteda/quanteda/tree/v1.5.0 (URL)