Published January 30, 2019
| Version v1.4
Software
Open
quanteda/quanteda: CRAN v1.4.0
Creators
- Kenneth Benoit1
- Kohei Watanabe2
- Haiyan Wang3
- Paul Nulty4
- Adam Obeng5
- Stefan Müller6
- Jiong Wei Lua
- Aki Matsuo7
- Christian Mueller1
- Will Lowe8
- Pablo Barberá9
- Tyler Rinker10
- mark padgham11
- Christopher Gandrud12
- Tom Paskhalis1
- nicmer
- lindbrook
- hofaichan
- etienne-s
- hotzeplotz
- Thomas J. Leeper
- Stas Malavin13
- Michael W. Kearney14
- Michael Chirico15
- Katrin Leinweber16
- 1. London School of Economics and Political Science
- 2. Waseda University
- 3. LSE
- 4. University of Cambridge
- 5. Columbia University, London School of Economics
- 6. University of Zurich
- 7. Department of Methodology, London School of Economics
- 8. Princeton University
- 9. London School of Economics
- 10. Campus Labs
- 11. @ATFutures
- 12. @zalando
- 13. Soil Cryology Lab
- 14. @MUDSA
- 15. @myteksi
- 16. @TIBHannover
Description
Bug fixes and stability enhancements
- Fixed bug in
dfm_compress()
anddfm_group()
that changed or deleted docvars attributes of dfm objects (#1506). - Fixed a bug in
textplot_xray()
that caused incorrect facet labels when a pattern contained multiple list elements or values (#1514). kwic()
now correctly returns the pattern associated with each match as the"keywords"
attribute, for allpattern
types (#1515)- Implemented some improvements in efficiency and computation of unusual edge cases for
textstat_simil()
andtextstat_dist()
.
textstat_lexdiv()
now works on tokens objects, not just dfm objects. New methods of lexical diversity now include MATTR (the Moving-Average Type-Token Ratio, Covington & McFall 2010) and MSTTR (Mean Segmental Type-Token Ratio).- New function
tokens_split()
allows splitting single into multiple tokens based on a pattern match. (#1500) - New function
tokens_chunk()
allows splitting tokens into new documents of equally-sized "chunks". (#1520) - New function
textstat_entropy()
now computes entropy for a dfm across feature or document margins. - The documentation for
textstat_readability()
is vastly improved, now providing detailing all formulas and providing full references. - New function
dfm_match()
allows a user to specify the features in a dfm according to a fixed vector of feature names, including those of another dfm. Replacesdfm_select(x, pattern)
wherepattern
was a dfm. - A new argument
vertex_labelsize
added totextplot_network()
to allow more precise control of label sizes, either globally or individually.
tokens.tokens(x, remove_hyphens = TRUE)
wherex
was generated withremove_hyphens = FALSE
now behaves similarly to how the same tokens would be handled had this option been called on character input astokens.character(x, remove_hyphens = TRUE)
. (#1498)
Files
quanteda/quanteda-v1.4.zip
Files
(32.7 MB)
Name | Size | Download all |
---|---|---|
md5:d73bcfb636cb589134cbec939f27495f
|
32.7 MB | Preview Download |
Additional details
Related works
- Is supplement to
- https://github.com/quanteda/quanteda/tree/v1.4 (URL)