Info: Zenodo’s user support line is staffed on regular business days between Dec 23 and Jan 5. Response times may be slightly longer than normal.

There is a newer version of the record available.

Published January 27, 2017 | Version v0.9.9-17
Software Open

kbenoit/quanteda: CRAN v0.9.9-17

  • 1. London School of Economics and Political Science
  • 2. University of Cambridge
  • 3. Columbia University, London School of Economics
  • 4. London School of Economics
  • 5. University of Southern California
  • 6. Department of Methodology, LSE
  • 7. Harvard IQSS (@IQSS)
  • 8. University at Buffalo
  • 9. Finnish Museum of Natural History, University of Helsinki

Description

Bug fixes and minor feature additions.

Changes since v0.9.9-3 Bug fixes
  • Fixed a bug causing dfm and tokens to break on > 10,000 documents. (#438)
  • Fixed a bug in tokens(x, what = "character", removeSeparators = TRUE) that returned an empty string.
  • Fixed a bug in corpus.VCorpus if the VCorpus contains a single document. (#445)
  • Fixed a bug in dfm_compress in which the function failed on documents that contained zero feature counts. (#467)
  • Fixed a bug in textmodel_NB that caused the class priors Pc to be refactored alphabetically instead of in the order of assignment (#471), also affecting predicted classes (#476).
New features
  • New textstat function textstat_keyness() discovers words that occur at differential rates between partitions of a dfm (using chi-squared, Fisher's exact test, and the G^2 likelihood ratio test to measure the strength of associations).
  • Added 2017-Trump to the inaugural corpus datasets (data_corpus_inaugual and data_char_inaugural).
  • Improved the groups argument in texts() (and in dfm() that uses this function), which will now coerce to a factor rather than requiring one.
  • Added a dfm constructor from dfm objects, with the option of collapsing by groups.
  • Added new arguments to sequences(): ordered and max_length, the latter to prevent memory leaks from extremely long sequences.
  • dictionary() now accepts YAML as an input file format.
  • dfm_lookup and tokens_lookup now accept a levels argument to determine which level of a hierarchical dictionary should be applied.
  • Added min_nchar and max_nchar arguments to dfm_select.
  • dictionary() can now be called on the argument of a list() without explicitly wrapping it in list().
  • fcm now works directly on a dfm object when context = "documents".

Files

kbenoit/quanteda-v0.9.9-17.zip

Files (12.1 MB)

Name Size Download all
md5:e7d71c3d5273fbb6d3e99f58a55b4217
12.1 MB Preview Download

Additional details

Related works