There is a newer version of this record available.

Software Open Access

quanteda/quanteda: CRAN v1.5.0

Kenneth Benoit; Kohei Watanabe; Haiyan Wang; Paul Nulty; Adam Obeng; Stefan Müller; Jiong Wei Lua; Aki Matsuo; Christian Mueller; Will Lowe; Pablo Barberá; Tyler Rinker; mark padgham; Christopher Gandrud; José Tomás Atria; Tom Paskhalis; nicmer; lindbrook; hofaichan; etienne-s; hotzeplotz; Thomas J. Leeper; Stas Malavin; Michael W. Kearney; Michael Chirico; Katrin Leinweber; Johannes Gruber


Citation Style Language JSON Export

{
  "publisher": "Zenodo", 
  "DOI": "10.5281/zenodo.3268686", 
  "title": "quanteda/quanteda: CRAN v1.5.0", 
  "issued": {
    "date-parts": [
      [
        2019, 
        7, 
        4
      ]
    ]
  }, 
  "abstract": "New features\n<ul>\n<li>Add <code>flatten</code> and <code>levels</code> arguments to <code>as.list.dictionary2()</code> to enable more flexible conversion of dictionary objects. (#1661)</li>\n<li>In <code>corpus_sample()</code>, the <code>size</code> now works with the <code>by</code> argument, to control the size of units sampled from each group.</li>\n<li>Improvements to <code>textstat_dist()</code> and <code>textstat_simil()</code>, see below.</li>\n<li>Long tokens are not discarded automatically in the call to <code>tokens()</code>. (#1713)</li>\n</ul>\nBehaviour changes\n<ul>\n<li><code>textstat_dist()</code> and <code>textstat_simil()</code> now return sparse symmetric matrix objects using classes from the <strong>Matrix</strong> package.  This replaces the former structure based on the <code>dist</code> class.  Computation of these classes is now also based on the fast implementation in the <strong>proxyC</strong> package.  When computing similarities, the new <code>min_simil</code> argument allows a user to ignore certain values below a specified similarity threshold.  A new coercion method <code>as.data.frame.textstat_simildist()</code> now exists for converting these returns into a data.frame of pairwise comparisons.  Existing methods such as <code>as.matrix()</code>, <code>as.dist()</code>, and <code>as.list()</code> work as they did before.</li>\n<li>We have removed the \"faith\", \"chi-squared\", and \"kullback\" methods from <code>textstat_dist()</code> and <code>textstat_simil()</code> because these were either not symmetric or not invariant to document or feature ordering. Finally, the <code>selection</code> argument has been deprecated in favour of a new <code>y</code> argument.  </li>\n<li><code>textstat_readability()</code> now defaults to <code>measure = \"Flesch\"</code> if no measure is supplied.  This makes it consistent with <code>textstat_lexdiv()</code> that also takes a default measure (\"TTR\") if none is supplied.  (#1715)</li>\n<li>The default values for <code>max_nchar</code> and <code>min_nchar</code> in <code>tokens_select()</code> are now NULL, meaning they are not applied if the user does not supply values.  Fixes #1713.</li>\n</ul>\nBug fixes and stability enhancements\n<ul>\n<li><code>kwic.corpus()</code> and <code>kwic.tokens()</code> behaviour now aligned, meaning that dictionaries are correctly faceted by key instead of by value. (#1684)</li>\n<li>Improved formatting of <code>tokens()</code> verbose output. (#1683)</li>\n<li>Subsetting and printing of subsetted kwic objects is more robust. (#1665)</li>\n<li>The \"Bormuth\" and \"DRP\" measures are now fixed for <code>textstat_readability()</code>. (#1701)</li>\n</ul>", 
  "author": [
    {
      "family": "Kenneth Benoit"
    }, 
    {
      "family": "Kohei Watanabe"
    }, 
    {
      "family": "Haiyan Wang"
    }, 
    {
      "family": "Paul Nulty"
    }, 
    {
      "family": "Adam Obeng"
    }, 
    {
      "family": "Stefan M\u00fcller"
    }, 
    {
      "family": "Jiong Wei Lua"
    }, 
    {
      "family": "Aki Matsuo"
    }, 
    {
      "family": "Christian Mueller"
    }, 
    {
      "family": "Will Lowe"
    }, 
    {
      "family": "Pablo Barber\u00e1"
    }, 
    {
      "family": "Tyler Rinker"
    }, 
    {
      "family": "mark padgham"
    }, 
    {
      "family": "Christopher Gandrud"
    }, 
    {
      "family": "Jos\u00e9 Tom\u00e1s Atria"
    }, 
    {
      "family": "Tom Paskhalis"
    }, 
    {
      "family": "nicmer"
    }, 
    {
      "family": "lindbrook"
    }, 
    {
      "family": "hofaichan"
    }, 
    {
      "family": "etienne-s"
    }, 
    {
      "family": "hotzeplotz"
    }, 
    {
      "family": "Thomas J. Leeper"
    }, 
    {
      "family": "Stas Malavin"
    }, 
    {
      "family": "Michael W. Kearney"
    }, 
    {
      "family": "Michael Chirico"
    }, 
    {
      "family": "Katrin Leinweber"
    }, 
    {
      "family": "Johannes Gruber"
    }
  ], 
  "version": "v1.5.0", 
  "type": "article", 
  "id": "3268686"
}
557
125
views
downloads
All versions This version
Views 5578
Downloads 1251
Data volume 3.3 GB37.0 MB
Unique views 5228
Unique downloads 421

Share

Cite as