3268686
doi
10.5281/zenodo.3268686
oai:zenodo.org:3268686
Kohei Watanabe
Waseda University
Haiyan Wang
Tracr
Paul Nulty
University College Dublin
Adam Obeng
Columbia University, London School of Economics
Stefan Müller
University of Zurich
Jiong Wei Lua
London School of Economics
Aki Matsuo
Institute for Analytics and Data Science, University of Essex
Christian Mueller
London School of Economics and Political Science
Will Lowe
Princeton University
Pablo Barberá
University of Southern California
Tyler Rinker
Campus Labs
mark padgham
@ATFutures
Christopher Gandrud
@zalando
José Tomás Atria
Tom Paskhalis
London School of Economics and Political Science
nicmer
lindbrook
hofaichan
etienne-s
hotzeplotz
Thomas J. Leeper
Stas Malavin
Soil Cryology Lab
Michael W. Kearney
@MUDSA
Michael Chirico
@myteksi
Katrin Leinweber
@TIBHannover
Johannes Gruber
University of Glasgow
quanteda/quanteda: CRAN v1.5.0
Kenneth Benoit
London School of Economics and Political Science
url:https://github.com/quanteda/quanteda/tree/v1.5.0
info:eu-repo/semantics/openAccess
Other (Open)
New features
<ul>
<li>Add <code>flatten</code> and <code>levels</code> arguments to <code>as.list.dictionary2()</code> to enable more flexible conversion of dictionary objects. (#1661)</li>
<li>In <code>corpus_sample()</code>, the <code>size</code> now works with the <code>by</code> argument, to control the size of units sampled from each group.</li>
<li>Improvements to <code>textstat_dist()</code> and <code>textstat_simil()</code>, see below.</li>
<li>Long tokens are not discarded automatically in the call to <code>tokens()</code>. (#1713)</li>
</ul>
Behaviour changes
<ul>
<li><code>textstat_dist()</code> and <code>textstat_simil()</code> now return sparse symmetric matrix objects using classes from the <strong>Matrix</strong> package. This replaces the former structure based on the <code>dist</code> class. Computation of these classes is now also based on the fast implementation in the <strong>proxyC</strong> package. When computing similarities, the new <code>min_simil</code> argument allows a user to ignore certain values below a specified similarity threshold. A new coercion method <code>as.data.frame.textstat_simildist()</code> now exists for converting these returns into a data.frame of pairwise comparisons. Existing methods such as <code>as.matrix()</code>, <code>as.dist()</code>, and <code>as.list()</code> work as they did before.</li>
<li>We have removed the "faith", "chi-squared", and "kullback" methods from <code>textstat_dist()</code> and <code>textstat_simil()</code> because these were either not symmetric or not invariant to document or feature ordering. Finally, the <code>selection</code> argument has been deprecated in favour of a new <code>y</code> argument. </li>
<li><code>textstat_readability()</code> now defaults to <code>measure = "Flesch"</code> if no measure is supplied. This makes it consistent with <code>textstat_lexdiv()</code> that also takes a default measure ("TTR") if none is supplied. (#1715)</li>
<li>The default values for <code>max_nchar</code> and <code>min_nchar</code> in <code>tokens_select()</code> are now NULL, meaning they are not applied if the user does not supply values. Fixes #1713.</li>
</ul>
Bug fixes and stability enhancements
<ul>
<li><code>kwic.corpus()</code> and <code>kwic.tokens()</code> behaviour now aligned, meaning that dictionaries are correctly faceted by key instead of by value. (#1684)</li>
<li>Improved formatting of <code>tokens()</code> verbose output. (#1683)</li>
<li>Subsetting and printing of subsetted kwic objects is more robust. (#1665)</li>
<li>The "Bormuth" and "DRP" measures are now fixed for <code>textstat_readability()</code>. (#1701)</li>
</ul>
Zenodo
2019-07-04
info:eu-repo/semantics/other
596731
v1.5.0
1680907735.103657
37034944
md5:5b7294057e21e230fdb349076e556c1e
https://zenodo.org/records/3268686/files/quanteda/quanteda-v1.5.0.zip
public
https://github.com/quanteda/quanteda/tree/v1.5.0
Is supplement to
url
10.5281/zenodo.596731
isVersionOf
doi