There is a newer version of this record available.

Software Open Access

quanteda/quanteda: CRAN v1.5.0

Kenneth Benoit; Kohei Watanabe; Haiyan Wang; Paul Nulty; Adam Obeng; Stefan Müller; Jiong Wei Lua; Aki Matsuo; Christian Mueller; Will Lowe; Pablo Barberá; Tyler Rinker; mark padgham; Christopher Gandrud; José Tomás Atria; Tom Paskhalis; nicmer; lindbrook; hofaichan; etienne-s; hotzeplotz; Thomas J. Leeper; Stas Malavin; Michael W. Kearney; Michael Chirico; Katrin Leinweber; Johannes Gruber


MARC21 XML Export

<?xml version='1.0' encoding='UTF-8'?>
<record xmlns="http://www.loc.gov/MARC21/slim">
  <leader>00000nmm##2200000uu#4500</leader>
  <controlfield tag="005">20190730123356.0</controlfield>
  <controlfield tag="001">3268686</controlfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Waseda University</subfield>
    <subfield code="a">Kohei Watanabe</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Tracr</subfield>
    <subfield code="a">Haiyan Wang</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">University College Dublin</subfield>
    <subfield code="a">Paul Nulty</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Columbia University, London School of Economics</subfield>
    <subfield code="a">Adam Obeng</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">University of Zurich</subfield>
    <subfield code="a">Stefan Müller</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">London School of Economics</subfield>
    <subfield code="a">Jiong Wei Lua</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Institute for Analytics and Data Science, University of Essex</subfield>
    <subfield code="a">Aki Matsuo</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">London School of Economics and Political Science</subfield>
    <subfield code="a">Christian Mueller</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Princeton University</subfield>
    <subfield code="a">Will Lowe</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">University of Southern California</subfield>
    <subfield code="a">Pablo Barberá</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Campus Labs</subfield>
    <subfield code="a">Tyler Rinker</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">@ATFutures</subfield>
    <subfield code="a">mark padgham</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">@zalando</subfield>
    <subfield code="a">Christopher Gandrud</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="a">José Tomás Atria</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">London School of Economics and Political Science</subfield>
    <subfield code="a">Tom Paskhalis</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="a">nicmer</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="a">lindbrook</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="a">hofaichan</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="a">etienne-s</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="a">hotzeplotz</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="a">Thomas J. Leeper</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Soil Cryology Lab</subfield>
    <subfield code="a">Stas Malavin</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">@MUDSA</subfield>
    <subfield code="a">Michael W. Kearney</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">@myteksi</subfield>
    <subfield code="a">Michael Chirico</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">@TIBHannover</subfield>
    <subfield code="a">Katrin Leinweber</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">University of Glasgow</subfield>
    <subfield code="a">Johannes Gruber</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">37034944</subfield>
    <subfield code="z">md5:5b7294057e21e230fdb349076e556c1e</subfield>
    <subfield code="u">https://zenodo.org/record/3268686/files/quanteda/quanteda-v1.5.0.zip</subfield>
  </datafield>
  <datafield tag="542" ind1=" " ind2=" ">
    <subfield code="l">open</subfield>
  </datafield>
  <datafield tag="260" ind1=" " ind2=" ">
    <subfield code="c">2019-07-04</subfield>
  </datafield>
  <datafield tag="909" ind1="C" ind2="O">
    <subfield code="p">software</subfield>
    <subfield code="o">oai:zenodo.org:3268686</subfield>
  </datafield>
  <datafield tag="100" ind1=" " ind2=" ">
    <subfield code="u">London School of Economics and Political Science</subfield>
    <subfield code="a">Kenneth Benoit</subfield>
  </datafield>
  <datafield tag="245" ind1=" " ind2=" ">
    <subfield code="a">quanteda/quanteda: CRAN v1.5.0</subfield>
  </datafield>
  <datafield tag="540" ind1=" " ind2=" ">
    <subfield code="a">Other (Open)</subfield>
  </datafield>
  <datafield tag="650" ind1="1" ind2="7">
    <subfield code="a">cc-by</subfield>
    <subfield code="2">opendefinition.org</subfield>
  </datafield>
  <datafield tag="520" ind1=" " ind2=" ">
    <subfield code="a">New features
&lt;ul&gt;
&lt;li&gt;Add &lt;code&gt;flatten&lt;/code&gt; and &lt;code&gt;levels&lt;/code&gt; arguments to &lt;code&gt;as.list.dictionary2()&lt;/code&gt; to enable more flexible conversion of dictionary objects. (#1661)&lt;/li&gt;
&lt;li&gt;In &lt;code&gt;corpus_sample()&lt;/code&gt;, the &lt;code&gt;size&lt;/code&gt; now works with the &lt;code&gt;by&lt;/code&gt; argument, to control the size of units sampled from each group.&lt;/li&gt;
&lt;li&gt;Improvements to &lt;code&gt;textstat_dist()&lt;/code&gt; and &lt;code&gt;textstat_simil()&lt;/code&gt;, see below.&lt;/li&gt;
&lt;li&gt;Long tokens are not discarded automatically in the call to &lt;code&gt;tokens()&lt;/code&gt;. (#1713)&lt;/li&gt;
&lt;/ul&gt;
Behaviour changes
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;textstat_dist()&lt;/code&gt; and &lt;code&gt;textstat_simil()&lt;/code&gt; now return sparse symmetric matrix objects using classes from the &lt;strong&gt;Matrix&lt;/strong&gt; package.  This replaces the former structure based on the &lt;code&gt;dist&lt;/code&gt; class.  Computation of these classes is now also based on the fast implementation in the &lt;strong&gt;proxyC&lt;/strong&gt; package.  When computing similarities, the new &lt;code&gt;min_simil&lt;/code&gt; argument allows a user to ignore certain values below a specified similarity threshold.  A new coercion method &lt;code&gt;as.data.frame.textstat_simildist()&lt;/code&gt; now exists for converting these returns into a data.frame of pairwise comparisons.  Existing methods such as &lt;code&gt;as.matrix()&lt;/code&gt;, &lt;code&gt;as.dist()&lt;/code&gt;, and &lt;code&gt;as.list()&lt;/code&gt; work as they did before.&lt;/li&gt;
&lt;li&gt;We have removed the "faith", "chi-squared", and "kullback" methods from &lt;code&gt;textstat_dist()&lt;/code&gt; and &lt;code&gt;textstat_simil()&lt;/code&gt; because these were either not symmetric or not invariant to document or feature ordering. Finally, the &lt;code&gt;selection&lt;/code&gt; argument has been deprecated in favour of a new &lt;code&gt;y&lt;/code&gt; argument.  &lt;/li&gt;
&lt;li&gt;&lt;code&gt;textstat_readability()&lt;/code&gt; now defaults to &lt;code&gt;measure = "Flesch"&lt;/code&gt; if no measure is supplied.  This makes it consistent with &lt;code&gt;textstat_lexdiv()&lt;/code&gt; that also takes a default measure ("TTR") if none is supplied.  (#1715)&lt;/li&gt;
&lt;li&gt;The default values for &lt;code&gt;max_nchar&lt;/code&gt; and &lt;code&gt;min_nchar&lt;/code&gt; in &lt;code&gt;tokens_select()&lt;/code&gt; are now NULL, meaning they are not applied if the user does not supply values.  Fixes #1713.&lt;/li&gt;
&lt;/ul&gt;
Bug fixes and stability enhancements
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;kwic.corpus()&lt;/code&gt; and &lt;code&gt;kwic.tokens()&lt;/code&gt; behaviour now aligned, meaning that dictionaries are correctly faceted by key instead of by value. (#1684)&lt;/li&gt;
&lt;li&gt;Improved formatting of &lt;code&gt;tokens()&lt;/code&gt; verbose output. (#1683)&lt;/li&gt;
&lt;li&gt;Subsetting and printing of subsetted kwic objects is more robust. (#1665)&lt;/li&gt;
&lt;li&gt;The "Bormuth" and "DRP" measures are now fixed for &lt;code&gt;textstat_readability()&lt;/code&gt;. (#1701)&lt;/li&gt;
&lt;/ul&gt;</subfield>
  </datafield>
  <datafield tag="773" ind1=" " ind2=" ">
    <subfield code="n">url</subfield>
    <subfield code="i">isSupplementTo</subfield>
    <subfield code="a">https://github.com/quanteda/quanteda/tree/v1.5.0</subfield>
  </datafield>
  <datafield tag="773" ind1=" " ind2=" ">
    <subfield code="n">doi</subfield>
    <subfield code="i">isVersionOf</subfield>
    <subfield code="a">10.5281/zenodo.596731</subfield>
  </datafield>
  <datafield tag="024" ind1=" " ind2=" ">
    <subfield code="a">10.5281/zenodo.3268686</subfield>
    <subfield code="2">doi</subfield>
  </datafield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">software</subfield>
  </datafield>
</record>
620
129
views
downloads
All versions This version
Views 6208
Downloads 1291
Data volume 3.4 GB37.0 MB
Unique views 5768
Unique downloads 461

Share

Cite as