There is a newer version of this record available.

Software Open Access

quanteda/quanteda: CRAN v1.5.0

Kenneth Benoit; Kohei Watanabe; Haiyan Wang; Paul Nulty; Adam Obeng; Stefan Müller; Jiong Wei Lua; Aki Matsuo; Christian Mueller; Will Lowe; Pablo Barberá; Tyler Rinker; mark padgham; Christopher Gandrud; José Tomás Atria; Tom Paskhalis; nicmer; lindbrook; hofaichan; etienne-s; hotzeplotz; Thomas J. Leeper; Stas Malavin; Michael W. Kearney; Michael Chirico; Katrin Leinweber; Johannes Gruber


DataCite XML Export

<?xml version='1.0' encoding='utf-8'?>
<resource xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://datacite.org/schema/kernel-4" xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4.1/metadata.xsd">
  <identifier identifierType="DOI">10.5281/zenodo.3268686</identifier>
  <creators>
    <creator>
      <creatorName>Kenneth Benoit</creatorName>
      <affiliation>London School of Economics and Political Science</affiliation>
    </creator>
    <creator>
      <creatorName>Kohei Watanabe</creatorName>
      <affiliation>Waseda University</affiliation>
    </creator>
    <creator>
      <creatorName>Haiyan Wang</creatorName>
      <affiliation>Tracr</affiliation>
    </creator>
    <creator>
      <creatorName>Paul Nulty</creatorName>
      <affiliation>University College Dublin</affiliation>
    </creator>
    <creator>
      <creatorName>Adam Obeng</creatorName>
      <affiliation>Columbia University, London School of Economics</affiliation>
    </creator>
    <creator>
      <creatorName>Stefan Müller</creatorName>
      <affiliation>University of Zurich</affiliation>
    </creator>
    <creator>
      <creatorName>Jiong Wei Lua</creatorName>
      <affiliation>London School of Economics</affiliation>
    </creator>
    <creator>
      <creatorName>Aki Matsuo</creatorName>
      <affiliation>Institute for Analytics and Data Science, University of Essex</affiliation>
    </creator>
    <creator>
      <creatorName>Christian Mueller</creatorName>
      <affiliation>London School of Economics and Political Science</affiliation>
    </creator>
    <creator>
      <creatorName>Will Lowe</creatorName>
      <affiliation>Princeton University</affiliation>
    </creator>
    <creator>
      <creatorName>Pablo Barberá</creatorName>
      <affiliation>University of Southern California</affiliation>
    </creator>
    <creator>
      <creatorName>Tyler Rinker</creatorName>
      <affiliation>Campus Labs</affiliation>
    </creator>
    <creator>
      <creatorName>mark padgham</creatorName>
      <affiliation>@ATFutures</affiliation>
    </creator>
    <creator>
      <creatorName>Christopher Gandrud</creatorName>
      <affiliation>@zalando</affiliation>
    </creator>
    <creator>
      <creatorName>José Tomás Atria</creatorName>
      <affiliation></affiliation>
    </creator>
    <creator>
      <creatorName>Tom Paskhalis</creatorName>
      <affiliation>London School of Economics and Political Science</affiliation>
    </creator>
    <creator>
      <creatorName>nicmer</creatorName>
      <affiliation></affiliation>
    </creator>
    <creator>
      <creatorName>lindbrook</creatorName>
      <affiliation></affiliation>
    </creator>
    <creator>
      <creatorName>hofaichan</creatorName>
      <affiliation></affiliation>
    </creator>
    <creator>
      <creatorName>etienne-s</creatorName>
      <affiliation></affiliation>
    </creator>
    <creator>
      <creatorName>hotzeplotz</creatorName>
      <affiliation></affiliation>
    </creator>
    <creator>
      <creatorName>Thomas J. Leeper</creatorName>
      <affiliation></affiliation>
    </creator>
    <creator>
      <creatorName>Stas Malavin</creatorName>
      <affiliation>Soil Cryology Lab</affiliation>
    </creator>
    <creator>
      <creatorName>Michael W. Kearney</creatorName>
      <affiliation>@MUDSA</affiliation>
    </creator>
    <creator>
      <creatorName>Michael Chirico</creatorName>
      <affiliation>@myteksi</affiliation>
    </creator>
    <creator>
      <creatorName>Katrin Leinweber</creatorName>
      <affiliation>@TIBHannover</affiliation>
    </creator>
    <creator>
      <creatorName>Johannes Gruber</creatorName>
      <affiliation>University of Glasgow</affiliation>
    </creator>
  </creators>
  <titles>
    <title>quanteda/quanteda: CRAN v1.5.0</title>
  </titles>
  <publisher>Zenodo</publisher>
  <publicationYear>2019</publicationYear>
  <dates>
    <date dateType="Issued">2019-07-04</date>
  </dates>
  <resourceType resourceTypeGeneral="Software"/>
  <alternateIdentifiers>
    <alternateIdentifier alternateIdentifierType="url">https://zenodo.org/record/3268686</alternateIdentifier>
  </alternateIdentifiers>
  <relatedIdentifiers>
    <relatedIdentifier relatedIdentifierType="URL" relationType="IsSupplementTo">https://github.com/quanteda/quanteda/tree/v1.5.0</relatedIdentifier>
    <relatedIdentifier relatedIdentifierType="DOI" relationType="IsVersionOf">10.5281/zenodo.596731</relatedIdentifier>
  </relatedIdentifiers>
  <version>v1.5.0</version>
  <rightsList>
    <rights rightsURI="info:eu-repo/semantics/openAccess">Open Access</rights>
  </rightsList>
  <descriptions>
    <description descriptionType="Abstract">New features
&lt;ul&gt;
&lt;li&gt;Add &lt;code&gt;flatten&lt;/code&gt; and &lt;code&gt;levels&lt;/code&gt; arguments to &lt;code&gt;as.list.dictionary2()&lt;/code&gt; to enable more flexible conversion of dictionary objects. (#1661)&lt;/li&gt;
&lt;li&gt;In &lt;code&gt;corpus_sample()&lt;/code&gt;, the &lt;code&gt;size&lt;/code&gt; now works with the &lt;code&gt;by&lt;/code&gt; argument, to control the size of units sampled from each group.&lt;/li&gt;
&lt;li&gt;Improvements to &lt;code&gt;textstat_dist()&lt;/code&gt; and &lt;code&gt;textstat_simil()&lt;/code&gt;, see below.&lt;/li&gt;
&lt;li&gt;Long tokens are not discarded automatically in the call to &lt;code&gt;tokens()&lt;/code&gt;. (#1713)&lt;/li&gt;
&lt;/ul&gt;
Behaviour changes
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;textstat_dist()&lt;/code&gt; and &lt;code&gt;textstat_simil()&lt;/code&gt; now return sparse symmetric matrix objects using classes from the &lt;strong&gt;Matrix&lt;/strong&gt; package.  This replaces the former structure based on the &lt;code&gt;dist&lt;/code&gt; class.  Computation of these classes is now also based on the fast implementation in the &lt;strong&gt;proxyC&lt;/strong&gt; package.  When computing similarities, the new &lt;code&gt;min_simil&lt;/code&gt; argument allows a user to ignore certain values below a specified similarity threshold.  A new coercion method &lt;code&gt;as.data.frame.textstat_simildist()&lt;/code&gt; now exists for converting these returns into a data.frame of pairwise comparisons.  Existing methods such as &lt;code&gt;as.matrix()&lt;/code&gt;, &lt;code&gt;as.dist()&lt;/code&gt;, and &lt;code&gt;as.list()&lt;/code&gt; work as they did before.&lt;/li&gt;
&lt;li&gt;We have removed the "faith", "chi-squared", and "kullback" methods from &lt;code&gt;textstat_dist()&lt;/code&gt; and &lt;code&gt;textstat_simil()&lt;/code&gt; because these were either not symmetric or not invariant to document or feature ordering. Finally, the &lt;code&gt;selection&lt;/code&gt; argument has been deprecated in favour of a new &lt;code&gt;y&lt;/code&gt; argument.  &lt;/li&gt;
&lt;li&gt;&lt;code&gt;textstat_readability()&lt;/code&gt; now defaults to &lt;code&gt;measure = "Flesch"&lt;/code&gt; if no measure is supplied.  This makes it consistent with &lt;code&gt;textstat_lexdiv()&lt;/code&gt; that also takes a default measure ("TTR") if none is supplied.  (#1715)&lt;/li&gt;
&lt;li&gt;The default values for &lt;code&gt;max_nchar&lt;/code&gt; and &lt;code&gt;min_nchar&lt;/code&gt; in &lt;code&gt;tokens_select()&lt;/code&gt; are now NULL, meaning they are not applied if the user does not supply values.  Fixes #1713.&lt;/li&gt;
&lt;/ul&gt;
Bug fixes and stability enhancements
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;kwic.corpus()&lt;/code&gt; and &lt;code&gt;kwic.tokens()&lt;/code&gt; behaviour now aligned, meaning that dictionaries are correctly faceted by key instead of by value. (#1684)&lt;/li&gt;
&lt;li&gt;Improved formatting of &lt;code&gt;tokens()&lt;/code&gt; verbose output. (#1683)&lt;/li&gt;
&lt;li&gt;Subsetting and printing of subsetted kwic objects is more robust. (#1665)&lt;/li&gt;
&lt;li&gt;The "Bormuth" and "DRP" measures are now fixed for &lt;code&gt;textstat_readability()&lt;/code&gt;. (#1701)&lt;/li&gt;
&lt;/ul&gt;</description>
  </descriptions>
</resource>
626
129
views
downloads
All versions This version
Views 62610
Downloads 1291
Data volume 3.4 GB37.0 MB
Unique views 58210
Unique downloads 461

Share

Cite as