Software Open Access
Kenneth Benoit; Kohei Watanabe; Haiyan Wang; Paul Nulty; Adam Obeng; Stefan Müller; Jiong Wei Lua; Aki Matsuo; Christian Mueller; Will Lowe; Pablo Barberá; Christopher Gandrud; mark padgham; Tyler Rinker; José Tomás Atria; Johannes Gruber; Katrin Leinweber; Michael Chirico; Michael W. Kearney; Stas Malavin; Thomas J. Leeper; hotzeplotz; Chung-hong Chan; etienne-s; hofaichan; lindbrook; nicmer; Tom Paskhalis
Changes
block_size
to quanteda_options()
to control the number of documents in blocked tokenization.print.dictionary2()
to control the printing of nested levels with max_nkey
(#1967)textstat_summary()
to provide detailed information about dfm, tokens and corpus objects. It will replace summary()
in future versions.what = "word"
) corpora with large numbers of documents that contain social media tags and URLs that needed to be preserved (such a large corpus of Tweets).quanteda_options()
. The following are now preserved: "#政治" as well as Weibo-style hashtags such as "#英国首相#".convert(x, to = "data.frame")
now outputs the first column as "doc_id" rather than "document" since "document" is a commonly occurring term in many texts. (#1918)char_select()
, char_keep()
, and char_remove()
for easy manipulation of character vectors.dictionary_edit()
for easy, interactive editing of dictionaries, plus the functions char_edit()
and list_edit()
for editing character and list of character objects.textplot_wordcloud()
that plots objects from textstat_keyness()
, to visualize keywords either by comparison or for the target category only.kwic()
(#1840).logsmooth
scheme to dfm_weight()
.textstat_summary()
method, which returns summary information about the tokens/types/features etc in an object. It also caches summary information so that this can be retrieved on subsequent calls, rather than re-computed.NA
for non-existent features when n
> nfeat(x)
in textstat_frequency(x, n)
. (#1929)dfm_lookup()
and tokens_lookup()
in which an error was caused when no dictionary key returned a single match (#1946).textstat_simil/dist
object converted to a data.frame to drop its document2
labels (#1939).dfm_match()
to fail on a dfm that included "pads" (""
). (#1960)data_dfm_lbgexample
object using more modern dfm internals.textstat_readability()
, textstat_lexdiv()
, and nscrabble()
so that empty texts are not dropped in the result. (#1976)Name | Size | |
---|---|---|
quanteda/quanteda-v2.1.0.zip
md5:d47f7a5422db03c600904e1c7e9a4828 |
38.4 MB | Download |
All versions | This version | |
---|---|---|
Views | 1,543 | 22 |
Downloads | 210 | 1 |
Data volume | 6.1 GB | 38.4 MB |
Unique views | 1,445 | 21 |
Unique downloads | 120 | 1 |