Published May 28, 2017
| Version v0.9.9.65
Software
Open
kbenoit/quanteda: CRAN v0.9.9.65
Creators
- 1. London School of Economics and Political Science
- 2. University of Cambridge
- 3. Columbia University, London School of Economics
- 4. LSE
- 5. London School of Economics
- 6. Department of Methodology, London School of Economics
- 7. Trinity College Dublin
- 8. University of Southern California
- 9. Harvard IQSS (@IQSS)
- 10. University at Buffalo
- 11. Soil Cryology Lab
Description
Changes since v0.9.9-50 New features
- Corpus construction using
corpus()
now works for atm::SimpleCorpus
object. (#680) - Added
corpus_trim()
andchar_trim()
functions for selecting documents or subsets of documents based on sentence, paragraph, or document lengths. - Conversion of a dfm to an stm object now passes docvars through in the
$meta
of the return object. - New
dfm_group(x, groups = )
command, a convenience wrapper arounddfm.dfm(x, groups = )
(#725). - Methods for extending quanteda functions to readtext objects updated to match CRAN release of readtext package.
- Corpus constructor methods for data.frame objects now conform to the "text interchange format" for corpus data.frames, automatically recognizing
doc_id
andtext
fields, which also provides interoperability with the readtext package. corpus construction methods are now more explicitly tailored to input object classes.
dfm_lookup()
behaves more robustly on different platforms, especially for keys whose values match no features (#704).textstat_simil()
andtextstat_dist()
no longer take then
argument, as this was not sorting features in correct order.- Fixed failure of
tokens(x, what = "character")
whenx
included Twitter characters@
and#
(#637). - Fixed bug #707 where
ntype.dfm()
produced an incorrect result. - Fixed bug #706 where
textstat_readability()
andtextstat_lexdiv()
for single-document returns whendrop = TRUE
. - Improved the robustness of
corpus_reshape()
. print
, andhead
, andtail
methods fordfm
are more robust (#684).- Fixed bug in
convert(x, to = "stm")
caused by zero-count documents and zero-count features in a dfm (#699, #700, #701). This also removes docvar rows from$meta
when this is passed through the dfm, for zero-count documents. - Corrected broken handling of nested Yoshikoder dictionaries in
dictionary()
. (#722) dfm_compress
now preserves a dfm's docvars if collapsing only on the features margin, which means thatdfm_tolower()
anddfm_toupper()
no longer remove the docvars.fcm_compress()
now retains the fcm class, and generates and error when an asymmetric compression is attempted (#728).textstat_collocations()
now returns the collocations as character, not as a factor (#736)- Fixed a bug in
dfm_lookup(x, exclusive = FALSE)
wherein an empty dfm ws returned with there was no no match (#116). - Argument passing through
dfm()
totokens()
is now robust, and preserves variables defined in the calling environment (#721). - Fixed issues related to dictionaries failing when applying
str()
,names()
, or other indexing operations, which started happening on Linux and Windows platforms following the CRAN move to 3.4.0. (#744) - Dictionary import using the LIWC format is more robust to improperly formatted input files (#685).
- Weights applied using
dfm_weight()
now print friendlier error messages when the weight vector contains features not found in the dfm. See this Stack Overflow question for the use case that sparked this improvement.
Files
kbenoit/quanteda-v0.9.9.65.zip
Files
(15.9 MB)
Name | Size | Download all |
---|---|---|
md5:da376abc2069d0d6fa93b5266d08c5bd
|
15.9 MB | Preview Download |
Additional details
Related works
- Is supplement to
- https://github.com/kbenoit/quanteda/tree/v0.9.9.65 (URL)