Software Open Access
Tyler Rinker; Colin Gillespie; Craig Citro
CHANGES IN qdap VERSION 2.1.1
syllable_count returned the sentence (recycled) in the words column of the output. This behavior has been fixed. See GitHub issue #188 for details.
syn returned antonyms for some words. This was caused by the dictionary: qdapDictionaries::key.syn contained antonyms and elemets the were error messages (character). This has been fixed. Reference issue #190. (Jingjing Zou)
The pres_debates2012 data set contained three errors in speech attribution. This has been corrected and the turn of talk (tot) as well.
word_stats would throw an error if no poly-syllable words existed. This has been corrected (reported by Nicolas Turenne).
qdap_df and %&% added to mimic some of the functionality of dplyr's tbl_df and chaining pipe in a more specific, less flexible, qdap oriented way.
Text added to view and change the text.var attribute of a data.frame of the classqdap_df`.
cumulative generic method added to view cumulative scores over time.
formality picks up a cumulative method.
polarity picks up a cumulative method.
end_mark picks up a class (end_mark), plot method, and a cumulative method.
syllable_sum, polysyllable_sum, and combo_syllable_sum pick up a class, plot method, and a cumulative method.
wfm becomes a generic method currently applied to a text.var that is: character, factor (coerced to character), or wfdf.
unbag added as a compliment to bag_o_words and friends for undoing string splitting. A convenience wrapper for paste(collapse = " ").
as.Corpus.TermDocumentMatrix, as.Corpus.DocumentTermMatrix, and as.Corpus.wfm added to convert a matrix format to a tm::Corpus.
exclude becomes a generic method for various classes. Functionality is the same but with improved code readability.
check_spelling_interactive, check_spelling, which_misspelled, and correct allow the user to identify potentially misspelled words and optionally suggest replacements.
random_data & random_sent added to generate random sentence data sets and vectors.
comma_spacer added to ensure strings with commas contain a space after them.
check_text added to identify potential problems in text.
replace_ordinal added to convert ordinal representations of 1 through 100 to strictly ordinal text (e.g., "1st" becomes "first").
A vignette: Cleaning Text & Debugging was added to assist users with cleaning and debugging problems in qdap.
pronoun_type, and subject_pronoun_type, object_pronoun_type added to examine usage of subject/object pronouns by grouping variable.
wfm gains a speedup through generic classes and tm package integration (strip is no longer used in wfm).
as.tdm.character and as.dtm.character gain a speed boost with a tm package integration.
Added message to as.data.frame.Corpus for missing end-marks suggesting the use of: sent.split = FALSE.
as.Corpus familiy of functions didn't necessarily respect document names and sometimes used numeric sequence instead. The introduction of a reader via tm::readTabular has fixed this.
sentSplit now gives warnings for text that may contain anomalies such as: non-ASCII characters, factors, missing punctuation, empty cells, and no alphabetic characters found.
read.transcript now gives a warning when reading from a .docx file and the separator (sep) used is still found in the text as this may indicate the data did not split correctly.
dispersion_plot now takes a named list of vectors of terms as the argument to match.terms. The vectors are combined as a unified theme named with the names of the list supplied to match.terms.
as.data.frame.Corpus's default value for sent.split is now FALSE.
The state column in the qdap::DATA2 data-set is now character (previously factor).