Software Open Access

qdap Version 2.1.0

Tyler Rinker; Colin Gillespie; Craig Citro



  • syllable_count returned the sentence (recycled) in the words column of the output. This behavior has been fixed. See GitHub issue #188 for details.

  • syn returned antonyms for some words. This was caused by the dictionary: qdapDictionaries::key.syn contained antonyms and elemets the were error messages (character). This has been fixed. Reference issue #190. (Jingjing Zou)

  • The pres_debates2012 data set contained three errors in speech attribution. This has been corrected and the turn of talk (tot) as well.

  • word_stats would throw an error if no poly-syllable words existed. This has been corrected (reported by Nicolas Turenne).


  • qdap_df and %&% added to mimic some of the functionality of dplyr's tbl_df and chaining pipe in a more specific, less flexible, qdap oriented way.

  • Text added to view and change the text.var attribute of a data.frame of the classqdap_df`.

  • cumulative generic method added to view cumulative scores over time.

  • formality picks up a cumulative method.

  • polarity picks up a cumulative method.

  • end_mark picks up a class (end_mark), plot method, and a cumulative method.

  • syllable_sum, polysyllable_sum, and combo_syllable_sum pick up a class, plot method, and a cumulative method.

  • wfm becomes a generic method currently applied to a text.var that is: character, factor (coerced to character), or wfdf.

  • unbag added as a compliment to bag_o_words and friends for undoing string splitting. A convenience wrapper for paste(collapse = " ").

  • as.Corpus.TermDocumentMatrix, as.Corpus.DocumentTermMatrix, and as.Corpus.wfm added to convert a matrix format to a tm::Corpus.

  • exclude becomes a generic method for various classes. Functionality is the same but with improved code readability.

  • check_spelling_interactive, check_spelling, which_misspelled, and correct allow the user to identify potentially misspelled words and optionally suggest replacements.

  • random_data & random_sent added to generate random sentence data sets and vectors.

  • comma_spacer added to ensure strings with commas contain a space after them.

  • check_text added to identify potential problems in text.

  • replace_ordinal added to convert ordinal representations of 1 through 100 to strictly ordinal text (e.g., "1st" becomes "first").

  • A vignette: Cleaning Text & Debugging was added to assist users with cleaning and debugging problems in qdap.

  • pronoun_type, and subject_pronoun_type, object_pronoun_type added to examine usage of subject/object pronouns by grouping variable.



  • wfm gains a speedup through generic classes and tm package integration (strip is no longer used in wfm).

  • as.tdm.character and as.dtm.character gain a speed boost with a tm package integration.

  • Added message to for missing end-marks suggesting the use of: sent.split = FALSE.

  • as.Corpus familiy of functions didn't necessarily respect document names and sometimes used numeric sequence instead. The introduction of a reader via tm::readTabular has fixed this.

  • sentSplit now gives warnings for text that may contain anomalies such as: non-ASCII characters, factors, missing punctuation, empty cells, and no alphabetic characters found.

  • read.transcript now gives a warning when reading from a .docx file and the separator (sep) used is still found in the text as this may indicate the data did not split correctly.

  • dispersion_plot now takes a named list of vectors of terms as the argument to match.terms. The vectors are combined as a unified theme named with the names of the list supplied to match.terms.


  •'s default value for sent.split is now FALSE.

  • The state column in the qdap::DATA2 data-set is now character (previously factor).

Name Size
9.8 MB Download


Cite as