Package-level

quanteda-package

An R package for the quantitative analysis of textual data

quanteda_options()

Get or set package options for quanteda

Data

Built-in data objects.

data_char_sampletext

A paragraph of text for testing various text-based functions

data_char_ukimmig2010

Immigration-related sections of 2010 UK party manifestos

data-relocated

Formerly included data objects

data_corpus_inaugural

US presidential inaugural address texts

data_dfm_lbgexample

dfm from data in Table 1 of Laver, Benoit, and Garry (2003)

data_dictionary_LSD2015

Lexicoder Sentiment Dictionary (2015)

Corpus functions

Functions for constructing and manipulating corpus class objects.

corpus()

Construct a corpus object

corpus_reshape()

Recast the document units of a corpus

corpus_sample()

Randomly sample documents from a corpus

corpus_segment() char_segment()

Segment texts on a pattern match

corpus_subset()

Extract a subset of a corpus

corpus_trim() char_trim()

Remove sentences based on their token lengths or a pattern match

docvars() `docvars<-`() `$`(<corpus>) `$<-`(<corpus>) `$`(<tokens>) `$<-`(<tokens>) `$`(<dfm>) `$<-`(<dfm>)

Get or set document-level variables

head(<corpus>) tail(<corpus>)

Return the first or last part of a corpus

meta() `meta<-`() metacorpus() `metacorpus<-`()

Get or set object metadata

metadoc() `metadoc<-`()

Get or set document-level meta-data

texts() `texts<-`() as.character(<corpus>)

Get or assign corpus texts

Tokens functions

Functions for constructing and manipulating tokens class objects.

tokens()

Construct a tokens object

tokens_chunk()

Segment tokens object by chunks of a given size

tokens_compound()

Convert token sequences into compound tokens

tokens_lookup()

Apply a dictionary to a tokens object

tokens_ngrams() char_ngrams() tokens_skipgrams()

Create ngrams and skipgrams from tokens

tokens_select() tokens_remove() tokens_keep()

Select or remove tokens from a tokens object

tokens_replace()

Replace tokens in a tokens object

tokens_sample()

Randomly sample documents from a tokens object

tokens_split()

Split tokens by a separator pattern

tokens_subset()

Extract a subset of a tokens

tokens_tolower() tokens_toupper()

Convert the case of tokens

tokens_tortl() char_tortl()

[Experimental] Change direction of words in tokens

tokens_wordstem() char_wordstem() dfm_wordstem()

Stem the terms in an object

types()

Get word types from a tokens object

as.list(<tokens>) as.character(<tokens>) is.tokens() unlist(<tokens>) `+`(<tokens>) c(<tokens>) as.tokens() is.tokens()

Coercion, checking, and combining functions for tokens objects

Character functions

Functions for constructing and manipulating character objects.

char_tolower() char_toupper()

Convert the case of character objects

corpus_segment() char_segment()

Segment texts on a pattern match

tokens_ngrams() char_ngrams() tokens_skipgrams()

Create ngrams and skipgrams from tokens

char_select() char_remove() char_keep()

Select or remove elements from a character vector

corpus_trim() char_trim()

Remove sentences based on their token lengths or a pattern match

tokens_wordstem() char_wordstem() dfm_wordstem()

Stem the terms in an object

Text matrix functions

Functions for constructing and manipulating a document-feature matrix (dfm) or feature co-occurrence matrix object.

dfm()

Create a document-feature matrix

dfm_compress() fcm_compress()

Recombine a dfm or fcm by combining identical dimension elements

dfm_group()

Combine documents in a dfm by a grouping variable

dfm_lookup()

Apply a dictionary to a dfm

dfm_match()

Match the feature set of a dfm to given feature names

dfm_select() dfm_remove() dfm_keep() fcm_select() fcm_remove() fcm_keep()

Select features from a dfm or fcm

dfm_replace()

Replace features in dfm

dfm_sample()

Randomly sample documents or features from a dfm

dfm_subset()

Extract a subset of a dfm

dfm_sort()

Sort a dfm by frequency of one or more margins

dfm_tfidf()

Weight a dfm by tf-idf

dfm_tolower() dfm_toupper() fcm_tolower() fcm_toupper()

Convert the case of the features of a dfm and combine

dfm_trim()

Trim a dfm using frequency threshold-based feature selection

dfm_weight() dfm_smooth()

Weight the feature frequencies in a dfm

tokens_wordstem() char_wordstem() dfm_wordstem()

Stem the terms in an object

docfreq()

Compute the (weighted) document frequency of a feature

featfreq()

Compute the frequencies of features

head(<dfm>) tail(<dfm>)

Return the first or last part of a dfm

as.dfm() is.dfm()

Coercion and checking functions for dfm objects

as.data.frame(<dfm>)

Convert a dfm to a data.frame

as.matrix(<dfm>)

Coerce a dfm to a matrix or data.frame

fcm()

Create a feature co-occurrence matrix

fcm_sort()

Sort an fcm in alphabetical order of the features

as.fcm()

Coercion and checking functions for fcm objects

Text statistics

Functions for computing statistics from texts and dfm objects.

textstat_collocations() is.collocations()

Identify and score multi-word expressions

textstat_simil() textstat_dist() as.list(<textstat_proxy>) as.data.frame(<textstat_proxy>)

Similarity and distance computation between documents or features

textstat_entropy()

Compute entropies of documents or features

textstat_lexdiv()

Calculate lexical diversity

textstat_frequency()

Tabulate feature frequencies

textstat_keyness()

Calculate keyness statistics

textstat_readability()

Calculate readability

textstat_summary()

Summarize documents

sparsity()

Compute the sparsity of a document-feature matrix

topfeatures()

Identify the most frequent features in a dfm

Text models

Functions for fitting models to textual data.

textmodels

Models for scaling and classification of textual data

Dictionary functions

Constructor and utility functions for working with dictionaries.

dictionary()

Create a dictionary

dictionary_edit() list_edit() char_edit()

Conveniently edit dictionaries

as.dictionary() is.dictionary()

Coercion and checking functions for dictionary objects

as.yaml()

Convert quanteda dictionary objects to the YAML format

Phrase discovery functions

Functions for exploring and detecting keywords and phrases.

textstat_collocations() is.collocations()

Identify and score multi-word expressions

kwic() is.kwic()

Locate keywords-in-context

Text plot functions

Plot functions for representing text and the analysis of texts.

textplot_keyness()

Plot word keyness

textplot_network() as.network(<fcm>) as.igraph(<fcm>)

Plot a network of feature co-occurrences

textplot_wordcloud()

Plot features as a wordcloud

textplot_xray()

Plot the dispersion of key word(s)

Utility functions

R-like functions to return counts and object information.

ndoc() nfeat()

Count the number of documents or features

nscrabble()

Count the Scrabble letter values of text

nsentence()

Count the number of sentences

nsyllable()

Count syllables in a text

ntoken() ntype()

Count the number of tokens or types

print(<corpus>) print(<dfm>) show(<dfm>) print(<dictionary2>) show(<dictionary2>) print(<fcm>) show(<fcm>) print(<tokens>)

Print methods for quanteda core objects

docnames() `docnames<-`()

Get or set document names

docnames(<spacyr_parsed>) ndoc(<spacyr_parsed>) ntoken(<spacyr_parsed>) ntype(<spacyr_parsed>) nsentence(<spacyr_parsed>)

Extensions for and from spacy_parse objects

featnames()

Get the feature labels from a dfm

Miscellaneous functions

phrase() is.phrase()

Declare a compound character to be a sequence of separate pattern matches

convert()

Convert quanteda objects to non-quanteda formats

bootstrap_dfm()

Bootstrap a dfm

docnames(<spacyr_parsed>) ndoc(<spacyr_parsed>) ntoken(<spacyr_parsed>) ntype(<spacyr_parsed>) nsentence(<spacyr_parsed>)

Extensions for and from spacy_parse objects