Tidy a corpus object from the quanteda package. tidy returns a tbl_df with one-row-per-document, with a text column containing the document's text, and one column for each document-level metadata. glance returns a one-row tbl_df with corpus-level metadata, such as source and created. For Corpus objects from the tm package, see tidy.Corpus.

# S3 method for corpus
tidy(x, ...)

# S3 method for corpus
glance(x, ...)

Arguments

x

A Corpus object, such as a VCorpus or PCorpus

...

Extra arguments, not used

Details

For the most part, the tidy output is equivalent to the "documents" data frame in the corpus object, except that it is converted to a tbl_df, and texts column is renamed to text to be consistent with other uses in tidytext.

Similarly, the glance output is simply the "metadata" object, with NULL fields removed and turned into a one-row tbl_df.

Examples

if (requireNamespace("quanteda", quietly = TRUE)) { data("data_corpus_inaugural", package = "quanteda") data_corpus_inaugural tidy(data_corpus_inaugural) }
#> # A tibble: 58 x 4 #> text #> * <chr> #> 1 "Fellow-Citizens of the Senate and of the House of Representatives:\n\nAmong the #> 2 "Fellow citizens, I am again called upon by the voice of my country to execute t #> 3 "When it was first perceived, in early times, that no middle course for America #> 4 "Friends and Fellow Citizens:\n\nCalled upon to undertake the duties of the firs #> 5 "Proceeding, fellow citizens, to that qualification which the Constitution requi #> 6 "Unwilling to depart from examples of the most revered authority, I avail myself #> 7 "About to add the solemnity of an oath to the obligations imposed by a second ca #> 8 "I should be destitute of feeling if I was not deeply affected by the strong pro #> 9 "Fellow citizens, I shall not attempt to describe the grateful emotions which th #> 10 "In compliance with an usage coeval with the existence of our Federal Constituti #> # ... with 48 more rows, and 3 more variables: Year <dbl>, President <chr>, #> # FirstName <chr>