Extensions of base R functions for corpus objects.
# S3 method for corpus print(x, ...) is.corpus(x) is.corpuszip(x) # S3 method for summary.corpus print(x, ...) # S3 method for corpus +(c1, c2) # S3 method for corpus c(..., recursive = FALSE) # S3 method for corpus [(x, i, j = NULL, ..., drop = TRUE) # S3 method for corpus [[(x, i, ...) # S3 method for corpus [[(x, i) <- value # S3 method for corpus str(object, ...)
x | a corpus object |
---|---|
... | not used |
c1 | corpus one to be added |
c2 | corpus two to be added |
recursive | logical used by `c()` method, always set to `FALSE` |
i | index for documents or rows of document variables |
j | index for column of document variables |
drop | if |
value | a vector that will form a new docvar |
object | the corpus about which you want structural information |
is.corpus
returns TRUE
if the object is a corpus
is.corpuszip
returns TRUE
if the object is a compressed corpus
The +
operator for a corpus object will combine two corpus
objects, resolving any non-matching docvars
or
metadoc
fields by making them into NA
values for the
corpus lacking that field. Corpus-level meta data is concatenated, except
for source
and notes
, which are stamped with information
pertaining to the creation of the new joined corpus.
The `c()` operator is also defined for corpus class objects, and provides
an easy way to combine multiple corpus objects.
There are some issues that need to be addressed in future revisions of
quanteda concerning the use of factors to store document variables and
meta-data. Currently most or all of these are not recorded as factors,
because we use stringsAsFactors=FALSE
in the
data.frame
calls that are used to create and store the
document-level information, because the texts should always be stored as
character vectors and never as factors.
# concatenate corpus objects corpus1 <- corpus(data_char_ukimmig2010[1:2]) corpus2 <- corpus(data_char_ukimmig2010[3:4]) corpus3 <- corpus(data_char_ukimmig2010[5:6]) summary(c(corpus1, corpus2, corpus3))#> Corpus consisting of 6 documents: #> #> Text Types Tokens Sentences #> BNP 1125 3280 88 #> Coalition 142 260 4 #> Conservative 251 499 15 #> Greens 322 679 21 #> Labour 298 683 29 #> LibDem 251 483 14 #> #> Source: Concatenation by c.corpus() #> Created: Fri Oct 6 09:35:46 2017 #> Notes:# ways to index corpus elements data_corpus_inaugural["1793-Washington"] # 2nd Washington inaugural speech#> 1793-Washington #> "Fellow citizens, I am again called upon by the voice of my country to execute the functions of its Chief Magistrate. When the occasion proper for it shall arrive, I shall endeavor to express the high sense I entertain of this distinguished honor, and of the confidence which has been reposed in me by the people of united America.\n\nPrevious to the execution of any official act of the President the Constitution requires an oath of office. This oath I am now about to take, and in your presence: That if it shall be found during my administration of the Government I have in any instance violated willingly or knowingly the injunctions thereof, I may (besides incurring constitutional punishment) be subject to the upbraidings of all who are now witnesses of the present solemn ceremony.\n\n "data_corpus_inaugural[2] # same#> 1793-Washington #> "Fellow citizens, I am again called upon by the voice of my country to execute the functions of its Chief Magistrate. When the occasion proper for it shall arrive, I shall endeavor to express the high sense I entertain of this distinguished honor, and of the confidence which has been reposed in me by the people of united America.\n\nPrevious to the execution of any official act of the President the Constitution requires an oath of office. This oath I am now about to take, and in your presence: That if it shall be found during my administration of the Government I have in any instance violated willingly or knowingly the injunctions thereof, I may (besides incurring constitutional punishment) be subject to the upbraidings of all who are now witnesses of the present solemn ceremony.\n\n "# access the docvars from data_corpus_irishbudget2010 data_corpus_irishbudget2010[, "year"]#> [1] "2010" "2010" "2010" "2010" "2010" "2010" "2010" "2010" "2010" "2010" #> [11] "2010" "2010" "2010" "2010"# same data_corpus_irishbudget2010[["year"]]#> year #> 2010_BUDGET_01_Brian_Lenihan_FF 2010 #> 2010_BUDGET_02_Richard_Bruton_FG 2010 #> 2010_BUDGET_03_Joan_Burton_LAB 2010 #> 2010_BUDGET_04_Arthur_Morgan_SF 2010 #> 2010_BUDGET_05_Brian_Cowen_FF 2010 #> 2010_BUDGET_06_Enda_Kenny_FG 2010 #> 2010_BUDGET_07_Kieran_ODonnell_FG 2010 #> 2010_BUDGET_08_Eamon_Gilmore_LAB 2010 #> 2010_BUDGET_09_Michael_Higgins_LAB 2010 #> 2010_BUDGET_10_Ruairi_Quinn_LAB 2010 #> 2010_BUDGET_11_John_Gormley_Green 2010 #> 2010_BUDGET_12_Eamon_Ryan_Green 2010 #> 2010_BUDGET_13_Ciaran_Cuffe_Green 2010 #> 2010_BUDGET_14_Caoimhghin_OCaolain_SF 2010# create a new document variable data_corpus_irishbudget2010[["govtopp"]] <- ifelse(data_corpus_irishbudget2010[["party"]] %in% c("FF", "Greens"), "Government", "Opposition") docvars(data_corpus_irishbudget2010)#> year debate number foren name #> 2010_BUDGET_01_Brian_Lenihan_FF 2010 BUDGET 01 Brian Lenihan #> 2010_BUDGET_02_Richard_Bruton_FG 2010 BUDGET 02 Richard Bruton #> 2010_BUDGET_03_Joan_Burton_LAB 2010 BUDGET 03 Joan Burton #> 2010_BUDGET_04_Arthur_Morgan_SF 2010 BUDGET 04 Arthur Morgan #> 2010_BUDGET_05_Brian_Cowen_FF 2010 BUDGET 05 Brian Cowen #> 2010_BUDGET_06_Enda_Kenny_FG 2010 BUDGET 06 Enda Kenny #> 2010_BUDGET_07_Kieran_ODonnell_FG 2010 BUDGET 07 Kieran ODonnell #> 2010_BUDGET_08_Eamon_Gilmore_LAB 2010 BUDGET 08 Eamon Gilmore #> 2010_BUDGET_09_Michael_Higgins_LAB 2010 BUDGET 09 Michael Higgins #> 2010_BUDGET_10_Ruairi_Quinn_LAB 2010 BUDGET 10 Ruairi Quinn #> 2010_BUDGET_11_John_Gormley_Green 2010 BUDGET 11 John Gormley #> 2010_BUDGET_12_Eamon_Ryan_Green 2010 BUDGET 12 Eamon Ryan #> 2010_BUDGET_13_Ciaran_Cuffe_Green 2010 BUDGET 13 Ciaran Cuffe #> 2010_BUDGET_14_Caoimhghin_OCaolain_SF 2010 BUDGET 14 Caoimhghin OCaolain #> party govtopp #> 2010_BUDGET_01_Brian_Lenihan_FF FF Opposition #> 2010_BUDGET_02_Richard_Bruton_FG FG Opposition #> 2010_BUDGET_03_Joan_Burton_LAB LAB Opposition #> 2010_BUDGET_04_Arthur_Morgan_SF SF Opposition #> 2010_BUDGET_05_Brian_Cowen_FF FF Opposition #> 2010_BUDGET_06_Enda_Kenny_FG FG Opposition #> 2010_BUDGET_07_Kieran_ODonnell_FG FG Opposition #> 2010_BUDGET_08_Eamon_Gilmore_LAB LAB Opposition #> 2010_BUDGET_09_Michael_Higgins_LAB LAB Opposition #> 2010_BUDGET_10_Ruairi_Quinn_LAB LAB Opposition #> 2010_BUDGET_11_John_Gormley_Green Green Opposition #> 2010_BUDGET_12_Eamon_Ryan_Green Green Opposition #> 2010_BUDGET_13_Ciaran_Cuffe_Green Green Opposition #> 2010_BUDGET_14_Caoimhghin_OCaolain_SF SF Opposition