Returns subsets of a corpus that meet certain conditions, including direct
logical operations on docvars (document-level variables). corpus_subset
functions identically to subset.data.frame
, using non-standard
evaluation to evaluate conditions based on the docvars in the corpus.
corpus_subset(x, subset, select, ...)
x | corpus object to be subsetted |
---|---|
subset | logical expression indicating the documents to keep: missing values are taken as false |
select | expression, indicating the docvars to select from the corpus |
... | not used |
corpus object, with a subset of documents (and docvars) selected according to arguments
subset.data.frame
summary(corpus_subset(data_corpus_inaugural, Year > 1980))#> Corpus consisting of 10 documents: #> #> Text Types Tokens Sentences Year President FirstName #> 1981-Reagan 902 2790 128 1981 Reagan Ronald #> 1985-Reagan 925 2921 123 1985 Reagan Ronald #> 1989-Bush 795 2681 141 1989 Bush George #> 1993-Clinton 642 1833 81 1993 Clinton Bill #> 1997-Clinton 773 2449 111 1997 Clinton Bill #> 2001-Bush 621 1808 97 2001 Bush George W. #> 2005-Bush 773 2319 100 2005 Bush George W. #> 2009-Obama 938 2711 110 2009 Obama Barack #> 2013-Obama 814 2317 88 2013 Obama Barack #> 2017-Trump 582 1660 88 2017 Trump Donald J. #> #> Source: Gerhard Peters and John T. Woolley. The American Presidency Project. #> Created: Tue Jun 13 14:51:47 2017 #> Notes: http://www.presidency.ucsb.edu/inaugurals.phpsummary(corpus_subset(data_corpus_inaugural, Year > 1930 & President == "Roosevelt", select = Year))#> Corpus consisting of 4 documents: #> #> Text Types Tokens Sentences Year #> 1933-Roosevelt 743 2062 85 1933 #> 1937-Roosevelt 725 1997 96 1937 #> 1941-Roosevelt 526 1544 68 1941 #> 1945-Roosevelt 275 647 26 1945 #> #> Source: Gerhard Peters and John T. Woolley. The American Presidency Project. #> Created: Tue Jun 13 14:51:47 2017 #> Notes: http://www.presidency.ucsb.edu/inaugurals.php