load_truth.Rd
By default, for the US hub, the resulting data.frame contains data for weekly incident cases (JHU), weekly incident deaths (JHU) and daily incident hospitalization (HealthData) at all county, state and national level. For the ECDC hub, the default resulting data.frame contains data for weekly incident cases (JHU), weekly incident deaths (JHU) and daily incident hospitalization (ECDC) for all European countries.
load_truth( truth_source = NULL, target_variable = NULL, as_of = NULL, truth_end_date = NULL, temporal_resolution = NULL, locations = NULL, data_location = NULL, local_repo_path = NULL, hub = c("US", "ECDC") )
truth_source | character vector specifying where the truths will
be loaded from: currently support |
---|---|
target_variable | string specifying target type It should be one or more of
|
as_of | character vector of "as of" dates to use for querying truths in
format 'yyyy-mm-dd'. For each spatial unit and temporal reporting unit, the last
available data with an issue date on or before the given |
truth_end_date | date to include the last available truth point in 'yyyy-mm-dd' format.
If |
temporal_resolution | character specifying temporal resolution
to include: currently support |
locations | vector of valid location code.
If |
data_location | character specifying the location of truth data.
Currently only supports |
local_repo_path | path to local clone of the |
hub | character, which hub to use. Default is |
data.frame with columns model
, target_variable
, target_end_date
,
location
, value
, location_name
, population
and extra information in these cases
If hub = "US"
, it returns extra columns geo_type
, geo_value
, abbreviation
and full_location_name
.
If truth_source = "ECDC"
, this function returns extra columns week_start
. However, when target_variable
is only
inc hosp
, there are no extra columns appended to the resulting data frame.
"inc hosp"
is only available from "HealthData"
and "ECDC"
and this function is not loading
data for other target variables from "HealthData"
.
When loading data for multiple target variables for the US hub, temporal_resolution
will be applied
to all target variables but "inc hosp"
. In that case, the function will return
daily incident hospitalization counts along with other data.
For the US hub, weekly temporal resolution will be applied to "inc hosp"
if the user specifies "inc hosp"
as the only target_variable
.On the other hand, temporal_resolution
will
be applied to "inc hosp"
in all cases for the ECDC hub.
When loading weekly data, if there are not enough observations for a week, the corresponding
weekly count would be NA
in resulting data frame.
as_of
is only supported when data_location = "covidData"
. Otherwise, this function
will return a warning.
library(covidHubUtils) # load for US load_truth( truth_source = c("JHU", "HealthData"), target_variable = c("inc case", "inc death", "inc hosp") )#>#>#> #> #> #>#> #>#>#>#>#> #> #> #>#> #>#>#>#>#> #> #> #>#> #>#>#> # A tibble: 539,622 × 11 #> model target_variable target_end_date location value location_name population #> <chr> <chr> <date> <chr> <dbl> <chr> <dbl> #> 1 Obse… inc hosp 2020-07-21 19 33 Iowa 3155070 #> 2 Obse… inc hosp 2020-07-21 72 0 Puerto Rico 3754939 #> 3 Obse… inc hosp 2020-07-19 10 0 Delaware 973764 #> 4 Obse… inc hosp 2020-07-17 15 4 Hawaii 1415872 #> 5 Obse… inc hosp 2020-07-16 50 6 Vermont 623989 #> 6 Obse… inc hosp 2020-07-15 02 1 Alaska 731545 #> 7 Obse… inc hosp 2020-08-05 22 181 Louisiana 4648794 #> 8 Obse… inc hosp 2020-09-03 41 7 Oregon 4217737 #> 9 Obse… inc hosp 2020-08-29 25 12 Massachusetts 6892503 #> 10 Obse… inc hosp 2020-09-14 48 527 Texas 28995881 #> # … with 539,612 more rows, and 4 more variables: geo_type <chr>, #> # geo_value <chr>, abbreviation <chr>, full_location_name <chr># load for ECDC load_truth( truth_source = c("JHU"), target_variable = c("inc case", "inc death"), hub = "ECDC" )#>#>#> #> #> #>#> #>#>#>#>#> #> #> #>#> #>#>#> # A tibble: 5,184 × 7 #> model location target_end_date target_variable value location_name population #> <chr> <chr> <date> <chr> <dbl> <chr> <int> #> 1 Obse… AT 2020-01-25 inc case 0 Austria 8809212 #> 2 Obse… AT 2020-02-01 inc case 0 Austria 8809212 #> 3 Obse… AT 2020-02-08 inc case 0 Austria 8809212 #> 4 Obse… AT 2020-02-15 inc case 0 Austria 8809212 #> 5 Obse… AT 2020-02-22 inc case 0 Austria 8809212 #> 6 Obse… AT 2020-02-29 inc case 9 Austria 8809212 #> 7 Obse… AT 2020-03-07 inc case 70 Austria 8809212 #> 8 Obse… AT 2020-03-14 inc case 576 Austria 8809212 #> 9 Obse… AT 2020-03-21 inc case 2159 Austria 8809212 #> 10 Obse… AT 2020-03-28 inc case 5457 Austria 8809212 #> # … with 5,174 more rows