By default, for the US hub, the resulting data.frame contains data for weekly incident cases (JHU), weekly incident deaths (JHU) and daily incident hospitalization (HealthData) at all county, state and national level. For the ECDC hub, the default resulting data.frame contains data for weekly incident cases (JHU), weekly incident deaths (JHU) and daily incident hospitalization (ECDC) for all European countries.

load_truth(
  truth_source = NULL,
  target_variable = NULL,
  as_of = NULL,
  truth_end_date = NULL,
  temporal_resolution = NULL,
  locations = NULL,
  data_location = NULL,
  local_repo_path = NULL,
  hub = c("US", "ECDC")
)

Arguments

truth_source

character vector specifying where the truths will be loaded from: currently support "JHU", "USAFacts", "NYTimes", "HealthData" and "ECDC". If NULL, default for US hub is c("JHU", "HealthData"). If NULL, default for ECDC hub is c("JHU").

target_variable

string specifying target type It should be one or more of "cum death", "inc case", "inc death", "inc hosp". If NULL, default for US hub is c("inc case", "inc death", "inc hosp"). If NULL, default for ECDC hub is c("inc case", "inc death").

as_of

character vector of "as of" dates to use for querying truths in format 'yyyy-mm-dd'. For each spatial unit and temporal reporting unit, the last available data with an issue date on or before the given as_of date are returned. This is only available for covidData now.

truth_end_date

date to include the last available truth point in 'yyyy-mm-dd' format. If NULL,default to system date.

temporal_resolution

character specifying temporal resolution to include: currently support "weekly" and "daily". If NULL, default to "weekly" for cases and deaths, "daily" for hospitalizations. Weekly temporal_resolution will not be applied to "inc hosp" when multiple target variables are specified. "ECDC" truth data is weekly by default. Daily level data is not available.

locations

vector of valid location code. If NULL, default to all locations with available forecasts. US hub is using FIPS code and ECDC hub is using country name abbreviation.

data_location

character specifying the location of truth data. Currently only supports "local_hub_repo", "remote_hub_repo" and "covidData". If NULL, default to "remote_hub_repo".

local_repo_path

path to local clone of the reichlab/covid19-forecast-hub repository. Only used when data_location is "local_hub_repo"

hub

character, which hub to use. Default is "US", other option is "ECDC"

Value

data.frame with columns model, target_variable, target_end_date, location, value, location_name, population and extra information in these cases

  • If hub = "US", it returns extra columns geo_type, geo_value, abbreviation and full_location_name.

  • If truth_source = "ECDC", this function returns extra columns week_start. However, when target_variable is only inc hosp, there are no extra columns appended to the resulting data frame.

Details

  • "inc hosp" is only available from "HealthData" and "ECDC" and this function is not loading data for other target variables from "HealthData".

  • When loading data for multiple target variables for the US hub, temporal_resolution will be applied to all target variables but "inc hosp". In that case, the function will return daily incident hospitalization counts along with other data.

  • For the US hub, weekly temporal resolution will be applied to "inc hosp" if the user specifies "inc hosp" as the only target_variable.On the other hand, temporal_resolution will be applied to "inc hosp" in all cases for the ECDC hub.

  • When loading weekly data, if there are not enough observations for a week, the corresponding weekly count would be NA in resulting data frame.

  • as_of is only supported when data_location = "covidData". Otherwise, this function will return a warning.

Examples

library(covidHubUtils) # load for US load_truth( truth_source = c("JHU", "HealthData"), target_variable = c("inc case", "inc death", "inc hosp") )
#> Rows: 21232 Columns: 4
#> ── Column specification ──────────────────────────────────────────────────────── #> Delimiter: "," #> chr (2): location, location_name #> dbl (1): value #> date (1): date
#> #> Use `spec()` to retrieve the full column specification for this data. #> Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Rows: 1808000 Columns: 4
#> ── Column specification ──────────────────────────────────────────────────────── #> Delimiter: "," #> chr (2): location, location_name #> dbl (1): value #> date (1): date
#> #> Use `spec()` to retrieve the full column specification for this data. #> Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Rows: 1808000 Columns: 4
#> ── Column specification ──────────────────────────────────────────────────────── #> Delimiter: "," #> chr (2): location, location_name #> dbl (1): value #> date (1): date
#> #> Use `spec()` to retrieve the full column specification for this data. #> Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> # A tibble: 539,622 × 11 #> model target_variable target_end_date location value location_name population #> <chr> <chr> <date> <chr> <dbl> <chr> <dbl> #> 1 Obse… inc hosp 2020-07-21 19 33 Iowa 3155070 #> 2 Obse… inc hosp 2020-07-21 72 0 Puerto Rico 3754939 #> 3 Obse… inc hosp 2020-07-19 10 0 Delaware 973764 #> 4 Obse… inc hosp 2020-07-17 15 4 Hawaii 1415872 #> 5 Obse… inc hosp 2020-07-16 50 6 Vermont 623989 #> 6 Obse… inc hosp 2020-07-15 02 1 Alaska 731545 #> 7 Obse… inc hosp 2020-08-05 22 181 Louisiana 4648794 #> 8 Obse… inc hosp 2020-09-03 41 7 Oregon 4217737 #> 9 Obse… inc hosp 2020-08-29 25 12 Massachusetts 6892503 #> 10 Obse… inc hosp 2020-09-14 48 527 Texas 28995881 #> # … with 539,612 more rows, and 4 more variables: geo_type <chr>, #> # geo_value <chr>, abbreviation <chr>, full_location_name <chr>
# load for ECDC load_truth( truth_source = c("JHU"), target_variable = c("inc case", "inc death"), hub = "ECDC" )
#> Rows: 18176 Columns: 4
#> ── Column specification ──────────────────────────────────────────────────────── #> Delimiter: "," #> chr (2): location, location_name #> dbl (1): value #> date (1): date
#> #> Use `spec()` to retrieve the full column specification for this data. #> Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Rows: 18176 Columns: 4
#> ── Column specification ──────────────────────────────────────────────────────── #> Delimiter: "," #> chr (2): location, location_name #> dbl (1): value #> date (1): date
#> #> Use `spec()` to retrieve the full column specification for this data. #> Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> # A tibble: 5,184 × 7 #> model location target_end_date target_variable value location_name population #> <chr> <chr> <date> <chr> <dbl> <chr> <int> #> 1 Obse… AT 2020-01-25 inc case 0 Austria 8809212 #> 2 Obse… AT 2020-02-01 inc case 0 Austria 8809212 #> 3 Obse… AT 2020-02-08 inc case 0 Austria 8809212 #> 4 Obse… AT 2020-02-15 inc case 0 Austria 8809212 #> 5 Obse… AT 2020-02-22 inc case 0 Austria 8809212 #> 6 Obse… AT 2020-02-29 inc case 9 Austria 8809212 #> 7 Obse… AT 2020-03-07 inc case 70 Austria 8809212 #> 8 Obse… AT 2020-03-14 inc case 576 Austria 8809212 #> 9 Obse… AT 2020-03-21 inc case 2159 Austria 8809212 #> 10 Obse… AT 2020-03-28 inc case 5457 Austria 8809212 #> # … with 5,174 more rows