This function can be used to prepare R objects from remote or local data
sources. The object of this function is to provide a reproducible version of
a series of commonly used steps for getting, loading, and processing data.
This function has two stages: Getting data (download, extracting from archives,
loading into R) and postProcessing (for Spatial* and Raster*
objects, this is crop, reproject, mask/intersect).
To trigger the first stage, provide url or archive.
To trigger the second stage, provide studyArea or rasterToMatch.
See examples.
prepInputs(targetFile = NULL, url = NULL, archive = NULL, alsoExtract = NULL, destinationPath = ".", fun = NULL, quick = getOption("reproducible.quick"), overwrite = FALSE, purge = FALSE, useCache = getOption("reproducible.useCache", FALSE), ...)
| targetFile | Character string giving the path to the eventual file
(raster, shapefile, csv, etc.) after downloading and extracting from a zip
or tar archive. This is the file before it is passed to
|
|---|---|
| url | Optional character string indicating the URL to download from.
Normally, if used within a module, this url should be explicitly given as
sourceURL for an |
| archive | Optional character string giving the path of an archive
containing |
| alsoExtract | Optional character string naming files other than
|
| destinationPath | Character string of a directory in which to download
and save the file that comes from |
| fun | Character string indicating the function to use to load
|
| quick | Logical. This is passed internally to |
| overwrite | Logical. Should downloading and all the other actions occur even if they pass the checksums or the files are all there. |
| purge | Logical or Integer. |
| useCache | Passed to Cache in various places. Default |
| ... | Additional arguments passed to |
This function is still experimental: use with caution.
Download from the web via either drive_download,
download.file;
Extract from archive using unzip or untar;
Load into R using raster,
shapefile, or any other function passed in with fun;
Checksumming of all files during this process. This is put into a
CHECKSUMS.txt file in the destinationPath, appending if it is
already there, overwriting the entries for same files if entries already exist.
This will be triggered if either rasterToMatch or studyArea
is supplied.
Fix errors. Currently only errors fixed are for SpatialPolygons
using buffer(..., width = 0);
Crop using cropInputs;
Project using projectInputs;
Mask using maskInputs;
Determine file name determineFilename via filename2;
Optionally, write that file name to disk via writeOutputs.
NOTE: checksumming does not occur during the post-processing stage, as
there are no file downloads. To achieve fast results, wrap
prepInputs with Cache.
NOTE: sf objects are still very experimental.
Raster* and Spatial* objects:rasterToMatch or studyArea are used, then this will
trigger several subsequent functions, specifically the sequence,
Crop, reproject, mask, which appears to be a common sequence in
spatial simulation. See postProcess.spatialObjects.
Understanding various combinations of rasterToMatch
and/or studyArea:
Please see postProcess.spatialObjects.
purgeIn options for control of purging the CHECKSUMS.txt file are:
0 | keep file |
1 | delete file |
2 | delete entry for targetFile |
4 | delete entry for alsoExtract |
3 | delete entry for archive |
5 | delete entry for targetFile & alsoExtract |
6 | delete entry for targetFile, alsoExtract & archive |
7 | delete entry that is failing (i.e., for the file downloaded by the url) |
0 |
will only remove entries in the CHECKSUMS.txt that are associated with
targetFile, alsoExtract or archive When prepInputs is called, it will write or append to a (if
already exists)
CHECKSUMS.txt file. If the CHECKSUMS.txt is not correct, use
this argument to remove it.
# This function works within a module; however, currently, # \cde{sourceURL} is not yet working as desired. Use \code{url}.# NOT RUN { # download a zip file from internet, unzip all files, load as shapefile, Cache the call # First time: don't know all files - prepInputs will guess, if download file is an archive, # then extract all files, then if there is a .shp, it will load with raster::shapefile dPath <- file.path(tempdir(), "ecozones") shpEcozone <- prepInputs(destinationPath = dPath, url = "http://sis.agr.gc.ca/cansis/nsdb/ecostrat/zone/ecozone_shp.zip") # Robust to partial file deletions: unlink(dir(dPath, full.names = TRUE)[1:3]) shpEcozone <- prepInputs(destinationPath = dPath, url = "http://sis.agr.gc.ca/cansis/nsdb/ecostrat/zone/ecozone_shp.zip") unlink(dPath, recursive = TRUE) # Once this is done, can be more precise in operational code: # specify targetFile, alsoExtract, and fun, wrap with Cache ecozoneFilename <- file.path(dPath, "ecozones.shp") ecozoneFiles <- c("ecozones.dbf", "ecozones.prj", "ecozones.sbn", "ecozones.sbx", "ecozones.shp", "ecozones.shx") shpEcozone <- prepInputs(targetFile = ecozoneFilename, url = "http://sis.agr.gc.ca/cansis/nsdb/ecostrat/zone/ecozone_shp.zip", alsoExtract = ecozoneFiles, fun = "shapefile", destinationPath = dPath) unlink(dPath, recursive = TRUE) #' # Add a study area to Crop and Mask to # Create a "study area" library(sp) library(raster) coords <- structure(c(-122.98, -116.1, -99.2, -106, -122.98, 59.9, 65.73, 63.58, 54.79, 59.9), .Dim = c(5L, 2L)) Sr1 <- Polygon(coords) Srs1 <- Polygons(list(Sr1), "s1") StudyArea <- SpatialPolygons(list(Srs1), 1L) crs(StudyArea) <- "+init=epsg:4326 +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0" # specify targetFile, alsoExtract, and fun, wrap with Cache ecozoneFilename <- file.path(dPath, "ecozones.shp") # Note, you don't need to "alsoExtract" the archive... if the archive is not there, but the # targetFile is there, it will not redownload the archive. ecozoneFiles <- c("ecozones.dbf", "ecozones.prj", "ecozones.sbn", "ecozones.sbx", "ecozones.shp", "ecozones.shx") shpEcozoneSm <- Cache(prepInputs, url = "http://sis.agr.gc.ca/cansis/nsdb/ecostrat/zone/ecozone_shp.zip", targetFile = reproducible::asPath(ecozoneFilename), alsoExtract = reproducible::asPath(ecozoneFiles), studyArea = StudyArea, fun = "shapefile", destinationPath = dPath, filename2 = "EcozoneFile.shp") # passed to determineFilename plot(shpEcozone) plot(shpEcozoneSm, add = TRUE, col = "red") unlink(dPath) # Big Raster, with crop and mask to Study Area - no reprojecting (lossy) of raster, # but the StudyArea does get reprojected, need to use rasterToMatch dPath <- file.path(tempdir(), "LCC") lcc2005Filename <- file.path(dPath, "LCC2005_V1_4a.tif") url <- file.path("ftp://ftp.ccrs.nrcan.gc.ca/ad/NLCCLandCover", "LandcoverCanada2005_250m/LandCoverOfCanada2005_V1_4.zip") # messages received below may help for filling in more arguments in the subsequent call LCC2005 <- prepInputs(url = url, destinationPath = asPath(dPath), studyArea = StudyArea) plot(LCC2005) # if wrapped with Cache, will be fast second time, very fast 3rd time (via memoised copy) LCC2005 <- Cache(prepInputs, url = url, targetFile = lcc2005Filename, archive = asPath("LandCoverOfCanada2005_V1_4.zip"), destinationPath = asPath(dPath), studyArea = StudyArea) # }