R/cache.R, R/pipe.R
cache.RdCache method that accommodates environments, S4 methods, Rasters, & nested caching
The special assign operator %<% is equivalent to Cache. See examples at the end.
Cache(FUN, ..., notOlderThan = NULL, objects = NULL, outputObjects = NULL, algo = "xxhash64", cacheRepo = NULL, length = 1e+06, compareRasterFileLength, userTags = c(), digestPathContent, omitArgs = NULL, classOptions = list(), debugCache = character(), sideEffect = FALSE, makeCopy = FALSE, quick = getOption("reproducible.quick", FALSE), verbose = getOption("reproducible.verbose", FALSE), cacheId = NULL, useCache = getOption("reproducible.useCache", TRUE), showSimilar = NULL) # S4 method for ANY Cache(FUN, ..., notOlderThan = NULL, objects = NULL, outputObjects = NULL, algo = "xxhash64", cacheRepo = NULL, length = 1e+06, compareRasterFileLength, userTags = c(), digestPathContent, omitArgs = NULL, classOptions = list(), debugCache = character(), sideEffect = FALSE, makeCopy = FALSE, quick = getOption("reproducible.quick", FALSE), verbose = getOption("reproducible.verbose", FALSE), cacheId = NULL, useCache = getOption("reproducible.useCache", TRUE), showSimilar = NULL) lhs %<% rhs
| FUN | Either a function or an unevaluated function call (e.g., using
|
|---|---|
| ... | Arguments of |
| notOlderThan | load an artifact from the database only if it was created after notOlderThan. |
| objects | Character vector of objects to be digested. This is only applicable if there is a list, environment or simList with named objects within it. Only this/these objects will be considered for caching, i.e., only use a subset of the list, environment or simList objects. |
| outputObjects | Optional character vector indicating which objects to
return. This is only relevant for |
| algo | The algorithms to be used; currently available choices are
|
| cacheRepo | A repository used for storing cached objects.
This is optional if |
| length | Numeric. If the element passed to Cache is a |
| compareRasterFileLength | Being deprecated; use |
| userTags | A character vector with Tags. These Tags will be added to the repository along with the artifact. |
| digestPathContent | Being deprecated. Use |
| omitArgs | Optional character string of arguments in the FUN to omit from the digest. |
| classOptions | Optional list. This will pass into |
| debugCache | Character or Logical. Either |
| sideEffect | Logical or path. Determines where the function will look for new files following function completion. See Details. NOTE: this argument is experimental and may change in future releases. |
| makeCopy | Logical. If |
| quick | Logical. If |
| verbose | Logical. This will output much more information about the internals of Caching, which may help diagnose Caching challenges. |
| cacheId | Character string. If passed, this will override the calculated hash of the inputs, and return the result from this cacheId in the cacheRepo. In general, this is not used; however, in some particularly finicky situations where Cache is not correctly detecting unchanged inputs, this can stabilize the return value. |
| useCache | Logical. If |
| showSimilar | A logical or numeric. Useful for debugging.
If |
| lhs | A name to assign to. |
| rhs | A function call |
As with cache, returns the value of the
function call or the cached version (i.e., the result from a previous call
to this same cached function with identical arguments).
Caching R objects using cache has five important limitations:
the archivist package detects different environments as different;
it also does not detect S4 methods correctly due to method inheritance;
it does not detect objects that have file-base storage of information
(specifically RasterLayer-class objects);
the default hashing algorithm is relatively slow.
heavily nested function calls may want Cache arguments to propagate through
This version of the Cache function accommodates those four special,
though quite common, cases by:
converting any environments into list equivalents;
identifying the dispatched S4 method (including those made through inheritance) before hashing so the correct method is being cached;
by hashing the linked file, rather than the Raster object.
Currently, only file-backed Raster* objects are digested
(e.g., not ff objects, or any other R object where the data
are on disk instead of in RAM);
using fastdigest internally when the object
is in RAM, which can be up to ten times faster than
digest. Note that file-backed objects are still
hashed using digest.
Cache will save arguments passed by user in a hidden environment. Any nested Cache functions will use arguments in this order 1) actual arguments passed at each Cache call, 2) any inherited arguments from an outer Cache call, 3) the default values of the Cache function. See section on Nested Caching.
If Cache is called within a SpaDES module, then the cached entry will automatically
get 3 extra userTags: eventTime, eventType, and moduleName.
These can then be used in clearCache to selectively remove cached objects
by eventTime, eventType or moduleName.
Cache will add a tag to the artifact in the database called accessed,
which will assign the time that it was accessed, either read or write.
That way, artifacts can be shown (using showCache) or removed (using
clearCache) selectively, based on their access dates, rather than only
by their creation dates. See example in clearCache.
Cache (uppercase C) is used here so that it is not confused with, and does
not mask, the archivist::cache function.
As indicated above, several objects require pre-treatment before
caching will work as expected. The function .robustDigest accommodates this.
It is an S4 generic, meaning that developers can produce their own methods for
different classes of objects. Currently, there are methods for several types
of classes. See .robustDigest.
See .robustDigest for other specifics for other classes.
Commonly, Caching is nested, i.e., an outer function is wrapped in a Cache
function call, and one or more inner functions are also wrapped in a Cache
function call. A user can always specify arguments in every Cache function
call, but this can get tedious and can be prone to errors. The normal way that
R handles arguments is it takes the user passed arguments if any, and
default arguments for all those that have no user passed arguments. We have inserted
a middle step. The order or precedence for any given Cache function call is
1. user arguments, 2. inherited arguments, 3. default arguments. At this time,
the top level Cache arguments will propagate to all inner functions unless
each individual Cache call has other arguments specified, i.e., "middle"
nested Cache function calls don't propagate their arguments to further "inner"
Cache function calls. See example.
userTags is unique of all arguments: its values will be appended to the
inherited userTags.
Caching speed may become a critical aspect of a final product. For example,
if the final product is a shiny app, rerunning the entire project may need
to take less then a few seconds at most. There are 3 arguments that affect
Cache speed: quick, length, and
algo. quick is passed to .robustDigest, which currently
only affects Path and Raster* class objects. In both cases, quick
means that little or no disk-based information will be assessed.
If a function has a path argument, there is some ambiguity about what should be done. Possibilities include:
hash the string as is (this will be very system specific, meaning a
Cache call will not work if copied between systems or directories);
hash the basename(path);
hash the contents of the file.
If paths are passed in as is (i.e,. character string), the result will not be predictable.
Instead, one should use the wrapper function asPath(path), which sets the
class of the string to a Path, and one should decide whether one wants
to digest the content of the file (using quick = FALSE),
or just the filename ((quick = TRUE)). See examples.
In general, it is expected that caching will only be used when stochasticity
is not relevant, or if a user has achieved sufficient stochasticity (e.g., via
sufficient number of calls to experiment) such that no new explorations
of stochastic outcomes are required. It will also be very useful in a
reproducible workflow.
sideEffectIf sideEffect is not FALSE, then metadata about any files that
added to sideEffect will be added as an attribute to the cached copy.
Subsequent calls to this function
will assess for the presence of the new files in the sideEffect location.
If the files are identical (quick = FALSE) or their file size is
identical (quick = TRUE), then the cached copy of the function will
be returned (and no files changed). If there are missing or incorrect files,
then the function will re-run. This will accommodate the situation where the
function call is identical, but somehow the side effect files were modified.
If sideEffect is logical, then the function will check the
cacheRepo; if it is a path, then it will check the path. The function will
assess whether the files to be downloaded are found locally
prior to download. If it fails the local test, then it will try to recover from a
local copy if (makeCopy had been set to TRUE the first time
the function was run. Currently, local recovery will only work ifmakeCOpy was
set to TRUE the first time Cache
was run). Default is FALSE.
tmpDir <- file.path(tempdir()) # Basic use ranNumsA <- Cache(rnorm, 10, 16, cacheRepo = tmpDir) # All same ranNumsB <- Cache(rnorm, 10, 16, cacheRepo = tmpDir) # recovers cached copy#>ranNumsC <- rnorm(10, 16) %>% Cache(cacheRepo = tmpDir) # recovers cached copy#>ranNumsD <- Cache(quote(rnorm(n = 10, 16)), cacheRepo = tmpDir) # recovers cached copy#># For more in depth uses, see vignette# NOT RUN { browseVignettes(package = "reproducible") # }# Equivalent a <- Cache(rnorm, 1)#>#>b %<% rnorm(1)#>#>