ropensci/drake: Minor release: new file API, tidyselect, and internal fixes

doi:10.5281/zenodo.1205521

Published March 22, 2018 | Version v5.1.0

Software Open

ropensci/drake: Minor release: new file API, tidyselect, and internal fixes

1. Eli Lilly and Company @EliLillyCo
2. Indiana Commission for Higher Education
3. University of California at Berkeley Agricultural and Resource Economics
4. University of Zürich
5. Generali China AMC @GCAMC

Version 5.1.0

Add a reduce_plan() function to do pairwise reductions on collections of targets.
Forcibly exclude the dot (.) from being a dependency of any target or import. This enforces more consistent behavior in the face of the current static code analysis funcionality, which sometimes detects . and sometimes does not.
Use ignore() to optionally ignore pieces of workflow plan commands and/or imported functions. Use ignore(some_code) to
1. Force drake to not track dependencies in some_code, and
2. Ignore any changes in some_code when it comes to deciding which target are out of date.
Force drake to only look for imports in environments inheriting from envir in make() (plus explicitly namespaced functions).
Force loadd() to ignore foreign imports (imports not explicitly found in envir when make() last imported them).
Reduce default verbosity. Only targets are printed out by default. Verbosity levels are integers ranging from 0 through 4.
Change loadd() so that only targets (not imports) are loaded if the ... and list arguments are empty.
Add check to drake_plan() to check for duplicate targets
Add a .gitignore file containing "*" to the default .drake/ cache folder every time new_cache() is called. This means the cache will not be automatically committed to git. Users need to remove .gitignore file to allow unforced commits, and then subsequent make()s on the same cache will respect the user's wishes and not add another .gitignore. this only works for the default cache. Not supported for manual storrs.
Add a new experimental "future" backend with a manual scheduler.
Implement dplyr-style tidyselect functionality in loadd(), clean(), and build_times(). For build_times(), there is an API change: for tidyselect to work, we needed to insert a new ... argument as the first argument of build_times().
Deprecate the single-quoting API for files. Users should now use formal API functions in their commands:
- file_in() for file inputs to commands or imported functions (for imported functions, the input file needs to be an imported file, not a target).
- file_out() for output file targets (ignored if used in imported functions).
- knitr_in() for knitr/rmarkdown reports. This tells drake to look inside the source file for target dependencies in code chunks (explicitly referenced with loadd() and readd()). Treated as a file_in() if used in imported functions.
Change drake_plan() so that it automatically fills in any target names that the user does not supply. Also, any file_out()s become the target names automatically (double-quoted internally).
Make read_drake_plan() (rather than an empty drake_plan()) the default plan argument in all functions that accept a plan.
Add support for active bindings: loadd(..., lazy = "bind"). That way, when you have a target loaded in one R session and hit make() in another R session, the target in your first session will automatically update.
Use tibbles for workflow plan data frames and the output of dataframes_graph().
Return warnings, errors, and other context of each build, all wrapped up with the usual metadata. diagnose() will take on the role of returning this metadata.
Deprecate the read_drake_meta() function in favor of diagnose().
Add a new expose_imports() function to optionally force drake detect deeply nested functions inside specific packages.
Move the "quickstart.Rmd" vignette to "example-basic.Rmd". The so-called "quickstart" didn't end up being very quick, and it was all about the basic example anyway.
Move drake_build() to be an exclusively user-side function.
Add a replace argument to loadd() so that objects already in the user's eOne small thing:nvironment need not be replaced.
When the graph cyclic, print out all the cycles.
Prune self-referential loops (and duplicate edges) from the workflow graph. That way, recursive functions are allowed.
Add a seed argument to make(), drake_config(), and load_basic_example(). Also hard-code a default seed of 0. That way, the pseudo-randomness in projects should be reproducible across R sessions.
Cache the pseudo-random seed at the time the project is created and use that seed to build targets until the cache is destroyed.
Add a new drake_read_seed() function to read the seed from the cache. Its examples illustrate what drake is doing to try to ensure reproducible random numbers.
Evaluate the quasiquotation operator !! for the ... argument to drake_plan(). Suppress this behavior using tidy_evaluation = FALSE or by passing in commands passed through the list argument.
Preprocess workflow plan commands with rlang::expr() before evaluating them. That means you can use the quasiquotation operator !! in your commands, and make() will evaluate them according to the tidy evaluation paradigm.
Restructure drake_example("basic"), drake_example("gsp"), and drake_example("packages") to demonstrate how to set up the files for serious drake projects. More guidance was needed in light of this issue.
Improve the examples of drake_plan() in the help file (?drake_plan).

Version 5.0.0

Transfer drake to rOpenSci: https://github.com/ropensci/drake
Several functions now require an explicit config argument, which you can get from drake_config() or make(). Examples:
- outdated()
- missed()
- rate_limiting_times()
- predict_runtime()
- vis_drake_graph()
- dataframes_graph()
Always process all the imports before building any targets. This is part of the solution to #168: if imports and targets are processed together, the full power of parallelism is taken away from the targets. Also, the way parallelism happens is now consistent for all parallel backends.
Major speed improvement: dispense with internal inventories and rely on cache$exists() instead.
Let the user define a trigger for each target to customize when make() decides to build targets.
Document triggers and other debugging/testing tools in the new debug vignette.
Restructure the internals of the storr cache in a way that is not back-compatible with projects from versions 4.4.0 and earlier. The main change is to make more intelligent use of storr namespaces, improving efficiency (both time and storage) and opening up possibilities for new features. If you attempt to run drake >= 5.0.0 on a project from drake <= 4.0.0, drake will stop you before any damage to the cache is done, and you will be instructed how to migrate your project to the new drake.
Use formatR::tidy_source() instead of parse() in tidy_command() (originally tidy() in R/dependencies.R). Previously, drake was having problems with an edge case: as a command, the literal string "A" was interpreted as the symbol A after tidying. With tidy_source(), literal quoted strings stay literal quoted strings in commands. This may put some targets out of date in old projects, yet another loss of back compatibility in version 5.0.0.
Speed up clean() by refactoring the cache inventory and using light parallelism.
Implement rescue_cache(), exposed to the user and used in clean(). This function removes dangling orphaned files in the cache so that a broken cache can be cleaned and used in the usual ways once more.
Change the default cpu and elapsed arguments of make() to NULL. This solves an elusive bug in how drake imposes timeouts.
Allow users to set target-level timeouts (overall, cpu, and elapsed) with columns in the workflow plan data frame.
Document timeouts and retries in the new debug vignette.
Add a new graph argument to functions make(), outdated(), and missed().
Export a new prune_graph() function for igraph objects.
Delete long-deprecated functions prune() and status().
Deprecate and rename functions:
- analyses() => plan_analyses()
- as_file() => as_drake_filename()
- backend() => future::plan()
- build_graph() => build_drake_graph()
- check() => check_plan()
- config() => drake_config()
- evaluate() => evaluate_plan()
- example_drake() => drake_example()
- examples_drake() => drake_examples()
- expand() => expand_plan()
- gather() => gather_plan()
- plan(), workflow(), workplan() => drake_plan()
- plot_graph() => vis_drake_graph()
- read_config() => read_drake_config()
- read_graph() => read_drake_graph()
- read_plan() => read_drake_plan()
- render_graph() => render_drake_graph()
- session() => drake_session()
- summaries() => plan_summaries()
Disallow output and code as names in the workflow plan data frame. Use target and command instead. This naming switch has been formally deprecated for several months prior.
Deprecate the ..analysis.. and ..dataset.. wildcards in favor of analysis and dataset, respectively. The new wildcards are stylistically better an pass linting checks.
Add new functions drake_quotes(), drake_unquote(), and drake_strings() to remove the silly dependence on the eply package.
Add a skip_safety_checks flag to make() and drake_config(). Increases speed.
In sanitize_plan(), remove rows with blank targets "".
Add a purge argument to clean() to optionally remove all target-level information.
Add a namespace argument to cached() so users can inspect individual storr namespaces.
Change verbose to numeric: 0 = print nothing, 1 = print progress on imports only, 2 = print everything.
Add a new next_stage() function to report the targets to be made in the next parallelizable stage.
Add a new session_info argument to make(). Apparently, sessionInfo() is a bottleneck for small make()s, so there is now an option to suppress it. This is mostly for the sake of speeding up unit tests.
Add a new log_progress argument to make() to suppress progress logging. This increases storage efficiency and speeds some projects up a tiny bit.
Add an optional namespace argument to loadd() and readd(). You can now load and read from non-default storr namespaces.
Add drake_cache_log(), drake_cache_log_file(), and make(..., cache_log_file = TRUE) as options to track changes to targets/imports in the drake cache.
Detect knitr code chunk dependencies in response to commands with rmarkdown::render(), not just knit().
Add a new general best practices vignette to clear up misconceptions about how to use drake properly.

Files

ropensci/drake-v5.1.0.zip

Files (6.6 MB)

Name	Size	Download all
ropensci/drake-v5.1.0.zip md5:1df7dd9f16aa0d406953e782f5fde9e6	6.6 MB	Preview Download

Additional details

Is supplement to: https://github.com/ropensci/drake/tree/v5.1.0 (URL)

	All versions	This version
Views	2,562	39
Downloads	390	15
Data volume	1.3 GB	151.3 MB

ropensci/drake: Minor release: new file API, tidyselect, and internal fixes

Creators

Description

Files

ropensci/drake-v5.1.0.zip

Files (6.6 MB)

Additional details

Related works