Functions to set up optimisers (which find parameters that
maximise the joint density of a model) and change their tuning parameters,
for use in opt()
. For details of the algorithms and how to
tune them, see the
SciPy
optimiser docs or the
TensorFlow
optimiser docs.
nelder_mead() powell() cg() bfgs() newton_cg() l_bfgs_b(maxcor = 10, maxls = 20) tnc(max_cg_it = -1, stepmx = 0, rescale = -1) cobyla(rhobeg = 1) slsqp() gradient_descent(learning_rate = 0.01) adadelta(learning_rate = 0.001, rho = 1, epsilon = 1e-08) adagrad(learning_rate = 0.8, initial_accumulator_value = 0.1) adagrad_da(learning_rate = 0.8, global_step = 1L, initial_gradient_squared_accumulator_value = 0.1, l1_regularization_strength = 0, l2_regularization_strength = 0) momentum(learning_rate = 0.001, momentum = 0.9, use_nesterov = TRUE) adam(learning_rate = 0.1, beta1 = 0.9, beta2 = 0.999, epsilon = 1e-08) ftrl(learning_rate = 1, learning_rate_power = -0.5, initial_accumulator_value = 0.1, l1_regularization_strength = 0, l2_regularization_strength = 0) proximal_gradient_descent(learning_rate = 0.01, l1_regularization_strength = 0, l2_regularization_strength = 0) proximal_adagrad(learning_rate = 1, initial_accumulator_value = 0.1, l1_regularization_strength = 0, l2_regularization_strength = 0) rms_prop(learning_rate = 0.1, decay = 0.9, momentum = 0, epsilon = 1e-10)
maxcor | maximum number of 'variable metric corrections' used to define the approximation to the hessian matrix |
---|---|
maxls | maximum number of line search steps per iteration |
max_cg_it | maximum number of hessian * vector evaluations per iteration |
stepmx | maximum step for the line search |
rescale | log10 scaling factor used to trigger rescaling of objective |
rhobeg | reasonable initial changes to the variables |
learning_rate | the size of steps (in parameter space) towards the optimal value |
rho | the decay rate |
epsilon | a small constant used to condition gradient updates |
initial_accumulator_value | initial value of the 'accumulator' used to tune the algorithm |
global_step | the current training step number |
initial_gradient_squared_accumulator_value | initial value of the accumulators used to tune the algorithm |
l1_regularization_strength | L1 regularisation coefficient (must be 0 or greater) |
l2_regularization_strength | L2 regularisation coefficient (must be 0 or greater) |
momentum | the momentum of the algorithm |
use_nesterov | whether to use Nesterov momentum |
beta1 | exponential decay rate for the 1st moment estimates |
beta2 | exponential decay rate for the 2nd moment estimates |
learning_rate_power | power on the learning rate, must be 0 or less |
decay | discounting factor for the gradient |
an optimiser
object that can be passed to opt
.
The cobyla()
does not provide information about the number of
iterations nor convergence, so these elements of the output are set to NA
# NOT RUN { # use optimisation to find the mean and sd of some data x <- rnorm(100, -2, 1.2) mu <- variable() sd <- variable(lower = 0) distribution(x) <- normal(mu, sd) m <- model(mu, sd) # configure optimisers & parameters via 'optimiser' argument to opt opt_res <- opt(m, optimiser = bfgs()) # compare results with the analytic solution opt_res$par c(mean(x), sd(x)) # }