opt#

Generic implementations of optimization routines.

Generic implementations of optimization algorithm such as conjugate gradient and line search that can be reused between domain specific modules. In, the future, this module may be replaced by Operator Discretization Library (ODL) solvers library.

tike.opt.adadelta(g, d0=None, v=None, m=None, decay=0.9, eps=1e-06)[source]#

Return the adadelta algorithm direction.

Used to provide a better search direction to stochastic gradient descent.

Parameters
  • g (vector) – The current gradient.

  • d0 (vector) – The previous search direction.

  • v (vector) – The adadelta gradient weights.

  • m (vector) – The adadelta direction weights.

  • eps (float) – A tiny constant to prevent zero division.

Returns

  • d (vector) – The new search direction.

  • v (vector) – The new gradient weights.

References

Zeiler, Matthew D. “Adadelta: an adaptive learning rate method.” arXiv preprint arXiv:1212.5701 (2012).

tike.opt.adagrad(g, v=None, m=None, eps=1e-06)[source]#

Return the adaptive gradient algorithm direction.

Used to provide a better search direction to stochastic gradient descent.

Parameters
  • g (vector) – The current gradient.

  • v (vector) – The adagrad gradient weights.

  • eps (float) – A tiny constant to prevent zero division.

  • m (Union[None, numpy.typing.NDArray]) –

Returns

  • d (vector) – The new search direction.

  • v (vector) – The new gradient weights.

References

Duchi, John, Elad Hazan, and Yoram Singer. “Adaptive subgradient methods for online learning and stochastic optimization.” Journal of machine learning research 12, no. 7 (2011).

tike.opt.adam(g, v=None, m=None, vdecay=0.999, mdecay=0.9, eps=1e-08)[source]#

Return the adaptive moment estimation direction.

Used to provide a better search direction to stochastic gradient descent.

Parameters
  • g (vector) – The current search direction.

  • v (vector) – Second moment estimate.

  • m (vector) – First moment estimate.

  • vdecay (float [0, 1)) – A factor which determines how quickly information from previous steps decays.

  • mdecay (float [0, 1)) – A factor which determines how quickly information from previous steps decays.

  • eps (float) – A tiny constant to prevent zero division.

Returns

  • d (vector) – The new search direction.

  • v (vector) – The new gradient weights.

  • m (vector) – The new momentum weights.

Return type

Tuple[numpy.typing.NDArray, numpy.typing.NDArray, numpy.typing.NDArray]

References

Kingma, Diederik P., and Jimmy Ba. “Adam: A method for stochastic optimization.” arXiv preprint arXiv:1412.6980 (2014).

tike.opt.batch_indicies(n, m=1, use_random=True)[source]#

Return list of indices [0…n) as m groups.

>>> batch_indicies(10, 3)
[array([2, 4, 7, 3]), array([1, 8, 9]), array([6, 5, 0])]
tike.opt.conjugate_gradient(array_module, x, cost_function, grad, direction_dy=<function direction_dy>, dir_multi=<function dir_single>, update_multi=<function update_single>, num_iter=1, step_length=1, num_search=None, cost=None)[source]#

Use conjugate gradient to estimate x.

Parameters
  • array_module (module) – The Python module that will provide array operations.

  • x (array_like) – The object to be recovered.

  • cost_function (func(x) -> float) – The function being minimized to recover x.

  • grad (func(x) -> array_like) – The gradient of cost_function.

  • dir_multi (func(x) -> list_of_array) – The search direction in all GPUs.

  • update_multi (func(x) -> list_of_array) – The updated subimages in all GPUs.

  • num_iter (int) – The number of steps to take.

  • num_search (int) – The number of during which to perform line search.

  • step_length (float) – The initial multiplier of the search direction.

  • cost (float) – The current loss function estimate.

tike.opt.dir_single(x)[source]#
tike.opt.direction_dy(xp, grad1, grad0=None, dir_=None)[source]#

Return the Dai-Yuan search direction.

Parameters
  • grad0 (array_like) – The gradient from the previous step.

  • grad1 (array_like) – The gradient from this step.

  • dir (array_like) – The previous search direction.

tike.opt.fit_line_least_squares(y, x)[source]#

Return the slope, intercept pair that best fits y, x to a line.

y = slope * x + intercept

Parameters
  • y (List[float]) –

  • x (List[float]) –

Return type

Tuple[float, float]

tike.opt.get_batch(x, b, n)[source]#

Returns x[:, b[n]]; for use with map().

tike.opt.is_converged(algorithm_options)[source]#

Return True if cost slope is non-negative within the the window.

Every half-window, look at the slope of the line that fits to the last window cost values (average cost values if mini-batch). If this slope is non-negative, return True else return False.

This is a smoothed absolute differences convergence criteria because we are considering the difference between consecutive cost values (absolute differences) per epoch.

Return a new step_length using a backtracking line search.

Parameters
  • f (function(x)) – The function being optimized.

  • x (vector) – The current position.

  • d (vector) – The search direction.

  • step_length (float) – The initial step_length.

  • step_shrink (float) – Decrease the step_length by this fraction at each iteration.

  • cost (float) – f(x) if it is already known.

Returns

  • step_length (float) – The optimal step length along d.

  • cost (float) – The new value of the cost function after stepping along d.

  • x (float) – The new value of x after stepping along d.

References

https://en.wikipedia.org/wiki/Backtracking_line_search

tike.opt.momentum(g, v, m, vdecay=None, mdecay=0.9)[source]#

Add momentum to the gradient direction.

Parameters
  • g (vector) – The current gradient.

  • m (vector) – The momentum.

  • eps (float) – A tiny constant to prevent zero division.

tike.opt.put_batch(y, x, b, n)[source]#

Assigns y into x[:, b[n]]; for use with map().

tike.opt.update_single(x, step_length, d)[source]#