Nesterov

Nesterov#

class seli.opt.Nesterov(lr: float = 0.001, beta: float = 0.9)[source]#

Bases: Optimizer

Nesterov Accelerated Gradient optimizer.

Improves on standard momentum by computing gradients at a “lookahead” position, providing better convergence rates.

Methods Summary

call_param(key, grad, **_)

Process the gradients of a single parameter.

Methods Documentation

call_param(key: str, grad: Float[Array, '*s'], **_) Float[Array, '*s'][source]#

Process the gradients of a single parameter. This function is useful for implementing custom optimizers that essentially run the same function for all parameters. This is the case for most well known optimizers.

Parameters:
  • loss (Float[Array, ""]) – The absolute loss value.

  • key (str) – The key of the parameter.

  • grad (Float[Array]) – The gradients of the parameter.

  • param (Float[Array]) – The parameter values.

Returns:

grad – The processed gradients of the parameter.

Return type:

Float[Array]