lagom¶
Agent¶
-
class
lagom.
BaseAgent
(config, env, device, **kwargs)[source]¶ Base class for all agents.
The agent could select an action from some data (e.g. observation) and update itself by defining a certain learning mechanism.
Any agent should subclass this class, e.g. policy-based or value-based.
Parameters: - config (dict) – a dictionary of configurations
- env (Env) – environment object.
- device (Device) – a PyTorch device
- **kwargs – keyword aguments used to specify the agent
-
choose_action
(x, **kwargs)[source]¶ Returns the selected action given the data.
Note
It’s recommended to handle all dtype/device conversions between CPU/GPU or Tensor/Numpy here.
The output is a dictionary containing useful items,
Parameters: - obs (object) – batched observation returned from the environment. First dimension is treated as batch dimension.
- **kwargs – keyword arguments to specify action selection.
Returns: - a dictionary of action selection output. It contains all useful information (e.g. action,
action_logprob, state_value). This allows the API to be generic and compatible with different kinds of runner and agents.
Return type: dict
-
learn
(D, **kwargs)[source]¶ Defines learning mechanism to update the agent from a batched data.
Parameters: - D (list) – a list of batched data to train the agent e.g. in policy gradient, this can be
a list of
Trajectory
. - **kwargs – keyword arguments to specify learning mechanism
Returns: a dictionary of learning output. This could contain the loss and other useful metrics.
Return type: dict
- D (list) – a list of batched data to train the agent e.g. in policy gradient, this can be
a list of
-
class
lagom.
RandomAgent
(config, env, device, **kwargs)[source]¶ A random agent samples action uniformly from action space.
-
choose_action
(x, **kwargs)[source]¶ Returns the selected action given the data.
Note
It’s recommended to handle all dtype/device conversions between CPU/GPU or Tensor/Numpy here.
The output is a dictionary containing useful items,
Parameters: - obs (object) – batched observation returned from the environment. First dimension is treated as batch dimension.
- **kwargs – keyword arguments to specify action selection.
Returns: - a dictionary of action selection output. It contains all useful information (e.g. action,
action_logprob, state_value). This allows the API to be generic and compatible with different kinds of runner and agents.
Return type: dict
-
learn
(D, **kwargs)[source]¶ Defines learning mechanism to update the agent from a batched data.
Parameters: - D (list) – a list of batched data to train the agent e.g. in policy gradient, this can be
a list of
Trajectory
. - **kwargs – keyword arguments to specify learning mechanism
Returns: a dictionary of learning output. This could contain the loss and other useful metrics.
Return type: dict
- D (list) – a list of batched data to train the agent e.g. in policy gradient, this can be
a list of
-
Data¶
Logger¶
-
class
lagom.
Logger
[source]¶ Log the information in a dictionary.
If a key is logged more than once, then the new value will be appended to a list.
Note
It uses pickle to serialize the data. Empirically,
pickle
is 2x faster thannumpy.save
and other alternatives likeyaml
is too slow andJSON
does not support numpy array.Warning
It is discouraged to store hierarchical structure, e.g. list of dict of list of ndarray. Because pickling such complex and large data structure is extremely slow. Put dictionary only at the topmost level. Large numpy array should be saved separately.
Example:
Default:
>>> logger = Logger() >>> logger('iteration', 1) >>> logger('train_loss', 0.12) >>> logger('iteration', 2) >>> logger('train_loss', 0.11) >>> logger('iteration', 3) >>> logger('train_loss', 0.09) >>> logger OrderedDict([('iteration', [1, 2, 3]), ('train_loss', [0.12, 0.11, 0.09])]) >>> logger.dump() Iteration: [1, 2, 3] Train Loss: [0.12, 0.11, 0.09]
With indentation:
>>> logger.dump(indent=1) Iteration: [1, 2, 3] Train Loss: [0.12, 0.11, 0.09]
With specific keys:
>>> logger.dump(keys=['iteration']) Iteration: [1, 2, 3]
With specific index:
>>> logger.dump(index=0) Iteration: 1 Train Loss: 0.12
With specific list of indices:
>>> logger.dump(index=[0, 2]) Iteration: [1, 3] Train Loss: [0.12, 0.09]
-
__call__
(key, value)[source]¶ Log the information with given key and value.
Note
The key should be semantic and each word is separated by
_
.Parameters: - key (str) – key of the information
- value (object) – value to be logged
-
dump
(keys=None, index=None, indent=0, border='')[source]¶ Dump the loggings to the screen.
Parameters: - keys (list, optional) – a list of selected keys. If
None
, then use all keys. Default:None
- index (int/list, optional) –
the index of logged values. It has following use cases:
scalar
: a specific index. If-1
, then use last element.list
: a list of indicies.None
: all indicies.
- indent (int, optional) – the number of tab indentation. Default:
0
- border (str, optional) – the string to print as header and footer
- keys (list, optional) – a list of selected keys. If
Engine¶
-
class
lagom.
BaseEngine
(config, **kwargs)[source]¶ Base class for all engines.
It defines the training and evaluation process.
-
eval
(n=None, **kwargs)[source]¶ Evaluation process for one iteration.
Note
It is recommended to use
Logger
to store loggings.Note
All parameterized modules should be called .eval() to specify evaluation mode.
Parameters: - n (int, optional) – n-th iteration for evaluation.
- **kwargs – keyword aguments used for logging.
Returns: a dictionary of evluation output
Return type: dict
-
train
(n=None, **kwargs)[source]¶ Training process for one iteration.
Note
It is recommended to use
Logger
to store loggings.Note
All parameterized modules should be called .train() to specify training mode.
Parameters: - n (int, optional) – n-th iteration for training.
- **kwargs – keyword aguments used for logging.
Returns: a dictionary of training output
Return type: dict
-
Runner¶
Evolution Strategies¶
-
class
lagom.
BaseES
[source]¶ Base class for all evolution strategies.
Note
The optimization is treated as minimization. e.g. maximize rewards is equivalent to minimize negative rewards.
Note
For painless parallelization, we highly recommend to use concurrent.futures.ProcessPoolExecutor with a few practical tips.
- Set max_workers argument to control the max parallelization capacity.
- When execution get stuck, try to use
CloudpickleWrapper
to wrap the objective function e.g. particularly for lambda, class methods - Use with ProcessPoolExecutor once to wrap entire iterative ES generations. Because using this internally for each generation, it can slow down the parallelization dramatically due to overheads.
- To reduce overheads further (e.g. PyTorch models, gym environments)
- Recreate such models for each generation will be very expensive.
- Use initializer function for ProcessPoolExecutor
- Within initializer function, define PyTorch models and gym environments as global variables Note that the global variables are defined to each worker independently
- Don’t forget to use with torch.no_grad to increase forward pass speed.
-
ask
()[source]¶ Sample a set of new candidate solutions.
Returns: a list of sampled candidate solutions Return type: list
-
result
¶ Return a namedtuple of all results for the optimization.
It contains: * xbest: best solution evaluated * fbest: objective function value of the best solution * evals_best: evaluation count when xbest was evaluated * evaluations: evaluations overall done * iterations: number of iterations * xfavorite: distribution mean in “phenotype” space, to be considered as current best estimate of the optimum * stds: effective standard deviations
-
tell
(solutions, function_values)[source]¶ Update the parameters of the population for a new generation based on the values of the objective function evaluated for sampled solutions.
Parameters: - solutions (list/ndarray) – candidate solutions returned from
ask()
- function_values (list) – a list of objective function values evaluated for the sampled solutions.
- solutions (list/ndarray) – candidate solutions returned from
-
class
lagom.
CMAES
(x0, sigma0, opts=None)[source]¶ Implements CMA-ES algorithm.
Note
It is a wrapper of the original CMA-ES implementation.
Parameters: - x0 (list) – initial solution
- sigma0 (list) – initial standard deviation
- opts (dict) – a dictionary of options, e.g. [‘popsize’, ‘seed’]
-
ask
()[source]¶ Sample a set of new candidate solutions.
Returns: a list of sampled candidate solutions Return type: list
-
result
¶ Return a namedtuple of all results for the optimization.
It contains: * xbest: best solution evaluated * fbest: objective function value of the best solution * evals_best: evaluation count when xbest was evaluated * evaluations: evaluations overall done * iterations: number of iterations * xfavorite: distribution mean in “phenotype” space, to be considered as current best estimate of the optimum * stds: effective standard deviations
-
tell
(solutions, function_values)[source]¶ Update the parameters of the population for a new generation based on the values of the objective function evaluated for sampled solutions.
Parameters: - solutions (list/ndarray) – candidate solutions returned from
ask()
- function_values (list) – a list of objective function values evaluated for the sampled solutions.
- solutions (list/ndarray) – candidate solutions returned from
-
class
lagom.
CEM
(x0, sigma0, opts=None)[source]¶ -
ask
()[source]¶ Sample a set of new candidate solutions.
Returns: a list of sampled candidate solutions Return type: list
-
result
¶ Return a namedtuple of all results for the optimization.
It contains: * xbest: best solution evaluated * fbest: objective function value of the best solution * evals_best: evaluation count when xbest was evaluated * evaluations: evaluations overall done * iterations: number of iterations * xfavorite: distribution mean in “phenotype” space, to be considered as current best estimate of the optimum * stds: effective standard deviations
-
tell
(solutions, function_values)[source]¶ Update the parameters of the population for a new generation based on the values of the objective function evaluated for sampled solutions.
Parameters: - solutions (list/ndarray) – candidate solutions returned from
ask()
- function_values (list) – a list of objective function values evaluated for the sampled solutions.
- solutions (list/ndarray) – candidate solutions returned from
-