lagom¶
Agent¶
-
class
lagom.
BaseAgent
(config, env, device, **kwargs)[source]¶ Base class for all agents.
The agent could select an action from a given observation and update itself by defining a certain learning mechanism.
Any agent should subclass this class, e.g. policy-based or value-based.
Note
All agents should by default handle batched data e.g. batched observation returned from
VecEnv
and batched action for each sub-environment of aVecEnv
.Parameters: - config (dict) – a dictionary of configurations
- env (VecEnv) – environment object.
- device (Device) – a PyTorch device
- **kwargs – keyword aguments used to specify the agent
-
choose_action
(obs, **kwargs)[source]¶ Returns an (batched) action selected by the agent from received (batched) observation/
Note
Tensor conversion should be handled here instead of in policy or network forward pass.
The output is a dictionary containing useful items, e.g. action, action_logprob, state_value
Parameters: - obs (object) – batched observation returned from the environment. First dimension is treated as batch dimension.
- **kwargs – keyword arguments to specify action selection.
Returns: out – a dictionary of action selection output. It should also contain all useful information to be stored during interaction with
BaseRunner
. This allows a generic API of the runner classes for all kinds of agents. Note that everything should be batched even if for scalar loss, i.e.scalar_loss -> [scalar_loss]
Return type: dict
-
learn
(D, **kwargs)[source]¶ Defines learning mechanism to update the agent from a batched data.
Parameters: - D (list) – a list of batched data to train the agent e.g. in policy gradient, this can be
a list of
Trajectory
orSegment
- **kwargs – keyword arguments to specify learning mechanism
Returns: out – a dictionary of learning output. This could contain the loss.
Return type: dict
- D (list) – a list of batched data to train the agent e.g. in policy gradient, this can be
a list of
-
class
lagom.
RandomAgent
(config, env, device, **kwargs)[source]¶ A random agent samples action uniformly from action space.
-
choose_action
(obs, **kwargs)[source]¶ Returns an (batched) action selected by the agent from received (batched) observation/
Note
Tensor conversion should be handled here instead of in policy or network forward pass.
The output is a dictionary containing useful items, e.g. action, action_logprob, state_value
Parameters: - obs (object) – batched observation returned from the environment. First dimension is treated as batch dimension.
- **kwargs – keyword arguments to specify action selection.
Returns: out – a dictionary of action selection output. It should also contain all useful information to be stored during interaction with
BaseRunner
. This allows a generic API of the runner classes for all kinds of agents. Note that everything should be batched even if for scalar loss, i.e.scalar_loss -> [scalar_loss]
Return type: dict
-
learn
(D, **kwargs)[source]¶ Defines learning mechanism to update the agent from a batched data.
Parameters: - D (list) – a list of batched data to train the agent e.g. in policy gradient, this can be
a list of
Trajectory
orSegment
- **kwargs – keyword arguments to specify learning mechanism
Returns: out – a dictionary of learning output. This could contain the loss.
Return type: dict
- D (list) – a list of batched data to train the agent e.g. in policy gradient, this can be
a list of
-
Logger¶
-
class
lagom.
Logger
[source]¶ Log the information in a dictionary.
If a key is logged more than once, then the new value will be appended to a list.
Note
It uses pickle to serialize the data. Empirically,
pickle
is 2x faster thannumpy.save
and other alternatives likeyaml
is too slow andJSON
does not support numpy array.Warning
It is discouraged to store hierarchical structure, e.g. list of dict of list of ndarray. Because pickling such complex and large data structure is extremely slow. Put dictionary only at the topmost level. Large numpy array should be saved separately.
Example:
Default:
>>> logger = Logger() >>> logger('iteration', 1) >>> logger('train_loss', 0.12) >>> logger('iteration', 2) >>> logger('train_loss', 0.11) >>> logger('iteration', 3) >>> logger('train_loss', 0.09) >>> logger OrderedDict([('iteration', [1, 2, 3]), ('train_loss', [0.12, 0.11, 0.09])]) >>> logger.dump() Iteration: [1, 2, 3] Train Loss: [0.12, 0.11, 0.09]
With indentation:
>>> logger.dump(indent=1) Iteration: [1, 2, 3] Train Loss: [0.12, 0.11, 0.09]
With specific keys:
>>> logger.dump(keys=['iteration']) Iteration: [1, 2, 3]
With specific index:
>>> logger.dump(index=0) Iteration: 1 Train Loss: 0.12
With specific list of indices:
>>> logger.dump(index=[0, 2]) Iteration: [1, 3] Train Loss: [0.12, 0.09]
-
__call__
(key, value)[source]¶ Log the information with given key and value.
Note
The key should be semantic and each word is separated by
_
.Parameters: - key (str) – key of the information
- value (object) – value to be logged
-
dump
(keys=None, index=None, indent=0, border='')[source]¶ Dump the loggings to the screen.
Parameters: - keys (list, optional) – a list of selected keys. If
None
, then use all keys. Default:None
- index (int/list, optional) –
the index of logged values. It has following use cases:
scalar
: a specific index. If-1
, then use last element.list
: a list of indicies.None
: all indicies.
- indent (int, optional) – the number of tab indentation. Default:
0
- border (str, optional) – the string to print as header and footer
- keys (list, optional) – a list of selected keys. If
Engine¶
-
class
lagom.
BaseEngine
(config, **kwargs)[source]¶ Base class for all engines.
It defines the training and evaluation process.
-
eval
(n=None, **kwargs)[source]¶ Evaluation process for one iteration.
Note
It is recommended to use
Logger
to store loggings.Note
All parameterized modules should be called .eval() to specify evaluation mode.
Parameters: - n (int, optional) – n-th iteration for evaluation.
- **kwargs – keyword aguments used for logging.
Returns: out – evluation output
Return type: dict
-
train
(n=None, **kwargs)[source]¶ Training process for one iteration.
Note
It is recommended to use
Logger
to store loggings.Note
All parameterized modules should be called .train() to specify training mode.
Parameters: - n (int, optional) – n-th iteration for training.
- **kwargs – keyword aguments used for logging.
Returns: out – training output
Return type: dict
-
Evolution Strategies¶
-
class
lagom.
BaseES
[source]¶ Base class for all evolution strategies.
Note
The optimization is treated as minimization. e.g. maximize rewards is equivalent to minimize negative rewards.
Note
For painless parallelization, we highly recommend to use concurrent.futures.ProcessPoolExecutor with a few practical tips.
- Set max_workers argument to control the max parallelization capacity.
- When execution get stuck, try to use
CloudpickleWrapper
to wrap the objective function e.g. particularly for lambda, class methods - Use with ProcessPoolExecutor once to wrap entire iterative ES generations. Because using this internally for each generation, it can slow down the parallelization dramatically due to overheads.
- To reduce overheads further (e.g. PyTorch models, gym environments)
- Recreate such models for each generation will be very expensive.
- Use initializer function for ProcessPoolExecutor
- Within initializer function, define PyTorch models and gym environments as global variables Note that the global variables are defined to each worker independently
- Don’t forget to use with torch.no_grad to increase forward pass speed.
-
ask
()[source]¶ Sample a set of new candidate solutions.
Returns: solutions – sampled candidate solutions Return type: list
-
result
¶ Return a namedtuple of all results for the optimization.
It contains: * xbest: best solution evaluated * fbest: objective function value of the best solution * evals_best: evaluation count when xbest was evaluated * evaluations: evaluations overall done * iterations: number of iterations * xfavorite: distribution mean in “phenotype” space, to be considered as current best estimate of the optimum * stds: effective standard deviations
-
tell
(solutions, function_values)[source]¶ Update the parameters of the population for a new generation based on the values of the objective function evaluated for sampled solutions.
Parameters: - solutions (list/ndarray) – candidate solutions returned from
ask()
- function_values (list) – a list of objective function values evaluated for the sampled solutions.
- solutions (list/ndarray) – candidate solutions returned from
-
class
lagom.
CMAES
(x0, sigma0, opts=None)[source]¶ Implements CMA-ES algorithm.
Note
It is a wrapper of the original CMA-ES implementation.
Parameters: - x0 (list) – initial solution
- sigma0 (list) – initial standard deviation
- opts (dict) – a dictionary of options, e.g. [‘popsize’, ‘seed’]
-
ask
()[source]¶ Sample a set of new candidate solutions.
Returns: solutions – sampled candidate solutions Return type: list
-
result
¶ Return a namedtuple of all results for the optimization.
It contains: * xbest: best solution evaluated * fbest: objective function value of the best solution * evals_best: evaluation count when xbest was evaluated * evaluations: evaluations overall done * iterations: number of iterations * xfavorite: distribution mean in “phenotype” space, to be considered as current best estimate of the optimum * stds: effective standard deviations
-
tell
(solutions, function_values)[source]¶ Update the parameters of the population for a new generation based on the values of the objective function evaluated for sampled solutions.
Parameters: - solutions (list/ndarray) – candidate solutions returned from
ask()
- function_values (list) – a list of objective function values evaluated for the sampled solutions.
- solutions (list/ndarray) – candidate solutions returned from
-
class
lagom.
CEM
(x0, sigma0, opts=None)[source]¶ -
ask
()[source]¶ Sample a set of new candidate solutions.
Returns: solutions – sampled candidate solutions Return type: list
-
result
¶ Return a namedtuple of all results for the optimization.
It contains: * xbest: best solution evaluated * fbest: objective function value of the best solution * evals_best: evaluation count when xbest was evaluated * evaluations: evaluations overall done * iterations: number of iterations * xfavorite: distribution mean in “phenotype” space, to be considered as current best estimate of the optimum * stds: effective standard deviations
-
tell
(solutions, function_values)[source]¶ Update the parameters of the population for a new generation based on the values of the objective function evaluated for sampled solutions.
Parameters: - solutions (list/ndarray) – candidate solutions returned from
ask()
- function_values (list) – a list of objective function values evaluated for the sampled solutions.
- solutions (list/ndarray) – candidate solutions returned from
-