lagom¶

Agent¶

class lagom.BaseAgent(config, env, device, **kwargs)[source]¶

Base class for all agents.

The agent could select an action from a given observation and update itself by defining a certain learning mechanism.

Any agent should subclass this class, e.g. policy-based or value-based.

Note

All agents should by default handle batched data e.g. batched observation returned from VecEnv and batched action for each sub-environment of a VecEnv.

Parameters:	config (dict) – a dictionary of configurations env (VecEnv) – environment object. device (Device) – a PyTorch device **kwargs – keyword aguments used to specify the agent

choose_action(obs, **kwargs)[source]¶

Returns an (batched) action selected by the agent from received (batched) observation/

Note

Tensor conversion should be handled here instead of in policy or network forward pass.

The output is a dictionary containing useful items, e.g. action, action_logprob, state_value

Parameters:	obs (object) – batched observation returned from the environment. First dimension is treated as batch dimension. **kwargs – keyword arguments to specify action selection.
Returns:	out – a dictionary of action selection output. It should also contain all useful information to be stored during interaction with `BaseRunner`. This allows a generic API of the runner classes for all kinds of agents. Note that everything should be batched even if for scalar loss, i.e. `scalar_loss -> [scalar_loss]`
Return type:	dict

learn(D, **kwargs)[source]¶

Defines learning mechanism to update the agent from a batched data.

Parameters:	D (list) – a list of batched data to train the agent e.g. in policy gradient, this can be a list of `Trajectory` or `Segment` **kwargs – keyword arguments to specify learning mechanism
Returns:	out – a dictionary of learning output. This could contain the loss.
Return type:	dict

class lagom.RandomAgent(config, env, device, **kwargs)[source]¶

A random agent samples action uniformly from action space.

choose_action(obs, **kwargs)[source]¶

Returns an (batched) action selected by the agent from received (batched) observation/

Note

Tensor conversion should be handled here instead of in policy or network forward pass.

The output is a dictionary containing useful items, e.g. action, action_logprob, state_value

Parameters:	obs (object) – batched observation returned from the environment. First dimension is treated as batch dimension. **kwargs – keyword arguments to specify action selection.
Returns:	out – a dictionary of action selection output. It should also contain all useful information to be stored during interaction with `BaseRunner`. This allows a generic API of the runner classes for all kinds of agents. Note that everything should be batched even if for scalar loss, i.e. `scalar_loss -> [scalar_loss]`
Return type:	dict

learn(D, **kwargs)[source]¶

Defines learning mechanism to update the agent from a batched data.

Parameters:	D (list) – a list of batched data to train the agent e.g. in policy gradient, this can be a list of `Trajectory` or `Segment` **kwargs – keyword arguments to specify learning mechanism
Returns:	out – a dictionary of learning output. This could contain the loss.
Return type:	dict

Logger¶

class lagom.Logger[source]¶

Log the information in a dictionary.

If a key is logged more than once, then the new value will be appended to a list.

Note

It uses pickle to serialize the data. Empirically, pickle is 2x faster than numpy.save and other alternatives like yaml is too slow and JSON does not support numpy array.

Warning

It is discouraged to store hierarchical structure, e.g. list of dict of list of ndarray. Because pickling such complex and large data structure is extremely slow. Put dictionary only at the topmost level. Large numpy array should be saved separately.

Example:

Default:

>>> logger = Logger()
>>> logger('iteration', 1)
>>> logger('train_loss', 0.12)
>>> logger('iteration', 2)
>>> logger('train_loss', 0.11)
>>> logger('iteration', 3)
>>> logger('train_loss', 0.09)

>>> logger
OrderedDict([('iteration', [1, 2, 3]), ('train_loss', [0.12, 0.11, 0.09])])

>>> logger.dump()
Iteration: [1, 2, 3]
Train Loss: [0.12, 0.11, 0.09]

With indentation:

>>> logger.dump(indent=1)
    Iteration: [1, 2, 3]
    Train Loss: [0.12, 0.11, 0.09]

With specific keys:

>>> logger.dump(keys=['iteration'])
Iteration: [1, 2, 3]

With specific index:

>>> logger.dump(index=0)
Iteration: 1
Train Loss: 0.12

With specific list of indices:

>>> logger.dump(index=[0, 2])
Iteration: [1, 3]
Train Loss: [0.12, 0.09]

__call__(key, value)[source]¶

Log the information with given key and value.

Note

The key should be semantic and each word is separated by _.

Parameters:	key (str) – key of the information value (object) – value to be logged

clear()[source]¶: Remove all loggings in the dictionary.

dump(keys=None, index=None, indent=0, border='')[source]¶

Dump the loggings to the screen.

Parameters:

keys (list, optional) – a list of selected keys. If None, then use all keys. Default: None
index (int/list, optional) –
the index of logged values. It has following use cases:
- scalar: a specific index. If -1, then use last element.
- list: a list of indicies.
- None: all indicies.
indent (int, optional) – the number of tab indentation. Default: 0
border (str, optional) – the string to print as header and footer

save(f)[source]¶

Save loggings to a file.

Parameters:	f (str) – file path

Engine¶

class lagom.BaseEngine(config, **kwargs)[source]¶

Base class for all engines.

It defines the training and evaluation process.

eval(n=None, **kwargs)[source]¶

Evaluation process for one iteration.

Note

It is recommended to use Logger to store loggings.

Note

All parameterized modules should be called .eval() to specify evaluation mode.

Parameters:	n (int, optional) – n-th iteration for evaluation. **kwargs – keyword aguments used for logging.
Returns:	out – evluation output
Return type:	dict

train(n=None, **kwargs)[source]¶

Training process for one iteration.

Note

It is recommended to use Logger to store loggings.

Note

All parameterized modules should be called .train() to specify training mode.

Parameters:	n (int, optional) – n-th iteration for training. **kwargs – keyword aguments used for logging.
Returns:	out – training output
Return type:	dict

Evolution Strategies¶

class lagom.BaseES[source]¶

Base class for all evolution strategies.

Note

The optimization is treated as minimization. e.g. maximize rewards is equivalent to minimize negative rewards.

Note

For painless parallelization, we highly recommend to use concurrent.futures.ProcessPoolExecutor with a few practical tips.

Set max_workers argument to control the max parallelization capacity.
When execution get stuck, try to use CloudpickleWrapper to wrap the objective function e.g. particularly for lambda, class methods
Use with ProcessPoolExecutor once to wrap entire iterative ES generations. Because using this internally for each generation, it can slow down the parallelization dramatically due to overheads.
To reduce overheads further (e.g. PyTorch models, gym environments)
- Recreate such models for each generation will be very expensive.
- Use initializer function for ProcessPoolExecutor
- Within initializer function, define PyTorch models and gym environments as global variables Note that the global variables are defined to each worker independently
- Don’t forget to use with torch.no_grad to increase forward pass speed.

ask()[source]¶

Sample a set of new candidate solutions.

Returns:	solutions – sampled candidate solutions
Return type:	list

result¶

Return a namedtuple of all results for the optimization.

It contains: * xbest: best solution evaluated * fbest: objective function value of the best solution * evals_best: evaluation count when xbest was evaluated * evaluations: evaluations overall done * iterations: number of iterations * xfavorite: distribution mean in “phenotype” space, to be considered as current best estimate of the optimum * stds: effective standard deviations

tell(solutions, function_values)[source]¶

Update the parameters of the population for a new generation based on the values of the objective function evaluated for sampled solutions.

Parameters:	solutions (list/ndarray) – candidate solutions returned from `ask()` function_values (list) – a list of objective function values evaluated for the sampled solutions.

class lagom.CMAES(x0, sigma0, opts=None)[source]¶

Implements CMA-ES algorithm.

Note

It is a wrapper of the original CMA-ES implementation.

Parameters:	x0 (list) – initial solution sigma0 (list) – initial standard deviation opts (dict) – a dictionary of options, e.g. [‘popsize’, ‘seed’]

ask()[source]¶

Sample a set of new candidate solutions.

Returns:	solutions – sampled candidate solutions
Return type:	list

result¶

Return a namedtuple of all results for the optimization.

It contains: * xbest: best solution evaluated * fbest: objective function value of the best solution * evals_best: evaluation count when xbest was evaluated * evaluations: evaluations overall done * iterations: number of iterations * xfavorite: distribution mean in “phenotype” space, to be considered as current best estimate of the optimum * stds: effective standard deviations

tell(solutions, function_values)[source]¶

Update the parameters of the population for a new generation based on the values of the objective function evaluated for sampled solutions.

Parameters:	solutions (list/ndarray) – candidate solutions returned from `ask()` function_values (list) – a list of objective function values evaluated for the sampled solutions.

class lagom.CEM(x0, sigma0, opts=None)[source]¶

ask()[source]¶

Sample a set of new candidate solutions.

Returns:	solutions – sampled candidate solutions
Return type:	list

result¶

Return a namedtuple of all results for the optimization.

It contains: * xbest: best solution evaluated * fbest: objective function value of the best solution * evals_best: evaluation count when xbest was evaluated * evaluations: evaluations overall done * iterations: number of iterations * xfavorite: distribution mean in “phenotype” space, to be considered as current best estimate of the optimum * stds: effective standard deviations

tell(solutions, function_values)[source]¶

Update the parameters of the population for a new generation based on the values of the objective function evaluated for sampled solutions.

Parameters:	solutions (list/ndarray) – candidate solutions returned from `ask()` function_values (list) – a list of objective function values evaluated for the sampled solutions.