lagom¶

Agent¶

class lagom.BaseAgent(config, env, device, **kwargs)[source]¶

Base class for all agents.

The agent could select an action from some data (e.g. observation) and update itself by defining a certain learning mechanism.

Any agent should subclass this class, e.g. policy-based or value-based.

Parameters:	config (dict) – a dictionary of configurations env (Env) – environment object. device (Device) – a PyTorch device **kwargs – keyword aguments used to specify the agent

choose_action(x, **kwargs)[source]¶

Returns the selected action given the data.

Note

It’s recommended to handle all dtype/device conversions between CPU/GPU or Tensor/Numpy here.

The output is a dictionary containing useful items,

Parameters:

obs (object) – batched observation returned from the environment. First dimension is treated as batch dimension.
**kwargs – keyword arguments to specify action selection.

Returns:

a dictionary of action selection output. It contains all useful information (e.g. action,: action_logprob, state_value). This allows the API to be generic and compatible with different kinds of runner and agents.

Return type:

dict

learn(D, **kwargs)[source]¶

Defines learning mechanism to update the agent from a batched data.

Parameters:	D (list) – a list of batched data to train the agent e.g. in policy gradient, this can be a list of `Trajectory`. **kwargs – keyword arguments to specify learning mechanism
Returns:	a dictionary of learning output. This could contain the loss and other useful metrics.
Return type:	dict

class lagom.RandomAgent(config, env, device, **kwargs)[source]¶

A random agent samples action uniformly from action space.

choose_action(x, **kwargs)[source]¶

Returns the selected action given the data.

Note

It’s recommended to handle all dtype/device conversions between CPU/GPU or Tensor/Numpy here.

The output is a dictionary containing useful items,

Parameters:

obs (object) – batched observation returned from the environment. First dimension is treated as batch dimension.
**kwargs – keyword arguments to specify action selection.

Returns:

a dictionary of action selection output. It contains all useful information (e.g. action,: action_logprob, state_value). This allows the API to be generic and compatible with different kinds of runner and agents.

Return type:

dict

learn(D, **kwargs)[source]¶

Defines learning mechanism to update the agent from a batched data.

Parameters:	D (list) – a list of batched data to train the agent e.g. in policy gradient, this can be a list of `Trajectory`. **kwargs – keyword arguments to specify learning mechanism
Returns:	a dictionary of learning output. This could contain the loss and other useful metrics.
Return type:	dict

Data¶

class lagom.StepType[source]¶: An enumeration.

class lagom.TimeStep(step_type: lagom.data.StepType, observation: object, reward: float, done: bool, info: dict)[source]¶

class lagom.Trajectory[source]¶

Logger¶

class lagom.Logger[source]¶

Log the information in a dictionary.

If a key is logged more than once, then the new value will be appended to a list.

Note

It uses pickle to serialize the data. Empirically, pickle is 2x faster than numpy.save and other alternatives like yaml is too slow and JSON does not support numpy array.

Warning

It is discouraged to store hierarchical structure, e.g. list of dict of list of ndarray. Because pickling such complex and large data structure is extremely slow. Put dictionary only at the topmost level. Large numpy array should be saved separately.

Example:

Default:

>>> logger = Logger()
>>> logger('iteration', 1)
>>> logger('train_loss', 0.12)
>>> logger('iteration', 2)
>>> logger('train_loss', 0.11)
>>> logger('iteration', 3)
>>> logger('train_loss', 0.09)

>>> logger
OrderedDict([('iteration', [1, 2, 3]), ('train_loss', [0.12, 0.11, 0.09])])

>>> logger.dump()
Iteration: [1, 2, 3]
Train Loss: [0.12, 0.11, 0.09]

With indentation:

>>> logger.dump(indent=1)
    Iteration: [1, 2, 3]
    Train Loss: [0.12, 0.11, 0.09]

With specific keys:

>>> logger.dump(keys=['iteration'])
Iteration: [1, 2, 3]

With specific index:

>>> logger.dump(index=0)
Iteration: 1
Train Loss: 0.12

With specific list of indices:

>>> logger.dump(index=[0, 2])
Iteration: [1, 3]
Train Loss: [0.12, 0.09]

__call__(key, value)[source]¶

Log the information with given key and value.

Note

The key should be semantic and each word is separated by _.

Parameters:	key (str) – key of the information value (object) – value to be logged

clear()[source]¶: Remove all loggings in the dictionary.

dump(keys=None, index=None, indent=0, border='')[source]¶

Dump the loggings to the screen.

Parameters:

keys (list, optional) – a list of selected keys. If None, then use all keys. Default: None
index (int/list, optional) –
the index of logged values. It has following use cases:
- scalar: a specific index. If -1, then use last element.
- list: a list of indicies.
- None: all indicies.
indent (int, optional) – the number of tab indentation. Default: 0
border (str, optional) – the string to print as header and footer

save(f)[source]¶

Save loggings to a file.

Parameters:	f (str) – file path

Engine¶

class lagom.BaseEngine(config, **kwargs)[source]¶

Base class for all engines.

It defines the training and evaluation process.

eval(n=None, **kwargs)[source]¶

Evaluation process for one iteration.

Note

It is recommended to use Logger to store loggings.

Note

All parameterized modules should be called .eval() to specify evaluation mode.

Parameters:	n (int, optional) – n-th iteration for evaluation. **kwargs – keyword aguments used for logging.
Returns:	a dictionary of evluation output
Return type:	dict

train(n=None, **kwargs)[source]¶

Training process for one iteration.

Note

It is recommended to use Logger to store loggings.

Note

All parameterized modules should be called .train() to specify training mode.

Parameters:	n (int, optional) – n-th iteration for training. **kwargs – keyword aguments used for logging.
Returns:	a dictionary of training output
Return type:	dict

Runner¶

class lagom.BaseRunner[source]¶

Base class for all runners.

A runner is a data collection interface between the agent and the environment.

__call__(agent, env, **kwargs)[source]¶

Defines data collection via interactions between the agent and the environment.

Parameters:	agent (BaseAgent) – agent env (Env) – environment **kwargs – keyword arguments for more specifications.

class lagom.EpisodeRunner[source]¶

class lagom.StepRunner(reset_on_call=True)[source]¶

Evolution Strategies¶

class lagom.BaseES[source]¶

Base class for all evolution strategies.

Note

The optimization is treated as minimization. e.g. maximize rewards is equivalent to minimize negative rewards.

Note

For painless parallelization, we highly recommend to use concurrent.futures.ProcessPoolExecutor with a few practical tips.

Set max_workers argument to control the max parallelization capacity.
When execution get stuck, try to use CloudpickleWrapper to wrap the objective function e.g. particularly for lambda, class methods
Use with ProcessPoolExecutor once to wrap entire iterative ES generations. Because using this internally for each generation, it can slow down the parallelization dramatically due to overheads.
To reduce overheads further (e.g. PyTorch models, gym environments)
- Recreate such models for each generation will be very expensive.
- Use initializer function for ProcessPoolExecutor
- Within initializer function, define PyTorch models and gym environments as global variables Note that the global variables are defined to each worker independently
- Don’t forget to use with torch.no_grad to increase forward pass speed.

ask()[source]¶

Sample a set of new candidate solutions.

Returns:	a list of sampled candidate solutions
Return type:	list

result¶

Return a namedtuple of all results for the optimization.

It contains: * xbest: best solution evaluated * fbest: objective function value of the best solution * evals_best: evaluation count when xbest was evaluated * evaluations: evaluations overall done * iterations: number of iterations * xfavorite: distribution mean in “phenotype” space, to be considered as current best estimate of the optimum * stds: effective standard deviations

tell(solutions, function_values)[source]¶

Update the parameters of the population for a new generation based on the values of the objective function evaluated for sampled solutions.

Parameters:	solutions (list/ndarray) – candidate solutions returned from `ask()` function_values (list) – a list of objective function values evaluated for the sampled solutions.

class lagom.CMAES(x0, sigma0, opts=None)[source]¶

Implements CMA-ES algorithm.

Note

It is a wrapper of the original CMA-ES implementation.

Parameters:	x0 (list) – initial solution sigma0 (list) – initial standard deviation opts (dict) – a dictionary of options, e.g. [‘popsize’, ‘seed’]

ask()[source]¶

Sample a set of new candidate solutions.

Returns:	a list of sampled candidate solutions
Return type:	list

result¶

Return a namedtuple of all results for the optimization.

It contains: * xbest: best solution evaluated * fbest: objective function value of the best solution * evals_best: evaluation count when xbest was evaluated * evaluations: evaluations overall done * iterations: number of iterations * xfavorite: distribution mean in “phenotype” space, to be considered as current best estimate of the optimum * stds: effective standard deviations

tell(solutions, function_values)[source]¶

Update the parameters of the population for a new generation based on the values of the objective function evaluated for sampled solutions.

Parameters:	solutions (list/ndarray) – candidate solutions returned from `ask()` function_values (list) – a list of objective function values evaluated for the sampled solutions.

class lagom.CEM(x0, sigma0, opts=None)[source]¶

ask()[source]¶

Sample a set of new candidate solutions.

Returns:	a list of sampled candidate solutions
Return type:	list

result¶

Return a namedtuple of all results for the optimization.

It contains: * xbest: best solution evaluated * fbest: objective function value of the best solution * evals_best: evaluation count when xbest was evaluated * evaluations: evaluations overall done * iterations: number of iterations * xfavorite: distribution mean in “phenotype” space, to be considered as current best estimate of the optimum * stds: effective standard deviations

tell(solutions, function_values)[source]¶

Update the parameters of the population for a new generation based on the values of the objective function evaluated for sampled solutions.

Parameters:	solutions (list/ndarray) – candidate solutions returned from `ask()` function_values (list) – a list of objective function values evaluated for the sampled solutions.