lagom.envs¶

lagom.envs.flatdim(space)[source]¶

lagom.envs.flatten(space, x)[source]¶

lagom.envs.unflatten(space, x)[source]¶

class lagom.envs.VecEnv(list_make_env)[source]¶

A vectorized environment runs serially for each sub-environment.

Each observation returned from vectorized environment is a batch of observations for each sub-environment. And step() is expected to receive a batch of actions for each sub-environment.

Note

All sub-environments should share the identical observation and action spaces. In other words, a vector of multiple different environments is not supported.

Parameters:	list_make_env (list) – a list of functions each returns an instantiated enviroment. observation_space (Space) – observation space of the environment action_space (Space) – action space of the environment

close()[source]¶

Close all environments.

It closes all the existing image viewers, then calls close_extras() and set closed as True.

Warning

This function itself does not close the environments, it should be handled in close_extras(). This is useful for parallelized environments.

Note

This will be automatically called when garbage collected or program exited.

close_extras()[source]¶: Clean up the extra resources e.g. beyond what’s in this base class.

get_images()[source]¶

Returns a batched RGB array with shape [N, H, W, C] from all environments.

Returns:	imgs – a batched RGB array with shape [N, H, W, C]
Return type:	ndarray

get_viewer()[source]¶

Returns an instantiated ImageViewer.

Returns:	viewer – an image viewer
Return type:	ImageViewer

render(mode='human')[source]¶

Render all the environments.

It firstly retrieve RGB images from all environments and use GridImage to make a grid of them as a single image. Then it either returns the image array or display the image to the screen by using ImageViewer.

See docstring in Env for more detais about rendering.

reset()[source]¶

Reset all the environments and return a list of initial observations from each environment.

Warning

If step_async() is still working, then it will be aborted.

Returns:	observations – a list of initial observations from all environments.
Return type:	list

step(actions)[source]¶

Ask all the environments to take a step with a list of actions, each for one environment.

Parameters: actions (list) – a list of actions, each for one environment.

Returns:

observations (list) – a list of observations, each returned from one environment after executing the given action.
rewards (list) – a list of scalar rewards, each returned from one environment.
dones (list) – a list of booleans indicating whether the episode terminates, each returned from one environment.
infos (list) – a list of dictionaries of additional informations, each returned from one environment.

unwrapped¶

Unwrap this vectorized environment.

Useful for sequential wrappers applied, it can access information from the original vectorized environment.

class lagom.envs.VecEnvWrapper(env)[source]¶

Wraps the vectorized environment to allow a modular transformation.

This class is the base class for all wrappers for vectorized environments. The subclass could override some methods to change the behavior of the original vectorized environment without touching the original code.

Note

Don’t forget to call super().__init__(env) if the subclass overrides __init__().

close_extras()[source]¶: Clean up the extra resources e.g. beyond what’s in this base class.

get_images()[source]¶

Returns a batched RGB array with shape [N, H, W, C] from all environments.

Returns:	imgs – a batched RGB array with shape [N, H, W, C]
Return type:	ndarray

reset()[source]¶

Reset all the environments and return a list of initial observations from each environment.

Warning

If step_async() is still working, then it will be aborted.

Returns:	observations – a list of initial observations from all environments.
Return type:	list

step(actions)[source]¶

Ask all the environments to take a step with a list of actions, each for one environment.

Parameters: actions (list) – a list of actions, each for one environment.

Returns:

observations (list) – a list of observations, each returned from one environment after executing the given action.
rewards (list) – a list of scalar rewards, each returned from one environment.
dones (list) – a list of booleans indicating whether the episode terminates, each returned from one environment.
infos (list) – a list of dictionaries of additional informations, each returned from one environment.

unwrapped¶

Unwrap this vectorized environment.

Useful for sequential wrappers applied, it can access information from the original vectorized environment.

lagom.envs.make_vec_env(make_env, num_env, init_seed)[source]¶

Create a vectorized environment, each associated with a different random seed.

Example:

>>> import gym
>>> make_vec_env(lambda: gym.make('CartPole-v1'), 3, 0)
<VecEnv: 3, CartPole-v1>

Parameters:	make_env (function) – a function to create an environment num_env (int) – number of environments to create. init_seed (int) – initial seed for `Seeder` to sample random seeds.
Returns:	env – created vectorized environment
Return type:	VecEnv

Wrappers¶

lagom.envs.wrappers.get_wrapper(env, name)[source]¶

Return a wrapped environment of a specific wrapper.

Note

If no such wrapper found, then an None is returned.

Parameters:	env (Env) – environment name (str) – name of the wrapper
Returns:	out – wrapped environment
Return type:	env

lagom.envs.wrappers.get_all_wrappers(env)[source]¶

Returns a list of wrapper names of a wrapped environment.

Parameters:	env (Env) – wrapped environment
Returns:	out – list of string names of wrappers
Return type:	list

class lagom.envs.wrappers.ClipAction(env)[source]¶: Clip the continuous action within the valid bound.

class lagom.envs.wrappers.ClipReward(env, min_r, max_r)[source]¶: “Clip reward to [min, max].

class lagom.envs.wrappers.SignClipReward(env)[source]¶: “Bin reward to {-1, 0, +1} by its sign.

class lagom.envs.wrappers.FlattenObservation(env)[source]¶: Observation wrapper that flattens the observation.

class lagom.envs.wrappers.NormalizeAction(env)[source]¶: Rescale the continuous action space of the environment from [-1, 1].

class lagom.envs.wrappers.LazyFrames(frames, lz4_compress=False)[source]¶

Ensures common frames are only stored once to optimize memory use.

To further reduce the memory use, it is optionally to turn on lz4 to compress the observations.

Note

This object should only be converted to numpy array just before forward pass.

class lagom.envs.wrappers.FrameStack(env, num_stack, lz4_compress=False)[source]¶

Observation wrapper that stacks the observations in a rolling manner.

For example, if the number os stacks is 4, then returned observation constains the most recent 4 observations. For environment ‘Pendulum-v0’, the original observation is an array with shape [3], so if we stack 4 observations, the processed observation has shape [3, 4].

Note

To be memory efficient, the stacked observations are wrapped by LazyFrame.

Note

The observation space must be Box type. If one uses Dict as observation space, it should apply FlattenDictWrapper at first.

Example:

>>> import gym
>>> env = gym.make('PongNoFrameskip-v0')
>>> env = FrameStack(env, 4)
>>> env.observation_space
Box(4, 210, 160, 3)

Parameters:	env (Env) – environment object num_stack (int) – number of stacks

reset(**kwargs)[source]¶

Resets the state of the environment and returns an initial observation.

Returns:	the initial observation.
Return type:	observation (object)

step(action)[source]¶

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Parameters:	action (object) – an action provided by the agent
Returns:	agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
Return type:	observation (object)

class lagom.envs.wrappers.GrayScaleObservation(env, keep_dim=False)[source]¶: Convert the image observation from RGB to gray scale.

class lagom.envs.wrappers.ResizeObservation(env, size)[source]¶: Downsample the image observation to a square image.

class lagom.envs.wrappers.ScaleReward(env, scale=0.01)[source]¶

Scale the reward.

Note

This is incredibly important and drastically impact on performance e.g. PPO.

Example:

>>> from lagom.envs import make_gym_env
>>> env = make_gym_env(env_id='CartPole-v1', seed=0)
>>> env = ScaleReward(env, scale=0.1)
>>> env.reset()
>>> observation, reward, done, info = env.step(env.action_space.sample())
>>> reward
0.1

Parameters:	env (Env) – environment scale (float) – reward scaling factor

class lagom.envs.wrappers.ScaledFloatFrame(env)[source]¶: Convert image frame to float range [0, 1] by dividing 255.

Warning

Do NOT use this wrapper for DQN ! It will break the memory optimization.

class lagom.envs.wrappers.TimeAwareObservation(env)[source]¶

Augment the observation with current time step in the trajectory.

Note

Currently it only works with one-dimensional observation space. It doesn’t support pixel observation space yet.

reset(**kwargs)[source]¶

Resets the state of the environment and returns an initial observation.

Returns:	the initial observation.
Return type:	observation (object)

step(action)[source]¶

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Parameters:	action (object) – an action provided by the agent
Returns:	agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
Return type:	observation (object)

class lagom.envs.wrappers.VecMonitor(env, deque_size=100)[source]¶

Record episode reward, horizon and time and report it when an episode terminates.

reset()[source]¶

Reset all the environments and return a list of initial observations from each environment.

Warning

If step_async() is still working, then it will be aborted.

Returns:	observations – a list of initial observations from all environments.
Return type:	list

step(actions)[source]¶

Ask all the environments to take a step with a list of actions, each for one environment.

Parameters: actions (list) – a list of actions, each for one environment.

Returns:

observations (list) – a list of observations, each returned from one environment after executing the given action.
rewards (list) – a list of scalar rewards, each returned from one environment.
dones (list) – a list of booleans indicating whether the episode terminates, each returned from one environment.
infos (list) – a list of dictionaries of additional informations, each returned from one environment.

class lagom.envs.wrappers.VecStandardizeObservation(env, clip=10.0, constant_moments=None)[source]¶

Standardizes the observations by running estimation of mean and variance.

Warning

To evaluate the agent trained on standardized observations, remember to save and load observation scalings, otherwise, the performance will be incorrect.

Parameters:	env (VecEnv) – a vectorized environment clip (float) – clipping range of standardized observation, i.e. [-clip, clip] constant_moments (tuple) – a tuple of constant mean and variance to standardize observation. Note that if it is provided, then running average will be ignored.

reset()[source]¶

Reset all the environments and return a list of initial observations from each environment.

Warning

If step_async() is still working, then it will be aborted.

Returns:	observations – a list of initial observations from all environments.
Return type:	list

step(actions)[source]¶

Ask all the environments to take a step with a list of actions, each for one environment.

Parameters: actions (list) – a list of actions, each for one environment.

Returns:

observations (list) – a list of observations, each returned from one environment after executing the given action.
rewards (list) – a list of scalar rewards, each returned from one environment.
dones (list) – a list of booleans indicating whether the episode terminates, each returned from one environment.
infos (list) – a list of dictionaries of additional informations, each returned from one environment.

class lagom.envs.wrappers.VecStandardizeReward(env, clip=10.0, gamma=0.99, constant_var=None)[source]¶

Standardize the reward by running estimation of variance.

Warning

We do not subtract running mean from reward but only divides it by running standard deviation. Because subtraction by mean will alter the reward shape so this might degrade the performance. Note that we perform this transformation from the second incoming reward while keeping first reward unchanged, otherwise it’ll have too large magnitude (then just being clipped) due to the fact that we do not subtract it from mean.

Note

Each reset(), we do not clean up the self.all_returns buffer. Because of discount factor (\(< 1\)), the running averages will converge after some iterations. Therefore, we do not allow discounted factor as \(1.0\) since it will lead to unbounded explosion of reward running averages.

Parameters:	env (VecEnv) – a vectorized environment clip (float) – clipping range of standardized reward, i.e. [-clip, clip] gamma (float) – discounted factor. Note that the value 1.0 should not be used. constant_var (ndarray) – Constant variance to standardize reward. Note that when it is provided, then running average will be ignored.

reset()[source]¶

Reset all the environments and return a list of initial observations from each environment.

Warning

If step_async() is still working, then it will be aborted.

Returns:	observations – a list of initial observations from all environments.
Return type:	list

step(actions)[source]¶

Ask all the environments to take a step with a list of actions, each for one environment.

Parameters: actions (list) – a list of actions, each for one environment.

Returns:

observations (list) – a list of observations, each returned from one environment after executing the given action.
rewards (list) – a list of scalar rewards, each returned from one environment.
dones (list) – a list of booleans indicating whether the episode terminates, each returned from one environment.
infos (list) – a list of dictionaries of additional informations, each returned from one environment.

class lagom.envs.wrappers.StepInfo(done: bool, info: dict)[source]¶

Defines a set of information for each time step.

A StepInfo is returned from each step and reset of an environment. It contains properties of the transition and additional information.

class lagom.envs.wrappers.VecStepInfo(env)[source]¶

reset()[source]¶

Reset all the environments and return a list of initial observations from each environment.

Warning

If step_async() is still working, then it will be aborted.

Returns:	observations – a list of initial observations from all environments.
Return type:	list

step(actions)[source]¶

Ask all the environments to take a step with a list of actions, each for one environment.

Parameters: actions (list) – a list of actions, each for one environment.

Returns:

observations (list) – a list of observations, each returned from one environment after executing the given action.
rewards (list) – a list of scalar rewards, each returned from one environment.
dones (list) – a list of booleans indicating whether the episode terminates, each returned from one environment.
infos (list) – a list of dictionaries of additional informations, each returned from one environment.