lagom.envs

class lagom.envs.RecordEpisodeStatistics(env, deque_size=100)[source]
reset(**kwargs)[source]

Resets the state of the environment and returns an initial observation.

Returns:the initial observation.
Return type:observation (object)
step(action)[source]

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Parameters:action (object) – an action provided by the agent
Returns:agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
Return type:observation (object)
class lagom.envs.NormalizeObservation(env, clip=5.0, constant_moments=None)[source]
class lagom.envs.NormalizeReward(env, clip=10.0, gamma=0.99, constant_var=None)[source]
reset()[source]

Resets the state of the environment and returns an initial observation.

Returns:the initial observation.
Return type:observation (object)
step(action)[source]

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Parameters:action (object) – an action provided by the agent
Returns:agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
Return type:observation (object)
class lagom.envs.TimeStepEnv(env)[source]
reset(**kwargs)[source]

Resets the state of the environment and returns an initial observation.

Returns:the initial observation.
Return type:observation (object)
step(action)[source]

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Parameters:action (object) – an action provided by the agent
Returns:agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
Return type:observation (object)
class lagom.envs.VecEnv(list_make_env)[source]

A vectorized environment runs serially for each sub-environment.

Each observation returned from vectorized environment is a batch of observations for each sub-environment. And step() is expected to receive a batch of actions for each sub-environment.

Note

All sub-environments should share the identical observation and action spaces. In other words, a vector of multiple different environments is not supported.

Parameters:
  • list_make_env (list) – a list of functions each returns an instantiated enviroment.
  • observation_space (Space) – observation space of the environment
  • action_space (Space) – action space of the environment
close()[source]

Close all environments.

It closes all the existing image viewers, then calls close_extras() and set closed as True.

Warning

This function itself does not close the environments, it should be handled in close_extras(). This is useful for parallelized environments.

Note

This will be automatically called when garbage collected or program exited.

close_extras()[source]

Clean up the extra resources e.g. beyond what’s in this base class.

get_images()[source]

Returns a batched RGB array with shape [N, H, W, C] from all environments.

Returns:a batched RGB array with shape [N, H, W, C]
Return type:ndarray
get_viewer()[source]

Returns an instantiated ImageViewer.

Returns:an image viewer
Return type:ImageViewer
render(mode='human')[source]

Render all the environments.

It firstly retrieve RGB images from all environments and use GridImage to make a grid of them as a single image. Then it either returns the image array or display the image to the screen by using ImageViewer.

See docstring in Env for more detais about rendering.

reset()[source]

Reset all the environments and return a list of initial observations from each environment.

Warning

If step_async() is still working, then it will be aborted.

Returns:a list of initial observations from all environments.
Return type:list
step(actions)[source]

Ask all the environments to take a step with a list of actions, each for one environment.

Parameters:actions (list) – a list of actions, each for one environment.
Returns:
a tuple of (observations, rewards, dones, infos)
  • observations (list): a list of observations, each returned from one environment after executing the given action.
  • rewards (list): a list of scalar rewards, each returned from one environment.
  • dones (list): a list of booleans indicating whether the episode terminates, each returned from one environment.
  • infos (list): a list of dictionaries of additional informations, each returned from one environment.
Return type:tuple
unwrapped

Unwrap this vectorized environment.

Useful for sequential wrappers applied, it can access information from the original vectorized environment.

class lagom.envs.VecEnvWrapper(env)[source]

Wraps the vectorized environment to allow a modular transformation.

This class is the base class for all wrappers for vectorized environments. The subclass could override some methods to change the behavior of the original vectorized environment without touching the original code.

Note

Don’t forget to call super().__init__(env) if the subclass overrides __init__().

close_extras()[source]

Clean up the extra resources e.g. beyond what’s in this base class.

get_images()[source]

Returns a batched RGB array with shape [N, H, W, C] from all environments.

Returns:a batched RGB array with shape [N, H, W, C]
Return type:ndarray
reset()[source]

Reset all the environments and return a list of initial observations from each environment.

Warning

If step_async() is still working, then it will be aborted.

Returns:a list of initial observations from all environments.
Return type:list
step(actions)[source]

Ask all the environments to take a step with a list of actions, each for one environment.

Parameters:actions (list) – a list of actions, each for one environment.
Returns:
a tuple of (observations, rewards, dones, infos)
  • observations (list): a list of observations, each returned from one environment after executing the given action.
  • rewards (list): a list of scalar rewards, each returned from one environment.
  • dones (list): a list of booleans indicating whether the episode terminates, each returned from one environment.
  • infos (list): a list of dictionaries of additional informations, each returned from one environment.
Return type:tuple
unwrapped

Unwrap this vectorized environment.

Useful for sequential wrappers applied, it can access information from the original vectorized environment.

lagom.envs.make_vec_env(make_env, num_env, init_seed)[source]

Create a vectorized environment, each associated with a different random seed.

Example:

>>> import gym
>>> make_vec_env(lambda: gym.make('CartPole-v1'), 3, 0)
<VecEnv: 3, CartPole-v1>
Parameters:
  • make_env (function) – a function to create an environment
  • num_env (int) – number of environments to create.
  • init_seed (int) – initial seed for Seeder to sample random seeds.
Returns:

created vectorized environment

Return type:

VecEnv