lagom.envs¶
-
class
lagom.envs.
VecEnv
(list_make_env)[source]¶ A vectorized environment runs serially for each sub-environment.
Each observation returned from vectorized environment is a batch of observations for each sub-environment. And
step()
is expected to receive a batch of actions for each sub-environment.Note
All sub-environments should share the identical observation and action spaces. In other words, a vector of multiple different environments is not supported.
Parameters: - list_make_env (list) – a list of functions each returns an instantiated enviroment.
- observation_space (Space) – observation space of the environment
- action_space (Space) – action space of the environment
-
close
()[source]¶ Close all environments.
It closes all the existing image viewers, then calls
close_extras()
and setclosed
asTrue
.Warning
This function itself does not close the environments, it should be handled in
close_extras()
. This is useful for parallelized environments.Note
This will be automatically called when garbage collected or program exited.
-
get_images
()[source]¶ Returns a batched RGB array with shape [N, H, W, C] from all environments.
Returns: imgs – a batched RGB array with shape [N, H, W, C] Return type: ndarray
-
get_viewer
()[source]¶ Returns an instantiated
ImageViewer
.Returns: viewer – an image viewer Return type: ImageViewer
-
render
(mode='human')[source]¶ Render all the environments.
It firstly retrieve RGB images from all environments and use
GridImage
to make a grid of them as a single image. Then it either returns the image array or display the image to the screen by usingImageViewer
.See docstring in
Env
for more detais about rendering.
-
reset
()[source]¶ Reset all the environments and return a list of initial observations from each environment.
Warning
If
step_async()
is still working, then it will be aborted.Returns: observations – a list of initial observations from all environments. Return type: list
-
step
(actions)[source]¶ Ask all the environments to take a step with a list of actions, each for one environment.
Parameters: actions (list) – a list of actions, each for one environment. Returns: - observations (list) – a list of observations, each returned from one environment after executing the given action.
- rewards (list) – a list of scalar rewards, each returned from one environment.
- dones (list) – a list of booleans indicating whether the episode terminates, each returned from one environment.
- infos (list) – a list of dictionaries of additional informations, each returned from one environment.
-
unwrapped
¶ Unwrap this vectorized environment.
Useful for sequential wrappers applied, it can access information from the original vectorized environment.
-
class
lagom.envs.
VecEnvWrapper
(env)[source]¶ Wraps the vectorized environment to allow a modular transformation.
This class is the base class for all wrappers for vectorized environments. The subclass could override some methods to change the behavior of the original vectorized environment without touching the original code.
Note
Don’t forget to call
super().__init__(env)
if the subclass overrides__init__()
.-
get_images
()[source]¶ Returns a batched RGB array with shape [N, H, W, C] from all environments.
Returns: imgs – a batched RGB array with shape [N, H, W, C] Return type: ndarray
-
reset
()[source]¶ Reset all the environments and return a list of initial observations from each environment.
Warning
If
step_async()
is still working, then it will be aborted.Returns: observations – a list of initial observations from all environments. Return type: list
-
step
(actions)[source]¶ Ask all the environments to take a step with a list of actions, each for one environment.
Parameters: actions (list) – a list of actions, each for one environment. Returns: - observations (list) – a list of observations, each returned from one environment after executing the given action.
- rewards (list) – a list of scalar rewards, each returned from one environment.
- dones (list) – a list of booleans indicating whether the episode terminates, each returned from one environment.
- infos (list) – a list of dictionaries of additional informations, each returned from one environment.
-
unwrapped
¶ Unwrap this vectorized environment.
Useful for sequential wrappers applied, it can access information from the original vectorized environment.
-
-
lagom.envs.
make_vec_env
(make_env, num_env, init_seed)[source]¶ Create a vectorized environment, each associated with a different random seed.
Example:
>>> import gym >>> make_vec_env(lambda: gym.make('CartPole-v1'), 3, 0) <VecEnv: 3, CartPole-v1>
Parameters: - make_env (function) – a function to create an environment
- num_env (int) – number of environments to create.
- init_seed (int) – initial seed for
Seeder
to sample random seeds.
Returns: env – created vectorized environment
Return type:
Wrappers¶
-
lagom.envs.wrappers.
get_wrapper
(env, name)[source]¶ Return a wrapped environment of a specific wrapper.
Note
If no such wrapper found, then an
None
is returned.Parameters: - env (Env) – environment
- name (str) – name of the wrapper
Returns: out – wrapped environment
Return type: env
-
lagom.envs.wrappers.
get_all_wrappers
(env)[source]¶ Returns a list of wrapper names of a wrapped environment.
Parameters: env (Env) – wrapped environment Returns: out – list of string names of wrappers Return type: list
-
class
lagom.envs.wrappers.
ClipAction
(env)[source]¶ Clip the continuous action within the valid bound.
-
class
lagom.envs.wrappers.
FlattenObservation
(env)[source]¶ Observation wrapper that flattens the observation.
-
class
lagom.envs.wrappers.
NormalizeAction
(env)[source]¶ Rescale the continuous action space of the environment from [-1, 1].
-
class
lagom.envs.wrappers.
LazyFrames
(frames, lz4_compress=False)[source]¶ Ensures common frames are only stored once to optimize memory use.
To further reduce the memory use, it is optionally to turn on lz4 to compress the observations.
Note
This object should only be converted to numpy array just before forward pass.
-
class
lagom.envs.wrappers.
FrameStack
(env, num_stack, lz4_compress=False)[source]¶ Observation wrapper that stacks the observations in a rolling manner.
For example, if the number os stacks is 4, then returned observation constains the most recent 4 observations. For environment ‘Pendulum-v0’, the original observation is an array with shape [3], so if we stack 4 observations, the processed observation has shape [3, 4].
Note
To be memory efficient, the stacked observations are wrapped by
LazyFrame
.Note
The observation space must be Box type. If one uses Dict as observation space, it should apply FlattenDictWrapper at first.
Example:
>>> import gym >>> env = gym.make('PongNoFrameskip-v0') >>> env = FrameStack(env, 4) >>> env.observation_space Box(4, 210, 160, 3)
Parameters: - env (Env) – environment object
- num_stack (int) – number of stacks
-
reset
(**kwargs)[source]¶ Resets the state of the environment and returns an initial observation.
Returns: the initial observation. Return type: observation (object)
-
step
(action)[source]¶ Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
Parameters: action (object) – an action provided by the agent Returns: agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning) Return type: observation (object)
-
class
lagom.envs.wrappers.
GrayScaleObservation
(env, keep_dim=False)[source]¶ Convert the image observation from RGB to gray scale.
-
class
lagom.envs.wrappers.
ResizeObservation
(env, size)[source]¶ Downsample the image observation to a square image.
-
class
lagom.envs.wrappers.
ScaleReward
(env, scale=0.01)[source]¶ Scale the reward.
Note
This is incredibly important and drastically impact on performance e.g. PPO.
Example:
>>> from lagom.envs import make_gym_env >>> env = make_gym_env(env_id='CartPole-v1', seed=0) >>> env = ScaleReward(env, scale=0.1) >>> env.reset() >>> observation, reward, done, info = env.step(env.action_space.sample()) >>> reward 0.1
Parameters: - env (Env) – environment
- scale (float) – reward scaling factor
-
class
lagom.envs.wrappers.
ScaledFloatFrame
(env)[source]¶ Convert image frame to float range [0, 1] by dividing 255.
Warning
Do NOT use this wrapper for DQN ! It will break the memory optimization.
-
class
lagom.envs.wrappers.
TimeAwareObservation
(env)[source]¶ Augment the observation with current time step in the trajectory.
Note
Currently it only works with one-dimensional observation space. It doesn’t support pixel observation space yet.
-
reset
(**kwargs)[source]¶ Resets the state of the environment and returns an initial observation.
Returns: the initial observation. Return type: observation (object)
-
step
(action)[source]¶ Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
Parameters: action (object) – an action provided by the agent Returns: agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning) Return type: observation (object)
-
-
class
lagom.envs.wrappers.
VecMonitor
(env, deque_size=100)[source]¶ Record episode reward, horizon and time and report it when an episode terminates.
-
reset
()[source]¶ Reset all the environments and return a list of initial observations from each environment.
Warning
If
step_async()
is still working, then it will be aborted.Returns: observations – a list of initial observations from all environments. Return type: list
-
step
(actions)[source]¶ Ask all the environments to take a step with a list of actions, each for one environment.
Parameters: actions (list) – a list of actions, each for one environment. Returns: - observations (list) – a list of observations, each returned from one environment after executing the given action.
- rewards (list) – a list of scalar rewards, each returned from one environment.
- dones (list) – a list of booleans indicating whether the episode terminates, each returned from one environment.
- infos (list) – a list of dictionaries of additional informations, each returned from one environment.
-
-
class
lagom.envs.wrappers.
VecStandardizeObservation
(env, clip=10.0, constant_moments=None)[source]¶ Standardizes the observations by running estimation of mean and variance.
Warning
To evaluate the agent trained on standardized observations, remember to save and load observation scalings, otherwise, the performance will be incorrect.
Parameters: - env (VecEnv) – a vectorized environment
- clip (float) – clipping range of standardized observation, i.e. [-clip, clip]
- constant_moments (tuple) – a tuple of constant mean and variance to standardize observation. Note that if it is provided, then running average will be ignored.
-
reset
()[source]¶ Reset all the environments and return a list of initial observations from each environment.
Warning
If
step_async()
is still working, then it will be aborted.Returns: observations – a list of initial observations from all environments. Return type: list
-
step
(actions)[source]¶ Ask all the environments to take a step with a list of actions, each for one environment.
Parameters: actions (list) – a list of actions, each for one environment. Returns: - observations (list) – a list of observations, each returned from one environment after executing the given action.
- rewards (list) – a list of scalar rewards, each returned from one environment.
- dones (list) – a list of booleans indicating whether the episode terminates, each returned from one environment.
- infos (list) – a list of dictionaries of additional informations, each returned from one environment.
-
class
lagom.envs.wrappers.
VecStandardizeReward
(env, clip=10.0, gamma=0.99, constant_var=None)[source]¶ Standardize the reward by running estimation of variance.
Warning
We do not subtract running mean from reward but only divides it by running standard deviation. Because subtraction by mean will alter the reward shape so this might degrade the performance. Note that we perform this transformation from the second incoming reward while keeping first reward unchanged, otherwise it’ll have too large magnitude (then just being clipped) due to the fact that we do not subtract it from mean.
Note
Each
reset()
, we do not clean up theself.all_returns
buffer. Because of discount factor (\(< 1\)), the running averages will converge after some iterations. Therefore, we do not allow discounted factor as \(1.0\) since it will lead to unbounded explosion of reward running averages.Parameters: - env (VecEnv) – a vectorized environment
- clip (float) – clipping range of standardized reward, i.e. [-clip, clip]
- gamma (float) – discounted factor. Note that the value 1.0 should not be used.
- constant_var (ndarray) – Constant variance to standardize reward. Note that when it is provided, then running average will be ignored.
-
reset
()[source]¶ Reset all the environments and return a list of initial observations from each environment.
Warning
If
step_async()
is still working, then it will be aborted.Returns: observations – a list of initial observations from all environments. Return type: list
-
step
(actions)[source]¶ Ask all the environments to take a step with a list of actions, each for one environment.
Parameters: actions (list) – a list of actions, each for one environment. Returns: - observations (list) – a list of observations, each returned from one environment after executing the given action.
- rewards (list) – a list of scalar rewards, each returned from one environment.
- dones (list) – a list of booleans indicating whether the episode terminates, each returned from one environment.
- infos (list) – a list of dictionaries of additional informations, each returned from one environment.
-
class
lagom.envs.wrappers.
StepInfo
(done: bool, info: dict)[source]¶ Defines a set of information for each time step.
A StepInfo is returned from each step and reset of an environment. It contains properties of the transition and additional information.
-
class
lagom.envs.wrappers.
VecStepInfo
(env)[source]¶ -
reset
()[source]¶ Reset all the environments and return a list of initial observations from each environment.
Warning
If
step_async()
is still working, then it will be aborted.Returns: observations – a list of initial observations from all environments. Return type: list
-
step
(actions)[source]¶ Ask all the environments to take a step with a list of actions, each for one environment.
Parameters: actions (list) – a list of actions, each for one environment. Returns: - observations (list) – a list of observations, each returned from one environment after executing the given action.
- rewards (list) – a list of scalar rewards, each returned from one environment.
- dones (list) – a list of booleans indicating whether the episode terminates, each returned from one environment.
- infos (list) – a list of dictionaries of additional informations, each returned from one environment.
-