lagom.networks: Networks¶

class lagom.networks.Module(**kwargs)[source]¶

Wrap PyTorch nn.module to provide more helper functions.

from_vec(x)[source]¶

Set the network parameters from a single flattened vector.

Parameters:	x (Tensor) – A single flattened vector of the network parameters with consistent size.

load(f)[source]¶

Load the network parameters from a file.

It complies with the recommended approach for saving a model in PyTorch documentation.

Parameters:	f (str) – file path.

num_params¶: Returns the total number of parameters in the neural network.

num_trainable_params¶: Returns the total number of trainable parameters in the neural network.

num_untrainable_params¶: Returns the total number of untrainable parameters in the neural network.

save(f)[source]¶

Save the network parameters to a file.

It complies with the recommended approach for saving a model in PyTorch documentation.

Note

It uses the highest pickle protocol to serialize the network parameters.

Parameters:	f (str) – file path.

to_vec()[source]¶: Returns the network parameters as a single flattened vector.

lagom.networks.ortho_init(module, nonlinearity=None, weight_scale=1.0, constant_bias=0.0)[source]¶

Applies orthogonal initialization for the parameters of a given module.

Parameters:

module (nn.Module) – A module to apply orthogonal initialization over its parameters.
nonlinearity (str, optional) – Nonlinearity followed by forward pass of the module. When nonlinearity is not None, the gain will be calculated and weight_scale will be ignored. Default: None
weight_scale (float, optional) – Scaling factor to initialize the weight. Ignored when nonlinearity is not None. Default: 1.0
constant_bias (float, optional) – Constant value to initialize the bias. Default: 0.0

Note

Currently, the only supported module are elementary neural network layers, e.g. nn.Linear, nn.Conv2d, nn.LSTM. The submodules are not supported.

Example:

>>> a = nn.Linear(2, 3)
>>> ortho_init(a)

lagom.networks.linear_lr_scheduler(optimizer, N, min_lr)[source]¶

Defines a linear learning rate scheduler.

Parameters:	optimizer (Optimizer) – optimizer N (int) – maximum bounds for the scheduling iteration e.g. total number of epochs, iterations or time steps. min_lr (float) – lower bound of learning rate

lagom.networks.make_fc(input_dim, hidden_sizes)[source]¶

Returns a ModuleList of fully connected layers.

Note

All submodules can be automatically tracked because it uses nn.ModuleList. One can use this function to generate parameters in BaseNetwork.

Example:

>>> make_fc(3, [4, 5, 6])
ModuleList(
  (0): Linear(in_features=3, out_features=4, bias=True)
  (1): Linear(in_features=4, out_features=5, bias=True)
  (2): Linear(in_features=5, out_features=6, bias=True)
)

Parameters:	input_dim (int) – input dimension in the first fully connected layer. hidden_sizes (list) – a list of hidden sizes, each for one fully connected layer.
Returns:	fc – A ModuleList of fully connected layers.
Return type:	nn.ModuleList

lagom.networks.make_cnn(input_channel, channels, kernels, strides, paddings)[source]¶

Returns a ModuleList of 2D convolution layers.

Note

All submodules can be automatically tracked because it uses nn.ModuleList. One can use this function to generate parameters in BaseNetwork.

Example:

>>> make_cnn(input_channel=3, channels=[16, 32], kernels=[4, 3], strides=[2, 1], paddings=[1, 0])
ModuleList(
  (0): Conv2d(3, 16, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
  (1): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1))
)

Parameters:	input_channel (int) – input channel in the first convolution layer. channels (list) – a list of channels, each for one convolution layer. kernels (list) – a list of kernels, each for one convolution layer. strides (list) – a list of strides, each for one convolution layer. paddings (list) – a list of paddings, each for one convolution layer.
Returns:	cnn – A ModuleList of 2D convolution layers.
Return type:	nn.ModuleList

lagom.networks.make_transposed_cnn(input_channel, channels, kernels, strides, paddings, output_paddings)[source]¶

Returns a ModuleList of 2D transposed convolution layers.

Note

All submodules can be automatically tracked because it uses nn.ModuleList. One can use this function to generate parameters in BaseNetwork.

Example:

make_transposed_cnn(input_channel=3,
                    channels=[16, 32],
                    kernels=[4, 3],
                    strides=[2, 1],
                    paddings=[1, 0],
                    output_paddings=[1, 0])
ModuleList(
  (0): ConvTranspose2d(3, 16, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))
  (1): ConvTranspose2d(16, 32, kernel_size=(3, 3), stride=(1, 1))
)

Parameters:	input_channel (int) – input channel in the first transposed convolution layer. channels (list) – a list of channels, each for one transposed convolution layer. kernels (list) – a list of kernels, each for one transposed convolution layer. strides (list) – a list of strides, each for one transposed convolution layer. paddings (list) – a list of paddings, each for one transposed convolution layer. output_paddings (list) – a list of output paddings, each for one transposed convolution layer.
Returns:	transposed_cnn – A ModuleList of 2D transposed convolution layers.
Return type:	nn.ModuleList

class lagom.networks.MDNHead(in_features, out_features, num_density, device, **kwargs)[source]¶

forward(x)[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

loss(logit_pi, mean, std, target)[source]¶

Calculate the MDN loss function.

The loss function (negative log-likelihood) is defined by:

\[L = -\frac{1}{N}\sum_{n=1}^{N}\ln \left( \sum_{k=1}^{K}\prod_{d=1}^{D} \pi_{k}(x_{n, d}) \mathcal{N}\left( \mu_k(x_{n, d}), \sigma_k(x_{n,d}) \right) \right)\]

For better numerical stability, we could use log-scale:

\[L = -\frac{1}{N}\sum_{n=1}^{N}\ln \left( \sum_{k=1}^{K}\exp \left\{ \sum_{d=1}^{D} \ln\pi_{k}(x_{n, d}) + \ln\mathcal{N}\left( \mu_k(x_{n, d}), \sigma_k(x_{n,d}) \right) \right\} \right)\]

Note

One should always use the second formula via log-sum-exp trick. The first formula is numerically unstable resulting in +/- Inf and NaN error.

The log-sum-exp trick is defined by

\[\log\sum_{i=1}^{N}\exp(x_i) = a + \log\sum_{i=1}^{N}\exp(x_i - a)\]

where \(a = \max_i(x_i)\)

Parameters:	logit_pi (Tensor) – the logit of mixing coefficients, shape [N, K, D] mean (Tensor) – mean of Gaussian mixtures, shape [N, K, D] std (Tensor) – standard deviation of Gaussian mixtures, shape [N, K, D] target (Tensor) – target tensor, shape [N, D]
Returns:	loss – calculated loss
Return type:	Tensor

sample(logit_pi, mean, std, tau=1.0)[source]¶

Sample from Gaussian mixtures using reparameterization trick.

Firstly sample categorically over mixing coefficients to determine a specific Gaussian
Then sample from selected Gaussian distribution

Parameters:	logit_pi (Tensor) – the logit of mixing coefficients, shape [N, K, D] mean (Tensor) – mean of Gaussian mixtures, shape [N, K, D] std (Tensor) – standard deviation of Gaussian mixtures, shape [N, K, D] tau (float) – temperature during sampling, it controls uncertainty. * If \(\tau > 1\): increase uncertainty * If \(\tau < 1\): decrease uncertainty
Returns:	x – sampled data with shape [N, D]
Return type:	Tensor

Recurrent Neural Networks¶

class lagom.networks.LayerNormLSTMCell(input_size, hidden_size)[source]¶

class lagom.networks.LSTMLayer(cell, *cell_args)[source]¶

class lagom.networks.StackedLSTM(num_layers, layer, first_layer_args, other_layer_args)[source]¶

lagom.networks.make_lnlstm(input_size, hidden_size, num_layers=1)[source]¶

RL components¶

class lagom.networks.CategoricalHead(feature_dim, num_action, device, **kwargs)[source]¶

Defines a module for a Categorical (discrete) action distribution.

Example

>>> import torch
>>> action_head = CategoricalHead(30, 4, 'cpu')
>>> action_head(torch.randn(2, 30))
Categorical(probs: torch.Size([2, 4]))

Parameters:	feature_dim (int) – number of input features num_action (int) – number of discrete actions device (torch.device) – PyTorch device **kwargs – keyword arguments for more specifications.

forward(x)[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class lagom.networks.DiagGaussianHead(feature_dim, action_dim, device, std0, **kwargs)[source]¶

Defines a module for a diagonal Gaussian (continuous) action distribution which the standard deviation is state independent.

The network outputs the mean \(\mu(x)\) and the state independent logarithm of standard deviation \(\log\sigma\) (allowing to optimize in log-space, i.e. both negative and positive).

The standard deviation is obtained by applying exponential function \(\exp(x)\).

Example

>>> import torch
>>> action_head = DiagGaussianHead(10, 4, 'cpu', 0.45)
>>> action_dist = action_head(torch.randn(2, 10))
>>> action_dist.base_dist
Normal(loc: torch.Size([2, 4]), scale: torch.Size([2, 4]))
>>> action_dist.base_dist.stddev
tensor([[0.4500, 0.4500, 0.4500, 0.4500],
        [0.4500, 0.4500, 0.4500, 0.4500]], grad_fn=<ExpBackward>)

Parameters:	feature_dim (int) – number of input features action_dim (int) – flat dimension of actions device (torch.device) – PyTorch device std0 (float) – initial standard deviation **kwargs – keyword arguments for more specifications.

forward(x)[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.