lagom.networks: Networks¶

class
lagom.networks.
Module
(**kwargs)[source]¶ Wrap PyTorch nn.module to provide more helper functions.

from_vec
(x)[source]¶ Set the network parameters from a single flattened vector.
Parameters: x (Tensor) – A single flattened vector of the network parameters with consistent size.

load
(f)[source]¶ Load the network parameters from a file.
It complies with the recommended approach for saving a model in PyTorch documentation.
Parameters: f (str) – file path.

num_params
¶ Returns the total number of parameters in the neural network.

num_trainable_params
¶ Returns the total number of trainable parameters in the neural network.

num_untrainable_params
¶ Returns the total number of untrainable parameters in the neural network.

save
(f)[source]¶ Save the network parameters to a file.
It complies with the recommended approach for saving a model in PyTorch documentation.
Note
It uses the highest pickle protocol to serialize the network parameters.
Parameters: f (str) – file path.


lagom.networks.
ortho_init
(module, nonlinearity=None, weight_scale=1.0, constant_bias=0.0)[source]¶ Applies orthogonal initialization for the parameters of a given module.
Parameters:  module (nn.Module) – A module to apply orthogonal initialization over its parameters.
 nonlinearity (str, optional) – Nonlinearity followed by forward pass of the module. When nonlinearity
is not
None
, the gain will be calculated andweight_scale
will be ignored. Default:None
 weight_scale (float, optional) – Scaling factor to initialize the weight. Ignored when
nonlinearity
is notNone
. Default: 1.0  constant_bias (float, optional) – Constant value to initialize the bias. Default: 0.0
Note
Currently, the only supported
module
are elementary neural network layers, e.g. nn.Linear, nn.Conv2d, nn.LSTM. The submodules are not supported.Example:
>>> a = nn.Linear(2, 3) >>> ortho_init(a)

lagom.networks.
linear_lr_scheduler
(optimizer, N, min_lr)[source]¶ Defines a linear learning rate scheduler.
Parameters:  optimizer (Optimizer) – optimizer
 N (int) – maximum bounds for the scheduling iteration e.g. total number of epochs, iterations or time steps.
 min_lr (float) – lower bound of learning rate

lagom.networks.
make_fc
(input_dim, hidden_sizes)[source]¶ Returns a ModuleList of fully connected layers.
Note
All submodules can be automatically tracked because it uses nn.ModuleList. One can use this function to generate parameters in
BaseNetwork
.Example:
>>> make_fc(3, [4, 5, 6]) ModuleList( (0): Linear(in_features=3, out_features=4, bias=True) (1): Linear(in_features=4, out_features=5, bias=True) (2): Linear(in_features=5, out_features=6, bias=True) )
Parameters:  input_dim (int) – input dimension in the first fully connected layer.
 hidden_sizes (list) – a list of hidden sizes, each for one fully connected layer.
Returns: fc – A ModuleList of fully connected layers.
Return type: nn.ModuleList

lagom.networks.
make_cnn
(input_channel, channels, kernels, strides, paddings)[source]¶ Returns a ModuleList of 2D convolution layers.
Note
All submodules can be automatically tracked because it uses nn.ModuleList. One can use this function to generate parameters in
BaseNetwork
.Example:
>>> make_cnn(input_channel=3, channels=[16, 32], kernels=[4, 3], strides=[2, 1], paddings=[1, 0]) ModuleList( (0): Conv2d(3, 16, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)) (1): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1)) )
Parameters:  input_channel (int) – input channel in the first convolution layer.
 channels (list) – a list of channels, each for one convolution layer.
 kernels (list) – a list of kernels, each for one convolution layer.
 strides (list) – a list of strides, each for one convolution layer.
 paddings (list) – a list of paddings, each for one convolution layer.
Returns: cnn – A ModuleList of 2D convolution layers.
Return type: nn.ModuleList

lagom.networks.
make_transposed_cnn
(input_channel, channels, kernels, strides, paddings, output_paddings)[source]¶ Returns a ModuleList of 2D transposed convolution layers.
Note
All submodules can be automatically tracked because it uses nn.ModuleList. One can use this function to generate parameters in
BaseNetwork
.Example:
make_transposed_cnn(input_channel=3, channels=[16, 32], kernels=[4, 3], strides=[2, 1], paddings=[1, 0], output_paddings=[1, 0]) ModuleList( (0): ConvTranspose2d(3, 16, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), output_padding=(1, 1)) (1): ConvTranspose2d(16, 32, kernel_size=(3, 3), stride=(1, 1)) )
Parameters:  input_channel (int) – input channel in the first transposed convolution layer.
 channels (list) – a list of channels, each for one transposed convolution layer.
 kernels (list) – a list of kernels, each for one transposed convolution layer.
 strides (list) – a list of strides, each for one transposed convolution layer.
 paddings (list) – a list of paddings, each for one transposed convolution layer.
 output_paddings (list) – a list of output paddings, each for one transposed convolution layer.
Returns: transposed_cnn – A ModuleList of 2D transposed convolution layers.
Return type: nn.ModuleList

class
lagom.networks.
MDNHead
(in_features, out_features, num_density, device, **kwargs)[source]¶ 
forward
(x)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

loss
(logit_pi, mean, std, target)[source]¶ Calculate the MDN loss function.
The loss function (negative loglikelihood) is defined by:
\[L = \frac{1}{N}\sum_{n=1}^{N}\ln \left( \sum_{k=1}^{K}\prod_{d=1}^{D} \pi_{k}(x_{n, d}) \mathcal{N}\left( \mu_k(x_{n, d}), \sigma_k(x_{n,d}) \right) \right)\]For better numerical stability, we could use logscale:
\[L = \frac{1}{N}\sum_{n=1}^{N}\ln \left( \sum_{k=1}^{K}\exp \left\{ \sum_{d=1}^{D} \ln\pi_{k}(x_{n, d}) + \ln\mathcal{N}\left( \mu_k(x_{n, d}), \sigma_k(x_{n,d}) \right) \right\} \right)\]Note
One should always use the second formula via logsumexp trick. The first formula is numerically unstable resulting in +/
Inf
andNaN
error.The logsumexp trick is defined by
\[\log\sum_{i=1}^{N}\exp(x_i) = a + \log\sum_{i=1}^{N}\exp(x_i  a)\]where \(a = \max_i(x_i)\)
Parameters:  logit_pi (Tensor) – the logit of mixing coefficients, shape [N, K, D]
 mean (Tensor) – mean of Gaussian mixtures, shape [N, K, D]
 std (Tensor) – standard deviation of Gaussian mixtures, shape [N, K, D]
 target (Tensor) – target tensor, shape [N, D]
Returns: loss – calculated loss
Return type: Tensor

sample
(logit_pi, mean, std, tau=1.0)[source]¶ Sample from Gaussian mixtures using reparameterization trick.
 Firstly sample categorically over mixing coefficients to determine a specific Gaussian
 Then sample from selected Gaussian distribution
Parameters:  logit_pi (Tensor) – the logit of mixing coefficients, shape [N, K, D]
 mean (Tensor) – mean of Gaussian mixtures, shape [N, K, D]
 std (Tensor) – standard deviation of Gaussian mixtures, shape [N, K, D]
 tau (float) – temperature during sampling, it controls uncertainty. * If \(\tau > 1\): increase uncertainty * If \(\tau < 1\): decrease uncertainty
Returns: x – sampled data with shape [N, D]
Return type: Tensor

Recurrent Neural Networks¶
RL components¶

class
lagom.networks.
CategoricalHead
(feature_dim, num_action, device, **kwargs)[source]¶ Defines a module for a Categorical (discrete) action distribution.
Example
>>> import torch >>> action_head = CategoricalHead(30, 4, 'cpu') >>> action_head(torch.randn(2, 30)) Categorical(probs: torch.Size([2, 4]))
Parameters:  feature_dim (int) – number of input features
 num_action (int) – number of discrete actions
 device (torch.device) – PyTorch device
 **kwargs – keyword arguments for more specifications.

forward
(x)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class
lagom.networks.
DiagGaussianHead
(feature_dim, action_dim, device, std0, **kwargs)[source]¶ Defines a module for a diagonal Gaussian (continuous) action distribution which the standard deviation is state independent.
The network outputs the mean \(\mu(x)\) and the state independent logarithm of standard deviation \(\log\sigma\) (allowing to optimize in logspace, i.e. both negative and positive).
The standard deviation is obtained by applying exponential function \(\exp(x)\).
Example
>>> import torch >>> action_head = DiagGaussianHead(10, 4, 'cpu', 0.45) >>> action_dist = action_head(torch.randn(2, 10)) >>> action_dist.base_dist Normal(loc: torch.Size([2, 4]), scale: torch.Size([2, 4])) >>> action_dist.base_dist.stddev tensor([[0.4500, 0.4500, 0.4500, 0.4500], [0.4500, 0.4500, 0.4500, 0.4500]], grad_fn=<ExpBackward>)
Parameters:  feature_dim (int) – number of input features
 action_dim (int) – flat dimension of actions
 device (torch.device) – PyTorch device
 std0 (float) – initial standard deviation
 **kwargs – keyword arguments for more specifications.

forward
(x)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.