lagom.metric: Metrics¶

lagom.metric.returns(gamma, traj)[source]
lagom.metric.bootstrapped_returns(gamma, traj, last_V)[source]

Return (discounted) accumulated returns with bootstrapping for a batch of episodic transitions.

Formally, suppose we have all rewards $$(r_1, \dots, r_T)$$, it computes

$Q_t = r_t + \gamma r_{t+1} + \dots + \gamma^{T - t} r_T + \gamma^{T - t + 1} V(s_{T+1})$

Note

The state values for terminal states are masked out as zero !

lagom.metric.td0_target(gamma, traj, Vs, last_V)[source]

Calculate TD(0) targets of a batch of episodic transitions.

Let $$r_1, r_2, \dots, r_T$$ be a list of rewards and let $$V(s_0), V(s_1), \dots, V(s_{T-1}), V(s_{T})$$ be a list of state values including a last state value. Let $$\gamma$$ be a discounted factor, the TD(0) targets are calculated as follows

$r_t + \gamma V(s_t), \forall t = 1, 2, \dots, T$

Note

The state values for terminal states are masked out as zero !

lagom.metric.td0_error(gamma, traj, Vs, last_V)[source]

Calculate TD(0) errors of a batch of episodic transitions.

Let $$r_1, r_2, \dots, r_T$$ be a list of rewards and let $$V(s_0), V(s_1), \dots, V(s_{T-1}), V(s_{T})$$ be a list of state values including a last state value. Let $$\gamma$$ be a discounted factor, the TD(0) errors are calculated as follows

$\delta_t = r_{t+1} + \gamma V(s_{t+1}) - V(s_t)$

Note

The state values for terminal states are masked out as zero !

lagom.metric.gae(gamma, lam, traj, Vs, last_V)[source]

Calculate the Generalized Advantage Estimation (GAE) of a batch of episodic transitions.

Let $$\delta_t$$ be the TD(0) error at time step $$t$$, the GAE at time step $$t$$ is calculated as follows

$A_t^{\mathrm{GAE}(\gamma, \lambda)} = \sum_{k=0}^{\infty}(\gamma\lambda)^k \delta_{t + k}$