UGTC — Uncertainty-Gated Temporal Credit

Key Features

🔌

Backbone-Agnostic

Drop UGTC into any actor-critic algorithm by replacing the advantage computation. Tested with PPO, TD3, SAC.

🎯

Adaptive Credit Assignment

Automatically selects between short-horizon and long-horizon GAE estimates based on per-state uncertainty.

📐

Fixed Hyperparameters

λ_fast=0.80, λ_slow=0.99, M=3, β=5.0. Same across all benchmarks — no per-task tuning required.

🔬

Ensemble Uncertainty

Slow critic ensemble disagreement provides calibrated uncertainty estimates without Bayesian inference.

⚡

Lightweight Overhead

Three small MLP value heads. Minimal parameter and compute overhead relative to actor network.

🌐

Multi-Language

Reference implementations in Python, C++ (header-only), and Java for portability.

Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                              UGTC MODULE                                    │
│                                                                             │
│   Input: s (observation)                                                    │
│                                                                             │
│   ┌──────────────────┐      ┌────────────────────────────────────────────┐  │
│   │   Fast Critic    │      │          Slow Ensemble (M=3)               │  │
│   │   V_fast(s)      │      │   V¹(s)    V²(s)    V³(s)                 │  │
│   │   λ_fast = 0.80  │      │   (independent parameters, λ = 0.99)      │  │
│   └────────┬─────────┘      └──────────────────┬──────────────────────── ┘  │
│            │                                   │                            │
│            │                     ┌─────────────┴───────────────┐            │
│            │                     │  σ(s) = std(V¹,V²,V³)(s)   │            │
│            │                     │  Ensemble Disagreement       │            │
│            │                     └─────────────┬───────────────┘            │
│            │                                   │                            │
│            │                     ┌─────────────▼───────────────┐            │
│            │                     │  EMA Normalization           │            │
│            │                     │  σ_EMA ← α·σ_EMA + (1-α)·σ  │            │
│            │                     │  σ̂(s) = σ(s) / (σ_EMA + ε)  │            │
│            │                     └─────────────┬───────────────┘            │
│            │                                   │                            │
│            │                     ┌─────────────▼───────────────┐            │
│            │                     │   Sigmoid Gate               │            │
│            │                     │   u(s) = σ(-β·(σ̂(s) - 1))   │            │
│            │                     └─────────────┬───────────────┘            │
│            │                                   │                            │
│   ┌────────▼───────────────────────────────────▼─────────────────────────┐  │
│   │   A^UGTC = u(s) · A^slow  +  (1 - u(s)) · A^fast                    │  │
│   │   Blended Advantage Estimate                                          │  │
│   └───────────────────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────────────────┘

Gate Behavior

Low uncertainty

u → 1 → use A^slow (accurate)

Medium uncertainty

u = 0.5 → equal blend

High uncertainty

u → 0 → use A^fast (stable)

Mathematical Foundation

Generalized Advantage Estimation

\delta_t = r_t + \gamma V(s_{t+1})(1 - d_t) - V(s_t) \] \[ A_t^{\text{GAE}} = \sum_{k=0}^{\infty} (\gamma\lambda)^k \delta_{t+k}

UGTC Dual-Stream Computation

\[ A_t^{\text{fast}} = \text{GAE}\!\left(\tau,\, V_{\text{fast}},\, \lambda_{\text{fast}} = 0.80\right) \] \[ A_t^{\text{slow}} = \text{GAE}\!\left(\tau,\, \bar{V}_{\text{slow}},\, \lambda_{\text{slow}} = 0.99\right) \] where \(\bar{V}_{\text{slow}} = \frac{1}{M}\sum_{m=1}^{M} V^m_{\text{slow}}\) (ensemble mean, M = 3)

Uncertainty Gate

\sigma(s) = \text{std}\!\left(V^1_{\text{slow}}(s),\, \ldots,\, V^M_{\text{slow}}(s)\right) \] \[ \hat{\sigma}(s) = \frac{\sigma(s)}{\sigma_{\text{EMA}} + \varepsilon}, \qquad \sigma_{\text{EMA}} \leftarrow \alpha \cdot \sigma_{\text{EMA}} + (1-\alpha)\cdot\mathbb{E}[\sigma(s)] \] \[ u(s) = \sigma\!\left(-\beta \cdot (\hat{\sigma}(s) - 1)\right)

Blended Advantage

\boxed{A_t^{\text{UGTC}} = u(s_t) \cdot A_t^{\text{slow}} + (1 - u(s_t)) \cdot A_t^{\text{fast}}}

Fixed Hyperparameters

Parameter	Symbol	Value	Description
Fast λ	\(\lambda_{\text{fast}}\)	0.80	GAE lambda for fast critic (low variance)
Slow λ	\(\lambda_{\text{slow}}\)	0.99	GAE lambda for slow ensemble (low bias)
Ensemble size	M	3	Number of slow critic heads
Gate temperature	β	5.0	Sigmoid sharpness
EMA momentum	α	0.99	Running uncertainty normalization

RL Algorithm Integrations

UGTC-PPO

On-policy

A^UGTC replaces standard GAE in the clipped surrogate objective. All UGTC critics trained via same regression pipeline.

UGTC-TD3

Off-policy

UGTC provides baseline correction for the actor: L = -(Q_min + η·A^UGTC). Twin-Q and delayed update preserved.

UGTC-SAC

Off-policy

V^UGTC replaces implicit value baseline in the entropy-regularized actor loss. Auto-α entropy tuning unchanged.

UGTC-DDPG

Extension

Proposed extension following TD3 integration logic. Not benchmarked in the paper — labeled as implementation assumption.

Quick Start

Installation

bash

git clone https://github.com/ethosoftai/ugtc.git
cd ugtc
pip install -e .

Minimal Usage

python

from ugtc import UGTCModule

# Create UGTC module (obs_dim=17 for Hopper-v4)
ugtc = UGTCModule(obs_dim=17)

# Replace standard GAE in your PPO update:
advantages = ugtc.compute_advantages(
    obs=obs,            # (T, obs_dim)
    next_obs=next_obs,  # (T, obs_dim)
    rewards=rewards,    # (T,)
    dones=dones,        # (T,)
    gamma=0.99,
)

# Same as before: normalize and use in clipped surrogate
advantages = (advantages - advantages.mean()) / (advantages.std() + 1e-8)

Run an Example

bash

# UGTC-PPO on CartPole-v1 (no MuJoCo needed)
python examples/ugtc_ppo_cartpole.py

# UGTC-PPO on Hopper-v4 (requires MuJoCo)
python examples/ugtc_ppo_mujoco.py --env Hopper-v4

# UGTC-TD3 on Pendulum-v1
python examples/ugtc_td3_pendulum.py

Citation

@misc{dalar2026ugtc,
  author    = {Dalar, Yağız Ekrem},
  title     = {{UGTC}: Uncertainty-Gated Temporal Credit},
  year      = {2026},
  publisher = {Zenodo},
  doi       = {10.5281/zenodo.19715116},
  url       = {https://doi.org/10.5281/zenodo.19715116},
  note      = {Accepted — Ulysseus Young Explorers in Science (UYES) Journal.
               Journal DOI forthcoming.}
}