Key Features
Backbone-Agnostic
Drop UGTC into any actor-critic algorithm by replacing the advantage computation. Tested with PPO, TD3, SAC.
Adaptive Credit Assignment
Automatically selects between short-horizon and long-horizon GAE estimates based on per-state uncertainty.
Fixed Hyperparameters
Ξ»_fast=0.80, Ξ»_slow=0.99, M=3, Ξ²=5.0. Same across all benchmarks β no per-task tuning required.
Ensemble Uncertainty
Slow critic ensemble disagreement provides calibrated uncertainty estimates without Bayesian inference.
Lightweight Overhead
Three small MLP value heads. Minimal parameter and compute overhead relative to actor network.
Multi-Language
Reference implementations in Python, C++ (header-only), and Java for portability.
Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β UGTC MODULE β
β β
β Input: s (observation) β
β β
β ββββββββββββββββββββ ββββββββββββββββββββββββββββββββββββββββββββββ β
β β Fast Critic β β Slow Ensemble (M=3) β β
β β V_fast(s) β β VΒΉ(s) VΒ²(s) VΒ³(s) β β
β β Ξ»_fast = 0.80 β β (independent parameters, Ξ» = 0.99) β β
β ββββββββββ¬ββββββββββ ββββββββββββββββββββ¬ββββββββββββββββββββββββ β β
β β β β
β β βββββββββββββββ΄ββββββββββββββββ β
β β β Ο(s) = std(VΒΉ,VΒ²,VΒ³)(s) β β
β β β Ensemble Disagreement β β
β β βββββββββββββββ¬ββββββββββββββββ β
β β β β
β β βββββββββββββββΌββββββββββββββββ β
β β β EMA Normalization β β
β β β Ο_EMA β Ξ±Β·Ο_EMA + (1-Ξ±)Β·Ο β β
β β β ΟΜ(s) = Ο(s) / (Ο_EMA + Ξ΅) β β
β β βββββββββββββββ¬ββββββββββββββββ β
β β β β
β β βββββββββββββββΌββββββββββββββββ β
β β β Sigmoid Gate β β
β β β u(s) = Ο(-Ξ²Β·(ΟΜ(s) - 1)) β β
β β βββββββββββββββ¬ββββββββββββββββ β
β β β β
β ββββββββββΌββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββ β
β β A^UGTC = u(s) Β· A^slow + (1 - u(s)) Β· A^fast β β
β β Blended Advantage Estimate β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Gate Behavior
Mathematical Foundation
Generalized Advantage Estimation
UGTC Dual-Stream Computation
where \(\bar{V}_{\text{slow}} = \frac{1}{M}\sum_{m=1}^{M} V^m_{\text{slow}}\) (ensemble mean, M = 3)
Uncertainty Gate
Blended Advantage
Fixed Hyperparameters
| Parameter | Symbol | Value | Description |
|---|---|---|---|
| Fast Ξ» | \(\lambda_{\text{fast}}\) | 0.80 | GAE lambda for fast critic (low variance) |
| Slow Ξ» | \(\lambda_{\text{slow}}\) | 0.99 | GAE lambda for slow ensemble (low bias) |
| Ensemble size | M | 3 | Number of slow critic heads |
| Gate temperature | Ξ² | 5.0 | Sigmoid sharpness |
| EMA momentum | Ξ± | 0.99 | Running uncertainty normalization |
RL Algorithm Integrations
UGTC-PPO
On-policy
A^UGTC replaces standard GAE in the clipped surrogate objective. All UGTC critics trained via same regression pipeline.
UGTC-TD3
Off-policy
UGTC provides baseline correction for the actor: L = -(Q_min + Ξ·Β·A^UGTC). Twin-Q and delayed update preserved.
UGTC-SAC
Off-policy
V^UGTC replaces implicit value baseline in the entropy-regularized actor loss. Auto-Ξ± entropy tuning unchanged.
UGTC-DDPG
Extension
Proposed extension following TD3 integration logic. Not benchmarked in the paper β labeled as implementation assumption.
Quick Start
Installation
git clone https://github.com/ethosoftai/ugtc.git cd ugtc pip install -e .
Minimal Usage
from ugtc import UGTCModule
# Create UGTC module (obs_dim=17 for Hopper-v4)
ugtc = UGTCModule(obs_dim=17)
# Replace standard GAE in your PPO update:
advantages = ugtc.compute_advantages(
obs=obs, # (T, obs_dim)
next_obs=next_obs, # (T, obs_dim)
rewards=rewards, # (T,)
dones=dones, # (T,)
gamma=0.99,
)
# Same as before: normalize and use in clipped surrogate
advantages = (advantages - advantages.mean()) / (advantages.std() + 1e-8)
Run an Example
# UGTC-PPO on CartPole-v1 (no MuJoCo needed) python examples/ugtc_ppo_cartpole.py # UGTC-PPO on Hopper-v4 (requires MuJoCo) python examples/ugtc_ppo_mujoco.py --env Hopper-v4 # UGTC-TD3 on Pendulum-v1 python examples/ugtc_td3_pendulum.py
Citation
@misc{dalar2026ugtc,
author = {Dalar, YaΔΔ±z Ekrem},
title = {{UGTC}: Uncertainty-Gated Temporal Credit},
year = {2026},
publisher = {Zenodo},
doi = {10.5281/zenodo.19715116},
url = {https://doi.org/10.5281/zenodo.19715116},
note = {Accepted β Ulysseus Young Explorers in Science (UYES) Journal.
Journal DOI forthcoming.}
}