# Dreamer: Model-Based RL with Latent Dynamics Dreamer is a model-based reinforcement learning algorithm that learns a latent dynamics model from images and trains a behavior policy entirely in the latent space. Based on papers: - [Dreamer: Learning Latent Dynamics for Planning from Pixels](https://arxiv.org/abs/1912.01603) (Hafner et al., 2019) - [Mastering Atari with Discrete World Models](https://arxiv.org/abs/2010.02193) (DreamerV2, Hafner et al., 2020) ## Key Idea Dreamer learns: 1. **World Model**: Latent dynamics model that predicts future latent states 2. **Value Model**: Estimates expected returns from any latent state 3. **Policy**: Actions that maximize expected returns in latent space The key innovation is learning behaviors purely in imagination - no gradients flow from the environment. ## Architecture ``` ┌─────────────────────────────────────────────────────────────────────┐ │ World Model (RSSM) │ │ │ │ ┌─────────┐ ┌───────────────────┐ ┌────────────────────────┐ │ │ │Encoder │ │ Latent Model │ │ Decoder │ │ │ │ (CNN) │ -> │ (GRU + Stoch) │ -> │ (Transposed CNN) │ │ │ │ 64x64 │ │ h_t = f(h_{t-1}, │ │ │ │ │ │ │ │ s_{t-1}, a)│ │ p(x_t | s_t, h_t) │ │ │ └─────────┘ └───────────────────┘ └────────────────────────┘ │ │ │ │ s_t ~ p(s_t | h_t) h_t ~ p(h_t | s_{t-1}, a_{t-1}) │ └─────────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────────┐ │ Imagination Rollout │ │ │ │ s_0 ──► a_0 ──► s_1 ──► a_1 ──► s_2 ──► ... ──► s_H │ │ │ │ │ │ └────────┴──────────────────────────────────────┐ │ │ ▼ │ │ ┌─────────────────────────────┐ │ │ │ λ-return target │ │ │ │ G_t = r_t + γ(1-λ)v + λG_{t+1} │ │ └─────────────────────────────┘ │ └─────────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────────┐ │ Actor-Critic Learning │ │ │ │ Actor: π(a_t | s_t, h_t) ──► REINFORCE with baseline │ │ Critic: v(s_t, h_t) ──► MSE on λ-returns │ └─────────────────────────────────────────────────────────────────────┘ ``` ## Components ### 1. Recurrent State Space Model (RSSM) The core world model combining: - **Deterministic hidden state** (h_t): Recurrent state (GRU) - **Stochastic latent state** (s_t): Discrete or continuous latent variables **Dynamics**: `h_t = f(h_{t-1}, s_{t-1}, a_{t-1})` **Posterior**: `s_t ~ q(s_t | h_t, x_t)` **Prior**: `s_t ~ p(s_t | h_t)` ### 2. Encoder/Decoder - **Encoder**: CNN that maps images to latent embeddings - **Decoder**: Transposed CNN that reconstructs images from latents - Both use ReLU activations and residual connections ### 3. Reward/Discount Heads - **Reward model**: Predicts reward from latent state - **Discount model**: Predicts episode termination (DreamerV2) ## Training ```python from world_models.models import DreamerAgent from world_models.configs import DreamerConfig cfg = DreamerConfig() cfg.env_backend = "gym" cfg.env = "Pendulum-v1" cfg.total_steps = 1_000_000 agent = DreamerAgent(cfg) agent.train() ``` ### Key Hyperparameters | Parameter | Default | Description | |-----------|---------|-------------| | `stoch_size` | 30 | Stochastic latent dimensions | | `deter_size` | 200 | Deterministic hidden size | | `embed_size` | 1024 | Encoder embedding size | | `imagine_horizon` | 15 | Imagination rollout length | | `discount` | 0.99 | Discount factor γ | | `td_lambda` | 0.95 | λ-return parameter | | `kl_loss_coeff` | 1.0 | KL divergence weight | ### Learning Objectives **World Model Loss**: ``` L_world = L_reconstruction + L_reward + β * L_KL ``` **Actor Loss** (REINFORCE): ``` L_actor = -E[log π(a|s) * (G - V(s))] ``` **Critic Loss** (MSE): ``` L_critic = E[(G - V(s))²] ``` ## DreamerV2 Enhancements DreamerV2 introduces several improvements: 1. **Discrete latents**: Categorical latent variables instead of Gaussian 2. **KL balancing**: Separate weighting for prior/posterior KL 3. **Discount model**: Learns to predict episode termination 4. **Layer normalization**: More stable training ## Environment Support Dreamer supports multiple backends: ```python cfg = DreamerConfig() cfg.env_backend = "dmc" # DeepMind Control Suite cfg.env = "walker-walk" cfg.env_backend = "gym" # Gym/Gymnasium cfg.env = "Pendulum-v1" cfg.env_backend = "unity_mlagents" # Unity ML-Agents cfg.unity_file_name = "env.exe" ``` ## References - Hafner, D., Lillicrap, T., Fischer, I., Vuong, Q., Held, D., Haarnoja, T., & Abbeel, P. (2019). Dreamer: Learning Latent Dynamics for Planning from Pixels. - Hafner, D., Lillicrap, T., Ba, J., & Norouzi, M. (2020). Mastering Atari with Discrete World Models.