# Dreamer: Model-Based RL with Latent Dynamics Dreamer is a model-based reinforcement learning algorithm that learns a latent dynamics model from images and trains a behavior policy entirely in the latent space. Based on papers: - [Dreamer: Learning Latent Dynamics for Planning from Pixels](https://arxiv.org/abs/1912.01603) (Hafner et al., 2019) - [Mastering Atari with Discrete World Models](https://arxiv.org/abs/2010.02193) (DreamerV2, Hafner et al., 2020) ## Key Idea Dreamer learns: 1. **World Model**: Latent dynamics model that predicts future latent states 2. **Value Model**: Estimates expected returns from any latent state 3. **Policy**: Actions that maximize expected returns in latent space The key innovation is learning behaviors purely in imagination - no gradients flow from the environment. ## Architecture

World Model RSSM

Encoder CNN 64x64 → GRU plus stochastic latent model → Decoder transposed CNN

Imagination Rollout

State s0 → Action a0 → Imagined future states → Lambda-return target

Actor-Critic Learning

Actor policy Critic value model

## Components ### 1. Recurrent State Space Model (RSSM) The core world model combining: - **Deterministic hidden state** (h_t): Recurrent state (GRU) - **Stochastic latent state** (s_t): Discrete or continuous latent variables **Dynamics**: ```{math} \mathbf{h}_t = f(\mathbf{h}_{t-1}, \mathbf{s}_{t-1}, \mathbf{a}_{t-1}) ``` **Posterior**: ```{math} \mathbf{s}_t \sim q(\mathbf{s}_t \mid \mathbf{h}_t, \mathbf{x}_t) ``` **Prior**: ```{math} \mathbf{s}_t \sim p(\mathbf{s}_t \mid \mathbf{h}_t) ``` ### 2. Encoder/Decoder - **Encoder**: CNN that maps images to latent embeddings - **Decoder**: Transposed CNN that reconstructs images from latents - Both use ReLU activations and residual connections ### 3. Reward/Discount Heads - **Reward model**: Predicts reward from latent state - **Discount model**: Predicts episode termination (DreamerV2) ## Training ```python :class: thebe from torchwm import DreamerAgent from torchwm import DreamerConfig cfg = DreamerConfig() cfg.env_backend = "gym" cfg.env = "Pendulum-v1" cfg.total_steps = 1_000_000 agent = DreamerAgent(cfg) agent.train() ``` ### Key Hyperparameters | Parameter | Default | Description | |-----------|---------|-------------| | `stoch_size` | 30 | Stochastic latent dimensions | | `deter_size` | 200 | Deterministic hidden size | | `embed_size` | 1024 | Encoder embedding size | | `imagine_horizon` | 15 | Imagination rollout length | | `discount` | 0.99 | Discount factor γ | | `td_lambda` | 0.95 | λ-return parameter | | `kl_loss_coeff` | 1.0 | KL divergence weight | ### Learning Objectives **World Model Loss**: ```{math} \begin{aligned} \mathcal{L}_\mathrm{world} &= \mathcal{L}_\mathrm{reconstruction} + \mathcal{L}_\mathrm{reward} + \beta \cdot \mathcal{L}_\mathrm{KL} \end{aligned} ``` **Actor Loss** (REINFORCE): ```{math} \mathcal{L}_\mathrm{actor} = -\mathbb{E}\left[\log \pi(\mathbf{a} \mid \mathbf{s}) \cdot (G - V(\mathbf{s}))\right] ``` **Critic Loss** (MSE): ```{math} \mathcal{L}_\mathrm{critic} = \mathbb{E}[(G - V(\mathbf{s}))^2] ``` ## DreamerV2 Enhancements DreamerV2 introduces several improvements: 1. **Discrete latents**: Categorical latent variables instead of Gaussian 2. **KL balancing**: Separate weighting for prior/posterior KL 3. **Discount model**: Learns to predict episode termination 4. **Layer normalization**: More stable training ## Environment Support Dreamer supports multiple backends: ```python :class: thebe cfg = DreamerConfig() cfg.env_backend = "dmc" # DeepMind Control Suite cfg.env = "walker-walk" cfg.env_backend = "gym" # Gym/Gymnasium cfg.env = "Pendulum-v1" # MuJoCo example: cfg.env_backend = "mujoco" # MuJoCo task ids or native MJCF/MJB files cfg.env = "Humanoid-v4" # or "models/cartpole.xml" cfg.mujoco_camera = None # native MJCF/MJB only cfg.mujoco_frame_skip = 4 # native MJCF/MJB only # Gymnasium Robotics example (all ids registered by installed package): cfg.env_backend = "robotics" cfg.env = "HalfCheetah-v2" # Brax example: cfg.env_backend = "brax" # JAX/Brax cfg.env = "ant" cfg.brax_backend = "generalized" cfg.env_backend = "unity_mlagents" # Unity ML-Agents cfg.unity_file_name = "env.exe" ``` For MuJoCo tasks, Dreamer delegates adapter construction to `make_mujoco_env_from_config`, which keeps `make_env` focused on backend selection while the MuJoCo module owns task-id vs XML/MJB source selection. Use Gymnasium task ids such as `Humanoid-v4` for standard benchmark rewards, or use native MJCF/MJB sources plus `MuJoCoImageEnv` callbacks for custom rewards and termination logic. Legacy MuJoCo v2/v3 ids and other Gymnasium Robotics tasks can use `env_backend="robotics"`; TorchWM lists those ids dynamically from the installed `gymnasium-robotics` package. ## References - Hafner, D., Lillicrap, T., Fischer, I., Vuong, Q., Held, D., Haarnoja, T., & Abbeel, P. (2019). Dreamer: Learning Latent Dynamics for Planning from Pixels. - Hafner, D., Lillicrap, T., Ba, J., & Norouzi, M. (2020). Mastering Atari with Discrete World Models.