Environment Backends#

TorchWM ships environment adapters for pixel-based world-model training, model-based reinforcement learning, and benchmark collection. Each backend page explains the installation requirements, factory functions, observation and action conventions, configuration fields, and common troubleshooting steps for that backend.

Choosing a backend#

Backend	Best for	Primary APIs	Typical observations	Typical actions
DeepMind Control Suite	Dreamer-style continuous-control tasks with state and rendered image observations	`DeepMindControlEnv`, `env_backend="dmc"`	Dict with DMC state keys plus `image`	Continuous `Box` from the DMC action spec
DeepMind Lab	3D navigation and puzzle tasks from DeepMind Lab	`DMLabEnv`, `make_dmlab_env`, `env_backend="dmlab"`	Dict with `image` plus requested Lab observations	Normalized one-hot `Box[-1, 1]` over native Lab actions
Gym and Gymnasium	Classic control, Box2D, custom Gym environments, and generic rendered tasks	`GymImageEnv`, `make_gym_env`, `env_backend="gym"`	Dict with `image` only	Original continuous `Box` or one-hot vector for discrete actions
DeepMind BSuite	Small diagnostic RL benchmark tasks such as `catch/0` and `deep_sea/0`	`BSuiteImageEnv`, `make_bsuite_env`, `env_backend="bsuite"`	Dict with synthetic `image` only	One-hot vector for discrete actions
Brax	JAX/Brax continuous-control tasks through a Gym-like image adapter	`BraxImageEnv`, `make_brax_env`, `env_backend="brax"`	Dict with synthesized `image`; raw vector in `info["vector_observation"]`	Continuous `Box[-1, 1]` matching `env.action_size`
Atari	Atari 2600 environments through Gymnasium/ALE	`make_atari_env`, `make_atari_vector_env`	ALE RGB/RAM observations	Discrete Atari actions
Procgen	Procedurally generated benchmark games	`ProcgenImageEnv`, `make_procgen_env`, `env_backend="procgen"`	Dict with `image`	One-hot vector for discrete Procgen actions
MuJoCo	Gymnasium MuJoCo task ids and native MJCF/MJB models	`make_mujoco_env`	Image dict via `GymImageEnv`/`MuJoCoImageEnv`	Continuous `Box`
Gymnasium Robotics	All ids registered by the installed Gymnasium Robotics package, including moved legacy MuJoCo v2/v3 ids	`make_robotics_env`, `list_gymnasium_robotics_envs`	Image dict via `GymImageEnv`	Continuous `Box`
Unity ML-Agents	External Unity executable simulations with continuous-control behaviors	`UnityMLAgentsEnv`, `env_backend="unity_mlagents"`	Dict with `image`	Continuous `Box[-1, 1]`
Vectorized environments	Multiprocess/vector rollout collection and native ALE vectorization	`TorchVectorizedEnv`, `make_atari_vector_env`	Batched observations	Batched actions
World Model Env	Model-based RL, policy optimization, and evaluation inside learned dynamics	`WorldModelEnv`, `make_world_model_env`, `env_backend="world-model"`	Adapter-defined Gymnasium space	Adapter-defined Gymnasium space
Wrappers	Shared preprocessing, action conversion, time limits, reward observations, and image transforms	`world_models.envs.wrappers`	Backend-dependent	Backend-dependent

Shared conventions#

Most TorchWM training code expects image observations as a dictionary entry named image with channel-first shape (3, H, W) and uint8 values. Backend adapters that wrap vector-only environments synthesize an image representation so pixel-based agents can still run.

DIAMOND-style Atari support is documented on the Atari page as a preprocessing helper for Atari rollouts. It is not a separate environment backend.

Dreamer environment construction applies a standard wrapper stack after creating DMC, DMLab, Gym/Gymnasium, MuJoCo, Gymnasium Robotics, Procgen, BSuite, Brax, or Unity environments:

ActionRepeat repeats each selected action for cfg.action_repeat environment steps.
NormalizeActions exposes finite continuous action bounds as normalized [-1, 1] policy outputs.
TimeLimit truncates episodes after cfg.time_limit // cfg.action_repeat wrapper steps.

Use torchwm envs list to print the lightweight backend catalog used by the CLI.