Environment Backends#

TorchWM ships environment adapters for pixel-based world-model training, model-based reinforcement learning, and benchmark collection. Each backend page explains the installation requirements, factory functions, observation and action conventions, configuration fields, and common troubleshooting steps for that backend.

Choosing a backend#

Backend	Best for	Primary APIs	Typical observations	Typical actions
DeepMind Control Suite	Dreamer-style continuous-control tasks with state and rendered image observations	`DeepMindControlEnv`, `env_backend="dmc"`	Dict with DMC state keys plus `image`	Continuous `Box` from the DMC action spec
Gym and Gymnasium	Classic control, Box2D, custom Gym environments, and generic rendered tasks	`GymImageEnv`, `make_gym_env`, `env_backend="gym"`	Dict with `image` only	Original continuous `Box` or one-hot vector for discrete actions
Brax	JAX/Brax continuous-control tasks through a Gym-like image adapter	`BraxImageEnv`, `make_brax_env`, `env_backend="brax"`	Dict with synthesized `image`; raw vector in `info["vector_observation"]`	Continuous `Box[-1, 1]` matching `env.action_size`
Atari	Atari 2600 environments through Gymnasium/ALE	`make_atari_env`, `make_atari_vector_env`	ALE RGB/RAM observations	Discrete Atari actions
MuJoCo	Gymnasium MuJoCo task ids and native MJCF/MJB models	`make_mujoco_env`	Image dict via `GymImageEnv`/`MuJoCoImageEnv`	Continuous `Box`
Gymnasium Robotics	All ids registered by the installed Gymnasium Robotics package, including moved legacy MuJoCo v2/v3 ids	`make_robotics_env`, `list_gymnasium_robotics_envs`	Image dict via `GymImageEnv`	Continuous `Box`
Unity ML-Agents	External Unity executable simulations with continuous-control behaviors	`UnityMLAgentsEnv`, `env_backend="unity_mlagents"`	Dict with `image`	Continuous `Box[-1, 1]`
Vectorized environments	Multiprocess/vector rollout collection and native ALE vectorization	`TorchVectorizedEnv`, `make_atari_vector_env`	Batched observations	Batched actions
Wrappers	Shared preprocessing, action conversion, time limits, reward observations, and image transforms	`world_models.envs.wrappers`	Backend-dependent	Backend-dependent

Shared conventions#

Most TorchWM training code expects image observations as a dictionary entry named image with channel-first shape (3, H, W) and uint8 values. Backend adapters that wrap vector-only environments synthesize an image representation so pixel-based agents can still run.

DIAMOND-style Atari support is documented on the Atari page as a preprocessing helper for Atari rollouts. It is not a separate environment backend.

Dreamer environment construction applies a standard wrapper stack after creating DMC, Gym/Gymnasium, Brax, or Unity environments:

ActionRepeat repeats each selected action for cfg.action_repeat environment steps.
NormalizeActions exposes finite continuous action bounds as normalized [-1, 1] policy outputs.
TimeLimit truncates episodes after cfg.time_limit // cfg.action_repeat wrapper steps.

Use torchwm envs list to print the lightweight backend catalog used by the CLI.