Environment Wrappers#

torchwm exposes reusable environment wrappers used by Dreamer and other environment pipelines. The wrapper module contains reusable preprocessing wrappers used by Dreamer and other environment pipelines. These wrappers let you compose time limits, action repeats, action normalization, observation dictionaries, one-hot actions, reward observations, and image transforms.

Standard Dreamer wrapper stack#

Dreamer environment construction applies wrappers in this order:

env = ActionRepeat(env, cfg.action_repeat)
env = NormalizeActions(env)
env = TimeLimit(env, cfg.time_limit // cfg.action_repeat)

This creates a stable interface for policies that emit normalized actions and train on fixed-length episodes.

Common wrappers#

Wrapper	Purpose
`TimeLimit`	End an episode after a fixed number of wrapper steps
`ActionRepeat`	Repeat each action and accumulate reward
`NormalizeActions`	Map normalized `[-1, 1]` actions back to finite environment bounds
`ObsDict`	Convert plain observations into a dictionary under a named key
`OneHotAction`	Convert one-hot vectors into discrete action indices
`RewardObs`	Add the latest reward to the observation dictionary under `reward`
`ResizeImage`	Resize image entries in an observation dictionary
`RenderImage`	Add `env.render("rgb_array")` output to observations
`SelectAction`	Select a named action entry before passing it to the environment

Example composition#

from torchwm import ActionRepeat, NormalizeActions, TimeLimit, make_gym_env

env = make_gym_env("Pendulum-v1", size=(64, 64), render_mode="rgb_array")
env = ActionRepeat(env, amount=2)
env = NormalizeActions(env)
env = TimeLimit(env, duration=500)

Action wrappers#

Use NormalizeActions for continuous-control environments when the policy emits normalized values but the simulator expects task-specific bounds. Use OneHotAction for raw discrete environments when your policy outputs one-hot action vectors.

GymImageEnv already provides a one-hot-style action space for discrete base environments, so avoid double-applying discrete conversion unless you intentionally bypass GymImageEnv.

Observation wrappers#

Use ObsDict when a base environment returns a plain array but downstream model code expects a dictionary. Use RewardObs when the model should observe the previous reward. Use RenderImage and ResizeImage to add or resize images for pixel-based training.

Troubleshooting#

Episode lengths are shorter than expected: account for ActionRepeat; Dreamer divides time_limit by action_repeat before applying TimeLimit.
Actions outside bounds: add NormalizeActions or check whether your policy already emits native environment actions.
Missing observation keys: inspect the environment after each wrapper in your stack to verify the expected dictionary keys.
Image shape mismatch: confirm whether the wrapper returns HWC or CHW images and transpose before model input if needed.