# Gym and Gymnasium

The Gym/Gymnasium backend adapts standard Gym-like environments to TorchWM's image-first training interface. It accepts either an environment ID string or a pre-built environment instance and returns observations as `{"image": ...}`.

## Install

Gymnasium and Gym are included in TorchWM's base dependency set. Optional environment families may require extras:

```bash
pip install torchwm[gym]
pip install "gymnasium[classic-control,box2d,atari]"
```

Install the extras needed by the specific Gymnasium environment ID you plan to use.

## Main APIs

```python
from torchwm import GymImageEnv, make_gym_env

env = make_gym_env("Pendulum-v1", seed=0, size=(64, 64), render_mode="rgb_array")
obs = env.reset()
```

You can also wrap an already-created environment:

```python
import gymnasium as gym
from torchwm import GymImageEnv

base_env = gym.make("CartPole-v1", render_mode="rgb_array")
env = GymImageEnv(base_env, seed=123, size=(64, 64))
```

## Dreamer configuration

```python
from torchwm import DreamerConfig

cfg = DreamerConfig()
cfg.env_backend = "gym"
cfg.env = "Pendulum-v1"
cfg.gym_render_mode = "rgb_array"
cfg.image_size = 64
```

`env_backend` can be `"gym"`, `"gymnasium"`, or `"generic"`. If `cfg.env_instance` is provided, Dreamer wraps that instance with `GymImageEnv` regardless of backend string.

## Observation conversion

`GymImageEnv` always exposes:

```python
{"image": uint8 array with shape (3, H, W)}
```

The wrapper handles several observation styles:

- Tuple reset/step outputs from Gymnasium by taking the first item as the observation.
- Dict observations by preferring image-like keys such as `image`, `pixels`, `rgb`, `observation`, or `state`.
- Vector observations by rendering simple vertical intensity bands into an RGB image.
- HWC, CHW, grayscale, and RGBA images by converting to RGB, resizing, and transposing to CHW.

When the wrapped environment supports `render()`, TorchWM attempts to use rendered frames for visual observations. If rendering fails or only vector observations are available, it falls back to vector-to-image synthesis.

## Action conversion

For continuous action spaces, `GymImageEnv.action_space` mirrors the wrapped environment's `Box` bounds.

For discrete action spaces, `GymImageEnv.action_space` is a continuous `Box` of shape `(n,)` in `[-1, 1]`. The wrapper expects a one-hot-like action vector and converts it to the discrete index with `argmax` before stepping the base environment. Its `sample()` method returns one-hot vectors with `1.0` at the selected action and `-1.0` elsewhere.

## Example environments

The lightweight catalog lists common IDs such as:

- Classic control: `CartPole-v1`, `Pendulum-v1`, `Acrobot-v1`, `MountainCarContinuous-v0`
- MuJoCo-style IDs: `HalfCheetah-v4`, `Humanoid-v4`, `Hopper-v4`, `Walker2d-v4`, `Ant-v4`
- Box2D: `LunarLander-v3`, `LunarLanderContinuous-v3`, `BipedalWalker-v3`, `CarRacing-v3`
- Toy text: `Blackjack-v1`, `FrozenLake-v1`, `Taxi-v3`

## CLI collection

The CLI can collect random-policy rollouts from Gym-like environments:

```bash
torchwm collect --env CartPole-v1 --steps 1000 --out cartpole.npz
```

The command first tries `torchwm.make_env()` and falls back to `gym.make()`.

## Troubleshooting

- **Black frames or missing render output**: create the environment with `render_mode="rgb_array"` and pass the same render mode to `GymImageEnv`.
- **Box2D import errors**: install the Box2D Gymnasium extra.
- **Discrete policies produce invalid actions**: emit vectors of length `env.action_space.shape[0]`; the wrapper chooses `argmax`.
- **Custom environment reset signatures**: Gymnasium-style `(obs, info)` and Gym-style `obs` resets are both supported by the wrapper.