NuPlan Dataset#

The NuPlan dataset backend provides a PyTorch interface to the Motional NuPlan autonomous driving dataset. It loads real-world driving scenarios, rasterises local map tiles, and returns structured samples with ego vehicle history, agent trajectories, and planning targets — suitable for world model training in the behavioural planning domain.

Install#

pip install nuplan-devkit

NuPlan is not part of TorchWM’s minimal dependencies. You also need a local copy of the NuPlan dataset. Download it from nuplan.org and unpack it to ~/nuplan/dataset (or set $NUPLAN_DATA_ROOT).

The dataset is ~1.8 TB for the full split. For prototyping, use the mini split (~/nuplan/dataset/mini, ~13 GB).

Main API#

from world_models.datasets.nuplan import NuPlanDataset, make_nuplan_dataloader

# Build a dataset over the mini split.
dataset = NuPlanDataset(
    split="train",
    planning_horizon=80,   # 8 seconds at 10 Hz
    past_horizon=20,       # 2 seconds at 10 Hz
    map_extent=(100.0, 100.0),
    map_resolution=0.1,
    max_agents=32,
    limit_scenarios=100,   # remove for full dataset
)

sample = dataset[0]
# sample.map_raster      -> (3, H, W)  float32
# sample.ego_past        -> (20, 6)    float32
# sample.ego_future      -> (80, 2)    float32
# sample.agents_past     -> (32, 20, 6) float32
# sample.agents_future   -> (32, 80, 6) float32
# sample.agents_mask     -> (32,)      bool
# sample.planning_target -> (80, 2)    float32

# Or create a DataLoader directly.
dataset, loader = make_nuplan_dataloader(
    split="train",
    batch_size=16,
    num_workers=4,
    limit_scenarios=100,
)

Configuration#

Parameter

Default

Description

data_root

$NUPLAN_DATA_ROOT or ~/nuplan/dataset

Dataset root directory

map_root

$NUPLAN_MAP_ROOT or ~/nuplan/maps

Map data root

split

train

One of train, val, test

planning_horizon

80

Future steps at 10 Hz

past_horizon

20

Past steps at 10 Hz

map_extent

(100.0, 100.0)

Raster crop half-extent in metres

map_resolution

0.1

Metres per pixel

max_agents

32

Max agents per sample (zero-padded)

limit_scenarios

None

Cap on total scenarios

Sample structure#

class world_models.datasets.nuplan.NuPlanSample(scenario_name, map_raster, ego_past, ego_future, agents_past, agents_future, agents_mask, agent_types, planning_target)[source]

Bases: object

A single training sample from the NuPlan dataset.

Parameters:
  • scenario_name (str)

  • map_raster (Tensor)

  • ego_past (Tensor)

  • ego_future (Tensor)

  • agents_past (Tensor)

  • agents_future (Tensor)

  • agents_mask (Tensor)

  • agent_types (Tensor)

  • planning_target (Tensor)

scenario_name: str
map_raster: Tensor
ego_past: Tensor
ego_future: Tensor
agents_past: Tensor
agents_future: Tensor
agents_mask: Tensor
agent_types: Tensor
planning_target: Tensor

Dataset class#

class world_models.datasets.nuplan.NuPlanDataset(data_root=None, map_root=None, split='train', db_files=None, map_version='nuplan-maps-v1.0', planning_horizon=80, past_horizon=20, map_extent=(100.0, 100.0), map_resolution=0.1, max_agents=32, limit_scenarios=None)[source]

Bases: Dataset[NuPlanSample]

PyTorch Dataset over NuPlan scenarios for world model training.

Each sample contains rasterised map tiles, ego and agent history, and future planning targets at 10 Hz.

Parameters:
  • data_root (str | Path | None) – Path to the NuPlan dataset root. Defaults to $NUPLAN_DATA_ROOT.

  • map_root (str | Path | None) – Path to NuPlan map data. Defaults to $NUPLAN_MAP_ROOT.

  • split (str) – "train", "val", or "test". The mini split is used automatically when data_root / "mini" exists.

  • db_files (list[str] | None) – Explicit list of .db files. When None the builder auto-discovers files under data_root / split.

  • map_version (str) – Map version string, e.g. "nuplan-maps-v1.0".

  • planning_horizon (int) – Number of future steps at 10 Hz (default 80 = 8 s).

  • past_horizon (int) – Number of past steps at 10 Hz (default 20 = 2 s).

  • map_extent (Tuple[float, float]) – Raster crop half-extent in metres (width, height).

  • map_resolution (float) – Metres per pixel for the raster.

  • max_agents (int) – Maximum agents per sample; fewer are zero-padded.

  • limit_scenarios (int | None) – Cap on total scenarios (useful for prototyping).

DataLoader factory#

world_models.datasets.nuplan.make_nuplan_dataloader(data_root=None, split='train', batch_size=32, num_workers=4, **dataset_kwargs)[source]

Create a NuPlan DataLoader.

Parameters:
  • data_root (str | Path | None) – Root of the NuPlan dataset (default: $NUPLAN_DATA_ROOT).

  • split (str) – Dataset split.

  • batch_size (int) – Batch size.

  • num_workers (int) – Worker count for the DataLoader.

  • **dataset_kwargs (Any) – Extra arguments forwarded to NuPlanDataset.

Return type:

(dataset, dataloader)

Observation contract#

Each sample returned by NuPlanDataset.__getitem__ is a NuPlanSample dataclass with the following fields:

  • map_raster(3, H, W) float32 tensor. Channels 0, 1, 2 are binary masks for drivable area, lane centre-lines, and crosswalks respectively. The crop is centred on the ego vehicle and oriented in the ego frame.

  • ego_past(past_horizon, 6) float32 tensor with columns (x, y, yaw, vx, vy, yaw_rate).

  • ego_future(planning_horizon, 2) float32 tensor with relative (x, y) waypoints from the last known ego position.

  • agents_past / agents_future — Zero-padded tensors of shape (max_agents, past_horizon, 6) and (max_agents, planning_horizon, 6). Use agents_mask to distinguish real agents from padding.

  • planning_target — Alias of ego_future for supervised trajectory prediction.

Integration with world models#

The NuPlan dataset can be used to train world models that operate on structured driving scenarios. The map raster and agent trajectories serve as context inputs, while the future ego trajectory provides a regression target for the planning head. A typical workflow:

dataset, loader = make_nuplan_dataloader(split="train", batch_size=32)

for epoch in range(num_epochs):
    for batch in loader:
        # batch is a dict with keys matching the NuPlanSample fields.
        map_raster = batch["map_raster"]         # (B, 3, H, W)
        ego_past = batch["ego_past"]             # (B, T_past, 6)
        agents_past = batch["agents_past"]       # (B, N, T_past, 6)
        agents_mask = batch["agents_mask"]       # (B, N)
        planning_target = batch["planning_target"]  # (B, T_future, 2)

        # forward through a world model ...
        loss = model(map_raster, ego_past, agents_past, agents_mask, planning_target)
        loss.backward()

Environment variables#

  • NUPLAN_DATA_ROOT — Path to the dataset root (default: ~/nuplan/dataset).

  • NUPLAN_MAP_ROOT — Path to the map data (default: ~/nuplan/maps).