Waymo Dataset Support (#21)

* Install waymo devkit * Create dataset_specific folders for waymo dataset * Implement functions for waymo * More method implementations * Format agent data * Drop agents with only one detection for waymo * More waymo map implementation * Refactor and add parallelization to waymo data loading * Add name check for waymo dataset * Refactor cache_map * Implement traffic light status extraction from waymo * Revert accidental change * Add tqdm to extract_vectorized function * Iterate to get the max/min point in waymo map * Set origin shifting to be 0 * Remove shallow copy * Update * Remove commented code and add linear interpolation * Refactor waymo data loading * Refactor waymo_dataset implementation * Fix major bugs * More debugging * Add interpolation and remove hard-coded value * Fix interpolation and first/last timesteps * Fix interpolation * Update agent data extraction * Fix first/last_timestep * remove unnecessary lines * Lots of quality of life improvements in preparation for merging (formatting, a bit of refactoring and speedups, parallelism for maps, etc). Remaining big todo is fixing map element boundaries... * Small formatting post-merge. * Initial solution for Waymo boundaries. * Saving work quickly to try something else. * We'll take it. Just remaining is the connectivity fix and then we can call it. * Traffic light saving done. * Small comment fix. * Fixing lane connectivity, will evaluate it later with improved visualizations. * Removing TODO comment. * Lessened the number of hops in the map_api_example. * Updating version, README, and DATASETS with Waymo info. * Adding newline. --------- Co-authored-by: songgua7 <[email protected]> Co-authored-by: Boris Ivanovic <[email protected]> Co-authored-by: Boris Ivanovic <[email protected]>
NVlabs · Apr 22, 2023 · 860908a · 860908a
1 parent 5a0567b
commit 860908a
Show file tree

Hide file tree

Showing 13 changed files with 1,010 additions and 14 deletions.
diff --git a/DATASETS.md b/DATASETS.md
@@ -1,7 +1,7 @@
 # Supported Datasets and Required Formats
 
 ## nuScenes
-Nothing special needs to be done for the nuScenes dataset, simply install it as per [the instructions in the devkit README](https://github.com/nutonomy/nuscenes-devkit#nuscenes-setup).
+Nothing special needs to be done for the nuScenes dataset, simply download it as per [the instructions in the devkit README](https://github.com/nutonomy/nuscenes-devkit#nuscenes-setup).
 
 It should look like this after downloading:
 ```
@@ -54,8 +54,30 @@ It should look like this after downloading:
 
 **Note**: Not all dataset splits need to be downloaded. For example, you can download only the nuPlan Mini Split in case you only need a small sample dataset.
 
+## Waymo Open Motion Dataset
+Nothing special needs to be done for the Waymo Open Motion Dataset, simply download v1.1 as per [the instructions on the dataset website](https://waymo.com/intl/en_us/open/download/).
+
+It should look like this after downloading:
+```
+/path/to/waymo/
+            ├── training/
+            |   ├── training.tfrecord-00000-of-01000
+            |   ├── training.tfrecord-00001-of-01000
+            |   └── ...
+            ├── validation/
+            │   ├── validation.tfrecord-00000-of-00150
+            |   ├── validation.tfrecord-00001-of-00150
+            |   └── ...
+            └── testing/
+                ├── testing.tfrecord-00000-of-00150
+                ├── testing.tfrecord-00001-of-00150
+                └── ...
+```
+
+**Note**: Not all the dataset parts need to be downloaded, only the necessary directories in [the Google Cloud Bucket](https://console.cloud.google.com/storage/browser/waymo_open_dataset_motion_v_1_1_0/uncompressed/scenario) need to be downloaded (e.g., `validation` for the validation dataset).
+
 ## Lyft Level 5
-Nothing special needs to be done for the Lyft Level 5 dataset, simply install it as per [the instructions on the dataset website](https://woven-planet.github.io/l5kit/dataset.html).
+Nothing special needs to be done for the Lyft Level 5 dataset, simply download it as per [the instructions on the dataset website](https://woven-planet.github.io/l5kit/dataset.html).
 
 It should look like this after downloading:
 ```

diff --git a/README.md b/README.md
@@ -13,16 +13,19 @@ The easiest way to install trajdata is through PyPI with
 pip install trajdata
 ```
 
-In case you would also like to use datasets such as nuScenes and Lyft Level 5 (which require their own devkits to access raw data), the following will also install the respective devkits.
+In case you would also like to use datasets such as nuScenes, Lyft Level 5, or Waymo Open Motion Dataset (which require their own devkits to access raw data), the following will also install the respective devkits.
 ```sh
 # For nuScenes
 pip install "trajdata[nusc]"
 
 # For Lyft
 pip install "trajdata[lyft]"
 
-# Both
-pip install "trajdata[nusc,lyft]"
+# For Waymo
+pip install "trajdata[waymo]"
+
+# All
+pip install "trajdata[nusc,lyft,waymo]"
 ```
 Then, download the raw datasets (nuScenes, Lyft Level 5, ETH/UCY, etc) in case you do not already have them. For more information about how to structure dataset folders/files, please see [`DATASETS.md`](./DATASETS.md).
 
@@ -84,9 +87,12 @@ Currently, the dataloader supports interfacing with the following datasets:
 | Dataset | ID | Splits | Add'l Tags | Description | dt | Maps |
 |---------|----|--------|------------|-------------|----|------|
 | nuScenes Train/TrainVal/Val | `nusc_trainval` | `train`, `train_val`, `val` | `boston`, `singapore` | nuScenes prediction challenge training/validation/test splits (500/200/150 scenes) | 0.5s (2Hz) | :white_check_mark: |
-| nuScenes Test | `nusc_test` | `test` | `boston`, `singapore` | nuScenes' test split, no annotations (150 scenes) | 0.5s (2Hz) | :white_check_mark: |
+| nuScenes Test | `nusc_test` | `test` | `boston`, `singapore` | nuScenes test split, no annotations (150 scenes) | 0.5s (2Hz) | :white_check_mark: |
 | nuScenes Mini | `nusc_mini` | `mini_train`, `mini_val` | `boston`, `singapore` | nuScenes mini training/validation splits (8/2 scenes) | 0.5s (2Hz) | :white_check_mark: |
 | nuPlan Mini | `nuplan_mini` | `mini_train`, `mini_val`, `mini_test` | `boston`, `singapore`, `pittsburgh`, `las_vegas` | nuPlan mini training/validation/test splits (942/197/224 scenes) | 0.05s (20Hz) | :white_check_mark: |
+| Waymo Open Motion Training | `waymo_train` | `train` | N/A | Waymo Open Motion Dataset `training` split | 0.1s (10Hz) | :white_check_mark: |
+| Waymo Open Motion Validation | `waymo_val` | `val` | N/A | Waymo Open Motion Dataset `validation` split | 0.1s (10Hz) | :white_check_mark: |
+| Waymo Open Motion Testing | `waymo_test` | `test` | N/A | Waymo Open Motion Dataset `testing` split | 0.1s (10Hz) | :white_check_mark: |
 | Lyft Level 5 Train | `lyft_train` | `train` | `palo_alto` | Lyft Level 5 training data - part 1/2 (8.4 GB) | 0.1s (10Hz) | :white_check_mark: |
 | Lyft Level 5 Train Full | `lyft_train_full` | `train` | `palo_alto` | Lyft Level 5 training data - part 2/2 (70 GB) | 0.1s (10Hz) | :white_check_mark: |
 | Lyft Level 5 Validation | `lyft_val` | `val` | `palo_alto` | Lyft Level 5 validation data (8.2 GB) | 0.1s (10Hz) | :white_check_mark: |

diff --git a/examples/map_api_example.py b/examples/map_api_example.py
@@ -162,7 +162,7 @@ def main():
     ax.imshow(map_img, alpha=0.5, origin="lower")
     vec_map.visualize_lane_graph(
         origin_lane=np.random.randint(0, len(vec_map.lanes)),
-        num_hops=10,
+        num_hops=5,
         raster_from_world=raster_from_world,
         ax=ax,
     )

diff --git a/requirements.txt b/requirements.txt
@@ -17,6 +17,11 @@ nuscenes-devkit==1.1.9
 protobuf==3.19.4
 l5kit==1.5.0
 
+# Waymo devkit
+tensorflow==2.11.0
+waymo-open-dataset-tf-2-11-0
+intervaltree
+
 # Development
 black
 isort

diff --git a/setup.cfg b/setup.cfg
@@ -1,6 +1,6 @@
 [metadata]
 name = trajdata
-version = 1.3.0
+version = 1.3.1
 author = Boris Ivanovic
 author_email = [email protected]
 description = A unified interface to many trajectory forecasting datasets.
@@ -48,3 +48,7 @@ nusc =
 lyft =
     protobuf==3.19.4
     l5kit==1.5.0
+waymo =
+    tensorflow==2.11.0
+    waymo-open-dataset-tf-2-11-0
+    intervaltree
diff --git a/src/trajdata/caching/df_cache.py b/src/trajdata/caching/df_cache.py
@@ -757,7 +757,7 @@ def finalize_and_cache_map(
             cache_path, vector_map.env_name, vector_map.map_name, raster_resolution
         )
 
-        pbar_kwargs = {"position": 2, "leave": False}
+        pbar_kwargs = {"position": 2, "leave": False, "disable": True}
         rasterized_map: RasterizedMap = raster_utils.rasterize_map(
             vector_map, raster_resolution, **pbar_kwargs
         )

diff --git a/src/trajdata/dataset.py b/src/trajdata/dataset.py
@@ -46,7 +46,6 @@
     scene_collate_fn,
 )
 from trajdata.dataset_specific import RawDataset
-from trajdata.maps import VectorMap
 from trajdata.maps.map_api import MapAPI
 from trajdata.parallel import ParallelDatasetPreprocessor, scene_paths_collate_fn
 from trajdata.utils import agent_utils, env_utils, scene_utils, string_utils
@@ -187,7 +186,8 @@ def __init__(
         self.raster_map_params = (
             raster_map_params
             if raster_map_params is not None
-            else {"px_per_m": DEFAULT_PX_PER_M}
+            # Allowing for parallel map processing in case the user specifies num_workers.
+            else {"px_per_m": DEFAULT_PX_PER_M, "num_workers": num_workers}
         )
         self.incl_vector_map = incl_vector_map
         self.vector_map_params = (

diff --git a/src/trajdata/dataset_specific/scene_records.py b/src/trajdata/dataset_specific/scene_records.py
@@ -23,6 +23,12 @@ class LyftSceneRecord(NamedTuple):
     data_idx: int
 
 
+class WaymoSceneRecord(NamedTuple):
+    name: str
+    length: str
+    data_idx: int
+
+
 class NuPlanSceneRecord(NamedTuple):
     name: str
     location: str

diff --git a/src/trajdata/dataset_specific/waymo/__init__.py b/src/trajdata/dataset_specific/waymo/__init__.py
@@ -0,0 +1 @@
+from .waymo_dataset import WaymoDataset