Skip to content

Commit

Permalink
cleaned code, improved readme
Browse files Browse the repository at this point in the history
  • Loading branch information
Thomas Hossler committed Oct 14, 2021
1 parent 513ebf8 commit 53c1eba
Show file tree
Hide file tree
Showing 3 changed files with 29 additions and 24 deletions.
32 changes: 17 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## Data

For this project, we will be using data from the [Waymo Open dataset](https://waymo.com/open/). The files can be downloaded directly from the website as tar files or from the [Google Cloud Bucket](https://console.cloud.google.com/storage/browser/waymo_open_dataset_v_1_2_0_individual_files/) as individual tf records.
For this project, we will be using data from the [Waymo Open dataset](https://waymo.com/open/). The files can be downloaded directly from the website as tar files or from the [Google Cloud Bucket](https://console.cloud.google.com/storage/browser/waymo_open_dataset_v_1_2_0_individual_files/) as individual tf records.

## Structure

Expand Down Expand Up @@ -44,14 +44,15 @@ In the classroom workspace, every library and package should already be installe

### Download and process the data

**Note:** This first step is already done for you in the classroom workspace. You can find the downloaded and processed files within the `/data/waymo/` directory (note that this is different than the `/home/workspace/data` you'll use for splitting )
**Note:** This first step is already done for you in the classroom workspace. You can find the downloaded and processed files within the `/data/waymo/` directory (note that this is different than the `/home/workspace/data` you'll use for splitting ). If you are using the workspace, you can move directly to the next section (Exploratory Data Analysis).

The first goal of this project is to download the data from the Waymo's Google Cloud bucket to your local machine. For this project, we only need a subset of the data provided (for example, we do not need to use the Lidar data). Therefore, we are going to download and trim immediately each file. In `download_process.py`, you can view the `create_tf_example` function, which will perform this processing. This function takes the components of a Waymo Tf record and saves them in the Tf Object Detection api format. An example of such function is described [here](https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/training.html#create-tensorflow-records). We are already providing the `label_map.pbtxt` file.

You can run the script using the following (you will need to add your desired directory names):
```
python download_process.py --data_dir {processed_file_location} --temp_dir {temp_dir_for_raw_files}
python download_process.py --data_dir {files location} --size {number of files to download}
```
**Note:** Size is not a required parameter. If not specified, the code will download 100 files. The `/data/waymo` folder already contains those 100 files.

You are downloading 100 files so be patient! Once the script is done, you can look inside your data_dir folder to see if the files have been downloaded and processed correctly.

Expand All @@ -60,23 +61,24 @@ You are downloading 100 files so be patient! Once the script is done, you can lo

Now that you have downloaded and processed the data, you should explore the dataset! This is the most important task of any machine learning project. To do so, open the `Exploratory Data Analysis` notebook. In this notebook, your first task will be to implement a `display_instances` function to display images and annotations using `matplotlib`. This should be very similar to the function you created during the course. Once you are done, feel free to spend more time exploring the data and report your findings. Report anything relevant about the dataset in the writeup.

Keep in mind that you should refer to this analysis to create the different spits (training, testing and validation).
Keep in mind that you should refer to this analysis to create the different spits (training, testing and validation).


### Create the splits

Now you have become one with the data! Congratulations! How will you use this knowledge to create the different splits: training, validation and testing. There are no single answer to this question but you will need to justify your choice in your submission. You will need to implement the `split_data` function in the `create_splits.py` file. Once you have implemented this function, run it using:
Now you have become one with the data! Congratulations! How will you use this knowledge to create the different splits: training, validation and testing. There are no single correct answer to this question but you will need to justify your choice in your submission. You will need to implement the `split_data` function in the `create_splits.py` file. Once you have implemented this function, run it using:
```
python create_splits.py --data_dir /home/workspace/data/
python create_splits.py --source /data/waymo/ --destination /home/workspace/data/
```

NOTE: Keep in mind that your storage is limited. The files should be <ins>moved</ins> and not copied.
**Note:** If you are using the workspace, you cannot **move** files from the `/data/waymo/` folder as this folder is **Read-Only**. Your code should copy the data from


### Edit the config file

Now you are ready for training. As we explain during the course, the Tf Object Detection API relies on **config files**. The config that we will use for this project is `pipeline.config`, which is the config for a SSD Resnet 50 640x640 model. You can learn more about the Single Shot Detector [here](https://arxiv.org/pdf/1512.02325.pdf).
Now you are ready for training. As we explain during the course, the Tf Object Detection API relies on **config files**. The config that we will use for this project is `pipeline.config`, which is the config for a SSD Resnet 50 640x640 model. You can learn more about the Single Shot Detector [here](https://arxiv.org/pdf/1512.02325.pdf).

First, let's download the [pretrained model](http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8.tar.gz) and move it to `training/pretrained-models/`.
First, let's download the [pretrained model](http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8.tar.gz) and move it to `training/pretrained-models/`.

Now we need to edit the config files to change the location of the training and validation files, as well as the location of the label_map file, pretrained weights. We also need to adjust the batch size. To do so, run the following:
```
Expand All @@ -86,7 +88,7 @@ A new config file has been created, `pipeline_new.config`.

### Training

You will now launch your very first experiment with the Tensorflow object detection API. Create a folder `training/reference`. Move the `pipeline_new.config` to this folder. You will now have to launch two processes:
You will now launch your very first experiment with the Tensorflow object detection API. Create a folder `training/reference`. Move the `pipeline_new.config` to this folder. You will now have to launch two processes:
* a training process:
```
python model_main_tf2.py --model_dir=training/reference/ --pipeline_config_path=training/reference/pipeline_new.config
Expand All @@ -98,15 +100,15 @@ python model_main_tf2.py --model_dir=training/reference/ --pipeline_config_path=

NOTE: both processes will display some Tensorflow warnings.

To monitor the training, you can launch a tensorboard instance by running `tensorboard --logdir=training`. You will report your findings in the writeup.
To monitor the training, you can launch a tensorboard instance by running `tensorboard --logdir=training`. You will report your findings in the writeup.

### Improve the performances

Most likely, this initial experiment did not yield optimal results. However, you can make multiple changes to the config file to improve this model. One obvious change consists in improving the data augmentation strategy. The [`preprocessor.proto`](https://github.com/tensorflow/models/blob/master/research/object_detection/protos/preprocessor.proto) file contains the different data augmentation method available in the Tf Object Detection API. To help you visualize these augmentations, we are providing a notebook: `Explore augmentations.ipynb`. Using this notebook, try different data augmentation combinations and select the one you think is optimal for our dataset. Justify your choices in the writeup.
Most likely, this initial experiment did not yield optimal results. However, you can make multiple changes to the config file to improve this model. One obvious change consists in improving the data augmentation strategy. The [`preprocessor.proto`](https://github.com/tensorflow/models/blob/master/research/object_detection/protos/preprocessor.proto) file contains the different data augmentation method available in the Tf Object Detection API. To help you visualize these augmentations, we are providing a notebook: `Explore augmentations.ipynb`. Using this notebook, try different data augmentation combinations and select the one you think is optimal for our dataset. Justify your choices in the writeup.

Keep in mind that the following are also available:
* experiment with the optimizer: type of optimizer, learning rate, scheduler etc
* experiment with the architecture. The Tf Object Detection API [model zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md) offers many architectures. Keep in mind that the `pipeline.config` file is unique for each architecture and you will have to edit it.
* experiment with the architecture. The Tf Object Detection API [model zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md) offers many architectures. Keep in mind that the `pipeline.config` file is unique for each architecture and you will have to edit it.


### Creating an animation
Expand Down Expand Up @@ -135,10 +137,10 @@ This section should contain a quantitative and qualitative description of the da
#### Cross validation
This section should detail the cross validation strategy and justify your approach.

### Training
### Training
#### Reference experiment
This section should detail the results of the reference experiment. It should includes training metrics and a detailed explanation of the algorithm's performances.

#### Improve on the reference
This section should highlight the different strategies you adopted to improve your model. It should contain relevant figures and details of your findings.

19 changes: 11 additions & 8 deletions create_splits.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,23 +8,26 @@
from utils import get_module_logger


def split(data_dir):
def split(source, destination):
"""
Create three splits from the processed records. The files should be moved to new folders in the
Create three splits from the processed records. The files should be moved to new folders in the
same directory. This folder should be named train, val and test.
args:
- data_dir [str]: data directory, /mnt/data
- source [str]: source data directory, contains the processed tf records
- destination [str]: destination data directory, contains 3 sub folders: train / val / test
"""
# TODO: Implement function


if __name__ == "__main__":

if __name__ == "__main__":
parser = argparse.ArgumentParser(description='Split data into training / validation / testing')
parser.add_argument('--data_dir', required=True,
help='data directory')
parser.add_argument('--source', required=True,
help='source data directory')
parser.add_argument('--destination', required=True,
help='destination data directory')
args = parser.parse_args()

logger = get_module_logger(__name__)
logger.info('Creating splits...')
split(args.data_dir)
split(args.source, args.destination)
2 changes: 1 addition & 1 deletion download_process.py
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,7 @@ def download_and_process(filename, data_dir):
process_tfr(local_path, data_dir)
# remove the original tf record to save space
logger.info(f'Deleting {local_path}')
# os.remove(local_path)
os.remove(local_path)


if __name__ == "__main__":
Expand Down

0 comments on commit 53c1eba

Please sign in to comment.