cleaned code, improved readme

pepisg · Oct 14, 2021 · 53c1eba · 53c1eba
1 parent 513ebf8
commit 53c1eba
Show file tree

Hide file tree

Showing 3 changed files with 29 additions and 24 deletions.
diff --git a/README.md b/README.md
@@ -2,7 +2,7 @@
 
 ## Data
 
-For this project, we will be using data from the [Waymo Open dataset](https://waymo.com/open/). The files can be downloaded directly from the website as tar files or from the [Google Cloud Bucket](https://console.cloud.google.com/storage/browser/waymo_open_dataset_v_1_2_0_individual_files/) as individual tf records. 
+For this project, we will be using data from the [Waymo Open dataset](https://waymo.com/open/). The files can be downloaded directly from the website as tar files or from the [Google Cloud Bucket](https://console.cloud.google.com/storage/browser/waymo_open_dataset_v_1_2_0_individual_files/) as individual tf records.
 
 ## Structure
 
@@ -44,14 +44,15 @@ In the classroom workspace, every library and package should already be installe
 
 ### Download and process the data
 
-**Note:** This first step is already done for you in the classroom workspace. You can find the downloaded and processed files within the `/data/waymo/` directory (note that this is different than the `/home/workspace/data` you'll use for splitting )
+**Note:** This first step is already done for you in the classroom workspace. You can find the downloaded and processed files within the `/data/waymo/` directory (note that this is different than the `/home/workspace/data` you'll use for splitting ). If you are using the workspace, you can move directly to the next section (Exploratory Data Analysis).
 
 The first goal of this project is to download the data from the Waymo's Google Cloud bucket to your local machine. For this project, we only need a subset of the data provided (for example, we do not need to use the Lidar data). Therefore, we are going to download and trim immediately each file. In `download_process.py`, you can view the `create_tf_example` function, which will perform this processing. This function takes the components of a Waymo Tf record and saves them in the Tf Object Detection api format. An example of such function is described [here](https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/training.html#create-tensorflow-records). We are already providing the `label_map.pbtxt` file.
 
 You can run the script using the following (you will need to add your desired directory names):
 ```
-python download_process.py --data_dir {processed_file_location} --temp_dir {temp_dir_for_raw_files}
+python download_process.py --data_dir {files location} --size {number of files to download}
 ```
+**Note:** Size is not a required parameter. If not specified, the code will download 100 files. The `/data/waymo` folder already contains those 100 files.
 
 You are downloading 100 files so be patient! Once the script is done, you can look inside your data_dir folder to see if the files have been downloaded and processed correctly.
 
@@ -60,23 +61,24 @@ You are downloading 100 files so be patient! Once the script is done, you can lo
 
 Now that you have downloaded and processed the data, you should explore the dataset! This is the most important task of any machine learning project. To do so, open the `Exploratory Data Analysis` notebook. In this notebook, your first task will be to implement a `display_instances` function to display images and annotations using `matplotlib`. This should be very similar to the function you created during the course. Once you are done, feel free to spend more time exploring the data and report your findings. Report anything relevant about the dataset in the writeup.
 
-Keep in mind that you should refer to this analysis to create the different spits (training, testing and validation). 
+Keep in mind that you should refer to this analysis to create the different spits (training, testing and validation).
 
 
 ### Create the splits
 
-Now you have become one with the data! Congratulations! How will you use this knowledge to create the different splits: training, validation and testing. There are no single answer to this question but you will need to justify your choice in your submission. You will need to implement the `split_data` function in the `create_splits.py` file. Once you have implemented this function, run it using:
+Now you have become one with the data! Congratulations! How will you use this knowledge to create the different splits: training, validation and testing. There are no single correct answer to this question but you will need to justify your choice in your submission. You will need to implement the `split_data` function in the `create_splits.py` file. Once you have implemented this function, run it using:
 ```
-python create_splits.py --data_dir /home/workspace/data/
+python create_splits.py --source /data/waymo/ --destination /home/workspace/data/
 ```
 
-NOTE: Keep in mind that your storage is limited. The files should be <ins>moved</ins> and not copied. 
+**Note:** If you are using the workspace, you cannot **move** files from the `/data/waymo/` folder as this folder is **Read-Only**. Your code should copy the data from
+
 
 ### Edit the config file
 
-Now you are ready for training. As we explain during the course, the Tf Object Detection API relies on **config files**. The config that we will use for this project is `pipeline.config`, which is the config for a SSD Resnet 50 640x640 model. You can learn more about the Single Shot Detector [here](https://arxiv.org/pdf/1512.02325.pdf). 
+Now you are ready for training. As we explain during the course, the Tf Object Detection API relies on **config files**. The config that we will use for this project is `pipeline.config`, which is the config for a SSD Resnet 50 640x640 model. You can learn more about the Single Shot Detector [here](https://arxiv.org/pdf/1512.02325.pdf).
 
-First, let's download the [pretrained model](http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8.tar.gz) and move it to `training/pretrained-models/`. 
+First, let's download the [pretrained model](http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8.tar.gz) and move it to `training/pretrained-models/`.
 
 Now we need to edit the config files to change the location of the training and validation files, as well as the location of the label_map file, pretrained weights. We also need to adjust the batch size. To do so, run the following:
 ```
@@ -86,7 +88,7 @@ A new config file has been created, `pipeline_new.config`.
 
 ### Training
 
-You will now launch your very first experiment with the Tensorflow object detection API. Create a folder `training/reference`. Move the `pipeline_new.config` to this folder. You will now have to launch two processes: 
+You will now launch your very first experiment with the Tensorflow object detection API. Create a folder `training/reference`. Move the `pipeline_new.config` to this folder. You will now have to launch two processes:
 * a training process:
 ```
 python model_main_tf2.py --model_dir=training/reference/ --pipeline_config_path=training/reference/pipeline_new.config
@@ -98,15 +100,15 @@ python model_main_tf2.py --model_dir=training/reference/ --pipeline_config_path=
 
 NOTE: both processes will display some Tensorflow warnings.
 
-To monitor the training, you can launch a tensorboard instance by running `tensorboard --logdir=training`. You will report your findings in the writeup. 
+To monitor the training, you can launch a tensorboard instance by running `tensorboard --logdir=training`. You will report your findings in the writeup.
 
 ### Improve the performances
 
-Most likely, this initial experiment did not yield optimal results. However, you can make multiple changes to the config file to improve this model. One obvious change consists in improving the data augmentation strategy. The [`preprocessor.proto`](https://github.com/tensorflow/models/blob/master/research/object_detection/protos/preprocessor.proto) file contains the different data augmentation method available in the Tf Object Detection API. To help you visualize these augmentations, we are providing a notebook: `Explore augmentations.ipynb`. Using this notebook, try different data augmentation combinations and select the one you think is optimal for our dataset. Justify your choices in the writeup. 
+Most likely, this initial experiment did not yield optimal results. However, you can make multiple changes to the config file to improve this model. One obvious change consists in improving the data augmentation strategy. The [`preprocessor.proto`](https://github.com/tensorflow/models/blob/master/research/object_detection/protos/preprocessor.proto) file contains the different data augmentation method available in the Tf Object Detection API. To help you visualize these augmentations, we are providing a notebook: `Explore augmentations.ipynb`. Using this notebook, try different data augmentation combinations and select the one you think is optimal for our dataset. Justify your choices in the writeup.
 
 Keep in mind that the following are also available:
 * experiment with the optimizer: type of optimizer, learning rate, scheduler etc
-* experiment with the architecture. The Tf Object Detection API [model zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md) offers many architectures. Keep in mind that the `pipeline.config` file is unique for each architecture and you will have to edit it. 
+* experiment with the architecture. The Tf Object Detection API [model zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md) offers many architectures. Keep in mind that the `pipeline.config` file is unique for each architecture and you will have to edit it.
 
 
 ### Creating an animation
@@ -135,10 +137,10 @@ This section should contain a quantitative and qualitative description of the da
 #### Cross validation
 This section should detail the cross validation strategy and justify your approach.
 
-### Training 
+### Training
 #### Reference experiment
 This section should detail the results of the reference experiment. It should includes training metrics and a detailed explanation of the algorithm's performances.
 
 #### Improve on the reference
 This section should highlight the different strategies you adopted to improve your model. It should contain relevant figures and details of your findings.
- 
+
diff --git a/create_splits.py b/create_splits.py
@@ -8,23 +8,26 @@
 from utils import get_module_logger
 
 
-def split(data_dir):
+def split(source, destination):
     """
-    Create three splits from the processed records. The files should be moved to new folders in the 
+    Create three splits from the processed records. The files should be moved to new folders in the
     same directory. This folder should be named train, val and test.
 
     args:
-        - data_dir [str]: data directory, /mnt/data
+        - source [str]: source data directory, contains the processed tf records
+        - destination [str]: destination data directory, contains 3 sub folders: train / val / test
     """
     # TODO: Implement function
-
 
-if __name__ == "__main__": 
+
+if __name__ == "__main__":
     parser = argparse.ArgumentParser(description='Split data into training / validation / testing')
-    parser.add_argument('--data_dir', required=True,
-                        help='data directory')
+    parser.add_argument('--source', required=True,
+                        help='source data directory')
+    parser.add_argument('--destination', required=True,
+                        help='destination data directory')
     args = parser.parse_args()
 
     logger = get_module_logger(__name__)
     logger.info('Creating splits...')
-    split(args.data_dir)
+    split(args.source, args.destination)
diff --git a/download_process.py b/download_process.py
@@ -134,7 +134,7 @@ def download_and_process(filename, data_dir):
     process_tfr(local_path, data_dir)
     # remove the original tf record to save space
     logger.info(f'Deleting {local_path}')
-    # os.remove(local_path)
+    os.remove(local_path)
 
 
 if __name__ == "__main__":