First commit

JunshengFu · Mar 21, 2017 · 23a37d7 · 23a37d7
commit 23a37d7
Show file tree

Hide file tree

Showing 40 changed files with 1,848 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -0,0 +1,197 @@
+#**Vehicle Detection for Autonomous Driving** 
+
+##Objective
+
+####A demo of Vehicle Detection System: a monocular camera is used for detecting vehicles. 
+
+
+####[**(1) Highway Drive (with Lane Departure Warning)**](https://youtu.be/Brh9-uab7Qs) 
+
+[![gif_demo1][demo1_gif]](https://youtu.be/Brh9-uab7Qs)
+
+####[**(2) City Drive (Vehicle Detection only)**](https://youtu.be/2wOxK86LcaM) 
+[![gif_demo2][demo2_gif]](https://youtu.be/2wOxK86LcaM)
+
+---
+
+###Code & Files
+
+####1. My project includes the following files
+* [main.py](main.py) is the main code for demos
+* [svn_pipeline.py](svn_pipeline.py) is the car detection pipeline with SVN
+* [yolo_pipeline.py](svn_pipeline.py) is the car detection pipeline with a deep net [YOLO (You Only Look Once)](https://arxiv.org/pdf/1506.02640.pdf)
+* [visualization.py](visualizations.py) is the function for adding visalization
+* [README.md](README.md) summarizing the results
+---
+Others are the same as in the repository of [Lane Departure Warning System](https://github.com/JunshengFu/autonomous-driving-lane-departure-warning):
+* [calibration.py](calibration.py) contains the script to calibrate camera and save the calibration results
+* [lane.py](model.h5) contains the lane class 
+* [examples](examples) folder contains the sample images and videos
+
+
+####2. Dependencies & my environment
+
+Anaconda is used for managing my [**dependencies**](https://github.com/udacity/CarND-Term1-Starter-Kit).
+
+* OpenCV3, Python3.5, tensorflow, CUDA8  
+* OS: Ubuntu 16.04 (should work on other platform too)
+
+####3. How to run the code
+
+(1) Download weights for YOLO
+
+You can download the weight from [here](https://drive.google.com/open?id=0B5WIzrIVeL0WS3N2VklTVmstelE) and save it to
+the [weights](weights) folder.
+
+(2) If you want to run the demo, you can simply run:
+```sh
+python main.py
+```
+---
+
+###**Two approaches: Linear SVN vs Neural Network**
+
+###1. Linear SVN Approach
+`svn_pipeline.py` contains the code for the svn pipeline.
+
+**Steps:**
+
+* Perform a Histogram of Oriented Gradients (HOG) feature extraction on a labeled training set of images and train a classifier Linear SVM classifier
+* A color transform is applied to the image and append binned color features, as well as histograms of color, to HOG feature vector. 
+* Normalize your features and randomize a selection for training and testing.
+* Implement a sliding-window technique and use SVN classifier to search for vehicles in images.
+* Run pipeline on a video stream and create a heat map of recurring detections frame by frame to reject outliers and follow detected vehicles.
+* Estimate a bounding box for detected vehicles.
+
+[//]: # (Image References)
+[image1]: ./examples/car_not_car.png
+[image2]: ./examples/hog_1.png
+[image2-1]: ./examples/hog_2.png
+[image3]: ./examples/search_windows.png
+[image4]: ./examples/heat_map1.png
+[image5]: ./examples/heat_map2.png
+[image6]: ./examples/labels_map.png
+[image7]: ./examples/svn_1.png
+[image8]: ./examples/yolo_1.png
+[image_yolo1]: ./examples/yolo1.png
+[image_yolo2]: ./examples/yolo2.png
+[video1]: ./project_video.mp4
+[demo1_gif]: ./examples/demo1.gif
+[demo2_gif]: ./examples/demo2.gif
+
+####1.1 Extract Histogram of Oriented Gradients (HOG) from training images
+The code for this step is contained in the function named `extract_features` and codes from line 464 to 552 in `svn_pipeline.py`. 
+ If the SVN classifier exist, load it directly. 
+
+ Otherwise, I started by reading in all the `vehicle` and `non-vehicle` images, around 8000 images in each category.  These datasets are comprised of 
+ images taken from the [GTI vehicle image database](http://www.gti.ssr.upm.es/data/Vehicle_database.html) and 
+ [KITTI vision benchmark suite](http://www.cvlibs.net/datasets/kitti/).
+ Here is an example of one of each of the `vehicle` and `non-vehicle` classes:
+
+![alt text][image1]
+
+
+I then explored different color spaces and different `skimage.hog()` parameters (`orientations`, `pixels_per_cell`, and `cells_per_block`).  I grabbed random images from each of the two classes and displayed them to get a feel for what the `skimage.hog()` output looks like.
+
+Here is an example using the `RGB` color space and HOG parameters of `orientations=9`, `pixels_per_cell=(8, 8)` and `cells_per_block=(2, 2)`:
+
+![alt text][image2]
+![alt text][image2-1]
+
+To optimize the HoG extraction, I **extract the HoG feature for the entire image only once**. Then the entire HoG image
+is saved for further processing. (see line 319 to 321 in  `svn_pipeline.py`)
+
+####1.2 Final choices of HOG parameters, Spatial Features and Histogram of Color.
+
+I tried various combinations of parameters and choose the final combination as follows 
+(see line 16-27 in `svn_pipeline.py`):
+* `YCrCb` color space
+* orient = 9  # HOG orientations
+* pix_per_cell = 8 # HOG pixels per cell
+* cell_per_block = 2 # HOG cells per block, which can handel e.g. shadows
+* hog_channel = "ALL" # Can be 0, 1, 2, or "ALL"
+* spatial_size = (32, 32) # Spatial binning dimensions
+* hist_bins = 32    # Number of histogram bins
+* spatial_feat = True # Spatial features on or off
+* hist_feat = True # Histogram features on or off
+* hog_feat = True # HOG features on or off
+
+All the features are **normalized** by line 511 to 513 in `svn_pipeline.py`, which is a critical step. Otherwise, classifier 
+may have some bias toward to the features with higher weights.
+####1.3. How to train a classifier
+I randomly select 20% of images for testing and others for training, and a linear SVN is used as classifier (see line
+520 to 531 in `svn_pipeline.py`)
+
+####1.4 Sliding Window Search
+For this SVN-based approach, I use two scales of the search window (64x64 and 128x128, see line 41) and search only between 
+[400, 656] in y axis (see line 32 in `svn_pipeline.py`). I choose 75% overlap for the search windows in each scale (see 
+line 314 in `svn_pipeline.py`). 
+
+For every window, the SVN classifier is used to predict whether it contains a car nor not. If yes, save this window (see 
+line 361 to 366 in `svn_pipeline.py`). In the end, a list of windows contains detected cars are obtianed.
+
+![alt text][image3]
+
+####1.5 Create a heat map of detected vehicles
+After obtained a list of windows which may contain cars, a function named `generate_heatmap` (in line 565 in 
+`svn_pipeline.py`) is used to generate a heatmap. Then a threshold is used to filter out the false positives.
+
+![heatmap][image4]
+![heatmap][image5]
+
+####1.6 Image vs Video implementation
+**For image**, we could directly use the result from the filtered heatmap to create a bounding box of the detected 
+vehicle. 
+
+**For video**, we could further utilize neighbouring frames to filter out the false positives, as well as to smooth 
+the position of bounding box. 
+* Accumulate the heatmap for N previous frame.  
+* Apply weights to N previous frames: smaller weights for older frames (line 398 to 399 in `svn_pipeline.py`).
+* I then apply threshold and use `scipy.ndimage.measurements.label()` to identify individual blobs in the heatmap.  
+* I then assume each blob corresponded to a vehicle and constructe bounding boxes to cover the area of each blob detected.  
+
+
+####Example of test image
+
+![alt text][image7]
+
+---
+
+
+###2. Neural Network Approach (YOLO)
+`yolo_pipeline.py` contains the code for the yolo pipeline. 
+
+[YOLO](https://arxiv.org/pdf/1506.02640.pdf) is an object detection pipeline baesd on Neural Network. Contrast to prior work on object detection with classifiers 
+to perform detection, YOLO frame object detection as a regression problem to spatially separated bounding boxes and
+associated class probabilities. A single neural network predicts bounding boxes and class probabilities directly from
+full images in one evaluation. Since the whole detection pipeline is a single network, it can be optimized end-to-end
+directly on detection performance.
+
+![alt text][image_yolo2]
+
+Steps to use the YOLO for detection:
+* resize input image to 448x448
+* run a single convolutional network on the image
+* threshold the resulting detections by the model’s confidence
+
+![alt text][image_yolo1]
+
+`yolo_pipeline.py` is modified and integrated based on this [tensorflow implementation of YOLO](https://github.com/gliese581gg/YOLO_tensorflow).
+Since the "car" is known to YOLO, I use the precomputed weights directly and apply to the entire input frame.
+
+####Example of test image
+![alt text][image8]
+
+---
+
+###Discussionpr
+For the SVN based approach, the accuray is good, but the speed (2 fps) is an problem due to the fact of sliding window approach 
+is time consuming! We could use image downsampling, multi-threads, or GPU processing to improve the speed. But, there are probably
+a lot engineering work need to be done to make it running real-time. Also, in this application, I limit the vertical searching 
+range to control the number of searching windows, as well as avoid some false positives (e.g. cars on the tree).
+
+For YOLO based approach, it achieves real-time and the accuracy are quite satisfactory. Only in some cases, it may failure to
+ detect the small car thumbnail in distance. My intuition is that the original input image is in resolution of 1280x720, and it needs to be downscaled
+ to 448x448, so the car in distance will be tiny and probably quite distorted in the downscaled image (448x448). In order to 
+ correctly identify the car in distance, we might need to either crop the image instead of directly downscaling it, or retrain 
+ the network.
diff --git a/calibration.py b/calibration.py
@@ -0,0 +1,121 @@
+"""calibration.py: Calibration the cameras and save the calibration results."""
+
+__author__ = "Junsheng Fu"
+__email__ = "[email protected]"
+__date__ = "March 2017"
+
+import numpy as np
+import cv2
+import glob
+import pickle
+import matplotlib.pyplot as plt
+from os import path
+
+
+def calibrate_camera(nx, ny, basepath):
+    """
+
+    :param nx: number of grids in x axis
+    :param ny: number of grids in y axis
+    :param basepath: path contains the calibration images
+    :return: write calibration file into basepath as calibration_pickle.p
+    """
+
+    objp = np.zeros((nx*ny,3), np.float32)
+    objp[:,:2] = np.mgrid[0:nx,0:ny].T.reshape(-1,2)
+
+    # Arrays to store object points and image points from all the images.
+    objpoints = [] # 3d points in real world space
+    imgpoints = [] # 2d points in image plane.
+
+    # Make a list of calibration images
+    images = glob.glob(path.join(basepath, 'calibration*.jpg'))
+
+    # Step through the list and search for chessboard corners
+    for fname in images:
+        img = cv2.imread(fname)
+        gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
+
+        # Find the chessboard corners
+        ret, corners = cv2.findChessboardCorners(gray, (nx,ny),None)
+
+        # If found, add object points, image points
+        if ret == True:
+            objpoints.append(objp)
+            imgpoints.append(corners)
+
+            # Draw and display the corners
+            img = cv2.drawChessboardCorners(img, (nx,ny), corners, ret)
+            cv2.imshow('input image',img)
+            cv2.waitKey(500)
+
+    cv2.destroyAllWindows()
+
+
+    # calibrate the camera
+    img_size = (img.shape[1], img.shape[0])
+    ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints, img_size, None, None)
+
+    # Save the camera calibration result for later use (we don't use rvecs / tvecs)
+    dist_pickle = {}
+    dist_pickle["mtx"] = mtx
+    dist_pickle["dist"] = dist
+    destnation = path.join(basepath,'calibration_pickle.p')
+    pickle.dump( dist_pickle, open( destnation, "wb" ) )
+    print("calibration data is written into: {}".format(destnation))
+
+    return mtx, dist
+
+
+def load_calibration(calib_file):
+    """
+
+    :param calib_file:
+    :return: mtx and dist
+    """
+    with open(calib_file, 'rb') as file:
+        # print('load calibration data')
+        data= pickle.load(file)
+        mtx = data['mtx']       # calibration matrix
+        dist = data['dist']     # distortion coefficients
+
+    return mtx, dist
+
+
+def undistort_image(imagepath, calib_file, visulization_flag):
+    """ undistort the image and visualization
+
+    :param imagepath: image path
+    :param calib_file: includes calibration matrix and distortion coefficients
+    :param visulization_flag: flag to plot the image
+    :return: none
+    """
+    mtx, dist = load_calibration(calib_file)
+
+    img = cv2.imread(imagepath)
+
+    # undistort the image
+    img_undist = cv2.undistort(img, mtx, dist, None, mtx)
+    img_undistRGB = cv2.cvtColor(img_undist, cv2.COLOR_BGR2RGB)
+
+    if visulization_flag:
+        imgRGB = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
+        f, (ax1, ax2) = plt.subplots(1, 2)
+        ax1.imshow(imgRGB)
+        ax1.set_title('Original Image', fontsize=30)
+        ax1.axis('off')
+        ax2.imshow(img_undistRGB)
+        ax2.set_title('Undistorted Image', fontsize=30)
+        ax2.axis('off')
+        plt.show()
+
+    return img_undistRGB
+
+
+if __name__ == "__main__":
+
+    nx, ny = 9, 6  # number of grids along x and y axis in the chessboard pattern
+    basepath = 'camera_cal/'  # path contain the calibration images
+
+    # calibrate the camera and save the calibration data
+    calibrate_camera(nx, ny, basepath)
diff --git a/calibration_pickle.p b/calibration_pickle.p
diff --git a/clf_pickle_all_v1.p b/clf_pickle_all_v1.p
diff --git a/examples/.DS_Store b/examples/.DS_Store
diff --git a/examples/000275.png b/examples/000275.png
diff --git a/examples/2.png b/examples/2.png
diff --git a/examples/25.png b/examples/25.png
diff --git a/examples/3.png b/examples/3.png
diff --git a/examples/31.png b/examples/31.png
diff --git a/examples/53.png b/examples/53.png
diff --git a/examples/8.png b/examples/8.png
diff --git a/examples/car_not_car.png b/examples/car_not_car.png
diff --git a/examples/car_sample.png b/examples/car_sample.png
diff --git a/examples/demo1.gif b/examples/demo1.gif
diff --git a/examples/demo2.gif b/examples/demo2.gif
diff --git a/examples/heat_map1.png b/examples/heat_map1.png
diff --git a/examples/heat_map2.png b/examples/heat_map2.png
diff --git a/examples/hog_1.png b/examples/hog_1.png
diff --git a/examples/hog_2.png b/examples/hog_2.png
diff --git a/examples/notcar_sample.png b/examples/notcar_sample.png
diff --git a/examples/output_bboxes.png b/examples/output_bboxes.png
diff --git a/examples/project_video.mp4 b/examples/project_video.mp4
diff --git a/examples/search_windows.png b/examples/search_windows.png
diff --git a/examples/svn_1.png b/examples/svn_1.png
diff --git a/examples/test1.jpg b/examples/test1.jpg
diff --git a/examples/test2.jpg b/examples/test2.jpg
diff --git a/examples/test3.jpg b/examples/test3.jpg
diff --git a/examples/test4.jpg b/examples/test4.jpg
diff --git a/examples/test5.jpg b/examples/test5.jpg
diff --git a/examples/test6.jpg b/examples/test6.jpg
diff --git a/examples/yolo1.png b/examples/yolo1.png
diff --git a/examples/yolo2.png b/examples/yolo2.png
diff --git a/examples/yolo_1.png b/examples/yolo_1.png