-
Dataset
- Dense Depth for Autonomous Driving (DDAD)
- KITTI Eigen Split
wget -i splits/kitti_archieves_to_download.txt -P kitti_data/ cd kitti_data/ unzip "*.zip" cd .. find kitti_data/ -name '*.png' | parallel 'convert -quality 92 -sampling-factor 2x2, 1x1, 1x1 {.}.png {.}.jpg && rm {}'
- The above conversion command creates images with default chroma subsampling
2x2, 1x1, 1x1
.
-
Problem Setting
while specialist hardware can give per-pixel depth, a more attractive approach is to only require a single RGB camera.
train a deep network to map from an input image to a depth map
-
Methods
-
Geometry Models
The simplest representation of a camera an image plane at a given position and orientation in space.
The pinhole camera geometry models the camera with two sub-parameterizations, intrinsic and extrinsic paramters. Intrinsic parameters model the optic component (without distortion), and extrinsic model the camera position and orientation in space. This projection of the camera is described as:
A 3D point is projected in a image with the following formula (homogeneous coordinates):
-
Cross-View Reconstruction
frames the learning problem as one of novel view-synthesis, by training a network to predict the appearance of a target image from the viewpoint another image using depth (disparity)
formulate the problem as the minimization of a photometric reprojection error at training time
Here. pe is a photometric reconstruction error, proj() are the resulting 2D coordinates of the projected depths Dₜ in the source view and <> is the sampling operator. For simplicity of notation we assume the pre-comuted intrinsics K of all views are identical, though they can be different. α is set to 0.85.
consider the scene structure and camera motion at the same time, where camera pose estimation has a positive impact on monocular depth estimation. these two sub-networks are trained jointly, and the entire model is constrained by image reconstruction loss similar to stereo matching methods. formulate the problem as the minimization of a photometric reprojection error at training time formulate the problem as the minimization of a photometric reprojection error at training time
-
-
Folder
dataset/
2011_09_26/
...
...
model_dataloader/
model_layer/
model_loss/
model_save/
model_test.py
model_train.py
model_parser.py
model_utility.py
- Packages
apt-get update -y
apt-get install moreutils
or
apt-get install -y moreutils
- Training
python model_train.py --pose_type separate --datatype kitti_eigen_zhou
python model_train.py --pose_type separate --datatype kitti_benchmark
- Test
python model_test.py
- evaluation
kitti_eigen_zhou
abs_rel sqrt_rel rmse rmse_log a1 a2 a3
0.125 0.977 4.992 0.202 0.861 0.955 0.980
kitti_eigen_benchmark
abs_rel sqrt_rel rmse rmse_log a1 a2 a3
0.104 0.809 4.502 0.182 0.900 0.963 0.981
-
What is a feature map? that's the yellow block in the image.
-
It's a collection of N one-dimensional "maps" that each represent a particular "feature" that the model has spotted within the image.
-
why convolutional layers are known as feature extractors
-
How do we get from input (whether image or feature map) to a feature map?
-
through kernels or filters
-
you configure some number N per convolutional layer
-
"slide"(convolve) over your input data