You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
All detection/localization implementations that I've read thus far start from some sort of sliding window approach where you simply search the image with bounding boxes of varying sizes and scales to find the exact location of the object. It's a pretty straight forward approach that's computationally intensive. The following papers all use that approach as their baseline and show how they improve speed and accuracy over it (some high level notes about each but I'd read them for more in depth info!):
DenseNet:
Accelerates classification over large set of aspect ratios and image size regions.
Creates ~25 images of multiple resolutions, batches them together and does inference on the batch of images at multiple scales (similar to building pyramid of different image scales in the SIFT algorithm for example).
Hack Caffe to do this: flatten all images of different scales into size of original input to match expected batch size. Use padding to add space between images and to make the correct size.
sliding window based, does not use pretrained weights.
replace the last layer with a regression layer which generates an object binary mask. Apply bounding box regression to find the best bounding box for the object. iterative process.
No description provided.
The text was updated successfully, but these errors were encountered: