Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use localization implementation to get bounding boxes #7

Open
BradNeuberg opened this issue Aug 18, 2015 · 2 comments
Open

Use localization implementation to get bounding boxes #7

BradNeuberg opened this issue Aug 18, 2015 · 2 comments
Assignees
Milestone

Comments

@BradNeuberg
Copy link
Owner

No description provided.

@BradNeuberg BradNeuberg added this to the iteration1 milestone Aug 18, 2015
@jhauswald jhauswald changed the title Study R-CNN papers and implementations Localization and Detection Aug 18, 2015
@jhauswald
Copy link
Contributor

Difference between Detection and Localization Fig 1 here shows the difference for those interested.

All detection/localization implementations that I've read thus far start from some sort of sliding window approach where you simply search the image with bounding boxes of varying sizes and scales to find the exact location of the object. It's a pretty straight forward approach that's computationally intensive. The following papers all use that approach as their baseline and show how they improve speed and accuracy over it (some high level notes about each but I'd read them for more in depth info!):

  • DenseNet:
    • Accelerates classification over large set of aspect ratios and image size regions.
    • Creates ~25 images of multiple resolutions, batches them together and does inference on the batch of images at multiple scales (similar to building pyramid of different image scales in the SIFT algorithm for example).
    • Hack Caffe to do this: flatten all images of different scales into size of original input to match expected batch size. Use padding to add space between images and to make the correct size.
    • paper, source
  • DNNs for Object Detection (NIPS 2013):
    • sliding window based, does not use pretrained weights.
    • replace the last layer with a regression layer which generates an object binary mask. Apply bounding box regression to find the best bounding box for the object. iterative process.
    • paper
  • Overfeat:
    • attempts to solve classification, localization and detection (in increasing order of difficulty)
    • Uses 6 different scales of the input image (table 5 in paper).
    • Trains a regression model of 2 FC layers that produces the bounding box coordinates
    • paper

@BradNeuberg
Copy link
Owner Author

Sounds like the approach we will use is:

http://arxiv.org/pdf/1311.2524v5.pdf

via this Python open source library:

https://github.com/jhauswald/selective_search_py

Updating this issue to indicate that we found an algorithm to use for iteration 2.

@BradNeuberg BradNeuberg modified the milestones: iteration2, iteration1 Aug 19, 2015
@BradNeuberg BradNeuberg changed the title Localization and Detection Use localization implementation to get bounding boxes Aug 19, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants