You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For features like buildings we want to sample OpenStreetMap when extracting geometries in rs extract.
The osmium handlers in robosat.osm should take a sampler and then for every OpenStreetMap entity call back ask the sampler if they should handle this entity or not.
For the sampler we have a few options:
let user pass a number n of samples (e.g. 20k); we take the first n and after that just drop features. Problem: we don't randomly sample from all geographical areas; not a good idea
let the user pass a fraction f of samples (e.g. 0.1); in the osm call backs we take a random number r in [0, 1] and keep the sample if the number if r < f. Problem: users want a fixed amount of samples (e.g. 20k) but a fraction will change depending on how many features there are in osm. For example with parking lots a fraction of 0.1 is maybe a few thousands, with buildings it's millions.
do two passes over the data; in the first pass count how many features there are in osm, then come up with a fraction to keep; then in the second pass we use approach 2. Problem: needs two passes over the data, and two separate handlers for one feature.
use an online algorithm for random sampling: reservoir sampling. It's an algorithm for randomly sampling k items out of a stream of unknown size. This is a good read.
Tasks:
Implement a ReservoirSampler class; it takes a size n of max. number of items to randomly sample from a stream of unknown size.
Let our osmium handlers take a ReservoirSampler; in the osm entity call backs they push features into the reservoir. And in the save function they save features from the reservoir. The reservoir is responsible for keeping or discarding features doing the sampling.
Add an optional argument to the rs extract tool for users to set the sample size; pass this argument to the sampler.
Note: now that we have the rs dedupe tool deduplicating detections against OpenStreetMap we need to think about how to design the interface here. The dedupe tool currently ready in the OpenStreetMap features created in the extract tool. If we randomly sample features in extract we can no longer use it for deduplication.
The text was updated successfully, but these errors were encountered:
For features like buildings we want to sample OpenStreetMap when extracting geometries in
rs extract
.The osmium handlers in
robosat.osm
should take a sampler and then for every OpenStreetMap entity call back ask the sampler if they should handle this entity or not.For the sampler we have a few options:
n
of samples (e.g. 20k); we take the firstn
and after that just drop features. Problem: we don't randomly sample from all geographical areas; not a good ideaf
of samples (e.g.0.1
); in the osm call backs we take a random numberr
in [0, 1] and keep the sample if the number ifr < f
. Problem: users want a fixed amount of samples (e.g. 20k) but a fraction will change depending on how many features there are in osm. For example with parking lots a fraction of 0.1 is maybe a few thousands, with buildings it's millions.k
items out of a stream of unknown size. This is a good read.Tasks:
ReservoirSampler
class; it takes a sizen
of max. number of items to randomly sample from a stream of unknown size.ReservoirSampler
; in the osm entity call backs they push features into the reservoir. And in the save function they save features from the reservoir. The reservoir is responsible for keeping or discarding features doing the sampling.rs extract
tool for users to set the sample size; pass this argument to the sampler.Note: now that we have the rs dedupe tool deduplicating detections against OpenStreetMap we need to think about how to design the interface here. The dedupe tool currently ready in the OpenStreetMap features created in the extract tool. If we randomly sample features in extract we can no longer use it for deduplication.
The text was updated successfully, but these errors were encountered: