forked from jaybaird/python-bloomfilter
-
Notifications
You must be signed in to change notification settings - Fork 0
Scalable Bloom Filter implemented in Python
License
YipChunKit/python-bloomfilter
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
python-sbf is an implmentation of Scalable Bloom Filters as discussed in P. Almeida, C.Baquero, N. Preguiça, D. Hutchison, Scalable Bloom Filters, (GLOBECOM 2007), IEEE, 2007. Bloom filters are great if you understand what amount of bits you need to set aside early to store your entire set. Scalable Bloom Filters allow your bloom filter bits to grow as a function of false positive probability and size. A filter is "full" when it's capacity: M * ((ln 2 ^ 2) / abs(ln p)), where M is the number of bits and p is the false positive probability, is reached a new filter is then created exponentially larger than the last with a tighter probability of false positives and a larger k. The defaults are fairly sane for large sets: >>> sbf = ScalableBloomFilter(mode=ScalableBloomFilter.SMALL_SET_GROWTH) >>> [sbf.add(i) for i in xrange(0, 100000)] # the following is debugging output you can turn on in the source Filter is full, creating new filter, size: 16384, k: 5 Filter is full, creating new filter, size: 32768, k: 6 Filter is full, creating new filter, size: 65536, k: 7 Filter is full, creating new filter, size: 131072, k: 8 Filter is full, creating new filter, size: 262144, k: 9 Filter is full, creating new filter, size: 524288, k: 10 Filter is full, creating new filter, size: 1048576, k: 11 >>> (sum([f.m for f in sbf.filters]) / 8) / 1024 255 # kilobytes >>> sbf.capacity() 133095 >>> len(sbf) 99994 # len(sbf) may not equal the entire input length. 0.006% error is well below the default 0.1% error threshold For sets with an ever increasing amount of k, you'll need to use LARGE_SET_GROWTH so the limit of the hashes isn't reached. Fixing this limit is planned for a future release.
About
Scalable Bloom Filter implemented in Python
Resources
License
Stars
Watchers
Forks
Packages 0
No packages published
Languages
- Python 100.0%