Skip to content

YipChunKit/python-bloomfilter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 

Repository files navigation

python-sbf is an implmentation of Scalable Bloom Filters as discussed in
P. Almeida, C.Baquero, N. Preguiça, D. Hutchison, Scalable Bloom Filters, (GLOBECOM 2007), IEEE, 2007.

Bloom filters are great if you understand what amount of bits you need to set aside early to store your entire set.
Scalable Bloom Filters allow your bloom filter bits to grow as a function of false positive probability and size.

A filter is "full" when it's capacity: M * ((ln 2 ^ 2) / abs(ln p)), where M is the number of bits and p is the false positive probability, is reached a new filter is then created exponentially larger than the last with a tighter probability of false positives and a larger k.

The defaults are fairly sane for large sets:
>>> sbf = ScalableBloomFilter(mode=ScalableBloomFilter.SMALL_SET_GROWTH)
>>> [sbf.add(i) for i in xrange(0, 100000)]
# the following is debugging output you can turn on in the source
Filter is full, creating new filter, size: 16384, k: 5
Filter is full, creating new filter, size: 32768, k: 6
Filter is full, creating new filter, size: 65536, k: 7
Filter is full, creating new filter, size: 131072, k: 8
Filter is full, creating new filter, size: 262144, k: 9
Filter is full, creating new filter, size: 524288, k: 10
Filter is full, creating new filter, size: 1048576, k: 11
>>> (sum([f.m for f in sbf.filters]) / 8) / 1024
255 # kilobytes
>>> sbf.capacity()
133095
>>> len(sbf)
99994 # len(sbf) may not equal the entire input length. 0.006% error is well below the default 0.1% error threshold 

For sets with an ever increasing amount of k, you'll need to use LARGE_SET_GROWTH so the limit of the hashes isn't reached. Fixing this limit is planned for a future release.

About

Scalable Bloom Filter implemented in Python

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 100.0%