Based on Space/Time Trade-offs in Hash Coding with Allowable Errors by Burton H. Bloom.
BF.RESERVE {key} {error_rate} {capacity} [EXPANSION {expansion}] [NONSCALING]
Creates an empty Bloom Filter with a single sub-filter for the initial capacity
requested and with an upper bound error_rate
. By default, the filter
auto-scales by creating additional sub-filters when capacity
is reached. The
new sub-filter is created with size of the previous sub-filter multiplied by
expansion
.
Though the filter can scale up by creating sub-filters, it is recommended to
reserve the estimated required capacity
since maintaining and querying
sub-filters requires additional memory (each sub-filter uses an extra bits and
hash function) and consume further CPU time than an equivalent filter that had
the right capacity at creation time.
The number of hash functions is -log(error)/ln(2)^2
.
The number of bits per item is -log(error)/ln(2)
≈ 1.44.
- 1% error rate requires 7 hash functions and 10.08 bits per item.
- 0.1% error rate requires 10 hash functions and 14.4 bits per item.
- 0.01% error rate requires 14 hash functions and 20.16 bits per item.
- key: The key under which the filter is found
- error_rate: The desired probability for false positives. The rate is a decimal value between 0 and 1. For example, for a desired false positive rate of 0.1% (1 in 1000), error_rate should be set to 0.001.
- capacity: The number of entries intended to be added to the filter.
If your filter allows scaling, performance will begin to degrade after
adding more items than this number. The actual degradation depends on
how far the limit has been exceeded. Performance degrades linearly with
the number of
sub-filters
.
Optional parameters:
- NONSCALING: Prevents the filter from creating additional sub-filters if
initial capacity is reached. Non-scaling filters requires slightly less
memory than their scaling counterparts. The filter returns an error
when
capacity
is reached. - EXPANSION: When
capacity
is reached, an additional sub-filter is created. The size of the new sub-filter is the size of the last sub-filter multiplied byexpansion
. If the number of elements to be stored in the filter is unknown, we recommend that you use anexpansion
of 2 or more to reduce the number of sub-filters. Otherwise, we recommend that you use anexpansion
of 1 to reduce memory consumption. The default expansion value is 2.
O(1)
OK on success, error otherwise.
BF.ADD {key} {item}
Adds an item to the Bloom Filter, creating the filter if it does not yet exist.
- key: The name of the filter
- item: The item to add
O(k), where k is the number of hash
functions used by the last sub-filter.
"1" if the item was newly inserted, or "0" if it may have existed previously.
BF.MADD {key} {item ...}
Adds one or more items to the Bloom Filter and creates the filter if it does not exist yet.
This command operates identically to BF.ADD
except that it allows multiple inputs and returns
multiple values.
- key: The name of the filter
- item: One or more items to add
O(k * n), where k is the number of hash
functions used by the last sub-filter
and m the number of items that are added.
An array of booleans (integers). Each element is either true or false depending on whether the corresponding input element was newly added to the filter or may have previously existed.
BF.INSERT {key} [CAPACITY {cap}] [ERROR {error}] [EXPANSION {expansion}] [NOCREATE]
[NONSCALING] ITEMS {item ...}
BF.INSERT is a sugarcoated combination of BF.RESERVE and BF.ADD. It creates a
new filter if the key
does not exist using the relevant arguments (see BF.RESERVE).
Next, all ITEMS
are inserted.
- key: The name of the filter
- item: One or more items to add. The
ITEMS
keyword must precede the list of items to add.
Optional parameters:
- NOCREATE: (Optional) Indicates that the filter should not be created if
it does not already exist. If the filter does not yet exist, an error is
returned rather than creating it automatically. This may be used where a strict
separation between filter creation and filter addition is desired. It is an
error to specify
NOCREATE
together with eitherCAPACITY
orERROR
. - capacity: (Optional) Specifies the desired
capacity
for the filter to be created. This parameter is ignored if the filter already exists. If the filter is automatically created and this parameter is absent, then the module-levelcapacity
is used. SeeBF.RESERVE
for more information about the impact of this value. - error: (Optional) Specifies the
error
ratio of the newly created filter if it does not yet exist. If the filter is automatically created anderror
is not specified then the module-level error rate is used. SeeBF.RESERVE
for more information about the format of this value. - NONSCALING: Prevents the filter from creating additional sub-filters if
initial capacity is reached. Non-scaling filters require slightly less
memory than their scaling counterparts. The filter returns an error
when
capacity
is reached. - expansion: When
capacity
is reached, an additional sub-filter is created. The size of the new sub-filter is the size of the last sub-filter multiplied byexpansion
. If the number of elements to be stored in the filter is unknown, we recommend that you use anexpansion
of 2 or more to reduce the number of sub-filters. Otherwise, we recommend that you use anexpansion
of 1 to reduce memory consumption. The default expansion value is 2.
Add three items to a filter with default parameters if the filter does not already exist:
BF.INSERT filter ITEMS foo bar baz
Add one item to a filter with a capacity of 10000 if the filter does not already exist:
BF.INSERT filter CAPACITY 10000 ITEMS hello
Add 2 items to a filter with an error if the filter does not already exist:
BF.INSERT filter NOCREATE ITEMS foo bar
O(k * n), where k is the number of hash
functions used by the last sub-filter
and m the number of items that are added.
An array of booleans (integers). Each element is either true or false depending on whether the corresponding input element was newly added to the filter or may have previously existed.
BF.EXISTS {key} {item}
Determines whether an item may exist in the Bloom Filter or not.
- key: The name of the filter
- item: The item to check for
O(k * n), where k is the number of hash
functions and n is the number of
sub-filters
.
On average, a sub-filter returns FALSE after 2 bits are tested. Therefore the
average complexity for a FALSE reply is O(2 * n)
. For a TRUE
reply the complexity is O((2 * 1/2 * n) + k)
.
"0" if the item certainly does not exist, "1" if the item may exist.
BF.MEXISTS {key} {item ...}
Determines if one or more items may exist in the filter or not.
- key: The name of the filter
- items: One or more items to check
O(m * k * n), where m is the number of added elements, k is the number of hash
functions and n is the number of sub-filters
.
On average, a sub-filter returns FALSE after 2 bits are tested. Therefore the
average complexity for a FALSE reply is O(m * 2 * n)
. For a TRUE
reply, the complexity is O(m * ((2 * 1/2 * n ) + k))
.
An array of boolean values (actually integers). A true value means the corresponding item may exist in the filter, and a false value means it does not exist in the filter.
BF.SCANDUMP {key} {iter}
Begins an incremental save of the bloom filter. This is useful for large bloom
filters which cannot fit into the normal SAVE
and RESTORE
model.
The first time this command is called, the value of iter
should be 0. This
command returns successive (iter, data)
pairs until (0, NULL)
to
indicate completion.
A demonstration in python-flavored pseudocode:
chunks = []
iter = 0
while True:
iter, data = BF.SCANDUMP(key, iter)
if iter == 0:
break
else:
chunks.append([iter, data])
# Load it back
for chunk in chunks:
iter, data = chunk
BF.LOADCHUNK(key, iter, data)
- key: Name of the filter
- iter: Iterator value; either 0 or the iterator from a previous invocation of this command
O(n), where n is the capacity.
An array of Iterator and Data. The Iterator is passed as input to the next
invocation of SCANDUMP
. If Iterator is 0, then it means iteration has
completed.
The iterator-data pair should also be passed to LOADCHUNK
when restoring
the filter.
BF.LOADCHUNK {key} {iter} {data}
Restores a filter previously saved using SCANDUMP
. See the SCANDUMP
command
for example usage.
This command overwrites any bloom filter stored under key
. Make sure that
the bloom filter is not be changed between invocations.
- key: Name of the key to restore
- iter: Iterator value associated with
data
(returned bySCANDUMP
) - data: Current data chunk (returned by
SCANDUMP
)
O(n), where n is the capacity.
OK
on success, or an error on failure.
BF.INFO {key}
Return information about key
- key: Name of the key to restore
O(1)
127.0.0.1:6379> BF.INFO cf
1) Capacity
2) (integer) 1709
3) Size
4) (integer) 2200
5) Number of filters
6) (integer) 1
7) Number of items inserted
8) (integer) 0
9) Expansion rate
10) (integer) 1