This is another approach to make mnesia
more partition-tolerant.
Results are not 100% accurate (see below for known problems), but it's much
better than default "do not even try to recover after split".
(node@host)>application:start(reunion).
This will start reunion
in default configuraion, tracking all keys in
set
and ordered_set
tables with default strategy of merge_only
and
trying to reconnect all not running nodes every net_tick_timeout
seconds.
reunion
tries to detect two types of collisions:
- INSERT/DELETE collision, when some record is present on one node and not present on another.
- Data Collision, when some record present on both nodes but contents differ.
In order to resolve first type of collision for set
tables, reunion
tracks
insert and delete operations, storing info about Table, Key, Operation and
WhenItHappened in internal ets
table. When collision occurs, reunion
consults this table trying to determine last event for this key and restores
record accordingly (if last found event is 'insert' - record is re-inserted
on node it missed, if 'delete' - record is deleted on node it still present).
For bag
tables operations are not tracked, and default behaviour is just to
merge bags, adding missing elements on node they are not present.
With Data Collision, when elements are present on both nodes with different
content, conflict resolution function is called. Default behaviour for
set
tables is to do nothing but to raise alarm (using sasl
alarm_handler
),
and for bag
tables both elements are written on both nodes.
There are two more pre-defined strategies: last_version
, that selects
"best" record using comparision on some field in the record,
(node@host)>mnesia:write_table_property(kvs, {reunion_compare,
{reunion_lib, last_version, [field]}}).
and last_modified
(variant of last_version using pre-defined modified
field
for comparison):
(node@host)>mnesia:write_table_property(kvs, {reunion_compare,
{reunion_lib, last_modified, []}}).
These strategies can be used for bag
tables too, but configuration is a bit
different: to select "best" element among a bag, elements should have some
"secondary key" field, and "best" element is selected only among elements with
the same "secondary key".
You can define your own conflict resolution functon, which will be called as:
function(init, {Table :: atom(), Type :: 'set' | 'bag', Fields :: list(atom()),
RemoteNode :: atom()) ->
{ok, Modstate :: any()};
function(done, Modstate :: any(), RemoteNode :: atom()) ->
any();
function(LocalRecords, RemoteRecord, ModState :: any()) ->
{ok, Actions :: reunion:action() | list(reunion:action()),
NextState :: any()} |
{inconsistency, Error, NextState :: any()}
where Action
can be one of
{write_local, Record} | {write_remote, Record} | {delete_local, Record} |
{delete_remote, Record}
I hope, the names are self-descriptive enough.
(node@host)>mnesia:write_table_property(bag, {reunion_compare, ignore}).
This also disables key learning for this table, useable for tables with rapidly changing data.
(node@host)>application:set_env(reunion, reconnect, never).
With this setting no new reconnect timers will be scheduled. Other possible setting is custom timeout value in seconds.
As user_properties
are not auto-propagated to table fragments,
compare
strategy always inherited from base_table
. There are no
way to implement custom strategy for some fragment.
Error window: it's possible that some conflicts will not be found or that
reunion can introduce missing update
problems. This is caused by the
fact that no table locking used, so objects in mnesia can be changed
after object is fetched for compare but before resulting object is
written or before mnesia nodes are joined again.
Another interesting edge case: let's assume netsplit happens right after
creating some record and then record is deleted during netsplit. This can
be detected and resolved correctly only in case reuinion
is running on
both nodes and clocks are synchronised. If there are no reunion
on
island where record is deleted - this deletion will be missed.
reunion
is an "almost complete rewrite" of unsplit
by Ulf Wiger.
Major difference between our approaches: unsplit
uses a stateless
approach and just unable to resolve "object present here and not present
there" case: this can be a result of local insertion or remote deletion,
and it's not possible to determine what happened without storing this
information.
Ulf Wiger for his unsplit
application.