admin.c
,stonith_admin.8
:stonith_admin
command-line tool and its man pagecommands.c
,internal.h
,main.c
,remote.c
,stonithd.7
: stonithd and its man pagefence_dummy
,fence_legacy
,fence_legacy.8
,fence_pcmk
,fence_pcmk.8
: Pacemaker-supplied fence agents and their man pagesregression.py(.in)
: regression tests forstonithd
standalone_config.c
,standalone_config.h
: abandoned projecttest.c
:stonith-test
command-line tool
In the broadest terms, stonith works like this:
- The initiator (an external program such as
stonith_admin
, or the cluster itself via thecrmd
) asks the localstonithd
, "Hey, can you fence this node?" - The local
stonithd
asks all thestonithd's
in the cluster (including itself), "Hey, what fencing devices do you have access to that can fence this node?" - Each
stonithd
in the cluster replies with a list of available devices that it knows about. - Once the original
stonithd
gets all the replies, it asks the most appropriatestonithd
peer to actually carry out the fencing. It may send out more than one such request if the target node must be fenced with multiple devices. - The chosen
stonithd(s)
call the appropriate fencing resource agent(s) to do the fencing, then replies to the originalstonithd
with the result. - The original
stonithd
broadcasts the result to allstonithd's
. - Each
stonithd
sends the result to each of its local clients (including, at some point, the initiator).
A fencing request can be initiated by the cluster or externally, using the libfencing API.
- The cluster always initiates fencing via
crmd/te_actions.c:te_fence_node()
(which calls thefence()
API). This occurs when a graph synapse contains aCRM_OP_FENCE
XML operation. - The main external clients are
stonith_admin
andstonith-test
.
Highlights of the fencing API:
stonith_api_new()
creates and returns a newstonith_t
object, whosecmds
member has methods for connect, disconnect, fence, etc.- the
fence()
method creates and sends aSTONITH_OP_FENCE XML
request with the desired action and target node. Callers do not have to choose or even have any knowledge about particular fencing devices.
The function calls for a stonith request go something like this as of this writing:
The local stonithd
receives the client's request via an IPC or messaging
layer callback, which calls
stonith_command()
, which (for requests) callshandle_request()
, which (forSTONITH_OP_FENCE
from a client) callsinitiate_remote_stonith_op()
, which creates aSTONITH_OP_QUERY
XML request with the target, desired action, timeout, etc.. then broadcasts the operation to the cluster group (i.e. allstonithd
instances) and starts a timer. The query is broadcast because (1) location constraints might prevent the local node from accessing the stonith device directly, and (2) even if the local node does have direct access, another node might be preferred to carry out the fencing.
Each stonithd
receives the original stonithd's STONITH_OP_QUERY
broadcast
request via IPC or messaging layer callback, which calls:
stonith_command()
, which (for requests) callshandle_request()
, which (forSTONITH_OP_QUERY
from a peer) callsstonith_query()
, which callsget_capable_devices()
withstonith_query_capable_device_db()
to add device information to an XML reply and send it. (A message is considered a reply if it containsT_STONITH_REPLY
, which is only set bystonithd
peers, not clients.)
The original stonithd
receives all peers' STONITH_OP_QUERY
replies via IPC
or messaging layer callback, which calls:
stonith_command()
, which (for replies) callshandle_reply()
which (forSTONITH_OP_QUERY
) callsprocess_remote_stonith_query()
, which allocates a new query result structure, parses device information into it, and adds it to operation object. It increments the number of replies received for this operation, and compares it against the expected number of replies (i.e. the number of active peers), and if this is the last expected reply, callscall_remote_stonith()
, which calculates the timeout and sendsSTONITH_OP_FENCE
request(s) to carry out the fencing. If the target node has a fencing "topology" (which allows specifications such as "this node can be fenced either with device A, or devices B and C in combination"), it will choose the device(s), and send out as many requests as needed. If it chooses a device, it will choose the peer; a peer is preferred if it has "verified" access to the desired device, meaning that it has the device "running" on it and thus has a monitor operation ensuring reachability.
Each STONITH_OP_FENCE
request goes something like this as of this writing:
The chosen peer stonithd
receives the STONITH_OP_FENCE
request via IPC or
messaging layer callback, which calls:
stonith_command()
, which (for requests) callshandle_request()
, which (forSTONITH_OP_FENCE
from a peer) callsstonith_fence()
, which callsschedule_stonith_command()
(using supplied device ifF_STONITH_DEVICE
was set, otherwise the highest-priority capable device obtained viaget_capable_devices()
withstonith_fence_get_devices_cb()
), which adds the operation to the device's pending operations list and triggers processing.
The chosen peer stonithd's
mainloop is triggered and calls
stonith_device_dispatch()
, which callsstonith_device_execute()
, which pops off the next item from the device's pending operations list. If acting as the (internally implemented) watchdog agent, it panics the node, otherwise it callsstonith_action_create()
andstonith_action_execute_async()
to call the fencing agent.
The chosen peer stonithd's mainloop is triggered again once the fencing agent returns, and calls
stonith_action_async_done()
which adds the results to an action object then calls its- done callback (
st_child_done()
), which callsschedule_stonith_command()
for a new device if there are further required actions to execute or if the original action failed, then builds and sends an XML reply to the originalstonithd
(viastonith_send_async_reply()
), then checks whether any pending actions are the same as the one just executed and merges them if so.
- done callback (
The original stonithd
receives the STONITH_OP_FENCE
reply via IPC or
messaging layer callback, which calls:
stonith_command()
, which (for replies) callshandle_reply()
, which callsprocess_remote_stonith_exec()
, which calls eithercall_remote_stonith()
(to retry a failed operation, or try the next device in a topology is appropriate, which issues a newSTONITH_OP_FENCE
request, proceeding as before) orremote_op_done()
(if the operation is definitively failed or successful).- remote_op_done() broadcasts the result to all peers.
Finally, all peers receive the broadcast result and call
remote_op_done()
, which sends the result to all local clients.