Skip to content

Commit

Permalink
ovn-controller: eliminate stall in ofctrl state machine
Browse files Browse the repository at this point in the history
The "ovn -- 2 HVs, 3 LRs connected via LS, static routes"
test case currently exhibits frequent failures. These failures
occur because, at the time that the test packets are sent to
verify forwarding, no flows have been installed in the vswitch
for one of the hypervisors.

The state machine implemented by ofctrl_run() is intended to
iterate as long as progress is being made, either as long as
the state continues to change or as long as packets are being
received.  Unfortunately, the code had a bug: if receiving a
packet caused the state to change, it didn't call the state's
run function again to try to see if it would change the state.
This caused a real problem in the following case:

   1) The state is S_TLV_TABLE_MOD_SENT.
   2) An OFPTYPE_NXT_TLV_TABLE_REPLY message is received.
   3) No event (other than SB probe timer expiration) is expected
      that would unblock poll_block() in the main ovn-controller
      loop.

In such a case, ofctrl_run() would receive the packet and
advance the state, but not call the run function for the new
state, and then leave the state machine paused until the next
event (e.g. a timer event) occurred.

This commit fixes the problem by continuing to iterate the state
machine until the state remains the same and no packet is
received in the same iteration.  Without this fix, around 40
failures are seen out of 100 attempts, with this fix no failures
have been observed in several hundred attempts (using an earlier
version of this patch).

Signed-off-by: Lance Richardson <[email protected]>
[[email protected] refactored for clarity]
Signed-off-by: Ben Pfaff <[email protected]>
Acked-by: Lance Richardson <[email protected]>
  • Loading branch information
hlrichardson authored and blp committed Jul 23, 2016
1 parent a4d0428 commit c8d7a71
Showing 1 changed file with 32 additions and 24 deletions.
56 changes: 32 additions & 24 deletions ovn/controller/ofctrl.c
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@
#include "openvswitch/vlog.h"
#include "ovn-controller.h"
#include "ovn/lib/actions.h"
#include "poll-loop.h"
#include "physical.h"
#include "rconn.h"
#include "socket-util.h"
Expand Down Expand Up @@ -409,45 +410,52 @@ ofctrl_run(const struct ovsrec_bridge *br_int)
state = S_NEW;
}

enum ofctrl_state old_state;
do {
old_state = state;
bool progress = true;
for (int i = 0; progress && i < 50; i++) {
/* Allow the state machine to run. */
enum ofctrl_state old_state = state;
switch (state) {
#define STATE(NAME) case NAME: run_##NAME(); break;
STATES
#undef STATE
default:
OVS_NOT_REACHED();
}
} while (state != old_state);

for (int i = 0; state == old_state && i < 50; i++) {
/* Try to process a received packet. */
struct ofpbuf *msg = rconn_recv(swconn);
if (!msg) {
break;
}

const struct ofp_header *oh = msg->data;
enum ofptype type;
enum ofperr error;
if (msg) {
const struct ofp_header *oh = msg->data;
enum ofptype type;
enum ofperr error;

error = ofptype_decode(&type, oh);
if (!error) {
switch (state) {
error = ofptype_decode(&type, oh);
if (!error) {
switch (state) {
#define STATE(NAME) case NAME: recv_##NAME(oh, type); break;
STATES
STATES
#undef STATE
default:
OVS_NOT_REACHED();
default:
OVS_NOT_REACHED();
}
} else {
char *s = ofp_to_string(oh, ntohs(oh->length), 1);
VLOG_WARN("could not decode OpenFlow message (%s): %s",
ofperr_to_string(error), s);
free(s);
}
} else {
char *s = ofp_to_string(oh, ntohs(oh->length), 1);
VLOG_WARN("could not decode OpenFlow message (%s): %s",
ofperr_to_string(error), s);
free(s);

ofpbuf_delete(msg);
}

ofpbuf_delete(msg);
/* If we did some work, plan to go around again. */
progress = old_state != state || msg;
}
if (progress) {
/* We bailed out to limit the amount of work we do in one go, to allow
* other code a chance to run. We were still making progress at that
* point, so ensure that we come back again without waiting. */
poll_immediate_wake();
}

return (state == S_CLEAR_FLOWS || state == S_UPDATE_FLOWS
Expand Down

0 comments on commit c8d7a71

Please sign in to comment.