Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
ovn-controller: eliminate stall in ofctrl state machine
The "ovn -- 2 HVs, 3 LRs connected via LS, static routes" test case currently exhibits frequent failures. These failures occur because, at the time that the test packets are sent to verify forwarding, no flows have been installed in the vswitch for one of the hypervisors. The state machine implemented by ofctrl_run() is intended to iterate as long as progress is being made, either as long as the state continues to change or as long as packets are being received. Unfortunately, the code had a bug: if receiving a packet caused the state to change, it didn't call the state's run function again to try to see if it would change the state. This caused a real problem in the following case: 1) The state is S_TLV_TABLE_MOD_SENT. 2) An OFPTYPE_NXT_TLV_TABLE_REPLY message is received. 3) No event (other than SB probe timer expiration) is expected that would unblock poll_block() in the main ovn-controller loop. In such a case, ofctrl_run() would receive the packet and advance the state, but not call the run function for the new state, and then leave the state machine paused until the next event (e.g. a timer event) occurred. This commit fixes the problem by continuing to iterate the state machine until the state remains the same and no packet is received in the same iteration. Without this fix, around 40 failures are seen out of 100 attempts, with this fix no failures have been observed in several hundred attempts (using an earlier version of this patch). Signed-off-by: Lance Richardson <[email protected]> [[email protected] refactored for clarity] Signed-off-by: Ben Pfaff <[email protected]> Acked-by: Lance Richardson <[email protected]>
- Loading branch information