Skip to content

Commit

Permalink
ovn: Implement basic ARP support for L3 logical routers.
Browse files Browse the repository at this point in the history
This is sufficient support that an L3 logical router can now transmit
packets to VMs (and other destinations) without having to know the
IP-to-MAC binding in advance.  The details are carefully documented in all
of the appropriate places.

There are several important caveats that need to be fixed before this can
be taken seriously in production.  These are documented in ovn/TODO.  The
most important of these are renewal, expiration, and limiting the size of
the ARP table.

Signed-off-by: Ben Pfaff <[email protected]>
Acked-by: Justin Pettit <[email protected]>
  • Loading branch information
blp committed Mar 12, 2016
1 parent bce7cf4 commit 0bac716
Show file tree
Hide file tree
Showing 18 changed files with 1,116 additions and 208 deletions.
60 changes: 13 additions & 47 deletions ovn/TODO
Original file line number Diff line number Diff line change
Expand Up @@ -4,18 +4,6 @@

** New OVN logical actions

*** arp

Generates an ARP packet based on the current IPv4 packet and allows it
to be processed as part of the current pipeline (and then pop back to
processing the original IPv4 packet).

TCP/IP stacks typically limit the rate at which ARPs are sent, e.g. to
one per second for a given target. We might need to do this too.

We probably need to buffer the packet that generated the ARP. I don't
know where to do that.

*** icmp4 { action... }

Generates an ICMPv4 packet based on the current IPv4 packet and
Expand Down Expand Up @@ -60,37 +48,16 @@ the "arp" action, and an action for generating

** Dynamic IP to MAC bindings

Some bindings from IP address to MAC will undoubtedly need to be
discovered dynamically through ARP requests. It's straightforward
enough for a logical L3 router to generate ARP requests and forward
them to the appropriate switch.

It's more difficult to figure out where the reply should be processed
and stored. It might seem at first that a first-cut implementation
could just keep track of the binding on the hypervisor that needs to
know, but that can't happen easily because the VM that sends the reply
might not be on the same HV as the VM that needs the answer (that is,
the VM that sent the packet that needs the binding to be resolved) and
there isn't an easy way for it to know which HV needs the answer.

Thus, the HV that processes the ARP reply (which is unknown when the
ARP is sent) has to tell all the HVs the binding. The most obvious
place for this in the OVN_Southbound database.
OVN has basic support for establishing IP to MAC bindings dynamically,
using ARP.

Details need to be worked out, including:
*** Ratelimiting.

*** OVN_Southbound schema changes.
From casual observation, Linux appears to generate at most one ARP per
second per destination.

Possibly bindings could be added to the Port_Binding table by adding
or modifying columns. Another possibility is that another table
should be added.

*** Logical_Flow representation

It would be really nice to maintain the general-purpose nature of
logical flows, but these bindings might have to include some
hard-coded special cases, especially when it comes to the relationship
with populating the bindings into the OVN_Southbound table.
This might be supported by adding a new OVN logical action for
rate-limiting.

*** Tracking queries

Expand All @@ -104,16 +71,15 @@ into the database.
Something needs to make sure that bindings remain valid and expire
those that become stale.

** MTU handling (fragmentation on output)
One way to do this might be to add some support for time to the
database server itself.

** Ratelimiting.
*** Table size limiting.

*** ARP.
The table of MAC bindings must not be allowed to grow unreasonably
large.

*** ICMP error generation, TCP reset, UDP unreachable, protocol unreachable, ...

As a point of comparison, Linux doesn't ratelimit TCP resets but I
think it does everything else.
** MTU handling (fragmentation on output)

* ovn-controller

Expand Down
88 changes: 80 additions & 8 deletions ovn/controller/lflow.c
Original file line number Diff line number Diff line change
Expand Up @@ -193,15 +193,13 @@ is_switch(const struct sbrec_datapath_binding *ldp)

}

/* Translates logical flows in the Logical_Flow table in the OVN_SB database
* into OpenFlow flows. See ovn-architecture(7) for more information. */
void
lflow_run(struct controller_ctx *ctx, const struct lport_index *lports,
const struct mcgroup_index *mcgroups,
const struct hmap *local_datapaths,
const struct simap *ct_zones, struct hmap *flow_table)
/* Adds the logical flows from the Logical_Flow table to 'flow_table'. */
static void
add_logical_flows(struct controller_ctx *ctx, const struct lport_index *lports,
const struct mcgroup_index *mcgroups,
const struct hmap *local_datapaths,
const struct simap *ct_zones, struct hmap *flow_table)
{
struct hmap flows = HMAP_INITIALIZER(&flows);
uint32_t conj_id_ofs = 1;

const struct sbrec_logical_flow *lflow;
Expand Down Expand Up @@ -275,6 +273,7 @@ lflow_run(struct controller_ctx *ctx, const struct lport_index *lports,
.first_ptable = first_ptable,
.cur_ltable = lflow->table_id,
.output_ptable = output_ptable,
.arp_ptable = OFTABLE_MAC_BINDING,
};
error = actions_parse_string(lflow->actions, &ap, &ofpacts, &prereqs);
if (error) {
Expand Down Expand Up @@ -351,6 +350,79 @@ lflow_run(struct controller_ctx *ctx, const struct lport_index *lports,
}
}

static void
put_load(const uint8_t *data, size_t len,
enum mf_field_id dst, int ofs, int n_bits,
struct ofpbuf *ofpacts)
{
struct ofpact_set_field *sf = ofpact_put_SET_FIELD(ofpacts);
sf->field = mf_from_id(dst);
sf->flow_has_vlan = false;

bitwise_copy(data, len, 0, &sf->value, sf->field->n_bytes, ofs, n_bits);
bitwise_one(&sf->mask, sf->field->n_bytes, ofs, n_bits);
}

/* Adds an OpenFlow flow to 'flow_table' for each MAC binding in the OVN
* southbound database, using 'lports' to resolve logical port names to
* numbers. */
static void
add_neighbor_flows(struct controller_ctx *ctx,
const struct lport_index *lports, struct hmap *flow_table)
{
struct ofpbuf ofpacts;
struct match match;
match_init_catchall(&match);
ofpbuf_init(&ofpacts, 0);

const struct sbrec_mac_binding *b;
SBREC_MAC_BINDING_FOR_EACH (b, ctx->ovnsb_idl) {
const struct sbrec_port_binding *pb
= lport_lookup_by_name(lports, b->logical_port);
if (!pb) {
continue;
}

struct eth_addr mac;
if (!eth_addr_from_string(b->mac, &mac)) {
static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1);
VLOG_WARN_RL(&rl, "bad 'mac' %s", b->mac);
continue;
}

ovs_be32 ip;
if (!ip_parse(b->ip, &ip)) {
static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1);
VLOG_WARN_RL(&rl, "bad 'ip' %s", b->ip);
continue;
}

match_set_metadata(&match, htonll(pb->datapath->tunnel_key));
match_set_reg(&match, MFF_LOG_OUTPORT - MFF_REG0, pb->tunnel_key);
match_set_reg(&match, 0, ntohl(ip));

ofpbuf_clear(&ofpacts);
put_load(mac.ea, sizeof mac.ea, MFF_ETH_DST, 0, 48, &ofpacts);

ofctrl_add_flow(flow_table, OFTABLE_MAC_BINDING, 100,
&match, &ofpacts);
}
ofpbuf_uninit(&ofpacts);
}

/* Translates logical flows in the Logical_Flow table in the OVN_SB database
* into OpenFlow flows. See ovn-architecture(7) for more information. */
void
lflow_run(struct controller_ctx *ctx, const struct lport_index *lports,
const struct mcgroup_index *mcgroups,
const struct hmap *local_datapaths,
const struct simap *ct_zones, struct hmap *flow_table)
{
add_logical_flows(ctx, lports, mcgroups, local_datapaths,
ct_zones, flow_table);
add_neighbor_flows(ctx, lports, flow_table);
}

void
lflow_destroy(void)
{
Expand Down
1 change: 1 addition & 0 deletions ovn/controller/lflow.h
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ struct uuid;
#define OFTABLE_DROP_LOOPBACK 34
#define OFTABLE_LOG_EGRESS_PIPELINE 48 /* First of LOG_PIPELINE_LEN tables. */
#define OFTABLE_LOG_TO_PHY 64
#define OFTABLE_MAC_BINDING 65

/* The number of tables for the ingress and egress pipelines. */
#define LOG_PIPELINE_LEN 16
Expand Down
9 changes: 4 additions & 5 deletions ovn/controller/ovn-controller.c
Original file line number Diff line number Diff line change
Expand Up @@ -302,7 +302,7 @@ main(int argc, char *argv[])

enum mf_field_id mff_ovn_geneve = ofctrl_run(br_int);

pinctrl_run(br_int);
pinctrl_run(&ctx, &lports, br_int);

struct hmap flow_table = HMAP_INITIALIZER(&flow_table);
lflow_run(&ctx, &lports, &mcgroups, &local_datapaths,
Expand Down Expand Up @@ -332,13 +332,12 @@ main(int argc, char *argv[])
poll_immediate_wake();
}

ovsdb_idl_loop_commit_and_wait(&ovnsb_idl_loop);
ovsdb_idl_loop_commit_and_wait(&ovs_idl_loop);

if (br_int) {
ofctrl_wait();
pinctrl_wait();
pinctrl_wait(&ctx);
}
ovsdb_idl_loop_commit_and_wait(&ovnsb_idl_loop);
ovsdb_idl_loop_commit_and_wait(&ovs_idl_loop);
poll_block();
if (should_service_stop()) {
exiting = true;
Expand Down
Loading

0 comments on commit 0bac716

Please sign in to comment.