From 06a26dd2b4b80c2c4159eac245c9cc8e5ccb93e6 Mon Sep 17 00:00:00 2001
From: Mickey Spiegel router
and
the connected logical router port specifies a
- redirect-chassis
, the flow is only programmed on the
- redirect-chassis
.
+ redirect-chassis
:
redirect-chassis
.
+ redirect-chassis
.
+ For each dnat_and_snat
NAT rule on a distributed
+ router that specifies an external Ethernet address E,
+ a priority-50 flow that matches inport == GW
+ && eth.dst == E
, where GW
+ is the logical router gateway port, with action
+ next;
.
+
+ This flow is only programmed on the gateway port instance on
+ the chassis where the logical_port
specified in
+ the NAT rule resides.
+
@@ -928,7 +961,9 @@ output;
ip4.src
or ip6.src
is any IP
- address owned by the router.
+ address owned by the router, unless the packet was recirculated
+ due to egress loopback as indicated by
+ REGBIT_EGRESS_LOOPBACK
.
ip4.src
is the broadcast address of any IP network
@@ -1040,6 +1075,50 @@ outport = P;
flags.loopback = 1;
output;
+
+
+ For the gateway port on a distributed logical router with NAT
+ (where one of the logical router ports specifies a
+ redirect-chassis
):
+
redirect-chassis
. This behavior avoids
+ generation of multiple ARP responses from different chassis,
+ and allows upstream MAC learning to point to the
+ redirect-chassis
.
+
+ If the corresponding NAT rule can be handled in a distributed
+ manner, then this flow is only programmed on the gateway port
+ instance where the logical_port
specified in the
+ NAT rule resides.
+
+ Some of the actions are different for this case, using the
+ external_mac
specified in the NAT rule rather
+ than the gateway port's Ethernet address E:
+
+eth.src = external_mac; +arp.sha = external_mac; ++ +
+ This behavior avoids generation of multiple ARP responses + from different chassis, and allows upstream MAC learning to + point to the correct chassis. +
+Ingress Table 3: UNSNAT on Gateway Routers
+@@ -1273,6 +1354,45 @@ icmp4 {
Ingress Table 3: UNSNAT on Distributed Routers
+ +
+ For each configuration in the OVN Northbound database, that asks
+ to change the source IP address of a packet from A to
+ B, a priority-100 flow matches ip &&
+ ip4.dst == B && inport == GW
,
+ where GW is the logical router gateway port, with an
+ action ct_snat; next;
.
+
+ If the NAT rule cannot be handled in a distributed manner, then
+ the priority-100 flow above is only programmed on the
+ redirect-chassis
.
+
+ For each configuration in the OVN Northbound database, that asks
+ to change the source IP address of a packet from A to
+ B, a priority-50 flow matches ip &&
+ ip4.dst == B
with an action
+ REGBIT_NAT_REDIRECT = 1; next;
. This flow is for
+ east/west traffic to a NAT destination IPv4 address. By
+ setting the REGBIT_NAT_REDIRECT
flag, in the
+ ingress table Gateway Redirect
this will trigger a
+ redirect to the instance of the gateway port on the
+ redirect-chassis
.
+
+ A priority-0 logical flow with match 1
has actions
+ next;
.
+
@@ -1280,6 +1400,9 @@ icmp4 { be DNATted from a virtual IP address to a real IP address. Packets in the reverse direction needs to be unDNATed.
+ +Ingress Table 4: DNAT on Gateway Routers
+Ingress Table 4: DNAT on Distributed Routers
+ ++ On distributed routers, the DNAT table only handles packets + with destination IP address that needs to be DNATted from a + virtual IP address to a real IP address. The unDNAT processing + in the reverse direction is handled in a separate table in the + egress pipeline. +
+ +
+ For each configuration in the OVN Northbound database, that asks
+ to change the destination IP address of a packet from A to
+ B, a priority-100 flow matches ip &&
+ ip4.dst == B && inport == GW
,
+ where GW is the logical router gateway port, with an
+ action ct_dnat(B);
.
+
+ If the NAT rule cannot be handled in a distributed manner, then
+ the priority-100 flow above is only programmed on the
+ redirect-chassis
.
+
+ For each configuration in the OVN Northbound database, that asks
+ to change the destination IP address of a packet from A to
+ B, a priority-50 flow matches ip &&
+ ip4.dst == B
with an action
+ REGBIT_NAT_REDIRECT = 1; next;
. This flow is for
+ east/west traffic to a NAT destination IPv4 address. By
+ setting the REGBIT_NAT_REDIRECT
flag, in the
+ ingress table Gateway Redirect
this will trigger a
+ redirect to the instance of the gateway port on the
+ redirect-chassis
.
+
+ A priority-0 logical flow with match 1
has actions
+ next;
.
+
@@ -1367,9 +1537,9 @@ icmp4 {
packet's final destination, unchanged) and advances to the next
table for ARP resolution. It also sets reg1
(or
xxreg1
) to the IP address owned by the selected router
- port (Table 7 will generate ARP request, if needed, with
- reg0
as the target protocol address and reg1
- as the source protocol address).
+ port (ingress table ARP Request
will generate an ARP
+ request, if needed, with reg0
as the target protocol
+ address and reg1
as the source protocol address).
@@ -1377,6 +1547,16 @@ icmp4 {
+ For distributed logical routers where one of the logical router
+ ports specifies a redirect-chassis
, a priority-300
+ logical flow with match REGBIT_NAT_REDIRECT == 1
has
+ actions ip.ttl--; next;
. The outport
+ will be set later in the Gateway Redirect table.
+
IPv4 routing table. For each route to IPv4 network N with @@ -1462,6 +1642,17 @@ next;
+ For distributed logical routers where one of the logical router
+ ports specifies a redirect-chassis
, a priority-200
+ logical flow with match REGBIT_NAT_REDIRECT == 1
has
+ actions eth.dst = E; next;
, where
+ E is the ethernet address of the router's distributed
+ gateway port.
+
Static MAC bindings. MAC bindings can be known statically based on @@ -1513,9 +1704,9 @@ next;
Dynamic MAC bindings. These flows resolve MAC-to-IP bindings
that have become known dynamically through ARP or neighbor
- discovery. (The next table will issue an ARP or neighbor
- solicitation request for cases where the binding is not yet
- known.)
+ discovery. (The ingress table ARP Request
will
+ issue an ARP or neighbor solicitation request for cases where
+ the binding is not yet known.)
@@ -1540,6 +1731,15 @@ next;
REGBIT_NAT_REDIRECT == 1
has actions
+ outport = CR; next;
, where CR
+ is the chassisredirect
port representing the instance
+ of the logical router distributed gateway port on the
+ redirect-chassis
.
+ outport == GW &&
@@ -1552,6 +1752,15 @@ next;
redirect-chassis
.
ip4.src == B &&
+ outport == GW
, where GW is
+ the logical router distributed gateway port, with actions
+ next;
.
+ outport == GW
has actions
@@ -1595,9 +1804,9 @@ arp {
- (Ingress table 4 initialized reg1
with the IP address
- owned by outport
and reg0
with the next-hop
- IP address)
+ (Ingress table IP Routing
initialized reg1
+ with the IP address owned by outport
and
+ reg0
with the next-hop IP address)
@@ -1611,12 +1820,60 @@ arp {
+ This is for already established connections' reverse traffic. + i.e., DNAT has already been done in ingress pipeline and now the + packet has entered the egress pipeline as part of a reply. For + NAT on a distributed router, it is unDNATted here. For Gateway + routers, the unDNAT processing is carried out in the ingress DNAT + table. +
+ +
+ For each configuration in the OVN Northbound database that asks
+ to change the destination IP address of a packet from an IP
+ address of A to B, a priority-100 flow
+ matches ip && ip4.src == B
+ && outport == GW
, where GW
+ is the logical router gateway port, with an action
+ ct_dnat;
.
+
+ If the NAT rule cannot be handled in a distributed manner, then
+ the priority-100 flow above is only programmed on the
+ redirect-chassis
.
+
+ If the NAT rule can be handled in a distributed manner, then
+ there is an additional action
+ eth.src = EA;
, where EA
+ is the ethernet address associated with the IP address
+ A in the NAT rule. This allows upstream MAC
+ learning to point to the correct chassis.
+
1
has actions
+ next;
.
+ Packets that are configured to be SNATed get their source IP address changed based on the configuration in the OVN Northbound database.
+ +Egress Table 1: SNAT on Gateway Routers
+@@ -1650,7 +1907,122 @@ arp {
Egress Table 1: SNAT on Distributed Routers
+ +
+ For each configuration in the OVN Northbound database, that asks
+ to change the source IP address of a packet from an IP address of
+ A or to change the source IP address of a packet that
+ belongs to network A to B, a flow matches
+ ip && ip4.src == A &&
+ outport == GW
, where GW is the
+ logical router gateway port, with an action
+ ct_snat(B);
. The priority of the flow
+ is calculated based on the mask of A, with matches
+ having larger masks getting higher priorities.
+
+ If the NAT rule cannot be handled in a distributed manner, then
+ the flow above is only programmed on the
+ redirect-chassis
.
+
+ If the NAT rule can be handled in a distributed manner, then
+ there is an additional action
+ eth.src = EA;
, where EA
+ is the ethernet address associated with the IP address
+ A in the NAT rule. This allows upstream MAC
+ learning to point to the correct chassis.
+
1
has actions
+ next;
.
+
+ For distributed logical routers where one of the logical router
+ ports specifies a redirect-chassis
.
+
+ Earlier in the ingress pipeline, some east-west traffic was
+ redirected to the chassisredirect
port, based on
+ flows in the UNSNAT
and DNAT
ingress
+ tables setting the REGBIT_NAT_REDIRECT
flag, which
+ then triggered a match to a flow in the
+ Gateway Redirect
ingress table. The intention was
+ not to actually send traffic out the distributed gateway port
+ instance on the redirect-chassis
. This traffic was
+ sent to the distributed gateway port instance in order for DNAT
+ and/or SNAT processing to be applied.
+
+ While UNDNAT and SNAT processing have already occurred by this + point, this traffic needs to be forced through egress loopback on + this distributed gateway port instance, in order for UNSNAT and + DNAT processing to be applied, and also for IP routing and ARP + resolution after all of the NAT processing, so that the packet can + be forwarded to the destination. +
+ ++ This table has the following flows: +
+ +
+ For each NAT rule in the OVN Northbound database on a
+ distributed router, a priority-100 logical flow with match
+ ip4.dst == E &&
+ outport == GW
, where E is the
+ external IP address specified in the NAT rule, and GW
+ is the logical router distributed gateway port, with the
+ following actions:
+
+clone { + ct_clear; + inport = outport; + outport = ""; + flags = 0; + flags.loopback = 1; + reg0 = 0; + reg1 = 0; + ... + reg9 = 0; + REGBIT_EGRESS_LOOPBACK = 1; + next(pipeline=ingress, table=0); +}; ++ +
+ flags.loopback
is set since in_port is unchanged
+ and the packet may return back to that port after NAT processing.
+ REGBIT_EGRESS_LOOPBACK
is set to indicate that
+ egress loopback has occurred, in order to skip the source IP
+ address check against the router address.
+
1
has actions
+ next;
.
+ Packets that reach this table are ready for delivery. It contains diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c index 5c03b04b36b..a4f76a9d6af 100644 --- a/ovn/northd/ovn-northd.c +++ b/ovn/northd/ovn-northd.c @@ -28,6 +28,7 @@ #include "openvswitch/hmap.h" #include "openvswitch/json.h" #include "ovn/lex.h" +#include "ovn/lib/logical-fields.h" #include "ovn/lib/ovn-dhcp.h" #include "ovn/lib/ovn-nb-idl.h" #include "ovn/lib/ovn-sb-idl.h" @@ -136,8 +137,10 @@ enum ovn_stage { PIPELINE_STAGE(ROUTER, IN, ARP_REQUEST, 8, "lr_in_arp_request") \ \ /* Logical router egress stages. */ \ - PIPELINE_STAGE(ROUTER, OUT, SNAT, 0, "lr_out_snat") \ - PIPELINE_STAGE(ROUTER, OUT, DELIVERY, 1, "lr_out_delivery") + PIPELINE_STAGE(ROUTER, OUT, UNDNAT, 0, "lr_out_undnat") \ + PIPELINE_STAGE(ROUTER, OUT, SNAT, 1, "lr_out_snat") \ + PIPELINE_STAGE(ROUTER, OUT, EGR_LOOP, 2, "lr_out_egr_loop") \ + PIPELINE_STAGE(ROUTER, OUT, DELIVERY, 3, "lr_out_delivery") #define PIPELINE_STAGE(DP_TYPE, PIPELINE, STAGE, TABLE, NAME) \ S_##DP_TYPE##_##PIPELINE##_##STAGE \ @@ -152,11 +155,20 @@ enum ovn_stage { * priority to determine the ACL's logical flow priority. */ #define OVN_ACL_PRI_OFFSET 1000 +/* Register definitions specific to switches. */ #define REGBIT_CONNTRACK_DEFRAG "reg0[0]" #define REGBIT_CONNTRACK_COMMIT "reg0[1]" #define REGBIT_CONNTRACK_NAT "reg0[2]" #define REGBIT_DHCP_OPTS_RESULT "reg0[3]" +/* Register definitions for switches and routers. */ +#define REGBIT_NAT_REDIRECT "reg9[0]" +/* Indicate that this packet has been recirculated using egress + * loopback. This allows certain checks to be bypassed, such as a + * logical router dropping packets with source IP address equals + * one of the logical router's own IP addresses. */ +#define REGBIT_EGRESS_LOOPBACK "reg9[1]" + /* Returns an "enum ovn_stage" built from the arguments. */ static enum ovn_stage ovn_stage_build(enum ovn_datapath_type dp_type, enum ovn_pipeline pipeline, @@ -3265,6 +3277,33 @@ build_lswitch_flows(struct hmap *datapaths, struct hmap *ports, ds_put_format(&actions, "outport = %s; output;", op->json_key); ovn_lflow_add(lflows, op->od, S_SWITCH_IN_L2_LKUP, 50, ds_cstr(&match), ds_cstr(&actions)); + + /* Add ethernet addresses specified in NAT rules on + * distributed logical routers. */ + if (op->peer->od->l3dgw_port + && op->peer == op->peer->od->l3dgw_port) { + for (int i = 0; i < op->peer->od->nbr->n_nat; i++) { + const struct nbrec_nat *nat + = op->peer->od->nbr->nat[i]; + if (!strcmp(nat->type, "dnat_and_snat") + && nat->logical_port && nat->external_mac + && eth_addr_from_string(nat->external_mac, &mac)) { + + ds_clear(&match); + ds_put_format(&match, "eth.dst == "ETH_ADDR_FMT + " && is_chassis_resident(\"%s\")", + ETH_ADDR_ARGS(mac), + nat->logical_port); + + ds_clear(&actions); + ds_put_format(&actions, "outport = %s; output;", + op->json_key); + ovn_lflow_add(lflows, op->od, S_SWITCH_IN_L2_LKUP, + 50, ds_cstr(&match), + ds_cstr(&actions)); + } + } + } } else { static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1); @@ -3794,6 +3833,7 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports, ds_clear(&match); ds_put_cstr(&match, "ip4.src == "); op_put_v4_networks(&match, op, true); + ds_put_cstr(&match, " && "REGBIT_EGRESS_LOOPBACK" == 0"); ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_INPUT, 100, ds_cstr(&match), "drop;"); @@ -3966,17 +4006,56 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports, ds_clear(&actions); ds_put_format(&actions, "eth.dst = eth.src; " - "eth.src = %s; " "arp.op = 2; /* ARP reply */ " - "arp.tha = arp.sha; " - "arp.sha = %s; " + "arp.tha = arp.sha; "); + + if (op->od->l3dgw_port && op == op->od->l3dgw_port) { + struct eth_addr mac; + if (nat->external_mac && + eth_addr_from_string(nat->external_mac, &mac) + && nat->logical_port) { + /* distributed NAT case, use nat->external_mac */ + ds_put_format(&actions, + "eth.src = "ETH_ADDR_FMT"; " + "arp.sha = "ETH_ADDR_FMT"; ", + ETH_ADDR_ARGS(mac), + ETH_ADDR_ARGS(mac)); + /* Traffic with eth.src = nat->external_mac should only be + * sent from the chassis where nat->logical_port is + * resident, so that upstream MAC learning points to the + * correct chassis. Also need to avoid generation of + * multiple ARP responses from different chassis. */ + ds_put_format(&match, " && is_chassis_resident(\"%s\")", + nat->logical_port); + } else { + ds_put_format(&actions, + "eth.src = %s; " + "arp.sha = %s; ", + op->lrp_networks.ea_s, + op->lrp_networks.ea_s); + /* Traffic with eth.src = l3dgw_port->lrp_networks.ea_s + * should only be sent from the "redirect-chassis", so that + * upstream MAC learning points to the "redirect-chassis". + * Also need to avoid generation of multiple ARP responses + * from different chassis. */ + if (op->od->l3redirect_port) { + ds_put_format(&match, " && is_chassis_resident(%s)", + op->od->l3redirect_port->json_key); + } + } + } else { + ds_put_format(&actions, + "eth.src = %s; " + "arp.sha = %s; ", + op->lrp_networks.ea_s, + op->lrp_networks.ea_s); + } + ds_put_format(&actions, "arp.tpa = arp.spa; " "arp.spa = "IP_FMT"; " "outport = %s; " "flags.loopback = 1; " "output;", - op->lrp_networks.ea_s, - op->lrp_networks.ea_s, IP_ARGS(ip), op->json_key); ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_INPUT, 90, @@ -4104,7 +4183,7 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports, } } - /* NAT, Defrag and load balancing in Gateway routers. */ + /* NAT, Defrag and load balancing. */ HMAP_FOR_EACH (od, key_node, datapaths) { if (!od->nbr) { continue; @@ -4115,10 +4194,13 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports, ovn_lflow_add(lflows, od, S_ROUTER_IN_UNSNAT, 0, "1", "next;"); ovn_lflow_add(lflows, od, S_ROUTER_OUT_SNAT, 0, "1", "next;"); ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, 0, "1", "next;"); + ovn_lflow_add(lflows, od, S_ROUTER_OUT_UNDNAT, 0, "1", "next;"); + ovn_lflow_add(lflows, od, S_ROUTER_OUT_EGR_LOOP, 0, "1", "next;"); - /* NAT rules, packet defrag and load balancing are only valid on - * Gateway routers. */ - if (!smap_get(&od->nbr->options, "chassis")) { + /* NAT rules are only valid on Gateway routers and routers with + * l3dgw_port (router has a port with "redirect-chassis" + * specified). */ + if (!smap_get(&od->nbr->options, "chassis") && !od->l3dgw_port) { continue; } @@ -4168,6 +4250,23 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports, } } + /* For distributed router NAT, determine whether this NAT rule + * satisfies the conditions for distributed NAT processing. */ + bool distributed = false; + struct eth_addr mac; + if (od->l3dgw_port && !strcmp(nat->type, "dnat_and_snat") && + nat->logical_port && nat->external_mac) { + if (eth_addr_from_string(nat->external_mac, &mac)) { + distributed = true; + } else { + static struct vlog_rate_limit rl = + VLOG_RATE_LIMIT_INIT(5, 1); + VLOG_WARN_RL(&rl, "bad mac %s for dnat in router " + ""UUID_FMT"", nat->external_mac, UUID_ARGS(&od->key)); + continue; + } + } + /* Ingress UNSNAT table: It is for already established connections' * reverse traffic. i.e., SNAT has already been done in egress * pipeline and now the packet has entered the ingress pipeline as @@ -4179,10 +4278,41 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports, * egress pipeline. */ if (!strcmp(nat->type, "snat") || !strcmp(nat->type, "dnat_and_snat")) { - ds_clear(&match); - ds_put_format(&match, "ip && ip4.dst == %s", nat->external_ip); - ovn_lflow_add(lflows, od, S_ROUTER_IN_UNSNAT, 90, - ds_cstr(&match), "ct_snat; next;"); + if (!od->l3dgw_port) { + /* Gateway router. */ + ds_clear(&match); + ds_put_format(&match, "ip && ip4.dst == %s", + nat->external_ip); + ovn_lflow_add(lflows, od, S_ROUTER_IN_UNSNAT, 90, + ds_cstr(&match), "ct_snat; next;"); + } else { + /* Distributed router. */ + + /* Traffic received on l3dgw_port is subject to NAT. */ + ds_clear(&match); + ds_put_format(&match, "ip && ip4.dst == %s" + " && inport == %s", + nat->external_ip, + od->l3dgw_port->json_key); + if (!distributed && od->l3redirect_port) { + /* Flows for NAT rules that are centralized are only + * programmed on the "redirect-chassis". */ + ds_put_format(&match, " && is_chassis_resident(%s)", + od->l3redirect_port->json_key); + } + ovn_lflow_add(lflows, od, S_ROUTER_IN_UNSNAT, 100, + ds_cstr(&match), "ct_snat;"); + + /* Traffic received on other router ports must be + * redirected to the central instance of the l3dgw_port + * for NAT processing. */ + ds_clear(&match); + ds_put_format(&match, "ip && ip4.dst == %s", + nat->external_ip); + ovn_lflow_add(lflows, od, S_ROUTER_IN_UNSNAT, 50, + ds_cstr(&match), + REGBIT_NAT_REDIRECT" = 1; next;"); + } } /* Ingress DNAT table: Packets enter the pipeline with destination @@ -4190,21 +4320,87 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports, * to a logical IP address. */ if (!strcmp(nat->type, "dnat") || !strcmp(nat->type, "dnat_and_snat")) { - /* Packet when it goes from the initiator to destination. - * We need to zero the inport because the router can - * send the packet back through the same interface. */ + if (!od->l3dgw_port) { + /* Gateway router. */ + /* Packet when it goes from the initiator to destination. + * We need to set flags.loopback because the router can + * send the packet back through the same interface. */ + ds_clear(&match); + ds_put_format(&match, "ip && ip4.dst == %s", + nat->external_ip); + ds_clear(&actions); + if (dnat_force_snat_ip) { + /* Indicate to the future tables that a DNAT has taken + * place and a force SNAT needs to be done in the + * Egress SNAT table. */ + ds_put_format(&actions, + "flags.force_snat_for_dnat = 1; "); + } + ds_put_format(&actions, "flags.loopback = 1; ct_dnat(%s);", + nat->logical_ip); + ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, 100, + ds_cstr(&match), ds_cstr(&actions)); + } else { + /* Distributed router. */ + + /* Traffic received on l3dgw_port is subject to NAT. */ + ds_clear(&match); + ds_put_format(&match, "ip && ip4.dst == %s" + " && inport == %s", + nat->external_ip, + od->l3dgw_port->json_key); + if (!distributed && od->l3redirect_port) { + /* Flows for NAT rules that are centralized are only + * programmed on the "redirect-chassis". */ + ds_put_format(&match, " && is_chassis_resident(%s)", + od->l3redirect_port->json_key); + } + ds_clear(&actions); + ds_put_format(&actions, "ct_dnat(%s);", + nat->logical_ip); + ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, 100, + ds_cstr(&match), ds_cstr(&actions)); + + /* Traffic received on other router ports must be + * redirected to the central instance of the l3dgw_port + * for NAT processing. */ + ds_clear(&match); + ds_put_format(&match, "ip && ip4.dst == %s", + nat->external_ip); + ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, 50, + ds_cstr(&match), + REGBIT_NAT_REDIRECT" = 1; next;"); + } + } + + /* Egress UNDNAT table: It is for already established connections' + * reverse traffic. i.e., DNAT has already been done in ingress + * pipeline and now the packet has entered the egress pipeline as + * part of a reply. We undo the DNAT here. + * + * Note that this only applies for NAT on a distributed router. + * Undo DNAT on a gateway router is done in the ingress DNAT + * pipeline stage. */ + if (od->l3dgw_port && (!strcmp(nat->type, "dnat") + || !strcmp(nat->type, "dnat_and_snat"))) { ds_clear(&match); - ds_put_format(&match, "ip && ip4.dst == %s", nat->external_ip); + ds_put_format(&match, "ip && ip4.src == %s" + " && outport == %s", + nat->logical_ip, + od->l3dgw_port->json_key); + if (!distributed && od->l3redirect_port) { + /* Flows for NAT rules that are centralized are only + * programmed on the "redirect-chassis". */ + ds_put_format(&match, " && is_chassis_resident(%s)", + od->l3redirect_port->json_key); + } ds_clear(&actions); - if (dnat_force_snat_ip) { - /* Indicate to the future tables that a DNAT has taken - * place and a force SNAT needs to be done in the Egress - * SNAT table. */ - ds_put_format(&actions, "flags.force_snat_for_dnat = 1; "); + if (distributed) { + ds_put_format(&actions, "eth.src = "ETH_ADDR_FMT"; ", + ETH_ADDR_ARGS(mac)); } - ds_put_format(&actions, "flags.loopback = 1; ct_dnat(%s);", - nat->logical_ip); - ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, 100, + ds_put_format(&actions, "ct_dnat;"); + ovn_lflow_add(lflows, od, S_ROUTER_OUT_UNDNAT, 100, ds_cstr(&match), ds_cstr(&actions)); } @@ -4213,22 +4409,107 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports, * address. */ if (!strcmp(nat->type, "snat") || !strcmp(nat->type, "dnat_and_snat")) { + if (!od->l3dgw_port) { + /* Gateway router. */ + ds_clear(&match); + ds_put_format(&match, "ip && ip4.src == %s", + nat->logical_ip); + ds_clear(&actions); + ds_put_format(&actions, "ct_snat(%s);", nat->external_ip); + + /* The priority here is calculated such that the + * nat->logical_ip with the longest mask gets a higher + * priority. */ + ovn_lflow_add(lflows, od, S_ROUTER_OUT_SNAT, + count_1bits(ntohl(mask)) + 1, + ds_cstr(&match), ds_cstr(&actions)); + } else { + /* Distributed router. */ + ds_clear(&match); + ds_put_format(&match, "ip && ip4.src == %s" + " && outport == %s", + nat->logical_ip, + od->l3dgw_port->json_key); + if (!distributed && od->l3redirect_port) { + /* Flows for NAT rules that are centralized are only + * programmed on the "redirect-chassis". */ + ds_put_format(&match, " && is_chassis_resident(%s)", + od->l3redirect_port->json_key); + } + ds_clear(&actions); + if (distributed) { + ds_put_format(&actions, "eth.src = "ETH_ADDR_FMT"; ", + ETH_ADDR_ARGS(mac)); + } + ds_put_format(&actions, "ct_snat(%s);", nat->external_ip); + + /* The priority here is calculated such that the + * nat->logical_ip with the longest mask gets a higher + * priority. */ + ovn_lflow_add(lflows, od, S_ROUTER_OUT_SNAT, + count_1bits(ntohl(mask)) + 1, + ds_cstr(&match), ds_cstr(&actions)); + } + } + + /* Logical router ingress table 0: + * For NAT on a distributed router, add rules allowing + * ingress traffic with eth.dst matching nat->external_mac + * on the l3dgw_port instance where nat->logical_port is + * resident. */ + if (distributed) { ds_clear(&match); - ds_put_format(&match, "ip && ip4.src == %s", nat->logical_ip); - ds_clear(&actions); - ds_put_format(&actions, "ct_snat(%s);", nat->external_ip); + ds_put_format(&match, + "eth.dst == "ETH_ADDR_FMT" && inport == %s" + " && is_chassis_resident(\"%s\")", + ETH_ADDR_ARGS(mac), + od->l3dgw_port->json_key, + nat->logical_port); + ovn_lflow_add(lflows, od, S_ROUTER_IN_ADMISSION, 50, + ds_cstr(&match), "next;"); + } + + /* Ingress Gateway Redirect Table: For NAT on a distributed + * router, add flows that are specific to a NAT rule. These + * flows indicate the presence of an applicable NAT rule that + * can be applied in a distributed manner. */ + if (distributed) { + ds_clear(&match); + ds_put_format(&match, "ip4.src == %s && outport == %s", + nat->logical_ip, + od->l3dgw_port->json_key); + ovn_lflow_add(lflows, od, S_ROUTER_IN_GW_REDIRECT, 100, + ds_cstr(&match), "next;"); + } - /* The priority here is calculated such that the - * nat->logical_ip with the longest mask gets a higher - * priority. */ - ovn_lflow_add(lflows, od, S_ROUTER_OUT_SNAT, - count_1bits(ntohl(mask)) + 1, + /* Egress Loopback table: For NAT on a distributed router. + * If packets in the egress pipeline on the distributed + * gateway port have ip.dst matching a NAT external IP, then + * loop a clone of the packet back to the beginning of the + * ingress pipeline with inport = outport. */ + if (od->l3dgw_port) { + /* Distributed router. */ + ds_clear(&match); + ds_put_format(&match, "ip4.dst == %s && outport == %s", + nat->external_ip, + od->l3dgw_port->json_key); + ds_clear(&actions); + ds_put_format(&actions, + "clone { ct_clear; " + "inport = outport; outport = \"\"; " + "flags = 0; flags.loopback = 1; "); + for (int i = 0; i < MFF_N_LOG_REGS; i++) { + ds_put_format(&actions, "reg%d = 0; ", i); + } + ds_put_format(&actions, REGBIT_EGRESS_LOOPBACK" = 1; " + "next(pipeline=ingress, table=0); };"); + ovn_lflow_add(lflows, od, S_ROUTER_OUT_EGR_LOOP, 100, ds_cstr(&match), ds_cstr(&actions)); } } /* Handle force SNAT options set in the gateway router. */ - if (dnat_force_snat_ip) { + if (dnat_force_snat_ip && !od->l3dgw_port) { /* If a packet with destination IP address as that of the * gateway router (as set in options:dnat_force_snat_ip) is seen, * UNSNAT it. */ @@ -4247,7 +4528,7 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports, ovn_lflow_add(lflows, od, S_ROUTER_OUT_SNAT, 100, ds_cstr(&match), ds_cstr(&actions)); } - if (lb_force_snat_ip) { + if (lb_force_snat_ip && !od->l3dgw_port) { /* If a packet with destination IP address as that of the * gateway router (as set in options:lb_force_snat_ip) is seen, * UNSNAT it. */ @@ -4266,22 +4547,61 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports, ds_cstr(&match), ds_cstr(&actions)); } - /* Re-circulate every packet through the DNAT zone. - * This helps with two things. - * - * 1. Any packet that needs to be unDNATed in the reverse - * direction gets unDNATed. Ideally this could be done in - * the egress pipeline. But since the gateway router - * does not have any feature that depends on the source - * ip address being external IP address for IP routing, - * we can do it here, saving a future re-circulation. - * - * 2. Any packet that was sent through SNAT zone in the - * previous table automatically gets re-circulated to get - * back the new destination IP address that is needed for - * routing in the openflow pipeline. */ - ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, 50, - "ip", "flags.loopback = 1; ct_dnat;"); + if (!od->l3dgw_port) { + /* For gateway router, re-circulate every packet through + * the DNAT zone. This helps with two things. + * + * 1. Any packet that needs to be unDNATed in the reverse + * direction gets unDNATed. Ideally this could be done in + * the egress pipeline. But since the gateway router + * does not have any feature that depends on the source + * ip address being external IP address for IP routing, + * we can do it here, saving a future re-circulation. + * + * 2. Any packet that was sent through SNAT zone in the + * previous table automatically gets re-circulated to get + * back the new destination IP address that is needed for + * routing in the openflow pipeline. */ + ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, 50, + "ip", "flags.loopback = 1; ct_dnat;"); + } else { + /* For NAT on a distributed router, add flows to Ingress + * IP Routing table, Ingress ARP Resolution table, and + * Ingress Gateway Redirect Table that are not specific to a + * NAT rule. */ + + /* The highest priority IN_IP_ROUTING rule matches packets + * with REGBIT_NAT_REDIRECT (set in DNAT or UNSNAT stages), + * with action "ip.ttl--; next;". The IN_GW_REDIRECT table + * will take care of setting the outport. */ + ovn_lflow_add(lflows, od, S_ROUTER_IN_IP_ROUTING, 300, + REGBIT_NAT_REDIRECT" == 1", "ip.ttl--; next;"); + + /* The highest priority IN_ARP_RESOLVE rule matches packets + * with REGBIT_NAT_REDIRECT (set in DNAT or UNSNAT stages), + * then sets eth.dst to the distributed gateway port's + * ethernet address. */ + ds_clear(&actions); + ds_put_format(&actions, "eth.dst = %s; next;", + od->l3dgw_port->lrp_networks.ea_s); + ovn_lflow_add(lflows, od, S_ROUTER_IN_ARP_RESOLVE, 200, + REGBIT_NAT_REDIRECT" == 1", ds_cstr(&actions)); + + /* The highest priority IN_GW_REDIRECT rule redirects packets + * with REGBIT_NAT_REDIRECT (set in DNAT or UNSNAT stages) to + * the central instance of the l3dgw_port for NAT processing. */ + ds_clear(&actions); + ds_put_format(&actions, "outport = %s; next;", + od->l3redirect_port->json_key); + ovn_lflow_add(lflows, od, S_ROUTER_IN_GW_REDIRECT, 200, + REGBIT_NAT_REDIRECT" == 1", ds_cstr(&actions)); + } + + /* Load balancing and packet defrag are only valid on + * Gateway routers. */ + if (!smap_get(&od->nbr->options, "chassis")) { + continue; + } /* A set to hold all ips that need defragmentation and tracking. */ struct sset all_ips = SSET_INITIALIZER(&all_ips); diff --git a/ovn/ovn-architecture.7.xml b/ovn/ovn-architecture.7.xml index 5614b16ece4..d8114f1f9de 100644 --- a/ovn/ovn-architecture.7.xml +++ b/ovn/ovn-architecture.7.xml @@ -793,11 +793,10 @@ number 13. -
+ If the connected logical router port has a
+ redirect-chassis
specified and the logical router
+ has rules specified in
+ with , then those
+ addresses are also used to populate the switch's destination
+ lookup.
+
Supported only in OVN 2.7 and later. Earlier versions required
router addresses to be manually synchronized.
@@ -927,8 +936,9 @@
redirect-chassis
specified.