forked from openvswitch/ovs
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathovn-sb.xml
2681 lines (2316 loc) · 105 KB
/
ovn-sb.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="utf-8"?>
<database name="ovn-sb" title="OVN Southbound Database">
<p>
This database holds logical and physical configuration and state for the
Open Virtual Network (OVN) system to support virtual network abstraction.
For an introduction to OVN, please see <code>ovn-architecture</code>(7).
</p>
<p>
The OVN Southbound database sits at the center of the OVN
architecture. It is the one component that speaks both southbound
directly to all the hypervisors and gateways, via
<code>ovn-controller</code>/<code>ovn-controller-vtep</code>, and
northbound to the Cloud Management System, via <code>ovn-northd</code>:
</p>
<h2>Database Structure</h2>
<p>
The OVN Southbound database contains classes of data with
different properties, as described in the sections below.
</p>
<h3>Physical Network (PN) data</h3>
<p>
PN tables contain information about the chassis nodes in the system. This
contains all the information necessary to wire the overlay, such as IP
addresses, supported tunnel types, and security keys.
</p>
<p>
The amount of PN data is small (O(n) in the number of chassis) and it
changes infrequently, so it can be replicated to every chassis.
</p>
<p>
The <ref table="Chassis"/> table comprises the PN tables.
</p>
<h3>Logical Network (LN) data</h3>
<p>
LN tables contain the topology of logical switches and routers, ACLs,
firewall rules, and everything needed to describe how packets traverse a
logical network, represented as logical datapath flows (see Logical
Datapath Flows, below).
</p>
<p>
LN data may be large (O(n) in the number of logical ports, ACL rules,
etc.). Thus, to improve scaling, each chassis should receive only data
related to logical networks in which that chassis participates. Past
experience shows that in the presence of large logical networks, even
finer-grained partitioning of data, e.g. designing logical flows so that
only the chassis hosting a logical port needs related flows, pays off
scale-wise. (This is not necessary initially but it is worth bearing in
mind in the design.)
</p>
<p>
The LN is a slave of the cloud management system running northbound of OVN.
That CMS determines the entire OVN logical configuration and therefore the
LN's content at any given time is a deterministic function of the CMS's
configuration, although that happens indirectly via the
<ref db="OVN_Northbound"/> database and <code>ovn-northd</code>.
</p>
<p>
LN data is likely to change more quickly than PN data. This is especially
true in a container environment where VMs are created and destroyed (and
therefore added to and deleted from logical switches) quickly.
</p>
<p>
<ref table="Logical_Flow"/> and <ref table="Multicast_Group"/> contain LN
data.
</p>
<h3>Logical-physical bindings</h3>
<p>
These tables link logical and physical components. They show the current
placement of logical components (such as VMs and VIFs) onto chassis, and
map logical entities to the values that represent them in tunnel
encapsulations.
</p>
<p>
These tables change frequently, at least every time a VM powers up or down
or migrates, and especially quickly in a container environment. The
amount of data per VM (or VIF) is small.
</p>
<p>
Each chassis is authoritative about the VMs and VIFs that it hosts at any
given time and can efficiently flood that state to a central location, so
the consistency needs are minimal.
</p>
<p>
The <ref table="Port_Binding"/> and <ref table="Datapath_Binding"/> tables
contain binding data.
</p>
<h3>MAC bindings</h3>
<p>
The <ref table="MAC_Binding"/> table tracks the bindings from IP addresses
to Ethernet addresses that are dynamically discovered using ARP (for IPv4)
and neighbor discovery (for IPv6). Usually, IP-to-MAC bindings for virtual
machines are statically populated into the <ref table="Port_Binding"/>
table, so <ref table="MAC_Binding"/> is primarily used to discover bindings
on physical networks.
</p>
<h2>Common Columns</h2>
<p>
Some tables contain a special column named <code>external_ids</code>. This
column has the same form and purpose each place that it appears, so we
describe it here to save space later.
</p>
<dl>
<dt><code>external_ids</code>: map of string-string pairs</dt>
<dd>
Key-value pairs for use by the software that manages the OVN Southbound
database rather than by
<code>ovn-controller</code>/<code>ovn-controller-vtep</code>. In
particular, <code>ovn-northd</code> can use key-value pairs in this
column to relate entities in the southbound database to higher-level
entities (such as entities in the OVN Northbound database). Individual
key-value pairs in this column may be documented in some cases to aid
in understanding and troubleshooting, but the reader should not mistake
such documentation as comprehensive.
</dd>
</dl>
<table name="SB_Global" title="Southbound configuration">
<p>
Southbound configuration for an OVN system. This table must have exactly
one row.
</p>
<group title="Status">
This column allow a client to track the overall configuration state of
the system.
<column name="nb_cfg">
Sequence number for the configuration. When a CMS or
<code>ovn-nbctl</code> updates the northbound database, it increments
the <code>nb_cfg</code> column in the <code>NB_Global</code> table in
the northbound database. In turn, when <code>ovn-northd</code> updates
the southbound database to bring it up to date with these changes, it
updates this column to the same value.
</column>
</group>
<group title="Common Columns">
<column name="external_ids">
See <em>External IDs</em> at the beginning of this document.
</column>
</group>
<group title="Connection Options">
<column name="connections">
Database clients to which the Open vSwitch database server should
connect or on which it should listen, along with options for how these
connections should be configured. See the <ref table="Connection"/>
table for more information.
</column>
<column name="ssl">
Global SSL configuration.
</column>
</group>
</table>
<table name="Chassis" title="Physical Network Hypervisor and Gateway Information">
<p>
Each row in this table represents a hypervisor or gateway (a chassis) in
the physical network (PN). Each chassis, via
<code>ovn-controller</code>/<code>ovn-controller-vtep</code>, adds
and updates its own row, and keeps a copy of the remaining rows to
determine how to reach other hypervisors.
</p>
<p>
When a chassis shuts down gracefully, it should remove its own row.
(This is not critical because resources hosted on the chassis are equally
unreachable regardless of whether the row is present.) If a chassis
shuts down permanently without removing its row, some kind of manual or
automatic cleanup is eventually needed; we can devise a process for that
as necessary.
</p>
<column name="name">
OVN does not prescribe a particular format for chassis names.
ovn-controller populates this column using <ref key="system-id"
table="Open_vSwitch" column="external_ids" db="Open_vSwitch"/>
in the Open_vSwitch database's <ref table="Open_vSwitch"
db="Open_vSwitch"/> table. ovn-controller-vtep populates this
column with <ref table="Physical_Switch" column="name"
db="hardware_vtep"/> in the hardware_vtep database's
<ref table="Physical_Switch" db="hardware_vtep"/> table.
</column>
<column name="hostname">
The hostname of the chassis, if applicable. ovn-controller will populate
this column with the hostname of the host it is running on.
ovn-controller-vtep will leave this column empty.
</column>
<column name="nb_cfg">
Sequence number for the configuration. When <code>ovn-controller</code>
updates the configuration of a chassis from the contents of the
southbound database, it copies <ref table="SB_Global" column="nb_cfg"/>
from the <ref table="SB_Global"/> table into this column.
</column>
<column name="external_ids" key="ovn-bridge-mappings">
<code>ovn-controller</code> populates this key with the set of bridge
mappings it has been configured to use. Other applications should treat
this key as read-only. See <code>ovn-controller</code>(8) for more
information.
</column>
<column name="external_ids" key="datapath-type">
<code>ovn-controller</code> populates this key with the datapath type
configured in the <ref table="Bridge" column="datapath_type"/> column of
the Open_vSwitch database's <ref table="Bridge" db="Open_vSwitch"/>
table. Other applications should treat this key as read-only. See
<code>ovn-controller</code>(8) for more information.
</column>
<column name="external_ids" key="iface-types">
<code>ovn-controller</code> populates this key with the interface types
configured in the <ref table="Open_vSwitch" column="iface_types"/> column
of the Open_vSwitch database's <ref table="Open_vSwitch"
db="Open_vSwitch"/> table. Other applications should treat this key as
read-only. See <code>ovn-controller</code>(8) for more information.
</column>
<group title="Common Columns">
The overall purpose of these columns is described under <code>Common
Columns</code> at the beginning of this document.
<column name="external_ids"/>
</group>
<group title="Encapsulation Configuration">
<p>
OVN uses encapsulation to transmit logical dataplane packets
between chassis.
</p>
<column name="encaps">
Points to supported encapsulation configurations to transmit
logical dataplane packets to this chassis. Each entry is a <ref
table="Encap"/> record that describes the configuration.
</column>
</group>
<group title="Gateway Configuration">
<p>
A <dfn>gateway</dfn> is a chassis that forwards traffic between the
OVN-managed part of a logical network and a physical VLAN, extending a
tunnel-based logical network into a physical network. Gateways are
typically dedicated nodes that do not host VMs and will be controlled
by <code>ovn-controller-vtep</code>.
</p>
<column name="vtep_logical_switches">
Stores all VTEP logical switch names connected by this gateway
chassis. The <ref table="Port_Binding"/> table entry with
<ref column="options" table="Port_Binding"/>:<code>vtep-physical-switch</code>
equal <ref table="Chassis"/> <ref column="name" table="Chassis"/>, and
<ref column="options" table="Port_Binding"/>:<code>vtep-logical-switch</code>
value in <ref table="Chassis"/>
<ref column="vtep_logical_switches" table="Chassis"/>, will be
associated with this <ref table="Chassis"/>.
</column>
</group>
</table>
<table name="Encap" title="Encapsulation Types">
<p>
The <ref column="encaps" table="Chassis"/> column in the <ref
table="Chassis"/> table refers to rows in this table to identify
how OVN may transmit logical dataplane packets to this chassis.
Each chassis, via <code>ovn-controller</code>(8) or
<code>ovn-controller-vtep</code>(8), adds and updates its own rows
and keeps a copy of the remaining rows to determine how to reach
other chassis.
</p>
<column name="type">
The encapsulation to use to transmit packets to this chassis.
Hypervisors must use either <code>geneve</code> or
<code>stt</code>. Gateways may use <code>vxlan</code>,
<code>geneve</code>, or <code>stt</code>.
</column>
<column name="options">
<p>
Options for configuring the encapsulation. Currently, the only
option that has been defined is <code>csum</code>.
</p>
<p>
<code>csum</code> indicates that encapsulation checksums can be
transmitted and received with reasonable performance. It is a hint
to senders transmitting data to this chassis that they should use
checksums to protect OVN metadata. <code>ovn-controller</code>
populates this key with the value defined in
<ref table="Open_vSwitch" column="external_ids:ovn-encap-csum"/> column
of the Open_vSwitch database's <ref table="Open_vSwitch"
db="Open_vSwitch"/> table. Other applications should treat this key as
read-only. See <code>ovn-controller</code>(8) for more information.
</p>
<p>
In terms of performance, this actually significantly increases
throughput in most common cases when running on Linux based hosts
without NICs supporting encapsulation hardware offload (around 60% for
bulk traffic). The reason is that generally all NICs are capable of
offloading transmitted and received TCP/UDP checksums (viewed as
ordinary data packets and not as tunnels). The benefit comes on the
receive side where the validated outer checksum can be used to
additionally validate an inner checksum (such as TCP), which in turn
allows aggregation of packets to be more efficiently handled by the
rest of the stack.
</p>
<p>
Not all devices see such a benefit. The most notable exception is
hardware VTEPs. These devices are designed to not buffer entire
packets in their switching engines and are therefore unable to
efficiently compute or validate full packet checksums. In addition
certain versions of the Linux kernel are not able to fully take
advantage of encapsulation NIC offloads in the presence of checksums.
(This is actually a pretty narrow corner case though - earlier
versions of Linux don't support encapsulation offloads at all and
later versions support both offloads and checksums well.)
</p>
<p>
<code>csum</code> defaults to <code>false</code> for hardware VTEPs and
<code>true</code> for all other cases.
</p>
</column>
<column name="ip">
The IPv4 address of the encapsulation tunnel endpoint.
</column>
</table>
<table name="Address_Set" title="Address Sets">
<p>
See the documentation for the <ref table="Address_Set"
db="OVN_Northbound"/> table in the <ref db="OVN_Northbound"/> database
for details.
</p>
<column name="name"/>
<column name="addresses"/>
</table>
<table name="Logical_Flow" title="Logical Network Flows">
<p>
Each row in this table represents one logical flow.
<code>ovn-northd</code> populates this table with logical flows
that implement the L2 and L3 topologies specified in the
<ref db="OVN_Northbound"/> database. Each hypervisor, via
<code>ovn-controller</code>, translates the logical flows into
OpenFlow flows specific to its hypervisor and installs them into
Open vSwitch.
</p>
<p>
Logical flows are expressed in an OVN-specific format, described here. A
logical datapath flow is much like an OpenFlow flow, except that the
flows are written in terms of logical ports and logical datapaths instead
of physical ports and physical datapaths. Translation between logical
and physical flows helps to ensure isolation between logical datapaths.
(The logical flow abstraction also allows the OVN centralized
components to do less work, since they do not have to separately
compute and push out physical flows to each chassis.)
</p>
<p>
The default action when no flow matches is to drop packets.
</p>
<p><em>Architectural Logical Life Cycle of a Packet</em></p>
<p>
This following description focuses on the life cycle of a packet through
a logical datapath, ignoring physical details of the implementation.
Please refer to <em>Architectural Physical Life Cycle of a Packet</em> in
<code>ovn-architecture</code>(7) for the physical information.
</p>
<p>
The description here is written as if OVN itself executes these steps,
but in fact OVN (that is, <code>ovn-controller</code>) programs Open
vSwitch, via OpenFlow and OVSDB, to execute them on its behalf.
</p>
<p>
At a high level, OVN passes each packet through the logical datapath's
logical ingress pipeline, which may output the packet to one or more
logical port or logical multicast groups. For each such logical output
port, OVN passes the packet through the datapath's logical egress
pipeline, which may either drop the packet or deliver it to the
destination. Between the two pipelines, outputs to logical multicast
groups are expanded into logical ports, so that the egress pipeline only
processes a single logical output port at a time. Between the two
pipelines is also where, when necessary, OVN encapsulates a packet in a
tunnel (or tunnels) to transmit to remote hypervisors.
</p>
<p>
In more detail, to start, OVN searches the <ref table="Logical_Flow"/>
table for a row with correct <ref column="logical_datapath"/>, a <ref
column="pipeline"/> of <code>ingress</code>, a <ref column="table_id"/>
of 0, and a <ref column="match"/> that is true for the packet. If none
is found, OVN drops the packet. If OVN finds more than one, it chooses
the match with the highest <ref column="priority"/>. Then OVN executes
each of the actions specified in the row's <ref table="actions"/> column,
in the order specified. Some actions, such as those to modify packet
headers, require no further details. The <code>next</code> and
<code>output</code> actions are special.
</p>
<p>
The <code>next</code> action causes the above process to be repeated
recursively, except that OVN searches for <ref column="table_id"/> of 1
instead of 0. Similarly, any <code>next</code> action in a row found in
that table would cause a further search for a <ref column="table_id"/> of
2, and so on. When recursive processing completes, flow control returns
to the action following <code>next</code>.
</p>
<p>
The <code>output</code> action also introduces recursion. Its effect
depends on the current value of the <code>outport</code> field. Suppose
<code>outport</code> designates a logical port. First, OVN compares
<code>inport</code> to <code>outport</code>; if they are equal, it treats
the <code>output</code> as a no-op by default. In the common
case, where they are different, the packet enters the egress
pipeline. This transition to the egress pipeline discards
register data, e.g. <code>reg0</code> ... <code>reg9</code> and
connection tracking state, to achieve uniform behavior regardless
of whether the egress pipeline is on a different hypervisor
(because registers aren't preserve across tunnel encapsulation).
</p>
<p>
To execute the egress pipeline, OVN again searches the <ref
table="Logical_Flow"/> table for a row with correct <ref
column="logical_datapath"/>, a <ref column="table_id"/> of 0, a <ref
column="match"/> that is true for the packet, but now looking for a <ref
column="pipeline"/> of <code>egress</code>. If no matching row is found,
the output becomes a no-op. Otherwise, OVN executes the actions for the
matching flow (which is chosen from multiple, if necessary, as already
described).
</p>
<p>
In the <code>egress</code> pipeline, the <code>next</code> action acts as
already described, except that it, of course, searches for
<code>egress</code> flows. The <code>output</code> action, however, now
directly outputs the packet to the output port (which is now fixed,
because <code>outport</code> is read-only within the egress pipeline).
</p>
<p>
The description earlier assumed that <code>outport</code> referred to a
logical port. If it instead designates a logical multicast group, then
the description above still applies, with the addition of fan-out from
the logical multicast group to each logical port in the group. For each
member of the group, OVN executes the logical pipeline as described, with
the logical output port replaced by the group member.
</p>
<p><em>Pipeline Stages</em></p>
<p>
<code>ovn-northd</code> populates the <ref table="Logical_Flow"/> table
with the logical flows described in detail in <code>ovn-northd</code>(8).
</p>
<column name="logical_datapath">
The logical datapath to which the logical flow belongs.
</column>
<column name="pipeline">
<p>
The primary flows used for deciding on a packet's destination are the
<code>ingress</code> flows. The <code>egress</code> flows implement
ACLs. See <em>Logical Life Cycle of a Packet</em>, above, for details.
</p>
</column>
<column name="table_id">
The stage in the logical pipeline, analogous to an OpenFlow table number.
</column>
<column name="priority">
The flow's priority. Flows with numerically higher priority take
precedence over those with lower. If two logical datapath flows with the
same priority both match, then the one actually applied to the packet is
undefined.
</column>
<column name="match">
<p>
A matching expression. OVN provides a superset of OpenFlow matching
capabilities, using a syntax similar to Boolean expressions in a
programming language.
</p>
<p>
The most important components of match expression are
<dfn>comparisons</dfn> between <dfn>symbols</dfn> and
<dfn>constants</dfn>, e.g. <code>ip4.dst == 192.168.0.1</code>,
<code>ip.proto == 6</code>, <code>arp.op == 1</code>, <code>eth.type ==
0x800</code>. The logical AND operator <code>&&</code> and
logical OR operator <code>||</code> can combine comparisons into a
larger expression.
</p>
<p>
Matching expressions also support parentheses for grouping, the logical
NOT prefix operator <code>!</code>, and literals <code>0</code> and
<code>1</code> to express ``false'' or ``true,'' respectively. The
latter is useful by itself as a catch-all expression that matches every
packet.
</p>
<p>
Match expressions also support a kind of function syntax. The
following functions are supported:
</p>
<dl>
<dt><code>is_chassis_resident(<var>lport</var>)</code></dt>
<dd>
Evaluates to true on a chassis on which logical port <var>lport</var>
(a quoted string) resides, and to false elsewhere. This function was
introduced in OVN 2.7.
</dd>
</dl>
<p><em>Symbols</em></p>
<p>
<em>Type</em>. Symbols have <dfn>integer</dfn> or <dfn>string</dfn>
type. Integer symbols have a <dfn>width</dfn> in bits.
</p>
<p>
<em>Kinds</em>. There are three kinds of symbols:
</p>
<ul>
<li>
<p>
<dfn>Fields</dfn>. A field symbol represents a packet header or
metadata field. For example, a field
named <code>vlan.tci</code> might represent the VLAN TCI field in a
packet.
</p>
<p>
A field symbol can have integer or string type. Integer fields can
be nominal or ordinal (see <em>Level of Measurement</em>,
below).
</p>
</li>
<li>
<p>
<dfn>Subfields</dfn>. A subfield represents a subset of bits from
a larger field. For example, a field <code>vlan.vid</code> might
be defined as an alias for <code>vlan.tci[0..11]</code>. Subfields
are provided for syntactic convenience, because it is always
possible to instead refer to a subset of bits from a field
directly.
</p>
<p>
Only ordinal fields (see <em>Level of Measurement</em>,
below) may have subfields. Subfields are always ordinal.
</p>
</li>
<li>
<p>
<dfn>Predicates</dfn>. A predicate is shorthand for a Boolean
expression. Predicates may be used much like 1-bit fields. For
example, <code>ip4</code> might expand to <code>eth.type ==
0x800</code>. Predicates are provided for syntactic convenience,
because it is always possible to instead specify the underlying
expression directly.
</p>
<p>
A predicate whose expansion refers to any nominal field or
predicate (see <em>Level of Measurement</em>, below) is nominal;
other predicates have Boolean level of measurement.
</p>
</li>
</ul>
<p>
<em>Level of Measurement</em>. See
http://en.wikipedia.org/wiki/Level_of_measurement for the statistical
concept on which this classification is based. There are three
levels:
</p>
<ul>
<li>
<p>
<dfn>Ordinal</dfn>. In statistics, ordinal values can be ordered
on a scale. OVN considers a field (or subfield) to be ordinal if
its bits can be examined individually. This is true for the
OpenFlow fields that OpenFlow or Open vSwitch makes ``maskable.''
</p>
<p>
Any use of a nominal field may specify a single bit or a range of
bits, e.g. <code>vlan.tci[13..15]</code> refers to the PCP field
within the VLAN TCI, and <code>eth.dst[40]</code> refers to the
multicast bit in the Ethernet destination address.
</p>
<p>
OVN supports all the usual arithmetic relations (<code>==</code>,
<code>!=</code>, <code><</code>, <code><=</code>,
<code>></code>, and <code>>=</code>) on ordinal fields and
their subfields, because OVN can implement these in OpenFlow and
Open vSwitch as collections of bitwise tests.
</p>
</li>
<li>
<p>
<dfn>Nominal</dfn>. In statistics, nominal values cannot be
usefully compared except for equality. This is true of OpenFlow
port numbers, Ethernet types, and IP protocols are examples: all of
these are just identifiers assigned arbitrarily with no deeper
meaning. In OpenFlow and Open vSwitch, bits in these fields
generally aren't individually addressable.
</p>
<p>
OVN only supports arithmetic tests for equality on nominal fields,
because OpenFlow and Open vSwitch provide no way for a flow to
efficiently implement other comparisons on them. (A test for
inequality can be sort of built out of two flows with different
priorities, but OVN matching expressions always generate flows with
a single priority.)
</p>
<p>
String fields are always nominal.
</p>
</li>
<li>
<p>
<dfn>Boolean</dfn>. A nominal field that has only two values, 0
and 1, is somewhat exceptional, since it is easy to support both
equality and inequality tests on such a field: either one can be
implemented as a test for 0 or 1.
</p>
<p>
Only predicates (see above) have a Boolean level of measurement.
</p>
<p>
This isn't a standard level of measurement.
</p>
</li>
</ul>
<p>
<em>Prerequisites</em>. Any symbol can have prerequisites, which are
additional condition implied by the use of the symbol. For example,
For example, <code>icmp4.type</code> symbol might have prerequisite
<code>icmp4</code>, which would cause an expression <code>icmp4.type ==
0</code> to be interpreted as <code>icmp4.type == 0 &&
icmp4</code>, which would in turn expand to <code>icmp4.type == 0
&& eth.type == 0x800 && ip4.proto == 1</code> (assuming
<code>icmp4</code> is a predicate defined as suggested under
<em>Types</em> above).
</p>
<p><em>Relational operators</em></p>
<p>
All of the standard relational operators <code>==</code>,
<code>!=</code>, <code><</code>, <code><=</code>,
<code>></code>, and <code>>=</code> are supported. Nominal
fields support only <code>==</code> and <code>!=</code>, and only in a
positive sense when outer <code>!</code> are taken into account,
e.g. given string field <code>inport</code>, <code>inport ==
"eth0"</code> and <code>!(inport != "eth0")</code> are acceptable, but
not <code>inport != "eth0"</code>.
</p>
<p>
The implementation of <code>==</code> (or <code>!=</code> when it is
negated), is more efficient than that of the other relational
operators.
</p>
<p><em>Constants</em></p>
<p>
Integer constants may be expressed in decimal, hexadecimal prefixed by
<code>0x</code>, or as dotted-quad IPv4 addresses, IPv6 addresses in
their standard forms, or Ethernet addresses as colon-separated hex
digits. A constant in any of these forms may be followed by a slash
and a second constant (the mask) in the same form, to form a masked
constant. IPv4 and IPv6 masks may be given as integers, to express
CIDR prefixes.
</p>
<p>
String constants have the same syntax as quoted strings in JSON (thus,
they are Unicode strings).
</p>
<p>
Some operators support sets of constants written inside curly braces
<code>{</code> ... <code>}</code>. Commas between elements of a set,
and after the last elements, are optional. With <code>==</code>,
``<code><var>field</var> == { <var>constant1</var>,
<var>constant2</var>,</code> ... <code>}</code>'' is syntactic sugar
for ``<code><var>field</var> == <var>constant1</var> ||
<var>field</var> == <var>constant2</var> || </code>...<code></code>.
Similarly, ``<code><var>field</var> != { <var>constant1</var>,
<var>constant2</var>, </code>...<code> }</code>'' is equivalent to
``<code><var>field</var> != <var>constant1</var> &&
<var>field</var> != <var>constant2</var> &&
</code>...<code></code>''.
</p>
<p>
You may refer to a set of IPv4, IPv6, or MAC addresses stored in the
<ref table="Address_Set"/> table by its <ref column="name"
table="Address_Set"/>. An <ref table="Address_Set"/> with a name
of <code>set1</code> can be referred to as
<code>$set1</code>.
</p>
<p><em>Miscellaneous</em></p>
<p>
Comparisons may name the symbol or the constant first,
e.g. <code>tcp.src == 80</code> and <code>80 == tcp.src</code> are both
acceptable.
</p>
<p>
Tests for a range may be expressed using a syntax like <code>1024 <=
tcp.src <= 49151</code>, which is equivalent to <code>1024 <=
tcp.src && tcp.src <= 49151</code>.
</p>
<p>
For a one-bit field or predicate, a mention of its name is equivalent
to <code><var>symobl</var> == 1</code>, e.g. <code>vlan.present</code>
is equivalent to <code>vlan.present == 1</code>. The same is true for
one-bit subfields, e.g. <code>vlan.tci[12]</code>. There is no
technical limitation to implementing the same for ordinal fields of all
widths, but the implementation is expensive enough that the syntax
parser requires writing an explicit comparison against zero to make
mistakes less likely, e.g. in <code>tcp.src != 0</code> the comparison
against 0 is required.
</p>
<p>
<em>Operator precedence</em> is as shown below, from highest to lowest.
There are two exceptions where parentheses are required even though the
table would suggest that they are not: <code>&&</code> and
<code>||</code> require parentheses when used together, and
<code>!</code> requires parentheses when applied to a relational
expression. Thus, in <code>(eth.type == 0x800 || eth.type == 0x86dd)
&& ip.proto == 6</code> or <code>!(arp.op == 1)</code>, the
parentheses are mandatory.
</p>
<ul>
<li><code>()</code></li>
<li><code>== != < <= > >=</code></li>
<li><code>!</code></li>
<li><code>&& ||</code></li>
</ul>
<p>
<em>Comments</em> may be introduced by <code>//</code>, which extends
to the next new-line. Comments within a line may be bracketed by
<code>/*</code> and <code>*/</code>. Multiline comments are not
supported.
</p>
<p><em>Symbols</em></p>
<p>
Most of the symbols below have integer type. Only <code>inport</code>
and <code>outport</code> have string type. <code>inport</code> names a
logical port. Thus, its value is a <ref column="logical_port"/> name
from the <ref table="Port_Binding"/> table. <code>outport</code> may
name a logical port, as <code>inport</code>, or a logical multicast
group defined in the <ref table="Multicast_Group"/> table. For both
symbols, only names within the flow's logical datapath may be used.
</p>
<p>
The <code>reg</code><var>X</var> symbols are 32-bit integers.
The <code>xxreg</code><var>X</var> symbols are 128-bit integers,
which overlay four of the 32-bit registers: <code>xxreg0</code>
overlays <code>reg0</code> through <code>reg3</code>, with
<code>reg0</code> supplying the most-significant bits of
<code>xxreg0</code> and <code>reg3</code> the least-signficant.
<code>xxreg1</code> similarly overlays <code>reg4</code> through
<code>reg7</code>.
</p>
<ul>
<li><code>reg0</code>...<code>reg9</code></li>
<li><code>xxreg0</code> <code>xxreg1</code></li>
<li><code>inport</code> <code>outport</code></li>
<li><code>flags.loopback</code></li>
<li><code>eth.src</code> <code>eth.dst</code> <code>eth.type</code></li>
<li><code>vlan.tci</code> <code>vlan.vid</code> <code>vlan.pcp</code> <code>vlan.present</code></li>
<li><code>ip.proto</code> <code>ip.dscp</code> <code>ip.ecn</code> <code>ip.ttl</code> <code>ip.frag</code></li>
<li><code>ip4.src</code> <code>ip4.dst</code></li>
<li><code>ip6.src</code> <code>ip6.dst</code> <code>ip6.label</code></li>
<li><code>arp.op</code> <code>arp.spa</code> <code>arp.tpa</code> <code>arp.sha</code> <code>arp.tha</code></li>
<li><code>tcp.src</code> <code>tcp.dst</code> <code>tcp.flags</code></li>
<li><code>udp.src</code> <code>udp.dst</code></li>
<li><code>sctp.src</code> <code>sctp.dst</code></li>
<li><code>icmp4.type</code> <code>icmp4.code</code></li>
<li><code>icmp6.type</code> <code>icmp6.code</code></li>
<li><code>nd.target</code> <code>nd.sll</code> <code>nd.tll</code></li>
<li><code>ct_mark</code> <code>ct_label</code></li>
<li>
<p>
<code>ct_state</code>, which has the following Boolean subfields:
</p>
<ul>
<li><code>ct.new</code>: True for a new flow</li>
<li><code>ct.est</code>: True for an established flow</li>
<li><code>ct.rel</code>: True for a related flow</li>
<li><code>ct.rpl</code>: True for a reply flow</li>
<li><code>ct.inv</code>: True for a connection entry in a bad state</li>
</ul>
<p>
The above subfields of <code>ct_state</code> are initialized by
the <code>ct_next</code> action, described later.
</p>
<ul>
<li>
<code>ct.dnat</code>: True for a packet whose destination IP
address has been changed.
</li>
<li>
<code>ct.snat</code>: True for a packet whose source IP
address has been changed.
</li>
</ul>
<p>
The above subfields of <code>ct_state</code> are initialized by
the actions like <code>ct_dnat</code>, <code>ct_snat</code> and
<code>ct_lb</code> described later.
</p>
</li>
</ul>
<p>
The following predicates are supported:
</p>
<ul>
<li><code>eth.bcast</code> expands to <code>eth.dst == ff:ff:ff:ff:ff:ff</code></li>
<li><code>eth.mcast</code> expands to <code>eth.dst[40]</code></li>
<li><code>vlan.present</code> expands to <code>vlan.tci[12]</code></li>
<li><code>ip4</code> expands to <code>eth.type == 0x800</code></li>
<li><code>ip4.mcast</code> expands to <code>ip4.dst[28..31] == 0xe</code></li>
<li><code>ip6</code> expands to <code>eth.type == 0x86dd</code></li>
<li><code>ip</code> expands to <code>ip4 || ip6</code></li>
<li><code>icmp4</code> expands to <code>ip4 && ip.proto == 1</code></li>
<li><code>icmp6</code> expands to <code>ip6 && ip.proto == 58</code></li>
<li><code>icmp</code> expands to <code>icmp4 || icmp6</code></li>
<li><code>ip.is_frag</code> expands to <code>ip.frag[0]</code></li>
<li><code>ip.later_frag</code> expands to <code>ip.frag[1]</code></li>
<li><code>ip.first_frag</code> expands to <code>ip.is_frag && !ip.later_frag</code></li>
<li><code>arp</code> expands to <code>eth.type == 0x806</code></li>
<li><code>nd</code> expands to <code>icmp6.type == {135, 136} && icmp6.code == 0 && ip.ttl == 255</code></li>
<li><code>nd_ns</code> expands to <code>icmp6.type == 135 && icmp6.code == 0 && ip.ttl == 255</code></li>
<li><code>nd_na</code> expands to <code>icmp6.type == 136 && icmp6.code == 0 && ip.ttl == 255</code></li>
<li><code>tcp</code> expands to <code>ip.proto == 6</code></li>
<li><code>udp</code> expands to <code>ip.proto == 17</code></li>
<li><code>sctp</code> expands to <code>ip.proto == 132</code></li>
</ul>
</column>
<column name="actions">
<p>
Logical datapath actions, to be executed when the logical flow
represented by this row is the highest-priority match.
</p>
<p>
Actions share lexical syntax with the <ref column="match"/> column. An
empty set of actions (or one that contains just white space or
comments), or a set of actions that consists of just
<code>drop;</code>, causes the matched packets to be dropped.
Otherwise, the column should contain a sequence of actions, each
terminated by a semicolon.
</p>
<p>
The following actions are defined:
</p>
<dl>
<dt><code>output;</code></dt>
<dd>
<p>
In the ingress pipeline, this action executes the
<code>egress</code> pipeline as a subroutine. If
<code>outport</code> names a logical port, the egress pipeline
executes once; if it is a multicast group, the egress pipeline runs
once for each logical port in the group.
</p>
<p>
In the egress pipeline, this action performs the actual
output to the <code>outport</code> logical port. (In the egress
pipeline, <code>outport</code> never names a multicast group.)
</p>
<p>
By default, output to the input port is implicitly dropped,
that is, <code>output</code> becomes a no-op if
<code>outport</code> == <code>inport</code>. Occasionally
it may be useful to override this behavior, e.g. to send an
ARP reply to an ARP request; to do so, use
<code>flags.loopback = 1</code> to allow the packet to
"hair-pin" back to the input port.
</p>
</dd>
<dt><code>next;</code></dt>
<dt><code>next(<var>table</var>);</code></dt>
<dt><code>next(pipeline=<var>pipeline</var>, table=<var>table</var>);</code></dt>
<dd>
Executes the given logical datapath <var>table</var> in
<var>pipeline</var> as a subroutine. The default <var>table</var> is
just after the current one. If <var>pipeline</var> is specified, it
may be <code>ingress</code> or <code>egress</code>; the default
<var>pipeline</var> is the one currently executing. Actions in the
ingress pipeline may not use <code>next</code> to jump into the
egress pipeline (use the <code>output</code> instead), but
transitions in the opposite direction are allowed.
</dd>
<dt><code><var>field</var> = <var>constant</var>;</code></dt>
<dd>
<p>
Sets data or metadata field <var>field</var> to constant value
<var>constant</var>, e.g. <code>outport = "vif0";</code> to set the
logical output port. To set only a subset of bits in a field,
specify a subfield for <var>field</var> or a masked
<var>constant</var>, e.g. one may use <code>vlan.pcp[2] = 1;</code>
or <code>vlan.pcp = 4/4;</code> to set the most sigificant bit of
the VLAN PCP.
</p>
<p>
Assigning to a field with prerequisites implicitly adds those
prerequisites to <ref column="match"/>; thus, for example, a flow
that sets <code>tcp.dst</code> applies only to TCP flows,
regardless of whether its <ref column="match"/> mentions any TCP
field.
</p>
<p>
Not all fields are modifiable (e.g. <code>eth.type</code> and
<code>ip.proto</code> are read-only), and not all modifiable fields
may be partially modified (e.g. <code>ip.ttl</code> must assigned
as a whole). The <code>outport</code> field is modifiable in the
<code>ingress</code> pipeline but not in the <code>egress</code>