forked from torvalds/linux
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathbonding.txt
2520 lines (1956 loc) · 101 KB
/
bonding.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
Linux Ethernet Bonding Driver HOWTO
Latest update: 23 September 2009
Initial release : Thomas Davis <tadavis at lbl.gov>
Corrections, HA extensions : 2000/10/03-15 :
- Willy Tarreau <willy at meta-x.org>
- Constantine Gavrilov <const-g at xpert.com>
- Chad N. Tindel <ctindel at ieee dot org>
- Janice Girouard <girouard at us dot ibm dot com>
- Jay Vosburgh <fubar at us dot ibm dot com>
Reorganized and updated Feb 2005 by Jay Vosburgh
Added Sysfs information: 2006/04/24
- Mitch Williams <mitch.a.williams at intel.com>
Introduction
============
The Linux bonding driver provides a method for aggregating
multiple network interfaces into a single logical "bonded" interface.
The behavior of the bonded interfaces depends upon the mode; generally
speaking, modes provide either hot standby or load balancing services.
Additionally, link integrity monitoring may be performed.
The bonding driver originally came from Donald Becker's
beowulf patches for kernel 2.0. It has changed quite a bit since, and
the original tools from extreme-linux and beowulf sites will not work
with this version of the driver.
For new versions of the driver, updated userspace tools, and
who to ask for help, please follow the links at the end of this file.
Table of Contents
=================
1. Bonding Driver Installation
2. Bonding Driver Options
3. Configuring Bonding Devices
3.1 Configuration with Sysconfig Support
3.1.1 Using DHCP with Sysconfig
3.1.2 Configuring Multiple Bonds with Sysconfig
3.2 Configuration with Initscripts Support
3.2.1 Using DHCP with Initscripts
3.2.2 Configuring Multiple Bonds with Initscripts
3.3 Configuring Bonding Manually with Ifenslave
3.3.1 Configuring Multiple Bonds Manually
3.4 Configuring Bonding Manually via Sysfs
3.5 Overriding Configuration for Special Cases
4. Querying Bonding Configuration
4.1 Bonding Configuration
4.2 Network Configuration
5. Switch Configuration
6. 802.1q VLAN Support
7. Link Monitoring
7.1 ARP Monitor Operation
7.2 Configuring Multiple ARP Targets
7.3 MII Monitor Operation
8. Potential Trouble Sources
8.1 Adventures in Routing
8.2 Ethernet Device Renaming
8.3 Painfully Slow Or No Failed Link Detection By Miimon
9. SNMP agents
10. Promiscuous mode
11. Configuring Bonding for High Availability
11.1 High Availability in a Single Switch Topology
11.2 High Availability in a Multiple Switch Topology
11.2.1 HA Bonding Mode Selection for Multiple Switch Topology
11.2.2 HA Link Monitoring for Multiple Switch Topology
12. Configuring Bonding for Maximum Throughput
12.1 Maximum Throughput in a Single Switch Topology
12.1.1 MT Bonding Mode Selection for Single Switch Topology
12.1.2 MT Link Monitoring for Single Switch Topology
12.2 Maximum Throughput in a Multiple Switch Topology
12.2.1 MT Bonding Mode Selection for Multiple Switch Topology
12.2.2 MT Link Monitoring for Multiple Switch Topology
13. Switch Behavior Issues
13.1 Link Establishment and Failover Delays
13.2 Duplicated Incoming Packets
14. Hardware Specific Considerations
14.1 IBM BladeCenter
15. Frequently Asked Questions
16. Resources and Links
1. Bonding Driver Installation
==============================
Most popular distro kernels ship with the bonding driver
already available as a module and the ifenslave user level control
program installed and ready for use. If your distro does not, or you
have need to compile bonding from source (e.g., configuring and
installing a mainline kernel from kernel.org), you'll need to perform
the following steps:
1.1 Configure and build the kernel with bonding
-----------------------------------------------
The current version of the bonding driver is available in the
drivers/net/bonding subdirectory of the most recent kernel source
(which is available on http://kernel.org). Most users "rolling their
own" will want to use the most recent kernel from kernel.org.
Configure kernel with "make menuconfig" (or "make xconfig" or
"make config"), then select "Bonding driver support" in the "Network
device support" section. It is recommended that you configure the
driver as module since it is currently the only way to pass parameters
to the driver or configure more than one bonding device.
Build and install the new kernel and modules, then continue
below to install ifenslave.
1.2 Install ifenslave Control Utility
-------------------------------------
The ifenslave user level control program is included in the
kernel source tree, in the file Documentation/networking/ifenslave.c.
It is generally recommended that you use the ifenslave that
corresponds to the kernel that you are using (either from the same
source tree or supplied with the distro), however, ifenslave
executables from older kernels should function (but features newer
than the ifenslave release are not supported). Running an ifenslave
that is newer than the kernel is not supported, and may or may not
work.
To install ifenslave, do the following:
# gcc -Wall -O -I/usr/src/linux/include ifenslave.c -o ifenslave
# cp ifenslave /sbin/ifenslave
If your kernel source is not in "/usr/src/linux," then replace
"/usr/src/linux/include" in the above with the location of your kernel
source include directory.
You may wish to back up any existing /sbin/ifenslave, or, for
testing or informal use, tag the ifenslave to the kernel version
(e.g., name the ifenslave executable /sbin/ifenslave-2.6.10).
IMPORTANT NOTE:
If you omit the "-I" or specify an incorrect directory, you
may end up with an ifenslave that is incompatible with the kernel
you're trying to build it for. Some distros (e.g., Red Hat from 7.1
onwards) do not have /usr/include/linux symbolically linked to the
default kernel source include directory.
SECOND IMPORTANT NOTE:
If you plan to configure bonding using sysfs, you do not need
to use ifenslave.
2. Bonding Driver Options
=========================
Options for the bonding driver are supplied as parameters to the
bonding module at load time, or are specified via sysfs.
Module options may be given as command line arguments to the
insmod or modprobe command, but are usually specified in either the
/etc/modules.conf or /etc/modprobe.conf configuration file, or in a
distro-specific configuration file (some of which are detailed in the next
section).
Details on bonding support for sysfs is provided in the
"Configuring Bonding Manually via Sysfs" section, below.
The available bonding driver parameters are listed below. If a
parameter is not specified the default value is used. When initially
configuring a bond, it is recommended "tail -f /var/log/messages" be
run in a separate window to watch for bonding driver error messages.
It is critical that either the miimon or arp_interval and
arp_ip_target parameters be specified, otherwise serious network
degradation will occur during link failures. Very few devices do not
support at least miimon, so there is really no reason not to use it.
Options with textual values will accept either the text name
or, for backwards compatibility, the option value. E.g.,
"mode=802.3ad" and "mode=4" set the same mode.
The parameters are as follows:
ad_select
Specifies the 802.3ad aggregation selection logic to use. The
possible values and their effects are:
stable or 0
The active aggregator is chosen by largest aggregate
bandwidth.
Reselection of the active aggregator occurs only when all
slaves of the active aggregator are down or the active
aggregator has no slaves.
This is the default value.
bandwidth or 1
The active aggregator is chosen by largest aggregate
bandwidth. Reselection occurs if:
- A slave is added to or removed from the bond
- Any slave's link state changes
- Any slave's 802.3ad association state changes
- The bond's administrative state changes to up
count or 2
The active aggregator is chosen by the largest number of
ports (slaves). Reselection occurs as described under the
"bandwidth" setting, above.
The bandwidth and count selection policies permit failover of
802.3ad aggregations when partial failure of the active aggregator
occurs. This keeps the aggregator with the highest availability
(either in bandwidth or in number of ports) active at all times.
This option was added in bonding version 3.4.0.
arp_interval
Specifies the ARP link monitoring frequency in milliseconds.
The ARP monitor works by periodically checking the slave
devices to determine whether they have sent or received
traffic recently (the precise criteria depends upon the
bonding mode, and the state of the slave). Regular traffic is
generated via ARP probes issued for the addresses specified by
the arp_ip_target option.
This behavior can be modified by the arp_validate option,
below.
If ARP monitoring is used in an etherchannel compatible mode
(modes 0 and 2), the switch should be configured in a mode
that evenly distributes packets across all links. If the
switch is configured to distribute the packets in an XOR
fashion, all replies from the ARP targets will be received on
the same link which could cause the other team members to
fail. ARP monitoring should not be used in conjunction with
miimon. A value of 0 disables ARP monitoring. The default
value is 0.
arp_ip_target
Specifies the IP addresses to use as ARP monitoring peers when
arp_interval is > 0. These are the targets of the ARP request
sent to determine the health of the link to the targets.
Specify these values in ddd.ddd.ddd.ddd format. Multiple IP
addresses must be separated by a comma. At least one IP
address must be given for ARP monitoring to function. The
maximum number of targets that can be specified is 16. The
default value is no IP addresses.
arp_validate
Specifies whether or not ARP probes and replies should be
validated in the active-backup mode. This causes the ARP
monitor to examine the incoming ARP requests and replies, and
only consider a slave to be up if it is receiving the
appropriate ARP traffic.
Possible values are:
none or 0
No validation is performed. This is the default.
active or 1
Validation is performed only for the active slave.
backup or 2
Validation is performed only for backup slaves.
all or 3
Validation is performed for all slaves.
For the active slave, the validation checks ARP replies to
confirm that they were generated by an arp_ip_target. Since
backup slaves do not typically receive these replies, the
validation performed for backup slaves is on the ARP request
sent out via the active slave. It is possible that some
switch or network configurations may result in situations
wherein the backup slaves do not receive the ARP requests; in
such a situation, validation of backup slaves must be
disabled.
This option is useful in network configurations in which
multiple bonding hosts are concurrently issuing ARPs to one or
more targets beyond a common switch. Should the link between
the switch and target fail (but not the switch itself), the
probe traffic generated by the multiple bonding instances will
fool the standard ARP monitor into considering the links as
still up. Use of the arp_validate option can resolve this, as
the ARP monitor will only consider ARP requests and replies
associated with its own instance of bonding.
This option was added in bonding version 3.1.0.
downdelay
Specifies the time, in milliseconds, to wait before disabling
a slave after a link failure has been detected. This option
is only valid for the miimon link monitor. The downdelay
value should be a multiple of the miimon value; if not, it
will be rounded down to the nearest multiple. The default
value is 0.
fail_over_mac
Specifies whether active-backup mode should set all slaves to
the same MAC address at enslavement (the traditional
behavior), or, when enabled, perform special handling of the
bond's MAC address in accordance with the selected policy.
Possible values are:
none or 0
This setting disables fail_over_mac, and causes
bonding to set all slaves of an active-backup bond to
the same MAC address at enslavement time. This is the
default.
active or 1
The "active" fail_over_mac policy indicates that the
MAC address of the bond should always be the MAC
address of the currently active slave. The MAC
address of the slaves is not changed; instead, the MAC
address of the bond changes during a failover.
This policy is useful for devices that cannot ever
alter their MAC address, or for devices that refuse
incoming broadcasts with their own source MAC (which
interferes with the ARP monitor).
The down side of this policy is that every device on
the network must be updated via gratuitous ARP,
vs. just updating a switch or set of switches (which
often takes place for any traffic, not just ARP
traffic, if the switch snoops incoming traffic to
update its tables) for the traditional method. If the
gratuitous ARP is lost, communication may be
disrupted.
When this policy is used in conjuction with the mii
monitor, devices which assert link up prior to being
able to actually transmit and receive are particularly
susceptible to loss of the gratuitous ARP, and an
appropriate updelay setting may be required.
follow or 2
The "follow" fail_over_mac policy causes the MAC
address of the bond to be selected normally (normally
the MAC address of the first slave added to the bond).
However, the second and subsequent slaves are not set
to this MAC address while they are in a backup role; a
slave is programmed with the bond's MAC address at
failover time (and the formerly active slave receives
the newly active slave's MAC address).
This policy is useful for multiport devices that
either become confused or incur a performance penalty
when multiple ports are programmed with the same MAC
address.
The default policy is none, unless the first slave cannot
change its MAC address, in which case the active policy is
selected by default.
This option may be modified via sysfs only when no slaves are
present in the bond.
This option was added in bonding version 3.2.0. The "follow"
policy was added in bonding version 3.3.0.
lacp_rate
Option specifying the rate in which we'll ask our link partner
to transmit LACPDU packets in 802.3ad mode. Possible values
are:
slow or 0
Request partner to transmit LACPDUs every 30 seconds
fast or 1
Request partner to transmit LACPDUs every 1 second
The default is slow.
max_bonds
Specifies the number of bonding devices to create for this
instance of the bonding driver. E.g., if max_bonds is 3, and
the bonding driver is not already loaded, then bond0, bond1
and bond2 will be created. The default value is 1. Specifying
a value of 0 will load bonding, but will not create any devices.
miimon
Specifies the MII link monitoring frequency in milliseconds.
This determines how often the link state of each slave is
inspected for link failures. A value of zero disables MII
link monitoring. A value of 100 is a good starting point.
The use_carrier option, below, affects how the link state is
determined. See the High Availability section for additional
information. The default value is 0.
mode
Specifies one of the bonding policies. The default is
balance-rr (round robin). Possible values are:
balance-rr or 0
Round-robin policy: Transmit packets in sequential
order from the first available slave through the
last. This mode provides load balancing and fault
tolerance.
active-backup or 1
Active-backup policy: Only one slave in the bond is
active. A different slave becomes active if, and only
if, the active slave fails. The bond's MAC address is
externally visible on only one port (network adapter)
to avoid confusing the switch.
In bonding version 2.6.2 or later, when a failover
occurs in active-backup mode, bonding will issue one
or more gratuitous ARPs on the newly active slave.
One gratuitous ARP is issued for the bonding master
interface and each VLAN interfaces configured above
it, provided that the interface has at least one IP
address configured. Gratuitous ARPs issued for VLAN
interfaces are tagged with the appropriate VLAN id.
This mode provides fault tolerance. The primary
option, documented below, affects the behavior of this
mode.
balance-xor or 2
XOR policy: Transmit based on the selected transmit
hash policy. The default policy is a simple [(source
MAC address XOR'd with destination MAC address) modulo
slave count]. Alternate transmit policies may be
selected via the xmit_hash_policy option, described
below.
This mode provides load balancing and fault tolerance.
broadcast or 3
Broadcast policy: transmits everything on all slave
interfaces. This mode provides fault tolerance.
802.3ad or 4
IEEE 802.3ad Dynamic link aggregation. Creates
aggregation groups that share the same speed and
duplex settings. Utilizes all slaves in the active
aggregator according to the 802.3ad specification.
Slave selection for outgoing traffic is done according
to the transmit hash policy, which may be changed from
the default simple XOR policy via the xmit_hash_policy
option, documented below. Note that not all transmit
policies may be 802.3ad compliant, particularly in
regards to the packet mis-ordering requirements of
section 43.2.4 of the 802.3ad standard. Differing
peer implementations will have varying tolerances for
noncompliance.
Prerequisites:
1. Ethtool support in the base drivers for retrieving
the speed and duplex of each slave.
2. A switch that supports IEEE 802.3ad Dynamic link
aggregation.
Most switches will require some type of configuration
to enable 802.3ad mode.
balance-tlb or 5
Adaptive transmit load balancing: channel bonding that
does not require any special switch support. The
outgoing traffic is distributed according to the
current load (computed relative to the speed) on each
slave. Incoming traffic is received by the current
slave. If the receiving slave fails, another slave
takes over the MAC address of the failed receiving
slave.
Prerequisite:
Ethtool support in the base drivers for retrieving the
speed of each slave.
balance-alb or 6
Adaptive load balancing: includes balance-tlb plus
receive load balancing (rlb) for IPV4 traffic, and
does not require any special switch support. The
receive load balancing is achieved by ARP negotiation.
The bonding driver intercepts the ARP Replies sent by
the local system on their way out and overwrites the
source hardware address with the unique hardware
address of one of the slaves in the bond such that
different peers use different hardware addresses for
the server.
Receive traffic from connections created by the server
is also balanced. When the local system sends an ARP
Request the bonding driver copies and saves the peer's
IP information from the ARP packet. When the ARP
Reply arrives from the peer, its hardware address is
retrieved and the bonding driver initiates an ARP
reply to this peer assigning it to one of the slaves
in the bond. A problematic outcome of using ARP
negotiation for balancing is that each time that an
ARP request is broadcast it uses the hardware address
of the bond. Hence, peers learn the hardware address
of the bond and the balancing of receive traffic
collapses to the current slave. This is handled by
sending updates (ARP Replies) to all the peers with
their individually assigned hardware address such that
the traffic is redistributed. Receive traffic is also
redistributed when a new slave is added to the bond
and when an inactive slave is re-activated. The
receive load is distributed sequentially (round robin)
among the group of highest speed slaves in the bond.
When a link is reconnected or a new slave joins the
bond the receive traffic is redistributed among all
active slaves in the bond by initiating ARP Replies
with the selected MAC address to each of the
clients. The updelay parameter (detailed below) must
be set to a value equal or greater than the switch's
forwarding delay so that the ARP Replies sent to the
peers will not be blocked by the switch.
Prerequisites:
1. Ethtool support in the base drivers for retrieving
the speed of each slave.
2. Base driver support for setting the hardware
address of a device while it is open. This is
required so that there will always be one slave in the
team using the bond hardware address (the
curr_active_slave) while having a unique hardware
address for each slave in the bond. If the
curr_active_slave fails its hardware address is
swapped with the new curr_active_slave that was
chosen.
num_grat_arp
Specifies the number of gratuitous ARPs to be issued after a
failover event. One gratuitous ARP is issued immediately after
the failover, subsequent ARPs are sent at a rate of one per link
monitor interval (arp_interval or miimon, whichever is active).
The valid range is 0 - 255; the default value is 1. This option
affects only the active-backup mode. This option was added for
bonding version 3.3.0.
num_unsol_na
Specifies the number of unsolicited IPv6 Neighbor Advertisements
to be issued after a failover event. One unsolicited NA is issued
immediately after the failover.
The valid range is 0 - 255; the default value is 1. This option
affects only the active-backup mode. This option was added for
bonding version 3.4.0.
primary
A string (eth0, eth2, etc) specifying which slave is the
primary device. The specified device will always be the
active slave while it is available. Only when the primary is
off-line will alternate devices be used. This is useful when
one slave is preferred over another, e.g., when one slave has
higher throughput than another.
The primary option is only valid for active-backup mode.
primary_reselect
Specifies the reselection policy for the primary slave. This
affects how the primary slave is chosen to become the active slave
when failure of the active slave or recovery of the primary slave
occurs. This option is designed to prevent flip-flopping between
the primary slave and other slaves. Possible values are:
always or 0 (default)
The primary slave becomes the active slave whenever it
comes back up.
better or 1
The primary slave becomes the active slave when it comes
back up, if the speed and duplex of the primary slave is
better than the speed and duplex of the current active
slave.
failure or 2
The primary slave becomes the active slave only if the
current active slave fails and the primary slave is up.
The primary_reselect setting is ignored in two cases:
If no slaves are active, the first slave to recover is
made the active slave.
When initially enslaved, the primary slave is always made
the active slave.
Changing the primary_reselect policy via sysfs will cause an
immediate selection of the best active slave according to the new
policy. This may or may not result in a change of the active
slave, depending upon the circumstances.
This option was added for bonding version 3.6.0.
updelay
Specifies the time, in milliseconds, to wait before enabling a
slave after a link recovery has been detected. This option is
only valid for the miimon link monitor. The updelay value
should be a multiple of the miimon value; if not, it will be
rounded down to the nearest multiple. The default value is 0.
use_carrier
Specifies whether or not miimon should use MII or ETHTOOL
ioctls vs. netif_carrier_ok() to determine the link
status. The MII or ETHTOOL ioctls are less efficient and
utilize a deprecated calling sequence within the kernel. The
netif_carrier_ok() relies on the device driver to maintain its
state with netif_carrier_on/off; at this writing, most, but
not all, device drivers support this facility.
If bonding insists that the link is up when it should not be,
it may be that your network device driver does not support
netif_carrier_on/off. The default state for netif_carrier is
"carrier on," so if a driver does not support netif_carrier,
it will appear as if the link is always up. In this case,
setting use_carrier to 0 will cause bonding to revert to the
MII / ETHTOOL ioctl method to determine the link state.
A value of 1 enables the use of netif_carrier_ok(), a value of
0 will use the deprecated MII / ETHTOOL ioctls. The default
value is 1.
xmit_hash_policy
Selects the transmit hash policy to use for slave selection in
balance-xor and 802.3ad modes. Possible values are:
layer2
Uses XOR of hardware MAC addresses to generate the
hash. The formula is
(source MAC XOR destination MAC) modulo slave count
This algorithm will place all traffic to a particular
network peer on the same slave.
This algorithm is 802.3ad compliant.
layer2+3
This policy uses a combination of layer2 and layer3
protocol information to generate the hash.
Uses XOR of hardware MAC addresses and IP addresses to
generate the hash. The formula is
(((source IP XOR dest IP) AND 0xffff) XOR
( source MAC XOR destination MAC ))
modulo slave count
This algorithm will place all traffic to a particular
network peer on the same slave. For non-IP traffic,
the formula is the same as for the layer2 transmit
hash policy.
This policy is intended to provide a more balanced
distribution of traffic than layer2 alone, especially
in environments where a layer3 gateway device is
required to reach most destinations.
This algorithm is 802.3ad compliant.
layer3+4
This policy uses upper layer protocol information,
when available, to generate the hash. This allows for
traffic to a particular network peer to span multiple
slaves, although a single connection will not span
multiple slaves.
The formula for unfragmented TCP and UDP packets is
((source port XOR dest port) XOR
((source IP XOR dest IP) AND 0xffff)
modulo slave count
For fragmented TCP or UDP packets and all other IP
protocol traffic, the source and destination port
information is omitted. For non-IP traffic, the
formula is the same as for the layer2 transmit hash
policy.
This policy is intended to mimic the behavior of
certain switches, notably Cisco switches with PFC2 as
well as some Foundry and IBM products.
This algorithm is not fully 802.3ad compliant. A
single TCP or UDP conversation containing both
fragmented and unfragmented packets will see packets
striped across two interfaces. This may result in out
of order delivery. Most traffic types will not meet
this criteria, as TCP rarely fragments traffic, and
most UDP traffic is not involved in extended
conversations. Other implementations of 802.3ad may
or may not tolerate this noncompliance.
The default value is layer2. This option was added in bonding
version 2.6.3. In earlier versions of bonding, this parameter
does not exist, and the layer2 policy is the only policy. The
layer2+3 value was added for bonding version 3.2.2.
3. Configuring Bonding Devices
==============================
You can configure bonding using either your distro's network
initialization scripts, or manually using either ifenslave or the
sysfs interface. Distros generally use one of two packages for the
network initialization scripts: initscripts or sysconfig. Recent
versions of these packages have support for bonding, while older
versions do not.
We will first describe the options for configuring bonding for
distros using versions of initscripts and sysconfig with full or
partial support for bonding, then provide information on enabling
bonding without support from the network initialization scripts (i.e.,
older versions of initscripts or sysconfig).
If you're unsure whether your distro uses sysconfig or
initscripts, or don't know if it's new enough, have no fear.
Determining this is fairly straightforward.
First, issue the command:
$ rpm -qf /sbin/ifup
It will respond with a line of text starting with either
"initscripts" or "sysconfig," followed by some numbers. This is the
package that provides your network initialization scripts.
Next, to determine if your installation supports bonding,
issue the command:
$ grep ifenslave /sbin/ifup
If this returns any matches, then your initscripts or
sysconfig has support for bonding.
3.1 Configuration with Sysconfig Support
----------------------------------------
This section applies to distros using a version of sysconfig
with bonding support, for example, SuSE Linux Enterprise Server 9.
SuSE SLES 9's networking configuration system does support
bonding, however, at this writing, the YaST system configuration
front end does not provide any means to work with bonding devices.
Bonding devices can be managed by hand, however, as follows.
First, if they have not already been configured, configure the
slave devices. On SLES 9, this is most easily done by running the
yast2 sysconfig configuration utility. The goal is for to create an
ifcfg-id file for each slave device. The simplest way to accomplish
this is to configure the devices for DHCP (this is only to get the
file ifcfg-id file created; see below for some issues with DHCP). The
name of the configuration file for each device will be of the form:
ifcfg-id-xx:xx:xx:xx:xx:xx
Where the "xx" portion will be replaced with the digits from
the device's permanent MAC address.
Once the set of ifcfg-id-xx:xx:xx:xx:xx:xx files has been
created, it is necessary to edit the configuration files for the slave
devices (the MAC addresses correspond to those of the slave devices).
Before editing, the file will contain multiple lines, and will look
something like this:
BOOTPROTO='dhcp'
STARTMODE='on'
USERCTL='no'
UNIQUE='XNzu.WeZGOGF+4wE'
_nm_name='bus-pci-0001:61:01.0'
Change the BOOTPROTO and STARTMODE lines to the following:
BOOTPROTO='none'
STARTMODE='off'
Do not alter the UNIQUE or _nm_name lines. Remove any other
lines (USERCTL, etc).
Once the ifcfg-id-xx:xx:xx:xx:xx:xx files have been modified,
it's time to create the configuration file for the bonding device
itself. This file is named ifcfg-bondX, where X is the number of the
bonding device to create, starting at 0. The first such file is
ifcfg-bond0, the second is ifcfg-bond1, and so on. The sysconfig
network configuration system will correctly start multiple instances
of bonding.
The contents of the ifcfg-bondX file is as follows:
BOOTPROTO="static"
BROADCAST="10.0.2.255"
IPADDR="10.0.2.10"
NETMASK="255.255.0.0"
NETWORK="10.0.2.0"
REMOTE_IPADDR=""
STARTMODE="onboot"
BONDING_MASTER="yes"
BONDING_MODULE_OPTS="mode=active-backup miimon=100"
BONDING_SLAVE0="eth0"
BONDING_SLAVE1="bus-pci-0000:06:08.1"
Replace the sample BROADCAST, IPADDR, NETMASK and NETWORK
values with the appropriate values for your network.
The STARTMODE specifies when the device is brought online.
The possible values are:
onboot: The device is started at boot time. If you're not
sure, this is probably what you want.
manual: The device is started only when ifup is called
manually. Bonding devices may be configured this
way if you do not wish them to start automatically
at boot for some reason.
hotplug: The device is started by a hotplug event. This is not
a valid choice for a bonding device.
off or ignore: The device configuration is ignored.
The line BONDING_MASTER='yes' indicates that the device is a
bonding master device. The only useful value is "yes."
The contents of BONDING_MODULE_OPTS are supplied to the
instance of the bonding module for this device. Specify the options
for the bonding mode, link monitoring, and so on here. Do not include
the max_bonds bonding parameter; this will confuse the configuration
system if you have multiple bonding devices.
Finally, supply one BONDING_SLAVEn="slave device" for each
slave. where "n" is an increasing value, one for each slave. The
"slave device" is either an interface name, e.g., "eth0", or a device
specifier for the network device. The interface name is easier to
find, but the ethN names are subject to change at boot time if, e.g.,
a device early in the sequence has failed. The device specifiers
(bus-pci-0000:06:08.1 in the example above) specify the physical
network device, and will not change unless the device's bus location
changes (for example, it is moved from one PCI slot to another). The
example above uses one of each type for demonstration purposes; most
configurations will choose one or the other for all slave devices.
When all configuration files have been modified or created,
networking must be restarted for the configuration changes to take
effect. This can be accomplished via the following:
# /etc/init.d/network restart
Note that the network control script (/sbin/ifdown) will
remove the bonding module as part of the network shutdown processing,
so it is not necessary to remove the module by hand if, e.g., the
module parameters have changed.
Also, at this writing, YaST/YaST2 will not manage bonding
devices (they do not show bonding interfaces on its list of network
devices). It is necessary to edit the configuration file by hand to
change the bonding configuration.
Additional general options and details of the ifcfg file
format can be found in an example ifcfg template file:
/etc/sysconfig/network/ifcfg.template
Note that the template does not document the various BONDING_
settings described above, but does describe many of the other options.
3.1.1 Using DHCP with Sysconfig
-------------------------------
Under sysconfig, configuring a device with BOOTPROTO='dhcp'
will cause it to query DHCP for its IP address information. At this
writing, this does not function for bonding devices; the scripts
attempt to obtain the device address from DHCP prior to adding any of
the slave devices. Without active slaves, the DHCP requests are not
sent to the network.
3.1.2 Configuring Multiple Bonds with Sysconfig
-----------------------------------------------
The sysconfig network initialization system is capable of
handling multiple bonding devices. All that is necessary is for each
bonding instance to have an appropriately configured ifcfg-bondX file
(as described above). Do not specify the "max_bonds" parameter to any
instance of bonding, as this will confuse sysconfig. If you require
multiple bonding devices with identical parameters, create multiple
ifcfg-bondX files.
Because the sysconfig scripts supply the bonding module
options in the ifcfg-bondX file, it is not necessary to add them to
the system /etc/modules.conf or /etc/modprobe.conf configuration file.
3.2 Configuration with Initscripts Support
------------------------------------------
This section applies to distros using a recent version of
initscripts with bonding support, for example, Red Hat Enterprise Linux
version 3 or later, Fedora, etc. On these systems, the network
initialization scripts have knowledge of bonding, and can be configured to
control bonding devices. Note that older versions of the initscripts
package have lower levels of support for bonding; this will be noted where
applicable.
These distros will not automatically load the network adapter
driver unless the ethX device is configured with an IP address.
Because of this constraint, users must manually configure a
network-script file for all physical adapters that will be members of
a bondX link. Network script files are located in the directory:
/etc/sysconfig/network-scripts
The file name must be prefixed with "ifcfg-eth" and suffixed
with the adapter's physical adapter number. For example, the script
for eth0 would be named /etc/sysconfig/network-scripts/ifcfg-eth0.
Place the following text in the file:
DEVICE=eth0
USERCTL=no
ONBOOT=yes
MASTER=bond0
SLAVE=yes
BOOTPROTO=none
The DEVICE= line will be different for every ethX device and
must correspond with the name of the file, i.e., ifcfg-eth1 must have
a device line of DEVICE=eth1. The setting of the MASTER= line will
also depend on the final bonding interface name chosen for your bond.
As with other network devices, these typically start at 0, and go up
one for each device, i.e., the first bonding instance is bond0, the
second is bond1, and so on.
Next, create a bond network script. The file name for this
script will be /etc/sysconfig/network-scripts/ifcfg-bondX where X is