The Distributed Replication Block Device (DRBD) is a software-based, shared-nothing, replication storage solution mirroring the content of block devices (hard disks, partitions, logical volumes etc.) between servers.
In single-primary mode, any resource is, at any given time, in the primary role on only one cluster member. Since it is thus guaranteed that only one cluster node manipulates the data at any moment, this mode can be used with any conventional file system (ext3, ext4, XFS etc.).
In dual-primary mode, any resource is, at any given time, in the primary role on both cluster nodes. Since concurrent access to the data is thus possible, this mode requires the use of a shared cluster file system that utilizes a distributed lock manager.
With a standard filesystem (Ext3/4, XFS ...), the following situation could arise: Imagine a DRBD resource with an Ext4 filesystem on it. DRBD is in Dual-Primary mode, the Ext4 filesystem is mounted on both cluster nodes. Application A writes something down to the filesystem residing on the DRBD resource on node A, which then gets written to the physical storage device. At the very same time, application B tries to write something down to the filesystem on node B, which gets written down to exactly the same region on the storage device of node B.
DRBD replicates the changes from node A to node B and the other way around. It changes the contents of the physical storage device. However - as DRBD resides under the mentioned Ext4 filesystems, the filesystem on the physical disk of node A does not notice the changes coming from node B (and vice versa). This process is called a concurrent write. Starting from now, the actual content of the storage device differs from what the filesystem there thinks it should be.
Protocol A.
Asynchronous replication protocol. Local write operations on the primary node are considered completed as soon as the local disk write has occurred, and the replication packet has been placed in the local TCP send buffer.
Protocol C.
Synchronous replication protocol. Local write operations on the primary node are considered completed only after both the local and the remote disk write have been confirmed.
Protocol D.
Protocol D which operates like protocol A, with the following differences:
-
The system can operate in Dual Primary mode with the following properties:
-
Writes propagated to remote machine are "buffered" in memory (epoch buffers)
-
When a checkpoint is issued by local machine (currently via an ioctl interface), it is queued up as a P_BARRIER packet (with a checkpoint number).
-
On receiving this special barrier packet, the Remote machine acknowledges reception of all data buffers via P_CHECKPOINT_ACK and then falls through the usual barrier sync code (creates a new epoch and asynchronously start flushing the previous epoch buffer to disk).
-
The wait_for_checkpoint_ack ioctl returns success to caller on receipt of P_CHECKPOINT_ACK. This constitutes a checkpoint commit.
-
If local machine fails, remote machine discards it's current epoch (uncommitted checkpoint)
-
-
The system can also operate in usual primary/secondary mode, in which case, the functionality is identical to that of Protocol A, ie there are no checkpoints. Plain asynchronous replication.
After you have installed DRBD, you must set aside a roughly identically sized storage area on both cluster nodes. This will become the lower-level device for your DRBD resource. You may use any type of block device found on your system for this purpose. Typical examples include:
- A hard drive partition (or a full physical hard drive),
- an LVM Logical Volume or any other block device configured by the Linux device-mapper infrastructure
For the purpose of this guide, we assume a very simple setup:
- Both hosts have a free (currently unused) partition named
/dev/sda7
. - We are using internal meta data.
For the purpose of this guide, we assume a minimal setup in line with the examples given in the previous sections:
resource r0 {
on alice {
device /dev/drbd1;
disk /dev/sda7;
address 10.1.1.31:7789;
meta-disk internal;
}
on bob {
device /dev/drbd1;
disk /dev/sda7;
address 10.1.1.32:7789;
meta-disk internal;
}
}
This example configures DRBD in the following fashion:
- Our cluster consists of two nodes,
alice
andbob
. - We have a resource arbitrarily named
r0
which uses/dev/sda7
as the lower-level device, and is configured with internal meta data. - The resource uses TCP port 7789 for its network connections, and bind to the IP addresses 10.1.1.31 and 10.1.1.32 respectively.
In fact, you can use a shorthand notation for the on host
sub-sections, too: every option whose values are equal on both hosts may be specified directly in the resource
section. Thus, we can further condense this section, in comparison with the example cited above:
resource r0 {
device /dev/drbd1;
disk /dev/sda7;
meta-disk internal;
on alice {
address 10.1.1.31:7789;
}
on bob {
address 10.1.1.32:7789;
}
}
In order to use a DRBD resource as the virtual block device, you must add a line like the following to your Xen domU configuration:
disk = [ 'drbd:resource,xvda,w' ]
This example configuration makes the DRBD resource named resource
available to the domU as /dev/xvda
in read/write mode (w
).