Skip to content

Commit

Permalink
docs: workflow for master migration
Browse files Browse the repository at this point in the history
Change-Id: I9b9c66505e0efd1f4aef80884346507d4fe08d9c
Reviewed-on: http://gerrit.cloudera.org:8080/4300
Reviewed-by: David Ribeiro Alves <[email protected]>
Tested-by: Kudu Jenkins
  • Loading branch information
adembo committed Sep 16, 2016
1 parent 7959cd4 commit 1610b4a
Showing 1 changed file with 160 additions and 0 deletions.
160 changes: 160 additions & 0 deletions docs/administration.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -194,3 +194,163 @@ WARNING: Although metrics logging automatically rolls and compresses previous lo
not remove old ones. Since metrics logging can use significant amounts of disk space,
consider setting up a system utility to monitor space in the log directory and archive or
delete old segments.

== Common Kudu workflows

=== Migrating to Multiple Kudu Masters

For high availability and to avoid a single point of failure, Kudu clusters should be created with
multiple masters. Many Kudu clusters were created with just a single master, either for simplicity
or because Kudu multi-master support was still experimental at the time. This workflow demonstrates
how to migrate to a multi-master configuration.

WARNING: The workflow is unsafe for adding new masters to an existing multi-master configuration.
Do not use it for that purpose.

WARNING: The workflow presupposes at least basic familiarity with Kudu configuration management. If
using Cloudera Manager (CM), the workflow also presupposes familiarity with it.

WARNING: All of the command line steps below should be executed as the Kudu UNIX user, typically
`kudu`.

==== Prepare for the migration

. Establish a maintenance window (one hour should be sufficient). During this time the Kudu cluster
will be unavailable.

. Decide how many masters to use. The number of masters should be odd. Three or five node master
configurations are recommendeded; they can tolerate one or two failures respectively.

. Perform the following preparatory steps for the existing master:
* Identify and record the directory where the master's data lives. If using Kudu system packages,
the default value is /var/lib/kudu/master, but it may be customized via the `fs_wal_dir`
configuration parameter.
* Identify and record the port the master is using for RPCs. The default port value is 7051, but it
may have been customized using the `rpc_bind_addresses` configuration parameter.
* Identify the master's UUID. It can be fetched using the following command:
+
[source,bash]
----
$ kudu fs dump uuid --fs_wal_dir=<master_data_dir> 2>/dev/null
----
master_data_dir:: existing master's previously recorded data directory
+
[source,bash]
.Example
----
$ kudu fs dump uuid --fs_wal_dir=/var/lib/kudu/master 2>/dev/null
4aab798a69e94fab8d77069edff28ce0
$
----
+
* Optional: configure a DNS alias for the master. The alias could be a DNS cname (if the machine
already has an A record in DNS), an A record (if the machine is only known by its IP address),
or an alias in /etc/hosts. Doing this simplifies recovering from permanent master failures
greatly, and is highly recommended. The alias should be an abstract representation of the
master (e.g. `master-1`).

. Perform the following preparatory steps for each new master:
* Choose an unused machine in the cluster. The master generates very little load so it can be
colocated with other data services or load-generating processes, though not with another Kudu
master from the same configuration.
* Ensure Kudu is installed on the machine, either via system packages (in which case the `kudu` and
`kudu-master` packages should be installed), or via some other means.
* Choose and record the directory where the master's data will live.
* Choose and record the port the master should use for RPCs.
* Optional: configure a DNS alias for the master (e.g. `master-2`, `master-3`, etc).

==== Perform the migration

. Stop all the Kudu processes in the entire cluster.

. Format the data directory on each new master machine, and record the generated UUID. Use the
following command sequence:
+
[source,bash]
----
$ kudu fs format --fs_wal_dir=<master_data_dir>
$ kudu fs dump uuid --fs_wal_dir=<master_data_dir> 2>/dev/null
----
+
master_data_dir:: new master's previously recorded data directory
+
[source,bash]
.Example
----
$ kudu fs format --fs_wal_dir=/var/lib/kudu/master
$ kudu fs dump uuid --fs_wal_dir=/var/lib/kudu/master 2>/dev/null
f5624e05f40649b79a757629a69d061e
$
----

. If using CM, add the new Kudu master roles now, but do not start them. If using DNS aliases,
override the empty value of the `Master Address` parameter for each role (including the
existing master role) with that master's alias. Add the port number (separated by a colon) if
using a non-default RPC port value.

. Rewrite the master's Raft configuration with the following command, executed on the existing
master machine:
+
[source,bash]
----
$ kudu local_replica cmeta rewrite_raft_config --fs_wal_dir=<master_data_dir> <tablet_id> <all_masters>
----
+
master_data_dir:: existing master's previously recorded data directory
tablet_id:: must be the string `00000000000000000000000000000000`
all_masters:: space-separated list of masters, both new and existing. Each entry in the list must be
a string of the form `<uuid>:<hostname>:<port>`
uuid::: master's previously recorded UUID
hostname::: master's previously recorded hostname or alias
port::: master's previously recorded RPC port number
+
[source,bash]
.Example
----
$ kudu local_replica cmeta rewrite_raft_config --fs_wal_dir=/var/lib/kudu/master 00000000000000000000000000000000 4aab798a69e94fab8d77069edff28ce0:master-1:7051 f5624e05f40649b79a757629a69d061e:master-2:7051 988d8ac6530f426cbe180be5ba52033d:master-3:7051
----

. Start the existing master.

. Copy the master data to each new master with the following command, executed on each new master
machine:
+
[source,bash]
----
$ kudu local_replica copy_from_remote --fs_wal_dir=<master_data_dir> <tablet_id> <existing_master>
----
+
master_data_dir:: new master's previously recorded data directory
tablet_id:: must be the string `00000000000000000000000000000000`
existing_master:: RPC address of the existing master and must be a string of the form
`<hostname>:<port>`
hostname::: existing master's previously recorded hostname or alias
port::: existing master's previously recorded RPC port number
+
[source,bash]
.Example
----
$ kudu local_replica copy_from_remote --fs_wal_dir=/var/lib/kudu/master 00000000000000000000000000000000 master-1:7051
----

. Start all of the new masters.
+
WARNING: Skip the next step if using CM.
+
. Modify the value of the `tserver_master_addrs` configuration parameter for each tablet server.
The new value must be a comma-separated list of masters where each entry is a string of the form
`<hostname>:<port>`
hostname:: master's previously recorded hostname or alias
port:: master's previously recorded RPC port number

. Start all of the tablet servers.

Congratulations, the cluster has now been migrated to multiple masters! To verify that all masters
are working properly, consider performing the following sanity checks:

* Using a browser, visit each master's web UI. Look at the /masters page. All of the masters should
be listed there with one master in the LEADER role and the others in the FOLLOWER role. The
contents of /masters on each master should be the same.

* Run a Kudu system check (ksck) on the cluster using the `kudu` command line tool. Help for ksck
can be viewed via `kudu cluster ksck --help`.

0 comments on commit 1610b4a

Please sign in to comment.