This project is not being maintained anymore.
Consul is an open-source tool for service discovery and configuration. This project aims to deploy consul in a distributed and high-availability manner on DC/OS clusters and provides a package for easy installation and management.
(!) This package is currently in beta. Use in production environments at your own risk.
This package is available in the DC/OS Universe.
- DC/OS cluster running at least 1.11
- For TLS support DC/OS Enterprise is required
- Installed and configured dcos cli
dcos package install consul
By default the package will install 3 nodes. Check the DC/OS UI to see if all of the nodes have been started. Once the nodes have been started you can reach the HTTP API from inside the cluster via http://api.consul.l4lb.thisdcos.directory:8500
.
You can customize your installation using an options file.
If you want to enable TLS support you need to provide a serviceaccount (only works on DC/OS EE):
dcos security org service-accounts keypair private-key.pem public-key.pem
dcos security org service-accounts create -p public-key.pem -d "Consul service account" consul-principal
dcos security secrets create-sa-secret --strict private-key.pem consul-principal consul/principal
dcos security org groups add_user superusers consul-principal
Then create a options.json
file with the following contents:
{
"service": {
"service_account_secret": "consul/principal",
"service_account": "consul-principal"
},
"consul": {
"security": {
"gossip_encryption_key": "toEtMu3TSeQasOI2Zg/OVg==",
"transport_encryption_enabled": true
}
}
}
For more configuration options see dcos package describe consul --config
.
For any non-demo developments you must generate your own gossip encryption key. To do so download the consul binary from the consul homepage and run ./consul keygen
. Add the output as value for gossip_encryption_key
.
To install the customized configuration run dcos package install consul --options=options.json
.
After the framework has been started you can reach the HTTPS API via http://api-tls.consul.l4lb.thisdcos.directory:8501
. The endpoint uses certificates signed by the cluster-internal DC/OS CA. So you need to either provide the CA certificate to your clients (recommended, see DC/OS documentation on how to retrive it) or disable certificate checking (only do that for testing).
To change the configuration of consul update your options file and then run dcos package update start --options=options.json
. Be aware that during the update all the consul nodes will be restarted one by one and there will be a short downtime when the current leader is restarted.
You can increase the number of nodes (check the consul deployment table on recommended number of nodes), but not decrease it to avoid data loss.
Consul stores its data locally on the host system it is running on. The data will survive a restart. In the event of a host failure the consul node running on that host is lost and must be replaced. To do so execute the following steps:
- Find out which node is lost by running
dcos consul pod status
. Let's assume it isconsul-2
. - Force-leave the failed node from consul by running
dcos task exec -it consul-0-node ./consul force-leave <node-name>
(e.g.dcos task exec -it consul-0-node ./consul force-leave consul-2-node
). - Replace the failed pod:
dcos consul pod replace <pod-name>
(e.g.dcos consul pod replace consul-2
).
If you replaced the pod without first executing the force-leave, the new node will join the cluster nonetheless, but all consul instances will report errors of the form
Error while renaming Node ID: "f4ed39ca-3a00-4554-8d5e-e952488d670f": Node name consul-2-node is reserved by node 55921e76-a55b-5cf5-fc25-7936c57ce05d with name consul-2-node
.
To get rid of these errors, do the following (again assuming the pod in question is consul-2
):
dcos consul debug pod pause consul-2
dcos consul task exec -it consul-0-node ./consul force-leave consul-2-node
dcos consul debug pod resume consul-2
After this the new node will rejoin the cluster.
You should only replace one node at a time and wait between nodes to give the cluster time to stabilize. Depending on your configured number of nodes consul will survive the loss of one or more nodes (for three nodes one node can be lost) and will remain operational.
In case you lose a majority of nodes or the cluster managed to get into a state where the nodes are not able to properly resync and elect a leader, there is a disaster recovery procedure than can help:
- Put all nodes not lost in pause mode
dcos consul debug pod pause <pod-name>
. - Select one of the not lost nodes as your new initial leader (if possible use
consul-0
as the other nodes use it as starting point for finding the cluster). - Enter the node (
dcos consul task exec -it consul-0-node bash
). - Determine node id using
cat consul-data/node-id
. - Create a file
consul-data/raft/peers.json
with the following content:[{"non_voter": false, "id": "<node-id>", "address": "consul-0-node.consul.autoip.dcos.thisdcos.directory:8500"}]
. - Exit the node and resume it
dcos consul debug pod resume consul-0
. - Look at the logs and verify the node starts up and elects itsself as leader.
- One by one resume or replace all the other nodes and make sure they join the cluster.
Also see the consul outage recovery documentation for more details.
- Deploys a distributed consul cluster
- Supports configurable number of nodes (minimum of three nodes)
- Configuration changes and version updates in a rolling-restart fashion
- Automatic TLS encryption
- Due to the nature of the leader failure detection and reelection process short downtimes during updates can not be avoided.
- During a pod restart there can be warnings in the logs of the consul nodes about connection problems for a few minutes.
- Replacing a failed node requires manual intervention in consul to clear out the old node.
This framework is based on the DC/OS SDK and was developed using dcosdev. Thanks to Mesosphere for providing these tools.
This project is not associated with HashiCorp in any form.
This software is provided as-is. Use at your own risk.