This repository contains the infrastructure code necessary to run the Million Container Challenge using HashiCorp's Nomad on Google's Compute Engine Cloud or Amazon Web Services.
We leverage Packer and Terraform to provision the infrastructure. Below are the instructions to provision the infrastructure.
Artifacts need to first be created for Terraform to provision. This can be accomplished by running the below commands in the packer/. directory. You can alternatively build these images locally by running the packer build
command instead of building them in Atlas with packer push
.
From the root directory of this repository, run the below commands to build your images with Packer.
If you're using GCE you will need to get an account.json
file from GCE and place it in the root of this repository.
cd packer
export ATLAS_USERNAME=YOUR_ATLAS_USERNAME
export GCE_PROJECT_ID=YOUR_GOOGLE_PROJECT_ID
export GCE_DEFAULT_ZONE=us-central1-a
export GCE_SOURCE_IMAGE=ubuntu-1404-trusty-v20160114e
packer push gce_utility.json
packer push gce_consul_server.json
packer push gce_nomad_server.json
packer push gce_nomad_client.json
cd packer
export ATLAS_USERNAME=YOUR_ATLAS_USERNAME
export AWS_ACCESS_KEY_ID=YOUR_AWS_ACCESS_KEY_ID
export AWS_SECRET_ACCESS_KEY=YOUR_AWS_SECRET_ACCESS_KEY
export AWS_DEFAULT_REGION=us-east-1
export AWS_SOURCE_AMI=ami-9a562df2
packer push aws_utility.json
packer push aws_consul_server.json
packer push aws_nomad_server.json
packer push aws_nomad_client.json
To provision the infrastructure necessary for C1M, run the below Terraform commands. If you want to provision locally rather than in Atlas, use the terraform apply
command instead of terraform push
.
From the root directory of this repository, run the below commands to provision your infrastructure with Terraform.
You only need to run the terraform remote config
command once.
If you're using GCE you will need to get an account.json
file from GCE and place it in the directory you're running Terraform commands from (terraform/_env/gce
).
cd terraform/_env/gce
export ATLAS_USERNAME=YOUR_ATLAS_USERNAME
export ATLAS_TOKEN=YOUR_ATLAS_TOKEN
export ATLAS_ENVIRONMENT=c1m-gce
terraform remote config -backend-config name=$ATLAS_USERNAME/$ATLAS_ENVIRONMENT # Only need to run this command once
terraform get
terraform push -name $ATLAS_USERNAME/$ATLAS_ENVIRONMENT -var "atlas_token=$ATLAS_TOKEN" -var "atlas_username=$ATLAS_USERNAME"
cd terraform/_env/aws
export ATLAS_USERNAME=YOUR_ATLAS_USERNAME
export ATLAS_TOKEN=YOUR_ATLAS_TOKEN
export ATLAS_ENVIRONMENT=c1m-aws
terraform remote config -backend-config name=$ATLAS_USERNAME/$ATLAS_ENVIRONMENT # Only need to run this command once
terraform get
terraform push -name $ATLAS_USERNAME/$ATLAS_ENVIRONMENT -var "atlas_token=$ATLAS_TOKEN" -var "atlas_username=$ATLAS_USERNAME"
To tweak the infrastructure size, update the Terraform variable(s) for the region(s) you're provisioning.
Once your infrastructure is provisioned, grab a Nomad Server to ssh
into from the output of the terraform apply
or from the web console. Jot down the public IP of the Nomad server you're going to run these jobs from, you will need to later to gather your results.
Run the below commands to schedule your first job. This will schedule 5 docker containers on 5 different nodes using the node_class
constraint. Each job contains a different number of tasks that will be scheduled by Nomad, the job types are defined below.
- Docker Driver (
classlogger_n_docker.nomad
)- Schedules n number of docker containers
- Docker Driver with Consul (
classlogger_n_consul_docker.nomad
)- Schedules n number of docker containers registering each container as a service with Consul
- Raw Fork/Exec Driver (
classlogger_n_raw_exec.nomad
)- Schedules n number of tasks
- Raw Fork/Exec Driver with Consul (
classlogger_n_consul_raw_exec.nomad
)- Schedules n number of tasks registering each task as a service with Consul
You can change the job being run by modifying the JOBSPEC
environment variable and change the number of jobs being run by modifying the JOBS
environment variables.
ssh ubuntu@NOMAD_SERVER_IP
cd /opt/nomad/jobs
sudo JOBSPEC=docker-classlogger-1.nomad JOBS=1 bench-runner /usr/local/bin/bench-nomad
To gather results, complete the C1M Results steps after running each job. Before adding more nodes to your infrastructure, be sure to pull down the Spawn Results locally so you can see how fast Terraform and GCE spun up each infrastructure size.
To gather the results of the C1M challenge, follow the below instructions.
Run the below commands from the Utility box to gather C1M spawn results.
consul exec 'scp -C -q -o StrictHostKeyChecking=no -i /home/ubuntu/c1m/site.pem /home/ubuntu/c1m/spawn/spawn.csv [email protected]:/home/ubuntu/c1m/spawn/$(hostname).file'
FILE=/home/ubuntu/c1m/spawn/$(date '+%s').csv && find /home/ubuntu/c1m/spawn/. -type f -name '*.file' -exec cat {} + >> $FILE && sed -i '1s/^/type,name,time\n/' $FILE
Make sure the consul exec
command pulled in all of the spawn files by running ls -1 | grep '.file' | wc -l
and confirm the count matches your number of nodes. This command is idempotent and can be run multiple times. Below are some examples of how you can get live updates on Consul and Nomad agents, as well as running consul-exec
until it has all necessary files.
# Live updates on Nomad agents
EXPECTED=5000 && CURRENT=0 && while [ $CURRENT -lt $EXPECTED ]; do CURRENT=$(nomad node-status | grep 'ready' | wc -l); echo "Nomad nodes ready: $CURRENT/$EXPECTED"; sleep 10; done
# Live updates on Consul agents
EXPECTED=5009 && CURRENT=0 && while [ $CURRENT -lt $EXPECTED ]; do CURRENT=$(consul members | grep 'alive' | wc -l); echo "Consul members alive: $CURRENT/$EXPECTED"; sleep 10; done
# Run consul exec until you have the number of spawn files you need
EXPECTED=5009 && CURRENT=0 && while [ $CURRENT -lt $EXPECTED ]; do consul exec 'scp -C -q -o StrictHostKeyChecking=no -i /home/ubuntu/c1m/site.pem /home/ubuntu/c1m/spawn/spawn.csv [email protected]:/home/ubuntu/c1m/spawn/$(hostname).file'; CURRENT=$(ls -1 /home/ubuntu/c1m/spawn | grep '.file' | wc -l); echo "Spawn files: $CURRENT/$EXPECTED"; sleep 60; done
Run spawn_results.sh locally after running all jobs for each node count to gather all C1M spawn results.
sh spawn_results.sh UTILITY_IP NODE_COUNT
Run c1m_results.sh locally after running each job for each node count to gather all C1M performance results.
sh c1m_results.sh NOMAD_SERVER_IP UTILITY_IP NODE_COUNT JOB_NAME
Use the below consul exec
commands to run a Nomad join operation on any subset of Nomad agents. This can be used to join servers, join clients to servers, or change the Nomad server cluster at which clients are pointing.
consul exec -datacenter gce-us-central1 -service nomad-server 'sudo /opt/nomad/nomad_join.sh "nomad-server?dc=gce-us-central1&passing" "server"'
consul exec -datacenter gce-us-central1 -service nomad-client 'sudo /opt/nomad/nomad_join.sh "nomad-server?dc=gce-us-central1&passing"'