- Download and Install the pre-built PipelineAI Installation Docker Image
-- OR --
- Install AWS CLI
- Install Docker Community Edition
- Install Miniconda with Python 3 Support
- Install Kubernetes CLI
- Install KOPS
Note: ALL COMMANDS MUST BE RUN WITHIN THE DOCKER CONTAINER STARTED IN STEP 1!!
We cannot support one-off environments!
aws configure
Enter ACCESS_KEY_ID
, SECRET_ACCESS_KEY
, and Default Region (ie. us-west-2
)
aws iam create-group --group-name kops
aws iam attach-group-policy --policy-arn arn:aws:iam::aws:policy/AmazonEC2FullAccess --group-name kops
aws iam attach-group-policy --policy-arn arn:aws:iam::aws:policy/AmazonRoute53FullAccess --group-name kops
aws iam attach-group-policy --policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess --group-name kops
aws iam attach-group-policy --policy-arn arn:aws:iam::aws:policy/IAMFullAccess --group-name kops
aws iam attach-group-policy --policy-arn arn:aws:iam::aws:policy/AmazonVPCFullAccess --group-name kops
aws iam create-user --user-name kops
aws iam add-user-to-group --user-name kops --group-name kops
aws iam create-access-key --user-name kops
^^ COPY THESE CREDENTIALS SOMEWHERE. YOU WILL NEED THEM LATER! ^^
aws configure
Enter new ACCESS_KEY_ID
, SECRET_ACCESS_KEY
, and Default Region (ie. us-west-2
) from above.
AWS Environment Variables
Because aws configure
doesn't export these vars for kops to use, we export them below.
export AWS_ACCESS_KEY_ID=`aws configure get aws_access_key_id`
export AWS_SECRET_ACCESS_KEY=`aws configure get aws_secret_access_key`
Cluster Name
If you see the following, CLUSTER_NAME
must be a fully-qualified domain name (ie. Route 53) per this link.
export CLUSTER_NAME=<your-cluster-name-with-fully-qualified-DNS-name>
Cluster Name must be a fully-qualified DNS name (ie. awscpu.pipeline.ai
)
S3 Bucket
This bucket must be accessible with the AWS Credentials used in the aws configure
command above.
export KOPS_STATE_STORE=<your-globally-unique-s3-bucket-name>
State Store must be fully-qualified s3 bucket such as s3://awscpu.pipeline.ai
aws s3 mb ${KOPS_STATE_STORE}
### EXPECTED OUTPUT ###
# make_bucket: <your-globally-unique-s3-bucket-name>
ssh-keygen -t rsa
### EXPECTED OUTPUT ###
# Generating public/private rsa key pair.
# Enter file in which to save the key (/root/.ssh/id_rsa):
# Created directory '/root/.ssh'.
# Enter passphrase (empty for no passphrase):
# Enter same passphrase again:
# Your identification has been saved in /root/.ssh/id_rsa.
# Your public key has been saved in /root/.ssh/id_rsa.pub.
# The key fingerprint is:
# ...
# The key's randomart image is:
# ...
^^ Copy these keys somewhere. You will need them later! ^^
kops create cluster \
--cloud aws \
--dns public \
--ssh-public-key ~/.ssh/id_rsa.pub \
--master-zones us-west-2b \
--master-size t2.medium \
--zones us-west-2b \
--node-count 1 \
--node-size r3.2xlarge \
--node-tenancy default \
--kubernetes-version 1.8.4 \
--image kope.io/k8s-1.7-debian-jessie-amd64-hvm-ebs-2017-07-28 \
--alsologtostderr \
--log_dir logs \
--v 5 \
--state ${KOPS_STATE_STORE} \
--name ${CLUSTER_NAME}
Note: You can use --vpc
to re-use your existing infrastructure.
Note 2: You can switch to flannel
for networking by adding --networking flannel
Note 3: Other images available here.
kops edit cluster --state ${KOPS_STATE_STORE} --name ${CLUSTER_NAME}
Copy the following at the BOTTOM of the spec:
# FROM HERE
kubeAPIServer:
runtimeConfig:
batch/v2alpha1: "true"
apps/v1alpha1: "true"
# TO HERE
Alpha feature configuration is described HERE
kops get ig --state ${KOPS_STATE_STORE} --name ${CLUSTER_NAME}
### EXPECTED OUTPUT ###
#
Using cluster from kubectl context: <cluster-name>
NAME ROLE MACHINETYPE MIN MAX ZONES
master-us-west-2b Master t2.medium 1 1 us-west-2b
nodes Node r3.2xlarge 1 1 us-west-2b
kops get ig --state s3://awsgpu.pipeline.ai --name awsgpu.pipeline.ai
kops edit ig nodes --state ${KOPS_STATE_STORE} --name ${CLUSTER_NAME}
Copy the following at the BOTTOM of the spec:
# FROM HERE
rootVolumeSize: 200
rootVolumeType: gp2
kubernetesVersion: 1.8.4
# TO HERE
kops get ig --state ${KOPS_STATE_STORE} --name ${CLUSTER_NAME}
### EXPECTED OUTPUT ###
#
NAME ROLE MACHINETYPE MIN MAX SUBNETS
master-us-west-2b Master t2.medium 1 1 us-west-2b
nodes Node r3.2xlarge 1 1 us-west-2b
kops update cluster --state ${KOPS_STATE_STORE} --name ${CLUSTER_NAME} --yes
** WAIT ABOUT 10-15 MINUTES BEFORE ATTEMPTING THESE NEXT STEPS!! **
Re-run this until cluster starts successfully.
If you have issues, either check the logs or the AWS Console (Autoscale Groups, etc).
kops validate cluster
Disable TLS Auth (For Now)
kubectl config set-cluster ${CLUSTER_NAME} --insecure-skip-tls-verify=true
kubectl get nodes
Step 3: Setup Kubernetes Add-Ons
Kubernetes Dashboard <-- HIGHLY RECOMMENDED
kubectl create -f https://raw.githubusercontent.com/kubernetes/kops/master/addons/kubernetes-dashboard/v1.7.1.yaml
Login to Kubernetes Dashboard
kubectl cluster-info
### EXPECTED OUTPUT ###
...
Kubernetes-dashboard is running at ... <-- COPY THIS URL, USE BELOW
Navigate your browser to the following:
https://<kubernetes-dashboard-url-from-above>
- Username: admin
- Password:
kops get secrets kube --type secret -oplaintext --state ${KOPS_STATE_STORE}
Heapster enables the Autoscaling features in Kubernetes
kubectl create -f https://raw.githubusercontent.com/kubernetes/kops/master/addons/monitoring-standalone/v1.7.0.yaml
kubectl create -f https://raw.githubusercontent.com/kubernetes/kops/master/addons/logging-elasticsearch/v1.6.0.yaml
Follow THESE instructions to train and serve models with PipelineAI.
- Make sure these Environment Variables have been set up above
kops delete --state ${KOPS_STATE_STORE} cluster --name ${CLUSTER_NAME}
- Add
--yes
to the command above when you're ready to delete the cluster
More kops
Commands
- Modify Cluster (
instanceType
,nodeCount
,rootVolumeSize
,rootVolumeOptimization
) - Use AWS Spot Instances (
maxPrice
)
- If you see the following, you need to set the
CLUSTER_NAME
andKOPS_STATE_STORE
environment variables in your shell before you can run any kops commands.
State store "" is not cloud-reachable - please use an S3 bucket
- If you see the following error related to
no credentials
orno space left
, you need to increase therootVolumeSize
of your EC2 instance per previous step.
Failed to pull image "docker.io/fluxcapacitor/jupyterhub": image pull failed for docker.io/pipelineai/...:<tag>, this may be because there are no credentials on this request. details: (write /mnt/sda1/var/lib/docker/tmp/GetImageBlob871155766: no space left on device)
Error syncing pod, skipping: failed to "StartContainer" for "jupyterhub" with ErrImagePull: "image pull failed for docker.io/pipelineai/...:<tag>, this may be because there are no credentials on this request. details: (write /mnt/sda1/var/lib/docker/tmp/GetImageBlob871155766: no space left on device)"