This Workshop assumes you have access to an existing Kubernetes Cluster, but will quickly demo pathing using Google Kubernetes Engine using their web console
In addition, you will need a Datadog Account and have access to an API key -- Start a Free Trial Here!
This repo showcases a Kubernetes-based path to deploying a simple Python Flask Application Service and a simple Java SpringBoot Application Service that all return some sample text contained in a separate postgres container.
The goal of this repo is to demonstrate the steps involved in installing a Datadog agent and showcase the product's Infrastructure Monitoring, Application Performance Monitoring, Live Process/Container Monitoring, and Log Monitoring Capabilities in a Kubernetes x Docker based environment.
This repo makes no accommodations for proxy scenarios and does not fully accommodate situations where machines are unable to pull from the internet to download packages.
The gist of the setup portion is:
- Spin up GKE Instance
- Deploy!
Start GKE instance via their console To get started, you can simply use the Standard Cluster template with something like 3 nodes to get you in a good place to run this demo.
Find the Cluster instance in the page and click the Connect Button followed by the Run in Cloud Shell
option to spin up a browser based shell to interface with the cluster.
You will essentially SSH into a "gcloud" virtual machine terminal that is outside the scope of the cluster however Google preloads it with a command to scope the local kubectl interface to interact direct with the cluster.
You will also need to initiate with running
kubectl create clusterrolebinding cluster-admin-binding --clusterrole cluster-admin --user $(gcloud config get-value account)
This is important for the DCA and Kubernetes State metrics (explained later)
Store the Datadog API key in a kubernetes secret so its not directly in the deployment code
kubectl create secret generic datadog-api --from-literal=token=___INSERT_API_KEY_HERE___
Do the same thing for APP key in a secret called datadog-app
kubectl create secret generic datadog-app --from-literal=token=___INSERT_APP_KEY_HERE___
Create a secret 32 character token for DCA
kubectl create secret generic datadog-auth-token --from-literal=token=12345678901234567890123456789012
The key is then referenced in the Daemon file here
Should the cluster be unable to head over to Docker Hub and access the public files, its probably best to build the images locally
docker build -t sample_flask:007 ./flask_app/
docker build -t sample_postgres:007 ./postgres/
docker build -t sample_springboot:007 ./SpringBoot_app/
Be aware you need to change the yaml files so the referenced images becomes the above images you build
For example, change the referenced image in springboot to sample_springboot:007
Deploy the postgres container (this needs to happen first so the service host/port envs properly load into subsequent containers)
kubectl create -f postgres_deployment.yaml
Deploy the Flask application container and turn it into a service Also create a configMap for the logs product
kubectl create -f flask_deployment.yaml
Deploy the SpringBoot application container and turn it into a service Also create a configMap for the logs product
kubectl create -f springboot_deployment.yaml
Deploy the Datadog agent container
kubectl create -f agent_daemon.yaml
Deploy a nonfunction pause container to demonstrate Datadog AutoDiscovery via a simple HTTP check against www.google.com
kubectl create -f pause.yaml
Deploy kubernetes state files to demonstrate kubernetes_state check
kubectl create -f kubernetes
And we are done!
The Flask App offers 3 endpoints that returns some text FLASK_SERVICE_IP:5000/
, FLASK_SERVICE_IP:5000/query
, FLASK_SERVICE_IP:5000/bad
Run kubectl get services
to find the FLASK_SERVICE_IP address of the flask application service
The SpringBoot App offers 3 endpoints that returns some text SpringBoot_SERVICE_IP:8080/
, SpringBoot_SERVICE_IP:8080/query
Run kubectl get services
to find the SpringBoot_SERVICE_IP address of the flask application service
If you used the default GKE template or know the cluster is accessible via the internet, you can use the IP found in the EXTERNAL-IP
column as SERVICE_IP
then hit one of the following:
curl SpringBoot_SERVICE_IP:5000/
curl SpringBoot_SERVICE_IP:5000/query
to see the Flask application at work
otherwise you can reference the CLUSTER-IP
option as SERVICE_IP
You must SSH into one of the nodes in the cluster and then you can run one of the following:
In Google, you can simply do this by accessing their Compute Nodes Console Page and click the SSH button
Datadog's Java Agent Module after v0.17.0 can accommodate JMX metrics out of the box.
Simply turn on the feature and point the module to the relevant Datadog Agent Statsd endpoint and you should see jvm.*
metrics in your metrics summary page
The Datadog agent container should now be deployed and is acting as a collector and middleman between the services and Datadog's backend. Through actions -- curling the endpoints -- and doing nothing, metrics will be generated and directed to the corresponding Datadog Account based off your supplied API key
Below is a quick discussion on some points of interest
This part pertains to the ingestion of timeseries data, status checks, and events.
By deploying the agent referencing the Datadog Container Image in the agent_daemon.yaml file, the check automatically comes prepackaged with system level (CPU, Mem, IO, Disk), Kubernetes, and Docker level checks.
The gist of the setup portion is:
- Deploy the agent daemon the proper environment variables, volume and volumeMount arguments
- Deploy relevant applications with annotations
- Validate metrics go to agent and ends up in our application
volumeMount and volume for the proc directory is required from the host level
In the Datadog web application you can reference the host map, and filter on the particular hostname to see what is going on.
docker.sock and cgroup volumeMounts and volumes are required to be attached in the daemonset
Application/Service Pods and Containers go up and down. The Datadog agent traditionally requires a modification of the hardcoded host/port values of corresponding configuration files (example with postgres) and an agent restart to collect Data for installed software.
Rather than having a Mechanical Turk sit on standy ready to make the changes, the containerized accommodates makes this process automagic using Autodiscovery where the agent has the capability to monitor the annotations of deployments and automatically establish checks as pods come and go.
To set up autodiscovery, you will need to set up volumes and mountPaths to put the ethemeral configuration files
Autodiscovery of the postgres pod in this container is straightforward, simply annotation in the configuration file by adding in the typical check sections required.
Note the annotation arguments here must be identical for the agent to properly connect to the container.
Many services (like kubernetes itself) utilizes Prometheus as an enhancement to reveal custom internal metrics specific to the service.
You can see what the structure of the prometheus metrics look like by running:
curl SpringBoot_SERVICE_IP:8080/actuator/prometheus
Datadog has the innate capability to read the log structure of prometheus produced metrics and turn it into custom metrics. The scope of this repo doesn't really touch on it too much, but collecting prometheus metrics can simply be done via annotations done at the deployment level as seen here -- note this example fails on purpose so you can see what an error looks like in agent status
Read more about it via our documentation
Live Process/Container Monitoring is the capability to get container and process level granularity for all monitored systems. This feature provides not only standard system level metrics at the process/container level, but also on the initial run commands used to set up the process/container.
Simply add DD_PROCESS_AGENT_ENABLED env variable in the daemonset to turn on this feature
Sometimes passwords are revealed in the initial run commands, the agent comes equipped with passwd to remove a standard set of arguments
We again need docker.sock to get container information.
run kubectl get pods
to get the pod name of the agent container.
run kubectl exec -it POD_NAME bash
to hop into the container
run agent status
to see the status summary and look for the integrations section to see agent is collecting metrics
run cat /var/log/datadog/agent.log
to see logs pertaining to the agent
The same agent that handles infrastructure metrics can also accommodate receiving Trace Data from a designated APM module -- these modules sit on top of your applications and forward payloads to a local Datadog agent to middle man to our backend.
Applications are spinning up in pods and we need payloads being fired to the sidecar agent pod. In this example, we set up a route between the pods with a port going through the host level.
Enable agent to receive traces from the Agent deployment side via environment variables
Create a port connection to host via the 8126 port
Provide the deploy file with a link to the host level for port 8126 via environmental variables, so that applications can reference the host/port values to fire traces to
Have the ddtrace module
In the app.py code, import ddtrace module and patch both sqlalchemy and the Flask app object.
Note: the trace module is an implementation as all modules are. If certain spans are not being captured, you can always hardcode them in.
In the Dockerfile setting up SpringBoot, make sure you initiate the runtime commands to mount dd-java-agent.jar as a javaagent as seen here
Networking must also be possible so SpringBoot can fire to Agent pod's 8126 port so that means in springboot_deployment you need this piece in agent_daemon you need this piece
Note: the trace module is an implementation as all modules are so you can read what technologies we are compatible with here. If certain clauses in your code are not being captured, you can always hardcode them in.
Datadog also recently integrated their APM platform with the logs platform. You can totally think about APM traces as if they were logs. As of 10/11/18, you need to whitelist which APM services and endpoints you want collected via this agent_daemon configuration setting
run kubectl get pods
to get the pod name of the agent container.
run kubectl exec -it POD_NAME bash
to hop into the container
run agent status
to see the status summary and look for the tracing section to see agent has tracing turned on
run cat /var/log/datadog/trace-agent.log
to see logs pertaining to the trace agent
Head over to Trace Services Page and look for your service level metrics and traces!
The same agent polling for metrics periodically for infrastructure metrics, live process metrics, and middle manning trace transcations can also be set up to tail log instances.
Simply turn on DD_LOGS_ENABLED via the environmental variable in the agent daemon file.
Tailing logs from flask is pretty easy using kubernetes config maps.
Simply set up the mountPath directory connected via the host and the volume in the agent daemon
Set up the corresponding mounts for volume and mountPath so we can connect the flask pod to the relevant agent pod via the host
In the app, set up the app level logging configurations so logs are properly pushed to the right log file when routines are run (example)
run kubectl get pods
to get the pod name of the agent container.
run kubectl exec -it POD_NAME bash
to hop into the container
run agent status
to see the status summary and look for the logging section to see agent has logs turned on
run cat /var/log/datadog/agent.log
to see relevant log instances pertaining to the log agent
Navigate to the logs explorer page and look for flask logs!