This Workshop assumes you have access to an existing Kubernetes Cluster, but will quickly demo pathing using Google Kubernetes Engine using their web console
In addition, you will need a Datadog Account and have access to an API key -- Start a Free Trial Here!
This repo showcases a Kubernetes-based path to deploying a simple flask app container service that returns some sample text contained in a separate postgres container.
The goal of this repo is to demonstrate the steps involved in installing a Datadog agent to demonstrate the product's Infrastructure Monitoring, Application Performance Monitoring, Live Process/Container Monitoring, and Log Monitoring Capabilities in a Kubernetes x Docker based environment.
This repo makes no accommodations for proxy scenarios and does not fully accommodate situations where machines are unable to pull from the internet to download packages
The gist of the setup portion is:
- Spin up GKE Instance
- Deploy!
Start GKE instance via their console To get started, you can simply use the Standard Cluster template with something like 3 nodes to get you in a good place to run this demo.
Find the Cluster instance in the page and click the Connect Button followed by the Run in Cloud Shell
option to spin up a browser based shell to interface with the cluster.
You will essentially SSH into a "gcloud" virtual machine terminal that is outside the scope of the cluster however Google preloads it with a command to scope the local kubectl interface to interact direct with the cluster.
Store the Datadog API key in a kubernetes secret so its not directly in the deployment code
kubectl create secret generic datadog-api --from-literal=token=___INSERT_API_KEY_HERE___
The key is then referenced in the Daemon file here
Deploy the postgres container
kubectl create -f postgres_deployment.yaml
Deploy the application container and turn it into a service Also create a configMap for the logs product
kubectl create -f app_deployment.yaml
Deploy the Datadog agent container
kubectl create -f agent_daemon.yaml
Deploy a nonfunction pause container to demonstrate Datadog AutoDiscovery via a simple HTTP check against www.google.com
kubectl create -f pause.yaml
Deploy kubernetes state files to demonstrate kubernetes_state check
kubectl create -f kubernetes
And we are done!
The Flask App offers 3 endpoints that returns some text FLASK_SERVICE_IP:5000/
, FLASK_SERVICE_IP:5000/query
, FLASK_SERVICE_IP:5000/bad
Run kubectl get services
to find the FLASK_SERVICE_IP address of the flask application service
If you used the default GKE template or know the cluster is accessible via the internet, you can use the IP found in the EXTERNAL-IP
column as FLASK_SERVICE_IP
then hit one of the following:
curl FLASK_SERVICE_IP:5000/
curl FLASK_SERVICE_IP:5000/query
curl FLASK_SERVICE_IP:5000/bad
to see the Flask application at work
otherwise you can reference the CLUSTER-IP
option as FLASK_SERVICE_IP
You must SSH into one of the nodes in the cluster and then you can run one of the following:
curl FLASK_SERVICE_IP:5000/
curl FLASK_SERVICE_IP:5000/query
curl FLASK_SERVICE_IP:5000/bad
to see the Flask application at work
In Google, you can simply do this by accessing their Compute Nodes Console Page and click the SSH button
The Datadog agent container should now be deployed and is acting as a collector and middleman between the services and Datadog's backend. Through actions -- curling the endpoints -- and doing nothing, metrics will be generated and directed to the corresponding Datadog Account based off your supplied API key
Below is a quick discussion on some points of interest
This part pertains to the ingestion of timeseries data, status checks, and events.
By deploying the agent referencing the Datadog Container Image in the agent_daemon.yaml file, the check automatically comes prepackaged with system level (CPU, Mem, IO, Disk), Kubernetes, and Docker level checks.
The gist of the setup portion is:
- Deploy the agent daemon the proper environment variables, volume and volumeMount arguments
- Deploy relevant applications with annotations
- Validate metrics go to agent and ends up in our application
volumeMount and volume for the proc directory is required from the host level
In the Datadog web application you can reference the host map, and filter on the particular hostname to see what is going on.
docker.sock and cgroup volumeMounts and volumes are required to be attached in the daemonset
Application/Service Pods and Containers go up and down. The Datadog agent traditionally requires a modification of the hardcoded host/port values of corresponding configuration files (example with postgres) and an agent restart to collect Data for installed software.
Rather than having a Mechanical Turk sit on standy ready to make the changes, the containerized accommodates makes this process automagic using Autodiscovery where the agent has the capability to monitor the annotations of deployments and automatically establish checks as pods come and go.
To set up autodiscovery, you will need to set up volumes and mountPaths to put the ethemeral configuration files
Autodiscovery of the postgres pod in this container is straightforward, simply annotation in the configuration file by adding in the typical check sections required.
Note the annotation arguments here must be identical for the agent to properly connect to the container.
Many services (like kubernetes itself) utilizes Prometheus as an enhancement to reveal custom internal metrics specific to the service.
You can see what the structure of the prometheus metrics look like by running:
minikube ssh
curl localhost:10255/metrics
Datadog has the innate capability to read the log structure of prometheus produced metrics and turn it into custom metrics. The scope of this repo doesn't really touch on it too much, but collecting prometheus metrics can simply be done via annotations done at the deployment level as seen here -- note this example fails on purpose so you can see what an error looks like in agent status
Read more about it via our documentation
Live Process/Container Monitoring is the capability to get container and process level granularity for all monitored systems. This feature provides not only standard system level metrics at the process/container level, but also on the initial run commands used to set up the process/container.
Simply add DD_PROCESS_AGENT_ENABLED env variable in the daemonset to turn on this feature
Sometimes passwords are revealed in the initial run commands, the agent comes equipped with passwd to remove a standard set of arguments
We again need docker.sock to get container information.
run kubectl get pods
to get the pod name of the agent container.
run kubectl exec -it POD_NAME bash
to hop into the container
run agent status
to see the status summary and look for the integrations section to see agent is collecting metrics
run cat /var/log/datadog/agent.log
to see logs pertaining to the agent
The same agent that handles infrastructure metrics can also accommodate receiving Trace Data from a designated APM module -- these modules sit on top of your applications and forward payloads to a local Datadog agent to middle man to our backend.
Applications are spinning up in pods and we need payloads being fired to the sidecar agent pod. In this example, we set up a route between the pods with a port going through the host level.
Enable agent to receive traces from the Agent deployment side via environment variables
Create a port connection to host via the 8126 port
Provide the deploy file with a link to the host level for port 8126 via environmental variables, so that applications can reference the host/port values to fire traces to
Have the ddtrace module
In the app.py code, import ddtrace module and patch both sqlalchemy and the Flask app object.
Note: the trace module is an implementation as all modules are. If certain spans are not being captured, you can always hardcode them in.
run kubectl get pods
to get the pod name of the agent container.
run kubectl exec -it POD_NAME bash
to hop into the container
run agent status
to see the status summary and look for the tracing section to see agent has tracing turned on
run cat /var/log/datadog/trace-agent.log
to see logs pertaining to the trace agent
head over to Trace Services Page and look for your service level metrics and traces!
The same agent polling for metrics periodically for infrastructure metrics, live process metrics, and middle manning trace transcations can also be set up to tail log instances.
Simply turn on DD_LOGS_ENABLED via the environmental variable in the agent daemon file.
Tailing logs from flask is pretty easy using kubernetes config maps.
Simply set up the mountPath directory connected via the host and the volume in the agent daemon
Set up the corresponding mounts for volume and mountPath so we can connect the flask pod to the relevant agent pod via the host
In the app, set up the app level logging configurations so logs are properly pushed to the right log file when routines are run (example)
run kubectl get pods
to get the pod name of the agent container.
run kubectl exec -it POD_NAME bash
to hop into the container
run agent status
to see the status summary and look for the logging section to see agent has logs turned on
run cat /var/log/datadog/agent.log
to see relevant log instances pertaining to the log agent
Navigate to the logs explorer page and look for flask logs!