Skip to content

ziquanmiao/kubernetes_datadog

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Intro

This Workshop assumes you have access to an existing Kubernetes Cluster, but will quickly demo pathing using Google Kubernetes Engine using their web console

In addition, you will need a Datadog Account and have access to an API key -- Start a Free Trial Here!

This repo showcases a Kubernetes-based path to deploying a simple flask app container service that returns some sample text contained in a separate postgres container.

The goal of this repo is to demonstrate the steps involved in installing a Datadog agent to demonstrate the product's Infrastructure Monitoring, Application Performance Monitoring, Live Process/Container Monitoring, and Log Monitoring Capabilities in a Kubernetes x Docker based environment.

This repo makes no accommodations for proxy scenarios and does not fully accommodate situations where machines are unable to pull from the internet to download packages

Steps to Success

The gist of the setup portion is:

  1. Spin up GKE Instance
  2. Deploy!

Set up - Skippable if you already have a cluster

Start GKE instance via their console To get started, you can simply use the Standard Cluster template with something like 3 nodes to get you in a good place to run this demo.

Find the Cluster instance in the page and click the Connect Button followed by the Run in Cloud Shell option to spin up a browser based shell to interface with the cluster.

You will essentially SSH into a "gcloud" virtual machine terminal that is outside the scope of the cluster however Google preloads it with a command to scope the local kubectl interface to interact direct with the cluster.

Store the Datadog API key in a kubernetes secret so its not directly in the deployment code

kubectl create secret generic datadog-api --from-literal=token=___INSERT_API_KEY_HERE___

The key is then referenced in the Daemon file here

Deploy Things

Deploy the postgres container

kubectl create -f postgres_deployment.yaml

Deploy the application container and turn it into a service Also create a configMap for the logs product

kubectl create -f app_deployment.yaml

Deploy the Datadog agent container

kubectl create -f agent_daemon.yaml

Deploy a nonfunction pause container to demonstrate Datadog AutoDiscovery via a simple HTTP check against www.google.com

kubectl create -f pause.yaml

Deploy kubernetes state files to demonstrate kubernetes_state check

kubectl create -f kubernetes

And we are done!

Use the Flask App

The Flask App offers 3 endpoints that returns some text FLASK_SERVICE_IP:5000/, FLASK_SERVICE_IP:5000/query, FLASK_SERVICE_IP:5000/bad

Run kubectl get services to find the FLASK_SERVICE_IP address of the flask application service

Cluster Accessible via Internet

If you used the default GKE template or know the cluster is accessible via the internet, you can use the IP found in the EXTERNAL-IP column as FLASK_SERVICE_IP

then hit one of the following:

curl FLASK_SERVICE_IP:5000/
curl FLASK_SERVICE_IP:5000/query
curl FLASK_SERVICE_IP:5000/bad

to see the Flask application at work

Cluster Not Accessible Via Internet

otherwise you can reference the CLUSTER-IP option as FLASK_SERVICE_IP

You must SSH into one of the nodes in the cluster and then you can run one of the following:

curl FLASK_SERVICE_IP:5000/
curl FLASK_SERVICE_IP:5000/query
curl FLASK_SERVICE_IP:5000/bad

to see the Flask application at work

In Google, you can simply do this by accessing their Compute Nodes Console Page and click the SSH button

Some points of interest

The Datadog agent container should now be deployed and is acting as a collector and middleman between the services and Datadog's backend. Through actions -- curling the endpoints -- and doing nothing, metrics will be generated and directed to the corresponding Datadog Account based off your supplied API key

Below is a quick discussion on some points of interest

Infrastructure Product

This part pertains to the ingestion of timeseries data, status checks, and events.

By deploying the agent referencing the Datadog Container Image in the agent_daemon.yaml file, the check automatically comes prepackaged with system level (CPU, Mem, IO, Disk), Kubernetes, and Docker level checks.

The gist of the setup portion is:

  1. Deploy the agent daemon the proper environment variables, volume and volumeMount arguments
  2. Deploy relevant applications with annotations
  3. Validate metrics go to agent and ends up in our application

System Metric Requirements

volumeMount and volume for the proc directory is required from the host level

In the Datadog web application you can reference the host map, and filter on the particular hostname to see what is going on.

Kubernetes/Docker

docker.sock and cgroup volumeMounts and volumes are required to be attached in the daemonset

Autodiscovery

Application/Service Pods and Containers go up and down. The Datadog agent traditionally requires a modification of the hardcoded host/port values of corresponding configuration files (example with postgres) and an agent restart to collect Data for installed software.

Rather than having a Mechanical Turk sit on standy ready to make the changes, the containerized accommodates makes this process automagic using Autodiscovery where the agent has the capability to monitor the annotations of deployments and automatically establish checks as pods come and go.

To set up autodiscovery, you will need to set up volumes and mountPaths to put the ethemeral configuration files

Postgres Example

Autodiscovery of the postgres pod in this container is straightforward, simply annotation in the configuration file by adding in the typical check sections required.

Note the annotation arguments here must be identical for the agent to properly connect to the container.

Prometheus

Many services (like kubernetes itself) utilizes Prometheus as an enhancement to reveal custom internal metrics specific to the service.

You can see what the structure of the prometheus metrics look like by running:

minikube ssh
curl localhost:10255/metrics

Datadog has the innate capability to read the log structure of prometheus produced metrics and turn it into custom metrics. The scope of this repo doesn't really touch on it too much, but collecting prometheus metrics can simply be done via annotations done at the deployment level as seen here -- note this example fails on purpose so you can see what an error looks like in agent status

Read more about it via our documentation

Live Processes/Containers

Live Process/Container Monitoring is the capability to get container and process level granularity for all monitored systems. This feature provides not only standard system level metrics at the process/container level, but also on the initial run commands used to set up the process/container.

Simply add DD_PROCESS_AGENT_ENABLED env variable in the daemonset to turn on this feature

Requirements

Sometimes passwords are revealed in the initial run commands, the agent comes equipped with passwd to remove a standard set of arguments

We again need docker.sock to get container information.

Validation

run kubectl get pods to get the pod name of the agent container.

run kubectl exec -it POD_NAME bash to hop into the container

run agent status to see the status summary and look for the integrations section to see agent is collecting metrics

run cat /var/log/datadog/agent.log to see logs pertaining to the agent

APM Tracing

The same agent that handles infrastructure metrics can also accommodate receiving Trace Data from a designated APM module -- these modules sit on top of your applications and forward payloads to a local Datadog agent to middle man to our backend.

Applications are spinning up in pods and we need payloads being fired to the sidecar agent pod. In this example, we set up a route between the pods with a port going through the host level.

Requirements

From the agent daemon side

Enable agent to receive traces from the Agent deployment side via environment variables

Create a port connection to host via the 8126 port

From the application Side

Provide the deploy file with a link to the host level for port 8126 via environmental variables, so that applications can reference the host/port values to fire traces to

Flask specific

Have the ddtrace module

In the app.py code, import ddtrace module and patch both sqlalchemy and the Flask app object.

Note: the trace module is an implementation as all modules are. If certain spans are not being captured, you can always hardcode them in.

Validation

Agent Side

run kubectl get pods to get the pod name of the agent container.

run kubectl exec -it POD_NAME bash to hop into the container

run agent status to see the status summary and look for the tracing section to see agent has tracing turned on

run cat /var/log/datadog/trace-agent.log to see logs pertaining to the trace agent

Datadog Side

head over to Trace Services Page and look for your service level metrics and traces!

Logs

The same agent polling for metrics periodically for infrastructure metrics, live process metrics, and middle manning trace transcations can also be set up to tail log instances.

Simply turn on DD_LOGS_ENABLED via the environmental variable in the agent daemon file.

Tailing Flask logs via Config Maps

Tailing logs from flask is pretty easy using kubernetes config maps.

Agent Side

Simply set up the mountPath directory connected via the host and the volume in the agent daemon

Flask side

Set up the corresponding mounts for volume and mountPath so we can connect the flask pod to the relevant agent pod via the host

In the app, set up the app level logging configurations so logs are properly pushed to the right log file when routines are run (example)

Validation

Agent Side

run kubectl get pods to get the pod name of the agent container.

run kubectl exec -it POD_NAME bash to hop into the container

run agent status to see the status summary and look for the logging section to see agent has logs turned on

run cat /var/log/datadog/agent.log to see relevant log instances pertaining to the log agent

Datadog Side

Navigate to the logs explorer page and look for flask logs!

About

Show a sample kubernetes cluster using Datadog

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published