Collecting host and container metrics using CAdvisor and Collectd.
Motivation, I needed a quick Collectd solution to gather host and container metrics without installing or running software directly on the hosting system (outside of a container).
Collectd is a good solution for gathering metrics from many different sources. It has a wealth of plugins for both active and passive metric collection. There are also plugins which support forwarding metrics to various backends. Not to mention, for me at least, it is already in place as the metrics collection and transport solution.
But, Collectd does not currently provide plugins to:
- collect metrics from the host when Collectd itself is running in a container (not that I found anyway).
- collect metrics from (other) containers running on the host system.
CAdvisor proved a good solution for exposing a base set of metrics from both the host system as well as the other (Docker) containers.
Since Collectd configurations are dynamic and target specific, a mounted volume is used initially. This requirement will be eliminated as configuration support via etcd and consul is added. For now the configurations are distributed through current orchestration methods (ansible, puppet, chef, salt, etc.).
- Deploy
etc-collectd
to target host and configure. - Start containers. Manual and systemd instructions below configuration section.
The main Collectd configuration etc-collectd/collectd.conf.example
.
Hostname or FQDNLookup must be set prior to starting Collectd.
cd etc-colletd
cp collectd.conf.example collectd.conf
- Edit resulting
collectd.conf
accordingly:- Static IP and valid forward and reverse DNS for the hostname to be used.
- Comment Hostname
#Hostname
- Ensure FQDNLookup set to true
FQDNLookup true
- Comment Hostname
- Dynamic IP and ephemeral hostname to be used.
- Update Hostanme
Hostname "test.local"
- Ensure FQDNLookup set to false
FQDNLookup false
- Update Hostanme
- Static IP and valid forward and reverse DNS for the hostname to be used.
At least one Collectd writer plugin, in etc-collectd/conf.d
, must be enabled for Collectd to run correctly.
- Copy one, or more, of the example (network, write_graphite, write_http, csv) configurations to file(s) with a
.conf
extension. - Edit the resulting configuration file.
- Note, configurations are not dynamically loaded by Collectd. If the Collectd container is already running, a restart will be required to pick up configuration changes and the addition of new configuration files.
InfluxDB also has a Graphite plugin which can be used by this configuration.
- Determine fqdn or ip of the destination (graphite or influxdb) host.
- Determine destination protocol and port (e.g. tcp 2003).
cd etc-collectd/conf.d && cp write_graphite.conf.example write_graphite.conf
- Edit
write_graphite.conf
- Update applicable settings for target. Host, Port, and Protocol at a minimum.
InfluxDB also has a native Collectd plugin which can be used by this configuration.
- Determine fqdn or ip of the destination (graphite or influxdb) host.
- Determine destination port (e.g. collectd's default is 25826).
cd etc-collectd/conf.d && cp network.conf.example network.conf
- Edit
network.conf
- Update Server setting (one-line or block form) and any others applicable to target. Note format for Server is
Server "IP||FQDN" "port"
.
- Determine destination URL.
cd etc-collectd/conf.d && cp write_http.conf.example write_http.conf
- Edit
write_http.conf
- Update applicable setting(s), URL at a minimum.
cd etc-collectd/conf.d && cp csv.conf.example csv.conf
- Edit
csv.conf
- Update DataDir, default is
/opt/collectd/csv
. Update to write to a mounted volume if the data is needed outside of the container. (export, easier access for testing, etc.)
This configures the script which gathers metrics from CAdvisor and emits them to Collectd. The documentation and descriptions of the settings are contained within the YAML file.
cd etc-collectd && cp cadvisor.yaml.example cadvisor.yaml
- Edit
cadvisor.yaml
- Read descriptions and update settings accordingly.
From wherever etc-collectd
was placed.
sudo docker run --name=cadvisor \
-v /:/rootfs:ro \
-v /var/run:/var/run:rw \
-v /sys:/sys:ro \
-v /var/lib/docker/:/var/lib/docker:ro \
-d google/cadvisor:latest
sudo docker run --name=collectd \
-v $(pwd)/etc-collectd:/etc/collectd \
-v /var/run/docker.sock:/var/run/docker.sock \
-d maier/collectd:latest
The collectd.service
and cadvisor.service
unit files from this repository can be used as a starting point. Note, modify collectd unit file to ensure the path for etc-collectd
points to where the configuration files are actually located. (default is /conf/etc-collectd
)
- shell access
- CAdvisor
docker exec -it cadvisor /bin/sh
, busybox based, useopkg-install
to add additional packages. - Collectd
docker exec -it collectd /bin/sh
, alipine based, useapk-install
to add additional packages.
- CAdvisor
- verify docker socket in collectd container
docker exec -it collectd /bin/sh
apk-install socat
echo -e "GET /containers/json HTTP/1.1\r\n" | socat unix-connect:/var/run/docker.sock -
- verify cadvisor (from host)
curl -s "$(docker inspect --format '{{ .NetworkSettings.IPAddress }}' cadvisor):8080/api/v2.0/machine" | python -m json.tool
- list cadvisor /system.slice/subcontainers (from host), useful when editing
system_services:
list incadvisor.yaml
curl -s "$(docker inspect --format '{{ .NetworkSettings.IPAddress }}' cadvisor):8080/api/v1.3/containers/system.slice" | python -c 'import json,sys,pprint;obj=json.load(sys.stdin);pprint.pprint(obj["subcontainers"]);'
In no particular priority order...
- add mesos metrics collection plugin for Collectd
- ansible playbook
- option to pull container metrics from Docker or CAdvisor
- configure from consul
- configure from etcd