Skip to content

kevin7674/nvidia_smi_exporter

 
 

Repository files navigation

nvidia_smi_exporter

nvidia-smi metrics exporter for Prometheus 機器需已安裝 nvidia_smi

1.在監控目標主機下載並運行 nvidia_smi_exporter

git clone https://github.com/kevin7674/nvidia_smi_exporter.git
cd nvidia_smi_exporter
./nvidia_smi_exporter 9101 &

2.修改 vi /etc/rc.local

./nvidia_smi_exporter 9101 &

node-exporter

docker run -d -p 9100:9100 prom/node-exporter:v1.1.0

Prometheus

創建volume

docker volume create prometheus-volume

檢查volume

docker volume ls

先編好/root/prometheus.yml 中的監控目標再運行

docker run -d -p 9090:9090 -v /root/prometheus.yml:/etc/prometheus/prometheus.yml --volume prometheus-volume:/prometheus prom/prometheus:v1.8.2

Grafana 8.3.0

docker run -d -p 3000:3000 --name grafana grafana/grafana:8.3.0

============================================================

============================================================

nvidia_smi_exporter Image

> docker build -t="nvidia_smi_exporter:0" .

Docker Run

> nvidia-docker run -d --net="host" nvidia_smi_exporter:0 --restart=always

Default port is 9101

curl localhost:9101/metrics

temperature_gpu{gpu="Tesla V100-SXM2-16GB[0]"} 34
utilization_gpu{gpu="Tesla V100-SXM2-16GB[0]"} 0
utilization_memory{gpu="Tesla V100-SXM2-16GB[0]"} 0
memory_total{gpu="Tesla V100-SXM2-16GB[0]"} 16152
memory_free{gpu="Tesla V100-SXM2-16GB[0]"} 16142
memory_used{gpu="Tesla V100-SXM2-16GB[0]"} 10
temperature_gpu{gpu="Tesla V100-SXM2-16GB[1]"} 37
utilization_gpu{gpu="Tesla V100-SXM2-16GB[1]"} 0
utilization_memory{gpu="Tesla V100-SXM2-16GB[1]"} 0
memory_total{gpu="Tesla V100-SXM2-16GB[1]"} 16152
memory_free{gpu="Tesla V100-SXM2-16GB[1]"} 16142
memory_used{gpu="Tesla V100-SXM2-16GB[1]"} 10
temperature_gpu{gpu="Tesla V100-SXM2-16GB[2]"} 36
utilization_gpu{gpu="Tesla V100-SXM2-16GB[2]"} 0
utilization_memory{gpu="Tesla V100-SXM2-16GB[2]"} 0
memory_total{gpu="Tesla V100-SXM2-16GB[2]"} 16152
memory_free{gpu="Tesla V100-SXM2-16GB[2]"} 16142
memory_used{gpu="Tesla V100-SXM2-16GB[2]"} 10
temperature_gpu{gpu="Tesla V100-SXM2-16GB[3]"} 33
utilization_gpu{gpu="Tesla V100-SXM2-16GB[3]"} 0
utilization_memory{gpu="Tesla V100-SXM2-16GB[3]"} 0
memory_total{gpu="Tesla V100-SXM2-16GB[3]"} 16152
memory_free{gpu="Tesla V100-SXM2-16GB[3]"} 16142
memory_used{gpu="Tesla V100-SXM2-16GB[3]"} 10
temperature_gpu{gpu="Tesla V100-SXM2-16GB[4]"} 36
utilization_gpu{gpu="Tesla V100-SXM2-16GB[4]"} 0
utilization_memory{gpu="Tesla V100-SXM2-16GB[4]"} 0
memory_total{gpu="Tesla V100-SXM2-16GB[4]"} 16152
memory_free{gpu="Tesla V100-SXM2-16GB[4]"} 16142
memory_used{gpu="Tesla V100-SXM2-16GB[4]"} 10
temperature_gpu{gpu="Tesla V100-SXM2-16GB[5]"} 37
utilization_gpu{gpu="Tesla V100-SXM2-16GB[5]"} 0
utilization_memory{gpu="Tesla V100-SXM2-16GB[5]"} 0
memory_total{gpu="Tesla V100-SXM2-16GB[5]"} 16152
memory_free{gpu="Tesla V100-SXM2-16GB[5]"} 16142
memory_used{gpu="Tesla V100-SXM2-16GB[5]"} 10
temperature_gpu{gpu="Tesla V100-SXM2-16GB[6]"} 39
utilization_gpu{gpu="Tesla V100-SXM2-16GB[6]"} 0
utilization_memory{gpu="Tesla V100-SXM2-16GB[6]"} 0
memory_total{gpu="Tesla V100-SXM2-16GB[6]"} 16152
memory_free{gpu="Tesla V100-SXM2-16GB[6]"} 16142
memory_used{gpu="Tesla V100-SXM2-16GB[6]"} 10
temperature_gpu{gpu="Tesla V100-SXM2-16GB[7]"} 38
utilization_gpu{gpu="Tesla V100-SXM2-16GB[7]"} 0
utilization_memory{gpu="Tesla V100-SXM2-16GB[7]"} 0
memory_total{gpu="Tesla V100-SXM2-16GB[7]"} 16152
memory_free{gpu="Tesla V100-SXM2-16GB[7]"} 16142
memory_used{gpu="Tesla V100-SXM2-16GB[7]"} 10

Exact command

nvidia-smi --query-gpu=name,index,temperature.gpu,utilization.gpu,utilization.memory,memory.total,memory.free,memory.used --format=csv,noheader,nounits

Prometheus example config

- job_name: "gpu_exporter"
  static_configs:
  - targets: ['localhost:9101']

About

nvidia-smi exporter for Prometheus

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Go 87.0%
  • Dockerfile 13.0%