Name	Name	Last commit message	Last commit date
Latest commit History 54 Commits
.github	.github
cmd	cmd
grafana	grafana
internal/exporter	internal/exporter
samples	samples
systemd	systemd
.gitignore	.gitignore
.goreleaser.yml	.goreleaser.yml
CODE_OF_CONDUCT.md	CODE_OF_CONDUCT.md
CONTRIBUTING.md	CONTRIBUTING.md
Dockerfile	Dockerfile
LICENSE	LICENSE
README.md	README.md
go.mod	go.mod
go.sum	go.sum

nvidia_gpu_exporter

Nvidia GPU exporter for prometheus, using nvidia-smi binary to gather metrics.

Introduction

There are many Nvidia GPU exporters out there however they have problems such as not being maintained, not providing pre-built binaries, having a dependency to Linux and/or Docker, targeting enterprise setups (DCGM) and so on.

This is a simple exporter that uses nvidia-smi(.exe) binary to collect, parse and export metrics. This makes it possible to run it on Windows and get GPU metrics while gaming - no Docker or Linux required.

This project is based on a0s/nvidia-smi-exporter. However, this one is written in Go to produce a single, static binary.

If you are a gamer who's into monitoring, you are in for a treat.

Highlights

Will work on any system that has nvidia-smi(.exe)? binary - Windows, Linux, MacOS... No C bindings required
Doesn't even need to run the monitored machine: can be configured to execute nvidia-smi command remotely
No need for a Docker or Kubernetes environment
Auto-discovery of the metric fields nvidia-smi can expose (future-compatible)
Comes with its own Grafana dashboard

Visualization

You can use the official Grafana dashboard to see your GPU metrics in a nicely visualized way.

Here's how it looks like:

Installation

By downloading the binaries (MacOS/Linux/Windows)

Go to the releases and download the latest release archive for your platform.
Extract the archive.
Move the binary to somewhere in your PATH.

Sample steps for Linux 64-bit:

$ VERSION=0.1.6
$ wget https://github.com/utkuozdemir/nvidia_gpu_exporter/releases/download/v${VERSION}/nvidia_gpu_exporter_${VERSION}_linux_x86_64.tar.gz
$ tar -xvzf nvidia_gpu_exporter_${VERSION}_linux_x86_64.tar.gz
$ mv nvidia_gpu_exporter /usr/local/bin
$ nvidia_gpu_exporter --help

Installing as a Windows Service

Requirements:

Scoop package manager
NSSM (get the latest pre-release version)

Installation steps:

Open a privileged powershell prompt (right click - Run as administrator)
Run the following commands:

scoop bucket add nvidia_gpu_exporter https://github.com/utkuozdemir/scoop_nvidia_gpu_exporter.git
scoop install nvidia_gpu_exporter/nvidia_gpu_exporter --global
New-NetFirewallRule -DisplayName "Nvidia GPU Exporter" -Direction Inbound -Action Allow -Protocol TCP -LocalPort 9835
nssm install nvidia_gpu_exporter "C:\ProgramData\scoop\apps\nvidia_gpu_exporter\current\nvidia_gpu_exporter.exe"
Start-Service nvidia_gpu_exporter

Installing as a Linux (Systemd) Service

If your Linux distro is using systemd, you can install the exporter as a service using the unit file provided.

Follow these simple steps:

Download the Linux binary matching your CPU architecture and put it under /usr/local/bin directory.
Drop a copy of the file nvidia_gpu_exporter.service under /etc/systemd/system directory.
Run sudo systemctl daemon-reload
Start and enable the service to run on boot: sudo systemctl enable --now nvidia_gpu_exporter

Running in Docker

You can run the exporter in a Docker container.

For it to work, you will need to ensure the following:

The nvidia-smi binary is bind-mounted from the host to the container under its PATH
The devices /dev/nvidiaX (depends on the number of GPUs you have) and /dev/nvidiactl are mounted into the container
The library files libnvidia-ml.so and libnvidia-ml.so.1 are mounted inside the container. They are typically found under /usr/lib/x86_64-linux-gnu/ or /usr/lib/i386-linux-gnu/. Locate them in your host to ensure you are mounting them from the correct path.

A working example with all these combined (tested in Ubuntu 20.04):

docker run -d \
--name nvidia_smi_exporter \
--restart unless-stopped \
--device /dev/nvidiactl:/dev/nvidiactl \
--device /dev/nvidia0:/dev/nvidia0 \
-v /usr/lib/x86_64-linux-gnu/libnvidia-ml.so:/usr/lib/x86_64-linux-gnu/libnvidia-ml.so \
-v /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1:/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 \
-v /usr/bin/nvidia-smi:/usr/bin/nvidia-smi \
-p 9835:9835 \
utkuozdemir/nvidia_gpu_exporter:0.1.6

Running in Kubernetes

Using the exporter in Kubernetes is pretty similar with running it in Docker.

You can use the official helm chart to install the exporter.

The chart was tested on the following configuration:

Ubuntu Desktop 20.04 with Kernel 5.8.0-55-generic
K3s v1.21.1+k3s1
Nvidia GeForce RTX 2080 Super
Nvidia Driver version 465.27

Note: I didn't have chance to test it on an enterprise cluster with GPU support. If you have access to one and give the exporter a try and share the results, I would appreciate it greatly.

Command Line Reference

The exporter binary accepts the following arguments:

usage: nvidia_gpu_exporter [<flags>]

Flags:
  -h, --help                Show context-sensitive help (also try --help-long and --help-man).
      --web.config.file=""  [EXPERIMENTAL] Path to configuration file that can enable TLS or authentication.
      --web.listen-address=":9835"
                            Address to listen on for web interface and telemetry.
      --web.telemetry-path="/metrics"
                            Path under which to expose metrics.
      --nvidia-smi-command="nvidia-smi"
                            Path or command to be used for the nvidia-smi executable
      --query-field-names="AUTO"
                            Comma-separated list of the query fields. You can find out possible fields by running `nvidia-smi --help-query-gpus`. The value `AUTO` will
                            automatically detect the fields to query.
      --log.level=info      Only log messages with the given severity or above. One of: [debug, info, warn, error]
      --log.format=logfmt   Output format of log messages. One of: [logfmt, json]
      --version             Show application version.

Remote scraping configuration

The exporter can be configured to scrape metrics from a remote machine.

An example use case is running the exporter in a Raspberry Pi in your home network while scraping the metrics from your PC over SSH.

The exporter supports arbitrary commands with arguments to produce nvidia-smi-like output. Therefore, configuration is pretty straightforward.

Simply override the --nvidia-smi-command command-line argument (replace SSH_USER and SSH_HOST with SSH credentials):

nvidia_gpu_exporter --nvidia-smi-command "ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null SSH_USER@SSH_HOST nvidia-smi"

Contributing

See CONTRIBUTING for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

nvidia_gpu_exporter

Introduction

Highlights

Visualization

Installation

By downloading the binaries (MacOS/Linux/Windows)

Installing as a Windows Service

Installing as a Linux (Systemd) Service

Running in Docker

Running in Kubernetes

Command Line Reference

Remote scraping configuration

Contributing

About

Releases

Packages

Languages

License

zhan9san/nvidia_gpu_exporter

Folders and files

Latest commit

History

Repository files navigation

nvidia_gpu_exporter

Introduction

Highlights

Visualization

Installation

By downloading the binaries (MacOS/Linux/Windows)

Installing as a Windows Service

Installing as a Linux (Systemd) Service

Running in Docker

Running in Kubernetes

Command Line Reference

Remote scraping configuration

Contributing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages