This repository contains an example of programmatic monitoring of Nvidia GPUs in C++ using NVML library.
Refer to Monitoring Nvidia GPUs using API article for details.
Core deliverables of this project are contained in separate directories and are the following:
monitor
Collects metrics from GPU devices.
Console application, written in C++17.
data_extractor
Extracts metrics data from monitor's output.
Console application, written in Python 3.
device_data_filter
Filters metrics of a specific device from full data set obtained via
data_extractor
.Console application, written in Python 3.
data_visualizer
Plots metrics of a specific GPU device.
Console application, written in Python 3.
This section briefly describes steps needed to be taken to get each deliverable ready for execution.
The monitor
component is written in C++17 with build instructions
defined in cmake
.
Hence, one needs a cmake
and a C++ frontend supporting std:c++17
to build the monitor.
Having those both installed, the monitor build steps are the following:
mkdir build
cd build
cmake ..
The data_extractor
component uses Python 3.7+.
Has no build steps.
The device_data_filter
component uses Python 3.7+.
Its dependencies are listed at:
device_data_filter/requirements.txt
Dependencies installation example:
pip install -r device_data_filter/requirements.txt
The data_visualizer
component uses Python 3.7+.
Its dependencies are listed at:
data_visualizer/requirements.txt
Dependencies installation example:
pip install -r data_visualizer/requirements.txt
Current section contains brief usage description for components provided by this repository.
NVML shared library must be present in the system and
its directory must be listed in system's PATH
variable
in order to run the monitor.
Usually, it comes installed with graphics driver on Windows and has to be installed additionally on Linux-based systems.
The monitor
does not take any parameters.
Basic usage:
./monitor
Watching monitor and persisting logs on Windows:
./monitor | Tee-Object -FilePath "monitor.log"
Watching monitor and persisting logs on Linux-base systems:
./monitor | tee "monitor.log"
Executable of the data_extractor
component is located at:
./data_extractor/run.py
Its usage doc is listed below:
usage: run.py [-h] [monitor_log] [extracted_data] Extract data from monitor log positional arguments: monitor_log path to monitor log file (default: -) extracted_data path to extracted data file (default: -) optional arguments: -h, --help show this help message and exit
As stated in the doc, the component accepts 2 positional arguments.
Both of them are optional and default to std streams.
Example of reading from stdin and writing to stdout:
cat /path/to/captured/monitor.log | ./data_extractor/run.py
Same with specifying std streams explicitly:
cat /path/to/captured/monitor.log | ./data_extractor/run.py - -
Example of using files:
./data_extractor/run.py /path/to/captured/monitor.log /path/to/output/data_all.csv
Or same using output redirection:
./data_extractor/run.py /path/to/captured/monitor.log > /path/to/output/data_all.csv
Example of reading from a file and writing to stdout:
./data_extractor/run.py /path/to/captured/monitor.log
Executable of the device_data_filter
component is located at:
./device_data_filter/run.py
Its usage doc is listed below:
usage: run.py [-h] [device_index] [input_data] [output_data] Filter monitoring data for a specific device positional arguments: device_index index of device to filter data by (default: 0) input_data path to monitor log file (default: -) output_data path to extracted data file (default: -) optional arguments: -h, --help show this help message and exit
As stated in the doc, the component accepts 3 optional positional arguments.
device_index
defaults to 0
and the rest defaults to std streams,
just as in case of data_extractor
.
Example of filtering data of device with index 0
using files:
./device_data_filter/run.py 0 /path/to/output/data_all.csv /path/to/output/data_0.csv
Executable of the data_visualizer
component is located at:
./data_visualizer/run.py
Its usage doc is listed below:
usage: run.py [-h] [output_file_path_format] [input_data] Visualize device data positional arguments: output_file_path_format format of output file paths (default: monitor.{suffix}.png) input_data path to device data file (default: -) optional arguments: -h, --help show this help message and exit
As stated in the doc, the component accepts 2 optional positional arguments:
- path format for output files, where
{suffix}
will be replaced with a suffix likefull
,0
,1
and so on (seedocs/monitor.*.png
for output examples). - path to data filtered by
device_data_filter
component.
Usage plotting data for device 0
:
./data_visualizer/run.py '/path/to/output/monitor.{suffix}.csv' /path/to/output/data_0.csv