zesysman
provides a command-line interface to oneAPI Level Zero System Resource Management services. It supports reading and modifying GPU settings. (The ability to modify settings depends on system privileges.) It is closely tied to the oneAPI Level Zero System Resource Management library: consult the current oneAPI Level Zero Specification to understand important concepts.
To list the current options supported, use the --help
option:
% zesysman --help
By default, zesysman
operates on all devices in the system controlled by an L0 driver. --device
limits operations to specific devices selected by device index or UUID and --driver
similarly limits operations to devices supported by a given driver. The available devices and drivers can be listed as follows:
% zesysman --device list
% zesysman --driver list
The following options query status of various system management components:
Option | Feature Group |
---|---|
-t , --show-temp |
Temperature |
-p , --show-power |
Power |
-c , --show-freq , --show-clocks |
Frequency |
-u , --show-util |
Utilization |
-m , --show-mem |
Memory modules |
-x , --show-pci |
PCI |
--show-fabric-ports |
Fabric ports |
-b , --show-standby |
Standby domains |
-e , --show-errors |
Errors |
-a , --show-all |
all of the above |
Not included in --show-all : |
|
--show-device |
Device attributes |
--show-processes |
Process usage |
--show-scheduler |
Scheduler mode |
--show-diag |
Diagnostic Tests |
By default, telemetry data is reported (as applicable) for the specified component. This is variable/dynamic data such as current temperature or actual frequency. To report inventory data instead, specify --show-inventory
. This is fixed/configured data such as vendor information or current frequency limits. To show both, specify both --show-inventory
and --show-telemetry
:
Option | Attribute Type |
---|---|
-i , --show-inventory |
fixed/configured |
-l , --show-telemetry |
variable/dynamic |
For device/power/frequency/PCI inventory and frequency telemetry, --verbose
requests additional data be shown.
Note that not all oneAPI Level Zero System Resource Management features are supported by any given driver, device, or system configuration. If a request to access a oneAPI Level Zero System Resource Management feature returns an UNSUPPORTED indication, the tool skips it.
Telemetry data can be polled by specifying the number of --iterations
to perform (setting --iterations
to 0
or specifying only a --poll
time requests polling until stopped by keyboard interrupt). Some telemetry data (e.g., bandwidth) requires at least two reads. When requesting such data, the initial report is delayed by the current --poll
time (which defaults to 1 second).
Option | Polling Parameter |
---|---|
--poll TIME |
Time between polls in seconds |
--iterations N |
Number of times to poll |
By default, queries are returned in list format. When requested, they can be output in XML, table, or CSV format. If an output filename ending in .xml
or .csv
is specified via --output
, the corresponding format is selected by default.
Option | Output Control |
---|---|
-f list , --format list |
use list output format |
-f xml , --format xml |
use XML output format |
-f table , --format table |
use table output format |
-f csv , --format csv |
use CSV table output format |
--output FILE |
output to FILE |
--tee |
print to standard output also |
Here are examples of all four formats:
% zesysman --show-freq
Device 0
UUID : 00000000-0000-0000-0000-3ea500008086
FrequencyDomains
GPU
RequestedFrequency : 300.0 MHz
ActualFrequency : 300.0 MHz
ThrottleReasons : Not throttled
% zesysman --show-freq --format xml
<?xml version="1.0" ?>
<Devices>
<Device Index="0" UUID="00000000-0000-0000-0000-3ea500008086">
<FrequencyDomains>
<FrequencyDomain Index="0" Name="GPU">
<RequestedFrequency Units="MHz">300.0</RequestedFrequency>
<ActualFrequency Units="MHz">300.0</ActualFrequency>
<ThrottleReasons>Not throttled</ThrottleReasons>
</FrequencyDomain>
</FrequencyDomains>
</Device>
</Devices>
% zesysman --show-freq --format table
+-------+--------------------------+-----------------------+---------------------------+
| Index | Freq[GPU].Requested[MHz] | Freq[GPU].Actual[MHz] | Freq[GPU].ThrottleReasons |
|=======|==========================|=======================|===========================|
| 0 | 300.0 | 300.0 | Not throttled |
+-------+--------------------------+-----------------------+---------------------------+
% zesysman --show-freq --format csv
Index,Freq[GPU].Requested[MHz],Freq[GPU].Actual[MHz],Freq[GPU].ThrottleReasons
0,300.0,300.0,Not throttled
There are additional controls to fine-tune specific output formats:
Option | Output Control |
---|---|
--indent COUNT |
indent by COUNT spaces in list format |
--indent -COUNT |
indent by COUNT tabs in list format |
--style condensed |
do not line up colons in list format |
--style aligned |
line up colons in list format |
--uuid-index |
use UUID as index for table formats |
--ascii |
do not use Unicode degree sign |
The tool allows various System Manager settings to be set as well. The process must have the appropriate privileges to change these:
Option | Sets |
---|---|
--enable-critical-temp IDX |
critical temperature |
--disable-critical-temp IDX |
critical temperature |
--enable-t1-low-to-high IDX |
temperature threshold 1 |
--disable-t1-low-to-high IDX |
temperature threshold 1 |
--enable-t1-ligh-to-low IDX |
temperature threshold 1 |
--disable-t1-ligh-to-low IDX |
temperature threshold 1 |
--set-t1-threshold IDX C |
temperature threshold 1 |
--enable-t2-low-to-high IDX |
temperature threshold 2 |
--disable-t2-low-to-high IDX |
temperature threshold 2 |
--enable-t2-ligh-to-low IDX |
temperature threshold 2 |
--disable-t2-ligh-to-low IDX |
temperature threshold 2 |
--set-t2-threshold IDX C |
temperature threshold 2 |
--set-power POW [TAU] |
sustained power limit |
--enable-power |
sustained power limit |
--disable-power |
sustained power limit |
--set-burst-power POW |
burst power limit |
--enable-burst-power |
burst power limit |
--disable-burst-power |
burst power limit |
--set-peak-power AC [DC] |
peak power limit |
--set-energy-threshold J |
energy threshold |
--set-freq MIN [MAX] |
frequency limits |
--set-error-thresholds IDX N [...] |
error thresholds |
--enable-fabric-ports IDX [...] |
enable fabric ports |
--disable-fabric-ports IDX [...] |
disable fabric ports |
--enable-beaconing IDX [...] |
enable port beaconing |
--disable-beaconing IDX [...] |
disable port beaconing |
--set-standby enabled |
enables standby promotion |
--set-standby disabled |
disables standby promotion |
--set-scheduler MODE |
scheduler timeout mode |
for --set-scheduler
, MODE
may be one of the following:
MODE | Specifies |
---|---|
timeout [TIME] |
timeout mode, value |
timeslice [INT] [YLD] |
timeslice mode, interval, yield interval |
exclusive |
exclusive mode |
Values specified to --set
operations can be specified with units (e.g., ms
, MHz
) and values are scaled to match the units used by the fields.
The tool supports additional actions when supported by the underlying device and driver:
Option | Action |
---|---|
--run-diag SUITE [FIRST_IDX [LAST_IDX]] |
run diagnostic tests |
--clear-errors |
clear error counters |
--reset |
reset specified device |
If --reset
or --run-diag
is used and there is more than one supported device in the system, --device
must be used to limit operation to a single device. Before resetting a device, the tool will ask for confirmation unless you also specify -y
or --yes
on the command line. To forcibly reset even if the device is in use, specify --force
also.
As an added-value feature, the tool supports a --show-topo
option that uses fabric port queries to identify how the devices connect to Xe fabrics. It supports three different output modes. The desired mode must be explicitly specified:
Option | Output Mode |
---|---|
-show-topo matrix |
textual summary matrix of connected devices |
-show-topo graph |
graphviz-formatted representation of topology map |
-show-topo info |
attach points and interconnections in query format |
The matrix
view summarizes which devices connect to each other via Xe fabric connections:
% zesysman --show-topo matrix
0 1 2 3 4 5 6 7 8 9 10 11
0 X x x x x x - - - - - -
1 x X x x x x - - - - - -
2 x x X x x x - - - - - -
3 x x x X x x - - - - - -
4 x x x x X x - - - - - -
5 x x x x x X - - - - - -
6 - - - - - - X x x x x x
7 - - - - - - x X x x x x
8 - - - - - - x x X x x x
9 - - - - - - x x x X x x
10 - - - - - - x x x x X x
11 - - - - - - x x x x x X
The graph
view provides output intended for graphviz tools like dot, neato, twopi, fdp, or sfdp. These generate nicely-formatted graphical renditions in a variety of formats. For example:
% zesysman --show-topo graph | dot -Tpdf > fabric.pdf
The info
view lists the fabric identifier and attach points for each device. For each attach point, it provides a list of ports, and for each port it identifies the remote fabric identifier, attach point, and port to which it is connected. As with component queries, the output defaults to list
format and other formats may be specified by the -f
/--format
option.