Skip to content

Commit

Permalink
doc: added sections on build-time and run-time controls
Browse files Browse the repository at this point in the history
Co-authored-by: Fitch, Benjamin <[email protected]>
  • Loading branch information
vpirogov and fitchbe committed Jun 10, 2020
1 parent 6d9bee5 commit afc3257
Show file tree
Hide file tree
Showing 5 changed files with 126 additions and 62 deletions.
39 changes: 30 additions & 9 deletions doc/advanced/primitive_cache.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,15 +15,36 @@ for every instance of inference or iteration of training process.
The primitive cache is global hence a user doesn't have to maintain any
persistent oneDNN resources to benefit from the primitive cache.

## Memory consumption
Since the primitive cache has limited capacity, it uses
a replacement policy to evict excess primitives. The capacity indicates
the maximum number of primitives it can hold at a time and it can be adjusted
with an API or an environment variable `DNNL_PRIMITIVE_CACHE_CAPACITY`.
If the capacity is set to 0 then the primitve cache is disabled.
The API takes precedence over the environment variable.

## Primitive cache profiling
## Managing Memory Consumption
The primitive cache has an upper limit for the number of primitives stored. Once
capacity is exceeded, a primitive that was least recently used will be evicted
from the cache. See the Run-time Controls section below for information on
changing the cache capacity.

## Profiling
Information about primitive cache hits and misses can be used for debug
purposes. That information is part of the verbose output for verbose
level 2 (@ref dev_guide_verbose).

## Build-time Controls

At build-time, support for this feature is controlled via cmake option
`DNNL_ENABLE_PRIMITIVE_CACHE`.

| CMake Option | Supported values (defaults in bold) | Description
| :--- | :--- | :---
| DNNL_ENABLE_PRIMITIVE_CACHE | **ON**, OFF | Enables [primitive cache](@ref dev_guide_primitive_cache)

## Run-time Controls
When the feature is enabled at build-time, the `DNNL_PRIMITIVE_CACHE_CAPACITY`
environment variable can be used to change cache capacity or disable the cache.

| Environment variable | Value | Description
| :--- | :--- | :---
| DNNL_PRIMITIVE_CACHE_CAPACITY | \<number\> | Set cache capacity to \<number\> (default **1024**)
| | 0 | Disable primitive cache

This feature can also be managed at run-time with the following functions:
* @ref dnnl_set_primitive_cache_capacity

The function setting takes precedence over the environment variable.
5 changes: 3 additions & 2 deletions doc/build/build_options.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ Build Options {#dev_guide_build_options}

oneDNN supports the following build-time options.

| Option | Supported values (defaults in bold) | Description
| CMake Option | Supported values (defaults in bold) | Description
| :--- | :--- | :---
| DNNL_LIBRARY_TYPE | **SHARED**, STATIC | Defines the resulting library type
| DNNL_CPU_RUNTIME | **OMP**, TBB, SEQ, THREADPOOL | Defines the threading runtime for CPU engines
Expand All @@ -12,9 +12,10 @@ oneDNN supports the following build-time options.
| DNNL_BUILD_TESTS | **ON**, OFF | Controls building the tests
| DNNL_ARCH_OPT_FLAGS | *compiler flags* | Specifies compiler optimization flags (see warning note below)
| DNNL_ENABLE_CONCURRENT_EXEC | ON, **OFF** | Disables sharing a common scratchpad between primitives in #dnnl::scratchpad_mode::library mode
| DNNL_ENABLE_JIT_PROFILING | **ON**, OFF | Enables [integration with Intel(R) VTune(TM) Amplifier](@ref dev_guide_profilers)
| DNNL_ENABLE_JIT_PROFILING | **ON**, OFF | Enables [integration with performance profilers](@ref dev_guide_profilers)
| DNNL_ENABLE_PRIMITIVE_CACHE | **ON**, OFF | Enables [primitive cache](@ref dev_guide_primitive_cache)
| DNNL_ENABLE_MAX_CPU_ISA | **ON**, OFF | Enables [CPU dispatcher controls](@ref dev_guide_cpu_dispatcher_control)
| DNNL_VERBOSE | **ON**, OFF | Enables [verbose mode](@ref dev_guide_verbose)

All other building options that can be found in CMake files are dedicated for
the development/debug purposes and are subject to change without any notice.
Expand Down
55 changes: 38 additions & 17 deletions doc/performance_considerations/dispatcher_control.md
Original file line number Diff line number Diff line change
@@ -1,28 +1,49 @@
CPU dispatcher control {#dev_guide_cpu_dispatcher_control}
CPU Dispatcher Control {#dev_guide_cpu_dispatcher_control}
==========================================================

oneDNN uses JIT code generation to implement most of its functionality and will
choose the best code based on detected processor features. Sometimes it is
necessary to control which features oneDNN detects. This is sometimes useful for
debugging purposes or for performance exploration. To enable this, oneDNN
provides two mechanisms: an environment variable `DNNL_MAX_CPU_ISA` and a
function `dnnl::set_max_cpu_isa()`.
debugging purposes or for performance exploration.

The environment variable can be set to an upper-case name of the ISA as
defined by the `dnnl::cpu_isa` enumeration. For example,
`DNNL_MAX_CPU_ISA=AVX2` will instruct oneDNN to dispatch code that will run
on systems with Intel AVX2 instruction set support. The `DNNL_MAX_CPU_ISA=ALL`
setting implies no restrictions.
## Build-time Controls

The `dnnl::set_max_cpu_isa()` function allows changing the ISA at run-time.
At build-time, support for this feature is controlled via cmake option
`DNNL_ENABLE_JIT_PROFILING`.

| CMake Option | Supported values (defaults in bold) | Description
| :--- | :--- | :---
| DNNL_ENABLE_MAX_CPU_ISA | **ON**, OFF | Enables [CPU dispatcher controls](@ref dev_guide_cpu_dispatcher_control)

## Run-time Controls

When the feature is enabled at build-time, the `DNNL_MAX_CPU_ISA` environment
variable can be used to limit processor features oneDNN is able to detect to
certain Instruction Set Architecture (ISA) and older instruction sets.

| Environment variable | Value | Description
| :--- | :--- | :---
| DNNL_MAX_CPU_ISA | SSE41 | Intel Streaming SIMD Extensions 4.1 (Intel SSE4.1)
| | AVX | Intel Advanced Vector Extensions (Intel AVX)
| | AVX2 | Intel Advanced Vector Extensions 2 (Intel AVX2)
| | AVX512_MIC | Intel Advanced Vector Extensions 512 (Intel AVX-512) with AVX512CD, AVX512ER, and AVX512PF extensions
| | AVX512_MIC_4OPS | Intel AVX-512 with AVX512_4FMAPS and AVX512_4VNNIW extensions
| | AVX512_CORE | Intel AVX-512 with AVX512BW, AVX512VL, and AVX512DQ extensions
| | AVX512_CORE_VNNI | Intel AVX-512 with Intel Deep Learning Boost (Intel DL Boost)
| | AVX512_CORE_BF16 | Intel AVX-512 with Intel DL Boost and bfloat16 support
| | **ALL** | **No restrictions on ISA (default)**

@note The ISAs are partially ordered:
* SSE41 < AVX < AVX2,
* AVX2 < AVX512_MIC < AVX512_MIC_4OPS,
* AVX2 < AVX512_CORE < AVX512_CORE_VNNI < AVX512_CORE_BF16.

This feature can also be managed at run-time with the following functions:
* @ref dnnl::set_max_cpu_isa function allows changing the ISA at run-time.
The limitation is that, it is possible to set the value only before the first
JIT-ed function is generated. This limitation ensures that the JIT-ed code
observe consistent CPU features both during generation and execution.
* @ref dnnl::get_effective_cpu_isa function returns the currently used CPU ISA
which is the highest available CPU ISA by default.

The `dnnl::get_effective_cpu_isa()` function returns the currently used CPU ISA
which is the highest available CPU ISA by default. This behavior can be
overridden via the `DNNL_MAX_CPU_ISA` environment variable or by
`dnnl::set_max_cpu_isa()` function.

This feature can be enabled or disabled at build time. See @ref
dev_guide_build_options for more information.
Function settings take precedence over environment variables.
46 changes: 27 additions & 19 deletions doc/performance_considerations/profilers.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Profiling oneDNN performance {#dev_guide_profilers}
Profiling oneDNN Performance {#dev_guide_profilers}
=================================================

oneDNN uses JIT (just-in-time) code generation based on primitive parameters and
Expand All @@ -7,33 +7,41 @@ performance event information, profilers need to be notified about
address ranges containing JIT-ed code. oneDNN supports two profilers:
VTune Amplifier and Linux perf.

At build-time, support for this feature is controlled via cmake option
`DNNL_ENABLE_JIT_PROFILING`:
## Build-time Controls

| Option | Possible Values (defaults in bold) | Description
| :--- |:--- | :---
|DNNL_ENABLE_JIT_PROFILING | **ON**, OFF | Enables integration with performance profilers
At build-time, support for this feature is controlled via cmake option
`DNNL_ENABLE_JIT_PROFILING`.

At run-time, this feature can be controlled via the following two functions:
| CMake Option | Supported values (defaults in bold) | Description
| :--- | :--- | :---
| DNNL_ENABLE_JIT_PROFILING | **ON**, OFF | Enables performance profilers integration

* @ref dnnl_set_jit_profiling_flags
## Run-time Controls

* @ref dnnl_set_jit_profiling_jitdumpdir
When the feature is enabled at build-time, the `DNNL_JIT_PROFILE` environment
variable can be used to manage integration with performance profilers.

or via the `DNNL_JIT_PROFILE` environment variable which accepts the same
values as the @ref dnnl_set_jit_profiling_flags function. The following
individual flags may be OR-ed:
| Environment variable | Value | Description
| :--- | :--- | :---
| DNNL_JIT_PROFILE | **1** | **Enables VTune integration (default)**
| | 2 | Enables basic Linux perf integration
| | 6 | Enables Linux perf integration with JIT dump output
| | 14 | Enables Linux perf integration with JIT dump output and TSC timestamps

* @ref DNNL_JIT_PROFILE_VTUNE = 1: @copybrief DNNL_JIT_PROFILE_VTUNE
* @ref DNNL_JIT_PROFILE_LINUX_PERFMAP = 2: @copybrief DNNL_JIT_PROFILE_LINUX_PERFMAP
* @ref DNNL_JIT_PROFILE_LINUX_JITDUMP = 4: @copybrief DNNL_JIT_PROFILE_LINUX_JITDUMP
* @ref DNNL_JIT_PROFILE_LINUX_JITDUMP_USE_TSC = 8: @copybrief DNNL_JIT_PROFILE_LINUX_JITDUMP_USE_TSC
Other valid values for `DNNL_JIT_PROFILE` include integer values representing
a combination of flags accepted by @ref dnnl_set_jit_profiling_flags function.

The default setting of the profiling flags is to enable integration with
VTune Amplifier, therefore it does not require any additional setup and works
VTune Amplifier; therefore it does not require any additional setup and works
out of the box. Code integrating oneDNN may override this behavior.

## Example: profiling with VTune Amplifier
This feature can also be managed at run-time with the following functions:
* @ref dnnl_set_jit_profiling_flags
* @ref dnnl_set_jit_profiling_jitdumpdir

Function settings take precedence over environment variables.

## Example: Profiling with VTune Amplifier

Assuming that environment is set up already.

Expand Down Expand Up @@ -72,7 +80,7 @@ the `[Dynamic code]` module.

See more examples in the [VTune Amplifier User Guide](https://software.intel.com/en-us/vtune-amplifier-help-tutorials-and-samples)

## Example: profiling with Linux perf
## Example: Profiling with Linux perf

The following command instructs oneDNN to enable both jitdump and perfmap
profiling modes and write jitdump files into `.debug` directory in the current
Expand Down
43 changes: 28 additions & 15 deletions doc/performance_considerations/verbose.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,17 +7,7 @@ the most time. oneDNN verbose mode enables tracing execution of oneDNN
primitives and collection of basic statistics like execution time and
primitive parameters.

The behavior is controlled with `DNNL_VERBOSE` environment variable or
@ref dnnl_set_verbose function.

| Value | Behavior
| :---- | :----
| **0** | no verbose output (default)
| 1 | primitive information at execution
| 2 | primitive information at creation and execution

The function setting takes precedence over the environment variable.

When verbose mode is enabled oneDNN will print out information to `stdout`.
The first lines of verbose information contain the build version and git hash,
if available, as well as CPU and GPU runtimes, and the supported instruction
set architecture.
Expand All @@ -35,13 +25,38 @@ containing:
- a problem description in [benchdnn format](@ref dev_guide_benchdnn)
- execution time in milliseconds

## Build-time Controls

At build-time, support for this feature is controlled via cmake option
`DNNL_VERBOSE`.

| CMake Option | Supported values (defaults in bold) | Description
| :--- | :--- | :---
| DNNL_VERBOSE | **ON**, OFF | Enables [verbose mode](@ref dev_guide_verbose)

## Run-time Controls

When the feature is enabled at build-time, the `DNNL_VERBOSE` environment
variable can be used to turn verbose mode on and control the level of verbosity.

| Environment variable | Value | Description
| :--- | :--- | :---
| DNNL_VERBOSE | **0** | **no verbose output (default)**
| | 1 | primitive information at execution
| | 2 | primitive information at creation and execution

This feature can also be managed at run-time with the following functions:
* @ref dnnl_set_verbose

The function setting takes precedence over the environment variable.

## Example

~~~sh
DNNL_VERBOSE=1 ./benchdnn --conv ic16ih7oc16oh7kh5ph2n"wip"
~~~

This produces the following output (the line break was added to fit the page width):
This produces the following output (the line breaks were added to fit the page width):

~~~sh
dnnl_verbose,info,DNNL v1.3.0 (commit d0fc158e98590dfad0165e568ca466876a794597)
Expand All @@ -52,9 +67,7 @@ dnnl_verbose,exec,cpu,reorder,jit:uni,undef,src_f32::blocked:abcd:f0 dst_f32::bl
dnnl_verbose,exec,cpu,reorder,jit:uni,undef,src_f32::blocked:abcd:f0 dst_f32::blocked:ABcd8b8a:f0,,,16x16x5x5,0.0251465
dnnl_verbose,exec,cpu,reorder,jit:uni,undef,src_f32::blocked:abcd:f0 dst_f32::blocked:aBcd8b:f0,,,2x16x7x7,0.0180664
dnnl_verbose,exec,cpu,reorder,simple:any,undef,src_f32::blocked:a:f0 dst_f32::blocked:a:f0,,,16,0.0229492
dnnl_verbose,exec,cpu,convolution,jit:avx2,forward_training,
src_f32::blocked:aBcd8b:f0 wei_f32::blocked:ABcd8b8a:f0 bia_f32::blocked:a:f0 dst_f32::blocked:aBcd8b:f0,,
alg:convolution_direct,mb2_ic16oc16_ih7oh7kh5sh1dh0ph2_iw7ow7kw5sw1dw0pw2,0.0390625
dnnl_verbose,exec,cpu,convolution,jit:avx2,forward_training,src_f32::blocked:aBcd8b:f0 wei_f32::blocked:ABcd8b8a:f0 bia_f32::blocked:a:f0 dst_f32::blocked:aBcd8b:f0,,alg:convolution_direct,mb2_ic16oc16_ih7oh7kh5sh1dh0ph2_iw7ow7kw5sw1dw0pw2,0.0390625
dnnl_verbose,exec,cpu,reorder,jit:uni,undef,src_f32::blocked:aBcd8b:f0 dst_f32::blocked:abcd:f0,,,2x16x7x7,0.173096
~~~

Expand Down

0 comments on commit afc3257

Please sign in to comment.