doc: added sections on build-time and run-time controls

Co-authored-by: Fitch, Benjamin <[email protected]>
nivren · Jun 10, 2020 · afc3257 · afc3257
1 parent 6d9bee5
commit afc3257
Show file tree

Hide file tree

Showing 5 changed files with 126 additions and 62 deletions.
diff --git a/doc/advanced/primitive_cache.md b/doc/advanced/primitive_cache.md
@@ -15,15 +15,36 @@ for every instance of inference or iteration of training process.
 The primitive cache is global hence a user doesn't have to maintain any
 persistent oneDNN resources to benefit from the primitive cache.
 
-## Memory consumption
-Since the primitive cache has limited capacity, it uses
-a replacement policy to evict excess primitives. The capacity indicates
-the maximum number of primitives it can hold at a time and it can be adjusted
-with an API or an environment variable `DNNL_PRIMITIVE_CACHE_CAPACITY`.
-If the capacity is set to 0 then the primitve cache is disabled.
-The API takes precedence over the environment variable.
-
-## Primitive cache profiling
+## Managing Memory Consumption
+The primitive cache has an upper limit for the number of primitives stored. Once
+capacity is exceeded, a primitive that was least recently used will be evicted
+from the cache. See the Run-time Controls section below for information on
+changing the cache capacity.
+
+## Profiling
 Information about primitive cache hits and misses can be used for debug
 purposes. That information is part of the verbose output for verbose
 level 2 (@ref dev_guide_verbose).
+
+## Build-time Controls
+
+At build-time, support for this feature is controlled via cmake option
+`DNNL_ENABLE_PRIMITIVE_CACHE`.
+
+| CMake Option                | Supported values (defaults in bold) | Description
+| :---                        | :---                                | :---
+| DNNL_ENABLE_PRIMITIVE_CACHE | **ON**, OFF                         | Enables [primitive cache](@ref dev_guide_primitive_cache)
+
+## Run-time Controls
+When the feature is enabled at build-time, the `DNNL_PRIMITIVE_CACHE_CAPACITY`
+environment variable can be used to change cache capacity or disable the cache.
+
+| Environment variable          | Value            | Description
+| :---                          | :---             | :---
+| DNNL_PRIMITIVE_CACHE_CAPACITY | \<number\>       | Set cache capacity to \<number\> (default **1024**)
+|                               | 0                | Disable primitive cache
+
+This feature can also be managed at run-time with the following functions:
+* @ref dnnl_set_primitive_cache_capacity
+
+The function setting takes precedence over the environment variable.
diff --git a/doc/build/build_options.md b/doc/build/build_options.md
@@ -3,7 +3,7 @@ Build Options {#dev_guide_build_options}
 
 oneDNN supports the following build-time options.
 
-| Option                      | Supported values (defaults in bold) | Description
+| CMake Option                | Supported values (defaults in bold) | Description
 | :---                        | :---                                | :---
 | DNNL_LIBRARY_TYPE           | **SHARED**, STATIC                  | Defines the resulting library type
 | DNNL_CPU_RUNTIME            | **OMP**, TBB, SEQ, THREADPOOL       | Defines the threading runtime for CPU engines
@@ -12,9 +12,10 @@ oneDNN supports the following build-time options.
 | DNNL_BUILD_TESTS            | **ON**, OFF                         | Controls building the tests
 | DNNL_ARCH_OPT_FLAGS         | *compiler flags*                    | Specifies compiler optimization flags (see warning note below)
 | DNNL_ENABLE_CONCURRENT_EXEC | ON, **OFF**                         | Disables sharing a common scratchpad between primitives in #dnnl::scratchpad_mode::library mode
-| DNNL_ENABLE_JIT_PROFILING   | **ON**, OFF                         | Enables [integration with Intel(R) VTune(TM) Amplifier](@ref dev_guide_profilers)
+| DNNL_ENABLE_JIT_PROFILING   | **ON**, OFF                         | Enables [integration with performance profilers](@ref dev_guide_profilers)
 | DNNL_ENABLE_PRIMITIVE_CACHE | **ON**, OFF                         | Enables [primitive cache](@ref dev_guide_primitive_cache)
 | DNNL_ENABLE_MAX_CPU_ISA     | **ON**, OFF                         | Enables [CPU dispatcher controls](@ref dev_guide_cpu_dispatcher_control)
+| DNNL_VERBOSE                | **ON**, OFF                         | Enables [verbose mode](@ref dev_guide_verbose)
 
 All other building options that can be found in CMake files are dedicated for
 the development/debug purposes and are subject to change without any notice.

diff --git a/doc/performance_considerations/dispatcher_control.md b/doc/performance_considerations/dispatcher_control.md
@@ -1,28 +1,49 @@
-CPU dispatcher control {#dev_guide_cpu_dispatcher_control}
+CPU Dispatcher Control {#dev_guide_cpu_dispatcher_control}
 ==========================================================
 
 oneDNN uses JIT code generation to implement most of its functionality and will
 choose the best code based on detected processor features. Sometimes it is
 necessary to control which features oneDNN detects. This is sometimes useful for
-debugging purposes or for performance exploration. To enable this, oneDNN
-provides two mechanisms: an environment variable `DNNL_MAX_CPU_ISA` and a
-function `dnnl::set_max_cpu_isa()`.
+debugging purposes or for performance exploration.
 
-The environment variable can be set to an upper-case name of the ISA as
-defined by the `dnnl::cpu_isa` enumeration. For example,
-`DNNL_MAX_CPU_ISA=AVX2` will instruct oneDNN to dispatch code that will run
-on systems with Intel AVX2 instruction set support. The `DNNL_MAX_CPU_ISA=ALL`
-setting implies no restrictions.
+## Build-time Controls
 
-The `dnnl::set_max_cpu_isa()` function allows changing the ISA at run-time.
+At build-time, support for this feature is controlled via cmake option
+`DNNL_ENABLE_JIT_PROFILING`.
+
+| CMake Option                | Supported values (defaults in bold) | Description
+| :---                        | :---                                | :---
+| DNNL_ENABLE_MAX_CPU_ISA     | **ON**, OFF                         | Enables [CPU dispatcher controls](@ref dev_guide_cpu_dispatcher_control)
+
+## Run-time Controls
+
+When the feature is enabled at build-time, the `DNNL_MAX_CPU_ISA` environment
+variable can be used to limit processor features oneDNN is able to detect to
+certain Instruction Set Architecture (ISA) and older instruction sets.
+
+| Environment variable | Value            | Description
+| :---                 | :---             | :---
+| DNNL_MAX_CPU_ISA     | SSE41            | Intel Streaming SIMD Extensions 4.1 (Intel SSE4.1)
+|                      | AVX              | Intel Advanced Vector Extensions (Intel AVX)
+|                      | AVX2             | Intel Advanced Vector Extensions 2 (Intel AVX2)
+|                      | AVX512_MIC       | Intel Advanced Vector Extensions 512 (Intel AVX-512) with AVX512CD, AVX512ER, and AVX512PF extensions
+|                      | AVX512_MIC_4OPS  | Intel AVX-512 with AVX512_4FMAPS and AVX512_4VNNIW extensions
+|                      | AVX512_CORE      | Intel AVX-512 with AVX512BW, AVX512VL, and AVX512DQ extensions
+|                      | AVX512_CORE_VNNI | Intel AVX-512 with Intel Deep Learning Boost (Intel DL Boost)
+|                      | AVX512_CORE_BF16 | Intel AVX-512 with Intel DL Boost and bfloat16 support
+|                      | **ALL**          | **No restrictions on ISA (default)**
+
+@note The ISAs are partially ordered:
+* SSE41 < AVX < AVX2,
+* AVX2 < AVX512_MIC < AVX512_MIC_4OPS,
+* AVX2 < AVX512_CORE < AVX512_CORE_VNNI < AVX512_CORE_BF16.
+
+This feature can also be managed at run-time with the following functions:
+* @ref dnnl::set_max_cpu_isa function allows changing the ISA at run-time.
 The limitation is that, it is possible to set the value only before the first
 JIT-ed function is generated. This limitation ensures that the JIT-ed code
 observe consistent CPU features both during generation and execution.
+* @ref dnnl::get_effective_cpu_isa function returns the currently used CPU ISA
+which is the highest available CPU ISA by default.
 
-The `dnnl::get_effective_cpu_isa()` function returns the currently used CPU ISA
-which is the highest available CPU ISA by default. This behavior can be
-overridden via the `DNNL_MAX_CPU_ISA` environment variable or by
-`dnnl::set_max_cpu_isa()` function.
-
-This feature can be enabled or disabled at build time. See @ref
-dev_guide_build_options for more information.
+Function settings take precedence over environment variables.
diff --git a/doc/performance_considerations/profilers.md b/doc/performance_considerations/profilers.md
@@ -1,4 +1,4 @@
-Profiling oneDNN performance {#dev_guide_profilers}
+Profiling oneDNN Performance {#dev_guide_profilers}
 =================================================
 
 oneDNN uses JIT (just-in-time) code generation based on primitive parameters and
@@ -7,33 +7,41 @@ performance event information, profilers need to be notified about
 address ranges containing JIT-ed code. oneDNN supports two profilers:
 VTune Amplifier and Linux perf.
 
-At build-time, support for this feature is controlled via cmake option
-`DNNL_ENABLE_JIT_PROFILING`:
+## Build-time Controls
 
-| Option                      | Possible Values (defaults in bold)   | Description
-| :---                        |:---                                  | :---
-|DNNL_ENABLE_JIT_PROFILING    | **ON**, OFF                          | Enables integration with performance profilers
+At build-time, support for this feature is controlled via cmake option
+`DNNL_ENABLE_JIT_PROFILING`.
 
-At run-time, this feature can be controlled via the following two functions:
+| CMake Option                | Supported values (defaults in bold) | Description
+| :---                        | :---                                | :---
+| DNNL_ENABLE_JIT_PROFILING   | **ON**, OFF                         | Enables performance profilers integration
 
-* @ref dnnl_set_jit_profiling_flags
+## Run-time Controls
 
-* @ref dnnl_set_jit_profiling_jitdumpdir
+When the feature is enabled at build-time, the `DNNL_JIT_PROFILE` environment
+variable can be used to manage integration with performance profilers.
 
-or via the `DNNL_JIT_PROFILE` environment variable which accepts the same
-values as the @ref dnnl_set_jit_profiling_flags function. The following
-individual flags may be OR-ed:
+| Environment variable | Value            | Description
+| :---                 | :---             | :---
+| DNNL_JIT_PROFILE     | **1**            | **Enables VTune integration (default)**
+|                      | 2                | Enables basic Linux perf integration
+|                      | 6                | Enables Linux perf integration with JIT dump output
+|                      | 14               | Enables Linux perf integration with JIT dump output and TSC timestamps
 
-* @ref DNNL_JIT_PROFILE_VTUNE = 1: @copybrief DNNL_JIT_PROFILE_VTUNE
-* @ref DNNL_JIT_PROFILE_LINUX_PERFMAP = 2: @copybrief DNNL_JIT_PROFILE_LINUX_PERFMAP
-* @ref DNNL_JIT_PROFILE_LINUX_JITDUMP = 4: @copybrief DNNL_JIT_PROFILE_LINUX_JITDUMP
-* @ref DNNL_JIT_PROFILE_LINUX_JITDUMP_USE_TSC = 8: @copybrief DNNL_JIT_PROFILE_LINUX_JITDUMP_USE_TSC
+Other valid values for `DNNL_JIT_PROFILE` include integer values representing
+a combination of flags accepted by @ref dnnl_set_jit_profiling_flags function.
 
 The default setting of the profiling flags is to enable integration with
-VTune Amplifier, therefore it does not require any additional setup and works
+VTune Amplifier; therefore it does not require any additional setup and works
 out of the box. Code integrating oneDNN may override this behavior.
 
-## Example: profiling with VTune Amplifier
+This feature can also be managed at run-time with the following functions:
+* @ref dnnl_set_jit_profiling_flags
+* @ref dnnl_set_jit_profiling_jitdumpdir
+
+Function settings take precedence over environment variables.
+
+## Example: Profiling with VTune Amplifier
 
 Assuming that environment is set up already.
 
@@ -72,7 +80,7 @@ the `[Dynamic code]` module.
 
 See more examples in the [VTune Amplifier User Guide](https://software.intel.com/en-us/vtune-amplifier-help-tutorials-and-samples)
 
-## Example: profiling with Linux perf
+## Example: Profiling with Linux perf
 
 The following command instructs oneDNN to enable both jitdump and perfmap
 profiling modes and write jitdump files into `.debug` directory in the current

diff --git a/doc/performance_considerations/verbose.md b/doc/performance_considerations/verbose.md
@@ -7,17 +7,7 @@ the most time. oneDNN verbose mode enables tracing execution of oneDNN
 primitives and collection of basic statistics like execution time and
 primitive parameters.
 
-The behavior is controlled with `DNNL_VERBOSE` environment variable or
-@ref dnnl_set_verbose function.
-
-| Value | Behavior
-| :---- | :----
-| **0** | no verbose output (default)
-| 1     | primitive information at execution
-| 2     | primitive information at creation and execution
-
-The function setting takes precedence over the environment variable.
-
+When verbose mode is enabled oneDNN will print out information to `stdout`.
 The first lines of verbose information contain the build version and git hash,
 if available, as well as CPU and GPU runtimes, and the supported instruction
 set architecture.
@@ -35,13 +25,38 @@ containing:
 - a problem description in [benchdnn format](@ref dev_guide_benchdnn)
 - execution time in milliseconds
 
+## Build-time Controls
+
+At build-time, support for this feature is controlled via cmake option
+`DNNL_VERBOSE`.
+
+| CMake Option                | Supported values (defaults in bold) | Description
+| :---                        | :---                                | :---
+| DNNL_VERBOSE                | **ON**, OFF                         | Enables [verbose mode](@ref dev_guide_verbose)
+
+## Run-time Controls
+
+When the feature is enabled at build-time, the `DNNL_VERBOSE` environment
+variable can be used to turn verbose mode on and control the level of verbosity.
+
+| Environment variable | Value            | Description
+| :---                 | :---             | :---
+| DNNL_VERBOSE         | **0**            | **no verbose output (default)**
+|                      | 1                | primitive information at execution
+|                      | 2                | primitive information at creation and execution
+
+This feature can also be managed at run-time with the following functions:
+* @ref dnnl_set_verbose
+
+The function setting takes precedence over the environment variable.
+
 ## Example
 
 ~~~sh
 DNNL_VERBOSE=1 ./benchdnn --conv ic16ih7oc16oh7kh5ph2n"wip"
 ~~~
 
-This produces the following output (the line break was added to fit the page width):
+This produces the following output (the line breaks were added to fit the page width):
 
 ~~~sh
 dnnl_verbose,info,DNNL v1.3.0 (commit d0fc158e98590dfad0165e568ca466876a794597)
@@ -52,9 +67,7 @@ dnnl_verbose,exec,cpu,reorder,jit:uni,undef,src_f32::blocked:abcd:f0 dst_f32::bl
 dnnl_verbose,exec,cpu,reorder,jit:uni,undef,src_f32::blocked:abcd:f0 dst_f32::blocked:ABcd8b8a:f0,,,16x16x5x5,0.0251465
 dnnl_verbose,exec,cpu,reorder,jit:uni,undef,src_f32::blocked:abcd:f0 dst_f32::blocked:aBcd8b:f0,,,2x16x7x7,0.0180664
 dnnl_verbose,exec,cpu,reorder,simple:any,undef,src_f32::blocked:a:f0 dst_f32::blocked:a:f0,,,16,0.0229492
-dnnl_verbose,exec,cpu,convolution,jit:avx2,forward_training,
-    src_f32::blocked:aBcd8b:f0 wei_f32::blocked:ABcd8b8a:f0 bia_f32::blocked:a:f0 dst_f32::blocked:aBcd8b:f0,,
-    alg:convolution_direct,mb2_ic16oc16_ih7oh7kh5sh1dh0ph2_iw7ow7kw5sw1dw0pw2,0.0390625
+dnnl_verbose,exec,cpu,convolution,jit:avx2,forward_training,src_f32::blocked:aBcd8b:f0 wei_f32::blocked:ABcd8b8a:f0 bia_f32::blocked:a:f0 dst_f32::blocked:aBcd8b:f0,,alg:convolution_direct,mb2_ic16oc16_ih7oh7kh5sh1dh0ph2_iw7ow7kw5sw1dw0pw2,0.0390625
 dnnl_verbose,exec,cpu,reorder,jit:uni,undef,src_f32::blocked:aBcd8b:f0 dst_f32::blocked:abcd:f0,,,2x16x7x7,0.173096
 ~~~