The Caliper ConfigManager API

Caliper provides several built-in performance measurement and reporting configurations. These can be activated from within a program with the ConfigManager API using a short configuration string. Configuration strings can be hard-coded in the program or provided by the user in some form, e.g. as a command-line parameter or in the programs's configuration file.

To access and control the built-in configurations, create a cali::ConfigManager object. Add a configuration string with add(), start the requested configuration channels with start(), and trigger output with flush():

#include <caliper/cali-manager.h>

int main(int argc, char* argv[])
{
  cali::ConfigManager mgr;

  if (argc > 1)
    mgr.add(argv[1]);
  if (mgr.error())
    std::cerr << "Config error: " << mgr.error_msg() << std::endl;
  // ...
  mgr.start(); // start requested performance measurement channels
  // ... (program execution)
  mgr.flush(); // write performance results
}

A complete code example is here.

ConfigManager Configuration String Syntax

A configuration string for the ConfigManager class is a comma-separated list of configs and parameters.

A config is the name of one of Caliper's built-in measurement configurations, e.g. runtime-report or event-trace. Multiple configs can be specified, separated by comma.

Most configs have optional parameters, e.g. output to name an output file. Parameters can be specified as a list of key-value pairs in parantheses after the config name, e.g. runtime-report(output=report.txt,io.bytes=true). For boolean parameters, only the key needs to be added to enable it; for example, io.bytes is equal to io.bytes=true.

Parameters can also be listed separately in the config string, outside of parentheses. In that case, the parameter applies to all configs, whereas parameters inside parentheses only apply to the config where they are listed. For example, in runtime-report(io.bytes),spot,mem.highwatermark, the mem.highwatermark option will be active in both the runtime-report and spot config, whereas io.bytes will only be active for runtime-report. Configs and parameters can be listed in any order.

Here is a more complex example:

runtime-report(output=stdout),profile.cuda,mem.highwatermark,event-trace(output=trace.cali,trace.io)

This will print a runtime profile to stdout, including CUDA API calls and memory high-water marks in the profile, and write an event trace with region begin/end and I/O operations into the trace.cali file.

Built-in Configs

The following sections describe the ConfigManager's built-in configs and their parameters.

runtime-report

The runtime-report config prints a time profile for Caliper-annotated regions:

Path                                         Inclusive time Exclusive time Time %
main                                               0.149681       0.008896  5.830346
  lulesh.cycle                                     0.140785       0.000053  0.034736
    LagrangeLeapFrog                               0.140667       0.000050  0.032769
      CalcTimeConstraintsForElems                  0.004589       0.004589  3.007583
      LagrangeElements                             0.086606       0.000201  0.131733
        ApplyMaterialPropertiesForElems            0.078804       0.000377  0.247082
          EvalEOSForElems                          0.078427       0.014422  9.452029
            CalcEnergyForElems                     0.064005       0.064005 41.948211
        CalcQForElems                              0.004483       0.002415  1.582766
          CalcMonotonicQForElems                   0.002068       0.002068  1.355346
        CalcLagrangeElements                       0.003118       0.000460  0.301479
          CalcKinematicsForElems                   0.002658       0.002658  1.742026
      LagrangeNodal                                0.049422       0.011016  7.219772
        CalcForceForNodes                          0.038406       0.002194  1.437925
          CalcVolumeForceForElems                  0.036212       0.000691  0.452874
            CalcHourglassControlForElems           0.023037       0.010822  7.092626
              CalcFBHourglassForceForElems         0.012215       0.012215  8.005584
            IntegrateStressForElems                0.012484       0.012484  8.181884
    TimeIncrement                                  0.000065       0.000065  0.042600

By default, runtime-report collects a time profile and writes it to stderr on flush(). For MPI programs, it aggregates results across ranks and prints statistics.

Parameter name	Description
aggregate_across_ranks	Enable/disable aggregation across ranks in MPI programs
output	Output file name, or stderr/stdout
mem.highwatermark	Record and print memory high-water mark in annotated regions
io.bytes	Record and print I/O bytes written and read in annotated regions
io.bandwidth	Record and print I/O bandwidth in annotated regions
profile.mpi	Profile MPI functions
profile.cuda	Profile host-side CUDA API functions (e.g, cudaMalloc)

event-trace

The event-trace config records a trace of region enter/exit events and writes it out in Caliper's .cali format for processing with Caliper's cali-query tool. It generates a unique output file name by default. Every process writes a separate file.

Parameter name	Description
event.timestamps	Add time since program start in the event records
output	Output file name, or stderr/stdout
trace.mpi	Trace MPI functions
trace.cuda	Trace host-side CUDA API functions (e.g, cudaMalloc)
trace.io	Trace I/O

nvprof

The nvprof config forwards Caliper annotations to NVidia's NVProf profiler. It generates no output on its own.

spot

The spot config records a time profile and writes a .cali file to be processed by the Spot web visualization framework.

Parameter name	Description
aggregate_across_ranks	Enable/disable aggregation across ranks in MPI programs
output	Output file name, or stderr/stdout
mem.highwatermark	Record and print memory high-water mark in annotated regions
io.bytes	Record and print I/O bytes written and read in annotated regions
io.bandwidth	Record and print I/O bandwidth in annotated regions

hatchet-region-profile

The hatchet-region-profile config collects per-process region time profiles meant for processing with the hatchet library. By default, it will write a json file that hatchet can read. It can also write Caliper .cali files for processing with Caliper's cali-query tool.

Parameter name	Description
output	Output file name, or stderr/stdout
output.format	Output format. 'json-split', 'json', or 'cali'. Default: 'json-split'.
io.bytes	Record and print I/O bytes written and read in annotated regions
profile.mpi	Profile MPI functions
profile.cuda	Profile host-side CUDA API functions (e.g, cudaMalloc)

hatchet-sample-profile

The hatchet-sample-profile config performs call-path sampling to collect per-thread call-path and region sample profiles. By default, it writes json files for processing with the hatchet library. Profiles contain the sample count per region/call path.

Most options for the hatchet-sample-profile config require that Caliper is built with dyninst support.

Parameter name	Description
output	Output file name, or stderr/stdout
output.format	Output format. 'json-split', 'json', or 'cali'. Default: 'json-split'.
sample.frequency	Sampling frequency in Hz. Default: 200.
sample.threads	Sample each thread. Default: true.
sample.callpath	Perform call-stack lookup at each sample. Default: true.
lookup.module	Lookup the module names (.so/exe) where samples were taken.
lookup.sourceloc	Lookup the source file/line number where samples were taken.

cuda-activity

The cuda-activity config records CUDA activities (kernel execution, memory copies, etc.), and prints a region profile of time spent on the GPU as well as on the host. GPU activity time is assigned to the Caliper-annotated regions on the host where the activity was launched. Requires CUPTI support. The config performs tracing to record GPU activities, it is therefore not recommended for long-running executions. Example output:

Path                              Avg Host Time Max Host Time Avg GPU Time Max GPU Time GPU %
main                                   1.614494      1.615764     0.693623     0.836652 42.962240
  qs.mainloop                          1.614295      1.615514     0.693623     0.836652 42.967531
    cycleFinalize                      0.001150      0.001530
    cycleTracking                      1.312167      1.380361     0.693623     0.836652 52.860857
      cycleTracking_Test_Done          0.000161      0.000162
      cycleTracking_MPI                0.285434      0.313542
        cycleTracking_Test_Done        0.019785      0.036499
      cycleTracking_Kernel             0.841515      0.917650     0.693623     0.836652 82.425458
    cycleInit                          0.296091      0.384112

Avg/Max Host and GPU time are average and maximum inclusive runtime for host and GPU activities, respectively, over all MPI ranks. "GPU %" is the ratio of GPU time to Host runtime in percent.

Parameter name	Description
aggregate_across_ranks	Enable/disable aggregation across ranks in MPI programs
output	Output file name, or stderr/stdout
profile.mpi	Profile MPI functions
profile.cuda	Profile host-side CUDA API functions (e.g, cudaMalloc)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ConfigManagerAPI.md

ConfigManagerAPI.md

The Caliper ConfigManager API

ConfigManager Configuration String Syntax

Built-in Configs

runtime-report

event-trace

nvprof

spot

hatchet-region-profile

hatchet-sample-profile

cuda-activity

Files

ConfigManagerAPI.md

Latest commit

History

ConfigManagerAPI.md

File metadata and controls

The Caliper ConfigManager API

ConfigManager Configuration String Syntax

Built-in Configs

runtime-report

event-trace

nvprof

spot

hatchet-region-profile

hatchet-sample-profile

cuda-activity