Performance counters are not correctly measured in AMD ZEN series #14

joonsung-kim · 2020-11-04T04:55:58Z

Hi.

I have tried to measure the performance counters related to decoder parts (i.e., uops dispatched from legacy x86 decoder <DeDisUopsFromDecoder.DecoderDispatched> or micro-op cache <DeDisUopsFromDecoder.OpCacheDispatched>).
I have tested with a simple code snippet consisting of 8 multi-byte nops (each multi-byte nop is 4 bytes) without unrolling. I thought this code snippet results in a series of micro-op cache hits; however, the results show all uops are dispatched from the legacy x86 decoder, not micro-op cache.

command

sudo ./kernel-nanoBench.sh -basic_mode -unroll_count 1 -loop_count 100000 -cpu 1 -asm "nop ax; nop ax; nop ax; nop ax; nop ax; nop ax; nop ax; nop ax" -config configs/cfg_Zen_all.txt | grep -i "dedisuops"

results (I slightly modified the source code to dump absolute measured counters)

DeDisUopsFromDecoder.DecoderDispatched: 10.00 (1000019)
DeDisUopsFromDecoder.OpCacheDispatched: 0.00 (0)

I cannot understand why every instruction is decoded by the legacy x86 decoder.

I also checked with a simple test program consisting of the same code pattern (see below).
test.s build command: <nasm -f elf64 test.s -o test.o; ld test.o -o test>

global _start

_start:
        mov rdi, 100000
        call test_uop_cache_hit
    mov rax, 60
    mov rdi, 0
    syscall

test_uop_cache_hit:
    nop ax
    nop ax
    nop ax
    nop ax
    nop ax
    nop ax
    nop ax
    nop ax

    dec rdi
    jnz test_uop_cache_hit
    ret

Then, I checked the performance counters with the perf tool.

$perf stat -e cycles,instructions,r01AA,r02AA,r03AA ./test

 Performance counter stats for './test':

            298349      cycles                                                      
           1037949      instructions              #    3.48  insn per cycle                                            
             86233      r01AA                                                       
            999280      r02AA                                                       
           1085721      r03AA                                                       

       0.000433346 seconds time elapsed

The results show major uops are decoded by micro-op cache (r01AA => decoded by the legacy x86 decoder // r02AA => decoded by micro-op cache // r03AA => all uops).

Why nanoBench and perf show different results?

Sincerely.
Joonsung Kim.

The text was updated successfully, but these errors were encountered:

andreas-abel · 2020-11-04T22:25:50Z

Note that the perf tool runs the benchmark in user space. If you use the user-space version of nanoBench (i.e., use nanoBench.sh instead of kernel-nanoBench.sh), the results are very similar to perf.

I do not know why the uops don't come from the uop cache when running the benchmark in kernel space. However, I don't think that the measurements are incorrect.

joonsung-kim · 2020-11-06T06:10:52Z

@andreas-abel

Thanks. with user-mode nanoBench, it works correctly as I expected :). However, still, I can't figure out why kernel-mode nanoBench provides unexplainable results. (Personally, I prefer to use kernel-mode nanoBench to minimize extra overheads.)

Is there any plan to fix this issue in kernel-mode nanoBench?

andreas-abel · 2020-11-06T14:47:58Z

I don't think there is anything to be fixed in nanoBench, as I don't think there is anything wrong. If you don't like how the CPU behaves in kernel mode, you would need to contact AMD ;)

joonsung-kim · 2020-11-08T10:25:42Z

Yes, I also think there seems to be nothing wrong with kernel-mode nanoBench. It would be better to contact AMD people. Thanks for your reply :)

andreas-abel closed this as completed Nov 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance counters are not correctly measured in AMD ZEN series #14

Performance counters are not correctly measured in AMD ZEN series #14

joonsung-kim commented Nov 4, 2020

andreas-abel commented Nov 4, 2020

joonsung-kim commented Nov 6, 2020

andreas-abel commented Nov 6, 2020

joonsung-kim commented Nov 8, 2020

Performance counters are not correctly measured in AMD ZEN series #14

Performance counters are not correctly measured in AMD ZEN series #14

Comments

joonsung-kim commented Nov 4, 2020

andreas-abel commented Nov 4, 2020

joonsung-kim commented Nov 6, 2020

andreas-abel commented Nov 6, 2020

joonsung-kim commented Nov 8, 2020