Self measuring instruction performance tests #839
Replies: 4 comments 6 replies
-
Note: added M and Zfinx test suites as well. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Additional data. This is the performance with C (compressed) turned off:
|
Beta Was this translation helpful? Give feedback.
-
That's impressive - you really put a lot of work in this! 👍 However, this really surprised me:
All simple ALU operations should complete within 2 clock cycles. What kind of memory system did you use for these evaluations? Furthermore, I would suggest stopping the cycle counter if the actual benchmarking instructions are not executed. Otherwise, the loop overhead instructions are also taken into account. neorv32_cpu_csr_write(CSR_MCOUNTINHIBIT, -1); // halt all counters
...
startTime = neorv32_cpu_csr_read(CSR_MCYCLE);
for (i = 0; i < instLoop; i++) {
neorv32_cpu_csr_write(CSR_MCOUNTINHIBIT, 0); // enable all counters
#if instCalls == 16
...
#endif
neorv32_cpu_csr_write(CSR_MCOUNTINHIBIT, -1); // halt all counters
}
stopTime = neorv32_cpu_csr_read(CSR_MCYCLE); Somewhere in the code I saw that you were having troubles with NOP instructions when compiling with compressed instructions. This example shows how to prevent the assembler from emitting compressed NOPs ( asm volatile (".option push \n"
".option norvc \n"
"nop \n"
".option pop \n");
The big "problem" with the
Oh, no IDLE status? How do you manage that? Don't you then have a direct combinatorial feedback from ACK/ERR to STB? |
Beta Was this translation helpful? Give feedback.
-
I've added the first of a series of self measuring performance test utilizing mcycle and some assembly stuff. This will help update the datasheet instruction cycle counts :)
https://github.com/mikaelsky/neorv32/tree/Performance_tests
Right now the MRET test isn't functional. It traps out, probably because of U being enabled.
The main.c file has a boat load of parameters to adjust what is tested and how.
@stnolting one challenge right now, which I'm unsure why its happening, is the cycle counts seem off.
If you look at the readme I pasted in the performance measurement for basic arithmetic instructions. When I run the test internally on xcelium I get the expected cycle count of 2 for everything, except sub which is 3 for some reason. But running with GHDL I get 4, so ~2x.
I'll dig some more on my end. The only core config difference I see is that our core config has C enabled, but that shouldn't matter for I instructions. Beyond I had to do some NOP tuning for all the branch and jump instructions.
A further detail is e.g. branch which comes in at [no branch, forward, backward] (3,7,8) cycles internally but (4,11,11) cycles in the default sim setup. I added a forward and backwards test as the riscv spec notes that if the offset is negative/backwards the branch should always assume true, so can be faster than forward.
I will add my M and Zfinx self measuring tests as well.
Beta Was this translation helpful? Give feedback.
All reactions