\note Benchmarks discussed in this section are only to reason about relative performance when compared to a hand-written code. Real-world performance heavily depends on a particular message structure and access pattern.
To reason about the performance of generated code I've made a set of benchmarks around this message:
<sbe:message name="msg1" id="1">
<field name="field1" id="1" type="uint32"/>
<field name="field2" id="2" type="uint32"/>
<field name="field3" id="3" type="uint32"/>
<field name="field4" id="4" type="uint32"/>
<field name="field5" id="5" type="uint32"/>
<group name="flat_group" id="10">
<field name="field1" id="1" type="uint32"/>
<field name="field2" id="2" type="uint32"/>
<field name="field3" id="3" type="uint32"/>
<field name="field4" id="4" type="uint32"/>
<field name="field5" id="5" type="uint32"/>
</group>
<group name="nested_group" id="20">
<field name="field1" id="1" type="uint32"/>
<field name="field2" id="2" type="uint32"/>
<field name="field3" id="3" type="uint32"/>
<field name="field4" id="4" type="uint32"/>
<field name="field5" id="5" type="uint32"/>
<data name="data" id="6" type="varDataEncoding"/>
</group>
<group name="nested_group2" id="30">
<field name="field1" id="1" type="uint32"/>
<field name="field2" id="2" type="uint32"/>
<field name="field3" id="3" type="uint32"/>
<field name="field4" id="4" type="uint32"/>
<field name="field5" id="5" type="uint32"/>
<group name="nested_group" id="20">
<field name="field1" id="1" type="uint32"/>
<field name="field2" id="2" type="uint32"/>
<field name="field3" id="3" type="uint32"/>
<field name="field4" id="4" type="uint32"/>
<field name="field5" id="5" type="uint32"/>
<data name="data" id="6" type="varDataEncoding"/>
</group>
</group>
<data name="data" id="6" type="varDataEncoding"/>
</sbe:message>
They all use the same scenario: read all message fields in-order up to a certain
point. For example, top_level_fields_benchmark
reads only 5 top-level fields,
flat_group_benchmark
reads top-level fields and all fields in all entries of
flat_group
and so on.
There are 4 different reading methods:
raw_reader
, a reader written by hand which uses pointer arithmetic and castssbepp_reader
, a reader which uses normal accessors ofsbepp
generated codesbepp_cursor_reader
, a reader which uses cursor-based accessors ofsbepp
generated codereal_logic_reader
, a reader which uses code generated by RealLogic which provides a forward-only access
The idea was to compare performance of normal and cursor-based accessors to the code written by hand with a message of gradually increasing complexity. All the measurements were done for a pack of 1000 messages but using two different strategies:
-
Fixed group size and data field size to
10
. Since all message have the same structure, this benchmark is quite stable and was used for the following analysis.Intel(R) Core(TM) i9-10900 CPU @ 2.80GHz Benchmark Time CPU Iterations sbepp_reader::top_level_fields_benchmark/1000/10/10/10/10 1515 ns 1515 ns 439318 sbepp_reader::flat_group_benchmark/1000/10/10/10/10 23784 ns 23783 ns 29424 sbepp_reader::nested_group_benchmark/1000/10/10/10/10 60434 ns 60431 ns 11524 sbepp_reader::nested_group2_benchmark/1000/10/10/10/10 580107 ns 580068 ns 1208 sbepp_reader::whole_message_benchmark/1000/10/10/10/10 822789 ns 822741 ns 848 sbepp_cursor_reader::top_level_fields_benchmark/1000/10/10/10/10 1516 ns 1516 ns 462815 sbepp_cursor_reader::flat_group_benchmark/1000/10/10/10/10 23767 ns 23765 ns 29446 sbepp_cursor_reader::nested_group_benchmark/1000/10/10/10/10 59644 ns 59642 ns 11640 sbepp_cursor_reader::nested_group2_benchmark/1000/10/10/10/10 397326 ns 397305 ns 1732 sbepp_cursor_reader::whole_message_benchmark/1000/10/10/10/10 412343 ns 412322 ns 1716 raw_reader::top_level_fields_benchmark/1000/10/10/10/10 1518 ns 1517 ns 460772 raw_reader::flat_group_benchmark/1000/10/10/10/10 23761 ns 23759 ns 29490 raw_reader::nested_group_benchmark/1000/10/10/10/10 62226 ns 62219 ns 11198 raw_reader::nested_group2_benchmark/1000/10/10/10/10 431421 ns 431394 ns 1617 raw_reader::whole_message_benchmark/1000/10/10/10/10 423216 ns 423194 ns 1654 real_logic_reader::top_level_fields_benchmark/1000/10/10/10/10 1524 ns 1524 ns 462506 real_logic_reader::flat_group_benchmark/1000/10/10/10/10 23044 ns 23042 ns 30361 real_logic_reader::nested_group_benchmark/1000/10/10/10/10 60635 ns 60632 ns 11447 real_logic_reader::nested_group2_benchmark/1000/10/10/10/10 422053 ns 422028 ns 1642 real_logic_reader::whole_message_benchmark/1000/10/10/10/10 431510 ns 431489 ns 1642
-
Randomized group size in range
[0; 20]
and data size in range[0; 32]
. This one cannot be used to compare different reading approaches since message structure heavily changes and is only provided for a reference.Intel(R) Core(TM) i9-10900 CPU @ 2.80GHz Benchmark Time CPU Iterations sbepp_reader::top_level_fields_benchmark/1000/0/20/0/32 1520 ns 1520 ns 460833 sbepp_reader::flat_group_benchmark/1000/0/20/0/32 21984 ns 21983 ns 29613 sbepp_reader::nested_group_benchmark/1000/0/20/0/32 139916 ns 139912 ns 4900 sbepp_reader::nested_group2_benchmark/1000/0/20/0/32 1507963 ns 1507874 ns 481 sbepp_reader::whole_message_benchmark/1000/0/20/0/32 1818439 ns 1818343 ns 388 sbepp_cursor_reader::top_level_fields_benchmark/1000/0/20/0/32 1511 ns 1511 ns 463569 sbepp_cursor_reader::flat_group_benchmark/1000/0/20/0/32 22442 ns 22442 ns 30635 sbepp_cursor_reader::nested_group_benchmark/1000/0/20/0/32 137442 ns 137438 ns 5036 sbepp_cursor_reader::nested_group2_benchmark/1000/0/20/0/32 1251388 ns 1251352 ns 540 sbepp_cursor_reader::whole_message_benchmark/1000/0/20/0/32 1304626 ns 1304581 ns 538 raw_reader::top_level_fields_benchmark/1000/0/20/0/32 1511 ns 1511 ns 463647 raw_reader::flat_group_benchmark/1000/0/20/0/32 22794 ns 22793 ns 29730 raw_reader::nested_group_benchmark/1000/0/20/0/32 137293 ns 137289 ns 4893 raw_reader::nested_group2_benchmark/1000/0/20/0/32 1307361 ns 1307269 ns 533 raw_reader::whole_message_benchmark/1000/0/20/0/32 1296803 ns 1296747 ns 544 real_logic_reader::top_level_fields_benchmark/1000/0/20/0/32 1510 ns 1510 ns 463907 real_logic_reader::flat_group_benchmark/1000/0/20/0/32 23054 ns 23053 ns 30689 real_logic_reader::nested_group_benchmark/1000/0/20/0/32 141231 ns 141225 ns 5048 real_logic_reader::nested_group2_benchmark/1000/0/20/0/32 1301144 ns 1301107 ns 524 real_logic_reader::whole_message_benchmark/1000/0/20/0/32 1371855 ns 1371795 ns 539
We can see that when message structure is simple, like in
top_level_fields_benchmark
and flat_group_benchmark
, there's no reason to
use more complex cursor-based accessors. Even in nested_group_benchmark
there's no significant gain because a single data
member is not a big deal,
computing it's length is a single memory read. Only starting from
nested_group2_benchmark
cursor-based API really starts to shine since message
structure becomes really complex at that point.