Benchmarks {#benchmarks}

\note Benchmarks discussed in this section are only to reason about relative performance when compared to a hand-written code. Real-world performance heavily depends on a particular message structure and access pattern.

To reason about the performance of generated code I've made a set of benchmarks around this message:

<sbe:message name="msg1" id="1">
    <field name="field1" id="1" type="uint32"/>
    <field name="field2" id="2" type="uint32"/>
    <field name="field3" id="3" type="uint32"/>
    <field name="field4" id="4" type="uint32"/>
    <field name="field5" id="5" type="uint32"/>

    <group name="flat_group" id="10">
        <field name="field1" id="1" type="uint32"/>
        <field name="field2" id="2" type="uint32"/>
        <field name="field3" id="3" type="uint32"/>
        <field name="field4" id="4" type="uint32"/>
        <field name="field5" id="5" type="uint32"/>
    </group>

    <group name="nested_group" id="20">
        <field name="field1" id="1" type="uint32"/>
        <field name="field2" id="2" type="uint32"/>
        <field name="field3" id="3" type="uint32"/>
        <field name="field4" id="4" type="uint32"/>
        <field name="field5" id="5" type="uint32"/>
        <data name="data" id="6" type="varDataEncoding"/>
    </group>

    <group name="nested_group2" id="30">
        <field name="field1" id="1" type="uint32"/>
        <field name="field2" id="2" type="uint32"/>
        <field name="field3" id="3" type="uint32"/>
        <field name="field4" id="4" type="uint32"/>
        <field name="field5" id="5" type="uint32"/>

        <group name="nested_group" id="20">
            <field name="field1" id="1" type="uint32"/>
            <field name="field2" id="2" type="uint32"/>
            <field name="field3" id="3" type="uint32"/>
            <field name="field4" id="4" type="uint32"/>
            <field name="field5" id="5" type="uint32"/>
            <data name="data" id="6" type="varDataEncoding"/>
        </group>
    </group>

    <data name="data" id="6" type="varDataEncoding"/>
</sbe:message>

They all use the same scenario: read all message fields in-order up to a certain point. For example, top_level_fields_benchmark reads only 5 top-level fields, flat_group_benchmark reads top-level fields and all fields in all entries of flat_group and so on.
There are 4 different reading methods:

raw_reader, a reader written by hand which uses pointer arithmetic and casts
sbepp_reader, a reader which uses normal accessors of sbepp generated code
sbepp_cursor_reader, a reader which uses cursor-based accessors of sbepp generated code
real_logic_reader, a reader which uses code generated by RealLogic which provides a forward-only access

The idea was to compare performance of normal and cursor-based accessors to the code written by hand with a message of gradually increasing complexity. All the measurements were done for a pack of 1000 messages but using two different strategies:

Fixed group size and data field size to 10. Since all message have the same structure, this benchmark is quite stable and was used for the following analysis.

Intel(R) Core(TM) i9-10900 CPU @ 2.80GHz

Benchmark                                                                 Time             CPU   Iterations
sbepp_reader::top_level_fields_benchmark/1000/10/10/10/10              1515 ns         1515 ns       439318
sbepp_reader::flat_group_benchmark/1000/10/10/10/10                   23784 ns        23783 ns        29424
sbepp_reader::nested_group_benchmark/1000/10/10/10/10                 60434 ns        60431 ns        11524
sbepp_reader::nested_group2_benchmark/1000/10/10/10/10               580107 ns       580068 ns         1208
sbepp_reader::whole_message_benchmark/1000/10/10/10/10               822789 ns       822741 ns          848

sbepp_cursor_reader::top_level_fields_benchmark/1000/10/10/10/10       1516 ns         1516 ns       462815
sbepp_cursor_reader::flat_group_benchmark/1000/10/10/10/10            23767 ns        23765 ns        29446
sbepp_cursor_reader::nested_group_benchmark/1000/10/10/10/10          59644 ns        59642 ns        11640
sbepp_cursor_reader::nested_group2_benchmark/1000/10/10/10/10        397326 ns       397305 ns         1732
sbepp_cursor_reader::whole_message_benchmark/1000/10/10/10/10        412343 ns       412322 ns         1716

raw_reader::top_level_fields_benchmark/1000/10/10/10/10                1518 ns         1517 ns       460772
raw_reader::flat_group_benchmark/1000/10/10/10/10                     23761 ns        23759 ns        29490
raw_reader::nested_group_benchmark/1000/10/10/10/10                   62226 ns        62219 ns        11198
raw_reader::nested_group2_benchmark/1000/10/10/10/10                 431421 ns       431394 ns         1617
raw_reader::whole_message_benchmark/1000/10/10/10/10                 423216 ns       423194 ns         1654

real_logic_reader::top_level_fields_benchmark/1000/10/10/10/10         1524 ns         1524 ns       462506
real_logic_reader::flat_group_benchmark/1000/10/10/10/10              23044 ns        23042 ns        30361
real_logic_reader::nested_group_benchmark/1000/10/10/10/10            60635 ns        60632 ns        11447
real_logic_reader::nested_group2_benchmark/1000/10/10/10/10          422053 ns       422028 ns         1642
real_logic_reader::whole_message_benchmark/1000/10/10/10/10          431510 ns       431489 ns         1642

Randomized group size in range [0; 20] and data size in range [0; 32]. This one cannot be used to compare different reading approaches since message structure heavily changes and is only provided for a reference.

Intel(R) Core(TM) i9-10900 CPU @ 2.80GHz

Benchmark                                                                 Time             CPU   Iterations
sbepp_reader::top_level_fields_benchmark/1000/0/20/0/32                1520 ns         1520 ns       460833
sbepp_reader::flat_group_benchmark/1000/0/20/0/32                     21984 ns        21983 ns        29613
sbepp_reader::nested_group_benchmark/1000/0/20/0/32                  139916 ns       139912 ns         4900
sbepp_reader::nested_group2_benchmark/1000/0/20/0/32                1507963 ns      1507874 ns          481
sbepp_reader::whole_message_benchmark/1000/0/20/0/32                1818439 ns      1818343 ns          388

sbepp_cursor_reader::top_level_fields_benchmark/1000/0/20/0/32         1511 ns         1511 ns       463569
sbepp_cursor_reader::flat_group_benchmark/1000/0/20/0/32              22442 ns        22442 ns        30635
sbepp_cursor_reader::nested_group_benchmark/1000/0/20/0/32           137442 ns       137438 ns         5036
sbepp_cursor_reader::nested_group2_benchmark/1000/0/20/0/32         1251388 ns      1251352 ns          540
sbepp_cursor_reader::whole_message_benchmark/1000/0/20/0/32         1304626 ns      1304581 ns          538

raw_reader::top_level_fields_benchmark/1000/0/20/0/32                  1511 ns         1511 ns       463647
raw_reader::flat_group_benchmark/1000/0/20/0/32                       22794 ns        22793 ns        29730
raw_reader::nested_group_benchmark/1000/0/20/0/32                    137293 ns       137289 ns         4893
raw_reader::nested_group2_benchmark/1000/0/20/0/32                  1307361 ns      1307269 ns          533
raw_reader::whole_message_benchmark/1000/0/20/0/32                  1296803 ns      1296747 ns          544

real_logic_reader::top_level_fields_benchmark/1000/0/20/0/32           1510 ns         1510 ns       463907
real_logic_reader::flat_group_benchmark/1000/0/20/0/32                23054 ns        23053 ns        30689
real_logic_reader::nested_group_benchmark/1000/0/20/0/32             141231 ns       141225 ns         5048
real_logic_reader::nested_group2_benchmark/1000/0/20/0/32           1301144 ns      1301107 ns          524
real_logic_reader::whole_message_benchmark/1000/0/20/0/32           1371855 ns      1371795 ns          539

We can see that when message structure is simple, like in top_level_fields_benchmark and flat_group_benchmark, there's no reason to use more complex cursor-based accessors. Even in nested_group_benchmark there's no significant gain because a single data member is not a big deal, computing it's length is a single memory read. Only starting from nested_group2_benchmark cursor-based API really starts to shine since message structure becomes really complex at that point.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

benchmarks.md

benchmarks.md

Benchmarks {#benchmarks}

Files

benchmarks.md

Latest commit

History

benchmarks.md

File metadata and controls

Benchmarks {#benchmarks}