Skip to content

Latest commit

 

History

History
147 lines (124 loc) · 8.58 KB

benchmarks.md

File metadata and controls

147 lines (124 loc) · 8.58 KB

Benchmarks {#benchmarks}

\note Benchmarks discussed in this section are only to reason about relative performance when compared to a hand-written code. Real-world performance heavily depends on a particular message structure and access pattern.

To reason about the performance of generated code I've made a set of benchmarks around this message:

<sbe:message name="msg1" id="1">
    <field name="field1" id="1" type="uint32"/>
    <field name="field2" id="2" type="uint32"/>
    <field name="field3" id="3" type="uint32"/>
    <field name="field4" id="4" type="uint32"/>
    <field name="field5" id="5" type="uint32"/>

    <group name="flat_group" id="10">
        <field name="field1" id="1" type="uint32"/>
        <field name="field2" id="2" type="uint32"/>
        <field name="field3" id="3" type="uint32"/>
        <field name="field4" id="4" type="uint32"/>
        <field name="field5" id="5" type="uint32"/>
    </group>

    <group name="nested_group" id="20">
        <field name="field1" id="1" type="uint32"/>
        <field name="field2" id="2" type="uint32"/>
        <field name="field3" id="3" type="uint32"/>
        <field name="field4" id="4" type="uint32"/>
        <field name="field5" id="5" type="uint32"/>
        <data name="data" id="6" type="varDataEncoding"/>
    </group>

    <group name="nested_group2" id="30">
        <field name="field1" id="1" type="uint32"/>
        <field name="field2" id="2" type="uint32"/>
        <field name="field3" id="3" type="uint32"/>
        <field name="field4" id="4" type="uint32"/>
        <field name="field5" id="5" type="uint32"/>

        <group name="nested_group" id="20">
            <field name="field1" id="1" type="uint32"/>
            <field name="field2" id="2" type="uint32"/>
            <field name="field3" id="3" type="uint32"/>
            <field name="field4" id="4" type="uint32"/>
            <field name="field5" id="5" type="uint32"/>
            <data name="data" id="6" type="varDataEncoding"/>
        </group>
    </group>

    <data name="data" id="6" type="varDataEncoding"/>
</sbe:message>

They all use the same scenario: read all message fields in-order up to a certain point. For example, top_level_fields_benchmark reads only 5 top-level fields, flat_group_benchmark reads top-level fields and all fields in all entries of flat_group and so on.
There are 4 different reading methods:

  • raw_reader, a reader written by hand which uses pointer arithmetic and casts
  • sbepp_reader, a reader which uses normal accessors of sbepp generated code
  • sbepp_cursor_reader, a reader which uses cursor-based accessors of sbepp generated code
  • real_logic_reader, a reader which uses code generated by RealLogic which provides a forward-only access

The idea was to compare performance of normal and cursor-based accessors to the code written by hand with a message of gradually increasing complexity. All the measurements were done for a pack of 1000 messages but using two different strategies:

  1. Fixed group size and data field size to 10. Since all message have the same structure, this benchmark is quite stable and was used for the following analysis.

    Intel(R) Core(TM) i9-10900 CPU @ 2.80GHz
    
    Benchmark                                                                 Time             CPU   Iterations
    sbepp_reader::top_level_fields_benchmark/1000/10/10/10/10              1515 ns         1515 ns       439318
    sbepp_reader::flat_group_benchmark/1000/10/10/10/10                   23784 ns        23783 ns        29424
    sbepp_reader::nested_group_benchmark/1000/10/10/10/10                 60434 ns        60431 ns        11524
    sbepp_reader::nested_group2_benchmark/1000/10/10/10/10               580107 ns       580068 ns         1208
    sbepp_reader::whole_message_benchmark/1000/10/10/10/10               822789 ns       822741 ns          848
    
    sbepp_cursor_reader::top_level_fields_benchmark/1000/10/10/10/10       1516 ns         1516 ns       462815
    sbepp_cursor_reader::flat_group_benchmark/1000/10/10/10/10            23767 ns        23765 ns        29446
    sbepp_cursor_reader::nested_group_benchmark/1000/10/10/10/10          59644 ns        59642 ns        11640
    sbepp_cursor_reader::nested_group2_benchmark/1000/10/10/10/10        397326 ns       397305 ns         1732
    sbepp_cursor_reader::whole_message_benchmark/1000/10/10/10/10        412343 ns       412322 ns         1716
    
    raw_reader::top_level_fields_benchmark/1000/10/10/10/10                1518 ns         1517 ns       460772
    raw_reader::flat_group_benchmark/1000/10/10/10/10                     23761 ns        23759 ns        29490
    raw_reader::nested_group_benchmark/1000/10/10/10/10                   62226 ns        62219 ns        11198
    raw_reader::nested_group2_benchmark/1000/10/10/10/10                 431421 ns       431394 ns         1617
    raw_reader::whole_message_benchmark/1000/10/10/10/10                 423216 ns       423194 ns         1654
    
    real_logic_reader::top_level_fields_benchmark/1000/10/10/10/10         1524 ns         1524 ns       462506
    real_logic_reader::flat_group_benchmark/1000/10/10/10/10              23044 ns        23042 ns        30361
    real_logic_reader::nested_group_benchmark/1000/10/10/10/10            60635 ns        60632 ns        11447
    real_logic_reader::nested_group2_benchmark/1000/10/10/10/10          422053 ns       422028 ns         1642
    real_logic_reader::whole_message_benchmark/1000/10/10/10/10          431510 ns       431489 ns         1642
    
  2. Randomized group size in range [0; 20] and data size in range [0; 32]. This one cannot be used to compare different reading approaches since message structure heavily changes and is only provided for a reference.

    Intel(R) Core(TM) i9-10900 CPU @ 2.80GHz
    
    Benchmark                                                                 Time             CPU   Iterations
    sbepp_reader::top_level_fields_benchmark/1000/0/20/0/32                1520 ns         1520 ns       460833
    sbepp_reader::flat_group_benchmark/1000/0/20/0/32                     21984 ns        21983 ns        29613
    sbepp_reader::nested_group_benchmark/1000/0/20/0/32                  139916 ns       139912 ns         4900
    sbepp_reader::nested_group2_benchmark/1000/0/20/0/32                1507963 ns      1507874 ns          481
    sbepp_reader::whole_message_benchmark/1000/0/20/0/32                1818439 ns      1818343 ns          388
    
    sbepp_cursor_reader::top_level_fields_benchmark/1000/0/20/0/32         1511 ns         1511 ns       463569
    sbepp_cursor_reader::flat_group_benchmark/1000/0/20/0/32              22442 ns        22442 ns        30635
    sbepp_cursor_reader::nested_group_benchmark/1000/0/20/0/32           137442 ns       137438 ns         5036
    sbepp_cursor_reader::nested_group2_benchmark/1000/0/20/0/32         1251388 ns      1251352 ns          540
    sbepp_cursor_reader::whole_message_benchmark/1000/0/20/0/32         1304626 ns      1304581 ns          538
    
    raw_reader::top_level_fields_benchmark/1000/0/20/0/32                  1511 ns         1511 ns       463647
    raw_reader::flat_group_benchmark/1000/0/20/0/32                       22794 ns        22793 ns        29730
    raw_reader::nested_group_benchmark/1000/0/20/0/32                    137293 ns       137289 ns         4893
    raw_reader::nested_group2_benchmark/1000/0/20/0/32                  1307361 ns      1307269 ns          533
    raw_reader::whole_message_benchmark/1000/0/20/0/32                  1296803 ns      1296747 ns          544
    
    real_logic_reader::top_level_fields_benchmark/1000/0/20/0/32           1510 ns         1510 ns       463907
    real_logic_reader::flat_group_benchmark/1000/0/20/0/32                23054 ns        23053 ns        30689
    real_logic_reader::nested_group_benchmark/1000/0/20/0/32             141231 ns       141225 ns         5048
    real_logic_reader::nested_group2_benchmark/1000/0/20/0/32           1301144 ns      1301107 ns          524
    real_logic_reader::whole_message_benchmark/1000/0/20/0/32           1371855 ns      1371795 ns          539
    

We can see that when message structure is simple, like in top_level_fields_benchmark and flat_group_benchmark, there's no reason to use more complex cursor-based accessors. Even in nested_group_benchmark there's no significant gain because a single data member is not a big deal, computing it's length is a single memory read. Only starting from nested_group2_benchmark cursor-based API really starts to shine since message structure becomes really complex at that point.