Skip to content

Latest commit

 

History

History
352 lines (287 loc) · 23.4 KB

README.md

File metadata and controls

352 lines (287 loc) · 23.4 KB

This repository contains a code-generator that turns a Deutsche Börse ETI (Enhanced Trading Interface) protocol description into Python bindings. It supports EOBI (market data) protocol descriptions, as well.

There is also a code generator for creating Wireshark protocol dissector from these protocol descriptions.

This is a private research project for investigating how binary serialisation/deserialization code can profit from modern Python features and other experiments.

2021, Georg Sauthoff [email protected]

Use Cases

The generated Python code can be used for several purposes, such as:

  • creating binary message templates for traffic generators or high performance ETI clients
  • analysing captured ETI messages for - say - debugging
  • a concise reference to look up message details such as the name, offset, width, type, etc. of fields
  • writing a ETI traffic generator or test-server in Python

Examples

As an example for how the generated code looks you can check out the output for the T7 ETI version 9 specification.

This repository also contains a simple ETI-Client (eti_client.py) and a small ETI-Server (eti_server.py) that can be used to ping pong some ETI messages over the network. The server runs forever and replies to each request with some context dependent response message or messages (as specified in the protocol specification). If alternative response types are possible, a choice is made by random. Since the server dumps each received ETI message to stdout it can also be used as ad-hoc protocol dissector when developing/testing an ETI client.

There is also a simple EOBI-Client (eobi_client.py) that dumps multicast market data packets, including the DSCP field in which the EOBI protocol encodes market data related information, as well.

Another example is pcapdump.py, a simple PCAP to ETI/EOBI dumper. It pretty-prints EOBI/ETI packets from a PCAP file to stdout in a human-readable format. Note that for simplicity it assumes that ETI-TCP-packets just contain complete ETI messages and start with an ETI message header which is usually the case, in practice. Of course, since it's ETI over TCP and TCP is a stream oriented protocol it's perfectly fine for a client to span ETI-messages over TCP segment boundaries. Adding TCP reassembly to the example can be seen as an exercise.

The pcapgen.py script shows how to quickly generate/fake some ETI/EOBI PCAP files from scratch for testing purposes.

Protocol Descriptions

Deutsche Börse publishes the ETI protocol descriptions on their web sites. Since they are sometimes kind of hard to find I include some links:

EOBI descriptions:

Related Documentation

The previous section contains links into the Euex/Xetra system documentation which includes manuals and reference manuals on the various protocols and services.

Besides the protocols there is also the N7 Network Access Guide which lists the various ports and IP addresses in use for these protocols:

The functional reference gives some background on how the exchange system (the order matching etc.) is supposed to work:

Python Notes

The main noteworthy modern Python features the generated code uses are Python enumerations (available since Python 3.4) and dataclasses (available since Python 3.7, for Python 3.6 there is a backport).

Dataclasses provide some syntactic sugar for dealing with mutable named records in Python. Their use of type annotations and default values allow for compact definitions. Two things to keep in mind with dataclasses are that default value definitions must be immutable and that additional (non-annotated) fields can accidentally added my typos. Thus, the generated code uses default factory functions for mutable defaults and overwrites __setattr__() to check for unknown fields.

The generated code also makes heavy use of Python's neat struct package for serializing and deserializing spans of elementary fields. This isn't a recent addition to Python, however, memoryviews, which are often a useful tool for avoiding buffer churning were added as late as Python 2.7.

Performance

Of course, Python trades some runtime speed for syntactic sugar and usability, and you wouldn't write performance critical code in Python. Having said that, serializing/deserializing shouldn't be too slow, either.

The file bench_eti.py contains a small benchmark that repeatedly serializes an IOC (immediate-or-cancel) order after changing a few fields, while avoiding buffer churning.

On a Skylake i7-6600U Laptop this results in:

$ pytest bench_eti.py 
------------------------------------------------------------------------------------- benchmark: 2 tests -------------------------------------------------------------------------------------
Name (time in us)         Min                   Max               Mean             StdDev             Median               IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_pack_ioc          4.4320 (1.0)        158.2600 (1.0)       4.9058 (1.0)       2.1368 (1.0)       4.7510 (1.0)      0.1140 (1.0)       403;688      203.8419 (1.0)       18464           1
test_unpack_ioc       20.6270 (4.65)     1,231.7820 (7.78)     23.7020 (4.83)     10.6706 (4.99)     22.3940 (4.71)     2.5851 (22.67)   1063;1534       42.1906 (0.21)      22066           1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Legend:
  Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
  OPS: Operations Per Second, computed as 1 / Mean

That means on that machine the code serializes ~ 200 k IOC orders per seconds (with cpython) which is quite ok.

Using PyPy, the numbers are much better (same machine):

$ pypy3 -m pytest bench_eti.py
----------------------------------------------------------------------------------------------- benchmark: 2 tests -----------------------------------------------------------------------------------------------
Name (time in ns)            Min                       Max                  Mean                 StdDev                Median                 IQR             Outliers  OPS (Kops/s)            Rounds  Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_pack_ioc           258.9996 (1.0)        334,982.7357 (1.0)        317.4350 (1.0)       1,081.2319 (1.0)        284.9449 (1.0)       25.3693 (1.0)      323;20639    3,150.2509 (1.0)      191351          19
test_unpack_ioc       3,491.9940 (13.48)    3,202,659.4854 (9.56)     4,626.1754 (14.57)    14,446.8865 (13.36)    3,695.9827 (12.97)    345.4916 (13.62)   1321;18103      216.1613 (0.07)     142817           2
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Legend:
  Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
  OPS: Operations Per Second, computed as 1 / Mean

Basically PyPy speeds up the serialization by a factor of 10 and the deserialization by a factor of 5 or so. Note the change in units in the pytest output (from µs to ns).

Protocol Introduction

The ETI and EOBI protocols specify a message stream, where each message is tagged and starts with a length field, although most messages are of fixed size. Most message fields are of fixed size, those which aren't are prefixed with an accompanying length field. Integers a encoded in little endian byte order, each field size is divisible by 8 bits, and the size of each message is divisible by 8 bytes.

One important difference between the ETI and EOBI encoding is that whole EOBI messages are of fixed size whereas ETI messages may vary in size and only their sub-records are of fixed size. That means that arrays in ETI messages are minimally encoded (i.e. only the filled elements are put on the wire) while arrays in EOBI are fully encoded (i.e. trailing empty elements act as additional padding). Some ETI messages also include string fields of variable size and those are zero-padded such that the message size is divisible by 8.

ETI runs over TCPv4 while EOBI is specified on top of UDPv4.

See for example the ETI request header:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                          Body Length                          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|          Template ID          |                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               +
|                  Network Message ID (unused)                  |
+                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                               |              pad              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                    Message Sequence Number                    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                         Sender Sub ID                         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

And the EOBI Packet-Header:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|          Body Length          |          Template ID          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                Message Sequence Number (unused)               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                  Application Sequence Number                  |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                       Market Segment ID                       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Partition ID | CompletionInd.|ApplSeqRestInd.|   DSCP copy   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                              pad                              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                                                               |
+                         Transact Time                         +
|                                                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Wireshark Protocol Dissectors

The tool ./eti2wireshark.py generates Wireshark protocol dissectors from the ETI/EOBI protocol descriptions.

Example:

./eti2wireshark.py --proto eobi --desc 'Enhanced Order Book Interface' temp/T7_EOBI_9.1.zip/eobi-mod.xml -o packet-eobi.c
./eti2wireshark.py temp/T7_ETI_9.1.zip/eti_Derivatives.xml -o packet-eti.c

The generated code is implemented around a tight state machine to avoid code bloat.

Protocol fields are pretty-printed in the obvious ways, e.g. timestamps in human readable format, fixed point decimals with the point inserted at the type specific place, enumeration mappings provided etc.

Related work:

  • Open-Markets-Initiative/wireshark-lua - A collection of Lua based model-generated Wireshark dissectors for various trading/market data protocols. The ETI/EOBI protocols are listed there as untested. I haven't tested these dissectors - however, the fact that they use another layer of general indirection (the Lua interpreter) surely doesn't help with dissecting speed.
    The generated ETI 9.1 Lua dissector file contains over 32 thousand lines whereas the eti2wireshark.py generated ETI 9.1 dissector C-code just spans about 13 thousand lines - where most of the lines are lookup tables that are placed into the read-only data segment (i.e. more than 12 thousand lines).
    FWIW, in contrast to the eti2wireshark dissectors, the Lua dissectors pretty-print field names with spaces between the camel-cased elements.
    A real limitation is that timestamp fields such as ExecID are displayed as is, i.e. the value isn't converted into a human readable date-time string.
    A serious issue is how the Lua dissectors display fixed-point decimals: the Lua code uses floating-point arithmetic to convert them and the resulting floating-point value is displayed. Thus, the displayed value is just an approximation of the real value.
    From the repository's description and README it isn't clear where the Lua dissector generators are available and whether they are avaiable under an Open Source license.
  • dharmangbhavsar/eti_dissector (removed) - 'A Eurex ETI Wireshark Dissector for Geneva Trading' was available until mid 2021 or so but that repository was removed later that year. From the archived page its unclear whether that dissector was released under an open source license. The last commit was from December, 2018 and it looks like it supported ETI version 6.1. Since the repository listing includes Deutsche Börse's published C header file (with structs for all the ETI PDUs) and no XML protocol description it looks like that dissectors wasn't code generated.

See also

  • The benchmark test case relies on pytest benchmark (Fedora package: python3-pytest-benchmark).
  • The pcapdump.py example uses the dpkt package for parsing PCAP files and skipping over Ethernet/IP/UDP/TCP headers (Fedora package: python3-dpkt).
  • Wikipedia's List of Electronic Trading Protocols