Copyright(c) 2014 Milo Yip ([email protected])
This benchmark evaluates the performance of conversion from 32-bit/64-bit integer to ASCII string in decimal. The function prototypes are:
void u32toa(uint32_t value, char* buffer);
void i32toa(int32_t value, char* buffer);
void u64toa(uint64_t value, char* buffer);
void i64toa(int64_t value, char* buffer);
Note that itoa()
is not a standard function in C and C++, but provided by some compilers.
Firstly the program verifies the correctness of implementations.
Then, two cases for benchmark are carried out:
-
Sequential: Converts consecutive values in same number of decimal digits.
For
u32toa()
, the tested values are { 1, 2, ..., 9 }, {10, 11, ..., 99 }, ... { 4000000000, 4000000001, ..., 4294967296}, i.e., groups of 1 to 10 decimal digits.For signed versions, use alternate signs, e.g. { 1, -2, 3, -4, ... 9 }.
For 64-bit integer, there are groups of 1 to 20 decimal digits.
-
Random: Converts the shuffled sequence of the first case.
Each digit group is run for 100000 times. The minimum time duration is measured for 10 trials.
- Obtain premake4.
- Copy premake4 executable to
itoa-benchmark/build
folder (or system path). - Run
premake.bat
orpremake.sh
initoa-benchmark/build
- On Windows, build the solution at
itoa-benchmark/build/vs2008/
or/vs2010/
. - On other platforms, run GNU
make config=release32
(orrelease64
) atitoa-benchmark/build/gmake/
- On success, run the
itoa
executable is generated atitoa-benchmark/
- The results in CSV format will be written to
itoa-benchmark/result
. - Run GNU
make
initoa-benchmark/result
to generate results in HTML.
The following are sequential
results measured on a PC (Core i7 920 @2.67Ghz), where u32toa()
is compiled by Visual C++ 2013 and run on Windows 64-bit. The speedup is based on sprintf()
.
Function | Time (ms) | Speedup |
---|---|---|
ostringstream | 991.234 | 0.19x |
ostrstream | 931.916 | 0.20x |
to_string | 232.894 | 0.81x |
sprintf | 188.733 | 1.00x |
vc | 61.509 | 3.07x |
naive | 26.739 | 7.06x |
count | 20.723 | 9.11x |
lut | 17.770 | 10.62x |
countlut | 9.920 | 19.03x |
branchlut | 8.474 | 22.27x |
sse2 | 7.646 | 24.68x |
null | 2.533 | 74.50x |
Note that the null
implementation does nothing. It measures the overheads of looping and function call.
Since the C++ standard library implementations (ostringstream
, ostrstream
, to_string
) are slow, they are turned off by default. User can re-enable them by defining RUN_CPPITOA
macro.
Some results of various configurations are located at itoa-benchmark/result
. They can be accessed online, with interactivity provided by Google Charts:
- [email protected]_win32_vc2013
- [email protected]_win64_vc2013
- [email protected]_cygwin32_gcc4.8
- [email protected]_cygwin64_gcc4.8
Function | Description |
---|---|
ostringstream | std::ostringstream in C++ standard library. |
ostrstream | std::ostrstream in C++ standard library. |
to_string | std::to_string() in C++11 standard library. |
sprintf | sprintf() in C standard library |
vc | Visual C++'s _itoa() , _i64toa() , _ui64toa() |
naive | Compute division/modulo of 10 for each digit, store digits in temp array and copy to buffer in reverse order. |
count | Count number of decimal digits first, using technique from [1]. |
lut | Uses lookup table (LUT) of digit pairs for division/modulo of 100. Mentioned in [2] |
countlut | Combines count and lut. |
branchlut | Use branching to divide-and-conquer the range of value, make computation more parallel. |
sse2 | Based on branchlut scheme, use SSE2 SIMD instructions to convert 8 digits in parallel. The algorithm is designed by Wojciech Muła [3]. (Experiment shows it is useful for values equal to or more than 9 digits) |
null | Do nothing. |
-
How to add an implementation?
You may clone an existing implementation file (e.g.
naive.cpp
). And then modify it. Re-runpremake
to add it to project or makefile. Note that it will automatically register to the benchmark by macroREGISTER_TEST(name)
.Making pull request of new implementations is welcome.
-
Why not converting integers to
std::string
?It may introduce heap allocation, which is a big overhead. User can easily wrap these low-level functions to return
std::string
, if needed. -
Why fast
itoa()
functions is needed?They are a very common operations in writing data in text format. The standard way of
sprintf()
,std::stringstream
,std::to_string(int)
(C++11) often provides poor performance. The author of this benchmark would optimize the "naive" implementation in RapidJSON, thus he creates this project.
[1] Anderson, Bit Twiddling Hacks, 1997.
[2] Alexandrescu, Three Optimization Tips for C++, 2012.
[3] Muła, SSE: conversion integers to decimal representation, 2011.
- [The String Formatters of Manor Farm] (http://www.gotw.ca/publications/mill19.htm) by Herb Sutter, 2001.
- C++ itoa benchmark by localvoid
- Stackoverflow: C++ performance challenge: integer to std::string conversion