After removing the logger from the elixir_grpc_bench, and by using these settings,
Name | Description | Default value |
---|---|---|
GRPC_BENCHMARK_DURATION | Duration of the benchmark. | 120s |
GRPC_REQUEST_PAYLOAD | File (from payload/) containing the data to be sent in the client request. | 100B |
GRPC_SERVER_RAM | Maximum memory used by the server. | 4096m |
GRPC_CLIENT_CONNECTIONS | Number of connections to use. | 5 |
GRPC_CLIENT_CONCURRENCY | Number of requests to run concurrently. It can't be smaller than the number of connections. | 50 |
GRPC_CLIENT_QPS | Rate limit, in queries per second (QPS). | 0 (unlimited) |
and running in Docker, on a laptop, with Windows 10 Pro, with Processor Intel(R) Xeon(R) CPU E3-1505M v6 @ 3.00GHz, 3001 Mhz, 4 Core(s), 8 Logical Processor(s),
I got these results for elixir_grpc_bench:
name | req/s | avg. latency | 90 % in | 95 % in | 99 % in | avg. cpu | avg. memory |
---|---|---|---|---|---|---|---|
GRPC_SERVER_CPUS=8 GRPC_CLIENT_CPUS=8 | 10353 | 4.75 ms | 6.92 ms | 8.17 ms | 11.53 ms | 544.78% | 66.33 MiB |
GRPC_SERVER_CPUS=4 GRPC_CLIENT_CPUS=4 | 8867 | 5.57 ms | 7.01 ms | 23.75 ms | 34.25 ms | 405.27% | 65.38 MiB |
GRPC_SERVER_CPUS=2 GRPC_CLIENT_CPUS=2 | 4195 | 11.84 ms | 64.42 ms | 68.17 ms | 78.62 ms | 199.42% | 66.47 MiB |
One repo to finally have a clear, objective gRPC benchmark with code for everyone to verify and improve.
Contributions are most welcome!
The goal of this benchmark is to compare the performance and resource usage of various gRPC libraries across different programming languages and technologies. To achieve that, a minimal protobuf contract is used to not pollute the results with other concepts (e.g. performances of hash maps) and to make the implementations simple.
That being said, the service implementations should NOT take advantage of that and keep the code generic and maintainable. No inline assembly or other, language specific tricks / hacks. What does generic mean? One should be able to easily adapt the existing code to some fundamental use cases (e.g. having a thread-safe hash map on server side to provide values to client given some key).
Although in the end results are sorted according to the number of requests served, one should go beyond and look at the resource usage - perhaps one implementation is slightly better in terms of raw speed but uses three times more CPU to achieve that. Maybe it's better to take the first one if you're running on a Raspberry Pi and want to get the most of it. Maybe it's better to use the latter in a big server with 32 CPUs because it scales. It all depends on your use case. This benchmark is created to help people make an informed decision (and get ecstatic when their favourite technology seems really good, without doubts).
We try to provide some metrics to make this decision easier:
- req/s - the number of requests the service was able to successfully serve
- average latency, and 90/95/99 percentiles - time from sending a request to receiving the response
- average CPU, memory - average resource usage during the benchmark, as reported by
docker stats
- Completeness of the gRPC library. We test only basic unary RPC at the moment. This is the most common service method which may be enough for some business use cases, but not for the others. When you're happy about the results of some technology, you should check out it's documentation (if it exists) and decide yourself if is it production-ready.
- Taste. Some may find beauty in Ruby, some may feel like Java is the only real deal. Others treat languages as tools and don't care at all. We don't judge (officially 😉 ). Unless it's a huge state machine with raw
void
pointers. Ups!
Linux or MacOS with Docker. Keep in mind that the results on MacOS may not be that reliable, Docker for Mac runs on a VM.
To build the benchmarks images use: ./build.sh [BENCH1] [BENCH2] ...
. You need them to run the benchmarks.
To run the benchmarks use: ./bench.sh [BENCH1] [BENCH2] ...
. They will be run sequentially.
To clean-up the benchmark images use: ./clean.sh [BENCH1] [BENCH2] ...
The benchmark can be configured through the following environment variables:
Name | Description | Default value |
---|---|---|
GRPC_BENCHMARK_DURATION | Duration of the benchmark. | 30s |
GRPC_REQUEST_PAYLOAD | File (from payload/) containing the data to be sent in the client request. | 100B |
GRPC_SERVER_CPUS | Maximum number of cpus used by the server. | 1 |
GRPC_SERVER_RAM | Maximum memory used by the server. | 512m |
GRPC_CLIENT_CONNECTIONS | Number of connections to use. | 5 |
GRPC_CLIENT_CONCURRENCY | Number of requests to run concurrently. It can't be smaller than the number of connections. | 50 |
GRPC_CLIENT_QPS | Rate limit, in queries per second (QPS). | 0 (unlimited) |
GRPC_CLIENT_CPUS | Maximum number of cpus used by the client. | 1 |
GRPC_BENCHMARK_DURATION
should not be too small. Some implementations need a warm-up before achieving their optimal performance and most real-life gRPC services are expected to be long running processes. From what we measured, 300s should be enough.GRPC_SERVER_CPUS
+GRPC_CLIENT_CPUS
should not exceed total number of cores on the machine. The reason for this is that you don't want theghz
client to steal precious CPU cycles from the service under test. Keep in mind that having theGRPC_CLIENT_CPUS
too low may not saturate the service in some of the more performant implementations. Also keep in mind limiting the number ofGRPC_SERVER_CPUS
to 1 will severely hamper the performance for some technologies - is running a service on 1 CPU your use case? It may be, but keep in mind eventual load balancer also incurs some costs.
Other parameters will depend on your use-case. Choose wisely.
You can find our sample results in the Wiki. Be sure to run the benchmarks yourself if you have sufficient hardware, especially for multi-core scenarios.