We are interested to hear from you if performance of TruffleRuby is lower than other implementations of Ruby for code that you care about. The Compatibility guide lists some features which we know are slow and are unlikely to get faster.
TruffleRuby uses extremely sophisticated techniques to optimise your Ruby program. These optimisations take time to apply, so TruffleRuby is often a lot slower than other implementations until it has had time to 'warm up'. Also, TruffleRuby tries to find a 'stable state' of your program and to automatically remove the dynamism of Ruby where it is not needed, but this then means that if the stable state is disturbed by something performance lowers again until TruffleRuby can adapt to the new stable state. Another problem is that TruffleRuby is very good at removing unnecessary work, such as calculations that are not needed or loops that contain no work.
All of these issues make it hard to benchmark TruffleRuby. This isn't a problem that is unique to us - it applies to many sophisticated virtual machines - but most Ruby implementations are not yet doing optimisations powerful enough to show them so it may be a new problem to some people in the Ruby community.
To experiment with how fast TruffleRuby can be we recommend using the Enterprise Edition of GraalVM and rebuilding the Ruby executable images.
For the best peak performance, you want to use the JVM configuration, using
--jvm
. The default native configuration starts faster but doesn't quite reach
the same peak performance. However, you must then use a good benchmarking
tool, like benchmark-ips
described below, to run the benchmark, or the slower
warmup time will mean that you don't see TruffleRuby's true performance in the
benchmark. If you want to write simpler benchmarks that just run a while loop
with a simple timer (which we would not recommend anyway), then use the default
native mode so that startup and warmup time is shorter.
If you are examining the performance of TruffleRuby, we would recommend that you
always run with the --engine.TraceCompilation
flag. If you see
compilation failures or repeated compilation of the same methods, this is an
indicator that something is not working as intended and you may need to examine
why, or ask us to help you do so. If you don't run with this flag Truffle will
try to work around errors and you will not see that there is a problem.
The TruffleRuby team recommends that you use
benchmark-ips
to
check the performance of TruffleRuby, and it makes things easier for us if you
report any potential performance problems using a report from benchmark-ips
.
A benchmark could look like this:
require 'benchmark/ips'
Benchmark.ips do |x|
x.iterations = 2
x.report("adding") do
14 + 2
end
end
We use the x.iterations =
extension in benchmark-ips
to run the warmup and
measurement cycles of benchmark-ips
two times, to ensure the result are stable
and enough warmup was provided (which be tweaked with x.warmup = 5
).
You should see something like this:
Warming up --------------------------------------
adding 20.933k i/100ms
adding 1.764M i/100ms
Calculating -------------------------------------
adding 2.037B (±12.7%) i/s - 9.590B in 4.965741s
adding 2.062B (±11.5%) i/s - 10.123B in 4.989398s
We want to look at the last line, which says that TruffleRuby runs 2.062 billion iterations of this block per second, with a margin of error of ±11.5%.
Compare that to an implementation like Rubinius:
Warming up --------------------------------------
adding 71.697k i/100ms
adding 74.983k i/100ms
Calculating -------------------------------------
adding 2.111M (±12.2%) i/s - 10.302M
adding 2.126M (±10.6%) i/s - 10.452M
It can be described as a thousand times faster than Rubinius. That seems like a lot - and what is actually happening here is that TruffleRuby is optimising away your benchmark. The effect is less with complex code that cannot be optimised away.
Some other benchmarking tools for other languages have a feature called 'black holes'. These surround a value and make it appear to be variable at runtime even if it is in fact a constant, so that the optimiser does not remove it and actually performs any computations that use it. However, TruffleRuby uses extensive value profiling (caching of values and turning them into constants), and even if you make a value appear to be a variable at its source, it is likely to be value profiled at an intermediate stage. In general, more complex benchmarks that naturally defeat value profiling are preferable, rather than manually adding annotations to turn off important features.