The great ruby shootout
I used benchmark suite from ruby-1.9.3-p125. All tests run on:
- OS: OSX Lion 10.7.3
- CPU: 2.3GHz i5
- RAM: 8Gb 1333 MHz DDR3
- SSD: OCZ Vertex 3 Max IOPS SATA III 2.5" 120Gb
Implementations:
- ruby 1.8.7p249 (system ruby)
- ruby 1.9.3p125
- ruby 2.0.0dev (2012-02-25 trunk 34796)
- MacRuby 0.12 (ruby 1.9.2) (Nightly build)
- maglev 1.0.0 (ruby 1.8.7)
- rubinius 1.2.4 (1.8.7 release 2011-07-05 JI)
- rubinius 2.0.0dev (1.9.3 e22ed173 yyyy-mm-dd JI)
- jruby 1.7.0.dev (ruby-1.9.3-p28) (Java HotSpot(TM) 64-Bit Server VM 1.7.0_04-ea)
- jruby 1.6.7 (ruby-1.8.7-p357) (Java HotSpot(TM) 64-Bit Server VM 1.7.0_04-ea)
JRuby was run with the --server -Xinvokedynamic.constants=true flags.
The compiler matters
From time to time I see blog posts about improving ruby performance through applying some patches, but what if to go further and try to improve ruby performance by compiling it with fastest available compiler? I decided to check this out.
Here are the list:
- gcc version 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.9.00)
- Apple clang version 3.1 (tags/Apple/clang-318.0.45) (based on LLVM 3.1svn)
- gcc version 4.2.1 (Apple Inc. build 5666) (dot 3)
- gcc version 4.7.0 20120218 (experimental) (GCC)
#!/bin/bash
compilers=( gcc gcc-4.2 gcc-4.7 clang )
for i in "${compilers[@]}"; do
CC=$i ./configure --disable-install-doc --prefix ~/Projects/benches/mri/1.9.3-p125-$i
time make -j4
make install
done
$ ruby driver.rb -v -o ~/Projects/benches/compilers-bench.txt \
--executables='~/Projects/benches/mri/1.9.3-p125-gcc/bin/ruby;
~/Projects/benches/mri/1.9.3-p125-gcc-4.2/bin/ruby;
~/Projects/benches/mri/1.9.3-p125-gcc-4.7/bin/ruby;
~/Projects/benches/mri/1.9.3-p125-clang/bin/ruby'
Results:

Oh, default llvm-gcc is ~20% slower (I run bench a couple of times and got similar results each time) than -pre version of gcc-4.7 in synthetic tests.

To be sure that nothing broke with gcc-4.7:
PASS all 943 tests
KNOWNBUGS.rb .
PASS all 1 tests
Ok, I want to try
That is easy if you have homebrew installed:
$ brew install https://raw.github.com/etehtsea/formulary/009735e66ccabc5867331f64a406073d1623c683/Formula/gcc.rb --enable-cxx --enable-profiled-build --use-gcc
~ 1 hour later...
$ CC=gcc-4.7 ruby-build 1.9.3-p125 ~/.rbenv/versions/1.9.3-p125
What about %any other implementation%?
I couldn't stop on it and conducted by my curiosity have run benchmark on other popular ruby implementations and MRI versions. I won't put complete logs, but only some highlights.
Don't use system ruby
It's a trap!
bm_vm_thread_mutex3.rb
# 1000 threads, one mutex
require 'thread'
m = Mutex.new
r = 0
max = 2000
(1..max).map{
Thread.new{
i=0
while i<max
i+=1
m.synchronize{
r += 1
}
end
}
}.each{|e|
e.join
}
raise r.to_s if r != max * max
$ time ~/.rbenv/versions/1.8.7-p357/bin/ruby bm_vm_thread_mutex3.rb
real 0m3.093s
user 0m3.078s
sys 0m0.013s
$ /usr/bin/ruby -v
ruby 1.8.7 (2011-12-28 patchlevel 357) [i686-darwin11.3.0]
$ time /usr/bin/ruby bm_vm_thread_mutex3.rb
^Cbm_vm_thread_mutex3.rb:18:in `join': Interrupt
from bm_vm_thread_mutex3.rb:18
from bm_vm_thread_mutex3.rb:7:in `each'
from bm_vm_thread_mutex3.rb:7
real 3m54.930s
user 3m54.122s
sys 0m0.918s
Even if you sure that you don't use Threads elsewhere there are
results without this test:
- ruby 1.8.7 (2010-01-10) - 572.863 sec
- ruby 1.9.3p125 (2012-02-16) - 211.655 sec
Failed on 1.8:
bm_app_factorial.rbbm_so_ackermann.rb
Rubinius 1.2.4 vs 2.0.0-dev
I've read that there is no GIL in 2.0.0-dev version and so on, but
upcoming version is slower, and it's really slower.
The most slowdown is again in bm_vm_thread_mutext3.rb test:
- rubinius 1.2.4 (1.8.7 release 2011-07-05 JI) - 3.260 sec
- rubinius 2.0.0dev (1.9.3 e22ed173 yyyy-mm-dd JI) - 207.711 sec
Here are tests with big differences:

Total result without it:
- 1.2.4 - 518.861 sec
- 2.0.0dev - 606.811 sec
Rubinius wasn't really fast and become ~15% slower.
Failed:
- factorial changed to 4k instead of 5k
bm_loop_generator.rbbm_so_ackermann.rbbm_vm_thread_pass_flood.rb(took too long)
MacRuby 0.12 (Nightly)
MacRuby is what you need, when you want to write desktop application for OS X or just use it's API, but there is no reason to use it from performance point.
First of all - MacRuby's eval (bm_vm2_eval.rb) is pretty slow:
- ruby 1.9.3p125 (2012-02-16) - 29.681 sec
- MacRuby 0.12 (ruby 1.9.2) - 232.257 sec
bm_vm2_eval.rb
i=0
while i<6_000_000 # benchmark loop 2
i+=1
eval("1")
end
So as erb parsing and creation Class instances:
bm_app_erb.rb
#
# Create many HTML strings with ERB.
#
require 'erb'
data = DATA.read
max = 15_000
title = "hello world!"
content = "hello world!\n" * 10
max.times{
ERB.new(data).result(binding)
}
__END__
<html>
<head> <%= title %> </head>
<body>
<h1> <%= title %> </h1>
<p>
<%= content %>
</p>
</body>
</html>
- 1.9.3p125 - 1.817 sec
- MacRuby - 81.808 sec
bm_vm3_clearmethodcache.rb
i=0
while i<200_000
i+=1
Class.new{
def m; end
}
end
- 1.9.3p125 - 0.748 sec
- MacRuby - 86.573 sec
And other tests with big differences:

Failed:
bm_loop_generator.rbbm_so_count_words.rbbm_so_nsieve_bits.rb(took too long)bm_vm_thread_create_join.rb(took too long)
Maglev 1.0
Interesting that Maglev has similar problems:
bm_vm2_eval.rb- 754.028 secbm_vm3_clearmethodcache.rb- 33.785 sec

JRuby 1.6 vs 1.7.0-dev
JRuby 1.7.0-dev has similar performance to 1.6.6 version with
significant improvement in bm_vm_thread_mutex3.rb bench:
- 1.7.0-dev - 14.381 sec
- 1.6.6 - 202.552 sec
Total result without it:
- 1.7.0-dev - 257.584 sec
- 1.6.6 - 229.502 sec
Failed:
bm_io_select.rb
MRI 2.0.0-dev vs 1.9.3-p125
Same situation with MRI dev branch. Just one improvement in
bm_vm_thread_create_join.rb:
- ruby 2.0.0dev (2012-02-25 trunk 34796) - 2.806 sec
- ruby 1.9.3p125 (2012-02-16) - 9.239 sec
Total shootout

Total chart:

Chart without:
bm_vm_thread_mutex3.rbbm_vm2_eval.rbbm_vm3_clearmethodcache.rb

Looks competitive without them, doesn't it?
Update: Negative timings in vm1/vm2 tests? WTF?
This happened because of benchmark accuracy. Each test in these
sections runs in while loop, so resulting time calculates like
res_time = vm1/2_test_result - loop_whileloop1/2_result
bm_loop_whileloop.rb
i=0
while i<30_000_000 # benchmark loop 1
i+=1
end
bm_vm1_const.rb
Const = 1
i = 0
while i<30_000_000 # while loop 1
i+= 1
j = Const
k = Const
end
bm_vm1_ensure.rb
i=0
while i<30_000_000 # benchmark loop 1
i+=1
begin
begin
ensure
end
ensure
end
end
Results for jruby 1.7.0.dev:
loop_whileloop- 1.56291389465332 secvm1_const- 1.53185486793518 secvm1_ensure- 1.56137585639954 sec
There are some proof code from driver.rb:
if /bm_loop_whileloop.rb/ =~ file
@loop_wl1 = r[1].map{|e| e.min}
elsif /bm_loop_whileloop2.rb/ =~ file
@loop_wl2 = r[1].map{|e| e.min}
end
output "name\t#{@execs.map{|(e, v)| v}.join("\t")}#{difference}"
@results.each{|v, result|
rets = []
s = nil
result.each_with_index{|e, i|
r = e.min
case v
when /^vm1_/
if @loop_wl1
r -= @loop_wl1[i]
s = '*'
end
when /^vm2_/
if @loop_wl2
r -= @loop_wl2[i]
s = '*'
end
end
rets << sprintf("%.3f", r)
}
P.S. Please correct me if I messed up somewhere (especially in English grammar).
blog comments powered byDisqus