The great ruby shootout
I used benchmark suite from ruby-1.9.3-p125
. All tests run on:
- OS: OSX Lion 10.7.3
- CPU: 2.3GHz i5
- RAM: 8Gb 1333 MHz DDR3
- SSD: OCZ Vertex 3 Max IOPS SATA III 2.5" 120Gb
Implementations:
- ruby 1.8.7p249 (system ruby)
- ruby 1.9.3p125
- ruby 2.0.0dev (2012-02-25 trunk 34796)
- MacRuby 0.12 (ruby 1.9.2) (Nightly build)
- maglev 1.0.0 (ruby 1.8.7)
- rubinius 1.2.4 (1.8.7 release 2011-07-05 JI)
- rubinius 2.0.0dev (1.9.3 e22ed173 yyyy-mm-dd JI)
- jruby 1.7.0.dev (ruby-1.9.3-p28) (Java HotSpot(TM) 64-Bit Server VM 1.7.0_04-ea)
- jruby 1.6.7 (ruby-1.8.7-p357) (Java HotSpot(TM) 64-Bit Server VM 1.7.0_04-ea)
JRuby was run with the --server -Xinvokedynamic.constants=true
flags.
The compiler matters
From time to time, I see blog posts about improving ruby performance through applying some patches, but what if to go further and try to improve ruby performance by compiling it with the fastest available compiler? I decided to check this out.
Here are the list:
- gcc version 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.9.00)
- Apple clang version 3.1 (tags/Apple/clang-318.0.45) (based on LLVM 3.1svn)
- gcc version 4.2.1 (Apple Inc. build 5666) (dot 3)
- gcc version 4.7.0 20120218 (experimental) (GCC)
#!/bin/bash
compilers=( gcc gcc-4.2 gcc-4.7 clang )
for i in "${compilers[@]}"; do
CC=$i ./configure --disable-install-doc --prefix ~/Projects/benches/mri/1.9.3-p125-$i
time make -j4
make install
done
$ ruby driver.rb -v -o ~/Projects/benches/compilers-bench.txt \
--executables='~/Projects/benches/mri/1.9.3-p125-gcc/bin/ruby;
~/Projects/benches/mri/1.9.3-p125-gcc-4.2/bin/ruby;
~/Projects/benches/mri/1.9.3-p125-gcc-4.7/bin/ruby;
~/Projects/benches/mri/1.9.3-p125-clang/bin/ruby'
Results:
Oh, default llvm-gcc is ~20% slower (I run bench a couple of times and got similar results each time) than -pre version of gcc-4.7 in synthetic tests.
To be sure that nothing broke with gcc-4.7:
PASS all 943 tests
KNOWNBUGS.rb .
PASS all 1 tests
Ok, I want to try
That is easy if you have homebrew
installed:
$ brew install https://raw.github.com/etehtsea/formulary/009735e66ccabc5867331f64a406073d1623c683/Formula/gcc.rb --enable-cxx --enable-profiled-build --use-gcc
~ 1 hour later...
$ CC=gcc-4.7 ruby-build 1.9.3-p125 ~/.rbenv/versions/1.9.3-p125
What about %any other implementation%?
I couldn't stop on it and conducted by my curiosity have run the benchmark on other popular ruby implementations and MRI versions. I won't put complete logs, but only some highlights.
Don't use system ruby
It's a trap!
bm_vm_thread_mutex3.rb
# 1000 threads, one mutex
require 'thread'
m = Mutex.new
r = 0
max = 2000
(1..max).map{
Thread.new{
i=0
while i<max
i+=1
m.synchronize{
r += 1
}
end
}
}.each{|e|
e.join
}
raise r.to_s if r != max * max
$ time ~/.rbenv/versions/1.8.7-p357/bin/ruby bm_vm_thread_mutex3.rb
real 0m3.093s
user 0m3.078s
sys 0m0.013s
$ /usr/bin/ruby -v
ruby 1.8.7 (2011-12-28 patchlevel 357) [i686-darwin11.3.0]
$ time /usr/bin/ruby bm_vm_thread_mutex3.rb
^Cbm_vm_thread_mutex3.rb:18:in `join': Interrupt
from bm_vm_thread_mutex3.rb:18
from bm_vm_thread_mutex3.rb:7:in `each'
from bm_vm_thread_mutex3.rb:7
real 3m54.930s
user 3m54.122s
sys 0m0.918s
Even if you sure that you don't use Thread
s elsewhere there are
results without this test:
- ruby 1.8.7 (2010-01-10) - 572.863 sec
- ruby 1.9.3p125 (2012-02-16) - 211.655 sec
Failed on 1.8:
bm_app_factorial.rb
bm_so_ackermann.rb
Rubinius 1.2.4 vs 2.0.0-dev
I've read that there is no GIL in 2.0.0-dev
version and so on, but
upcoming version is slower, and it's really slower.
The most slowdown is again in bm_vm_thread_mutext3.rb
test:
- rubinius 1.2.4 (1.8.7 release 2011-07-05 JI) - 3.260 sec
- rubinius 2.0.0dev (1.9.3 e22ed173 yyyy-mm-dd JI) - 207.711 sec
Here are tests with big differences:
Total result without it:
- 1.2.4 - 518.861 sec
- 2.0.0dev - 606.811 sec
Rubinius was ~15% slower.
Failed:
- factorial changed to 4k instead of 5k
bm_loop_generator.rb
bm_so_ackermann.rb
bm_vm_thread_pass_flood.rb
(took too long)
MacRuby 0.12 (Nightly)
MacRuby is what you need, when you want to write a desktop application for OS X or just use it's API, but there is no reason to use it from performance point.
First of all - MacRuby's eval
(bm_vm2_eval.rb
) is pretty slow:
- ruby 1.9.3p125 (2012-02-16) - 29.681 sec
- MacRuby 0.12 (ruby 1.9.2) - 232.257 sec
bm_vm2_eval.rb
i=0
while i<6_000_000 # benchmark loop 2
i+=1
eval("1")
end
So as erb
parsing and creation Class
instances:
bm_app_erb.rb
#
# Create many HTML strings with ERB.
#
require 'erb'
data = DATA.read
max = 15_000
title = "hello world!"
content = "hello world!\n" * 10
max.times{
ERB.new(data).result(binding)
}
__END__
<html>
<head> <%= title %> </head>
<body>
<h1> <%= title %> </h1>
<p>
<%= content %>
</p>
</body>
</html>
- 1.9.3p125 - 1.817 sec
- MacRuby - 81.808 sec
bm_vm3_clearmethodcache.rb
i=0
while i<200_000
i+=1
Class.new{
def m; end
}
end
- 1.9.3p125 - 0.748 sec
- MacRuby - 86.573 sec
And other tests with big differences:
Failed:
bm_loop_generator.rb
bm_so_count_words.rb
bm_so_nsieve_bits.rb
(took too long)bm_vm_thread_create_join.rb
(took too long)
Maglev 1.0
Interesting that Maglev has similar problems:
bm_vm2_eval.rb
- 754.028 secbm_vm3_clearmethodcache.rb
- 33.785 sec
JRuby 1.6 vs 1.7.0-dev
JRuby 1.7.0-dev has similar performance to 1.6.6 version with
significant improvement in bm_vm_thread_mutex3.rb
bench:
- 1.7.0-dev - 14.381 sec
- 1.6.6 - 202.552 sec
Total result without it:
- 1.7.0-dev - 257.584 sec
- 1.6.6 - 229.502 sec
Failed:
bm_io_select.rb
MRI 2.0.0-dev vs 1.9.3-p125
Same situation with MRI dev branch. Just one improvement in
bm_vm_thread_create_join.rb
:
- ruby 2.0.0dev (2012-02-25 trunk 34796) - 2.806 sec
- ruby 1.9.3p125 (2012-02-16) - 9.239 sec
Total shootout
Total chart:
Chart without:
bm_vm_thread_mutex3.rb
bm_vm2_eval.rb
bm_vm3_clearmethodcache.rb
Looks competitive without them, doesn't it?
Update: Negative timings in vm1/vm2 tests? WTF?
This happened because of benchmark accuracy. Each test in these
sections runs in while
loop, so resulting time calculates like
res_time = vm1/2_test_result - loop_whileloop1/2_result
bm_loop_whileloop.rb
i=0
while i<30_000_000 # benchmark loop 1
i+=1
end
bm_vm1_const.rb
Const = 1
i = 0
while i<30_000_000 # while loop 1
i+= 1
j = Const
k = Const
end
bm_vm1_ensure.rb
i=0
while i<30_000_000 # benchmark loop 1
i+=1
begin
begin
ensure
end
ensure
end
end
Results for jruby 1.7.0.dev
:
loop_whileloop
- 1.56291389465332 secvm1_const
- 1.53185486793518 secvm1_ensure
- 1.56137585639954 sec
There are some proof code from driver.rb:
if /bm_loop_whileloop.rb/ =~ file
@loop_wl1 = r[1].map{|e| e.min}
elsif /bm_loop_whileloop2.rb/ =~ file
@loop_wl2 = r[1].map{|e| e.min}
end
output "name\t#{@execs.map{|(e, v)| v}.join("\t")}#{difference}"
@results.each{|v, result|
rets = []
s = nil
result.each_with_index{|e, i|
r = e.min
case v
when /^vm1_/
if @loop_wl1
r -= @loop_wl1[i]
s = '*'
end
when /^vm2_/
if @loop_wl2
r -= @loop_wl2[i]
s = '*'
end
end
rets << sprintf("%.3f", r)
}
P.S. Please correct me if I messed up somewhere (especially in English grammar).