Fork me on GitHub

Yet another rubyist blog

The great ruby shootout

I used benchmark suite from ruby-1.9.3-p125. All tests run on:

Implementations:

JRuby was run with the --server -Xinvokedynamic.constants=true flags.

The compiler matters

From time to time I see blog posts about improving ruby performance through applying some patches, but what if to go further and try to improve ruby performance by compiling it with fastest available compiler? I decided to check this out.

Here are the list:

#!/bin/bash
compilers=( gcc gcc-4.2 gcc-4.7 clang )

for i in "${compilers[@]}"; do
  CC=$i ./configure --disable-install-doc --prefix ~/Projects/benches/mri/1.9.3-p125-$i
  time make -j4
  make install
done
$ ruby driver.rb -v -o ~/Projects/benches/compilers-bench.txt \
--executables='~/Projects/benches/mri/1.9.3-p125-gcc/bin/ruby;
               ~/Projects/benches/mri/1.9.3-p125-gcc-4.2/bin/ruby;
               ~/Projects/benches/mri/1.9.3-p125-gcc-4.7/bin/ruby;
               ~/Projects/benches/mri/1.9.3-p125-clang/bin/ruby'

Results:

mri-compilers

Oh, default llvm-gcc is ~20% slower (I run bench a couple of times and got similar results each time) than -pre version of gcc-4.7 in synthetic tests.

compile-time

To be sure that nothing broke with gcc-4.7:

PASS all 943 tests
KNOWNBUGS.rb .
PASS all 1 tests

Ok, I want to try

That is easy if you have homebrew installed:

$ brew install https://raw.github.com/etehtsea/formulary/009735e66ccabc5867331f64a406073d1623c683/Formula/gcc.rb --enable-cxx --enable-profiled-build --use-gcc

~ 1 hour later...

$ CC=gcc-4.7 ruby-build 1.9.3-p125 ~/.rbenv/versions/1.9.3-p125

What about %any other implementation%?

I couldn't stop on it and conducted by my curiosity have run benchmark on other popular ruby implementations and MRI versions. I won't put complete logs, but only some highlights.

Don't use system ruby

It's a trap!

bm_vm_thread_mutex3.rb

# 1000 threads, one mutex

require 'thread'
m = Mutex.new
r = 0
max = 2000
(1..max).map{
  Thread.new{
    i=0
    while i<max
      i+=1
      m.synchronize{
        r += 1
      }
    end
  }
}.each{|e|
  e.join
}
raise r.to_s if r != max * max
$ time ~/.rbenv/versions/1.8.7-p357/bin/ruby bm_vm_thread_mutex3.rb 
real    0m3.093s
user    0m3.078s
sys 0m0.013s
$ /usr/bin/ruby -v
ruby 1.8.7 (2011-12-28 patchlevel 357) [i686-darwin11.3.0]
$ time /usr/bin/ruby bm_vm_thread_mutex3.rb
^Cbm_vm_thread_mutex3.rb:18:in `join': Interrupt
    from bm_vm_thread_mutex3.rb:18
    from bm_vm_thread_mutex3.rb:7:in `each'
    from bm_vm_thread_mutex3.rb:7

real    3m54.930s
user    3m54.122s
sys 0m0.918s

Even if you sure that you don't use Threads elsewhere there are results without this test:

Failed on 1.8:

Rubinius 1.2.4 vs 2.0.0-dev

I've read that there is no GIL in 2.0.0-dev version and so on, but upcoming version is slower, and it's really slower.

The most slowdown is again in bm_vm_thread_mutext3.rb test:

Here are tests with big differences: result-rubinius

Total result without it:

Rubinius wasn't really fast and become ~15% slower.

Failed:

MacRuby 0.12 (Nightly)

MacRuby is what you need, when you want to write desktop application for OS X or just use it's API, but there is no reason to use it from performance point.

First of all - MacRuby's eval (bm_vm2_eval.rb) is pretty slow:

bm_vm2_eval.rb

i=0
while i<6_000_000 # benchmark loop 2
  i+=1
  eval("1")
end

So as erb parsing and creation Class instances:

bm_app_erb.rb

#
# Create many HTML strings with ERB.
#

require 'erb'

data = DATA.read
max = 15_000
title = "hello world!"
content = "hello world!\n" * 10

max.times{
  ERB.new(data).result(binding)
}

__END__

<html>
  <head> <%= title %> </head>
  <body>
    <h1> <%= title %> </h1>
    <p>
      <%= content %>
    </p>
  </body>
</html>

bm_vm3_clearmethodcache.rb

i=0
while i<200_000
  i+=1

  Class.new{
    def m; end
  }
end

And other tests with big differences: result-macruby

Failed:

Maglev 1.0

Interesting that Maglev has similar problems:

result-rubinius

JRuby 1.6 vs 1.7.0-dev

JRuby 1.7.0-dev has similar performance to 1.6.6 version with significant improvement in bm_vm_thread_mutex3.rb bench:

Total result without it:

Failed:

MRI 2.0.0-dev vs 1.9.3-p125

Same situation with MRI dev branch. Just one improvement in bm_vm_thread_create_join.rb:

Total shootout

result-total-list1 result-total-list2

Total chart:

result-total-chart1

Chart without:

result-total-chart2

Looks competitive without them, doesn't it?

Update: Negative timings in vm1/vm2 tests? WTF?

This happened because of benchmark accuracy. Each test in these sections runs in while loop, so resulting time calculates like res_time = vm1/2_test_result - loop_whileloop1/2_result

bm_loop_whileloop.rb

i=0
while i<30_000_000 # benchmark loop 1
  i+=1
end

bm_vm1_const.rb

Const = 1

i = 0
while i<30_000_000 # while loop 1
  i+= 1
  j = Const
  k = Const
end

bm_vm1_ensure.rb

i=0
while i<30_000_000 # benchmark loop 1
  i+=1
  begin
    begin
    ensure
    end
  ensure
  end
end

Results for jruby 1.7.0.dev:

There are some proof code from driver.rb:

if /bm_loop_whileloop.rb/ =~ file
  @loop_wl1 = r[1].map{|e| e.min}
elsif /bm_loop_whileloop2.rb/ =~ file
  @loop_wl2 = r[1].map{|e| e.min}
end
output "name\t#{@execs.map{|(e, v)| v}.join("\t")}#{difference}"
@results.each{|v, result|
  rets = []
  s = nil
  result.each_with_index{|e, i|
    r = e.min
    case v
    when /^vm1_/
      if @loop_wl1
        r -= @loop_wl1[i]
        s = '*'
      end
    when /^vm2_/
      if @loop_wl2
        r -= @loop_wl2[i]
        s = '*'
      end
    end
    rets << sprintf("%.3f", r)
  }

P.S. Please correct me if I messed up somewhere (especially in English grammar).

blog comments powered byDisqus