Saturday, February 25, 2017

Awk vs gawk vs ruby benchmark.

The input file contains lines start with a number or anything else. When a start with a number, it only contains 2 numbers space separated. Output is the summation of the 2 numbers; lines starting with anything other than numbers will be ignored. Some sample lines --

720 7
256 1
4 4
5 7
a578dc953fd09cc6
55 3
f2d9d631d497c97e
cb6db932d9c9b6c2

Awk pattern --

'/^[0-9]/ { print $1+$2 }'

Ruby script --

#! /usr/bin/ruby
ARGF.each {
 |line|
 if line =~ /^([0-9]+) ([0-9]+)/
  puts $1.to_i | $2.to_i
 end
}

Results --

time gawk '/^[0-9]/ { print $1+$2 }' /tmp/awk_input.txt > /dev/null

real    0m10.224s
user    0m10.192s
sys     0m0.031s

time mawk '/^[0-9]/ { print $1+$2 }' /tmp/awk_input.txt > /dev/null

real    0m2.804s
user    0m2.769s
sys     0m0.032s

time ./bench.rb /tmp/awk_input.txt > /dev/null

real    0m36.886s
user    0m36.813s
sys     0m0.070s

So overall, mawk is 3.5 times faster than gawk and is 13 times faster than Ruby.

Script used to generate the input fie --

#! /usr/bin/ruby
require 'securerandom'
awkinput = IO.new(IO.sysopen("/tmp/awk_input.txt", 'a'))
9999999.times {
 writeme = SecureRandom.hex(8)
 if writeme =~ /^([0-9]+).*([0-9]+)/
  datawrite = "#{$1} #{$2}"
 else
  datawrite = writeme
 end
 awkinput.write(datawrite + "\n")
}

No comments:

Post a Comment