Saturday, February 25, 2017

Awk vs gawk vs ruby benchmark.

The input file contains lines start with a number or anything else. When a start with a number, it only contains 2 numbers space separated. Output is the summation of the 2 numbers; lines starting with anything other than numbers will be ignored. Some sample lines --

720 7
256 1
4 4
5 7
a578dc953fd09cc6
55 3
f2d9d631d497c97e
cb6db932d9c9b6c2

Awk pattern --

'/^[0-9]/ { print $1+$2 }'

Ruby script --

#! /usr/bin/ruby
ARGF.each {
 |line|
 if line =~ /^([0-9]+) ([0-9]+)/
  puts $1.to_i | $2.to_i
 end
}

Results --

time gawk '/^[0-9]/ { print $1+$2 }' /tmp/awk_input.txt > /dev/null

real    0m10.224s
user    0m10.192s
sys     0m0.031s

time mawk '/^[0-9]/ { print $1+$2 }' /tmp/awk_input.txt > /dev/null

real    0m2.804s
user    0m2.769s
sys     0m0.032s

time ./bench.rb /tmp/awk_input.txt > /dev/null

real    0m36.886s
user    0m36.813s
sys     0m0.070s

So overall, mawk is 3.5 times faster than gawk and is 13 times faster than Ruby.

Script used to generate the input fie --

#! /usr/bin/ruby
require 'securerandom'
awkinput = IO.new(IO.sysopen("/tmp/awk_input.txt", 'a'))
9999999.times {
 writeme = SecureRandom.hex(8)
 if writeme =~ /^([0-9]+).*([0-9]+)/
  datawrite = "#{$1} #{$2}"
 else
  datawrite = writeme
 end
 awkinput.write(datawrite + "\n")
}

Sunday, February 5, 2017

Block device tester

I made this script to test block devices. First argument is the block device to test.
#! /usr/bin/ruby 
# Will quit in case some corrupt blocks are found and will print which position (from the offset) was a corrupt block found.
# First arg -- the block device.
require "securerandom"
require 'digest'
# Block size -- no. of Bytes to write at a time. Script will consume this much memory.
Bs = 9*1024*1024
Multiplyer = 6
# Returns random data of size bs. multiplyer specifies over how much interval to repeat the random data. The data drawn from the random no. generator will be bs/multiplyer
def getRandom(multiplyer, bs)
 randomDataUnit = (bs.to_f/multiplyer.to_f).ceil
 randomData = SecureRandom.random_bytes(randomDataUnit)
 randomData *= multiplyer
 if randomData.bytesize > bs
  randomData = randomData.byteslice(0, bs)
 end
 return randomData
end

# Open device
devwio = IO.new(IO.sysopen(ARGV[0], File::WRONLY|File::BINARY|File::SYNC))
devrio = IO.new(IO.sysopen(ARGV[0], File::RDONLY|File::BINARY|File::RSYNC))
devrio.sync = true
devwio.sync = true

# Calculate no. of blocks to write
devsize = `blockdev --getsize64 #{ARGV[0]}`.to_i
writeBlocks = (devsize.to_f/Bs.to_f).floor

# Write those blocks while testing
writeBlocks.times {
 data = getRandom(Multiplyer, Bs)
 devwio.write(data)
# TODO -- Move if to seperate function
 if (Digest::SHA1.digest data) != (Digest::SHA1.digest devrio.read(Bs))
  puts "\nData verification failed from #{devrio.pos-Bs} to #{devrio.pos}"
 else
  100.times {
   print "\x8"
  }
  print "Progress -- #{devrio.pos/1024/1024}MB"
 end
}
# Handel remaining blocks.
data = getRandom(1, devsize-(writeBlocks*Bs))
devwio.write(data)
# TODO -- Move if to seperate function
if (Digest::SHA1.digest data) != (Digest::SHA1.digest devrio.read)
  puts "\nData verification failed from #{devrio.pos-Bs} to #{devrio.pos}"
else
 100.times {
  print "\x8"
 }
 print "Last #{devrio.pos/1024/1024}MB"
end
puts
devwio.close
devrio.close