Tuesday, June 10, 2025

mdadm (RAID 5) performance under different parity layouts (-p --parity --layout)

While performance of right-asymmetric, left-asymmetric, right-symmetric, left-symmetric, is roughly the same, the performance of parity-last and parity-first seems strikingly fast for reads.

Tests were done on a RAID 5 setup over 3 USB hard drivers each have 10TB capacity. Each HDD is capable of 250+ MBPS simultaneously (therefore the USB link is not saturated).

The optimal chunk size for right-asymmetric, left-asymmetric, right-symmetric, left-symmetric starts at 32KB where the sequential read speeds are around 475MB/s. At 256KB and 512KB chunks, the read speeds slightly improve to around 483MB/s. Below 32KB chunks, the read speeds suffer significantly where I get 120MB/s reads at 4K chunks. The write speeds are around 480MB/s even for 4KB chunks and remains the same even for 512KB chunks (no tests where done beyond this size).

With parity-last/first you can afford to have a lower chunk size with the same read performance. For e.g. at 16K chunks, I was getting writes of 488MB/s and reads of 478MB/s writes. However the best lowest chunk size was 32K where I was getting 490MB/s writes and 506MB/s reads. The performance remained the same upto 512K chunk size. Therefore in a 3 disk RAID-5 setup, parity-last/first gives the optimal performance at a lower chunk size (compared to other parity layouts) which MUST be a good deal, however as per other tests done both lower chunk size and parity-last/first is not a good idea.

The problem with parity-last/first is that the writes do not scale beyond 2 data disks (i.e. 3 disks in total), which was a RAID-4 problem and parity-last/first IS a raid 4 layout. Technically, the random writes must not scale and it must not impact sequential writes, but it seems it does not scale even for sequential writes. Synthetic tests where done by starting a VM in qemu with 5 block devices, each of which was throttled to 5MB/s. These are the tests done (with 5 disks) -- 

create qemu-storage --
qemu-img create -f qcow2 -o lazy_refcounts=on RAID5-test-storage1.qcow2 20G
qemu-img create -f qcow2 -o lazy_refcounts=on RAID5-test-storage2.qcow2 20G
qemu-img create -f qcow2 -o lazy_refcounts=on RAID5-test-storage3.qcow2 20G
qemu-img create -f qcow2 -o lazy_refcounts=on RAID5-test-storage4.qcow2 20G
qemu-img create -f qcow2 -o lazy_refcounts=on RAID5-test-storage5.qcow2 20G

Launch qemu -- 

qemu-system-x86_64 -machine accel=kvm,kernel_irqchip=on,mem-merge=on -drive file=template_trixie.raid5.qcow2,id=centos,if=virtio,media=disk,cache=unsafe,aio=threads,index=0 -drive file=RAID5-test-storage1.qcow2,id=storage1,if=virtio,media=disk,cache=unsafe,aio=threads,index=1,throttling.bps-total=$((5*1024*1024)) -drive file=RAID5-test-storage2.qcow2,id=storage2,if=virtio,media=disk,cache=unsafe,aio=threads,index=2,throttling.bps-total=$((5*1024*1024)) -drive file=RAID5-test-storage3.qcow2,id=storage3,if=virtio,media=disk,cache=unsafe,aio=threads,index=3,throttling.bps-total=$((5*1024*1024)) -drive file=RAID5-test-storage4.qcow2,id=storage4,if=virtio,media=disk,cache=unsafe,aio=threads,index=4,throttling.bps-total=$((5*1024*1024)) -drive file=RAID5-test-storage5.qcow2,id=storage5,if=virtio,media=disk,cache=unsafe,aio=threads,index=5,throttling.bps-total=$((5*1024*1024)) -vnc [::1]:0 -device e1000,id=ethnet,netdev=primary,mac=52:54:00:12:34:56 -netdev tap,ifname=veth0,script=no,downscript=no,id=primary -m 1024 -smp 12 -daemonize -monitor pty -serial pty > /tmp/vm0_pty.txt

mdadm parameters for parity-last/first --

mdadm -C /dev/md/bench -l 5 --home-cluster=archive10TB --homehost=any -z 1G -p parity-last -x 0 -n 5 -c 512K --data-offset=8K -N tempRAID -k resync /dev/disk/by-path/virtio-pci-0000:00:0{5..9}.0

mdadm parameters for left-symmetric --

mdadm -C /dev/md/bench -l 5 --home-cluster=archive10TB --homehost=any -z 1G -p left-symmetric -x 0 -n 5 -c 512K --data-offset=8K -N tempRAID -k resync /dev/disk/by-path/virtio-pci-0000:00:0{5..9}.0

Write test --
cat /dev/urandom | tee /dev/stdout | tee /dev/stdout| tee /dev/stdout| tee /dev/stdout| tee /dev/stdout| tee /dev/stdout | tee /dev/stdout| tee /dev/stdout| tee /dev/stdout | dd of=/dev/md/bench bs=1M count=100 oflag=direct iflag=fullblock

 Read test --
dd if=/dev/md/bench of=/dev/null bs=1M count=100 iflag=direct

For the writes, I was getting getting 10MB/s with parity-last and 13.4MB/s left-symmetric (34% higher).

For reads I was getting 21.8MB/s with parity-last and 27.6MB/s with left-symmetric

Therefore it seems left-symmetric was scaling better in every way.

To ensure nothing was wrong with the test setup, I repeated the same test for parity-last/first with 3 disks instead and I was getting 10.7MB/s writes and 10.7MB/s reads.

With this I come to the conclusion, that parity-last/first scales for writes for at best 2 disks in the best case scenario. Yes, agree I was getting a little extra speed for reads with left-symmetric with 5 disks (because theoretically it must be upto 20MB/s), but why did it happen exactly is beyond my understanding.

As of why smaller chunk size is not a good idea, I'll write about that in another blog post.

No comments:

Post a Comment