Thursday, March 27, 2025

xfs vs ext4 squential operation benchmark (both on HDD and nvme)

Methodology

sequentially writing a 5GB file with cache --
cat /dev/urandom | tee /dev/stdout | tee /dev/stdout| tee /dev/stdout| tee /dev/stdout| tee /dev/stdout| tee /dev/stdout | tee /dev/stdout| tee /dev/stdout| tee /dev/stdout | dd of=random iflag=fullblock bs=1M count=5120

Next write without cache --
rm random
sync; echo 3 > /proc/sys/vm/drop_caches
cat /dev/urandom | tee /dev/stdout | tee /dev/stdout| tee /dev/stdout| tee /dev/stdout| tee /dev/stdout| tee /dev/stdout | tee /dev/stdout| tee /dev/stdout| tee /dev/stdout | dd of=random iflag=fullblock bs=1M count=5120 oflag=direct

Read without cache --
sync; echo 3 > /proc/sys/vm/drop_caches
dd if=random of=/dev/null iflag=direct bs=1M

Read with cache --
dd if=random of=/dev/null bs=1M
 
Repeated read with cache --
sync; echo 3 > /proc/sys/vm/drop_caches
for i in {1..10}; do dd if=random of=/dev/null bs=1M; sleep 1; done
 

FS format parameters and mount options -- 

2 benchmarks will be done for XFS. one with rmapbt=0 and the other with rmapbt=1
These are the xfs parameters -- 
mkfs.xfs -f -m rmapbt=0,reflink=0
mkfs.xfs -f -m rmapbt=1,reflink=0

ext4 format options are either for large file or for both large and small files.
ext4 format options optimized for large files --
mkfs.ext4 -m 1 -O none,dir_index,extent,^flex_bg,^bigalloc,has_journal,large_file,sparse_super2,^uninit_bg 
 
ext4 format options optimized for both large and small files --
mkfs.ext4 -g 256 -G 4 -J size=100 -m 1 -C 2097152 -O none,bigalloc,extent,flex_bg,has_journal,large_file,sparse_super2,^uninit_bg,dir_index,dir_nlink,^sparse_super,^sparse_super2

xfs mount options -- 
mount -o logbufs=8,logbsize=256k,noquota,noatime

ext4 mount options (when formatted for large file optimization) -- 
mount -o noquota,noatime,data=writeback,journal_async_commit,inode_readahead_blks=32768,max_batch_time=10000000

Benchmark results

xfs rmapbt on vs off in nvme -- 

Without rmapbt
    sequentially writing a 5GB file with cache --
    2.3 GB/s
    
    Next write without cache --
    1.7 GB/s
    
    Read without cache --
    2.2 GB/s
    
    Read with cache --
    2.9 GB/s
    
    Repeated read with cache --
    2.8
    17.3
    16.2
    16.2
    16.2
    16.2
    16.3
    16.3
    16.1
    16.3
With rmapbt
    sequentially writing a 5GB file with cache --
    2.4 GB/s
    
    Next write without cache --
    1.7 GB/s
    
    Read without cache --
    2.2 GB/s
    
    Read with cache --
    2.8 GB/s
    
    Repeated read with cache --
    2.8 GB/s
    16.4 GB/s
    16.5 GB/s
    16.5 GB/s
    16.5 GB/s
    16.4 GB/s
    16.5 GB/s
    16.4 GB/s
    16.5 GB/s
    16.5 GB/s

Sequential read/write operations with rmapbt on/off in XFS

XFS (rmapbt=0) --
sequentially writing a 1GB file with cache --
116 MB/s,112 MB/s
Next write without cache --
105 MB/s
Read without cache --
104 MB/s
Read with cache --
104 MB/s
Read with cache again --
13.8 GB/s
Repeated read with cache --
This was done after formatting + sequentially writing a 1GB file with cache
105,17.4,17.3,17.3,17.3,17.3,17.4,17.4,17.4,17.4
Avg: 17.35555555555555555555

XFS format options with rmapbt=1 --
sequentially writing a 1GB file with cache --
115 MB/s
Repeated read with cache --
This was done after formatting + sequentially writing a 1GB file with cache
106, 13.9,13.8,13.4,13.9,13.7,14.1,13.9,13.8,14.0
Avg: 13.833333

Sequential read/write operations on ext4

ext4 (optimized for large files) --
sequentially writing a 1GB file with cache --
112 MB/s,112 MB/s
Next write without cache --
104 MB/s
Read without cache --
105 MB/s
Read with cache --
105 MB/s
Read with cache again --
11.2 GB/s
Repeated read with cache --
This was done after formatting + sequentially writing a 1GB file with cache
108,11.4,11.3,11.2,13.7,12.5,12.3,12.4,12.2,12.3
Avg:12.14444444444444444444

ext4 mount options optimized for both small and large files -
sequentially writing a 1GB file with cache --
115 MB/s
Repeated read with cache --
This was done after formatting + sequentially writing a 1GB file with cache
103,11.8,12.0,12.0,11.9,12.8,12.8,12.8,12.9,12.9
Avg: 12.433333

Conclusion -- 

For nvme/ssd, xfs with rmapbt on is the way for sequential operations on large file. This is also better than ext4 even for small file operations (benchmark published later).
For HDD Storage, xfs without rmapbt (or rmapbt=0) will perform the best.