Jens Axboe (axboe) wrote,
Jens Axboe
axboe

pdflush epitaph

It seems it has been about 2 months since I last posted here. That's not due to lack of kernel activity, though real life has interfered a bit with the addition of one more son to the family.

The patch set has been undergoing some changes since I last posted. One is the ability to have more than one thread per backing device. This is supposed to be useful for extreme cases where a single CPU cannot keep up with a very fast device. I have yet to actually test this part, I'm hoping some of the interested parties will join the fun and add the file system related code that enables placement and flushing of dirty inodes on several writeback threads per bdi. Another change is lazy create/exit of flusher threads. pdflush has 2-8 threads running depending on what mood it is in. With the per-bdi flusher threads, they will not get created unless they are going to be working. If they have been idle for some time, they will exit again. So this should more smoothly respond to actual system demands, not much point in having 100 idle threads for 100 disks, if only a fraction of those disks are actually writeback busy in a period of time.

I've also done a bit of testing this week, results look pretty good. Most show the new approach reaching similar performance but at a lower system utilization percentage, or higher performance. So that's all good. Yanmin Zhang (Intel) ran into a bug (that may or may not already be fixed, I'll know tomorrow when tests are run with new patches) and posted a fio job file that he reproduced it with. I decided to run the test with and without the writeback patches to compare results. The disk used is an 32G Intel X25-E SSD and the file system is ext4.

KernelThroughput
usr CPUsys CPUdisk util
writeback175MB/sec17.55%43.04%97.80%
vanilla147MB/sec13.44%47.33%85.98%

Pretty decent result, I'd say. Apart from the lower system utilization, the interesting bit is how the writeback patches actually enable us to keep the disk busy. ~86% utilization for the vanilla kernel is pretty depressing. The fio job file used was:

[global]
direct=0
ioengine=mmap
iodepth=256
iodepth_batch=32
size=1500M
bs=4k
pre_read=1
overwrite=1
numjobs=1
loops=5
runtime=600
group_reporting
directory=/data

[job_group0_sub0]
exec_prerun="echo 3 > /proc/sys/vm/drop_caches"
startdelay=0
rw=randwrite
filename=f1:f2


The next few days will be spend with polishing the patch set and posting version 5. That one should hopefully be ready for inclusion in the -next tree, and then be headed upstream for 2.6.31.

Oh, and that Intel disk kicks some serious ass. For sequential writes, it maintains 210MB/sec easily. I have a few OCZ Vertex disks as well which do pretty well for sequential writes too, for random writes the Intel drive is in a different league though. For my birthday, I want 4 more Intel disks for testing!

Tags: kernel, linux, pdflush, writeback
Subscribe

  • F5100 and IOPS

    I finally got everything setup and wired up. The box now has 80 modules installed and I'm using 4 SAS controllers which each connects to two…

  • New storage toy

    Since I do a lot of work on the IO side of the Linux kernel, I have an assortment of various types of disk drives and storage. I've been lucky enough…

  • Cheaper SSD reliability, continued

    So after twice promising me to get info from 'an engineer' in a 10 day time span, I pushed OCZ again today. The answer is that it's likely "bad…

Comments for this post were disabled by the author