Jens Axboe (axboe) wrote,
Jens Axboe

NAPI like approach for block devices

In continuing the quest for higher performance on fast IO devices, I did a quick'n dirty NAPI like implementation for block devices. Like the networking equivalent, it punts work to a softirq and does some polling from that to reap further events from the device. The benefit here being reaping more completion events per hardware interrupt, while also making the interrupt path faster. Initial results were encouraging, I got a few percent higher IOPS rate but at a higher sys time cost. So lets look into why that is.

Now, Linux IO completions already use a softirq for handling the bottom half part of the processing. In crappy ascii callgraph layout, it looks something like the below. The example uses AHCI to give you a full path from start to finish, other drivers vary slightly at the start of course.

IRQ triggers
                    qc->complete_fn() (ata_scsi_qc_complete() for normal IO)
                        command->scsidone() (scsi_done() for normal IO)

blk_complete_request() triggers a call of BLOCK_SOFTIRQ and this completes the top half of the IO processing. Now the softirq triggers:


And here blk_end_request() does the bio unrolling and completes all parts of the bio's in the request, calls the registered bio completion notifier, and finally completes the request and frees it.

The NAPI prototype registered a private softirq and hooked in at the very top of the callgraph, altering it to look something like this:

IRQ triggers
        blk_napi_sched_port() (arms BLOCK_NAPI_SOFTIRQ)


and so on, follows the same calltrace as the original example. Then it hits blk_complete_request() and another softirq is then raised to handle the original bottom half. blk_napi_softirq() will call into ahci_port_intr() until it does no work, or its budget of work has been used up.

This is indeed suboptimal, since we now have to raise two softirqs for completions. The reason it's still a win is because we handle a lot more than 1 request per hard interrupt, but there's still some ground to be gained here. Next I'll work on making the top half really small and using just the single softirq, essentially making the top half the same as the NAPI one listed above, but eliminating the softirq from the NAPI bottom half and just completing everything from the BLOCK_NAPI_SOFTIRQ path.
Tags: io, linux
Comments for this post were disabled by the author