NAME
BUFFERIO,
biodone,
biowait,
getiobuf,
putiobuf,
nestiobuf_setup,
nestiobuf_done —
block I/O buffer
transfers
SYNOPSIS
#include <sys/buf.h>
void
biodone(
struct
buf *bp);
int
biowait(
struct
buf *bp);
struct buf *
getiobuf(
struct
vnode *vp,
bool
waitok);
void
putiobuf(
struct
buf *bp);
void
nestiobuf_setup(
struct
buf *mbp,
struct buf
*bp,
int offset,
size_t size);
void
nestiobuf_done(
struct
buf *mbp,
int
donebytes,
int
error);
DESCRIPTION
The
BUFFERIO subsystem manages block I/O buffer transfers,
described by the
struct buf structure, which serves
multiple purposes between users in
BUFFERIO, users in
buffercache(9), and users
in block device drivers to execute transfers to physical disks.
BLOCK DEVICE USERS
Users of
BUFFERIO wishing to submit a buffer for block I/O
transfer must obtain a
struct buf, e.g. via
getiobuf(), fill its parameters, and submit it to a block
device with
bdev_strategy(9), usually
via
VOP_STRATEGY(9).
The parameters to an I/O transfer described by
bp are
specified by the following
struct buf fields:
-
-
- bp
->b_flags
- Flags specifying the type of transfer.
B_READ
- Transfer is read from device. If not set, transfer is
write to device.
B_ASYNC
- Asynchronous I/O. Caller must not provide
bp
->b_iodone
and must
not call biowait(bp).
For legibility, callers should indicate writes by passing the pseudo-flag
B_WRITE
, which is zero.
-
-
- bp
->b_data
- Pointer to kernel virtual address of source/target for
transfer.
-
-
- bp
->b_bcount
- Nonnegative number of bytes requested for transfer.
-
-
- bp
->b_blkno
- Block number at which to do transfer.
-
-
- bp
->b_iodone
- I/O completion callback.
B_ASYNC
must not be set in
bp->b_flags
.
Additionally, if the I/O transfer is a write associated with a
vnode(9)
vp, then before the user submits it to a block device,
the user must increment
vp->v_numoutput
. The user
must not acquire
vp's vnode lock between incrementing
vp->v_numoutput
and
submitting
bp to a block device — doing so will
likely cause deadlock with the syncer.
Block I/O transfer completion may be notified by the
bp->b_iodone
callback, by
signalling
biowait() waiters, or not at all in the
B_ASYNC
case.
- If the user sets the
bp
->b_iodone
callback to
a non-NULL
function pointer, it will be called in
soft interrupt context when the I/O transfer is complete. The user
may not call
biowait(bp) in this case.
- If
B_ASYNC
is set, then the I/O
transfer is asynchronous and the user will not be notified when it is
completed. The user may not call
biowait(bp) in this case.
- Otherwise, if
bp
->b_iodone
is
NULL
and B_ASYNC
is not
specified, the user may wait for the I/O transfer to complete with
biowait(bp).
Once an I/O transfer has completed, its
struct buf may be
reused, but the user must first clear the
BO_DONE
flag
of
bp->b_oflags
before
reusing it.
NESTED I/O TRANSFERS
Sometimes an I/O transfer from a single buffer in memory cannot go to a single
location on a block device: it must be split up into smaller transfers for
each segment of the memory buffer.
After initializing the
b_flags
,
b_data
, and
b_bcount
parameters of an I/O transfer for the buffer, called the
master buffer, the user can issue smaller transfers for
segments of the buffer using
nestiobuf_setup(). When nested
I/O transfers complete, in any order, they debit from the amount of work left
to be done in the master buffer. If any segments of the buffer were skipped,
the user can report this with
nestiobuf_done() to debit the
skipped part of the work.
The master buffer's I/O transfer is completed when all nested buffers' I/O
transfers are completed, and if
nestiobuf_done() is called
in the case of skipped segments.
For writes associated with a vnode
vp,
nestiobuf_setup() accounts for
vp->v_numoutput
, so the
caller is not allowed to acquire
vp's vnode lock before
submitting the nested I/O transfer to a block device. However, the caller is
responsible for accounting the master buffer in
vp->v_numoutput
. This must be
done very carefully because after incrementing
vp->v_numoutput
, the caller
is not allowed to acquire
vp's vnode lock before either
calling
nestiobuf_done() or submitting the last nested I/O
transfer to a block device.
For example:
struct buf *mbp, *bp;
size_t skipped = 0;
unsigned i;
int error = 0;
mbp = getiobuf(vp, true);
mbp->b_data = data;
mbp->b_resid = mbp->b_bcount = datalen;
mbp->b_flags = B_WRITE;
KASSERT(0 < nsegs);
KASSERT(datalen == nsegs*segsz);
for (i = 0; i < nsegs; i++) {
struct vnode *devvp;
daddr_t blkno;
vn_lock(vp, LK_EXCLUSIVE | LK_RETRY);
error = VOP_BMAP(vp, i*segsz, &devvp, &blkno, NULL);
VOP_UNLOCK(vp);
if (error == 0 && blkno == -1)
error = EIO;
if (error) {
/* Give up early, don't try to handle holes. */
skipped += datalen - i*segsz;
break;
}
bp = getiobuf(vp, true);
nestiobuf_setup(bp, mbp, i*segsz, segsz);
bp->b_blkno = blkno;
if (i == nsegs - 1) /* Last segment. */
break;
VOP_STRATEGY(devvp, bp);
}
/*
* Account v_numoutput for master write.
* (Must not vn_lock before last VOP_STRATEGY!)
*/
mutex_enter(&vp->v_interlock);
vp->v_numoutput++;
mutex_exit(&vp->v_interlock);
if (skipped)
nestiobuf_done(mbp, skipped, error);
else
VOP_STRATEGY(devvp, bp);
BLOCK DEVICE DRIVERS
Block device drivers implement a ‘strategy’ method, in the
d_strategy
member of
struct
bdevsw
(
driver(9)), to
queue a buffer for disk I/O. The inputs to the strategy method are:
-
-
- bp
->b_flags
- Flags specifying the type of transfer.
B_READ
- Transfer is read from device. If not set, transfer is
write to device.
-
-
- bp
->b_data
- Pointer to kernel virtual address of source/target for
transfer.
-
-
- bp
->b_bcount
- Nonnegative number of bytes requested for transfer.
-
-
- bp
->b_blkno
- Block number at which to do transfer, relative to partition
start.
If the strategy method uses
bufq(9),
it must additionally initialize the following fields before queueing
bp with
bufq_put(9):
-
-
- bp
->b_rawblkno
- Block number relative to volume start.
When the I/O transfer is complete, whether it succeeded or failed, the strategy
method must:
- Set
bp
->b_error
to zero on
success, or to an errno(2)
error code on failure.
- Set
bp
->b_resid
to the number
of bytes remaining to transfer, whether on success or on failure. If no
bytes were transferred, this must be set to
bp->b_bcount
.
- Call
biodone(bp).
FUNCTIONS
-
-
- biodone(bp)
- Notify that the I/O transfer described by
bp has completed.
To be called by a block device driver. Caller must first set
bp
->b_error
to an error
code and bp->b_resid
to
the number of bytes remaining to transfer.
-
-
- biowait(bp)
- Wait for the synchronous I/O transfer described by
bp to complete. Returns the value of
bp
->b_error
.
To be called by a user requesting the I/O transfer.
May not be called if bp has a callback or is
asynchronous — that is, if
bp->b_iodone
is set, or
if B_ASYNC
is set in
bp->b_flags
.
-
-
- getiobuf(vp,
waitok)
- Allocate a struct buf for an I/O
transfer. If vp is non-
NULL
,
the transfer is associated with it. If waitok is
false, returns NULL
if none can be allocated
immediately.
The resulting struct buf pointer must eventually be
passed to putiobuf() to release it. Do
not use
brelse(9).
The buffer may not be used for an asynchronous I/O transfer, because there
is no way to know when it is completed and may be safely passed to
putiobuf(). Asynchronous I/O transfers are allowed only
for buffers in the
buffercache(9).
May sleep if waitok is true.
-
-
- putiobuf(bp)
- Free bp, which must have been
allocated by getiobuf(). Either bp
must never have been submitted to a block device, or the I/O transfer must
have completed.
CODE REFERENCES
The
BUFFERIO subsystem is implemented in
sys/kern/vfs_bio.c.
SEE ALSO
buffercache(9),
bufq(9)
BUGS
The
BUFFERIO abstraction provides no way to cancel an I/O
transfer once it has been submitted to a block device.
The
BUFFERIO abstraction provides no way to do I/O transfers
with non-kernel pages, e.g. directly to buffers in userland without copying
into the kernel first.
The
struct buf type is all mixed up with the
buffercache(9).
The
BUFFERIO abstraction is a totally idiotic API design.
The
v_numoutput
accounting required of
BUFFERIO callers is asinine.