NAME
rumpuser —
rump kernel hypercall
interface
LIBRARY
rump User Library (librumpuser, -lrumpuser)
SYNOPSIS
#include <rump/rumpuser.h>
DESCRIPTION
The
rumpuser hypercall interfaces allow a rump kernel to
access host resources. A hypervisor implementation must implement the routines
described in this document to allow a rump kernel to run on the host. The
implementation included in
NetBSD is for POSIX-like
hosts (*BSD, Linux, etc.). This document is divided into sections based on the
functionality group of each hypercall.
Since the hypercall interface is a C function interface, both the rump kernel
and the hypervisor must conform to the same ABI. The interface itself attempts
to assume as little as possible from the type systems, and for example
off_t is passed as
int64_t and
enums are passed as ints. It is recommended that the hypervisor converts these
to the native types before starting to process the hypercall, for example by
assigning the ints back to enums.
UPCALLS AND RUMP KERNEL
CONTEXT
A hypercall is always entered with the calling thread scheduled in the rump
kernel. In case the hypercall intends to block while waiting for an event, the
hypervisor must first release the rump kernel scheduling context. In other
words, the rump kernel context is a resource and holding on to it while
waiting for a rump kernel event/resource may lead to a deadlock. Even when
there is no possibility of deadlock in the strict sense of the term, holding
on to the rump kernel context while performing a slow hypercall such as
reading a device will prevent other threads (including the clock interrupt)
from using that rump kernel context.
Releasing the context is done by calling the
hyp_backend_unschedule() upcall which the hypervisor
received from rump kernel as a parameter for
rumpuser_init(). Before a hypercall returns back to the rump
kernel, the returning thread must carry a rump kernel context. In case the
hypercall unscheduled itself, it must reschedule itself by calling
hyp_backend_schedule().
HYPERCALL INTERFACES
Initialization
int rumpuser_init(
int
version,
struct rump_hyperup *hyp)
Initialize the hypervisor.
-
-
- version
- hypercall interface version number that the kernel expects
to be used. In case the hypervisor cannot provide an exact match, this
routine must return a non-zero value.
-
-
- hyp
- pointer to a set of upcalls the hypervisor can make into
the rump kernel
Memory allocation
int
rumpuser_malloc(
size_t len,
int alignment,
void **memp)
-
-
- len
- amount of memory to allocate
-
-
- alignment
- size the returned memory must be aligned to. For example,
if the value passed is 4096, the returned memory must be aligned to a 4k
boundary.
-
-
- memp
- return pointer for allocated memory
void rumpuser_free(
void
*mem,
size_t len)
-
-
- mem
- memory to free
-
-
- len
- length of allocation. This is always equal to the amount
the caller requested from the rumpuser_malloc() which
returned mem.
Files and I/O
int rumpuser_open(
const
char *name,
int mode,
int
*fdp)
Open
name for I/O and associate a file descriptor with it.
Notably, there needs to be no mapping between
name and
the host's file system namespace. For example, it is possible to associate the
file descriptor with device I/O registers for special values of
name.
-
-
- name
- the identifier of the file to open for I/O
-
-
- mode
- combination of the following:
-
-
RUMPUSER_OPEN_RDONLY
- open only for reading
-
-
RUMPUSER_OPEN_WRONLY
- open only for writing
-
-
RUMPUSER_OPEN_RDWR
- open for reading and writing
-
-
RUMPUSER_OPEN_CREATE
- do not treat missing name as an
error
-
-
RUMPUSER_OPEN_EXCL
- combined with
RUMPUSER_OPEN_CREATE
, flag an error if
name already exists
-
-
RUMPUSER_OPEN_BIO
- the caller will use this file for block I/O, usually
used in conjunction with accessing file system media. The hypervisor
should treat this flag as advisory and possibly enable some
optimizations for *fdp based on it.
Notably, the permissions of the created file are left up to the hypervisor
implementation.
-
-
- fdp
- An integer value denoting the open file is returned
here.
int rumpuser_close(
int
fd)
Close a previously opened file descriptor.
int
rumpuser_getfileinfo(
const char *name,
uint64_t *size,
int *type)
-
-
- name
- file for which information is returned. The namespace is
equal to that of rumpuser_open().
-
-
- size
- If non-
NULL
, size of the file is
returned here.
-
-
- type
- If non-
NULL
, type of the file is
returned here. The options are RUMPUSER_FT_DIR
,
RUMPUSER_FT_REG
,
RUMPUSER_FT_BLK
,
RUMPUSER_FT_CHR
, or
RUMPUSER_FT_OTHER
for directory, regular file,
block device, character device or unknown, respectively.
void rumpuser_bio(
int
fd,
int op,
void *data,
size_t dlen,
int64_t off,
rump_biodone_fn biodone,
void
*donearg);
Initiate block I/O and return immediately.
-
-
- fd
- perform I/O on this file descriptor. The file descriptor
must have been opened with
RUMPUSER_OPEN_BIO
.
-
-
- op
- Transfer data from the file descriptor with
RUMPUSER_BIO_READ
and transfer data to the file
descriptor with RUMPUSER_BIO_WRITE
. Unless
RUMPUSER_BIO_SYNC
is specified, the hypervisor may
cache a write instead of committing it to permanent storage.
-
-
- data
- memory address to transfer data to/from
-
-
- dlen
- length of I/O. The length is guaranteed to be a multiple of
512.
-
-
- off
- offset into fd where I/O is
performed
-
-
- biodone
- To be called when the I/O is complete. Accessing
data is not legal after the call is made.
-
-
- donearg
- opaque arg that must be passed to
biodone.
int rumpuser_iovread(
int
fd,
struct rumpuser_iovec *ruiov,
size_t iovlen,
int64_t off,
size_t *retv);
int
rumpuser_iovwrite(
int fd,
struct rumpuser_iovec *ruiov,
size_t
iovlen,
int64_t off,
size_t
*retv);
These routines perform scatter-gather I/O which is not block I/O by nature and
therefore cannot be handled by
rumpuser_bio().
-
-
- fd
- file descriptor to perform I/O on
-
-
- ruiov
- an array of I/O descriptors. It is defined as follows:
struct rumpuser_iovec {
void *iov_base;
size_t iov_len;
};
-
-
- iovlen
- number of elements in ruiov
-
-
- off
- offset of fd to perform I/O on. This
can either be a non-negative value or
RUMPUSER_IOV_NOSEEK
. The latter denotes that no
attempt to change the underlying objects offset should be made. Using both
types of offsets on a single instance of fd results
in undefined behavior.
-
-
- retv
- number of bytes successfully transferred is returned
here
int rumpuser_syncfd(
int
fd,
int flags,
uint64_t
start,
uint64_t len);
Synchronizes
fd with respect to backing storage. The other
arguments are:
-
-
- flags
- controls how synchronization happens. It must contain one
of the following:
-
-
RUMPUSER_SYNCFD_READ
- Make sure that the next read sees writes from all other
parties. This is useful for example in the case that
fd represents memory to write a DMA read is
being performed.
-
-
RUMPUSER_SYNCFD_WRITE
- Flush cached writes.
The following additional parameters may be passed in
flags:
-
-
RUMPUSER_SYNCFD_BARRIER
- Issue a barrier. Outstanding I/O operations which were
started before the barrier complete before any operations after the
barrier are performed.
-
-
RUMPUSER_SYNCFD_SYNC
- Wait for the synchronization operation to fully
complete before returning. For example, this could mean that the data
to be written to a disk has hit either the disk or non-volatile
memory.
-
-
- start
- offset into the object.
-
-
- len
- the number of bytes to synchronize. The value 0 denotes
until the end of the object.
Clocks
The hypervisor should support two clocks, one for wall time and one for
monotonically increasing time, the latter of which may be based on some
arbitrary time (e.g. system boot time). If this is not possible, the
hypervisor must make a reasonable effort to retain semantics.
int
rumpuser_clock_gettime(
int
enum_rumpclock,
int64_t *sec,
long
*nsec)
-
-
- enum_rumpclock
- specifies the clock type. In case of
RUMPUSER_CLOCK_RELWALL
the wall time should be
returned. In case of RUMPUSER_CLOCK_ABSMONO
the
time of a monotonic clock should be returned.
-
-
- sec
- return value for seconds
-
-
- nsec
- return value for nanoseconds
int
rumpuser_clock_sleep(
int
enum_rumpclock,
int64_t sec,
long
nsec)
-
-
- enum_rumpclock
- In case of
RUMPUSER_CLOCK_RELWALL
,
the sleep should last at least as long as specified. In case of
RUMPUSER_CLOCK_ABSMONO
, the sleep should last
until the hypervisor monotonic clock hits the specified absolute
time.
-
-
- sec
- sleep duration, seconds. exact semantics depend on
clk.
-
-
- nsec
- sleep duration, nanoseconds. exact semantics depend on
clk.
Parameter retrieval
int
rumpuser_getparam(
const char *name,
void *buf,
size_t buflen)
Retrieve a configuration parameter from the hypervisor. It is up to the
hypervisor to decide how the parameters can be set.
-
-
- name
- name of the parameter. If the name starts with an
underscore, it means a mandatory parameter. The mandatory parameters are
RUMPUSER_PARAM_NCPU
which specifies the amount of
virtual CPUs bootstrapped by the rump kernel and
RUMPUSER_PARAM_HOSTNAME
which returns a preferably
unique instance name for the rump kernel.
-
-
- buf
- buffer to return the data in as a string
-
-
- buflen
- length of buffer
Termination
void rumpuser_exit(
int
value)
Terminate the rump kernel with exit value
value. If
value is
RUMPUSER_PANIC
the
hypervisor should attempt to provide something akin to a core dump.
Console output
Console output is divided into two routines: a per-character one and printf-like
one. The former is used e.g. by the rump kernel's internal printf routine. The
latter can be used for direct debug prints e.g. very early on in the rump
kernel's bootstrap or when using the in-kernel routine causes too much skew in
the debug print results (the hypercall runs outside of the rump kernel and
therefore does not cause any locking or scheduling events inside the rump
kernel).
void
rumpuser_putchar(
int ch)
Output
ch on the console.
void
rumpuser_dprintf(
const char *fmt,
...)
Do output based on printf-like parameters.
Signals
A rump kernel should be able to send signals to client programs due to some
standard interfaces including signal delivery in their specifications.
Examples of these interfaces include
setitimer(2) and
write(2). The
rumpuser_kill() function advises the hypercall
implementation to raise a signal for the process containing the rump kernel.
int
rumpuser_kill(
int64_t pid,
int sig)
-
-
- pid
- The pid of the rump kernel process that the signal is
directed to. This value may be used as the hypervisor as a hint on how to
deliver the signal. The value
RUMPUSER_PID_SELF
may also be specified to indicate no hint. This value will be removed in a
future version of the hypercall interface.
-
-
- sig
- Number of signal to raise. The value is in NetBSD signal
number namespace. In case the host has a native representation for
signals, the value should be translated before the signal is raised. In
case there is no mapping between sig and native
signals (if any), the behavior is implementation-defined.
A rump kernel will ignore the return value of this hypercall. The only
implication of not implementing
rumpuser_kill() is that some
application programs may not experience expected behavior for standard
interfaces.
As an aside,the
rump_sp(7)
protocol provides equivalent functionality for remote clients.
Random pool
int
rumpuser_getrandom(
void *buf,
size_t buflen,
int flags,
size_t *retp)
-
-
- buf
- buffer that the randomness is written to
-
-
- buflen
- number of bytes of randomness requested
-
-
- flags
- The value 0 or a combination of
RUMPUSER_RANDOM_HARD
(return true randomness
instead of something from a PRNG) and
RUMPUSER_RANDOM_NOWAIT
(do not block in case the
requested amount of bytes is not available).
-
-
- retp
- The number of random bytes written into
buf.
Threads
int
rumpuser_thread_create(
void *(*fun)(void
*),
void *arg,
const char
*thrname,
int mustjoin,
int
priority,
int cpuidx,
void
**cookie);
Create a schedulable host thread context. The rump kernel will call this
interface when it creates a kernel thread. The scheduling policy for the new
thread is defined by the hypervisor. In case the hypervisor wants to optimize
the scheduling of the threads, it can perform heuristics on the
thrname,
priority and
cpuidx parameters.
-
-
- fun
- function that the new thread must call. This call will
never return.
-
-
- arg
- argument to be passed to fun
-
-
- thrname
- Name of the new thread.
-
-
- mustjoin
- If 1, the thread will be waited for by
rumpuser_thread_join() when the thread exits.
-
-
- priority
- The priority that the kernel requested the thread to be
created at. Higher values mean higher priority. The exact kernel semantics
for each value are not available through this interface.
-
-
- cpuidx
- The index of the virtual CPU that the thread is bound to,
or -1 if the thread is not bound. The mapping between the virtual CPUs and
physical CPUs, if any, is hypervisor implementation specific.
-
-
- cookie
- In case mustjoin is set, the value
returned in cookie will be passed to
rumpuser_thread_join().
void
rumpuser_thread_exit(
void)
Called when a thread created with
rumpuser_thread_create()
exits.
int
rumpuser_thread_join(
void *cookie)
Wait for a joinable thread to exit. The cookie matches the value from
rumpuser_thread_create().
void
rumpuser_curlwpop(
int enum_rumplwpop,
struct lwp *l)
Manipulate the hypervisor's thread context database. The possible operations are
create, destroy, and set as specified by
enum_rumplwpop:
-
-
RUMPUSER_LWP_CREATE
- Inform the hypervisor that l is now a
valid thread context which may be set. A currently valid value of
l may not be specified. This operation is
informational and does not mandate any action from the hypervisor.
-
-
RUMPUSER_LWP_DESTROY
- Inform the hypervisor that l is no
longer a valid thread context. This means that it may no longer be set as
the current context. A currently set context or an invalid one may not be
destroyed. This operation is informational and does not mandate any action
from the hypervisor.
-
-
RUMPUSER_LWP_SET
- Set l as the current host thread's
rump kernel context. A previous context must not exist.
-
-
RUMPUSER_LWP_CLEAR
- Clear the context previous set by
RUMPUSER_LWP_SET
. The value passed in
l is the current thread and is never
NULL
.
struct lwp *
rumpuser_curlwp(
void)
Retrieve the rump kernel thread context associated with the current host thread,
as set by
rumpuser_curlwpop(). This routine may be called
when a context is not set and the routine must return
NULL
in that case. This interface is expected to be
called very often. Any optimizations pertaining to the execution speed of this
routine should be done in
rumpuser_curlwpop().
void
rumpuser_seterrno(
int errno)
Set an errno value in the calling thread's TLS. Note: this is used only if rump
kernel clients make rump system calls.
Mutexes, rwlocks
and condition variables
The locking interfaces have standard semantics, so we will not discuss each one
in detail. The data types
struct rumpuser_mtx,
struct rumpuser_rw and
struct
rumpuser_cv used by these interfaces are opaque to the rump kernel, i.e.
the hypervisor has complete freedom over them.
Most of these interfaces will (and must) relinquish the rump kernel CPU context
in case they block (or intend to block). The exceptions are the
"nowrap" variants of the interfaces which may not relinquish rump
kernel context.
void
rumpuser_mutex_init(
struct rumpuser_mtx
**mtxp,
int flags)
void
rumpuser_mutex_enter(
struct rumpuser_mtx
*mtx)
void
rumpuser_mutex_enter_nowrap(
struct
rumpuser_mtx *mtx)
int
rumpuser_mutex_tryenter(
struct rumpuser_mtx
*mtx)
void
rumpuser_mutex_exit(
struct rumpuser_mtx
*mtx)
void
rumpuser_mutex_destroy(
struct rumpuser_mtx
*mtx)
void
rumpuser_mutex_owner(
struct rumpuser_mtx
*mtx,
struct lwp **lp)
Mutexes provide mutually exclusive locking. The flags, of which at least one
must be given, are as follows:
-
-
RUMPUSER_MTX_SPIN
- Create a spin mutex. Locking this type of mutex must not
relinquish rump kernel context even when
rumpuser_mutex_enter() is used.
-
-
RUMPUSER_MTX_KMUTEX
- The mutex must track and be able to return the rump kernel
thread that owns the mutex (if any). If this flag is not specified,
rumpuser_mutex_owner() will never be called for that
particular mutex.
void
rumpuser_rw_init(
struct rumpuser_rw
**rwp)
void
rumpuser_rw_enter(
int enum_rumprwlock,
struct rumpuser_rw *rw)
int
rumpuser_rw_tryenter(
int
enum_rumprwlock,
struct rumpuser_rw *rw)
int
rumpuser_rw_tryupgrade(
struct rumpuser_rw
*rw)
void
rumpuser_rw_downgrade(
struct rumpuser_rw
*rw)
void
rumpuser_rw_exit(
struct rumpuser_rw
*rw)
void
rumpuser_rw_destroy(
struct rumpuser_rw
*rw)
void
rumpuser_rw_held(
int enum_rumprwlock,
struct rumpuser_rw *rw,
int
*heldp);
Read/write locks provide either shared or exclusive locking. The possible values
for
lk are
RUMPUSER_RW_READER
and
RUMPUSER_RW_WRITER
. Upgrading means trying to
migrate from an already owned shared lock to an exclusive lock and downgrading
means migrating from an already owned exclusive lock to a shared lock.
void
rumpuser_cv_init(
struct rumpuser_cv
**cvp)
void
rumpuser_cv_destroy(
struct rumpuser_cv
*cv)
void
rumpuser_cv_wait(
struct rumpuser_cv
*cv,
struct rumpuser_mtx *mtx)
void
rumpuser_cv_wait_nowrap(
struct rumpuser_cv
*cv,
struct rumpuser_mtx *mtx)
int
rumpuser_cv_timedwait(
struct rumpuser_cv
*cv,
struct rumpuser_mtx *mtx,
int64_t sec,
int64_t nsec);
void
rumpuser_cv_signal(
struct rumpuser_cv
*cv)
void
rumpuser_cv_broadcast(
struct rumpuser_cv
*cv)
void
rumpuser_cv_has_waiters(
struct rumpuser_cv
*cv,
int *waitersp)
Condition variables wait for an event. The
mtx interlock
eliminates a race between checking the predicate and sleeping on the condition
variable; the mutex should be released for the duration of the sleep in the
normal atomic manner. The timedwait variant takes a specifier indicating a
relative sleep duration after which the routine will return with
ETIMEDOUT
. If a timedwait is signaled before the
timeout expires, the routine will return 0.
The order in which the hypervisor reacquires the rump kernel context and
interlock mutex before returning into the rump kernel is as follows. In case
the interlock mutex was initialized with both
RUMPUSER_MTX_SPIN
and
RUMPUSER_MTX_KMUTEX
, the rump kernel context is
scheduled before the mutex is reacquired. In case of a purely
RUMPUSER_MTX_SPIN
mutex, the mutex is acquired first.
In the final case the order is implementation-defined.
RETURN VALUES
All routines which return an integer return an errno value. The hypervisor must
translate the value to the the native errno namespace used by the rump kernel.
Routines which do not return an integer may never fail.
SEE ALSO
rump(3)
Antti Kantee,
Flexible Operating System Internals: The Design and
Implementation of the Anykernel and Rump Kernels, Aalto
University Doctoral Dissertations, 2012,
Section 2.3.2: The Hypercall Interface.
For a list of all known implementations of the
rumpuser
interface, see
http://wiki.rumpkernel.org/Platforms.
HISTORY
The rump kernel hypercall API was first introduced in
NetBSD
5.0. The API described above first appeared in
NetBSD
7.0.