Table of Contents

kevent

Contents

The Proposed Linux kevent API

The proposed Linux kevent API is a new unified event handling interface, similar in spirit to completion ports and the FreeBSD/OS X kqueue interface. Using a single kernel call, a thread can wait for all possible event types that the kernel can generate, instead of past interfaces that only allow you to wait for specific subsets of events (e.g. POSIX sigevent completions are limited only to AIO completion, timer expiry, and the arrival of new messages to a message queue, while epoll is just a more efficient method of doing a traditional Unix select or poll).



Project was closed, for details consider links at homepage.


Kevent API

 int kevent_init(struct kevent_ring *ring, unsigned int ring_size, unsigned int flags);

Return value: kevent control file descriptor or negative error value.

struct kevent_ring
{
  unsigned int ring_kidx, ring_over;
  struct ukevent event[0];
}

Example userspace code (ring_buffer.c) can be found on project's homepage.

Each kevent syscall can be so called cancellation point in glibc, i.e. when thread has been canceled in kevent syscall, thread can be safely removed and no events will be lost, since each syscall (kevent_wait() or kevent_get_events()) will copy event into special ring buffer, accessible from other threads or even processes (if shared memory is used).

When kevent is removed (not dequeued when it is ready, but just removed), even if it was ready, it is not copied into ring buffer, since if it is removed, no one cares about it (otherwise user would wait until it becomes ready and got it through usual way using kevent_get_events() or kevent_wait()) and thus no need to copy it to the ring buffer.


 int kevent_ctl(int fd, unsigned int cmd, unsigned int num, struct ukevent *arg)

Return value: number of events processed or negative error value.

When called, kevent_ctl() will carry out the operation specified in the cmd parameter.


 int kevent_get_events(int ctl_fd, unsigned int min_nr, unsigned int max_nr, struct timespec timeout, struct ukevent *buf, unsigned flags)

Return value: number of events copied or negative error value

kevent_get_events will wait timeout nanoseconds for at least min_nr completed events, copying completed struct ukevents to buf and deleting any KEVENT_REQ_ONESHOT event requests. In nonblocking mode it returns as many events as possible, but not more than max_nr. In blocking mode it waits until timeout or if at least min_nr events are ready.

This function copies event into ring buffer if it was initialized, if ring buffer is full, KEVENT_RET_COPY_FAILED flag is set in ret_flags field.


 int kevent_wait(int ctl_fd, unsigned int num, struct timespec timeout, unsigned int flags)

Return value: number of events copied into ring buffer or negative error value.

This syscall waits until either timeout expires or at least one event becomes ready. It also copies events into special ring buffer. If ring buffer is full, it waits until there are ready events and then return. If kevent is one-shot kevent it is removed in this syscall. If kevent is edge-triggered (KEVENT_REQ_ET flag is set in 'req_flags') it is requeued in this syscall for performance reasons.


int kevent_commit(int ctl_fd, unsigned int new_uidx, unsigned int over);

Return value: number of committed kevents or negative error value.

This function commits, i.e. marks as empty, slots in the ring buffer, so they can be reused when userspace completes that entries processing.

Overflow counter is used to prevent situation when two threads are going to free the same events, but one of them was scheduled away for too long, so ring indexes were wrapped, so when that thread will be awakened, it will free not those events, which it suppose to free.

It is possible that returned number of committed events will be smaller than requested number - it is possible when several threads try to commit the same events.


long aio_sendfile(int kevent_fd, int sock_fd, int in_fd, off_t offset, size_t count);

Async sendfile implementation. Returned cookie can be used to determine which entry has been returned by kevent_get_events() - it will be stored in event.ptr. event.ret_data will contain number of bytes actually transferred.


long aio_sendfile_path(int kevent_fd, int sock_fd, void *header, size_t header_size, char *filename, off_t offset, size_t count);

Async sendfile implementation. Returned cookie can be used to determine which entry has been returned by kevent_get_events() - it will be stored in event.ptr. event.ret_data will contain number of bytes actually transferred.


struct ukevent

The bulk of the interface is entirely done through the ukevent struct. It is used to add event requests, modify existing event requests, specify which event requests to remove, and return completed events.

struct ukevent contains the following members:

KEVENT flags

Kevent kernel subsystems

Usage

For KEVENT_CTL_ADD, all fields relevant to the event type must be filled (id, type, possibly event, req_flags). After kevent_ctl(…, KEVENT_CTL_ADD, …) returns each struct's ret_flags should be checked to see if the event is already broken or done.

For KEVENT_CTL_MODIFY, the id, req_flags, and user and event fields must be set and an existing kevent request must have matching id and user fields. If a match is found, req_flags and event are replaced with the newly supplied values and requeueing is started, so modified kevent can be checked and probably marked as ready immediately. If a match can't be found, the passed in ukevent's ret_flags has KEVENT_RET_BROKEN set. KEVENT_RET_DONE is always set.

For KEVENT_CTL_REMOVE, the id and user fields must be set and an existing kevent request must have matching id and user fields. If a match is found, the kevent request is removed. If a match can't be found, the passed in ukevent's ret_flags has KEVENT_RET_BROKEN set. KEVENT_RET_DONE is always set.

For kevent_get_events, the entire structure is returned.


Use cases

struct ukevent should contain following fields: