A few hints on supporting kdbus as backend in your favorite D-Bus library.

~~~

Before you read this, have a look at the DIFFERENCES and
GVARIANT_SERIALIZATION texts you find in the same directory where you
found this.

We invite you to port your favorite D-Bus protocol implementation
over to kdbus. However, there are a couple of complexities
involved. On kdbus we only speak GVariant marshaling, kdbus clients
ignore traffic in dbus1 marshaling. Thus, you need to add a second,
GVariant compatible marshaler to your library first.

After you have done that: here's the basic principle how kdbus works:

You connect to a bus by opening its bus node in /sys/fs/kdbus/. All
buses have a device node there, it starts with a numeric UID of the
owner of the bus, followed by a dash and a string identifying the
bus. The system bus is thus called /sys/fs/kdbus/0-system, and for user
buses the device node is /sys/fs/kdbus/1000-user (if 1000 is your user
id).

(Before we proceed, please always keep a copy of libsystemd next
to you, ultimately that's where the details are, this document simply
is a rough overview to help you grok things.)

CONNECTING

To connect to a bus, simply open() its device node and issue the
KDBUS_CMD_HELLO call. That's it. Now you are connected. Do not send
Hello messages or so (as you would on dbus1), that does not exist for
kdbus.

The structure you pass to the ioctl will contain a couple of
parameters that you need to know, to operate on the bus.

There are two flags fields, one indicating features of the kdbus
kernel side ("conn_flags"), the other one ("bus_flags") indicating
features of the bus owner (i.e. systemd). Both flags fields are 64bit
in width.

When calling into the ioctl, you need to place your own supported
feature bits into these fields. This tells the kernel about the
features you support. When the ioctl returns, it will contain the
features the kernel supports.

If any of the higher 32bit are set on the two flags fields and your
client does not know what they mean, it must disconnect. The upper
32bit are used to indicate "incompatible" feature additions on the bus
system, the lower 32bit indicate "compatible" feature additions. A
client that does not support a "compatible" feature addition can go on
communicating with the bus, however a client that does not support an
"incompatible" feature must not proceed with the connection. When a
client encountes such an "incompatible" feature it should immediately
try the next bus address configured in the bus address string.

The hello structure also contains another flags field "attach_flags"
which indicates metadata that is optionally attached to all incoming
messages. You probably want to set KDBUS_ATTACH_NAMES unconditionally
in it. This has the effect that all well-known names of a sender are
attached to all incoming messages. You need this information to
implement matches that match on a message sender name correctly. Of
course, you should only request the attachment of as little metadata
fields as you need.

The kernel will return in the "id" field your unique id. This is a
simple numeric value. For compatibility with classic dbus1 simply
format this as string and prefix ":1.".

The kernel will also return the bloom filter size and bloom filter
hash function number used for the signal broadcast bloom filter (see
below).

The kernel will also return the bus ID of the bus in a 128bit field.

The pool size field specifies the size of the memory mapped buffer.
After the calling the hello ioctl, you should memory map the kdbus
fd. In this memory mapped region, the kernel will place all your incoming
messages.

SENDING MESSAGES

Use the MSG_SEND ioctl to send a message to another peer. The ioctl
takes a structure that contains a variety of fields:

The flags field corresponds closely to the old dbus1 message header
flags field, though the DONT_EXPECT_REPLY field got inverted into
EXPECT_REPLY.

The dst_id/src_id field contains the unique id of the destination and
the sender. The sender field is overridden by the kernel usually, hence
you shouldn't fill it in. The destination field can also take the
special value KDBUS_DST_ID_BROADCAST for broadcast messages. For
messages intended to a well-known name set the field to
KDBUS_DST_ID_NAME, and attach the name in a special "items" entry to
the message (see below).

The payload field indicates the payload. For all dbus traffic it
should carry the value 0x4442757344427573ULL. (Which encodes
'DBusDBus').

The cookie field corresponds with the "serial" field of classic
dbus1. We simply renamed it here (and extended it to 64bit) since we
didn't want to imply the monotonicity of the assignment the way the
word "serial" indicates it.

When sending a message that expects a reply, you need to set the
EXPECT_REPLY flag in the message flag field. In this case you should
also fill out the "timeout_ns" value which indicates the timeout in
nsec for this call. If the peer does not respond in this time you will
get a notification of a timeout. Note that this is also used for
security purposes: a single reply messages is only allowed through the
bus as long as the timeout has not ended. With this timeout value you
hence "open a time window" in which the peer might respond to your
request and the policy allows the response to go through.

When sending a message that is a reply, you need to fill in the
cookie_reply field, which is similar to the reply_serial field of
dbus1. Note that a message cannot have EXPECT_REPLY and a reply_serial
at the same time!

This pretty much explains the ioctl header. The actual payload of the
data is now referenced in additional items that are attached to this
ioctl header structure at the end. When sending a message, you attach
items of the type PAYLOAD_VEC, PAYLOAD_MEMFD, FDS, BLOOM_FILTER,
DST_NAME to it:

   KDBUS_ITEM_PAYLOAD_VEC: contains a pointer + length pair for
   referencing arbitrary user memory. This is how you reference most
   of your data. It's a lot like the good old iovec structure of glibc.

   KDBUS_ITEM_PAYLOAD_MEMFD: for large data blocks it is preferable
   to send prepared "memfds" (see below) over. This item contains an
   fd for a memfd plus a size.

   KDBUS_ITEM_FDS: for sending over fds attach an item of this type with
   an array of fds.

   KDBUS_ITEM_BLOOM_FILTER: the calculated bloom filter of this message,
   only for undirected (broadcast) message.

   KDBUS_ITEM_DST_NAME: for messages that are directed to a well-known
   name (instead of a unique name), this item contains the well-known
   name field.

A single message may consists of no, one or more payload items of type
PAYLOAD_VEC or PAYLOAD_MEMFD. D-Bus protocol implementations should
treat them as a single block that just happens to be split up into
multiple items. Some restrictions apply however:

   The message header in its entirety must be contained in a single
   PAYLOAD_VEC item.

   You may only split your message up right in front of each GVariant
   contained in the payload, as well is immediately before framing of a
   Gvariant, as well after as any padding bytes if there are any. The
   padding bytes must be wholly contained in the preceding
   PAYLOAD_VEC/PAYLOAD_MEMFD item. You may not split up basic types
   nor arrays of fixed types. The latter is necessary to allow APIs
   to return direct pointers to linear arrays of numeric
   values. Examples: The basic types "u", "s", "t" have to be in the
   same payload item. The array of fixed types "ay", "ai" have to be
   fully in contained in the same payload item. For an array "as" or
   "a(si)" the only restriction however is to keep each string
   individually in an uninterrupted item, to keep the framing of each
   element and the array in a single uninterrupted item, however the
   various strings might end up in different items.

Note again, that splitting up messages into separate items is up to the
implementation. Also note that the kdbus kernel side might merge
separate items if it deems this to be useful. However, the order in
which items are contained in the message is left untouched.

PAYLOAD_MEMFD items allow zero-copy data transfer (see below regarding
the memfd concept). Note however that the overhead of mapping these
makes them relatively expensive, and only worth the trouble for memory
blocks > 512K (this value appears to be quite universal across
architectures, as we tested). Thus we recommend sending PAYLOAD_VEC
items over for small messages and restore to PAYLOAD_MEMFD items for
messages > 512K. Since while building up the message you might not
know yet whether it will grow beyond this boundary a good approach is
to simply build the message unconditionally in a memfd
object. However, when the message is sealed to be sent away check for
the size limit. If the size of the message is < 512K, then simply send
the data as PAYLOAD_VEC and reuse the memfd. If it is >= 512K, seal
the memfd and send it as PAYLOAD_MEMFD, and allocate a new memfd for
the next message.

RECEIVING MESSAGES

Use the MSG_RECV ioctl to read a message from kdbus. This will return
an offset into the pool memory map, relative to its beginning.

The received message structure more or less follows the structure of
the message originally sent. However, certain changes have been
made. In the header the src_id field will be filled in.

The payload items might have gotten merged and PAYLOAD_VEC items are
not used. Instead, you will only find PAYLOAD_OFF and PAYLOAD_MEMFD
items. The former contain an offset and size into your memory mapped
pool where you find the payload.

If during the HELLO ioctl you asked for getting metadata attached to
your message, you will find additional KDBUS_ITEM_CREDS,
KDBUS_ITEM_PID_COMM, KDBUS_ITEM_TID_COMM, KDBUS_ITEM_TIMESTAMP,
KDBUS_ITEM_EXE, KDBUS_ITEM_CMDLINE, KDBUS_ITEM_CGROUP,
KDBUS_ITEM_CAPS, KDBUS_ITEM_SECLABEL, KDBUS_ITEM_AUDIT items that
contain this metadata. This metadata will be gathered from the sender
at the point in time it sends the message. This information is
uncached, and since it is appended by the kernel, trustable. The
KDBUS_ITEM_SECLABEL item usually contains the SELinux security label,
if it is used.

After processing the message you need to call the KDBUS_CMD_FREE
ioctl, which releases the message from the pool, and allows the kernel
to store another message there. Note that the memory used by the pool
is ordinary anonymous, swappable memory that is backed by tmpfs. Hence
there is no need to copy the message out of it quickly, instead you
can just leave it there as long as you need it and release it via the
FREE ioctl only after that's done.

BLOOM FILTERS

The kernel does not understand dbus marshaling, it will not look into
the message payload. To allow clients to subscribe to specific subsets
of the broadcast matches we employ bloom filters.

When broadcasting messages, a bloom filter needs to be attached to the
message in a KDBUS_ITEM_BLOOM item (and only for broadcasting
messages!). If you don't know what bloom filters are, read up now on
Wikipedia. In short: they are a very efficient way how to
probabilistically check whether a certain word is contained in a
vocabulary. It knows no false negatives, but it does know false
positives.

The parameters for the bloom filters that need to be included in
broadcast message is communicated to userspace as part of the hello
response structure (see above). By default it has the parameters m=512
(bits in the filter), k=8 (nr of hash functions). Note however, that
this is subject to change in later versions, and userspace
implementations must be capable of handling m values between at least
m=8 and m=2^32, and k values between at least k=1 and k=32. The
underlying hash function is SipHash-2-4. It is used with a number of
constant (yet originally randomly generated) 128bit hash keys, more
specifically:

   b9,66,0b,f0,46,70,47,c1,88,75,c4,9c,54,b9,bd,15,
   aa,a1,54,a2,e0,71,4b,39,bf,e1,dd,2e,9f,c5,4a,3b,
   63,fd,ae,be,cd,82,48,12,a1,6e,41,26,cb,fa,a0,c8,
   23,be,45,29,32,d2,46,2d,82,03,52,28,fe,37,17,f5,
   56,3b,bf,ee,5a,4f,43,39,af,aa,94,08,df,f0,fc,10,
   31,80,c8,73,c7,ea,46,d3,aa,25,75,0f,9e,4c,09,29,
   7d,f7,18,4b,7b,a4,44,d5,85,3c,06,e0,65,53,96,6d,
   f2,77,e9,6f,93,b5,4e,71,9a,0c,34,88,39,25,bf,35

When calculating the first bit index into the bloom filter, the
SipHash-2-4 hash value is calculated for the input data and the first
16 bytes of the array above as hash key. Of the resulting 8 bytes of
output, as many full bytes are taken for the bit index as necessary,
starting from the output's first byte. For the second bit index the
same hash value is used, continuing with the next unused output byte,
and so on. Each time the bytes returned by the hash function are
depleted it is recalculated with the next 16 byte hash key from the
array above and the same input data.

For each message to send across the bus we populate the bloom filter
with all possible matchable strings. If a client then wants to
subscribe to messages of this type, it simply tells the kernel to test
its own calculated bit mask against the bloom filter of each message.

More specifically, the following strings are added to the bloom filter
of each message that is broadcasted:

  The string "interface:" suffixed by the interface name

  The string "member:" suffixed by the member name

  The string "path:" suffixed by the path name

  The string "path-slash-prefix:" suffixed with the path name, and
  also all prefixes of the path name (cut off at "/"), also prefixed
  with "path-slash-prefix".

  The string "message-type:" suffixed with the strings "signal",
  "method_call", "error" or "method_return" for the respective message
  type of the message.

  If the first argument of the message is a string, "arg0:" suffixed
  with the first argument.

  If the first argument of the message is a string, "arg0-dot-prefix"
  suffixed with the first argument, and also all prefixes of the
  argument (cut off at "."), also prefixed with "arg0-dot-prefix".

  If the first argument of the message is a string,
  "arg0-slash-prefix" suffixed with the first argument, and also all
  prefixes of the argument (cut off at "/"), also prefixed with
  "arg0-slash-prefix".

  Similar for all further arguments that are strings up to 63, for the
  arguments and their "dot" and "slash" prefixes. On the first
  argument that is not a string, addition to the bloom filter should be
  stopped however.

(Note that the bloom filter does not contain sender nor receiver
names!)

When a client wants to subscribe to messages matching a certain
expression, it should calculate the bloom mask following the same
algorithm. The kernel will then simply test the mask against the
attached bloom filters.

Note that bloom filters are probabilistic, which means that clients
might get messages they did not expect. Your bus protocol
implementation must be capable of dealing with these unexpected
messages (which it needs to anyway, given that transfers are
relatively unrestricted on kdbus and people can send you all kinds of
non-sense).

If a client connects to a bus whose bloom filter metrics (i.e. filter
size and number of hash functions) are outside of the range the client
supports it must immediately disconnect and continue connection with
the next bus address of the bus connection string.

INSTALLING MATCHES

To install matches for broadcast messages, use the KDBUS_CMD_ADD_MATCH
ioctl. It takes a structure that contains an encoded match expression,
and that is followed by one or more items, which are combined in an
AND way. (Meaning: a message is matched exactly when all items
attached to the original ioctl struct match).

To match against other user messages add a KDBUS_ITEM_BLOOM item in
the match (see above). Note that the bloom filter does not include
matches to the sender names. To additionally check against sender
names, use the KDBUS_ITEM_ID (for unique id matches) and
KDBUS_ITEM_NAME (for well-known name matches) item types.

To match against kernel generated messages (see below) you should add
items of the same type as the kernel messages include,
i.e. KDBUS_ITEM_NAME_ADD, KDBUS_ITEM_NAME_REMOVE,
KDBUS_ITEM_NAME_CHANGE, KDBUS_ITEM_ID_ADD, KDBUS_ITEM_ID_REMOVE and
fill them out. Note however, that you have some wildcards in this
case, for example the .id field of KDBUS_ITEM_ID_ADD/KDBUS_ITEM_ID_REMOVE
structures may be set to 0 to match against any id addition/removal.

Note that dbus match strings do no map 1:1 to these ioctl() calls. In
many cases (where the match string is "underspecified") you might need
to issue up to six different ioctl() calls for the same match. For
example, the empty match (which matches against all messages), would
translate into one KDBUS_ITEM_BLOOM ioctl, one KDBUS_ITEM_NAME_ADD,
one KDBUS_ITEM_NAME_CHANGE, one KDBUS_ITEM_NAME_REMOVE, one
KDBUS_ITEM_ID_ADD and one KDBUS_ITEM_ID_REMOVE.

When creating a match, you may attach a "cookie" value to them, which
is used for deleting this match again. The cookie can be selected freely
by the client. When issuing KDBUS_CMD_REMOVE_MATCH, simply pass the
same cookie as before and all matches matching the same "cookie" value
will be removed. This is particularly handy for the case where multiple
ioctl()s are added for a single match strings.

MEMFDS

memfds may be sent across kdbus via KDBUS_ITEM_PAYLOAD_MEMFD items
attached to messages. If this is done, the data included in the memfd
is considered part of the payload stream of a message, and are treated
the same way as KDBUS_ITEM_PAYLOAD_VEC by the receiving side. It is
possible to interleave KDBUS_ITEM_PAYLOAD_MEMFD and
KDBUS_ITEM_PAYLOAD_VEC items freely, by the reader they will be
considered a single stream of bytes in the order these items appear in
the message, that just happens to be split up at various places
(regarding rules how they may be split up, see above). The kernel will
refuse taking KDBUS_ITEM_PAYLOAD_MEMFD items that refer to memfds that
are not sealed.

Note that sealed memfds may be unsealed again if they are not mapped
you have the only fd reference to them.

Alternatively to sending memfds as KDBUS_ITEM_PAYLOAD_MEMFD items
(where they are just a part of the payload stream of a message) you can
also simply attach any memfd to a message using
KDBUS_ITEM_PAYLOAD_FDS. In this case, the memfd contents is not
considered part of the payload stream of the message, but simply fds
like any other, that happen to be attached to the message.

MESSAGES FROM THE KERNEL

A couple of messages previously generated by the dbus1 bus driver are
now generated by the kernel. Since the kernel does not understand the
payload marshaling, they are generated by the kernel  in a different
format. This is indicated with the "payload type" field of the
messages set to 0. Library implementations should take these messages
and synthesize traditional driver messages for them on reception.

More specifically:

   Instead of the NameOwnerChanged, NameLost, NameAcquired signals
   there are kernel messages containing KDBUS_ITEM_NAME_ADD,
   KDBUS_ITEM_NAME_REMOVE, KDBUS_ITEM_NAME_CHANGE, KDBUS_ITEM_ID_ADD,
   KDBUS_ITEM_ID_REMOVE items are generated (each message will contain
   exactly one of these items). Note that in libsystemd we have
   obsoleted NameLost/NameAcquired messages, since they are entirely
   redundant to NameOwnerChanged. This library will hence only
   synthesize NameOwnerChanged messages from these kernel messages,
   and never generate NameLost/NameAcquired. If your library needs to
   stay compatible to the old dbus1 userspace, you possibly might need
   to synthesize both a NameOwnerChanged and NameLost/NameAcquired
   message from the same kernel message.

   When a method call times out, a KDBUS_ITEM_REPLY_TIMEOUT message is
   generated. This should be synthesized into a method error reply
   message to the original call.

   When a method call fails because the peer terminated the connection
   before responding, a KDBUS_ITEM_REPLY_DEAD message is
   generated. Similarly, it should be synthesized into a method error
   reply message.

For synthesized messages we recommend setting the cookie field to
(uint32_t) -1 (and not (uint64_t) -1!), so that the cookie is not 0
(which the dbus1 spec does not allow), but clearly recognizable as
synthetic.

Note that the KDBUS_ITEM_NAME_XYZ messages will actually inform you
about all kinds of names, including activatable ones. Classic dbus1
NameOwnerChanged messages OTOH are only generated when a name is
really acquired on the bus and not just simply activatable. This means
you must explicitly check for the case where an activatable name
becomes acquired or an acquired name is lost and returns to be
activatable.

NAME REGISTRY

To acquire names on the bus, use the KDBUS_CMD_NAME_ACQUIRE ioctl(). It
takes a flags field similar to dbus1's RequestName() bus driver call,
however the NO_QUEUE flag got inverted into a QUEUE flag instead.

To release a previously acquired name use the KDBUS_CMD_NAME_RELEASE
ioctl().

To list acquired names use the KDBUS_CMD_CONN_INFO ioctl. It may be
used to list unique names, well known names as well as activatable
names and clients currently queuing for ownership of a well-known
name. The ioctl will return an offset into the memory pool. After
reading all the data you need, you need to release this via the
KDBUS_CMD_FREE ioctl(), similar how you release a received message.

CREDENTIALS

kdbus can optionally attach various kinds of metadata about the sender at
the point of time of sending ("credentials") to messages, on request
of the receiver. This is both supported on directed and undirected
(broadcast) messages. The metadata to attach is selected at time of
the HELLO ioctl of the receiver via a flags field (see above). Note
that clients must be able to handle that messages contain more
metadata than they asked for themselves, to simplify implementation of
broadcasting in the kernel. The receiver should not rely on this data
to be around though, even though it will be correct if it happens to
be attached. In order to avoid programming errors in applications, we
recommend though not passing this data on to clients that did not
explicitly ask for it.

Credentials may also be queried for a well-known or unique name. Use
the KDBUS_CMD_CONN_INFO for this. It will return an offset to the pool
area again, which will contain the same credential items as messages
have attached. Note that when issuing the ioctl, you can select a
different set of credentials to gather, than what was originally requested
for being attached to incoming messages.

Credentials are always specific to the sender's domain that was
current at the time of sending, and of the process that opened the
bus connection at the time of opening it. Note that this latter data
is cached!

POLICY

The kernel enforces only very limited policy on names. It will not do
access filtering by userspace payload, and thus not by interface or
method name.

This ultimately means that most fine-grained policy enforcement needs
to be done by the receiving process. We recommend using PolicyKit for
any more complex checks. However, libraries should make simple static
policy decisions regarding privileged/unprivileged method calls
easy. We recommend doing this by enabling KDBUS_ATTACH_CAPS and
KDBUS_ATTACH_CREDS for incoming messages, and then discerning client
access by some capability, or if sender and receiver UIDs match.

BUS ADDRESSES

When connecting to kdbus use the "kernel:" protocol prefix in DBus
address strings. The device node path is encoded in its "path="
parameter.

Client libraries should use the following connection string when
connecting to the system bus:

   kernel:path=/sys/fs/kdbus/0-system/bus;unix:path=/var/run/dbus/system_bus_socket

This will ensure that kdbus is preferred over the legacy AF_UNIX
socket, but compatibility is kept. For the user bus use:

   kernel:path=/sys/fs/kdbus/$UID-user/bus;unix:path=$XDG_RUNTIME_DIR/bus

With $UID replaced by the callers numer user ID, and $XDG_RUNTIME_DIR
following the XDG basedir spec.

Of course the $DBUS_SYSTEM_BUS_ADDRESS and $DBUS_SESSION_BUS_ADDRESS
variables should still take precedence.

DBUS SERVICE FILES

Activatable services for kdbus may not use classic dbus1 service
activation files. Instead, programs should drop in native systemd
.service and .busname unit files, so that they are treated uniformly
with other types of units and activation of the system.

Note that this results in a major difference to classic dbus1:
activatable bus names can be established at any time in the boot process.
This is unlike dbus1 where activatable names are unconditionally available
as long as dbus-daemon is running. Being able to control when
activatable names are established is essential to allow usage of kdbus
during early boot and in initrds, without the risk of triggering
services too early.

DISCLAIMER

This all is so far just the status quo. We are putting this together, because
we are quite confident that further API changes will be smaller, but
to make this very clear: this is all subject to change, still!

We invite you to port over your favorite dbus library to this new
scheme, but please be prepared to make minor changes when we still
change these interfaces!