Age | Commit message (Collapse) | Author |
|
Instead of figuring out how many RRs to cache right before we do so,
determine this at the time we install the answer RRs, so that we can
still alter this as we manipulate the answer during validation.
The primary purpose of this is to pave the way so that we can drop
unsigned RRsets from the answer and invalidate the number of RRs to
cache at the same time.
|
|
This adds initial support for validating RRSIG/DNSKEY/DS chains when
doing lookups. Proof-of-non-existance, or proof-of-unsigned-zones is not
implemented yet.
With this change DnsTransaction objects will generate additional
DnsTransaction objects when looking for DNSKEY or DS RRs to validate an
RRSIG on a response. DnsTransaction objects are thus created for three
reasons now:
1) Because a user asked for something to be resolved, i.e. requested by
a DnsQuery/DnsQueryCandidate object.
2) As result of LLMNR RR probing, requested by a DnsZoneItem.
3) Because another DnsTransaction requires the requested RRs for
validation of its own response.
DnsTransactions are shared between all these users, and are GC
automatically as soon as all of these users don't need a specific
transaction anymore.
To unify the handling of these three reasons for existance for a
DnsTransaction, a new common naming is introduced: each DnsTransaction
now tracks its "owners" via a Set* object named "notify_xyz", containing
all owners to notify on completion.
A new DnsTransaction state is introduced called "VALIDATING" that is
entered after a response has been receieved which needs to be validated,
as long as we are still waiting for the DNSKEY/DS RRs from other
DnsTransactions.
This patch will request the DNSKEY/DS RRs bottom-up, and then validate
them top-down.
Caching of RRs is now only done after verification, so that the cache is
not poisoned with known invalid data.
The "DnsAnswer" object gained a substantial number of new calls, since
we need to add/remove RRs to it dynamically now.
|
|
For each transaction, record when the earliest point in time when the
query packet may hit the wire. This is the same time stamp for which
the timer is scheduled in retries, except for the initial query packets
which are delayed by a random jitter. In this case, we denote that the
packet may actually be sent at the nominal time, without the jitter.
Transactions that share the same timestamp will also have identical
values in this field. It is used to coalesce pending queries in a later
patch.
|
|
When a jitter callback is issued instead of sending a DNS packet directly,
on_transaction_timeout() is invoked to 'retry' the transaction. However,
this function has side effects. For once, it increases the packet loss
counter on the scope, and it also unrefs/refs the server instances.
Fix this by tracking the jitter with two bool variables. One saying that
the initial jitter has been scheduled in the first place, and one that
tells us the delay packet has been sent.
|
|
The logic is to kick off mDNS packets in a delayed way is mostly identical
to what LLMNR needs, except that the constants are different.
|
|
This adds a new SD_RESOLVED_AUTHENTICATED flag for responses we return
on the bus. When set, then the data has been authenticated. For now this
mostly reflects the DNSSEC AD bit, if DNSSEC=trust is set. As soon as
the client-side validation is complete it will be hooked up to this flag
too.
We also set this bit whenver we generated the data ourselves, for
example, because it originates in our local LLMNR zone, or from the
built-in trust anchor database.
The "systemd-resolve-host" tool has been updated to show the flag state
for the data it shows.
|
|
When doing DNSSEC lookups we need to know one or more DS or DNSKEY RRs
as trust anchors to validate lookups. With this change we add a
compiled-in trust anchor database, serving the root DS key as of today,
retrieved from:
https://data.iana.org/root-anchors/root-anchors.xml
The interface is kept generic, so that additional DS or DNSKEY RRs may
be served via the same interface, for example by provisioning them
locally in external files to support "islands" of security.
The trust anchor database becomes the fourth source of RRs we maintain,
besides, the network, the local cache, and the local zone.
|
|
This is inspired by the logic in BIND [0], follow-up patches
will implement the reset of that scheme.
If we get a server error back, or if after several attempts we don't
get a reply at all, we switch from UDP to TCP for the given
server for the current and all subsequent requests. However, if
we ever successfully received a reply over UDP, we never fall
back to TCP, and once a grace-period has passed, we try to upgrade
again to using UDP. The grace-period starts off at five minutes
after the current feature level was verified and then grows
exponentially to six hours. This is to mitigate problems due
to temporary lack of network connectivity, but at the same time
avoid flooding the network with retries when the feature attempted
feature level genuinely does not work.
Note that UDP is likely much more commonly supported than TCP,
but depending on the path between the client and the server, we
may have more luck with TCP in case something is wrong. We really
do prefer UDP though, as that is much more lightweight, that is
why TCP is only the last resort.
[0]: <https://kb.isc.org/article/AA-01219/0/Refinements-to-EDNS-fallback-behavior-can-cause-different-outcomes-in-Recursive-Servers.html>
|
|
Let's track where the data came from: from the network, the cache or the
local zone. This is not only useful for debugging purposes, but is also
useful when the zone probing wants to ensure it's not reusing
transactions that were answered from the cache or the zone itself.
|
|
DnsTransaction objects
Previously we'd only store the DnsPacket in the DnsTransaction, and the
DnsQuery would then take the DnsPacket's DnsAnswer and return it. With
this change we already pull the DnsAnswer out inside the transaction.
We still store the DnsPacket in the transaction, if we have it, since we
still need to determine from which peer a response originates, to
implement caching properly. However, the DnsQuery logic doesn't care
anymore for the packet, it now only looks at answers and rcodes from the
successfuly candidate.
This also has the benefit of unifying how we propagate incoming packets,
data from the local zone or the local cache.
|
|
This adds support for searching single-label hostnames in a set of
configured search domains.
A new object DnsQueryCandidate is added that links queries to scopes.
It keeps track of the search domain last used for a query on a specific
link. Whenever a host name was unsuccessfuly resolved on a scope all its
transactions are flushed out and replaced by a new set, with the next
search domain appended.
This also adds a new flag SD_RESOLVED_NO_SEARCH to disable search domain
behaviour. The "systemd-resolve-host" tool is updated to make this
configurable via --search=.
Fixes #1697
|
|
This is a continuation of the previous include sort patch, which
only sorted for .c files.
|
|
This hopefully makes this a bit more expressive and clarifies that the
fd is not used for the DNS TCP socket. This also mimics how the LLMNR
UDP fd is named in the manager object.
|
|
Let's simplify things and only maintain a single RR key per transaction
object, instead of a full DnsQuestion. Unicast DNS and LLMNR don't
support multiple questions per packet anway, and Multicast DNS suggests
coalescing questions beyond a single dns query, across the whole system.
|
|
With the exponential backoff, we can perform more requests in the same amount of time,
so bump this a bit.
In case of large RTT this may be necessary in order not to regress, and in case
of large packet-loss it will make us more robust. The latter is particularly
relevant once we start probing for features (and hence may see packet-loss
until we settle on the right feature level).
|
|
Rather than fixing this to 5s for unicast DNS and 1s for LLMNR, start
at a tenth of those values and increase exponentially until the old
values are reached. For LLMNR the recommended timeout for IEEE802
networks (which basically means all of the ones we care about) is 100ms,
so that should be uncontroversial. For unicast DNS I have found no
recommended value. However, it seems vastly more likely that hitting a
500ms timeout is casued by a packet loss, rather than the RTT genuinely
being greater than 500ms, so taking this as a startnig value seems
reasonable to me.
In the common case this greatly reduces the latency due to normal packet
loss. Moreover, once we get support for probing for features, this means
that we can send more packets before degrading the feature level whilst
still allowing us to settle on the correct feature level in a reasonable
timeframe.
The timeouts are tracked per server (or per scope for the multicast
protocols), and once a server (or scope) receives a successfull package
the timeout is reset. We also track the largest RTT for the given
server/scope, and always start our timouts at twice the largest
observed RTT.
|
|
This function emits the UDP packet via the scope, but first it will
determine the current server (and connect to it) and store the
server in the transaction.
This should not change the behavior, but simplifies the code.
|
|
With access to the server when creating the socket, we can connect()
to the server and hence simplify message sending and receiving in
follow-up patches.
|
|
A transaction can only have one socket at a time, so no need to distinguish these.
|
|
We used to have one global socket, use one per transaction instead. This
has the side-effect of giving us a random UDP port per transaction, and
hence increasing the entropy and making cache poisoining significantly
harder to achieve.
We still reuse the same port number for packets belonging to the same
transaction (resent packets).
|
|
We want to discover information about the server and use that in when crafting
packets to be resent.
|
|
This patch removes includes that are not used. The removals were found with
include-what-you-use which checks if any of the symbols from a header is
in use.
|
|
|
|
|