Age | Commit message (Collapse) | Author |
|
Prior to this change every journal append causes an ftruncate() for the
sake of inotify propagation of the mmap-based writes.
With this change the notification is deferred up to ~250ms, coalescing
any repeated journal writes during the deferred period into a single
ftruncate(). The ftruncate() call isn't free and doing it on every
append adds unnecessary overhead and latency in the journald event loop.
Introduces journal_file_enable_post_change_timer() which manages a
timer on the provided sd-event instance for scheduling coalesced
ftruncates. The ftruncate() behavior is unchanged unless
journal_file_enable_post_change_timer() is called on the JournalFile.
While not a tremendous improvement, profiling systemd-journald event loop
latencies using instrumentation as introduced by 34b8751 it was observed that
coalescing the ftruncates was low-hanging fruit worth pursuing.
Note orders 12 and 13 shifting left into order 11 and order 6 dipping into
order 5:
Unmodified:
log2(us) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
-----------------------------------------------------------
[10685.414572] 0 0 0 0 38 602 61 2 290 60 1643 2554 13 1 4 1 0 0 1
[10690.415114] 0 0 0 0 0 646 54 7 309 44 2073 2148 17 1 3 0 0 0 1
[10695.415509] 0 0 0 0 1 650 73 3 324 37 2071 2270 9 0 0 1 0 1 0
[10700.416297] 0 0 0 0 0 659 50 4 318 38 2111 2152 6 0 1 0 0 1 1
[10705.417136] 0 0 0 0 2 660 48 4 320 38 2129 2146 12 1 1 0 0 1 1
[10710.489114] 0 0 0 0 0 673 38 3 321 37 1925 2339 7 0 0 0 0 1 1
[10715.489613] 0 0 0 0 3 656 64 8 317 48 2365 2007 7 0 0 0 0 0 1
Coalesced:
log2(us) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
-----------------------------------------------------------
[ 6169.161360] 0 0 0 1 24 786 54 11 389 24 4192 771 6 4 0 0 1 0 1
[ 6174.161705] 0 0 0 1 18 800 35 6 380 27 3977 893 3 1 0 0 1 0 1
[ 6179.162741] 0 0 0 1 28 768 51 4 391 16 3998 831 5 3 0 0 0 0 2
[ 6184.162856] 0 0 0 0 19 770 60 2 376 26 3795 1004 9 5 1 0 1 0 1
[ 6189.163279] 0 0 0 0 28 761 49 7 372 27 3729 1056 3 2 0 0 1 0 1
[ 6194.164255] 0 0 0 0 25 785 49 7 394 19 3996 908 6 3 2 0 0 0 1
[ 6199.164658] 0 0 0 0 29 797 35 5 389 18 3995 898 3 4 1 1 1 0 1
The remaining high-order delays are a result of the synchronous fsyncs in
systemd-journald, beyond the scope of this commit.
|
|
Misc resolved cache fixes
|
|
This is in the fast path, so let's not do all this work unneccessarily.
|
|
When the DNS_RESOURCE_KEY_CACHE_FLUSH flag is not set for an mDNS packet, we should not flush
the cache for RRs with matching keys. However, we were unconditionally flushing the cache
also for these packets.
Now mark all packets as cache_flush by default, except for these mDNS packets, and respect
that flag in the cache handling.
This fixes 90325e8c2e559a21ef0bc2f26b844c140faf8020.
|
|
importd: drop dkr support
|
|
The logic of dns_cache_get() is now:
- look up the precise key;
- look up NXDOMAIN item;
- if an RR type that may be redirected
(i.e., not CNAME, DNAME, RRSIG, NSEC, NSEC3, SIG, KEY, or
NXT) look up a correpsonding CNAME or DNAME record;
- look up a corresponding NSEC record;
Before this change we would give up before potentially finding
negative cache entries for DNAME, CNAME and NSEC records, we
would return NSEC records for aliases where we had DNAME or CNAME
records available and we would incorrectly try to redirect DNSSEC RRs.
|
|
Some DNS servers will hand out negative answers without SOA records,
these can not be cached, so log about that fact.
|
|
An NXDOMAIN entry means there are no RRs of any type for a name,
so only cache by CLASS + NAME, rather than CLASS + NAME + TYPE.
|
|
Apart from dropping redundant information, this fixes an issue
where, due to broken DNS servers, we can only be certain of whether
an apparent NODATA response is in fact an NXDOMAIN response after
explicitly resolving the canonical name. This issue is outlined in
RFC2308. Moreover, by caching NXDOMAIN for an existing name, we
would mistakenly return NXDOMAIN for types which should not be
redirected. I.e., a query for AAAA on test-nx-1.jklm.no correctly
returns NXDOMAIN, but a query for CNAME should return the record
and a query for DNAME should return NODATA.
Note that this means we will not cache an NXDOMAIN response in the
presence of redirection, meaning one redundant roundtrip in case the
name is queried again.
|
|
Use /proc/net/sockstat6 to detect IPv6 support
|
|
The current code is not compatible with current dkr protocols anyway,
and dkr has a different focus ("microservices") than nspawn anyway
("whole machine containers"), hence drop support for it, we cannot
reasonably keep this up to date, and it creates the impression we'd
actually care for the microservices usecase.
|
|
resolved: more mDNS specific bits (3)
|
|
RFC6762, 18.1:
In multicast query messages, the Query Identifier SHOULD be set to
zero on transmission.
|
|
Only .in-addr.arpa and .local are considered local in mDNS, so discard the
packet if anything else is thrown at us.
|
|
Third DNSSEC patch series
|
|
The file /sys/module/ipv6 does not exist in all container
implementations (e.g. Virtuozzo). Using /proc/net/sockstat6
detects IPv6 support reliably in these environments, too.
This file does not exist when the kernel is not compiled with
IPv6 support, or if IPv6 support is disabled, so simply checking
for existence should be a suitable check.
Fixes #2059
|
|
core: do not warn about Wants depencencies on masked units
|
|
Let's simply call it dns_transaction_prepare(), so that we have the nice
cycle for prepare() → go() → emit() → process().
After all it's pretty clear that what we prepare there, and we dont call
the others go_next_attempt(), emit_next_attempt() or
process_next_attempt().
|
|
destructors
|
|
|
|
This adds initial support for validating RRSIG/DNSKEY/DS chains when
doing lookups. Proof-of-non-existance, or proof-of-unsigned-zones is not
implemented yet.
With this change DnsTransaction objects will generate additional
DnsTransaction objects when looking for DNSKEY or DS RRs to validate an
RRSIG on a response. DnsTransaction objects are thus created for three
reasons now:
1) Because a user asked for something to be resolved, i.e. requested by
a DnsQuery/DnsQueryCandidate object.
2) As result of LLMNR RR probing, requested by a DnsZoneItem.
3) Because another DnsTransaction requires the requested RRs for
validation of its own response.
DnsTransactions are shared between all these users, and are GC
automatically as soon as all of these users don't need a specific
transaction anymore.
To unify the handling of these three reasons for existance for a
DnsTransaction, a new common naming is introduced: each DnsTransaction
now tracks its "owners" via a Set* object named "notify_xyz", containing
all owners to notify on completion.
A new DnsTransaction state is introduced called "VALIDATING" that is
entered after a response has been receieved which needs to be validated,
as long as we are still waiting for the DNSKEY/DS RRs from other
DnsTransactions.
This patch will request the DNSKEY/DS RRs bottom-up, and then validate
them top-down.
Caching of RRs is now only done after verification, so that the cache is
not poisoned with known invalid data.
The "DnsAnswer" object gained a substantial number of new calls, since
we need to add/remove RRs to it dynamically now.
|
|
the DNSKEY's algorithm
As long as we support the digest we are good.
|
|
name, not the owner name
When the DNSKEY is in higher zone, then that's OK, and we need to check
the RRSIG's signer name against the DNSKEY hence.
|
|
We actually maintain an array of pointers to RRs, not of RRs themselves,
fix the qsort() invocation accordingly.
|
|
When increasing the DnsAnswer array, don't operate piecemeal, grow the
array exponentially.
This way, the default logic for DnsAnswer allocations matches the
behaviour for GREEDY_REALLOC and suchlike, and we can reduce the number
of necessary allocations.
|
|
|
|
This got borked in 547493c5ad5c82032e247609970f96be76c2d661.
|
|
It's complicated enough, it deserves its own call.
(Also contains some unrelated whitespace, comment and assertion changes)
|
|
Enforce this while parsing RRs.
|
|
As soon as we encounter the OPT RR while parsing, store it in a special
field in the DnsPacket structure. That way, we won't be confused if we
iterate through RRs, and can check that there's really only one of these
RRs around.
|
|
DnsAnswer objects should be considered immutable after having passed to
more than one user, i.e. with a reference counter > 1. Enforce that in
code, so that we can track down misuses easier.
|
|
libgcrypt encodes the error source in the error code, we need to mask
that away before comparing error codes.
|
|
Let's simplify things, by making this a function call of its own.
|
|
|
|
ifindex variable
|
|
Quoting @teg:
"Contrary to what the comment said, we always verify redirect chains in
full, and cache all the CNAME records. There is therefore no need to
do extra negative caching along a CNAME chain."
This simply steals @teg's commit since we'll touch the SOA matching case
in a later patch, and rather want this bit gone, so that we don't have
to "fix" it, only to remove it later on.
|
|
After all, that's how this is done in DNS, and is particularly important
if we look a DS/DNSKEY RRs for the root zone itself, where the owner
name would otherwise be shown as completely empty (i.e. missing).
|
|
When iterating through RR lists we frequently end up comparing RRs and
RR keys with themselves, hence att a minor optimization to check ptr
values first, before doing a deep comparison.
|
|
DNS RR types are uint16_t after all, treat them as such.
|
|
Expose soft limits on the bus
|
|
For mDNS, if we're unable to stuff all known answers into the given packet,
allocate a new one, push the RR into that one and link it to the current
one.
|
|
In dns_scope_emit(), walk the list of additional packets and emit all of
them. Set the TC bit in all but the last of them.
This is specific to mDNS, so an assertion is triggered if used with other
protocols.
|
|
For mDNS, we need to support the TC bit in case the list of known answers
exceed the maximum packet size.
For this, add a 'more' pointer to DnsPacket for an additional packet.
When a packet is unref'ed, the ->more packet is also unrefed, so it
sufficient to only keep track of the 1st packet in a chain.
|
|
We need to support the TC bit in queries in case known answers exceed the
maximum packet size. Factor out the flags compilation to
dns_packet_set_flags() and make it externally available.
|
|
sd_event_add_io() returns the error directly and does not mess with errno.
|
|
DNS names ending with .local are specific to mDNS, so don't use them
on DNS scopes.
|
|
Udev indentation
|
|
basic: add RB-Tree implementation
|
|
This new functions exports cached records of type PTR, SRV and TXT into
an existing DnsPacket. This is used in order to fill in known records
to mDNS queries, for known answer supression.
|
|
Implement dns_transaction_make_packet_mdns(), a special version of
dns_transaction_make_packet() for mDNS which differs in many ways:
a) We coalesce queries of currently active transaction on the scope.
This is possible because mDNS actually allows many questions in a
to be sent in a single packet and it takes some burden from the
network.
b) Both A and AAAA query keys are broadcast on both IPv4 and IPv6
scopes, because other hosts might only respond on one of their
addresses but resolve both types.
c) We discard previously sent packages (t->sent) so we can start over
and coalesce pending transactions again.
|