Fix a race between "rndc reconfig" and waiting for a ./DNSKEY fetch to complete

Calling nextpart() after reconfiguring ns1 is not safe, because the
expected log message may appear in ns5/named.run before nextpart() is
run.  With the TTL for ./DNSKEY set to 20 seconds, ns5 will refresh it
after 10 seconds, by which time wait_for_log() will already have failed.
This results in a false negative.

However, just calling nextpart() before reconfiguring ns1 would
introduce a different problem: if ns5 refreshed ./DNSKEY between these
two steps, the subsequent wait_for_log() call would return immediately
as it would come across the log message about a failure while refreshing
./DNSKEY instead of the expected success.  This in turn would result in
a different false negative as the root key would still be uninitialized
by the time "rndc secroots" is called.

Prevent both kinds of false negatives by:

  - calling nextpart() before reconfiguring ns1, in order to prevent the
    first case described above,

  - looking for a more specific log message, in order to prevent the
    second case described above.

Also look for a more specific log message in the first part of the
relevant check, not to fix any problem, but just to emphasize that a
different fetch result is expected in that case.

With these tweaks in place, if a (failed) ./DNSKEY refresh is scheduled
between nextpart() and reconfiguring ns1, wait_for_log() will just wait
for two more seconds (one "hour"), at which point another refresh
attempt will be made that will succeed.

(cherry picked from commit 012ca0a27d)
This commit is contained in:
Michał Kępień 2018-03-06 11:33:16 +01:00
parent 479ba17e93
commit bdf0287c2d
2 changed files with 3 additions and 4 deletions

View file

@ -17,7 +17,6 @@ test -r $RANDFILE || $GENRANDOM 400 $RANDFILE
copy_setports ns1/named1.conf.in ns1/named.conf
copy_setports ns2/named.conf.in ns2/named.conf
copy_setports ns3/named.conf.in ns3/named.conf
copy_setports ns4/named.conf.in ns4/named.conf
copy_setports ns5/named.conf.in ns5/named.conf
cp ns5/named1.args ns5/named.args

View file

@ -685,7 +685,7 @@ rm -f ns5/managed-keys.bind*
cp ns5/named2.args ns5/named.args
nextpart ns5/named.run > /dev/null
$PERL $SYSTEMTESTTOP/start.pl --noclean --restart --port ${PORT} . ns5
wait_for_log "Returned from key fetch in keyfetch_done()" ns5/named.run
wait_for_log "Returned from key fetch in keyfetch_done() for '.': failure" ns5/named.run
# ns1 should still REFUSE queries from ns5, so resolving should be impossible
$DIG $DIGOPTS +noauth example. @10.53.0.5 txt > dig.out.ns5.a.test$n || ret=1
grep "flags:.*ad.*QUERY" dig.out.ns5.a.test$n > /dev/null && ret=1
@ -694,9 +694,9 @@ grep "status: SERVFAIL" dig.out.ns5.a.test$n > /dev/null || ret=1
# Allow queries from ns5 to ns1
copy_setports ns1/named3.conf.in ns1/named.conf
rm -f ns1/root.db.signed.jnl
mkeys_reconfig_on 1
nextpart ns5/named.run > /dev/null
wait_for_log "Returned from key fetch in keyfetch_done()" ns5/named.run
mkeys_reconfig_on 1
wait_for_log "Returned from key fetch in keyfetch_done() for '.': success" ns5/named.run
# ns1 should not longer REFUSE queries from ns5, so managed keys should be
# correctly refreshed and resolving should succeed
$DIG $DIGOPTS +noauth example. @10.53.0.5 txt > dig.out.ns5.b.test$n || ret=1