arp: Fix case where changing interface properties consistently fails.

If changing interface properties fails after getting a lease, it is
possible under some strange conditions for the failure to be
persistent.  This seems to happen if the carrier cycles off and on
several times during ndhc initialization.

Since this issue is very hard to replicate, the most conservative
thing to do here is to simply have ndhc suicide itself so it can
be respawned by a process supervisor.

Logs of the issue in practice:

(carrier is down while the daemon is started here, it seems)

16:57:09.638979845 ndhc-ifch seccomp filter installed.  Please disable seccomp if you
16:57:09.638989136 Discovering DHCP servers...
16:57:09.638991371 (send_dhcp_raw) carrier down; sendto would fail
16:57:09.638993318 Failed to send a discover request packet.
 ...
16:57:13.636519925 Discovering DHCP servers...
16:57:13.651462476 Received IP offer: X from server Y via Z
 ...
16:57:13.912592571  wan0: Gateway router set to: A
16:57:13.912607463  wan0: arp: Searching for dhcp server and gw addresses...
16:57:14.635532676  wan0: Carrier down.
17:04:32.983897760  wan0: arp: Still looking for gateway hardware address...
17:04:32.984158226  wan0: arp: Still looking for DHCP agent hardware address...
17:04:32.984781255  wan0: Interface is back.  Revalidating lease...
17:04:32.985585501  wan0: arp: Gateway hardware address B
17:04:32.985590436  wan0: arp: DHCP agent hardware address C
17:04:38.234857403  wan0: arp: Still waiting for gateway to reply to arp ping...
17:04:38.235109016  wan0: arp: Still waiting for DHCP agent to reply to arp ping...
16:57:24.165620224  wan0: arp: Still waiting for gateway to reply to arp ping...
16:57:29.169621070  wan0: arp: DHCP agent and gateway didn't reply.  Getting new lease.
16:57:29.217710616  wan0: Discovering DHCP servers...
16:57:29.249645130  wan0: Received IP offer: X from server Y via Z
16:57:29.249657203  wan0: Sending a selection request for X...
16:57:29.285632973  wan0: Received ACK: X from server Y via Z
16:57:29.297717159  wan0: arp: Probing for hosts that may conflict with our lease...
16:57:29.360249458  wan0: arp: Probing for hosts that may conflict with our lease...
16:57:29.435114526  wan0: arp: Probing for hosts that may conflict with our lease...
16:57:29.500473345  wan0: Lease of X obtained.  Lease time is D seconds.
16:57:29.500485894  wan0: Failed to set the interface IP address and properties!
 ...
And the final two errors repeat.  Restarting ndhc by hand instantly
fixes the issue.

So there's a lot going on -- bizzare clock skew, and carrier flickering
on and off.
This commit is contained in:
Nicholas J. Kain 2015-10-28 20:20:21 -04:00
parent e0b5ff8eaf
commit ae16e26d00

View File

@ -499,9 +499,8 @@ int arp_collision_timeout(struct client_state_t cs[static 1], long long nowts)
garp.last_conflict_ts = 0; garp.last_conflict_ts = 0;
garp.wake_ts[AS_COLLISION_CHECK] = -1; garp.wake_ts[AS_COLLISION_CHECK] = -1;
if (ifchange_bind(cs, &garp.dhcp_packet) < 0) { if (ifchange_bind(cs, &garp.dhcp_packet) < 0) {
log_warning("%s: Failed to set the interface IP address and properties!", suicide("%s: Failed to set the interface IP address and properties!",
client_config.interface); client_config.interface);
return ARPR_FAIL;
} }
cs->routerAddr = get_option_router(&garp.dhcp_packet); cs->routerAddr = get_option_router(&garp.dhcp_packet);
if (arp_get_gw_hwaddr(cs) < 0) { if (arp_get_gw_hwaddr(cs) < 0) {