If carrier is lost before network fingerprinting is complete, we
have a few problems; first, we don't know whether the network has
changed underneath us. Second, we've not yet configured the
interface properties, and it is not unlikely that doing so will
fail as the underlying network device may have been destroyed
and recreated during this time (eg, if ethtool has been run at
start-up time).
Thus, the safest reaction is to terminate and force a supervisor
respawn. It is best to do this once carrier recovers, not when
the carrier is lost, as it is more likely to minimize delays.
The gateway/router MAC fingerprinting could perhaps be done more
robustly in the face of suspend or carrier loss, but the time window
in which things could get confused is very small and I would rather
just rely on supervisor respawn in that case.
Even this case I don't think I've ever seen.
Somewhere along the line it quit being set at the start of discovery
and was always 0. This is clearly not desired behavior.
Found by manual examination of packets while fuzzing the options
parser.
This corrects a bug where stale dhcp packets would get reprocessed,
causing very bad behavior; an issue that was introduced in the
coroutine conversion.
Some networks have multiple DHCP servers that don't respect the
serverid that is specified in DHCPREQUESTs.
Instead we simply check to see that the yiaddr matches.
While we're at it, ignore DHCP NAK during REQUESTING state. It
doesn't really make sense. Instead we should just wait for
timeout.
This change actually has no effect because incoming dhcp packets
that differ from our xid are dropped, so the xid is always changed
to the same value that it already has.
This change makes it much easier to reason about ndhc's behavior
and properly handle errors.
It is a very large changeset, but there is no way to make this
sort of change incrementally. Lease acquisition is tested to
work.
It is highly likely that some bugs were both introduced and
squashed here. Some obvious code cleanups will quickly follow.
If a packet send failed because the carrier went down without a
netlink notification, then assume the hardware carrier was lost while
the machine was suspended (eg, ethernet cable pulled during suspend).
Simulate a netlink carrier down event and freeze the dhcp state
machine until a netlink carrier up event is received.
The ARP code is not yet handling this issue everywhere, but the
window of opportunity for it to happen there is much shorter.
Mostly reverts the previous commit and instead teaches ndhc to properly
handle the case when it is communicating with a DHCP relay agent on
its local segment rather than directly with a DHCP server.
different segment.
The network fingerprinting would never complete if the DHCP server was
on a different segment before this change, since it would be impossible
for the ARP messages sent by ndhc to ever reach the DHCP server
(and vice-versa).
Now just give up trying to find the hardware address after two tries
and assume that the DHCP server cannot be reached by ARP.
An alternative would be to fingerprint the relay agent instead, but
to do so would require a lot more work as the giaddr field is only
meaningful in the client->server message path, not in the
server->client path. Thus it would require gathering the source IP
for DHCP replies sent by unicast or broadcast and ferrying along
this information to the ARP checking code where it would be used
in place of the DHCP server address.
This is entirely possible to do, but is quite a bit more work.