ARP packets aren't split across multiple receive events, so
reply_offset is pointless, and we implicitly assume that the
previous ARP packet data is still available after a forced sleep.
We instead check carrier status as needed. This approach is more
robust. For a simple example, imagine link state changes that happen
while the machine is suspended.
There was no way to disable writing pidfiles before.
pidfiles are an unreliable method of tracking processes, anyway; process
supervisors are strongly recommended. If a pidfile is really needed, it
can be explicitly specified.
Before the exit code would be success and no error message would print,
and it required a bit of control flow tracing to determine what would
actually happen.
No direct functional change (unless the supervising process cares about
the return code of ndhc on exit).
Lease times and arp timeouts are all calculated using long long,
but epoll takes its timeout argument as an int. Guard against
timeouts > INT_MAX but < UINT_MAX wrapping and causing spuriously
short timeouts when converted to a signed int.
This problem has been observed in the wild. Thanks to thypon
for a detailed strace that pointed me towards this issue.
This corrects a bug where stale dhcp packets would get reprocessed,
causing very bad behavior; an issue that was introduced in the
coroutine conversion.
If ndhc were a high-performance program that handled lots of events,
this change would harm performance. But it is not, and it implicitly
believes that events come in one at a time. Processing batches
would make it harder to assure correctness while also never allocating
memory at runtime.
The previous structure was fine when everything was handled
immediately by callbacks, but it isn't now.
This change makes it much easier to reason about ndhc's behavior
and properly handle errors.
It is a very large changeset, but there is no way to make this
sort of change incrementally. Lease acquisition is tested to
work.
It is highly likely that some bugs were both introduced and
squashed here. Some obvious code cleanups will quickly follow.
This change paves the way for allowing ifch to notify the core ndhc
about failures. It would be far too difficult to reason about the
state machine if the requests to ifch were asynchronous.
Currently ndhc assumes that ifch requests never fail, but this
is not always true because of eg, rfkill.
In order for this to work, the correct rfkill index must be specified
with the rfkill-idx option.
It might be possible to auto-detect the corresponding rfkill-idx option,
but I'm not sure if there's a guaranteed mapping between rfkill name and
interface name, as it seems that rfkills should represent phy devices
and not wlan devices.
The rfkill indexes can be found by checking
/sys/class/rfkill/rfkill<IDX>.
AF_UNIX SOCK_STREAM sockets between the master processes and each subprocess,
and poll for the HUP event.
At the same time, be specific about the events that are checked in epoll
when dispatching on an event.
The pipes wouldn't do this job anymore because they were unused and thus
never performed writes that would generate SIGPIPEs, so the pipes are
removed, too.