In production we've had several incidents over the years where a process
has a signal handler registered for SIGHUP or one of the SIGUSR signals
which can be used to signal a request to reload configs, rotate log
files, and the like. While this may seem harmless enough, what we've
seen happen repeatedly is something like the following:
1. A process is using SIGHUP/SIGUSR[12] to request some
application-handled state change -- reloading configs, rotating a log
file, etc;
2. This kind of request is deprecated and removed, so the signal handler
is removed. However, a site where the signal might be sent from is
missed (often logrotate or a service manager);
3. Because the default disposition of these signals is terminal, sooner
or later these applications are going to be sent SIGHUP or similar
and end up unexpectedly killed.
I know for a fact that we're not the only organisation experiencing
this: in general, signal use is pretty tricky to reason about and safely
remove because of the fairly aggressive SIG_DFL behaviour for some
common signals, especially for SIGHUP which has a particularly ambiguous
meaning. Especially in a large, highly interconnected codebase,
reasoning about signal interactions between system configuration and
applications can be highly complex, and it's inevitable that on occasion
a callsite will be missed.
In some cases the right call to avoid this will be to migrate services
towards other forms of IPC for this purpose, but inevitably there will
be some services which must continue using signals, so we need a safe
way to support them.
This patch adds support for the -H/--require-handler flag, which matches
on processes with a userspace handler present for the signal being sent.
With this flag we can enforce that all SIGHUP reload cases and SIGUSR
equivalents use --require-handler. This effectively mitigates the case
we've seen time and time again where SIGHUP is used to rotate log files
or reload configs, but the sending site is mistakenly left present after
the removal of signal handler, resulting in unintended termination of
the process.
Signed-off-by: Chris Down <chris@chrisdown.name>
procps 3.3.17 the c option changed the command/args field
to cmd but this got removed as part of newlib
Functionality is back in with a test case.
References:
https://bugs.debian.org/1026326
Signed-off-by: Craig Small <csmall@dropbear.xyz>
When the skill program was ported to the new API the code to filter
on PID, used by the -p option, was missed. It is now restored.
References:
https://bugs.debian.org/1025915
Previously the match for shmid was \d+ but the variable is printed
as a hex number, updated the regex to suit.
Added some changes for pmap test so if the test_shm process fails
we just skip past it.
Signed-off-by: Craig Small <csmall@dropbear.xyz>
Created a test process test_shm that allocates a shared memory
segment and prints the segment ID. pmap testsuite runs pmap to
check that the segment is found.
The value returned by shmget() is the same value that is printed
in the fifth column /proc/<PID>/maps
Signed-off-by: Craig Small <csmall@dropbear.xyz>
When the globbing update was put into sysctl, you could no longer
simply use the keys because one key could potentially be
multiple paths once the glob expansion occured. Using the path
instead gave a unique output.
Except certain programs, such as salt, expected the output to use
the dotted path "kernel.hostname" and not "kernel/hostname".
We can no longer use the original key, so now for each path:
Copy the path
strip off /proc/
convert all / to .
The sysctl testsuite was also updated to check for a few different
types of conversion failures.
References:
commit 6389deca5bhttps://www.freelists.org/post/procps/some-procpsn4400-fixes,4https://repo.saltproject.io/
Signed-off-by: Craig Small <csmall@dropbear.xyz>
Sometimes due to race conditions or the way dejagnu gates the
output, or even there is another interesting process, the ps tests
sometimes fail. These changes make it a little more lenient without
losing the purpose of the test.
Replaces Debian patch ps_tests
References:
https://salsa.debian.org/debian/procps/-/blob/debian/2%253.3.17-7/debian/patches/ps_checks
In some build systems, such as the Debian pbuilders, the
environment is strange. The tty is called "TTY" which causes
some of the ps tests to fail.
This commit checks for that specific result and returns ""
so the tests can be bypassed.
Replaces Debian patch fix_checks.
References:
https://salsa.debian.org/debian/procps/-/blob/debian/2%253.3.17-7/debian/patches/fix_checks
systemd-sysctl handles glob patterns along with overrides and
exceptions. Now the procps sysctl does it too.
The return value for sysctl is consistently either 0 or 1.
Added tests to check sysctl functions.
References:
procps-ng/procps#191
Signed-off-by: Craig Small <csmall@dropbear.xyz>
vmstat -d testsuite will fail if your /proc/diskstats is present
but zero length. While this seems buggy behaviour from lxcfs, its
there and its a simple matter to test for it and skip those tests
if we are run on a zero length /proc/diskstats system.
If you run slabtop with the -d option and then -o option the
delay gets set to zero and it runs forever. slabtop now checks
for this combination and errors.
Adding a DEJAGNU test also found that none of the slabtop
checks were running so they got added to the list and only the
ones that need /proc/slabinfo (if not readable) are skipped.
References:
#160
The referenced commits enavled both pkill and kill to send an integer to
the killed or signalled process. The test_process now will report on the
integer if sent and the testsuite changes take advantage of this
new feature.
Another process make/destroy set had to be made as using spawn
instead of exec changes both the SID and TTY for the underlying
process, making other tests fail.
References:
commit 7d55409b82
commit 2b804a532a
The previous commit had one minor bug in it because the fields need
to be alphabetical and times comes after timeout.
Added NEWS item for this feature
Added another testsuite check for new flags in case they
disappear or go strange one day.
References:
commit 8a94ed6111
The previous two patches updated free, but needed a tweak and the tests
also needed to be updated. I've hand-calculated the results using bc and
both the testsuite and bc results equal what free prints out.
References:
commit 9365be7633procps-ng/procps#45
This was supposed to be just a cherry-pick of the referenced
commit. However there were two problems:
1. kill code was moved out to its own file
2. strtosig() had a latent bug where signal numbers were not
converted to names.
Original note:
kill -lHUP would work correctly, but kill -l HUP would not.
The list option in kill was hit by a quirk of getopt_long where an
option with an optional argument would not attempt to get the argument
beyond the space, even though a mandatory argument would do that.
The fix is a kludge to scan to the next argument and if it looks
like something we can use, use it. Lucky for us, the list option is
one where parsing can stop immediately.
Thanks to Brian Vandenberg for the way forward.
References:
http://stackoverflow.com/questions/1052746/getopt-does-not-parse-optional-arguments-to-parametershttps://bugs.debian.org/854407
commit 537cea324b121f54744369425332c256aa84a181
This avoids situations where longer regex which matches short-named proc
is used. Test for pgrep updated.
This is the newlib update of 5d12be1b7e8cc690a4d8778754aae5db4c07db2b
Signed-off-by: Craig Small <csmall@enc.com.au>
Add a warning if you specify a command over 15 characters and don't
use the -f command.
This is a pick of two patches from master:
24fd260 pgrep: Fix off by one error in line check
4a7f9fc pgrep - adds warning that pattern exceeds 15 chars without
References:
!25
kill -l SIGHUP (or any other signal-name prefixed with "SIG")
would cause free() to be called with a bad pointer instead of
a pointer to what was allocated. Fix this and add test-case.
Previous commit fixed pkill for trailing garbage on pkill
signal when it was an integer. This check now ensures that
pkill complains about it.
References:
commit a3975a9c60
On some setups the signals count can change and be truncated. You
will notice this because the number will have "<" prepended. The
testsuite didn't handle this.
You could either get:
BLOCKED BLOCKED BLOCKED CAUGHT
CAUGHT CATCHED
0000000000000000 0000000000000000 0000000000000000 00000001f3d1fef9 00000001f3d1fef9 00000001f3d1fef9
or
BLOCKED BLOCKED BLOCKED CAUGHT CAUGHT CATCHED
00000000 00000000 00000000 <f3d1fef9 <f3d1fef9 <f3d1fef9
testsuite would fail if /proc/vmstat was unreadable.
Issue #3 brought up by Mike Frysinger.
test script explicitly checks to see if it is readable and
sets these tests to unsupported if not.
For the test suite, procps used to use sleep which would just
create a process or two to test the tools against. Some setups
coreutils creates all programs including sleep into one blob which
means a lot of the tests fail, see issue #2
procps has its own sleep program now.
Before this commit, the test checking `vmstat -m` (slabinfo) output uses
a fairly strict regular expression that only allows alphanumeric
characters and a few exceptions such as "_", "-", "(" and ")".
However, recent kernels use a wider range of characters, such as ">".
For instance, see this Linux commit which creates a "page->ptl" slab:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/mm/memory.c?id=b35f1819acd9243a3ff7ad25b1fa8bd6bfe80fb2#n4283
Other patches for reporting slab usage per memcg include the names of
the cgroup in the slabinfo output, which can include additional
characters and use dots for abbreviation.
The check should not be so string, instead it could simply look for a
chain of non-whitespace characters and that should be enough.
Tested that `make check` is still working, including in some of the
environments where features that enable the additional slabinfo names.
Signed-off-by: Filipe Brandenburger <filbranden@google.com>
Signed-off-by: Craig Small <csmall@enc.com.au>
The current regexp checks for a \s+ in the beginning, however that will
only match if there is a \n in the `ps` output before test-schedbatch,
but that will not happen if test-schedbatch is the first process in the
list, which happens if the PID of test-schedbatch is low enough to bring
it up in the sorted list.
Fix it by enabling newline-sensitive matching with (?n) which then
allows using ^ and $ anchors in the regexp (including an optional \r
introduced by expect.) Matching the end of line also improves checking
that the last field matches 18 exactly and not something like 181, etc.
Tested that `make check` does not break and also fixed the flakiness
seen in an environment with few processes running under the test user
which made the issue more frequent.
Signed-off-by: Filipe Brandenburger <filbranden@google.com>
Signed-off-by: Craig Small <csmall@enc.com.au>
Right now the test case is both testing it (expect_pass "$test") and
marking it as untested (untested "$test"), it should do either one or
the other, so stop marking it as untested.
Before this change, these lines appear in testsuite/ps.log or the output
of `make check RUNTESTFLAGS="--all"`:
PASS: ps SCHED_BATCH scheduler
UNTESTED: ps SCHED_BATCH scheduler
Note that the second line is confusing, it's implying that the test is
untested, right after having tested it and indicated it passes.
After this change, only the first line will appear.
Tested that both `make check` and `make distcheck` continue working with
this commit.
Signed-off-by: Filipe Brandenburger <filbranden@google.com>
Signed-off-by: Craig Small <csmall@enc.com.au>
On most systems the only process with a SID=1 is init
and certainly not a test sleep. On docker systems this
test program IS on SID=1 and so our "impossible SID" becomes
possible.
The ps sched test has been disabled. There are too many
odd build farms this fails in strange ways.
Other odd build farms have no tty and so some tests check
for no tty and skip if not found.
Free always used 1024 based units but used the confusing old style
kilo,mega etc.
This change changes the names to kibi,mebi for 1024 based divisors
and kilo,mega for 1000 based divisors or IEC units.
It also checks if you try to set two units, e.g free -k -m
Petabyte and Pebibyte have been added.
If you used to use the long options such as --mega these will now
actually print megabytes (they previously printed mebibytes).
The short options are being used on the IEC units
References: https://www.gitorious.org/procps/procps/merge_requests/38
Signed-off-by: Craig Small <csmall@enc.com.au>
free got a makeover to suit the newer kernel memory management.
This commit updates the tests to follow the new output for free
Signed-off-by: Craig Small <csmall@enc.com.au>
It seems command -v also includes built-ins so checking for kill
is useless because it finds the built-in and those machines or
environments that have no /bin/kill fail at the check stage.
Oh and then TCL exec doesn't spawn a shell.
After reading way too many TCL websites, I believe this should
fix the problem. TCL quoting is... different to say the least but
it works reliably here. The script now even picked up a typo elsewhere
which was nice.
This change should stop the intermittent FTBFS bugs from the Debian
pbuilders, I hope! You'd think kill $var wouldn't be this difficult.
vmstat -p checks used to fail on systems with odd
partition tables, including some Debian buildd servers.
This change limits what sort of test partitions are used,
otherwise the test is skipped.
There probably are other valid partitions, these can be added
later, if known.
Benno Schulenberg suggested some changes to the help messages
to provide some consistency and clarity for both the users and
translators of procps.
The test needed to be updated as the pmap output changed too.
Signed-off-by: Craig Small <csmall@enc.com.au>
vmstat -d or vmstat -p would crash mysteriously under different
circumstances. The problem was eventually tracked down to /sys not
being mounted which meant is_disk() always returned false.
The partition would then be attempted to be linked to a non-existent
disk causing a segfault.
vmstat will now not link to a disk if none exists.
The change in testing will skip those tests when /sys/block doesn't
exist.
Many thanks to Daniel Schepler for his analysis and suggestions.
Bug-Debian: http://bugs.debian.org/736628