One of our physical machine shows that the "CACHE SIZE" column
of slabtop output is extremely high, three times of the products
of objs nums and objs size. After some analysis, we found that
the order of slab, which decides "pages per slab", will shrink
when memory pressure is high and normal order allocation failed.
So we think it might help to add these comments to the man help.
Minor fix: add the "memory." back, which is lost after
"aa461df0: docs: Minor manpage fixes"
Signed-off-by: Zhao Mengmeng <zhaomengmeng@kylinos.cn>
Well this is embarrassing. After repeatedly flogging a
horse (represented by issue #274) I was certain it was
dead. But, it turns out that the darn thing yet lived.
In fact, the bug that was patched was not even the one
the poster experienced. Now merge request #173 finally
penetrated my foggy brain and explicated the real bug.
Since forever (linux 2.6), top has ignored those guest
and guest_nice fields in /proc/stat. When many virtual
machines were running that overhead went unrecognized.
So, this commit simply adds those tics to the 'system'
figures so that it can be seen in text or graph modes.
Reference(s):
https://gitlab.com/procps-ng/procps/-/merge_requests/173https://gitlab.com/procps-ng/procps/-/issues/274
. Mar 2023, avoid keystroke '%Cpu' distortions
commit 7e33fc47c6
Signed-off-by: Jim Warner <james.warner@comcast.net>
When pgrep was used to match on signal, it makes sense to use
the same signal parsing code as pkill. Unfortunately the
"find the signal" part is a little too enthusaistic about what a
signal is, meaning
pgrep -u -42
fails because the signal becomes "42" and then there is no UID.
This is a bit sad for pkill but has been that way for a long
time. For pgrep this is new so now only the long form
pgrep --signal <X>
will work.
In addition, when using --signal if pgrep/pkill couldn't work
out what the signal was it just silently ignored it. It now
complains and aborts.
References:
https://bugs.debian.org/1031765
commit 866abacf88
This patch just follows Craig's lead for the remaining
ps and top program files and associated man documents.
Signed-off-by: Jim Warner <james.warner@comcast.net>
The lstart field has been converted to use the strftime()
function so that it uses the locale. A new option -D
allows the user to define the format that would want this
field to show.
This may mean the field will be longer than it should be,
especially for French locales and the user defined field,
but the field length can be specified too.
---
This commit started off making all the relevant fields use the
locale correctly so it could solve #226 as well. The issue
is there an implied restriction (or not) around
strftime("%b") and probably strftime("%a") for abbrievated month
and day names respectively.
English, and some/most other languages put an additional
restriction that all abbreviations are 3 characters long.
The problem is, not all languages do this.
French is a good example:
janv. févr. mars avril mai juin juil. août sept. oct. nov. déc.
Maybe strip the . at the end?
That helps for some months, not all
Maybe take the first three characters?
Several wide languages will have big issues
Maybe convert wide, get wcslen then use that.
Even after that June "juin" and July "juil" are both "jui".
So, anything that uses the month (bsdstart,start) use ctime which
doesn't use locale. That solves the length issue.
stime does, which means it has this issue but its been like that
for years. You get stuff like this:
janv.13 482261
00:00 1151918
2022 1458628
06:12 1957584
The only way to fix that would be to
a)Make the field two characters longer
b)Convert it back to ctime() which means everyone else
loses.
This could have be oh-so easy if everyone made %b and %a three
(wide) characters everywhere.
References:
procps-ng/procps#228procps-ng/procps#226
Signed-off-by: Craig Small <csmall@dropbear.xyz>
If you have the watched program doing some other thing every time its
run and you resize the window, you might get unexpected results. The
-r option lets you run only when the interval has expired.
References:
procps-ng/procps!125procps-ng/procps#190
Updated the definition of total, because its not *all* of
the installed memory but close to it.
References:
procps-ng/procps#247
Signed-off-by: Craig Small <csmall@dropbear.xyz>
In production we've had several incidents over the years where a process
has a signal handler registered for SIGHUP or one of the SIGUSR signals
which can be used to signal a request to reload configs, rotate log
files, and the like. While this may seem harmless enough, what we've
seen happen repeatedly is something like the following:
1. A process is using SIGHUP/SIGUSR[12] to request some
application-handled state change -- reloading configs, rotating a log
file, etc;
2. This kind of request is deprecated and removed, so the signal handler
is removed. However, a site where the signal might be sent from is
missed (often logrotate or a service manager);
3. Because the default disposition of these signals is terminal, sooner
or later these applications are going to be sent SIGHUP or similar
and end up unexpectedly killed.
I know for a fact that we're not the only organisation experiencing
this: in general, signal use is pretty tricky to reason about and safely
remove because of the fairly aggressive SIG_DFL behaviour for some
common signals, especially for SIGHUP which has a particularly ambiguous
meaning. Especially in a large, highly interconnected codebase,
reasoning about signal interactions between system configuration and
applications can be highly complex, and it's inevitable that on occasion
a callsite will be missed.
In some cases the right call to avoid this will be to migrate services
towards other forms of IPC for this purpose, but inevitably there will
be some services which must continue using signals, so we need a safe
way to support them.
This patch adds support for the -H/--require-handler flag, which matches
on processes with a userspace handler present for the signal being sent.
With this flag we can enforce that all SIGHUP reload cases and SIGUSR
equivalents use --require-handler. This effectively mitigates the case
we've seen time and time again where SIGHUP is used to rotate log files
or reload configs, but the sending site is mistakenly left present after
the removal of signal handler, resulting in unintended termination of
the process.
Signed-off-by: Chris Down <chris@chrisdown.name>
The man page said it cannot show changes to comm, such as when you
use prctl(). In fact, ps can see this. The args field may not change
because its due to the path of the executable but comm can.
The field comm no longer shows defunct for zombie processes, use the
state field for this as it could be obscured if not the last
column anyhow.
Signed-off-by: Craig Small <csmall@dropbear.xyz>
This patch just supplements the previous series with a
few minor tweaks representing some diverse objectives:
. a recent date for man page (which i always overlook)
. improved length calculations to maximize graph width
. a proper response to platforms with less than 8 cpus
. more consistency and readability with one blank line
Signed-off-by: Jim Warner <james.warner@comcast.net>
pgrep and friends naturally filter their own processes from their
matches. The same issue can occur when elevating with tools like sudo or
doas, where the elevating shim layers linger as a parent and are
returned in the results. For example:
% sudo pkill -9 -cf someelevatedcmdline
1
zsh: killed sudo pkill -9 -cf someelevatedcmdline
This is a situation we've actually seen in production, where some poor
soul changes how permission management works (for example with Linux's
hidepid option), needs to elevate a pgrep or pkill call, and now ends up
with more than they bargained for. Even after the issue is noticed,
resolving it requires reinventing some of the pgrep logic, which is
unfortunate.
This commit adds the -A/--ignore-ancestors option which excludes pgrep's
ancestors from the results:
% sudo ./pkill -9 -Acf someelevatedcmdline
0
We looks at multiple layers of the process hierarchy because, while
things like sudo only have one layer of shimming, some mechanisms (like
those found in a typical container manager like those found in Docker or
Kubernetes) may have many more.
Signed-off-by: Chris Down <chris@chrisdown.name>