The preceding commit made that 'esc_all' function more
efficient by using direct pointer manipulation instead
of an indexed string reference approach within a loop.
This commit applies the same approach to the 'esc_ctl'
function. Now we'll save 12 more iterated instructions
while decreasing the function's code size by 43 bytes.
Signed-off-by: Jim Warner <james.warner@comcast.net>
While this patch has some cosmetic whitespace changes,
more importantly it makes that 'esc_all' function more
efficient. By abandoning the indexed loop approach for
a direct pointer manipulation, we will save 9 iterated
machine instructions, for a total of 33 bytes of code.
Signed-off-by: Jim Warner <james.warner@comcast.net>
Much of what was represented in the commit message for
the reference shown below was revisited in this patch.
It also means that the assertion in the last paragraph
of that message will only now be true with LANG unset.
[ and forget all the bullshit about not altering any ]
[ kernel supplied data. sometimes we must to avoid a ]
[ corrupt display due to a string we can not decode. ]
And while this commit still avoids the overhead of the
'mbrtowc', 'wcwidth' 'isprint, & 'iswprint' functions,
we achieve all the benefits with simple table lookups.
Plus such benefits are extended to additional strings.
For example, both PIDS_EXE and PIDS_CMD fields are now
also subject to being 'escaped'. If a program name did
contain multibyte characters, potential truncation may
corrupt it when it's squeezed into a 15/63 byte array.
Now, all future users of this new library only need to
deal with the disparities between string and printable
lengths. Such strings themselves are always printable.
[ the ps program now contains some unnecessary costs ]
[ with the duplicated former 'escape' functions. But ]
[ we retain that copied escape.c code for posterity. ]
[ besides, in a one-shot guy it's of little concern. ]
Note: Proper display of some multibyte strings was not
possible at the linux console. It would seem a concept
of zero length chars (like a 'combining acute accent')
is not recognized. Thus the display becomes corrupted.
But if utf8 decoding is disabled (via LANG=), then all
callers will now see '?', restoring correct alignment.
Reference(s):
. Dec 2020, newlib 'escape' logic refactored
commit a221b9084a
Signed-off-by: Jim Warner <james.warner@comcast.net>
This new library provides callers with pure strings or
string vectors. It is up to those callers to deal with
potential utf8 multibyte characters and any difference
between strlen and the corresponding printable widths.
So, it makes no sense for the library to go to all the
trouble of invoking those rather expensive 'mbrtowc' &
'wcwidth' functions to ultimately yield total 'cells'.
Thus, this patch will eliminate all the code and parms
that are involved with such possible multibyte issues.
[ Along the way we'll lose the ability to substitute ]
[ '?' for an invalid/unprintable multibyte sequence. ]
[ We will, however, replace ctrl chars with the '?'. ]
[ This presents no problem for that ps program since ]
[ it now duplicates all of the original escape code. ]
[ And, we'll no longer be executing that code twice! ]
[ As for the top program, it takes the position that ]
[ it is wrong to alter kernel supplied data. So with ]
[ potential invalid/unprintable stuff, he'll rely on ]
[ terminal emulators to properly handle such issues! ]
[ Besides, even using a proper multibyte string, not ]
[ all terminals generate the proper printable width. ]
[ This is especially true when it comes to an emoji. ]
[ And should callers chose not to be portable to all ]
[ locales by calling setlocale(LC_ALL, ""), they can ]
[ expect to see lots of "?", regardless of what this ]
[ library fixes in a faulty multibyte string anyway. ]
Signed-off-by: Jim Warner <james.warner@comcast.net>
In that commit referenced below, a promise was made to
revisit an 'escape_str' function in efforts to make it
private to the library. The problem was it's needed by
both ps plus the library which is why it was exported.
So, in an effort to remove it from libprocps.sym, this
patch duplicates all the required code in ps/output.c.
Now, each version can be made private to their caller.
[ along the way we'll use this opportunity to remove ]
[ the 'restrict' qualifiers from function parameters ]
[ while swatting a compiler warning referenced below ]
Reference(s):
. April 2016, most escape functions made private
commit d916d5db86
proc/escape.c: In function `escape_command':
proc/escape.c:182:23: warning: initialization of `const char **' from incompatible pointer type `char **' [-Wincompatible-pointer-types]
182 | const char **lc = (char**)pp->cmdline;
| ^
Signed-off-by: Jim Warner <james.warner@comcast.net>
There was a time when that procps.h file served a more
traditional role. Prior to the commit referenced below
it held just macros plus manifest constants. But, with
that change, such items were replaced with a series of
includes embracing all the library exported functions.
That approach was known to disguise errors which would
have otherwise yielded a compiler warning. And without
such a warning, there was no way to address the error.
So this patch will trade the all inclusive header file
approach for individual includes only where necessary.
Reference(s):
. April 2016, procps.h header file revamped
commit ccb6ae8de1
. Sept 2018, top abandoned use of procps.h
commit a6dfc2382e
Signed-off-by: Jim Warner <james.warner@comcast.net>
Thanks to Konstantin for discovering 2 problems in the
issue referenced below. That 15+ year old logic went a
little too far overboard wrestling with a utf8 string.
Henceforth, we will not treat 'x9b' as special. And we
also will handle a 'combining acute accent' correctly.
Reference(s):
https://gitlab.com/procps-ng/procps/-/issues/176
Signed-off-by: Jim Warner <james.warner@comcast.net>
This solves several problems:
1/ outbuf[1] was written to, but not outbuf[0], which was left
uninitialized (well, SECURE_ESCAPE_ARGS() already fixes this, but do it
explicitly as well); we know it is safe to write one byte to outbuf,
because SECURE_ESCAPE_ARGS() guarantees it.
2/ If bytes was 1, the write to outbuf[1] was an off-by-one overflow.
3/ Do not call escape_str() with a 0 bufsize if bytes == overhead.
4/ Prevent various buffer overflows if bytes <= overhead.
Simply rearrange the old comparisons. The new comparisons are safe,
because we know from previous checks that:
1/ wlen > 0
2/ my_cells < *maxcells (also: my_cells >= 0 and *maxcells > 0)
3/ len > 1
4/ my_bytes+1 < bufsize (also: my_bytes >= 0 and bufsize > 0)
This should never happen, because wcwidth() is called only if iswprint()
returns nonzero. But belt-and-suspenders, and make it visually clear
(very important for the next patch).
The SECURE_ESCAPE_ARGS() macro solves several potential problems
(although we found no problematic calls to the escape*() functions in
procps's code-base, but had to thoroughly review every call; and this is
library code):
1/ off-by-one overflows if the size of the destination buffer is 0;
2/ buffer overflows if this size (or "maxroom") is negative;
3/ integer overflows (for example, "*maxcells+1");
4/ always null-terminate the destination buffer (unless its size is 0).
---------------------------- adapted for newlib branch
. the escape.c now has just a single exported function
. thus SECURE_ESCAPE_ARGS() is needed in only 2 places
. unlike that original patch, macro is executed 1 time
( not like 'escape_command' calling 'escape_strlist' )
( which might then call 'escape_str' multiple times! )
Signed-off-by: Jim Warner <james.warner@comcast.net>
Profiling revealed a large amount of time spent in the
'escape_str_utf8' function (escape.c) with both of our
NLS branches (newlib and master). That same result was
not seen under an ancient top-3.2.8 program & library.
Well, the 3.2.8 result was ultimately explained by the
absence of a 'setlocale', necessary under NLS support.
Thus, when that ancient library tested for locale, all
it got was 'ANSI_...' & assumed 'UTF-8' wasn't active.
But after a hack to that ancient code to place it on a
par with newlib/master, I still found cost differences
that led me to revisit an old change referenced below.
It turns out that 'iswprint' costs far more than would
a call of 'isprint', even with the extra support code.
So this commit just reverts that five year old change.
Reference(s):
commit 7b0fc19e9d
Signed-off-by: Jim Warner <james.warner@comcast.net>
escaped_copy(): only appears in ps, moved to ps/output.c
escape_strlist() only used in escape.c made static
escape_command() used in library, made internal
procps.h no longer includes escape.h
escape_str() used by library and ps so needs to be exported
definition put into procps.h including the odd define required.
Far from ideal to have it this way, will look at it another time
to have it all in, all out or split nicer so its not in the API;
perhaps a lib/ file?
An earlier commit attempted to cleanse our environment
of all useless trailing whitespace. But the effort did
not catch 'empty' lines with a single space before ^J.
This commit hopefully finishes off the earlier effort.
In the meantime, let's pray that contributors' editors
are configured so that such wasted crap is disallowed!
Reference(s):
commit fe75e26ab6
Signed-off-by: Jim Warner <james.warner@comcast.net>
The entire tree's polluted with inappropriate trailing
whitespace. This commit rids our environment of all of
those useless keystrokes. Unfortunately, it sure ain't
a permanent solution and requires every contributor to
instruct their editor(s) to prevent or eliminate them.
Plus it's strongly recommended we all insert something
like what's shown below to our '.gitconfig' file so as
to provide at least some warnings when we try to apply
any patches (git am) that do contain the #@!%& things!
References(s):
~/.gitconfig excerpt ---------------------------------
[core]
whitespace = trailing-space, space-before-tab, blank-at-eof
[apply]
whitespace = warn
--------------------------------- ~/.gitconfig excerpt
Signed-off-by: Jim Warner <james.warner@comcast.net>
Fix few compiler warnings. Some of these warnings appeared multiple
times, and the listing bellow is more about which sort of errors
where fixed.
devname.c:87:12: warning: comparison of integers of different signs: 'int' and 'unsigned long'
output.c:389:36: warning: passing 'char **const' to parameter of type 'const char *const restrict *' discards qualifiers in nested pointer types
output.c:611:31: warning: comparison of integers of different signs: 'const unsigned long' and 'int'
stacktrace.c:33:37: warning: unused parameter 'signum'
Signed-off-by: Sami Kerola <kerolasa@iki.fi>
Library Changes
. added PROC_EDITCMDLCVT flag
. added an internal (static) fill_cmdline_cvt function:
- reads and "escapes" /proc/#/cmdline
- returns result as a single string in a single vector
- callers are guaranteed a cmdline (no more NULL)
. added vectorize_this_str function, exploited by
fill_cgroup_cvt, fill_cmdline_cvt
. generalized read_cmdline function as read_unvectored, now
exploited by fill_cgroup_cvt, fill_cmdline_cvt, read_cmdline
( cgroup and cmdline no longer need be converted to string )
( vectors before being transformed to final representation )
. fixed bug regarding skipped group numbers (when enabled)
. escape_str made responsible for all single byte translation
with distinction between control chars + other unprintable
. added escaped_copy function for already escaped strings
. reorganized parts of proc_t to restore formatting standards
( displacement changes shouldn't matter with new version # )
. former ZAP_SUSEONLY #define now OOMEM_ENABLE
. added to library.map: escaped_copy; read_cmdline
Top Program Changes
. exploited the new PROC_EDITCMDLCVT provision
. eliminated now obsolete #include "proc/escape.h"
. changed the P_WCH display format if no kernel symbol table
. fixed very old bug in lflgs for out-of-view sort fields
. former ZAP_SUSEONLY #define now OOMEM_ENABLE
Ps Program Changes
. exploited the new PROC_EDITCMDLCVT provision
. exploited the new escaped_copy function
. consolidated pr_args and pr_comm into pr_argcom
Signed-off-by: Jan Görig <jgorig@redhat.com>