This patch is intended to bring the master branch into
closer alignment with the newlib branch functionality.
The commit shown below replaced that 'escape_str' with
'escaped_copy' so that the ps program wouldn't execute
some code twice. However, it created a problem for the
top program when the UTF-8 encoding was not available.
Unlike ps who calls the escape functions directly, top
outputs those library provided strings unaltered. That
means there is no workaround (like LANG=) when such an
encoding was unavailable. This, in turn, can lead to a
corrupted display (especially with the linux console).
Now, when a UTF-8 encoding is not available, multibyte
chars are converted to '?', reducing display problems.
[ there's still a potential problem concerning 'cmd' ]
[ should program names contain multibyte characters. ]
[ unlike that newlib branch, this string is not ever ]
[ altered with the '?' char under the master branch. ]
Reference(s);
. Dec 2020, escaoed_copy repleced most escape_str
commit eea5626bb1
Signed-off-by: Jim Warner <james.warner@comcast.net>
Now that the ps program is using 'escape_str' for most
of the library's returned strings, this patch tries to
lessen the prospects of executing that function twice.
Our newlib branch has achieved such a goal through the
elimination of nearly all escape.c code. However, here
we avoid API change by trading some 'escape_str' calls
(with wide character overhead) for a slightly extended
'escaped_copy' call (which incurs no multibyte costs).
Note: until we migrate to the newlib version, there is
a remaining call to 'escape_str' which we can't avoid.
Such code involves the 'escape_command' function call.
[ As we prepare for this new (final?) release, there ]
[ were already internal library changes that require ]
[ a new 'revision'. This patch won't impact the API! ]
Signed-off-by: Jim Warner <james.warner@comcast.net>
Thanks to Konstantin for discovering 2 problems in the
issue referenced below. That 15+ year old logic went a
little too far overboard wrestling with a utf8 string.
Henceforth, we will not treat 'x9b' as special. And we
also will handle a 'combining acute accent' correctly.
Reference(s):
https://gitlab.com/procps-ng/procps/-/issues/176
Signed-off-by: Jim Warner <james.warner@comcast.net>
May happen if strlen(src) > INT_MAX for example. This patch prevents
escaped_copy() from increasing maxroom and returning -1 (= number of
bytes consumed in dst).
This solves several problems:
1/ outbuf[1] was written to, but not outbuf[0], which was left
uninitialized (well, SECURE_ESCAPE_ARGS() already fixes this, but do it
explicitly as well); we know it is safe to write one byte to outbuf,
because SECURE_ESCAPE_ARGS() guarantees it.
2/ If bytes was 1, the write to outbuf[1] was an off-by-one overflow.
3/ Do not call escape_str() with a 0 bufsize if bytes == overhead.
4/ Prevent various buffer overflows if bytes <= overhead.
Simply rearrange the old comparisons. The new comparisons are safe,
because we know from previous checks that:
1/ wlen > 0
2/ my_cells < *maxcells (also: my_cells >= 0 and *maxcells > 0)
3/ len > 1
4/ my_bytes+1 < bufsize (also: my_bytes >= 0 and bufsize > 0)
This should never happen, because wcwidth() is called only if iswprint()
returns nonzero. But belt-and-suspenders, and make it visually clear
(very important for the next patch).
The SECURE_ESCAPE_ARGS() macro solves several potential problems
(although we found no problematic calls to the escape*() functions in
procps's code-base, but had to thoroughly review every call; and this is
library code):
1/ off-by-one overflows if the size of the destination buffer is 0;
2/ buffer overflows if this size (or "maxroom") is negative;
3/ integer overflows (for example, "*maxcells+1");
4/ always null-terminate the destination buffer (unless its size is 0).
Profiling revealed a large amount of time spent in the
'escape_str_utf8' function (escape.c) with both of our
NLS branches (newlib and master). That same result was
not seen under an ancient top-3.2.8 program & library.
Well, the 3.2.8 result was ultimately explained by the
absence of a 'setlocale', necessary under NLS support.
Thus, when that ancient library tested for locale, all
it got was 'ANSI_...' & assumed 'UTF-8' wasn't active.
But after a hack to that ancient code to place it on a
par with newlib/master, I still found cost differences
that led me to revisit an old change referenced below.
It turns out that 'iswprint' costs far more than would
a call of 'isprint', even with the extra support code.
So this commit just reverts that five year old change.
[ this patch parallels a similar change under newlib ]
Reference(s):
commit 7b0fc19e9d
Signed-off-by: Jim Warner <james.warner@comcast.net>
An earlier commit attempted to cleanse our environment
of all useless trailing whitespace. But the effort did
not catch 'empty' lines with a single space before ^J.
This commit hopefully finishes off the earlier effort.
In the meantime, let's pray that contributors' editors
are configured so that such wasted crap is disallowed!
Reference(s):
commit fe75e26ab6
Signed-off-by: Jim Warner <james.warner@comcast.net>
The entire tree's polluted with inappropriate trailing
whitespace. This commit rids our environment of all of
those useless keystrokes. Unfortunately, it sure ain't
a permanent solution and requires every contributor to
instruct their editor(s) to prevent or eliminate them.
Plus it's strongly recommended we all insert something
like what's shown below to our '.gitconfig' file so as
to provide at least some warnings when we try to apply
any patches (git am) that do contain the #@!%& things!
References(s):
~/.gitconfig excerpt ---------------------------------
[core]
whitespace = trailing-space, space-before-tab, blank-at-eof
[apply]
whitespace = warn
--------------------------------- ~/.gitconfig excerpt
Signed-off-by: Jim Warner <james.warner@comcast.net>
Fix few compiler warnings. Some of these warnings appeared multiple
times, and the listing bellow is more about which sort of errors
where fixed.
devname.c:87:12: warning: comparison of integers of different signs: 'int' and 'unsigned long'
output.c:389:36: warning: passing 'char **const' to parameter of type 'const char *const restrict *' discards qualifiers in nested pointer types
output.c:611:31: warning: comparison of integers of different signs: 'const unsigned long' and 'int'
stacktrace.c:33:37: warning: unused parameter 'signum'
Signed-off-by: Sami Kerola <kerolasa@iki.fi>
Library Changes
. added PROC_EDITCMDLCVT flag
. added an internal (static) fill_cmdline_cvt function:
- reads and "escapes" /proc/#/cmdline
- returns result as a single string in a single vector
- callers are guaranteed a cmdline (no more NULL)
. added vectorize_this_str function, exploited by
fill_cgroup_cvt, fill_cmdline_cvt
. generalized read_cmdline function as read_unvectored, now
exploited by fill_cgroup_cvt, fill_cmdline_cvt, read_cmdline
( cgroup and cmdline no longer need be converted to string )
( vectors before being transformed to final representation )
. fixed bug regarding skipped group numbers (when enabled)
. escape_str made responsible for all single byte translation
with distinction between control chars + other unprintable
. added escaped_copy function for already escaped strings
. reorganized parts of proc_t to restore formatting standards
( displacement changes shouldn't matter with new version # )
. former ZAP_SUSEONLY #define now OOMEM_ENABLE
. added to library.map: escaped_copy; read_cmdline
Top Program Changes
. exploited the new PROC_EDITCMDLCVT provision
. eliminated now obsolete #include "proc/escape.h"
. changed the P_WCH display format if no kernel symbol table
. fixed very old bug in lflgs for out-of-view sort fields
. former ZAP_SUSEONLY #define now OOMEM_ENABLE
Ps Program Changes
. exploited the new PROC_EDITCMDLCVT provision
. exploited the new escaped_copy function
. consolidated pr_args and pr_comm into pr_argcom
Signed-off-by: Jan Görig <jgorig@redhat.com>