Christoph Schulz 03ad7ae081 ash: reset tokpushback before prompting while parsing heredoc
The parser reads from an already freed memory location, thereby causing
unpredictable results, in the following situation:

- ENABLE_ASH_EXPAND_PRMT is enabled
- heredoc is being parsed
- command substitution is used within heredoc

Examples where this bug crops up are (PS2 is set to "> "):

$ cat <<EOF
> `echo abc`
> EOF
-sh: O: not found

$ cat <<EOF
> $(echo abc)
> EOF
-sh: {garbage}: not found

The presumable reason is that setprompt_if() causes a nested expansion when
ENABLE_ASH_EXPAND_PRMT is enabled, therefore leaving "wordtext" in an unusable
state. However, when parseheredoc() is called, "tokpushback" is non-zero, which
causes the next call to xxreadtoken() to return TWORD, causing the caller to
use the invalid "wordtoken" instead of reading the next valid token.

The call chain is:

list()
-> peektoken() [sets tokpushback to 1]
-> parseheredoc()
   -> setprompt_if()
      -> pushstackmark()
      -> expandstr()
         -> readtoken1()
            [sets lasttoken to TWORD, wordtoken points to expanded prompt]
      -> popstackmark() [invalidates wordtoken, leaves lasttoken as is]
   -> readtoken1()
      -> ...parsebackq
         -> list()
            -> andor()
               -> pipeline()
                  -> readtoken()
                     -> xxreadtoken()
                        [tokpushback non-zero, reuse lasttoken and wordtext]

Note that in almost all other contexts, each call to setprompt_if() is preceded
by setting "tokpushback" to zero. One exception is "oldstyle" backquote parsing
in readtoken1(), but there "tokpushback" is reset afterwards. The other
exception is nlprompt(), but this function is only used within readtoken1()
(but in contexts where no nested calls to xxreadtoken() occur) and xxreadtoken()
(where "tokpushback" is guaranteed to be zero).

function                                             old     new   delta
parseheredoc                                         124     131      +7

Signed-off-by: Christoph Schulz <develop@kristov.de>
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
2018-11-20 17:45:52 +01:00
..
2018-03-02 20:48:36 +01:00
2018-07-17 15:04:17 +02:00

http://www.opengroup.org/onlinepubs/9699919799/
Open Group Base Specifications Issue 7


http://www.opengroup.org/onlinepubs/9699919799/utilities/V3_chap01.html
Shell & Utilities

It says that any of the standard utilities may be implemented
as a regular shell built-in. It gives a list of utilities which
are usually implemented that way (and some of them can only
be implemented as built-ins, like "alias"):

alias
bg
cd
command
false
fc
fg
getopts
jobs
kill
newgrp
pwd
read
true
umask
unalias
wait


http://www.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html
Shell Command Language

It says that shell must implement special built-ins. Special built-ins
differ from regular ones by the fact that variable assignments
done on special builtin are *PRESERVED*. That is,

VAR=VAL special_builtin; echo $VAR

should print VAL.

(Another distinction is that an error in special built-in should
abort the shell, but this is not such a critical difference,
and moreover, at least bash's "set" does not follow this rule,
which is even codified in autoconf configure logic now...)

List of special builtins:

. file
: [argument...]
break [n]
continue [n]
eval [argument...]
exec [command [argument...]]
exit [n]
export name[=word]...
export -p
readonly name[=word]...
readonly -p
return [n]
set [-abCefhmnuvx] [-o option] [argument...]
set [+abCefhmnuvx] [+o option] [argument...]
set -- [argument...]
set -o
set +o
shift [n]
times
trap n [condition...]
trap [action condition...]
unset [-fv] name...

In practice, no one uses this obscure feature - none of these builtins
gives any special reasons to play such dirty tricks.

However. This section also says that *function invocation* should act
similar to special built-in. That is, variable assignments
done on function invocation should be preserved after function invocation.

This is significant: it is not unthinkable to want to run a function
with some variables set to special values. But because of the above,
it does not work: variable will "leak" out of the function.