add sed -r support.
I bumped into a couple of things that want to use extended regular expressions
in sed, and it really isn't that hard to add. Can't say I've extensively
tested it, but it's small and isn't going to break anything that doesn't use
it, so...
Rob
the _destination_ file. (Ah hah! That works _much_ better...) I
implemented the behavior, I just forgot to test this corner of it. My fault,
sorry...
No, gnu sed -i doesn't preverve ownership information. I checked.
Permissions, yes, ownership info, no.
Rob
that the _only_ change to is that gnu sed has been replaced with busybox sed.
And ncurses' install phase hangs. I trace it down, and it's trying to run
gawk. (Insert obligatory doubletake, but this is FSF code we're talking
about, so...)
It turns out gawk shells out to sed, ala "sed -f /tmp/blah file.h". The
/tmp/blah file is basically empty (it contains one character, a newline). So
basically, gawk is using sed as "cat". With gnu sed, it works like cat,
anyway.
With busybox sed, it tests if its command list is empty after parsing the
command line, and if the list is empty it takes the first file argument as a
sed command string, and if that leaves the file list empty it tries to read
the data to operate on from stdin. (Hence the hang, since nothing's coming
in on stdin...)
It _should_ be testing whether there were any instances of -f or -e, not
whether it actually got any commands. Using sed as cat may be kind of
stupid, but it's valid and gawk relies on this behavior.
Here's a patch to fix it, turning a couple of ints into chars in hopes of
saving a bit of the space this adds. Comments?
Rob
This is a bulk spelling fix patch against busybox-1.00-pre10.
If anyone gets a corrupted copy (and cares), let me know and
I will make alternate arrangements.
Erik - please apply.
Authors - please check that I didn't corrupt any meaning.
Package importers - see if any of these changes should be
passed to the upstream authors.
I glossed over lots of sloppy capitalizations, missing apostrophes,
mixed American/British spellings, and German-style compound words.
What is "pretect redefined for test" in cmdedit.c?
Good luck on the 1.00 release!
- Larry
sed -i "/^boo/a fred" ipsec.conf
Which works in gnu sed. (And is _supposed_ to strip all the whitespace before
"fred".)
It also broke:
sed -i -e "/^boo/a \\" -e " fred" ipsec.conf
I.E. there can legally be spaces between the a and the backslash at the end of
the line.
And strangely enough, gnu sed accepts the following syntax as well:
sed -i "/^boo/a \\ fred" ipsec.conf
Which is a way of having the significant whitespace at the start of the line,
all on one line. (But notice that the whitespace BEFORE the slash is still
stripped, as is the slash itself. And notice that the naieve placement of
"\n" there doesn't work, it puts an n at the start of the appended line. The
double slashing is for shell escapes because you could escape the quote, you
see. It's turned into a single backslash. But \n there is _not_ turned into
a newline by the shell. So there.)
This makes all three syntaxes work in my tests. I should probably start
writing better documentation at some point. I posted my current sedtests.py
file to the list, which needs a lot more tests added as well...
The sed command in busybox 1.0.0-pre8 loses leading whitespace
in 'a' command ('i' and 'c' commands are also affected). A
patch to fix this is attached at the end of this message.
The following is a transcript that reproduces the problem. The
first run uses busybox 1.0.0-pre3 as "/bin/sed" command, which
gets the expected result. Later in the test, /bin/sed symlink
is changed to point at busybox 1.0.0-pre8 and the test script is
run again, which shows the failure.
=== reproduction recipe ===
* Part 1. Use busybox 1.0.0-pre3 as sed; this works.
root# cd /tmp
root# cat 1.sh
#!/bin/sh
cd /tmp
rm -f ipsec.conf ipsec.conf+
cat >ipsec.conf <<\EOF
version 2.0
config setup
klipsdebug=none
plutodebug=none
plutostderrlog=/dev/null
conn %default
keyingtries=1
...
EOF
sed -e '/^config setup/a\
nat_traversal=yes' ipsec.conf >ipsec.conf+
mv -f ipsec.conf+ ipsec.conf
root# sh -x 1.sh
+ cd /tmp
+ rm -f ipsec.conf ipsec.conf+
+ cat
+ sed -e /^config setup/a\
nat_traversal=yes ipsec.conf
+ mv -f ipsec.conf+ ipsec.conf
root# cat ipsec.conf
version 2.0
config setup
nat_traversal=yes
klipsdebug=none
plutodebug=none
plutostderrlog=/dev/null
conn %default
keyingtries=1
...
root# sed --version
sed: invalid option -- -
BusyBox v1.00-pre3 (2004.02.26-18:47+0000) multi-call binary
Usage: sed [-nef] pattern [files...]
* Part 2. Continuing from the above, use busybox 1.0.0-pre8
as sed; this fails.
root# ln -s busybox-pre8 /bin/sed-8
root# mv /bin/sed-8 /bin/sed
root# sed --version
This is not GNU sed version 4.0
root# sed --
BusyBox v1.00-pre8 (2004.03.30-02:44+0000) multi-call binary
Usage: sed [-nef] pattern [files...]
root# sh -x 1.sh
+ cd /tmp
+ rm -f ipsec.conf ipsec.conf+
+ cat
+ sed -e /^config setup/a\
nat_traversal=yes ipsec.conf
+ mv -f ipsec.conf+ ipsec.conf
root# cat ipsec.conf
version 2.0
config setup
nat_traversal=yes
klipsdebug=none
plutodebug=none
plutostderrlog=/dev/null
conn %default
keyingtries=1
...
root#
=== reproduction recipe ends here ===
This problem was introduced in 1.0.0-pre4. The problem is that
the command argument parsing code strips leading whitespaces too
aggressively. When running the above example, the piece of code
in question gets "\n\tnat_traversal=yes" as its argument in
cmdstr variable (shown part in the following patch). What it
needs to do at this point is to strip the first newline and
nothing else, but it instead strips all the leading whitespaces
at the beginning of the string, thus losing the tab character.
The following patch fixes this.
While building glibc with busybox as part of the development environment, I
found a bug in glibc's regexec can throw sed into an endless loop. This
fixes it. Should I put an #ifdef around it or something? (Note, this patch
also contains the "this is not gnu sed 4.0" hack I posted earlier, which is
also needed to build glibc...)
Moving on to building diffutils, busybox sed needs this patch to get
past the first problem. (Passing it a multi-line command line argument
with -e works, but if you don't use -e it doesn't break up the multiple
lines...)
echo fooba | ./busybox sed -n 's/foo//;s/bar/found/p'
I really need to start adding these tests to the testsuite.
keep the substituted and altered flags seperate
If a label isnt specified, jump to end of script, not the last command
in the script.
Print an error and exit if you try and jump to a non-existant label
Works for the following testcase
# cat strings
a
b
c
d
e
f
g
# cat strings | ./busybox sed -n '/d/b;p'
a
b
c
e
f
g
Fixed a memory leak in add_cmd/add_cmd_str by moving the allocation
of sed_cmd down to where it's actually first needed.
In get_address, if index_of_next_unescaped_regexp_delim ever failed, we
wouldn't notice because the return value was added to idx, which was
already guaranteed to be > 0. (This is buried in the changes made when
I redid get_address to be based on pointer arithmetic, because all the tests
were gratuitously dereferencing with a constant zero, which wasn't obvious.)
Comment in parse_regex_delim was wrong: 's' and 'y' both call it.
The reason "sed_cmd->num_backrefs = 0;" isn't needed is that sed_cmd was
allocated with cmalloc, which zeroes memory.
Different handling of space after \ in i...
Different handling of pattern "s/a/b s/c/d"
Cool, resursive reads don't cause a crash. :)
Fixed "sed -f blah filename - < filename" since GNU sed was handling
both - and filenames on the same line. (You can even list - more than
once, although it's immediate EOF...)
the command.
# cat strings
a
b
c
d
e
f
g
# ./busybox sed '1,2d;4,$d' <strings
c
# ./busybox sed '4,$d;1,2d' <strings
# sed '4,$d;1,2d' <strings
c
# sed '1,2d;4,$d' <strings
c
- Fixed bug where you couldn't use two addresses for a 'c' cmd
- Moved the do_sed_cmd function into process_file to simplify some things
- Reduced a buncha lines of code in the process
busybox.h which slowed compiles. I left only what was needed and then fixed up
all the apps to include their own header files. I also fixed naming for pwd.h
and grp.h functions. Tested to compile and run with libc5, glibc, and uClibc.
-Erik
a few other ugly places (do_subst_command got a much-needed overhaul). Also
took out BB_FEATURE_SED_PATTERN_SPACE from Config.h[.Hurd] as the 'p' is now a
standard feature (adds almost no bloat).
the -V (version) flag from busybox sed. It is unnecessary because sed is not a
standalone and should therefore be independently reporting a version number.
Moreover, it is extra code that we just don't need.
(\1, \2...\9). This touched a lot of places in this file and I added a new
function 'print_subst_w_backrefs' in order to keep 'do_subst_command' a
little more tidy.
* I tested this good 'n hard, but will always appreciate more testing from
other, willing folks.
- Noticed that the index_of_next_unescaped_slash was subtly wrong so I
changed both the functionality and behavior (it used to skip over the first
char in the string you passed it, assuming it was a leading '/'--this
assumption is no longer made) this necessitated changing the lines that
call this function just slightly.
- add_cmd_str: segv's were being generated if there was a '# comment' line
(and probably other kinds of lines, too) that was not followed by a
semi-colon or whitespace
- parse_edit_cmd: was returning a wrong number (too low) for the index; it
was not accounting for backslashes eaten, for the fact that we start at the
3rd index in the string, or for the fact that we add an extra newline.
- parse_cmd_str: was returning a wrong number (again, too low) for the index
in the case of single-letter commands (p,d). There was some
over-compensation for this in the 'return' stmt at the end which also
needed some help.
- load_cmd_file: was not eating trailing newlines off the line read from the
command file. This had the deleterious effect of printing an extra newlines
after text displayed from edit (i,a,c) commands.
- Obsoleted the trim_str function (#if 0'ed out -- maybedelete later) in
favor of strrspn.
- Obsoleted the strrspn function (#if 0'ed out as well) as soon as I
discovered that it wasn't needed either.
- Fixed a subtle bug in parse_subst_cmd where it would choke with an error if
there was any trailing space after the s/match/replace/ expression.
to accomodate a trailing '\n'ewline that I append to it later one. This is
only necessary for the case of one inserted, appended, or changed line, but
it's still necessary.