2006-01-22 07:14:29 +05:30
|
|
|
|
<!--#include file="header.html" -->
|
|
|
|
|
|
|
|
|
|
<h2>Rob's notes on programming busybox.</h2>
|
|
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
|
<li><a href="#goals">What are the goals of busybox?</a></li>
|
|
|
|
|
<li><a href="#design">What is the design of busybox?</a></li>
|
|
|
|
|
<li><a href="#source">How is the source code organized?</a></li>
|
|
|
|
|
<ul>
|
|
|
|
|
<li><a href="#source_applets">The applet directories.</a></li>
|
|
|
|
|
<li><a href="#source_libbb">The busybox shared library (libbb)</a></li>
|
|
|
|
|
</ul>
|
|
|
|
|
<li><a href="#adding">Adding an applet to busybox</a></li>
|
|
|
|
|
<li><a href="#standards">What standards does busybox adhere to?</a></li>
|
2006-05-01 10:56:01 +05:30
|
|
|
|
<li><a href="#portability">Portability.</a></li>
|
2006-01-29 11:59:01 +05:30
|
|
|
|
<li><a href="#tips">Tips and tricks.</a></li>
|
|
|
|
|
<ul>
|
|
|
|
|
<li><a href="#tips_encrypted_passwords">Encrypted Passwords</a></li>
|
|
|
|
|
<li><a href="#tips_vfork">Fork and vfork</a></li>
|
2006-02-12 06:15:39 +05:30
|
|
|
|
<li><a href="#tips_short_read">Short reads and writes</a></li>
|
2006-04-10 23:24:23 +05:30
|
|
|
|
<li><a href="#tips_memory">Memory used by relocatable code, PIC, and static linking.</a></li>
|
2006-05-01 10:56:01 +05:30
|
|
|
|
<li><a href="#tips_kernel_headers">Including Linux kernel headers.</a></li>
|
2006-01-29 11:59:01 +05:30
|
|
|
|
</ul>
|
2006-02-16 08:51:44 +05:30
|
|
|
|
<li><a href="#who">Who are the BusyBox developers?</a></li>
|
2006-01-22 07:14:29 +05:30
|
|
|
|
</ul>
|
|
|
|
|
|
2006-04-11 00:46:50 +05:30
|
|
|
|
<h2><b><a name="goals">What are the goals of busybox?</a></b></h2>
|
2006-01-22 07:14:29 +05:30
|
|
|
|
|
|
|
|
|
<p>Busybox aims to be the smallest and simplest correct implementation of the
|
|
|
|
|
standard Linux command line tools. First and foremost, this means the
|
|
|
|
|
smallest executable size we can manage. We also want to have the simplest
|
|
|
|
|
and cleanest implementation we can manage, be <a href="#standards">standards
|
|
|
|
|
compliant</a>, minimize run-time memory usage (heap and stack), run fast, and
|
|
|
|
|
take over the world.</p>
|
|
|
|
|
|
2006-04-11 00:46:50 +05:30
|
|
|
|
<h2><b><a name="design">What is the design of busybox?</a></b></h2>
|
2006-01-22 07:14:29 +05:30
|
|
|
|
|
|
|
|
|
<p>Busybox is like a swiss army knife: one thing with many functions.
|
|
|
|
|
The busybox executable can act like many different programs depending on
|
|
|
|
|
the name used to invoke it. Normal practice is to create a bunch of symlinks
|
|
|
|
|
pointing to the busybox binary, each of which triggers a different busybox
|
|
|
|
|
function. (See <a href="FAQ.html#getting_started">getting started</a> in the
|
|
|
|
|
FAQ for more information on usage, and <a href="BusyBox.html">the
|
|
|
|
|
busybox documentation</a> for a list of symlink names and what they do.)
|
|
|
|
|
|
|
|
|
|
<p>The "one binary to rule them all" approach is primarily for size reasons: a
|
|
|
|
|
single multi-purpose executable is smaller then many small files could be.
|
|
|
|
|
This way busybox only has one set of ELF headers, it can easily share code
|
|
|
|
|
between different apps even when statically linked, it has better packing
|
|
|
|
|
efficiency by avoding gaps between files or compression dictionary resets,
|
|
|
|
|
and so on.</p>
|
|
|
|
|
|
|
|
|
|
<p>Work is underway on new options such as "make standalone" to build separate
|
|
|
|
|
binaries for each applet, and a "libbb.so" to make the busybox common code
|
|
|
|
|
available as a shared library. Neither is ready yet at the time of this
|
|
|
|
|
writing.</p>
|
|
|
|
|
|
2006-04-11 00:46:50 +05:30
|
|
|
|
<a name="source"></a>
|
2006-01-22 07:14:29 +05:30
|
|
|
|
|
2006-04-11 00:46:50 +05:30
|
|
|
|
<h2><a name="source_applets"><b>The applet directories</b></a></h2>
|
2006-01-22 07:14:29 +05:30
|
|
|
|
|
|
|
|
|
<p>The directory "applets" contains the busybox startup code (applets.c and
|
|
|
|
|
busybox.c), and several subdirectories containing the code for the individual
|
|
|
|
|
applets.</p>
|
|
|
|
|
|
|
|
|
|
<p>Busybox execution starts with the main() function in applets/busybox.c,
|
|
|
|
|
which sets the global variable bb_applet_name to argv[0] and calls
|
|
|
|
|
run_applet_by_name() in applets/applets.c. That uses the applets[] array
|
|
|
|
|
(defined in include/busybox.h and filled out in include/applets.h) to
|
|
|
|
|
transfer control to the appropriate APPLET_main() function (such as
|
|
|
|
|
cat_main() or sed_main()). The individual applet takes it from there.</p>
|
|
|
|
|
|
|
|
|
|
<p>This is why calling busybox under a different name triggers different
|
|
|
|
|
functionality: main() looks up argv[0] in applets[] to get a function pointer
|
|
|
|
|
to APPLET_main().</p>
|
|
|
|
|
|
|
|
|
|
<p>Busybox applets may also be invoked through the multiplexor applet
|
|
|
|
|
"busybox" (see busybox_main() in applets/busybox.c), and through the
|
|
|
|
|
standalone shell (grep for STANDALONE_SHELL in applets/shell/*.c).
|
|
|
|
|
See <a href="FAQ.html#getting_started">getting started</a> in the
|
|
|
|
|
FAQ for more information on these alternate usage mechanisms, which are
|
|
|
|
|
just different ways to reach the relevant APPLET_main() function.</p>
|
|
|
|
|
|
|
|
|
|
<p>The applet subdirectories (archival, console-tools, coreutils,
|
|
|
|
|
debianutils, e2fsprogs, editors, findutils, init, loginutils, miscutils,
|
|
|
|
|
modutils, networking, procps, shell, sysklogd, and util-linux) correspond
|
|
|
|
|
to the configuration sub-menus in menuconfig. Each subdirectory contains the
|
|
|
|
|
code to implement the applets in that sub-menu, as well as a Config.in
|
|
|
|
|
file defining that configuration sub-menu (with dependencies and help text
|
|
|
|
|
for each applet), and the makefile segment (Makefile.in) for that
|
|
|
|
|
subdirectory.</p>
|
|
|
|
|
|
|
|
|
|
<p>The run-time --help is stored in usage_messages[], which is initialized at
|
|
|
|
|
the start of applets/applets.c and gets its help text from usage.h. During the
|
|
|
|
|
build this help text is also used to generate the BusyBox documentation (in
|
|
|
|
|
html, txt, and man page formats) in the docs directory. See
|
|
|
|
|
<a href="#adding">adding an applet to busybox</a> for more
|
|
|
|
|
information.</p>
|
|
|
|
|
|
2006-04-11 00:46:50 +05:30
|
|
|
|
<h2><a name="source_libbb"><b>libbb</b></a></h2>
|
2006-01-22 07:14:29 +05:30
|
|
|
|
|
|
|
|
|
<p>Most non-setup code shared between busybox applets lives in the libbb
|
|
|
|
|
directory. It's a mess that evolved over the years without much auditing
|
|
|
|
|
or cleanup. For anybody looking for a great project to break into busybox
|
|
|
|
|
development with, documenting libbb would be both incredibly useful and good
|
|
|
|
|
experience.</p>
|
|
|
|
|
|
|
|
|
|
<p>Common themes in libbb include allocation functions that test
|
|
|
|
|
for failure and abort the program with an error message so the caller doesn't
|
|
|
|
|
have to test the return value (xmalloc(), xstrdup(), etc), wrapped versions
|
|
|
|
|
of open(), close(), read(), and write() that test for their own failures
|
|
|
|
|
and/or retry automatically, linked list management functions (llist.c),
|
|
|
|
|
command line argument parsing (getopt_ulflags.c), and a whole lot more.</p>
|
|
|
|
|
|
2006-04-11 00:46:50 +05:30
|
|
|
|
<h2><a name="adding"><b>Adding an applet to busybox</b></a></h2>
|
2006-01-22 07:14:29 +05:30
|
|
|
|
|
|
|
|
|
<p>To add a new applet to busybox, first pick a name for the applet and
|
|
|
|
|
a corresponding CONFIG_NAME. Then do this:</p>
|
|
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
|
<li>Figure out where in the busybox source tree your applet best fits,
|
|
|
|
|
and put your source code there. Be sure to use APPLET_main() instead
|
|
|
|
|
of main(), where APPLET is the name of your applet.</li>
|
|
|
|
|
|
|
|
|
|
<li>Add your applet to the relevant Config.in file (which file you add
|
|
|
|
|
it to determines where it shows up in "make menuconfig"). This uses
|
|
|
|
|
the same general format as the linux kernel's configuration system.</li>
|
|
|
|
|
|
|
|
|
|
<li>Add your applet to the relevant Makefile.in file (in the same
|
|
|
|
|
directory as the Config.in you chose), using the existing entries as a
|
|
|
|
|
template and the same CONFIG symbol as you used for Config.in. (Don't
|
|
|
|
|
forget "needlibm" or "needcrypt" if your applet needs libm or
|
|
|
|
|
libcrypt.)</li>
|
|
|
|
|
|
|
|
|
|
<li>Add your applet to "include/applets.h", using one of the existing
|
|
|
|
|
entries as a template. (Note: this is in alphabetical order. Applets
|
|
|
|
|
are found via binary search, and if you add an applet out of order it
|
|
|
|
|
won't work.)</li>
|
|
|
|
|
|
|
|
|
|
<li>Add your applet's runtime help text to "include/usage.h". You need
|
|
|
|
|
at least appname_trivial_usage (the minimal help text, always included
|
|
|
|
|
in the busybox binary when this applet is enabled) and appname_full_usage
|
|
|
|
|
(extra help text included in the busybox binary with
|
|
|
|
|
CONFIG_FEATURE_VERBOSE_USAGE is enabled), or it won't compile.
|
|
|
|
|
The other two help entry types (appname_example_usage and
|
|
|
|
|
appname_notes_usage) are optional. They don't take up space in the binary,
|
|
|
|
|
but instead show up in the generated documentation (BusyBox.html,
|
|
|
|
|
BusyBox.txt, and the man page BusyBox.1).</li>
|
|
|
|
|
|
|
|
|
|
<li>Run menuconfig, switch your applet on, compile, test, and fix the
|
|
|
|
|
bugs. Be sure to try both "allyesconfig" and "allnoconfig" (and
|
|
|
|
|
"allbareconfig" if relevant).</li>
|
|
|
|
|
|
|
|
|
|
</ul>
|
|
|
|
|
|
2006-04-11 00:46:50 +05:30
|
|
|
|
<h2><a name="standards">What standards does busybox adhere to?</a></h2>
|
2006-01-22 07:14:29 +05:30
|
|
|
|
|
|
|
|
|
<p>The standard we're paying attention to is the "Shell and Utilities"
|
2006-04-11 00:46:50 +05:30
|
|
|
|
portion of the <a href="http://www.opengroup.org/onlinepubs/009695399/">Open
|
2006-01-22 07:14:29 +05:30
|
|
|
|
Group Base Standards</a> (also known as the Single Unix Specification version
|
|
|
|
|
3 or SUSv3). Note that paying attention isn't necessarily the same thing as
|
|
|
|
|
following it.</p>
|
|
|
|
|
|
|
|
|
|
<p>SUSv3 doesn't even mention things like init, mount, tar, or losetup, nor
|
|
|
|
|
commonly used options like echo's '-e' and '-n', or sed's '-i'. Busybox is
|
|
|
|
|
driven by what real users actually need, not the fact the standard believes
|
|
|
|
|
we should implement ed or sccs. For size reasons, we're unlikely to include
|
|
|
|
|
much internationalization support beyond UTF-8, and on top of all that, our
|
|
|
|
|
configuration menu lets developers chop out features to produce smaller but
|
|
|
|
|
very non-standard utilities.</p>
|
|
|
|
|
|
|
|
|
|
<p>Also, Busybox is aimed primarily at Linux. Unix standards are interesting
|
|
|
|
|
because Linux tries to adhere to them, but portability to dozens of platforms
|
|
|
|
|
is only interesting in terms of offering a restricted feature set that works
|
|
|
|
|
everywhere, not growing dozens of platform-specific extensions. Busybox
|
|
|
|
|
should be portable to all hardware platforms Linux supports, and any other
|
|
|
|
|
similar operating systems that are easy to do and won't require much
|
|
|
|
|
maintenance.</p>
|
|
|
|
|
|
|
|
|
|
<p>In practice, standards compliance tends to be a clean-up step once an
|
|
|
|
|
applet is otherwise finished. When polishing and testing a busybox applet,
|
|
|
|
|
we ensure we have at least the option of full standards compliance, or else
|
|
|
|
|
document where we (intentionally) fall short.</p>
|
|
|
|
|
|
2006-05-01 10:56:01 +05:30
|
|
|
|
<h2><a name="portability">Portability.</a></h2>
|
|
|
|
|
|
|
|
|
|
<p>Busybox is a Linux project, but that doesn't mean we don't have to worry
|
|
|
|
|
about portability. First of all, there are different hardware platforms,
|
|
|
|
|
different C library implementations, different versions of the kernel and
|
|
|
|
|
build toolchain... The file "include/platform.h" exists to centralize and
|
|
|
|
|
encapsulate various platform-specific things in one place, so most busybox
|
|
|
|
|
code doesn't have to care where it's running.</p>
|
|
|
|
|
|
|
|
|
|
<p>To start with, Linux runs on dozens of hardware platforms. We try to test
|
|
|
|
|
each release on x86, x86-64, arm, power pc, and mips. (Since qemu can handle
|
|
|
|
|
all of these, this isn't that hard.) This means we have to care about a number
|
|
|
|
|
of portability issues like endianness, word size, and alignment, all of which
|
|
|
|
|
belong in platform.h. That header handles conditional #includes and gives
|
|
|
|
|
us macros we can use in the rest of our code. At some point in the future
|
|
|
|
|
we might grow a platform.c, possibly even a platform subdirectory. As long
|
|
|
|
|
as the applets themselves don't have to care.</p>
|
|
|
|
|
|
|
|
|
|
<p>On a related note, we made the "default signedness of char varies" problem
|
|
|
|
|
go away by feeding the compiler -funsigned-char. This gives us consistent
|
|
|
|
|
behavior on all platforms, and defaults to 8-bit clean text processing (which
|
|
|
|
|
gets us halfway to UTF-8 support). NOMMU support is less easily separated
|
|
|
|
|
(see the tips section later in this document), but we're working on it.</p>
|
|
|
|
|
|
|
|
|
|
<p>Another type of portability is build environments: we unapologetically use
|
|
|
|
|
a number of gcc and glibc extensions (as does the Linux kernel), but these have
|
|
|
|
|
been picked up by packages like uClibc, TCC, and Intel's C Compiler. As for
|
|
|
|
|
gcc, we take advantage of newer compiler optimizations to get the smallest
|
|
|
|
|
possible size, but we also regression test against an older build environment
|
|
|
|
|
using the Red Hat 9 image at "http://busybox.net/downloads/qemu". This has a
|
|
|
|
|
2.4 kernel, gcc 3.2, make 3.79.1, and glibc 2.3, and is the oldest
|
|
|
|
|
build/deployment environment we still put any effort into maintaining. (If
|
|
|
|
|
anyone takes an interest in older kernels you're welcome to submit patches,
|
|
|
|
|
but the effort would probably be better spent
|
|
|
|
|
<a href="http://www.selenic.com/linux-tiny/">trimming
|
|
|
|
|
down the 2.6 kernel</a>.) Older gcc versions than that are uninteresting since
|
|
|
|
|
we now use c99 features, although
|
|
|
|
|
<a href="http://fabrice.bellard.free.fr/tcc/">tcc</a> might be worth a
|
|
|
|
|
look.</p>
|
|
|
|
|
|
|
|
|
|
<p>We also test busybox against the current release of uClibc. Older versions
|
|
|
|
|
of uClibc aren't very interesting (they were buggy, and uClibc wasn't really
|
|
|
|
|
usable as a general-purpose C library before version 0.9.26 anyway).</p>
|
|
|
|
|
|
|
|
|
|
<p>Other unix implementations are mostly uninteresting, since Linux binaries
|
|
|
|
|
have become the new standard for portable Unix programs. Specifically,
|
|
|
|
|
the ubiquity of Linux was cited as the main reason the Intel Binary
|
|
|
|
|
Compatability Standard 2 died, by the standards group organized to name a
|
|
|
|
|
successor to ibcs2: <a href="http://www.telly.org/86open/">the 86open
|
|
|
|
|
project</a>. That project disbanded in 1999 with the endorsement of an
|
|
|
|
|
existing standard: Linux ELF binaries. Since then, the major players at the
|
|
|
|
|
time (such as <a
|
|
|
|
|
href=http://www-03.ibm.com/servers/aix/products/aixos/linux/index.html>AIX</a>, <a
|
|
|
|
|
href=http://www.sun.com/software/solaris/ds/linux_interop.jsp#3>Solaris</a>, and
|
|
|
|
|
<a href=http://www.onlamp.com/pub/a/bsd/2000/03/17/linuxapps.html>FreeBSD</a>)
|
|
|
|
|
have all either grown Linux support or folded.</p>
|
|
|
|
|
|
|
|
|
|
<p>The major exceptions are newcomer MacOS X, some embedded environments
|
|
|
|
|
(such as newlib+libgloss) which provide a posix environment but not a full
|
|
|
|
|
Linux environment, and environments like Cygwin that provide only partial Linux
|
|
|
|
|
emulation. Also, some embedded Linux systems run a Linux kernel but amputate
|
|
|
|
|
things like the /proc directory to save space.</p>
|
|
|
|
|
|
|
|
|
|
<p>Supporting these systems is largely a question of providing a clean subset
|
|
|
|
|
of BusyBox's functionality -- whichever applets can easily be made to
|
|
|
|
|
work in that environment. Annotating the configuration system to
|
|
|
|
|
indicate which applets require which prerequisites (such as procfs) is
|
|
|
|
|
also welcome. Other efforts to support these systems (swapping #include
|
|
|
|
|
files to build in different environments, adding adapter code to platform.h,
|
|
|
|
|
adding more extensive special-case supporting infrastructure such as mount's
|
|
|
|
|
legacy mtab support) are handled on a case-by-case basis. Support that can be
|
|
|
|
|
cleanly hidden in platform.h is reasonably attractive, and failing that
|
|
|
|
|
support that can be cleanly separated into a separate conditionally compiled
|
|
|
|
|
file is at least worth a look. Special-case code in the body of an applet is
|
|
|
|
|
something we're trying to avoid.</p>
|
|
|
|
|
|
2006-01-29 11:59:01 +05:30
|
|
|
|
<h2><a name="tips" />Programming tips and tricks.</a></h2>
|
|
|
|
|
|
|
|
|
|
<p>Various things busybox uses that aren't particularly well documented
|
|
|
|
|
elsewhere.</p>
|
|
|
|
|
|
|
|
|
|
<h2><a name="tips_encrypted_passwords">Encrypted Passwords</a></h2>
|
|
|
|
|
|
|
|
|
|
<p>Password fields in /etc/passwd and /etc/shadow are in a special format.
|
|
|
|
|
If the first character isn't '$', then it's an old DES style password. If
|
|
|
|
|
the first character is '$' then the password is actually three fields
|
|
|
|
|
separated by '$' characters:</p>
|
|
|
|
|
<pre>
|
|
|
|
|
<b>$type$salt$encrypted_password</b>
|
|
|
|
|
</pre>
|
|
|
|
|
|
|
|
|
|
<p>The "type" indicates which encryption algorithm to use: 1 for MD5 and 2 for SHA1.</p>
|
|
|
|
|
|
|
|
|
|
<p>The "salt" is a bunch of ramdom characters (generally 8) the encryption
|
|
|
|
|
algorithm uses to perturb the password in a known and reproducible way (such
|
|
|
|
|
as by appending the random data to the unencrypted password, or combining
|
|
|
|
|
them with exclusive or). Salt is randomly generated when setting a password,
|
|
|
|
|
and then the same salt value is re-used when checking the password. (Salt is
|
|
|
|
|
thus stored unencrypted.)</p>
|
|
|
|
|
|
|
|
|
|
<p>The advantage of using salt is that the same cleartext password encrypted
|
|
|
|
|
with a different salt value produces a different encrypted value.
|
|
|
|
|
If each encrypted password uses a different salt value, an attacker is forced
|
|
|
|
|
to do the cryptographic math all over again for each password they want to
|
|
|
|
|
check. Without salt, they could simply produce a big dictionary of commonly
|
|
|
|
|
used passwords ahead of time, and look up each password in a stolen password
|
|
|
|
|
file to see if it's a known value. (Even if there are billions of possible
|
|
|
|
|
passwords in the dictionary, checking each one is just a binary search against
|
|
|
|
|
a file only a few gigabytes long.) With salt they can't even tell if two
|
|
|
|
|
different users share the same password without guessing what that password
|
|
|
|
|
is and decrypting it. They also can't precompute the attack dictionary for
|
|
|
|
|
a specific password until they know what the salt value is.</p>
|
|
|
|
|
|
|
|
|
|
<p>The third field is the encrypted password (plus the salt). For md5 this
|
|
|
|
|
is 22 bytes.</p>
|
|
|
|
|
|
|
|
|
|
<p>The busybox function to handle all this is pw_encrypt(clear, salt) in
|
|
|
|
|
"libbb/pw_encrypt.c". The first argument is the clear text password to be
|
|
|
|
|
encrypted, and the second is a string in "$type$salt$password" format, from
|
|
|
|
|
which the "type" and "salt" fields will be extracted to produce an encrypted
|
|
|
|
|
value. (Only the first two fields are needed, the third $ is equivalent to
|
|
|
|
|
the end of the string.) The return value is an encrypted password in
|
|
|
|
|
/etc/passwd format, with all three $ separated fields. It's stored in
|
|
|
|
|
a static buffer, 128 bytes long.</p>
|
|
|
|
|
|
|
|
|
|
<p>So when checking an existing password, if pw_encrypt(text,
|
|
|
|
|
old_encrypted_password) returns a string that compares identical to
|
|
|
|
|
old_encrypted_password, you've got the right password. When setting a new
|
|
|
|
|
password, generate a random 8 character salt string, put it in the right
|
|
|
|
|
format with sprintf(buffer, "$%c$%s", type, salt), and feed buffer as the
|
|
|
|
|
second argument to pw_encrypt(text,buffer).</p>
|
|
|
|
|
|
|
|
|
|
<h2><a name="tips_vfork">Fork and vfork</a></h2>
|
|
|
|
|
|
2006-02-24 01:29:34 +05:30
|
|
|
|
<p>On systems that haven't got a Memory Management Unit, fork() is unreasonably
|
|
|
|
|
expensive to implement (and sometimes even impossible), so a less capable
|
|
|
|
|
function called vfork() is used instead. (Using vfork() on a system with an
|
|
|
|
|
MMU is like pounding a nail with a wrench. Not the best tool for the job, but
|
|
|
|
|
it works.)</p>
|
|
|
|
|
|
2006-01-29 12:15:38 +05:30
|
|
|
|
<p>Busybox hides the difference between fork() and vfork() in
|
|
|
|
|
libbb/bb_fork_exec.c. If you ever want to fork and exec, use bb_fork_exec()
|
|
|
|
|
(which returns a pid and takes the same arguments as execve(), although in
|
|
|
|
|
this case envp can be NULL) and don't worry about it. This description is
|
|
|
|
|
here in case you want to know why that does what it does.</p>
|
|
|
|
|
|
2006-02-24 01:29:34 +05:30
|
|
|
|
<p>Implementing fork() depends on having a Memory Management Unit. With an
|
|
|
|
|
MMU then you can simply set up a second set of page tables and share the
|
|
|
|
|
physical memory via copy-on-write. So a fork() followed quickly by exec()
|
|
|
|
|
only copies a few pages of the parent's memory, just the ones it changes
|
|
|
|
|
before freeing them.</p>
|
|
|
|
|
|
|
|
|
|
<p>With a very primitive MMU (using a base pointer plus length instead of page
|
|
|
|
|
tables, which can provide virtual addresses and protect processes from each
|
|
|
|
|
other, but no copy on write) you can still implement fork. But it's
|
2006-04-11 00:10:27 +05:30
|
|
|
|
unreasonably expensive, because you have to copy all the parent process'
|
2006-02-24 01:29:34 +05:30
|
|
|
|
memory into the new process (which could easily be several megabytes per fork).
|
|
|
|
|
And you have to do this even though that memory gets freed again as soon as the
|
|
|
|
|
exec happens. (This is not just slow and a waste of space but causes memory
|
|
|
|
|
usage spikes that can easily cause the system to run out of memory.)</p>
|
|
|
|
|
|
|
|
|
|
<p>Without even a primitive MMU, you have no virtual addresses. Every process
|
2006-04-11 00:10:27 +05:30
|
|
|
|
can reach out and touch any other process' memory, because all pointers are to
|
|
|
|
|
physical addresses with no protection. Even if you copy a process' memory to
|
2006-02-24 01:29:34 +05:30
|
|
|
|
new physical addresses, all of its pointers point to the old objects in the
|
|
|
|
|
old process. (Searching through the new copy's memory for pointers and
|
|
|
|
|
redirect them to the new locations is not an easy problem.)</p>
|
|
|
|
|
|
|
|
|
|
<p>So with a primitive or missing MMU, fork() is just not a good idea.</p>
|
2006-01-29 11:59:01 +05:30
|
|
|
|
|
|
|
|
|
<p>In theory, vfork() is just a fork() that writeably shares the heap and stack
|
|
|
|
|
rather than copying it (so what one process writes the other one sees). In
|
|
|
|
|
practice, vfork() has to suspend the parent process until the child does exec,
|
|
|
|
|
at which point the parent wakes up and resumes by returning from the call to
|
|
|
|
|
vfork(). All modern kernel/libc combinations implement vfork() to put the
|
|
|
|
|
parent to sleep until the child does its exec. There's just no other way to
|
2006-02-24 01:29:34 +05:30
|
|
|
|
make it work: the parent has to know the child has done its exec() or exit()
|
|
|
|
|
before it's safe to return from the function it's in, so it has to block
|
|
|
|
|
until that happens. In fact without suspending the parent there's no way to
|
|
|
|
|
even store separate copies of the return value (the pid) from the vfork() call
|
2006-01-29 11:59:01 +05:30
|
|
|
|
itself: both assignments write into the same memory location.</p>
|
|
|
|
|
|
|
|
|
|
<p>One way to understand (and in fact implement) vfork() is this: imagine
|
|
|
|
|
the parent does a setjmp and then continues on (pretending to be the child)
|
|
|
|
|
until the exec() comes around, then the _exec_ does the actual fork, and the
|
|
|
|
|
parent does a longjmp back to the original vfork call and continues on from
|
|
|
|
|
there. (It thus becomes obvious why the child can't return, or modify
|
|
|
|
|
local variables it doesn't want the parent to see changed when it resumes.)
|
|
|
|
|
|
|
|
|
|
<p>Note a common mistake: the need for vfork doesn't mean you can't have two
|
|
|
|
|
processes running at the same time. It means you can't have two processes
|
|
|
|
|
sharing the same memory without stomping all over each other. As soon as
|
|
|
|
|
the child calls exec(), the parent resumes.</p>
|
|
|
|
|
|
2006-01-29 12:15:38 +05:30
|
|
|
|
<p>If the child's attempt to call exec() fails, the child should call _exit()
|
|
|
|
|
rather than a normal exit(). This avoids any atexit() code that might confuse
|
|
|
|
|
the parent. (The parent should never call _exit(), only a vforked child that
|
|
|
|
|
failed to exec.)</p>
|
|
|
|
|
|
2006-01-29 11:59:01 +05:30
|
|
|
|
<p>(Now in theory, a nommu system could just copy the _stack_ when it forks
|
|
|
|
|
(which presumably is much shorter than the heap), and leave the heap shared.
|
2006-02-24 01:29:34 +05:30
|
|
|
|
Even with no MMU at all
|
2006-01-29 11:59:01 +05:30
|
|
|
|
In practice, you've just wound up in a multi-threaded situation and you can't
|
2006-04-11 00:10:27 +05:30
|
|
|
|
do a malloc() or free() on your heap without freeing the other process' memory
|
2006-01-29 11:59:01 +05:30
|
|
|
|
(and if you don't have the proper locking for being threaded, corrupting the
|
|
|
|
|
heap if both of you try to do it at the same time and wind up stomping on
|
|
|
|
|
each other while traversing the free memory lists). The thing about vfork is
|
|
|
|
|
that it's a big red flag warning "there be dragons here" rather than
|
|
|
|
|
something subtle and thus even more dangerous.)</p>
|
|
|
|
|
|
2006-02-12 06:15:39 +05:30
|
|
|
|
<h2><a name="tips_sort_read">Short reads and writes</a></h2>
|
|
|
|
|
|
|
|
|
|
<p>Busybox has special functions, bb_full_read() and bb_full_write(), to
|
|
|
|
|
check that all the data we asked for got read or written. Is this a real
|
|
|
|
|
world consideration? Try the following:</p>
|
|
|
|
|
|
|
|
|
|
<pre>while true; do echo hello; sleep 1; done | tee out.txt</pre>
|
|
|
|
|
|
|
|
|
|
<p>If tee is implemented with bb_full_read(), tee doesn't display output
|
|
|
|
|
in real time but blocks until its entire input buffer (generally a couple
|
|
|
|
|
kilobytes) is read, then displays it all at once. In that case, we _want_
|
|
|
|
|
the short read, for user interface reasons. (Note that read() should never
|
|
|
|
|
return 0 unless it has hit the end of input, and an attempt to write 0
|
|
|
|
|
bytes should be ignored by the OS.)</p>
|
|
|
|
|
|
|
|
|
|
<p>As for short writes, play around with two processes piping data to each
|
2006-04-11 00:46:50 +05:30
|
|
|
|
other on the command line (cat bigfile | gzip > out.gz) and suspend and
|
2006-02-12 06:15:39 +05:30
|
|
|
|
resume a few times (ctrl-z to suspend, "fg" to resume). The writer can
|
|
|
|
|
experience short writes, which are especially dangerous because if you don't
|
|
|
|
|
notice them you'll discard data. They can also happen when a system is under
|
|
|
|
|
load and a fast process is piping to a slower one. (Such as an xterm waiting
|
|
|
|
|
on x11 when the scheduler decides X is being a CPU hog with all that
|
|
|
|
|
text console scrolling...)</p>
|
|
|
|
|
|
|
|
|
|
<p>So will data always be read from the far end of a pipe at the
|
|
|
|
|
same chunk sizes it was written in? Nope. Don't rely on that. For one
|
2006-04-11 00:46:50 +05:30
|
|
|
|
counterexample, see <a href="http://www.faqs.org/rfcs/rfc896.html">rfc 896
|
2006-02-12 06:15:39 +05:30
|
|
|
|
for Nagle's algorithm</a>, which waits a fraction of a second or so before
|
|
|
|
|
sending out small amounts of data through a TCP/IP connection in case more
|
|
|
|
|
data comes in that can be merged into the same packet. (In case you were
|
|
|
|
|
wondering why action games that use TCP/IP set TCP_NODELAY to lower the latency
|
|
|
|
|
on their their sockets, now you know.)</p>
|
|
|
|
|
|
2006-04-10 23:24:23 +05:30
|
|
|
|
<h2><a name="tips_memory">Memory used by relocatable code, PIC, and static linking.</a></h2>
|
|
|
|
|
|
|
|
|
|
<p>The downside of standard dynamic linking is that it results in self-modifying
|
2006-04-11 00:10:27 +05:30
|
|
|
|
code. Although each executable's pages are mmaped() into a process' address
|
2006-04-10 23:24:23 +05:30
|
|
|
|
space from the executable file and are thus naturally shared between processes
|
|
|
|
|
out of the page cache, the library loader (ld-linux.so.2 or ld-uClibc.so.0)
|
|
|
|
|
writes to these pages to supply addresses for relocatable symbols. This
|
|
|
|
|
dirties the pages, triggering copy-on-write allocation of new memory for each
|
2006-04-11 00:10:27 +05:30
|
|
|
|
processes' dirtied pages.</p>
|
2006-04-10 23:24:23 +05:30
|
|
|
|
|
|
|
|
|
<p>One solution to this is Position Independent Code (PIC), a way of linking
|
|
|
|
|
a file so all the relocations are grouped together. This dirties fewer
|
2006-04-11 00:10:27 +05:30
|
|
|
|
pages (often just a single page) for each process' relocations. The down
|
2006-04-10 23:24:23 +05:30
|
|
|
|
side is this results in larger executables, which take up more space on disk
|
|
|
|
|
(and a correspondingly larger space in memory). But when many copies of the
|
|
|
|
|
same program are running, PIC dynamic linking trades a larger disk footprint
|
|
|
|
|
for a smaller memory footprint, by sharing more pages.</p>
|
|
|
|
|
|
|
|
|
|
<p>A third solution is static linking. A statically linked program has no
|
|
|
|
|
relocations, and thus the entire executable is shared between all running
|
|
|
|
|
instances. This tends to have a significantly larger disk footprint, but
|
|
|
|
|
on a system with only one or two executables, shared libraries aren't much
|
|
|
|
|
of a win anyway.</p>
|
|
|
|
|
|
|
|
|
|
<p>You can tell the glibc linker to display debugging information about its
|
|
|
|
|
relocations with the environment variable "LD_DEBUG". Try
|
2006-04-11 00:10:27 +05:30
|
|
|
|
"LD_DEBUG=help /bin/true" for a list of commands. Learning to interpret
|
2006-04-10 23:24:23 +05:30
|
|
|
|
"LD_DEBUG=statistics cat /proc/self/statm" could be interesting.</p>
|
|
|
|
|
|
2006-04-18 03:17:03 +05:30
|
|
|
|
<p>For more on this topic, here's Rich Felker:</p>
|
|
|
|
|
<blockquote>
|
|
|
|
|
<p>Dynamic linking (without fixed load addresses) fundamentally requires
|
|
|
|
|
at least one dirty page per dso that uses symbols. Making calls (but
|
|
|
|
|
never taking the address explicitly) to functions within the same dso
|
|
|
|
|
does not require a dirty page by itself, but will with ELF unless you
|
|
|
|
|
use -Bsymbolic or hidden symbols when linking.</p>
|
|
|
|
|
|
|
|
|
|
<p>ELF uses significant additional stack space for the kernel to pass all
|
|
|
|
|
the ELF data structures to the newly created process image. These are
|
|
|
|
|
located above the argument list and environment. This normally adds 1
|
|
|
|
|
dirty page to the process size.</p>
|
|
|
|
|
|
|
|
|
|
<p>The ELF dynamic linker has its own data segment, adding one or more
|
|
|
|
|
dirty pages. I believe it also performs relocations on itself.</p>
|
|
|
|
|
|
|
|
|
|
<p>The ELF dynamic linker makes significant dynamic allocations to manage
|
|
|
|
|
the global symbol table and the loaded dso's. This data is never
|
|
|
|
|
freed. It will be needed again if libdl is used, so unconditionally
|
|
|
|
|
freeing it is not possible, but normal programs do not use libdl. Of
|
|
|
|
|
course with glibc all programs use libdl (due to nsswitch) so the
|
|
|
|
|
issue was never addressed.</p>
|
|
|
|
|
|
|
|
|
|
<p>ELF also has the issue that segments are not page-aligned on disk.
|
|
|
|
|
This saves up to 4k on disk, but at the expense of using an additional
|
|
|
|
|
dirty page in most cases, due to a large portion of the first data
|
|
|
|
|
page being filled with a duplicate copy of the last text page.</p>
|
|
|
|
|
|
|
|
|
|
<p>The above is just a partial list of the tiny memory penalties of ELF
|
|
|
|
|
dynamic linking, which eventually add up to quite a bit. The smallest
|
|
|
|
|
I've been able to get a process down to is 8 dirty pages, and the
|
|
|
|
|
above factors seem to mostly account for it (but some were difficult
|
|
|
|
|
to measure).</p>
|
|
|
|
|
</blockquote>
|
|
|
|
|
|
2006-05-01 10:56:01 +05:30
|
|
|
|
<h2><a name="tips_kernel_headers"></a>Including kernel headers</h2>
|
|
|
|
|
|
|
|
|
|
<p>The "linux" or "asm" directories of /usr/include contain Linux kernel
|
|
|
|
|
headers, so that the C library can talk directly to the Linux kernel. In
|
|
|
|
|
a perfect world, applications shouldn't include these headers directly, but
|
|
|
|
|
we don't live in a perfect world.</p>
|
|
|
|
|
|
|
|
|
|
<p>For example, Busybox's losetup code wants linux/loop.c because nothing else
|
|
|
|
|
#defines the structures to call the kernel's loopback device setup ioctls.
|
|
|
|
|
Attempts to cut and paste the information into a local busybox header file
|
|
|
|
|
proved incredibly painful, because portions of the loop_info structure vary by
|
|
|
|
|
architecture, namely the type __kernel_dev_t has different sizes on alpha,
|
|
|
|
|
arm, x86, and so on. Meaning we either #include <linux/posix_types.h> or
|
|
|
|
|
we hardwire #ifdefs to check what platform we're building on and define this
|
|
|
|
|
type appropriately for every single hardware architecture supported by
|
|
|
|
|
Linux, which is simply unworkable.</p>
|
|
|
|
|
|
|
|
|
|
<p>This is aside from the fact that the relevant type defined in
|
|
|
|
|
posix_types.h was renamed to __kernel_old_dev_t during the 2.5 series, so
|
|
|
|
|
to cut and paste the structure into our header we have to #include
|
|
|
|
|
<linux/version.h> to figure out which name to use. (What we actually do is
|
|
|
|
|
check if we're building on 2.6, and if so just use the new 64 bit structure
|
|
|
|
|
instead to avoid the rename entirely.) But we still need the version
|
|
|
|
|
check, since 2.4 didn't have the 64 bit structure.</p>
|
|
|
|
|
|
|
|
|
|
<p>The BusyBox developers spent <u>two years</u> _two years_ trying to figure
|
|
|
|
|
out a clean way to do all this. There isn't one. The losetup in the
|
|
|
|
|
util-linux package from kernel.org isn't doing it cleanly either, they just
|
|
|
|
|
hide the ugliness by nesting #include files. Their mount/loop.h
|
|
|
|
|
#includes "my_dev_t.h", which #includes <linux/posix_types.h> and
|
|
|
|
|
<linux/version.h> just like we do. There simply is no alternative.</p>
|
|
|
|
|
|
|
|
|
|
<p>We should never directly include kernel headers when there's a better
|
|
|
|
|
way to do it, but block copying information out of the kernel headers is not
|
|
|
|
|
a better way.</p>
|
|
|
|
|
|
2006-02-16 08:51:44 +05:30
|
|
|
|
<h2><a name="who">Who are the BusyBox developers?</a></h2>
|
|
|
|
|
|
|
|
|
|
<p>The following login accounts currently exist on busybox.net. (I.E. these
|
|
|
|
|
people can commit <a href="http://busybox.net/downloads/patches">patches</a>
|
2006-02-17 08:08:00 +05:30
|
|
|
|
into subversion for the BusyBox, uClibc, and buildroot projects.)</p>
|
|
|
|
|
|
|
|
|
|
<pre>
|
|
|
|
|
aldot :Bernhard Fischer
|
|
|
|
|
andersen :Erik Andersen <- uClibc and BuildRoot maintainer.
|
|
|
|
|
bug1 :Glenn McGrath
|
|
|
|
|
davidm :David McCullough
|
|
|
|
|
gkajmowi :Garrett Kajmowicz <- uClibc++ maintainer
|
|
|
|
|
jbglaw :Jan-Benedict Glaw
|
|
|
|
|
jocke :Joakim Tjernlund
|
|
|
|
|
landley :Rob Landley <- BusyBox maintainer
|
|
|
|
|
lethal :Paul Mundt
|
|
|
|
|
mjn3 :Manuel Novoa III
|
|
|
|
|
osuadmin :osuadmin
|
|
|
|
|
pgf :Paul Fox
|
|
|
|
|
pkj :Peter Kjellerstedt
|
|
|
|
|
prpplague :David Anders
|
|
|
|
|
psm :Peter S. Mazinger
|
|
|
|
|
russ :Russ Dill
|
|
|
|
|
sandman :Robert Griebl
|
|
|
|
|
sjhill :Steven J. Hill
|
|
|
|
|
solar :Ned Ludd
|
|
|
|
|
timr :Tim Riker
|
|
|
|
|
tobiasa :Tobias Anderberg
|
|
|
|
|
vapier :Mike Frysinger
|
|
|
|
|
</pre>
|
|
|
|
|
|
|
|
|
|
<p>The following accounts used to exist on busybox.net, but don't anymore so
|
|
|
|
|
I can't ask /etc/passwd for their names. (If anybody would like to make
|
|
|
|
|
a stab at it...)</p>
|
2006-02-16 08:51:44 +05:30
|
|
|
|
|
|
|
|
|
<pre>
|
|
|
|
|
aaronl
|
|
|
|
|
beppu
|
|
|
|
|
dwhedon
|
|
|
|
|
erik : Also Erik Andersen?
|
|
|
|
|
gfeldman
|
|
|
|
|
jimg
|
|
|
|
|
kraai
|
|
|
|
|
markw
|
|
|
|
|
miles
|
|
|
|
|
proski
|
|
|
|
|
rjune
|
|
|
|
|
tausq
|
2006-04-10 23:24:23 +05:30
|
|
|
|
vodz :Vladimir N. Oleynik
|
2006-02-16 08:51:44 +05:30
|
|
|
|
</pre>
|
|
|
|
|
|
|
|
|
|
|
2006-01-22 07:14:29 +05:30
|
|
|
|
<br>
|
|
|
|
|
<br>
|
|
|
|
|
<br>
|
|
|
|
|
|
|
|
|
|
<!--#include file="footer.html" -->
|