Documentation update: more detail on vfork.

This commit is contained in:
Rob Landley 2006-02-23 19:59:34 +00:00
parent 73a20f3551
commit b21837714a

View File

@ -237,29 +237,41 @@ second argument to pw_encrypt(text,buffer).</p>
<h2><a name="tips_vfork">Fork and vfork</a></h2>
<p>On systems that haven't got a Memory Management Unit, fork() is unreasonably
expensive to implement (and sometimes even impossible), so a less capable
function called vfork() is used instead. (Using vfork() on a system with an
MMU is like pounding a nail with a wrench. Not the best tool for the job, but
it works.)</p>
<p>Busybox hides the difference between fork() and vfork() in
libbb/bb_fork_exec.c. If you ever want to fork and exec, use bb_fork_exec()
(which returns a pid and takes the same arguments as execve(), although in
this case envp can be NULL) and don't worry about it. This description is
here in case you want to know why that does what it does.</p>
<p>On systems that haven't got a Memory Management Unit, fork() is unreasonably
expensive to implement, so a less capable function called vfork() is used
instead.</p>
<p>Implementing fork() depends on having a Memory Management Unit. With an
MMU then you can simply set up a second set of page tables and share the
physical memory via copy-on-write. So a fork() followed quickly by exec()
only copies a few pages of the parent's memory, just the ones it changes
before freeing them.</p>
<p>The reason vfork() exists is that if you haven't got an MMU then you can't
simply set up a second set of page tables and share the physical memory via
copy-on-write, which is what fork() normally does. This means that actually
forking has to copy all the parent's memory (which could easily be tens of
megabytes). And you have to do this even though that memory gets freed again
as soon as the exec happens, so it's probably all a big waste of time.</p>
<p>With a very primitive MMU (using a base pointer plus length instead of page
tables, which can provide virtual addresses and protect processes from each
other, but no copy on write) you can still implement fork. But it's
unreasonably expensive, because you have to copy all the parent process's
memory into the new process (which could easily be several megabytes per fork).
And you have to do this even though that memory gets freed again as soon as the
exec happens. (This is not just slow and a waste of space but causes memory
usage spikes that can easily cause the system to run out of memory.)</p>
<p>This is not only slow and a waste of space, it also causes totally
unnecessary memory usage spikes based on how big the _parent_ process is (not
the child), and these spikes are quite likely to trigger an out of memory
condition on small systems (which is where nommu is common anyway). So
although you _can_ emulate a real fork on a nommu system, you really don't
want to.</p>
<p>Without even a primitive MMU, you have no virtual addresses. Every process
can reach out and touch any other process's memory, because all pointers are to
physical addresses with no protection. Even if you copy a process's memory to
new physical addresses, all of its pointers point to the old objects in the
old process. (Searching through the new copy's memory for pointers and
redirect them to the new locations is not an easy problem.)</p>
<p>So with a primitive or missing MMU, fork() is just not a good idea.</p>
<p>In theory, vfork() is just a fork() that writeably shares the heap and stack
rather than copying it (so what one process writes the other one sees). In
@ -267,10 +279,10 @@ practice, vfork() has to suspend the parent process until the child does exec,
at which point the parent wakes up and resumes by returning from the call to
vfork(). All modern kernel/libc combinations implement vfork() to put the
parent to sleep until the child does its exec. There's just no other way to
make it work: they're sharing the same stack, so if either one returns from its
function it stomps on the callstack so that when the other process returns,
hilarity ensues. In fact without suspending the parent there's no way to even
store separate copies of the return value (the pid) from the vfork() call
make it work: the parent has to know the child has done its exec() or exit()
before it's safe to return from the function it's in, so it has to block
until that happens. In fact without suspending the parent there's no way to
even store separate copies of the return value (the pid) from the vfork() call
itself: both assignments write into the same memory location.</p>
<p>One way to understand (and in fact implement) vfork() is this: imagine
@ -292,6 +304,7 @@ failed to exec.)</p>
<p>(Now in theory, a nommu system could just copy the _stack_ when it forks
(which presumably is much shorter than the heap), and leave the heap shared.
Even with no MMU at all
In practice, you've just wound up in a multi-threaded situation and you can't
do a malloc() or free() on your heap without freeing the other process's memory
(and if you don't have the proper locking for being threaded, corrupting the