The following notes are taken from a software systems course I taught in which OS concepts featured rather prominently. For us, we want to focus on the basic outline of what a process is, what is private and what is shared, how to create processes, how to communicate among them.Unix is built on two fundamental abstractions: files and processes. A file is a stream of bytes. A process is an active entity, while a program is a static collection of bits. A program, more specifically, is a collection of bits that could be executed on the given computer under the given OS. It includes instructions to execute and information about static data. A program (the data) is stored in a file. A process is a program plus all the resources needed for its execution. We have had occasion to explore some of these resources (including the kernel structures for keeping track of them): virtual CPU including process registers, stack pointer, program counter; virtual address space; a memory structure that includes memory areas for program code, static data, stack space, and dynamically allocated data; file descriptors; process ID; real and effective user and group IDs, current working directory. A process starts with just one thread of control, but, as we've seen, it can acquire more.
There is another item associated with each process that it
would be appropriate to mention now: an environment. If
you type printenv into a shell, you'll see the
environment associated with your shell process.
From within C, an environment is an array of pointers to
character strings, just like the argv vector,
except that each string is of the
form VARIABLE=VALUE (operations are
case-sensitive). A program can get to its environment via the
global variable environ, and environments are
supported by the calls getenv(),
putenv(), setenv(), and
unsetenv().
A particularly important environment variable, and one that
figures prominently in today's material
is PATH, which provides a list of
directories to search when someone invokes a program.
We think of a process as running the instructions from and starting out with the data from a program executable file, and every operating system includes some way to run programs. It would be natural, therefore, to have a system call that takes the name of a file (or a file descriptor), creates a new process, loads the program into the address space, and starts it up. On some operating systems, there is indeed a call to spawn a new process in this way.
The traditional Unix model is a bit different, relying on two
separate types of system calls that, put together, perform this
function. (Actually, modern Unix systems support a
posix_spawn(), but we'll ignore that for now.)
- The
fork()call creates a new process, but the program running in the new process is the same as the one that calledfork(): When you fork a new process, the kernel logically copies your instruction, data, and stack segments into a new process. - The
exec()family of calls (none are actually calledexec(), as we'll see below) replace the program running in a process with a new program. No new process is created, the process just runs a different a program. (You can think of this as being analogous to a tail call optimization in a compiler.)
Pause and think that through. The only way to run a program
is via exec(), which doesn't create a new process;
and the only way to create a process is fork(),
which doesn't run a new program. (In Linux, fork()
is implemented via a system call called clone(),
but this has more to do with Linux threads than the present
material.)
The usual way, therefore, to launch a program in a new
process is to fork() a new process and then have
the newer fork (the child process) immediately
exec() the program. If you think this seems
unnecessarily difficult, you're right. A lot of impressive
engineering has been invested to make this all work efficiently.
exec()
There is no single exec() call. Rather, there
is a family of exec() system calls that
provides some flexibility in how arguments are passed in and how
the program is looked up. All the types of exec()
cause a program from a file to be loaded into the current
process.
How to call exec()
Each exec() call has a one or two character
suffix. The first character is either l, which
means the program's arguments are passed to the
exec() call in the parameter list, or a
v, which means the program's arguments are passed
into the exec() in a single vector (array)
argument. If there is a p in the second position,
it means that the program is to be looked up using the
PATH environment variable. If there is an
e in the second position, that means the caller
wants to pass in an environment explicitly rather than have the
new program inherit the existing environment.
int execv(char *path, char *argv[]);
int execl(char *path, char *arg0, char *arg1,..., (char *)NULL);
int execvp(char *path, char *argv[]);
int execlp(char *path, char *arg0, char *arg1,..., (char *)NULL);
int execve(char *path, char *argv[], char *envv[]);
int execle(char *path, char *arg0, char *arg1,..., (char *)NULL, char *envv[]);
If successful, none of these calls ever return, like calling
a continuation. If they do return, there was an error. In this
case, they return -1 and set errno.
exec() might fail if the file doesn't exist, isn't
a valid executable, or is not executable by the effective user
or group ID of the current process.
The path is the file to execute. In the
p versions, the path is a file name that will be
searched for in each element of the path list defined by the
PATH environment variable. The program file must
be a valid executable or a script. The
convention is that script files begin with the
characters #! (pronounced “she-bang”)
followed by the pathname of the an interpreter to execute the
script on the first line. A particularly common script is a
shell script, where the interpreter is a Unix shell,
like bash: #!/usr/bin/bash
The argv arguments are just like the
argv parameters every C program gets from the shell
— every program is executed via an exec()
call! Recall what argv usually looks like:
Note: the last element of the argument
vector must be NULL. (Go ahead and check:
argv[argc] actually does exist in your programs,
and its value is always NULL.) The l
variants allow one to specify the paramters without having to
build an array, provided you know how many parameters there are
at compile time. This is slick: on systems where the stacks
grow from higher to lower memory addresses, the library doesn't
have to build the vector for the actual kernel call; it just
uses the vector created when your arguments are pushed on the
stack (in reverse order)! And so you must remember
the NULL here, too. envv[] indicates
a vector of strings representing variable/value pairs.
Things you should know about using exec()
Since exec() leaves the process intact, much of
the associated information is preserved. The process ID, real
and effective UIDs and GIDs (unless the program file has set-UID
or set-GID bits on), parent process, times, open file
descriptors are all preserved.
There is a way to say that a file descriptor should be closed
on an exec(). However, it is usually a good idea
to keep track of what files you have open and close them when
you do an exec(), except for standard input,
output, and error, of course.
The text (instructions), data, bss, and stack are
overwritten/reinitialized. Signals are cleared, functions
registered with atexit() are cleared, any shared
memory segments with other processes are unshared, and in
general anything that doesn't make sense in the context of a new
program is not preserved.
An exec() example
Here is an example of an exec() call in action.
Try it out yourself!
The program below (whose source is in
execl-example.c)
prints a message including it's name and process ID, then
executes another program (source in
print-pid.c, and
then prints another message:
The print-pid program, whose source is below,
just prints out its name and PID:
As you can see from the following script, the message after the
exec() call is never printed, because
exec() never returns. Notice how the program name
is different, but the PID is the same:
exec().
See the fflush() and fsync() manual
pages.
fork()
The fork() call creates a new process that is
essentially a clone of the current process at the moment of
the fork() call (in fact, Linux uses
a clone() call as we said before). The kernel
creates a new address space and copies the instruction and data
segments into that address space (in fact, the instruction
segment is shared). File descriptors are copied, but the file
structures (including the file pointer) are shared.
N.B. The
new process is not initialized from a program file.
That's what exec() is for.
Each
process has its own stack (the new stack is a copy of the old
one), process ID, process state (including execution times),
and, from the fork() on, its own data.
How to call fork()
The fork() call itself is very simple:
pid_t fork(void);
It's operation is straightforward, but understanding how to program with it requires some imagination.
When your program is running in a process, and it executes a
fork(), the result is two separate processes
running your program. The new process does not start the
program from the beginning (remember, all that data you've
computed and state you have accumulated is copied, so
reinitializing would mean reloading the program file). Instead,
the new process starts exececuting at the only reasonable place:
the return from the fork() call.
So exec() never returns, and fork()
returns twice (if successful).
Your program must account for two separate processes
returning from the fork(). (Which returns first is
unpredictable and depends on the particular process schedule in
your operating system — it may not even be consistent
across programs or runs of the same program.) It would be
unusual for both the parent and child to do exactly the same
thing.
When you write your program, you can distinguish the
processes by the return value of fork(): the new
process, which is a child of the first process, gets back a
0 while the parent process gets back the process ID
of the child. A return value of -1 means the
fork() was unsuccessful, usually because of a lack
of resources or a limit on the number of processes.
Things you should know about using fork()
Here is a standard example program (fork-example.c)
that uses
fork() (and not much else):
Run this example yourself as shown in the example script and see what you get.
Notice how the line printing the process ID gets executed twice, once by each process. But wait, it appears that the first line executes twice in one run but not in the other! It doesn't actually execute twice, of course. The key to understanding this behavior is the I/O redirection.
When writing to a terminal, the kernel flushes the output
buffers when there is a newline character (something we're all
familiar with by now). But when writing to a file, buffering
takes place in larger units that are a better match to the disk
block size. After the fork(), the child process
gets a copy of all the parent's data, including the unflushed
user-level I/O buffers. When each process exits, it flushes
it's I/O buffers, so each process sends a copy of this data to
disk.
fork(). In particular,
you should be mindful of open files and unflushed I/O buffers.
The easest thing to do is to flush and close open files before
forking.
Keep in mind that the child gets a copy of the address space of the parent. (Read only memory, like the text segment, may be shared.) This means that any modifications either process makes to its data is not visible to the other. This is a feature that greatly simplifies concurrent programming with processes (which contrasts sharply with the shared memory model of threads). Generally, this is good from the point of view of software engineering and abstraction. However, questions of efficiency aside, it does mean that it can be difficult for cooperating processes to communicate.
One thing that is shared is the file table entry for any open files. That means that parent and child also share a file pointer for any open files. This is good if you want both processes to write to, for example, a common standard output, standard error, or a log file. You should be careful about I/O however, because there are no guarantees about which process will run when or for how long at a time, so I/O operations can be unpredictably interleaved.
Try the following example (fork-IO.c):
Here is what happened to me on a Linux station:
To spawn a process executing a new program (as
in posix_spawn()), we would code something that
looks like this:
...
pid = fork();
if (pid == 0) {
/* use exec() to load a new program over me */
}
/* Parent process finishes up and terminates */
...
Isn't this Monstrously Inefficient?
Implemented in the obvious way, yes. And Unix systems did implement these calls in the obvious way at one time. In this model,fork() is a very expensive (heavy
weight) operation: The kernel must build all the new process
related structures, clone the address space of the parent
process by copying everything into a new address space, then
launch the child process which almost immediately overwrites all
the recently copied data from a program file. This situation
led to a nasty hack (vfork() in BSD Unix) and
provided an added impetus to the development of threads,
aka
lightweight processes.
In order to ameliorate the high cost of these primitives in
the common case, Unix systems started to use copy on
write memory pages, which most modern virtual memory
systems support. The idea is that the child process gets a
copy of the parent process's page table, which means that
it actually shares physical memory resources with the parent.
However, the data pages (including the stack) are marked
copy-on-write. As long as a page is only read, there is no
copying. When either process tries to write to a shared page,
it is copied then, and the child process's page table is updated
accordingly. Thus, the fork() call itself only has
to create a new process structure (a task_struct in
Linux) and copy the page table.
In the typical case where the new process calls
exec() almost immediately, few, if any, pages are
written, and therefore, almost no copying takes place.
Why did they do it this way?
I have asked this question a lot. It seems odd to split the common task of spawning a new process into two parts. [Stevens, p. 193] argues that there are two main motivations:-
fork()is quite useful on its own. For example, it is quite common for a network server to have a main process that waits for requests over a network socket. When a request comes in, the server forks. The child processes the request and the parent continues waiting for another request. You will write a similarly structured program in which the parent process reads requests from its input and dispatches child processes to handle the requests. - The two-part model allows the child
process to do some things before calling
exec(). For example, the child may perform I/O redirection or change some other process attributes before having them be inherited across anexec().
Where does it all end?
To understand how a process ends, we need to let you in on a secret about how they begin: In Unix systems, the kernel does not actually start executing your program inmain(). The linker inserts a little bit of startup
code that takes the environment and arguments and then calls
your main(). (Some of the information for this
section was taken from Stevens, Chapter 7.)
When a process ends, it's resources are returned to the kernel, and, usually, its parent its notified. Processes can end normally or abnormally.
Normal Termination
To terminate normally, processes call one of two functions:
void exit(int status);
void _exit(int status);
There is also an _Exit(), but it's just the same
as _exit().
We have been using exit() for some time.
exit() calls all exit handlers (registred with
atexit()), then performs I/O cleanup duites
(flushing all buffers), and then calls _exit().
_exit() returns directly to the kernel.
In both cases, open file descriptors are closed and the parent process is notified (and can query the return code).
So, the startup code the linker puts in your program, essentially does this:
exit(main(argc, argv));
[Stevens, p. 162.]
Why would you ever call _exit()? There are
situations in which you actually want to avoid the cleanup
actions implied by exit(). One common case
involves a fork() that doesn't involve a subsequent
exec(). If the parent program had registered exit
handlers, then usually they should only be done once. The
typical strategy is to let the parent do the cleanup and have
the child process call _exit().
There is a lot of energy devoted to this idea of cleaning up
after a program. We know about the most obvious reason: We
normally want to ensure all user-level buffers are flushed
before the process's memory disappears (thus,
the normal cleanup involves the equivalent of an
fclose() on all open files). But it can be handy
for the programmer to be able to specify actions to be performed
however a program terminates. Imagine a database application.
If an error arises and some part of the program calls
exit(), the application may need to undo the
changes it made to the database or ensure that the index
structures are not in an inconsistent state. The way to specify
that you would like something done when your program terminates
is wrap up the activities in a function and register
that function with the atexit() call:
int atexit(void (*function)(void));
atexit() uses a function pointer. In this
instance, the parameter named function is a pointer
to a function that takes no arguments ((void)) and
returns no value. A process can register up to 32 such
functions.
Here is an example program (atexit-demo.c
that registers two different exit handlers, each one twice:
Notice two things:
- The exit handlers are called once for each time they are registered.
- They are called in reverse order: The last function registered is the first one called.
Write a program that takes an integer command line argument and then sleeps for that number of seconds, prints out a wake up message, and terminates.
Then use atexit() to add one or more handlers.
Then run the program an type C-c at it.
Abnormal Termination
Abnormal terminination usually results from an error or unhandled interruption. (We'll be talking about signaling a process later.) A process can deliberately create an abnormal termination by usingabort(), which works by
sending the process a SIGABRT:
void abort(void);
In the case of an abnormal abort (deliberate or otherwise) due
to an unhandled signal, the
process's status is set by the kernel to a value that
reflects the reason the process was terminated.
The Unix Undead
When a process exits, whether abnormally or normally, the kernel reclaims its resources, including its virtual address space, buffers, etc. However, a process descriptor remains in the kernel's process list. The structure contains an indication that the process has terminated (its process state) and the exit status.
A terminated process does not completely disappear until someone asks for the return code. A process in this state (in which almost all of its resources are gone but its process descriptor and return code are still there) is called a zombie. Once the parent gets the status (or the system decides no one really cares), the process is completely reclaimed. Collecting the exit statuses so zombie descriptors can be deallocated is called reaping.
See the Wikipedia page on zombie processes for more information.
Nothing Left but the Waiting
A parent can wait for one of its children to terminate. Indeed, it is not nice to leave zombies around, so one should keep track of allfork()ed processes and collect their return
statuses. There is a family of wait() calls to
support this.
pid_t wait(int *status);
pid_t waitpid(pid_t pid, int *status, int options);
wait() blocks until any child terminates. The
terminated child's status is placed in status and
the return value is the process ID of the terminating child.
waitpid() is similar, except that it waits for
the specific process with the given process ID to terminate. If
pid is -1, that means to wait for any
child (like wait()). There are other values that
have to do with restricting the parent's interest to processes
in specific process groups (which we have not yet discussed).
waitpid() also has an option, WNOHANG,
that tells it not to block if a matching child has not
terminated (in which case it returns 0).
We saw above that normal termination involves a final process
status code, and abnormal termination involved a status that
indicated the reason for the termination. (A normal exit code,
specified by the argument to exit(),
_exit(), or by returning from main(),
should be a one byte value.) You can tell from a process's status
whether it terminated normally or abnormally using a collection
of macros that come along with the wait() calls:
WIFEXITED(status)returns true if the process terminated normally (by returning frommain()or callingexit()or_exit()).
WEXITSTATUS(status)returns the least significant byte of the return code of the process provided it terminated normally.
WIFSIGNALED(status)returns true if the process terminated as the result of an uncaught signal.
WTERMSIG(status)returns the number of the signal that caused the process to terminate.
WIFSTOPPED(status)returns true if the process is stopped.
WSTOPSIG(status)returns the number of the signal which caused the child to stop.
Orphaned Processes
A polite program reaps its zombies. But what if a program is impolite or crashes? What happens to its children? A process whose parent has terminated is called an orphan process. In most Unix systems, the kernel finds all children of a process that terminates and changes their parent process IDs to1. 1 is the process ID of the
init process, which is said to inherit (or
adopt) the orphaned processes. init is the
process that is started immediately after the operating system
boots. It brings the system to a particular run level
(e.g., single-user or multi-user) and then goes through all the
runtime configuration files and starts (via fork()
and
exec()) all the appropriate programs.
The init process is polite: it periodically
wait()s on its children to reap zombies.
Programming with Processes
Once a process has been started, it is off and running on its own. How can multiple processes cooperate to get their work done? You can control what is in a new process's memory when it starts, so, for example, a child process may look in some variable to find out what it should do. For example, a network-based application may fork a process for each request, and each child simply looks in a programmer-determined variable likecurrent_request to figure out what to do.
This is enough for a surprisingly large number of applications. However, it is often desirable for processes to work together more closely. Perhaps the overhead of starting up a new process is large enough that we don't actually want to fork a process on every request in a network application. (This is also one of the principle motivations for threads, as we shall see later.) In such a case, we would like processes to be able to communicate. Inter-Process Communication (IPC) is a huge topic. In Unix-like systems, there are a range of options for IPC:
- The file system provides the easiest mechanism to use. Given what we saw above about unpredictable ordering of I/O operations, one must be careful with multiple processes using the same file. But disciplined use is fairly straightforward. For example, the parent process can create an output file to which it writes incoming requests. A child processes can read requests from the shared file. As long as their output operations are to different files or are synchronized, everything will work well.
- Pipes are a file-like mechanism supported
by Unix-like operating systems in which one process puts
data into a pipe and another reads it out. You've used
pipes a lot in the shell. (There are
also bi-directional pipes.) Pipes could be implemented in
the file system, but they are actually supported by the
kernel in Unix-like systems: they don't need to create
any disk files, but the data simply goes in and out of
kernel buffers. The relevant system calls are
dup()anddup2(). - FIFOs or named pipes work like regular pipes, but they overcome the restriction that only related processes may use a regular pipe. Named pipes have a name, like a file, so that unrelated processes can open them if they have the right permissions. Unfortunately, FIFOs don't guarantee that multiple readers can work on a single FIFO, i.e., FIFOs were designed with a simple producer/consumer model in mind.
- Message queues are a way for different processes to send arbitrary data to one another. The primary limitation of message queues is that the processes must be on the same machine, and these days, programmers want to plan for the eventuality that cooperating processes may be on different machines.
- Sockets are a mechanism for processes to send messages to each other, but the processes need not be on the same computer (but they can be). The HTTP server is a program that listens for requests on a socket. (Remote procedure call (RPC) is a generic mechanism on top of sockets that allows a process to call functions implemented by a another process.) One downside to programming with sockets is that, if processes really are running on different machines, sending binary data is complicated. Sending text messages is normally fine, but machines can use different representations for other forms of data. A common difference is little endian versus big endian numeric representations. In any case, numeric and structured data must be marshalled and unmarshalled.
- Semaphores and locks allow processes to synchronize and to avoid simultaneous manipulation of a shared resource.
- Signals are a simple way communicate a
small amount of information to a process
asynchronously. The kernel sends a process a
signal when certain errors arise (like division by zero).
Users can send a process a signal, often to kill it
(typing
C-cat a shell does this). And processes can signal each other or themselves. Setting an alarm that will alert the process when a certain amount of time has elapsed can be quite useful.
Last Modified