Architecture

From Fakeroot NG

This page documents the software architecture of fakeroot-ng. The documentation is relevant to the (as yet unreleased) multi-threaded debugger branch.

Throughout this page, the acronym "frng" is used to refer to fakeroot-ng.

The Daemon

Note: some people think that daemons are, by necessity, system startup processes. This is not the case. A daemon of any process that disassociates itself from the environment that started it.

Fakeroot-ng act as a debugger (tracer) for the faked root process. To do this, it creates an extra process and disassociates it from the running environment. This is the fakeroot-ng daemon, and it is an unprivileged process, without any special permissions.

File Lie Database

Fakeroot-ng allows processes to carry out changes to the file system that would require root permission. These include, among other things, changing ownership and device files. Since no fakeroot-ng related process actually has privileges, this is done by faking success on the creation system calls, and then lying about the effect to all future requests. If you create a character device, and then ask for a list of files in the directory, fakeroot-ng will lie to you that the file is a character device.

The lies frng tells the process are indexed in an unordered_map. The index key is the file's device and inode numbers. This allows the file lie database to persist across different processes, and also across invocations of frng. Also, since all of those lies are related to the stat structure, this indexing is extremely convenient.

The database is maintained in file_lie.h and file_lie.cpp.

Startup

Running the Daemon

Fakeroot-ng has two modes of operation. In the non-persistent mode any changes to the file system are discarded once the debugged process and all of its children exit. In the persistent mode (invoking frng with the -p option) the database is loaded from the state file (if it exists), and saved back to it once the daemon exits.

Running the daemon is managed through the daemonCtrl class. The standard mode of running a debugged process is fork+exec. In it, the parent forks, the child runs ptrace(PTRACE_TRACEME) and then performs execve to run the actual command. The parent is then responsible to wait for the child to finish.

This mode is not how frng works. Instead, the command to be run is a direct child of the shell that started frng. The daemon, if a new one is needed, detaches itself from the environment in the usual way. The debugee is, therefor, the grandparent of the debugger. For debug purposes, it is sometimes beneficial for frng not to disassociate itself from the standard output and standard input of the calling shell, or to change directory to / (otherwise a core file will not be generated). The -d option to frng causes it to not perform full daemonization.

Despite the non-standard relation, throughout this document the debugee is sometimes referred to as the "child" process. This is due to the fact that, working with ptrace, all debugees' status is retrieved via the wait system call.

This non-standard mode of operation allows fakeroot-ng to not affect script usage of commands. If the child itself daemonizes, running it under fakeroot-ng from a script will let the script continue execution as soon as the child finishes daemonizing. Compare this to running such a process through strace, where strace will only return when the last traced process finishes.

Daemon Control Sockets

Before running the actual command, the child process establishes a socket with the debugger process. Through this socket the child sends two commands. "Reserve" and "Attach". Once the second command is done, the child is free to execve the actual command to be executed.

Non-Persistent State Invocation

In the non-persistent case, every invocation of frng must result in a new daemon running. The communication socket with the daemon is opened using socketpair. Once the child performs execve, the only new processes to attach to the debugger are descendants of the original child process.

In this mode of operation, as soon as the last such descendant exits, the daemon also exits.

Persistent State Invocation

If frng is invoked with the -p option, it is committed to providing a consistent view of the file system to all processes that asked to use the same state file. This is true whether these processes are descendants of the original child process or are a result of an unrelated invocation of frng that gave the same state file.

This means that, on startup, frng needs to decide whether to create a new daemon, or whether to try to attach to an existing daemon. At any time a child process exists, exactly one daemon must be running to handle it.

There is a potential race with this strategy. If one instance of the daemon is in the process of exiting while another is just starting up, the new instance might try to connect to the old instance's socket after the old instance is no longer listening on it. This bug was found and diagnosed by Russell Yanofsky.

To prevent this race, the following sequence is employed:

  1. Open the state file (creating it if not already existing), and try to lock it using flock.
  2. If the lock succeeds, no other daemon is running (and none can start, as we now hold the lock), create the daemon which will:
    1. Create a SOCK_SEQPACKET Unix domain socket and start listening on it. The socket is bound to a file with the same name as the state file, with the extension .sock.
  3. Whether or not we tried to create a daemon, close the state file and try to connect to the state.sock socket and send it the "Reserve" command.
  4. If the connection was refused or hung up without sending us a reply to the "Reserve" then the daemon has exited. Retry everything up to this point.
  5. If the Reserve succeeded, then the existing daemon is guaranteed to not exit until we do. Send it an Attach and continue to the fork.

This sequence serves two purposes. The first is to eliminate the race mentioned above. If the current daemon is in the process of shutting down it will not handle our Reserve request. Once it closes its socket, we'll get a hangup and retry the entire process. At this point we will, likely, get the lock, and launch our own daemon. This is done without busy waiting.

The second purpose this serves is that it allows us to know the debugger's PID before it tries to attach to us. This is important on Linux systems with the Yama security module enabled. On those systems, the child must run a special prctl command to tell the system which PID is allowed to attach to it. This sequence allows frng to work on Yama ptrace_scope of 1 without any manual work by the user. Since that is the default on at least some versions of Ubuntu, this is rather important.