Difference between pages "Main Page" and "Architecture"

From Fakeroot NG
(Difference between pages)
(Move page from old installation)
 
(Move page from old installation)
 
Line 1: Line 1:
=News=
+
This page documents the software architecture of fakeroot-ng. The documentation is relevant to the (as yet unreleased) multi-threaded debugger branch.
April 12 2013: Version 0.18 of fakeroot-ng is out. The biggest change is support for ptrace_scope option in the kernel. Fakeroot-ng now correctly works with a settings of 1 (default on Ubuntu). Sadly, it does not, and cannot, work with a setting of 2, as that is fundamentally incompatible with its mode of operation.
 
  
=What is Fakeroot Next Generation (NG)=
+
Throughout this page, the acronym "frng" is used to refer to fakeroot-ng.
Fakeroot-ng is a clean re-implementation of [http://fakeroot.alioth.debian.org/ fakeroot]. The core idea is to run a program, but wrap all system calls that program performs so that it thinks it is running as root, while it is, in practice, running as an unprivileged user. When the program is trying to perform a privileged operation (such as modifying a file's owner or creating a block device), this operation is emulated, so that an unprivileged operation is actually carried out, but the result of the privileged operation is reported to the program whenever it attempts to query the result.
 
  
==Technical differences between Fakeroot and Fakeroot-ng==
+
=The Daemon=
Fakeroot uses the LD_PRELOAD mechanism to wrap system calls. This is fairly simple (relatively speaking) to implement, and is equivalent of placing a hook between the program and the runtime library. Among other disadvantages, LD_PRELOAD is unable to track system calls performed by statically compiled files.
+
Note: some people think that daemons are, by necessity, system startup processes. This is not the case. A daemon of any process that disassociates itself from the environment that started it.
  
Fakeroot-ng, on the other hand, uses the PTRACE mechanism, the mechanism used by debuggers, to track the program's system calls. This is equivalent of placing a hook between the user space and the kernel. It is not possible to bypass the monitoring performed by fakeroot-ng, even if one attempts to do so on purpose. The down side is that the hooks themselves are much more platform dependent, and are more difficult to program and debug.
+
Fakeroot-ng act as a debugger (tracer) for the faked root process. To do this, it creates an extra process and disassociates it from the running environment. This is the fakeroot-ng daemon, and it is an unprivileged process, without any special permissions.
  
More details about the technical differences between LD_PRELOAD and PTRACE system calls monitoring can be found in the [[PTRACE LD_PRELOAD comparison]] page.
+
==File Lie Database==
 +
Fakeroot-ng allows processes to carry out changes to the file system that would require root permission. These include, among other things, changing ownership and device files. Since no fakeroot-ng related process actually has privileges, this is done by faking success on the creation system calls, and then lying about the effect to all future requests. If you create a character device, and then ask for a list of files in the directory, fakeroot-ng will lie to you that the file is a character device.  
  
==Functional differences between Fakeroot and Fakeroot-ng==
+
The lies frng tells the process are indexed in an [http://en.cppreference.com/w/cpp/container/unordered_map unordered_map]. The index key is the file's device and inode numbers. This allows the file lie database to persist across different processes, and also across invocations of frng. Also, since all of those lies are related to the [http://linux.die.net/man/2/stat stat] structure, this indexing is extremely convenient.
On the functional level, there are many differences between fakeroot and fakeroot-ng. Some of these differences are a result of the different technologies used by the two programs, and some are more derivatives of the above. The differences cover such areas as default ownership of unknown files, handling of SUID executables and chroot support. The differences are covered in more depth in the [[fakeroot vs fakeroot-ng comparison]] page.
 
  
==Drawing the Line==
+
The database is maintained in '''file_lie.h''' and '''file_lie.cpp'''.
Some features, though technically possible, are [[OutOfScope|out of scope for the project]].
 
  
=Technical documentation=
+
=Startup=
 +
==Running the Daemon==
 +
Fakeroot-ng has two modes of operation. In the non-persistent mode any changes to the file system are discarded once the debugged process and all of its children exit. In the persistent mode (invoking frng with the ''-p'' option) the database is loaded from the state file (if it exists), and saved back to it once the daemon exits.
  
* [[HowtoCompile|How to compile fakeroot-ng]]: Common compilation problems, compiling a 32bit version on a 64bit machine, etc.
+
Running the daemon is managed through the '''daemonCtrl''' class. The standard mode of running a debugged process is fork+exec. In it, the parent forks, the child runs ptrace(PTRACE_TRACEME) and then performs execve to run the actual command. The parent is then responsible to wait for the child to finish.
* [[Architecture]]: Software architecture of Fakeroot-NG.
 
  
=Who is Responsible for Fakeroot-ng=
+
This mode is not how frng works. Instead, the command to be run is a direct child of the shell that started frng. The daemon, if a new one is needed, detaches itself from the environment in the [http://en.wikipedia.org/wiki/Daemon_%28computing%29#Creation usual way]. The debugee is, therefor, the grandparent of the debugger. For debug purposes, it is sometimes beneficial for frng not to disassociate itself from the standard output and standard input of the calling shell, or to change directory to / (otherwise a core file will not be generated). The ''-d'' option to frng causes it to not perform full daemonization.
Fakeroot-ng is written and maintained by Shachar Shemesh.
 
  
==Getting support==
+
Despite the non-standard relation, throughout this document the debugee is sometimes referred to as the "child" process. This is due to the fact that, working with ptrace, all debugees' status is retrieved via the [http://linux.die.net/man/2/wait wait] system call.
Support is available through the mailing list. The list archives are available [http://sourceforge.net/mailarchive/forum.php?forum_id=30232 here]. Posting to the list requires subscribing to it first. You can subscribe to the list [https://lists.sourceforge.net/lists/listinfo/fakerootng-devel here].
 
  
Commercial support is available from Shachar's company, [http://www.lingnu.com Lingnu Open Source Consulting Ltd.]
+
This non-standard mode of operation allows fakeroot-ng to not affect script usage of commands. If the child itself daemonizes, running it under fakeroot-ng from a script will let the script continue execution as soon as the child finishes daemonizing. Compare this to running such a process through strace, where strace will only return when the last traced process finishes.
  
=Getting fakeroot-ng=
+
==Daemon Control Sockets==
The latest version of fakeroot-ng is 0.18. It can be downloaded from the [http://sourceforge.net/projects/fakerootng/ SourceForge project's page].
+
Before running the actual command, the child process establishes a socket with the debugger process. Through this socket the child sends two commands. "Reserve" and "Attach". Once the second command is done, the child is free to execve the actual command to be executed.
  
=Relevant discussion=
+
==Non-Persistent State Invocation==
A thread at the utrace development mailing list about [https://www.redhat.com/archives/utrace-devel/2009-July/msg00031.html utrace support for PTRACE_GETSIGINFO] (or its lack there of), and the [https://bugzilla.redhat.com/show_bug.cgi?id=510894 bugzilla bug filed].
+
In the non-persistent case, every invocation of frng must result in a new daemon running. The communication socket with the daemon is opened using [http://linux.die.net/man/2/socketpair socketpair]. Once the child performs execve, the only new processes to attach to the debugger are descendants of the original child process.
 +
 
 +
In this mode of operation, as soon as the last such descendant exits, the daemon also exits.
 +
 
 +
==Persistent State Invocation==
 +
If frng is invoked with the ''-p'' option, it is committed to providing a consistent view of the file system to all processes that asked to use the same state file. This is true whether these processes are descendants of the original child process or are a result of an unrelated invocation of frng that gave the same state file.
 +
 
 +
This means that, on startup, frng needs to decide whether to create a new daemon, or whether to try to attach to an existing daemon. At any time a child process exists, exactly one daemon must be running to handle it.
 +
 
 +
There is a potential race with this strategy. If one instance of the daemon is in the process of exiting while another is just starting up, the new instance might try to connect to the old instance's socket after the old instance is no longer listening on it. This bug was found and diagnosed by [http://sourceforge.net/p/fakerootng/mailman/message/25749940/ Russell Yanofsky].
 +
 
 +
To prevent this race, the following sequence is employed:
 +
# Open the state file (creating it if not already existing), and try to lock it using [http://linux.die.net/man/2/flock flock].
 +
# If the lock succeeds, no other daemon is running (and none can start, as we now hold the lock), create the daemon which will:
 +
## Create a SOCK_SEQPACKET Unix domain socket and start listening on it. The socket is bound to a file with the same name as the state file, with the extension ''.sock''.
 +
# Whether or not we tried to create a daemon, close the state file and try to connect to the ''state.sock'' socket and send it the "Reserve" command.
 +
# If the connection was refused or hung up without sending us a reply to the "Reserve" then the daemon has exited. Retry everything up to this point.
 +
# If the Reserve succeeded, then the existing daemon is guaranteed to not exit until we do. Send it an Attach and continue to the fork.
 +
 
 +
This sequence serves two purposes. The first is to eliminate the race mentioned above. If the current daemon is in the process of shutting down it will not handle our Reserve request. Once it closes its socket, we'll get a hangup and retry the entire process. At this point we will, likely, get the lock, and launch our own daemon. This is done without busy waiting.
 +
 
 +
The second purpose this serves is that it allows us to know the debugger's PID before it tries to attach to us. This is important on Linux systems with the Yama security module enabled. On those systems, the child must run a special [http://linux.die.net/man/2/prctl prctl] command to tell the system which PID is allowed to attach to it. This sequence allows frng to work on Yama ptrace_scope of 1 without any manual work by the user. Since that is the default on at least some versions of Ubuntu, this is rather important.

Latest revision as of 16:57, 22 April 2019

This page documents the software architecture of fakeroot-ng. The documentation is relevant to the (as yet unreleased) multi-threaded debugger branch.

Throughout this page, the acronym "frng" is used to refer to fakeroot-ng.

The Daemon

Note: some people think that daemons are, by necessity, system startup processes. This is not the case. A daemon of any process that disassociates itself from the environment that started it.

Fakeroot-ng act as a debugger (tracer) for the faked root process. To do this, it creates an extra process and disassociates it from the running environment. This is the fakeroot-ng daemon, and it is an unprivileged process, without any special permissions.

File Lie Database

Fakeroot-ng allows processes to carry out changes to the file system that would require root permission. These include, among other things, changing ownership and device files. Since no fakeroot-ng related process actually has privileges, this is done by faking success on the creation system calls, and then lying about the effect to all future requests. If you create a character device, and then ask for a list of files in the directory, fakeroot-ng will lie to you that the file is a character device.

The lies frng tells the process are indexed in an unordered_map. The index key is the file's device and inode numbers. This allows the file lie database to persist across different processes, and also across invocations of frng. Also, since all of those lies are related to the stat structure, this indexing is extremely convenient.

The database is maintained in file_lie.h and file_lie.cpp.

Startup

Running the Daemon

Fakeroot-ng has two modes of operation. In the non-persistent mode any changes to the file system are discarded once the debugged process and all of its children exit. In the persistent mode (invoking frng with the -p option) the database is loaded from the state file (if it exists), and saved back to it once the daemon exits.

Running the daemon is managed through the daemonCtrl class. The standard mode of running a debugged process is fork+exec. In it, the parent forks, the child runs ptrace(PTRACE_TRACEME) and then performs execve to run the actual command. The parent is then responsible to wait for the child to finish.

This mode is not how frng works. Instead, the command to be run is a direct child of the shell that started frng. The daemon, if a new one is needed, detaches itself from the environment in the usual way. The debugee is, therefor, the grandparent of the debugger. For debug purposes, it is sometimes beneficial for frng not to disassociate itself from the standard output and standard input of the calling shell, or to change directory to / (otherwise a core file will not be generated). The -d option to frng causes it to not perform full daemonization.

Despite the non-standard relation, throughout this document the debugee is sometimes referred to as the "child" process. This is due to the fact that, working with ptrace, all debugees' status is retrieved via the wait system call.

This non-standard mode of operation allows fakeroot-ng to not affect script usage of commands. If the child itself daemonizes, running it under fakeroot-ng from a script will let the script continue execution as soon as the child finishes daemonizing. Compare this to running such a process through strace, where strace will only return when the last traced process finishes.

Daemon Control Sockets

Before running the actual command, the child process establishes a socket with the debugger process. Through this socket the child sends two commands. "Reserve" and "Attach". Once the second command is done, the child is free to execve the actual command to be executed.

Non-Persistent State Invocation

In the non-persistent case, every invocation of frng must result in a new daemon running. The communication socket with the daemon is opened using socketpair. Once the child performs execve, the only new processes to attach to the debugger are descendants of the original child process.

In this mode of operation, as soon as the last such descendant exits, the daemon also exits.

Persistent State Invocation

If frng is invoked with the -p option, it is committed to providing a consistent view of the file system to all processes that asked to use the same state file. This is true whether these processes are descendants of the original child process or are a result of an unrelated invocation of frng that gave the same state file.

This means that, on startup, frng needs to decide whether to create a new daemon, or whether to try to attach to an existing daemon. At any time a child process exists, exactly one daemon must be running to handle it.

There is a potential race with this strategy. If one instance of the daemon is in the process of exiting while another is just starting up, the new instance might try to connect to the old instance's socket after the old instance is no longer listening on it. This bug was found and diagnosed by Russell Yanofsky.

To prevent this race, the following sequence is employed:

  1. Open the state file (creating it if not already existing), and try to lock it using flock.
  2. If the lock succeeds, no other daemon is running (and none can start, as we now hold the lock), create the daemon which will:
    1. Create a SOCK_SEQPACKET Unix domain socket and start listening on it. The socket is bound to a file with the same name as the state file, with the extension .sock.
  3. Whether or not we tried to create a daemon, close the state file and try to connect to the state.sock socket and send it the "Reserve" command.
  4. If the connection was refused or hung up without sending us a reply to the "Reserve" then the daemon has exited. Retry everything up to this point.
  5. If the Reserve succeeded, then the existing daemon is guaranteed to not exit until we do. Send it an Attach and continue to the fork.

This sequence serves two purposes. The first is to eliminate the race mentioned above. If the current daemon is in the process of shutting down it will not handle our Reserve request. Once it closes its socket, we'll get a hangup and retry the entire process. At this point we will, likely, get the lock, and launch our own daemon. This is done without busy waiting.

The second purpose this serves is that it allows us to know the debugger's PID before it tries to attach to us. This is important on Linux systems with the Yama security module enabled. On those systems, the child must run a special prctl command to tell the system which PID is allowed to attach to it. This sequence allows frng to work on Yama ptrace_scope of 1 without any manual work by the user. Since that is the default on at least some versions of Ubuntu, this is rather important.