The ptrace() system call provides the unique functionality under many *nix systems allowing a process to examine the data, and control the execution of, another running process. This includes the ability to read and write register values, arbitrary values from memory, and signals. The tracing process (from here on, the parent) may establish this relationship with the process being examined (the child) by either fork()ing the process as a literal child process, or by assuming the temporary parental responsibilities of the running child process by attaching to it during execution. The most useful applications of this system call would be building debuggers and process tracing tools. I’m also going to focus on the GNU/Linux version of ptrace.
There isn’t a ton of online documentation about how to use ptrace, probably due to the fact its undoubtedly one of the more infamous and disliked system calls available in POSIX. If you’ve never had to use it before, you’ll find it both an educational and frustrating experience. The manpage isn’t bad, but lacks a lot of context.
I’m going to try and avoid using the word process from here on in, and use task instead. This is because ptrace can be applied to individual threads within a process (at least, under Linux). Due to the complicated ambiguity between threads and processes on Linux, I’ll try to limit confusion and just refer to tasks instead.
Let’s take a look at the function definition from the manpage:
long ptrace (enum __ptrace_request request,
Taking a look at the parameters to the function:
- __ptrace_request request: A code provided to ptrace telling it which operation to perform. More on this later.
- pid_t pid: The task ID to perform the operation on
- void *addr: The address in memory for which to read or write for certain ptrace operations; other operations ignore this parameter.
- void *data: Address for various data structures to be read/written to/from the process. More on this later as well.
ptrace also returns a long integer, which for all ptrace operations except the peeks return 0 for success and -1 for error. For the peek operations it returns data read from the child, and -1 on error… sometimes. I’ll cover that too.
As you probably noticed from all the special cases above, this is not a straightforward or simple system call to use. There are a lot of special cases for input and output to consider depending on what you happen to be using it for at any given time. I’ll flesh out some of these further below. But firstly, let’s talk about the behavior of a child task under ptrace.
A child being traced has two basic states: stopped and running. ptrace operations cannot be performed on a running child process, thus, they must be done either when
- The child stops on its own
- The parent stops the child manually
Typically a process will stop (I’m talking about a ‘T’ status here) when it receives a SIGSTOP signal. However, when being traced, a child will stop upon receiving of any signal, with the exception of SIGKILL. This is true for signals that the child is explicitly ignoring as well. After receiving notification that the child has stopped via wait(), the parent can take the time to perform various ptrace operations, or can tell the child to continue executing through ptrace, either delivering or ignoring the signal which caused the stoppage.
If the parent process would like the child to stop (for example, after user input in a debugger), it can simply send it a SIGSTOP through the usual methods. Again, technically any unused signal besides SIGKILL would do the job, but its best to avoid ambiguity. It is important to ensure that the child task is stopped before doing anything to it; otherwise ptrace will return an ESRCH error: “no such process”.
Let me itemize the states involved in stopping, ptrace()-ing, and running a child process in this scenario:
- Child process is running
- Child process stops after receiving signal (SIGSTOP/SIGTRAP/other)
- Parent receives child signal status from wait()
- Parent performs various ptrace operations
- Parent signals child to continue executing
Any ptrace operations performed outside of step 4 will fail. Make sure that you have appropriately been notified that the child is stopped before trying to use ptrace. I mentioned above using the wait() call to retrieve process status of a traced child process. This is correct – as with a conventionally fork()ed process, the tracing parent uses wait() to determine task state after receiving a signal. In fact, it might be easier to use waitpid() so that you can specify the exact task you’re waiting for, in case you’re tracing multiple tasks/threads simultaneously.
Alright, let’s talk about some of the more interesting ptrace codes. I’ll provide a short code sample of a call for each respective request; any NULL argument is an argument unused by ptrace for that request. Firstly, the codes that deal with initiating and terminating the tracing of the child task.
long ret = ptrace (PTRACE_TRACEME, 0, NULL, NULL);
This is the only ptrace operation which is used by the child. It’s purpose is to indicate that the child task is to be traced by a parent and to grant it necessary ptrace permissions. The 0 in the pid field refers to the child task’s parent. As soon as the child makes a call to any of the exec() functions, it receives a SIGTRAP, at which point it is stopped until the tracing parent allows it to continue. It is important for the parent to wait for this event to happen before performing any ptrace operations, including the configuration operations involved with PTRACE_SETOPTIONS.
long ret = ptrace (PTRACE_ATTACH, target_pid, NULL, NULL);
This is used by a task when it wishes to trace the execution of another task. For the most part, this will make the process represented by target_pid the literal child of tracing task. By and large, the situation created by using PTRACE_ATTACH is equivalent to what would’ve happened if the child had used PTRACE_TRACEME instead.
An important note is that this operation involves sending a SIGSTOP to the targeted process, and as usual, the parent needs to perform a wait() on target_pid after this call before continuing with any other work to ensure the child has properly stopped.
long ret = ptrace (PTRACE_CONT, target_pid, NULL, 0);
This will be the request you’ll use each time that wait() indicates that the child has stopped after receiving a signal to get it running again. If the data field is anything besides zero or SIGSTOP, ptrace will figure its a signal number you’d like delivered to the process. This can be used to actually deliver signals to the child which caused it to stop and notify the parent before acting on them. For common signals like SIGTRAP, you probably won’t want to do this. However, if you’d like to see if the child properly handles a SIGUSR1, this would be one way to go about it.
long ret = ptrace (PTRACE_DETACH, child_pid, NULL, 0);
Completes the tracing relationship between the parent and child, and if the parent attached to the child, “re-parents” the child back to its original parent process. Then it continues the child with a SIGCONT.
Now that we’ve covered the basics of how to get a tracing running, let’s get to some of the more interesting stuff.
PTRACE_PEEKTEXT | PTRACE_PEEKDATA
long word = ptrace (PTRACE_PEEKDATA, child_pid, addr, NULL);
if (word == -1)
On GNU/Linux systems, text and data address spaces are shared, so although these two codes would be used interchangeably here, on other UNIX platforms this would not be the case. The purpose of this request is to read words from the child task’s data address space and inspect the values. I mentioned above that peek operations require a little extra effort when detecting errors, which is briefly outlined in the code snippet above. Although ptrace will return -1 for error on a peek operation, -1 may also be the value stored at the provided memory address. Thus, errno must be checked in these situations to ensure an error actually happened.
The utility of this request is obvious – reading values from memory addresses in another task’s address space. If you consider GDB, printing variables or setting breakpoints would all need to use this request.
PTRACE_POKETEXT | PTRACE_POKEDATA
long ret = ptrace (PTRACE_POKEDATA, child_pid, addr, new_val);
Conversely to the peek functions, the poke functions do the opposite – write arbitrary values into the memory space of the child task. This is useful if you’d like to examine the change in behavior of the child task given different parameters, or for debugging tasks such as inserting breakpoints. This is turning into a pretty long post, but I can cover how to insert breakpoints into a child task’s address space on a later blog post.
long ret = ptrace (PTRACE_SINGLESTEP, child_pid, NULL, NULL);
The single-step request is actually several operations batched into one. A PTRACE_SINGLESTEP request will execute a single instruction in the child task, then stop the child and notify the parent with a SIGTRAP. The operations involved include setting and removing a breakpoint so that only a single instruction is executed. This can be used to slowly step through the execution of a program, and assist with the usage of the other ptrace operations above. Think “stepi” from GDB.
PTRACE_GETREGS | PTRACE_SETREGS
long ret = ptrace (PTRACE_GETREGS, child_pid, NULL, ®s);
regs.rip = 0xdeadbeef;
#elif defined __i386__
regs.eip = 0xdeadbeef;
ret = ptrace (PTRACE_SETREGS, child_pid, NULL, ®s);
These ptrace requests involve reading and writing the general-purpose register values for the child process. The above example does three things:
- Reads the values of all general-purpose registers associated with child_pid
- Sets the instruction pointer of the user_regs_struct structure to a not-so-random address
- Writes the edited user_regs_struct back to the child, likely causing a crash upon re-execution due to the new instruction pointer setting
Similar functionality is available for the designated floating-point registers as well through the use of PTRACE_GETFPREGS and PTRACE_SETFPREGS.