Exploring System Internals with lsof and strace

This article describes various approaches to looking at the processes in you UNIX environment. Unlike other operating systems, most of the UNIX systems allow you to get a lot of information about the state of the system, processes, I/O, networking etc.

Why and when do you need it? There may be endless list of situations that may require you to look closely at your system. For example:

You face strange behavior from one of the processes, it consumes too much CPU etc
The program that worked before now fails with "Segmentation fault" or another error message
You have installed a new package and the documentation does not clearly explains the order or loading of the configuration files
There is a process in the system that uses a TCP port for incoming connections but you do not know which one
Anything else that sounds like "what is it doing", "who is using this resource" etc…

### The Basics

This section covers briefly the terminology that you need to familiarize yourself with in order to understand the results of the operations described later in the article. Do not worry, it is not more difficult when answering to Windows Vista UAC question ;)

System Call (a.k.a. "syscall")

System Call is a mechanism for the application to make a call to the operating system. The OS usually has a limited number of the "entry points" that can be used for the basic and fundamental tasks like file I/O, memory allocation, networking etc. For example, most of the UNIX systems have only 1-3 different system calls to open a file (although various libraries offer dozens of ways to deal with a file, they all finally make the same system call). The important thing about the system call that you need to understand is that there are few of them and they are the only way for any program to interact with the OS.

Process ID (PID)

Any process in the UNIX OSes is defined by its unique numeric PID, even if the process runs several threads. From the moment you start a program till the moment it is finished it maintains the same PID. However, some processes can span other processes (usually by using one of the fork() syscalls).

File Handle

Any I/O operations in UNIX is associated with a file handle which is a positive integer number. It applies to any file on the disk, a network socket etc. A typical operation with a file on the disk, for example, is one of the following:

open("filename") -> returns a handle that from this moment will be associated with this open file
read(handle, where, how_much)
write(handle, from_where, how_much)
close(handle) -> removes the association established with open()

Handles are unique only for an individual process. There are well-known handles available by default for any application, such as 0 for the standard input, 1 for standard output and 2 for the standard error stream.

### Turn out your pockets, process!

We will start from a tool that can show us a snapshot of what is happening right now with our system. There is an excellent tool that is available for many UNIX systems called "lsof" ("LiSt Open Files"). You can find more information about this tool here. Most of the Linux distributions and Mac OS X have this tool included.

Here is what this tool can do:

Show the list of files and other resources open by a selected process or by all processes in the system
Show the list of processes using a particular resource (like a TCP socket, for example)

You can do various types of sorting, filtering and formatting for getting exactly the results you want. The program has about 3 dozens of command line options, we will try a couple of typical and most representative examples now. Note: most of these examples require root access to the system (or you will be limited to seeing only the information that is related to your processes).

How to see the list of files opened by a running process

For example, we would like to see which resources are currently used by Apache HTTP server. First, we need to identify the process we are interested in:

[root@localhost]$ ps ax | grep httpd | grep -v grep

 2383 ?        Ss     0:00 /usr/sbin/httpd
 2408 ?        S      0:00 /usr/sbin/httpd
 2409 ?        S      0:00 /usr/sbin/httpd
 2410 ?        S      0:00 /usr/sbin/httpd
 2411 ?        S      0:00 /usr/sbin/httpd
 2412 ?        S      0:00 /usr/sbin/httpd
 2414 ?        S      0:00 /usr/sbin/httpd
 2415 ?        S      0:00 /usr/sbin/httpd
 2416 ?        S      0:00 /usr/sbin/httpd

Hmm…which one I am interested in? Unfortunately, there is no universal recipe to answer this question. However…lets look at the problem from another perspective. At some point I have mentioned that when you start a program, the OS creates a process. And a process can spawn other processes. So, we can assume that there may be a tree of sub-processed related to one parent. Lets see how we can get this information (this command will work on both Linux and OS X):

[root@localhost]$ ps axo pid,ppid,command | egrep '(httpd)|(PID)' |
grep -v grep

 2383     1 /usr/sbin/httpd
 2408  2383 /usr/sbin/httpd
 2409  2383 /usr/sbin/httpd
 2410  2383 /usr/sbin/httpd
 2411  2383 /usr/sbin/httpd
 2412  2383 /usr/sbin/httpd
 2414  2383 /usr/sbin/httpd
 2415  2383 /usr/sbin/httpd
 2416  2383 /usr/sbin/httpd

PPID is simply Parent PID, i.e. Parent Process ID. As you can see, most of these processes have the same PPID (2383) and there is one process with the PID of 2383. This must be the parent process. If you are curious to find out who is its parent, I will let you to do this exercise yourself.

Linux has another fancy tool called "pstree". We could use it to get the same kind of information:

[root@localhost]$ pstree -p | grep httpd

        |-httpd(2383)-+-httpd(2408)
        |             |-httpd(2409)
        |             |-httpd(2410)
        |             |-httpd(2411)
        |             |-httpd(2412)
        |             |-httpd(2414)
        |             |-httpd(2415)
        |             `-httpd(2416)

Looks better and quite self-explaining, isn't it? OK, back to the original question. Now we know that there are certain relationships with these "httpd"s. So, our target will be the process 2383.

In its simples form, "lsof" can be called like this:

[root@localhost]$ lsof -p 2383

COMMAND  PID USER   FD   TYPE DEVICE    SIZE    NODE NAME
httpd   2383 root  cwd    DIR    3,6    1024       2 /
httpd   2383 root  rtd    DIR    3,6    1024       2 /
httpd   2383 root  txt    REG    3,8  328136  261136 /usr/sbin/httpd
httpd   2383 root  mem    REG    3,6  130448    8167 /lib64/ld-2.5.so
httpd   2383 root  DEL    REG    0,8            8072 /dev/zero
httpd   2383 root  mem    REG    3,6  615136   79579 /lib64/libm-2.5.so
httpd   2383 root  mem    REG    3,6  117656    8186 /lib64/libpcre.so.0.0.1
httpd   2383 root  mem    REG    3,6   95480    8213 /lib64/libselinux.so.1
...
httpd   2383 root  mem    REG    3,8   46160 1704830 /usr/lib64/
php/modules/ldap.so
httpd   2383 root  DEL    REG    0,8            8120 /dev/zero
httpd   2383 root  DEL    REG    0,8            8122 /dev/zero
httpd   2383 root    0r   CHR    1,3            1488 /dev/null
httpd   2383 root    1w   CHR    1,3            1488 /dev/null
httpd   2383 root    2w   REG   3,12    1077    4210 /var/log/
httpd/error_log
httpd   2383 root    3r   CHR    1,9            1922 /dev/urandom
httpd   2383 root    4u  IPv6   8052             TCP *:http (LISTEN)
httpd   2383 root    5u  sock    0,5            8053 can't identify protocol
httpd   2383 root    6u  IPv6   8057             TCP *:https (LISTEN)
httpd   2383 root    7u  sock    0,5            8058 can't identify protocol
httpd   2383 root    8r  FIFO    0,6            8069 pipe
httpd   2383 root    9w  FIFO    0,6            8069 pipe
httpd   2383 root   10w   REG   3,12    1077    4210 /var/log/
httpd/error_log
httpd   2383 root   11w   REG   3,12     711    4215 /var/log/
httpd/ssl_error_log
httpd   2383 root   12w   REG   3,12       0    4138 /var/log/
httpd/access_log
httpd   2383 root   13w   REG   3,12       0    4151 /var/log/
httpd/ssl_access_log
httpd   2383 root   14w   REG   3,12       0    4152 /var/log/
httpd/ssl_request_log

Looks quite interesting, doesn't it? Now I am going to briefly explain the meaning of each kind of entry (you can get more information from the manual page for "lsof").

COMMAND

Contains first 9 characters of the UNIX command name associated with the process

PID

The process ID. In our example it is simply the PID we have asked the information for

USER

The UNIX user owning the process

FD

File Descriptor number. Actually, it is more than the descriptor I have mentioned before. There are special descriptors identifying various resources in the UNIX system. For more details I recommend to check the man page. As you can see, everything is there: the current working directory (cwd), the process image itself (txt), memory-mapped libraries etc. The most interesting for us are the open files and sockets: you can see the numbers 0..14 followed by a suffix ("w" for writing only, "r" for reading only, "u" for both reading and writing and others). For example, you can see that Apache httpd redirects all its error output (2w) to /var/log/https/error_log. It also has a bunch of other regular files open for writing (handles 10-14, various log files). You can also see that it is listening on two TCP sockets (the port numbers shown are resolved through /etc/services file, you can always use "-n" option to disable any name resolution - useful if you, for example, know the port number and want to look for it specifically). You can see two more sockets (do not be afraid of the message "can't identify...", it is just application-specific protocol), Apache is using them to talk to its sub-processes.

TYPE

Type of the node. There is several dozens of various node types, like "REG" for the regular files, "IPv4" (or "IPv6 for IP version 6) for the Internet sockets, "CHR" for the character devices, "DIR" for the directories etc.

DEVICE

Shows the UNIX device numbers (major, minor) associated with the resource. Another interesting parameter. For example, on my system the numbers 3,12 correspond to:

[root@localhost]$ ls -l /dev/* | egrep '3,\s*12'

brw-r----- 1 root disk     3,  12 Jul 17 19:17 /dev/hda12

[root@localhost]$ mount | grep hda12

/dev/hda12 on /var type ext3 (rw)

This is true, the files are locates on one of the partitions on the first IDE disk mounted at /var.

SIZE

The size of the file or the current offset (in bytes). This number does not make sense for some of the resource types so you see blank values there.

NODE

Node number of the file, protocol identifier (for Internet sockets) etc.

Name

The name of the file or resource, if applicable. The value is quite self-explaining and probably one of the most useful.

Try using this command for other processes in your system. Probably one of the most interesting ones to look at will be your X11 server.

There is a number of filters you can use to see only the information you are interested in. For example, I want to know does my iTunes talk to someone over the network or not. Here is a command I would use something like:

[root@localhost]$ lsof -p 1559 -a -i

Which produces empty answer (exactly what I was hoping for :) ). But if I try my SSH client instead of iTunes:

[root@localhost]$ lsof -p 1586 -a -i

COMMAND  PID       USER   FD   TYPE    DEVICE SIZE/OFF NODE NAME
ssh     1586       nik    3u  IPv4 0x40e7394      0t0  TCP macbook:50648
->oberon:ssh (ESTABLISHED)

Summary: you can get a snapshot of any live process using this command. Sometimes it is simpler to use "lsof" (especially on a system you are not familiar with) to find the location of the log file, config files for a service etc.

### Who is using that file?

From time to time you may become very curious about who (or at least which process :) ) is using certain resource(s) on your system. There may be a number of reasons: a server that does not start complaining about the port that is already being used, a suspicious network connection that you see when using "netstat" command, inability to unmount a network disk or a external storage unit etc.

A traditional "fuser" command is not always good enough for these purposes. For example, if I want to see all the processes that have any open files under /etc directory. Here is the command that give me the answer:

[root@localhost]$ lsof +D /etc

COMMAND    PID  USER   FD   TYPE DEVICE SIZE  NODE NAME
dbus-daem 2206  dbus    5r   DIR    3,6 1024 59500 /etc/dbus-1/system.d
acpid     2305  root    6r   REG    3,6  236 63639 /etc/acpi/
events/video.conf
avahi-dae 2541 avahi  cwd    DIR    3,6 1024 63468 /etc/avahi
avahi-dae 2541 avahi  rtd    DIR    3,6 1024 63468 /etc/avahi
prefdm    2662  root  255r   REG    3,6 1465 59322 /etc/X11/prefdm
000-delay 3831  root  255r   REG    3,6  577 59446 /etc/cron.
daily/000-delay.cron

When you use "+D path" option, lsof checks every single file under the specified path and shows the information about the processes using it. This command may take some time to execute if you have a lot of files there.

Another situation: a network connection. "-i" option of lsof allows you to search for a specific network connection. The format is following: lsof -i [4|6][protocol][@hostname|hostaddr][:service|port]

"4" or "6" defines the IP protocol version. If you do not use any filters and just use "-i" you will get the list of all connections in your system. Lets find all the UDP connections that we have on a OS X box:

[root@localhost]$ lsof -i UDP

COMMAND    PID USER   FD   TYPE    DEVICE SIZE/OFF NODE NAME
launchd      1 root   10u  IPv4 0x393ed40      0t0  UDP *:netbios-ns
mDNSRespo   39 root    7u  IPv4 0x393e790      0t0  UDP *:mdns
mDNSRespo   39 root    8u  IPv6 0x393e6c0      0t0  UDP *:mdns
mDNSRespo   39 root    9u  IPv4 0x41556c0      0t0  UDP *:mdns
mDNSRespo   39 root   11u  IPv4 0x4155ee0      0t0  UDP *:mdns
mDNSRespo   39 root   12u  IPv4 0x393dd00      0t0  UDP macbook:49727
netinfod    40 root    6u  IPv4 0x393ee10      0t0  UDP localhost:
netinfo-local
syslogd     41 root   16u  IPv4 0x393ec70      0t0  UDP *:49175
Directory   52 root   10u  IPv4 0x3fbd1a0      0t0  UDP *:*
Directory   52 root   12u  IPv4 0x4155d40      0t0  UDP *:*
ntpd       141 root    5u  IPv4 0x393e520      0t0  UDP *:ntp
ntpd       141 root    6u  IPv6 0x393e110      0t0  UDP *:ntp
ntpd       141 root    7u  IPv6 0x393ead0      0t0  UDP localhost:ntp
ntpd       141 root    8u  IPv6 0x393e860      0t0  UDP [fe80:1::1]:ntp
ntpd       141 root    9u  IPv4 0x393e5f0      0t0  UDP localhost:ntp
ntpd       141 root   10u  IPv6 0x393e450      0t0  UDP [fe80:5::216:
cbff:febf:a08d]:ntp
ntpd       141 root   11u  IPv4 0x393e380      0t0  UDP macbook:ntp
prl_dhcpd  147 root    5u  IPv4 0x393db60      0t0  UDP *:*
automount  168 root    8u  IPv4 0x3fbd000      0t0  UDP localhost:1023
automount  179 root    8u  IPv4 0x393e930      0t0  UDP localhost:1022
nmbd      1078 root    0u  IPv4 0x393ed40      0t0  UDP *:netbios-ns
nmbd      1078 root    1u  IPv4 0x393ed40      0t0  UDP *:netbios-ns
nmbd      1078 root    8u  IPv4 0x393ea00      0t0  UDP *:netbios-dgm
nmbd      1078 root    9u  IPv4 0x3fbd4e0      0t0  UDP 10.37.129.2:
netbios-ns
nmbd      1078 root   10u  IPv4 0x3fbeba0      0t0  UDP 10.37.129.2:
netbios-dgm
nmbd      1078 root   11u  IPv4 0x4155380      0t0  UDP macbook:
netbios-ns
nmbd      1078 root   12u  IPv4 0x4155450      0t0  UDP macbook:
netbios-dgm
cupsd     1087 root    6u  IPv4 0x3fbd340      0t0  UDP *:ipp

Do not be surprised if you do not see anything on your machine - you will not see the information about the processes that do not belong to you unless you are root.

Summary: using lsof with various filters allows you to search through your system and find out the detailed information about any local or network resource used.

### Watch your steps

We have learned how to get a snapshot or your UNIX system (or any individual process living in it). But sometimes it is not enough to look at the static information. Dynamic runtime information may answer a lot of other questions - why does the program crash, why does it say "cannot open resource" without providing the resource location, is it accepting the network connection or not etc.

In the fisrt section it was mentioned that the way the UNIX program talks to the operating system (i.e. deals with almost any resource on your computer) is called "system calls". Most of the UNIX kernels, including the Linux and OS X provide a way for the user to see the list of the system calls a process makes in real time, including the parameters the process passes to the kernel. There are different tools on different systems that provide this functionality. On Linux it is called "strace", on BSD (including OS X) there are two toolss working together - "ktrace" and "kdump". We will briefly cover both of them starting from the Linux one.

The really great thing about these tools is that they use the kernel tracing facility in order to work. Thus, they work with ANY program that is running in your system, it does not have to be something you have developed and you do not need the source code of the program in order to understand (in general, of course) what it is doing.

It is important to mention again that just like in case of "lsof" you cannot see anything that you are not supposed to see, i.e. you cannot see what process is doing if it is not yours (or unless you are root user).

The simplest way to run the "strace" would be as: strace command Lets see what a simple well-known like "w" (the command that shows the users logged in and their current activities) does:

[root@localhost]$ strace /usr/bin/w 2>&1 | less

execve("/usr/bin/w", ["/usr/bin/w"], [/* 30 vars */]) = 0
brk(0)                                  = 0x603000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0x2aaaaaaab000
uname({sys="Linux", node="oberon.anglab.com", ...}) = 0
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such
file or directory)
open("/etc/ld.so.cache", O_RDONLY)      = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=103874, ...}) = 0
mmap(NULL, 103874, PROT_READ, MAP_PRIVATE, 3, 0) = 0x2aaaaaaac000
close(3)                                = 0
open("/lib64/libproc-3.2.7.so", O_RDONLY) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\
	240;`\314"..., 832) = 832
...
open("/proc/uptime", O_RDONLY)          = 3
lseek(3, 0, SEEK_SET)                   = 0
read(3, "7267.86 7230.07\n", 1023)      = 16
access("/var/run/utmpx", F_OK)          = -1 ENOENT (No such file
or directory)
open("/var/run/utmp", O_RDWR)           = -1 EACCES (Permission denied)
open("/var/run/utmp", O_RDONLY)         = 4
fcntl(4, F_GETFD)                       = 0
...
open("/proc", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = 3
fstat(3, {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
fcntl(3, F_SETFD, FD_CLOEXEC)           = 0
getdents(3, /* 35 entries */, 1024)     = 1000
getdents(3, /* 38 entries */, 1024)     = 1016
stat("/proc/1", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
open("/proc/1/stat", O_RDONLY)          = 4
read(4, "1 (init) S 0 1 1 0 -1 4194560 64"..., 1023) = 216
close(4)                                = 0
socket(PF_FILE, SOCK_STREAM, 0)         = 4
...
fcntl(5, F_SETLKW, {type=F_UNLCK, whence=SEEK_SET, start=0, len=0}) = 0
alarm(0)                                = 1
rt_sigaction(SIGALRM, {SIG_DFL}, NULL, 8) = 0
close(5)                                = 0
write(1, " 21:31:48 up  2:01,  2 users,  l"..., 281 21:31:48 up  2:01,
2 users,  load average: 0.00, 0.00, 0.00
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
nick     pts/0    192.168.8.18     21:12    9.00s  0.03s  0.03s -bash
nick     pts/1    192.168.8.18     21:19    0.00s  0.13s  0.04s strace
/usr/bin
) = 281
exit_group(0)                           = ?
Process 3947 detached

Only a small part of the long output is shown here. As you can see, everything starts with the "execve" syscall. Which is correct - this is the call that executes a program in UNIX. By the way, for each of these calls you can get more information from the manual pages - just type "man execve" to learn more about the call and its parameters (as you can see the parameters are shown in the trace).And, as it is supposed to be, the execution ends with "exit_group" call with the status 0 (success).

You can see how the program being started loads the shared libraries it depends on. You can see that at some point it tries to open "var/run/utmpx" file but it does not exist (ENOENT). Then it tries to open "/var/run/utmp" file but it is not readable for this program (i.e. for this user) so it gets back (EACCES). Finally this program decides that the only way to find out who is logged in is to go through all the active processes and check where their standard input and output are attached to. In order to do it, "w" readsthe list of files and directories under "/proc" (open - getdents - close). By the way, the number returned by the open() call is the file handle. By using it, you can follow the operations with this file since all read() or write() system calls will use the handle. After getting the list of files, "w" goes over all process directories and reads the virtual files from there to get more information about the processes. And fnally it displays the results by writing them to the file descriptor 1 (standard output, stdout). Very simple and informative.

There is a number of parameters that you can use with "strace" to filter the results and get exactly the information you are looking for. For example, if you are interested only in the network operations, lets see what does "ping" command do (run it as root user):

[root@localhost]$ strace -x -e trace=network,desc ping -c 1 127.0.0.1

...
socket(PF_INET, SOCK_RAW, IPPROTO_ICMP) = 3
socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 4
connect(4, {sa_family=AF_INET, sin_port=htons(1025), sin_addr=
	inet_addr("127.0.0.1")}, 16) = 0
getsockname(4, {sa_family=AF_INET, sin_port=htons(1028), sin_addr=
	inet_addr("127.0.0.1")}, [16]) = 0
close(4)                                = 0
setsockopt(3, SOL_RAW, ICMP_FILTER, ~(ICMP_ECHOREPLY|ICMP_DEST_UNREACH|
	ICMP_SOURCE_QUENCH|ICMP_REDIRECT|ICMP_TIME_EXCEEDED|
	ICMP_PARAMETERPROB),
4) = 0
setsockopt(3, SOL_IP, IP_RECVERR, [1], 4) = 0
setsockopt(3, SOL_SOCKET, SO_SNDBUF, [324], 4) = 0
setsockopt(3, SOL_SOCKET, SO_RCVBUF, [65536], 4) = 0
getsockopt(3, SOL_SOCKET, SO_RCVBUF, [27472947287556096], [4]) = 0
fstat(1, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
setsockopt(3, SOL_SOCKET, SO_TIMESTAMP, [1], 4) = 0
setsockopt(3, SOL_SOCKET, SO_SNDTIMEO, "\x01\x00\x00\x00\x00\x00
	\x00\x00\x00\x00\x00\x00\x00\x00"..., 16) = 0
setsockopt(3, SOL_SOCKET, SO_RCVTIMEO, "\x01\x00\x00\x00\x00\x00
	\x00\x00\x00\x00\x00\x00\x00\x00"..., 16) = 0
ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff418a7c90) = -1
	EINVAL (Invalid argument)
sendmsg(3, {msg_name(16)={sa_family=AF_INET, sin_port=htons(0),
	sin_addr=inet_addr("127.0.0.1")}, msg_iov(1)=[{"\x08\x00
	\xc7\x8b\xd6\x10\x00\x01\x86\x16\xa0\x46\x00\x00"..., 64}],
	msg_controllen=0, msg_flags=0}, 0) = 64
recvmsg(3, {msg_name(16)={sa_family=AF_INET, sin_port=htons(0),
	sin_addr=inet_addr("127.0.0.1")}, msg_iov(1)=[{"\x45
	\x00\x00\x54\xed\xd5\x00\x00\x40\x01\x8e\xd1\x7f\x00"..., 192}],
	msg_controllen=32, {cmsg_len=32, cmsg_level=SOL_SOCKET,
	cmsg_type=0x1d /* SCM_??? */, ...}, msg_flags=0}, 0) = 84
write(1, "PING 127.0.0.1 (127.0.0.1) 56(84"..., 106PING
	127.0.0.1 (127.0.0.1) 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.321 ms
) = 106
write(1, "\n", 1
)                       = 1
write(1, "--- 127.0.0.1 ping statistics --"..., 144---
	127.0.0.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.321/0.321/0.321/0.000 ms
) = 144
Process 4310 detached

"-e" option allows to specify what kind(s) of system calls you are interested in.

As you can see, this program opes 2 network sockets (with the descriptors 3 and 4). One of them is used only to do getsockname() call (do "man getsockname" to get more information on this call) and then it gets closed. The one associated with the descriptor 3 is used to send and receive raw IP traffic (this is the way to send the packets using the protocols different from UDP and TCP). It configures the socket using a number of options and then finally sends an IP packet (sendmsg) to the given destination. You can even see part of the message data ("-x" option requests strace to display the hex numbers instead of non-ASCII strings). If you want to see the complete data you may want to use "-s" option to set the maximum string length you want to be displayed. After sending the message it gets back the response (recvmsg). Then it displays the output of the program.

You can use "-o" option with the file name argument to redirect the output to a text file. Usually when analyzing more complicated examples it is easier to get the trace into a file first and then study this file using text editor or "grep" command.

Mac OS X (like all BSD systems) offers "ktrace" command. In order to get similar result when analyzing the "ping" command we would have to do something like this:

[root@localhost]$ ktrace -f ping.trace -t ci ping -c 1 127.0.0.1
[root@localhost]$ kdump -f ping.trace  | less

...
   684 ping     CALL  socket(0x2,0x3,0x1)
   684 ping     RET   socket 3
   684 ping     CALL  getuid
   684 ping     RET   getuid 0
   684 ping     CALL  setuid(0)
   684 ping     RET   setuid 0
   684 ping     CALL  getuid
   684 ping     RET   getuid 0
   684 ping     CALL  getpid
   684 ping     RET   getpid 684/0x2ac
   684 ping     CALL  setsockopt(0x3,0xffff,0x400,0xbffffdc8,0x4)
   684 ping     RET   setsockopt 0
   684 ping     CALL  setsockopt(0x3,0xffff,0x1002,0xbffffdb8,0x4)
   684 ping     RET   setsockopt 0
   684 ping     CALL  setsockopt(0x3,0xffff,0x1001,0xbffffdb8,0x4)
   684 ping     RET   setsockopt 0
   684 ping     CALL  fstat(0x1,0xbffef0c0)
   684 ping     RET   fstat 0
   684 ping     CALL  ioctl(0x1,FIODTYPE,0xbffef08c)
   684 ping     RET   ioctl 0
...
   684 ping     CALL  sendto(0x3,0x5294,0x40,0,0x5260,0x10)
   684 ping     GIO   fd 3 wrote 64 bytes
       "\b\08\^R\M-,\^B\0\0\M-T\^X\240F\M-'\M^H\r\0\b
        \v\f\r\^N\^O\^P\^Q\^R\^S\^T\^U\^V\^W\^X\^Y\^Z
        \^[\^\\^]\^^\^_ !"#$%&'()*+,-./01234567"
   684 ping     RET   sendto 64/0x40
   684 ping     CALL  select(0x4,0xbffffc3c,0,0,0xbffffd90)
   684 ping     RET   select 1
   684 ping     CALL  recvmsg(0x3,0xbffffd24,0)
   684 ping     GIO   fd 3 wrote 84 bytes
       "E\0@\0|\M^N\0\0@\^A\0\0\^?\0\0\^A\^?\0\0
       \^A\0\0@\^R\M-,\^B\0\0\M-T\^X\240F\M-'\M^H\r\0\b
        \v\f\r\^N\^O\^P\^Q\^R\^S\^T\^U\^V\^W\^X\^Y\^Z
        \^[\^\\^]\^^\^_ !"#$%&'()*+,-./01234567"
   684 ping     RET   recvmsg 84/0x54
   684 ping     CALL  write(0x1,0x18000,0x39)
   684 ping     GIO   fd 1 wrote 57 bytes
       "64 bytes from 127.0.0.1: icmp_seq=0 ttl=64 time=0.069 ms
       "
   684 ping     RET   write 57/0x39
   684 ping     CALL  sigaction(0x2,0xbffef8c0,0xbffef8f8)
   684 ping     RET   sigaction 0
   684 ping     CALL  sigaction(0xe,0xbffef8c0,0xbffef8f8)
   684 ping     RET   sigaction 0
   684 ping     CALL  write(0x1,0x18000,0x1)
   684 ping     GIO   fd 1 wrote 1 byte
       "
       "
   684 ping     RET   write 1
   684 ping     CALL  write(0x1,0x18000,0x22)
   684 ping     GIO   fd 1 wrote 34 bytes
       "--- 127.0.0.1 ping statistics ---
...

"ktrace" writes the tracing information into a file ("ktrace.out" if the "-f" option is not specified) and the "kdump" tool is used to decode the file into more human-readable format. As you can see the behavior of the ping program on OS X is similar to what we saw on Linux. ktrace/kdump also have some basic filtering capabilities.

Important thing to know about strace is that by default it does not follow the child processes. I.e. if you try to trace a program than spawns other processes, you will not see the operations performed by the child processes. You will see only the fork() calls. In order to "follow" the execution of all child processes you need to use "-f" option.

### Tapping the wire

Another nice feature offered by strace (and ktrace on BSD systems) is the ability to attach to any process in the system for a limited time and capture its activities. Sometimes it may help to troubleshoot the network server processes a lot. In order to "attach" to a process you need to use "-p process_ID" option offered by strace. When you are done capturing the information it is enough just to terminate strace by pressing Ctrl+C. Lets try to see what happens when Sendmail accepts an incoming connection (in the given example the number 233 is the process ID of the sendmail process).

[root@localhost]$ strace -f -p 233

And on the second console do:

[root@localhost]$ telnet localhost 25
Trying 127.0.0.1...
Connected to oberon.
Escape character is '^]'.
220 oberon ESMTP Sendmail 8.11.6p2/8.11.6-20030304; Sat, 21 Jul 2007
06:00:02 +0400
quit
221 2.0.0 oberon closing connection
Connection closed by foreign host.

And the press Ctrl+C in the first console. Here is what you will see there (use "-o" option when running strace if you want to write the output to a file):

233   select(5, [4], NULL, NULL, {5, 0}) = 1 (in [4], left {1, 750000})
233   accept(4, {sin_family=AF_INET, sin_port=htons(1226), sin_addr=
	inet_addr("127.0.0.1")}, [16]) = 5
233   time([1184983202])                = 1184983202
233   sigprocmask(SIG_BLOCK, [ALRM], []) = 0
233   pipe([6, 7])                      = 0
233   sigprocmask(SIG_BLOCK, [CHLD], [ALRM]) = 0
233   fork()                            = 20953
20953 sigprocmask(SIG_UNBLOCK, [ALRM],  <unfinished ...>
233   getpid( <unfinished ...>
20953 <... sigprocmask resumed> [ALRM CHLD]) = 0
233   <... getpid resumed> )            = 233
20953 sigprocmask(SIG_UNBLOCK, [CHLD],  <unfinished ...>
233   sigprocmask(SIG_UNBLOCK, [CHLD],  <unfinished ...>
20953 <... sigprocmask resumed> [CHLD]) = 0
233   <... sigprocmask resumed> [ALRM CHLD]) = 0
20953 sigaction(SIGCHLD, {SIG_DFL},  <unfinished ...>
233   close(6 <unfinished ...>
...
20953 open("/etc/hosts", O_RDONLY)      = 6
20953 fstat(6, {st_mode=S_IFREG|0644, st_size=963, ...}) = 0
20953 mmap(0, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|0x20, 4294967295, 0)
	= 0x11a000
20953 read(6, "#\n# hosts\t\tThis file describe"..., 4096) = 963
20953 close(6)                          = 0
...
20953 write(3, "<22>Jul 21 06:00:03 sendmail[209"..., 143) = -1 EPIPE
	(Broken pipe)
20953 --- SIGPIPE (Broken pipe) ---
20953 close(3)                          = 0
20953 close(-1)                         = -1 EBADF (Bad file number)
20953 sigaction(SIGPIPE, {SIG_IGN}, NULL) = 0
20953 sigprocmask(SIG_UNBLOCK, [ALRM], []) = 0
20953 sigprocmask(SIG_BLOCK, NULL, [])  = 0
20953 time([1184983203])                = 1184983203
20953 unlink("./dfl6L202l20953")        = -1 ENOENT (No such
	file or directory)
20953 unlink("./qfl6L202l20953")        = -1 ENOENT (No such
	file or directory)
20953 unlink("./xfl6L202l20953")        = -1 ENOENT (No such
	file or directory)
20953 getpid()                          = 20953
20953 close(7)                          = 0
20953 setuid(0)                         = 0
20953 _exit(71)                         = ?
233   <... select resumed> )            = ? ERESTARTNOHAND (To be restarted)
233   --- SIGCHLD (Child exited) ---
233   wait4(-1, [WIFEXITED(s) && WEXITSTATUS(s) == 71], WNOHANG, NULL)
	= 20953
233   wait4(-1, 0xbfffdf60, WNOHANG, NULL) = 0
233   sigprocmask(SIG_BLOCK, [ALRM], []) = 0
233   time([1184983203])                = 1184983203
233   open("/proc/loadavg", O_RDONLY)   = 5
233   fstat(5, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
233   mmap(0, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|0x20, 4294967295, 0)
	= 0x118000
233   read(5, "0.16 0.09 0.29 3/53 21316\n", 1024) = 26
233   close(5)                          = 0
233   munmap(0x118000, 4096)            = 0
233   sigprocmask(SIG_UNBLOCK, [ALRM], [ALRM]) = 0
233   getpid()                          = 233
233   select(5, [4], NULL, NULL, {5, 0} <unfinished ...>

As you can see, sendmail is initially sitting there listening for the incoming connections on the file descriptor 4 (you can use "lsof" to see that it is associated with a TCP socket) - see the second parameter of the select() call. Then the incoming connection comes in and sendmail spawns another process to handle this connection (fork() is not followed by an execve() call because the child process is also sendmail). Then the child process (PID=20953) handles the SMTP dialog and finally exits.

As you can see, it is very simple. Just to add a couple of important points: do not forget to use "-f" flag when troubleshooting the server processes because most of them spawn the child processes to handle the connections. You can also specify multiple "-p" options with different PIDs to monitor many processes at the same time.

I hope this tutorial was useful. The process tracing mechanisms available in the UNIX systems give most of the people confidence that they actually control their systems and there is nothing "mystical" running on their computers.

← Previous Archive Next →

blog comments powered by Disqus

Published

20 July 2007

Exploring System Internals with lsof and strace

Quick links

How to see the list of files opened by a running process

Published

Tags