Exploring System Internals with lsof and strace
This article describes various approaches to looking at the processes in you UNIX environment. Unlike other operating systems, most of the UNIX systems allow you to get a lot of information about the state of the system, processes, I/O, networking etc.
Why and when do you need it? There may be endless list of situations that may require you to look closely at your system. For example:
- You face strange behavior from one of the processes, it consumes too much CPU etc
- The program that worked before now fails with "Segmentation fault" or another error message
- You have installed a new package and the documentation does not clearly explains the order or loading of the configuration files
- There is a process in the system that uses a TCP port for incoming connections but you do not know which one
- Anything else that sounds like "what is it doing", "who is using this resource" etc…
Quick links
- Basic terms
- Turn out your pockets, process!
- Who is using that file?
- Watch your steps
- Tapping the wire
This section covers briefly the terminology that you need to familiarize yourself with in order to understand the results of the operations described later in the article. Do not worry, it is not more difficult when answering to Windows Vista UAC question ;)
- System Call (a.k.a. "syscall")
- System Call is a mechanism for the application to make a call to the operating system. The OS usually has a limited number of the "entry points" that can be used for the basic and fundamental tasks like file I/O, memory allocation, networking etc. For example, most of the UNIX systems have only 1-3 different system calls to open a file (although various libraries offer dozens of ways to deal with a file, they all finally make the same system call). The important thing about the system call that you need to understand is that there are few of them and they are the only way for any program to interact with the OS.
- Process ID (PID)
- Any process in the UNIX OSes is defined by its unique numeric PID, even if the process runs several threads. From the moment you start a program till the moment it is finished it maintains the same PID. However, some processes can span other processes (usually by using one of the
fork()
syscalls).
- File Handle
- Any I/O operations in UNIX is associated with a file handle which is a positive integer number. It applies to any file on the disk, a network socket etc. A typical operation with a file on the disk, for example, is one of the following:
- open("filename") -> returns a handle that from this moment will be associated with this open file
- read(handle, where, how_much)
- write(handle, from_where, how_much)
- close(handle) -> removes the association established with open()
### Turn out your pockets, process!
We will start from a tool that can show us a snapshot of what is happening right now with our system. There is an excellent tool that is available for many UNIX systems called "lsof" ("LiSt Open Files"). You can find more information about this tool here. Most of the Linux distributions and Mac OS X have this tool included.
Here is what this tool can do:
- Show the list of files and other resources open by a selected process or by all processes in the system
- Show the list of processes using a particular resource (like a TCP socket, for example)
You can do various types of sorting, filtering and formatting for getting exactly the results you want. The program has about 3 dozens of command line options, we will try a couple of typical and most representative examples now. Note: most of these examples require root access to the system (or you will be limited to seeing only the information that is related to your processes).
How to see the list of files opened by a running process
For example, we would like to see which resources are currently used by Apache HTTP server. First, we need to identify the process we are interested in:
[root@localhost]$ ps ax | grep httpd | grep -v grep
2383 ? Ss 0:00 /usr/sbin/httpd 2408 ? S 0:00 /usr/sbin/httpd 2409 ? S 0:00 /usr/sbin/httpd 2410 ? S 0:00 /usr/sbin/httpd 2411 ? S 0:00 /usr/sbin/httpd 2412 ? S 0:00 /usr/sbin/httpd 2414 ? S 0:00 /usr/sbin/httpd 2415 ? S 0:00 /usr/sbin/httpd 2416 ? S 0:00 /usr/sbin/httpd
Hmm…which one I am interested in? Unfortunately, there is no universal recipe to answer this question. However…lets look at the problem from another perspective. At some point I have mentioned that when you start a program, the OS creates a process. And a process can spawn other processes. So, we can assume that there may be a tree of sub-processed related to one parent. Lets see how we can get this information (this command will work on both Linux and OS X):
[root@localhost]$ ps axo pid,ppid,command | egrep '(httpd)|(PID)' | grep -v grep
2383 1 /usr/sbin/httpd 2408 2383 /usr/sbin/httpd 2409 2383 /usr/sbin/httpd 2410 2383 /usr/sbin/httpd 2411 2383 /usr/sbin/httpd 2412 2383 /usr/sbin/httpd 2414 2383 /usr/sbin/httpd 2415 2383 /usr/sbin/httpd 2416 2383 /usr/sbin/httpd
PPID is simply Parent PID, i.e. Parent Process ID. As you can see, most of these processes have the same PPID (2383) and there is one process with the PID of 2383. This must be the parent process. If you are curious to find out who is its parent, I will let you to do this exercise yourself.
Linux has another fancy tool called "pstree". We could use it to get the same kind of information:
[root@localhost]$ pstree -p | grep httpd
|-httpd(2383)-+-httpd(2408) | |-httpd(2409) | |-httpd(2410) | |-httpd(2411) | |-httpd(2412) | |-httpd(2414) | |-httpd(2415) | `-httpd(2416)
Looks better and quite self-explaining, isn't it? OK, back to the original question. Now we know that there are certain relationships with these "httpd"s. So, our target will be the process 2383.
In its simples form, "lsof" can be called like this:
[root@localhost]$ lsof -p 2383
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME httpd 2383 root cwd DIR 3,6 1024 2 / httpd 2383 root rtd DIR 3,6 1024 2 / httpd 2383 root txt REG 3,8 328136 261136 /usr/sbin/httpd httpd 2383 root mem REG 3,6 130448 8167 /lib64/ld-2.5.so httpd 2383 root DEL REG 0,8 8072 /dev/zero httpd 2383 root mem REG 3,6 615136 79579 /lib64/libm-2.5.so httpd 2383 root mem REG 3,6 117656 8186 /lib64/libpcre.so.0.0.1 httpd 2383 root mem REG 3,6 95480 8213 /lib64/libselinux.so.1 ... httpd 2383 root mem REG 3,8 46160 1704830 /usr/lib64/ php/modules/ldap.so httpd 2383 root DEL REG 0,8 8120 /dev/zero httpd 2383 root DEL REG 0,8 8122 /dev/zero httpd 2383 root 0r CHR 1,3 1488 /dev/null httpd 2383 root 1w CHR 1,3 1488 /dev/null httpd 2383 root 2w REG 3,12 1077 4210 /var/log/ httpd/error_log httpd 2383 root 3r CHR 1,9 1922 /dev/urandom httpd 2383 root 4u IPv6 8052 TCP *:http (LISTEN) httpd 2383 root 5u sock 0,5 8053 can't identify protocol httpd 2383 root 6u IPv6 8057 TCP *:https (LISTEN) httpd 2383 root 7u sock 0,5 8058 can't identify protocol httpd 2383 root 8r FIFO 0,6 8069 pipe httpd 2383 root 9w FIFO 0,6 8069 pipe httpd 2383 root 10w REG 3,12 1077 4210 /var/log/ httpd/error_log httpd 2383 root 11w REG 3,12 711 4215 /var/log/ httpd/ssl_error_log httpd 2383 root 12w REG 3,12 0 4138 /var/log/ httpd/access_log httpd 2383 root 13w REG 3,12 0 4151 /var/log/ httpd/ssl_access_log httpd 2383 root 14w REG 3,12 0 4152 /var/log/ httpd/ssl_request_log
Looks quite interesting, doesn't it? Now I am going to briefly explain the meaning of each kind of entry (you can get more information from the manual page for "lsof").
- COMMAND
- Contains first 9 characters of the UNIX command name associated with the process
- PID
- The process ID. In our example it is simply the PID we have asked the information for
- USER
- The UNIX user owning the process
- FD
- File Descriptor number. Actually, it is more than the descriptor I have mentioned before. There are special descriptors identifying various resources in the UNIX system. For more details I recommend to check the man page. As you can see, everything is there: the current working directory (cwd), the process image itself (txt), memory-mapped libraries etc. The most interesting for us are the open files and sockets: you can see the numbers 0..14 followed by a suffix ("w" for writing only, "r" for reading only, "u" for both reading and writing and others). For example, you can see that Apache httpd redirects all its error output (2w) to /var/log/https/error_log. It also has a bunch of other regular files open for writing (handles 10-14, various log files). You can also see that it is listening on two TCP sockets (the port numbers shown are resolved through /etc/services file, you can always use "-n" option to disable any name resolution - useful if you, for example, know the port number and want to look for it specifically). You can see two more sockets (do not be afraid of the message "can't identify...", it is just application-specific protocol), Apache is using them to talk to its sub-processes.
- TYPE
- Type of the node. There is several dozens of various node types, like "REG" for the regular files, "IPv4" (or "IPv6 for IP version 6) for the Internet sockets, "CHR" for the character devices, "DIR" for the directories etc.
- DEVICE
- Shows the UNIX device numbers (major, minor) associated with the resource. Another interesting parameter. For example, on my system the numbers 3,12 correspond to:
[root@localhost]$ ls -l /dev/* | egrep '3,\s*12'
brw-r----- 1 root disk 3, 12 Jul 17 19:17 /dev/hda12
[root@localhost]$ mount | grep hda12
/dev/hda12 on /var type ext3 (rw)
This is true, the files are locates on one of the partitions on the first IDE disk mounted at /var. - SIZE
- The size of the file or the current offset (in bytes). This number does not make sense for some of the resource types so you see blank values there.
- NODE
- Node number of the file, protocol identifier (for Internet sockets) etc.
- Name
- The name of the file or resource, if applicable. The value is quite self-explaining and probably one of the most useful.
Try using this command for other processes in your system. Probably one of the most interesting ones to look at will be your X11 server.
There is a number of filters you can use to see only the information you are interested in. For example, I want to know does my iTunes talk to someone over the network or not. Here is a command I would use something like:
[root@localhost]$ lsof -p 1559 -a -i
Which produces empty answer (exactly what I was hoping for :) ). But if I try my SSH client instead of iTunes:
[root@localhost]$ lsof -p 1586 -a -i
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME ssh 1586 nik 3u IPv4 0x40e7394 0t0 TCP macbook:50648 ->oberon:ssh (ESTABLISHED)
Summary: you can get a snapshot of any live process using this command. Sometimes it is simpler to use "lsof" (especially on a system you are not familiar with) to find the location of the log file, config files for a service etc.
From time to time you may become very curious about who (or at least which process :) ) is using certain resource(s) on your system. There may be a number of reasons: a server that does not start complaining about the port that is already being used, a suspicious network connection that you see when using "netstat" command, inability to unmount a network disk or a external storage unit etc.
A traditional "fuser" command is not always good enough for these purposes. For example, if I want to see all the processes that have any open files under /etc directory. Here is the command that give me the answer:
[root@localhost]$ lsof +D /etc
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME dbus-daem 2206 dbus 5r DIR 3,6 1024 59500 /etc/dbus-1/system.d acpid 2305 root 6r REG 3,6 236 63639 /etc/acpi/ events/video.conf avahi-dae 2541 avahi cwd DIR 3,6 1024 63468 /etc/avahi avahi-dae 2541 avahi rtd DIR 3,6 1024 63468 /etc/avahi prefdm 2662 root 255r REG 3,6 1465 59322 /etc/X11/prefdm 000-delay 3831 root 255r REG 3,6 577 59446 /etc/cron. daily/000-delay.cron
When you use "+D path" option, lsof checks every single file under the specified path and shows the information about the processes using it. This command may take some time to execute if you have a lot of files there.
Another situation: a network connection. "-i" option of lsof allows you to search for a specific network connection. The format is following:
lsof -i [4|6][protocol][@hostname|hostaddr][:service|port]
"4" or "6" defines the IP protocol version. If you do not use any filters and just use "-i" you will get the list of all connections in your system. Lets find all the UDP connections that we have on a OS X box:
[root@localhost]$ lsof -i UDP
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME launchd 1 root 10u IPv4 0x393ed40 0t0 UDP *:netbios-ns mDNSRespo 39 root 7u IPv4 0x393e790 0t0 UDP *:mdns mDNSRespo 39 root 8u IPv6 0x393e6c0 0t0 UDP *:mdns mDNSRespo 39 root 9u IPv4 0x41556c0 0t0 UDP *:mdns mDNSRespo 39 root 11u IPv4 0x4155ee0 0t0 UDP *:mdns mDNSRespo 39 root 12u IPv4 0x393dd00 0t0 UDP macbook:49727 netinfod 40 root 6u IPv4 0x393ee10 0t0 UDP localhost: netinfo-local syslogd 41 root 16u IPv4 0x393ec70 0t0 UDP *:49175 Directory 52 root 10u IPv4 0x3fbd1a0 0t0 UDP *:* Directory 52 root 12u IPv4 0x4155d40 0t0 UDP *:* ntpd 141 root 5u IPv4 0x393e520 0t0 UDP *:ntp ntpd 141 root 6u IPv6 0x393e110 0t0 UDP *:ntp ntpd 141 root 7u IPv6 0x393ead0 0t0 UDP localhost:ntp ntpd 141 root 8u IPv6 0x393e860 0t0 UDP [fe80:1::1]:ntp ntpd 141 root 9u IPv4 0x393e5f0 0t0 UDP localhost:ntp ntpd 141 root 10u IPv6 0x393e450 0t0 UDP [fe80:5::216: cbff:febf:a08d]:ntp ntpd 141 root 11u IPv4 0x393e380 0t0 UDP macbook:ntp prl_dhcpd 147 root 5u IPv4 0x393db60 0t0 UDP *:* automount 168 root 8u IPv4 0x3fbd000 0t0 UDP localhost:1023 automount 179 root 8u IPv4 0x393e930 0t0 UDP localhost:1022 nmbd 1078 root 0u IPv4 0x393ed40 0t0 UDP *:netbios-ns nmbd 1078 root 1u IPv4 0x393ed40 0t0 UDP *:netbios-ns nmbd 1078 root 8u IPv4 0x393ea00 0t0 UDP *:netbios-dgm nmbd 1078 root 9u IPv4 0x3fbd4e0 0t0 UDP 10.37.129.2: netbios-ns nmbd 1078 root 10u IPv4 0x3fbeba0 0t0 UDP 10.37.129.2: netbios-dgm nmbd 1078 root 11u IPv4 0x4155380 0t0 UDP macbook: netbios-ns nmbd 1078 root 12u IPv4 0x4155450 0t0 UDP macbook: netbios-dgm cupsd 1087 root 6u IPv4 0x3fbd340 0t0 UDP *:ipp
Do not be surprised if you do not see anything on your machine - you will not see the information about the processes that do not belong to you unless you are root.
Summary: using lsof with various filters allows you to search through your system and find out the detailed information about any local or network resource used.
We have learned how to get a snapshot or your UNIX system (or any individual process living in it). But sometimes it is not enough to look at the static information. Dynamic runtime information may answer a lot of other questions - why does the program crash, why does it say "cannot open resource" without providing the resource location, is it accepting the network connection or not etc.
In the fisrt section it was mentioned that the way the UNIX program talks to the operating system (i.e. deals with almost any resource on your computer) is called "system calls". Most of the UNIX kernels, including the Linux and OS X provide a way for the user to see the list of the system calls a process makes in real time, including the parameters the process passes to the kernel. There are different tools on different systems that provide this functionality. On Linux it is called "strace", on BSD (including OS X) there are two toolss working together - "ktrace" and "kdump". We will briefly cover both of them starting from the Linux one.
The really great thing about these tools is that they use the kernel tracing facility in order to work. Thus, they work with ANY program that is running in your system, it does not have to be something you have developed and you do not need the source code of the program in order to understand (in general, of course) what it is doing.
It is important to mention again that just like in case of "lsof" you cannot see anything that you are not supposed to see, i.e. you cannot see what process is doing if it is not yours (or unless you are root user).
The simplest way to run the "strace" would be as:
strace command
Lets see what a simple well-known like "w" (the command that shows the users logged in and their current activities) does:
[root@localhost]$ strace /usr/bin/w 2>&1 | less
execve("/usr/bin/w", ["/usr/bin/w"], [/* 30 vars */]) = 0 brk(0) = 0x603000 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2aaaaaaab000 uname({sys="Linux", node="oberon.anglab.com", ...}) = 0 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=103874, ...}) = 0 mmap(NULL, 103874, PROT_READ, MAP_PRIVATE, 3, 0) = 0x2aaaaaaac000 close(3) = 0 open("/lib64/libproc-3.2.7.so", O_RDONLY) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\ 240;`\314"..., 832) = 832 ... open("/proc/uptime", O_RDONLY) = 3 lseek(3, 0, SEEK_SET) = 0 read(3, "7267.86 7230.07\n", 1023) = 16 access("/var/run/utmpx", F_OK) = -1 ENOENT (No such file or directory) open("/var/run/utmp", O_RDWR) = -1 EACCES (Permission denied) open("/var/run/utmp", O_RDONLY) = 4 fcntl(4, F_GETFD) = 0 ... open("/proc", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = 3 fstat(3, {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0 fcntl(3, F_SETFD, FD_CLOEXEC) = 0 getdents(3, /* 35 entries */, 1024) = 1000 getdents(3, /* 38 entries */, 1024) = 1016 stat("/proc/1", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0 open("/proc/1/stat", O_RDONLY) = 4 read(4, "1 (init) S 0 1 1 0 -1 4194560 64"..., 1023) = 216 close(4) = 0 socket(PF_FILE, SOCK_STREAM, 0) = 4 ... fcntl(5, F_SETLKW, {type=F_UNLCK, whence=SEEK_SET, start=0, len=0}) = 0 alarm(0) = 1 rt_sigaction(SIGALRM, {SIG_DFL}, NULL, 8) = 0 close(5) = 0 write(1, " 21:31:48 up 2:01, 2 users, l"..., 281 21:31:48 up 2:01, 2 users, load average: 0.00, 0.00, 0.00 USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT nick pts/0 192.168.8.18 21:12 9.00s 0.03s 0.03s -bash nick pts/1 192.168.8.18 21:19 0.00s 0.13s 0.04s strace /usr/bin ) = 281 exit_group(0) = ? Process 3947 detached
Only a small part of the long output is shown here. As you can see, everything starts with the "execve" syscall. Which is correct - this is the call that executes a program in UNIX. By the way, for each of these calls you can get more information from the manual pages - just type "man execve" to learn more about the call and its parameters (as you can see the parameters are shown in the trace).And, as it is supposed to be, the execution ends with "exit_group" call with the status 0 (success).
You can see how the program being started loads the shared libraries it depends on. You can see that at some point it tries to open "var/run/utmpx" file but it does not exist (ENOENT). Then it tries to open "/var/run/utmp" file but it is not readable for this program (i.e. for this user) so it gets back (EACCES). Finally this program decides that the only way to find out who is logged in is to go through all the active processes and check where their standard input and output are attached to. In order to do it, "w" readsthe list of files and directories under "/proc" (open - getdents - close). By the way, the number returned by the open() call is the file handle. By using it, you can follow the operations with this file since all read() or write() system calls will use the handle. After getting the list of files, "w" goes over all process directories and reads the virtual files from there to get more information about the processes. And fnally it displays the results by writing them to the file descriptor 1 (standard output, stdout). Very simple and informative.
There is a number of parameters that you can use with "strace" to filter the results and get exactly the information you are looking for. For example, if you are interested only in the network operations, lets see what does "ping" command do (run it as root user):
[root@localhost]$ strace -x -e trace=network,desc ping -c 1 127.0.0.1
... socket(PF_INET, SOCK_RAW, IPPROTO_ICMP) = 3 socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 4 connect(4, {sa_family=AF_INET, sin_port=htons(1025), sin_addr= inet_addr("127.0.0.1")}, 16) = 0 getsockname(4, {sa_family=AF_INET, sin_port=htons(1028), sin_addr= inet_addr("127.0.0.1")}, [16]) = 0 close(4) = 0 setsockopt(3, SOL_RAW, ICMP_FILTER, ~(ICMP_ECHOREPLY|ICMP_DEST_UNREACH| ICMP_SOURCE_QUENCH|ICMP_REDIRECT|ICMP_TIME_EXCEEDED| ICMP_PARAMETERPROB), 4) = 0 setsockopt(3, SOL_IP, IP_RECVERR, [1], 4) = 0 setsockopt(3, SOL_SOCKET, SO_SNDBUF, [324], 4) = 0 setsockopt(3, SOL_SOCKET, SO_RCVBUF, [65536], 4) = 0 getsockopt(3, SOL_SOCKET, SO_RCVBUF, [27472947287556096], [4]) = 0 fstat(1, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0 setsockopt(3, SOL_SOCKET, SO_TIMESTAMP, [1], 4) = 0 setsockopt(3, SOL_SOCKET, SO_SNDTIMEO, "\x01\x00\x00\x00\x00\x00 \x00\x00\x00\x00\x00\x00\x00\x00"..., 16) = 0 setsockopt(3, SOL_SOCKET, SO_RCVTIMEO, "\x01\x00\x00\x00\x00\x00 \x00\x00\x00\x00\x00\x00\x00\x00"..., 16) = 0 ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff418a7c90) = -1 EINVAL (Invalid argument) sendmsg(3, {msg_name(16)={sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.1")}, msg_iov(1)=[{"\x08\x00 \xc7\x8b\xd6\x10\x00\x01\x86\x16\xa0\x46\x00\x00"..., 64}], msg_controllen=0, msg_flags=0}, 0) = 64 recvmsg(3, {msg_name(16)={sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.1")}, msg_iov(1)=[{"\x45 \x00\x00\x54\xed\xd5\x00\x00\x40\x01\x8e\xd1\x7f\x00"..., 192}], msg_controllen=32, {cmsg_len=32, cmsg_level=SOL_SOCKET, cmsg_type=0x1d /* SCM_??? */, ...}, msg_flags=0}, 0) = 84 write(1, "PING 127.0.0.1 (127.0.0.1) 56(84"..., 106PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data. 64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.321 ms ) = 106 write(1, "\n", 1 ) = 1 write(1, "--- 127.0.0.1 ping statistics --"..., 144--- 127.0.0.1 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.321/0.321/0.321/0.000 ms ) = 144 Process 4310 detached
"-e" option allows to specify what kind(s) of system calls you are interested in.
As you can see, this program opes 2 network sockets (with the descriptors 3 and 4). One of them is used only to do getsockname() call (do "man getsockname" to get more information on this call) and then it gets closed. The one associated with the descriptor 3 is used to send and receive raw IP traffic (this is the way to send the packets using the protocols different from UDP and TCP). It configures the socket using a number of options and then finally sends an IP packet (sendmsg) to the given destination. You can even see part of the message data ("-x" option requests strace to display the hex numbers instead of non-ASCII strings). If you want to see the complete data you may want to use "-s" option to set the maximum string length you want to be displayed. After sending the message it gets back the response (recvmsg). Then it displays the output of the program.
You can use "-o" option with the file name argument to redirect the output to a text file. Usually when analyzing more complicated examples it is easier to get the trace into a file first and then study this file using text editor or "grep" command.
Mac OS X (like all BSD systems) offers "ktrace" command. In order to get similar result when analyzing the "ping" command we would have to do something like this:
[root@localhost]$ ktrace -f ping.trace -t ci ping -c 1 127.0.0.1 [root@localhost]$ kdump -f ping.trace | less
... 684 ping CALL socket(0x2,0x3,0x1) 684 ping RET socket 3 684 ping CALL getuid 684 ping RET getuid 0 684 ping CALL setuid(0) 684 ping RET setuid 0 684 ping CALL getuid 684 ping RET getuid 0 684 ping CALL getpid 684 ping RET getpid 684/0x2ac 684 ping CALL setsockopt(0x3,0xffff,0x400,0xbffffdc8,0x4) 684 ping RET setsockopt 0 684 ping CALL setsockopt(0x3,0xffff,0x1002,0xbffffdb8,0x4) 684 ping RET setsockopt 0 684 ping CALL setsockopt(0x3,0xffff,0x1001,0xbffffdb8,0x4) 684 ping RET setsockopt 0 684 ping CALL fstat(0x1,0xbffef0c0) 684 ping RET fstat 0 684 ping CALL ioctl(0x1,FIODTYPE,0xbffef08c) 684 ping RET ioctl 0 ... 684 ping CALL sendto(0x3,0x5294,0x40,0,0x5260,0x10) 684 ping GIO fd 3 wrote 64 bytes "\b\08\^R\M-,\^B\0\0\M-T\^X\240F\M-'\M^H\r\0\b \v\f\r\^N\^O\^P\^Q\^R\^S\^T\^U\^V\^W\^X\^Y\^Z \^[\^\\^]\^^\^_ !"#$%&'()*+,-./01234567" 684 ping RET sendto 64/0x40 684 ping CALL select(0x4,0xbffffc3c,0,0,0xbffffd90) 684 ping RET select 1 684 ping CALL recvmsg(0x3,0xbffffd24,0) 684 ping GIO fd 3 wrote 84 bytes "E\0@\0|\M^N\0\0@\^A\0\0\^?\0\0\^A\^?\0\0 \^A\0\0@\^R\M-,\^B\0\0\M-T\^X\240F\M-'\M^H\r\0\b \v\f\r\^N\^O\^P\^Q\^R\^S\^T\^U\^V\^W\^X\^Y\^Z \^[\^\\^]\^^\^_ !"#$%&'()*+,-./01234567" 684 ping RET recvmsg 84/0x54 684 ping CALL write(0x1,0x18000,0x39) 684 ping GIO fd 1 wrote 57 bytes "64 bytes from 127.0.0.1: icmp_seq=0 ttl=64 time=0.069 ms " 684 ping RET write 57/0x39 684 ping CALL sigaction(0x2,0xbffef8c0,0xbffef8f8) 684 ping RET sigaction 0 684 ping CALL sigaction(0xe,0xbffef8c0,0xbffef8f8) 684 ping RET sigaction 0 684 ping CALL write(0x1,0x18000,0x1) 684 ping GIO fd 1 wrote 1 byte " " 684 ping RET write 1 684 ping CALL write(0x1,0x18000,0x22) 684 ping GIO fd 1 wrote 34 bytes "--- 127.0.0.1 ping statistics --- ...
"ktrace" writes the tracing information into a file ("ktrace.out" if the "-f" option is not specified) and the "kdump" tool is used to decode the file into more human-readable format. As you can see the behavior of the ping program on OS X is similar to what we saw on Linux. ktrace/kdump also have some basic filtering capabilities.
Important thing to know about strace is that by default it does not follow the child processes. I.e. if you try to trace a program than spawns other processes, you will not see the operations performed by the child processes. You will see only the fork() calls. In order to "follow" the execution of all child processes you need to use "-f" option.
Another nice feature offered by strace (and ktrace on BSD systems) is the ability to attach to any process in the system for a limited time and capture its activities. Sometimes it may help to troubleshoot the network server processes a lot. In order to "attach" to a process you need to use "-p process_ID" option offered by strace. When you are done capturing the information it is enough just to terminate strace by pressing Ctrl+C. Lets try to see what happens when Sendmail accepts an incoming connection (in the given example the number 233 is the process ID of the sendmail process).
[root@localhost]$ strace -f -p 233
And on the second console do:
[root@localhost]$ telnet localhost 25 Trying 127.0.0.1... Connected to oberon. Escape character is '^]'. 220 oberon ESMTP Sendmail 8.11.6p2/8.11.6-20030304; Sat, 21 Jul 2007 06:00:02 +0400 quit 221 2.0.0 oberon closing connection Connection closed by foreign host.
And the press Ctrl+C in the first console. Here is what you will see there (use "-o" option when running strace if you want to write the output to a file):
233 select(5, [4], NULL, NULL, {5, 0}) = 1 (in [4], left {1, 750000}) 233 accept(4, {sin_family=AF_INET, sin_port=htons(1226), sin_addr= inet_addr("127.0.0.1")}, [16]) = 5 233 time([1184983202]) = 1184983202 233 sigprocmask(SIG_BLOCK, [ALRM], []) = 0 233 pipe([6, 7]) = 0 233 sigprocmask(SIG_BLOCK, [CHLD], [ALRM]) = 0 233 fork() = 20953 20953 sigprocmask(SIG_UNBLOCK, [ALRM], <unfinished ...> 233 getpid( <unfinished ...> 20953 <... sigprocmask resumed> [ALRM CHLD]) = 0 233 <... getpid resumed> ) = 233 20953 sigprocmask(SIG_UNBLOCK, [CHLD], <unfinished ...> 233 sigprocmask(SIG_UNBLOCK, [CHLD], <unfinished ...> 20953 <... sigprocmask resumed> [CHLD]) = 0 233 <... sigprocmask resumed> [ALRM CHLD]) = 0 20953 sigaction(SIGCHLD, {SIG_DFL}, <unfinished ...> 233 close(6 <unfinished ...> ... 20953 open("/etc/hosts", O_RDONLY) = 6 20953 fstat(6, {st_mode=S_IFREG|0644, st_size=963, ...}) = 0 20953 mmap(0, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|0x20, 4294967295, 0) = 0x11a000 20953 read(6, "#\n# hosts\t\tThis file describe"..., 4096) = 963 20953 close(6) = 0 ... 20953 write(3, "<22>Jul 21 06:00:03 sendmail[209"..., 143) = -1 EPIPE (Broken pipe) 20953 --- SIGPIPE (Broken pipe) --- 20953 close(3) = 0 20953 close(-1) = -1 EBADF (Bad file number) 20953 sigaction(SIGPIPE, {SIG_IGN}, NULL) = 0 20953 sigprocmask(SIG_UNBLOCK, [ALRM], []) = 0 20953 sigprocmask(SIG_BLOCK, NULL, []) = 0 20953 time([1184983203]) = 1184983203 20953 unlink("./dfl6L202l20953") = -1 ENOENT (No such file or directory) 20953 unlink("./qfl6L202l20953") = -1 ENOENT (No such file or directory) 20953 unlink("./xfl6L202l20953") = -1 ENOENT (No such file or directory) 20953 getpid() = 20953 20953 close(7) = 0 20953 setuid(0) = 0 20953 _exit(71) = ? 233 <... select resumed> ) = ? ERESTARTNOHAND (To be restarted) 233 --- SIGCHLD (Child exited) --- 233 wait4(-1, [WIFEXITED(s) && WEXITSTATUS(s) == 71], WNOHANG, NULL) = 20953 233 wait4(-1, 0xbfffdf60, WNOHANG, NULL) = 0 233 sigprocmask(SIG_BLOCK, [ALRM], []) = 0 233 time([1184983203]) = 1184983203 233 open("/proc/loadavg", O_RDONLY) = 5 233 fstat(5, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 233 mmap(0, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|0x20, 4294967295, 0) = 0x118000 233 read(5, "0.16 0.09 0.29 3/53 21316\n", 1024) = 26 233 close(5) = 0 233 munmap(0x118000, 4096) = 0 233 sigprocmask(SIG_UNBLOCK, [ALRM], [ALRM]) = 0 233 getpid() = 233 233 select(5, [4], NULL, NULL, {5, 0} <unfinished ...>
As you can see, sendmail is initially sitting there listening for the incoming connections on the file descriptor 4 (you can use "lsof" to see that it is associated with a TCP socket) - see the second parameter of the select() call. Then the incoming connection comes in and sendmail spawns another process to handle this connection (fork() is not followed by an execve() call because the child process is also sendmail). Then the child process (PID=20953) handles the SMTP dialog and finally exits.
As you can see, it is very simple. Just to add a couple of important points: do not forget to use "-f" flag when troubleshooting the server processes because most of them spawn the child processes to handle the connections. You can also specify multiple "-p" options with different PIDs to monitor many processes at the same time.
I hope this tutorial was useful. The process tracing mechanisms available in the UNIX systems give most of the people confidence that they actually control their systems and there is nothing "mystical" running on their computers.
blog comments powered by Disqus