Professional Documents
Culture Documents
Objectives
This chapter introduces Dtrace as well as Solaris commands that would allow viewing of Solaris
CPU, process, memory and I/O in action.
1.2
Chapter Outline
Dtrace
Probes
Scripts
Probe organization
Variables
Predicates
Printing
Aggregate functions
Observability commands
1.3
Introduction
At this point in time with our lectures and laboratory classes we have discussed theoretical
operating systems concepts. This chapter introduces commands that you can use to observe
these concepts as they are executing on a Solaris machine. We will discuss some basic Solaris
commands as well as the powerful DTrace toolkit.
1.4
DTrace
DTrace is a powerful application included in Solaris 10 that allows you to view the actual
behavior of a running Solaris system. DTrace was developed as a tool for debuggers. Before
DTrace, trying to see how your program executed would mean putting output statements at
certain portions of your code. This is well and good, but if you needed to see how the program
occupies memory for one, or how often the executable file calls a particular kernel function, or
even how the program accesses the hard disk, may involve actually editing the source code of
the operating system itself.
To avoid this problem, DTrace uses Probes, which are built-in in the Solaris Operating System
itself. Probes keep track of everything from memory usage of processes, to how often
interrupts are called by a program, to how data is stored on the disk. There are over 30,000
probes in the Solaris OS. Unused probes have no effect on execution. Enabling certain probes
using DTrace allows you to have an inside look under the engine of the operating system as it
runs.
1.4.1
We shall start learning DTrace with a very simple command. We will invoke dtrace with the -n
# dtrace -n BEGIN
Operating Systems
J.E.D.I.
1.4.2
Hello world
DTrace accepts a script written in the D language, a language similar to C and C++. The basic
syntax of a D script looks like this
<probe description>
/<predicate>/
{
<code>
}
The code section will be executed when the probe listed in the probe description is triggered
and the predicate (which is a conditional expression) evaluates to true.
With that idea, consider the following Hello World scritp fole for Dtrace, which we will save as
hello.d. We can see that trace(Hello world!) will run when the BEGIN probe fires, and
trace(Goodbye!) on activation of the END probe
BEGIN
{
trace("Hello world!");
}
END
{
}
trace("Goodbye!");
We will run dtrace with the -s option to inform dtrace the filename of the script we want it to
run.
Operating Systems
J.E.D.I.
# dtrace -s hello.d
dtrace: script 'hello.d' matched 2 probes
CPU
ID
FUNCTION:NAME
0
1
:BEGIN
^C
2
2
:END
Hello world!
Goodbye!
Note how we have actually modified the behavior of Dtrace to do printing when it starts and
ends. Tracing a program used to mean having to modify a program, editing it to put print
statements. With DTrace, you can trace a program without having to modify it simply by
finding the right probe.
1.4.3
Probe Organization
You can list down all the probes by running dtrace -l. There are over 30000 probes available in
Solaris.
# dtrace -l
ID
PROVIDER
1
dtrace
2
dtrace
3
dtrace
4
vminfo
5
vminfo
6 nfsmapid229
domain
MODULE
fasttrap
fasttrap
nfsmapid
FUNCTION NAME
BEGIN
END
ERROR
fasttrap_uwrite softlock
fasttrap_uread softlock
check_domain daemon-
Probes are identifed either by its unique probe ID or through its probe name which is
composed of the following values separated by colons:
Provider
Module
Function
Name
Profile probes that fire every specified interval, which can be used to get a running
sample of the system
Syscall probes for each entry and return from every system call in the system
Operating Systems
J.E.D.I.
To specify a probe in a D script, simply place its complete name in the probe description part.
By leaving out parts of the description, the code will fire on any matching probe.
For example, the following script runs the code whenever a page is loaded from virtual
memory into main memory, as well as code that will run on any probe provided by syscall.
vminfo:genunix:pageio_setup:pgin
{
trace("Page in occurred");
}
syscall:::
{
trace("Running a syscall probe");
}
1.4.4
Variables
Variables in DTrace are untyped, meaning their data type is determined only on the first
assignment of a value. For example, i = 0 creates an integer variable i with value 0 while msg
= "Hello" creates a string variable with value "Hello".
The dtrace:::BEGIN probe is often use to initialize variables. Variables once created are
accessible while the trace is running.
To show variables in action, the following dtrace script (countdown.d) shows the use of
variables, and the profile:::tick-1sec probe, which fires every second.
dtrace:::BEGIN
{
ctr = 10;
}
profile:::tick-1sec
{
trace(ctr);
ctr--;
}
dtrace:::END
{
trace("Thank you for using my program");
}
The output of countdown.d is the following:
# dtrace -s countdown.d
dtrace: script 'countdown.d' matched 3 probes
CPU
ID
FUNCTION:NAME
2 41214
:tick-1sec
2 41214
:tick-1sec
2 41214
:tick-1sec
2 41214
:tick-1sec
2 41214
:tick-1sec
2 41214
:tick-1sec
2 41214
:tick-1sec
2 41214
:tick-1sec
Operating Systems
10
9
8
7
6
5
4
3
4
J.E.D.I.
2
2
2
2
2
41214
41214
41214
41214
41214
:tick-1sec
:tick-1sec
:tick-1sec
:tick-1sec
:tick-1sec
:END
^C
1.4.5
2
1
0
-1
-2
Thank you for using my program
Predicates
Predicates act as an if statement to a script. Our script code is executed only if the probe fires
and the predicate is matched
Consider a modification to our countdown.d script
dtrace:::BEGIN
{
ctr = 10;
}
profile:::tick-1sec
/ ctr > 0 /
{
trace(ctr);
ctr--;
}
profile:::tick-1sec
/ ctr == 0/
{
trace(ctr);
exit(0);
}
dtrace:::END
{
trace("Time's up!");
}
Our first profile:::tick-1sec will run when ctr > 0, while the second one will run when ctr == 0.
The exit() function ends the trace.
# dtrace -s countdown.d
dtrace: script 'countdown.d' matched 4 probes
CPU
ID
FUNCTION:NAME
2 41214
:tick-1sec
10
2 41214
:tick-1sec
9
2 41214
:tick-1sec
8
2 41214
:tick-1sec
7
2 41214
:tick-1sec
6
2 41214
:tick-1sec
5
2 41214
:tick-1sec
4
2 41214
:tick-1sec
3
2 41214
:tick-1sec
2
2 41214
:tick-1sec
1
2 41214
:tick-1sec
0
2
2
:END
Time's up!
Operating Systems
J.E.D.I.
1.4.6
printf
The trace command simply prints out a single variable value. For better format, we can use the
printf command similar to the C/C++ printf command.
The printf() function accepts a format string and a comma separated list of variables:
printf("Time left is %d seconds. All i can say is %s", ctr, msg);
The format string is what is displayed on the screen with placeholders for variables denoted by
the % symbol
J.E.D.I.
Time
Time
Time
Time
left is 2 seconds
left is 1 seconds
left is 0 seconds
is up! You have 0 seconds left. All i can say is Goodbye
1.4.7
1.4.8
Aggregate Functions
Perhaps we don't really want to see what functions were called but how often each function
was called. We can use aggregate functions to analyze our data.
To illustrate this, we will modify our syscall.d script to produce a total of which probefunc's
were called by Bash.
syscal:::entry
/execname == "bash"/
{
printf("%s(%d) called %s\n", execname, pid, probefunc);
@[probefunc] = count();
}
Execution proceeds as regular, but at the script termination, we get the following output which
shows how often each system function was called.
Operating Systems
J.E.D.I.
^C
exece
fork1
lwp_self
schedctl
setcontext
stat64
waitsys
getpid
gtime
read
write
setpgrp
ioctl
lwp_sigmask
sigaction
1
1
1
1
1
1
1
2
3
3
4
6
15
22
33
You do not need to explicitly print an aggregrate function, as dtrace automatically assumes
that all aggregate functions are to be printed at the end of the script execution.
1.4.9
Sample script 2
Instead of just counting how many times a system call was executed, we can also view how
long it takes for the system call to run. To make things simple, we will consider only the read
function. Our next script (timestamp.d) uses two probes, syscall::read:entry and
syscall::read:exit. We also use the built-in variable timestamp which returns current time
We record the timestamp of a syscall:::entry execution and then subtract the new timestamp
on syscall:::exit to find out the duration.
syscall::read:entry
{
t = timestamp;
}
syscall::read:return
Operating Systems
J.E.D.I.
{
delay = timestamp - t;
printf("%s(%d) time in method %s: %d nsecs\n", execname, pid,
probefunc, delay);
t = 0;
}
The problem with this code is that multiple processes call this code and modify the value of t,
the time which we entered the system function. Consider a log book where a guard notes down
entry and exit times of people. If the notebook only had one entry for everybody, each new
person coming into the building will be recorded on that single slot. Your time inside the
building can no longer be determined.
To solve this problem, Dtrace proves the self structured variable. Any variable inserted in self is
unique to that thread. For example: If you declare a variable self->t, then self->t is unique to
each thread
The following is now our modification of timestamp.d:
syscall::read:entry
{
self->t = timestamp;
}
syscall::read:return
{
self->delay = timestamp - self->t;
printf("%s(%d) time in method %s: %d nsecs\n", execname, pid,
probefunc, self->delay);
self->t = 0;
}
our code's output would look like this:
# dtrace -q -s timestamp.d
Xsun(485) time in method read: 33261 nsecs
Xsun(485) time in method read: 23697 nsecs
Xsun(485) time in method read: 25137 nsecs
sshd(1405) time in method read: 46509 nsecs
Xsun(485) time in method read: 27892 nsecs
Xsun(485) time in method read: 21706 nsecs
sshd(1405) time in method read: 26893 nsecs
sshd(1405) time in method read: 13417 nsecs
sshd(1405) time in method read: 21497 nsecs
...
To make our data more meaningful, we can use the quantize aggregate function.
syscall::read:entry
{
self->t = timestamp;
}
syscall::read:return
{
self->delay = timestamp - t;
@[execname] = quantize[self->delay];
self->t = 0;
}
Operating Systems
J.E.D.I.
Xsun
1.4.10
value
8192
16384
32768
65536
value
8192
16384
32768
65536
count
0
41
4
0
DTrace toolkit
For this chapter, we only discuss an overview of DTrace and the D scripting language. You can
learn more from the Dtrace manual at http://docs.sun.com/app/docs/doc/817-6223. To make
things simple, a DTrace toolkit can be downloaded and installed. The DTrace toolkit contains a
lot of ready made D scripts that monitor various probes and produce output.
The DTrace toolkit can be downloaed from
http://www.opensolaris.org/os/community/dtrace/dtracetoolkit/ with the latest version being
DTraceToolkit-0.99.tar.gz. Once you have downloaded the file (or copied from CD) please do
the following commands.
1. gunzip DtraceToolkit-0.99.tar.gz
2. tar xvf DTraceToolkit-0.99.tar
3. cd DTraceToolkit-0.99
4. Execute the command ./install
DTraceToolkit is now installed (by default) in /opt/DDT. The following are some of the
directories installed in /opt/DDT
Docs/ - documentation
Operating Systems
10
J.E.D.I.
Each script has a man page associated with it, although these are not automatically installed.
To access the man files for the scripts, you can run the man command with additional options:
# man -M <DTT Man directory> <command>
By default, the DTT Man directory is /opt/DTT/Man
For example, to ask help on runocc.d, which is a script that checks run queue occupancy by
CPU:
# man -M /opt/DTT/Man runocc.d
There is currently under development probes for Java programs as well as probes for javascript
and web applications. Once these are implemented, they can be used to trace java programs
without having to modify source code.
1.5
CPU information
1.5.1
vmstat
The vmstat command lists down information about overall CPU behavior since CPU start.
Running vmstat on the command prompt produces the following output:
# vmstat
kthr
memory
page
disk
r b w
swap free re mf pi po fr de sr f0 s0 s2 s6
0 0 0 2678760 1842984 0 1 0 0 0 0 0 0 0 0 0
The fields are summarized as follows:
faults
sy
30
in
508
cpu
cs us sy id
46 0 1 99
w number of swapped out LWPs that are waiting for processing resources to finish
us user time
sy system time
id idle time
You can also run vmstat with an optional interval output, which basically runs the vmstat
command after each specified interval, in order to get a view of the system as it runs
# vmstat 5
kthr
memory
Operating Systems
page
disk
faults
cpu
11
J.E.D.I.
r b w
swap
0 0 0 2678752
0 0 0 2657392
0 0 0 2657392
0 0 0 2657392
0 0 0 2657392
0 0 0 2657384
0 0 0 2656272
0 0 0 2649720
0 0 0 2644960
0 0 0 2639400
You can compute
1.5.2
free re mf pi po fr de sr
1842968 0 1 0 0 0 0 0
1820800 0 5 0 0 0 0 0
1820800 0 0 0 0 0 0 0
1820800 0 0 0 0 0 0 0
1820800 0 0 0 0 0 0 0
1819848 2 24 234 0 0 0 0
1815528 3 27 2557 0 0 0 0
1791392 0 0 3103 0 0 0 0
1771032 0 0 3289 0 0 0 0
1750232 3 0 2773 0 0 0 0
for CPU utilization by subtracting
f0 s0 s2 s6
in
sy
0 0 0 0 508
30
0 0 0 0 503
50
0 0 0 0 502
34
0 0 0 0 504
44
0 0 0 0 508
70
0 0 33 0 609 663
0 0 328 0 1471 4262
0 0 438 0 1896 4418
0 0 459 0 1880 4639
0 0 395 0 1671 6227
idle time (id) from 100.
cs us sy
46 0 1
49 0 1
44 0 1
50 0 1
67 0 1
153 0 2
722 1 6
890 1 7
960 1 7
844 1 10
id
99
99
99
99
99
98
93
92
92
89
uptime
To find out how long your computer has been running, as well as CPU load averages, simply
run the uptime command:
# uptime
10:50am
up 3 day(s), 5 min(s),
2 users,
Note the load averages columns. These are the 1-, 5- and 15- minute CPU load averages.
These numbers are reflective of how many processors the computer has. For example, 1.00 is
100% of CPU utilization on a single processor computer, but half of a two processor computer.
If you are getting more than your processor count, then it means that CPU saturization is
occurring.
1.5.3
DTrace scripts
In the default /opt/DTT/Cpu directory, you can run the following dtrace scripts to get
information about your computer's CPUs:
1.5.4
Shellsnoop
Shellsnoop is an application that uses dtrace to show what is appearing on other terminals.
The following output shows a user changing his password:
# ./shellsnoop
PID PPID
1412 1411
1412 1411
1412 1411
1412 1411
1412 1411
1412 1411
1412 1411
1412 1411
1412 1411
1412 1411
1412 1411
Operating Systems
CMD DIR
bash
W
bash
R
bash
W
bash
R
bash
W
bash
R
bash
W
bash
R
bash
W
bash
R
bash
W
TEXT
#
p
p
a
a
s
s
s
s
w
w
12
J.E.D.I.
1412
1412
1412
1412
1411
1411
1411
1411
bash
bash
bash
bash
R
W
R
W
d
d
1734
1734
1734
1734
1734
1734
1734
1734
1734
1734
1412
1412
1412
1412
1412
1412
1412
1412
1412
1412
1412
1411
passwd
passwd
passwd
passwd
passwd
passwd
passwd
passwd
passwd
passwd
bash
W
W
W
W
W
W
W
W
W
W
W
passwd
: Changing password for
mario
New Password:
Re-enter new Password:
passwd: password successfully changed for mario
#
Password is still not shown, shellsnoop only displays what appears on the terminal screen
1.6
Processes
1.6.1
ps
PPID
0
0
0
0
1411
1
1
1
1
1
C
STIME TTY
0
Dec 10 ?
0
Dec 10 ?
0
Dec 10 ?
1
Dec 10 ?
0 08:14:12 pts/3
0
Dec 10 ?
0
Dec 10 ?
0
Dec 10 ?
TIME
0:12
0:01
0:00
6:29
0:01
0:09
0:24
0:00
0
0
0:00 /usr/sbin/rpcbind
0:00 /usr/sbin/cron
Dec 10 ?
Dec 10 ?
CMD
sched
/sbin/init
pageout
fsflush
bash
/lib/svc/bin/svc.startd
/lib/svc/bin/svc.configd
/usr/lib/snmp/snmpdx -y -c
The -e -f are options that would display all the processes and generates full columns
respectively. Columns shown are the following
PID process id
Operating Systems
13
J.E.D.I.
1.6.2
Process scripts are located in /opt/DDT/Proc. The following are some of the scripts which can
be used to monitor processes:
sampleproc an executable file that uses dtrace which inspects how much CPU the
application is using.
1.7
Memory
1.7.1
pmap -x
The pmap command, with the -x option, shows the memory layout of a given process id. The
following example shows the memory map of process with process id 1234. If you want to
explore the memory address of a particular process, you would have to use ps to find out its
pid.
# pmap -x 1412
1412:
bash
Address Kbytes
00010000
648
000C0000
80
000D4000
168
FF100000
864
...
1.7.2
RSS
624
48
168
856
Anon
16
64
-
Locked
-
Mode
r-x-rwx-rwx-r-x--
Mapped File
bash
bash
[ heap ]
libc.so.1
Memory scripts for dtrace can be found in /opt/DTT/Mem. The following are some of the scripts
that can be used to analyze memory
1.8
Disk
1.8.1
Dtrace toolkit
Operating Systems
14
J.E.D.I.
filename
The following is an example of the output if io snoop which shows files currently being edited
by which command:
# ./iosnoop
UID
PID
0
3
0 1726
0 1726
0 1726
0 1726
0 1726
0 1726
0 1726
0 1726
0 1726
1.9
D
W
R
R
R
R
W
W
W
R
R
BLOCK
SIZE
28720
2560
71712
8192
12227792
8192
11606832
8192
12227936
8192
1081232
8192
1372256 90112
11627936 958464
12228880
8192
12229184
8192
COMM
fsflush
vi
vi
vi
vi
vi
vi
vi
vi
vi
PATHNAME
<none>
/export/home/alice/temp.txt
/export/home/alice/temp.txt
<none>
/export/home/alice/temp.txt
/var/tmp/ExtBaWxd
/var/tmp/ExtBaWxd
/var/tmp/ExtBaWxd
/export/home/alice/temp.txt
/export/home/alice/temp.txt
Chime
Operating Systems
15