You are on page 1of 120

Chapter 1: Introduction

C is (as K&R admit) a relatively small language, but one which (to its admirers, anyway) wears well. C's small, unambitious feature set is a real advantage: there's less to learn there isn't e!cess baggage in the way when you don't need it. "t can also be a disadvantage: since it doesn't do everything for you, there's a lot you have to do yourself. (#ctually, this is viewed by many as an additional advantage: anything the language doesn't do for you, it doesn't dictate to you, either, so you're free to do that something however you want.) C is sometimes referred to as a $$high%level assembly language.'' &ome 'eo'le thin( that's an insult, but it's actually a deliberate and significant as'ect of the language. "f you have 'rogrammed in assembly language, you'll 'robably find C very natural and comfortable (although if you continue to focus too heavily on machine%level details, you'll 'robably end u' with unnecessarily non'ortable 'rograms). "f you haven't 'rogrammed in assembly language, you may be frustrated by C's lac( of certain higher%level features. "n either case, you should understand why C was designed this way: so that seemingly% sim'le constructions e!'ressed in C would not e!'and to arbitrarily e!'ensive (in time or s'ace) machine language constructions when com'iled. "f you write a C 'rogram sim'ly and succinctly, it is li(ely to result in a succinct, efficient machine language e!ecutable. "f you find that the e!ecutable 'rogram resulting from a C 'rogram is not efficient, it's 'robably because of something silly you did, not because of something the com'iler did behind your bac( which you have no control over. "n any case, there's no 'oint in com'laining about C's low%level flavor: C is what it is. # 'rogramming language is a tool, and no tool can 'erform every tas( unaided. "f you're building a house, and "'m teaching you how to use a hammer, and you as( how to assemble rafters and trusses into gables, that's a legitimate )uestion, but the answer has fallen out of the realm of $$*ow do " use a hammer+'' and into $$*ow do " build a house+''. "n the same way, we'll see that C does not have built%in features to 'erform every function that we might ever need to do while 'rogramming. #s mentioned above, C im'oses relatively few built%in ways of doing things on the 'rogrammer. &ome common tas(s, such as mani'ulating strings, allocating memory, and doing in'ut,out'ut (",-), are 'erformed by calling on library functions. -ther tas(s which you might want to do, such as creating or listing directories, or interacting with a mouse, or dis'laying windows or other user%interface elements, or doing color gra'hics, are not defined by the C language at all. .ou can do these things from a C 'rogram, of course, but you will be calling on services which are 'eculiar to your 'rogramming environment (com'iler, 'rocessor, and o'erating system) and which are not defined by the C standard. &ince this course is about 'ortable C 'rogramming, it will also be steering clear of facilities not 'rovided in all C environments. #nother as'ect of C that's worth mentioning here is that it is, to 'ut it bluntly, a bit dangerous. C does not, in general, try hard to 'rotect a 'rogrammer from mista(es. "f you

write a 'iece of code which will (through some oversight of yours) do something wildly different from what you intended it to do, u' to and including deleting your data or trashing your dis(, and if it is 'ossible for the com'iler to com'ile it, it generally will. .ou won't get warnings of the form $$/o you really mean to...+'' or $$#re you sure you really want to...+''. C is often com'ared to a shar' (nife: it can do a surgically 'recise 0ob on some e!acting tas( you have in mind, but it can also do a surgically 'recise 0ob of cutting off your finger. "t's u' to you to use it carefully. 1his as'ect of C is very widely critici2ed it is also used (0ustifiably) to argue that C is not a good teaching language. C aficionados love this as'ect of C because it means that C does not try to 'rotect them from themselves: when they (now what they're doing, even if it's ris(y or obscure, they can do it. &tudents of C hate this as'ect of C because it often seems as if the language is some (ind of a cons'iracy s'ecifically designed to lead them into booby tra's and $$gotcha3''s. 1his is another as'ect of the language which it's fairly 'ointless to com'lain about. "f you ta(e care and 'ay attention, you can avoid many of the 'itfalls. 1hese notes will 'oint out many of the obvious (and not so obvious) trouble s'ots.

1.1 A First Example


41his section corres'onds to K&R &ec. 5.56 1he best way to learn 'rogramming is to dive right in and start writing real 'rograms. 1his way, conce'ts which would otherwise seem abstract ma(e sense, and the 'ositive feedbac( you get from getting even a small 'rogram to wor( gives you a great incentive to im'rove it or write the ne!t one. /iving in with $$real'' 'rograms right away has another advantage, if only 'ragmatic: if you're using a conventional com'iler, you can't run a fragment of a 'rogram and see what it does nothing will run until you have a com'lete (if tiny or trivial) 'rogram. .ou can't learn everything you'd need to write a com'lete 'rogram all at once, so you'll have to ta(e some things $$on faith'' and 'arrot them in your first 'rograms before you begin to understand them. (.ou can't learn to 'rogram 0ust one e!'ression or statement at a time any more than you can learn to s'ea( a foreign language one word at a time. "f all you (now is a handful of words, you can't actually say anything: you also need to (now something about the language's word order and grammar and sentence structure and declension of articles and verbs.) 7esides the occasional necessity to ta(e things on faith, there is a more serious 'otential drawbac( of this $$dive in and 'rogram'' a''roach: it's a small ste' from learning%by% doing to learning%by%trial%and%error, and when you learn 'rogramming by trial%and%error, you can very easily learn many errors. 8hen you're not sure whether something will wor(, or you're not even sure what you could use that might wor(, and you try something, and it does wor(, you do not have any guarantee that what you tried wor(ed for the right reason. .ou might 0ust have $$learned'' something that wor(s only by

accident or only on your com'iler, and it may be very hard to un%learn it later, when it sto's wor(ing. 1herefore, whenever you're not sure of something, be very careful before you go off and try it $$0ust to see if it will wor(.'' -f course, you can never be absolutely sure that something is going to wor( before you try it, otherwise we'd never have to try things. 7ut you should have an e!'ectation that something is going to wor( before you try it, and if you can't 'redict how to do something or whether something would wor( and find yourself having to determine it e!'erimentally, ma(e a note in your mind that whatever you've 0ust learned (based on the outcome of the e!'eriment) is sus'ect. 1he first e!am'le 'rogram in K&R is the first e!am'le 'rogram in any language: 'rint or dis'lay a sim'le string, and e!it. *ere is my version of K&R's $$hello, world'' 'rogram:
#include <stdio.h> main() { printf("Hello, world!\n"); return ; !

"f you have a C com'iler, the first thing to do is figure out how to ty'e this 'rogram in and com'ile it and run it and see where its out'ut went. ("f you don't have a C com'iler yet, the first thing to do is to find one.) 1he first line is 'ractically boiler'late it will a''ear in almost all 'rograms we write. "t as(s that some definitions having to do with the $$&tandard ",- 9ibrary'' be included in our 'rogram these definitions are needed if we are to call the library function printf correctly. 1he second line says that we are defining a function named main. :ost of the time, we can name our functions anything we want, but the function name main is s'ecial: it is the function that will be $$called'' first when our 'rogram starts running. 1he em'ty 'air of 'arentheses indicates that our main function acce'ts no arguments, that is, there isn't any information which needs to be 'assed in when the function is called. 1he braces { and ! surround a list of statements in C. *ere, they surround the list of statements ma(ing u' the function main. 1he line
printf("Hello, world!\n");

is the first statement in the 'rogram. "t as(s that the function printf be called printf is a library function which 'rints formatted out'ut. 1he 'arentheses surround printf's argument list: the information which is handed to it which it should act on. 1he semicolon at the end of the line terminates the statement.

(printf's name reflects the fact that C was first develo'ed when 1elety'es and other 'rinting terminals were still in wides'read use. 1oday, of course, video dis'lays are far more common. printf's $$'rints'' to the standard output, that is, to the default location for 'rogram out'ut to go. ;owadays, that's almost always a video screen or a window on that screen. "f you do have a 'rinter, you'll ty'ically have to do something e!tra to get a 'rogram to 'rint to it.)
printf's

first (and, in this case, only) argument is the string which it should 'rint. 1he string, enclosed in double )uotes "", consists of the words $$*ello, world3'' followed by a s'ecial se)uence: \n. "n strings, any two%character se)uence beginning with the bac(slash \ re'resents a single s'ecial character. 1he se)uence \n re'resents the $$new line'' character, which 'rints a carriage return or line feed or whatever it ta(es to end one line of out'ut and move down to the ne!t. (1his 'rogram only 'rints one line of out'ut, but it's still im'ortant to terminate it.) 1he second line in the main function is
return ;

"n general, a function may return a value to its caller, and main is no e!ce'tion. 8hen main returns (that is, reaches its end and sto's functioning), the 'rogram is at its end, and the return value from main tells the o'erating system (or whatever invo(ed the 'rogram that main is the main function of) whether it succeeded or not. 7y convention, a return value of < indicates success. 1his 'rogram may loo( so absolutely trivial that it seems as if it's not even worth ty'ing it in and trying to run it, but doing so may be a big (and is certainly a vital) first hurdle. -n an unfamiliar com'uter, it can be arbitrarily difficult to figure out how to enter a te!t file containing 'rogram source, or how to com'ile and lin( it, or how to invo(e it, or what ha''ened after (if+) it ran. 1he most e!'erienced C 'rogrammers immediately go bac( to this one, sim'le 'rogram whenever they're trying out a new system or a new way of entering or building 'rograms or a new way of 'rinting out'ut from within 'rograms. #s Kernighan and Ritchie say, everything else is com'aratively easy. *ow you com'ile and run this (or any) 'rogram is a function of the com'iler and o'erating system you're using. 1he first ste' is to ty'e it in, e!actly as shown this may involve using a te!t editor to create a file containing the 'rogram te!t. .ou'll have to give the file a name, and all C com'ilers (that "'ve ever heard of) re)uire that files containing C source end with the e!tension .c. &o you might 'lace the 'rogram te!t in a file called hello.c. 1he second ste' is to com'ile the 'rogram. (&trictly s'ea(ing, com'ilation consists of two ste's, com'ilation 'ro'er followed by lin(ing, but we can overloo( this distinction at first, es'ecially because the com'iler often ta(es care of initiating the lin(ing ste' automatically.) -n many =ni! systems, the command to com'ile a C 'rogram from a source file hello.c is

cc "o hello hello.c

.ou would ty'e this command at the =ni! shell 'rom't, and it re)uests that the cc (C com'iler) 'rogram be run, 'lacing its out'ut (i.e. the new e!ecutable 'rogram it creates) in the file hello, and ta(ing its in'ut (i.e. the source code to be com'iled) from the file hello.c. 1he third ste' is to run (e!ecute, invo(e) the newly%built hello 'rogram. #gain on a =ni! system, this is done sim'ly by ty'ing the 'rogram's name:
hello

/e'ending on how your system is set u' (in 'articular, on whether the current directory is searched for e!ecutables, based on the >#1* variable), you may have to ty'e
.#hello

to indicate that the hello 'rogram is in the current directory (as o''osed to some $$$in'' directory full of e!ecutable 'rograms, elsewhere). .ou may also have your choice of C com'ilers. -n many =ni! machines, the cc command is an older com'iler which does not recogni2e modern, #;&" &tandard C synta!. #n old com'iler will acce't the sim'le 'rograms we'll be starting with, but it will not acce't most of our later 'rograms. "f you find yourself getting baffling com'ilation errors on 'rograms which you've ty'ed in e!actly as they're shown, it 'robably indicates that you're using an older com'iler. -n many machines, another com'iler called acc or %cc is available, and you'll want to use it, instead. (7oth acc and %cc are ty'ically invo(ed the same as cc that is, the above cc command would instead be ty'ed, say, %cc "o hello hello.c .) (-ne final caveat about =ni! systems: don't name your test 'rograms test, because there's already a standard command called test, and you and the command inter'reter will get badly confused if you try to re'lace the system's test command with your own, not least because your own almost certainly does something com'letely different.) =nder :&%/-&, the com'ilation 'rocedure is )uite similar. 1he name of the command you ty'e will de'end on your com'iler (e.g. cl for the :icrosoft C com'iler, tc or $cc for 7orland's 1urbo C, etc.). .ou may have to manually 'erform the second, lin(ing ste', 'erha's with a command named lin& or tlin&. 1he e!ecutable file which the com'iler,lin(er creates will have a name ending in .e'e (or 'erha's .com), but you can still invo(e it by ty'ing the base name (e.g. hello). &ee your com'iler documentation for com'lete details one of the manuals should contain a demonstration of how to enter, com'ile, and run a small 'rogram that 'rints some sim'le out'ut, 0ust as we're trying to describe here. "n an integrated or $$visual'' 'rogamming environment, such as those on the :acintosh or under various versions of :icrosoft 8indows, the ste's you ta(e to enter, com'ile, and run a 'rogram are somewhat different (and, theoretically, sim'ler). 1y'ically, there is a way to o'en a new source window, ty'e source code into it, give it a file name, and add it to the 'rogram (or $$'ro0ect'') you're building. "f necessary, there will be a way to s'ecify

what other source files (or $$modules'') ma(e u' the 'rogram. 1hen, there's a button or menu selection which com'iles and runs the 'rogram, all from within the 'rogramming environment. (1here will also be a way to create a standalone e!ecutable file which you can run from outside the environment.) "n a >C%com'atible environment, you may have to choose between creating /-& 'rograms or 8indows 'rograms. ("f you have troubles 'ertaining to the printf function, try s'ecifying a target environment of :&%/-&. &u''osedly, some com'ilers which are targeted at 8indows environments won't let you call printf, because until you call some fancier functions to re)uest that a window be created, there's no window for printf to 'rint to.) #gain, chec( the introductory or tutorial manual that came with the 'rogramming 'ac(age it should wal( you through the ste's necessary to get your first 'rogram running.

1.2 Second Example


-ur second e!am'le is of little more 'ractical use than the first, but it introduces a few more 'rogramming language elements:
#include <stdio.h> #( print a few num$ers, to illustrate a simple loop (# main() { int i; for(i ) return ! ; i < * ; i ) i + *) printf("i is ,d\n", i); ;

#s before, the line #include <stdio.h> is boiler'late which is necessary since we're calling the printf function, and main() and the 'air of braces {! indicate and delineate the function named main we're (again) writing. 1he first new line is the line which is the com'iler, but may be useful to a 'erson trying to read and understand the 'rogram. .ou can add comments anywhere you want to in the 'rogram, to document what the 'rogram is, what it does, who wrote it, how it wor(s, what the various functions are for and how they wor(, what the various variables are for, etc. 1he second new line, down within the function main, is
int i; #( print a few num$ers, to illustrate a simple loop (# a comment. #nything between the characters #( and (# is ignored by

which declares that our function will use a variable named i. 1he variable's ty'e is int, which is a 'lain integer.

;e!t, we set u' a loop:


for(i ) ; i < * ; i ) i + *)

1he (eyword for indicates that we are setting u' a $$for loo'.'' # for loo' is controlled by three e!'ressions, enclosed in 'arentheses and se'arated by semicolons. 1hese e!'ressions say that, in this case, the loo' starts by setting i to <, that it continues as long as i is less than 5<, and that after each iteration of the loo', i should be incremented by 5 (that is, have 5 added to its value). ?inally, we have a call to the printf function, as before, but with several differences. ?irst, the call to printf is within the body of the for loo'. 1his means that control flow does not 'ass once through the printf call, but instead that the call is 'erformed as many times as are dictated by the for loo'. "n this case, printf will be called several times: once when i is <, once when i is 5, once when i is @, and so on until i is A, for a total of 5< times. # second difference in the printf call is that the string to be 'rinted, "i is ,d", contains a 'ercent sign. 8henever printf sees a 'ercent sign, it indicates that printf is not su''osed to 'rint the e!act te!t of the string, but is instead su''osed to read another one of its arguments to decide what to 'rint. 1he letter after the 'ercent sign tells it what ty'e of argument to e!'ect and how to 'rint it. "n this case, the letter d indicates that printf is to e!'ect an int, and to 'rint it in decimal. ?inally, we see that printf is in fact being called with another argument, for a total of two, se'arated by commas. 1he second argument is the variable i, which is in fact an int, as re)uired by ,d. 1he effect of all of this is that each time it is called, printf will 'rint a line containing the current value of the variable i:
i is i is * i is ...

#fter several tri's through the loo', i will eventually e)ual A. #fter that tri' through the loo', the third control e!'ression i ) i + * will increment its value to 5<. 1he condition i < * is no longer true, so no more tri's through the loo' are ta(en. "nstead, control flow 0um's down to the statement following the for loo', which is the return statement. 1he main function returns, and the 'rogram is finished.

1.3 Program Structure


8e'll have more to say later about 'rogram structure, but for now let's observe a few basics. # 'rogram consists of one or more functions it may also contain global variables. (-ur two e!am'le 'rograms so far have contained one function a'iece, and no global variables.) #t the to' of a source file are ty'ically a few boiler'late lines such as #include <stdio.h>, followed by the definitions (i.e. code) for the functions. ("t's also

'ossible to s'lit u' the several functions ma(ing u' a larger 'rogram into several source files, as we'll see in a later cha'ter.) Bach function is further com'osed of declarations and statements, in that order. 8hen a se)uence of statements should act as one (for e!am'le, when they should all serve together as the body of a loo') they can be enclosed in braces (0ust as for the outer body of the entire function). 1he sim'lest (ind of statement is an expression statement, which is an e!'ression ('resumably 'erforming some useful o'eration) followed by a semicolon. B!'ressions are further com'osed of operators, objects (variables), and constants. C source code consists of several lexical elements. &ome are words, such as for, return, main, and i, which are either keywords of the language (for, return) or identifiers (names) we've chosen for our own functions and variables (main, i). 1here are constants such as * and * which introduce new values into the 'rogram. 1here are operators such as ), +, and >, which mani'ulate variables and values. 1here are other 'unctuation characters (often called delimiters), such as 'arentheses and s)uiggly braces {!, which indicate how the other elements of the 'rogram are grou'ed. ?inally, all of the 'receding elements can be se'arated by whitespace: s'aces, tabs, and the $$carriage returns'' between lines. 1he source code for a C 'rogram is, for the most 'art, $$free form.'' 1his means that the com'iler does not care how the code is arranged: how it is bro(en into lines, how the lines are indented, or whether whites'ace is used between things li(e variable names and other 'unctuation. (9ines li(e #include <stdio.h> are an e!ce'tion they must a''ear alone on their own lines, generally unbro(en. -nly lines beginning with # are affected by this rule we'll see other e!am'les later.) .ou can use whites'ace, indentation, and a''ro'riate line brea(s to ma(e your 'rograms more readable for yourself and other 'eo'le (even though the com'iler doesn't care). .ou can 'lace e!'lanatory comments anywhere in your 'rogram%%any te!t between the characters #( and (# is ignored by the com'iler. ("n fact, the com'iler 'retends that all it saw was whites'ace.) 1hough comments are ignored by the com'iler, well%chosen comments can ma(e a 'rogram much easier to read (for its author, as well as for others). 1he usage of whites'ace is our first style issue. "t's ty'ical to leave a blan( line between different 'arts of the 'rogram, to leave a s'ace on either side of o'erators such as + and ), and to indent the bodies of loo's and other control flow constructs. 1y'ically, we arrange the indentation so that the subsidiary statements controlled by a loo' statement (the $$loo' body,'' such as the printf call in our second e!am'le 'rogram) are all aligned with each other and 'laced one tab sto' (or some consistent number of s'aces) to the right of the controlling statement. 1his indentation (li(e all whites'ace) is not re)uired by the com'iler, but it ma(es 'rograms much easier to read. (*owever, it can also be misleading, if used incorrectly or in the face of inadvertent mista(es. 1he com'iler will decide what $$the body of the loo''' is based on its own rules, not the indentation, so if the indentation does not match the com'iler's inter'retation, confusion is inevitable.)

1o drive home the 'oint that the com'iler doesn't care about indentation, line brea(s, or other whites'ace, here are a few (e!treme) e!am'les: 1he fragments
for(i ) ; i < * ; i ) i + *) printf(",d\n", i); ; i < * ; i ) i + *) printf(",d\n", i);

and
for(i )

and and

for(i) ;i<* ;i)i+*)printf(",d\n",i); for(i ) ; i < * ; i ) i + *) printf(",d\n", i);

and
for ) i ; i ) ",d\n" ) ( i ; < * i ) + * printf ( , i ;

and
for (i) ; i<* ;i) i+*)printf (",d\n", i);

are all treated e!actly the same way by the com'iler. &ome 'rogrammers argue forever over the best set of $$rules'' for indentation and other as'ects of 'rogramming style, calling to mind the old 'hiloso'her's debates about the number of angels that could dance on the head of a 'in. &tyle issues (such as how a 'rogram is laid out) are im'ortant, but they're not something to be too dogmatic about, and there are also other, dee'er style issues besides mere layout and ty'ogra'hy. Kernighan and Ritchie ta(e a fairly moderate stance: #lthough C com'ilers do not care about how a 'rogram loo(s, 'ro'er indentation and s'acing are critical in ma(ing 'rograms easy for 'eo'le to read. 8e recommend writing only one statement 'er line, and using blan(s around o'erators to clarify grou'ing. 1he 'osition of braces is less im'ortant, although 'eo'le hold 'assionate beliefs. 8e have chosen one of several 'o'ular styles. >ic( a style that suits you, then use it consistently. 1here is some value in having a reasonably standard style (or a few standard styles) for code layout. >lease don't ta(e the above advice to $$'ic( a style that suits you'' as an invitation to invent your own brand%new style. "f ('erha's after you've been 'rogramming in C for a while) you have s'ecific ob0ections to s'ecific facets of e!isting styles, you're welcome to modify them, but if you don't have any 'articular leanings, you're 'robably best off co'ying an e!isting style at first. ("f you want to 'lace your own stam' of originality on the 'rograms that you write, there are better avenues for your creativity

than inventing a bi2arre layout you might instead try to ma(e the logic easier to follow, or th

Chapter 2: Basic Data Types and Operators


1he type of a variable determines what (inds of values it may ta(e on. #n operator com'utes new values out of old ones. #n expression consists of variables, constants, and o'erators combined to 'erform some useful com'utation. "n this cha'ter, we'll learn about C's basic ty'es, how to write constants and declare variables of these ty'es, and what the basic o'erators are. #s Kernighan and Ritchie say, $$1he ty'e of an ob0ect determines the set of values it can have and what o'erations can be 'erformed on it.'' 1his is a fairly formal, mathematical definition of what a ty'e is, but it is traditional (and meaningful). 1here are several im'lications to remember: 5. 1he $$set of values'' is finite. C's int ty'e can not re'resent all of the integers its float ty'e can not re'resent all floating%'oint numbers. @. 8hen you're using an ob0ect (that is, a variable) of some ty'e, you may have to remember what values it can ta(e on and what o'erations you can 'erform on it. ?or e!am'le, there are several o'erators which 'lay with the binary (bit%level) re'resentation of integers, but these o'erators are not meaningful for and may not be a''lied to floating%'oint o'erands. C. 8hen declaring a new variable and 'ic(ing a ty'e for it, you have to (ee' in mind the values and o'erations you'll be needing. "n other words, 'ic(ing a ty'e for a variable is not some abstract academic e!ercise it's closely connected to the way(s) you'll be using that variable.

2.1 Types
41his section corres'onds to K&R &ec. @.@6 1here are only a few basic data ty'es in C. 1he first ones we'll be encountering and using are:

char a character int an integer, in the range %C@,DED to C@,DED long int a larger integer (u' to F%@,5GD,GHC,EGD) float a floating%'oint number double a floating%'oint number, with more 'recision and 'erha's greater range than float

"f you can loo( at this list of basic ty'es and say to yourself, $$-h, how sim'le, there are only a few ty'es, " won't have to worry much about choosing among them,'' you'll have an easy time with declarations. (&ome masochists wish that the ty'e system were more com'licated so that they could s'ecify more things about each variable, but those of us who would rather not have to s'ecify these e!tra things each time are glad that we don't have to.) 1he ranges listed above for ty'es int and lon% int are the guaranteed minimum ranges. -n some systems, either of these ty'es (or, indeed, any C ty'e) may be able to hold larger values, but a 'rogram that de'ends on e!tended ranges will not be as 'ortable. &ome 'rogrammers become obsessed with (nowing e!actly what the si2es of data ob0ects will be in various situations, and go on to write 'rograms which de'end on these e!act si2es. /etermining or controlling the si2e of an ob0ect is occasionally im'ortant, but most of the time we can sideste' si2e issues and let the com'iler do most of the worrying. (?rom the ranges listed above, we can determine that ty'e int must be at least 5E bits, and that ty'e lon% int must be at least C@ bits. 7ut neither of these si2es is e!act many systens have C@%bit ints, and some systems have EG%bit lon% ints.) .ou might wonder how the com'uter stores characters. 1he answer involves a character set, which is sim'ly a ma''ing between some set of characters and some set of small numeric codes. :ost machines today use the #&C"" character set, in which the letter # is re'resented by the code EI, the am'ersand & is re'resented by the code CH, the digit 5 is re'resented by the code GA, the s'ace character is re'resented by the code C@, etc. (:ost of the time, of course, you have no need to (now or even worry about these 'articular code values they're automatically translated into the right sha'es on the screen or 'rinter when characters are 'rinted out, and they're automatically generated when you ty'e characters on the (eyboard. Bventually, though, we'll a''reciate, and even ta(e some control over, e!actly when these translations%%from characters to their numeric codes%%are 'erformed.) Character codes are usually small%%the largest code value in #&C"" is 5@E, which is the J (tilde or circumfle!) character. Characters usually fit in a byte, which is usually H bits. "n C, ty'e char is defined as occu'ying one byte, so it is usually H bits. :ost of the sim'le variables in most 'rograms are of ty'es int, lon% int, or dou$le. 1y'ically, we'll use int and dou$le for most 'ur'oses, and lon% int any time we need to hold integer values greater than C@,DED. #s we'll see, even when we're mani'ulating individual characters, we'll usually use an int variable, for reasons to be discussed later. 1herefore, we'll rarely use individual variables of ty'e char although we'll use 'lenty of arrays of char.

2.2 Constants
41his section corres'onds to K&R &ec. @.C6 # constant is 0ust an immediate, absolute value found in an e!'ression. 1he sim'lest constants are decimal integers, e.g. , *, -, *-. . -ccasionally it is useful to s'ecify

constants in base H or base 5E (octal or he!adecimal) this is done by 'refi!ing an e!tra (2ero) for octal, or ' for he!adecimal: the constants * , *//, and '0/ all re'resent the same number. ("f you're not using these non%decimal constants, 0ust remember not to use any leading 2eroes. "f you accidentally write *-. intending to get one hundred and twenty three, you'll get HC instead, which is 5@C base H.) 8e write constants in decimal, octal, or he!adecimal for our convenience, not the com'iler's. 1he com'iler doesn't care it always converts everything into binary internally, anyway. (1here is, however, no good way to s'ecify constants in source code in binary.) # constant can be forced to be of ty'e lon% int by suffi!ing it with the letter 1 (in u''er or lower case, although u''er case is strongly recommended, because a lower case l loo(s too much li(e the digit *). # constant that contains a decimal 'oint or the letter e (or both) is a floating%'oint constant: ..*/, * ., . *, *-.e/, *-../20e3 . 1he e indicates multi'lication by a 'ower of 5< *-../20e3 is 5@C.GIE times 5< to the Dth, or 5,@CG,IE<,<<<. (?loating%'oint constants are of ty'e dou$le by default.) 8e also have constants for s'ecifying characters and strings. (:a(e sure you understand the difference between a character and a string: a character is e!actly one character a string is a set of 2ero or more characters a string containing one character is distinct from a lone character.) # character constant is sim'ly a single character between single )uotes: 454, 4.4, 4,4. 1he numeric value of a character constant is, naturally enough, that character's value in the machine's character set. ("n #&C"", for e!am'le, 454 has the value EI.) # string is re'resented in C as a se)uence or array of characters. (8e'll have more to say about arrays in general, and strings in 'articular, later.) # string constant is a se)uence of 2ero or more characters enclosed in double )uotes: "apple", "hello, world", "this is a test". 8ithin character and string constants, the bac(slash character \ is s'ecial, and is used to re'resent characters not easily ty'ed on the (eyboard or for various reasons not easily ty'ed in constants. 1he most common of these $$character esca'es'' are:
\n \$ \r \4 \" \\ a a a a a a 66newline44 character $ac&space carria%e return (without a line feed) sin%le 7uote (e.%. in a character constant) dou$le 7uote (e.%. in a strin% constant) sin%le $ac&slash

?or e!am'le, "he said \"hi\"" is a string constant which contains two double )uotes, and 4\44 is a character constant consisting of a (single) single )uote. ;otice once again that the character constant 454 is very different from the string constant "5".

2.3 Declarations
41his section corres'onds to K&R &ec. @.G6 "nformally, a variable (also called an object) is a 'lace you can store a value. &o that you can refer to it unambiguously, a variable needs a name. .ou can thin( of the variables in your 'rogram as a set of bo!es or cubbyholes, each with a label giving its name you might imagine that storing a value $$in'' a variable consists of writing the value on a sli' of 'a'er and 'lacing it in the cubbyhole. # declaration tells the com'iler the name and ty'e of a variable you'll be using in your 'rogram. "n its sim'lest form, a declaration consists of the ty'e, the name of the variable, and a terminating semicolon:
char c; int i; float f;

.ou can also declare several variables of the same ty'e in one declaration, se'arating them with commas:
int i*, i-;

9ater we'll see that declarations may also contain initializers, qualifiers and storage classes, and that we can declare arrays, functions, pointers, and other (inds of data structures. 1he 'lacement of declarations is significant. .ou can't 'lace them 0ust anywhere (i.e. they cannot be inters'ersed with the other statements in your 'rogram). 1hey must either be 'laced at the beginning of a function, or at the beginning of a brace%enclosed bloc( of statements (which we'll learn about in the ne!t cha'ter), or outside of any function. ?urthermore, the 'lacement of a declaration, as well as its storage class, controls several things about its visibility and lifetime, as we'll see later. .ou may wonder why variables must be declared before use. 1here are two reasons: 5. "t ma(es things somewhat easier on the com'iler it (nows right away what (ind of storage to allocate and what code to emit to store and mani'ulate each variable it doesn't have to try to intuit the 'rogrammer's intentions. @. "t forces a bit of useful disci'line on the 'rogrammer: you cannot introduce variables willy%nilly you must thin( about them enough to 'ic( a''ro'riate ty'es for them. (1he com'iler's error messages to you, telling you that you a''arently forgot to declare a variable, are as often hel'ful as they are a nuisance: they're hel'ful when they tell you that you miss'elled a variable, or forgot to thin( about e!actly how you were going to use it.)

#lthough there are a few 'laces where declarations can be omitted (in which case the com'iler will assume an im'licit declaration), ma(ing use of these removes the advantages of reason @ above, so " recommend always declaring everything e!'licitly. :ost of the time, " recommend writing one declaration 'er line. ?or the most 'art, the com'iler doesn't care what order declarations are in. .ou can order the declarations al'habetically, or in the order that they're used, or to 'ut related declarations ne!t to each other. Collecting all variables of the same ty'e together on one line essentially orders declarations by ty'e, which isn't a very useful order (it's only slightly more useful than random order). # declaration for a variable can also contain an initial value. 1his initializer consists of an e)uals sign and an e!'ression, which is usually a single constant:
int i ) *; int i* ) * , i- ) - ;

2.4 aria!le "ames


41his section corres'onds to K&R &ec. @.56 8ithin limits, you can give your variables and functions any names you want. 1hese names (the formal term is $$identifiers'') consist of letters, numbers, and underscores. ?or our 'ur'oses, names must begin with a letter. 1heoretically, names can be as long as you want, but e!tremely long ones get tedious to ty'e after a while, and the com'iler is not re)uired to (ee' trac( of e!tremely long ones 'erfectly. (8hat this means is that if you were to name a variable, say, supercalafra%alisticespialidocious, the com'iler might get la2y and 'retend that you'd named it supercalafra%alisticespialidocio, such that if you later miss'elled it supercalafra%alisticespialidociou8, the com'iler wouldn't catch your mista(e. ;or would the com'iler necessarily be able to tell the difference if for some 'erverse reason you deliberately declared a second variable named supercalafra%alisticespialidociou8.) 1he ca'itali2ation of names in C is significant: the variable names 9aria$le, :aria$le, and :5;<5=1> (as well as silly combinations li(e 9ari5$le) are all distinct. # final restriction on names is that you may not use keywords (the words such as int and for which are 'art of the synta! of the language) as the names of variables or functions (or as identifiers of any (ind).

2.# Arit$metic %perators


41his section corres'onds to K&R &ec. @.I6 1he basic o'erators for 'erforming arithmetic are the same in many com'uter languages:
+ addition

" ( # ,

su$traction multiplication di9ision modulus (remainder)

1he " o'erator can be used in two ways: to subtract two numbers (as in a " $), or to negate one number (as in "a + $ or a + "$). 8hen a''lied to integers, the division o'erator # discards any remainder, so * # - is < and 3 # / is 5. 7ut when either o'erand is a floating%'oint )uantity (ty'e float or dou$le), the division o'erator yields a floating%'oint result, with a 'otentially non2ero fractional 'art. &o * # -. is <.I, and 3. # /. is 5.DI. 1he modulus o'erator , gives you the remainder when two integers are divided: * , - is 5 3 , / is C. (1he modulus o'erator can only be a''lied to integers.) #n additional arithmetic o'eration you might be wondering about is e!'onentiation. &ome languages have an e!'onentiation o'erator (ty'ically ? or ((), but C doesn't. (1o s)uare or cube a number, 0ust multi'ly it by itself.) :ulti'lication, division, and modulus all have higher precedence than addition and subtraction. 1he term $$'recedence'' refers to how $$tightly'' o'erators bind to their o'erands (that is, to the things they o'erate on). "n mathematics, multi'lication has higher 'recedence than addition, so * + - ( . is D, not A. "n other words, * + - ( . is e)uivalent to * + (- ( .). C is the same way. #ll of these o'erators $$grou''' from left to right, which means that when two or more of them have the same 'recedence and 'artici'ate ne!t to each other in an e!'ression, the evaluation conce'tually 'roceeds from left to right. ?or e!am'le, * " - " . is e)uivalent to (* " -) " . and gives %G, not F@. ($$Krou'ing'' is sometimes called associativity, although the term is used somewhat differently in 'rogramming than it is in mathematics. ;ot all C o'erators grou' from left to right a few grou' from right to left.) 8henever the default 'recedence or associativity doesn't give you the grou'ing you want, you can always use e!'licit 'arentheses. ?or e!am'le, if you wanted to add 5 to @ and then multi'ly the result by C, you could write (* + -) ( .. 7y the way, the word $$arithmetic'' as used in the title of this section is an ad0ective, not a noun, and it's 'ronounced differently than the noun: the accent is on the third syllable.

2.& Assignment %perators


41his section corres'onds to K&R &ec. @.5<6 1he assignment o'erator ) assigns a value to a variable. ?or e!am'le,
' ) *

sets ' to 5, and


a ) $

sets a to whatever $'s value is. 1he e!'ression


i ) i + *

is, as we've mentioned elsewhere, the standard 'rogramming idiom for increasing a variable's value by 5: this e!'ression ta(es i's old value, adds 5 to it, and stores it bac( into i. (C 'rovides several $$shortcut'' o'erators for modifying variables in this and similar ways, which we'll meet later.) 8e've called the ) sign the $$assignment o'erator'' and referred to $$assignment e!'ressions'' because, in fact, ) is an o'erator 0ust li(e + or ". C does not have $$assignment statements'' instead, an assignment li(e a ) $ is an e!'ression and can be used wherever any e!'ression can a''ear. &ince it's an e!'ression, the assignment a ) $ has a value, namely, the same value that's assigned to a. 1his value can then be used in a larger e!'ression for e!am'le, we might write
c ) a ) $

which is e)uivalent to

c ) (a ) $)

and assigns $'s value to both a and c. (1he assignment o'erator, therefore, grou's from right to left.) 9ater we'll see other circumstances in which it can be useful to use the value of an assignment e!'ression. "t's usually a matter of style whether you initiali2e a variable with an initiali2er in its declaration or with an assignment e!'ression near where you first use it. 1hat is, there's no 'articular difference between
int a ) * ;

and

int a; #( later... (# a ) * ;

2.' Function Calls


8e'll have much more to say about functions in a later cha'ter, but for now let's 0ust loo( at how they're called. (1o review: what a function is is a 'iece of code, written by you or by someone else, which 'erforms some useful, com'artmentali2able tas(.) .ou call a function by mentioning its name followed by a 'air of 'arentheses. "f the function ta(es any arguments, you 'lace the arguments between the 'arentheses, se'arated by commas. 1hese are all function calls:
printf("Hello, world!\n") printf(",d\n", i) s7rt(*//.) %etchar()

1he arguments to a function can be arbitrary e!'ressions. 1herefore, you don't have to say things li(e
int sum ) a + $ + c; printf("sum ) ,d\n", sum);

if you don't want to you can instead colla'se it to


printf("sum ) ,d\n", a + $ + c);

:any functions return values, and when they do, you can embed calls to these functions within larger e!'ressions:

c ) s7rt(a ( a + $ ( $) ' ) r ( cos(theta) i ) f*(f-(@)) 1he first e!'ression s)uares a and $, com'utes the s)uare root of the sum of the s)uares, and assigns the result to c. ("n other words, it com'utes a ( a + $ ( $, 'asses that number to the s7rt function, and assigns s7rt's return value to c.) 1he second e!'ression 'asses the value of the variable theta to the cos (cosine) function, multi'lies the result by r, and assigns the result to '. 1he third e!'ression 'asses the value of the variable @ to the function f-, 'asses the return value of f- immediately to the function f*, and finally assigns f*'s return value to the variable i.

Chapter 3: Statements and Control Flow


&tatements are the $$ste's'' of a 'rogram. :ost statements com'ute and assign values or call functions, but we will eventually meet several other (inds of statements as well. 7y default, statements are e!ecuted in se)uence, one after another. 8e can, however, modify that se)uence by using control flow constructs which arrange that a statement or grou' of statements is e!ecuted only if some condition is true or false, or e!ecuted over and over again to form a loop. (# somewhat different (ind of control flow ha''ens when we call a function: e!ecution of the caller is sus'ended while the called function 'roceeds. 8e'll discuss functions in cha'ter I.) :y definitions of the terms statement and control flow are somewhat circular. # statement is an element within a 'rogram which you can a''ly control flow to control flow is how you s'ecify the order in which the statements in your 'rogram are e!ecuted. (# wea(er definition of a statement might be $$a 'art of your 'rogram that does something,'' but this definition could as easily be a''lied to e!'ressions or functions.)

3.1 Expression Statements


41his section corres'onds to K&R &ec. C.56 :ost of the statements in a C 'rogram are expression statements. #n e!'ression statement is sim'ly an e!'ression followed by a semicolon. 1he lines
i ) ; i ) i + *; and printf("Hello, world!\n");

are all e!'ression statements. ("n some languages, such as >ascal, the semicolon se'arates statements, such that the last statement is not followed by a semicolon. "n C, however, the semicolon is a statement terminator all sim'le statements are followed by semicolons. 1he semicolon is also used for a few other things in C we've already seen that it terminates declarations, too.) B!'ression statements do all of the real wor( in a C 'rogram. 8henever you need to com'ute new values for variables, you'll ty'ically use e!'ression statements (and they'll ty'ically contain assignment o'erators). 8henever you want your 'rogram to do something visible, in the real world, you'll ty'ically call a function (as 'art of an e!'ression statement). 8e've already seen the most basic e!am'le: calling the function printf to 'rint te!t to the screen. 7ut anything else you might do%%read or write a dis( file, tal( to a modem or 'rinter, draw 'ictures on the screen%%will also involve function calls. (?urthermore, the functions you call to do these things are usually different de'ending on which o'erating system you're using. 1he C language does not define them, so we won't be tal(ing about or using them much.) B!'ressions and e!'ression statements can be arbitrarily com'licated. 1hey don't have to consist of e!actly one sim'le function call, or of one sim'le assignment to a variable. ?or one thing, many functions return values, and the values they return can then be used by other 'arts of the e!'ression. ?or e!am'le, C 'rovides a s7rt (s)uare root) function, which we might use to com'ute the hy'otenuse of a right triangle li(e this:
c ) s7rt(a(a + $($);

1o be useful, an e!'ression statement must do something it must have some lasting effect on the state of the 'rogram. (?ormally, a useful statement must have at least one side effect.) 1he first two sam'le e!'ression statements in this section (above) assign new values to the variable i, and the third one calls printf to 'rint something out, and these are good e!am'les of statements that do something useful. (1o ma(e the distinction clear, we may note that degenerate constructions such as
; i; i + *;

or

are syntactically valid statements%%they consist of an e!'ression followed by a semicolon%%but in each case, they com'ute a value without doing anything with it, so the com'uted value is discarded, and the statement is useless. 7ut if the $$degenerate'' statements in this 'aragra'h don't ma(e much sense to you, don't worry it's because they, fran(ly, don't ma(e much sense.) "t's also 'ossible for a single e!'ression to have multi'le side effects, but it's easy for such an e!'ression to be (a) confusing or (b) undefined. ?or now, we'll only be loo(ing at e!'ressions (and, therefore, statements) which do one well%defined thing at a time.

3.2 if Statements
41his section corres'onds to K&R &ec. C.@6 1he sim'lest way to modify the control flow of a 'rogram is with an if statement, which in its sim'lest form loo(s li(e this:
if(' > ma') ma' ) ';

Bven if you didn't (now any C, it would 'robably be 'retty obvious that what ha''ens here is that if ' is greater than ma', ' gets assigned to ma'. (8e'd use code li(e this to (ee' trac( of the ma!imum value of ' we'd seen%%for each new ', we'd com'are it to the old ma!imum value ma', and if the new value was greater, we'd u'date ma'.) :ore generally, we can say that the synta! of an if statement is:
if( expression ) statement

where expression is any e!'ression and statement is any statement. 8hat if you have a series of statements, all of which should be e!ecuted together or not at all de'ending on whether some condition is true+ 1he answer is that you enclose them in braces:
if( expression ) { statement<sub>1</sub> statement<sub>2</sub> statement<sub>3</sub> !

#s a general rule, anywhere the synta! of C calls for a statement, you may write a series of statements enclosed by braces. (.ou do not need to, and should not, 'ut a semicolon after the closing brace, because the series of statements enclosed by braces is not itself a sim'le e!'ression statement.) #n if statement may also o'tionally contain a second statement, the $$else clause,'' which is to be e!ecuted if the condition is not met. *ere is an e!am'le:
if(n > else ) a9era%e ) sum # n; { printf("can4t compute a9era%e\n"); a9era%e ) ; !

1he first statement or bloc( of statements is e!ecuted if the condition is true, and the second statement or bloc( of statements (following the (eyword else) is e!ecuted if the condition is not true. "n this e!am'le, we can com'ute a meaningful average only if n is greater than < otherwise, we 'rint a message saying that we cannot com'ute the average. 1he general synta! of an if statement is therefore

if( expression ) statement<sub>1</sub> else statement<sub>2</sub> (where both statement<sub>1</sub> and statement<sub>2</sub>

may be lists of

statements enclosed in braces). "t's also 'ossible to nest one if statement inside another. (?or that matter, it's in general 'ossible to nest any (ind of statement or control flow construct within another.) ?or e!am'le, here is a little 'iece of code which decides roughly which )uadrant of the com'ass you're wal(ing into, based on an ' value which is 'ositive if you're wal(ing east, and a A value which is 'ositive if you're wal(ing north:
if(' > ) { if(A > else ! { if(A >

) printf("Bortheast.\n"); printf("Coutheast.\n"); ) printf("Borthwest.\n"); printf("Couthwest.\n");

else

else ! 8hen you have one if statement (or loo') nested inside another, it's a very good idea to use e!'licit braces {!, as shown, to ma(e it clear (both to you and to the com'iler) how they're nested and which else goes with which if. "t's also a good idea to indent the

various levels, also as shown, to ma(e the code more readable to humans. 8hy do both+ .ou use indentation to ma(e the code visually more readable to yourself and other humans, but the com'iler doesn't 'ay attention to the indentation (since all whites'ace is essentially e)uivalent and is essentially ignored). 1herefore, you also have to ma(e sure that the 'unctuation is right. *ere is an e!am'le of another common arrangement of if and else. &u''ose we have a variable %rade containing a student's numeric grade, and we want to 'rint out the corres'onding letter grade. *ere is code that would do the 0ob:
if(%rade >) D ) printf("5"); else if(%rade >) E ) printf("="); else if(%rade >) 3 ) printf("F"); else if(%rade >) 0 ) printf("G"); else printf("H");

8hat ha''ens here is that e!actly one of the five printf calls is e!ecuted, de'ending on which of the conditions is true. Bach condition is tested in turn, and if one is true, the corres'onding statement is e!ecuted, and the rest are s(i''ed. "f none of the conditions is true, we fall through to the last one, 'rinting $$?''.

"n the cascaded if,else,if,else,... chain, each else clause is another if statement. 1his may be more obvious at first if we reformat the e!am'le, including every set of braces and indenting each if statement relative to the 'revious one:
if(%rade >) D ) { printf("5"); ! else { if(%rade >) E ) { printf("="); ! else { if(%rade >) 3 ) { printf("F"); ! else { if(%rade >) 0 ) { printf("G"); ! else { printf("H"); ! ! ! !

(y examining t$e code t$is )ay* it s$ould !e o!+ious t$at exactly one o, t$e printf calls is executed* and t$at )$ene+er one o, t$e conditions is ,ound true* t$e remaining conditions do not need to !e c$ec-ed and none o, t$e later statements )it$in t$e c$ain )ill !e executed. (ut once you.+e con+inced yoursel, o, t$is and learned to recogni/e t$e idiom* it.s generally pre,era!le to arrange t$e statements as in t$e ,irst example* )it$out trying to indent eac$ successi+e if statement one ta!stop ,urt$er out. 0%!+iously* you.d run into t$e rig$t margin +ery 1uic-ly i, t$e c$ain $ad 2ust a ,e) more cases34 3.3 (oolean Expressions
#n if statement li(e
if(' > ma') ma' ) ';

is 'erha's dece'tively sim'le. Conce'tually, we say that it chec(s whether the condition ' > ma' is $$true'' or $$false''. 1he mechanics underlying C's conce'tion of $$true'' and

$$false,'' however, deserve some e!'lanation. 8e need to understand how true and false values are re'resented, and how they are inter'reted by statements li(e if. #s far as C is concerned, a true,false condition can be re'resented as an integer. (#n integer can re'resent many values here we care about only two values: $$true'' and $$false.'' 1he study of mathematics involving only two values is called 7oolean algebra, after Keorge 7oole, a mathematician who refined this study.) "n C, $$false'' is re'resented by a value of < (2ero), and $$true'' is re'resented by any value that is non2ero. &ince there are many non2ero values (at least EI,ICG, for values of ty'e int), when we have to 'ic( a s'ecific value for $$true,'' we'll 'ic( 5. 1he relational operators such as <, <), >, and >) are in fact o'erators, 0ust li(e +, ", (, and #. 1he relational o'erators ta(e two values, loo( at them, and $$return'' a value of 5 or < de'ending on whether the tested relation was true or false. 1he com'lete set of relational o'erators in C is:
< <) > >) )) !) less than less than or e7ual %reater than %reater than or e7ual e7ual not e7ual

?or e!am'le, * < - is 5, . > / is <, 2 )) 2 is 5, and 0 !) 0 is <. 8e've now encountered 'erha's the most easy%to%stumble%on $$gotcha3'' in C: the e)uality%testing o'erator is )), not a single ), which is assignment. "f you accidentally write
if(a ) )

(and you 'robably will at some 'oint everybody ma(es this mista(e), it will not test whether a is 2ero, as you 'robably intended. "nstead, it will assign < to a, and then 'erform the $$true'' branch of the if statement if a is non2ero. 7ut a will have 0ust been assigned the value <, so the $$true'' branch will never be ta(en3 (1his could drive you cra2y while debugging%%you wanted to do something if a was <, and after the test, a is <, whether it was su''osed to be or not, but the $$true'' branch is nevertheless not ta(en.) 1he relational o'erators wor( with arbitrary numbers and generate true,false values. .ou can also combine true,false values by using the Boolean operators, which ta(e true,false values as o'erands and com'ute new true,false values. 1he three 7oolean o'erators are:
II JJ ! and or not (ta&es one operand; 66unary44)

1he II ($$and'') o'erator ta(es two true,false values and 'roduces a true (5) result if both

o'erands are true (that is, if the left%hand side is true and the right%hand side is true). 1he JJ ($$or'') o'erator ta(es two true,false values and 'roduces a true (5) result if either o'erand is true. 1he ! ($$not'') o'erator ta(es a single true,false value and negates it, turning false to true and true to false (< to 5 and non2ero to <). ?or e!am'le, to test whether the variable i lies between 5 and 5<, you might use
if(* < i II i < * ) ...

*ere we're e!'ressing the relation $$i is between 5 and 5<'' as $$5 is less than i and i is less than 5<.'' "t's im'ortant to understand why the more obvious e!'ression
if(* < i < * ) #( K;LBM (#

would not wor(. 1he e!'ression * < i < * is 'arsed by the com'iler analogously to * + i + * . 1he e!'ression * + i + * is 'arsed as (* + i) + * and means $$add 5 to i, and then add the result to 5<.'' &imilarly, the e!'ression * < i < * is 'arsed as (* < i) < * and means $$see if 5 is less than i, and then see if the result is less than 5<.'' 7ut in this case, $$the result'' is 5 or <, de'ending on whether i is greater than 5. &ince both < and 5 are less than 5<, the e!'ression * < i < * would always be true in C, regardless of the value of i3 Relational and 7oolean e!'ressions are usually used in conte!ts such as an if statement, where something is to be done or not done de'ending on some condition. "n these cases what's actually chec(ed is whether the e!'ression re'resenting the condition has a 2ero or non2ero value. #s long as the e!'ression is a relational or 7oolean e!'ression, the inter'retation is 0ust what we want. ?or e!am'le, when we wrote
if(' > ma')

the > o'erator 'roduced a 5 if ' was greater than ma', and a < otherwise. 1he if statement inter'rets < as false and 5 (or any non2ero value) as true. 7ut what if the e!'ression is not a relational or 7oolean e!'ression+ #s far as C is concerned, the controlling e!'ression (of conditional statements li(e if) can in fact be any e!'ression: it doesn't have to $$loo( li(e'' a 7oolean e!'ression it doesn't have to contain relational or logical o'erators. #ll C loo(s at (when it's evaluating an if statement, or anywhere else where it needs a true,false value) is whether the e!'ression evaluates to < or non2ero. ?or e!am'le, if you have a variable ', and you want to do something if ' is non2ero, it's 'ossible to write
if(') statement

and the statement will be e!ecuted if ' is non2ero (since non2ero means $$true''). 1his 'ossibility (that the controlling e!'ression of an if statement doesn't have to $$loo( li(e'' a 7oolean e!'ression) is both useful and 'otentially confusing. "t's useful when you

have a variable or a function that is $$conce'tually 7oolean,'' that is, one that you consider to hold a true or false (actually non2ero or 2ero) value. ?or e!am'le, if you have a variable 9er$ose which contains a non2ero value when your 'rogram should run in verbose mode and 2ero when it should be )uiet, you can write things li(e
if(9er$ose) printf("Ctartin% first pass\n");

and this code is both legal and readable, besides which it does what you want. 1he standard library contains a function isupper() which tests whether a character is an u''er%case letter, so if c is a character, you might write 7oth
if(isupper(c)) ... of these e!am'les (9er$ose

and isupper()) are useful and readable.

*owever, you will eventually come across code li(e


if(n) a9era%e ) sum # n;

where n is 0ust a number. *ere, the 'rogrammer wants to com'ute the average only if n is non2ero (otherwise, of course, the code would divide by <), and the code wor(s, because, in the conte!t of the if statement, the trivial e!'ression n is (as always) inter'reted as $$true'' if it is non2ero, and $$false'' if it is 2ero. $$Coding shortcuts'' li(e these can seem cry'tic, but they're also )uite common, so you'll need to be able to recogni2e them even if you don't choose to write them in your own code. 8henever you see code li(e
if(')

or

if(f()) where ' or f() do not have obvious $$7oolean'' non2ero'' or $$if f() returns non2ero.''

names, you can read them as $$if ' is

3.4 while 5oops


41his section corres'onds to half of K&R &ec. C.I6 9oo's generally consist of two 'arts: one or more control expressions which (not sur'risingly) control the e!ecution of the loo', and the body, which is the statement or set of statements which is e!ecuted over and over. 1he most basic loop in C is the while loo'. # while loo' has one control e!'ression, and e!ecutes as long as that e!'ression is true. 1his e!am'le re'eatedly doubles the number @ (@, G, H, 5E, ...) and 'rints the resulting numbers as long as they are less than 5<<<:
int ' ) -;

while(' < * ) { printf(",d\n", '); ' ) ' ( -; ! (-nce again, we've used braces {! to enclose the

grou' of statements which are to be

e!ecuted together as the body of the loo'.) 1he general synta! of a while loo' is
while( expression ) statement while loo' starts out li(e an if

# statement: if the condition e!'ressed by the expression is true, the statement is e!ecuted. *owever, after e!ecuting the statement, the condition is tested again, and if it's still true, the statement is e!ecuted again. (>resumably, the condition de'ends on some value which is changed in the body of the loo'.) #s long as the condition remains true, the body of the loo' is e!ecuted over and over again. ("f the condition is false right at the start, the body of the loo' is not e!ecuted at all.) #s another e!am'le, if you wanted to 'rint a number of blan( lines, with the variable n holding the number of blan( lines to be 'rinted, you might use code li(e this:
while(n > ) { printf("\n"); n ) n " *; !

#fter the loo' finishes (when control $$falls out'' of it, due to the condition being false), n will have the value <. .ou use a while loo' when you have a statement or grou' of statements which may have to be e!ecuted a number of times to com'lete their tas(. 1he controlling e!'ression re'resents the condition $$the loo' is not done'' or $$there's more wor( to do.'' #s long as the e!'ression is true, the body of the loo' is e!ecuted 'resumably, it ma(es at least some 'rogress at its tas(. 8hen the e!'ression becomes false, the tas( is done, and the rest of the 'rogram (beyond the loo') can 'roceed. 8hen we thin( about a loo' in this way, we can seen an additional im'ortant 'ro'erty: if the e!'ression evaluates to $$false'' before the very first tri' through the loo', we ma(e zero tri's through the loo'. "n other words, if the tas( is already done (if there's no wor( to do) the body of the loo' is not e!ecuted at all. ("t's always a good idea to thin( about the $$boundary conditions'' in a 'iece of code, and to ma(e sure that the code will wor( correctly when there is no wor( to do, or when there is a trivial tas( to do, such as sorting an array of one number. B!'erience has shown that bugs at boundary conditions are )uite common.)

3.# for 5oops


41his section corres'onds to the other half of K&R &ec. C.I6

-ur second loo', which we've seen at least one e!am'le of already, is the for loo'. 1he first one we saw was:
for (i ) ; i < * ; i ) i + *) printf("i is ,d\n", i); :ore generally, the synta! of a for loo' is for( expr<sub>1</sub> ; expr<sub>2</sub> ; expr<sub>3</sub> ) statement (*ere we see that the for loo' has three control e!'ressions. #s always, the statement

can be a brace%enclosed bloc(.) :any loo's are set u' to cause some variable to ste' through a range of values, or, more generally, to set u' an initial condition and then modify some value to 'erform each succeeding loo' as long as some condition is true. 1he three e!'ressions in a for loo' enca'sulate these conditions: expr<sub>1</sub> sets u' the initial condition, expr<sub>2</sub> tests whether another tri' through the loo' should be ta(en, and expr<sub>3</sub> increments or u'dates things after each tri' through the loo' and 'rior to the ne!t one. "n our first e!am'le, we had i ) as expr<sub>1</sub>, i < * as expr<sub>2</sub>, i ) i + * as expr<sub>3</sub>, and the call to printf as statement, the body of the loo'. &o the loo' began by setting i to <, 'roceeded as long as i was less than 5<, 'rinted out i's value during each tri' through the loo', and added 5 to i between each tri' through the loo'. 8hen the com'iler sees a for loo', first, expr<sub>1</sub> is evaluated. 1hen, expr<sub>2</sub> is evaluated, and if it is true, the body of the loo' (statement) is e!ecuted. 1hen, expr<sub>3</sub> is evaluated to go to the ne!t ste', and expr<sub>2</sub> is evaluated again, to see if there is a ne!t ste'. /uring the e!ecution of a for loo', the se)uence is:
expr<sub>1</sub> expr<sub>2</sub> statement expr<sub>3</sub> expr<sub>2</sub> statement expr<sub>3</sub> ... expr<sub>2</sub> statement expr<sub>3</sub> expr<sub>2</sub>

1he first thing e!ecuted is expr<sub>1</sub>. expr<sub>3</sub> is evaluated after every tri' through the loo'. 1he last thing e!ecuted is always expr<sub>2</sub>, because when expr<sub>2</sub> evaluates false, the loo' e!its. #ll three e!'ressions of a for loo' are o'tional. "f you leave out expr<sub>1</sub>, there sim'ly is no initiali2ation ste', and the variable(s) used with the loo' had better have been initiali2ed already. "f you leave out expr<sub>2</sub>, there is no test, and the default for the for loo' is that another tri' through the loo' should be ta(en (such

that unless you brea( out of it some other way, the loo' runs forever). "f you leave out expr<sub>3</sub>, there is no increment ste'. 1he semicolons se'arate the three controlling e!'ressions of a for loo'. (1hese semicolons, by the way, have nothing to do with statement terminators.) "f you leave out one or more of the e!'ressions, the semicolons remain. 1herefore, one way of writing a deliberately infinite loo' in C is
for(;;) ...

"t's useful to com'are C's for loo' to the e)uivalent loo's in other com'uter languages you might (now. 1he C loo'
for(i ) '; i <) A; i ) i + 8)

is roughly e)uivalent to:


do *

for < ) N to O step P i)',A,8

(BASIC) (FORTRA ) (!as"a#)

for i Q) ' to A

"n C (unli(e ?-R1R#;), if the test condition is false before the first tri' through the loo', the loo' won't be traversed at all. "n C (unli(e >ascal), a loo' control variable (in this case, i) is guaranteed to retain its final value after the loo' com'letes, and it is also legal to modify the control variable within the loo', if you really want to. (8hen the loo' terminates due to the test condition turning false, the value of the control variable after the loo' will be the first value for which the condition failed, not the last value for which it succeeded.) "t's also worth noting that a for loo' can be used in more general ways than the sim'le, iterative e!am'les we've seen so far. 1he $$control variable'' of a for loo' does not have to be an integer, and it does not have to be incremented by an additive increment. "t could be $$incremented'' by a multi'licative factor (5, @, G, H, ...) if that was what you needed, or it could be a floating%'oint variable, or it could be another ty'e of variable which we haven't met yet which would ste', not over numeric values, but over the elements of an array or other data structure. &trictly s'ea(ing, a for loo' doesn't have to have a $$control variable'' at all the three e!'ressions can be anything, although the loo' will ma(e the most sense if they are related and together form the e!'ected initiali2e, test, increment se)uence. 1he 'owers%of%two e!am'le of the 'revious section does fit this 'attern, so we could rewrite it li(e this:
int '; for(' ) -; ' < * ; ' ) ' ( -) printf(",d\n", ');

1here is no earth%sha(ing or fundamental difference between the while and for loo's. "n fact, given the general for loo'
for(expr<sub>1</sub>; expr<sub>2</sub>; expr<sub>3</sub>) statement could usually rewrite it as a while loo', moving the initiali2e and increment

you e!'ressions to statements before and within the loo':


expr<sub>1</sub> ; while(expr<sub>2</sub>) { statement expr<sub>3</sub> ; ! &imilarly, given the general while loo' while(expr) statement you could rewrite it as a for loo': for(; expr; ) statement

#nother contrast between the for and while loo's is that although the test e!'ression (expr<sub>2</sub>) is o'tional in a for loo', it is re)uired in a while loo'. "f you leave out the controlling e!'ression of a while loo', the com'iler will com'lain about a synta! error. (1o write a deliberately infinite while loo', you have to su''ly an e!'ression which is always non2ero. 1he most obvious one would sim'ly be while(*) .) "f it's 'ossible to rewrite a for loo' as a while loo' and vice versa, why do they both e!ist+ 8hich one should you choose+ "n general, when you choose a for loo', its three e!'ressions should all mani'ulate the same variable or data structure, using the initiali2e, test, increment 'attern. "f they don't mani'ulate the same variable or don't follow that 'attern, wedging them into a for loo' buys nothing and a while loo' would 'robably be clearer. (1he reason that one loo' or the other can be clearer is sim'ly that, when you see a for loo', you expect to see an idiomatic initiali2e,test,increment of a single variable, and if the for loo' you're loo(ing at doesn't end u' matching that 'attern, you've been momentarily misled.)

3.& break and continue


41his section corres'onds to K&R &ec. C.D6 &ometimes, due to an e!ce'tional condition, you need to 0um' out of a loo' early, that is, before the main controlling e!'ression of the loo' causes it to terminate normally. -ther times, in an elaborate loo', you may want to 0um' bac( to the to' of the loo' (to test the controlling e!'ression again, and 'erha's begin a new tri' through the loo') without 'laying out all the ste's of the current loo'. 1he $rea& and continue statements allow you to do these two things. (1hey are, in fact, essentially restricted forms of %oto.)

1o 'ut everything we've seen in this cha'ter together, as well as demonstrate the use of the $rea& statement, here is a 'rogram for 'rinting 'rime numbers between 5 and 5<<:
#include <stdio.h> #include <math.h> main() { int i, @; printf(",d\n", -); for(i ) .; i <) * ; i ) i + *) { for(@ ) -; @ < i; @ ) @ + *) { if(i , @ )) ) $rea&; if(@ > s7rt(i)) { printf(",d\n", i); $rea&; ! ! ! return ! ;

1he outer loo' ste's the variable i through the numbers from C to 5<< the code tests to see if each number has any divisors other than 5 and itself. 1he trial divisor @ loo's from @ u' to i. @ is a divisor of i if the remainder of i divided by @ is <, so the code uses C's $$remainder'' or $$modulus'' o'erator , to ma(e this test. (Remember that i , @ gives the remainder when i is divided by @.) "f the 'rogram finds a divisor, it uses $rea& to brea( out of the inner loo', without 'rinting anything. 7ut if it notices that @ has risen higher than the s)uare root of i, without its having found any divisors, then i must not have any divisors, so i is 'rime, and its value is 'rinted. (-nce we've determined that i is 'rime by noticing that @ > s7rt(i), there's no need to try the other trial divisors, so we use a second $rea& statement to brea( out of the loo' in that case, too.) 1he sim'le algorithm and im'lementation we used here (li(e many sim'le 'rime number algorithms) does not wor( for @, the only even 'rime number, so the 'rogram $$cheats'' and 'rints out @ no matter what, before going on to test the numbers from C to 5<<. :any im'rovements to this sim'le 'rogram are of course 'ossible you might e!'eriment with it. (/id you notice that the $$test'' e!'ression of the inner loo' for(@ ) -; @ < i; @ ) @ + *) is in a sense unnecessary, because the loo' always terminates early due to one of the two $rea& statements+)

Chapter 4: ore a!out Declarations "and Initiali#ation$


4.1 Arrays
&o far, we've been declaring sim'le variables: the declaration
int i;

declares a single variable, named i, of ty'e int. "t is also 'ossible to declare an array of several elements. 1he declaration
int aR* S;

declares an array, named a, consisting of ten elements, each of ty'e int. &im'ly s'ea(ing, an array is a variable that can hold more than one value. .ou s'ecify which of the several values you're referring to at any given time by using a numeric subscript. (#rrays in 'rogramming are similar to vectors or matrices in mathematics.) 8e can re'resent the array a above with a 'icture li(e this: "n C, arrays are zero based: the ten elements of a 5<%element array are numbered from < to A. 1he subscri't which s'ecifies a single element of an array is sim'ly an integer e!'ression in s)uare brac(ets. 1he first element of the array is aR S, the second element is aR*S, etc. .ou can use these $$array subscri't e!'ressions'' anywhere you can use the name of a sim'le variable, for e!am'le:
aR S ) * ; aR*S ) - ; aR-S ) aR S + aR*S;

;otice that the subscri'ted array references (i.e. e!'ressions such as aR S and aR*S) can a''ear on either side of the assignment o'erator. 1he subscri't does not have to be a constant li(e or * it can be any integral e!'ression. ?or e!am'le, it's common to loo' over all elements of an array:
int i; for(i ) ; i < * ; i ) i + *) aRiS ) ; ten elements of the array a to

1his loo' sets all

<.

#rrays are a real convenience for many 'roblems, but there is not a lot that C will do with them for you automatically. "n 'articular, you can neither set all elements of an array at once nor assign one array to another both of the assignments
a ) ; #( K;LBM (#

and

int $R* S;

$ ) a;

#( K;LBM (#

are illegal. 1o set all of the elements of an array to some value, you must do so one by one, as in the loo' e!am'le above. 1o co'y the contents of one array to another, you must again do so one by one:
int $R* S; for(i ) ; i < * ; i ) i + *) $RiS ) aRiS;

Remember that for an array declared there is the to'most element is aRDS. 1his is one reason that 2ero% based loo's are also common in C. ;ote that the for loo'
for(i ) ; i < * ; i ) i + *) ... int aR* S; no element aR* S

does 0ust what you want in this case: it starts at <, the number 5< suggests (correctly) that it goes through 5< iterations, but the less%than com'arison means that the last tri' through the loo' has i set to A. (1he com'arison i <) D would also wor(, but it would be less clear and therefore 'oorer style.) "n the little e!am'les so far, we've always loo'ed over all 5< elements of the sam'le array a. "t's common, however, to use an array that's bigger than necessarily needed, and to use a second variable to (ee' trac( of how many elements of the array are currently in use. ?or e!am'le, we might have an integer variable
#( num$er of elements of aRS in use (# 1hen, when we wanted to do something with a (such as 'rint it out), the loo' would run from < to na, not 5< (or whatever a's si2e was): for(i ) ; i < na; i ) i + *) printf(",d\n", aRiS); ;aturally, we would have to ensure ensure that na's value was always less than or e)ual to the number of elements actually declared in a. int na;

#rrays are not limited to ty'e int you can have arrays of char or dou$le or any other ty'e. *ere is a slightly larger e!am'le of the use of arrays. &u''ose we want to investigate the behavior of rolling a 'air of dice. 1he total roll can be anywhere from @ to 5@, and we want to count how often each roll comes u'. 8e will use an array to (ee' trac( of the counts: aR-S will count how many times we've rolled @, etc. 8e'll simulate the roll of a die by calling C's random number generation function, rand(). Bach time you call rand(), it returns a different, 'seudo%random integer. 1he values that rand() returns ty'ically s'an a large range, so we'll use C's modulus (or $$remainder'') o'erator , to 'roduce random numbers in the range we want. 1he

e!'ression rand() , 0 'roduces random numbers in the range < to I, and rand() , 0 + * 'roduces random numbers in the range 5 to E. *ere is the 'rogram:
#include <stdio.h> #include <stdli$.h> main() {

int i; int d*, d-; int aR*.S;

#( uses R-..*-S (#

for(i ) -; i <) *-; i ) i + *) aRiS ) ; for(i ) ; i { d* ) d- ) aRd* ! < * ; i ) i + *)

rand() , 0 + *; rand() , 0 + *; + d-S ) aRd* + d-S + *;

for(i ) -; i <) *-; i ) i + *) printf(",dQ ,d\n", i, aRiS); ! return ;

8e include the header <stdli$.h> because it contains the necessary declarations for the rand() function. 8e declare the array of si2e 5C so that its highest element will be aR*-S. (8e're wasting aR S and aR*S this is no great loss.) 1he variables d* and dcontain the rolls of the two individual dice we add them together to decide which cell of the array to increment, in the line
aRd* + d-S ) aRd* + d-S + *;

#fter 5<< rolls, we 'rint the array out. 1y'ically (as cra's 'layers well (now), we'll see mostly D's, and relatively few @'s and 5@'s. (7y the way, it turns out that using the , o'erator to reduce the range of the rand function is not always a good idea. 8e'll say more about this 'roblem in an e!ercise.) G.5.5 #rray "nitiali2ation G.5.@ #rrays of #rrays ($$:ultidimensional'' #rrays)

4.1.1 Array Initialization


#lthough it is not 'ossible to assign to all elements of an array at once using an assignment e!'ression, it is 'ossible to initiali2e some or all elements of an array when the array is defined. 1he synta! loo(s li(e this:

int aR* S ) { , *, -, ., /, 2, 0, 3, E, D!;

1he list of values, enclosed in braces {!, se'arated by commas, 'rovides the initial values for successive elements of the array. (=nder older, 're%#;&" C com'ilers, you could not always su''ly initiali2ers for $$local'' arrays inside functions you could only initiali2e $$global'' arrays, those outside of any function. 1hose com'ilers are now rare, so you shouldn't have to worry about this distinction any more. 8e'll tal( more about local and global variables later in this cha'ter.) "f there are fewer initiali2ers than elements in the array, the remaining elements are automatically initiali2ed to <. ?or e!am'le, would array definition includes an initiali2er, the array dimension may be omitted, and the com'iler will infer the dimension from the number of initiali2ers. ?or e!am'le,
int $RS ) {* , **, *-, *., */!; would declare, define, and initiali2e an array $ of I elements (i.e. 0ust as if you'd ty'ed int $R2S). -nly the dimension is omitted the brac(ets RS remain to indicate that $ is in int aR* S ) { , *, -, ., /, 2, 0!; initiali2e aR3S, aRES, and aRDS to <. 8hen an

fact an array. "n the case of arrays of char, the initiali2er may be a string constant:
char s*R3S ) "Hello,"; char s-R* S ) "there,"; char s.RS ) "world!";

#s before, if the dimension is omitted, it is inferred from the si2e of the string initiali2er. (8e haven't covered strings in detail yet%%we'll do so in cha'ter H%%but it turns out that all strings in C are terminated by a s'ecial character with the value <. 1herefore, the array s. will be of si2e D, and the e!'licitly%si2ed s* does need to be of si2e at least D. ?or s-, the last G characters in the array will all end u' being this 2ero%value character.)

4.1.2 Arrays of Arrays (``Multidimensional'' Arrays)


41his section is o'tional and may be s(i''ed.6 8hen we said that $$#rrays are not limited to ty'e int you can have arrays of... any other ty'e,'' we meant that more literally than you might have guessed. "f you have an $$array of int,'' it means that you have an array each of whose elements is of ty'e int. 7ut you can have an array each of whose elements is of ty'e x, where x is any ty'e you choose. "n 'articular, you can have an array each of whose elements is another array3 8e can use these arrays of arrays for the same sorts of tas(s as we'd use multidimensional arrays in other com'uter languages (or matrices in mathematics). ;aturally, we are not limited to arrays of arrays, either we could have an array of arrays of arrays, which would act li(e a C%dimensional array, etc.

1he declaration of an array of arrays loo(s li(e this:


int a-R2SR3S;

.ou have to read com'licated declarations li(e these $$inside out.'' 8hat this one says is that a- is an array of I somethings, and that each of the somethings is an array of D ints. :ore briefly, $$a- is an array of I arrays of D ints,'' or, $$a- is an array of array of int.'' "n the declaration of a-, the brac(ets closest to the identifier a- tell you what a- first and foremost is. 1hat's how you (now it's an array of I arrays of si2e D, not the other way around. .ou can thin( of a- as having I $$rows'' and D $$columns,'' although this inter'retation is not mandatory. (.ou could also treat the $$first'' or inner subscri't as $$!'' and the second as $$y.'' =nless you're doing something fancy, all you have to worry about is that the subscri'ts when you access the array match those that you used when you declared it, as in the e!am'les below.) 1o illustrate the use of multidimensional arrays, we might fill in the elements of the above array a- using this 'iece of code:
int i, @; for(i ) ; i < 2; i ) i + *) { for(@ ) ; @ < 3; @ ) @ + *) a-RiSR@S ) * ( i + @; ! 1his 'air of nested loo's sets aR*SR-S to 5@, aR/SR*S to G5, etc. &ince the first dimension of a- is I, the first subscri'ting inde! variable, i, runs from < to G. &imilarly,

the second subscri't varies from < to E. 8e could 'rint a- out (in a two%dimensional way, suggesting its structure) with a similar 'air of nested loo's:
for(i ) ; i < 2; i ) i + *) { for(@ ) ; @ < 3; @ ) @ + *) printf(",d\t", a-RiSR@S); printf("\n"); ! \t in the printf string is the tab character.)

(1he character

Lust to see more clearly what's going on, we could ma(e the $$row'' and $$column'' subscri'ts e!'licit by 'rinting them, too:
for(@ ) ; @ < 3; @ ) @ + *) printf("\t,dQ", @); printf("\n"); for(i ) ; i < 2; i ) i + *) { printf(",dQ", i); for(@ ) ; @ < 3; @ ) @ + *) printf("\t,d", a-RiSR@S);

printf("\n"); !

1his last fragment would 'rint


Q *Q -Q .Q /Q Q * . / *Q * ** -* .* /* -Q *-./.Q . *. -. .. /. /Q / */ -/ ./ // 2Q 2 *2 -2 .2 /2 0Q 0 *0 -0 .0 /0

?inally, there's no reason we have to loo' over the $$rows'' first and the $$columns'' second de'ending on what we wanted to do, we could interchange the two loo's, li(e this:
for(@ ) ; @ < 3; @ ) @ + *) { for(i ) ; i < 2; i ) i + *) printf(",d\t", a-RiSR@S); printf("\n"); !

;otice that i is still the first subscri't and it still runs from < to G, and @ is still the second subscri't and it still runs from < to E.

4.2 isi!ility and 5i,etime 06lo!al aria!les* etc.4


8e haven't said so e!'licitly, but variables are channels of communication within a 'rogram. .ou set a variable to a value at one 'oint in a 'rogram, and at another 'oint (or 'oints) you read the value out again. 1he two 'oints may be in ad0oining statements, or they may be in widely se'arated 'arts of the 'rogram. *ow long does a variable last+ *ow widely se'arated can the setting and fetching 'arts of the 'rogram be, and how long after a variable is set does it 'ersist+ /e'ending on the variable and how you're using it, you might want different answers to these )uestions. 1he visibility of a variable determines how much of the rest of the 'rogram can access that variable. .ou can arrange that a variable is visible only within one 'art of one function, or in one function, or in one source file, or anywhere in the 'rogram. (8e haven't really tal(ed about source files yet we'll be e!'loring them soon.) 8hy would you want to limit the visibility of a variable+ ?or ma!imum fle!ibility, wouldn't it be handy if all variables were 'otentially visible everywhere+ #s it ha''ens, that arrangement would be too fle!ible: everywhere in the 'rogram, you would have to (ee' trac( of the names of all the variables declared anywhere else in the 'rogram, so that you didn't accidentally re%use one. 8henever a variable had the wrong value by mista(e, you'd have to search the entire 'rogram for the bug, because any statement in the entire 'rogram could 'otentially have modified that variable. .ou would constantly be ste''ing all over yourself by using a common variable name li(e i in two 'arts of your 'rogram, and having one sni''et of code accidentally overwrite the values being used by another 'art of the code. 1he communication would be sort of li(e an old 'arty line%%

you'd always be accidentally interru'ting other conversations, or having your conversations interru'ted. 1o avoid this confusion, we generally give variables the narrowest or smallest visibility they need. # variable declared within the braces {! of a function is visible only within that function variables declared within functions are called local variables. "f another function somewhere else declares a local variable with the same name, it's a different variable entirely, and the two don't clash with each other. -n the other hand, a variable declared outside of any function is a global variable, and it is 'otentially visible anywhere within the 'rogram. .ou use global variables when you do want the communications 'ath to be able to travel to any 'art of the 'rogram. 8hen you declare a global variable, you will usually give it a longer, more descri'tive name (not something generic li(e i) so that whenever you use it you will remember that it's the same variable everywhere. #nother word for the visibility of variables is scope. *ow long do variables last+ 7y default, local variables (those declared within a function) have automatic duration: they s'ring into e!istence when the function is called, and they (and their values) disa''ear when the function returns. Klobal variables, on the other hand, have static duration: they last, and the values stored in them 'ersist, for as long as the 'rogram does. (-f course, the values can in general still be overwritten, so they don't necessarily 'ersist forever.) ?inally, it is 'ossible to s'lit a function u' into several source files, for easier maintenance. 8hen several source files are combined into one 'rogram (we'll be seeing how in the ne!t cha'ter) the com'iler must have a way of correlating the global variables which might be used to communicate between the several source files. ?urthermore, if a global variable is going to be useful for communication, there must be e!actly one of it: you wouldn't want one function in one source file to store a value in one global variable named %lo$al9ar, and then have another function in another source file read from a different global variable named %lo$al9ar. 1herefore, a global variable should have e!actly one defining instance, in one 'lace in one source file. "f the same variable is to be used anywhere else (i.e. in some other source file or files), the variable is declared in those other file(s) with an external declaration, which is not a defining instance. 1he e!ternal declaration says, $$hey, com'iler, here's the name and ty'e of a global variable "'m going to use, but don't define it here, don't allocate s'ace for it it's one that's defined somewhere else, and "'m 0ust referring to it here.'' "f you accidentally have two distinct defining instances for a variable of the same name, the com'iler (or the lin(er) will com'lain that it is $$multi'ly defined.'' "t is also 'ossible to have a variable which is global in the sense that it is declared outside of any function, but 'rivate to the one source file it's defined in. &uch a variable is visible to the functions in that source file but not to any functions in any other source files, even if they try to issue a matching declaration.

.ou get any e!tra control you might need over visibility and lifetime, and you distinguish between defining instances and e!ternal declarations, by using storage classes. # storage class is an e!tra (eyword at the beginning of a declaration which modifies the declaration in some way. Kenerally, the storage class (if any) is the first word in the declaration, 'receding the ty'e name. (&trictly s'ea(ing, this ordering has not traditionally been necessary, and you may see some code with the storage class, ty'e name, and other 'arts of a declaration in an unusual order.) 8e said that, by default, local variables had automatic duration. 1o give them static duration (so that, instead of coming and going as the function is called, they 'ersist for as long as the function does), you 'recede their declaration with the static (eyword:
static int i;

7y default, a declaration of a global variable (es'ecially if it s'ecifies an initial value) is the defining instance. 1o ma(e it an e!ternal declaration, of a variable which is defined somewhere else, you 'recede it with the (eyword e'tern:
e'tern int @;

?inally, to arrange that a global variable is visible only within its containing source file, you 'recede it with the static (eyword:
static int &;

;otice that the static (eyword can do two different things: it ad0usts the duration of a local variable from automatic to static, or it ad0usts the visibility of a global variable from truly global to 'rivate%to%the%file. 1o summari2e, we've tal(ed about two different attributes of a variable: visibility and duration. 1hese are orthogonal, as shown in this table: duration: visibility: local global ;,# automatic static normal global variables normal local variables static local variables

8e can also distinguish between file%sco'e global variables and truly global variables, based on the 'resence or absence of the static (eyword. 8e can also distinguish between e!ternal declarations and defining instances of global variables, based on the 'resence or absence of the e'tern (eyword.

4.3 De,ault 7nitiali/ation


1he duration of a variable (whether static or automatic) also affects its default initiali2ation. "f you do not e!'licitly initiali2e them, automatic%duration variables (that is, local, non% static ones) are not guaranteed to have any 'articular initial value they will ty'ically contain garbage. "t is therefore a fairly serious error to attem't to use the value of an automatic variable which has never been initiali2ed or assigned to: the 'rogram will either wor( incorrectly, or the garbage value may 0ust ha''en to be $$correct'' such that the 'rogram a''ears to wor( correctly3 *owever, the 'articular value that the garbage ta(es on can vary de'ending literally on anything: other 'arts of the 'rogram, which com'iler was used, which hardware or o'erating system the 'rogram is running on, the time of day, the 'hase of the moon. (-(ay, maybe the 'hase of the moon is a bit of an e!aggeration.) &o you hardly want to say that a 'rogram which uses an uninitiali2ed variable $$wor(s'' it may seem to wor(, but it wor(s for the wrong reason, and it may sto' wor(ing tomorrow. &tatic%duration variables (global and static local), on the other hand, are guaranteed to be initiali2ed to < if you do not use an e!'licit initiali2er in the definition. (-nce u'on a time, there was another distinction between the initiali2ation of automatic vs. static variables: you could initiali2e aggregate ob0ects, such as arrays, only if they had static duration. "f your com'iler com'lains when you try to initiali2e a local array, it's 'robably an old, 're%#;&" com'iler. :odern, #;&"%com'atible com'ilers remove this limitation, so it's no longer much of a concern.)

4.4 Examples
*ere is an e!am'le demonstrating almost everything we've seen so far:
int %lo$al9ar ) *; e'tern int another%lo$al9ar; static int pri9ate9ar; f() { int local9ar; int local9ar- ) -; static int persistent9ar;

*ere we have si! variables, three declared outside and three declared inside of the function f(). is a global variable. 1he declaration we see is its defining instance (it ha''ens also to include an initial value). %lo$al9ar can be used anywhere in this source file, and
%lo$al9ar

it could be used in other source files, too (as long as corres'onding e!ternal declarations are issued in those other source files). is a second global variable. "t is not defined here the defining instance for it (and its initiali2ation) is somewhere else.
another%lo$al9ar

is a $$'rivate'' global variable. "t can be used anywhere within this source file, but functions in other source files cannot access it, even if they try to issue e!ternal declarations for it. ("f other source files try to declare a global variable called $$pri9ate9ar'', they'll get their own they won't be sharing this one.) &ince it has static duration and receives no e!'licit initiali2ation, pri9ate9ar will be initiali2ed to <.
pri9ate9ar local9ar is a local variable within the function f(). "t can be accessed only within the function f(). ("f any other 'art of the 'rogram declares a variable named $$local9ar'', that variable will be distinct from the one we're loo(ing at here.) local9ar is conce'tually $$created'' each time f() is called, and disa''ears when f() returns. #ny value which was stored in local9ar last time f() was running will be lost and will not be available ne!t time f() is called. ?urthermore, since it has no e!'licit initiali2er, the value of local9ar will in general be garbage each time f() is called.

is also local, and everything that we said about local9ar a''lies to it, e!ce't that since its declaration includes an e!'licit initiali2er, it will be initiali2ed to @ each time f() is called.
local9ar-

?inally, persistent9ar is again local to f(), but it does maintain its value between calls to f(). "t has static duration but no e!'licit initiali2er, so its initial value will be <. 1he defining instances and e!ternal declarations we've been loo(ing at so far have all been of sim'le variables. 1here are also defining instances and e!ternal declarations of functions, which we'll be loo(ing at in the ne!t cha'ter. (#lso, don't worry about static variables for now if they don't ma(e sense to you they're a relatively so'histicated conce't, which you won't need to use at first.) 1he term declaration is a general one which encom'asses defining instances and e!ternal declarations defining instances and e!ternal declarations are two different (inds of declarations. ?urthermore, either (ind of declaration suffices to inform the com'iler of the name and ty'e of a 'articular variable (or function). "f you have the defining instance of a global variable in a source file, the rest of that source file can use that variable without having to issue any e!ternal declarations. "t's only in source files where the defining instance hasn't been seen that you need e!ternal declarations. .ou will sometimes hear a defining instance referred to sim'ly as a $$definition,'' and you will sometimes hear an e!ternal declaration referred to sim'ly as a $$declaration.'' 1hese usages are mildly ambiguous, in that you can't tell out of conte!t whether a $$declaration'' is a generic declaration (that might be a defining instance or an e!ternal declaration) or

whether it's an e!ternal declaration that s'ecifically is not a defining instance. (&imilarly, there are other constructions that can be called $$definitions'' in C, namely the definitions of 're'rocessor macros, structures, and ty'edefs, none of which we've met.) "n these notes, we'll try to ma(e things clear by using the unambiguous terms defining instance and external declaration. Blsewhere, you may have to loo( at the conte!t to determine how the terms $$definition'' and $$declaration'' are being used.

Chapter %: Functions and &ro'ram Structure


41his cha'ter corres'onds to K&R cha'ter G.6 # function is a $$blac( bo!'' that we've loc(ed 'art of our 'rogram into. 1he idea behind a function is that it compartmentalizes 'art of the 'rogram, and in 'articular, that the code within the function has some useful 'ro'erties: 5. "t 'erforms some well%defined tas(, which will be useful to other 'arts of the 'rogram. @. "t might be useful to other 'rograms as well that is, we might be able to reuse it (and without having to rewrite it). C. 1he rest of the 'rogram doesn't have to (now the details of how the function is im'lemented. 1his can ma(e the rest of the 'rogram easier to thin( about. G. 1he function 'erforms its tas( well. "t may be written to do a little more than is re)uired by the first 'rogram that calls it, with the antici'ation that the calling 'rogram (or some other 'rogram) may later need the e!tra functionality or im'roved 'erformance. ("t's im'ortant that a finished function do its 0ob well, otherwise there might be a reluctance to call it, and it therefore might not achieve the goal of reusability.) I. 7y 'lacing the code to 'erform the useful tas( into a function, and sim'ly calling the function in the other 'arts of the 'rogram where the tas( must be 'erformed, the rest of the 'rogram becomes clearer: rather than having some large, com'licated, difficult%to%understand 'iece of code re'eated wherever the tas( is being 'erformed, we have a single sim'le function call, and the name of the function reminds us which tas( is being 'erformed. E. &ince the rest of the 'rogram doesn't have to (now the details of how the function is im'lemented, the rest of the 'rogram doesn't care if the function is reim'lemented later, in some different way (as long as it continues to 'erform its same tas(, of course3). 1his means that one 'art of the 'rogram can be rewritten, to im'rove 'erformance or add a new feature (or sim'ly to fi! a bug), without having to rewrite the rest of the 'rogram. ?unctions are 'robably the most im'ortant wea'on in our battle against software com'le!ity. .ou'll want to learn when it's a''ro'riate to brea( 'rocessing out into functions (and also when it's not), and how to set u' function interfaces to best achieve

the )ualities mentioned above: reuseability, information hiding, clarity, and maintainability.

#.1 Function (asics


&o what defines a function+ "t has a name that you call it by, and a list of 2ero or more arguments or parameters that you hand to it for it to act on or to direct its wor( it has a body containing the actual instructions (statements) for carrying out the tas( the function is su''osed to 'erform and it may give you bac( a return value, of a 'articular ty'e. *ere is a very sim'le function, which acce'ts one argument, multi'lies it by @, and hands that value bac(:
int mult$Atwo(int ') { int ret9al; ret9al ) ' ( -; return ret9al; !

-n the first line we see the return ty'e of the function (int), the name of the function (mult$Atwo), and a list of the function's arguments, enclosed in 'arentheses. Bach argument has both a name and a ty'e mult$Atwo acce'ts one argument, of ty'e int, named '. 1he name ' is arbitrary, and is used only within the definition of mult$Atwo. 1he caller of this function only needs to (now that a single argument of ty'e int is e!'ected the caller does not need to (now what name the function will use internally to refer to that argument. ("n 'articular, the caller does not have to 'ass the value of a variable named '.) ;e!t we see, surrounded by the familiar braces, the body of the function itself. 1his function consists of one declaration (of a local variable ret9al) and two statements. 1he first statement is a conventional e!'ression statement, which com'utes and assigns a value to ret9al, and the second statement is a return statement, which causes the function to return to its caller, and also s'ecifies the value which the function returns to its caller. 1he return statement can return the value of any e!'ression, so we don't really need the local ret9al variable the function could be colla'sed to
int mult$Atwo(int ') { return ' ( -; !

*ow do we call a function+ 8e've been doing so informally since day one, but now we have a chance to call one that we've written, in full detail. *ere is a tiny s(eletal 'rogram to call mult$A-:
#include <stdio.h>

e'tern int mult$Atwo(int); int main() { int i, @; i ) .; @ ) mult$Atwo(i); printf(",d\n", @); return ; !

1his loo(s much li(e our other test 'rograms, with the e!ce'tion of the new line
e'tern int mult$Atwo(int);

1his is an external function prototype declaration. "t is an e!ternal declaration, in that it declares something which is defined somewhere else. (8e've already seen the defining instance of the function mult$Atwo, but maybe the com'iler hasn't seen it yet.) 1he function 'rototy'e declaration contains the three 'ieces of information about the function that a caller needs to (now: the function's name, return ty'e, and argument ty'e(s). &ince we don't care what name the mult$Atwo function will use to refer to its first argument, we don't need to mention it. (-n the other hand, if a function ta(es several arguments, giving them names in the 'rototy'e may ma(e it easier to remember which is which, so names may o'tionally be used in function 'rototy'e declarations.) ?inally, to remind us that this is an e!ternal declaration and not a defining instance, the 'rototy'e is 'receded by the (eyword e'tern. 1he 'resence of the function 'rototy'e declaration lets the com'iler (now that we intend to call this function, mult$Atwo. 1he information in the 'rototy'e lets the com'iler generate the correct code for calling the function, and also enables the com'iler to chec( u' on our code (by ma(ing sure, for e!am'le, that we 'ass the correct number of arguments to each function we call). /own in the body of main, the action of the function call should be obvious: the line
@ ) mult$Atwo(i);

calls mult$Atwo, 'assing it the value of i as its argument. 8hen mult$Atwo returns, the return value is assigned to the variable @. (;otice that the value of main's local variable i will become the value of mult$Atwo's 'arameter ' this is absolutely not a 'roblem, and is a normal sort of affair.) 1his e!am'le is written out in $$longhand,'' to ma(e each ste' e)uivalent. 1he variable i isn't really needed, since we could 0ust as well call
@ ) mult$Atwo(.);

#nd the variable @ isn't really needed, either, since we could 0ust as well call
printf(",d\n", mult$Atwo(.)); *ere, the call to mult$Atwo is a sube!'ression which serves as the second argument to printf. 1he value returned by mult$Atwo is 'assed immediately to printf. (*ere, as in

general, we see the fle!ibility and generality of e!'ressions in C. #n argument 'assed to a function may be an arbitrarily com'le! sube!'ression, and a function call is itself an

e!'ression which may be embedded as a sube!'ression within arbitrarily com'licated surrounding e!'ressions.) 8e should say a little more about the mechanism by which an argument is 'assed down from a caller into a function. ?ormally, C is call by value, which means that a function receives copies of the values of its arguments. 8e can illustrate this with an e!am'le. &u''ose, in our im'lementation of mult$Atwo, we had gotten rid of the unnecessary ret9al variable li(e this:
int mult$Atwo(int ') { ' ) ' ( -; return '; !

8e might wonder, if we wrote it this way, what would ha''en to the value of the variable i when we called
@ ) mult$Atwo(i); 8hen our im'lementation of mult$Atwo changes the value of ', does that change the value of i u' in the caller+ 1he answer is no. ' receives a co'y of i's value, so when we change ' we don't change i.

*owever, there is an e!ce'tion to this rule. 8hen the argument you 'ass to a function is not a single variable, but is rather an array, the function does not receive a co'y of the array, and it therefore can modify the array in the caller. 1he reason is that it might be too e!'ensive to co'y the entire array, and furthermore, it can be useful for the function to write into the caller's array, as a way of handing bac( more data than would fit in the function's single return value. 8e'll see an e!am'le of an array argument (which the function deliberately writes into) in the ne!t cha'ter.

#.2 Function Prototypes


"n modern C 'rogramming, it is considered good 'ractice to use 'rototy'e declarations for all functions that you call. #s we mentioned, these 'rototy'es hel' to ensure that the com'iler can generate correct code for calling the functions, as well as allowing the com'iler to catch certain mista(es you might ma(e. &trictly s'ea(ing, however, 'rototy'es are o'tional. "f you call a function for which the com'iler has not seen a 'rototy'e, the com'iler will do the best it can, assuming that you're calling the function correctly. "f 'rototy'es are a good idea, and if we're going to get in the habit of writing function 'rototy'e declarations for functions we call that we've written (such as mult$Atwo), what ha''ens for library functions such as printf+ 8here are their 'rototy'es+ 1he answer is in that boiler'late line
#include <stdio.h>

we've been including at the to' of all of our 'rograms. stdio.h is conce'tually a file full of e!ternal declarations and other information 'ertaining to the $$&tandard ",-'' library functions, including printf. 1he #include directive (which we'll meet formally in a later cha'ter) arranges that all of the declarations within stdio.h are considered by the com'iler, rather as if we'd ty'ed them all in ourselves. &omewhere within these declarations is an e!ternal function 'rototy'e declaration for printf, which satisfies the rule that there should be a 'rototy'e for each function we call. (?or other standard library functions we call, there will be other $$header files'' to include.) ?inally, one more thing about e!ternal function 'rototy'e declarations. 8e've said that the distinction between e!ternal declarations and defining instances of normal variables hinges on the 'resence or absence of the (eyword e'tern. 1he situation is a little bit different for functions. 1he $$defining instance'' of a function is the function, including its body (that is, the brace% enclosed list of declarations and statements im'lementing the function). #n e!ternal declaration of a function, even without the (eyword e'tern, loo(s nothing li(e a function declaration. 1herefore, the (eyword e'tern is o'tional in function 'rototy'e declarations. "f you wish, you can write
int mult$Atwo(int);

and this is 0ust as good an e!ternal function 'rototy'e declaration as


e'tern int mult$Atwo(int);

07n t$e ,irst ,orm* )it$out t$e extern* as soon as t$e compiler sees t$e semicolon* it -no)s it.s not going to see a ,unction !ody* so t$e declaration can.t !e a de,inition.4 8ou may )ant to stay in t$e $a!it o, using extern in all external declarations* including ,unction declarations* since 99 extern : external declaration.. is an easier rule to remem!er. #.3 Function P$ilosop$y
8hat ma(es a good function+ 1he most im'ortant as'ect of a good $$building bloc('' is that have a single, well%defined tas( to 'erform. 8hen you find that a 'rogram is hard to manage, it's often because it has not been designed and bro(en u' into functions cleanly. 1wo obvious reasons for moving code down into a function are because: 5. "t a''eared in the main 'rogram several times, such that by ma(ing it a function, it can be written 0ust once, and the several 'laces where it used to a''ear can be re'laced with calls to the new function. @. 1he main 'rogram was getting too big, so it could be made ('resumably) smaller and more manageable by lo''ing 'art of it off and ma(ing it a function. 1hese two reasons are im'ortant, and they re'resent significant benefits of well%chosen functions, but they are not sufficient to automatically identify a good function. #s we've been suggesting, a good function has at least these two additional attributes: C. "t does 0ust one well%defined tas(, and does it well.

G. "ts interface to the rest of the 'rogram is clean and narrow. #ttribute C is 0ust a restatement of two things we said above. #ttribute G says that you shouldn't have to (ee' trac( of too many things when calling a function. "f you (now what a function is su''osed to do, and if its tas( is sim'le and well%defined, there should be 0ust a few 'ieces of information you have to give it to act u'on, and one or 0ust a few 'ieces of information which it returns to you when it's done. "f you find yourself having to 'ass lots and lots of information to a function, or remember details of its internal im'lementation to ma(e sure that it will wor( 'ro'erly this time, it's often a sign that the function is not sufficiently well%defined. (# 'oorly%defined function may be an arbitrary chun( of code that was ri''ed out of a main 'rogram that was getting too big, such that it essentially has to have access to all of that main function's local variables.) 1he whole 'oint of brea(ing a 'rogram u' into functions is so that you don't have to thin( about the entire 'rogram at once ideally, you can thin( about 0ust one function at a time. 8e say that a good function is a $$blac( bo!,'' which is su''osed to suggest that the $$container'' it's in is o'a)ue%%callers can't see inside it (and the function inside can't see out). 8hen you call a function, you only have to (now what it does, not how it does it. 8hen you're writing a function, you only have to (now what it's su''osed to do, and you don't have to (now why or under what circumstances its caller will be calling it. (8hen designing a function, we should 'erha's thin( about the callers 0ust enough to ensure that the function we're designing will be easy to call, and that we aren't accidentally setting things u' so that callers will have to thin( about any internal details.) &ome functions may be hard to write (if they have a hard 0ob to do, or if it's hard to ma(e them do it truly well), but that difficulty should be com'artmentali2ed along with the function itself. -nce you've written a $$hard'' function, you should be able to sit bac( and rela! and watch it do that hard wor( on call from the rest of your 'rogram. "t should be 'leasant to notice (in the ideal case) how much easier the rest of the 'rogram is to write, now that the hard wor( can be deferred to this wor(horse function. ("n fact, if a difficult%to%write function's interface is well%defined, you may be able to get away with writing a )uic(%and%dirty version of the function first, so that you can begin testing the rest of the 'rogram, and then go bac( later and rewrite the function to do the hard 'arts. #s long as the function's original interface antici'ated the hard 'arts, you won't have to rewrite the rest of the 'rogram when you fi! the function.) 8hat "'ve been trying to say in the 'receding few 'aragra'hs is that functions are im'ortant for far more im'ortant reasons than 0ust saving ty'ing. &ometimes, we'll write a function which we only call once, 0ust because brea(ing it out into a function ma(es things clearer and easier. "f you find that difficulties 'ervade a 'rogram, that the hard 'arts can't be buried inside blac(%bo! functions and then forgotten about if you find that there are hard 'arts which involve com'licated interactions among multi'le functions, then the 'rogram 'robably needs redesigning.

?or the 'ur'oses of e!'lanation, we've been seeming to tal( so far only about $$main 'rograms'' and the functions they call and the rationale behind moving some 'iece of code down out of a $$main 'rogram'' into a function. 7ut in reality, there's obviously no need to restrict ourselves to a two%tier scheme. #ny function we find ourself writing will often be a''ro'riately written in terms of sub%functions, sub%sub%functions, etc. (?urthermore, the $$main 'rogram,'' main(), is itself 0ust a function.)

#.4 Separate Compilation;;5ogistics


8hen a 'rogram consists of many functions, it can be convenient to s'lit them u' into several source files. #mong other things, this means that when a change is made, only the source file containing the change has to be recom'iled, not the whole 'rogram. 1he 0ob of 'utting the 'ieces of a 'rogram together and 'roducing the final e!ecutable falls to a tool called the linker. (8e may or may not need to invo(e the lin(er e!'licitly a com'iler often invo(es it automatically, as needed.) 1he lin(er loo(s through all of the 'ieces ma(ing u' the 'rogram, sorting out the e!ternal declarations and defining instances. 1he com'iler has noted the definitions made by each source file, as well as the declarations of things used by each source file but ('resumably) defined elsewhere. ?or each thing (global variable or function) used but not defined by one 'iece of the 'rogram, the lin(er loo(s for another 'iece which does define that thing. 1he logistics of writing a 'rogram in several source files, and then com'iling and lin(ing all of the source files together, de'end on the 'rogramming environment you're using. 8e'll cover two 'ossibilities, de'ending on whether you're using a traditional command% line com'iler or a newer integrated develo'ment environment ("/B) or other gra'hical user interface (K=") com'iler. 8hen using a command%line com'iler, there are usually two main ste's involved in building an e!ecutable 'rogram from one or more source files. ?irst, each source file is com'iled, resulting in an object file containing the machine instructions (generated by the com'iler) corres'onding to 0ust the code in that source file. &econd, the various ob0ect files are linked together, with each other and with libraries containing code for functions which you did not write (such as printf), to 'roduce a final, e!ecutable 'rogram. =nder =ni!, the cc command can 'erform one or both ste's. &o far, we've been using e!tremely sim'le invocations of cc such as
cc "o hello hello.c

1his invocation com'iles a single source file, hello.c, lin(s it, and 'laces the e!ecutable in a file named hello. &u''ose we have a 'rogram which we're trying to build from three se'arate source files, '.c, A.c, and 8.c. 8e could com'ile all three of them, and lin( them together, all at once, with the command

cc "o mApro% '.c A.c 8.c

#lternatively, we could com'ile them se'arately: the "c o'tion to cc tells it to com'ile only, but not to lin(. "nstead of building an e!ecutable, it merely creates an ob0ect file, with a name ending in .o, for each source file com'iled. &o the three commands
cc "c '.c cc "c A.c cc "c A.c com'ile '.c, A.c,

would and 8.c and create ob0ect files '.o, A.o, and 8.o. 1hen, the three ob0ect files could be lin(ed together using
cc "o mApro% '.o A.o 8.o

8hen the cc command is given an .o file, it (nows that it does not have to com'ile it (it's an ob0ect file, already com'iled) it 0ust sends it through to the lin( 'rocess. #bove we mentioned that the second, lin(ing ste' also involves 'ulling in library functions. ;ormally, the functions from the &tandard C library are lin(ed in automatically. -ccasionally, you must re)uest a library manually one common situation under =ni! is that the math functions tend to be in a se'arate math library, which is re)uested by using "lm on the command line. &ince the libraries must ty'ically be searched after your 'rogram's own ob0ect files are lin(ed (so that the lin(er (nows which library functions your 'rogram uses), any "l o'tion must a''ear after the names of your files on the command line. ?or e!am'le, to lin( the ob0ect file mAmath.o ('reviously com'iled with cc "c mAmath.c) together with the math library, you might use
cc "o mAmathpro% mAmath.o "lm

(1he l in the "l o'tion is the lower case ell, for library it is not the digit *.) Bverything we've said about cc also a''lies to most other =ni! C com'ilers. (:any of you will be using %cc, the ?&?'s K;= C Com'iler.) 1here are command%line com'ilers for :&%/-& systems which wor( similarly. ?or e!am'le, the :icrosoft C com'iler comes with a F1 ($$com'ile and lin('') command, which wor(s almost the same as =ni! cc. .ou can com'ile and lin( in one ste':
cl hello.c

or you can com'ile only:

cl #c hello.c

creating an ob0ect file named hello.o$@ which you can lin( later. 1he 'receding has all been about command%line com'ilers. "f you're using some (ind of integrated develo'ment environment, such as 7orland's 1urbo C or the :icrosoft >rogrammer's 8or(bench or Misual C or 1hin( C or Codewarrior, most of the mechanical details are ta(en care of for you. (1here's also less " can say here about these environments, because they're all different.) 1y'ically you define a $$'ro0ect,'' and there's a way to s'ecify the list of files (modules) which ma(e u' your 'ro0ect. 1he modules might be source files which you ty'ed in or obtained elsewhere, or they might be source files which you created within the environment ('erha's by re)uesting a $$;ew source file,'' and ty'ing it in). 1y'ically, the 'rogramming environment has a single $$build''

button which does whatever's re)uired to build (and 'erha's even e!ecute) your 'rogram. 1here may also be configuration windows in which you can s'ecify com'iler o'tions (such as whether you'd li(e it to acce't C or CFF). $$&ee your manual for details.''

Chapter (: Basic I)O


&o far, we've been using printf to do out'ut, and we haven't had a way of doing any in'ut. "n this cha'ter, we'll learn a bit more about printf, and we'll begin learning about character%based in'ut and out'ut.

&.1 printf
printf's

name comes from print *ormatted. "t generates out'ut under the control of a format string (its first argument) which consists of literal characters to be 'rinted and also s'ecial character se)uences%%format specifiers%%which re)uest that other arguments be fetched, formatted, and inserted into the string. -ur very first 'rogram was nothing more than a call to printf, 'rinting a constant string:
printf("Hello, world!\n");

-ur second 'rogram also featured a call to printf:


printf("i is ,d\n", i);

"n that case, whenever printf $$'rinted'' the string "i is ,d", it did not 'rint it verbatim it re'laced the two characters ,d with the value of the variable i. 1here are )uite a number of format s'ecifiers for printf. *ere are the basic ones :
,d ,ld ,c ,s ,f ,e ,% ,o ,' ,, print an int ar%ument in decimal print a lon% int ar%ument in decimal print a character print a strin% print a float or dou$le ar%ument same as ,f, $ut use e'ponential notation use ,e or ,f, whiche9er is $etter print an int ar%ument in octal ($ase E) print an int ar%ument in he'adecimal ($ase *0) print a sin%le ,

"t is also 'ossible to s'ecify the width and 'recision of numbers and strings as they are inserted (somewhat li(e ?-R1R#; format statements) we'll 'resent those details in a later cha'ter. (Mery briefly, for those who are curious: a notation li(e ,.d means to 'rint an int in a field at least C s'aces wide a notation li(e ,2.-f means to 'rint a float or dou$le in a field at least I s'aces wide, with two 'laces to the right of the decimal.) 1o illustrate with a few more e!am'les: the call

printf(",c ,d ,f ,e ,s ,d,,\n", 4*4, -, ..*/, 20 "ei%ht", D);

.,

would 'rint
* - ..*/ 2.0 e+ 3 ei%ht D, , * , * );

1he call

printf(",d ,o ,'\n", * *

would 'rint
*// 0/ &uccessive calls to printf 0ust build u' the out'ut a 'iece printf("Hello, "); printf("world!\n"); would also 'rint Hello, world! (on one line of out'ut).

at a time, so the calls

Barlier we learned that C re'resents characters internally as small integers corres'onding to the characters' values in the machine's character set (ty'ically #&C""). 1his means that there isn't really much difference between a character and an integer in C most of the difference is in whether we choose to inter'ret an integer as an integer or a character. printf is one 'lace where we get to ma(e that choice: ,d 'rints an integer value as a string of digits re'resenting its decimal value, while ,c 'rints the character corres'onding to a character set value. &o the lines
char c ) 454; int i ) D3; printf("c ) ,c, i ) ,d\n", c, i); 'rint c as the character # and i as the number printf("c ) ,d, i ) ,c\n", c, i);

would called

AD. 7ut if, on the other hand, we

we'd see the decimal value ('rinted by ,d) of the character 454, followed by the character (whatever it is) which ha''ens to have the decimal value AD. .ou have to be careful when calling printf. "t has no way of (nowing how many arguments you've 'assed it or what their ty'es are other than by loo(ing for the format s'ecifiers in the format string. "f there are more format s'ecifiers (that is, more , signs) than there are arguments, or if the arguments have the wrong ty'es for the format s'ecifiers, printf can misbehave badly, often 'rinting nonsense numbers or (even worse) numbers which mislead you into thin(ing that some other 'art of your 'rogram is bro(en. 7ecause of some automatic conversion rules which we haven't covered yet, you have a small amount of latitude in the ty'es of the e!'ressions you 'ass as arguments to printf. 1he argument for ,c may be of ty'e char or int, and the argument for ,d may be of ty'e char or int. 1he string argument for ,s may be a string constant, an array of characters, or a 'ointer to some characters (though we haven't really covered strings or 'ointers yet). ?inally, the arguments corres'onding to ,e, ,f, and ,% may be of ty'es float or dou$le. 7ut other combinations do not wor( reliably: ,d will not 'rint a lon% int or a float or a dou$le ,ld will not 'rint an int ,e, ,f, and ,% will not 'rint an int.

&.2 C$aracter 7nput and %utput


41his section corres'onds to K&R &ec. 5.I6 =nless a 'rogram can read some in'ut, it's hard to (ee' it from doing e!actly the same thing every time it's run, and thus being rather boring after a while. 1he most basic way of reading in'ut is by calling the function %etchar. %etchar reads one character from the $$standard in'ut,'' which is usually the user's (eyboard, but which can sometimes be redirected by the o'erating system. %etchar returns (rather obviously) the character it reads, or, if there are no more characters available, the s'ecial value >LH ($$end of file''). # com'anion function is putchar, which writes one character to the $$standard out'ut.'' (1he standard out'ut is, again not sur'risingly, usually the user's screen, although it, too, can be redirected. printf, li(e putchar, 'rints to the standard out'ut in fact, you can imagine that printf calls putchar to actually 'rint each of the characters it formats.) =sing these two functions, we can write a very basic 'rogram to co'y the in'ut, a character at a time, to the out'ut:
#include <stdio.h> #( copA input to output (# main() { int c; c ) %etchar(); while(c !) >LH) { putchar(c); c ) %etchar(); ! ! return ;

1his code is straightforward, and " encourage you to ty'e it in and try it out. "t reads one character, and if it is not the >LH code, enters a while loo', 'rinting one character and reading another, as long as the character read is not >LH. 1his is a straightforward loo', although there's one mystery surrounding the declaration of the variable c: if it holds characters, why is it an int+ 8e said that a char variable could hold integers corres'onding to character set values, and that an int could hold integers of more arbitrary values (u' to F%C@DED). &ince most character sets contain a few hundred characters (nowhere near C@DED), an int variable

can in general comfortably hold all char values, and then some. 1herefore, there's nothing wrong with declaring c as an int. 7ut in fact, it's im'ortant to do so, because %etchar can return every character value, plus that s'ecial, non%character value >LH, indicating that there are no more characters. 1y'e char is only guaranteed to be able to hold all the character values it is not guaranteed to be able to hold this $$no more characters'' value without 'ossibly mi!ing it u' with some actual character value. ("t's li(e trying to cram five 'ounds of boo(s into a four%'ound bo!, or 5C eggs into a carton that holds a do2en.) 1herefore, you should always remember to use an int for anything you assign %etchar's return value to. 8hen you run the character co'ying 'rogram, and it begins co'ying its in'ut (your ty'ing) to its out'ut (your screen), you may find yourself wondering how to sto' it. "t sto's when it receives end%of%file (B-?), but how do you send B-?+ 1he answer de'ends on what (ind of com'uter you're using. -n =ni! and =ni!%related systems, it's almost always control%/. -n :&%/-& machines, it's control%N followed by the RB1=R; (ey. =nder 1hin( C on the :acintosh, it's control%/, 0ust li(e =ni!. -n other systems, you may have to do some research to learn how to send B-?. (;ote, too, that the character you ty'e to generate an end%of%file condition from the (eyboard is not the same as the s'ecial >LH value returned by %etchar. 1he >LH value returned by %etchar is a code indicating that the in'ut system has detected an end%of%file condition, whether it's reading the (eyboard or a file or a magnetic ta'e or a networ( connection or anything else. "n a dis( file, at least, there is not li(ely to be any character in the file corres'onding to >LH as far as your 'rogram is concerned, >LH indicates the absence of any more characters to read.) #nother e!cellent thing to (now when doing any (ind of 'rogramming is how to terminate a runaway 'rogram. "f a 'rogram is running forever waiting for in'ut, you can usually sto' it by sending it an end%of%file, as above, but if it's running forever not waiting for something, you'll have to ta(e more drastic measures. =nder =ni!, control%C (or, occasionally, the /B9B1B (ey) will terminate the current 'rogram, almost no matter what. =nder :&%/-&, control%C or control%7RB#K will sometimes terminate the current 'rogram, but by default :&%/-& only chec(s for control%C when it's loo(ing for in'ut, so an infinite loo' can be un(illable. 1here's a /-& command,
$rea& on

which tells /-& to loo( for control%C more often, and " recommend using this command if you're doing any 'rogramming. ("f a 'rogram is in a really tight infinite loo' under :&%/-&, there can be no way of (illing it short of rebooting.) -n the :ac, try command%'eriod or command%o'tion%B&C#>B. ?inally, don't be disa''ointed (as " was) the first time you run the character co'ying 'rogram. .ou'll ty'e a character, and see it on the screen right away, and assume it's your 'rogram wor(ing, but it's only your com'uter echoing every (ey you ty'e, as it always does. 8hen you hit RB1=R;, a full line of characters is made available to your 'rogram. "t then 2i's several times through its loo', reading and 'rinting all the characters in the

line in )uic( succession. "n other words, when you run this 'rogram, it will 'robably seem to co'y the in'ut a line at a time, rather than a character at a time. .ou may wonder how a 'rogram could instead read a character right away, without waiting for the user to hit RB1=R;. 1hat's an e!cellent )uestion, but unfortunately the answer is rather com'licated, and beyond the sco'e of our discussion here. (#mong other things, how to read a character right away is one of the things that's not defined by the C language, and it's not defined by any of the standard library functions, either. *ow to do it de'ends on which o'erating system you're using.) &tylistically, the character%co'ying 'rogram above can be said to have one minor flaw: it contains two calls to %etchar, one which reads the first character and one which reads (by virtue of the fact that it's in the body of the loo') all the other characters. 1his seems inelegant and 'erha's unnecessary, and it can also be ris(y: if there were more things going on within the loo', and if we ever changed the way we read characters, it would be easy to change one of the %etchar calls but forget to change the other one. "s there a way to rewrite the loo' so that there is only one call to %etchar, res'onsible for reading all the characters+ "s there a way to read a character, test it for >LH, and assign it to the variable c, all at the same time+ 1here is. "t relies on the fact that the assignment o'erator, ), is 0ust another o'erator in C. #n assignment is not (necessarily) a standalone statement it is an e!'ression, and it has a value (the value that's assigned to the variable on the left%hand side), and it can therefore 'artici'ate in a larger, surrounding e!'ression. 1herefore, most C 'rogrammers would write the character%co'ying loo' li(e this:
while((c ) %etchar()) !) >LH) putchar(c); 8hat does this mean+ 1he function %etchar is called, as before, and its return value is assigned to the variable c. 1hen the value is immediately com'ared against the value >LH. ?inally, the true,false value of the com'arison controls the while loo': as long as the value is not >LH, the loo' continues e!ecuting, but as soon as an >LH is received, no more tri's through the loo' are ta(en, and it e!its. 1he net result is that the call to %etchar ha''ens inside the test at the to' of the while loo', and doesn't have to be re'eated

before the loo' and within the loo' (more on this in a bit). &tated another way, the synta! of a while loo' is always
while( expression ) ...

# com'arison (using the !) o'erator) is of course an e!'ression the synta! is


expression !) expression expression ) expression

#nd an assignment is an e!'ression the synta! is 8hat we're seeing is 0ust another e!am'le of the fact that e!'ressions can be combined with essentially limitless generality and therefore infinite variety. 1he left%hand side of the !) o'erator (its first expression) is the (sub)e!'ression c ) %etchar(), and the combined e!'ression is the expression needed by the while loo'.

1he e!tra 'arentheses around


(c ) %etchar())

are im'ortant, and are there because because the precedence of the !) o'erator is higher than that of the ) o'erator. "f we (incorrectly) wrote
while(c ) %etchar() !) >LH) #( K;LBM (#

the com'iler would inter'ret it as


while(c ) (%etchar() !) >LH))

1hat is, it would assign the result of the !) o'erator to the variable c, which is not what we want. ($$>recedence'' refers to the rules for which o'erators are a''lied to their o'erands in which order, that is, to the rules controlling the default grou'ing of e!'ressions and sube!'ressions. ?or e!am'le, the multi'lication o'erator ( has higher 'recedence than the addition o'erator +, which means that the e!'ression a + $ ( c is 'arsed as a + ($ ( c). 8e'll have more to say about 'recedence later.) 1he line
while((c ) %etchar()) !) >LH)

e'itomi2es the cry'tic brevity which C is notorious for. .ou may find this terseness infuriating (and you're not alone3), and it can certainly be carried too far, but bear with me for a moment while " defend it. 1he sim'le e!am'le we've been discussing illustrates the tradeoffs well. 8e have four things to do: 5. @. C. G. call %etchar, assign its return value to a variable, test the return value against >LH, and 'rocess the character (in this case, 'rint it out again).

8e can't eliminate any of these ste's. 8e have to assign %etchar's value to a variable (we can't 0ust use it directly) because we have to do two different things with it (test, and 'rint). 1herefore, com'ressing the assignment and test into the same line is the only good way of avoiding two distinct calls to %etchar. .ou may not agree that the com'ressed idiom is better for being more com'act or easier to read, but the fact that there is now only one call to %etchar is a real virtue. /on't thin( that you'll have to write com'ressed lines li(e
while((c ) %etchar()) !) >LH)

right away, or in order to be an $$e!'ert C 'rogrammer.'' 7ut, for better or worse, most e!'erienced C 'rogrammers do li(e to use these idioms (whether they're 0ustified or not), so you'll need to be able to at least recogni2e and understand them when you're reading other 'eo'les' code.

&.3 <eading 5ines


"t's often convenient for a 'rogram to 'rocess its in'ut not a character at a time but rather a line at a time, that is, to read an entire line of in'ut and then act on it all at once. 1he standard C library has a cou'le of functions for reading lines, but they have a few aw(ward features, so we're going to learn more about character in'ut (and about writing functions in general) by writing our own function to read one line. *ere it is:
#include <stdio.h> #( #( #( #( int { int int ma' ;ead one line from standard input, (# copAin% it to line arraA ($ut no more than ma' chars). (# Goes not place terminatin% \n in line arraA. (# ;eturns line len%th, or for emptA line, or >LH for end"of"file. (# %etline(char lineRS, int ma') nch ) ; c; ) ma' " *;

#( lea9e room for 4\ 4 (#

while((c ) %etchar()) !) >LH) { if(c )) 4\n4) $rea&; if(nch < ma') { lineRnchS ) c; nch ) nch + *; ! ! if(c )) >LH II nch )) return >LH; lineRnchS ) 4\ 4; return nch; ! )

#s the comment indicates, this function will read one line of in'ut from the standard in'ut, 'lacing it into the line array. 1he si2e of the line array is given by the ma' argument the function will never write more than ma' characters into line. 1he main body of the function is a %etchar loo', much as we used in the character% co'ying 'rogram. "n the body of this loo', however, we're storing the characters in an array (rather than immediately 'rinting them out). #lso, we're only reading one line of characters, then sto''ing and returning. 1here are several new things to notice here.

?irst of all, the %etline function acce'ts an array as a 'arameter. #s we've said, array 'arameters are an e!ce'tion to the rule that functions receive co'ies of their arguments%% in the case of arrays, the function does have access to the actual array 'assed by the caller, and can modify it. &ince the function is accessing the caller's array, not creating a new one to hold a co'y, the function does not have to declare the argument array's si2e it's set by the caller. (1hus, the brac(ets in $$char lineRS'' are em'ty.) *owever, so that we won't overflow the caller's array by reading too long a line into it, we allow the caller to 'ass along the si2e of the array, which we 'romise not to e!ceed. &econd, we see an e!am'le of the $rea& statement. 1he to' of the loo' loo(s li(e our earlier character%co'ying loo'%%it sto's when it reaches >LH%%but we only want this loo' to read one line, so we also sto' (that is, brea( out of the loo') when we see the \n character signifying end%of%line. #n e)uivalent loo', without the $rea& statement, would be
while((c ) %etchar()) !) >LH II c !) 4\n4) { if(nch < ma') { lineRnchS ) c; nch ) nch + *; ! !

8e haven't learned about the internal re'resentation of strings yet, but it turns out that strings in C are sim'ly arrays of characters, which is why we are reading the line into an array of characters. 1he end of a string is mar(ed by the s'ecial character, 4\ 4. 1o ma(e sure that there's always room for that character, on our way in we subtract 5 from ma', the argument that tells us how many characters we may 'lace in the line array. 8hen we're done reading the line, we store the end%of%string character 4\ 4 at the end of the string we've 0ust built in the line array. ?inally, there's one subtlety in the code which isn't too im'ortant for our 'ur'oses now but which you may wonder about: it's arranged to handle the 'ossibility that a few characters (i.e. the a''arent beginning of a line) are read, followed immediately by an >LH, without the usual \n end%of%line character. (1hat's why we return >LH only if we received >LH and we hadn't read any characters first.) "n any case, the function returns the length (number of characters) of the line it read, not including the \n. (1herefore, it returns < for an em'ty line.) 9i(e %etchar, it returns >LH when there are no more lines to read. ("t ha''ens that >LH is a negative number, so it will never match the length of a line that %etline has read.) *ere is an e!am'le of a test 'rogram which calls %etline, reading the in'ut a line at a time and then 'rinting each line bac( out:
#include <stdio.h>

e'tern int %etline(char RS, int); main() { char lineR-20S; while(%etline(line, -20) !) >LH) printf("Aou tAped \",s\"\n", line); return ! ;

T$e notation char [] in t$e ,unction prototype ,or getline says t$at getline accepts as its ,irst argument an array o, char. =$en t$e program calls getline* it is care,ul to pass along t$e actual si/e o, t$e array. 08ou mig$t notice a potential pro!lem> since t$e num!er 2#& appears in t)o places* i, )e e+er decide t$at 2#& is too small* and t$at )e )ant to !e a!le to read longer lines* )e could easily c$ange one o, t$e instances o, 2#&* and ,orget to c$ange t$e ot$er one. 5ater )e.ll learn )ays o, sol+ing;;t$at is* a+oiding;;t$is sort o, pro!lem.4 &.4 <eading "um!ers
1he %etline function of the 'revious section reads one line from the user, as a string. 8hat if we want to read a number+ -ne straightforward way is to read a string as before, and then immediately convert the string to a number. 1he standard C library contains a number of functions for doing this. 1he sim'lest to use are atoi(), which converts a string to an integer, and atof(), which converts a string to a floating%'oint number. (7oth of these functions are declared in the header <stdli$.h>, so you should #include that header at the to' of any file using these functions.) .ou could read an integer from the user li(e this:
#include <stdli$.h> char lineR-20S; int n; printf("TApe an inte%erQ\n"); %etline(line, -20); n ) atoi(line); ;ow the variable n contains the number ty'ed did ty'e a valid number, and that %etline did

by the user. (1his assumes that the user not return >LH.)

Reading a floating%'oint number is similar:


#include <stdli$.h> char lineR-20S; dou$le '; printf("TApe a floatin%"point num$erQ\n");

%etline(line, -20); ' ) atof(line);

(atof is actually declared as returning ty'e dou$le, but you could also use it with a variable of ty'e float, because in general, C automatically converts between float and dou$le as needed.) #nother way of reading in numbers, which you're li(ely to see in other boo(s on C, involves the scanf function, but it has several 'roblems, so we won't discuss it for now. (&u'erficially, scanf seems sim'le enough, which is why it's often used, es'ecially in te!tboo(s. 1he trouble is that to 'erform in'ut reliably using scanf is not nearly as easy as it loo(s, es'ecially when you're not sure what the user is going to ty'e.)

&.3 <eading 5ines


"t's often convenient for a 'rogram to 'rocess its in'ut not a character at a time but rather a line at a time, that is, to read an entire line of in'ut and then act on it all at once. 1he standard C library has a cou'le of functions for reading lines, but they have a few aw(ward features, so we're going to learn more about character in'ut (and about writing functions in general) by writing our own function to read one line. *ere it is:
#include <stdio.h> #( #( #( #( int { int int ma' ;ead one line from standard input, (# copAin% it to line arraA ($ut no more than ma' chars). (# Goes not place terminatin% \n in line arraA. (# ;eturns line len%th, or for emptA line, or >LH for end"of"file. (# %etline(char lineRS, int ma') nch ) ; c; ) ma' " *;

#( lea9e room for 4\ 4 (#

while((c ) %etchar()) !) >LH) { if(c )) 4\n4) $rea&; if(nch < ma') { lineRnchS ) c; nch ) nch + *; ! ! if(c )) >LH II nch )) return >LH; lineRnchS ) 4\ 4; return nch; ! )

#s the comment indicates, this function will read one line of in'ut from the standard in'ut, 'lacing it into the line array. 1he si2e of the line array is given by the ma' argument the function will never write more than ma' characters into line. 1he main body of the function is a %etchar loo', much as we used in the character% co'ying 'rogram. "n the body of this loo', however, we're storing the characters in an array (rather than immediately 'rinting them out). #lso, we're only reading one line of characters, then sto''ing and returning. 1here are several new things to notice here. ?irst of all, the %etline function acce'ts an array as a 'arameter. #s we've said, array 'arameters are an e!ce'tion to the rule that functions receive co'ies of their arguments%% in the case of arrays, the function does have access to the actual array 'assed by the caller, and can modify it. &ince the function is accessing the caller's array, not creating a new one to hold a co'y, the function does not have to declare the argument array's si2e it's set by the caller. (1hus, the brac(ets in $$char lineRS'' are em'ty.) *owever, so that we won't overflow the caller's array by reading too long a line into it, we allow the caller to 'ass along the si2e of the array, which we 'romise not to e!ceed. &econd, we see an e!am'le of the $rea& statement. 1he to' of the loo' loo(s li(e our earlier character%co'ying loo'%%it sto's when it reaches >LH%%but we only want this loo' to read one line, so we also sto' (that is, brea( out of the loo') when we see the \n character signifying end%of%line. #n e)uivalent loo', without the $rea& statement, would be
while((c ) %etchar()) !) >LH II c !) 4\n4) { if(nch < ma') { lineRnchS ) c; nch ) nch + *; ! !

8e haven't learned about the internal re'resentation of strings yet, but it turns out that strings in C are sim'ly arrays of characters, which is why we are reading the line into an array of characters. 1he end of a string is mar(ed by the s'ecial character, 4\ 4. 1o ma(e sure that there's always room for that character, on our way in we subtract 5 from ma', the argument that tells us how many characters we may 'lace in the line array. 8hen we're done reading the line, we store the end%of%string character 4\ 4 at the end of the string we've 0ust built in the line array. ?inally, there's one subtlety in the code which isn't too im'ortant for our 'ur'oses now but which you may wonder about: it's arranged to handle the 'ossibility that a few characters (i.e. the a''arent beginning of a line) are read, followed immediately by an

>LH,

without the usual \n end%of%line character. (1hat's why we return >LH only if we received >LH and we hadn't read any characters first.) "n any case, the function returns the length (number of characters) of the line it read, not including the \n. (1herefore, it returns < for an em'ty line.) 9i(e %etchar, it returns >LH when there are no more lines to read. ("t ha''ens that >LH is a negative number, so it will never match the length of a line that %etline has read.) *ere is an e!am'le of a test 'rogram which calls %etline, reading the in'ut a line at a time and then 'rinting each line bac( out:
#include <stdio.h> e'tern int %etline(char RS, int); main() { char lineR-20S; while(%etline(line, -20) !) >LH) printf("Aou tAped \",s\"\n", line); return ! ;

1he notation char RS in the function 'rototy'e for %etline says that %etline acce'ts as its first argument an array of char. 8hen the 'rogram calls %etline, it is careful to 'ass along the actual si2e of the array. (.ou might notice a 'otential 'roblem: since the number @IE a''ears in two 'laces, if we ever decide that @IE is too small, and that we want to be able to read longer lines, we could easily change one of the instances of @IE, and forget to change the other one. 9ater we'll learn ways of solving%%that is, avoiding%% this sort of 'roblem.)

&.4 <eading "um!ers


1he %etline function of the 'revious section reads one line from the user, as a string. 8hat if we want to read a number+ -ne straightforward way is to read a string as before, and then immediately convert the string to a number. 1he standard C library contains a number of functions for doing this. 1he sim'lest to use are atoi(), which converts a string to an integer, and atof(), which converts a string to a floating%'oint number. (7oth of these functions are declared in the header <stdli$.h>, so you should #include that header at the to' of any file using these functions.) .ou could read an integer from the user li(e this:
#include <stdli$.h> char lineR-20S; int n; printf("TApe an inte%erQ\n"); %etline(line, -20); n ) atoi(line);

;ow the variable n contains the number ty'ed by the user. (1his assumes that the user did ty'e a valid number, and that %etline did not return >LH.) Reading a floating%'oint number is similar:
#include <stdli$.h> char lineR-20S; dou$le '; printf("TApe a floatin%"point num$erQ\n"); %etline(line, -20); ' ) atof(line); (atof is actually declared as returning ty'e dou$le, but you could also use it with a variable of ty'e float, because in general, C automatically converts between float dou$le as needed.)

and

#nother way of reading in numbers, which you're li(ely to see in other boo(s on C, involves the scanf function, but it has several 'roblems, so we won't discuss it for now. (&u'erficially, scanf seems sim'le enough, which is why it's often used, es'ecially in te!tboo(s. 1he trouble is that to 'erform in'ut reliably using scanf is not nearly as easy as it loo(s, es'ecially when you're not sure what the user is going to ty'e.)

Chapter +:

ore Operators

"n this cha'ter we'll meet some (though still not all) of C's more advanced arithmetic o'erators. 1he ones we'll meet here have to do with ma(ing common 'atterns of o'erations easier. "t's e!tremely common in 'rogramming to have to increment a variable by 5, that is, to add 5 to it. (?or e!am'le, if you're 'rocessing each element of an array, you'll ty'ically write a loo' with an inde! or 'ointer variable ste''ing through the elements of the array, and you'll increment the variable each time through the loo'.) 1he classic way to increment a variable is with an assignment li(e
i ) i + *

&uch an assignment is 'erfectly common and acce'table, but it has a few slight 'roblems: 5. #s we've mentioned, it loo(s a little odd, es'ecially from an algebraic 'ers'ective. @. "f the ob0ect being incremented is not a sim'le variable, the idiom can become cumbersome to ty'e, and corres'ondingly more error%'rone. ?or e!am'le, the e!'ression
.. aRi+@+-(&S ) aRi+@+-(&S + *

is a bit of a mess, and you may have to loo( closely to see that the similar%loo(ing e!'ression
aRi+@+-(&S ) aRi+@+-+&S + *

'robably has a mista(e in it. G. &ince incrementing things is so common, it might be nice to have an easier way of doing it. "n fact, C 'rovides not one but two other, sim'ler ways of incrementing variables and 'erforming other similar o'erations. D.5 #ssignment -'erators

'.1 Assignment %perators


41his section corres'onds to K&R &ec. @.5<6 1he first and more general way is that any time you have the 'attern
$ ) $ op e

where v is any variable (or anything li(e aRiS), op is any of the binary arithmetic o'erators we've seen so far, and e is any e!'ression, you can re'lace it with the sim'lified
$ op) e

?or e!am'le, you can re'lace the e!'ressions


i ) i + * @ ) @ " * & ) & ( (n + *) aRiS ) aRiS # $

with
i +) @ ") & () aRiS * * n + * #) $

"n an e!am'le in a 'revious cha'ter, we used the assignment to


aRd* + d-S ) aRd* + d-S + *; count the rolls of a 'air of dice. =sing +), we could aRd* + d-S +) *;

sim'lify this e!'ression to

#s these e!am'les show, you can use the $$o'O'' form with any of the arithmetic o'erators (and with several other o'erators that we haven't seen yet). 1he e!'ression, e, does not have to be the constant 5 it can be any e!'ression. .ou don't always need as many e!'licit 'arentheses when using the op) o'erators: the e!'ression
& () n + *

is inter'reted as

& ) & ( (n + *)

'.2 7ncrement and Decrement %perators


41his section corres'onds to K&R &ec. @.H6 1he assignment o'erators of the 'revious section let us re'lace v ) v op e with v op) e, so that we didn't have to mention v twice. "n the most common cases, namely when we're adding or subtracting the constant 5 (that is, when op is + or " and e is 5), C 'rovides another set of shortcuts: the autoincrement and autodecrement o'erators. "n their sim'lest forms, they loo( li(e this:
a%%&1&to i subtra"t&1&'rom @ 1hese corres'ond to the slightly longer i +) * and @ ") *, fully $$longhand'' forms i ) i + * and @ ) @ " *. ++i ""@

res'ectively, and also to the

1he ++ and "" o'erators a''ly to one o'erand (they're unary o'erators). 1he e!'ression + +i adds 5 to i, and stores the incremented result bac( in i. 1his means that these o'erators don't 0ust com'ute new values they also modify the value of some variable. (1hey share this 'ro'erty%%modifying some variable%%with the assignment o'erators we can say that these o'erators all have side effects. 1hat is, they have some effect, on the side, other than 0ust com'uting a new value.) 1he incremented (or decremented) result is also made available to the rest of the e!'ression, so an e!'ression li(e means result bac( in i, multi'ly it by @, and store that result in &.'' (1his is a 'retty meaningless e!'ression our actual uses of ++ later will ma(e more sense.) 7oth the ++ and "" o'erators have an unusual 'ro'erty: they can be used in two ways, de'ending on whether they are written to the left or the right of the variable they're o'erating on. "n either case, they increment or decrement the variable they're o'erating on the difference concerns whether it's the old or the new value that's $$returned'' to the surrounding e!'ression. 1he prefix form ++i increments i and returns the incremented value. 1he postfix form i++ increments i, but returns the prior, non%incremented value. Rewriting our 'revious e!am'le slightly, the e!'ression
& ) - ( i++ & ) - ( ++i $$add one to i, store the

means $$ta(e i's old value and multi'ly it by @, increment i, store the result of the multi'lication in &.'' 1he distinction between the 'refi! and 'ostfi! forms of ++ and "" will 'robably seem strained at first, but it will ma(e more sense once we begin using these o'erators in more realistic situations.

?or e!am'le, our %etline function of the 'revious cha'ter used the statements
lineRnchS ) c; nch ) nch + *;

as the body of its inner loo'. =sing the ++ o'erator, we could sim'lify this to
lineRnch++S ) c;

8e wanted to increment nch after deciding which element of the line array to store into, so the 'ostfi! form nch++ is a''ro'riate. ;otice that it only ma(es sense to a''ly the ++ and "" o'erators to variables (or to other $$containers,'' such as aRiS). "t would be meaningless to say something li(e
*++

or
(-+.)++

1he ++ o'erator doesn't 0ust mean $$add one'' it means $$add one to a variable'' or $$ma(e a variable's value one more than it was before.'' 7ut (*+-) is not a variable, it's an e!'ression so there's no 'lace for ++ to store the incremented result. #nother unfortunate e!am'le is
i ) i++;

which some confused 'rogrammers sometimes write, 'resumably because they want to be e!tra sure that i is incremented by 5. 7ut i++ all by itself is sufficient to increment i by 5 the e!tra (e!'licit) assignment to i is unnecessary and in fact counter'roductive, meaningless, and incorrect. "f you want to increment i (that is, add one to it, and store the result bac( in i), either use
or or or i ) i + *; i +) *; ++i; i++;

/on't try to use some bi2arre combination. /id it matter whether we used ++i or i++ in this last e!am'le+ Remember, the difference between the two forms is what value (either the old or the new) is 'assed on to the surrounding e!'ression. "f there is no surrounding e!'ression, if the ++i or i++ a''ears all by itself, to increment i and do nothing else, you can use either form it ma(es no difference. (1wo ways that an e!'ression can a''ear $$all by itself,'' with $$no surrounding e!'ression,'' are when it is an e!'ression statement terminated by a semicolon, as above, or when it is one of the controlling e!'ressions of a for loo'.) ?or e!am'le, both the loo's
for(i ) ; i < * ; ++i) printf(",d\n", i); ; i < * ; i++)

and

for(i )

printf(",d\n", i);

will behave e!actly the same way and 'roduce e!actly the same results. ("n real code, 'ostfi! increment is 'robably more common, though 'refi! definitely has its uses, too.) "n the 'receding section, we sim'lified the e!'ression
aRd* + d-S ) aRd* + d-S + *;

from a 'revious cha'ter down to


aRd* + d-S +) *;

=sing ++, we could sim'lify it still further to


aRd* + d-S++; ++aRd* + d-S;

or (#gain, in this case, both are e)uivalent.) 8e'll see more e!am'les of these o'erators in the ne!t section and in the ne!t cha'ter.

'.3 %rder o, E+aluation


41his section corres'onds to K&R &ec. @.5@6 8hen you start using the ++ and "" o'erators in larger e!'ressions, you end u' with e!'ressions which do several things at once, i.e., they modify several different variables at more or less the same time. 8hen you write such an e!'ression, you must be careful not to have the e!'ression $$'ull the rug out from under itself'' by assigning two different values to the same variable, or by assigning a new value to a variable at the same time that another 'art of the e!'ression is trying to use the value of that variable. #ctually, we had already started writing e!'ressions which did several things at once even before we met the ++ and "" o'erators. 1he e!'ression
(c ) %etchar()) !) >LH

assigns %etchar's return value to c, and com'ares it to >LH. 1he ++ and "" o'erators ma(e it much easier to cram a lot into a small e!'ression: the e!am'le
lineRnch++S ) c;

from the 'revious section assigned c to lineRnchS, and incremented nch. 8e'll eventually meet e!'ressions which do three things at once, such as which
aRi++S ) $R@++S; assigns $R@S to aRiS, and

increments i, and increments @.

"f you're not careful, though, it's easy for this sort of thing to get out of hand. Can you figure out e!actly what the e!'ression
aRi++S ) $Ri++S; #( K;LBM (#

should do+ " can't, and here's the im'ortant 'art: neither can the compiler. 8e (now that the definition of 'ostfi! ++ is that the former value, before the increment, is what goes on to 'artici'ate in the rest of the e!'ression, but the e!'ression aRi++S ) $Ri++S contains

two ++ o'erators. 8hich of them ha''ens first+ /oes this e!'ression assign the old ith element of $ to the new ith element of a, or vice versa+ ;o one (nows. 8hen the order of evaluation matters but is not well%defined (that is, when we can't say for sure which order the com'iler will evaluate the various de'endent 'arts in) we say that the meaning of the e!'ression is undefined, and if we're smart we won't write the e!'ression in the first 'lace. (8hy would anyone ever write an $$undefined'' e!'ression+ 7ecause sometimes, the com'iler ha''ens to evaluate it in the order a 'rogrammer wanted, and the 'rogrammer assumes that since it wor(s, it must be o(ay.) ?or e!am'le, su''ose we carelessly wrote this loo':
int i, aR* S; i ) ; while(i < * ) aRiS ) i++; li(e we're trying to set aR S

#( K;LBM (#

"t loo(s to <, aR*S to 5, etc. 7ut what if the increment i++ ha''ens before the com'iler decides which cell of the array a to store the (unincremented) result in+ 8e might end u' setting aR*S to <, aR-S to 5, etc., instead. &ince, in this case, we can't be sure which order things would ha''en in, we sim'ly shouldn't write code li(e this. "n this case, what we're doing matches the 'attern of a for loo', anyway, which would be a better choice:
for(i )

;ow that into the same e!'ression that's setting aRiS, the code is 'erfectly well%defined, and is guaranteed to do what we want. "n general, you should be wary of ever trying to second%guess the order an e!'ression will be evaluated in, with two e!ce'tions: 5. .ou can obviously assume that 'recedence will dictate the order in which binary o'erators are a''lied. 1his ty'ically says more than 0ust what order things ha''ens in, but also what the e!'ression actually means. ("n other words, the 'recedence of ( over + says more than that the multi'lication $$ha''ens first'' in * + - ( . it says that the answer is D, not A.) @. #lthough we haven't mentioned it yet, it is guaranteed that the logical o'erators II and JJ are evaluated left%to%right, and that the right%hand side is not evaluated at all if the left%hand side determines the outcome. 1o loo( at one more e!am'le, it might seem that the code
int i ) 3; printf(",d\n", i++ ( i++);

; i < * ; i++) aRiS ) i; the increment i++ isn't crammed

would have to 'rint IE, because no matter which order the increments ha''en in, D(H is H(D is IE. 7ut ++ 0ust says that the increment ha''ens later, not that it ha''ens immediately, so this code could 'rint GA (if the com'iler chose to 'erform the multi'lication first, and both increments later). #nd, it turns out that ambiguous

e!'ressions li(e this are such a bad idea that the #;&" C &tandard does not re)uire com'ilers to do anything reasonable with them at all. 1heoretically, the above code could end u' 'rinting G@, or HA@CG<ACG@, or <, or crashing your com'uter. >rogrammers sometimes mista(enly imagine that they can write an e!'ression which tries to do too much at once and then 'redict e!actly how it will behave based on $$order of evaluation.'' ?or e!am'le, we (now that multi'lication has higher precedence than addition, which means that in the e!'ression
i + @ ( &

will be multi'lied by &, and then i will be added to the result. "nformally, we often say that the multi'lication ha''ens $$before'' the addition. 1hat's true in this case, but it doesn't say as much as we might thin( about a more com'licated e!'ression, such as
@ i++ + @++ ( &++

"n this case, besides the addition and multi'lication, i, @, and & are all being incremented. 8e can not say which of them will be incremented first it's the com'iler's choice. ("n 'articular, it is not necessarily the case that @++ or &++ will ha''en first the com'iler might choose to save i's value somewhere and increment i first, even though it will have to (ee' the old value around until after it has done the multi'lication.) "n the 'receding e!am'le, it 'robably doesn't matter which variable is incremented first. "t's not too hard, though, to write an e!'ression where it does matter. "n fact, we've seen one already: the ambiguous assignment aRi++S ) $Ri++S. 8e still don't (now which i+ + ha''ens first. (8e can not assume, based on the right%to%left behavior of the ) o'erator, that the right%hand i++ will ha''en first.) 7ut if we had to (now what aRi++S ) $Ri++S really did, we'd have to (now which i++ ha''ened first. ?inally, note that 'arentheses don't dictate overall evaluation order any more than 'recedence does. >arentheses override 'recedence and say which o'erands go with which o'erators, and they therefore affect the overall meaning of an e!'ression, but they don't say anything about the order of sube!'ressions or side effects. 8e could not $$fi!'' the evaluation order of any of the e!'ressions we've been discussing by adding 'arentheses. "f we wrote
i++ + (@++ ( &++)

we still wouldn't (now which of the increments would ha''en first. (1he 'arentheses would force the multi'lication to ha''en before the addition, but 'recedence already would have forced that, anyway.) "f we wrote
(i++) ( (i++)

the 'arentheses wouldn't force the increments to ha''en before the multi'lication or in any well%defined order this 'arenthesi2ed version would be 0ust as undefined as i++ ( i++ was. 1here's a line from Kernighan & Ritchie, which " am fond of )uoting when discussing these issues 4&ec. @.5@, '. IG6:

1he moral is that writing code that de'ends on order of evaluation is a bad 'rogramming 'ractice in any language. ;aturally, it is necessary to (now what things to avoid, but if you don't (now how they are done on various machines, you won't be tem'ted to ta(e advantage of a 'articular im'lementation. 1he first edition of K&R said ...if you don't (now how they are done on various machines, that innocence may hel' to 'rotect you. " actually 'refer the first edition wording. :any te!tboo(s encourage you to write small 'rograms to find out how your com'iler im'lements some of these ambiguous e!'ressions, but it's 0ust one ste' from writing a small 'rogram to find out, to writing a real 'rogram which ma(es use of what you've 0ust learned. 7ut you don!t want to write 'rograms that wor( only under one 'articular com'iler, that ta(e advantage of the way that one com'iler (but 'erha's no other) ha''ens to im'lement the undefined e!'ressions. "t's fine to be curious about what goes on $$under the hood,'' and many of you will be curious enough about what's going on with these $$forbidden'' e!'ressions that you'll want to investigate them, but 'lease (ee' very firmly in mind that, for real 'rograms, the very easiest way of dealing with ambiguous, undefined e!'ressions (which one com'iler inter'rets one way and another inter'rets another way and a third crashes on) is not to write them in the first 'lace.

Chapter ,: Strin's
&trings in C are re'resented by arrays of characters. 1he end of the string is mar(ed with a s'ecial character, the null character, which is sim'ly the character with the value <. (1he null character has no relation e!ce't in name to the null pointer. "n the #&C"" character set, the null character is named ;=9.) 1he null or string%terminating character is re'resented by another character esca'e se)uence, \ . (8e've seen it once already, in the %etline function of cha'ter E.) 7ecause C has no built%in facilities for mani'ulating entire arrays (co'ying them, com'aring them, etc.), it also has very few built%in facilities for mani'ulating strings. "n fact, C's only truly built%in string%handling is that it allows us to use string constants (also called string literals) in our code. 8henever we write a string, enclosed in double )uotes, C automatically creates an array of characters for us, containing that string, terminated by the \ character. ?or e!am'le, we can declare and define an array of characters, and initiali2e it with a string constant:
char strin%RS ) "Hello, world!";

"n this case, we can leave out the dimension of the array, since the com'iler can com'ute it for us based on the si2e of the initiali2er (5G, including the terminating \ ). 1his is the only case where the com'iler si2es a string array for us, however in other cases, it will be

necessary that we decide how big the arrays and other data structures we use to hold strings are. 1o do anything else with strings, we must ty'ically call functions. 1he C library contains a few basic string mani'ulation functions, and to learn more about strings, we'll be loo(ing at how these functions might be im'lemented. &ince C never lets us assign entire arrays, we use the strcpA function to co'y one string to another:
#include <strin%.h> char strin%*RS ) "Hello, world!"; char strin%-R- S; strcpA(strin%-, strin%*);

1he destination string is strcpA's first argument, so that a call to strcpA mimics an assignment e!'ression (with the destination on the left%hand side). ;otice that we had to allocate strin%- big enough to hold the string that would be co'ied to it. #lso, at the to' of any source file where we're using the standard library's string%handling functions (such as strcpA) we must include the line
#include <strin%.h>

which contains e!ternal declarations for these functions. &ince C won't let us com'are entire arrays, either, we must call a function to do that, too. 1he standard library's strcmp function com'ares two strings, and returns < if they are identical, or a negative number if the first string is al'habetically $$less than'' the second string, or a 'ositive number if the first string is $$greater.'' (Roughly s'ea(ing, what it means for one string to be $$less than'' another is that it would come first in a dictionary or tele'hone boo(, although there are a few anomalies.) *ere is an e!am'le:
char strin%.RS ) "this is"; char strin%/RS ) "a test"; if(strcmp(strin%., strin%/) )) ) printf("strin%s are e7ual\n"); else printf("strin%s are different\n");

1his code fragment will 'rint $$strings are different''. ;otice that strcmp does not return a 7oolean, true,false, 2ero,non2ero answer, so it's not a good idea to write something li(e
if(strcmp(strin%., strin%/)) ...

because it will behave bac(wards from what you might reasonably e!'ect. (;evertheless, if you start reading other 'eo'le's code, you're li(ely to come across conditionals li(e if(strcmp(a, $)) or even if(!strcmp(a, $)). 1he first does something if the strings are une)ual the second does something if they're e)ual. .ou can read these more easily if you 'retend for a moment that strcmp's name were strdiff, instead.) #nother standard library function is strcat, which concatenates strings. "t does not concatenate two strings together and give you a third, new string what it really does is

a''end one string onto the end of another. ("f it gave you a new string, it would have to allocate memory for it somewhere, and the standard library string functions generally never do that for you automatically.) *ere's an e!am'le:
char strin%2R- S ) "Hello, "; char strin%0RS ) "world!"; printf(",s\n", strin%2); strcat(strin%2, strin%0); printf(",s\n", strin%2); 1he first call to printf 'rints $$*ello, '', and the second one 'rints $$*ello, world3'', indicating that the contents of strin%0 have been tac(ed on to the end of strin%2. ;otice that we declared strin%2 with e!tra s'ace, to ma(e room for the a''ended

characters. "f you have a string and you want to (now its length ('erha's so that you can chec( whether it will fit in some other array you've allocated for it), you can call strlen, which returns the length of the string (i.e. the number of characters in it), not including the \ :
char strin%3RS ) "a$c"; int len ) strlen(strin%3); printf(",d\n", len);

?inally, you can 'rint strings out with printf using the ,s format s'ecifier, as we've been doing in these e!am'les already (e.g. printf(",s\n", strin%2);). &ince a string is 0ust an array of characters, all of the string%handling functions we've 0ust seen can be written )uite sim'ly, using no techni)ues more com'licated than the ones we already (now. "n fact, it's )uite instructive to loo( at how these functions might be im'lemented. *ere is a version of strcpA:
mAstrcpA(char destRS, char srcRS) { int i ) ; while(srcRiS !) 4\ 4) { destRiS ) srcRiS; i++; ! destRiS ) 4\ 4; ! 8e've called it mAstrcpA

instead of strcpA so that it won't clash with the version that's already in the standard library. "ts o'eration is sim'le: it loo(s at characters in the src string one at a time, and as long as they're not \ , assigns them, one by one, to the corres'onding 'ositions in the dest string. 8hen it's done, it terminates the dest string by a''ending a \ . (#fter e!iting the while loo', i is guaranteed to have a value one

greater than the subscri't of the last character in src.) ?or com'arison, here's a way of writing the same code, using a for loo':
for(i ) ; srcRiS !) 4\ 4; i++) destRiS ) srcRiS;

destRiS ) 4\ 4;

.et a third 'ossibility is to move the test for the terminating \ character out of the for loo' header and into the body of the loo', using an e!'licit if and $rea& statement, so that we can 'erform the test after the assignment and therefore use the assignment inside the loo' to co'y the \ to dest, too:
for(i ) ; ; i++) { destRiS ) srcRiS; if(srcRiS )) 4\ 4) $rea&; !

(1here are in fact many, many ways to write strcpA. :any 'rogrammers li(e to combine the assignment and test, using an e!'ression li(e (destRiS ) srcRiS) !) 4\ 4. 1his is actually the same sort of combined o'eration as we used in our %etchar loo' in cha'ter E.) *ere is a version of strcmp:
mAstrcmp(char str*RS, char str-RS) { int i ) ; while(*) { if(str*RiS !) str-RiS) return str*RiS " str-RiS; if(str*RiS )) 4\ 4 JJ str-RiS )) 4\ 4) return ; i++; ! !

Characters are com'ared one at a time. "f two characters in one 'osition differ, the strings are different, and we are su''osed to return a value less than 2ero if the first string (str*) is al'habetically less than the second string. &ince characters in C are re'resented by their numeric character set values, and since most reasonable character sets assign values to characters in al'habetical order, we can sim'ly subtract the two differing characters from each other: the e!'ression str*RiS " str-RiS will yield a negative result if the i'th character of str* is less than the corres'onding character in str-. (#s it turns out, this will behave a bit strangely when com'aring u''er% and lower%case letters, but it's the traditional a''roach, which the standard versions of strcmp tend to use.) "f the characters are the same, we continue around the loo', unless the characters we 0ust com'ared were (both) \ , in which case we've reached the end of both strings, and they were both e)ual. ;otice that we used what may at first a''ear to be an infinite loo'%%the controlling e!'ression is the constant 5, which is always true. 8hat actually ha''ens is that the loo'

runs until one of the two return statements brea(s out of it (and the entire function). ;ote also that when one string is longer than the other, the first test will notice this (because one string will contain a real character at the RiS location, while the other will contain \ , and these are not e)ual) and the return value will be com'uted by subtracting the real character's value from <, or vice versa. (1hus the shorter string will be treated as $$less than'' the longer.) ?inally, here is a version of strlen:
int mAstrlen(char strRS) { int i; for(i ) ; strRiS !) 4\ 4; i++) {!

return i; !

"n this case, all we have to do is find the \ that terminates the string, and it turns out that the three control e!'ressions of the for loo' do all the wor( there's nothing left to do in the body. 1herefore, we use an em'ty 'air of braces {! as the loo' body. B)uivalently, we could use a null statement, which is sim'ly a semicolon:
for(i ) ; strRiS !) 4\ 4; i++) ;

Bm'ty loo' bodies can be a bit startling at first, but they're not unheard of. Bverything we've loo(ed at so far has come out of C's standard libraries. #s one last e!am'le, let's write a su$str function, for e!tracting a substring out of a larger string. 8e might call it li(e this:
char strin%ERS ) "this is a test"; char strin%DR* S; su$str(strin%D, strin%E, 2, /); printf(",s\n", strin%D);

1he idea is that we'll e!tract a substring of length G, starting at character I (<%based) of strin%E, and co'y the substring to strin%D. Lust as with strcpA, it's our res'onsibility to declare the destination string (strin%D) big enough. *ere is an im'lementation of su$str. ;ot sur'risingly, it's )uite similar to strcpA:

su$str(char destRS, char srcRS, int offset, int len) { int i; for(i ) ; i < len II srcRoffset + iS !) 4\ 4; i++) destRiS ) srcRi + offsetS; destRiS ) 4\ 4; ! "f you com'are this code to the code for mAstrcpA, you'll see that the only differences are that characters are fetched from srcRoffset + iS instead of srcRiS, and that the loo' sto's when len characters have been co'ied (or when the src string runs out of

characters, whichever comes first).

"n this cha'ter, we've been careless about declaring the return ty'es of the string functions, and (with the e!ce'tion of mAstrlen) they haven't returned values. 1he real string functions do return values, but they're of ty'e $$'ointer to character,'' which we haven't discussed yet. 8hen wor(ing with strings, it's im'ortant to (ee' firmly in mind the differences between characters and strings. 8e must also occasionally remember the way characters are re'resented, and about the relation between character values and integers. #s we have had several occasions to mention, a character is re'resented internally as a small integer, with a value de'ending on the character set in use. ?or e!am'le, we might find that 454 had the value EI, that 4a4 had the value AD, and that 4+4 had the value GC. (1hese are, in fact, the values in the #&C"" character set, which most com'uters use. *owever, you don't need to learn these values, because the vast ma0ority of the time, you use character constants to refer to characters, and the com'iler worries about the values for you. =sing character constants in 'reference to raw numeric values also ma(es your 'rograms more 'ortable.) #s we may also have mentioned, there is a big difference between a character and a string, even a string which contains only one character (other than the \ ). ?or e!am'le, 454 is not the same as "5". 1o drive home this 'oint, let's illustrate it with a few e!am'les. "f you have a string:
char strin%RS ) "hello, world!";

you can modify its first character by saying


strin%R S ) 4H4;

(-f course, there's nothing magic about the first character you can modify any character in the string in this way. 7e aware, though, that it is not always safe to modify strings in% 'lace li(e this we'll say more about the modifiability of strings in a later cha'ter on 'ointers.) &ince you're re'lacing a character, you want a character constant, 4H4. "t would not be right to write
strin%R S ) "H"; #( K;LBM (# because "H" is a string (an array of characters), not a single character. (1he destination of the assignment, strin%R S, is a char, but the right%hand side is a string these ty'es don't

match.) -n the other hand, when you need a string, you must use a string. 1o 'rint a single newline, you could call
printf("\n");

"t would not be correct to call


printf(4\n4); printf #( K;LBM (#

always wants a string as its first argument. (#s one final e!am'le, putchar wants a single character, so putchar(4\n4) would be correct, and putchar("\n") would be incorrect.)

8e must also remember the difference between strings and integers. "f we treat the character 4*4 as an integer, 'erha's by saying
int i ) 4*4;

we will 'robably not get the value 5 in i we'll get the value of the character 4*4 in the machine's character set. ("n #&C"", it's GA.) 8hen we do need to find the numeric value of a digit character (or to go the other way, to get the digit character with a 'articular value) we can ma(e use of the fact that, in any character set used by C, the values for the digit characters, whatever they are, are contiguous. "n other words, no matter what values 4 4 and 4*4 have, 4*4 " 4 4 will be 5 (and, obviously, 4 4 " 4 4 will be <). &o, for a variable c holding some digit character, the e!'ression
c " 4 4

gives us its value. (&imilarly, for an integer value i, i + 4 4 gives us the corres'onding digit character, as long as < PO i PO A.) Lust as the character 4*4 is not the integer 5, the string "*-." is not the integer 5@C. 8hen we have a string of digits, we can convert it to the corres'onding integer by calling the standard function atoi:
char strin%RS ) "*-."; int i ) atoi(strin%); int @ ) atoi("/20");

9ater we'll learn how to go in the other direction, to convert an integer into a string. (-ne way, as long as what you want to do is 'rint the number out, is to call printf, using ,d in the format string.)

Chapter -: The C &reprocessor


Conce'tually, the $$'re'rocessor'' is a translation 'hase that is a''lied to your source code before the com'iler 'ro'er gets its hands on it. (-nce u'on a time, the 're'rocessor was a se'arate 'rogram, much as the com'iler and lin(er may still be se'arate 'rograms today.) Kenerally, the 're'rocessor 'erforms te!tual substitutions on your source code, in three sorts of ways:

?ile inclusion: inserting the contents of another file into your source file, as if you had ty'ed it all in there. :acro substitution: re'lacing instances of one 'iece of te!t with another. Conditional com'ilation: #rranging that, de'ending on various circumstances, certain 'arts of your source code are seen or not seen by the com'iler at all.

1he ne!t three sections will introduce these three 're'rocessing functions. 1he synta! of the 're'rocessor is different from the synta! of the rest of C in several res'ects. ?irst of all, the 're'rocessor is $$line based.'' Bach of the 're'rocessor directives we're going to learn about (all of which begin with the # character) must begin at the beginning of a line, and each ends at the end of the line. (1he rest of C treats line ends as

0ust another whites'ace character, and doesn't care how your 'rogram te!t is arranged into lines.) &econdly, the 're'rocessor does not (now about the structure of C%%about functions, statements, or e!'ressions. "t is 'ossible to 'lay strange tric(s with the 're'rocessor to turn something which does not loo( li(e C into C (or vice versa). "t's also 'ossible to run into 'roblems when a 're'rocessor substitution does not do what you e!'ected it to, because the 're'rocessor does not res'ect the structure of C statements and e!'ressions (but you e!'ected it to). ?or the sim'le uses of the 're'rocessor we'll be discussing, you shouldn't have any of these 'roblems, but you'll want to be careful before doing anything tric(y or outrageous with the 're'rocessor. (#s it ha''ens, 'laying tric(y and outrageous games with the 're'rocessor is considered s'orting in some circles, but it ra'idly gets out of hand, and can lead to bewilderingly im'enetrable 'rograms.)

?.1 File 7nclusion


41his section corres'onds to K&R &ec. G.55.56 # line of the form
#include <filename.h>

or

#include "filename.h"

causes the contents of the file filename.h to be read, 'arsed, and com'iled at that 'oint. (#fter filename.h is 'rocessed, com'ilation continues on the line following the #include line.) ?or e!am'le, su''ose you got tired of rety'ing e!ternal function 'rototy'es such as
e'tern int %etline(char RS, int);

at the to' of each source file. .ou could instead 'lace the 'rototy'e in a header file, 'erha's %etline.h, and then sim'ly 'lace
#include "%etline.h"

at the to' of each source file where you called %etline. (.ou might not find it worthwhile to create an entire header file for a single function, but if you had a 'ac(age of several related function, it might be very useful to 'lace all of their declarations in one header file.) #s we may have mentioned, that's e!actly what the &tandard header files such as stdio.h are%%collections of declarations (including e!ternal function 'rototy'e declarations) having to do with various sets of &tandard library functions. 8hen you use #include to read in a header file, you automatically get the 'rototy'es and other declarations it contains, and you should use header files, 'recisely so that you will get the 'rototy'es and other declarations they contain. 1he difference between the <> and "" forms is where the 're'rocessor searches for filename.h. #s a general rule, it searches for files enclosed in <> in central, standard directories, and it searches for files enclosed in "" in the $$current directory,'' or the directory containing the source file that's doing the including. 1herefore, "" is usually used for header files you've written, and <> is usually used for headers which are 'rovided for you (which someone else has written).

1he e!tension $$.h'', by the way, sim'ly stands for $$header,'' and reflects the fact that #include directives usually sit at the to' (head) of your source files, and contain global declarations and definitions which you would otherwise 'ut there. (1hat e!tension is not mandatory%%you can theoretically name your own header files anything you wish%%but .h is traditional, and recommended.) #s we've already begun to see, the reason for 'utting something in a header file, and then using #include to 'ull that header file into several different source files, is when the something (whatever it is) must be declared or defined consistently in all of the source files. "f, instead of using a header file, you ty'ed the something in to each of the source files directly, and the something ever changed, you'd have to edit all those source files, and if you missed one, your 'rogram could fail in subtle (or serious) ways due to the mismatched declarations (i.e. due to the incom'atibility between the new declaration in one source file and the old one in a source file you forgot to change). >lacing common declarations and definitions into header files means that if they ever change, they only have to be changed in one 'lace, which is a much more wor(able system. 8hat should you 'ut in header files+

B!ternal declarations of global variables and functions. 8e said that a global variable must have e!actly one defining instance, but that it can have external declarations in many 'laces. 8e said that it was a grave error to issue an e!ternal declaration in one 'lace saying that a variable or function has one ty'e, when the defining instance in some other 'lace actually defines it with another ty'e. ("f the two 'laces are two source files, se'arately com'iled, the com'iler will 'robably not even catch the discre'ancy.) "f you 'ut the e!ternal declarations in a header file, however, and include the header wherever it's needed, the declarations are virtually guaranteed to be consistent. "t's a good idea to include the header in the source file where the defining instance a''ears, too, so that the com'iler can chec( that the declaration and definition match. (1hat is, if you ever change the ty'e, you do still have to change it in two 'laces: in the source file where the defining instance occurs, and in the header file where the e!ternal declaration a''ears. 7ut at least you don't have to change it in an arbitrary number of 'laces, and, if you've set things u' correctly, the com'iler can catch any remaining mista(es.) >re'rocessor macro definitions (which we'll meet in the ne!t section). &tructure definitions (which we haven't seen yet). 1y'edef declarations (which we haven't seen yet).

*owever, there are a few things not to 'ut in header files: /efining instances of global variables. "f you 'ut these in a header file, and include the header file in more than one source file, the variable will end u' multi'ly defined. ?unction bodies (which are also defining instances). .ou don't want to 'ut these in headers for the same reason%%it's li(ely that you'll end u' with multi'le co'ies of the function and hence $$multi'ly defined'' errors. >eo'le sometimes 'ut

commonly%used functions in header files and then use #include to bring them (once) into each 'rogram where they use that function, or use #include to bring together the several source files ma(ing u' a 'rogram, but both of these are 'oor ideas. "t's much better to learn how to use your com'iler or lin(er to combine together se'arately%com'iled ob0ect files. &ince header files ty'ically contain only e!ternal declarations, and should not contain function bodies, you have to understand 0ust what does and doesn't ha''en when you #include a header file. 1he header file may 'rovide the declarations for some functions, so that the com'iler can generate correct code when you call them (and so that it can ma(e sure that you're calling them correctly), but the header file does not give the com'iler the functions themselves. 1he actual functions will be combined into your 'rogram at the end of com'ilation, by the 'art of the com'iler called the linker. 1he lin(er may have to get the functions out of libraries, or you may have to tell the com'iler,lin(er where to find them. "n 'articular, if you are trying to use a third%'arty library containing some useful functions, the library will often come with a header file describing those functions. =sing the library is therefore a two%ste' 'rocess: you must #include the header in the files where you call the library functions, and you must tell the lin(er to read in the functions from the library itself.

?.2 @acro De,inition and Su!stitution


41his section corres'onds to K&R &ec. G.55.@6 # 're'rocessor line of the form
#define name text

defines a macro with the given name, having as its value the given re'lacement te!t. #fter that (for the rest of the current source file), wherever the 're'rocessor sees that name, it will re'lace it with the re'lacement te!t. 1he name follows the same rules as ordinary identifiers (it can contain only letters, digits, and underscores, and may not begin with a digit). &ince macros behave )uite differently from normal variables (or functions), it is customary to give them names which are all ca'ital letters (or at least which begin with a ca'ital letter). 1he re'lacement te!t can be absolutely anything%%it's not restricted to numbers, or sim'le strings, or anything. 1he most common use for macros is to 'ro'agate various constants around and to ma(e them more self%documenting. 8e've been saying things li(e
char lineR* S; ... %etline(line, *

);

but this is neither readable nor reliable it's not necessarily obvious what all those 5<<'s scattered around the 'rogram are, and if we ever decide that 5<< is too small for the si2e of the array to hold lines, we'll have to remember to change the number in two (or more) 'laces. # much better solution is to use a macro:
#define U5N1<B> *

char lineRU5N1<B>S; ... %etline(line, U5N1<B>);

;ow, if we ever want to change the si2e, we only have to do it in one 'lace, and it's more obvious what the words U5N1<B> s'rin(led through the 'rogram mean than the magic numbers 5<< did. &ince the re'lacement te!t of a 're'rocessor macro can be anything, it can also be an e!'ression, although you have to reali2e that, as always, the te!t is substituted (and 'erha's evaluated) later. ;o evaluation is 'erformed when the macro is defined. ?or e!am'le, su''ose that you write something li(e
#define 5 #define = . #define F 5 + =

(this is a 'retty meaningless e!am'le, but the situation does come u' in 'ractice). 1hen, later, su''ose that you write
int ' ) F ( -;

"f 5, =, and F were ordinary variables, you'd e!'ect ' to end u' with the value 5<. 7ut let's see what ha''ens. 1he 're'rocessor always substitutes te!t for macros e!actly as you have written it. &o it first substitites the re'lacement te!t for the macro F, resulting in
int ' ) 5 + = ( -;

1hen it substitutes the macros 5 and =, resulting in


int ' ) - + . ( -;

-nly when the 're'rocessor is done doing all this substituting does the com'iler get into the act. 7ut when it evaluates that e!'ression (using the normal 'recedence of multi'lication over addition), it ends u' initiali2ing ' with the value H3 1o guard against this sort of 'roblem, it is always a good idea to include e!'licit 'arentheses in the definitions of macros which contain e!'ressions. "f we were to define the macro F as
#define F (5 + =)

then the declaration of ' would ultimately e!'and to


int ' ) (- + .) ( -;

and ' would be initiali2ed to 5<, as we 'robably e!'ected. ;otice that there does not have to be (and in fact there usually is not) a semicolon at the end of a #define line. (1his is 0ust one of the ways that the synta! of the 're'rocessor is different from the rest of C.) "f you accidentally ty'e
#define U5N1<B> * ; #( K;LBM (#

then when you later declare


char lineRU5N1<B>S;

the 're'rocessor will e!'and it to


char lineR* ;S;

#( K;LBM (#

which is a synta! error. 1his is what we mean when we say that the 're'rocessor doesn't (now much of anything about the synta! of C%%in this last e!am'le, the value or re'lacement te!t for the macro U5N1<B> was the G characters * ; , and that's e!actly what the 're'rocessor substituted (even though it didn't ma(e any sense). &im'le macros li(e U5N1<B> act sort of li(e little variables, whose values are constant (or constant e!'ressions). "t's also 'ossible to have macros which loo( li(e little functions (that is, you invo(e them with what loo(s li(e function call synta!, and they e!'and to re'lacement te!t which is a function of the actual arguments they are invo(ed with) but we won't be loo(ing at these yet.

?.3 Conditional Compilation


41his section corres'onds to K&R &ec. G.55.C6 1he last 're'rocessor directive we're going to loo( at is #ifdef. "f you have the se)uence
#ifdef name pro(ram&text #else more&pro(ram&text #endif

in your 'rogram, the code that gets com'iled de'ends on whether a 're'rocessor macro by that name is defined or not. "f it is (that is, if there has been a #define line for a macro called name), then $$program text'' is com'iled and $$more program text'' is ignored. "f the macro is not defined, $$more program text'' is com'iled and $$program text'' is ignored. 1his loo(s a lot li(e an if statement, but it behaves com'letely differently: an if statement controls which statements of your 'rogram are e!ecuted at run time, but #ifdef controls which 'arts of your 'rogram actually get com'iled. Lust as for the if statement, the #else in an #ifdef is o'tional. 1here is a com'anion directive #ifndef, which com'iles code if the macro is not defined (although the $$#else clause'' of an #ifndef directive will then be com'iled if the macro is defined). 1here is also an #if directive which com'iles code de'ending on whether a com'ile%time e!'ression is true or false. (1he e!'ressions which are allowed in an #if directive are somewhat restricted, however, so we won't tal( much about #if here.) Conditional com'ilation is useful in two general classes of situations:

.ou are trying to write a 'ortable 'rogram, but the way you do something is different de'ending on what com'iler, o'erating system, or com'uter you're using. .ou 'lace different versions of your code, one for each situation, between suitable #ifdef directives, and when you com'ile the 'rogam in a 'articular environment, you arrange to have the macro names defined which select the variants you need in that environment. (?or this reason, com'ilers usually have ways of letting you define macros from the invocation command line or in a configuration file, and many also 'redefine certain macro names related to the

o'erating system, 'rocessor, or com'iler in use. 1hat way, you don't have to change the code to change the #define lines each time you com'ile it in a different environment.) ?or e!am'le, in #;&" C, the function to delete a file is remo9e. -n older =ni! systems, however, the function was called unlin&. &o if filename is a variable containing the name of a file you want to delete, and if you want to be able to com'ile the 'rogram under these older =ni! systems, you might write
#ifdef uni' unlin&(filename); #else remo9e(filename); #endif

1hen, you could 'lace the line


#define uni'

at the to' of the file when com'iling under an old =ni! system. (&ince all you're using the macro uni' for is to control the #ifdef, you don't need to give it any re'lacement te!t at all. "ny definition for a macro, even if the re'lacement te!t is em'ty, causes an #ifdef to succeed.) ("n fact, in this e!am'le, you wouldn't even need to define the macro uni' at all, because C com'ilers on old =ni! systems tend to 'redefine it for you, 'recisely so you can ma(e tests li(e these.)

.ou want to com'ile several different versions of your 'rogram, with different features 'resent in the different versions. .ou brac(et the code for each feature with #ifdef directives, and (as for the 'revious case) arrange to have the right macros defined or not to build the version you want to build at any given time. 1his way, you can build the several different versions from the same source code. (-ne common e!am'le is whether you turn debugging statements on or off. .ou can brac(et each debugging 'rintout with #ifdef G>=VM and #endif, and then turn on debugging only when you need it.) ?or e!am'le, you might use lines li(e this:
#ifdef G>=VM printf("' is ,d\n", '); #endif

to 'rint out the value of the variable ' at some 'oint in your 'rogram to see if it's what you e!'ect. 1o enable debugging 'rintouts, you insert the line
#define G>=VM

at the to' of the file, and to turn them off, you delete that line, but the debugging 'rintouts )uietly remain in your code, tem'orarily deactivated, but ready to reactivate if you find yourself needing them again later. (#lso, instead of inserting and deleting the #define line, you might use a com'iler flag such as "GG>=VM to define the macro G>=VM from the com'iler invocatin line.) Conditional com'ilation can be very handy, but it can also get out of hand. 8hen large chun(s of the 'rogram are com'letely different de'ending on, say, what o'erating system the 'rogram is being com'iled for, it's often better to 'lace the different versions in se'arate source files, and then only use one of the files (corres'onding to one of the versions) to build the 'rogram on any given system. #lso, if you are using an #;&" &tandard com'iler and you are writing #;&"%com'atible code, you usually won't need so much conditional com'ilation, because the &tandard s'ecifies e!actly how the com'iler must do certain things, and e!actly which library functions it much 'rovide, so you don't have to wor( so hard to accommodate the old variations among com'ilers and libraries.

Chapter 1.: &ointers


>ointers are often thought to be the most difficult as'ect of C. "t's true that many 'eo'le have various 'roblems with 'ointers, and that many 'rograms founder on 'ointer%related bugs. #ctually, though, many of the 'roblems are not so much with the 'ointers per se but rather with the memory they 'oint to, and more s'ecifically, when there isn!t any valid memory which they 'oint to. #s long as you're careful to ensure that the 'ointers in your 'rograms always 'oint to valid memory, 'ointers can be useful, 'owerful, and relatively trouble%free tools. (8e'll tal( about memory allocation in the ne!t cha'ter.) 41his cha'ter is the only one in this series that contains any gra'hics. "f you are using a te!t%only browser, there are a few figures you won't be able to see.6 # 'ointer is a variable that 'oints at, or refers to, another variable. 1hat is, if we have a 'ointer variable of ty'e $$'ointer to int,$$ it might 'oint to the int variable i, or to the third cell of the int array a. Kiven a 'ointer variable, we can as( )uestions li(e, $$8hat's the value of the variable that this 'ointer 'oints to+'' 8hy would we want to have a variable that refers to another variable+ 8hy not 0ust use that other variable directly+ 1he answer is that a level of indirection can be very useful. (#ndirection is 0ust another word for the situation when one variable refers to another.) "magine a club which elects new officers each year. "n its clubroom, it might have a set of mailbo!es for each member, along with s'ecial mailbo!es for the 'resident, secretary, and treasurer. 1he ban( doesn't mail statements to the treasurer under the treasurer's name it mails them to $$treasurer,'' and the statements go to the mailbo! mar(ed $$treasurer.'' 1his way, the ban( doesn't have to change the mailing address it uses every year. 1he mailbo!es labeled $$'resident,'' $$treasurer,'' and $$secretary'' are a little bit li(e 'ointers%%they don't refer to 'eo'le directly.

"f we ma(e the analogy that a mailbo! holding letters is li(e a variable holding numbers, then mailbo!es for the 'resident, secretary, and treasurer aren't )uite li(e 'ointers, because they're still mailbo!es which in 'rinci'le could hold letters directly. 7ut su''ose that mail is never actually 'ut in those three mailbo!es: su''ose each of the officers' mailbo!es contains a little mar(er listing the name of the member currently holding that office. 8hen you're sorting mail, and you have a letter for the treasurer, you first go to the treasurer's mailbo!, but rather than 'utting the letter there, you read the name on the mar(er there, and 'ut the mail in the mailbo! for that 'erson. &imilarly, if the club is 'oorly organi2ed, and the treasurer sto's doing his 0ob, and you're the 'resident, and one day you get a call from the ban( saying that the club's account is in arrears and the treasurer hasn't done anything about it and as(ing if you, the 'resident, can loo( into it and if the club is so 'oorly organi2ed that you've forgotten who the treasurer is, you can go to the treasurer's mailbo!, read the name on the mar(er there, and go to that mailbo! (which is 'robably overflowing) to find all the treasury%related mail. 8e could say that the mar(ers in the mailbo!es for the 'resident, secretary, and treasurer were pointers to other mailbo!es. "n an analogous way, 'ointer variables in C contain 'ointers to other variables or memory locations.

1A.1 (asic Pointer %perations


41his section corres'onds to K&R &ec. I.56 1he first things to do with 'ointers are to declare a 'ointer variable, set it to 'oint somewhere, and finally mani'ulate the value that it 'oints to. # sim'le 'ointer declaration loo(s li(e this:
int (ip;

1his declaration loo(s li(e our earlier declarations, with one obvious difference: that asteris(. 1he asteris( means that ip, the variable we're declaring, is not of ty'e int, but rather of ty'e 'ointer%to%int. (#nother way of loo(ing at it is that (ip, which as we'll see is the value 'ointed to by ip, will be an int.) 8e may thin( of setting a 'ointer variable to 'oint to another variable as a two%ste' 'rocess: first we generate a 'ointer to that other variable, then we assign this new 'ointer to the 'ointer variable. 8e can say (but we have to be careful when we're saying it) that a 'ointer variable has a value, and that its value is $$'ointer to that other variable''. 1his will ma(e more sense when we see how to generate 'ointer values. >ointers (that is, 'ointer values) are generated with the $$address%of'' o'erator I, which we can also thin( of as the $$'ointer%to'' o'erator. 8e demonstrate this by declaring (and initiali2ing) an int variable i, and then setting ip to 'oint to it:
int i ) 2; ip ) Ii;

1he assignment e!'ression ip ) Ii; contains both 'arts of the $$two%ste' 'rocess'': Ii generates a 'ointer to i, and the assignment o'erator assigns the new 'ointer to (that is,

'laces it $$in'') the variable ip. ;ow ip $$'oints to'' i, which we can illustrate with this 'icture:

is a variable of ty'e int, so the value in its bo! is a number, I. ip is a variable of ty'e 'ointer%to%int, so the $$value'' in its bo! is an arrow 'ointing at another bo!. Referring once again bac( to the $$two%ste' 'rocess'' for setting a 'ointer variable: the I o'erator draws us the arrowhead 'ointing at i's bo!, and the assignment o'erator ), with the 'ointer variable ip on its left, anchors the other end of the arrow in ip's bo!.
i

8e discover the value 'ointed to by a 'ointer using the $$contents%of'' o'erator, (. >laced in front of a 'ointer, the ( o'erator accesses the value 'ointed to by that 'ointer. "n other words, if ip is a 'ointer, then the e!'ression (ip gives us whatever it is that's in the variable or location 'ointed to by ip. ?or e!am'le, we could write something li(e which
printf(",d\n", (ip); would 'rint I, since ip 'oints

to i, and i is (at the moment) I.

(.ou may wonder how the asteris( ( can be the 'ointer contents%of o'erator when it is also the multi'lication o'erator. 1here is no ambiguity here: it is the multi'lication o'erator when it sits between two variables, and it is the contents%of o'erator when it sits in front of a single variable. 1he situation is analogous to the minus sign: between two variables or e!'ressions it's the subtraction o'erator, but in front of a single o'erator or e!'ression it's the negation o'erator. 1echnical terms you may hear for these distinct roles are unary and binary: a binary o'erator a''lies to two o'erands, usually on either side of it, while a unary o'erator a''lies to a single o'erand.) 1he contents%of o'erator ( does not merely fetch values through 'ointers it can also set values through 'ointers. 8e can write something li(e
(ip ) 3;

which means $$set whatever ip 'oints to to D.'' #gain, the ( tells us to go to the location 'ointed to by ip, but this time, the location isn't the one to fetch from%%we're on the left% hand sign of an assignment o'erator, so (ip tells us the location to store to. (1he situation is no different from array subscri'ting e!'ressions such as aR.S which we've already seen a''earing on both sides of assignments.) 1he result of the assignment (ip ) 3 is that i's value is changed to D, and the 'icture changes to:

"f we called printf(",d\n", (ip) again, it would now 'rint D. #t this 'oint, you may be wondering why we're going through this rigamarole%%if we wanted to set i to D, why didn't we do it directly+ 8e'll begin to e!'lore that ne!t, but first let's notice the difference between changing a 'ointer (that is, changing what variable it 'oints to) and changing the value at the location it 'oints to. 8hen we wrote (ip ) 3, we changed the value 'ointed to by ip, but if we declare another variable @:
int @ ) .;

and write

ip ) I@; we've changed ip itself.

1he 'icture now loo(s li(e this:

8e have to be careful when we say that a 'ointer assignment changes $$what the 'ointer 'oints to.'' -ur earlier assignment
(ip ) 3;

changed the value 'ointed to by ip, but this more recent assignment
ip ) I@;

has changed what variable ip 'oints to. "t's true that $$what ip 'oints to'' has changed, but this time, it has changed for a different reason. ;either i (which is still D) nor @ (which is still C) has changed. (8hat has changed is ip's value.) "f we again call
printf(",d\n", (ip);

this time it will 'rint C. 8e can also assign 'ointer values to other 'ointer variables. "f we declare a second 'ointer variable:
int (ip-;

then we can say


ip- ) ip;

;ow ip- 'oints where ip does we've essentially made a $$co'y'' of the arrow:

;ow, if we set ip to 'oint bac( to i again:


ip ) Ii;

the two arrows 'oint to different 'laces:

8e can now see that the two assignments


ip- ) ip;

and
(ip- ) (ip;

do two very different things. 1he first would ma(e ip- again 'oint to where ip 'oints (in other words, bac( to i again). 1he second would store, at the location 'ointed to by ip-, a co'y of the value 'ointed to by ip in other words (if ip and ip- still 'oint to i and @ res'ectively) it would set @ to i's value, or D. "t's im'ortant to (ee' very clear in your mind the distinction between a pointer and what it points to. 1he two are li(e a''les and oranges (or 'erha's oil and water) you can't mi! them. .ou can't $$set ip to I'' by writing something li(e
ip ) 2; #( K;LBM (#

I is an integer, but ip is a 'ointer. .ou 'robably wanted to $$set the value pointed to by ip to I,'' which you e!'ress by writing
(ip ) 2;

&imilarly, you can't $$see what ip is'' by writing


printf(",d\n", ip); printf(",d\n", (ip);

#( K;LBM (#

#gain, ip is a 'ointer%to%int, but ,d e!'ects an int. 1o 'rint what ip points to, use ?inally, a few more notes about 'ointer declarations. 1he ( in a 'ointer declaration is related to, but different from, the contents%of o'erator (. #fter we declare a 'ointer variable
int (ip;

the e!'ression sets


ip ) Ii what ip 'oints (ip ) 2

to (that is, which location it 'oints to), while the e!'ression

sets the value of the location 'ointed to by ip. -n the other hand, if we declare a 'ointer variable and include an initiali2er:
int (ip. ) Ii;

we're setting the initial value for ip., which is where ip. will 'oint, so that initial value is a 'ointer. ("n other words, the ( in the declaration int (ip. ) Ii; is not the contents% of o'erator, it's the indicator that ip. is a 'ointer.) "f you have a 'ointer declaration containing an initiali2ation, and you ever have occasion to brea( it u' into a sim'le declaration and a conventional assignment, do it li(e this:

int (ip.; ip. ) Ii;

/on't write
int (ip.; (ip. ) Ii;

or you'll be trying to mi! oil and water again. #lso, when we write
int (ip;

although the asteris( affects ip's ty'e, it goes with the identifier name ip, not with the ty'e int on the left. 1o declare two 'ointers at once, the declaration loo(s li(e
int (ip*, (ip-; int( ip;

&ome 'eo'le write 'ointer declarations li(e this: 1his wor(s for one 'ointer, because C essentially ignores whites'ace. 7ut if you ever write
int( ip*, ip-;

it will declare you meant.

#( W;L=5=1O K;LBM (# one 'ointer%to%int ip* and one plain int ip-, which

is 'robably not what

8hat is all of this good for+ "f it was 0ust for changing variables li(e i from I to D, it would not be good for much. 8hat it's good for, among other things, is when for various reasons we don't (now e!actly which variable we want to change, 0ust li(e the ban( didn't (now e!actly which club member it wanted to send the statement to.

1A.2 Pointers and ArraysB Pointer Arit$metic


41his section corres'onds to K&R &ec. I.C6 >ointers do not have to 'oint to single variables. 1hey can also 'oint at the cells of an array. ?or e!am'le, we can write
int (ip; int aR* S; ip ) IaR.S;

and we would end u' with ip 'ointing at the fourth cell of the array a (remember, arrays are <%based, so aR S is the first cell). 8e could illustrate the situation li(e this:

8e'd use this ip 0ust li(e the one in the 'revious section: (ip gives us what ip 'oints to, which in this case will be the value in aR.S. -nce we have a 'ointer 'ointing into an array, we can start doing pointer arithmetic. Kiven that ip is a 'ointer to aR.S, we can add 5 to ip:

ip + *

8hat does it mean to add one to a 'ointer+ "n C, it gives a 'ointer to the cell one farther on, which in this case is aR/S. 1o ma(e this clear, let's assign this new 'ointer to another 'ointer variable:
ip- ) ip + *;

;ow the 'icture loo(s li(e this:

"f we now do
(ip- ) /; we've set aR/S to G. 7ut it's

not necessary to assign a new 'ointer value to a 'ointer variable in order to use it we could also com'ute a new 'ointer value and use it immediately:
((ip + *) ) 2;

"n this last e!am'le, we've changed aR/S again, setting it to I. 1he 'arentheses are needed because the unary $$contents of'' o'erator ( has higher precedence (i.e., binds more tightly than) the addition o'erator. "f we wrote (ip + *, without the 'arentheses, we'd be fetching the value 'ointed to by ip, and adding 5 to that value. 1he e!'ression ((ip + *), on the other hand, accesses the value one 'ast the one 'ointed to by ip. Kiven that we can add 5 to a 'ointer, it's not sur'rising that we can add and subtract other numbers as well. "f ip still 'oints to aR.S, then
((ip + .) ) 3;

sets aR0S to D, and sets aR*S to G.

((ip " -) ) /;

=' above, we added 5 to ip and assigned the new 'ointer to ip-, but there's no reason we can't add one to a 'ointer, and change the same 'ointer:
ip ) ip + *;

;ow ip 'oints one 'ast where it used to (to aR/S, if we hadn't changed it in the meantime). 1he shortcuts we learned in a 'revious cha'ter all wor( for 'ointers, too: we could also increment a 'ointer using
ip +) *; ip++;

or -f course, 'ointers are not limited to ints. "t's )uite common to use 'ointers to other ty'es, es'ecially char. *ere is the innards of the mAstrcmp function we saw in a 'revious cha'ter, rewritten to use 'ointers. (mAstrcmp, you may recall, com'ares two strings, character by character.)

char (p* ) Istr*R S, (p- ) Istr-R S; while(*) { if((p* !) (p-) return (p* " (p-; if((p* )) 4\ 4 JJ (p- )) 4\ 4) return ; p*++; p-++; !

1he autoincrement o'erator ++ (li(e its com'anion, "") ma(es it easy to do two things at once. 8e've seen idioms li(e aRi++S which accesses aRiS and simultaneously increments i, leaving it referencing the ne!t cell of the array a. 8e can do the same thing with 'ointers: an e!'ression li(e (ip++ lets us access what ip 'oints to, while simultaneously incrementing ip so that it 'oints to the ne!t element. 1he 'reincrement form wor(s, too: (++ip increments ip, then accesses what it 'oints to. &imilarly, we can use notations li(e (ip"" and (""ip. #s another e!am'le, here is the strcpA (string co'y) loo' from a 'revious cha'ter, rewritten to use 'ointers:
char (dp ) IdestR S, (sp ) IsrcR S; while((sp !) 4\ 4) (dp++ ) (sp++; (dp ) 4\ 4;

(-ne )uestion that comes u' is whether the e!'ression (p++ increments p or what it 'oints to. 1he answer is that it increments p. 1o increment what p 'oints to, you can use ((p)++.) 8hen you're doing 'ointer arithmetic, you have to remember how big the array the 'ointer 'oints into is, so that you don't ever 'oint outside it. "f the array a has 5< elements, you can't access aR2 S or aR"*S or even aR* S (remember, the valid subscri'ts for a 5<%element array run from < to A). &imilarly, if a has 5< elements and ip 'oints to aR.S, you can't com'ute or access ip + * or ip " 2. (1here is one s'ecial case: you can, in this case, com'ute, but not access, a 'ointer to the none!istent element 0ust beyond the end of the array, which in this case is IaR* S. 1his becomes useful when you're doing 'ointer com'arisons, which we'll loo( at ne!t.)

1A.3 Pointer Su!traction and Comparison


#s we've seen, you can add an integer to a 'ointer to get a new 'ointer, 'ointing somewhere beyond the original (as long as it's in the same array). ?or e!am'le, you might write
ip- ) ip* + .;

#''lying a little algebra, you might wonder whether


ip- " ip* ) .

and the answer is, yes. 8hen you subtract two 'ointers, as long as they 'oint into the same array, the result is the number of elements se'arating them. .ou can also as( (again, as long as they 'oint into the same array) whether one 'ointer is greater or less than another: one 'ointer is $$greater than'' another if it 'oints beyond where the other one 'oints. .ou can also com'are 'ointers for e)uality and ine)uality: two 'ointers are e)ual if they 'oint to the same variable or to the same cell in an array, and are (obviously) une)ual if they don't. (8hen testing for e)uality or ine)uality, the two 'ointers do not have to 'oint into the same array.) -ne common use of 'ointer com'arisons is when co'ying arrays using 'ointers. *ere is a code fragment which co'ies 5< elements from arraA* to arraA-, using 'ointers. "t uses an end 'ointer, ep, to (ee' trac( of when it should sto' co'ying.
int arraA*R* S, arraA-R* S; int (ip*, (ip- ) IarraA-R S; int (ep ) IarraA*R* S; for(ip* ) IarraA*R S; ip* < ep; ip*++) (ip-++ ) (ip*; we mentioned, there is no element arraA*R* S, but it

#s is legal to com'ute a 'ointer to this (none!istent) element, as long as we only use it in 'ointer com'arisons li(e this (that is, as long as we never try to fetch or store the value that it 'oints to.)

1A.4 "ull Pointers


8e said that the value of a 'ointer variable is a 'ointer to some other variable. 1here is one other value a 'ointer may have: it may be set to a null pointer. # null pointer is a s'ecial 'ointer value that is (nown not to 'oint anywhere. 8hat this means that no other valid 'ointer, to any other variable or array cell or anything else, will ever com'are e)ual to a null 'ointer. 1he most straightforward way to $$get'' a null 'ointer in your 'rogram is by using the 'redefined constant BV11, which is defined for you by several standard header files, including <stdio.h>, <stdli$.h>, and <strin%.h>. 1o initiali2e a 'ointer to a null 'ointer, you might use code li(e
#include <stdio.h> int (ip ) BV11;

and to test it for a null 'ointer before ins'ecting the value 'ointed to you might use code li(e
if(ip !) BV11) printf(",d\n", (ip);

"t is also 'ossible to refer to the null 'ointer by using a constant , and you will see some code that sets null 'ointers by sim'ly doing

int (ip )

("n fact, BV11 is a 're'rocessor macro which ty'ically has the value, or re'lacement te!t, .) ?urthermore, since the definition of $$true'' in C is a value that is not e)ual to <, you will see code that tests for non%null 'ointers with abbreviated code li(e
if(ip) printf(",d\n", (ip);

1his has the same meaning as our 'revious e!am'le if(ip) is e)uivalent to if(ip !) ) and to if(ip !) BV11). #ll of these uses are legal, and although " recommend that you use the constant BV11 for clarity, you will come across the other forms, so you should be able to recogni2e them. .ou can use a null 'ointer as a 'laceholder to remind yourself (or, more im'ortantly, to hel' your 'rogram remember) that a 'ointer variable does not 'oint anywhere at the moment and that you should not use the $$contents of'' o'erator on it (that is, you should not try to ins'ect what it 'oints to, since it doesn't 'oint to anything). # function that returns 'ointer values can return a null 'ointer when it is unable to 'erform its tas(. (# null 'ointer used in this way is analogous to the >LH value that functions li(e %etchar return.) #s an e!am'le, let us write our own version of the standard library function strstr, which loo(s for one string within another, returning a 'ointer to the string if it can, or a null 'ointer if it cannot. *ere is the function, using the obvious brute%force algorithm: at every character of the in'ut string, the code chec(s for a match there of the 'attern string:
#include <stddef.h> char (mAstrstr(char inputRS, char patRS) { char (start, (p*, (p-; for(start ) IinputR S; (start !) 4\ 4; start++) { #( for each position in input strin%... (# p* ) pat; #( prepare to chec& for pattern strin% there (# p- ) start; while((p* !) 4\ 4) { if((p* !) (p-) #( characters differ (# $rea&; p*++; p-++; ! if((p* )) 4\ 4) #( found match (# return start; ! return BV11;

1he start 'ointer ste's over each character 'osition in the input string. #t each character, the inner loo' chec(s for a match there, by using p* to ste' over the 'attern string (pat), and p- to ste' over the in'ut string (starting at start). 8e com'are successive characters until either (a) we reach the end of the 'attern string ((p* )) 4\ 4), or (b) we find two characters which differ. 8hen we're done with the inner loo', if we reached the end of the 'attern string ((p* )) 4\ 4), it means that all 'receding characters matched, and we found a com'lete match for the 'attern starting at start, so we return start. -therwise, we go around the outer loo' again, to try another starting 'osition. "f we run out of those (if (start )) 4\ 4), without finding a match, we return a null 'ointer. ;otice that the function is declared as returning (and does in fact return) a 'ointer%to% char. 8e can use mAstrstr (or its standard library counter'art strstr) to determine whether one string contains another:
if(mAstrstr("Hello, world!", "lo") )) BV11) printf("no\n"); else printf("Aes\n");

"n general, C does not initiali2e 'ointers to null for you, and it never tests 'ointers to see if they are null before using them. "f one of the 'ointers in your 'rograms 'oints somewhere some of the time but not all of the time, an e!cellent convention to use is to set it to a null 'ointer when it doesn't 'oint anywhere valid, and to test to see if it's a null 'ointer before using it. 7ut you must use e!'licit code to set it to BV11, and to test it against BV11. ("n other words, 0ust setting an unused 'ointer variable to BV11 doesn't guarantee safety you also have to chec( for the null value before using the 'ointer.) -n the other hand, if you (now that a 'articular 'ointer variable is always valid, you don't have to insert a 'aranoid test against BV11 before using it.

1A.# 99E1ui+alence.. !et)een Pointers and Arrays


1here are a number of similarities between arrays and 'ointers in C. "f you have an array
int aR* S;

you can refer to aR S, aR*S, aR-S, etc., or to aRiS where i is an int. "f you declare a 'ointer variable ip and set it to 'oint to the beginning of an array:
int (ip ) IaR S;

you can refer to (ip, ((ip+*), ((ip+-), etc., or to ((ip+i) where i is an int. 1here are also differences, of course. .ou cannot assign two arrays the code
int aR* S, $R* S; a ) $; #( K;LBM (#

is illegal. #s we've seen, though, you can assign two 'ointer variables:

int (ip*, (ip-; ip* ) IaR S; ip- ) ip*;

>ointer assignment is straightforward the 'ointer on the left is sim'ly made to 'oint wherever the 'ointer on the right does. 8e haven't co'ied the data 'ointed to (there's still 0ust one co'y, in the same 'lace) we've 0ust made two 'ointers 'oint to that one 'lace. 1he similarities between arrays and 'ointers end u' being )uite useful, and in fact C builds on the similarities, leading to what is called $$the e)uivalence of arrays and 'ointers in C.'' 8hen we s'ea( of this $$e)uivalence'' we do not mean that arrays and 'ointers are the same thing (they are in fact )uite different), but rather that they can be used in related ways, and that certain o'erations may be used between them. 1he first such o'eration is that it is 'ossible to (a''arently) assign an array to a 'ointer:
int aR* S; int (ip; ip ) a;

8hat can this mean+ "n that last assignment ip ) a, aren't we mi!ing a''les and oranges again+ "t turns out that we are not C defines the result of this assignment to be that ip receives a 'ointer to the first element of a. "n other words, it is as if you had written
ip ) IaR S;

1he second facet of the e)uivalence is that you can use the $$array subscri'ting'' notation RiS on 'ointers, too. "f you write
ipR.S

it is 0ust as if you had written


((ip + .)

&o when you have a 'ointer that 'oints to a bloc( of memory, such as an array or a 'art of an array, you can treat that 'ointer $$as if'' it were an array, using the convenient RiS notation. "n other words, at the beginning of this section when we tal(ed about (ip, ((ip+*), ((ip+-), and ((ip+i), we could have written ipR S, ipR*S, ipR-S, and ipRiS. #s we'll see, this can be )uite useful (or at least convenient). 1he third facet of the e)uivalence (which is actually a more general version of the first one we mentioned) is that whenever you mention the name of an array in a conte!t where the $$value'' of the array would be needed, C automatically generates a 'ointer to the first element of the array, as if you had written IarraAR S. 8hen you write something li(e
int aR* S; int (ip; ip ) a + .;

it is as if you had written


ip ) IaR S + .;

which (and you might li(e to convince yourself of this) gives the same result as if you had written
ip ) IaR.S;

?or e!am'le, if the character array


char strin%R* int len; char (p; S;

contains some string, here is another way to find its length:

for(p ) strin%; (p !) 4\ 4; p++) ; len ) p " strin%;

#fter the loo', p 'oints to the 4\ 4 terminating the string. 1he e!'ression p " strin% is e)uivalent to p " Istrin%R S, and gives the length of the string. (-f course, we could also call strlen in fact here we've essentially written another im'lementation of strlen.)

1A.& Arrays and Pointers as Function Arguments


41his section corres'onds to K&R &ec. I.@6 Barlier, we learned that functions in C receive co'ies of their arguments. (1his means that C uses call by value it means that a function can modify one of its arguments without modifying the value in the caller.) 8e didn't say so at the time, but when a function is called, the co'ies of the arguments are made as if by assignment. 7ut since arrays can't be assigned, how can a function receive an array as an argument+ 1he answer will e!'lain why arrays are an a''arent e!ce'tion to the rule that functions cannot modify their arguments. 8e've been regularly calling a function %etline li(e this:
char lineR* S; %etline(line, * ); the intention that %etline read

with the ne!t line of in'ut into the character array line. 7ut in the 'revious 'aragra'h, we learned that when we mention the name of an array in an e!'ression, the com'iler generates a 'ointer to its first element. &o the call above is as if we had written
char lineR* S; %etline(IlineR S, * ); "n other words, the %etline function does receives a 'ointer to char3

not receive an array of char at all it actually

#s we've seen throughout this cha'ter, it's straightforward to mani'ulate the elements of an array using 'ointers, so there's no 'articular insurmountable difficulty if %etline receives a 'ointer. -ne )uestion remains, though: we had been defining %etline with its line 'arameter declared as an array:
int %etline(char lineRS, int ma') { ...

8e mentioned that we didn't have to s'ecify a si2e for the line 'arameter, with the e!'lanation that %etline really used the array in its caller, where the actual si2e was s'ecified. 7ut that declaration certainly does loo( li(e an array%%how can it wor( when %etline actually receives a 'ointer+ 1he answer is that the C com'iler does a little something behind your bac(. "t (nows that whenever you mention an array name in an e!'ression, it (the com'iler) generates a 'ointer to the array's first element. 1herefore, it (nows that a function can never actually receive an array as a 'arameter. 1herefore, whenever it sees you defining a function that seems to acce't an array as a 'arameter, the com'iler )uietly 'retends that you had declared it as acce'ting a 'ointer, instead. 1he definition of %etline above is com'iled e!actly as if it had been written
int %etline(char (line, int ma') { ... ! loo( at how %etline might be written if we thought of

9et's (argument) as a 'ointer, instead:


int { int int ma' nch ) ; c; ) ma' " *;

its first 'arameter

%etline(char (line, int ma')

#( lea9e room for 4\ 4 (#

#ifndef HM>T1<B> while((c ) %etchar()) !) >LH) #else while((c ) %etc(fp)) !) >LH) #endif { if(c )) 4\n4) $rea&; if(nch < ma') { ((line + nch) ) c; nch ) nch + *; ! ! if(c )) >LH II nch )) return >LH; ((line + nch) ) 4\ 4; return nch; ! )

7ut, as we've learned, we can also use $$array subscri't'' notation with 'ointers, so we could rewrite the 'ointer version of %etline li(e this:
int %etline(char (line, int ma') { int nch ) ;

int c; ma' ) ma' " *; #ifndef HM>T1<B> while((c ) %etchar()) !) >LH) #else while((c ) %etc(fp)) !) >LH) #endif { if(c )) 4\n4) $rea&; if(nch < ma') { lineRnchS ) c; nch ) nch + *; ! ! if(c )) >LH II nch )) return >LH; lineRnchS ) 4\ 4; return nch; ! )

#( lea9e room for 4\ 4 (#

7ut this is e!actly what we'd written before (see cha'ter E, &ec. E.C), e!ce't that the declaration of the line 'arameter is different. "n other words, within the body of the function, it hardly matters whether we thought line was an array or a 'ointer, since we can use array subscri'ting notation with both arrays and 'ointers. 1hese games that the com'iler is 'laying with arrays and 'ointers may seem bewildering at first, and it may seem faintly miraculous that everything comes out in the wash when you declare a function li(e %etline that seems to acce't an array. 1he e)uivalence in C between arrays and 'ointers can be confusing, but it does wor( and is one of the central features of C. "f the games which the com'iler 'lays ('retending that you declared a 'arameter as a 'ointer when you thought you declared it as an array) bother you, you can do two things: 5. Continue to 'retend that functions can receive arrays as 'arameters declare and use them that way, but remember that unli(e other arguments, a function can modify the co'y in its caller of an argument that (seems to be) an array. @. Reali2e that arrays are always 'assed to functions as 'ointers, and always declare your functions as acce'ting 'ointers.

1A.' Strings
7ecause of the $$e)uivalence'' of arrays and 'ointers, it is e!tremely common to refer to and mani'ulate strings as character 'ointers, or char ('s. "t is so common, in fact, that it is easy to forget that strings are arrays, and to imagine that they're re'resented by 'ointers. (#ctually, in the case of strings, it may not even matter that much if the distinction gets a little blurred there's certainly nothing wrong with referring to a

character 'ointer, suitably initiali2ed, as a $$string.'') 9et's loo( at a few of the im'lications: 5. #ny function that mani'ulates a string will actually acce't it as a char ( argument. 1he caller may 'ass an array containing a string, but the function will receive a 'ointer to the array's (string's) first element (character). @. 1he ,s format in printf e!'ects a character 'ointer. C. #lthough you have to use strcpA to co'y a string from one array to another, you can use sim'le 'ointer assignment to assign a string to a 'ointer. 1he string being assigned might either be in an array or 'ointed to by another 'ointer. "n other words, given
/. 2. char strin%RS ) "Hello, world!"; char (p*, (p-;

both
p* ) strin%

and
p- ) p*

are legal. (Remember, though, that when you assign a 'ointer, you're ma(ing a co'y of the 'ointer but not of the data it 'oints to. "n the first e!am'le, p* ends u' 'ointing to the string in strin%. "n the second e!am'le, p- ends u' 'ointing to the same string as p*. "n any case, after a 'ointer assignment, if you ever change the string (or other data) 'ointed to, the change is $$visible'' to both 'ointers. E. :any 'rograms mani'ulate strings e!clusively using character 'ointers, never e!'licitly declaring any actual arrays. #s long as these 'rograms are careful to allocate a''ro'riate memory for the strings, they're 'erfectly valid and correct. 8hen you start wor(ing heavily with strings, however, you have to be aware of one subtle fact. 8hen you initiali2e a character array with a string constant:
char strin%RS ) "Hello, world!";

you end u' with an array containing the string, and you can modify the array's contents to your heart's content:
strin%R S ) 4X4;

*owever, it's 'ossible to use string constants (the formal term is string literals) at other 'laces in your code. &ince they're arrays, the com'iler generates 'ointers to their first elements when they're used in e!'ressions, as usual. 1hat is, if you say
char (p* ) "Hello"; int len ) strlen("world");

it's almost as if you'd said


char internalYstrin%Y*RS ) "Hello";

*ere, are su''osed to suggest the fact that the com'iler is actually generating little tem'orary arrays every time you use a string constant in your code. $owever, the subtle fact is that the arrays which are $$behind'' the string constants are not necessarily modifiable. "n 'articular, the com'iler may store them in read%only%memory. 1herefore, if you write
char (p. ) "Hello, world!"; p.R S ) 4X4;

char internalYstrin%Y-RS ) "world"; char (p* ) IinternalYstrin%Y*R S; int len ) strlen(IinternalYstrin%Y-R S); the arrays named internalYstrin%Y* and internalYstrin%Y-

your 'rogram may crash, because it may try to store a value (in this case, the character 4X4) into nonwritable memory. 1he moral is that whenever you're building or modifying strings, you have to ma(e sure that the memory you're building or modifying them in is writable. 1hat memory should either be an array you've allocated, or some memory which you've dynamically allocated by the techni)ues which we'll see in the ne!t cha'ter. :a(e sure that no 'art of your 'rogram will ever try to modify a string which is actually one of the unnamed, unwritable arrays which the com'iler generated for you in res'onse to one of your string constants. (1he only e!ce'tion is array initiali2ation, because if you write to such an array, you're writing to the array, not to the string literal which you used to initiali2e the array.)

1A.C Example> (rea-ing a 5ine into 99=ords..


"n an earlier assignment, an $$e!tra credit'' version of a 'roblem as(ed you to write a little chec(boo( balancing 'rogram that acce'ted a series of lines of the form
deposit * chec& * chec& *-../ deposit 2 chec& -

"t was a sur'rising nuisance to do this in an ad hoc way, using only the tools we had at the time. "t was easy to read each line, but it was cumbersome to brea( it u' into the word ($$de'osit'' or $$chec('') and the amount. " find it very convenient to use a more general a''roach: first, brea( lines li(e these into a series of whites'ace%se'arated words, then deal with each word se'arately. 1o do this, we will use an array of pointers to char, which we can also thin( of as an $$array of strings,'' since a string is an array of char, and a 'ointer%to%char can easily 'oint at a string. *ere is the declaration of such an array:
char (wordsR* S;

1his is the first com'licated C declaration we've seen: it says that words is an array of 5< 'ointers to char. 8e're going to write a function, %etwords, which we can call li(e this:
int nwords;

nwords ) %etwords(line, words, * );

where line is the line we're brea(ing into words, words is the array to be filled in with the ('ointers to the) words, and nwords (the return value from %etwords) is the number of words which the function finds. (#s with %etline, we tell the function the si2e of the array so that if the line should ha''en to contain more words than that, it won't overflow the array). *ere is the definition of the %etwords function. "t finds the beginning of each word, 'laces a 'ointer to it in the array, finds the end of that word (which is signified by at least one whites'ace character) and terminates the word by 'lacing a 4\ 4 character after it. (1he 4\ 4 character will overwrite the first whites'ace character following the word.) ;ote that the original in'ut string is therefore modified by %etwords: if you were to try to 'rint the in'ut line after calling %etwords, it would a''ear to contain only its first word (because of the first inserted 4\ 4).
#include <stddef.h> #include <ctApe.h> %etwords(char (line, char (wordsRS, int ma'words) { char (p ) line; int nwords ) ; while(*) { while(isspace((p)) p++; if((p )) 4\ 4) return nwords; wordsRnwords++S ) p; while(!isspace((p) II (p !) 4\ 4) p++; if((p )) 4\ 4) return nwords; (p++ ) 4\ 4; if(nwords >) ma'words) return nwords; ! !

Bach time through the outer while loo', the function tries to find another word. ?irst it s(i's over whites'ace (which might be leading s'aces on the line, or the s'ace(s) se'arating this word from the 'revious one). 1he isspace function is new: it's in the standard library, declared in the header file <ctApe.h>, and it returns non2ero ($$true'') if the character you hand it is a s'ace character (a s'ace or a tab, or any other whites'ace character there might ha''en to be).

8hen the function finds a non%whites'ace character, it has found the beginning of another word, so it 'laces the 'ointer to that character in the ne!t cell of the words array. 1hen it ste's though the word, loo(ing at non%whites'ace characters, until it finds another whites'ace character, or the \ at the end of the line. "f it finds the \ , it's done with the entire line otherwise, it changes the whites'ace character to a \ , to terminate the word it's 0ust found, and continues. ("f it's found as many words as will fit in the words array, it returns 'rematurely.) Bach time it finds a word, the function increments the number of words (nwords) it has found. &ince arrays in C start at R S, the number of words the function has found so far is also the inde! of the cell in the words array where the ne!t word should be stored. 1he function actually assigns the ne!t word and increments nwords in one e!'ression:
wordsRnwords++S ) p;

.ou should convince yourself that this arrangement wor(s, and that (in this case) the 'reincrement form
wordsR++nwordsS ) p; #( K;LBM (#

would not behave as desired. 8hen the function is done (when it finds the \ terminating the in'ut line, or when it runs out of cells in the words array) it returns the number of words it has found. *ere is a com'lete e!am'le of calling %etwords:
char lineRS ) "this is a test"; int i; nwords ) %etwords(line, words, * ); for(i ) ; i < nwords; i++) printf(",s\n", wordsRiS);

Chapter 11:

emory /llocation

"n this cha'ter, we'll meet malloc, C's dynamic memory allocation function, and we'll cover dynamic memory allocation in some detail. #s we begin doing dynamic memory allocation, we'll begin to see (if we haven't seen it already) what 'ointers can really be good for. :any of the 'ointer e!am'les in the 'revious cha'ter (those which used 'ointers to access arrays) didn't do all that much for us that we couldn't have done using arrays. *owever, when we begin doing dynamic memory allocation, 'ointers are the only way to go, because what malloc returns is a 'ointer to the memory it gives us. (/ue to the e)uivalence between 'ointers and arrays, though, we will still be able to thin( of dynamically allocated regions of storage as if they were arrays, and even to use array%li(e subscri'ting notation on them.)

.ou have to be careful with dynamic memory allocation. malloc o'erates at a 'retty $$low level'' you will often find yourself having to do a certain amount of wor( to manage the memory it gives you. "f you don't (ee' accurate trac( of the memory which malloc has given you, and the 'ointers of yours which 'oint to it, it's all too easy to accidentally use a 'ointer which 'oints $$nowhere'', with generally un'leasant results. (1he basic 'roblem is that if you assign a value to the location 'ointed to by a 'ointer: and if 'oints $$nowhere'', well actually it can be construed to 'oint somewhere, 0ust not where you wanted it to, and that $$somewhere'' is where the < gets written. "f the $$somewhere'' is memory which is in use by some other 'art of your 'rogram, or even worse, if the o'erating system has not 'rotected itself from you and $$somewhere'' is in fact in use by the o'erating system, things could get ugly.)
; the 'ointer p (p )

11.1 Allocating @emory )it$

malloc

41his section corres'onds to 'arts of K&R &ecs. I.G, I.E, E.I, and D.H.I6 # 'roblem with many sim'le 'rograms, including in 'articular little teaching 'rograms such as we've been writing so far, is that they tend to use fi!ed%si2e arrays which may or may not be big enough. 8e have an array of 5<< ints for the numbers which the user enters and wishes to find the average of%%what if the user enters 5<5 numbers+ 8e have an array of 5<< chars which we 'ass to %etline to receive the user's in'ut%%what if the user ty'es a line of @<< characters+ "f we're luc(y, the relevant 'arts of the 'rogram chec( how much of an array they've used, and 'rint an error message or otherwise gracefully abort before overflowing the array. "f we're not so luc(y, a 'rogram may sail off the end of an array, overwriting other data and behaving )uite badly. "n either case, the user doesn't get his 0ob done. *ow can we avoid the restrictions of fi!ed%si2e arrays+ 1he answers all involve the standard library function malloc. Mery sim'ly, malloc returns a 'ointer to n bytes of memory which we can do anything we want to with. "f we didn't want to read a line of in'ut into a fi!ed%si2e array, we could use malloc, instead. *ere's the first ste':
#include <stdli$.h> char (line; int linelen ) * ; line ) malloc(linelen); #( incomplete "" malloc4s return 9alue not chec&ed (# %etline(line, linelen); malloc is declared in <stdli$.h>, so we #include that header in any 'rogram that calls malloc. # $$byte'' in C is, by definition, an amount of storage suitable for storing one character, so the above invocation of malloc gives us e!actly as many chars as we as(

for. 8e could illustrate the resulting 'ointer li(e this:

1he 5<< bytes of memory (not all of which are shown) 'ointed to by line are those allocated by malloc. (1hey are brand%new memory, conce'tually a bit different from the memory which the com'iler arranges to have allocated automatically for our conventional variables. 1he 5<< bo!es in the figure don't have a name ne!t to them, because they're not storage for a variable we've declared.) #s a second e!am'le, we might have occasion to allocate a 'iece of memory, and to co'y a string into it with strcpA:
char (p ) malloc(*2); #( incomplete "" malloc4s return 9alue not chec&ed (# strcpA(p, "Hello, world!");

8hen co'ying strings, remember that all strings have a terminating \ character. "f you use strlen to count the characters in a string for you, that count will not include the trailing \ , so you must add one before calling malloc:
char (somestrin%, (copA; ... copA ) malloc(strlen(somestrin%) + *); #( +* for \ #( incomplete "" malloc4s return 9alue not chec&ed (# strcpA(copA, somestrin%);

(#

8hat if we're not allocating characters, but integers+ "f we want to allocate 5<< ints, how many bytes is that+ "f we (now how big ints are on our machine (i.e. de'ending on whether we're using a 5E% or C@%bit machine) we could try to com'ute it ourselves, but it's much safer and more 'ortable to let C com'ute it for us. C has a si8eof o'erator, which com'utes the si2e, in bytes, of a variable or ty'e. "t's 0ust what we need when calling malloc. 1o allocate s'ace for 5<< ints, we could call
int (ip ) malloc(* ( si8eof(int));

1he use of the si8eof o'erator tends to loo( li(e a function call, but it's really an o'erator, and it does its wor( at com'ile time. &ince we can use array inde!ing synta! on 'ointers, we can treat a 'ointer variable after a call to malloc almost e!actly as if it were an array. "n 'articular, after the above call to malloc initiali2es ip to 'oint at storage for 5<< ints, we can access ipR S, ipR*S, ... u' to ipRDDS. 1his way, we can get the effect of an array even if we don't (now until run time how big the $$array'' should be. ("n a later section we'll see how we might deal with the case where we're not even sure at the 'oint we begin using it how big an $$array'' will eventually have to be.) -ur e!am'les so far have all had a significant omission: they have not chec(ed malloc's return value. -bviously, no real com'uter has an infinite amount of memory available, so there is no guarantee that malloc will be able to give us as much memory as we as( for. "f we call malloc(* ), or if we call malloc(* ) 5<,<<<,<<< times, we're 'robably going to run out of memory.

8hen malloc is unable to allocate the re)uested memory, it returns a null pointer. # null 'ointer, remember, 'oints definitively nowhere. "t's a $$not a 'ointer'' mar(er it's not a 'ointer you can use. (#s we said in section A.G, a null 'ointer can be used as a failure return from a function that returns 'ointers, and malloc is a 'erfect e!am'le.) 1herefore, whenever you call malloc, it's vital to chec( the returned 'ointer before using it3 "f you call malloc, and it returns a null 'ointer, and you go off and use that null 'ointer as if it 'ointed somewhere, your 'rogram 'robably won't last long. "nstead, a 'rogram should immediately chec( for a null 'ointer, and if it receives one, it should at the very least 'rint an error message and e!it, or 'erha's figure out some way of 'roceeding without the memory it as(ed for. 7ut it cannot go on to use the null 'ointer it got bac( from malloc in any way, because that null 'ointer by definition 'oints nowhere. ($$"t cannot use a null 'ointer in any way'' means that the 'rogram cannot use the ( or RS o'erators on such a 'ointer value, or 'ass it to any function that e!'ects a valid 'ointer.) # call to malloc, with an error chec(, ty'ically loo(s something li(e this:
int (ip ) malloc(* ( si8eof(int)); if(ip )) BV11) { printf("out of memorA\n"); exit&or&return !

#fter 'rinting the error message, this code should return to its caller, or e!it from the 'rogram entirely it cannot 'roceed with the code that would have used ip. -f course, in our e!am'les so far, we've still limited ourselves to $$fi!ed si2e'' regions of memory, because we've been calling malloc with fi!ed arguments li(e 5< or 5<<. (-ur call to %etline is still limited to 5<<%character lines, or whatever number we set the linelen variable to our ip variable still 'oints at only 5<< ints.) *owever, since the si2es are now values which can in 'rinci'le be determined at run%time, we've at least moved beyond having to recom'ile the 'rogram (with a bigger array) to accommodate longer lines, and with a little more wor(, we could arrange that the $$arrays'' automatically grew to be as large as re)uired. (?or e!am'le, we could write something li(e %etline which could read the longest in'ut line actually seen.) 8e'll begin to e!'lore this 'ossibility in a later section.

11.2 Freeing @emory


:emory allocated with malloc lasts as long as you want it to. "t does not automatically disa''ear when a function returns, as automatic%duration variables do, but it does not have to remain for the entire duration of your 'rogram, either. Lust as you can use malloc to control e!actly when and how much memory you allocate, you can also control e!actly when you deallocate it. "n fact, many 'rograms use memory on a transient basis. 1hey allocate some memory, use it for a while, but then reach a 'oint where they don't need that 'articular 'iece any

more. 7ecause memory is not ine!haustible, it's a good idea to deallocate (that is, release or free) memory you're no longer using. /ynamically allocated memory is deallocated with the free function. "f p contains a 'ointer 'reviously returned by malloc, you can call
free(p);

which will $$give the memory bac('' to the stoc( of memory (sometimes called the $$arena'' or $$'ool'') from which malloc re)uests are satisfied. Calling free is sort of the ultimate in recycling: it costs you almost nothing, and the memory you give bac( is immediately usable by other 'arts of your 'rogram. (1heoretically, it may even be usable by other 'rograms.) (?reeing unused memory is a good idea, but it's not mandatory. 8hen your 'rogram e!its, any memory which it has allocated but not freed should be automatically released. "f your com'uter were to somehow $$lose'' memory 0ust because your 'rogram forgot to free it, that would indicate a 'roblem or deficiency in your o'erating system.) ;aturally, once you've freed some memory you must remember not to use it any more. #fter calling
free(p);

it is 'robably the case that p still 'oints at the same memory. *owever, since we've given it bac(, it's now $$available,'' and a later call to malloc might give that memory to some other 'art of your 'rogram. "f the variable p is a global variable or will otherwise stic( around for a while, one good way to record the fact that it's not to be used any more would be to set it to a null 'ointer:
free(p); p ) BV11;

;ow we don't even have the 'ointer to the freed memory any more, and (as long as we chec( to see that p is non%BV11 before using it), we won't misuse any memory via the 'ointer p. 8hen thin(ing about malloc, free, and dynamically%allocated memory in general, remember again the distinction between a 'ointer and what it 'oints to. "f you call malloc to allocate some memory, and store the 'ointer which malloc gives you in a local 'ointer variable, what ha''ens when the function containing the local 'ointer variable returns+ "f the local 'ointer variable has automatic duration (which is the default, unless the variable is declared static), it will disa''ear when the function returns. 7ut for the 'ointer variable to disa''ear says nothing about the memory 'ointed to3 1hat memory still e!ists and, as far as malloc and free are concerned, is still allocated. 1he only thing that has disa''eared is the 'ointer variable you had which 'ointed at the allocated memory. (?urthermore, if it contained the only co'y of the 'ointer you had, once it disa''ears, you'll have no way of freeing the memory, and no way of using it, either. =sing memory and freeing memory both re)uire that you have at least one 'ointer to the memory3)

11.3 <eallocating @emory (loc-s


&ometimes you're not sure at first how much memory you'll need. ?or e!am'le, if you need to store a series of items you read from the user, and if the only way to (now how many there are is to read them until the user ty'es some $$end'' signal, you'll have no way of (nowing, as you begin reading and storing the first few, how many you'll have seen by the time you do see that $$end'' mar(er. .ou might want to allocate room for, say, 5<< items, and if the user enters a 5<5st item before entering the $$end'' mar(er, you might wish for a way to say $$uh, malloc, remember those 5<< items " as(ed for+ Could " change my mind and have @<< instead+'' "n fact, you can do e!actly this, with the realloc function. .ou hand realloc an old 'ointer (such as you received from an initial call to malloc) and a new si2e, and realloc does what it can to give you a chun( of memory big enough to hold the new si2e. ?or e!am'le, if we wanted the ip variable from an earlier e!am'le to 'oint at @<< ints instead of 5<<, we could try calling
ip ) realloc(ip, ( si8eof(int));

&ince you always want each bloc( of dynamically%allocated memory to be contiguous (so that you can treat it as if it were an array), you and realloc have to worry about the case where realloc can't ma(e the old bloc( of memory bigger $$in 'lace,'' but rather has to relocate it elsewhere in order to find enough contiguous s'ace for the new re)uested si2e. realloc does this by returning a new 'ointer. "f realloc was able to ma(e the old bloc( of memory bigger, it returns the same 'ointer. "f realloc has to go elsewhere to get enough contiguous memory, it returns a 'ointer to the new memory, after co'ying your old data there. ("n this case, after it ma(es the co'y, it frees the old bloc(.) ?inally, if realloc can't find enough memory to satisfy the new re)uest at all, it returns a null 'ointer. 1herefore, you usually don't want to overwrite your old 'ointer with realloc's return value until you've tested it to ma(e sure it's not a null 'ointer. .ou might use code li(e this:

int (newp; newp ) realloc(ip, ( si8eof(int)); if(newp !) BV11) ip ) newp; else { printf("out of memorA\n"); #( e'it or return (# #( $ut ip still points at * ints (# ! "f realloc returns something other than a null 'ointer, it succeeded, and we set ip to what it returned. (8e've either set ip to what it used to be or to a new 'ointer, but in either case, it 'oints to where our data is now.) "f realloc returns a null 'ointer, however, we hang on to our old 'ointer in ip which still 'oints at our original 5<< values.

>utting this all together, here is a 'iece of code which reads lines of te!t from the user, treats each line as an integer by calling atoi, and stores each integer in a dynamically% allocated $$array'':

#define U5N1<B> * char lineRU5N1<B>S; int (ip; int nalloc, nitems; nalloc ) * ; ip ) malloc(nalloc ( si8eof(int)); if(ip )) BV11) { printf("out of memorA\n"); e'it(*); ! nitems ) ; #( initial allocation (#

while(%etline(line, U5N1<B>) !) >LH) { if(nitems >) nalloc) { #( increase allocation (# int (newp; nalloc +) * ; newp ) realloc(ip, nalloc ( si8eof(int)); if(newp )) BV11) { printf("out of memorA\n"); e'it(*); ! ip ) newp; ! ipRnitems++S ) atoi(line); !

8e use two different variables to (ee' trac( of the $$array'' 'ointed to by ip. nalloc is how many elements we've allocated, and nitems is how many of them are in use. 8henever we're about to store another item in the $$array,'' if nitems >) nalloc, the old $$array'' is full, and it's time to call realloc to ma(e it bigger. ?inally, we might as( what the return ty'e of malloc and realloc is, if they are able to return 'ointers to char or 'ointers to int or (though we haven't seen it yet) 'ointers to any other ty'e. 1he answer is that both of these functions are declared (in <stdli$.h>) as returning a ty'e we haven't seen, 9oid ( (that is, 'ointer to 9oid). 8e haven't really seen ty'e 9oid, either, but what's going on here is that 9oid ( is s'ecially defined as a $$generic'' 'ointer ty'e, which may be used (strictly s'ea(ing, assigned to or from) any 'ointer ty'e.

11.4 Pointer Sa,ety


#t the beginning of the 'revious cha'ter, we said that the hard thing about 'ointers is not so much mani'ulating them as ensuring that the memory they 'oint to is valid. 8hen a 'ointer doesn't 'oint where you thin( it does, if you inadvertently access or modify the

memory it 'oints to, you can damage other 'arts of your 'rogram, or (in some cases) other 'rograms or the o'erating system itself3 8hen we use 'ointers to sim'le variables, as in section 5<.5, there's not much that can go wrong. 8hen we use 'ointers into arrays, as in section 5<.@, and begin moving the 'ointers around, we have to be more careful, to ensure that the roving 'ointers always stay within the bounds of the array(s). 8hen we begin 'assing 'ointers to functions, and es'ecially when we begin returning them from functions (as in the strstr function of section 5<.G) we have to be more careful still, because the code using the 'ointer may be far removed from the code which owns or allocated the memory. -ne 'articular 'roblem concerns functions that return 'ointers. 8here is the memory to which the returned 'ointer 'oints+ "s it still around by the time the function returns+ 1he strstr function returns either a null 'ointer (which 'oints definitively nowhere, and which the caller 'resumably chec(s for) or it returns a 'ointer which 'oints into the in'ut string, which the caller su''lied, which is 'retty safe. -ne thing a function must not do, however, is return a 'ointer to one of its own, local, automatic%duration arrays. Remember that automatic%duration variables (which includes all non%static local variables), including automatic%duration arrays, are deallocated and disa''ear when the function returns. "f a function returns a 'ointer to a local array, that 'ointer will be invalid by the time the caller tries to use it. ?inally, when we're doing dynamic memory allocation with malloc, realloc, and free, we have to be most careful of all. /ynamic allocation gives us a lot more fle!ibility in how our 'rograms use memory, although with that fle!ibility comes the res'onsibility that we manage dynamically allocated memory carefully. 1he 'ossibilities for misdirected 'ointers and associated mayhem are greatest in 'rograms that ma(e heavy use of dynamic memory allocation. .ou can reduce these 'ossibilities by designing your 'rogram in such a way that it's easy to ensure that 'ointers are used correctly and that memory is always allocated and deallocated correctly. ("f, on the other hand, your 'rogram is designed in such a way that meeting these guarantees is a tedious nuisance, sooner or later you'll forget or neglect to, and maintenance will be a nightmare.)

Chapter 12: Input and Output


&o far, we've been calling printf to 'rint formatted out'ut to the $$standard out'ut'' (wherever that is). 8e've also been calling %etchar to read single characters from the $$standard in'ut,'' and putchar to write single characters to the standard out'ut. $$&tandard in'ut'' and $$standard out'ut'' are two 'redefined ",- streams which are im'licitly available to us. "n this cha'ter we'll learn how to ta(e control of in'ut and out'ut by o'ening our own streams, 'erha's connected to data files, which we can read from and write to.

12.1 File Pointers and fopen


41his section corres'onds to K&R &ec. D.I6 *ow will we s'ecify that we want to access a 'articular data file+ "t would theoretically be 'ossible to mention the name of a file each time it was desired to read from or write to it. 7ut such an a''roach would have a number of drawbac(s. "nstead, the usual a''roach (and the one ta(en in C's stdio library) is that you mention the name of the file once, at the time you open it. 1hereafter, you use some little to(en%%in this case, the file pointer%% which (ee's trac( (both for your sa(e and the library's) of which file you're tal(ing about. 8henever you want to read from or write to one of the files you're wor(ing with, you identify that file by using its file 'ointer (that is, the file 'ointer you obtained when you o'ened the file). #s we'll see, you store file 'ointers in variables 0ust as you store any other data you mani'ulate, so it is 'ossible to have several files o'en, as long as you use distinct variables to store the file 'ointers. .ou declare a variable to store a file 'ointer li(e this: 1he ty'e for you by <stdio.h>. "t is a data structure which holds the information the standard ",- library needs to (ee' trac( of the file for you. ?or historical reasons, you declare a variable which is a 'ointer to this H<1> ty'e. 1he name of the variable can (as for any variable) be anything you choose it is traditional to use the letters fp in the variable name (since we're tal(ing about a file pointer). "f you were reading from two files at once you'd 'robably use two file 'ointers:
H<1> (fp*, (fp-; H<1> (fp; H<1> is 'redefined

"f you were reading from one file and writing to another you might declare and in'ut file 'ointer and an out'ut file 'ointer:
H<1> (ifp, (ofp;

9i(e any 'ointer variable, a file 'ointer isn't any good until it's initiali2ed to 'oint to something. (#ctually, no variable of any ty'e is much good until you've initiali2ed it.) 1o actually o'en a file, and receive the $$to(en'' which you'll store in your file 'ointer variable, you call fopen. fopen acce'ts a file name (as a string) and a mode value indicating among other things whether you intend to read or write this file. (1he mode variable is also a string.) 1o o'en the file input.dat for reading you might call
ifp ) fopen("input.dat", "r");

1he mode string "r" indicates reading. :ode "w" indicates writing, so we could o'en output.dat for out'ut li(e this:
ofp ) fopen("output.dat", "w");

1he other values for the mode string are less fre)uently used. 1he third ma0or mode is "a" for a''end. ("f you use "w" to write to a file which already e!ists, its old contents will be discarded.) .ou may also add a + character to the mode string to indicate that you want to both read and write, or a $ character to indicate that you want to do $$binary'' (as o''osed to te!t) ",-.

-ne thing to beware of when o'ening files is that it's an o'eration which may fail. 1he re)uested file might not e!ist, or it might be 'rotected against reading or writing. (1hese 'ossibilities ought to be obvious, but it's easy to forget them.) fopen returns a null 'ointer if it can't o'en the re)uested file, and it's im'ortant to chec( for this case before going off and using fopen's return value as a file 'ointer. Bvery call to fopen will ty'ically be followed with a test, li(e this:
ifp ) fopen("input.dat", "r"); if(ifp )) BV11) { printf("can4t open file\n"); exit&or&return !

"f fopen returns a null 'ointer, and you store it in your file 'ointer variable and go off and try to do ",- with it, your 'rogram will ty'ically crash. "t's common to colla'se the call to fopen and the assignment in with the test:
if((ifp ) fopen("input.dat", "r")) )) BV11) { printf("can4t open file\n"); exit&or&return !

.ou don't have to write these $$colla'sed'' tests if you're not comfortable with them, but you'll see them in other 'eo'le's code, so you should be able to read them.

12.2 7D% )it$ File Pointers


?or each of the ",- library functions we've been using so far, there's a com'anion function which acce'ts an additional file 'ointer argument telling it where to read from or write to. 1he com'anion function to printf is fprintf, and the file 'ointer argument comes first. 1o 'rint a string to the output.dat file we o'ened in the 'revious section, we might call
fprintf(ofp, "Hello, world!\n");

1he com'anion function to %etchar is %etc, and the file 'ointer is its only argument. 1o read a character from the input.dat file we o'ened in the 'revious section, we might call
int c; c ) %etc(ifp);

1he com'anion function to putchar is putc, and the file 'ointer argument comes last. 1o write a character to output.dat, we could call
putc(c, ofp);

-ur own %etline function calls %etchar and so always reads the standard in'ut. 8e could write a com'anion f%etline function which reads from an arbitrary file 'ointer:
#include <stdio.h> #( #( #( #( int { int int ma' ;ead one line from fp, (# copAin% it to line arraA ($ut no more than ma' chars). (# Goes not place terminatin% \n in line arraA. (# ;eturns line len%th, or for emptA line, or >LH for end"of"file. (# f%etline(H<1> (fp, char lineRS, int ma') nch ) ; c; ) ma' " *;

#( lea9e room for 4\ 4 (#

while((c ) %etc(fp)) !) >LH) { if(c )) 4\n4) $rea&; if(nch < ma') { lineRnchS ) c; nch ) nch + *; ! ! if(c )) >LH II nch )) return >LH; lineRnchS ) 4\ 4; return nch; ! )

;ow we could read one line from ifp by calling


char lineRU5N1<B>S; ... f%etline(ifp, line, U5N1<B>);

12.2 7D% )it$ File Pointers


?or each of the ",- library functions we've been using so far, there's a com'anion function which acce'ts an additional file 'ointer argument telling it where to read from or write to. 1he com'anion function to printf is fprintf, and the file 'ointer argument comes first. 1o 'rint a string to the output.dat file we o'ened in the 'revious section, we might call
fprintf(ofp, "Hello, world!\n");

1he com'anion function to %etchar is %etc, and the file 'ointer is its only argument. 1o read a character from the input.dat file we o'ened in the 'revious section, we might call
int c; c ) %etc(ifp);

1he com'anion function to putchar is putc, and the file 'ointer argument comes last. 1o write a character to output.dat, we could call
putc(c, ofp);

-ur own %etline function calls %etchar and so always reads the standard in'ut. 8e could write a com'anion f%etline function which reads from an arbitrary file 'ointer:
#include <stdio.h> #( #( #( #( int { int int ma' ;ead one line from fp, (# copAin% it to line arraA ($ut no more than ma' chars). (# Goes not place terminatin% \n in line arraA. (# ;eturns line len%th, or for emptA line, or >LH for end"of"file. (# f%etline(H<1> (fp, char lineRS, int ma') nch ) ; c; ) ma' " *;

#( lea9e room for 4\ 4 (#

while((c ) %etc(fp)) !) >LH) { if(c )) 4\n4) $rea&; if(nch < ma') { lineRnchS ) c; nch ) nch + *; ! ! if(c )) >LH II nch )) return >LH; lineRnchS ) 4\ 4; return nch; ! )

;ow we could read one line from ifp by calling


char lineRU5N1<B>S; ... f%etline(ifp, line, U5N1<B>);

12.3 Prede,ined Streams


7esides the file 'ointers which we e!'licitly o'en by calling fopen, there are also three 'redefined streams. stdin is a constant file 'ointer corres'onding to standard in'ut, and stdout is a constant file 'ointer corres'onding to standard out'ut. 7oth of these can be used anywhere a file 'ointer is called for for e!am'le, %etchar() is the same as %etc(stdin) and putchar(c) is the same as putc(c, stdout). 1he third 'redefined stream is stderr. 9i(e stdout, stderr is ty'ically connected to the screen by default. 1he difference is that stderr is not redirected when the standard out'ut is redirected. ?or e!am'le, under =ni! or :&%/-&, when you invo(e
pro%ram > filename

anything 'rinted to stdout is redirected to the file filename, but anything 'rinted to stderr still goes to the screen. 1he intent behind stderr is that it is the $$standard error out'ut'' error messages 'rinted to it will not disa''ear into an out'ut file. ?or e!am'le, a more realistic way to 'rint an error message when a file can't be o'ened would be

if((ifp ) fopen(filename, "r")) )) BV11) { fprintf(stderr, "can4t open file ,s\n", filename); exit&or&return ! where filename is a string variable indicating the file name to be o'ened. ;ot only is the error message 'rinted to stderr, but it is also more informative in that it mentions the

name of the file that couldn't be o'ened. (8e'll see another e!am'le in the ne!t cha'ter.) Read se)uentially: 'rev ne!t u' to'

12.4 Closing Files


#lthough you can o'en multi'le files, there's a limit to how many you can have o'en at once. "f your 'rogram will o'en many files in succession, you'll want to close each one as you're done with it otherwise the standard ",- library could run out of the resources it uses to (ee' trac( of o'en files. Closing a file sim'ly involves calling fclose with the file 'ointer as its argument:
fclose(fp);

Calling fclose arranges that (if the file was o'en for out'ut) any last, buffered out'ut is finally written to the file, and that those resources used by the o'erating system (and the C library) for this file are released. "f you forget to close a file, it will be closed automatically when the 'rogram e!its.

12.# Example> <eading a Data File


&u''ose you had a data file consisting of rows and columns of numbers:
* 2 0 ./ 3E

**-

&u''ose you wanted to read these numbers into an array. (#ctually, the array will be an array of arrays, or a $$multidimensional'' array see section G.5.@.) 8e can write code to do this by 'utting together several 'ieces: the f%etline function we 0ust showed, and the %etwords function from cha'ter 5<. #ssuming that the data file is named input.dat, the code would loo( li(e this:
#define U5N1<B> * #define U5N;LKC * #define U5NFL1C *

int arraARU5N;LKCSRU5NFL1CS; char (filename ) "input.dat"; H<1> (ifp; char lineRU5N1<B>S; char (wordsRU5NFL1CS; int nrows ) ; int n; int i; ifp ) fopen(filename, "r"); if(ifp )) BV11) { fprintf(stderr, "can4t open ,s\n", filename); e'it(>N<TYH5<1V;>); ! while(f%etline(ifp, line, U5N1<B>) !) >LH) { if(nrows >) U5N;LKC) { fprintf(stderr, "too manA rows\n"); e'it(>N<TYH5<1V;>); ! n ) %etwords(line, words, U5NFL1C); for(i ) ; i < n; i++) arraARnrowsSRiS ) atoi(wordsRiS); nrows++; !

Bach tri' through the loo' reads one line from the file, using f%etline. Bach line is bro(en u' into $$words'' using %etwords each $$word'' is actually one number. 1he numbers are however still re'resented as strings, so each one is converted to an int by calling atoi before being stored in the array. 1he code chec(s for two different error conditions (failure to o'en the in'ut file, and too many lines in the in'ut file) and if one of these conditions occurs, it 'rints an error message, and e!its. 1he e'it function is a &tandard library function which terminates your 'rogram. "t is declared in <stdli$.h>, and acce'ts one argument, which will be the exit status of the 'rogram. >N<TYH5<1V;> is a code, also defined by <stdli$.h>, which indicates that the 'rogram failed. &uccess is indicated by a code of >N<TYCVFF>CC, or sim'ly <. (1hese values can also be returned from main() calling e'it with a 'articular status value is essentially e)uivalent to returning that same status value from main.)

Chapter 13: 0eadin' the Command 1ine


41his section corres'onds to K&R &ec. I.5<6 8e've mentioned several times that a 'rogram is rarely useful if it does e!actly the same thing every time you run it. #nother way of giving a 'rogram some variable in'ut to wor( on is by invo(ing it with command line arguments. (8e should 'robably admit that command line user interfaces are a bit old%fashioned, and currently somewhat out of favor. "f you've used =ni! or :&%/-&, you (now what a command line is, but if your e!'erience is confined to the :acintosh or :icrosoft 8indows or some other Kra'hical =ser "nterface, you may never have seen a command line. "n fact, if you're learning C on a :ac or under 8indows, it can be tric(y to give your 'rogram a command line at all. 1hin( C for the :acintosh 'rovides a way "'m not sure about other com'ilers. "f your com'ilation environment doesn't 'rovide an easy way of simulating an old%fashioned command line, you may s(i' this cha'ter.) C's model of the command line is that it consists of a se)uence of words, ty'ically se'arated by whites'ace. .our main 'rogram can receive these words as an array of strings, one word 'er string. "n fact, the C run%time startu' code is always willing to 'ass you this array, and all you have to do to receive it is to declare main as acce'ting two 'arameters, li(e this:
int main(int ar%c, char (ar%9RS) { ... ! 8hen main is called, ar%c will be a count of the number of command%line arguments, and ar%9 will be an array ($$vector'') of the arguments themselves. &ince each word is a string which is re'resented as a 'ointer%to%char, ar%9 is an array%of%'ointers%to%char. &ince we are not defining the ar%9 array, but merely declaring a 'arameter which references an array somewhere else (namely, in main's caller, the run%time startu' code), we do not have to su''ly an array dimension for ar%9. (#ctually, since functions never receive arrays as 'arameters in C, ar%9 can also be thought of as a 'ointer%to%'ointer%to% char, or char ((. 7ut multidimensional arrays and 'ointers to 'ointers can be confusing, and we haven't covered them, so we'll tal( about ar%9 as if it were an array.) (#lso, there's nothing magic about the names ar%c and ar%9. .ou can give main's two

'arameters any names you li(e, as long as they have the a''ro'riate ty'es. 1he names ar%c and ar%9 are traditional.) 1he first 'rogram to write when 'laying with ar%c and ar%9 is one which sim'ly 'rints its arguments:
#include <stdio.h> main(int ar%c, char (ar%9RS)

{ int i; for(i ) ; i < ar%c; i++) printf("ar% ,dQ ,s\n", i, ar%9RiS); return ; ! (1his 'rogram is essentially the =ni! or :&%/-& echo

command.)

"f you run this 'rogram, you'll discover that the set of $$words'' ma(ing u' the command line includes the command you ty'ed to invo(e your 'rogram (that is, the name of your 'rogram). "n other words, ar%9R S ty'ically 'oints to the name of your 'rogram, and ar%9R*S is the first argument. 1here are no hard%and%fast rules for how a 'rogram should inter'ret its command line. 1here is one set of conventions for =ni!, another for :&%/-&, another for M:&. 1y'ically you'll loo' over the arguments, 'erha's treating some as o'tion flags and others as actual arguments (in'ut files, etc.), inter'reting or acting on each one. &ince each argument is a string, you'll have to use strcmp or the li(e to match arguments against any 'atterns you might be loo(ing for. Remember that ar%c contains the number of words on the command line, and that ar%9R S is the command name, so if ar%c is 5, there are no arguments to ins'ect. (.ou'll never want to loo( at ar%9RiS, for i >) ar%c, because it will be a null or invalid 'ointer.) #s another e!am'le, also illustrating fopen and the file ",- techni)ues of the 'revious cha'ter, here is a 'rogram which co'ies one or more in'ut files to its standard out'ut. &ince $$standard out'ut'' is usually the screen by default, this is therefore a useful 'rogram for dis'laying files. ("t's analogous to the obscurely%named =ni! cat command, and to the :&%/-& tApe command.) .ou might also want to com'are this 'rogram to the character%co'ying 'rogram of section E.@.
#include <stdio.h> main(int ar%c, char (ar%9RS) { int i; H<1> (fp; int c; for(i ) *; i < ar%c; i++) { fp ) fopen(ar%9RiS, "r"); if(fp )) BV11) { fprintf(stderr, "catQ can4t open ,s\n", ar%9RiS); continue; ! while((c ) %etc(fp)) !) >LH) putchar(c);

fclose(fp); ! return ! ;

#s a historical note, the =ni! cat 'rogram is so named because it can be used to concatenate two files together, li(e this:
cat a $ > c

1his illustrates why it's a good idea to 'rint error messages to stderr, so that they don't get redirected. 1he $$can't o'en file'' message in this e!am'le also includes the name of the 'rogram as well as the name of the file. .et another 'iece of information which it's usually a''ro'riate to include in error messages is the reason why the o'eration failed, if (nown. ?or o'erating system 'roblems, such as inability to o'en a file, a code indicating the error is often stored in the global variable errno. 1he standard library function strerror will convert an errno value to a human%readable error message string. 1herefore, an even more informative error message 'rintout would be
fp ) fopen(ar%9RiS, "r"); if(fp )) BV11) fprintf(stderr, "catQ can4t open ,sQ ,s\n", ar%9RiS, strerror(errno)); "f you use code li(e this, you can #include <errno.h> to get the declaration and <strin%.h> to get the declaration for strerror().

for errno,

Chapter 14: 2hat3s 4e5t6


1his last handout contains a brief list of the significant to'ics in C which we have not covered, and which you'll want to investigate further if you want to (now all of C.

Types and Declarations


8e have not tal(ed about the 9oid, short int, and lon% dou$le ty'es. 9oid is a ty'e with no values, used as a 'laceholder to indicate functions that do not return values or that acce't no arguments, and in the $$generic'' 'ointer ty'e 9oid ( that can 'oint to anything. short int is an integer ty'e that might use less s'ace than a 'lain int lon% dou$le is a floating%'oint ty'e that might have even more range or 'recision than 'lain dou$le. 1he char ty'e and the various si2es of int also have $$unsigned'' versions, which are declared using the (eyword unsi%ned. =nsigned ty'es cannot hold negative values but have guaranteed 'ro'erties on overflow. (8hether a 'lain char is signed or unsigned is im'lementation%defined you can use the (eyword si%ned to force a character ty'e to contain signed characters.) =nsigned ty'es are also useful when mani'ulating individual bits and bytes, when $$sign e!tension'' might otherwise be a 'roblem.

1wo additional type qualifiers const and 9olatile allow you to declare variables (or 'ointers to data) which you 'romise not to change, or which might change in une!'ected ways behind the 'rogram's bac(. 1here are user%defined structure and union ty'es. # structure or struct is a $$record'' consisting of one or more values of one or more ty'es concreted together into one entity which can be mani'ulated as a whole. # union is a ty'e which, at any one time, can hold a value from one of a s'ecified set of ty'es. 1here are user%defined enumeration ty'es ($$enum'') which are li(e integers but which always contain values from some fi!ed, 'redefined set, and for which the values are referred to by name instead of by number. >ointers can 'oint to functions as well as to data ty'es. 1y'es can be arbitrarily com'licated, when you start using multi'le levels of 'ointers, arrays, functions, structures, and,or unions. Bventually, it's im'ortant to understand the conce't of a declarator: in the declaration
int i, (ip, (fpi();

we have the base type int and three declarators i, (ip, and (fpi(). 1he declarator gives the name of a variable (or function) and also indicates whether it is a sim'le variable or a 'ointer, array, function, or some more elaborate combination (array of 'ointers, function returning 'ointer, etc.). "n the e!am'le, i is declared to be a 'lain int, ip is declared to be a 'ointer to int, and fpi is declared to be a function returning 'ointer to int. (Com'licated declarators may also contain 'arentheses for grou'ing, since there's a 'recedence hierarchy in declarators as well as e!'ressions: RS for arrays and () for functions have higher 'recedence than ( for 'ointers.) 8e have not said much about 'ointers to 'ointers, or arrays of arrays (i.e. multidimensional arrays), or the ramifications of array,'ointer e)uivalence on multidimensional arrays. ("n 'articular, a reference to an array of arrays does not generate a 'ointer to a 'ointer it generates a 'ointer to an array. .ou cannot 'ass a multidimensional array to a function which acce'ts 'ointers to 'ointers.) Mariables can be declared with a hint that they be 'laced in high%s'eed C>= registers, for efficiency. (1hese hints are rarely needed or used today, because modern com'ilers do a good 0ob of register allocation by themselves, without hints.) # mechanism called tApedef allows you to define user%defined aliases (i.e. new and 'erha's more%convenient names) for other ty'es.

%perators
1he bitwise operators I, J, ?, and Z o'erate on integers thought of as binary numbers or strings of bits. 1he I o'erator is bitwise #;/, the J o'erator is bitwise -R, the ?

o'erator is bitwise e!clusive%-R (Q-R), and the Z o'erator is a bitwise negation or com'lement. (I, J, and ? are $$binary'' in that they ta(e two o'erands Z is unary.) 1hese o'erators let you wor( with the individual bits of a variable one common use is to treat an integer as a set of single%bit flags. .ou might define the Crd (@RR@) bit as the $$verbose'' flag bit by defining
#define :>;=LC> /

1hen you can $$turn the verbose bit on'' in an integer variable fla%s by e!ecuting
fla%s ) fla%s J :>;=LC>; fla%s J) :>;=LC>; fla%s ) fla%s I Z:>;=LC>; fla%s I) Z:>;=LC>; or

and turn it off with


or

and test whether it's set with

if(fla%s I :>;=LC>)

1he left%shift and right%shift o'erators << and >> let you shift an integer left or right by some number of bit 'ositions for e!am'le, 9alue << - shifts 9alue left by two bits. 1he [Q or conditional o'erator (also called the $$ternary o'erator'') essentially lets you embed an if,then statement in an e!'ression. 1he assignment
a ) e'pr [ $ Q c;

is roughly e)uivalent to

&ince in an e!'ression, it can do things that if,then can't, or that would be cumbersome with if,then. ?or e!am'le, the function call
f(a, $, c [ d Q e);

if(e'pr) a ) $; else a ) c; you can use [Q anywhere

is roughly e)uivalent to
if(c) else f(a, $, d); f(a, $, e);

(B!ercise: what would the call


%(a, $, c [ d Q e, h [ i Q @, &);

be e)uivalent to+) 1he comma o'erator lets you 'ut two se'arate e!'ressions where one is re)uired the e!'ressions are e!ecuted one after the other. 1he most common use for comma o'erators is when you want multi'le variables controlling a for loo', for e!am'le:
for(i ) , @ ) * ; i < @; i++, @"")

# cast operator allows you to e!'licitly force conversion of a value from one ty'e to another. # cast consists of a ty'e name in 'arentheses. ?or e!am'le, you could convert an int to a dou$le by ty'ing

int i ) * ; dou$le d; d ) (dou$le)i;

("n this case, though, the cast is redundant, since this is a conversion that C would have 'erformed for you automatically, i.e. if you'd 0ust said d ) i .) .ou use e!'licit casts in those circumstances where C does not do a needed conversion automatically. -ne e!am'le is division: if you're dividing two integers and you want a floating%'oint result, you must e!'licitly force at least one of the o'erands to floating%'oint, otherwise C will 'erform an integer division and will discard the remainder. 1he code
int i ) *, @ ) -; dou$le d ) i # @;

will set d to <, but

d ) (dou$le)i # @;

will set d to <.I. .ou can also $$cast to 9oid'' to e!'licitly indicate that you're ignoring a function's return value, as in
(9oid)fclose(fp);

or

(9oid)printf("Hello, world!\n");

(=sually, it's a bad idea to ignore return values, but in some cases it's essentially inevitable, and the (9oid) cast (ee's some com'ilers from issuing warnings every time you ignore a value.) 1here's a 'recise, mildly elaborate set of rules which C uses for converting values automatically, in the absence of e!'licit casts. 1he . and "> o'erators let you access the members (com'onents) of structures and unions.

Statements
1he switch statement allows you to 0um' to one of a number of numeric case labels de'ending on the value of an e!'ression it's more convenient than a long if,else chain. (*owever, you can use switch only when the e!'ression is integral and all of the case labels are com'ile%time constants.) 1he do,while loo' is a loo' that tests its controlling e!'ression at the bottom of the loo', so that the body of the loo' always e!ecutes once even if the condition is initially false. (C's do,while loo' is therefore li(e >ascal's repeat,until loo', while C's while loo' is li(e >ascal's while,do loo'.) ?inally, when you really need to write $$s'aghetti code,'' C does have the all%'ur'ose %oto statement, and labels to go to.

Functions
?unctions can't return arrays, and it's tric(y to write a function as if it returns an array ('erha's by simulating the array with a 'ointer) because you have to be careful about allocating the memory that the returned 'ointer 'oints to. 1he functions we've written have all acce'ted a well%defined, fi!ed number of arguments. printf acce'ts a variable number of arguments (de'ending on how many , signs there are in the format string) but we haven't seen how to declare and write functions that do this.

C Preprocessor
"f you're careful, it's 'ossible (and can be useful) to use #include within a header file, so that you end u' with $$nested header files.'' "t's 'ossible to use #define to define $$function%li(e'' macros that acce't arguments the e!'ansion of the macro can therefore de'end on the arguments it's $$invo(ed'' with. 1wo s'ecial 're'rocessing o'erators # and ## let you control the e!'ansion of macro arguments in fancier ways. 1he 're'rocessor directive #if lets you conditionally include (or, with #else, conditionally not include) a section of code de'ending on some arbitrary com'ile%time e!'ression. (#if can also do the same macro%definedness tests as #ifdef and #ifndef, because the e!'ression can use a defined() o'erator.) -ther 're'rocessing directives are #elif, #error, #line, and #pra%ma. 1here are a few 'redefined 're'rocessor macros, some re)uired by the C standard, others 'erha's defined by 'articular com'ilation environments. 1hese are useful for conditional com'ilation (#ifdef, #ifndef).

Standard 5i!rary Functions


C's standard library contains many features and functions which we haven't seen. 8e've seen many of printf's formatting ca'abilities, but not all. 7esides format s'ecifier characters for a few ty'es we haven't seen, you can also control the width, 'recision, 0ustification (left or right) and a few other attributes of printf's format conversions. ("n their full com'le!ity, printf formats are about as elaborate and 'owerful as ?-R1R#; format statements.) # scanf function lets you do $$formatted in'ut'' analogous to printf's formatted out'ut. scanf reads from the standard in'ut a variant fscanf reads from a s'ecified file 'ointer.

1he sprintf and sscanf functions let you $$'rint'' and $$read'' to and from in%memory strings instead of files. 8e've seen that atoi lets you convert a numeric string into an integer the inverse o'eration can be 'erformed with sprintf:
int i ) * ; char strR* S; sprintf(str, ",d", i);

8e've used printf and fprintf to write formatted out'ut, and %etchar, %etc, putchar, and putc to read and write characters. 1here are also functions %ets, f%ets, puts, and fputs for reading and writing lines (though we rarely need these, es'ecially if we're using our own %etline and maybe f%etline), and also fread and fwrite for reading or writing arbitrary numbers of characters. "t's 'ossible to $$un%read'' a character, that is, to 'ush it bac( on an in'ut stream, with un%etc. (1his is useful if you accidentally read one character too far, and would 'refer that some other 'art of your 'rogram read that character instead.) .ou can use the ftell, fsee&, and rewind functions to 0um' around in files, 'erforming random access (as o''osed to se)uential) ",-. 1he feof and ferror functions will tell you whether you got >LH due to an actual end% of%file condition or due to a read error of some sort. .ou can clear errors and end%of%file conditions with clearerr. .ou can o'en files in $$binary'' mode, or for simultaneous reading and writing. (1hese o'tions involve e!tra characters a''ended to fopen's mode string: $ for binary, + for read,write.) 1here are several more string functions in <strin%.h>. # second set of string functions strncpA, strncat, and strncmp all acce't a third argument telling them to sto' after n characters if they haven't found the \ mar(ing the end of the string. # third set of $$mem'' functions, including memcpA and memcmp, o'erate on bloc(s of memory which aren't necessarily strings and where \ is not treated as a terminator. 1he strchr and strrchr functions find characters in strings. 1here is a motley collection of $$s'an'' and $$scan'' functions, strspn, strcspn, and strp$r&, for searching out or s(i''ing over se)uences of characters all drawn from a s'ecified set of characters. 1he strto& function aids in brea(ing u' a string into words or $$to(ens,'' much li(e our own %etwords function. 1he header file <ctApe.h> contains several functions which let you classify and mani'ulate characters: chec( for letters or digits, convert between u''er% and lower%case, etc. # host of mathematical functions are defined in the header file <math.h>. (#s we've mentioned, besides including <math.h>, you may on some =ni! systems have to as( for a s'ecial library containing the math functions while com'iling,lin(ing.)

1here's a random%number generator, rand, and a way to $$seed'' it, srand. rand returns integers from < u' to ;5BGYU5N (where ;5BGYU5N is a constant #defined in <stdli$.h>). -ne way of getting random integers from 5 to n is to call
(int)(rand() # (;5BGYU5N + *. ) ( n) + *

#nother way is

rand() # (;5BGYU5N # n + *) + * rand() , n + *

"t seems li(e it would be sim'ler to 0ust say but this method is im'erfect (or rather, it's im'erfect if n is a 'ower of two and your system's im'lementation of rand() is im'erfect, as all too many of them are). &everal functions let you interact with the o'erating system under which your 'rogram is running. 1he e'it function returns control to the o'erating system immediately, terminating your 'rogram and returning an $$e!it status.'' 1he %eten9 function allows you to read your o'erating system's or 'rocess's $$environment variables'' (if any). 1he sAstem function allows you to invo(e an o'erating%system command (i.e. another 'rogram) from within your 'rogram. 1he 7sort function allows you to sort an array (of any ty'e) you su''ly a com'arison function (via a function 'ointer) which (nows how to com'are two array elements, and 7sort does the rest. 1he $search function allows you to search for elements in sorted arrays it, too, o'erates in terms of a caller%su''lied com'arison function. &everal functions%%time, asctime, %mtime, localtime, asctime, m&time, difftime, and strftime%%allow you to determine the current date and time, 'rint dates and times, and 'erform other date,time mani'ulations. ?or e!am'le, to 'rint today's date in a 'rogram, you can write
#include <time.h> timeYt now; now ) time((timeYt ()BV11); printf("<t4s ,.-/s", ctime(Inow));

1he header file <stdar%.h> lets you mani'ulate variable%length function argument lists (such as the ones printf is called with). #dditional members of the printf family of functions let you write your own functions which acce't printf%li(e format s'ecifiers and variable numbers of arguments but call on the standard printf to do most of the wor(. 1here are facilities for dealing with multibyte and $$wide'' characters and strings, for use with multinational character sets.

You might also like