You are on page 1of 135

Welcome All!

Systems Programming
00. Introduction
Alexander Holupirek
Database and Information Systems Group
Department of Computer & Information Science
University of Konstanz

Summer Term 2008

1995 United Feature Syndicate, Inc. (NYC), scottadams@aol.com

Visiting Card

Your Tutors
Jochen Oekonomopulos
jochen.oekonomopulos@uni-konstanz.de

Alexander Holupirek

Enrolled in master studies Information Engineering

alexander.holupirek@uni-konstanz.de

V 504

 Tuesday, 18:00-19:30, Room C 252/Computer Pool

http://www.inf.uni-konstanz.de/~holupire

88 4440

E 217

E-mail is the best way to reach me.

You are welcome in my office whenever you have a question


(no need to make an appointment first).

Thomas Zink
thomas.zink@uni-konstanz.de
Enrolled in master studies Information Engineering
V 504

 Friday, 12:00-13:30, Room D 406/Computer Pool


3

Tutorial Groups

How You Will Benefit

Subversion Repository for the Tutorial


I

We have set up a version control system for the tutorials.

Please use it to commit your solutions to the assignments.

Source code from the lecture is available in the /pub directory.

Work on the weekly assignments.

Once registered to the tutorial you will receive your


credentials.

Hand them in on time.

Jochen and Thomas will revise them.

Attend the tutorials and discussion of solutions.

Assignments & Tutorials

Command line approach to check out the repository


$ svn -- username holu -- password XXXX co \
svn :// phobos29 . inf . uni - konstanz . de / sys_S08

How You Will Benefit (cont.)

How You Will Benefit (cont.)

Account & Mailinglist


Lecture Material

Use the Account Tool to register to the course.

Use the material provided on the course website to prepare for


the lectures.

You will automagically become a member on the mailinglist


sys S08@inf.uni-konstanz.de.

Dont hesitate to ask questions.

Let me know if I can improve the lecture material and/or its


presentation.

Feel free to post and discuss problems, questions, comments


on that list.

Make sure to receive the e-mails.1

Any information about changes etc. will be posted there.

1
7

These are sent to <loginname>@inf.uni-konstanz.de


8

How You Will Benefit (cont.)

Organizational Matters

Examination and Credits


I

Register to the course (via StudIS) within 4 weeks.

Pass the examination at the end of the semester.


Examination dates:

I
I

Website for this course


I

Please check this site regulary for latest information.


http://www.inf.uni-konstanz.de/dbis/teaching/ss08/sys/

July, 18th, 12:00 - 13:30, D 406


October, 17th, 12:00 - 13:30, D 406

Schedule (OK for everybody?)

6 ECTS, Informatik der Systeme

Have fun!

Monday, 18:00-19:30, Room C 252

Tuesday, 18:00-19:30, Room D 247/Computer Pool

Literature

10

Literature

The IEEE and The Open Group.


Single UNIX Specification, Version 3, 2004 Edition.
http://www.unix.org/single unix specification/

Brian W. Kernighan, Dennis M. Ritchie.


The C Programming Language.
ISBN 0-13-110370-9, 1988, 41th Printing.
Prentice Hall Software Series
W. Richard Stevens, Stephen A. Rago.
Advanced Programming in the UNIX Environment.
ISBN 978-0201433074
Addison-Wesley Professional; 2nd edition (June 27, 2005)

11

12

What Is This Course About?

The UNIX System Interface

Systems Programming
I

With systems we mean operating systems.

With programming we mean using the interface an operating


system (OS) provides.

With OS we mean UNIX-like OSs.

The UNIX operating system provides its services through a set of


system calls, which are in effect functions within the operating
system that may be called by user programs.
I

Syscalls determine a direct interface to the kernel.


I

Operating System

Layer of software on top of bare hardware.

Shields programmers from the complexity of the hardware.

Presents an interface (of a virtual machine) that is easier to


understand and program.

Employed for maximum efficiency.


Access some facility that is not the libraries.

The service calls available in the interface vary from OS to


OS, however the underlying concepts tend to be similar.

ISO C library is (in many cases) modeled on UNIX facilities.

13

Standardization Of The UNIX System Interface

Systems Programming With POSIX.1

During the 1980s the proliferation of UNIX versions and differences


between them led many large users (such as the U.S. government)
to call for standardization.
I

Among others ANSI2 C and the IEEE3 POSIX emerged

POSIX stands for Portable Operating System Interface

14

application using the API

POSIX.1 system call interface

standards4

POSIX refers to a family of related

POSIX originally used as synonym for IEEE Std 1003.1-1988

POSIX.1 emerged as a preferred term

The latest version of POSIX.1 was published on April, 30th 04.

It is called IEEE Std 1003.1, 2004 Edition (POSIX.1)

OS as Black Box

Figure: POSIX.1 as interface to UNIX OSs

American National Standards Institute


Institute of Electrical and Electronics Engineers
4
IEEE Std 1003.n (where n is a number) and the parts of ISO/IEC 9945
3

15

16

Systems vs. Kernel Programming

The Joint Standard

.
I
I

Black Box Modell is suitable for systems programming.


Knowledge about the systems internals, however, is beneficial
to use the system properly and to not work against it.
Providing the system services is (mostly) kernel programming.

application using the API

application using the API

POSIX.1 system call interface

POSIX.1 system call interface

The latest version POSIX.1 has been jointly developed by the IEEE
and The Open Group5 . As such it is both an IEEE and an Open
Group Technical Standard:
I IEEE Std 1003.1, 2004 Edition
I The Open Group Technical Standard Base Specifications, Issue 6

OS as Black Box

It is also an international standard ISO/IEC 9945:2003

OS kernel

Figure: Black vs. White Box View of a UNIX System


5

http://www.opengroup.org/overview/members/membership list.htm

17

The Single UNIX Specification, Version 3

The Single UNIX Specification (SUSv3)

The standard is published free of charge on the web6 as

The document is broken into four parts:

The Single UNIX Specification, Version 3, 2004 Edition


Conceptually, this standard describes a set of
fundamental services needed for the efficient construction
of application programs. Access to these services has
been provided by defining an interface, using the C
programming language, a command interpreter, and
common utility programs that establish standard
semantics and syntax.

Part 1: Base Definitions (XBD)

Part 2: System Interfaces (XSH)

Part 3: Shell and Utilities (XCU)

Part 4: Rationale

The System Interfaces volume (XSH)7 describes a set of system


interfaces offered to application programs by systems conformant
to this part of the Single UNIX Specification. Readers are expected
to be experienced C language programmers.
http://www.opengroup.org/onlinepubs/009695399/functions/contents.html

[IEEE/The Open Group, 2004, Preface]

18

http://www.unix.org/single unix specification/


19

http://www.unix.org/version3/xsh contents.html
20

Part 2: System Interfaces Volume (XSH)

UNIX Architecture
applications

Because POSIX.1 specifies an interface and not an implementation,


no distinction is made between system calls and library functions.

shell
system calls

Example
System Interface Table. Lists 1123 interfaces.

kernel

http://www.opengroup.org/onlinepubs/009695399/functions/atoi.html
http://www.opengroup.org/onlinepubs/009695399/functions/read.html

library routines

21

System Calls - Section 2

System Calls - Section 2

The system call interface has traditionally been documented in


Section 2 of the UNIX Programmers Manual.
1
2
3
3p
4
5
6
7
8
9
X11
X11R6
local

22

The system call interface has traditionally been documented in


Section 2 of the UNIX Programmers Manual.

General commands (tools and utilities).


System calls and error numbers.
Libraries.
perl(1) programmers reference guide.
Device drivers.
File formats.
Games.
Miscellaneous.
System maintenance and operation commands.
Kernel internals.
An alias for X11R6.
X Window System.
Pages located in /usr/local.

0
1
2
3
4
5
6
7
8
9

Header files (usually found in /usr/include)


Executable programs or shell commands
System calls (functions provided by the kernel)
Library calls (functions within program libraries)
Special files (usually found in /dev)
File formats and conventions eg /etc/passwd
Games
Miscellaneous (including macro packages and
conventions), e.g. man(7), groff(7)
System administration commands (usually only for root)
Kernel routines [Non standard]

man(1) on Linux

man(1) on OpenBSD

23

24

System Call Definition & C Library Functions

Definition of the system call interface is in the C language8 .

A standard technique on UNIX systems is for each system call


to have a function of the same name in the Standard C
Library.
Those functions invoke the apt kernel service, using whatever
technique is required on the system.

Library Calls - Section 3

Section 3 of the UNIX Programmers Manual defines the


general purpose functions available to the programmers.
These functions are not entry points into the kernel.
I
I
I

The function may put one or more of the C arguments into


general registers and then execute some machine instruction
that generates a software interrupt in the kernel.

May use kernels system calls, however.


printf(3): May invoke write(2) to perform output.
atoi(3) (convert ASCII string to integer): no OS at all.

Implementors view (kernel programming): Distinction


between system call vs. library function is fundamental.

Users perspective (systems programming): Not as critical,


both exist to provide services for application programs, but . . .

We can consider the system calls as being C functions.

Regardless of the actual implementation technique used to invoke a call


25

System Calls vs. Library Calls

26

Essentials

Example to illustrate the difference: current time and date


I

Some OS have syscalls to return the time and another to


return the date. Special handling (switch to or from daylight
saving) is handled by the kernel or requires human
intervention.

UNIX provides one syscall (gettimeofday(2)) that returns


the number of seconds since the Epoch.9

Any interpretation (local time zone, converting to


human-readable time) is left to the user process.

I
I

Good knowledge of C.
Knowledge about the services an OS provides:
I
I

system calls.
C libraries.

Some knowledge about kernels internas.

Some knowledge about operating system concepts.

Some knowledge about the underlying hardware.

Syscalls usually provide a minimal interface while library


functions often provide more elaborate functionality.

midnight, January 1, 1970, Coordinated Universal Time


27

28

A Tutorial Introduction

Systems Programming

Variables and Arithmetic Expressions

01. The C Programming Language

Character Input and Output


Alexander Holupirek

Arrays

Database and Information Systems Group


Department of Computer & Information Science
University of Konstanz

Functions
Call by Value, Call by Reference

Summer Term 2008

Character Arrays
Variables, Declarations and Scope

30

29

Schedule For Today: A First Glance At C


I

Quick introduction

Show essential elements of the language

No details, rules, and exceptions

Provide examples
Show the basics, such as

I
I
I
I
I

The First Program Is Always The Same


Print the words: Hello, world
Not that easy, because you have to:

variables and constants


arithmetic
control flow
functions
rudiments of input and output

I
I

Create the program text

Compile it successfully

Run it

Get the output

# include < stdio .h >

2
3
4

Leave out anything else, such as


I

pointers
structures
standard library

6
7
8

31

int
main ( void )
{
printf ( " Hello , world \ n " );
return (0);
}

32

Compilation On A UNIX-like OS

C Programs

$ cc - Wall hello . c
$ ls
hello . c a . out
$ ./ a . out
Hello , world
$

engine
preprocessor
compiler
assembler
linker

filename
hello.c
hello.i
hello.s
hello.o
a.out

Basic building blocks

description
source code
source w/ preproc. directives expanded
assembler code
object code ready to be linked
executable

functions

statements

variables

arguments

functions contain statements

statements specify computing operations to be done

variables store values used during computation

arguments (one way to) communicate data between functions

33

Building Blocks Of Our Example

34

Some Explanations About The Program Itself


1

# include < stdio .h >

2
3
4

A function called main

Liberty to name functions whatever you like, but . . .

main is special, a program begins execution at the beginning


of main

Every program must have a main somewhere


main will usually call other functions to help perform its job

I
I

5
6
7
8

Functions that you wrote


Functions that are provided for you, e.g. printf

35

int
main ( void )
{
printf ( " Hello , world \ n " );
return (0);
}
I

line 1: tell compiler to include information about the standard


input/output library

line 3/4: define a function named main, which receives no


argt values. Parentheses after the function name surround the
argument list (emlist). Returns an int.

line 5/8: statements of main are enclosed in braces

line 6: main calls library function printf, which prints this


sequences of characters; \n represents the newline character.
36

Line 6: Print A String

Character String/String Constant

A function is called by naming it, followed by a parenthesized


list of arguments:
printf("Hello world\n");

A sequence of characters in double quotes is called a character


string or string constant

Sequence \n stands for the newline character, which when


printed advances the output to the left margin of the next line

We have to use \n to include a newline character with printf

calls the function printf with the argument

printf ( " Hello , world


" );

"Hello world\n"
I

printf is a library function that prints output

$ cc hello . c
hello . c :6:16: missing terminating " character
hello . c :7:9: missing terminating " character
hello . c : In function main :
hello . c :8: error : syntax error before " return "

(in this case the string of characters between the quotes)

38

37

Printing Hello, world

Escape Sequences

printf never supplies a newline automatically

Notice that \n represents only a single character

so several calls can build up an output line in stages

our first program could just as well have been written like
below to produce identical output

An escape sequence like \n provides a general and extensible


mechanism for hard-to-type or invisible characters.
\a
\b
\f
\n
\r
\t
\v

# include < stdio .h >


int
main ( void )
{
printf ( " Hello , " );
printf ( " world " );
printf ( " \ n " );
return (0);
}

alert (bell) character


backspace
formfeed
newline
carriage return
horizontal tab
vertical tab

\\
\?
\
\"
\ooo
\xhh

backslash
question mark
single quote
double quote
octal number
hexadecimal number

Table: The complete set of escape sequences

39

40

Fahrenheit-Celsius: C = (5/9)(F 32)


A Tutorial Introduction
1

Variables and Arithmetic Expressions

2
3
4

Character Input and Output

5
6
7

Arrays

# include < stdio .h >


/* print fahrenheit - celsius table
for fahrenheit = 0 , 20 , ... , 300 */
int
main ( void )
{
int fahr , celsius ;
int lower , upper , step ;

Functions

lower = 0;
/* lower limit */
upper = 300; /* upper limit */
step = 20;
/* step size */

10
11
12

Call by Value, Call by Reference

13

fahr = lower ;
while ( fahr <= upper ) {
celsius = 5 * ( fahr - 32) / 9;
printf ( " % d \ t % d \ n " , fahr , celsius );
fahr = fahr + step ;
}
return (0);

14
15

Character Arrays

16
17
18

Variables, Declarations and Scope

19
20
21

0
20
40
60
80
100
120
140
160
180
200
220
240
260
280
300

41

Declarations And Assignment Statements

42

Basic Data Types


type

A declaration announces the properties of variables.


Consists of type name and a list of variables, such as:
7
8

char

int fahr , celsius ;


int lower , upper , step ;

int
float
double

Range/size of data types depends on machine


10
11
12

-17
-6
4
15
26
37
48
60
71
82
93
104
115
126
137
148

lower = 0;
/* lower limit */
upper = 300; /* upper limit */
step = 20;
/* step size */

description
a single byte, capable of holding one character
in the local character set
an integer, typically reflecting the natural
size of integers on the host machine
single-precision floating point
double-precision floating point

short and long are qualifiers that can be applied to integers:


short int i ;
long int f ;
unsigned long d ;

Assignment statements set the variables to their initial values.

signed and unsigned can be applied to char and any integer.

43

44

Data Types And Sizes

The while Loop

Sizes are machine-dependent


I

Each compiler is free to choose appropriate sizes for its own


hardware. ISO C defines compile-time limits.
short and int are at least 16 bit

long is at least 32 bit

short is no longer than int, int is no longer than long

Each line in the result table is computed the same way:


while ( fahr <= upper ) {
celsius = 5 * ( fahr - 32) / 9;
printf ( " % d \ t % d \ n " , fahr , celsius );
fahr = fahr + step ;
}

15
16
17
18

limits10

19

Numerical
are documented in <limits.h> and
<float.h>. Additional limits are specified in <stdint.h>11

 Assignment
10
11

ISO C99 : 7.10/5.2.4.2 : Numerical limits


ISO C99 : 7.18 : Integer Types
45

Integer Division

46

printf(3) Revisited
# include < stdio .h >
int
printf ( const char * format , ...);

Why is C = (5/9)(F 32) computed as:


16

printf(3) is a general-purpose output formatting function.12


I 1st argument is the string of characters to be printed.

celsius = 5 * ( fahr - 32) / 9;

As in many other languages, integer division truncates, i.e., any


fractional part is discarded. Since 5 and 9 are integers, 5/9 would
be truncated to zero and so all the Celsius temperature would be
reported as zero.

Each % in the 1st arg is paired with the 2nd, 3rd arg etc.
printf ( " % d \ t % d \ n " , fahr , celsius );

17

47

Each % indicates where one of the other arguments


and in what form it is to be printed.

%d, for instance, specifies an integer argument, so fahr and


celsius are printed with a tab (\t) between them.

12

Not part of the C language, but defined in ANSI X3.159-1989 (ANSI C)


48

Fahrenheit-Celsius Converter Bug List

1
2

Fixing problems

3
4

Pretty printing: Right-justified output

Switch from integer to floating-point arithmetic

6
8
9

Construct a patch for the changes using diff(1)

10

$ diff - up fahrenheit_v1 . c fahrenheit_v2 . c


--- fahrenheit_v1 . c
Sat Apr 19 08:58:48 2008
+++ fahrenheit_v2 . c
Sat Apr 19 08:58:05 2008
@@ -4 ,7 +4 ,7 @@
int
main ( void )
{
int fahr , celsius ;
+
float fahr , celsius ;
int lower , upper , step ;

11
12

NAME

13

diff - compare files line by line

lower = 0;
/* lower limit */
@@ -13 ,8 +13 ,8 @@ main ( void )

14
15

SYNOPSIS
diff [ OPTION ]... FILES

16
17
18

DESCRIPTION
Compare files line by line .
-u -U NUM -- unified [= NUM ]
Output NUM ( default 3) lines of unified context .
-p -- show -c - function
Show which C function each change is in .

19
20
21
22
23

+
+

fahr = lower ;
while ( fahr <= upper ) {
celsius = 5 * ( fahr - 32) / 9;
printf ( " % d \ t % d \ n " , fahr , celsius );
celsius = (5.0/9.0) * ( fahr - 32.0);
printf ( " %3.0 f \ t %6.1 f \ n " , fahr , celsius );
fahr = fahr + step ;
}
return (0);

50

49

Patching The First Version

Fahrenheit-Celsius Converter v2

Applying the patch using patch(1)


$ ls
fahrenheit_v1 . c

1
2
3

fah re n he it _ v1 _ v2 . diff

4
5

$ patch < fahre nheit_v 1_ v 2 . diff


Hmm ... Looks like a unified diff to me ...
The text leading up to this was :
-------------------------| $ diff - up fahrenheit_v1 . c fahrenheit_v2 . c
| - - - fahrenheit_v1 . c
Sat Apr 19 08:58:48 2008
|+++ fahrenheit_v2 . c
Sat Apr 19 08:58:05 2008
-------------------------Patching file fahrenheit_v1 . c using Plan A ...
Hunk #1 succeeded at 4.
Hunk #2 succeeded at 13.
done
$ ls
fahrenheit_v1 . c

6
7
8

# include < stdio .h >


/* print fahrenheit - celsius table
for fahrenheit = 0 , 20 , ... , 300 */
int
main ( void )
{
float fahr , celsius ;
int lower , upper , step ;

lower = 0;
/* lower limit */
upper = 300; /* upper limit */
step = 20;
/* step size */

10
11
12
13

fahr = lower ;
while ( fahr <= upper ) {
celsius = (5.0/9.0) * ( fahr - 32.0);
printf ( " %3.0 f \ t %6.1 f \ n " , fahr , celsius );
fahr = fahr + step ;
}
return (0);

14
15
16
17
18

fahrenheit_v1 . c . orig

19

f a hr e n hei t_v1_ v2 . diff

20
21

51

0
20
40
60
80
100
120
140
160
180
200
220
240
...

-17.8
-6.7
4.4
15.6
26.7
37.8
48.9
60.0
71.1
82.2
93.3
104.4
115.6
...

}
52

Printing With printf(3)


specifier
%d
%6d
%f
%6f
%.2f
%6.2f

The for Loop, Fahrenheit-Celsius v3

print as . . .
decimal integer
decimal, at least 6 characters wide
floating point
floating point, at least 6 characters wide
floating point, 2 characters after decimal point
floating point, at least 6 wide and 2 after decimal point

1
2
3
4
5
6
7

# include < stdio .h >


/* print fahrenheit - celsius table
for fahrenheit = 0 , 20 , ... , 300 */
int
main ( void )
{
int fahr ;

for ( fahr = 0; fahr <= 300; fahr = fahr + 20)


printf ( " %3 d %6.1 f \ n " , fahr , (5.0/9.0)*( fahr -32));

9
10

Further printf(3) recognizes %o for octal, %x for


hexadecimal, %c for character, %s for string, %p for address
(pointer)

11

return (0);

12
13

ISO C : 7.19.6 : Formatted input/output functions

0
20
40
60
80
100
120
140
160
180
200
220
240
260
280
300

-17.8
-6.7
4.4
15.6
26.7
37.8
48.9
60.0
71.1
82.2
93.3
104.4
115.6
126.7
137.8
148.9

54

53

Symbolic Constants, Fahrenheit-Celsius Final


I

Bad practice to bury magic numbers in a program

Convey little information, hard to change in a systematic way

A #define line defines a symbolic name

A Tutorial Introduction
Variables and Arithmetic Expressions
Character Input and Output

# include < stdio .h >

2
3
4
5

Arrays

# define LOWER 0
/* lower limit of table */
# define UPPER 300 /* upper limit */
# define STEP 20 /* step size */

Functions

6
7
8
9
10
11

/* print fahrenheit - celsius table */


int
main ( void )
{
int fahr ;

Call by Value, Call by Reference


Character Arrays

12

for ( fahr = LOWER ; fahr <= UPPER ; fahr += STEP )


printf ( " %3 d %6.1 f \ n " , fahr , (5.0/9.0)*( fahr -32));

13
14

Variables, Declarations and Scope

15

return (0);

16
17

}
55

56

Character Input And Output

getchar(3) and putchar(3)

# include < stdio .h >

Processing character data


I

Text I/O is dealt with as streams of characters

A text stream is a sequence of characters divided into lines

Each line consists of zero or more characters followed by a


newline character (regardless of where the stream originates or
where it goes to). The library makes each input or output
stream conform to this model

int
getchar ( void );
I
I

getchar(3) reads the next input character from a text stream


Why does getchar(3) return an int?
I

Standard library provides several functions for reading and


writing one character at a time, of which getchar(3) and
putchar(3) are the simplest.

int
putchar ( int c );

getchar(3) returns a distinctive value when there is no more


input. A value, called EOF (end of file), that cannot be
confused with any real data. EOF is defined in <stdio.h>
The return type must be big enough to hold EOF in addition to
any possible char.

putchar(3) prints a character each time it is called

57

File Copying

58

File Copying, v1
read a character
while (character is not end-of-file indicator)
output the character just read
read a character

Given getchar(3) and putchar(3) . . .


. . . we can write a surprising amount of useful code without
knowing anything more about input and output

# include < stdio .h >

2
3
4

Copying input to output one character at a time

5
6

read a character
while (character is not end-of-file indicator)
output the character just read
read a character

/* copy input to output , v1 */


int
main ( void )
{
int c ;

c = getchar ();
while ( c != EOF ) {
putchar ( c );
c = getchar ();
}

9
10
11
12
13
14

return (0);

15
16

59

60

File Copying, v2
I

Character Counting, v1

An assignment, such as c = getchar() is an expression and


has a value (value of the left hand side after the assignment)

An assignment can appear as part of a larger expression

3
4

# include < stdio .h >

2
3
4
5
6
7

/* copy input to output , v2 */


int
main ( void )
{
int c ;

/* count characters in input , v1 */


int
main ( void )
{
long nc ;

nc = 0;
while ( getchar () != EOF )
++ nc ;
printf ( " % ld \ n " , nc );

9
10
11

12

while (( c = getchar ()) != EOF )


putchar ( c );

9
10

13

return (0);

14

11

15

return (0);

12
13

# include < stdio .h >

61

Character Counting, v2

62

Line Counting
I

# include < stdio .h >

2
3
4
5
6
7

/* count characters in input , v2 */


int
main ( void )
{
double nc ;

3
4
5
6
7

for ( nc = 0; getchar () != EOF ; ++ nc )


; /* nothing */
printf ( " %.0 f \ n " , nc );

9
10
11

# include < stdio .h >

/* count lines in input */


int
main ( void )
{
int c , nl ;

nl = 0;
while (( c = getchar ()) != EOF )
if ( c == \ n )
++ nl ;
printf ( " % d \ n " , nl );

9
10

12

11

return (0);

13
14

Standard library ensures that an input text stream appears as


a sequence of lines, each terminated by a newline

12

13
14

return (0);

15
16

63

}
64

Word Counting

# include < stdio .h >

2
3
4

# define IN 1
# define OUT 0

/* inside a word */
/* outside a word */

5
6

NAME

wc - word , line , and byte or character count

8
9

SYNOPSIS
wc [ - c | -m ] [ - hlw ] [ file ...]

10

/* count lines , words and , characters in input */


int
main ( void )
{
int c , nl , nw , nc , state ;

11

state = OUT ;
nl = nw = nc = 0;
while (( c = getchar ()) != EOF ) {
++ nc ;
if ( c == \ n )
++ nl ;
if ( c == || c == \ n || c == \ t )
state = OUT ;
else if ( state == OUT ) {
state = IN ;
++ nw ;
}
}
printf ( " % d % d % d \ n " , nl , nw , nc );
return (0);

12

DESCRIPTION
The wc utility reads one or more input text files , and ,
by default , writes the number of lines , words , and bytes
contained in each input file to the standard output

13
14
15
16
17

$ wc / etc / services
285
1398
9732 / etc / services
$ cc count_words . c
$ cat / etc / services | ./ a . out
285 1398 9732

18
19
20
21
22
23
24
25
26
27

65

66

Counting Digits, White Spaces, And The Rest


A Tutorial Introduction
Variables and Arithmetic Expressions

Next is an artificial program, which counts the number of


occurrences of each digit, of white space characters (blank, tab,
newline), and all other characters.

Character Input and Output

It will help us to . . .

Arrays
Functions
Call by Value, Call by Reference

introduce arrays

talk about initialization

see that chars are, by definition, just small integers

speak about coding conventions

The output of the program on itself is:

Character Arrays

$ cat count_digits . c | ./ a . out


digits = 10 3 0 0 0 0 0 0 0 1 , white space =122 , other =361
$ wc -m count_digits . c
497 count_digits . c

Variables, Declarations and Scope

67

68

# include < stdio .h >

2
3
4
5
6
7
8

/* count digits , white space , others */


int
main ( void )
{
int c , i , nwhite , nother ;
int ndigit [10];

A Tutorial Introduction
Variables and Arithmetic Expressions
Character Input and Output

nwhite = nother = 0;
for ( i = 0; i < 10; ++ i )
ndigit [ i ] = 0;

10
11
12

Arrays

13

while (( c = getchar ()) != EOF )


if ( c >= 0 && c <= 9 )
++ ndigit [c - 0 ];
else if ( c == || c == \ n || c == \ t )
++ nwhite ;
else
++ nother ;

14
15
16
17
18
19
20

Functions
Call by Value, Call by Reference
Character Arrays

21

printf ( " digits = " );


for ( i = 0; i < 10; ++ i )
printf ( " % d " , ndigit [ i ]);
printf ( " , white space =% d , other =% d \ n " , nwhite , nother );

22
23
24
25

Variables, Declarations and Scope

26

return (0);

27
28

70

69

power(m,n)

# include < stdio .h >

2
3

int power ( int , int ); /* function declaration / prototype */

4
5

Functions, self-defined

6
7

So far, we just used printf(3), getchar(3), and putchar(3)


from the standard library. C has no exponentiation operator. We
define power(m,n) to raise an integer m to the power of n13 .

8
9

/* test power function */


int
main ( void )
{
int i ;

10

for ( i = 0; i < 10; ++ i )


printf ( " % d % d % d \ n " ,i , power (2 , i ) , power ( -3 , i ));
return (0);

11
12

A function definition has the form:

13
14

return-type
function-name(parameter declarations, if any)
{
declarations
statements
}

15
16
17
18
19
20

/* power : raise base to n - th power ; n >= 0 */


int
power ( int base , int n )
{
int i , p ;

21

p = 1;
for ( i = 1; i <= n ; ++ i )
p = p * base ;
return p ;

22
23
24
25
26

13

Only handles positive powers of small integers, in real life take pow(3).
71

72

Function Terminology
A Tutorial Introduction
Variables and Arithmetic Expressions
line 3: function declaration (function prototype), says that power is a
function that expects two int arguments and returns an int

Character Input and Output

line 17: function definition starts with the declaration of the parameter
types and names, and the type of the result that the function
returns (has to match with the prototype)
I

parameter, a variable named in the parenthesized list in a


function definition

argument, a value used in a call of the function

parameter and argument are sometimes referred to as formal


and actual argument

Arrays
Functions
Call by Value, Call by Reference
Character Arrays
Variables, Declarations and Scope

74

73

ArgumentsCall by Value/Reference

Passing An Array As Argument

In C, all function arguments are passed by value


I

The called function is given the values of its arguments in


temporary variables rather than the originals

When the name of an array is used as an argument,

The callee cant directly alter a variable in the calling function

Call by reference is possible


I

The caller must provide the address of the variable to be set


(technically a pointer to the variable), and the called function
must declare the parameter to be a pointer and access the
variable indirectly through it

We will discuss pointers in more detail at a later point

75

the value passed to the function is the location or address of


the beginning of the array

there is no copying of array elements

the function can access and alter any element of the array

76

Character Arrays
A Tutorial Introduction
Variables and Arithmetic Expressions

The most common type of array in C is the array of characters


Character Input and Output
longline.c reads a set of text lines and prints the longest
Arrays

Program outline:
Functions

while (there is another line)


if (its longer than the previous longest)
save it
save its length
print longest line

Call by Value, Call by Reference


Character Arrays
Variables, Declarations and Scope

77

Splitting The Program

78

The Controlling Function


7

The program divides naturally into pieces

8
9

Function getline fetches the next line of input


I
I
I

10
11

It has to return a signal about end-of-file


We let it return the length of the line, or zero on EOF
Zero is appropriate because it is never a valid line length

Function copy copies a line to a safe place

Function main to control getline and copy

12
13
14

/* print longest input line */


int
main ( void )
{
int len ;
/* current
int max ;
/* maximum
char line [ MAXLINE ];
/*
char longest [ MAXLINE ]; /*

15

max = 0;
while (( len = getline ( line , MAXLINE )) > 0)
if ( len > max ) {
max = len ;
copy ( longest , line );
}
if ( max > 0)
/* there was a line */
printf ( " % s " , longest );
return (0);

16
17
18
19

1
2

# include < stdio .h >


# define MAXLINE 1000

20
21

/* maximum input line size */

22

3
4
5

line length */
length seen so far */
current input line */
longest line saved here */

23

int getline ( char line [] , int maxline );


void copy ( char to [] , char from []);

24
25

79

80

Getting A Line
27
28
29
30
31

copy()

/* getline : read a line into s , return length */


int
getline ( char s [] , int lim )
{
int c , i ;

43
44
45
46

32

47

for ( i =0; i < lim -1 && ( c = getchar ())!= EOF && c != \ n ; ++ i )


s[i] = c;
if ( c == \ n ) {
s[i] = c;
++ i ;
}
s [ i ] = \0 ;
return i ;

33
34
35
36
37
38
39
40
41

/* copy : copy from into to ; assume to is big enough */


void
copy ( char to [] , char from [])
{
int i ;

48

i = 0;
while (( to [ i ] = from [ i ]) != \0 )
++ i ;

49
50
51
52

}
I

copy does not return a value, void explicitly states it

copy is used for its side-effect.

}
I

getline puts the character \0 (the null character; value is


zero) at the end of the array to mark the end of the string

"hello\n" is stored as

\n

$ cat longline . c | ./ a . out


/* copy : copy from into to ; assume to is big enough */

\0
81

82

Automatic Variables
A Tutorial Introduction
Variables and Arithmetic Expressions

Local, internal, automatic variables of a function

Character Input and Output

Variables, such as line or longest are private or local to the


function they are declared in. Their scope is the function.

A local variable in a function comes into existence only when


the function is called, it disappear when the function is exited

Those variables are called, automatic variables

Call by Value, Call by Reference

Automatic variable come and go with function invokation,


they do not retain their values from one call to the next14

Character Arrays

They will contain garbage, if they are not initialized

Arrays
Functions

Variables, Declarations and Scope


14
83

We will discuss an alternative (keyword static) later


84

External Variables

External Variables (cont.)


Definition and declaration of external variables

Global, external variables of a program


I

As an alternative to automatic variables, it is possible to


define variables that are external to all functions.

Those can be accessed by name by any function

Because external variables are globally accessible, they can be


used instead of argument lists to communicate data between
functions (but, beware!)

External variables remain into existence permanently

They retain their values even after the functions that set them
have returned

An external variable must be defined, exactly once, outside a


function; this sets aside storage for it.

The variable must also be declared in each function that


wants to access it; this states the type of the variable.

In general: All variables (automatic or extern) must be


declared, either explicit or implicit from context

Definition of a variable, refers to the place where the variable


is created and assigned storage

Declaration of a variable, refers to places where the nature of


the variable is stated but no storage is allocated

86

85
1
2

# include < stdio .h >


# define MAXLINE 1000

30
31

/* maximum input line size */

32

3
4
5
6

33

int max ;
/* maximum length seen so far */
char line [ MAXLINE ];
/* current input line */
char longest [ MAXLINE ]; /* longest line saved here */

34
35

37

int getline ( void );


void copy ( void );

38
39

10
11
12
13
14
15
16
17

40

/* print longest line ; external objects , weak solution */


int
main ( void )
{
int len ;
/* current line length */
extern int max ;
extern char longest [];

41
42
43
44
45
47
48

max = 0;
while (( len = getline ()) > 0)
if ( len > max ) {
max = len ;
copy ();
}
if ( max > 0)
/* there was a line */
printf ( " % s " , longest );
return (0);

19
20
21
22
23
24
25
26
27

46

18

28

for ( i =0; i < MAXLINE -1


&& ( c = getchar ()) != EOF && c != \ n ; ++ i )
line [ i ] = c ;
if ( c == \ n ) {
line [ i ] = c ;
++ i ;
}
line [ i ] = \0 ;
return i ;

36

7
8

int
getline ( void )
{
int c , i ;
extern char line [];

49
50
51

void
copy ( void )
{
int i ;
extern char line [] , longest [];

52

i = 0;
while (( longest [ i ] = line [ i ]) != \0 )
++ i ;

53
54
55
56
87

}
88

Terminology: External vs. Internal

Static Internal Variables


The static declaration can be applied to internal variables

A C program consists of a set of external objects, which are


either variables or functions

Function are always external, because C does not allow


functions to be defined inside other functions

External is used in contrast to internal, which describes the


arguments and variables used inside functions

By default, external variables and functions have the property


that all references to them by the same name, even from
functions compiled separately, are references to the same
thing (this is called external linkage in the standard)15

Internal static variables are local to a particular function (just


as automatic variables), but unlike automatics, they remain in
existence over different invokations of the function

This means that internal static variables provide private,


permanent storage within a single function

void
f ( unsigned int m , long n )
{
static int i ;
...
}

15

We will see later how to define external variables and functions that are
visible only within a single source file, once again, the keyword is static
90

89

Static External Variables And Functions

Register Variables
The register declaration

The static declaration can be applied to external objects


I

Applied to an external variable or function, static limits the


scope of that object to the rest of the source file

It provides a way to hide names otherwise globally visible

static char buf [ BUFSIZE ];


static int bufp = 0;

advises the compiler that the variable in question will be


heavily used. The idea is to place it in a machine register

Compiler are free to ignore the advice

Can only be used with automatics and formal arguments

Not possible to take the address of a register variable

void
f ( register unsigned int m , register long n )
{
register int i ;
...
}

static void
f ( register unsigned int m , register long n )
{
...
}

91

92

Initialization
I

In the absence of explicit initialization, external and static


variables are guaranteed to be initialized to zero.

Scalar variables may be initialized when they are defined, by


following the name with an equals sign and an expression:

Block Structure And Scope

int x = 1;
char squote = \ ;
long day = 1000 L * 60 L * 60 L * 24 L ; /* milliseconds / day */
I

Declarations of variables (including initializations) may follow


the left brace that introduces any compound statement, not
just the one that begins a function

They hide any identically named variables in outer blocks

They remain into existence until the matching right brace

What is the scope of i?

For external and static variables, the initializer must be a


constant expression; the initialization is done once,
conceptually before the program begins execution.

if ( n > 0) {
int i ;

For automatic and register variables, it is done each time the


function or block is entered (not restricted being a constant)

/* declare a new int */

for ( i = 0; i < n ; i ++)


...

94

93

Block Structure And Scope


I

Systems Programming

An automatic variable declared and initialized in a block is


initialized each time the block is entered

A static variable is initialized only the first time the block is


entered

Automatic variables, including formal parameters, also hide


external variables and functions of the same name

02. C Programs in Space and Time

Alexander Holupirek
Database and Information Systems Group
Department of Computer & Information Science
University of Konstanz

int x ;
int y ;

Summer Term 2008

void
f ( double x )
{
double y ;
...
}

95

96

C Programs In (Address) Space And (Run-)time


Repetition Computer Architecture

Where is my data and why do I have to know?


I

Storage Classes

C is closely related to the machine. Before talking about


pointers, storage allocation etc. some background knowledge
about address space, (virtual) memory and its allocation
during program execution comes in handy

From Source Code To Executable Code

Knowledge about the memory layout of a program is quite


helpful when debugging

Construction of an Executable

Knowledge about what is happening inside the machine on


program execution is fundamental, to both, debugging
programs and, in first place, writing clean code

Relocation Process

97

C, Assembler, And Machine Code

C, Assembler, And Machine Code


ausfhrbarer
Binrcode (hexadezimal dargestellt)

C-Quellcode
int a, b;
a = b * b;

Intel iA32-Assembler-Quellcode
mov
imul
mov

0x403030,%eax
0x403030,%eax
%eax,0x403020

Maschinenbefehle bzw.
Prozessorinstruktionen

Adresse

98

4012ee
4012ef
4012f0
4012f1
4012f2
4012f3
4012f4
4012f5
4012f6
4012f7
4012f8
4012f9
4012fa
4012fb
4012fc
4012fd
4012fe

C-Quellcode

a1
30
30
40
00
0f
af
05
30
30
40
00
a3
20
30
40
00

int a=4, b;
int main(void) {

Speicheradresse

Assembler-Quellcode

Speicherinhalt
(=Maschinenbefehl)

if (a>5)

8048344:
804834b:

83 3d 94 94 04 08 05
7e 0c

cmpl
jle

$0x5,0x8049494
8048359

804834d:
8048354:

c7 05 8c 95 04 08 01
00 00 00

movl

$0x1,0x804958c

b=1;

8048357:

eb 0a

jmp

8048363

8048359:
8048360:

c7 05 8c 95 04 08 00
00 00 00

movl

$0x0,0x804958c

c9

...

else
b=0;
}

Ausfhrbarer Binrcode

8048363:

a liegt auf Adresse 0x8049494


b liegt auf Adresse 0x804958c

Zahlenwerte in Binr- und Assemblercode


sind alle hexadezimal zu verstehen

Inhalt (je 1 Byte)

99

100

Address Space

Byte Ordering
Speicherinhalte
Adressraum

Startadresse des
Datenblocks
16 Byte

Datenblock

Adr.
n
n+1
n+2
n+3

0x1000000f
0x10000010

Adresse des ersten


Byte nach dem
Datenblock

Gre des
Datenblocks

Adressen einzelner
Byte

0x50000000
0x50000001

Hchstmgliche Adresse
(Speicherende)

Daten (4 Byte):

LSB

MSB
d3

d2

Big-Endian-System

0x10000000

Letzte Byteadresse
des Datenblocks

Adr.

Speicheradressen

Tiefstmgliche Adresse
(Speicherbeginn)

0x56
0xfc

max.

Inhalt
d3
d2
d1
d0

MSB

LSB

d1

d0

Little-Endian-System
Adr.
n
n+1
n+2
n+3

Inhalt
d0
d1
d2
d3

LSB

MSB

Mit der Adresse n wird auf die 4 Byte groen Daten im Programm zugegriffen
MSB = Most Significant Byte (hchstwertiges Byte)
LSB = Least Significant Byte (niedrigstwertiges Byte)

max.

101

Alignment Rules

Alignment Rules (cont.)


For derived types16 (constructed from the basic types) alignment
rules apply to each single component:

Goal: Optimal Performance


I

Determine the address locations for variables and instructions

Great impact on compiler, assembler, linker tools

alignment(1)

Adressen
(hexadezimal)
0x35
0x36
0x37
0x38

alignment(4)

struct artikel {char name[5];


int anzahl;
double preis;};

Datenbus

Adressraum

DatenLangwort
(misaligned)

102

Adressoffsets (Byteadressen)
+0
0x34

+1
0x35

+2
0x36

+3
0x37

0x38

0x39

0x3a

0x3b

1. Zugriff

Alignment rules may be influenced through compiler directives

2. Zugriff

(-malign-int aligns variables on 32-bit boundaries producing code that runs


Langwortgrenzen auf dem Bus

somewhat faster on processors with 32-bit busses at the expense of memory)

Langwortgrenzen (ohne Rest durch 4 teilbar) im Adressraum

16
103

arrays, functions, pointers, structures, unions (we will discuss them later)
104

Storage Classes
Repetition Computer Architecture

Placement of data in memory depends on storage class


I

An object, such as a variable, is a location in storage, and its


interpretation depends on two main attributes: its storage
class and its type

The storage class determines the lifetime of the storage


associated with the identified object

The types determines the meaning of the values found in the


identified object.

In C we have two storage classes: automatic and static

Storage class specifiers (auto, extern, register, static)


together with the context of an objects declaration, specify
its storage class

Storage Classes

From Source Code To Executable Code

Construction of an Executable

Relocation Process

105

Automatic Storage Class

Static Storage Class

Automatic Objects
I

They are local to a block17 , discarded on exit from the block

Declarations within a block create automatic objects if no


storage class specification is mentioned or auto is used

17

Static Objects

auto and register give the declared objects automatic


storage class, and may be used only within functions

106

Initialization of automatic objects is performed each time the


block is entered at the top (if a jump into the block is
executed the initializations are not performed)
Objects declared register are automatic, and are (if
possible) stored in fast registers of the machine
For register the address operator & is not allowed

May be local to a block or external to all blocks

In both cases, they retain their values across exit from and
reentry to functions and blocks

Within a block, static objects are declared with static

Objects declared outside of all blocks (at the same level as


function definitions) are always static

On the outer level, the keyword static makes them local to


a particular translation unit (internal linkage)

They are global to an entire program by omitting an explicit


storage class, or by using extern (external linkage)

aka compound statement, such as the body of a function


107

108

Storage Class And Sections

Typical Program Organisation

Intermediate Summary

A typical program divides naturally in sections

A program executed does not only use storage for its


instructions, but additionally needs space for, e.g., variables

Variables may be temporary, dynamically allocated, or static


(i.e., permanent in terms of storage allocation), initialized or
uninitialized, declared as constant (const) and thus read-only

I
I

Code machine instructions, should be unmodifiable, size is known


after compilation, does not change (.text)
Data I static data
I
I
I

Placement of data in memory depends on its storage class

During the translation process the compiler uses sections to


divide the address space into logical units

dynamic data
I
I

Details vary with operating systems and compiler used

initialized (.data) /uninitialized (.bbs)


constant address in memory
permanent life time

stack or heap
storage space not known
volatile life time

110

109

Program Sections

Virtual Memory And Segments


Virtual Memory
I

Whenever a process is created, the kernel provides a chunk of


physical memory which can be located anywhere

Through the magic of virtual memory (VM), the process


believes it has all the memory on the computer

Adressraum
.text

.data

.bss

PROM oder RAM


schreibgeschtzt

PROM:
Programmable Read Only Memory
(im Betrieb nicht beschreibbarer
Speicherbaustein)

RAM

RAM

Typically the VM space is laid out in a similar manner:

RAM:
Random Access Memory
(Speicher mit wahlfreiem Zugriff)

111

Text Segment (.text)

Initialized Data Segment (.data)

Uninitialized Data Segment (.bss)

The Stack

The Heap

112

A Program In Memory

Different Memory Layouts

(A) Lsung auf PC (iA32)

0
initialisierte Daten

bei Prozessstart bereitgestellt


und mit 0 initialisiert (gelscht)

nicht initialisierte Daten

dynamic
data
Adressen

Heap

Stack

Code, Konstanten
Stack

Programmstartadresse

bei Prozessstart bereitgestellt,


fr dynamische Speicherallozierung,
wchst dem Stapel entgegen

(B) Stack umgekehrt wachsend

initialisierte Daten
nicht initialisierte Daten

Code, Konstanten

Stack

initialisierte Daten
nicht initialisierte Daten

bei Prozessstart bereitgestellt,


wchst zu tieferen Adressen
(bzw. zu hheren Adr.;
prozessorabhngig)

Heap

Adressen

aus ausfhrbarer Datei geladen

Adressen

static
data

Code, Konstanten

Heap

113

Memory Segments

114

Memory Segments (cont.)

Text Segment. The text segment contains the actual code


(including constants) to be executed. Its usually sharable, so
multiple instances of a program can share the text segment to
lower memory requirements. This segment is usually marked
read-only so a program cant modify its own instructions.

The Stack The stack is a collection of stack frames which we will


discuss later. When a new frame needs to be added (as a
result of a newly called function), the stack grows downward.
The Heap Dynamic memory, where storage can be (de-)allocated
via Cs free(3)/malloc(3). The C library also gets
dynamic memory for its own personal workspace from the
heap as well. As more memory is requested on the fly, the
heap grows upward.

Initialized Data Segment. This segment contains global variables


which are initialized by the programmer.
Uninitialized Data Segment. Also named .bss (block started by
symbol) which was an operator used by an old assembler.
This segment contains uninitialized global variables. All
variables in this segment are initialized to 0 or NULL pointers
before the program begins to execute.

115

116

Variable Placement And Life Time (Code)

Variable Placement And Life Time (Code)


int a ;
/* Permanent life time */
static int b ; /* dito , but reduced scope */

int a ;
static int b ;

void
func ( void )
{
char c ; /* only for the life time of func () */
/* but 2 x ; visible only in func ()
*/
static int d ; /* i m unique , exist once at a stable */
/* address , visible only in func ()
*/
}

void
func ( void )
{
char c ;
static int d ;
}
int
main ( void )
{
int e ;

int
main ( void )
{
int e ; /* life time of main () */

int * pi = ( int *) malloc ( sizeof ( int ));


func ();
func ();
free ( pi );
return (0);

int * pi = ( int *) malloc ( sizeof ( int )); /* newborn */


func ();
func ();
free ( pi ); /* RIP , pi points to an invalid address */
return (0);

}
}
117

Variable Placement And Life Time (Diagram)

Variable Placement

Adresse
0
1. Instruktion
2. Instruktion
3. Instruktion
4. Instruktion
...
a
b
d
int

PC(t=0)
PC(t=x)

pi
SP(t=x)

c
pi
e

SP(t=0)
max.

Variables (outside a function) Globally declared variables go to the


Uninitialized Data Segment if they are not initialized, to
Initialized Data Segment otherwise. Necessary for the OS to
decide if storage has to be loaded with initialization data
from the executable binary.

Code

Daten

Variables (inside a function) Implicit assumption of auto, go to


The Stack. Declared as static, see above.

Halde (Heap)

Stapel (Stack)

118

t=0: Programmausfhrung wird


gestartet, d.h., Ausfhrungsumgebung ist bereits initialisiert

Constants (const) Text Segment


Function Parameters Are pushed on The Stack or stored in
registers. If pointers are passed, data is elsewhere.

t=x: beliebiger Zeitpunkt whrend


der Programmausfhrung

119

120

From Source Code To Executable Code


Repetition Computer Architecture

Translation Steps (multi-phase compilation)


Compilation HLL source code to assembler source code

Storage Classes

Assembly Assembler source code to object code


Linking Object code to executable code

From Source Code To Executable Code


Compilers and assemblers create object files containing the
generated binary code and data for a source file. Linkers combine
multiple object files into one, loaders take object files and load
them into memory.

Construction of an Executable

Goal: An executable binary file (a.out)

Relocation Process

From high-level language (HLL) source code to executable code,


i.e., concrete processor instructions in combination with data.

121

Translation Steps Using gcc(1)

Quellcode C/C++
Eingabedateien

File Suffixes And Their Meaning

*.c/*.cc/*.cpp

Ausgabedateien
Vorverarbeiteter
C/C++-Quellcode

Objektdatei,
Bibliotheksdatei

Assembler-Quellcode

*.s

Prprozessor

Compiler

*.i/*.ii

Assembler-Quellcode

122

*.o/*.a

Assembler

*.s

Objektdatei
(ungebunden)

suffix
.c
.i
.h
.s
.o

Binder

*.o

For any given input file, the file name suffix determines what kind
of compilation is done (see gcc(1)) for more details and suffixes:

a.out

compilation step
C source code which must be preprocessed
C source code which should not be preprocessed
Header file to be turned into a precompiled header
Assembler code
An object file to be fed straight into linking

Ausfhrbare Datei
(= Objektdatei, ladbar)

123

124

Creation Of An Executable File

The C Preprocessor

(Filename).c
Kompilieren
gcc
(Filename).s

= Operation
= Kommando

The C preprocessor performs . . .

= Eingang oder
Ausgang

Assemblieren
gas

Inclusion of named files

Macro Substitution

Conditional Compilation

(Filename).o

Object/Library Files
ld
Binden

a.out

125

File Inclusion

126

Macro Substitution

A control line of the form

A control line of the form

# include filename

# define identifier token - sequence

causes the replacement of that line by the entire contents of the


file filename.

causes the preprocessor to replace subsequent instances of the


identifier with the given sequence of tokens.

Note

Example

The characters in the name filename must not include > or \n, and
the effect is undefined if it contains any of ", , \ , or /*.

# define
# define
# define
# define
# define
# define

Location
The named file is searched for in a sequence of implementationdependent places (often starting in /usr/include).

127

EXIT_FAILURE
1
EXIT_SUCCESS
0
S_IRWXU 0000700
S_IRUSR 0000400
S_IWUSR 0000200
S_IXUSR 0000100

/*
/*
/*
/*

RWX mask for owner */


R for owner */
W for owner */
X for owner */

128

Macro Substitution (cont.)

Macro Substitution (cont.)


A control line of the form

A control line of the form

# undef identifier

# define identifier ( identifier - list ) token - sequence

causes the identifiers preprocessor definition to be forgotten. It is


not erroneous to apply #undef to an unknown identifier.

where there is no space between the first identifier and the (, is a


macro definition with parameters given by the identifier list.

Example

Example
# define
# define
# define
# define
# define

S_ISDIR ( m )
S_ISCHR ( m )
S_ISBLK ( m )
S_ISREG ( m )
S_ISFIFO ( m )

(( m
(( m
(( m
(( m
(( m

&
&
&
&
&

0170000)
0170000)
0170000)
0170000)
0170000)

==
==
==
==
==

0040000)
0020000)
0060000)
0100000)
0010000)

/*
/*
/*
/*
/*

/*
* Some header files may define an abs macro .
* If defined , undef it to prevent a syntax error
* and issue a warning .
* # warning is a pragma ( implementation - dependent action )
*/
# ifdef abs
# undef abs
# warning abs macro collides with abs () prototype , undefining
# endif

directory */
char sp . */
block sp . */
regular */
fifo
*/

130

129

Conditional Inclusion

Predefined Names

Several identifiers are predefined, and expand to produce special


information. They, and also the preprocessor expression operator
defined, may not be undefined or redefined.

Parts of a program may be compiled conditionally

Example
# ifndef
# ifdef
# define
# else
# define
# endif
# endif

NULL
__GNUG__
NULL
__null
NULL

LINE
FILE
DATE
TIME

0L

STDC

131

A decimal constant containing the current source line number


A string literal containing the name of the file being compiled
A string literal containing the data of compilation Mmm dd yyyy
A string literal containing the data of compilation hh:mm:ss
The constant 1. It is intended that this identifier be defined to
be 1 only in standard-conforming implementations

132

Compilation

Assembly

evtl. temporre Dateien


Text
HLL-Quellcode

evtl. temporre Dateien

Text
Kompilation
Compiler

Assembler-Quellcode
Text

Objektformat

Text

Assemblierung

AssemblerQuellcode

Assembler

bersetzungsliste mit
Fehlermeldungen

Maschinencode und
Zusatzinformationen
Text
bersetzungsliste mit Fehlermeldungen und Symboltabelle

133

134

Linking
Repetition Computer Architecture

Storage Classes

evtl. temporre Dateien


Objektformat
Maschinencode und Zusatzinfo.

Binrcode od.
Objektformat
Binden

Objektformat
Maschinencode und Zusatzinfo.
Bibliotheksobjektformat
Maschinencode und Zusatzinfo.

Absoluter Code oder relozierbarer Code mit Zusatzinfo.

From Source Code To Executable Code

Binder (Linker)
library
search

Text
Link Map (Adressraumbenutzung), Symbolliste

Construction of an Executable

Relocation Process

135

136

Program Section In Virtual Memory

Linking An Executable Binary


OBJ1

Nach Bindung

Nach Kompilation

.text1

OBJ2

.text2

OBJ3

.text3

.data1

.bss1

.text: Code
.data: initialisierte Variablen
.bss: nicht initialisierte Variablen

.bss2

Adressraum

Sektion .text (Code):


0

.data3

.bss3

Eingabedaten: ungebundene Objektdateien

0x08048244

xx

Bindung (linking)

Sektion .data (init. Daten)

.text1

OBJtotal

0x08049370

0
yy
Jede Sektion beginnt bei Adr. 0, Sektionen
sind logische. Adressrume des Compilers

.text2

.text3

.data1

.data3

.bss1

.bss2

.bss3

Verarbeitungsresultat: ausfhrbare Datei (gebunden, reloziert)

0xffffffff
Alle Sektionen sind im Adressraum absolut platziert

I
I

Each object code (compiled seperately) starts at address 0


Linking them together involves
I
I

centralization of sections
relocation of adresses

137

Relocation Records
I

Once sections are placed subsequently, relocation can start

Executable code contains embedded addresses

Static data, function calls, jump targets

On relocation those have to be changed inside the code

Without a relocation table this is not possible

A relocation record holds the relative address of a symbol


(name of a variable, a function etc.)

138

Source File: compile.c


int a = 1;
int b ;

/* Global variable , initialized


-> . data */
/* Global variable , uninitialized -> . bss */

int
main ( void )
{
static int c ;

/* Local , static variable -> . bss */

b = 5;
c = b + a + 16;
return c ;
}

RELOCATION RECORDS FOR [. text ]:


OFFSET
TYPE
VALUE
0000001 a R_386_32
b
00000023 R_386_32
a
00000029 R_386_32
b

Compile a relocatable object file


cc -c compile.c (creates compile.o)

Linking an executable binary (one-step compilation)


cc compile.c -o compile

139

140

Analysis of Object Files: compile.o

Object File: compile.o (cont.)

$ file compile . o
ELF 32 - bit LSB relocatable , Intel 80386 , version 1 , not stripped

SYMBOL TABLE :
00000000 l
00000000 l
00000000 l
00000000 l
00000000 l
00000000 l
00000000 g
00000000 g
00000004

$ objdump -x compile . o
compile . o :
file format elf32 - i386
compile . o
architecture : i386 , flags 0 x00000011 :
HAS_RELOC , HAS_SYMS
start address 0 x00000000
Sections :
Idx Name
0 . text
1 . data
2 . bss
3 . rodata

Size
0000005 a
CONTENTS ,
00000004
CONTENTS ,
00000004
ALLOC
00000005
CONTENTS ,

VMA
LMA
00000000 00000000
ALLOC , LOAD , RELOC ,
00000000 00000000
ALLOC , LOAD , DATA
00000000 00000000

File off
00000034
READONLY ,
00000090

Algn
2**2
CODE
2**2

00000094

2**2

00000000 00000000 00000094


ALLOC , LOAD , READONLY , DATA

2**0

df
d
d
d
O
d
O
F
O

* ABS *
. text
. data
. bss
. bss
. rodata
. data
. text
* COM *

00000000
00000000
00000000
00000000
00000004
00000000
00000004
0000005 a
00000004

compile . c

c .0
a
main
b

RELOCATION RECORDS FOR [. text ]:


OFFSET
TYPE
VALUE
0000001 a R_386_32
b
00000023 R_386_32
a
00000029 R_386_32
b
00000031 R_386_32
. bss
00000036 R_386_32
. bss
0000004 c R_386_32
. rodata

141

compile . o :
file format elf32 - i386
Disassembly of section . text :
00000000 < main >:
0:
55
push
1:
89 e5
mov
3:
83 ec 18
sub
6:
83 e4 f0
and
9:
b8 00 00 00 00
mov
e:
29 c4
sub
10:
a1 00 00 00 00
mov
15:
89 45 e8
mov
18:
c7 05 00 00 00 00 05
movl
1f:
00 00 00
22:
a1 00 00 00 00
mov
27:
03 05 00 00 00 00
add
2d:
83 c0 10
add
30:
a3 00 00 00 00
mov
35:
a1 00 00 00 00
mov
3a:
8 b 55 e8
mov
3d:
3 b 15 00 00 00 00
cmp
43:
74 13
je
45:
83 ec 08
sub
48:
ff 75 e8
pushl
4b:
68 00 00 00 00
push
50:
e8 fc ff ff ff
call
55:
83 c4 10
add
58:
c9
leave
59:
c3
ret

142

compile . o :
file format elf32 - i386
Disassembly of section . text :
00000000 < main >:
int b ;
/* Global variable , uninitialized -> . bss

% ebp
% esp ,% ebp
$0x18 ,% esp
$0xfffffff0 ,% esp
$0x0 ,% eax
% eax ,% esp
0 x0 ,% eax
% eax ,0 xffffffe8 (% ebp )
$0x5 ,0 x0

*/

int
main ( void )
{
0:
55
push
% ebp
... 6 more lines ...
15:
89 45 e8
mov
% eax ,0 xffffffe8 (% ebp )
static int c ; /* Local , static variable -> . bss */

0 x0 ,% eax
0 x0 ,% eax
$0x10 ,% eax
% eax ,0 x0
0 x0 ,% eax
0 xffffffe8 (% ebp ) ,% edx
0 x0 ,% edx
58 < main +0 x58 >
$0x8 ,% esp
0 xffffffe8 (% ebp )
$0x0
51 < main +0 x51 >
$0x10 ,% esp

18:
1f:
22:
27:
2d:
30:
35:

b = 5;
c7 05 00 00
00 00 00
c = b + a +
a1 00 00 00
03 05 00 00
83 c0 10
a3 00 00 00
return c ;
a1 00 00 00

movl

$0x5 ,0 x0

00

mov
add
add
mov

0 x0 ,% eax
0 x0 ,% eax
$0x10 ,% eax
% eax ,0 x0

00

mov

0 x0 ,% eax

00 00 05
16;
00
00 00

}
... 10 more lines ...
143

144

1 c0005c0 < main >:


int b ;
/* Global variable , uninitialized -> . bss

Executable Binary File: compile

int
main ( void )
{
1 c0005c0 :
55
1 c0005c1 :
89
1 c0005c3 :
83
1 c0005c6 :
83
1 c0005c9 :
b8
1 c0005ce :
29
1 c0005d0 :
a1
1 c0005d5 :
89
static int

compile :
file format elf32 - i386
compile
architecture : i386 , flags 0 x00000112 :
EXEC_P , HAS_SYMS , D_PAGED
start address 0 x1c000408
Sections :
Idx Name
...
9 . text

Size

...
12 . data
...
20 . bss
SYMBOL TABLE :
3 c003140 l
3 c003280 g
1 c0005c0 g
3 c001018 g

O
O
F
O

File off

Algn

00000214 1 c000408 1 c000408 00000408


CONTENTS , ALLOC , LOAD , READONLY , CODE

2**2

00000014 3 c001008 3 c001008


CONTENTS , ALLOC , LOAD , DATA

00001008

2**2

00000184
ALLOC

00001100

. bss
. bss
. text
. data

VMA

LMA

3 c003100

00000004
00000004
0000005 a
00000004

3 c003100

e5
ec
e4
00
c4
00
45
c;

push
% ebp
mov
% esp ,% ebp
18
sub
$0x18 ,% esp
f0
and
$0xfffffff0 ,% esp
00 00 00
mov
$0x0 ,% eax
sub
% eax ,% esp
31 00 3 c
mov
0 x3c003100 ,% eax
e8
mov
% eax ,0 xffffffe8 (% ebp )
/* Local , static variable -> . bss */

b = 5;
1 c0005d8 :
c7 05 80
1 c0005df :
00 00 00
c = b + a + 16;
1 c0005e2 :
a1 18 10
1 c0005e7 :
03 05 80
1 c0005ed :
83 c0 10
1 c0005f0 :
a3 40 31
return c ;
1 c0005f5 :
a1 40 31
}

2**5

c .0
b
main
a

*/

32 00 3 c 05

movl

$0x5 ,0 x3c003280

00 3 c
32 00 3 c
00 3 c

mov
add
add
mov

0 x3c001018 ,% eax
0 x3c003280 ,% eax
$0x10 ,% eax
% eax ,0 x3c003140

00 3 c

mov

0 x3c003140 ,% eax

145

146

Relocation Of An Assembler Instruction


Repetition Computer Architecture
During the linking process relocated addresses are injected in the
code, for example the assignment b = 5;

Storage Classes

Before relocation ( relocatable compile .o ):


18:
c7 05 00 00 00 00 05
movl
$0x5 ,0 x0
1 c0005d8 :
c7 05 80 32 00 3 c 05
movl
$0x5 ,0 x3c003280
After relocation ( executable compile ):

From Source Code To Executable Code

The proper address for b can be found in the symbol table.


Construction of an Executable

SYMBOL TABLE : ( compile )


3 c003280 g
O . bss
00000004 b

Relocation Process

147

The symbol table for compile yields 3c003280 for variable b

148

Relocation Of An Assembler Instruction (cont.)

Relocation Of An Assembler Instruction (cont.)


Putting it all together:

? How to find the right places in the machine code to perform


the substitutions?
I

RELOCATION RECORDS FOR [. text ]: ( compile . o )


0000001 a R_386_32
b
( relative offset )

Linker has relocation record (relative address) of b


SYMBOL TABLE : ( compile )
3 c003280 g
O . bss
00000004 b
( abs . address of b )
1 c0005c0 g
F . text 0000005 a main ( abs . address of main )

RELOCATION RECORDS FOR [. text ]: ( compile . o )


0000001 a R_386_32
b

Computing the address where substitution must be performed:


I

Linker has absolute address of main from symbol table


1 c0005c0 + 0000001 a = 1 c0005da

SYMBOL TABLE : ( compile )


3 c003280 g
O . bss
00000004 b
1 c0005c0 g
F . text 0000005 a main

18:
1 c0005d8 :

c7 05 00 00 00 00 05
c7 05 80 32 00 3 c 05

movl
movl

$0x5 ,0 x0
$0x5 ,0 x3c003280

149

150

Schedule For Today


Please make sure to register to the course via StudIS. You can not
attend the examination, otherwise.

Systems Programming
03. Functions and Program Structure

So far: Static view of the program (before run-time)

Alexander Holupirek
Database and Information Systems Group
Department of Computer & Information Science
University of Konstanz

Compilation in different steps

Program files (e.g., in the ELF) contain sections

Sections are mapped to VM segments

Observed correlation between static storage class specifier,


sections in ELF file and location in virtual memory

Summer Term 2008

Today: Dynamic view on the program (during run-time)

151

A closer look at functions

Automatic allocation of memory on the stack/the heap

152

Basics Of Functions
Basics of Functions
Functions Returning Non-integers

Basics of Functions

External Variables
Scope Rules

Break large computer tasks into smaller ones

Enable people to build on what other have done

No starting over from scratch

Hide details of operation from parts of the program that dont


need to know about them

Structure the program

Easing pain of making changes

Header Files
Static Variables
A Program in Execution - Unix Run-time

154

153

A Simple Version Of The Unix Tool grep(1)

Program Layout Of Simple grep

Basic task for simple grep:

Simple grep falls neatly into three pieces:

Print each line of input that contains a particular pattern

while (there is another line)


if (the line contains the pattern)
print it

Example:
Input: Text in /etc/services
Pattern: http
$ ./ a . out < / etc / services
# See also http :// www . iana . org / assignments / port - numbers
www
80/ tcp
http
# WorldWideWeb HTTP
https
443/ tcp
# secure http ( SSL )

155

As said, small pieces are easier to deal with than one big one

Irrelevant details can be buried in the functions

Chance of unwanted interactions is minimized

Pieces may even be useful in other programs

156

A Function For Each Problem

Source Code grep:main


# include < stdio .h >
# define MAXLINE 1000

Simple grep falls neatly into three pieces:

/* maximum input line length */

int getline ( char line [] , int max );


int strindex ( char source [] , char searchfor []);

while (there is another line) getline()


if (the line contains the pattern)
print it printf(3)

char pattern [] = " http " ; /* pattern to search for */


/* find all lines matching pattern */
int
main ( void )
{
char line [ MAXLINE ];
int found = 0;

Decide whether the line contains an occurence of the pattern


We write strindex(s, t) that returns the position or index in
the string s where the string t begins, or -1 if s does not contain t
If we later want to switch to more sophisticated pattern
matching, we only have to replace strindex; the rest of the
code remains the same.18

while ( getline ( line , MAXLINE ) > 0)


if ( strindex ( line , pattern ) >= 0) {
printf ( " % s " , line );
found ++;
}
return found ;
}

18

The standard library provides strstr(3) that is similar to strindex, except


that it returns a pointer instead of an index
157

Source Code grep:strindex

158

Function Definition
A function definition has the form:

/* strindex : return index of t in s , -1 if none */


int
strindex ( char *s , char * t )
{
int i , j , k ;
for ( i = 0; s [ i ] != \0 ; i ++) {
for ( j =0 , k = i ; t [ j ] == s [ k ] && t [ j ] != \0 ; j ++ , k ++)
;
if ( j > 0 && t [ j ] == \0 )
return i ;
}
return ( -1);

return-type
function-name(parameter declarations, if any)
{
declarations
statements
}
I

void dummy(void) { }
which does nothing, accepts nothing, and returns nothing19

}
I

19
159

Various parts may be absent; a minimal function is

If the return-type is omitted, int is assumed

. . . but may be used as a place holder during program development


160

A C Program Seen As Set Of External Objects

Returning From Functions


The return Statement . . .
. . . is the mechanism for returning a value from the called function
to its caller.

C program is just a set of definitions of variables and functions


I

Communication between the functions is


I
I
I

by argument
values returned by the functions
through external variables

The functions can occur in any order in the source file

Source program can be split into multiple files, so long as no


function is split.

Any expression can follow return

expression will be converted to return-type of function

The calling function is free to ignore the returned value


There need be no expression after return

in that case, no value is returned to the caller (garbage)

Control also returns with no value when execution falls off


the end of the function by reaching the closing right brace

It is not illegal, but probably a sign of trouble, if a function


returns a value from one place and no value from another

If a function fails to return a value, its value is likely garbage

161

162

Functions Returning Non-integers


Basics of Functions
Functions Returning Non-integers

Functions returning non-integer values

External Variables
Scope Rules

So far we have only returned either no value (void) or an int

What if function must return some other type?

To illustrate how to deal with this, we write and use atof(s).

Header Files

The function atof(s)

Static Variables

Converts the string s to its double-precision floating-point


equivalent. It handles an optional sign and decimal point, and the
presence or absence of either integer part or fractional part20 .

A Program in Execution - Unix Run-time

20
163

Use atof(3) declared by <stdlib.h> in real life


164

Source Code: atof

Declare To Use A Function

# include < ctype .h > /* isspace , isdigit ... */

double
/* atof : convert string s to double */
atof ( char s [])
{
double val , power ;
int i , sign ;

Calling function must know atof(s) returns a non-int value


One way to ensure this:
I

Declare atof() explicitly in the calling function

This kind of declaration is shown in a primitive calculator:

# include < stdio .h >


# define MAXLINE 100

for ( i = 0; isspace ( s [ i ]); i ++)


; /* skip white space */
sign = ( s [ i ] == - ) ? -1 : 1;
if ( s [ i ] == + || s [ i ] == - )
i ++;
for ( val = 0.0; isdigit ( s [ i ]); i ++)
val = 10.0 * val + ( s [ i ] - 0 );
if ( s [ i ] == . )
i ++;
for ( power = 1.0; isdigit ( s [ i ]); i ++) {
val = 10.0 * val + ( s [ i ] - 0 );
power *= 10.0;
}
return sign * val / power ;

int
/* rudimentary calculator */
main ( void )
{
double sum , atof ( char []);
char line [ MAXLINE ];
int getline ( char line [] , int max );

+123.2
123.2
-0.2
123
+0.7
123.7
-123.1

sum = 0;
while ( getline ( line , MAXLINE ) > 0)
printf ( " \ t % g \ n " , sum += atof ( line ));
return (0);

0.6

}
165

Inconsistent Return Types

166

Function Declaration By Context


A mismatch can happen,

The declaration
double sum, atof(char []);

if there is no function prototype,

and a function is implicitly declared by its first appearance in


an expression, just like in our calculator expression
sum += atof(line)

says that sum is a double variable, and that atof is a function that
takes one char[] argument and returns double.
I

The function atof must be declared and defined consistently

If atof itself and the call to it have inconsistent types in the


same source file, the error will be detected by the compiler

Function Declaration By Context

But if (as is more likely) atof were compiled separately, the


mismatch would not be detected, atof would return a
double that main would treat as an int, and meaningless
answers would result

167

If a name that has not been previously declared occurs in an


expression and is followed by a left parenthesis, it is declared
by context to be a function name

The function is assumed to return an int

Nothing is assumed about its arguments

168

Missing Function Arguments

Explicit Cast Of The Return Type


Given atof, properly declared, we could write atoi in terms of it:
/* atoi : convert string s to integer using atof */
int
atoi ( char s [])
{
double atof ( char s []);

If a function declaration does not include arguments, as in


double atof();
this is taken to mean that nothing is to be assumed about the
arguments of atof; all parameter checking is turned off.

return ( int ) atof ( s );


}

This special meaning of the empty argument list is intended to


permit older C programs to compile with ANSI/ISO compilers

If the function takes arguments, declare them

If the function takes no arguments, use void

The value of the return expression is converted to the type of


the function before the return is taken.

Therefore, the value of atof, a double, is converted


automatically to int when it appears in this return

This operation does potentially discard information warning

The cast states explicitly that the operation is intended

170

169

External Objects
Basics of Functions
Functions Returning Non-integers

As mentioned, a program is just a set of definitions of variables


and functions. These can be considered as external objects.

External Variables
Scope Rules
Header Files

Functions are always external21

External is used in contrast to internal, which describes the


arguments and variables used inside functions

By default, external variables and functions have the


property that all references to them by the same name, even
from functions compiled separately, are references to the same
thing (this is called external linkage in the standard)

Static Variables
A Program in Execution - Unix Run-time

21
171

C does not allow functions to be defined inside other functions


172

A Reverse Polish Notation Calculator

Calculator Design Using A Stack


stack :

1 -1 -1 -1 -1 -9
2
4 4 9
5

input :

We will build a reverse polish notation calculator to discuss


I

Function evaluation

Splitting up a program in several source files

Scope Rules

Program description

Infix Notation vs. Reverse Polish Notation

( 1 - 2 ) * ( 4 + 5 )

Each operand arriving is pushed on the stack


Once an operator arrives
I

1 2 - 4 5 + *

I
I

Parentheses are not needed; the notation is unambigous as long as


we know how many operands each operator expects.

Pop apt number of operands (e.g., two for binary operators)


Apply operator to them
Push the result back onto the stack

The value on the top of the stack is popped and printed when
the end of the input line is encountered.

174

173

Calculator Program Layout

Program Design Considerations

Basic structure of our calculator (controlling main function):


while (next operator or operand is not EOF)
if (number)
push it
else if (operator)
pop operands
do operation
push result
else if (newline)
pop and print top of stack
else
error

Pushing and popping a stack are trivial, but with error


handling long enough to be put each in a separate function

A function for fetching the next input operator or operand

Where to put the stack? Who should access it directly?


I

Keep it in main.
Pass the stack to the routines that push and pop it
I But main doesnt need to know about the stack
I main only does push and pop operations

175

Store the stack and its pointer in external variables

Accessible to the push and pop functions but not main

176

Program Layout In One Source File


1
2

# include < stdio .h >


# include < stdlib .h >

Lets think of the program as existing in one source file:

4
5

# includes
# defines

function declarations for main

9
10

int main ( void ) { }

# define MAXOP 100


/* signal number found */
# define NUMBER 0

external variables for push and pop

13
14

void push ( double f ) { }


double pop ( void ) { }

15
16
17

21
22
23
24

int getop ( char []);


void push ( double );
double pop ( void );

11
12

while (( type = getop ( s ))!= EOF ) {


switch ( type ) {
case NUMBER :
break ;
case + :
break ;
case * :
break ;
case - :
break ;
case / :
break ;
case \ n :
break ;
default :
printf ( " error : unknown \
command % s \ n " , s );
}
}
return (0);

19
20

7
8

18

25
26
27
28

/* reverse polish calc */


int
main ( void )
{
int type ;
char s [ MAXOP ];

int getop ( char s []) { }

29
30
31
32
33
34
35
36

routines called by getop

Main loop switches on the


type of operator or operand

37
38
39

178

177

Order Of Function Evaluation

Steering The Order Of Function Evaluation

I What about following


switch ( type ) {
case NUMBER :
implementation of switch?
push ( atof ( s ));
break ;
I + and * are commutative, the
case + :
order in which the popped
push ( pop () + pop ());
break ;
operands are combined is
case * :
irrelevant
push ( pop () * pop ());
break ;
I - and / left and right operands
case - :
must be distinguished
push ( pop () - pop ());
break ;
I / error: zero-divisor
case / :
push ( pop () / pop ());
I The order in which function
break ;
calls are evaluated is not defined
case \ n :
printf ( " \ t %.8 g \ n " , pop ());
Implementation is erroneous
break ;
default :
printf ( " error : unknown command % s \ n " , s );
}
179

20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40

switch ( type ) {
41
case \ n :
case NUMBER :
42
printf ( " \ t %.8 g \ n " , pop ());
push ( atof ( s ));
43
break ;
break ;
44
default :
case + :
45
printf ( " error : unknown command % s \
push ( pop () + pop ());
46
}
break ;
case * :
To guarantee the right order, it
push ( pop () * pop ());
is necessary to pop the first value
break ;
into a temporary variable.
case - :
op2 = pop ();
push ( pop () - op2 );
break ;
case / :
op2 = pop ();
if ( op2 != 0.0)
push ( pop () / op2 );
else
printf ( " error : zero divisor \ n " );
break ;

180

Source Code Stack


I

The stack itself and its fill factor (the stack pointer) are
shared by push and pop

Since they are defined outside any function, they are external

Source Code Stack

66
67

51

68

# define MAXVAL 100 /* maximum depth of val stack */

69

52
53
54

70

int sp = 0;
/* next free stack position */
double val [ MAXVAL ]; /* value stack */

71
72

55
56
57
58
59
60
61
62
63
64

73

/* push : push f onto value stack */


void
push ( double f )
{
if ( sp < MAXVAL )
val [ sp ++] = f ;
else
printf ( " error : stack full , can t push % g \ n " , f );
}

74
75
76

/* pop : pop and return top value from stack */


double
pop ( void )
{
if ( sp > 0)
return val [ - - sp ];
else {
printf ( " error : stack empty \ n " );
return (0.0);
}
}

181

Source Code To Get Operands And Operators


83
84
85
86
87

What Are getch And ungetch?

/* getop : get next operator or numeric operand */


int
getop ( char s [])
{
int i , c ;

What are getch and ungetch?

88

while (( s [0] = c = getch ()) == || c == \ t )


;
s [1] = \0 ;
if (! isdigit ( c ) && c != . )
return c ; /* not a number */
i = 0;
if ( isdigit ( c ))
/* collect integer part */
while ( isdigit ( s [++ i ] = c = getch ()))
;
if ( c == . )
/* collect fraction part */
while ( isdigit ( s [++ i ] = c = getch ()))
;
s [ i ] = \0 ;
if ( c != EOF )
ungetch ( c );
return NUMBER ;

89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105

182

It is often the case that a program cannot determine that it has


enough input until is has read too much.

Example: Collecting the characters that make up a number


Problem: Until the first non-digit is seen, the number is not
complete. But then the program has read one character too far.
Solution: It would be nice if it were possible to un-read the
unwanted character.

}
183

184

The Functions getch And ungetch

Source Code: (un-)getch


107

# define BUFSIZE 100

108
109
110

getch delivers the next input character to be considered

111
112

ungetch remembers the characters put back on the input.


Subsequent calls to getch will return them before
reading new input22 .

113
114
115
116
117

Work together via a shared buffer and an index in the buffer.

Because of that and because they must retain their values


between calls they must be external to both functions.

/* getch : get a ( possibly pushed back ) character */


int
getch ( void )
{
return ( bufp > 0) ? buf [ - - bufp ] : getchar ();
}

118
119
120
121
122
123
124
125
126
127

22

char buf [ BUFSIZE ]; /* buffer for ungetch */


int bufp = 0;
/* next free position in buf */

/* ungetch : push character back on input */


void
ungetch ( int c )
{
if ( bufp >= BUFSIZE )
printf ( " ungetch : too many characters \ n " );
else
buf [ bufp ++] = c ;
}

ungetc(3) declared in <stdio.h> un-gets a character from input stream


185

186

A Program In Several Files


Basics of Functions
I

As seen in assignments, the functions and variables that make


up a C program need not all be compiled at the same time.

The source text may be kept in several files, and previously


compiled routines may be loaded from libraries.

Functions Returning Non-integers


External Variables

There may arise some questions with this:

Scope Rules

How are declarations written so that variables are properly


declared during compilation?

How are declarations arranged so that all the pieces will be


properly connected when the program is loaded?

How are declarations organized so there is only one copy?

How are external variables initialized?

Header Files
Static Variables
A Program in Execution - Unix Run-time

187

188

Visibility Scope

Scope In Natural Order Of Appearance

Visibility Scope

main, sp, val, push, & pop defined in one file, in the order shown:

The scope of a name is the part of the program within which the
name can be used.

int
main ( void )
{ ... }

For an automatic variable declared at the beginning of a


function, the scope is the function in which the name is
declared

int sp = 0;
double val [ MAXVAL ];
void
push ( double f )
{ ... }

Local variables of the same name in different functions are


unrelated

The same is true of the parameters of the function, which


are in effect local variables

double
pop ( void )
{ ... }

The scope of an external variable or a function lasts from


the point at which it is declared to the end of the file being
compiled

Variables sp and val may be used in push and pop simply by


naming them; no further declarations are needed.
But these names are not visible in main, nor are push and pop
189

Definition & Declaration Of External Variables

190

Definition And Declaration Of External Variables


Consider the lines to appear outside of any function:
int sp ;
double val [ MAXVAL ];

Definition and Declaration of External Variables


I

If an external variable is to be referred to before it is defined

Or if it is defined in a different source file from the one where


it is being used

Then an extern declaration is mandatory.

They define the external variable sp and val

and cause storage to be set aside

and serve as the declaration for the rest of that source file.

On the other hand, consider the lines:

It is important to distinguish between the declaration of an


external variable and its definition.

extern int sp ;
extern double val [];

definition causes storage to be set aside (sets storage class)


declaration announces the properties of a variable (its type)

191

They declare for the rest of the source file that sp is an int
and val is a double[] (whose size is determined elsewhere)

They do not create the variables or reserve storage for them.


192

Wrap Up Definition And Declaration

Definition/Declaration Of Externals
Although it is not a likely organization for this program

Wrap Up Definition and Declaration


I

Array sizes must be specified with the definition, but are


optional with an extern declaration.

23

variables val and sp defined and initialized in another.

extern int sp ;
extern double val [];

Initialization of an external variable is possible only within


the definition
Other files may contain extern declarations to access it23

functions push and pop could be defined in one file

These definitions and declarations tie them together:

There must be only one definition of an external variable


among all files that make up the program.

void push ( double f ) { ...}


double pop ( void ) { ... }

# define MAXVALUE 100


int sp = 0;
double val [ MAXVALUE ];

Because the extern declarations lie ahead of and outside the


function definitions, they apply to all functions

One set of declarations suffices for all of the left file

The same organization would also be needed if the definitions


of sp and val followed their use in one file

There may also be extern declarations in the file containing the definition
194

193

Program Organisation In Different Files


Basics of Functions
Functions Returning Non-integers
Let us now divide the calculator program into several source files
(as a simulation for substantially bigger programs)

External Variables
Scope Rules
Header Files

main main.c

push and pop, and their variables stack.c

getop getop.c

getch and ungetch getch.c24

Static Variables
A Program in Execution - Unix Run-time
24

We seperate them from the others because they would come from a
seperately-compiled library in a realistic program
195

196

Header File

Program Structure

What about the definitions & declarations shared among files?

calc . h

main . c

# define NUMBER 0
void push ( double );
double pop ( void );
int getop ( char []);
int getch ( void );
void ungetch ( int );

# include < stdio .h >


# include < stdlib .h >
# include " calc . h "
# define MAXOP 100

As much as possible, we want to centralize this

As a consequence, there would be only one copy to get right


and keep right as the program evolves

We will place this common material in a header file calc.h

It will be included by the others as necessary

getch . c

stack . c

getop . c

There is a tradeoff between the desire that each file have


access only to the information it needs for its job and the
practical reality that it is harder to maintain more header files

# include < stdio .h >


# define BUFSIZE 100

# include < stdio .h >


# include " calc . h "
# define MAXVAL 100

# include < stdio .h >


# include < ctype .h >
# include " calc . h "

Up to some moderate program size, it is probably best to have


one header file that contains everything that is to be shared
between any two parts of the program

char buf [ BUFSIZE ];


int bufp = 0;

int sp = 0;
double val [ MAXVAL ];

int getch ( void ) {}


void ungetch ( int ) {}

void push ( double ) {}


double pop ( void ) {}

int main ( void ) {}

int getop ( char []) {}

198

197

Static Variables
Basics of Functions
The variables
Functions Returning Non-integers
External Variables

sp and val in stack.c and buf and bufp in getch.c

are for private use of functions in their source files

are not meant to be accessed by anything else

The static declaration

Scope Rules
Header Files

applied to an external variable or function

limits the scope of that object to the rest of the source file

External static thus provides a way to hide names like buf and
bufp in the getch-ungetch combination, which must be external so
they can be shared, yet which should not be visible to users of
getch and ungetch.

Static Variables
A Program in Execution - Unix Run-time

199

200

External Static Example

Specifier static For Functions


I

The external static is most often use for variables

If the two function and the two variables are compiled in one file:

But can be applied to functions as well

static char buf [ BUFSIZE ]; /* buffer for ungetch */


static int bufp = 0;
/* next free position in buf */

Normally, function names are global, however, declared static


its name is invisible outside of the file in which it is compiled.

int getch ( void ) { ... }


void ungetch ( int c ) { ... }
I

No other function will be able to access buf and bufp

The names will not conflict with the names in other files of
the same program

The same goes for sp and val in stack.c

$ readelf -s global . o
Symbol table . symtab
Num :
Value Size
0: 00000000
0
1: 00000000
0
2: 00000000
0
3: 00000000
0
4: 00000000
0
5: 00000005
5
6: 00000000

contains 7 entries :
Type
Bind
Vis
NOTYPE LOCAL DEFAULT
FILE
LOCAL DEFAULT
SECTION LOCAL DEFAULT
SECTION LOCAL DEFAULT
SECTION LOCAL DEFAULT
FUNC
LOCAL DEFAULT
^^^^^
5 FUNC
GLOBAL DEFAULT
^^^^^^

Ndx Name
UND
ABS global . c
1
2
3
1 local_func
1 global_func

202

201

Internal static For Local Variables

Storage Allocation For Internal static Variables


int main ( void ) {
static int inte rnal_static [4096];
}

The internal static declaration for local variables


I

Internal static variables are local to a particular function just


as automatic variables

But unlike automatics ( stack), they remain in existence


rather than coming an going each time the function is
activated ( .bss or .data).

$ readelf -a internal_stat ic . o
Section Headers :
[ Nr ] Name
Type
Addr
Off
[ 3] . data PROGBITS 00000000 000074
[ 4] . bss
NOBITS
00000000 000080
^^^^

Size
ES Flg Lk Inf Al
000000 00 WA 0
0 4
004000 00 WA 0
0 32
^^^^^^

int main ( void ) {


static int internal_static [4096] = { 1 , 2 , 3 };
}

This means that internal static variables provide private,


permanent storage within a single function

$ readelf -a internal_stat ic . o
[ Nr ] Name
Type
Addr
Off
[ 3] . data PROGBITS 00000000 000080
^^^^^
[ 4] . bss
NOBITS
00000000 004080

203

Size
ES Flg Lk Inf Al
004000 00 WA 0
0 32
^^^^^^
000000 00 WA 0
0 4

204

Array Intialization
Basics of Functions
int main ( void ) {
static int i nte rn al_ stat ic [4096] = { 1 , 2 , 3 };
}

Functions Returning Non-integers


External Variables

Array Intialization
I

Scope Rules

If there are fewer initializers for an array than the number


specified, the missing elements will be zero for external, static,
and automatic variables

It is an error to have too many initializers

There is no way to specify repetition of an initializer, nor to


initialize an element in the middle of an array without
supplying all the preceding values as well

Header Files
Static Variables
A Program in Execution - Unix Run-time

206

205

Virtual Memory User/Kernel


I

Memory Map, ELF, And Stack Usage Of Functions

System V ABI old style


(without stack randomization etc.)
0x08048000 Text/Data
0x40000000 Shared libs/mmap
0xc0000000 Stack
Randomizations now may occur in
I
I
I

Following slides about the function call stack are taken from:
Prof. Dr. Torsten Grust.
Buffer Overflow Exploits.
Talk at the University of Konstanz
February 2004

Location of the stack itself


Locations of shared libraries
Start of the programs heap

http://www3.in.tum.de/cms/members/grust

207

208

Schedule For Today


Systems Programming
So far: Learned about the memory layout of a C program

04. Pointers and Arrays

Observed correlation between storage class specifiers, sections


in ELF file and location in virtual memory

Learned about function calls and stack frames

While examining the virtual memory with the debugger,


memory addresses played an important role

Alexander Holupirek
Database and Information Systems Group
Department of Computer & Information Science
University of Konstanz

Today: Learn how make use of those memory addresses

Summer Term 2008

Dynamic allocation of memory on the heap

A closer look at pointers and arrays

209

210

Dynamic Memory Allocation - malloc(3)


Dynamic Memory Allocation

malloc(3) and calloc(3) obtain blocks of memory


# include < stdlib .h >

Pointers and Addresses

void *
malloc ( size_t size );

Pointers and Function Arguments

void *
calloc ( size_t nmemb , size_t size );

Pointers, Arrays and Address Arithmetic

malloc(3)

Character Pointers and C Strings


Command-line Arguments

211

Returns a pointer to size bytes of uninitialized storage

NULL if the request can not be satisfied

212

Dynamic Memory Allocation - calloc(3)

Extend Or Reduce Allocated Memory

malloc(3) and calloc(3) obtain blocks of memory

realloc(3) modifies size of (previously) allocated memory

# include < stdlib .h >

# include < stdlib .h >

void *
malloc ( size_t size );

void *
realloc ( void * ptr , size_t size );

void *
calloc ( size_t nmemb , size_t size );

realloc(3)
I

realloc(3) changes the size of the object pointed to by ptr


to size bytes

calloc(3) returns a pointer to enough space for an array of


nmemb objects of the specified size

It returns a pointer to the (possibly moved) object

Be sure to not have any more references to the old location

NULL if the request can not be satisfied

The storage is initialized to zero

The content of the old memory area is automatically moved


to the new location

calloc(3)
I

214

213

Alignment Of Obtained Storage

Freeing Allocated Memory


free(p) frees the space pointed to by p

A pointer to dynamically allocated storage

Only storage obtained by malloc or calloc can be freed

void * is the proper type for a generic pointer

If p is a NULL pointer, no action occurs

It is a pointer to any data type, but has to be converted


(coerced) into some other type before usage

It is an error to use something after it has been freed

The pointer returned by malloc(3) or calloc(3) has the


proper alignment for the object in question

Incorrect code that frees items from a list:


for ( p = head ; p != NULL ; p = p - > next ) /* WRONG */
free ( p );

It can be cast explicitly into the appropriate type

The right way is to save whatever is needed before freeing:

int * ip ;

for ( p = head ; p != NULL ; p = q ) {


q = p - > next ;
free ( p );
}

/* explicit cast to coerce returned void pointer */


ip = ( int *) calloc (n , sizeof ( int ));

215

216

Pointers And Addresses


Dynamic Memory Allocation
Pointers and Addresses

A pointer is a variable that contains the address of a variable

Pointer also carries the notion of what type of data it points to

Lets try to illustrate this observing running programs:

Pointers and Function Arguments

int
main ( void )
{
char c ;
char * p ;

Pointers, Arrays and Address Arithmetic


Character Pointers and C Strings

c = @ ;
p = &c;

Command-line Arguments

/* @ = 0 x40 */

return (0);
}

218

217

Simplified Picture Of Memory Organization

The Unary Address Operator: &

Memory as array of consecutively numbered memory cells


I

Can be manipulated individually or in contiguous groups

Typical situation: Any byte (often the shortest accessible unit)


can be a char; sizeof(char) is one by definition

Adjacent bytes may form a short, integer, long, string . . .

A pointer is a group of cells that can hold an address

p : 0 xcfbe5174 :
75:
76:
77:
:
:
:
c : 0 xcfbe517b :

p : 0 xcfbe5174 :
75:
76:
77:
:
:
:
c : 0 xcfbe517b :

0 x7b -. 4 cells interpreted as address


0 x51 | - - - - - - - - - - - - - - - - - - - - - - - - -.
0 xbe |
( little endian )
|
0 xcf -
|
____
|
____
|
____
|
0 x40 <- 1 memory cell ( @ ) <-

Unary operator & gives the address of an object

p = &c; the address of c is assigned to p

p is said to point to c
& operator only applies to objects in memory

219

0 x7b -. 4 cells interpreted as address


0 x51 | - - - - - - - - - - - - - - - - - - - - - - - - -.
0 xbe |
( little endian )
|
0 xcf -
|
____
p = &c;
|
____
|
____
|
0 x40 <- 1 memory cell ( @ ) <-

variables, array elements, and functions

220

Pointing To Integers

Simplified Picture Of Memory Organization II

As said, the pointer is associated with its type

Let us now consider a pointer to an int

(gdb) p /x i
/* examine content of i in hex
$2 = 0xdeadbeaf
(gdb) p pi
/* print content of pi
$3 = (unsigned int *) 0xcfbe2958 /* address uint is stored
(gdb) x /4b 0xcfbe2958 /* examine 4 bytes at this address
0xcfbe2958: 0xaf 0xbe 0xad 0xde
/* little endian
(gdb) p &pi
/* print the address of the pointer to int
$4 = (unsigned int **) 0xcfbe2954
(gdb) x /4b 0xcfbe2954
/* print what is stored there
0xcfbe2954: 0x58 0x29 0xbe 0xcf
/* the address of i

int
main ( void )
{
unsigned int i ;
unsigned int * pi ; /* pointer to int */
i = 0 xdeadbeaf ;
pi = & i ; /* address of i in pointer variable */

pi :

return (0);
}

i:

0 xcfbe2954 :
55:
56:
57:
0 xcfbe2958 :
59:
5a:
5b:

0 x58 -.
0 x29 |
0 xbe | - - - -.
0 xcf -
|
|
0 xaf
<----
0 xbe
0 xad
0 xde

0xcfbe2954:
0xcfbe2958:

0xcfbe2958
0xdeadbeaf

*/
*/

222

Artificial Pointer Operations

(gdb) p /x &pi
$10 = 0xcfbe2954
(gdb) p pi
$3 = (unsigned int *) 0xcfbe2958
(gdb) p /x i
$13 = 0xdeadbeaf

pi:
i:

*/
*/
*/
*/
*/

pointer to int

221

The Unary Dereferencing Operator: *

*/

int x = 1, y = 2, z[10];
int *ip; /* ip is a pointer to int */
/* address of an int */

ip = &x; /* ip now points to x (contains address of x) */

(gdb) p /x *pi /* print (in hex) the object pointed at */


$9 = 0xdeadbeaf
(gdb) p /x *&i
$12 = 0xdeadbeaf
(gdb) p /x *i
Cannot access memory at address 0xdeadbeaf

Unary operator * is the indirection or dereferencing operator

Applied to a pointer the object the pointer points to is accessed

y = *ip; /* y is now 1 */
*ip = 0; /* x is now 0 */
ip = &z[0]; /* ip now points to z[0] */

223

224

Declaration Of A Pointer

Pointers Reference A Specific Data Type


A pointer is constrained to point to a particular kind of object

The declaration of the pointer ip is intended as a mnemonic

int * ip ;

Exception: Pointer to void is used to hold any type of pointer


Can not be dereferenced

It says that the expression *ip is an int

Assume ip points to the integer x

The syntax of the declaration for a variable mimics the syntax


of expressions in which the variable might appear

Then *ip can occur in any context where x could:

This reasoning applies to function declarations as well:

1
2
3

double * dp , atof ( char *);


I

int * ip ;
int x = 0;

4
5

It says that in an expression *dp and atof(s) have values of


type double, and the argument of atof is a pointer to char

ip = & x ;
* ip = * ip + 10;

Line 5 increments *ip (and therefore x) by 10

226

225

Binding Of Unary Operators

Precedence And Associativity Of Operators

Unary operator * and & bind more tightly than arithmetic ones

Operators
( ) [ ] -> .
! ~ ++ -- + - * & (type) sizeof
* / %
+ << >>
< <= > >=
== !=
&
^
|
&&
||
?:
= += -= *= /= %= &= ^= |= <<= >>=
,

y = * ip + 1;
I

takes whatever ip points at

adds 1, and assigns the result to y

* ip += 1;
++* ip ;
(* ip )++;
I

Increment what ip points to

The parentheses are necessary in this last example.

Without, the expression would increment ip instead of what it


points to. Unary operators like * and ++ associate right to left.

Associativity
left to right
right to left
left to right
left to right
left to right
left to right
left to right
left to right
left to right
left to right
left to right
left to right
left to right
right to left
left to right

Table: Unary +, -, and * have higher precedence than the binary forms

227

228

Pointers Are Variables


Dynamic Memory Allocation
Pointers and Addresses

Pointers are variables and can be used without dereferencing


Pointers and Function Arguments

iq = ip ;
I

Copies the contents of ip into iq

Makes iq point to whatever ip pointed to

Pointers, Arrays and Address Arithmetic


Character Pointers and C Strings
Command-line Arguments

230

229

Passing Function Arguments By Value


I

Passing Pointers To Functions

In C arguments to functions are passed by value

Called function can not alter a variable in the calling function

Pointer arguments enable a function to access and change


objects in the function called it

void swap ( int , int );


void
swap ( int x , int y )
{
int tmp ;

void swap ( int * , int *);

/* WRONG */

void
swap ( int * px , int * py )
{
int tmp ;

tmp = x ;
x = y;
y = tmp ;

tmp = * px ;
* px = * py ;
* py = tmp ;

swap : px :

- - -.
|
py : - - -+ - -.
| |
| |
| |
| |
| |
caller : a : <-- |
|
b : <-----

The function only swaps copies of a and b

231

232

Function getint(): Get Integer from Input

Seperate Paths Back To Caller

Problem Statement:

Convert stream of characters into integer values


$ ./ getint
foo -01 bar +1 baz <- input
-1 (0 xffffffff ) <- output
10 (0 x0a )
<- output

return the values it found

signal end of file (EOF) when there is no more input

No matter what value is used for EOF

It could also be the value of an input integer

These values have to be passed back by seperate paths

Selected Approach25 :

getint has to:


I

Let getint return EOF as its function value

Use pointer argument to store back the converted integer

25

In fact, this is an often used approach in C


234

233

# include < ctype .h >

Usage Of getint()

int getch ( void );


int ungetch ( int );

Fill an array with integers by calls to getint

/* getint : get next integer from input into * pn */


int
getint ( int * pn )
{
int c , sign ;

int n , array [ SIZE ] , getint ( int *);


for ( n = 0; n < SIZE && getint (& array [ n ]) != EOF ; n ++}
;
I

Each call sets array[n] to the next integer found in input

It also increments n

It is essential to pass the address of array[n] to getint

This communicates the converted integer back to the caller

The following version of getint returns EOF, zero if the next


input is not a number, and a positive value for a valid number

while ( isspace ( c = getch ()))


/* skip white space */
;
if (! isdigit ( c ) && c != EOF && c != + && c != - ) {
ungetch ( c ); /* it s not a number */
return (0);
}
sign = ( c == - ) ? -1 : 1;
if ( c == + || c == - )
c = getch ();
for (* pn = 0; isdigit ( c ); c = getchar ())
* pn = 10 * * pn + ( c - 0 );
* pn *= sign ;
if ( c != EOF )
ungetch ( c );
return c ;
}
235

236

Pointers And Arrays


Dynamic Memory Allocation
Pointers and Addresses
Pointers and Function Arguments

There is a strong relationship between pointers and arrays

Any operation achieved by array subscripting . . .

. . . can also be done with pointers

int a [10];
int * pa ;
int x ;

Pointers, Arrays and Address Arithmetic

/* Define an array a of size 10 */


/* Pointer to an integer */

Character Pointers and C Strings


a:
Command-line Arguments

a[0] a[1] a[2] a[3] a[4] a[5] a[6] a[7] a[8] a[9]

238

237

Pointer Into Array

Pointer Into Array (cont.)

pa = & a [0]; /* Set pa to point to element zero of a */

x = * pa ;

pa1

pa1

a:

a:
a[0] a[1] a[2] a[3] a[4] a[5] a[6] a[7] a[8] a[9]

/* Copy the contents of a [0] into x */

a[0] a[1] a[2] a[3] a[4] a[5] a[6] a[7] a[8] a[9]

Assignment pa = &a[0];

Assignment x = *pa;

Sets pa to point to element zero of a

Copies the content of a[0] into x

pa contains the address of a[0]

239

240

Adding 1 To A Pointer

Adding i To A Pointer

If pa points to a particular element of an array

Thus, if pa points to a[0]

Then, by definition, pa+1 points to the next element

Then *(pa+2) refers to the contents of a[2]

pa1 pa+1

pa1 pa+1 pa+2

a:

a:
a[0] a[1] a[2] a[3] a[4]

a[0] a[1] a[2] a[3] a[4]

In general:

In general:

pa+i points i elements after pa

pa+i is the address of a[i]

pa-i points i elements before

*(pa+i) is the contents of a[i]

241

Adding A Pointer And An Integer

242

Array-and-Index And Pointer-and-Offset

Very close correspondence between indexing and pointer arithmetic

A pointer and an integer may be added (or substracted)

Equivalence of array-and-index and pointer-and-offset expr.:

The construction p + n means the address of the n-th object


beyond the one p currently points to

n is scaled according to the size of the object pa points to


(which is determined by the declaration of p)

Holds regardless of the type or size of the variables in the array

a[i] is converted to *(a+i) immediately (in evaluation)

If an int is four bytes, for example, n is scaled by four

Applying the & address operator to both sides of equivalence

If pa is a pointer, expressions may use it with a subscript

a[i]
&a[i]
pa[i]

243

*(a+i)
a+i
*(pa+i)

244

Indexing Backwards

Example: Scaling According To Type

With pointers into arrays we can use pointer arithmetic to


access nearby cells of the array

If we are sure that an element exist, it is also possible to index


backwards in an array p[-1], p[-2] and so on

This refers to objects before what p points to

Illegal to refer to objects that are not within the array bounds
p[-1]

*(p + -1)

int a [] = {0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9};
char a [] = " abcdefghij " ;

a+i
0xcfbf2a44
0xcfbf2a48
0xcfbf2a4c
...
0xcfbf2a74
0xcfbf2a75
0xcfbf2a76
...

*(p - 1)

Figure: Expressions p[-1] converted to the pointer form

a[i]
0
1
2
...
a
b
c
...

(a+i+1) - (a+i)
0xcfbf2a48 - 0xcfbf2a44
0xcfbf2a4c - 0xcfbf2a48
0xcfbf2a50 - 0xcfbf2a4c
...
0xcfbf2a75 - 0xcfbf2a74
0xcfbf2a76 - 0xcfbf2a75
0xcfbf2a77 - 0xcfbf2a76
...

(a+i+1) - (a+i)
1
1
1
...
1
1
1
...

246

245

Difference Between Pointer And Array Name

Arrays In C

One difference between an array name and a pointer:


I

A pointer is a variable

An array name is not a variable

The value of a variable or expression of type array is the address of


element zero of the array

As such:
int * pa ;
int a [3];
pa = a ; /* legal */
pa ++;
/* legal */
/* a = pa ;
/* a ++;

illegal */
illegal */

* a = 1;
*( a + 1) = 2;
*( a + 1) = 3;
*( a + 2) = 4;

In most languages, the value of an array is the entire array

If an array appears on the right-hand sign of an assignment,


the entire array is assigned, and the left-hand side had better
be an array, too

C does not work this way

C never lets you manipulate entire arrays

printf ( " %d , %d , % d \ n " ,


a [0] , a [1] , a [2]);

247

248

Array Name As Address Of First Element

Passing An Array To A Function

Definition:
The value of a variable (or expression) of type array is the
address of element zero of the array
I

The value of an array, when it appears in an expression, is a


pointer to its first element

Shorter: The value of the array a simply is &a[0]

When an array name is passed to a function

/* equivalent assignments / rhs expressions */


pa = & a [0];
pa = a ;
I

what is passed is the location of the initial element

what is passed is a pointer

what is passed is a variable containing an address

As a consequence, within the called function, this argument is a


local variable

One can often read that, when an array appears in an


expression, it decays into a pointer to its first element

250

249

An Array As Formal Argument

Passing Parts Of An Array To A Function

Function definition:
As formal parameters char s[] and char *s are equivalent
I

The latter may be prefered, because it says more explicitly


that the parameter is a pointer

When an array name is passed to a function, the function can


at its convenience believe that it has been handed either an
array or a pointer, and manipulate it accordingly

It is possible to pass part of an array to a function, by passing


a pointer to the beginning of the subarray:
f(&a[2]);

f(a+2);

Figure: Pass address of subarray that starts at a[2] to the function f

It can even use both notations if it seems appropriate and clear


I

Accordingly, within f, the parameter declaration can read:

So as far as f is concerned, the fact that the parameter refers


to part of a larger array is of no consequence

f ( int arr []) { ... }


or
f ( int * arr ) { ... }

251

252

Computing String Length Using A Pointer

Legal Calls To strlen?

/* strlen : return the length of string s */


int
strlen ( char * s )
{
int n ;

Given the last two slides, which calls will work?

for ( n = 0; * s != \0 ; s ++)
n ++;

strlen ( " hello , world " ); /* string constant */


strlen ( array );
/* char array [100]; */
strlen ( ptr );
/* char * ptr ; */

return n ;
}
I

Since s is a pointer, incrementing it is perfectly legal

s++ has no effect on the character string in the caller function

It merely increments strlens private copy of the pointer

254

253

Correct Statements?

Explanation
Left block

int a[10];
int *pa;
pa = a;
*pa = 0;
*(pa+1) = 1;
pa[2] = 2;
pa = &a[5];
*pa = 5;
*(pa-1) = 4;
pa[1] = 6;
pa = &a[9];
*pa = 9;
pa[-1] = 8;

/*
/*
/*
/*
/*
/*
/*
/*
/*
/*
/*

OK
OK
OK
OK
OK
OK
OK
OK
OK
OK
OK

*/
*/
*/
*/
*/
*/
*/
*/
*/
*/
*/

*(pa+10) = 0;
*(pa-1) = 0;

/* WRONG */
/* WRONG */

pa = &a[5];
*(pa+10) = 0;
pa = &a[10];
*pa = 0;

/*
/*
/*
/*

OK */
WRONG */
OK */
WRONG */

Statements set pointer pa to various cells of the array a

Some of those cells are modified by indirecting pa

You may verify that each cell of a that receives a value


receives the value of its own index (i.e., a[6] is set to 6)

Right block

int *pa2;
pa = &a[5];
/* OK */
pa2 = pa + 10; /* WRONG */
pa2 = pa - 10; /* WRONG */

255

Statements in the right column are all invalid.

The first examples set pa to point into the array a but then
use overly-large offsets (+10, -1) which end up trying to store
a value outside of the array a

The statements in the last set of examples set pa2 to point


outside of the array a.

Even though no attempt is made to access the nonexistent


cells, these statements are illegal, too.
256

Two More Statements

The NULL pointer

int a [10];
int * pa , * pa2 ;

pa = &a[5];
pa2 = pa + 10; /* WRONG */
*pa2 = 0; /* WRONG */

Computes a pointer to the nonexistent 15th cell

Also tries to store something there

Pointers and integers are not interchangeable

Zero is the sole exception

The constant zero may be assigned to a pointer, and a pointer


may be compared with the constant zero

The symbolic constant NULL is often used in place of zero

It is a mnemonic remainder to indicate more clearly that this


is a special value for a pointer

NULL is defined in <stdio.h>

# define NULL

0L

257

Comparison Of Pointers

258

Pointer Substraction
Pointer substraction to determine string length

Pointers may be compared under certain circumstances

If p and q point to elements of the same array and p < q

If p and q point to members of the same array

Then q-p+1 is the number of elements from p to q inclusive

Then relations like ==, !=, <, >=, etc. work properly

This fact can be used to write yet another version of strlen

p < q, for instance, is true if p points to an earlier member of


the array than q does

The behaviour is undefined for arithmetic or comparisons with


pointers that do not point to members of the same array

There is one exception: The address of the first element past


the end of an array can be used in pointer arithmetic

/* strlen : return length of string s */


int
strlen ( char * s )
{
char * p = s ;
while (* p != \0 )
p ++;
return p - s ;
}

259

260

Computing strlen Using A Pointer Substraction

Valid Pointer Operations


Legal pointer operations summarized

/* strlen : return length of string s */


int
strlen ( char * s )
{
char * p = s ;
while (* p != \0 )
p ++;
return p - s ;
}
I

In its declaration, p is initialized to s

That is, to point to the first character of the string

while loop: examine each char until \0 is seen

p points to characters p++ advances p to the next char

Finally, p - s gives the number of characters advanced over,


i.e., the string length

Assignment of pointers of the same type

Adding or subtracting a pointer and an integer

Subtracting or comparing two pointers to members of the


same array

Assigning or comparing to zero

All other pointer arithmetic is illegal

Illegal pointer operations

 Not legal to add, multiply, divide, shift, or mask two pointers


 Add float or double to pointers
 Assign a pointer of one type to a pointer of another type
without cast (Exception is void *)

262

261

String Constants, String Literals


Dynamic Memory Allocation

A string constant or string literal

Pointers and Addresses


Pointers and Function Arguments

written as "I am a string" is an array of characters

is (automatically) terminated with the null character \0

occupies one more length in storage than the number of


characters between the double quotes

Accessing string constants

Pointers, Arrays and Address Arithmetic


Character Pointers and C Strings
Command-line Arguments

Most often string constants appear as arguments to functions

For example: printf("Hello World!\n");

Access to the constant is provided through a character pointer

printf receives a pointer to the beginning of the char array

A string constant is accessed by a pointer to its first element

263

264

Character Pointers And C Strings

Character Pointers & Character Arrays Differ

Text strings are represented by arrays of characters


I

Since arrays are very often manipulated via pointers, character


pointers are probably the most common pointers in C

What is the difference between these two?


char * pmessage = " now is the time " ;
char amessage [] = " now is the time " ;

Good to remember:
C does not provide any operators for processing an entire string of
characters as a unit
I

Basic illustration of how pointer and array differ:

What are we doing here?

pmessage:

char * pmessage ;
pmessage = " now is the time " ;
pmessage = " hello , world " ;
I

now is the time\0

amessage: now is the time\0

Assigning two pointers, not copying two entire strings

266

265

Different Ways String Literals Are Used

Exemplify Different Usage Of String Literals

char amessage [] = " now is the time " ;


I

String literal used as initializer for the array amessage

amessage is an array of 16 characters

We may overwrite them at a later point writeable

char amessage [] = " now is the time " ;


char * pmessage = " now is the time " ;
amessage [0] = N ;
I

char * pmessage = " now is the time " ;


I

String literal is used to create a little block of characters


somewhere in memory

Pointer pmessage is initialized to point to it

We may reassign pmessage later to point somewhere else

As long as it points to the string literal, we can not modify


the characters it points to

Perfectly right "Now is the time"

pmessage [0] = N ;

267

Equivalent to *pmessage = N

Would not necessarily work, in fact, it is not allowed

Why?

268

Exemplify Different Usage Of String Literals

String Copy
Direct consequence of not treating a string as a unit

char * pmessage = " now is the time " ;


pmessage [0] = N ;
I

Compiler might have placed the little block of characters in


read-only memory

Consider the usage of

We need functions to copy string t to string s or to compare them

Compiler might have used the same little block of memory to


initialize both pointers

We wouldnt want a change to one to alter the other

Would be nice to say s = t

However, this will only copy pointers, not the characters

To copy the characters we need a loop

void
strcpy ( char s [] , char t [])
{
int i ;

char * pmessage = " now is the time " ;


char * qmessage = " now is the time " ;
I

for ( i = 0; t [ i ] != \0 ; i ++)
s [ i ] = t [ i ];
s [ i ] = \0 ;
}

269

String Copy Using Pointers

270

Incidental Remark On Standard Idioms

Pointer Version of strcpy()


void
strcpy ( char *s , char * t )
{
while (* t != \0 )
* s ++ = * t ++;
* s = \0 ;
}
I

The value *t++ is the character that t pointed to before t


was incremented

The postfix ++ doesnt change t until after this character has


been fetched

Expressions like *p++ and *--p may seem cryptic at first sight

*--p decrements p before fetching the character p points to

Analogous to array subscript exp. like a[i++] and a[--i]

Standard idioms for pushing and popping a stack


* p ++ = val ; /* push val onto stack */
val = * - - p ; /* pop top of stack into val */

271

272

Comparing Strings

Source Code: String Comparison


strcmp(char *s, char *t)

Comparison of strings is analogous to copying


I

It compares the character strings s and t, and returns


negative, zero or positive if s is lexicographically less than,
equal to, or greater than t

What we would compare with == is the two pointers

If the pointers are equal, they point to the same place, so they
certainly point to the same string

Greater than and less than are interpreted based on the


relative values of the characters in the machines character set

This means that a < b

Also (at least with ASCII character set): B < a

In other words, capital letters will sort before lower-case letters

The positive or negative number returned is the difference


between the values of the first two characters that differ

We cannot assign one string to another using =

Same holds for the comparison of two strings using ==

I
I

But if we have two strings in two different parts of memory,


pointers to them will always compare different even if the
strings pointed to contain identical sequences of characters

274

273

Source Code: String Comparison

Intermediate Summary

/* strcmp : return <0 if s <t , 0 if s == t , >0 if s > t */


int
strcmp ( char *s , char * t )
{
int i ;

Character pointer derived from string literal


char * pmessage = " now is the time " ;

for ( i = 0; s [ i ] == t [ i ]; i ++)
if ( s [ i ] == \0 )
return (0);
return s [ i ] - t [ i ];
}
/* strcmp as pointer version */
int
strcmp ( char *s , char * t )
{
for (; * s == * t ; s ++ , t ++)
if (* s == \0 )
return (0);
return * s - * t ;
}

is usable (i.e., readable)

but not writable

that is, the characters pointed to are not writable

Destination string has to be a writable array with enough space

275

for the number of characters in the string we are copying

plus one for the terminating \0

276

What About Those Code Snippets?

Valid Code Snippet & string.h

char * p1 = " Hello , world ! " ;


char * p2 ;
strcpy ( p2 , p1 );

A correct example would be:


char * p = " Hello , world ! " ;
char a [14];
strcpy (a , p );

Wrong. p2 doesnt point anywhere


char * p = " Hello , world ! " ;
char a [13];
strcpy (a , p );

Another option is to obtain some memory for the string copy

Dynamic memory allocation for destination string

Wrong. Array a is writable, but there is not enough space for \0

String manipulation with the standard library

char * p3 = " Hello , world ! " ;


char * p4 = " A string to overwrite " ;
strcpy ( p4 , p3 );

The header <string.h> contains declarations for the functions


mentioned in this section, plus a variety of other string-handling
functions from the standard library.

Wrong. p4 points to memory not allowed to be overwritten


278

277

Some More Versions Of strcpy(s, t)

strcpy(s, t): Pointer Version 1/3


/* strcpy : copy t to s ; pointer version 1 */
void
strcpy ( char *s , char * t )
{
while ((* s = * t ) != \0 ) {
s ++;
t ++;
}
}

Coding Style
I

In the following we revisit the strcpy function

Three rather compressed version are shown

/* strcpy : copy t to s ; array subscript version */


void
strcpy ( char *s , char * t )
{
int i ;
i = 0;
while (( s [ i ] = t [ i ]) != \0 )
i ++;
}

279

Because arguments are passed by value, strcpy can use the


parameters s and t in any way it pleases

Here they are conveniently initialized pointers, which are


matched along the arrays one character at a time . . .

. . . until the \0 that terminates t has been copied to s

280

strcpy(s, t): Pointer Version 2/3

strcpy(s, t): Pointer Version 3/3

/* strcpy : copy t to s ; pointer version 2 */


void
strcpy ( char *s , char * t )
{
while ((* s ++ = * t ++) != \0 )
;
}

/* strcpy : copy t to s ; pointer version 3 */


void
strcpy ( char *s , char * t )
{
while (* s ++ = * t ++)
;
}

Increment of s and t moved into the test part of the loop

*t++ is the char that t pointed to before t was incremented

We can observe that a comparison against \0 is redundant

postfix ++ doesnt change t until the char has been fetched

The question is merely whether the expression is zero

It is stored into the old s position before s is incremented

Idioms like this are frequently used in C programs

It is also the value that is compared against \0

What do you think about these abbreviations?

Effect: chars are copied from t to s, up to and including \0

281

282

Program Startup
Dynamic Memory Allocation

Prototype for the main function

Pointers and Addresses


Pointers and Function Arguments

The function called at program startup is named main

It shall be defined with a return type of int

And either with no parameters:

Pointers, Arrays and Address Arithmetic

int main ( void );


I

Character Pointers and C Strings

Or with two parameters:


int main ( int argc , char * argv []);

Command-line Arguments

283

284

Constraints For main Function

Command-line Arguments

Terminology (though any names may be used)


I

argc stands for argument count

argv stands for argument vector

When a program is executed, the process that does the exec


can pass command-line arguments to the new program

Normal operation for UNIX system shells

If argc and argv are declared

argv:

The value of argc shall be nonnegative

argv[0] represents the program name or argv[0][0] shall


be the null character if the program name is not available

argv[1] to argv[argc-1] represent program parameters

argv[argc] shall be a NULL pointer

echo\0
hello,\0
world\0
NULL

286

285

Environment

Environment (cont.)
Terminology

Each program is also passed an environment list

Like the argument list, it is an array of character pointers

environ is called environment pointer

Each pointing to a null-terminated C string

The array of pointers is the environment list

The address of the array is contained in a global variable:

The strings they point to are the envoronment strings

By convention, name=value string are used

extern char ** environ ;

Historical third argument to main (not ISO C)


environ:

int main ( int argc , char * argv [] , char * envp );

HOME=/home/holu\0
SHELL=/bin/ksh\0
PS1=\w \$\0
NULL

287

Most UNIX systems have provided a third argument to main

ISO C specifies main with two arguments

Posix.1 specifies environ to be used instead of 3rd arg


288

Echo Command-line Arguments


# include < stdio .h >

Systems Programming

int
main ( int argc , char * argv [])
{
int i ;

05. Structures & Trees


Alexander Holupirek

/* for ( i = 0; argv [ i ] != NULL ; i ++) */


for ( i = 0; i < argc ; i ++)
printf ( " argv [% d ]: % s \ n " , i , argv [ i ]);

Database and Information Systems Group


Department of Computer & Information Science
University of Konstanz

return (0);
}

Summer Term 2008

$ ./ a . out -a 1 - arg2 -- arg3


argv [0]: ./ a . out
argv [1]: -a
argv [2]: 1
argv [3]: - arg2
argv [4]: -- arg3

289

290

Schedule For Today


Basics of Structures
Self-referential Structures
Unions

Today: Finish introduction to the C programming language


I

Structures, Unions, Enumerations, Typedefs

Pointers to Function and Function Callbacks

Putting it all together by using an external library

Enumerations
Typedef
Pointers to Functions
Function Callbacks
The libxml2 library

291

292

Structures

Structure Declaration (Example)


Declare a structure:
struct point {
int x ;
int y ;
};

A structure is a collection of one or more variables


I

possibly of different types

grouped together under a single name for convenient handling

Some variables of that type:


A structure
I

organizes complicated data, particulary in large programs

permits a group of related variables to be treated as a unit

struct point here , there ;

Combination of the upper two:


struct point {
int x ;
int y ;
} here , there ;

294

293

Structure Declaration

Struct Declaration Defines A Type

Keyword struct introduces a structure declaration:


struct structure tag {
/* list of member declarations */
type name;
type name;
} list of variable declarations;

A struct declaration defines a type

Terminating right brace may be followed by a list of variables:


struct { ... } x , y , z ;

This is syntactically analogous to:


int x , y , z ;

Structure declaration has four parts:


I

keyword struct

structure tag (optional)

brace-enclosed list of declarations for the members (optional)

list of variables of the new structure type (optional)

Each statement declares x, y, and z


I
I

A structure declaration not followed by a list of variables


I
I

295

to be variables of the named type


and causes space to be set aside for them
reserves no storage
merely describes a template or the shape of a structure

296

Structure Tag

Operations On And Initialization Of Structures


Legal operations

Tagged structure
I

A previous established structure tag can be used subsequently


as a shorthand for the part of the declaration in braces:

struct point pt ; /* Structure tag as a shorthand */

copying it

assigning to it as a unit

taking its address with &

accessing its members

Illegal operation
I

Structures may not


be compared

Initialization

Anonymous structure

struct {
int i ;
int j ;
} a;

A list of constant member values initializes a structure


struct point maxpt = { 320 , 200 };

An automatic structure may also be initialized


I
I

by assignment
by calling a function returning a struct of apt type
298

297

Structures And Functions Example

Structures And Functions


There is nothing special about structures and functions

/* makepoint : make a point from x and y components */


struct point
makepoint ( int x , int y )
{
struct point tmp ;
tmp . x = x ;
tmp . y = y ;
return tmp ;

Pass/return components separately

Pass/return entire structure

Pass/return a pointer to structure

If a large structure is to be passed to a function a pointer may be


the better choice (pass by value copies the whole structure).

Structure pointers are just like pointers to ordinary variables:


struct point p1 ;
struct point p2 ;

struct point * pp ;

p1 = makepoint (0 ,0);
p2 = makepoint ( XMAX , YMAX );

* pp = makepoint (1 ,3); /* * pp is the structure */


(* pp ). x += 2;
/* (* pp ). x is a member */

299

300

The Structure Member Operators . And ->

Nested Structures

Structure operator . connects structure- and member name

Structures can be nested

A member of a particular structure is referred to in an expression


by structure-name.member

struct rect {
struct point pt1 ;
struct point pt2 ;
}

printf ( " %d ,% d " , pt .x , pt . y );

struct rect screen ;

Structure operator -> as shorthand


screen . pt1 . x ;

If ps is a pointer to a structure with member m, than


(*ps).m

ps->m

are equivalent by definition.

The rect structure contains two point structures

screen.pt1.x is the x coord. of the pt1 member of screen

301

302

Self-referential Structures: Binary Tree


Basics of Structures
Self-referential Structures

Data structure: Binary Tree (to store words lexicographically)

One node per distinct word


Each node contains:

Unions

I
I

Enumerations

I
I

Typedef

a
a
a
a

pointer to the text of the word


count of the number of occurences
pointer to the left child node
pointer to the right child node

This reads in C:
Pointers to Functions

struct tnode {
char * word ;
int count ;
struct tnode * left ;
struct tnode * right ;
};

Function Callbacks
The libxml2 library

303

304

Lexicographic Order In Binary Tree

Output And Tree View Of bintree.c

now is the time for all good men to come to the aid of their party
I
I

Each node has either zero, one or two children


Given a node and its word
I
I

1
1
1
1
1
1
1
1
1
1
2
1
1
2

Left subtree: All words are lexicographically less than word


Right subtree: All words are lexicographically greater than word

Consider the following input sentence:


now is the time for all good men to come to the aid of their party

now

aid
all
come
for
good
is
men
now
of
party
the
their
time
to

is
men

for
good

all
aid

the
time

of
party

their

come

305

Source Code Binary Tree


# include < stdio .h >
# include < ctype .h >
# include < string .h >
# define MAXWORD 100
struct tnode {
char * word ;
int count ;
struct tnode * left ;
struct tnode * right ;
};

to

306

Source Code Binary Tree


struct tnode * talloc ( void );
char * strdupl ( char *);
/* treeprint : in - order print of tree p */
void
treeprint ( struct tnode * p )
{
if ( p != NULL ) {
treeprint (p - > left );
printf ( " %4 d % s \ n " , p - > count , p - > word );
treeprint (p - > right );
}
}

/* add a node with w , at or below p */


struct tnode *
addtree ( struct tnode *p , char * w )
{
int cond ;
if ( p == NULL ) { /* new word arrived */
p = talloc (); /* make a new node */
p - > word = strdupl ( w );
p - > count = 1;
p - > left = p - > right = NULL ;
} else if (( cond = strcmp (w , p - > word )) == 0)
p - > count ++; /* repeated word */
else if ( cond < 0) /* less than -> left */
p - > left = addtree (p - > left , w );
else
p - > right = addtree (p - > right , w );
return ( p );

struct tnode *
addtree ( struct tnode * , char *);
void treeprint ( struct tnode *);
int getword ( char * , int );
int
main ( void )
{
struct tnode * root ;
char word [ MAXWORD ];

/* make a duplicate of s */
char *
strdupl ( char * s )
{
char * p ;
p = ( char *) malloc ( strlen ( s ) + 1);
if ( p != NULL )
strcpy (p , s );
return p ;
}

# include < stdlib .h >


/* talloc : make a tree node */
struct tnode *
talloc ( void )
{
struct tnode * tn ;
tn = ( struct tnode *)
malloc ( sizeof ( struct tnode ));
return tn ;
}

}
root = NULL ;
while ( getword ( word , MAXWORD ) != EOF )
if ( isalpha ( word [0]))
root = addtree ( root , word );
treeprint ( root );
return (0);

/* get single word from input */


int
getword ( char * word , int max )
{
int c , i ;
i = 0;
while (( c = getchar ()) != EOF && i < max - 1
&& c != && c != \ n && c != \ t )
word [ i ++] = c ;
word [ i ] = \0 ;
return ( c == EOF ) ? EOF : i ;
}

}
struct tnode * talloc ( void );
307

308

Unions
Basics of Structures
I

A union is a variable that may hold (at different times)


objects of different types and sizes.

Unions provide a way to manipulate different kinds of data in


a single area of storage.

Self-referential Structures
Unions

The syntax is based on structures:

Enumerations

Pointers to Functions

union u_tag {
int ival ;
float fval ;
char * sval ;
} u;

Function Callbacks

The variable u will be large enough to hold the largest of the


three types (the specific size is implementation-dependent)

It is the programmers responsibility to keep track of which


type is currently stored in a union

Typedef

The libxml2 library

309

310

Unions (cont.)
I

Unions may occur within structures and arrays, and vice versa

Notation for accessing a member of a union in a structure (or


vice versa) is identical to that for nested structures

Basics of Structures
Self-referential Structures
Unions

struct {
char * name ;
int flags ;
int utype ;
union {
int ival ;
float fval ;
char * sval ;
} u;
} symtab [ NSYM ];

Enumerations
Typedef
Pointers to Functions
Function Callbacks
The libxml2 library

symtab [ i ]. u . ival
* symtab [ i ]. u . sval
/* first char of string sval */
symtab [ i ]. u . sval [0] /* dito */
311

312

Enumerations

Enumerations (cont.)

Enumerations provide a conventient way to associate constant


values with names

An alternative to #define with the advantage that the values


can be generated automatically

A debugger may also be able to print values of enumeration


variables in symbolic form

An enumeration is a list of constant integer values

First value in an enum has value 0, the next 1 . . .

. . . unless explicit values are specified


enum escapes { BELL = \ a ,
BACKSPACE = \ b , TAB = \ t };

enum boolean { NO , YES };

If not all values are specified, unspecified values continue the


progression from the last specified value
/* FEB is 2 , MAR is 3 ... */
enum months { JAN = 1 , FEB , MAR , APR , MAY , JUN ,
JUL , AUG , SEP , OCT , NOV , DEC };

314

313

Typedef: New Data Type Names


Basics of Structures
typedef is a facility for creating new data type names:

Self-referential Structures

typedef int Length ;

Unions
Enumerations
Typedef

makes the name Length a synonym for int

Type Length can be used exactly in the same way as type int

Reasons for using typedefs


I

Pointers to Functions

Portability issues
I

Function Callbacks

(Better) Documentation for a program


I

The libxml2 library

315

Types like size t, ptrdiff t are examples


A type Treeptr may be easier to understand than one
declared as a pointer to a complicated structure

316

Further typedefs
Basics of Structures
I

In effect, typedef is like #define

Except that it is interpreted by the compiler

Therefore its capabilities are beyond textual substitutions

Self-referential Structures
Unions
Enumerations

typedef int (* PFI )( char * , char *);


I

Typedef

Creates the type PFI (pointer to function (of two char *


argument) returning int

Pointers to Functions

typedef enum { ON , OFF , BROKEN } state ;


I

Function Callbacks

Creates the type state

The libxml2 library

317

Pointers To Functions

318

Pointer To Functions Example


# include < stdio .h >

I
I

A function itself is not a variable


But it is possible to define pointers to functions
I
I

void
print_one ( void )
{
printf ( " 1\ n " );
}

which can be assigned, placed in arrays


passed to functions, returned by functions

void
print_two ( void )
{
printf ( " 2\ n " );
}

int (* func )( char * , char *);


I

Declares a pointer to a function that has two char *


arguments and returns an int

func is a pointer to a function

(*func) is the function, as such the function call reads:

int
main ( void )
{
void (* func [])( void ) = { print_one , print_two };
(* func [0])();
(* func [1])();
return (0);

(* comp )( " abc " , " def " );


}
319

320

Dealing With XML Trees Using SAX


Basics of Structures
Self-referential Structures
Unions

Putting it all together

Enumerations
Typedef

Establish function callbacks in an event driven application

Use structures with function pointers to register for callbacks

Further deal with trees, more precise XML trees

Use an external library able to parse trees

Pointers to Functions

We will build a SAX parser application for XML documents


Function Callbacks
The libxml2 library

321

An API To Handle XML Trees

322

Tree View Of An XML Document


<a >
<b >
<c >

SAX is a quasi-standard for parsing XML documents.

We will use the C library libxml226 .

SAX processors operate in a streaming fashion and with


constant space, regardless of the document size.

<d / >
<e / >

SAX (Simple API for XML)


</c >

</b >
<f >
<g / >
<h >

The SAX parser reads its input sequentially, and once only.

c
<i / >
<j / >

</h >

h
i

</f >
</a >

An XML document and its inner tree structure.

26

Originally developed for the GNOME project


323

324

Parsing Process Triggers Events

SAX Events
Selected events defined by the SAX interface27 :

SAX Parser reports.


During the parsing process the parser reports interesting events
to registered applications, such as:
I

The occurence of a start or end tag,

a piece of simple text, or

the beginning/end of the document.

Event
startDocument
endDocument
startElement
endElement
characters
comment

Registered application reacts.


I

The application implements code to react in parallel (to the


parsing process) to fired events.

Syntactical errors in the XML document will be detected by


the parser and reported to the application.

..
.

. . . triggered by
<?xml ...?>
EOF
<t a1 = v1 . . . an = vn >
</t>
text content
<!-- c -->
..
.

Formal arguments

t, (a1 , v1 ), . . . , (an , vn )
t
buffer pointer, length
c
..
.

Be careful with the characters event! For performance reasons the parser will
give you a pointer to its own memory space. Never write to this memory, and
never look further than the length given by the parser!

27

Complete documentation http://www.saxproject.org/

325

SAX Events Example

326

Function Callbacks

<? xml version ="1.0" encoding =" iso -8859 -1"? >
<fs >
< dir name =" etc " >
< file name =" services " >
# $OpenBSD : services , v 1.67 2007/05/01 11:48:40 steven Exp $
#
# Network services , Internet style
#
# Note that it is presently the policy of IANA to assign a single ...
</ file >
</ dir >
< dir name =" usr "/ >
</ fs >

Event
startDocument
startElement
startElement
startElement
characters
endElement
endElement
..
.

Events are reported to the application via callback functions


The application has to register them before parsing.

Actual arguments
t = "fs"
t = "dir", a1 = "name", v1 = "etc"
t = "file", a1 = "name", v1 = "services"
c = "# $OpenBSD: services ...", len = n
t = "file"
t = "dir"
..
.
327

Populate a callback function table.

Hand over the callback function table to the parser.

Invoke the parsing process.

Whenever any of the SAX event occurs, the parser calls the
function that is registered for this event.

328

The libxml2 Library


Basics of Structures
Correct function type is mandatory to interface with libxml2:

Self-referential Structures

void
s ax _ st ar t _d oc u me n t ( void * ctx );

Unions

void
sax_en d_docu ment ( void * ctx );

Enumerations

void
sa x_s tar t_el eme nt ( void * ctx , const xmlChar *t , const xmlChar ** atts );
void
sax_end_element ( void * ctx , const xmlChar * t );

Typedef

void
sax_characters ( void * ctx , const xmlChar *c , int len );

Pointers to Functions
Function Callbacks
The libxml2 library

The *ctx (context) pointer stores private application data.

Its value can be set before parsing.

The same pointer is passed through with every callback.

329

Callback C Code Example

330

Callback Function Table

A simple character callback function definition:


void
sax_ characters ( void * ctx , const xmlChar *c , int len )
{
int i ;
for ( i = 0; i < len ; i ++)
printf ( " % c " , c [ i ]);
}

Our callback functions need to be registered . . .

. . . in the callback function table of the parser.

libxml2 declares a structure called xmlSAXHandler:

struct xmlSAXHandler {
s t a rt D o c u m e n t S A X F u n c startDocument ;
en dD oc um en tS AX Fun c endDocument ;
s t ar t El e m en t S AX F u nc startElement ;
endE lement SAXFun c endElement ;
char acters SAXFun c characters ;
...
};

The corresponding typedef in libxml/parser.h:


/* *
* c h a ra cter sSA XFu nc :
* @ctx : the user data ( XML parser context )
* @ch : a xmlChar string
* @len : the number of xmlChar
*
* Receive some chars from the parser .
*/
typedef void (* c har a c te rs S A X F u nc )
( void * ctx , const xmlChar * ch , int len );

331

332

Populate A Callback Function Table

Pass Callback Table And Parse

# include < libxml / SAX .h >


# include < libxml / parserInternals .h >

/* context pointer */
xmlParserCtxtPtr ctx ;

/* define a callback function table */


xmlSAXHandler sax_handler ;

/* create new parser instance */


ctx = x m l C r e a t e F i l e P a r s e r C t x t ( " fs . xml " );

/* function to be called back ( characters event ) */


static void
sax_characters ( void * ctx , const xmlChar *c , int len )
{
...
}

/* pass callback table */


ctx - > sax = & sax_handler ;
/* start parsing */
xmlParseDocument ( ctx );
return (0);
}

int
main ( void )
{
/* register callback function */
sax_handler . characters = sax_characters ;

Instantiate a new parser.

Hand over the callback function table to the parser.

Invoke the parsing process.

333

Source Code Parser Application Example


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

# include < stdio .h >


# include < libxml / SAX .h >
# include < libxml / pa rs erIn ternals .h >
static xmlSAXHandler sax_handler ;
/* characters callback function
* invoked for each text content */
static void
sax_characte rs ( void * ctx ,
const xmlChar *c , int len )
{
int i ;
for ( i = 0; i < len ; i ++)
printf ( " % c " , c [ i ]);
}

Course repository:
pub/src/sax xmp.c

19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42

334

Compiling Applications With libxml2

int
main ( void )
{
/* context pointer */
x mlPar serC txtPtr ctx ;

Compilation Phase
cc - Wall
-I / usr / local / include / libxml2
-I / usr / local / include
-c
sax_xmp . c

/* register callback function */


sax_handler . characters = sax_characters ;
/* create new parser instance */
ctx = x m l C r e a t e F i l e P a r s e r C t x t ( " fs . xml " );
if ( ctx == NULL ) {
printf ( " error reading file " );
return ( -1);
}

enable a lot of warnings


the libxml2 header files
character set conversion
do not link yet , compile only
the C file we want to compile

Linking Phase
cc - Wall
-L / usr / local / lib
- lxml2
sax_xmp . o

/* pass callback table */


ctx - > sax = & sax_handler ;
/* start parsing */
x mlPar seDo cument ( ctx );

enable a lot of warnings


location of shared object
include the libxml2 code
object files to link together

return (0);
}

335

336

Compilation Session CIP Pool


Systems Programming
06. Traversing, Storing and Operating Trees

Two step compilation in CIP Pool (Linux)


Alexander Holupirek

mond10 :~ > cc - Wall -I / usr / include / libxml2 -c sax_xmp . c


mond10 :~ > cc - Wall - lxml2 sax_xmp . o

Database and Information Systems Group


Department of Computer & Information Science
University of Konstanz

At one sweep
mond10 :~ > cc - Wall -I / usr / include / libxml2 - lxml2 sax_xmp . c

Summer Term 2008

338

337

Schedule For Today

Internal UNIX Kernel Structure


Benutzerprogramme
Programmbibliothek

User Level
Kernel Level

So far: The C programming language


Introduction to the basic constructs and features

First practical experience

Datei-Subsystem (file subsystem)

Pufferung (buffer cache)

Today: Discussion of the practical project

Zeichen (character)

Intention: Learn about the UNIX system interface

Next assignments will (mostly) belong together

Outline the project

Introduction to the first part: Storing trees in tables

Block (block)

Gertetreiber (device driver)

ProzesssteuerungsSubsystem (process
control subsystem)

Systemaufruf-Schnittstelle (system call interface)

Interprozesskommunikation (IPC)
Prozessorzuteilung
(scheduling)
Speicherverwaltung
(memory management)

Hardwaresteuerung
Kernel Level
Hardware Level
Hardware

Figure: System V Release 3 Kernel Architecture (Glatz, p. 13)

339

340

Practical Project (File Subsystem)

System Architecture
FUSE
Implementation

Implementation Steps
I

Store XML tree in table(s) ( DBMS)

Store file hierarchy tree in DBMS

Connect OS kernel with DBMS as Filesystem in USErspace

cat ~/Mail/a.msg/subject

pre/size/level
...

libc

libfuse

user

Expected result

kernel

A (read-only) filesystem in userspace, that will allow common


UNIX commands, such as cd, cat, to operate on files inherent
structure (email: from, to, subject . . . ; mp3: title, composer, . . . ).

FUSE

VFS

ext3
Figure: Extending the file hierarchy into the file.

341

342

Depth-First, Left-to-Right Traversal


<a >
<b >
<c >
<d / >
<e / >
</c >
</b >
<f >
<g / >
<h >
<i / >
<j / >
</h >
</f >
</a >

Encoding Trees (XPath Accelerator)


Tree Reconstruction
Storing XML Trees in (R)DBMSs
Storing File Hierarchies in (R)DBMSs
Implementation Details

a
b

f
g

c
d

h
i

A sequential read corresponds to a depth-first, left-to-right tree traversal.

343

344

Visiting All Nodes Of The Tree

Pre-Order/Post-Order Traversal

<a >

0 <a >

<b >

1 <b >

<c >

<d / >
<e / >
</c >
</b >
<f >
<g / >
<h >
<i / >
<j / >
</h >
</f >
</a >

2 <c >

3 <d / > 0

4 <e / > 1
</c > 2
</b > 3
5 <f >
6 <g / > 4
7 <h >
8 <i / > 5
9 <j / > 6
</h > 7
</f > 8
</a > 9

For each node n, we receive

Parsing an XML document,

I startElement event before the events of any children of n, and

I start element callbacks occur in pre-order,

I endElement event after the events of any children of n.

I end element callbacks occur in post-order.

345

346

Storing Trees In Relational Tables


n
a
b
c
d
e
f
g
h
i
j

pre
0
1
2
3
4
5
6
7
8
9

post
9
3
2
0
1
8
4
7
5
6

Encoding Trees (XPath Accelerator)


0

Tree Reconstruction
8

Storing XML Trees in (R)DBMSs

Storing File Hierarchies in (R)DBMSs

Torsten Grust, 2002; Grust et al., 2004


XPath Accelerator.

Implementation Details

http://www-db.in.tum.de/~grust/files/xpath-accel.pdf (short)
http://www-db.in.tum.de/~grust/files/accelerating-locsteps.pdf

347

348

Tree Reconstruction
I Sequentially scan table T with encoded document (in pre-order).

Encoding Trees (XPath Accelerator)

I Maintain a stack with nodes whose start (but no end) tag was printed.
I Before processing a node n, check stack for nodes n0 whose end tags have

to be emitted first. This is the case for n0 with post(n0 ) < post(n).

Tree Reconstruction

I Afterwards print start tag of n and push it onto the stack.

Storing XML Trees in (R)DBMSs

Given: pre/post table T , stack S, node n


foreach n in T do
while S not empty and post(S.top()) < post(n) do
print("</", name(S.top()), ">");
S.pop();
print("<", name(n), ">");
S.push(n);

Storing File Hierarchies in (R)DBMSs


Implementation Details

while S not empty do


print("</", name(S.top()), ">");
S.pop ();

350

349

An Example: Document View

An Example: Table View


pre
0
1
2
3
4
5
6
7
8
9
10
11
12

Consider the following XML document:


<? xml version = " 1.0 " ? >
<fs >
< dir name = " Mail " >
< file name = " 01. email " >
< date >Tue, 13 May 2008 17:48:56 +0200 (CEST) </ date >
< from >Christian.Pich@uni-konstanz.de </ from >
<to >inf@inf.uni-konstanz.de </ to >
< subject >EM-Tipprunde </ subject >
< message >
Lieber Fachbereich,
es sind nur noch wenige Tage bis zur Fussball-EM!
Einer alten Tradition folgend wollen wir auch dieses
Mal eine Tipprunde im Fachbereich veranstalten...
</ message >
</ file >
</ dir >
</ fs >

post
12
11
10
1
0
3
2
5
4
7
6
9
8

kind
elem
elem
elem
elem
text
elem
text
elem
text
elem
text
elem
text

content
fs
dir
file
date
Tue, 13 May 2008 17:48:56 +0200 (CEST)
from
Christian.Pich@uni-konstanz.de
to
inf@inf.uni-konstanz.de
subject
EM-Tipprunde
message
Lieber Fachbereich, es sind ...

Table: Relational encoding of the XML document (whitespaces and


attributes omitted)

351

352

A Tree In Two-Dimensional Space

An Example: Table View (pre/post/size/level)

 The pre/post plane and its properties

pre
0
1
2
3
4
5
6
7
8
9
10
11
12

post
12
11
10
1
0
3
2
5
4
7
6
9
8

size
12
11
10
1
0
1
0
1
0
1
0
1
0

level
0
1
2
3
4
3
4
3
4
3
4
3
4

kind
elem
elem
elem
elem
text
elem
text
elem
text
elem
text
elem
text

content
fs
dir
file
date
Tue, 13 May 2008 17:48:56 +0200 (CEST)
from
Christian.Pich@uni-konstanz.de
to
inf@inf.uni-konstanz.de
subject
EM-Tipprunde
message
Lieber Fachbereich, es sind ...

Table: Relational encoding with additional structural metadata to speed


up queries: size (#descendants) and level (#ancestors)

pre(n) post(n) = level(n) size(n)

354

353

Relational Tree Encoding In Real Life


Encoding Trees (XPath Accelerator)

Relational XQuery Processing


The following open source implementations are build on top of the
relational tree encoding idea of the XPath Accelerator.
I Pathfinder XQuery Compiler

Tree Reconstruction
Storing XML Trees in (R)DBMSs

http://www.pathfinder-xquery.org/
I

MonetDB/XQuery DBMS
http://monetdb-xquery.org/

Storing File Hierarchies in (R)DBMSs

BaseX (uses a similar encoding)


http://www.basex.org

Implementation Details

Tree-aware storage structures and query algorithms.

355

356

UNIX Filesystem

File Hierarchy Traversal

The UNIX FS is a hierarchical arrangement of directories and files.


I

Everything starts in the directory called root whose name is


the single character /.

A directory is a file that is differentiated from a plain file by a


flag in its inode(5) entry.

$ tree ./ a

0 a9

| - - 1 b3
|
-- 2 c 2
|
|-|
--- 5 f 8
| - - 6 g4
-- 7 h 7
|---

Directory entries (dirent(5)) contain information about a


file28 and a pointer to the file itself.

3 d0

4 e1

8 i5
9 j6

Directory entries may contain other directories as well as plain


files; such nested directories are referred to as subdirectories.
Traversing a file hierarchy,
I enter directory callbacks occur in pre-order,
I leave directory callbacks occur in post-order.

28

The stat(2) and fstat(2) functions return a structure containing all the
attributes of a file.
357

358

The Database As Filesystem


Encoding Trees (XPath Accelerator)
DB-aware
applications

Mapi

MonetDB/XQuery

xls /mnt/fuse

FUSE
Implementation

FUSE XQuery
Module

ls /mnt/fuse

FSOps to
XQuery/XQUF

Pathfinder
XQuery Compiler

libc

libfuse

MonetDB Kernel

Tree Reconstruction
Storing XML Trees in (R)DBMSs

user

Storing File Hierarchies in (R)DBMSs

kernel

VFS

FUSE
ext3

Implementation Details
Figure: A filesystem in userspace implemented by a tree-aware DBMS

359

360

Represent A Table In Main-Memory

Table: Array (or dynamic data structure) of tuples

Tuples: A case for a new struct

Tuple Representation

enum kind {
ELEM = 1 , ATTR , TXT , COMMENT , DOC , UDEF
};

struct tuple {
int pre ;
int post ;
int level ;
int size ;
enum kind ;
char * name ;
};
I

struct tuple {
unsigned int size ;
unsigned int level ;
enum kind kind ;
void * cnt ;
};

What members do we actually need?

361

362

Your Assignment Part I III


Systems Programming
I. Relational Tree Storage & Recovery
I

Extend your SAX Parser to store an XML document in a


relational table.

Implement the reconstruction algorithm to restore the original


document.

07. File I/O

Alexander Holupirek
Database and Information Systems Group
Department of Computer & Information Science
University of Konstanz

II. File Hierarchy Traversal & Storage


Summer Term 2008

15.06. 29.06.

III. Connect your storage to FUSE kernel module


29.06. 13.07.

363

364

Schedule For Today

Input And Output (I/O)

So far:
I

I/O in the UNIX OS

First practical part: Storing trees in tables

Today: I/O in the UNIX OS


I

Learn about the basic I/O routines

Learn about essential file properties and related functions

Prepares second practical part:


Storing a file hierarchy in tables

All I/O is done by reading or writing files. All peripheral


devices, even keyboard and screen, are files in the filesystem.

A single homogeneous interface handles all communication


between a program and peripheral devices.
In general, before reading or writing a file, the OS has to be
informed, which is done via opening the file.

I
I

In case of a write it may be necessary to create the file first.


Permissions have to be checked etc.

366

365

Unbuffered I/O

System Calls vs. Library Routines


Unbuffered I/O because each read or write invokes a system call.

UNIX provides its services through a set of system calls


applications
I

In effect, these are functions within the OS.

As such they may be called by user applications.


Regarding file I/O it all boils down to five functions:

shell
system calls

open, read, write, lseek, and close

We speak of unbuffered I/O in contrast to the standard I/O.

Unbuffered I/O is not part of ISO C, but POSIX.1 and SUSv3.

kernel

library routines

367

368

File Descriptor

Standard Input, Output, and Error

File Descriptor
I

When we open an existing file or create a new file, the kernel


returns a file descriptor to the process.

To the kernel, all open files are referred to by file descriptors.

A file descriptor is a non-negative integer.

Whenever I/O is to be done on the file, the file descriptor is


used instead of the name to identify the file.

Special arrangements for I/O with keyboard and screen

Each process has a fixed size descriptor table, which is


guaranteed to have at least n slots.

The entries in the descriptor table are numbered with small


integers starting at 0.

The call getdtablesize(3) returns the size of this table.

When the command interpreter (shell) runs a program, three


files are open, with the file descriptors 0, 1, and 2.

0 stdin, 1 stdout, 2 stderr.

If a program reads 0 and writes 1 and 2, it can do input and


output without worrying about opening files.

POSIX.1 replaces the magic numbers 0, 1, and 2 with


STDIN FILENO, STDOUT FILENO, STDERR FILENO (unistd.h)

370

369

I/O Redirection
I

Tracing Hello World (on Linux)

The user can redirect I/O to and from files with < and >:

strace(1)ing system calls of hello world

$ ./ a . out < infile > outfile


I

In this case, the shell changes the default assignments for file
descriptors 0 and 1 to the named files.

File descriptor 2 normally remains attached to the screen to


display error messages.

Similar observations hold for I/O associated with a pipe.

In all cases, the file assignments are changed by the shell, not
by the program. The program does not known where its input
comes from nor where its output goes, so long as it uses file 0
for input and 1 and 2 for output.

$ strace ./ hello
execve ( " ./ hello " , [ " ./ hello " ] , [ /* 72 vars */ ]) = 0
brk (0)
= 0 x602000
mmap ( NULL , 4096 , PROT_READ | PROT , ...
) = 0 x2ac75deeb000
uname ({ sys = " Linux " , node = " titan05 " , ...}) = 0
access ( " / etc / ld . so . preload " , R_OK )
= -1 ENOENT
open ( " / etc / ld . so . cache " , O_RDONLY )
= 3
fstat (3 , { st_mode = S_IFREG |0644 , ...})
= 0
mmap ( NULL , 197102 , PROT_READ , ...
) = 0 x2ac75deec000
close (3)
= 0
open ( " / lib64 / libc . so .6 " , O_RDONLY )
= 3
read (3 , " \177 ELF \2\1\1\0 \0\0\03 " ... , 832) = 832
fstat (3 , { st_mode = S_IFREG |075 , ...})
= 0
mmap ( NULL , 4096 , PROT_READ | ...)
= 0 x2ac75df1d000
mmap ( NULL , 3412216 , PROT_READ | ....)
= 0 x2ac75e0ed000

371

372

Tracing Hello World (on Linux) (cont.)

Low-Level I/O
File Descriptor

fadvise64 (3 , 0 , 3412216 , ...)


= 0
mprotect (0 x2ac75e226000 , 2093056 , ...)
= 0
mmap (0 x2ac75e425000 , 20480 , ...)
= 0 x2ac75e425000
mmap (0 x2ac75e42a000 , 16632 , ...)
= 0 x2ac75e42a000
close (3)
= 0
mmap ( NULL , 4096 , PROT_READ | ...)
= 0 x2ac75e42f000
arch_prctl ( ARCH_SET_FS , 0 x2ac75e42f6f0 )
= 0
mprotect (0 x2ac75e425000 , 12288 , PROT_READ ) = 0
mprotect (0 x600000 , 4096 , PROT_READ )
= 0
munmap (0 x2ac75deec000 , 197102)
= 0
fstat (1 , { st_mode = S_IFCHR |0620 , ...})
= 0
mmap ( NULL , 4096 , PROT_READ |...)
= 0 x2ac75deec000
write (1 , " Hello , world \ n " , 13 Hello , world
)
= 13
exit_group (0)
= ?
Process 32089 detached

Open/Create/Close a File
Reposition File Offset
Read and Write a File
Properties of a File
Primitive System Data Types
The size-related Fields
Directory Properties and Functions
Device Numbers and Time-related Fields
374

373

Open A File

Flags For The open(2) Function

A file is opened by calling the open(2) function:

flags is an int that specifies how the file is to be opened:

# include < fcntl .h >

O RDONLY
O WRONLY
O RDWR

int
open ( const char * path , int flags , mode_t mode );

path is the name of the file to open or create.

flags specifies a multitude of options (formed by ORing


together one or more constants from fcntl.h (next slide)).

mode holds permission information associated with a file.

If successful, open(2) returns a file descriptor.

Otherwise, a value of -1 is returned and errno is set to


indicate the error ( errno.h).

Open for reading only.


Open for writing only.
Open for reading and writing.

Table: One and only one of these three constants must be specified.29

fd = open ( name , O_RDONLY , 0);

e.g. ENAMETOOLONG: A component of a path name exceeded


MAXNAMLEN, or an entire path name exceeded MAXPATHLEN-1

In principle, its an error to try to open a non-existing file.

The system call creat (sic!) is provided to create new files.

But . . .

29

Most implementations define O RDONLY as 0, O WRONLY as 1, and O RDWR as


2, for compatibility with older programs.
375

376

Creating New Files

Even More Flags For The open(2) Function


Following constants are optional flags to open(2):

# include < sys / types .h >


# include < sys / stat .h >
# include < fcntl .h >

O
O
O
O
O
O
O
O

int
creat ( const char * path , mode_t mode );

However, creat is made obsolete by open:


Historically, in early version of the UNIX system, the second argument to open could
be only 0, 1, or 2. There was no way to open a file that didnt exist. Therefore, a
separate system call, creat, was needed to create new files. With th O CREAT and

APPEND
CREAT
EXCL
TRUNC
NONBLOCK
SYNC
RSYNC
DSYNC

Append on each write.


Create file if it does not exist (requires mode argument).
Error if O CREAT and file exists.
Truncate file length to 0.
Do not block on open or for data to become available.
Have each write wait for I/O to complete (incl. file attributes).
Let read wait until pending writes to same area are complete.
Have each write wait for I/O to complete (excl. file attributes).

Table: Short description of some more POSIX.1 flags to open. Consult


your system manual for further information & implementation details.

O TRUNC options now provided by open, a separate creat function is no longer needed:

open ( path , O_CREAT | O_TRUNC | O_WRONLY , mode );

377

Close A File

378

Low-Level I/O

close(2) - delete a descriptor

File Descriptor

# include < unistd .h >

Open/Create/Close a File

int
close ( int d );

Reposition File Offset


Read and Write a File

An open file is closed by calling the close function.

Releases any record locks the process may have on the file.

May be used to not run out of active descriptors per process.

Primitive System Data Types

When a process exits, all associated file descriptors are freed.

The size-related Fields

returns 0 on success, -1 on failure and sets global int errno.


close will fail if:

Directory Properties and Functions

I
I

Properties of a File

Argument is not an active descriptor (EBADF)


An interrupt was received (EINTR)

Device Numbers and Time-related Fields


379

380

lseek - Reposition Read/Write File Offset

Offset Interpretation

# include < unistd .h >

# include < unistd .h >

off_t
lseek ( int fildes , off_t offset , int whence );

off_t
lseek ( int fildes , off_t offset , int whence );

Every open file has a current file offset

Interpretation of offset depends on whence argument:

Normally, a non-negative integer.

Measures the number of bytes from the beginning of file.

read and write normally start at the current file offset.

They increment the offset by number of bytes read or written.


By default the offset is initialized to 0 when a file is opened.

whence
SEEK SET
SEEK CUR
SEEK END

Open with O APPEND is an exception.

offset (re-)position
offset is set to offset bytes from the beginning of file.
files offset is set to its current value plus offset.30
files offset is set to the size of the file plus offset.30

A successful call to lseek returns the new file offset.

Otherwise, a value of -1 is returned and errno is set.

30

The offset can be negative or positive.

381

Error Indication And Current Offset

382

Seeking Capability
Same goes to determine if a file is capable of seeking:

lseek will fail and the file pointer will remain unchanged if:

int
main ( void )
{
if ( lseek ( STDIN_FILENO , 0 , SEEK_SET ) == -1)
err ( errno , " can not seek [% d ]. " , errno );
else
printf ( " seek OK .\ n " );
return (0);
}

EBADF fd is not an open file descriptor.


ESPIPE fd is associated with a pipe, socket, or FIFO.
EINVAL whence is not a proper value or the resulting offset would
be negative on a filesystem or special device that does
not allow negative offsets to be used.

$ ./ a . out / etc / motd


seek OK .
$ cat / etc / motd | ./ a . out
a . out : can not seek [29].: Illegal seek

To determine the current offset, we can seek with zero offset:


off_t pos ;
pos = lseek ( fd , 0 , SEEK_CUR );

$ grep 29 / usr / include / sys / errno . h


# define ESPIPE
29

383

/* Illegal seek */

384

Extending a File with lseek()

Creating a File with a Hole


char
char

lseek and file offset


I

lseek only records the current file offset within the kernel.

It does not cause any I/O to take place.

This offset is then used by the next read or write operation.

int
main ( void )
{
int fd ;
if (( fd = creat ( " file . hole " , S_IRUSR | S_IWUSR )) < 0)
err (1 , " creat error " );
if ( write ( fd , buf1 , 10) != 10)
err (1 , " buf1 write error " );
/* offset now = 10 */
if ( lseek ( fd , 50*16384 , SEEK_SET ) == -1)
err (1 , " lseek error " );
/* offset now = 50 * 16384 */
if ( write ( fd , buf2 , 10) != 10)
err (1 , " buf2 write error " );
/* offset now = 50 * 16384 + 10 */

Creating holes
I

The files offset can be greater than the files current size.

The next write to the file will extend the file.

A hole will be created in the file (which is allowed).

buf1 [] = " abcdefghij " ;


buf2 [] = " ABCDEFGHIJ " ;

return (0);
}

385

Reading Holes

386

Holes and Disk Block Usage

Bytes in a file that have not been written are read back as 0.

$ ls -l
-rw - - - - - - - 1 holu holu 819210 May 15 11:05 file . hole
$ od -c file . hole
0000000 a b c d e f g h i j \0 \0 \0 \0 \0 \0
0000020 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
*
3100000 A B C D E F G H I J
3100012

A hole in a file isnt required to have storage backing it on


disk (depends on the filesystem implementation).
$ cat file . hole > file . nohole
$ ls - ls file . hole file . nohole
96 ... holu holu 819210 May 15 11:05 file . hole
1664 ... holu holu 819210 May 15 11:07 file . nohole

387

Although both files are the same size.

The file without holes consumes 1664 disk blocks

Whereas the file with holes consumes only 96 blocks.

388

Read and Write a File

Low-Level I/O
File Descriptor

Input and output uses the read and write system calls.

Those are accessed from C programs through two identically


named functions: read(2) and write(2).

For both the first argument is a file descriptor.

Second argument is a pointer to a buffer in the program


where the data is to go to or come from.

Third argument is the number of bytes to be transferred.

Open/Create/Close a File
Reposition File Offset
Read and Write a File
Properties of a File

# include < sys / types .h >


# include < unistd .h >

Primitive System Data Types


The size-related Fields

ssize_t
read ( int d , void * buf , size_t nbytes );

Directory Properties and Functions

ssize_t
write ( int d , const void * buf , size_t nbytes );

Device Numbers and Time-related Fields


389

Low-Level I/O Examples

390

File Content Copying, v3

A simple program to copy its input to its output


I

Its an equivalent of the file copy programs written previously.

It will copy anything to anything, since the input and output


can be redirected to any file or device.

# include < stdio .h >


# include < sys / types .h >
# include < unistd .h >
int /* copy anything to anything , v3 */
main ( void )
{
char buf [ BUFSIZ ];
int c ;

An implementation of getchar()
I

read/write can be used to construct higher-level routines.

First, with unbuffered input, reading stdin one char at a time.

Second, input in big chunks, output one char at a time.

while (( c = read (0 , buf , BUFSIZ )) > 0)


write (1 , buf , c );
return (0);

Simplified version of UNIX program cp


I

Copy one file to another

391

392

Unbuffered Version of getchar()

Buffered Version of getchar()


# include < stdio .h >
# include < sys / types .h >
# include < unistd .h >

# include < stdio .h >


# include < sys / types .h >
# include < unistd .h >

/* getchar : simple buffered version */


int
getchar ( void )
{
static char buf [ BUFSIZ ];
static char * bufp = buf ;
static int n = 0;

/* getchar : unbuffered single character input */


int
getchar ( void )
{
char c ;

if ( n == 0) {
/* buffer is empty */
n = read (0 , buf , sizeof buf );
bufp = buf ;
}
return ( - - n >= 0) ? ( unsigned char ) * bufp ++ : EOF ;

return ( read (0 , &c , 1) == 1) ? ( unsigned char ) c : EOF ;


}

393

File Copying, v4

394

Low-Level I/O

/* cp : copy f1 to f2 ( file copy ) */


int
main ( int argc , char * argv [])
{
int f1 , f2 ;
ssize_t n ;
char buf [ BUFSIZ ];

File Descriptor
Open/Create/Close a File
Reposition File Offset
Read and Write a File

if ( argc != 3)
error ( " Usage : cp from to " );
if (( f1 = open ( argv [1] , O_RDONLY , 0)) == -1)
error ( " can t open % s " , argv [1]);
if (( f2 = creat ( argv [2] , PERMS )) == -1)
error ( " can t create %s , mode %03 o " , argv [2] , PERMS );
while (( n = read ( f1 , buf , BUFSIZ )) > 0)
if ( write ( f2 , buf , n ) != n )
error ( " write error on file % s " , argv [2]);
return (0);

Properties of a File
Primitive System Data Types
The size-related Fields
Directory Properties and Functions
Device Numbers and Time-related Fields

}
395

396

Properties of a File

Synopsis of the stat Functions


# include < sys / types .h >
# include < sys / stat .h >

Drawbacks of our file copy implementation.


I

Can only copy one file.

Does not permit the second argument to be a directory.

Invents permission instead of copying them.

int
stat ( const char * path , struct stat * sb );
int
fstat ( int fd , struct stat * sb );

Metadata of files, the stat information.

int
lstat ( const char * path , struct stat * sb );

When interacting with filesystem it is often relevant to


determine information about a file (like with the permission
above) in addition to or instead of its actual content.

Given a path, the stat(2) function returns a structure of


information about the named file.

An example is the listing of a directory using ls(1).

fstat(2) works with a file descriptor, instead of a pathname.

It prints the name of files in the directory.

Optionally, other metadata, such as sizes, permissions etc.

lstat(2) returns information about the symbolic link, not


the referenced file.

Return 0 if OK, and -1 on error with errno set.

397

The stat Structure

398

Low-Level I/O

struct stat {
mode_t
st_mode ;
/* inode s mode */
uid_t
st_uid ;
/* user ID of owner */
gid_t
st_gid ;
/* group ID of owner */
off_t
st_size ;
/* file size , in bytes */
int64_t
st_blocks ; /* blocks allocated for file */
u_int32_t st_blksize ; /* optimal blocksize for I / O */
dev_t
st_dev ;
/* device inode resides on */
ino_t
st_ino ;
/* inode s number */
nlink_t
st_nlink ; /* number of hard links to the file */
dev_t
st_rdev ;
/* device type , for special file inode */
struct timespec st_atimespec ; /* last access */
struct timespec st_mtimespec ; /* last data modification */
struct timespec st_ctimespec ; /* last file status change */
u_int32_t st_flags ; /* user defined flags for file */
u_int32_t st_gen ;
/* file generation number */
};

File Descriptor
Open/Create/Close a File
Reposition File Offset
Read and Write a File
Properties of a File
Primitive System Data Types
The size-related Fields
Directory Properties and Functions

This is where ls -l gets its information from.

Mostly primitive system data types.

Device Numbers and Time-related Fields


399

400

Primitive System Data Types


Type
dev t
fd set
fpos t
gid t
ino t
mode t
nlink t
off t
pid t
ptrdiff t
size t
ssize t
time t
uid t
wchar t

File Types

Description
device numbers (major and minor)
file descriptor sets
file position
numeric group IDs
inode numbers
file type, file creation mode
link counts for directory entries
file sizes and offsets (signed)
process IDs and process group IDs
result of subtracting two pointers (signed)
size of objects (such as strings) (unsigned)
functions that return a count of bytes
(signed) (e.g., read, write)
counter of seconds of calendar time
numeric user IDs
can represent all distinct character codes

struct stat {
mode_t
...
};

st_mode ;

/* inode s mode */

The type of a file is encoded in the st mode member of stat.


Regular file Text and binary data.
Directory file Names of files and pointers to information on
these files.
Character special file Certain type of devices.
Block special file Typically disk devices.
FIFO Used for IPC (aka pipes).
Socket Network communication.
Symbolic link Pointer to another file.

Table: Some implementation-dependent data types (sys/types.h)

401

File Type Definition


struct stat {
mode_t
...
};

File Type Determination


st_mode ;

/* inode s mode */

Determine file types with macros (pass st mode as argument).

sys / types . h : typedef


sys / _types . h : typedef

Macro
S ISREG(m)
S ISDIR(m)
S ISCHR(m)
S ISBLK(m)
S ISFIFO(m)
S ISLNK(m)
S ISSOCK(m)

__mode_t
mode_t ;
__uint32_t __mode_t ;

/* sys / stat . h ( OpenBSD 4.3) */


# define S_IFMT

0170000 /* type of file mask */

# define
# define
# define
# define
# define
# define
# define

0010000
0020000
0040000
0060000
0100000
0120000
0140000

S_IFIFO
S_IFCHR
S_IFDIR
S_IFBLK
S_IFREG
S_IFLNK
S_IFSOCK

402

/*
/*
/*
/*
/*
/*
/*

named pipe ( fifo ) */


character special */
directory */
block special */
regular */
symbolic link */
socket */

Type of file
Regular file
Directory file
Character special file
Block special file
FIFO
Symbolic Link
Socket

Table: File type macros in <sys/stat.h>

403

404

File Types and Macros

Used Bits of st mode (so far)

File types:
/* sys / stat . h ( OpenBSD 4.3)
# define S_IFMT
0170000 /*
# define S_IFIFO 0010000 /*
# define S_IFCHR 0020000 /*
# define S_IFDIR 0040000 /*
# define S_IFBLK 0060000 /*
# define S_IFREG 0100000 /*
# define S_IFLNK 0120000 /*
# define S_IFSOCK 0140000 /*

*/
type of file mask */
named pipe ( fifo ) */
character special */
directory */
block special */
regular */
symbolic link */
socket */

What we have so far:


# define
# define
# define
# define
# define
# define
# define

File type macros:


# define
# define
# define
# define
# define
# define
# define

S_ISFIFO ( m )
S_ISCHR ( m )
S_ISDIR ( m )
S_ISBLK ( m )
S_ISREG ( m )
S_ISLNK ( m )
S_ISSOCK ( m )

(( m
(( m
(( m
(( m
(( m
(( m
(( m

&
&
&
&
&
&
&

0170000)
0170000)
0170000)
0170000)
0170000)
0170000)
0170000)

==
==
==
==
==
==
==

0010000)
0020000)
0040000)
0060000)
0100000)
0120000)
0140000)

/*
/*
/*
/*
/*
/*
/*

S_IFIFO
S_IFCHR
S_IFDIR
S_IFBLK
S_IFREG
S_IFLNK
S_IFSOCK

# define S_IFMT

fifo */
char spec . */
directory */
block spec . */
reg . file */
symb . link */
socket */

0
1
0 2
0 4
0 6
1
0
1 2
1 4

0
0
0
0
0
0
0

0
0
0
0
0
0
0

0
0
0
0
0
0
0

0
0
0
0
0
0
0

/*
/*
/*
/*
/*
/*
/*

named pipe ( fifo ) */


character special */
directory */
block special */
regular */
symbolic link */
socket */

1
7
0
0
0
0 /* type of file mask */
--x xxx --- --- --- --- /* used bits st_mode */
m ode

405

File Permissions

406

Used Bits of st mode (so far)

/* sys / stat . h ( OpenBSD 4.3) */


# define
# define
# define
# define

S_IRWXU
S_IRUSR
S_IWUSR
S_IXUSR

0000700
0000400
0000200
0000100

/*
/*
/*
/*

RWX mask for owner */


R for owner */
W for owner */
X for owner */

# define
# define
# define
# define

S_IRWXG
S_IRGRP
S_IWGRP
S_IXGRP

0000070
0000040
0000020
0000010

/*
/*
/*
/*

RWX mask for group */


R for group */
W for group */
X for group */

# define
# define
# define
# define

S_IRWXO
S_IROTH
S_IWOTH
S_IXOTH

0000007
0000004
0000002
0000001

/*
/*
/*
/*

RWX mask for other */


R for other */
W for other */
X for other */

# define S_IREAD
# define S_IWRITE
# define S_IEXEC

Permission bits 08:


# define
# define
# define
# define

S_IRWXO
S_IRWXG
S_IRWXU
S_IFMT

0
0
0
0
0
7
0
0
0
0
7
0
0
0
0
7
0
0
1
7
0
0
0
0
- -| ||| --- xxx xxx xxx
m ode
usr grp oth

/*
/*
/*
/*
/*

RWX mask for other */


RWX mask for group */
RWX mask for owner */
type of file mask */
used bits st_mode */

S_IRUSR
S_IWUSR
S_IXUSR

407

408

File Ownership

Process IDs

struct stat {
mode_t
uid_t
gid_t
...
};

st_mode ;
st_uid ;
st_gid ;

Every process has some associated IDs:


real user id
who we really are
real group id
effective user id
effective group id
used for file access permission checks
supplementary group ids
saved set-user-id
saved by exec functions
saved set-group-id

/* inode s mode */
/* user ID of owner */
/* group ID of owner */

Every file has some associated IDs.

Every file has an owner and a group owner.

real [uid|gid] taken from password file on login.

The owner is stored in member st uid, the group in st gid.

eff. [uid|gid] depends on set-[uid|gid] bits in st mode.


supplementary gid taken from /etc/group on login.
saved set-[uid|gid] copies of eff. [uid|gid].

409

The set-id Bits

410

Example for set-user-id

Bits 10 and 11:


# define S_ISGID
# define S_ISUID

Consider passwd(1) to modify a users password

0
0 2
0
0
0 /* set group id on exec */
0
0 4
0
0
0 /* set user id on exec */
- -| ||| xx - ||| ||| ||| /* used bits st_mode */
m ode ss - usr grp oth

$ ls -l / usr / bin / passwd


-r - sr - xr - x 1 root bin

When we execute a program file the effective user id of the


process is usually the real user id and the effective group id is
usually the real group id.

But, there are two special flags in st mode.

Set-user-id (S ISUID) and set-group-id (S ISGID).

. . . when this file is executed, set the effective user id of the


process to be the owner of the file (st uid).

411

22500 Aug 28 18:12 / usr / bin / passwd

Owner is root and set-user-id bit is set.

When the program file is running as a process, it has


superuser privileges.

This is independent of the real user id of the process that


executes the file.

This is required to write to the password file/db.

412

File Access Tests Performed by the Kernel

File Access Permissions Quiz

Given: effective user/group ids (process),


user/group ids and permissions (file)
Output: Grant or deny access to file

Do any file types have permissions?


Yes. Any file types have permission (not only regular files).

What does execute permission for a directory grant?

if eff uid equals 0 then


/* superuser has full reign throughout filesystem. */

"

Permission to pass through the directory when it is a component of


a pathname that we are trying to access (i.e., search the directory
looking for a specific filename).

if (eff uid equals st uid) then


/* process owns the file. */
if (apt user permission) then

" else %
if (eff gid equals st gid) then
if (apt group permission) then " else %
if (apt other permission) then " else %

What is the search bit? Where does its name come from?
Whenever we want to open any file by name we must have execute
permission in each directory mentioned in the name (including the
current directory if it is implied). This is why the execute
permission bit for the directory is often called the search bit.

414

413

File Access Permissions Quiz

The History of the Sticky Bit


The missing bit no. 9:

What does read permission for a directory grant?

# define S_ISTXT

Obtain a listing of all filenames in the directory.

To create a file what permissions do we need in the directory?


We can not create a new file in a directory unless we have write
and execute permission in the directory.

The S ISTXT aka S SVTX is known as sticky bit.

Formerly, it had an effect on executable programs.

If set, a copy of the programs text was saved in the swap area
on process termination. S ISVTX as mnemonic for saved-text.

This caused the program to load faster the next time.

Swap area was handled as a contiguous file, compared to


random data blocks in fs.

Sticky because it stuck in swap until reboot.

Obsoleted by faster filesystems.

To delete a file what permissions do we need in the directory?


We (also) need write and execute permission in the directory
containing the file.

Do we need read and write permissions on the file to delete it?


No.

415

0
0
1
0
0
0 /* sticky bit */
--x xxx xxx xxx xxx xxx /* bits st_mode */
m ode sst usr grp oth

416

The Sticky Bit Today


I
I

Functions Related to st mode, st uid, st gid

Today it has an effect on directories.


If set for a directory, a file in the directory can be removed or
renamed only if the user has write permission for the directory,
and either
I
I
I

owns the file,


owns the directory, or
is the superuser

/tmp is a good candidate for the sticky bit.

Any user can typically create files there.

drwxrwxrwt 7 root root 4.0K 2007-11-19 19:46 tmp

But users should not be able to delete or rename files owned


by others.

access(2) - check access permissions of a file or pathname

umask(2) - set file creation mode mask

chmod, fchmod(2) - change mode of file

chown, fchown, lchown(2) - change owner/group of a file

418

417

The size-related Fields

Low-Level I/O
File Descriptor

struct stat {
off_t
int64_t
u_int32_t
...
};

Open/Create/Close a File
Reposition File Offset

st_size ;
/* file size , in bytes */
st_blocks ; /* blocks allocated for file */
st_blksize ; /* optimal blocksize for I / O */

Read and Write a File


st size Regular file. File size of 0 is allowed (EOF on read).
Symbolic link. Actual number of bytes of the
targetstring w/o terminating null byte.

Properties of a File
Primitive System Data Types

st blocks The actual number of blocks allocated for the file in


512-byte units (non-portable!). As short symbolic links
are stored in the inode, this number may be zero.

The size-related Fields


Directory Properties and Functions

st blksize The preferred block size for I/O for the file.

Device Numbers and Time-related Fields


419

420

Write a Single Byte File

Write a Single Byte File (cont.)

int
main ( void )
{
int fd ;
ssize_t wb ;
struct stat s ;
struct statfs sfs ;

Get some background information:

if (( fd = open ( " / tmp / file " , O_CREAT | O_TRUNC | O_RDWR , 0600)) == -1)
err ( errno , " can not create file . [% d ] " , errno );
if (( wb = write ( fd , " a " , 1)) == -1)
err ( errno , " can not write to fd % d " , fd );
if ( fstat ( fd , & s ) != 0)
err ( errno , " fstat failed . " );
if ( fstatfs ( fd , & sfs ) != 0)
err ( errno , " statfs error occured . " );
printf ( " st_size :\ t % lld \ n " , s . st_size );
printf ( " st_blocks :\ t % lld \ n " , s . st_blocks );
printf ( " st_blksize :\ t % d \ n " , s . st_blksize );
printf ( " f_bsize :\ t % d \ n " , sfs . f_bsize );
printf ( " f_iosize :\ t % u \ n " , sfs . f_iosize );
return (0);

st_size :
st_blocks :
st_blksize :
f_bsize :
f_iosize :

1
4
16384
2048
16384

/*
/*
/*
/*
/*

the single character */


4 x 512 = 2048 */
preferred file I / O size */
fundamental fs block size */
optimal transfer block size */

Compare it:
$ ls - ls / tmp / file
4 -rw -r - -r - - 1 holu

wheel

1 May 16 11:06 / tmp / file

Occupies one filesystem data block (2K) to store the single byte.

}
421

Get Filesystem Statistics (BSD UNIX)

422

Functions Related to size Fields

# include < sys / param .h >


# include < sys / mount .h >

# include < unistd .h >

int
fstatfs ( int fd , struct statfs * buf );

int
truncate ( const char * path , off_t length );

struct statfs {
u_int32_t f_flags ;
u_int32_t f_bsize ;
u_int32_t f_iosize ;

/* copy of mount flags */


/* file system block size */
/* optimal transfer block size */
/* unit is f_bsize */
u_int64_t f_blocks ;
/* total data blocks in file system */
u_int64_t f_bfree ;
/* free blocks in fs */
int64_t
f_bavail ;
/* free blocks avail to non - superuser */
u_int64_t f_files ;
/* total file nodes in file system */
u_int64_t f_ffree ;
/* free file nodes in fs */
int64_t
f_favail ;
/* free file nodes avail to non - root */
u_int64_t f_syncwrites ; /* count of sync writes since mount */
u_int64_t f_syncreads ;
/* count of sync reads since mount */
u_int64_t f_asyncwrites ; /* count of async writes since mount */
u_int64_t f_asyncreads ; /* count of async reads since mount */
fsid_t
f_fsid ;
/* file system id */
u_int32_t f_namemax ;
/* maximum filename length */
uid_t
f_owner ;
/* user that mounted the file system */
u_int32_t f_ctime ;
/* last mount [ - u ] time */
u_int32_t f_spare [3];
/* spare for later */
char f_fstypename [ MFSNAMELEN ]; /* fs type name */
char f_mntonname [ MNAMELEN ];
/* directory on which mounted */
char f_mntfromname [ MNAMELEN ]; /* mounted file system */
union mount_info mount_info ;
/* per - filesystem mount options */

int
ftruncate ( int fd , off_t length );
/* Both return 0 if OK , -1 on error */
I

Truncate an existing file to length bytes.

If previous size was greater than length, the data beyond


length is no longer accessible.

If previous size was less than length, the effect is system


dependant ( create a hole).

};

423

424

Inodes and Hard Links

Low-Level I/O
File Descriptor

struct stat {
ino_t
st_ino ;
/* inode s number */
nlink_t
st_nlink ; /* number of hard links to the file */
...
};

Open/Create/Close a File
Reposition File Offset
Read and Write a File

For a better understanding of these members, it is advisable


to recall the difference between an inode and a directory entry
that refers to an inode.

This will be additionally of importance when we talk about


the concept of links to a file.

Therefore we will have a look at the traditional UNIX


filesystem organization.

Properties of a File
Primitive System Data Types
The size-related Fields
Directory Properties and Functions
Device Numbers and Time-related Fields
425

Disk drive, Partitions, and Filesystem

426

A Closer Look at the Filesystem

Figure: Cylinder groups inodes and data blocks in more detail. Two
directory entries point to the same inode.[Apue,Fig. 4.14]

Figure: A disk drive being divided into one or more partitions. Each
partition can contain a filesystem. Inodes are fixed-length entries that
contain most of the information about a file. [Apue,Fig. 4.13]
427

428

The Link Count

Make a Hard File Link


# include < unistd .h >

Every inode has a link count.

It contains the number of dir entries pointing to the inode.

Only when the link count goes to zero can the file be deleted
(i.e., can the associated data blocks be released).

This is why unlinking a file does not always mean deleting the
blocks associated with the file.

link creates the specified directory entry (hard link) name2


with the attributes of the object pointed at by name1.

If the link is successful: the link count of the underlying object


is incremented; name1 and name2 share equal access and
rights to the underlying object.

If name1 is removed, the file name2 is not deleted and the link
count of the underlying object is decremented.

name1 must exist for the hard link to succeed and both
name1 and name2 must be in the same file system. As
mandated by Posix.1 name1 may not be a directory.

This is why the function that removes a directory entry is


called unlink(2) and not delete.

This type of links is called hard link (vs. symbolic link).

In the stat structure the link count is called st nlink and


has type nlink t with limit LINK MAX.

int
/* 0 if OK , -1 on error */
link ( const char * name1 , const char * name2 );

429

Remove a Directory Entry

430

Remove a File or Directory

# include < unistd .h >


int
unlink ( const char * path );

/* 0 if OK , -1 on error */

unlink removes the link named by path from its directory.

Link count of the file referenced by the link is decremented.

If that decrement reduces the link count of the file to zero,


and no process has the file open, then all resources associated
with the file are reclaimed.

# include < stdio .h >


int
remove ( const char * path );

/* 0 if OK , -1 on error */

remove removes the file or directory specified by path.

If path specifies a directory, this is equal to rmdir(path).

Otherwise, it is the equivalent of unlink(path).

If one or more processes have the file open when the last link
is removed, the link is removed, but the removal of the file is
delayed until all references to it have been closed.

431

432

Change the Name of a File

Notes on Renaming a File

# include < stdio .h >

When renaming a file without changing filesystems . . .


int
/* 0 if OK , -1 on error */
rename ( const char * from , const char * to );
I

rename causes the link named from to be renamed as to.

If to exists, it is first removed.

Both from and to must be of the same type (that is, both
directories or both non-directories), and must reside on the
same filesystem.

If the final component of from is a symbolic link, the symbolic


link is renamed, not the file or directory to which it points.

The actual content of the file need not be moved.

A new directory entry needs to point to the existing inode.

The old directory entry has to be removed.

Example: rename(/usr/lib/foo, /usr/foo)

Assumption: /usr/lib and /usr are on the same filesystem.

Consequence: the contents of foo need not be moved.

This is how mv(1) usually operates.

433

File Metadata

On-disk Format of an inode (Example)

The inode contains all metadata associated with a file:


I

File type

Files access permission bits

Size of the file

Pointers to the data blocks for the file

Most stat entries are obtained from the inode

However, filename and inode number ino t are stored in the


directory entry.

434

struct ufs1_dinode {
u_int16_t
di_mode ;
int16_t
di_nlink ;
union {
u_int16_t oldids [2];
u_int32_t inumber ;
} di_u ;
u_int64_t
di_size ;
int32_t
di_atime ;
int32_t
di_atimensec ;
int32_t
di_mtime ;
int32_t
di_mtimensec ;
int32_t
di_ctime ;
int32_t
di_ctimensec ;
int32_t
di_db [ NDADDR ];
int32_t
di_ib [ NIADDR ];
u_int32_t
di_flags ;
int32_t
di_blocks ;
int32_t
di_gen ;
u_int32_t
di_uid ;
u_int32_t
di_gid ;
int32_t
di_spare [2];
};

struct dirent {
ino_t d_ino ;
/* inode number */
char d_name [ NAME_MAX + 1]; /* null - terminated filename */
}

435

/*
/*

0: IFMT , permissions ; see below . */


2: File link count . */

/*
/*

4: Ffs : old user and group ids . */


4: Lfs : inode number . */

/*
/*
/*
/*
/*
/*
/*
/*
/*
/*
/*
/*
/*
/*
/*

8:
16:
20:
24:
28:
32:
36:
40:
88:
100:
104:
108:
112:
116:
120:

File byte count . */


Last access time . */
Last access time . */
Last modified time . */
Last modified time . */
Last inode change time . */
Last inode change time . */
Direct disk blocks . */
Indirect disk blocks . */
Status flags ( chflags ). */
Blocks actually held . */
Generation number . */
File owner . */
File group . */
Reserved ; currently unused */

436

Structures for Directory Handling

Reading Directories

The structures dirent and DIR are implementation dependent


I

They are defined in the file <dirent.h>.

For simplicity you can think of those as:

# define NAME_MAX

Actual format of a directory depends on the UNIX system and


the design of the filesystem.
To simplify access a set of directory routines were developed
and are now part of SUSv3.
I

255 /* longest filename component */

typedef struct struct {


ino_t d_ino ;
/* inode number */
char d_name [ NAME_MAX + 1]; /* null - terminated filename */
} dirent ;

opendir, readdir, telldir, closedir, seekdir, and


rewinddir

The internal DIR is used by the six functions to maintain


information about the directory being read.31

The pointer to the internal DIR structure is returned by


opendir and then used with the other five functions.

typedef struct { /* minimal DIR : no buffering etc . */


int fd ;
/* file descriptor for directory */
dirent d ; /* the directory entry */
} DIR ;
31

We will see a similar approach when discussing the standard library I/O
function and the FILE structure.
438

437

Directory Functions

List Directory Contents


# include
# include
# include
# include

# include < sys / types .h >


# include < dirent .h >
DIR *
/* Returns : pointer if OK , NULL on error */
opendir ( const char * filename );

int
main ( int argc , char * argv [])
{
DIR * dp ;
struct dirent * dent ;

struct dirent *
/* ptr or NULL at end of dir / error */
readdir ( DIR * dirp );
long
/* current location in directory stream */
telldir ( const DIR * dirp );
int
closedir ( DIR * dirp );

< dirent .h >


< err .h >
< errno .h >
< stdio .h >

if ( argc != 2)
errx (1 , " single arg ( directory name ) required . " );
if (( dp = opendir ( argv [1])) == NULL )
err ( errno , " can not open % s " , argv [1]);
while (( dent = readdir ( dp )) != NULL )
printf ( " % s \ n " , dent - > d_name );
closedir ( dp );

/* 0 if OK , -1 on error */

void
seekdir ( DIR * dirp , long loc );
void
rewinddir ( DIR * dirp );

return (0);
}

439

440

Functions Related to Directories

Low-Level I/O
File Descriptor

# include < sys / types .h >


# include < sys / stat .h >
int
/* 0 if OK , -1 on error */
mkdir ( const char * path , mode_t mode );

Open/Create/Close a File
Reposition File Offset

# include < unistd .h >


int
rmdir ( const char * path );

Read and Write a File

/* 0 if OK , -1 on error */

Properties of a File
int
chdir ( const char * path );

/* 0 if OK , -1 on error */

int
fchdir ( int fd );

/* 0 if OK , -1 on error */

Primitive System Data Types


The size-related Fields
Directory Properties and Functions

char *
/* buf if ok , NULL on error */
getcwd ( char * buf , size_t size );

Device Numbers and Time-related Fields


442

441

Device Numbers

The Three Time Fields

struct stat {
dev_t st_dev ; /* device inode resides on */
dev_t st_rdev ; /* device type , for special file inode */
...
};
I

A filesystem is known by its major and minor device number.

The major number identifies the device driver.

The minor number identifies the specific subdevice.32

Each filesystem on the same disk drive would usually have the
same major number, but a different minor number.

32

struct stat {
...
struct timespec st_atime ; /* last access */
struct timespec st_mtime ; /* last data modification */
struct timespec st_ctime ; /* last - change time of inode */
};

Three time fields are maintained for each file:


I

Many operations (changing the file access permissions,


changing the user id, changing the number of links . . . ) do
affect the inode without changing the actual content.

st dev is the device number of the filesystem that contains


the filename and its corresponding inode.

Since inode and content are stored separately we need both.

Note that there is no last-access time for an inode.

Only character and block special files have an st rdev value.


It contains the device number for the actual device.

This is why functions like access(2) and stat(2) do not


change any of the three values.

A disk drive often contains several filesystems


443

444

Schedule for Today


Last lecture: File I/O

Systems Programming

08. Standard I/O Library

Low-Level I/O:
Unbuffered I/O and control functions on file descriptors.

Alexander Holupirek

Filesystem Interface:
Functions for operating on directories and for manipulating
file attributes such as access modes and ownership.

Database and Information Systems Group


Department of Computer & Information Science
University of Konstanz

Today: Standard I/O Library

Summer Term 2008

Standard I/O library (ISO C) aka I/O on streams.


High-level functions that operate on streams, including
formatted input and output.

Discussion of the lecture evaluation.

Discussion of project part I.


446

445

The Standard I/O Library

Streams and FILE Objects


Unbuffered I/O File descriptors

Input and output functionality of the ISO C standard library

So far I/O centered around file descriptors.

Specified by the ISO C standard.

When a file is opened a file descriptor is returned.

Has been implemented on many OSs other than UNIX.

It was used for all subsequent I/O operations.

Additional interfaces defined as extensions by SUSv3.

Handles details such as buffer allocation and performing I/O


in optimal-sized chunks (no need to worry about using the
correct buffer size).

Ease of use.

Initially written by Dennis Ritchie around 1975.

Standard I/O Library Streams

447

Standard I/O centers around streams.

When opening or creating a file we say that we associate a


stream with the file (fopen(3) returns a pointer to FILE).

FILE contains all the information required by the standard


I/O library to manage the stream.

448

The FILE object

Single- and Multibyte Character Sets

Typical members of the FILE structure


I

The file descriptor used for actual I/O.

A pointer to a buffer for the stream.

The size of the buffer.

Count of the number of characters currently in the buffer.

An error flag.

Standard I/O file streams can be used with single-byte and


multibyte (wide) character sets.

ASCII character set:


A single character is represented by a single byte.

A character can be represented by more than one byte.


I

A streams orientation determines whether the characters that


are read and written are single-byte or multibyte.

Initially, when a stream is created, it has no orientation.

If a multibyte I/O function ( wchar.h) is used on a stream


without orientation, the streams orientation is set to
wide-oriented, and byte-oriented in case of byte I/O functions.

Incidental Remark
I

International character sets:

In general there is no need to examine a FILE object, just pass


the pointer as an argument to each standard I/O function.
A pointer with type FILE * is referred to as a file pointer.

449

Predefined streams

450

Buffering

Standard Input, Standard Output, and Standard Error


The standard I/O library provides buffering

3 predefined streams are automatically available to a process.

They refer to the same files as the file descriptors:


STDIN FILENO, STDOUT FILENO, and STDERR FILENO.

Goal is to minimize the number of read and write calls.

Buffering is tried to be automatically associated to streams.

They are referenced through the predefined file pointers


stdin, stdout, and stderr, defined in <stdio.h>.

Applications should not worry about it.

Different buffering modes can lead to confusions.

/* < stdio .h > */

Three types of buffering provided by the standard I/O library

__BEGIN_DECLS
extern FILE __sF [];
__END_DECLS
# define stdin
# define stdout
# define stderr

(& __sF [0])


(& __sF [1])
(& __sF [2])

451

Fully buffered

Line buffered

Unbuffered

452

Fully (block) buffered I/O

Line buffered I/O

Line buffered I/O provided by the standard I/O library:

The fully buffered I/O provided by the standard I/O library:


I

Actual I/O takes place when the standard I/O buffer is filled.

Files residing on disk are normally fully buffered by the library.

The buffer is obtained by one of the I/O functions.

Usually by calling malloc(3) the first time I/O takes place.

The term flush describes the writing of a standard I/O buffer.

A buffer can be flushed automatically by the standard I/O


routines such as when a buffer fills.

Explicitly, by using the function fflush(3).

Actual I/O takes place, when a newline character is


encountered on input or output.

This allows us to output a single character at a time (e.g.,


with fputc(3)), knowing that actual I/O will take place only
when we finish writing each line.

Line buffering is typically used on a stream when it refers to a


terminal (e.g., standard input and standard output).

However, the size of the buffer is fixed, so I/O might take


place if the buffer is filled before a newline is seen.

453

Unbuffered I/O

454

ISO C Buffering Requirements


ISO C requires the following buffering characteristics:

Unbuffered I/O:

Standard input and output are fully buffered, if and only if


they do not refer to an interactive device.

Standard error is never fully buffered.

The standard I/O library does not buffer the characters.

When an output stream is unbuffered, information appears on


the destination file/terminal as soon as written write(2).

Standard error stream is normally unbuffered.

Should standard input and output be unbuffered or line buffered, if


they refer to an interactive device? Should standard error be line
buffered or unbuffered?

Any error messages are displayed as quickly as possible


(regardless whether they contain a newline or not).

System dependent (for instance OpenBSD)

455

If stdin and stdout refer to a terminal they are line buffered.

Standard error is initially unbuffered.

456

Turn Buffering On and Off

Alter Buffering Behaviour


# include < stdio .h >

# include < stdio .h >

int
/* 0 if OK else EOF ( but stream is still functional ) */
setvbuf ( FILE * stream , char * buf , int mode , size_t size );

void
setbuf ( FILE * stream , char * buf );

setvbuf is used to alter the buffering behavior of a stream.

setbuf turns buffering on or off.

May only be used after sucessful open and before first I/O.

It may be implemented similar to:

mode must be one of the following three macros:


# define _IOFBF 0 /* setvbuf should set fully buffered */
# define _IOLBF 1 /* setvbuf should set line buffered */
# define _IONBF 2 /* setvbuf should set unbuffered */

/* / usr / src / lib / libc / stdio / setbuf . c */


# include < stdio .h >
void
setbuf ( FILE * fp , char * buf )
{
( void ) setvbuf ( fp , buf , buf ? _IOFBF : _IONBF , BUFSIZ );
}

I
I

For an unbuffered stream, buf and size are ignored.


For line or fully buffered streams
I
I

33

buf and size can optionally specify a buffer and its size.
If buf is NULL the system chooses an apt size33 .

System-dependent:

BUFSIZE (stdio.h), st blksize (stat.h)

457

Buffer Options and Flushing a Stream

Opening a Stream

The setbuf and setvbuf functions and their options:


Function

mode

setbuf
IOFBF
setvbuf

IOLBF
IONBF

buf

Buffer & length

Type of buffering

nonnull
NULL

user buf of length BUFSIZ


(no buffer)

fully buffered or line buffered


unbuffered

nonnull
NULL
nonnull
NULL
(ignored)

user buf of length


system buffer of apt
user buf of length
system buffer of apt
(no buffer)

size
length
size
length

458

# include < stdio .h >


FILE *
fopen ( const char * path , const char * mode );

fully buffered

FILE *
freopen ( const char * path , const char * mode , FILE * stream );

line buffered
unbuffered

FILE * /* all : fpointer if OK , NULL on failure with errno */


fdopen ( int fildes , const char * mode );

At any time, a stream can be flushed:


I

# include < stdio .h >

I
int
/* 0 if OK , EOF on failure and errno set */
fflush ( FILE * stream );

fopen opens a specified file


freopen opens a specified file on a specified stream.
I
I

Any unwritten data for the stream is passed to the kernel.

If stream is NULL, all output streams are flushed.

459

The original stream (if it exists) is always closed.


Change the file associated with stderr, stdin, stdout.

fdopen is part of Posix.1 not ISO C, as standard I/O does


not deal with file descriptors.

460

Modes to Open a Standard I/O Stream

Meanings of Open Modes with fdopen

ISO C specifies 15 values for opening a standard I/O stream:


mode
r or rb
w or wb
a or ab
r+ or r+b or rb+
w+ or w+b or wb+
a+ or a+b or ab+

fdopen associates a stream with an existing file descriptor

Description
open for reading
truncate to 0 length or create for writing
append; open for writing at end of file, or create for writing
open for reading and writing
truncate to 0 length or create for reading and writing
open or create for reading and writing at end of file

Using b allows to differentiate between text and binary files.

UNIX kernels do not differentiate between these types of files


it has no effect.

With fdopen, the meaning of mode differs slightly.

Opening for write does not truncate the file.


I
I
I
I

Example: Descriptor was created by open(2).


The file already existed.
O TRUNC flag is in control whether file is to be truncated.
fdopen can not simply truncate any file it opens for writing.

Opening for append can not create the file.


I

The file has to exist if a descriptor refers to it.

461

Appending to a Stream

462

Sharing a Single File

Append mode guarantees atomic operation


I

Opening for append: Each write is at the current end of file.

If multiple processes open the same file using append mode34 ,


data from each process will be correctly written to the file.

Older versions of UNIX didnt support the append mode, so


programs were coded as follows:
if ( lseek ( fd , 0L , 2) < 0) /* position to EOF ... */
err ( errno , " lseek error " );
if ( write ( fd , buf , 100) != 100) /* ... and write */
err ( errno , " write error " );

This works fine for a single process, but problems arise if


multiple processes use this technique to append to the same
file appending messages to a logfile, for instance.
Figure: Two processes with the same file open [Apue, Fig. 3.7]

34

Same holds for O APPEND with open(2) function.


463

464

Lost Update on Append

Atomic Operation with Append Mode

Assume processes A and B append to the same file.35

Each process has its own file table entry, but they share a
single v-node table entry (see Figure 3.7).

Problem: Logical operation position to EOF and write


causes two seperate function calls.

A performs lseek to EOF and sets current file offset to 1500.

Kernel switches and schedules B to run.

Solution: Positioning to the current end of file and the write


has to be an atomic operation with regard to other processes.

B performs lseek to 1500 (EOF).

B performs write and increments current file offset to 1600.

Any operation that requires more than one function call


cannot be atomic (kernel can suspend the process in-between).

Since the file size has been extended, the kernel also updates
the current file size in the v-node to 1600.

Append mode is an atomic way.

Each time a write is performed for a file with this append flag
set, the current file offset in the file table entry is first set to
the current file size from the i-node table entry.

Each every write appends to the (updated) current end of file.

Kernel switches and A resumes.

When A calls write, the data is written at current file offset


for A, which is 1500.

I
35

Data wrote by process B is overwritten.

 Lost update.

Without using append mode.


466

465

Final Remarks on Opening a Stream

Closing a stream with fclose(3)

Six different ways to open a standard I/O stream:


Restriction
file must already exist
previous contents of file discarded
stream can be read
stream can be written
stream can be written only at end

r
x

r+
x

x
x
x

x
x

x
x

w+

a+
# include < stdio .h >

x
x
x

int
/* 0 if OK , else EOF / errno ( no further access ) */
fclose ( FILE * stream );

x
x
x

An open stream is closed by calling fclose.

Any buffered output data is flushed before the file is closed.

Any created files will have mode S IRUSR | S IWUSR |


S IRGRP | S IWGRP | S IROTH | S IWOTH (0666).

Any buffered input data is discarded.

Any automatically allocated buffers are released.

With a file opened for reading and writing (+ sign in mode)


reads and writes cannot be arbitrarily intermixed.

When a process terminates normally (calling exit or returning


from main), all open standard I/O streams are closed.

Output shall not be directly followed by input without an


intervening fflush. Input shall not be followed by output
without repositioning.

Creating a new file with mode w or a, there is no way to


specify files access permission bits, as with open(2).

467

468

Reading and Writing a Stream

Character-at-a-time Input Functions

Once opened a stream there are three different types of I/O:


I

Character-at-a-time I/O. Read and write one character at a


time, with the standard I/O functions handling all the
buffering (if the stream is buffered).

Line-at-a-time I/O. To read or write a line at a time, we use


fgets(3) and fputs(3). Each line is terminated with a
newline character, and we have to specify the maximum line
length we can handle.

# include < stdio .h >


int
fgetc ( FILE * stream );
int
getc ( FILE * stream );
int /* equivalent to getc () with the argument stdin . */
getchar ( void );

Direct I/O36 . Provided by fread(3) and fwrite(3). For


each operation we read or write some number of objects,
where each object is of specified size.

Next character as an unsigned char converted to int.


I

These types of I/O are refered to as unformatted I/O. Formatted


I/O is done by functions, such as printf or scanf.
36

Return the next requested object from the stream.


The input functions return the same value whether an error
occurs or EOF (feof and ferror are used to distinguish).

aka binary I/O, object-at-a-time I/O, record/structure-oriented I/O


469

Check Stream Status

470

Push-Back Characters
# include < stdio .h >

# include < stdio .h >


int
feof ( FILE * stream );

/* non - zero if it is set */

int
ferror ( FILE * stream );

/* non - zero if it is set */

int
/* c if OK , EOF on failure */
ungetc ( int c , FILE * stream );
I

Characters pushed back return by subsequent reads on the


stream in reverse order of their pushing (FILO).

One character of push-back is guaranteed, but as long as


there is sufficient memory, an effectively infinite amount of
pushback is allowed.

If a character is successfully pushed-back, the end-of-file


indicator for the stream is cleared.

Pushing back EOF will fail and the stream remains unchanged.

Pushed characters dont get written back to file or device.


They are kept incore.

int
clearerr ( FILE * stream );
I

Most implementations have two flags for each stream in FILE.


An error flag. An end-of-file flag.

Both flags are cleared by clearerr.

471

472

Character-at-a-time Output Functions

Line-at-a-time Input Functions


# include < stdio .h >

# include < stdio .h >


char *
fgets ( char * str , int size , FILE * stream );

int
fputc ( int c , FILE * stream );

char * /* should NEVER be used - > unknown buffer size */


gets ( char * str );

int
putc ( int c , FILE * stream );

/* Both return str if OK , NULL on EOF or error */


int
/* All : c if OK , EOF / errno on failure */
putchar ( int c );
I

str specifies the address of the buffer to read the line into.

The functions write the character c (converted to an


unsigned char) to the output stream.

gets reads from stdin and fgets from stream.

With fgets the size of the buffer is specified.

EOF is returned if a write error occures, or if an attempt is


made to write a read-only stream.

The buffer is always null-terminated, i.e., at most size 1 is


read. If the line is longer, a partial line is returned. The next
call will read what follows.

473

Line-at-a-time Output Functions

474

Binary I/O

# include < stdio .h >

Motivation for binary I/O

int
/* 0 on success and EOF on error */
fputs ( const char * str , FILE * stream );
int
puts ( const char * str );
/* >=0 on success and EOF or error */

Read or write an entire structure at a time.

With character-at-a-time functions, such as getc or putc we


have to loop through an entire structure.
Line-at-a-time functions will not work.

I
I

fputs writes the string pointed to by str to the stream


pointed to by stream.

fputs stops writing when it hits a null byte.


fgets wont work right on input with null or newline bytes.

puts writes the string str, and a terminating newline


character, to the stream stdout.

475

476

Binary I/O Functions

Binary I/OWrite an Array

# include < stdio .h >


size_t
fread ( void * ptr , size_t size ,
size_t nmemb , FILE * stream );

The functions have two common cases:


I

size_t
fwrite ( const void * ptr , size_t size ,
size_t nmemb , FILE * stream );
/* Return number of objects read or written */

Read or write a binary array


float data [10];
if ( fwrite (& data [2] , sizeof ( float ) , 4 , fp ) != 4)
err (1 , " fwrite error . " );

fread reads nmemb objects, each size bytes long.

size as the size of each element of the array.

Input is taken from stream and stored at the location ptr.


Both return number of objects read or written.

nmemb as the number of elements.

I
I

37

For read it can be less than nmemb if error occurs or EOF.37


For write an error has occured if it is not equal to nmemb.

ferror and feof must be called to determine.


477

Binary I/OWrite a Structure

Fundamental Problems with Binary I/O

Read or write a structure

Binary formats change between compilers and architectures

struct tuple {
unsigned int size ;
unsigned int level ;
enum kind kind ;
void * cnt ;
} tup ;
if ( fwrite (& tup , sizeof ( tup ) , 1 , fp ) != 1)
err (1 , " fwrite error . " );
I

size as the size of structure.

nmemb as one (the number of objects to write).

478

479

Binary formats used to store multibyte integers and


floating-point values differ among machine architectures.

The offset of a member within a structure can differ between


compilers and systems.

Even on a single system, the binary layout of a structure


can differ, depending on compiler options.

To exchanging binary data among different systems a


higher-level protocol is probably the better choice.

480

Positioning a Stream

Obtaining a File Descriptor

# include < stdio .h >

There are three ways to position a standard I/O stream:

int
/* file descriptor assoc . with the stream */
fileno ( FILE * stream );

ftell and fseek. File position stored as long. (Historic)

ftello and fseeko. File position stored as off t. (SUSv3)

fgetpos and fsetpos. File position stored as fpos t. (ISO C)

On UNIX, the standard I/O library ends up calling the


low-level I/O routines.

They work similar to lseek(2) and the whence options (SEEK SET
etc.) are the same.

Each standard I/O stream has an associated file descriptor.

fileno can obtain the descriptor (SUSv3, not ISO C).

481

482

Schedule for Today


Systems Programming
Last lectures: File I/O

09. Filesystem in USErspace (FUSE)

Alexander Holupirek
Database and Information Systems Group
Department of Computer & Information Science
University of Konstanz

Unbuffered I/O and control functions on file descriptors.

Functions for operating on directories and for manipulating


file attributes such as access modes and ownership.

I/O on streams, i.e., standard I/O library (ISO C).

Today: Filesystem in USErspace (FUSE)

Summer Term 2008

Short introduction to FUSE with practical examples.

Stepwise discussion of project part III.


Use relational storage as filesystem implementation.

483

484

FUSE stands for Filesystem in USErspace

Ports of the Userspace Filesystem Interface


While the interface definition originated in LINUX38 , support has
since been added to multiple different UNIX-type OSs:
I LINUX

FUSE stands for Filesystem in USErspace


I

It provides a framework for building userspace filesystem


servers, i.e., implement filesystems with userspace code.

Userlevel FS have been around before (on Linux NFS was


implemented that way for quite some time).

The basic idea is to integrate information behind the


filesystem namespace (GmailFS, FUSEPod ...).

From a programmers perspective FUSE provides a library and


defines a standard and a low-level interface to use it.

A FUSE-supported system integrates kernel and userlevel


components.

NetBSD/puffs/reFUSE
I

http://www.netbsd.org/docs/puffs/

FreeBSD/fuse4bsd
I

http://fuse4bsd.creo.hu/

Mac OS X/MacFUSE
I

http://code.google.com/p/macfuse/

OpenSolaris
I

38

http://fuse.sourceforge.net/

http://opensolaris.org/os/project/fuse/

http://kerneltrap.org/node/4517, Miklos announces FUSE (LKML)

485

FUSE-based FS Implementations

The Big Picture

sshfs - mount a SSH filesystem

How do userspace FSs operate?

$ sshfs username@hostname :/ path / to / mount \


> ~/ mnt / ssh / -o uid =1000 , gid =1000
$ fusermount -u ~/ mnt / ssh /
I
I

NTFS-3G (http://www.ntfs-3g.org)
Comprehensive list of FUSE-based FSs.
I

http://fuse.sourceforge.net/wiki/index.php/FileSystems

ArchiveFSs, CompressedFSs, DatabaseFSs, EncryptedFSs,


MediaFSs, HardwareFSs, MonitoringFSs, NetworkFSs,
NonNativeFSs, UnionFSs, VersioningFSs

486

487

Attach an in-kernel filesystem (component/module) to the


kernels virtual filesystem layer.

It prepares incoming requests for delivery to userspace.

Sends the request to userspace.

Waits for a response.

Interprets the answers.

Feeds the results back to the caller in the kernel.

488

The Big Picture (Illustrated)

Communication via Special Device

Special file descriptor /dev/fuse


I

The FUSE kernel module and the FUSE library communicate


via a special file descriptor which is obtained by opening
/dev/fuse.

This file can be opened multiple times, and the obtained file
descriptor is passed to the mount syscall, to match up the
descriptor with the mounted filesystem.

Figure: Path of a filesystem call (e.g., stat) [FUSE project page]

489

FUSE-based FS Implementation Overview

Restrictions and Possibilities

How do userspace FSs operate (implementation view)?


I

A userlevel file server registers a number of callbacks with the


userlevel library.

It requests the kernel to mount the filesystem.

Control is either passed to the library or kept with the caller.

The library provides routines to decode filesystem requests


from the kernel.

The library calls back the appropriate registered functions.

The library passes back the results to the kernel.

490

Incidental Remarks

491

The kernel filesystem calling conventions dictate how to


interface with the virtual filesystem layer.

Other than that, myFUSE is free to decide how to operate.

myFUSE is free to provide other interfaces to userspace.

Applications and the rest of the kernel (outside the VFS


module) cannot distinguish a filesystem implemented on top
of FUSE from a filesystem implementation in the kernel.

492

Standard and Low Level Interface

FUSE Summary
A. Kernel Module39

For the filesystem callbacks, FUSE provides two different


interfaces against which to write a filesystem:
I

The kernel module hooks into the VFS code and looks like a
filesystem module.

It implements a special-purpose device which can be opened


by a userspace process.

It spends its time accepting filesystem requests.

Translates them into its own protocol.

Sends them out via the device interface.

Responses to requests come back from userspace via the


FUSE device.

They are translated back into the form expected by the kernel.

The standard interface based on pathnames.


I

Operations resemble system calls.

The low level interface.


I
I

It resemble the kernel virtual filesystem interface closely.


Requires the filesystem to manually handle all operation traffic
between filesystem and kernel.

39

Jonathan Corbet summarizes FUSE as a three-part system


http://lwn.net/Articles/68104/
493

FUSE Summary (cont.)

494

FUSE Summary (cont.)


C. FUSE-based FS implementation

B. FUSE library
I

FUSE implements a library which manages communications


with the kernel module.

It accepts filesystem requests from the FUSE device.

Translates them into a set of function calls which look similar


(but not identical) to the kernels VFS interface.

These functions have names like open(), read(), write(),


rename(), symlink(), etc.

The user-supplied component which actually implements the


filesystem of interest.

It fills a fuse operations structure with pointers to its


functions to register for callbacks:
static struct fuse_operations myfs_ops = {
. getattr
= myfs_getattr ,
. readdir
= myfs_readdir ,
. open
= myfs_open ,
. read
= myfs_read ,
};

495

Those implement the required operations in whatever way


makes sense.

496

Documentation and Further Information

Hello FUSE (hello.c)


# define FUSE_USE_VERSION 26
# include < fuse .h >
# include < stdio .h >
# include < string .h >
# include < errno .h >
# include < fcntl .h >

Where to start and where to look?


I

The interfaces are well documented (within the header files).

Some example filesystem are provided with the FUSE code.

Hello FUSE (standard interface) hello.c

Hello FUSE LL (low-level interface) hello ll.c

static const char * hello_str = " Hello World !\ n " ;


static const char * hello_path = " / hello " ;
static struct fuse_operatio ns hello_oper = {
. getattr
= hello_getattr ,
. readdir
= hello_readdir ,
. open
= hello_open ,
. read
= hello_read ,
};

Hello FUSE standard interface example


/*
FUSE : Filesystem in Userspace
Copyright ( C ) 2001 -2007 Miklos Szeredi < mikl os@szeredi . hu >
This program can be distributed under the terms of the GNU GPL .
See the file COPYING .

int
main ( int argc , char * argv [])
{
return fuse_main ( argc , argv , & hello_oper , NULL );
}

gcc - Wall pkg - config fuse -- cflags -- libs hello . c -o hello


*/

497

Hello FUSE (hello.c, cont.)

498

Hello FUSE (hello.c, cont.)

static int
hello_getattr ( const char * path , struct stat * stbuf )
{
int res = 0;

static int
hello_readdir ( const char * path , void * buf ,
fuse_fill_dir_ t filler , off_t offset ,
struct fuse_file_info * fi )
{
( void ) offset ;
( void ) fi ;

memset ( stbuf , 0 , sizeof ( struct stat ));


if ( strcmp ( path , " / " ) == 0) {
stbuf - > st_mode = S_IFDIR | 0755;
stbuf - > st_nlink = 2;
} else if ( strcmp ( path , hello_path ) == 0) {
stbuf - > st_mode = S_IFREG | 0444;
stbuf - > st_nlink = 1;
stbuf - > st_size = strlen ( hello_str );
} else
res = - ENOENT ;

if ( strcmp ( path , " / " ) != 0)


return - ENOENT ;
filler ( buf , " . " , NULL , 0);
filler ( buf , " .. " , NULL , 0);
filler ( buf , hello_path + 1 , NULL , 0);
return 0;

return res ;

499

500

Hello FUSE (hello.c, cont.)

Hello FUSE (hello.c, cont.)


static int
hello_read ( const char * path , char * buf ,
size_t size , off_t offset ,
struct fuse_file_info * fi )
{
size_t len ;
( void ) fi ;
if ( strcmp ( path , hello_path ) != 0)
return - ENOENT ;

static int
hello_open ( const char * path , struct fuse_fil e_info * fi )
{
if ( strcmp ( path , hello_path ) != 0)
return - ENOENT ;
if (( fi - > flags & 3) != O_RDONLY )
return - EACCES ;

len = strlen ( hello_str );


if ( offset < len ) {
if ( offset + size > len )
size = len - offset ;
memcpy ( buf , hello_str + offset , size );
} else
size = 0;

return 0;
}

return size ;
}

501

Project Part Three: DBFS

502

Demonstration of Expected Result

Use relational storage as backend for a FUSE implementation


I

We use the low level interface to connect DB and FS via pre


and ino, respectively.

The implementation is read-only.

Mount the shredded file hierarchy (from part two) as FUSE.


Optional Part: Prolong the file hierarchy on XML files
inherent structure.

I
I

A Stepwise Solution of Project Part Three


I

Access to FUSE supported OS.


I

Whenever you encounter an XML file, shred it (project part


one) into file hierarchy kind FXML.
Fake stat information for ELEM etc. where appropriate.
Content of TEXT nodes may appear as symbolic links.

503

Send me an e-mail for access to compute server (titan14).

Take hello ll.c as template.

Have a look at fuse lowlevel.h.

Implement functions as demonstrated during lecture.

504

Low Level Functions

Lookup a Directory Entry by Name

We will need to implement the following functions:


/* *
fuse_lowlevel . h
* Look up a directory entry by name and get its attributes .
*
* Valid replies :
*
fu s e_reply_entry
*
fuse_reply_err
*
* @param req request handle
* @param parent inode number of the parent directory
* @param name the name to look up
*/
void (* lookup ) ( fuse_req_t req ,
fuse_ino_t parent ,
const char * name );

static struct fu se_l o wl ev e l _ o p s dbfs_ll_oper = {


. lookup
= dbfs_ll_lookup ,
. getattr
= dbfs_ll_getattr ,
. readdir
= dbfs_ll_readdir ,
. open
= dbfs_ll_open ,
. read
= dbfs_ll_read ,
};
I

lookup - Lookup a dir entry by name and get its attributes.

getattr - Get file attributes

readdir - Read directory

open - Open a file

read - Read data

505

Get File Attributes

506

Read Directory
/* *
fuse_lowlevel . h
* Read directory
*
* Send a buffer filled using f use_ add _di r en try () , with size
* not exceeding the requested size . Send an empty buffer on
* end of stream .
*
* Valid replies :
*
fuse_reply_buf
*
fuse_reply_err
*
* @param req request handle
* @param ino the inode number
* @param size maximum number of bytes to send
* @param off offset to continue reading the directory stream
* @param fi file information
*/
void (* readdir ) ( fuse_req_t req , fuse_ino_t ino , size_t size ,
off_t off , struct fuse_file_info * fi );

/* *
fuse_lowlevel . h
* Get file attributes
*
* Valid replies :
*
f use _r eply_attr
*
fuse_reply_err
*
* @param req request handle
* @param ino the inode number
* @param fi for future use , currently always NULL
*/
void (* getattr ) ( fuse_req_t req ,
fuse_ino_t ino ,
struct fuse_file_inf o * fi );

507

508

Filling a Buffer in readdir


/* *
fuse_lowlevel . h
* Add a directory entry to the buffer
*
* Buffer needs to be large enough to hold the entry .
* If it s not , then the entry is not filled in , but
* the size of the entry is still returned . The caller
* can check this by comparing the bufsize parameter
* with the returned entry size . If the entry size is
* larger than the buffer size , the operation failed .
*/
size_t
/* return the space needed for the entry
f u se _ ad d _di r en t ry ( fuse_req_t req ,
/* request handle
char * buf ,
/* point to add new entry
size_t bufsize , /* remaining size of buf
const char * name , /* name of the entry
const struct stat * stbuf , /* file atts
off_t off ); /* offset of the next entry

Open a File
/* *
fuse_lowlevel . h
* Open a file
*
* Open flags ( with the exception of O_CREAT , O_EXCL ,
* O_NOCTTY and O_TRUNC ) are available in fi - > flags .
*
* ...
* Valid replies :
*
f us e_reply_open
*
fuse_reply_err
*
* @param req request handle
* @param ino the inode number
* @param fi file information
*/
void (* open ) ( fuse_req_t req ,
fuse_ino_t ino ,
struct fuse_file_info * fi );

*/
*/
*/
*/
*/
*/
*/

509

Read data

510

Miscellaneous
Definitions

/* *
fuse_lowlevel . h
* Read data
*
* Read should send exactly the number of bytes requested
* except on EOF or error , otherwise the rest of the data
* will be substituted with zeroes .
* ...
* Valid replies :
*
fuse_reply_buf
*
fuse_reply_err
*/
void (* read ) ( fuse_req_t req ,
/* request handle */
fuse_ino_t ino ,
/* inode number */
size_t size , /* number of bytes to read */
off_t off ,
/* offset to read from */
struct fuse_ file_inf o * fi ); /* file info */

/* * The node ID of the root inode */


# define FUSE_ROOT_ID 1
/* * Inode number type */
typedef unsigned long fuse_ino_t ;
/* * Request pointer type */
typedef struct fuse_req * fuse_req_t ;

Caveats

511

Beware of different struct stat sizes.

FUSE compiles with -D FILE OFFSET BITS=64

fts(3) does not!

512

Low Level Replies to Functions

Reply with an Error Code or Success

/* *
fuse_lowlevel . h
* Reply with an error code or success
*
* Possible requests :
*
all except forget
*
* unlink , rmdir , rename , flush , release , fsync , fsyncdir ,
* setxattr , removexattr and setlk may send a zero code
*
* @param req request handle
* @param err the positive error value , or zero for success
* @return zero for success , - errno for failure to send reply
*/
int fuse_reply_err ( fuse_req_t req , int err );

The following replies are relevant to our functions:


I

fuse reply err

fuse reply attr

fuse reply entry

fuse reply buf

fuse reply open

514

513

Reply with a Directory Entry

Reply with Data

/* *
fuse_lowlevel . h
* Reply with a directory entry
*
* Possible requests :
*
lookup , mknod , mkdir , symlink , link
*
* @param req request handle
* @param e the entry parameters
* @return zero for success , - errno for failure to send reply
*/
int fu se_ reply _e nt ry ( fuse_req_t req ,
const struct f u se _ en try _ pa r am * e );
I

/* *
fuse_lowlevel . h
* Reply with data
*
* Possible requests : read , readdir
*/
int
/* zero for success , - errno for failure */
fuse _reply_buf ( fuse_req_t req ,
/* request handle */
const char * buf ,
/* contains data */
size_t size ); /* data size in bytes */

struct fuse entry param is also in fuse lowlevel.h.

515

516

Reply with Attributes

Reply with Open Parameters

/* *
fuse_lowlevel . h
* Reply with attributes
*
* Possible requests :
*
getattr , setattr
*
* @param req request handle
* @param the attributes
* @param attr_timeout validity timeout
*
( in seconds ) for the attributes
* @return zero for success , - errno for failure to send reply
*/
int fus e_reply_attr ( fuse_req_t req ,
const struct stat * attr ,
double attr_timeout );

/* *
fuse_lowlevel . h
* Reply with open parameters
*
* currently the following members of fi are used :
*
fh , direct_io , keep_cache
*
* Possible requests :
*
open , opendir
*
* @param req request handle
* @param fi file information
* @return zero for success , - errno for failure to send reply
*/
int f use_reply_open ( fuse_req_t req ,
const struct fuse_file_info * fi );

517

518

Schedule for Today


Systems Programming

Last lectures: Filesystem in USErspace (FUSE)

10. Project Discussion


I

Unbuffered I/O and control functions on file descriptors.

Functions for operating on directories and for manipulating


file attributes such as access modes and ownership.

I/O on streams, i.e., standard I/O library (ISO C).

Introduction to FUSE

Alexander Holupirek
Database and Information Systems Group
Department of Computer & Information Science
University of Konstanz

Today: Project Discussion

Summer Term 2008

519

Present sample solution.

Address common problems.

Managing projects.

520

Project Overview

DBFS Commands

DBFS
I

Unifies different visitors and FUSE interface to provide file


system operations on a database.
Uses SAX to translate a given XML file into the database
representation.
I

Uses FHT to translate a given file hierarchy into the database


representation.
I

I
I

see lecture 5 and assignment 7, project part I

see assignment 8, project part II

The database representation can be serialized to an XML file.


FUSE is used to provide file systems operations on the
database representation.
I

see lecture 9 and assignment 9, project part III

522

521

Recap. Low Level Functions

dbfs ll getattr

The following functions had to be implemented:


static struct fu se_l o wl ev e l _ o p s dbfs_ll_oper = {
. lookup
= dbfs_ll_lookup ,
. getattr
= dbfs_ll_getattr ,
. readdir
= dbfs_ll_readdir ,
. open
= dbfs_ll_open ,
. read
= dbfs_ll_read ,
};
I

lookup - Lookup a dir entry by name and get its attributes.

getattr - Get file attributes, a wrapper for dbfs stat

readdir - Read directory

read - Read data

open - Open a file, was already implemented

Calls dbfs stat to check the attributes.

Replies with error if requested ino is not a file or dir.

Replies with the attributes set by dbfs stat.

static void dbfs_ll_getattr


( fuse_req_t req , fuse_ino_t ino , struct fus e_file_info * fi )
{
( void ) fi ;
struct stat stbuf ;
fprintf ( stderr , " [ getattr ] ino : % lu \ n " , ino );
memset (& stbuf , 0 , sizeof ( stbuf ));
if ( dbfs_stat ( ino , & stbuf ) == -1)
fuse_reply_err ( req , ENOENT );
else
fuse_reply_attr ( req , & stbuf , 1.0);
}

523

524

dbfs stat

dbfs ll lookup

static int
dbfs_stat ( fuse_ino_t ino , struct stat * stbuf )
{
enum kind_t kind = db_kind ( ino );
if ( kind == FREG || kind == FDIR ) {
stbuf - > st_ino = ino ;
stbuf - > st_mode = db_st_mode ( ino );
stbuf - > st_nlink = db_st_nlink ( ino );
stbuf - > st_uid = db_st_uid ( ino );
stbuf - > st_gid = db_st_gid ( ino );
stbuf - > st_rdev = db_st_rdev ( ino );
stbuf - > st_size = db_st_size ( ino );
stbuf - > st_blocks = db_st_blocks ( ino );
stbuf - > st_atime = db_st_atime ( ino );
stbuf - > st_mtime = db_st_mtime ( ino );
stbuf - > st_ctime = db_st_ctime ( ino );
return 0;
}
return -1;
}

Lookup a directory by name and get its attributes.

Passes the parent and the name of directory to lookup.

Get all the children of parent.

Check the names.

If the name matches, reply with fuse reply entry.

Else reply with fuse reply err.

525

526

dbfs ll readdir

static void dbfs_ll_lookup


( fuse_req_t req , fuse_ino_t parent , const char * name )
{
unsigned int numchildren ;
unsigned int children [512];
numchildren = db_children ( children , 512 , parent );
unsigned int i ;
for ( i =0; i < numchildren ; i ++) {
if ( strcmp ( name , db_cnt ( children [ i ])) == 0) {
struct stat sbuf ;
struct fus e _e ntr y _p a ra m e ;
memset (& e , 0 , sizeof ( e ));
if ( dbfs_stat ( children [ i ] , & sbuf ) == 0) {
e . ino = children [ i ];
e . attr_timeout = 1.0;
e . entry_timeout = 1.0;
e . attr = sbuf ;
f use_ r ep l y_ en t ry ( req , & e );
return ;
}
}
}
fuse_reply_err ( req , ENOENT );
}

527

Reads a directory and fills a buffer with children.

Check if it is a directory, FDIR.

Get all the children.

Fill the buffer.

Reply with reply buf limited and clean up.

If not a directory reply with fuse reply err.

528

dbfs ll read
static void dbfs_ll_rea ddi r
( fuse_req_t req , fuse_ino_t ino , size_t size ,
off_t off , struct fuse_file_info * fi ) {
( void ) fi , size , off ;
if ( db_kind ( ino ) == FDIR ) {
unsigned int numchildren ;
unsigned int children [512];
numchildren = db_children ( children , 512 , ino );
struct dirbuf dbuf ;
memset (& dbuf , 0 , sizeof ( dbuf ));
unsigned int i ;
for ( i =0; i < numchildren ; i ++) {
dirbuf_add (
req , & dbuf , db_cnt ( children [ i ]) , children [ i ]);
}
re ply _bu f _ li m it ed ( req , dbuf .p , dbuf . size , off , size );
free ( dbuf . p );
} else
fuse_reply_err ( req , ENOTDIR );
}

Reads data from a file. Sends exactly the number of bytes


requested.

Check if kind is file, FREG.

If it is a file, get the content, which is by definition at


pre=ino+1.

Reply with reply buf limited.

If not a file reply with fuse reply err.

static void dbfs_ll_read


( fuse_req_t req , fuse_ino_t ino , size_t size
, off_t off , struct fuse_file_info * fi ) {
( void ) fi ;
if ( db_kind ( ino ) == FREG )
re ply _buf _li mit ed ( req , db_cnt ( ino + 1)
, db_st_size ( ino ) , off , size );
else
fuse_reply_err ( req , EIO );
}
529

Things to Remember

530

Things to Remember

void pointer
I

point to unspecified object

cant be dereferenced

allocated memory is already returned as (void *), no cast


necessary

Expressions and Evaluation

before dereferencing they must be cast to correct object type

# include < stdlib .h >


struct foo {
int a ;
void * p ;
} bar ;
int main ( void ) {
bar . p = malloc ( sizeof ( struct foo ));
(( struct foo *) bar . p ) - > a = 10;
return 1;
}

531

uses short-circuit evaluation, evaluation stops when first


expression evaluates to FALSE

C has no definition for TRUE and FALSE, only TRUE is


nonzero, FALSE is zero

therefore you should never use TRUE and FALSE in


expressions, only for assignments.

532

Organizing Files

Organizing Files

Header
I

struct definitions

typedefs

function prototypes

global variables, unions, enums

constants

#defined macros

Common Pitfalls
cyclic dependencies in two header files

missing #include in a source file

duplicate definitions

duplicate instances

You should always comment your files carefully. Include author,


revision dates, function descriptions (with parameters and return).

Source
I

actual implementation

Compiler makes no difference between source and header file.

533

High Level vs. Low Level

Development Environments for Unix/Linux

High Level
I

class structure

optimizer

get() and set() functions

no need to know memory layout

Vim

to a certain point classes can be emulated

direct access faster

good to know the underlying architecture

optimizable with inline assembler

command line

supports syntax highlighting, code completion, build

plugins for file exploration

I http://tldp.org/HOWTO/C-editing-with-VIM-HOWTO/index.html

Low Level
I

534

Emacs, the programmers editor

535

command line, gui available

supports syntax highlighting, code completion, auto


indentation, build

536

Development Environments for Unix/Linux

Kdevelop
I

full-fledged IDE for KDE

Anjuta
I

full-fledged IDE for Gnome

Eclipse with CDT plugin


I

full-fledged platform independant IDE

537

You might also like