Course Notes On SPL

Course Notes for
Structure of
Programming
Languages
Part A
Goals of Course
To survey the various programming
languages, their purposes and their histories

Why do we have so many languages?
How did these languages develop?
Are some languages better than others for
some things?
To examine methods for describing language
syntax and semantics
Goals of Course
Syntax indicates structure of program code
How can language designer specify this?
How can programmer learn language?
How can compiler recognize this?
Lexical analysis
Parsing (syntax analysis)
Brief discussing of parsing techniques
Semantics indicate meaning of the code

What code will actually do?
Can we effectively do this in a formal way?
Static semantics
Dynamic semantics
Goals of Course
To examine some language features and
constructs and how they are used and

implemented in various languages
Variables and constants
Types, binding and type checking
Scope and lifetime
Data Types
Primitive types
Array types
Structured data types
Goals of Course
Pointer (reference) types
Assignment statements and expressions

Operators, precedence and associativity
Type coercions and conversions
Boolean expressions and short-circuit evaluation
Control statements
Selection
Iteration
Unconditional branching goto
Goals of Course
Process abstraction: procedures and functions
Parameters and parameter-passing
Generic subprograms
Nonlocal environments and side-effects
Implementing subprograms
Subprograms in static-scoped languages
Subprograms in dynamic-scoped languages
Goals of course
Data abstraction and abstract data types
Object-oriented programming
Design issues
Implementations in various object-oriented
languages
Concurrency
Concurrency issues
Subprogram level concurrency
Implementations in various languages
Statement level concurrency
Goals of Course
Exception handling
Issues and implementations
IF TIME PERMITS
Functional programming languages
Logic programming languages
Language Development Issues

Why do we have high-level programming
languages?
Machine code is too difficult for us to read,
understand and debug

Machine code is not portable between
architectures
Why do we have many high-level
programming languages?
Different people and companies developed
them
9

Different languages are either designed to or
happen to meet different programming

needs
Scientific applications
FORTRAN
Business applications
COBOL
AI
LISP, Scheme (& Prolog)
Systems programming
C
Web programming
Perl, PHP, Javascript
General purpose
C++, Ada, Java
10

Programming language qualities and
evaluation criteria
Readability
How much can non-author understand logic of code
just by reading it?

Is code clear and unambiguous to reader?
These are often subjective, but sometimes is is
fairly obvious
Examples of features that help readability:
Comments
Long identifier names
Named constants
11

Clearly understood control statements
Language orthogonality
Simple features combine in a consistent way
But it can go too far, as explained in the text about Algol 68
Writability
Not dissimilar to readability
How easy is it for programmer to use language
effectively?
Can depend on the domain in which it is being used
Ex: LISP is very writable for AI applications but would
not be so good for systems programming

Also somewhat subjective
12
Examples of features that help writability:

Clearly understood control statements
Subprograms
Also orthogonality
Reliability
Two different ideas of reliability
1. Programs are less susceptible to logic errors
2.
Ex: Assignment vs. comparison in C++

See assign.cpp and assign.java
Programs have the ability to recover from

exceptional situations
Exception handling we will discuss more later
13
Language Design Issues

Many factors influence language design
Architecture
Most languages were designed for single processor
von Neumann type computers

CPU to execute instructions
Data and instructions stored in main memory
General language approach

Imperative languages
Fit well with von Neumann computers
Focus is on variables, assignment, selection and
iteration
Examples: FORTRAN, Pascal, C, Ada, C++, Java
14

Imperative language evolution
Simple straight-line code
Top-down design and process abstraction
Data abstraction and ADTs
Object-oriented programming
Some consider object-oriented languages not to be
imperative, but most modern oo languages have

imperative roots (ex. C++, Java)
Functional languages
Focus is on function and procedure calls
Mimics mathematical functions
Less emphasis on variables and assignment
In strictest form has no iteration at all recursion is used
instead
Examples: LISP, Scheme
15

Logic programming languages
Symbolic logic used to express propositions, rules and
inferences
Programs are in a sense theorems
User enters a proposition and system uses programmers
rules and propositions in an attempt to prove it
Typical outputs:
Yes: Proposition can be established by program
No: Proposition cannot be established by program
Example: Prolog see example program
Cost
What are the overall costs associated with a given
language?
How does the design affect that cost?
16

Training programmers
How easy is it to learn?
Writing programs
Is language a good fit for the task?
Compiling programs
How long does it take to compile programs?
This is not as important now as it once was
Executing programs
How long does program take to run?
Often there is a trade-off here
Ex: Java is slower than C++ but it has many run-time features
(array bounds checking, security manager) that are lacking in
C++
17
Language Implementation Issues
How is HLL code processed and executed on

the computer?
Compilation
Source code is converted by the compiler into
binary code that is directly executable by the
computer
Compilation process can be broken into 4
separate steps:
1) Lexical Analysis
Breaks up code into lexical units, or tokens

Examples of tokens: reserved words, identifiers,
punctuation
Feeds the tokens into the syntax analyzer
18
Language Implementation
Issues
2) Syntax Analysis
Tokens are parsed and examined for correct

syntactic structure, based on the rules for the
language
Programmer syntax errors are detected in this
phase
3) Semantic Analysis/Intermediate Code Generation
Declaration and type errors are checked here

Intermediate code generated is similar to assembly
code
Optimizations can be done here as well, for
example:
Unnecessary statements eliminated

Statements moved out of loops if possible
Recursion removed if possible
19
Issues
4) Code Generation
Intermediate code is converted into executable

code
Code is also linked with libraries if necessary
Note that steps 1) and 2) are independent of the
architecture depend only upon the language

Front End
Step 3) is somewhat dependent upon the
architecture, since, for example, optimizations
will depend upon the machine used
Step 4) is clearly dependent upon the
architecture Back End
20
Issues
Interpreting
Program is executed in software, by an interpreter
Source level instructions are executed by a virtual
machine
Allows for robust run-time error checking and
debugging
Penalty is speed of execution
Example: Some LISP implementations, Unix shell
scripts and Web server scripts
21
Issues
Hybrid
First 3 phases of compilation are done, and
intermediate code is generated

Intermediate code is interpreted
Faster than pure interpretation, since the
intermediate codes are simpler and easier to
interpret than the source codes
Still much slower than compilation
Examples: Java and Perl
However, now Java uses JIT Compilation also
Method code is compiled as it is called so if it is called again it
will be faster
22
Brief, Incomplete PL History

Early 50s
Early HLLs started to emerge
FORTRAN
Stands for FORmula TRANslating system
Developed by a team led by John Backus at IBM for
the IBM 704 machine

Successful in part because of support by IBM
Designed for scientific applications
The root of the imperative language tree
23
Brief PL History
Lacked many features that we new take for granted
in programming languages:
Conditional loops
Statement blocks
Recursive abilities
Many of these features were added in future
versions of FORTRAN
FORTRAN II, FORTRAN IV, FORTRAN 77, FORTRAN 90
Had some interesting features that are now
obsolete
COMMON, EQUIVALENCE, GOTO
We may discuss what these are later
24
Brief PL History
Late 50s
COBOL
COmmon Business Oriented Language
Developed by US DoD
Separated data and procedure divisions
But didnt allow functions or parameters
Still widely used, due in part to the large cost of
rewriting software from scratch

Big companies would rather maintain COBOL programs
than rewrite them in a different language
25
Brief PL History
LISP
LISt Processing
Developed by John McCarthy of MIT
Functional language
Good for symbolic manipulation, list processing
Had recursion and conditional expressions
Not in original FORTRAN
At one time used extensively for AI

Today most widely used version, COMMON LISP, has
included some imperative features
26
Brief PL History
ALGOL
ALGOL 58 and then ALGOL 60 both designed by
international committee
Goals for the language:
Syntax should be similar to mathematical notation and
readable
Should be usable for algorithms in publications
Should be compilable into machine code
Included some interesting features
Pass by value and pass by name (wacky!) parameters
Recursion (first in an imperative language)
Dynamic arrays
Block structure and local variables
27
Brief PL History
Introduced Backus-Naur Form (BNF) as a way to
describe the language syntax

Still commonly used today, but not well-accepted at the
time
Never widely used, but influenced virtually all
imperative languages after it
28
Brief PL History
Late 60s
Simula 67
Designed for simulation applications
Introduced some interesting features
Classes for data abstraction
Coroutines for re-entrant subprograms
ALGOL 68
Emphasized orthogonality and user-defined data
types
Not widely used
29
Brief PL History
70s
Pascal
Developed by Nicklaus Wirth
No major innovations, but due to its simplicity and
emphasis of good programming style, became

widely used for teaching
C
Developed by Dennis Ritchie to help implement the
Unix operating system

Has a great deal of flexibility, esp. with types
Incomplete type checking
30
Brief PL History
Void pointers
Coerces many types
Many programmers (esp. systems programmers)
love it
Language purists hate it
Easy to miss logic errors
Prolog
Logic programming
We discussed it a bit already
Still used somewhat, mostly in AI
May discuss in more detail later
31
Brief PL History
80s
Ada
Developed over a number of years by DoD
Goal was to have one language for all DoD
applications
Especially for embedded systems
Contains some important features

Data encapsulation with packages
Generic packages and subprograms
Exception handling
Tasks for concurrent execution
We will discuss some of these later
32
Brief PL History
Very large language difficult to program reliably,
even though reliability was one of its goals!

Early compilers were slow and error-prone
Did not have the widespread general use that was
hoped
Eventually the government stopped requiring it for
DoD applications
Use faded after this
Not used widely anymore
Ada 95 added object-oriented features

Still wasn't used much, especially with the advent of
Java and other OO languages
33
Brief PL History
Smalltalk
Designed and developed by Alan Kay
Concepts developed in 60s, but language did not come to
fruition until 1980

Designed to be used on a desktop computer 15 years
before desktop computers existed
First true object-oriented language
Language syntax is geared toward objects
messages passed between objects
methods are invoked as responses to messages
Always dynamically bound
All classes are subclasses of Object
Also included software devel. environment

Had large impact on future OOLs, esp. Java
34
Brief PL History
C++
Developed largely by Bjarne Stroustrup as an
extension to C
Backward compatible
Added object-oriented features and some additional
typing features to improve C

Very powerful and very flexible language
But still has reliability problems
Ex. no array bounds checking
Ex. dynamic memory allocation
Widely used and likely to be used for a while longer
35
Brief PL History
Perl
Developed by Larry Wall
Takes features from C as well as scripting languages
awk, sed and sh
Some features:
Regular expression handling
Associative arrays
Implicit data typing
Originally used for data extraction and report
generation
Evolved into the archetypal Web scripting language
Has many proponents and detractors
36
Brief PL History
90s
Java
Interestingly enough, just like Ada, Java was
originally developed to be used in embedded

systems
Developed at Sun by a team headed by James
Gosling
Syntax borrows heavily from C++
But many features (flaws?) of C++ have been
eliminated
No explicit pointers or pointer arithmetic
Array bounds checking
Garbage collection to reclaim dynamic memory
37
Brief PL History
Object model of Java actually more resembles that
of Smalltalk rather than that of C++

All variables are references
Class hierarchy begins with Object
Dynamic binding of method names to operations by
default
But not as pure in its OO features as Smalltalk, due
to its imperative control structures

Interpreted for portability and security
Also JIT compilation now
Growing in popularity, largely due to its use on Web
pages
38
Brief PL History
00's (aughts? oughts? naughts?)
See http://www.randomhouse.com/wotd/index.pperl?date=19990803
C#
Main roots in C++ and Java with some other
influences as well
Used with the MS .NET programming environment
Some improvements and some deprovements
compared to Java
Likely to succeed given MS support
39
Program Syntax
Recall job of syntax analyzer:
Groups (parses) tokens (fed in from lexical
analyzer) into meaningful phrases

Determines if syntactic structure of token
stream is legal based on rules of the language
Lets look at this in more detail
How does compiler know what is legal and
what is not?
How does it detect errors?
40
Program Syntax
To answer these questions we must look at
programming language syntax in a more

formal way
Language:
Set of strings of lexemes from some alphabet
Lexemes are the lowest level syntactic elements
Lexemes are made up of characters, as defined by
the character set for the language
41
Program Syntax
Lexemes are categorized into different tokens and
processed by the lexical analyzer

Ex:
if (width < height)

{
cout << width << endl;

}
Lexemes: if, (, width, <, height, ), {, cout, <<, width,
<<, endl, ;, }
Tokens: iftok, lpar, idtok, lt, idtok, rpar, lbrace, idtok, llt,
idtok, llt, idtok, semi, rbrace
Note that some tokens correspond to single lexemes
(ex. iftok) whereas some correspond to many (ex.
42
Program Syntax
How do we formally define a language?

Assume we have a language, L, defined over
1)
an alphabet, .
2 related techniques:
Recognition
An algorithm or mechanism, R, will process
any given string, S, of lexemes and correctly
determine if S is within L or not
Not used for enumeration of all strings in L
Used by parser portion of compiler
43
Program Syntax
2) Generation
Produces valid sentences of L
Not as useful as recognition for compilation,
since the valid sentences could be arbitrary

More useful in understanding language
syntax, since it shows how the sentences are
formed
Recognizer only says if sentence is valid or not

more of a trial and error technique
44
Program Syntax
So recognizers are what compilers need, but
generators are what programmers need to

understand language
Luckily there are systematic ways to create
recognizers from generators

Thus the programmer reads the generator to
understand the language, and a recognizer is
created from the generator for the compiler
45
Language Generators
Grammar
A mechanism (or set of rules) by which a
language is generated
Defined by the following:
A set of non-terminal symbols, N

Do not actually appear in strings
A set of terminal symbols, T

Appear in strings
A set of productions, P
Rules used in string generation
A starting symbol, S
46
Language Generators
Noam Chomsky described four classes of
grammars (used to generate four classes of

languages) Chomsky Hierarchy
0 )Unrestricted
1)Context-sensitive
2)Context-free
3)Regular
More info on unrestricted and context-
sensitive grammars in a theory course

The last two will be useful to us
47
Language Generators
Regular Grammars
Productions must be of the form:
<non> <ter><non> | <ter>
where <non> is a nonterminal, <ter> is a
terminal, and | represents either or

Can be modeled by a Finite-State Automaton
(FSA)
Also equivalent to Regular Expressions
Provide a model for building lexical analyzers
48
Language Generators
Have following properties (among others)
Can generate strings of the form n, where is a
finite sequence and n is an integer
Pattern recognition
Can count to a finite number

Ex. { an | n = 85 }
But we need at least 86 states to do this
Cannot count to arbitrary number
Note that { an } for any n (i.e. 0 or more occurrences) is easy
do not have to count
Important to realize that the number of states is
finite: cannot recognize patterns with an arbitrary

number of possibilities
49
Language Generators
Example: Regular grammar to recognize Pascal
identifiers (assume no caps)

N = {Id, X} T = {a..z, 0..9} S = Id
P=
Id aX | bX | | a | b | | z
X aX | bX | | 0X | | 9X | a | | z | 0 | | 9
Consider equiv. FSA:
Id
9
50
Language Generators
Example: Regular grammar to generate a
binary string containing an odd number of 1s

N = {A,B} T = {0,1} S = A P =
A 0A | 1B | 1
B 0B | 1A | 0
Example: Regular grammars CANNOT generate
strings of the form anbn

Grammar needs some way to count number of as
and bs to make sure they are the same

Any regular grammar (or FSA) has a finite number,
say k, of different states
If n > k, not possible
51
Language Generators
If we could add a memory of some sort we
could get this to work

Context-free Grammars
Can be modeled by a Push-Down Automaton
(PDA)
FSA with added push-down stack
Productions are of the form:

<non> , where <non> is a nonterminal and is
any sequence of terminals and nonterminals
note rhs is more flexible now
52
Language Generators
So how to generate anbn ? Let a=0, b=1
N = {A} T = {0,1} S = A P =
A 0A1 | 01
Note that now we can have a terminal after the
nonterminal as well as before

Can also have multiple nonterminals in a single
production
Example: Grammar to generate sets of
balanced parentheses
N = {A} T = {(,)} S = A P =
A AA | (A) | ()
53
Language Generators
Context-free grammars are also equivalent to
BNF grammars
Developed by Backus and modified by Naur
Used initially to describe Algol 60
Given a (BNF) grammar, we can derive any
string in the language from the start symbol

and the productions
A common way to derive strings is using a
leftmost derivation
Always replace leftmost nonterminal first
Complete when no nonterminals remain
54
Language Generators
Example: Leftmost derivation of nested parens:
(()(()))
A (A)
(AA)
(()A)
(()(A))
(()(()))
We can view this derivation as a tree, called a
parse tree for the string
55
Language Generators
Parse tree for (()(()))
A
(
A
)
A
(
)
)
56
Language Generators
If, for a given grammar, a string can be derived
by two or more different parse trees, the

grammar is ambiguous
Some languages are inherently ambiguous
All grammars that generate that language are
ambiguous
Many other languages are not themselves
ambiguous, but can be generated by

ambiguous grammars
It is generally better for use with compilers if a
grammar is unambiguous
Semantics are often based on syntactic form
57
Language Generators
Ambiguous grammar example: Generate strings
of the form 0n1m, where n,m >= 1

N = {A,B,C} T = {0,1} S = A
P=
A BC | 0A1
B 0B | 0
C 1C | 1
Consider the string: 00011

A
B
C
0
B
0
1
B
0
A
0
C
1
A
B
1
C
0
58
Language Generators
We can easily make this grammar
unambiguous:
Remove production: A 0A1
Note that nonterminal B can generate an arbitrary
number of 0s and nonterminal C can generate an

arbitrary number of 1s
Now only one parse tree
A
B
C
0
B
0
1
B
C
1
0
59
Language Generators
Lets look at a few more examples
Grammar to generate: {WWR | W {0,1} }
N = {A} T = {0,1} S = A P = ?
S 0A0 | 1A1 | 00 | 11
Grammar to generate: strings in {0,1} of the
form WX such that |W| = |X| but W != X

This one is a little tricker
How to approach this problem?
We need to guarantee two things
Overall string length is even
At least one bit differs in the two halves
60
Language Generators
See board
Ok, now how do we make a grammar to do this?
Make every string (even length) the result of two odd-
length strings appended to each other

Assume odd-length strings are Ol and Or
Make sure that either
Ol has a 1 in the middle and Or has a 0 in the middle or
Ol has a 0 in the middle and Or has a 1 in the middle
Productions:
In AB | BA
A 0A0 | 1A1 | 1A0 | 0A1 | 1
B 0B0 | 1B1 | 1B0 | 0B1| 0
61
Language Generators
Lets look at an example more relevant to
programming languages:
Grammar to generate simple assignment
statements in a C-like language (diff. from one in

text):
<assig stmt> ::= <var> = <arith expr>
<arith expr> ::= <term> | <arith expr> + <term> |
<arith expr> - <term>
<term> ::= <primary> | <term> * <primary> | <term>
/ <primary>
<primary> ::= <var> | <num> | (<arith expr>)
<var> ::= <id> | <id>[<subscript list>]
<subscript list> ::= <arith expr> | <subscript list>,
<arith expr>
62
Language Generators
stmt>* 20
Parse tree for: X<assig
= (A[2]+Y)
<var> = <arith expr>
<id>
<term>
<term>
*
<primary>
<primary>
<num>
<arith expr> )
<arith expr>
<term
>
<primary>
<var>
+ <term>
<primary>
<var>
<id>
<id> [ <subscript list>

]
<arith expr>
<term
>
<primary>
<num>
63
Language Generators
Wow that seems like a very complicated parse
tree to generate such a short statement

Extra non-terminals are often necessary to remove
ambiguity
Extra non-terminals are often necessary to create
precedence
Precedence in previous grammar has * and / higher
than + and /
They would be lower in the parse tree
LOWER ABOVE IS CORRECT
What about associativity

Left recursive productions == left associativity
Right recursive productions == right associativity
64
Language Generators
But Context-free grammars cannot generate
everything
Ex: Strings of the form WW in {0,1}
Cannot guarantee that arbitrary string is the same
on both sides
Compare to WWR
These we can generate from the middle and build out
in each direction
For WW we would need separate productions for each
side, and we cannot coordinate the two with a contextfree grammar
Need Context-Sensitive in this case
65
Language Generators
Lets look at one more grammar example
Grammar to generate all postfix expressions
involving binary operators * and -. Assume

<id> is predefined and corresponds to any
variable name
Ex: v w x y - * z * How do we approach this problem?
Terminals easy
Nonterminals/Start require some thought
Productions require a lot of thought
66
Language Generators
T = { <id>, *, - }
N={A}
S={A}
P=
A AA* | AA- | <id>
Show parse tree for previous example
Is this grammar LL(1)?
We will discuss what this means soon
67
Parsers
Ok, we can generate languages, but how to
recognize them?
We need to convert our generators into
recognizers, or parsers
We know that a Context-free grammar
corresponds to a Push-Down Automaton (PDA)
However, the PDA may be non-deterministic
As we saw in examples, to create a parse tree we
sometimes have to guess at a substitution
68
Parsers
May have to guess a few times before we get the
correct answer
This does not lend itself to programming
language parsing
Wed like parser to never have to guess
To eliminate guessing, we must restrict the
PDAs to deterministic PDAs, which restricts the

grammars that we can use
Must be unambiguous
Some other, less obvious restrictions, depending
upon parsing technique used
69
Parsers
There are two general categories of parsers
Bottom-up parsers
Can parse any language generated by a
Deterministic PDA
Build the parse trees from the leaves up back to the
root as the tokens are processed
At each step, a substring that matches the right-hand
side of a production is substituted with the left side of

the production
Reduces input string all the way back to the start
symbol for the grammar
Also called shift-reduce parsing
70
Parsers
Correspond LR(k) grammars
Left to right processing of string

Rightmost derivation of parse tree (in reverse)
k symbols lookahead required
LR parsers are difficult to write by hand, but can be
produced systematically by programs such as YACC

(Yet Another Compiler Compiler).
Primary variations of LR grammars/parsers
SLR (Simple LR)
LALR (Look Ahead LR)
LR most general but also most complicated to
implement
We'll leave details to CS 1622
71
Parsers
Top-down parsers
Build the parse trees from the root down as the
tokens are processed

Also called predictive parsers, or LL parsers
Left-to-right processing of string

Leftmost derivation of parse tree
The LL(1) that we saw before means we can parse
with only one token lookahead
More restrictive than LR parsers: there are
grammars generated by Deterministic PDAs that

are not LL grammars (i.e. cannot be parsed by
an LL parser)
Some restrictions on productions allowed
Cannot handle left-recursion we'll see why shortly
72
Parsers
Implementing a top-down parser
One technique is Recursive Descent
Can think of each production as a function
As string of tokens is parsed, terminal symbols
are consumed/processed and non-terminal

symbols generate function calls
Now we can see why left-recursive productions
cannot be handled
From Example 3.4
<expr> <expr> + <term>
Recursion will continue indefinitely without consuming
any symbols
73
Parsers
Luckily, in most cases a grammar with left
recursion can be converted into one with only

right-recursion
Recursive Descent parsers can be written by
hand, or generated
Think of a program that processes the grammar by
creating a function shell for each non-terminal

Then details of function are filled in based upon the
various right-hand sides the non-terminal generates
See example
74
LL(1) Grammars
So how can we tell if a grammar is LL(1)?
Given the current non-terminal (or left side of a
production) and the next terminal we must be

able to uniquely determine the right side of the
production to follow
Remember that a non-terminal can have multiple
productions
As we previously mentioned, the grammar must
not be left recursive

However, not having left recursion is necessary but
not sufficient for an LL(1) grammar
75
LL(1)
Grammars
Ex:
A aX | aY
We cannot determine which right side to follow
without more information than just "a"
How can we process a grammar to determine if
this situation occurs?

Calculate the First set for each RHS of productions
First set of a sequence of symbols, S, is the set of
terminals that begin the strings derived from S

Given multiple RHS for nonterminal N
N 1 | 2
If First(1) and First(2) intersect, the grammar is not
LL(1)
76
LL(1)
Grammars
So how do we calculate First() sets?
Algorithm is given in Aho (see Slide 2)
Consider symbol X
If X is a terminal, First(X) = {X}
If X is a production, add to First(X)
If X is a nonterminal and X Y1Y2Yk is a production
Add a to First(X) if, for some i, a is in First(Yi) and is in all of
First(Y1) First(Yi-1)
Add to First(X) if is in First(Yj) for all j = 1, 2, k
To calculate First(X1X2Xn) for some X1X2Xn

Add non- symbols of First(X1)
If is in First(X1), and non- symbols of First(X2)
If is in all First(Xi), add

77
LL(1) Grammars
A aB | b | cBB
B aB | bA | aBb
A aB | CD | E |
B b
C cA |
D dA
E dB
78
Semantics
Sematics indicate the meaning of a program
What do the symbols just parsed actually say to
do?
Two different kinds of semantics:
Static Semantics
Almost an extension of program syntax
Deals with structure more than meaning, but at a
meta level
Handles structural details that are difficult or
impossible to handle with the parser
Ex: Has variable X been declared prior to its use?
Ex: Do variable types match?
79
Semantics
Dynamic Semantics (often just called semantics)
What does the syntax mean?
Ex: Control statements
Ex: Parameter passing
Programmer needs to know meaning of statements
before he/she can use language effectively
80
Semantics
Static Semantics
One technique for determining/checking static
semantics is Attribute Grammars

Start with a context-free grammar, and add to
it:
Attributes (for the grammar symbols)
Indicate some properties of the symbols
Attribute computation functions (semantic
functions)
Allow attributes to be determined
Predicate functions
Indicate the static semantic rules
81
Semantics
Attributes made up of synthesized attributes
and inherited attributes

Synthesized Attributes
Formed using attributes of grammar symbols lower in
the parse tree

Ex: Result type of an expression is synthesized from the
types of the subexpressions
Inherited Attributes
Formed using attributes of grammar symbols higher in
the parse tree

Ex: Type of RHS of an assignment is expected to match
that of LHS the type is inherited from the type of the
LHS variable
82
Semantics
Semantic Functions
Indicate how attributes are derived, based on the
static semantics of the language
Ex: A = B + C;
Assume A, B and C can be integers or floats
If B and C are both integers, RHS result type is integer,
otherwise it is float
Predicate functions
Test attributes of symbols processed to see if they
match those defined by language
Ex: A = B + C
If RHS type attribute is not equal to LHS type attribute,
error (in some languages)

83
Semantics
Detailed Example in text
Grammar Rules
1)
2)
3)
4)
<assign> <var> = <expr>

<expr> <var> + <var>
<expr> <var>
<var> A | B | C
Attributes
actual_type: actual type of <var> or <expr> in
question (synthesized, but for a <var> we say this is

an intrinsic attribute)
expected_type: associated with <expr>, indicating the
type that it SHOULD be inherited from actual_type of
<var>
84
Semantics
Semantic functions
Parallel to syntax rules of the grammar

See Ex. 3.6 in text
1) <assign> <var> = <expr>

<expr>.expected_type <var>.actual_type
2) <expr> <var>[2] + <var>[3]
<expr>.actual_type if (<var>[2].actual_type =
int) and
(<var>[3].actual_type = int) then
int
else real
end if
3) <expr> <var>
<expr>.actual_type <var>.actual_type
4) <var> A | B | C
<var>.actual_type look-up(<var>.string)
Predicate functions
Only one needed here do the types match?
<expr>.actual_type == <expr>.expected_type
85
Semantics
Ex: A = B + C
<assign
>
actual_type
<var
>
A
<var>[2
]
actual_type
expected
<expr actual_type
type
>
+
<var>[3
]
actual_type
C
86
Semantics
Attribute grammars are useful but not typically
used in their pure form for full-scale languages

makes the grammars more complicated and
compilers more difficult to generate
87
Semantics
Dynamic Semantics (semantics)
Clearly vital to the understanding of the
language
In early languages they were simply informal
like manual pages
Efforts have been made in later years to
formalize semantics, just as syntax has been
formalized
But semantics tend to be more complex and
less precisely defined
More difficult to formalize
88
Semantics
Some techniques have gained support however
Operational Semantics
Define meaning by result of execution on a primitive
machine, examining the state of the machine before

and after the execution
Axiomatic Semantics
Preconditions and postconditions define meaning of
statements
Used in conjunction with proofs of program correctness
Denotational Semantics
Map syntactic constructs into mathematical objects
that model their meaning

Quite rigorous and complex
89
Identifiers, Reserved Words and

Keywords
Identifier
String of characters used to name an entity
within a program
Most languages have similar rules for ids, but
not always
C++ and Java are case-sensitive, while Ada is not
Can be a good thing: mixing case allows for longer,
more readable names, ala Java:

NoninvertibleTransformException
Can be a bad thing: should that first i be upper or lower
case?
90

Keywords
C++, Ada and Java allow underscores, while
standard Pascal does not

FORTRAN originally allowed only 6 chars
Reserved Word
Name whose definition is part of the syntax of
the language
Cannot be used by programmer in any other way
Most newer languages have reserved words

Make parsing easier, since each reserved word will
be a different token
91

Keywords
Ex: end if in Ada
Interesting extension topic

If we extend a language and add new reserved
words, we may make some old programs
syntactically incorrect
Ex: C subprogram using class as an id will not compile
with a C++ compiler

Ex: Ada 83 program using abstract as an id will not
compile with an Ada 95 compiler
Keywords
To some, keyword reserved word
Ex: C++, Java
92

Keywords
To others, there is a difference
Keywords are only special in certain contexts
Can be redefined in other contexts
Ex: FORTRAN keywords may be redefined
Predefined Identifiers
Identifiers defined by the language
implementers, which may be redefined

cin, cout in C++
real, integer in Pascal
predefined classes in Java
93

Keywords
Programmer may wish to redefine for a specific
application
Ex: Change a Java interface to include an extra method
Problem: predefined version no longer applies, so
program segments that depend on it are invalid

Better to extend a class or compose a new class than
to redefine a predefined class

Ex: Comparable interface can be implemented as we
see fit by a new class
94
Variables
Simple (nave) definition: a name for a
memory location
In fact, it is really much more
Six attributes
Name
Address
Value
Type
Lifetime
Scope
95
Variables
Name:
Identifier
In most languages the same name may be used
for different variables, as long as there is no

ambiguity
Some exceptions
Ex: A method variable name may be declared only
once within a Java method
96
Variables
Address
Location in memory
Also called the l-value
Some situations that are possible:
Different variables with the same name have
different addresses
Declared in different blocks of the program
Same variable has different addresses at different
points in time
Declared in a subprogram and allocated based on run-
time stack
Well discuss this more shortly
97
Variables
Different variables share same address: aliasing
Occurs with FORTAN EQUIVALENCE, Pascal and Ada
record variants, C and C++ unions, pointer variables,

reference parameters
Adds to the flexibility of a language, especially with
pointers and reference parameters
Can also save memory in some situations
Many references to a single copy rather than having multiple
copies
Can be quite problematic if programmer does not
handle them correctly

Ex: copy constructor and = operator for classes with dynamic
components in C++
Ex: shallow copy of arrays in Java
Well discuss most of these things more in later
chapters
98
Variables
Type
Modern data types include both the structure of
the data and the operations that go with it

Important in determining legality of some
expressions
Value
Contents of memory locations allocated for that
variable
Also called the r-value
99
Variables
Lifetime
Time during which the variable is bound to a
specific memory location

Scope
Section of the program in which a variable is
visible
Accessible to the programmer/code in that section
100
Binding
Binding of variables deals with an association
of variable attributes with actual values

The time when each of these occurs in a
program is the binding time of the attribute
Static binding: occurs before runtime and does
not change during program execution

Dynamic binding: occurs during runtime or
changes during runtime
101
Binding
Name:
Occurs when program is written (chosen by
programmer) and for most languages will not

change: static
Address and Lifetime:
A memory location for a variable must be
allocated and deallocated at some point in a

program
The lifetime is the period between the
allocation and deallocation of the memory
location for that variable
102
Binding
We are ignoring bindings associated with a
computers virtual memory

This in fact could cause a variable to be bound and
unbound to different memory locations many times

throughout the execution of a program
Each time the data is swapped or paged out, the
location is physically unbound, and then rebound
when it is swapped or paged back in
We will ignore these issues since they are more related
to how the operating system and hardware execute

programs than they are to the way the variables are
declared and used within the program
103
Binding
Text puts lifetimes of variables into 4 categories
Static: bound to same memory cell for entire
program
Ex: static C++ variables
Stack-dynamic: bound and unbound based on run-
time stack
Ex: Pascal, C++, Ada subprogram variables
Explicit heap-dynamic: nameless cells that can only
be accessed using other variables (pointers)

Allocated (and poss. deallocated) explicitly by the
programmer
Ex: result of new in C++ or Java
104
Binding
Implicit heap-dynamic variables
Binding of all attributes (except name) changed upon
each assignment
Much overhead to maintain all of the dynamic
information
Used in Algol 68 and APL
Not used in most newer languages
See lifetimes.cpp
Value
Dynamic by the nature of a variable can
change during run-time
105
Binding
Type
Dynamic Binding
Type associated with a variable is determined at
run-time
A single variable could have many different types at
different points in a program
Static Binding
Type associated with a variable is determined at
compile-time (based on var. declaration)

Once declared, type of a variable does not change
106
Binding
Advantage of dynamic binding
More flexibility in programming
Can use same variable for different types
Can make operations generic
Disadv. of dynamic binding

Type-checking is limited and must be done at runtime
To understand this better, we must discuss
type-checking at more length

Well return to our last binding topic
(scope) afterward
107
Type-checking
Type checking
Determination that operands and operators in
an expression have types that are compatible

with each other
By compatible we mean
the types match or
they can be coerced (implicitly converted) to match
If they are not compatible, a type-error occurs
108
Type-checking
Why is type-checking important?
Lets review programming errors
Compilation error: detectable when the program is
compiled
Usually a syntax error or static semantic error
Run-time error: detectable as the program is being
run
Often an illegal instruction or I/O error
Logic error: error in meaning of the program

Often only detectable through debugging and/or testing
program on known data

Program could seem to run perfectly, but produce
incorrect results
109
Type-checking
Wed like an environment that is not conducive
to logic errors
Consider dynamic type binding again
Assignments cannot be type checked
Since type may change, any assignment is legal
If object being assigned is an erroneous type, we have
a logic error
Type checking that is done must be done at run-
time
This requires type information to be stored and
accessed at run-time
Must be done in software i.e. the language must be
interpreted
Increases both memory and run-time overhead
110
Type-checking
Now consider static type binding
Since types are set at compile-time, most (but not
usually all) type checking can be done at compiletype
Assignments can be checked to avoid logic errors
Type information does not need to be kept at runtime
Program can run in hardware
111
Type-checking
STRONGLY TYPED language
2 slightly different definitions
Traditional definition: If ALL type checking can be
done at compile-time (i.e. statically), a language is
strongly typed
Sebesta definition: If ALL type errors can always be
detected (either at compile-time or at run-time) a
language is strongly typed
First definition is more reliable but also more
restrictive
112
Type-checking
No commonly used languages are truly strongly
typed, but some come close

Lets look at two: C++ and Ada
C++ union construct allows programmer to access
same memory as different types with no checking

Ada record variants contain a discriminant to
determine which type is being used
Can only access the type indicated by the discriminant
Cannot change the discriminant unless you change the
entire record prevents inconsistency

Checking can be turned off, however
113
Type-checking
See union.cpp and variant.adb
Pascal also has a discriminant, but does not
require entire record to be assigned when

discriminant is changed
Suggests type-safety, but does not enforce it
Well look more at these types of structures in
Chapter 6
114
Type-checking
So what does compatible mean?
We said type-checking involves determining if
operands and operators are compatible

Different languages define it differently
Name compatible (or name equivalent)
The types have the same name
Easy to check and enforce
Simply record the type of each variable in the symbol
table
Compare when necessary
115
Type-checking
Somewhat limiting for programmer
Ex. in Pascal
A1: array[1..10] of integer;

A2 := A1; { not allowed }
Assignment is not legal even though they have the
same structure
Also cannot pass either variable as a parameter to a
subprogram
Variables above actually each have an ANONYMOUS
TYPE not name compatible with any other type
Generally not a good idea to use
116
Type-checking
Structurally compatible (equivalent)
The types are compatible if they have the same
structure
Ex:
A2 := A1; { this would be allowed }
Since both have same size and base type, it works
Much more flexible than name compatible, but also
much more difficult for compiler and not as clear to

programmer
Compiler must compare structures of every variable
involved in the expression

It is not obvious what is and is not compatible
117
Type-checking
record
X: float;
A: array[1..10] of int;
end record
Can compiler tell that
components are
reversed?
A1:array[1..10] of float;
Structure is the same
record
A: array[1..10] of int;
X: float;
end record
Could it match the
types with more

complex records?
A2:array[3..12] of float;
Index values are
changed
118
Type Checking
So how about some languages?
Ada: name equivalence but allows subtypes to be
compatible with parent types
Pascal: almost name equivalence, but considers
variables in the same declaration to also be of the
same type, and allows one type name to be set
equal to another
A1, A2: array[1..10] of integer;
above not compatible in Ada
type newint = integer;
C++: name equivalence, but a lot of coercion is
done we will look at coercion later

See types.p, types.adb, types.cpp
119
Binding
Ok, back to binding
Scope (visibility)
Static scope: determined at compile-time
Dynamic scope: determined at run-time
Implications:
Most languages have the notion of local variables
(i.e. within a block or a subprogram)

If these are the only variables we use, scope is not
important
120
Binding
Scope become significant when dealing with non-
local variables
How and where are these accessible?
Most modern languages use static scope

If variable is not locally declared, proceed out to
the textual parent (static parent) of the

block/subprogram until the declaration is found
Fairly clear to programmer can look at code to
see scope
Well discuss implementation later
121
Binding
Examples:
Pascal
Subprograms are only scope blocks, but can be nested to
arbitrary depth
All declarations must be at the beginning of a subprogram
Somewhat restrictive, although not the fault of static scope
Ada
Subprograms can be nested
Also allow declare blocks to nest scope within the same
subprogram
Useful to variable length arrays
All declarations must be at the beginning of a subprogram
or declare block
See arraytest.adb
122
Binding
C++
Subprograms CANNOT be nested
New declaration blocks can be made with {}
Declarations can be made anywhere within a block, and
last until the end of the block
Interesting note:
What about the scope of loop control variables in
for loops?
Ada always implicitly declares LCVs and scope (and
lifetime) is LOCAL to the loop body

C++ and Java for is more general and does not require
a locally declared LCV, but if included, is also LOCAL to
the loop body
Wasnt always the case in C++
123
Binding
Dynamic scope
Non-local variables are accessed via calls on
the run-time stack (going from top to bottom

until declaration is found)
A non-local reference could be declared in different
blocks in different runs

Used in APL, SNOBAL4 and through local variables
in Perl
Flexible but very tricky
Difficult for programmer to anticipate different
definitions of non-local variables
124
Binding
Type-checking must be dynamic, since types of non-
locals are not known until run-time

More time-consuming
See scope.pl
125
Scope, Lifetime and Referencing

Environments
Concepts of Scope and Lifetime are not
comparable
Lifetime deals with existence and association of
memory
WHEN
Scope deals with visibility of variable names

WHERE
However, sometimes they seem to be the
same
126

Environments
Ex. stack-dynamic variables in some places
Ex. global variables in some situations
#include <iostream>
using namespace std;
int i = 10;
int main()
{
{
int i = 11, j = 20;
{
int i = 12, k = 30;
cout << i << " " << j << " " << k << endl;
}
cout << i << " " << j << endl;
}
cout << i << endl;
}
127

Environments
More often they are not the same
Lifetime of stack-dynamic variables continues
when a subsequent function is called, whereas

scope does not include the body of the
subsequent function
Lifetime of heap-dynamic variables continues
even if they are not accessible at all
128

Environments
Referencing Environment
Given a statement in a program, what variables
are visible there?

Clearly this depends on the type of scoping
used
Once we know the scope rules, we can always
figure this out
From code if static scoping is used

From call chains (at run-time) if dynamic scoping is
used
We will look more at non-local variable access
when we discuss subprograms
129
Primitive Data Types

Most languages provide these
Numeric types
Integer
Usually 2s complement
Exact representation of the number
In some languages (ex. C++) size is not specified
Depends on hardware
In others (ex. Java) size is specified

Regardless of hardware
130

Float
Usually exponent/mantissa form, each coded in
binary
Often an approximation of the number
Only a limited number of bits for mantissa
Decimal point floats
Many numbers cannot be represented exactly
Irrational numbers (ex. PI, e, 21/2)
Infinite repeating decimals (ex. 1/3)
Some terminating decimals as well (ex. 0.1)
Instead we use digits of precision
In most new computers, this is defined by IEEE Standard 754
See p. 255 in text
See rounding.cpp
131

Fixed point
Called Decimal in text
Store rational numbers with a fixed number of
decimal places
Useful if we need decimal point to stay put
Ex. Working with money
Can be stored as Binary Coded Decimal each digit
encoded in binary
Similar to strings, but we can put 2 digits into one byte,
since each digit needs only 4 bits

However, 10 digit values do not actually require 4 full
bits
So memory is somewhat wasted in this representation
132

Can also be stored as integers, with an extra attribute in
the descriptor
Scale factor indicates how many places over to locate decimal
point
Ex: X = 102.53 and Y = 32.65
Stored as 10253 and 3265 (in binary) with scale factor of 2
Z=X+Y
Add the integers and keep the same scale factor (involves normalization
if the number of decimal places are not the same think about this)
= 13518 with scale factor 2 = 135.18

Z=X*Y
Multiply the integers and add the scale factors, then normalize to the
correct number of digits
= 33476045 with scale factor 4

= 3347.6045 normalized to 2 digits
= 3347.60
133
Primitive
Data
But clearly
limited inTypes
size by integer size
In Java the BigDecimal class (not primitive) stores
them as (very long) integers with a 32 bit scale

factor see BigD.java
Very Long Numbers

Typically NOT primitive types
Implemented using arrays via a predefined class (ex.
BigInteger and BigDecimal in Java) or via an add-on
library (NTL library for C++)
Boolean
Used in most languages (except C)
Values are true or false
134

Though stored as integer values (0 or 1),
boolean values are typically not compatible

with integer values
Exception is C++, where all non-zero values are
true, 0 is false
This adds flexibility, but can cause logic errors
Recurring theme!
Remember assign.cpp
135

Character
In early computers, character codes were not
standardized
Caused trouble for transferring text files
Now most computers use ASCII character set
(or extended ASCII set)

But ASCII is not adequate for international
character sets
Unicode is used in Java, perhaps in other languages
soon
16 bit code
136

ASCII is also not the same for Unix platforms
and Windows platforms

Unix uses only a <LF> while Windows uses a
<CR><LF> combination
Can cause problems for programs not equipped to
handle the difference
We can easily convert, however
FTP in ASCII mode
Simple script to convert
137
More Data Types

Strings
Not a primitive type in many languages
C, C++, Pascal, Ada
Primitive type in others
BASIC, FORTRAN, Perl
Issues to consider:
Should a string be considered a single entity, or a
collection of characters?
Should the length of a string be fixed or variable?
Which operations can be used?
138
More
Data Types
Single entity vs. collection of characters
More an issue of access than of structure
In languages with no primitive string type, a
string is generally perceived as a collection of

characters
Ex: In C and C++ a string is an array of char
If we want we can treat it like any other array
If we use operations in string.h (or <cstring>) we
can treat it like a string
C++ also has string class
Ex: In Ada String is a predefined unconstrained
array of characters
String variables must be constrained
Some simple operations can be used (ex.
assignment, comparison, concatenation)
139
More Data Types

In Perl, string is a primitive type
Strings are accessed as single entities
We can use functions to get character index values, but
strings are stored as single, primitive variable values

Many operations can be used (later)
In Java, we have two string types
Neither is really primitive, since they are classes built
using an underlying array of char

String
Immutable objects (cannot be changed once created)
Allows strings to be stored and accessed more
efficiently, since multiple objects with the same value
can be replaced with a single object (at compile-time
usually done for literals)
140
More Data Types

StringBuffer
Objects that can be modified after creation
More efficient if programmer is altering string
values, since new objects do not have to be created
for each operation
String length
3 variations
1) Static (fixed) length length of string is set when
the object is created and cannot change

2) Limited dynamic length length of string can vary
up to a preset maximum
Typically the dynamic aspect is logical physically the

string size is preset, but some of the space may not be
used
3) Dynamic length length of string can vary without141
More Data Types

1) Used in Pascal, Ada
String variable is declared to be of a fixed size, with

all locations being part of the string
It is up to programmer to either pad string at end or
somehow ignore unused locations
However, Ada improves on Pascal strings by
making the String type an unconstrained array
String objects must have a fixed length, but they are all
of the same type (ex: for params)
See astrings.adb and pstrings.p
2) Used in C, C++ (c strings)
String variable is of a fixed maximum size, but any

number of characters up to the maximum length
may be used at any time
In C and C++ a special sentinel character (\0) is
used to indicate the end of the logical string
Implementation is VERY dangerous (what else is
new!)
See cstrings.cpp
142
Used in Perl,
Java StringBuffer, C++ string class
More Data
Types
3)
(among others)
Physical length of the string object is

automatically adjusted by the system to hold the
current length logical string
Memory is reallocated if necessary
Only limit on string size is amount of memory
available
Realize that this is not free: each time the string
has to be resized the following occurs:
New memory of new size is allocated

Old data is copied to new memory
Variable is rebound to new memory
Old memory is discarded
It is smart for system to double memory each

time so resizing is infrequent
Well discuss allocation later
See Strings.java
143
More Data Types

Operations
What can we do with our strings?
These are affected by how the previous issue
(length)
If the length must remain constant, we cannot have
any operations that would change the length

Some possibilities:
Slicing (give a substring)
Search for a string
Index
Regular expression matching
144
More Data Types

Enumeration Types
Used primarily for readability
Ex: January, February vs. 0, 1,
Typically map identifiers to integer values
Pascal
Identifiers can be used only once
I/O not predefined (user must write I/O operations
for each new enum. type)

Major negative
Can be used in for loops and case statements
145
More Data Types

Ada
Ids can be overloaded as long as the correct
definition can be determined through context

type colors is (red, blue, yellow, purple, green);
type signals is (red, yellow, green);
Now the literal green (or yellow or red) is ambiguous in
and of itself
We must qualify it in the code to the desired type
for C in signalsred..signalsgreen
I/O can be done through a generic I/O package
very helpful
Allows values to be output without having to write new
functions each time

146
More
Data Types
C++
Enum values are converted to ints by the
compiler
Lose the type checking that Ada and Pascal
provide
Java added Enum types in 1.5
All are a subclass of Enum so are objects
Subrange types
Improve readability (new name id) and
logic-error checking (restricting ranges)

Provided in Pascal and Ada
147
More Data Types

Arrays
Traditionally a homogeneous collection of
sequential locations
Many issues to consider with arrays a few are:
How are they indexed?
How/when is memory allocated?
How are multidimensional arrays handled?
What operations can be done on arrays?
148
More Data Types

Indexing
Mapping that takes an array ID and an index
value and produces an element within the array
Y = A[X];
Range of index values depends on two things:

1) How many items are in the array?
This depends on how allocation is allowed in a language
We will discuss more with memory allocation

2) Are any bounds implicit?
C, C++, Java arrays start at 0
Ada, Pascal can start anywhere
149
More Data Types

Whether range of indexes is static or dynamic, the
actual subscripts must be evaluated at run-time

Subscript could be a variable or expression whose
value is not known until run-time

If value is not within the range of the array, a rangeerror occurs
VERY COMMON logic error made in programs
Checked in Ada, Pascal and Java
Not checked in FORTRAN, C, C++, much to the chagrin of
programmers
Q: Why arent range-errors checked in C, C++?
150
More Data Types

Memory allocation
Remember discussion of allocation during Binding
lecture
Basically those bindings hold, with some slight
additions
Static: size of array is fixed and allocation is done at
compile-time
Fixed stack-dynamic: size of array is fixed but
allocation is done on run-time stack during
execution
Pascal arrays
"Normal" C++ arrays
151
More Data Types

Stack-dynamic: size of array is determined at run-
time and array is allocated on run-time stack;

however, once allocated, size is fixed
Ada arrays
See prev. Ada array example
Heap-dynamic:
Explicit: programmer allocates and deallocates arrays
C++, Java, FORTRAN 90 dynamic arrays
Implicit: array is resized automatically as needed by
system
Perl, Java ArrayList
See examples on board
152
More Data Types

Multiple Dimension Arrays
2-D Array: array of vectors plane Matrix
3-D Array: array of matrices or planes
Higher dimension arrays are not so easily
envisioned, but in most languages are legal
Subscripts for each dimension can be different
ranges, and, in fact even different types
Ex. Pascal:
type matrix = array[1..10, A..Z] of real;
Indexing function is a bit more complicated, as we
will see next
153
More Data Types

Implementing Arrays (1-D)
Two questions we need to answer:
What information do we need at compile-time?
What information do we need at run-time?
The answers to these questions depend
somewhat on other language properties

If dynamic type-binding is used, most information
must be kept at run-time

Even with static type-binding, some information
may be needed at run-time
154
More Data Types

Lets assume static type-binding, since most
languages use this

What do we need to know at compile-time?
Element type
Index type
Above two needed for type checking
Element size
Index range (lower and upper bounds)
Array base address
Above items are needed to construct the indexing function for the array
Lets see how the indexing function is created

Given an array, A, we want a function to return the Lvalue
(address) of ith location A

155
More Data Types

Assume:
E = Element size
LB = Lower index bound
UB = Upper index bound
S = start (base) address of array
Lvalue(A[i]) = S + (i LB) E
= (S LB E) + (i E)
Note that once S is known (array has been bound
to memory), left part of the equation is constant
So for each array access, only (i E) must be calculated
If i is a constant, this too can be precalculated (ex. A[5])
156
More Data Types

What do we need to know at run-time?
This depends on what kind of checking we are doing
Ex: C/C++ need only the start address (S) of the array
and the element size, (E)

LB is always 0
UB is not checked
Ex: Ada, Pascal need more:

LB and UB needed for bounds checking
Run-time information for an array is typically stored
in a run-time descriptor (sometimes called a dopevector)

This may be stored in the same memory block as the
data itself (ex. at the beginning), or elsewhere in

memory
157
More Data Types

Implementing Arrays (multi-D)
This is more complicated than 1-D, since
computer memory is typically linear

Thus multi-D arrays are still stored physically as
1-D arrays
Language must allow them to be accessed in a
multi-D way
Ex. Two-D arrays
In most languages stored in row-major order
Line up rows end-to-end to produce the linear physical
array
158
More Data Types

Column-major order also exists used in FORTRAN
Index function for 2-D arrays is a bit more
complicated than for 1-D, but is still not that

hard to calculate
lvalue(A[i,j]) = S + (i LB1)ROWLENGTH
+ (j LB2)E
where ROWLENGTH = (UB2 LB2+ 1)E
Do example
Given int A[10][5] calculate A[4][3]
159
More Data Types

Can we access arrays in any other way?
In some languages (ex: C, C++) we can use
pointers for array access
In this case rather than calculating the index of the
location as an offset from the base address, we
move the pointer to the address of the location we
want to access
For sequential access of the items in an array, this
is much more efficient than using traditional
indexing
Of course, it is also dangerous!
See ptrarrays.cpp
160
More Data Types

Array Operations
What can be done to array as a whole?
Assignment
Allowed in languages in which array is actually a data
type (ex. Ada, Pascal, Java)

Usually bitwise copy between arrays (shallow copy)
C/C++ do not allow assignment since array variables
are simply (constant) pointers
Comparison
Equal and not equal allowed in Ada
With Java references are compared
More complex operations are allowed in some
languages (ex. APL)

161
More
Data Types
Associative Arrays
Instead of being indexed on integers (or
simple enumeration values), index on

arbitrary keys (usually strings)
Ex: Perl hash (also exists in Smalltalk and Java)
and PHP arrays

Java and other langs also have Hashtable
classes
Typically done following hashing
procedures
Hash function used to map key to an index,
mod the table size

Keys are dispersed in a random fashion, to
reduce collisions
162
More Data Types

Typically in these language-implemented hashes,
the table size is not shown to the programmer

Programmer does not need to know details in order to
use it
Usually it grows and shrinks dynamically as the table
fills or becomes smaller
Resizing is expensive all items must be rehashed
into new sized table

Would like to not have to resize too often
Doubling size with each expansion is a good idea
163
More Data Types

Records
Heterogeneous collections of data, accessed by
component names
Very useful in creating structured data types
Forerunners of objects (do not have the
operations, just the data)
Access fairly uniform across languages with dot
notation
Fields accessed by name since they may have
varying size
164
More Data Types

Pointers
Variables that store addresses of other variables
01001101 3.14159
01001101
In most high-level languages, their primary use
is to allow access of dynamic memory
Exception: C, where pointers are required for
reference parameters and where they are also

used heavily for array access
Well talk about parameters later
165
More Data Types

To access memory being pointed to, a pointer is
dereferenced
Usually explicit:
C++ Ex: *P = 6.02;

Implicit in some contexts:
Ada Ex: rec_ptr.field
166
More Data Types

Pointer Issues:
Should pointers be typed like other variables?
How do the scope and lifetime of the pointer
variable relate (if at all) to the scope and lifetime of
the heap-dynamic variable?
What are the safety/access issues a programmer
using pointers must be concerned with?
How is heap-dynamic memory maintained?
What are reference types and how are they
similar/different to pointer types?
167
More Data Types

Types
Most languages with pointers require them to be
typed
This allows type-checking of heap-dynamic memory
Scope/Lifetime/Safety/Acess
In most languages the scope and lifetime of heap-
dynamic variables are distinct from those of the

pointer variables
If pointer variable is a stack-dynamic variable, its scope
and lifetime are as we discussed previously

Heap-dynamic variables are also as we discussed
previously
168
More Data Types

Thus we have the potential for 2 different
problems:
1) Lifetime of pointer variable accessing a heap-
dynamic variable ends, but heap-dynamic

variable still exists
Now heap-dynamic variable is no longer accessible
This problem can also occur by simply reassigning
the pointer variable

Now we have formed GARBAGE heap-dynamic
variable that can no longer be accessed
Different languages handle garbage in different
ways we will discuss them shortly
169
More Data Types

2) Heap-dynamic variable is in some way
deallocated (returned to system), but pointer

variable is still storing its address
Now address stored in pointer is invalid
Can cause problems if it is dereferenced, especially
if the memory has been reallocated for some other

use
This is a dangling reference
Can be catastrophic to a program, as most C++
programmers know
To deal with these problems, we must discuss
how heap-dynamic memory is maintained
170

Course Notes On SPL

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Course Notes On SPL

Uploaded by

Copyright:

Available Formats

Course Notes for

languages, their purposes and their histories

syntax and semantics

Semantics indicate meaning of the code

constructs and how they are used and

Assignment statements and expressions

Statement level concurrency

Language Development Issues

understand and debug

Language Development Issues

happen to meet different programming

LISP, Scheme (& Prolog)

Perl, PHP, Javascript

Language Development Issues

just by reading it?

Language Development Issues

not be so good for systems programming

Language Development Issues

Examples of features that help writability:

Ex: Assignment vs. comparison in C++

Programs have the ability to recover from

Exception handling we will discuss more later

Language Design Issues

von Neumann type computers

General language approach

Language Design Issues

imperative, but most modern oo languages have

Language Design Issues

Example: Prolog see example program

Language Design Issues

Language Implementation Issues

How is HLL code processed and executed on

Breaks up code into lexical units, or tokens

Tokens are parsed and examined for correct

3) Semantic Analysis/Intermediate Code Generation

Declaration and type errors are checked here

Unnecessary statements eliminated

Intermediate code is converted into executable

Note that steps 1) and 2) are independent of the

architecture depend only upon the language

intermediate code is generated

Brief, Incomplete PL History

the IBM 704 machine

Many of these features were added in future

Had some interesting features that are now

Still widely used, due in part to the large cost of

rewriting software from scratch

than rewrite them in a different language

At one time used extensively for AI

included some imperative features

describe the language syntax

imperative languages after it

emphasis of good programming style, became

Unix operating system

Many programmers (esp. systems programmers)

Contains some important features

even though reliability was one of its goals!

Ada 95 added object-oriented features

Java and other OO languages

fruition until 1980

Also included software devel. environment

Added object-oriented features and some additional

typing features to improve C