Professional Documents
Culture Documents
Structure of
Programming
Languages
Part A
Goals of Course
To survey the various programming
some things?
To examine methods for describing language
Goals of Course
Syntax indicates structure of program code
How can language designer specify this?
How can programmer learn language?
How can compiler recognize this?
Lexical analysis
Parsing (syntax analysis)
Brief discussing of parsing techniques
Goals of Course
To examine some language features and
Goals of Course
Pointer (reference) types
Goals of Course
Process abstraction: procedures and functions
Parameters and parameter-passing
Generic subprograms
Nonlocal environments and side-effects
Implementing subprograms
Subprograms in static-scoped languages
Subprograms in dynamic-scoped languages
Goals of course
Data abstraction and abstract data types
Object-oriented programming
Design issues
Implementations in various object-oriented
languages
Concurrency
Concurrency issues
Subprogram level concurrency
Implementations in various languages
Goals of Course
Exception handling
Issues and implementations
IF TIME PERMITS
Functional programming languages
Logic programming languages
languages?
Machine code is too difficult for us to read,
programming languages?
Different people and companies developed
them
9
Business applications
COBOL
AI
Systems programming
C
Web programming
General purpose
C++, Ada, Java
10
evaluation criteria
Readability
How much can non-author understand logic of code
Writability
Not dissimilar to readability
How easy is it for programmer to use language
effectively?
Can depend on the domain in which it is being used
Ex: LISP is very writable for AI applications but would
12
Reliability
Two different ideas of reliability
1. Programs are less susceptible to logic errors
2.
13
iteration
Examples: FORTRAN, Pascal, C, Ada, C++, Java
14
instead
Examples: LISP, Scheme
15
inferences
Programs are in a sense theorems
User enters a proposition and system uses programmers
rules and propositions in an attempt to prove it
Typical outputs:
Yes: Proposition can be established by program
No: Proposition cannot be established by program
Cost
What are the overall costs associated with a given
language?
How does the design affect that cost?
16
Writing programs
Is language a good fit for the task?
Compiling programs
How long does it take to compile programs?
This is not as important now as it once was
Executing programs
How long does program take to run?
Often there is a trade-off here
Ex: Java is slower than C++ but it has many run-time features
(array bounds checking, security manager) that are lacking in
C++
17
Language Implementation
Issues
2) Syntax Analysis
Language Implementation
Issues
4) Code Generation
20
Language Implementation
Issues
Interpreting
Program is executed in software, by an interpreter
Source level instructions are executed by a virtual
machine
Allows for robust run-time error checking and
debugging
Penalty is speed of execution
Example: Some LISP implementations, Unix shell
scripts and Web server scripts
21
Language Implementation
Issues
Hybrid
First 3 phases of compilation are done, and
22
23
Brief PL History
Lacked many features that we new take for granted
in programming languages:
Conditional loops
Statement blocks
Recursive abilities
versions of FORTRAN
FORTRAN II, FORTRAN IV, FORTRAN 77, FORTRAN 90
obsolete
COMMON, EQUIVALENCE, GOTO
We may discuss what these are later
24
Brief PL History
Late 50s
COBOL
COmmon Business Oriented Language
Developed by US DoD
Separated data and procedure divisions
But didnt allow functions or parameters
25
Brief PL History
LISP
LISt Processing
Developed by John McCarthy of MIT
Functional language
Good for symbolic manipulation, list processing
Had recursion and conditional expressions
Not in original FORTRAN
26
Brief PL History
ALGOL
ALGOL 58 and then ALGOL 60 both designed by
international committee
Goals for the language:
Syntax should be similar to mathematical notation and
readable
Should be usable for algorithms in publications
Should be compilable into machine code
Included some interesting features
Pass by value and pass by name (wacky!) parameters
Recursion (first in an imperative language)
Dynamic arrays
Block structure and local variables
27
Brief PL History
Introduced Backus-Naur Form (BNF) as a way to
time
Never widely used, but influenced virtually all
28
Brief PL History
Late 60s
Simula 67
Designed for simulation applications
Introduced some interesting features
Classes for data abstraction
Coroutines for re-entrant subprograms
ALGOL 68
Emphasized orthogonality and user-defined data
types
Not widely used
29
Brief PL History
70s
Pascal
Developed by Nicklaus Wirth
No major innovations, but due to its simplicity and
30
Brief PL History
Void pointers
Coerces many types
love it
Language purists hate it
Easy to miss logic errors
Prolog
Logic programming
We discussed it a bit already
Still used somewhat, mostly in AI
May discuss in more detail later
31
Brief PL History
80s
Ada
Developed over a number of years by DoD
Goal was to have one language for all DoD
applications
Especially for embedded systems
32
Brief PL History
Very large language difficult to program reliably,
33
Brief PL History
Smalltalk
Designed and developed by Alan Kay
Concepts developed in 60s, but language did not come to
Brief PL History
C++
Developed largely by Bjarne Stroustrup as an
extension to C
Backward compatible
35
Brief PL History
Perl
Developed by Larry Wall
Takes features from C as well as scripting languages
awk, sed and sh
Some features:
Regular expression handling
Associative arrays
Implicit data typing
generation
Evolved into the archetypal Web scripting language
Has many proponents and detractors
36
Brief PL History
90s
Java
Interestingly enough, just like Ada, Java was
eliminated
No explicit pointers or pointer arithmetic
Array bounds checking
Garbage collection to reclaim dynamic memory
37
Brief PL History
Object model of Java actually more resembles that
default
But not as pure in its OO features as Smalltalk, due
pages
38
Brief PL History
00's (aughts? oughts? naughts?)
See http://www.randomhouse.com/wotd/index.pperl?date=19990803
C#
Main roots in C++ and Java with some other
influences as well
Used with the MS .NET programming environment
Some improvements and some deprovements
compared to Java
Likely to succeed given MS support
39
Program Syntax
Recall job of syntax analyzer:
Groups (parses) tokens (fed in from lexical
what is not?
How does it detect errors?
40
Program Syntax
To answer these questions we must look at
41
Program Syntax
Lexemes are categorized into different tokens and
<<, endl, ;, }
Tokens: iftok, lpar, idtok, lt, idtok, rpar, lbrace, idtok, llt,
idtok, llt, idtok, semi, rbrace
Note that some tokens correspond to single lexemes
42
Program Syntax
1)
an alphabet, .
2 related techniques:
Recognition
An algorithm or mechanism, R, will process
any given string, S, of lexemes and correctly
determine if S is within L or not
Not used for enumeration of all strings in L
Used by parser portion of compiler
43
Program Syntax
2) Generation
Produces valid sentences of L
Not as useful as recognition for compilation,
44
Program Syntax
So recognizers are what compilers need, but
45
Language Generators
Grammar
A mechanism (or set of rules) by which a
language is generated
Defined by the following:
A set of productions, P
Rules used in string generation
A starting symbol, S
46
Language Generators
Noam Chomsky described four classes of
47
Language Generators
Regular Grammars
Productions must be of the form:
<non> <ter><non> | <ter>
where <non> is a nonterminal, <ter> is a
48
Language Generators
Have following properties (among others)
Can generate strings of the form n, where is a
finite sequence and n is an integer
Pattern recognition
Language Generators
Example: Regular grammar to recognize Pascal
Id
9
50
Language Generators
Example: Regular grammar to generate a
Language Generators
If we could add a memory of some sort we
(PDA)
FSA with added push-down stack
52
Language Generators
So how to generate anbn ? Let a=0, b=1
N = {A} T = {0,1} S = A P =
A 0A1 | 01
balanced parentheses
N = {A} T = {(,)} S = A P =
A AA | (A) | ()
53
Language Generators
Context-free grammars are also equivalent to
BNF grammars
Developed by Backus and modified by Naur
Used initially to describe Algol 60
54
Language Generators
Example: Leftmost derivation of nested parens:
(()(()))
A (A)
(AA)
(()A)
(()(A))
(()(()))
55
Language Generators
Parse tree for (()(()))
A
(
A
)
A
(
)
)
56
Language Generators
If, for a given grammar, a string can be derived
ambiguous
Many other languages are not themselves
grammar is unambiguous
Semantics are often based on syntactic form
57
Language Generators
Ambiguous grammar example: Generate strings
P=
A BC | 0A1
B 0B | 0
C 1C | 1
B
0
1
B
0
A
0
C
1
A
B
1
C
0
58
Language Generators
We can easily make this grammar
unambiguous:
Remove production: A 0A1
Note that nonterminal B can generate an arbitrary
B
0
1
B
C
1
0
59
Language Generators
Lets look at a few more examples
Grammar to generate: {WWR | W {0,1} }
N = {A} T = {0,1} S = A P = ?
S 0A0 | 1A1 | 00 | 11
Grammar to generate: strings in {0,1} of the
60
Language Generators
See board
Ok, now how do we make a grammar to do this?
Make every string (even length) the result of two odd-
Productions:
In AB | BA
A 0A0 | 1A1 | 1A0 | 0A1 | 1
B 0B0 | 1B1 | 1B0 | 0B1| 0
61
Language Generators
Lets look at an example more relevant to
programming languages:
Grammar to generate simple assignment
Language Generators
stmt>* 20
Parse tree for: X<assig
= (A[2]+Y)
<var> = <arith expr>
<id>
<term>
<term>
*
<primary>
<primary>
<num>
<arith expr> )
<arith expr>
<term
>
<primary>
<var>
+ <term>
<primary>
<var>
<id>
63
Language Generators
Wow that seems like a very complicated parse
ambiguity
Extra non-terminals are often necessary to create
precedence
Precedence in previous grammar has * and / higher
than + and /
They would be lower in the parse tree
LOWER ABOVE IS CORRECT
Language Generators
But Context-free grammars cannot generate
everything
Ex: Strings of the form WW in {0,1}
Cannot guarantee that arbitrary string is the same
on both sides
Compare to WWR
These we can generate from the middle and build out
in each direction
For WW we would need separate productions for each
side, and we cannot coordinate the two with a contextfree grammar
Need Context-Sensitive in this case
65
Language Generators
Lets look at one more grammar example
Grammar to generate all postfix expressions
66
Language Generators
T = { <id>, *, - }
N={A}
S={A}
P=
A AA* | AA- | <id>
Show parse tree for previous example
Is this grammar LL(1)?
We will discuss what this means soon
67
Parsers
Ok, we can generate languages, but how to
recognize them?
We need to convert our generators into
recognizers, or parsers
We know that a Context-free grammar
corresponds to a Push-Down Automaton (PDA)
However, the PDA may be non-deterministic
As we saw in examples, to create a parse tree we
68
Parsers
May have to guess a few times before we get the
correct answer
This does not lend itself to programming
language parsing
Wed like parser to never have to guess
69
Parsers
There are two general categories of parsers
Bottom-up parsers
Can parse any language generated by a
Deterministic PDA
Build the parse trees from the leaves up back to the
root as the tokens are processed
At each step, a substring that matches the right-hand
70
Parsers
Correspond LR(k) grammars
implement
71
Parsers
Top-down parsers
72
Parsers
Implementing a top-down parser
One technique is Recursive Descent
Can think of each production as a function
As string of tokens is parsed, terminal symbols
cannot be handled
From Example 3.4
<expr> <expr> + <term>
Recursion will continue indefinitely without consuming
any symbols
73
Parsers
Luckily, in most cases a grammar with left
74
LL(1) Grammars
So how can we tell if a grammar is LL(1)?
Given the current non-terminal (or left side of a
productions
As we previously mentioned, the grammar must
75
LL(1)
Grammars
Ex:
A aX | aY
We cannot determine which right side to follow
without more information than just "a"
How can we process a grammar to determine if
LL(1)
76
LL(1)
Grammars
So how do we calculate First() sets?
Algorithm is given in Aho (see Slide 2)
Consider symbol X
If X is a terminal, First(X) = {X}
If X is a production, add to First(X)
If X is a nonterminal and X Y1Y2Yk is a production
Add a to First(X) if, for some i, a is in First(Yi) and is in all of
First(Y1) First(Yi-1)
Add to First(X) if is in First(Yj) for all j = 1, 2, k
LL(1) Grammars
A aB | b | cBB
B aB | bA | aBb
A aB | CD | E |
B b
C cA |
D dA
E dB
78
Semantics
Sematics indicate the meaning of a program
What do the symbols just parsed actually say to
do?
Two different kinds of semantics:
Static Semantics
Almost an extension of program syntax
Deals with structure more than meaning, but at a
meta level
Handles structural details that are difficult or
impossible to handle with the parser
Ex: Has variable X been declared prior to its use?
Ex: Do variable types match?
79
Semantics
Dynamic Semantics (often just called semantics)
What does the syntax mean?
Ex: Control statements
Ex: Parameter passing
Programmer needs to know meaning of statements
80
Semantics
Static Semantics
One technique for determining/checking static
functions)
Allow attributes to be determined
Predicate functions
Indicate the static semantic rules
81
Semantics
Attributes made up of synthesized attributes
Semantics
Semantic Functions
Indicate how attributes are derived, based on the
static semantics of the language
Ex: A = B + C;
Assume A, B and C can be integers or floats
If B and C are both integers, RHS result type is integer,
otherwise it is float
Predicate functions
Test attributes of symbols processed to see if they
match those defined by language
Ex: A = B + C
If RHS type attribute is not equal to LHS type attribute,
Semantics
Detailed Example in text
Grammar Rules
1)
2)
3)
4)
Attributes
actual_type: actual type of <var> or <expr> in
Semantics
Semantic functions
<expr>.actual_type == <expr>.expected_type
85
Semantics
Ex: A = B + C
<assign
>
actual_type
<var
>
A
<var>[2
]
actual_type
expected
<expr actual_type
type
>
+
<var>[3
]
actual_type
C
86
Semantics
Attribute grammars are useful but not typically
87
Semantics
Dynamic Semantics (semantics)
Clearly vital to the understanding of the
language
In early languages they were simply informal
like manual pages
Efforts have been made in later years to
formalize semantics, just as syntax has been
formalized
But semantics tend to be more complex and
less precisely defined
More difficult to formalize
88
Semantics
Some techniques have gained support however
Operational Semantics
Define meaning by result of execution on a primitive
statements
Used in conjunction with proofs of program correctness
Denotational Semantics
Map syntactic constructs into mathematical objects
within a program
Most languages have similar rules for ids, but
not always
C++ and Java are case-sensitive, while Ada is not
Can be a good thing: mixing case allows for longer,
90
Reserved Word
Name whose definition is part of the syntax of
the language
Cannot be used by programmer in any other way
91
Keywords
To some, keyword reserved word
Ex: C++, Java
92
Predefined Identifiers
Identifiers defined by the language
93
application
Ex: Change a Java interface to include an extra method
94
Variables
Simple (nave) definition: a name for a
memory location
In fact, it is really much more
Six attributes
Name
Address
Value
Type
Lifetime
Scope
95
Variables
Name:
Identifier
In most languages the same name may be used
96
Variables
Address
Location in memory
Also called the l-value
Some situations that are possible:
Different variables with the same name have
different addresses
Declared in different blocks of the program
points in time
Declared in a subprogram and allocated based on run-
time stack
Well discuss this more shortly
97
Variables
Different variables share same address: aliasing
Occurs with FORTAN EQUIVALENCE, Pascal and Ada
copies
components in C++
Ex: shallow copy of arrays in Java
chapters
98
Variables
Type
Modern data types include both the structure of
variable
Also called the r-value
99
Variables
Lifetime
Time during which the variable is bound to a
visible
Accessible to the programmer/code in that section
100
Binding
Binding of variables deals with an association
101
Binding
Name:
Occurs when program is written (chosen by
Binding
We are ignoring bindings associated with a
103
Binding
Text puts lifetimes of variables into 4 categories
Static: bound to same memory cell for entire
program
Ex: static C++ variables
time stack
Ex: Pascal, C++, Ada subprogram variables
programmer
Ex: result of new in C++ or Java
104
Binding
Implicit heap-dynamic variables
Binding of all attributes (except name) changed upon
each assignment
Much overhead to maintain all of the dynamic
information
Used in Algol 68 and APL
Not used in most newer languages
See lifetimes.cpp
Value
Dynamic by the nature of a variable can
105
Binding
Type
Dynamic Binding
Type associated with a variable is determined at
run-time
A single variable could have many different types at
different points in a program
Static Binding
Type associated with a variable is determined at
106
Binding
Advantage of dynamic binding
More flexibility in programming
Can use same variable for different types
Can make operations generic
Type-checking
Type checking
Determination that operands and operators in
108
Type-checking
Why is type-checking important?
Lets review programming errors
Compilation error: detectable when the program is
compiled
Usually a syntax error or static semantic error
run
Often an illegal instruction or I/O error
Type-checking
Wed like an environment that is not conducive
to logic errors
Consider dynamic type binding again
Assignments cannot be type checked
Since type may change, any assignment is legal
If object being assigned is an erroneous type, we have
a logic error
Type checking that is done must be done at run-
time
This requires type information to be stored and
accessed at run-time
Must be done in software i.e. the language must be
interpreted
Increases both memory and run-time overhead
110
Type-checking
Now consider static type binding
Since types are set at compile-time, most (but not
usually all) type checking can be done at compiletype
Assignments can be checked to avoid logic errors
Type information does not need to be kept at runtime
Program can run in hardware
111
Type-checking
STRONGLY TYPED language
2 slightly different definitions
Traditional definition: If ALL type checking can be
done at compile-time (i.e. statically), a language is
strongly typed
Sebesta definition: If ALL type errors can always be
detected (either at compile-time or at run-time) a
language is strongly typed
First definition is more reliable but also more
restrictive
112
Type-checking
No commonly used languages are truly strongly
113
Type-checking
See union.cpp and variant.adb
Pascal also has a discriminant, but does not
Chapter 6
114
Type-checking
So what does compatible mean?
We said type-checking involves determining if
table
Compare when necessary
115
Type-checking
Somewhat limiting for programmer
Ex. in Pascal
116
Type-checking
Structurally compatible (equivalent)
The types are compatible if they have the same
structure
Ex:
A1: array[1..10] of integer;
A2: array[1..10] of integer;
A2 := A1; { this would be allowed }
Since both have same size and base type, it works
Much more flexible than name compatible, but also
Type-checking
record
X: float;
A: array[1..10] of int;
end record
components are
reversed?
A1:array[1..10] of float;
record
A: array[1..10] of int;
X: float;
end record
changed
118
Type Checking
So how about some languages?
Ada: name equivalence but allows subtypes to be
compatible with parent types
Pascal: almost name equivalence, but considers
variables in the same declaration to also be of the
same type, and allows one type name to be set
equal to another
A1, A2: array[1..10] of integer;
above not compatible in Ada
type newint = integer;
C++: name equivalence, but a lot of coercion is
Binding
Ok, back to binding
Scope (visibility)
Static scope: determined at compile-time
Dynamic scope: determined at run-time
Implications:
Most languages have the notion of local variables
120
Binding
Scope become significant when dealing with non-
local variables
How and where are these accessible?
121
Binding
Examples:
Pascal
Subprograms are only scope blocks, but can be nested to
arbitrary depth
All declarations must be at the beginning of a subprogram
Somewhat restrictive, although not the fault of static scope
Ada
Subprograms can be nested
Also allow declare blocks to nest scope within the same
subprogram
Useful to variable length arrays
or declare block
See arraytest.adb
122
Binding
C++
Subprograms CANNOT be nested
New declaration blocks can be made with {}
Declarations can be made anywhere within a block, and
Interesting note:
What about the scope of loop control variables in
for loops?
Ada always implicitly declares LCVs and scope (and
123
Binding
Dynamic scope
Non-local variables are accessed via calls on
124
Binding
Type-checking must be dynamic, since types of non-
125
comparable
Lifetime deals with existence and association of
memory
WHEN
same
126
128
used
129
130
binary
Often an approximation of the number
Only a limited number of bits for mantissa
Decimal point floats
Many numbers cannot be represented exactly
Irrational numbers (ex. PI, e, 21/2)
Infinite repeating decimals (ex. 1/3)
Some terminating decimals as well (ex. 0.1)
Instead we use digits of precision
In most new computers, this is defined by IEEE Standard 754
See p. 255 in text
See rounding.cpp
131
encoded in binary
Similar to strings, but we can put 2 digits into one byte,
the descriptor
Scale factor indicates how many places over to locate decimal
point
Ex: X = 102.53 and Y = 32.65
Stored as 10253 and 3265 (in binary) with scale factor of 2
Z=X+Y
Add the integers and keep the same scale factor (involves normalization
if the number of decimal places are not the same think about this)
Primitive
Data
But clearly
limited inTypes
size by integer size
Boolean
Used in most languages (except C)
Values are true or false
134
true, 0 is false
This adds flexibility, but can cause logic errors
Recurring theme!
Remember assign.cpp
135
standardized
Caused trouble for transferring text files
soon
16 bit code
136
<CR><LF> combination
Can cause problems for programs not equipped to
handle the difference
We can easily convert, however
FTP in ASCII mode
Simple script to convert
137
Issues to consider:
Should a string be considered a single entity, or a
collection of characters?
Should the length of a string be fixed or variable?
Which operations can be used?
138
More
Data Types
Single entity vs. collection of characters
More an issue of access than of structure
In languages with no primitive string type, a
String length
3 variations
1) Static (fixed) length length of string is set when
String objects must have a fixed length, but they are all
of the same type (ex: for params)
See astrings.adb and pstrings.p
See cstrings.cpp
142
Used in Perl,
Java StringBuffer, C++ string class
More Data
Types
3)
(among others)
(length)
If the length must remain constant, we cannot have
144
145
very helpful
Allows values to be output without having to write new
More
Data Types
C++
Enum values are converted to ints by the
compiler
Lose the type checking that Ada and Pascal
provide
Java added Enum types in 1.5
All are a subclass of Enum so are objects
Subrange types
Improve readability (new name id) and
147
sequential locations
Many issues to consider with arrays a few are:
How are they indexed?
How/when is memory allocated?
How are multidimensional arrays handled?
What operations can be done on arrays?
148
149
programmers
150
additions
Static: size of array is fixed and allocation is done at
compile-time
Fixed stack-dynamic: size of array is fixed but
allocation is done on run-time stack during
execution
Pascal arrays
"Normal" C++ arrays
151
Heap-dynamic:
Explicit: programmer allocates and deallocates arrays
C++, Java, FORTRAN 90 dynamic arrays
Implicit: array is resized automatically as needed by
system
Perl, Java ArrayList
152
153
154
Element size
Index range (lower and upper bounds)
Array base address
Above items are needed to construct the indexing function for the array
Lvalue(A[i]) = S + (i LB) E
= (S LB E) + (i E)
Note that once S is known (array has been bound
to memory), left part of the equation is constant
So for each array access, only (i E) must be calculated
If i is a constant, this too can be precalculated (ex. A[5])
156
multi-D way
Ex. Two-D arrays
In most languages stored in row-major order
Line up rows end-to-end to produce the linear physical
array
158
159
160
More
Data Types
Associative Arrays
Instead of being indexed on integers (or
procedures
162
use it
Usually it grows and shrinks dynamically as the table
fills or becomes smaller
Resizing is expensive all items must be rehashed
163
component names
Very useful in creating structured data types
Forerunners of objects (do not have the
operations, just the data)
Access fairly uniform across languages with dot
notation
Fields accessed by name since they may have
varying size
164
01001101 3.14159
01001101
In most high-level languages, their primary use
is to allow access of dynamic memory
Exception: C, where pointers are required for
165
dereferenced
Usually explicit:
166
167
typed
This allows type-checking of heap-dynamic memory
Scope/Lifetime/Safety/Acess
In most languages the scope and lifetime of heap-
168
problems:
1) Lifetime of pointer variable accessing a heap-
169
170