You are on page 1of 38

Symbol Tables

CS331 Symbol Tables

Types and scope


Symbol Table is a scratch pad where the compiler keeps information about the objects in a program
Variables (storage areas) Functions, procedures

Enables compiler to do type checking and determine/control scope of a variable There are no type compatibility constraints or scoping rules at run time
CS331 Symbol Tables

Strongly Typed Language


No type error will occur when the program runs A type system is said to be strongly typed if it passes only type safe programs A language is strongly typed if its compiler is strongly typed C is not strongly typed Pascal, Java strongly typed
CS331 Symbol Tables

Scope
Scope rules of a language are used for specifying which declaration of a variable is associated with a specific occurrence of the variable Scope rules apply to variables, constants, new type definitions and functions

CS331 Symbol Tables

Static Scope
Also called Lexical Scope Lexical scope rules specify the association of variables with declarations based on just the examination of the source code The binding of variables to declarations is done at compile time Pascal and C use Static Scoping rules
CS331 Symbol Tables

Dynamic Scope
The binding of variables to declarations is done at run time Dynamic scoping can be achieved by copying a function verbatim at the place of call Pure Lisp and original Common Lisp, as well as Snobol and APL use Dynamic Scoping Rules

CS331 Symbol Tables

Static vs Dynamic
int i = 10; void fun1() { printf(Inside fun1%d\n, i); } void fun2() { int i = 20; fun1(); } int main() { fun1(); fun2(); fun1(); }

Static Scope
Inside fun1 10 Inside fun1 10 Inside fun1 10

Dynamic Scope
Inside fun1 10 Inside fun1 20 Inside fun1 10
CS331 Symbol Tables

Binding & Bound


Bound occurrence : an occurrence of a variable/function Binding (declaration) : gives new identity to the variable Scope rules determine which binding occurrence is associated with a bound occurrence of a variable, thereby specifying the scope of a binding variable
int i = 10; Occurrence { int i = 20; i = i + 2; Occurrence printf(%d\n, i ); // Binding

// Binding Occurrence // Bound

CS331 Symbol Tables

// Bound Occurrence

Declaration Vs. Definition

Declaration specifies the type of the variable Definition allocates storage for the variable

CS331 Symbol Tables

Blocks
A set of statements enclosed within blocking symbols (BEGIN and END, { and }, etc.) is called a block (compound statement) Blocks nest inside other blocks Blocks are either disjoint or nested A block-structured language allows procedures/functions to nest within other procedures/functions
CS331 Symbol Tables

Most-Closely-Nested Rule
An occurrence of a variable is associated with the innermost enclosing declaration of that variable Alternatively, a variable is bound to a binding of that variable (is in the scope) until the enclosing block ends, provided there are no reint I = 1; declarations of that variable
void fun1() { printf(%d\n, I); { int I = 3; printf(%d\n, I); } }

CS331 Symbol Tables

Local Variables
Also called automatic variables Formal Parameters are essentially local variables of the function Actual Parameters are the items passed in

Lifetime
The lifetime of a local variable is the function activation The lifetime of a static variable is the whole program execution
CS331 Symbol Tables

Scope in C
The scope of a declaration in Cis either Local or Global A block in C can have declarations (only at the beginning, though) All storage for a procedure is allocated up front The scope extends from that point onwards until the end of the block

CS331 Symbol Tables

Scopes in C++ and Java


Local Function File Class

A variable can be declared anywhere in a block and not just in the beginning of the block The scope extends from that point onwards till the end of the block
CS331 Symbol Tables

Scope Resolution

The Scope Resolution Operator ( :: ) From a method, one can access a global variable (that has been redeclared locally).

CS331 Symbol Tables

Example
int i=0; // (or extern int i;) class TestC { public: int i ; TestC() { i = 10;} void fun1() { printf("In TestC fun1: %d\n", i); printf("In TestC fun1, Class: %d\n", TestC::i); printf("In TestC fun1, Global:%d\n, ::i); } void fun2(); } void TestC::fun2() { int i = 20; printf("Inside TestC fun2, Local i: %d\n", i); printf("Inside TestC fun2, TestC i: %d\n", TestC::i); printf("Inside TestC fun2, Global i: %d\n",::i); }

CS331 Symbol Tables

Example
#include <stdio.h> #include "TestC.cpp" int main() { int i = 18; TestC *test = new TestC(); test->fun1(); test->fun2(); } Inside fun1 of TestC : 10 Inside fun1 of TestC : Class : 10 Inside fun1 of TestC : Global: 0 Inside fun2 of TestC : Local i: 20 Inside fun2 of TestC : TestC i: 10 Inside fun2 of TestC : Global i: 0
CS331 Symbol Tables

Scope for For loop in C++ vs. Java


for (int j=0; j<4; j++) { printf("Inside Loop1 j=%d\n", j); } for (j=0; j<4; j++) { printf("Inside Loop2 j=%d\n", j); }

In C++: ok, prints local value of j In Java: undefined variable j


CS331 Symbol Tables

Symbol Tables
Programming languages contain declarations + statements Declarations are non-executable
In fact, they are just compiler handwaving to make programming easier and enforce constraints
E.g. type consistency etc.

Symbol table is a database used by the compiler to maintain information about variables, procedures, etc.
CS331 Symbol Tables

Static Checking
Compiler enforces required declarations, type compatibility, etc. at compile-time
Machine code has no provision for any of this

To perform static checks, information must be recorded somewhere


Grammar specifies the syntax additional (semantic) information, sometimes called attributes, must be recorded in symbol table for all identifiers.
Typically attributes in a symbol table entry include type and CS331 Symbol Tables in the memory offset (where

Special requirements
A database, but with :
Speed : symbol table is accessed every time an ID or type is referenced
Table must be in memory

Ease of maintenance : Symbol table is the most complex data structure in the compiler Flexibility : a language like C does not limit the complexity of variable declarations
Must represent variables of arbitrary type Must be able to grow as symbols added

Support for duplicate entries : variables with the same name can exist at different nesting levels Ability to delete arbitrary elements/groups of elements (e.g. all variables local to a block)
CS331 Symbol Tables

Symbol Table Organization


Each entry is a record with two main parts:
Key : usually the identifier name Information : attributes associated with the key
Varies by kind of object
Simple variable : name, type, location (offset) Array : name, type, location, lower bound, upper bound Procedure : name, location, number of parameters, type of parameters

Functions:
Lookup : find entry for a given name Insert : add an entry Delete : delete an entry
CS331 Symbol Tables

Some possible ST structures Stack


Enter records with names (key) and associated information in FIFO fashion
If two variables declared with same name, most recently declared is seen first

Crude but workable if ST very small Easy to delete a block of declarations


CS331 Symbol Tables

Variable declarations are done in waves


int laurel, hardy; { int curly, larry, moe; { int house of representatives [435]; } } Stack pointer house of representatives moe larry

LEVEL 3 LEVEL 2 Static Nesting Level

curly
hardy

laurel

LEVEL 1

All variables associated with a block can be deleted at one time by adding a constant to the stack pointer
CS331 Symbol Tables

Disadvantages of Stack Inefficient linear search


Time would be prohibitive if the ST is large

Maximum size must be known at compile time


Have to allocate for worst case

CS331 Symbol Tables

Tree-based Symbol Tables

Binary tree
Solves the search time and limited size problems Average search time (balanced binary tree) is logarithmic Tree size can grow dynamically Ease of insertion

CS331 Symbol Tables

Binary tree
Deletion of arbitrary nodes is difficult
But not an issue in ST!
Nodes for a given level inserted as a block Newer levels deleted before older ones
Most recently inserted are leaves if inner node at most recent level, all children at same level

All variables in a block can be removed by breaking links without rearranging tree (always at the end of a branch)
CS331 Symbol Tables

Example
laurel

Break to delete level 2

hardy

moe

curly

larry
Break to delete level 3

house of representatives

CS331 Symbol Tables

Disadvantages of BT
It is common for programmers to declare variables in alphabetical order
Variables are added to the ST in order of declaration Degrades to linked list: search is linear

Solve at cost of greater insert and delete time by using height-balanced (AVL) tree BUT: shuffling can destroy the order that made deletions easy Collisions are hard to handle Solve at expense of lookup time
Each node has 2CS331 Symbol Tables name, one for fields: one for

Another way to handle collisions


Add another node to tree node
Pointer to linked list of conflicting nodes Newly added entries at the beginning of the list
laurel hardy hardy hardy

moe

curly

larry larry larry

CS331 Symbol Tables

Final problem with BT

Global variables are discouraged in structured programming


Consequence: well-structured program accesses local variables more frequently

These nodes, added last,


CS331 Symbol Tables

Best ST structure: HASH TABLE


Ideally, we would like to use an array indexed directly by key
Example: key includes letters, digits, and underscore; string could be a base 63 number
26 lower case + 26 uppercase + 10 digits + underscore = 63 possible chars But array accessed by 16 character name would need 6316-1 or 60,000,000,000,000,000,000,000,000,000 elements

Solution:
CS331 Symbol Tables

Hash Table
Several elements of the uncompressed array found at a single location in the compressed array Convert key index into a pseudo-random number
Use this to index compressed array

The randomization is called hashing The number is the hash value


The same key should always hash to the same value Similar keys should hash to different values

Collisions resolved by making each array element the head of a linked list of table
CS331 Symbol Tables

Hash Function
Most common, simplest:
String (key) treated as a number (e.g., add the numeric values for some or all characters in the name) MOD this number by the table size Ideally, a prime number
NAME A B NUMERIC VALUE 97 98 VALUE MOD TABLE SIZE 1 2

C
D

99
100

0
1

Hash_tab[0] Hash_tab[1] Hash_tab[2]

B CS331 Symbol Tables

Ideal Hash Function


Minimizes collisions Average search should be ~binary tree (logarithmic)
Search time is proportional to mean chain length Maximum path length in a (balanced) binary tree with n elements is log2n (e.g. for 255 elements = 7)

We want to get comparable chain length in the hash table Difficult to predict chain lengths from a given algorithm because they are determined by the words in the input
CS331 Symbol Tables

Symbol Attributes
Each piece of information associated with a name in a program is called an attribute
Language-dependent

Vary by symbol class :


Variable Type Constant Parameter Record Record field Procedure Function Array Label File Etc.
CS331 Symbol Tables

Example: Arrays
Must associate dimensions, upper and lower bound of each dimension
Fortran: maximum of 3 dimensions, lower bound always 1 Most languages: many or unlimited dimensions
Need a pointer to a list of lower bound / upper bound pairs

Pascal: only 1 dimensional arrays

Dynamic allocation (array bound is not a constant)


E.g. PL/1:
DCL A(N*2) FIXED;

A is dynamically allocated storage on run-time stack at execution time Compiler must generate code to compute N*2 at runtime, store in a temporary variable Instead of upper bound, store pointer to the ST entry for the temporary variable
CS331 Symbol Tables

Symbol Table Operations


1. Insert 2. Lookup (retrieve) Each name is inserted once, but retrieved many times
Insert is preceded by lookup to see if already there Consequence:

We need rapid search


CS331 Symbol Tables

You might also like