You are on page 1of 37

Course - Principles of Programming Language Report on - Types: Data Representation

Department of Computer Science & Engineering Semester - 5th Session - 2012-13


Submitted by: Kapil Agrawal(10115032) Karan Singh(10115033) K Rushabh Kumar Khandare(10115034) Yashwant K(10115035) Kunal Swami(10115037) Kunami Hansdah(10115038) Manish Kumar(10115041)

National Institute Of Technology, Raipur

Table of Contents
INTRODUCTION 1 THE ROLE TYPES
Values and Their Types Type Expressions Types in This Unit Static Layout Decisions 5 6 7 7 7

A preview of Type Names, Arrays and Records

2 ARRAYS: SEQUENCES OF ELEMENTS


Array Types Array Bounds Array Layout Efficient Address Computation Layout of Array of Arrays Array Bounds and Storage Allocation Static and Dynamic Array Bounds Array Values and Initialization 9 10 10 10 12 12 13 13

3 RECORDS: NAMED FIELDS


A Record Type Specifies Fields A Variable Declaration Allocates Storage Operations on Records A Comparison of Arrays and Records 14 14 15 15

4 UNIONS AND VARIANT RECORDS


Layout of Variant Records Variant Records Compromise Type Safety 17 19

5 SETS
Set Values Set Types Implementation of Sets Operations on Sets
Types: Data Representation 2

19 20 20 20

6 POINTERS: EFFICIENCY AND DYNAMIC ALLOCATION


Pointer Types Operations on Pointers Data Structure that Grows and Shrinks Dangling Pointers and Memory Leaks Design of Pointer Operations in Pascal Pointers as Proxies Operations on Entries 22 22 23 24 24 25 25

7 TWO STRING TABLES


A Representation in Pascal A Representation in C Arrays and Pointers in C 26 26 26

8 TYPES AND ERROR CHECKING


Variable Binding: The Types of a Variable Type System: The Type of an Expression The Basic Rule of Type Checking Type Names and Type Equivalence Static and Dynamic Checking 27 28 29 31 32

Types: Data Representation

Introduction
This report is based on the chapter Types: Data Representation of the subject Principles of Programming Language which deals with data in imperative languages. The imperative languages are separately from functional languages because the emphasis in imperative language is on data structures with assignable components; i.e. components can hold individual values. The size and layout of data structures tends to be fixed at compile time in this language, before a program is run. The various sections described here provides with the information regarding data structures that grow and shrink are typically implemented using fixed-size cells and pointers. Allocation and de-allocation in imperative language must be done explicitly. Further it introduces with the Basic Types ,integers and reals, programming styles regarding characters and type conversion, arrays and its types, layout, bounds (static and dynamic) ,storage. Some description on the working with records, fields, their specification and operations on them and lot more about pointers , sets, strings and error and type checking. Hope this work of ours would help out in better understanding of students and helping them to cope up with the topic of the subject.

Types: Data Representation

1 The Role of Types


Objects have representation. Things or values that are meaningful to an application are called objects or data objects which have corresponding representation in a program with organized values in it. Objects as such can be any data like days, weeks, months, a variable d, m, n, etc like March 5 and its representation as 100. A day of the year is a data object which can be represented by an integer between 1 and 365. Example: Object(days) January 1 January 31 February 1 Representation(integers) 1 31 32

Further integers can be manipulated using arithmetic operations like = and + on integers. Such as if consider let n be the integer representation of p as day, then we can denote the function tomorrow (p) as n+1.

Values and Their Types


Generally, in imperative languages data representations are built up from those values which can be modified by the underlying machine. Values held in machine locations can be classified into basic types: o o o o o integers, character, real, and boolean these values can be denoted by a name they can have the value of an expression operations on basic values are built into the language the can appear on the right side of an assignment, etc

Constructed or structured types: o built up from simpler types o laid out using sequences of locations in the machines o Array, record, and pointers

Types: Data Representation

Type Expressions
Type Expression describes how a data representation is built up (represent data objects) regarding type. Further it helps in avoiding confusion regarding expressions like array [1..100] of char & arithmetic expressions like x+y-z They are also used to lay out values in the underlying machine and to check that operations are applied properly within expressions. Abbreviated to type: array [0..99] of char the role of a type is determined by the use of type expression. It is thus very important to mention and use these values properly for their efficient use. Sample of types using the syntax of Pascal: <simple> ::=<name> | <enumeration> |<subrange> ::=<simple> | array|<simple>| of <type> | records <field-list> end |set of <simple> | <name>

<enumeration> ::=(< name-list>) <subrange> <field> ::=<constant> <constant> ::=<name-list>:<type>

Types: Data Representation

Static Layout Decision


A value of basic type occupies a fixed amount of space. As characters fit in a byte and integers in a machine word. character might fit in a byte, integer in a machine word. array elements are laid out in consecutive machine words. each element occupies the same amount of space A: array [0..2] of T Each field of a record has its own types with its own layout

A pointer usually fits in a machine word, independent of the size of data it points to p: ^ T Sets are implemented with a bit for each potential element the bit for an elements has value 1 if the element is in the set. A : set of [1..5] We can also consider an example of array elements laid out in consecutive machine locations, having same type and occupying same amount of space. Each field of a record has its own type with its own layout.

Names of Basic Types


boolean, char , integer , real A type can be named type <name> = <type> ; An array supports random access of elements. o Random access means that the time to access A[i] is independent of value of i o var A : array [0..99] of char ; A record consists of a set of components, each with its own type o Components of a record are called fields.
Types: Data Representation 7

o Field name are referred to as selectors, o Field identifier, or member name. o E denotes a record with a field name f, then field itself is denoted by E.f

type entryrep = record o end; term:termrep; page:integer;

type termerp = record spell: array [0..99] of char length: integer; end;

Fig. 1 Basic types

2 Arrays: Sequences of Elements


An array is a homogeneous aggregate of data elements in which an individual element is identified by its position in the aggregate, relative to the first element. The individual data elements of an array are of the same type. References to individual array elements are specified using subscript expressions. If any of the subscript expressions in a reference include variables, then the reference will require an additional run-time calculation to determine the address of the memory location being referenced. In many languages, such as C, C++, Java, Ada, and C#, all of the elements of an array are required to be of the same type. In these languages, pointers and references are restricted to point to or reference a single type. So the objects or data values being pointed to or referenced are also of a single type. In some other languages, such as JavaScript, Python, and Ruby, variables are type less
Types: Data Representation 8

references to objects or data values. In these cases, arrays still consist of elements of a single type, but the elements can reference objects or data values of different types. Such arrays are still homogeneous, because the array elements are of the same type. C# and Java 5.0 provide generic arrays, that is, arrays whose elements are references to objects, through their class libraries. An array is a collection of variables which are all of the same type. It is a data structure, which provides the facility to store a collection of data of the same type under single variable name. The declaration of the array includes the type of array that is the type of value we are going to store in it. The fundamental property of arrays is that A[i], the ith element of array A, can be accessed quickly, for any value of i at run time. The index I is often an integer, but it does not need to be. A language designer can allow the index to be any type or value, so long as A[i] can be accessed efficiently. This section contains an example of an array indexed by characters.

Array Types
An array type specifies the index of the first and last elements of the array and the type of all the elements. The index of the first element is called the lower bound and the index of the last element is called the upper bound of the array. An array type has the form array [ <simple> ] of <type> where <type> gives the type of the array elements and [ <simple> ] specifies the lower and upper bounds. Pascal allows the array index type to be an enumeration or sub range which means the list of elements can be specified or the range of array has to be specified. The following are some examples of array types: array [1996..2000] of real array [ (mon, tue, wed, thu, fri) ] of integer Array types are distinguished from record types mainly because they allow the element indices to be compiled at run time as in the Pascal assignment A [I, J] := A[N-I,2*J]. Among other things, this feature allows a single iterative statement to process arbitrarily many elements of an array variable.

Types: Data Representation

Array bounds
Bounds checking are any method of detecting whether a variable is within some bounds before its use. It is particularly relevant to a variable used as an index into an array to ensure its value lies within the bounds of the array. Example: A value of 32768 about to be assigned to a sixteen-bit signed integer variable (whose bounds are 32768 to +32767), or accessing element 25 on an array with index range 0 through 9 only. The first is also known as range checking, the second as index checking. Failed bounds check usually results in the generation of some sort of exception signal. Pascal initially included the bounds in an array type. The problem with this approach is that an array of 10 integers then has a different type from an array of 100 integers. The problem shows up when the procedures are considered. A Pascal procedure expects arguments of a specific type, so a program would need different procedures for sorting arrays of 10 elements and array of 100 elements. Such problems are solved by using parameterized types, where the array bounds are passed as parameters.

Array Layout
The layout of an array determines the machine address of an element A[i] relative to the address of the first element. Layout can occur separately from allocation, which reserves the actual machine address for the array elements. var A: array[low..high] of T the elements of array A appear in consecutive locations in the underlying machine. Let w be the width of each array element, i.e. each element of T occupies w locations. Then, if A[low] begins at location base, A[low+1] begins at base + w, A[low+2] at base + 2*w, and so on.

Efficient Address Computation


The address of an array element can be computed in two parts: A part that can be pre computed as soon as the array is declared A part has to be computed at runtime because it depends on the value of array subscript. Although the layout of array elements may be known in advance, the actual element A[i] is not, since the value of i can change at runtime.
Types: Data Representation 10

The formula for computing the address of A[i] therefore depends on the value of i. A formula for the address of A[i] is best expressed as i*w + (base low*w) where i*w has to be computed at run time, but where (base - low*w) can be pre computed. In C, the first element of an array is the zeroth element; so low=0, and the formula simplifies to i*w + base. Rectangular arrays allocated in memory by row. Let's see how the memory is allocated for this array: char ttt[3][3] = {{x, x, o}, {o, o, x}, {x, o, }}; The memory for this array could be visualized as in the diagram to the right, which identifies a few cells by their subscripts.

Fig. 2 Memory allocation of 2D array

Types: Data Representation

11

Because memory is addressed linearly, a better representation is like the diagram below:

Fig .3 Alternative way

Computing the Address of Any Element


C++ must compute the memory address of each array element that it accesses. C++ does this automatically, but it helps to understand what's going on "under the hood". Assume the following declaration:

char a[ROWS][COLS]; // assume ROWS and COLS are constants Because arrays are laid out in memory by row, each row length is COLS (the number of columns is the size of a row). Let's assume that you want to find the address of a[r][c]. The baseAddress of the array is the address of the first element. The rowSize is COLS in the above example. The elementSize is the number of bytes required to represent the data (typically 1 for char, 4 for int, 4 for float, and 8 for double). The number of rows (ROWS) is not used in the computation because the number of rows is not used, there is no need to pass it when declaring a formal array parameter for a two-dimension array.

Storage Allocation
There are different ways to allocate storage Static allocation allocates storage at compile time Stack allocation - manages run time storage as stack Heap allocation allocates and de-allocates storage area as needed at runtime.

Static Allocation
Names are bound to storage locations. This property allows values of local names to be retained across the complete program.
Types: Data Representation 12

At the compile time compiler determines how much storage should be allocated for each object. At the compile time, compiler determines the following: 1. Where the activation records go, relative to target code 2. Where the addresses should be filled in the records 3. The address for the procedure calls Limitations: Size & position of data objects must be known at compile time. Recursive procedures are restricted. Data objects cannot be created dynamically. Example: FORTRAN A FORTRAN compiler might place the activation record for a procedure together with the code for that procedure. In some systems it is possible to use link editor to link activation records and executable code.

Stack Allocation
Storage is organized as a stack. Activation records are pushed and popped as activation begin and end. Storage for locals in each call of a procedure is contained in the activation record for that call. The values of locals are deleted when the activation ends. A register can be used to mark the top of stack. At run time an activation record can be allocated and de-allocated by incrementing and decrementing register.

Array Initialization
Some languages provide the means to initialize arrays at the time their storage is allocated. In Fortran 95+, an array can be initialized by assigning it an array aggregate in its declaration. An array aggregate for a single-dimensioned array is a list of literals delimited by parentheses and slashes. For example, we could have integer. In C declaration int list [] = {4, 5, 7, 83}; The compiler sets the length of the array. This is meant to be a convenience but is not without cost. It effectively removes the possibility that the system could detect some kinds of programmer errors, such as mistakenly leaving a value out of the list.

Types: Data Representation

13

As discussed in Section 6.3.2, character strings in C and C++ are implemented as arrays of char. These arrays can be initialized to string constants, as in char name [] = "freddie";

The array name will have eight elements, because all strings are terminated with a null character (zero), which is implicitly applied by the system for string constants. Arrays of strings in C and C++ can also be initialized with string literals. In this case, the array is one of pointers to characters. For example, char *names [] = {"Bob", "Jake", "Darcie"};

3 Records: Named Fields


Records allow variables relevant to any object to be grouped together and treated as a unit. It deals with the fields and there types as used in the variable declaration during a program. A record type with k fields can have the following form of representation: Record <name1>: <type1>; <name2>: <type2>; . . . <namek>: <typek>;

Such as in case of database kind of records we can write as: <customer_name>: <char>; <account_no>: <integer>; <id_number>: <varchar>;

Here, name refers to the name of fields i type of field and each field has its own distinct name within a record.
Types: Data Representation 14

For example we can include a record of type complex in the following declaration with two fields, re & im.

type complex = record re: real; im: imaginary; end;

Also declaration of the fields with same type can be combined together, where re and im are both of types real:

type complex = record re, im: real; end;

We are free to enter the record fields in any order. Since fields are accessed by their name and not by their relative positions as in an array so change in the order of the fields of a record has no effect on the meaning of the program. A Variable Declaration Allocates Storage:-

This statement denotes that storage is allocated when the template is applied in a variable declaration, not when the template is described. So once a record type is described, we can use it in a variable declaration. Example: var a, b, c : integer;

This denotes that variables a, b and c have storage associated with them and the layout of which is determined by the type integer. We can use the records and field logic as we want keeping in mind some of their basic points worth noting. Operations on Records
Types: Data Representation 15

If expression E denotes a record with a field name f, then the field name itself is denoted by E.f which has both location and a value.

Example: c.re:= a.re + b.re The example given above shows that sum of the values of fields b.re and a.re is placed in the location of c.re.

Comparison between arrays and records with respect to their possible component types, layout and component selection:

Arrays Array elements can be changed at run time, depending on the value of i.

Records Record fields are fixed at compile time

Flexibility in selecting array elements

Hence no flexibility in selecting record elements

Array is a homogeneous collection of elements.

Record is a heterogeneous collection of elements.

Array elements are evaluated at run time.

Record elements are selected at compile time Each field in a record can have different type and hence can occupy different amount of space

Each element occupies same amount of space.

Table 1: Comparison between arrays and records

4 Union and Variant Records


Unions and variant records illustrate a possible implementation for objects.
Types: Data Representation 16

Records: Records are for representing objects with common properties; all records of same type have same fields in common. Variant records are for representing objects that have some but not all properties in common.

Unions: A union is a special case of variant record, with an empty common part. Variant records have a part common to all records of that type, and a variant part, specific to some subset of the records. Specifically, suppose that objects in a set can be classified into n disjoint subsets (n>1). Such objects can be represented by records with n variant parts, one per subset, and a part common to all subsets.

Fig. 4 Union and variant records

Unions are used to view the same data item in different ways. Following is an example in C which shows how a 16-bit register AX comprises of two 8-bit registers AH and AL.
struct WORDREGS { unsigned int ax; unsigned int bx; unsigned int cx; unsigned int dx; unsigned int si; unsigned int di; unsigned int cflag; }; struct BYTEREGS { unsigned char a1, ah; unsigned char b1, bh; unsigned char c1, ch; unsigned char d1, dh; Types: Data Representation 17

};

union of these two structures provides access to WORDREGS or BYTEREGS defined as: union REGS { struct WORDREGS x; struct BYTEREGS h; } union REGS reg can access AH register by reg.h.ah and the AX register by reg.x.ax.

Layout of Variant Records


Following syntax is used to explore the layout of variant parts within a record: case <tag-name>:<type-name> of <constant1>: (<fields1>); <constant1>: (<fields1>); . . . <constantv>: (<fieldsv>);

In Pascal a variant part appears after fixed part of a record. The constants in the syntax <constant1>, <constant2>,. . . , <constantv> corresponds to distinct states of the variant part; each state has its own field layout. The state depends on the constant stored in a special field, called a tag-field with name <tag-name> and type given by <type-name>. type kind = (leaf, unary, binary); node = record c1: T1; c2: T2; case k: kind of leaf: ( ); unary: ( child: T3 ); binary: ( lchild, rchild: T4 ); end;

Types: Data Representation

18

Variant Records Compromise Type Safety


Variant records introduce weaknesses into the type system for a language. Compilers usually do not check that the value in the tag field is consistent with the state of the record. Furthermore, tag fields are optional. Example: The tag name can be dropped, as in the following declaration of type t: type kind=1 .. 2; t=record case kind of 1 : ( i:integer ); 2 : ( r:real ); end; var x:t;

Record type t consists of only a variant part. The type name kind after case provides for the variant part to be in one of two states, given by constants 1 and 2 associated with kind. Since tag name does not appear between case and the type name kind, there is no tag field; the state of the variant part therefore cannot be stored within the record. The only possible fields for variable x are x.i and x.r, only one of which exists at any given time. Since the state is not stored within the record, an implementation cannot check whether x is in state 1 when x.i is selected and whether x is in state 2 when x.r is selected.

5 SETS
Sets can be implemented efficiently using bits in the underlying machine. Operations of sets turn into bit operations. Pascal allows sets to be used as values. It also provides a type constructor set of for building set types from enumerations and sub ranges.

Set Values
A set is written in Pascal by writing its elements between the set brackets, [ and ]. The following are examples of sets: [] [ 0..9 ] [ a..z, A..Z ] [ Mon..Sun ]
Types: Data Representation 19

The elements of a set can be written individually or as sub ranges. The empty set is [ ], the set with no elements. All set elements must be of the same simple type- specifically, integer, an enumeration, or a sub range of these types.

Set Types
The type set of S represents subsets of S. For example, consider variable A declared by var A: set of [ 1..3 ] A can denote one of the following sets: [ ], [ 1 ], [ 2 ], [ 3 ], [ 1, 2 ], [ 1, 3 ], [ 2, 3 ], [ 1, 2, 3 ] Since these are all the subsets of [ 1, 2, 3 ], the type set of S in Pascal should perhaps be called subset of S.

Implementation of Sets
These sets can all be represented using three bits. Element 1 is represented by the first bit, element 2 by the second bit, and element 3 by the third bit. The set [ 1, 3 ] can then be encoded by the bit vector 101 A set of n elements is implemented as a bit vector of length n, Since the bit vector must typically fit in a word, an implementation is allowed to impose a limit on the maximum number of elements in a set, thereby restricting the usefulness of sets. The purpose of the limit is to allow sets to be represented by one or more machine words.

Operations on Sets
The basic operation on sets is a membership test. The operation in tests if an element x belongs to a set A. Bit vectors allow the following operations on sets to be implemented efficiently, by using bit-wise operations: A+B * / set union set difference set intersection symmetric difference

Types: Data Representation

20

Sets can be compared using the relational operators <=, =, !=, >=, where <= is interpreted as subset and >= is interpreted as subset and >= is interpreted as superset. Note, however, that the operations < and > are not allowed. Example: Following is program fragment to demonstrate sets: case ch of +, -, *, /, (,), ;: begin lookahead := tok[ch]; ch := end; 0, 1, 2, 3, 4, 5, 6, 7, 8, 9: begin . . . lookahead := number end; end; If the labels that select a statement are grouped into a set, then a membership test can be used to select the statement. A derived binary relation between two sets is the subset relation, also called set inclusion. If all the members of set A are also members of set B, then A is a subset of B, denoted A B. For example, {1, 2} is a subset of {1, 2, 3}, but {1, 4} is not. From this definition, it is clear that a set is a subset of itself; in cases where one wishes to avoid this, the term proper subset is defined to exclude this possibility. Just as arithmetic features binary operations on numbers, set theory features binary operations on sets. The:

Union of the sets A and B, denoted A B, is the set of all objects that are a member of A, or B, or both. The union of {1, 2, 3} and {2, 3, 4} is the set {1, 2, 3, 4}. Intersection of the sets A and B, denoted A B, is the set of all objects that are members of both A and B. The intersection of {1, 2, 3} and {2, 3, 4} is the set {2, 3}. Set difference of U and A, denoted U \ A, is the set of all members of U that are not members of A. The set difference {1,2,3} \ {2,3,4} is {1} , while, conversely, the set difference {2,3,4} \ {1,2,3} is {4} . When A is a subset of U, the set difference U \ A is also called the complement of A in U. In this case, if the choice of U is clear from the context, the notation Ac is sometimes used instead of U \ A, particularly if U is a universal set as in the study of Venn diagrams.
Types: Data Representation 21

Symmetric difference of sets A and B is the set of all objects that are a member of exactly one of A and B (elements which are in one of the sets, but not in both). For instance, for the sets {1,2,3} and {2,3,4} , the symmetric difference set is {1,4} . It is the set difference of the union and the intersection, (A B) \ (A B) or (A \ B) (B \ A). Cartesian product of A and B, denoted A B, is the set whose members are all possible ordered pairs (a,b) where a is a member of A and b is a member of B. The Cartesian product of {1, 2} and {red, white} is {(1, red), (1, white), (2, red), (2, white)}. Power set of a set A is the set whose members are all possible subsets of A. For example, the power set of {1, 2} is { {}, {1}, {2}, {1,2} } .

Some basic sets of central importance are the empty set (the unique set containing no elements), the set of natural numbers, and the set of real numbers.

6 POINTERS EFFICIENCY AND DYNAMIC ALLOCATION


A pointer type is a value that provides indirect access to elements of known type. Pointers are motivated by indirect address in machine language. In a way pointers provide indirect access to memory location where a particular type of value (int, char, float or any other type) is stored. Pointers also have their types defined; their type is same as the type of data whose memory location pointer is pointing to. A pointer in C language is of the form: <type-name>*<pointer variable name> Example: int *c; char *str; float *f; etc

Advantages of using Pointers


EFFICIENCY: Using Pointers in program increases program efficiency. We can prove this fact just by a simple example. Suppose we have the following piece of code: char str1[]="hello world";
Types: Data Representation 22

printf("%d", sizeof(str1)); OUTPUT 12 Similarly, char * str2="hello world"; printf("%d",sizeof(str2)); OUTPUT (in 16 bit compilers) 2 Apart from this Pointers are also efficient because to access a memory address location we need not traverse each subsequent address to reach the memory address. We can simply jump off using pointers. This saves execution time. Example: case 1 char arr1[]="hello world"; printf("%d", arr[4]); case 2 char arr2[]="hello world"; char *ptr=arr2; printf("%d", *(ptr+4)); The o/p in both cases is the same i.e. 'o' but in case 1 to print 'o' the compiler will have to go through arr[0], arr[1], arr[2], arr[3].But in 2nd case the Pointer variable simply stores the base address of array arr2[].Now to print the arr[4] element it simply adds 4 to base address and points to the corresponding memory location. Now we use the dereferencing operator * to get the value at the corresponding memory location. Dynamic Data. Data structures that grow and shrink during run-time can be implemented via pointers.This is called Dynamic memory management.

Example: Use of malloc() and calloc() in C facilitates us with dynamic memory management. Dynamic memory management is necessary because sometimes during runtime we don't know what will be the size or type of input during the execution time. So in order to
Types: Data Representation 23

handle such a situation we use dynamic memory management to allocate necessary resources (memory) at runtime.

Operations on Pointers
The basic operations on pointers in dereferencing that can be done using the * operator in C/C++ and using in Pascal. But when it comes to the Pascal language there are 5 basic operations that pointers can perform they are as follows: 1. Dynamic allocation to the heap: Execution of new (p) leaves p pointing to a newly allocated data structure of the type T on the heap. 2. Dereferencing: Expression p denotes data structure pointed to by p.

3. Assignment: Assignments are permitted to pointers of same type. 4. Equality testing: The equality relation = tests if two pointers of same type point to same data structure. 5. De-allocation: A dynamic data structure exists until it is explicitly released by execution of a statement dispose (p).

Data Structures That Grow and Shrink


The ability to grow and shrink data structures in programs like compilers which must handle source text ranging from a few lines to a thousands of lines. Data structure that grow and shrink during program execution are implemented using records and pointers because of the following design principle. Static layout principle: The size and layout of the storage for each type are known statically, before a program runs. Example: Implementing linked list data structure in C, the nodes of the linked list can grow or shrink on runtime as per demand. Code to implement a node in link list is: struct node{ int data; struct node * link; }; Here a node is a linked list data structure. We can create functions append() to append a node to the linked list, delete() function to delete a particular node from link list, insert() to insert a node in the linked list. All these functions will perform their task
Types: Data Representation 24

dynamically i.e. at runtime.via insert() or append() linked list data structure can grow dynamically and via delete() linked list can shrink if required. The conceptual or logical organization of linked list data structure is different from representation in physical memory. The links between cells determine the logical organization. The connected cells need not be physically adjacent to each other, they can be anywhere in memory.

Dangling Pointer and Memory Leaks


A dangling pointer is a pointer that points to such a memory location whose value has been removed. Example: int * a = new int; int *b = a; delete b; Now a will be a dandling pointer. Or in other words Dangling pointers in computer programming are pointers that do not point to a valid object of the appropriate type. Dangling pointers arise when an object is deleted or de-allocated, without modifying the value of the pointer, so that the pointer still points to the memory location of the de-allocated memory. Memory leaks occur when memory allocated to a pointer is not deleted and the pointer goes out of scope. This allocated memory never gets de-allocated and remains in the heap until the system is restarted or a memory leak is a particular type of unintentional memory consumption by a computer program where the program fails to release memory when no longer needed. Example: void f(void) { void* s; s = malloc(50); return; } //control comes out w/o freeing the memory
Types: Data Representation 25

// get memory

free () function can be used to de-allocate memory in above example. Memory leaks in some cases can lead to Buffer Overflows which is a common type of attack used by hackers.

7 Two String Tables


From a distance types in C and Pascal are almost similar as both have arrays, records and pointers. The differences in treatment of pointers however lead to differences in style between the languages. The Pascal representation is adopted from the published code of TEX typesetting.

A Representation in Pascal
The TEX typesetting uses two arrays, say pool and start to hold the character strings like: Tex troff word The individual characters in string are kept in array pool. Elements of the other array, start, point to the first character of each string. The actual array names in the code of TEX and str_start and str_pool.Elements start[s] is the index of the first character of the string s. The start array stores the address of the first character of a new string. The array pool contains all the strings.

Arrays and Pointers in C


In C programming language arrays are represented as block of memory locations whose values are stored in contagious locations. Array elements can be accessed via their index positions.

8 Types and Error Checking


Type distinctions between values carry over to expressions. Values have fixed types. The constant 3.14 is a real number, the constant true is a boolean, and real number and boolean are different types.

If x denotes 3.14, then it denotes a real number, alternatively, if x denotes 3.14, then it denotes a real number. Alternatively, if x denotes true, then it denotes a Boolean value.
Types: Data Representation 26

The type of an expression x + y can be inferred from the types of x and y. Since 2 and 2 are integers and the sum of two integers is an integer, 2 + 2 must also denote an integer.

Fig. 5 Arrays and pointers in C

Variable Binding
The language design determines whether a variable has a fixed type. In Pascal and C, the type of a variable is fixed; if i is declared to have a type integer, then it must denote an integer, although the integer it denotes can change at run time. The assignment i := i+1 changes the value denoted by i, but the new value is again an integer. Lisp and Smalltalk do not restrict the types of the values a variable can denote at run time. The distinction between Pascal and Lisp can be explained in terms of binding times. A variable binding associates a property with a variable. Thus, an assignment x := 3.14 is a binding that associates the value 3.14 with x. A binding is static if it occurs before a program runs; it is dynamic if it occurs at run time. Static bindings are sometimes referred to as early bindings, and dynamic bindings are referred to as late bindings.
Types: Data Representation 27

Thus Pascal has static binding of types and dynamic bindings of values to variables, whereas Lisp has dynamic binding of both values and types.

Value binding Dynamic Pascal and C Dynamic Lisp/Smalltalk

Type binding Static Dynamic

Table 2: Difference between Value and Type binding

Type Systems
A type system associates a type with each computed value. By examining the flow of these values, a type system attempts to ensure or prove that no type errors can occur. The particular type system in question determines exactly what constitutes a type error, but in general the aim is to prevent operations expecting a certain kind of value from being used with values for which that operation does not make sense (logic errors); memory errors will also be prevented. Type systems are often specified as part of programming languages, and built into the interpreters and compilers for them; although they can also be implemented as optional tools. The depth of type constraints and the manner of their evaluation affect the typing of the language. A programming language may further associate an operation with varying concrete algorithms on each type in the case of type polymorphism. Type theory is the study of type systems, although the concrete type systems of programming languages originate from practical issues of computer architecture, compiler implementation, and language design. A program associates each value with at least one particular type, but it also occurs also that a one value is associated with many subtypes. Other entities, such as objects, modules, communication channels, dependencies can become associated with a type.

Types: Data Representation

28

Even a type can become associated with a type. An implementation of some type system could in theory associate some identifications named this way:

data type a type of a value class a type of an object kind (type theory) a type of a type, or metatype

These are the kinds of abstractions typing can go through on a hierarchy of levels contained in a system. Example: Type system for Fortran: o Variable names starting with letter I through N have type int; All other names have type real o A number has type real if it contains a decimal pointer; otherwise it has type int. o If expressions E and F have same type, following are expressions of same type E + F, E - F, E * F, E / F

Basic Rule of Type Checking


Rules in a type system are based on the following property of functions: When a function from a set A to set B is applied to an element of set A the result is an element of set B.

Arithmetic operators
Arithmetic operators are functions. Associated with each op is a rule that specifies the type of an expression E op F in terms of the types E and F. An example is

If E and F have type int, then E + F also has type int

Overloading: Multiple Meanings

Types: Data Representation

29

Familiar operator symbols like + and * are overloaded; that is, these symbols have different meanings in different contexts. + is used for both integer and real addition, so it has two possible types. The treatment of + can therefore be restated using the following pair of rules: If E has type int and F has type int, then E + F also has type int If E has type real and F has type real, then E + F also has type real

Coercion: Implicit Type Conversion


The original FORTRAN type system rejected expressions like X + I and 2 * 3.142, since one operand is an integer and the other is a real. This restriction was lifted in later versions of FORTRAN. Most programming languages treat the expression 2 * 3.142 as if it were 2.0 * 3.142, the product of two reals. Coercion is a conversion from one type to another, inserted automatically by a programming language. In 2 * 3.142, the integer 2 is coerced to a real before the multiplication is done.

Polymorphism
A polymorphic function has a parameterized type, also called a generic type. Data structures like stacks and queues can be defined to hold values of any type. Thus, we can define stacks of integers, stacks of grammar symbols, and so on. When code for such data structures is put into a library, a library designer cannot possibly anticipate all future uses of the data structure. Polymorphic types allow such a data structure to be defined once and then applied later to any desired type. In imperative languages like Pascal and C, the only polymorphic functions are operations on built-in types. C++ supports parameterized types, using a construct called templates. Functional languages have long supported polymorphic types.

Type Names and Type Equivalence


The question of type equivalence arises during type checking. Example: Are the following two types equal?
Types: Data Representation 30

array[ 0..9 ] of integer array[ 0..9 ] of integer In a large context, if variables x, y, and z are declared as follows, are their types equal? x, y : z : array[ 0..9 ] of integer array[ 0..9 ] of integer

In Pascal, x and y have the same type, because they are declared together, but z does not. In the corresponding C fragments, x, y and z all have the same type.

Structural Equivalence
Two type expressions are structurally equivalent if and only if they are equivalent under the following three rules: SE1. A type name is structurally equivalent to itself. SE2. Two types are structurally equivalent if they are formed by applying the same type constructor to structurally equivalent types. SE3. After a type declaration, type n = T, the type name n is structurally equivalent to T. By these rules, the types char and char are structurally equivalent, and so are the type names S and T: type S= array [ 0..99 ] of char; type T= array [ 0..99 ] of char;

Forms of Name Equivalence


More limited notions of type equivalence are obtained if we restrict rules SE1-SE3. Some possibilities follow: o Pure name equivalence. A type name is equivalent to itself, but no constructed type is equal to any other constructed type. o Transitive name equivalence. A type name is equivalent to itself and can be declared equivalent to other type names. Then, the following types S, T and U are equivalent to each other and to integer because integer is a type name also:
Types: Data Representation 31

type S = integer; T = S; U = integer; o Type expression equivalence. A type name is equivalent only to itself. Two type expressions are equivalent if they are formed by applying the same constructor to equivalent expressions. In other words, the expressions have to be identical.

Static and Dynamic Checking


The choice between static and dynamic typing requires trade-offs. Static typing can find type errors reliably at compile time. This should increase the reliability of the delivered program. However, programmers disagree over how commonly type errors occur, and thus disagree over the proportion of those bugs that are coded that would be caught by appropriately representing the designed types in code. Static typing advocates believe programs are more reliable when they have been well type-checked, while dynamic typing advocates point to distributed code that has proven reliable and to small bug databases. The value of static typing, then, presumably increases as the strength of the type system is increased. Advocates of dependently typed languages such as Dependent ML and Epigram have suggested that almost all bugs can be considered type errors, if the types used in a program are properly declared by the programmer or correctly inferred by the compiler. Static typing usually results in compiled code that executes more quickly. When the compiler knows the exact data types that are in use, it can produce optimized machine code. Further, compilers for statically typed languages can find assembler shortcuts more easily. Some dynamically typed languages such as Common Lisp allow optional type declarations for optimization for this very reason. Static typing makes this pervasive. By contrast, dynamic typing may allow compilers to run more quickly and allow interpreters to dynamically load new code, since changes to source code in dynamically

Types: Data Representation

32

typed languages may result in less checking to perform and less code to revisit. This too may reduce the edit-compile-test-debug cycle. Dynamic typing allows constructs that some static type checking would reject as illegal. For example, eval functions, which execute arbitrary data as code, become possible. An eval function is possible with static typing, but requires advanced uses of algebraic data types. Furthermore, dynamic typing better accommodates transitional code and prototyping, such as allowing a placeholder data structure (mock object) to be transparently used in place of a full-fledged data structure (usually for the purposes of experimentation and testing).

*****

Types: Data Representation

33

Exercises
Solution 7
a. Length of string =start[s+1]-start[s]

b. To output the string s: for(i=0;i<length;i++) { pool[i]; }

c. To compare two strings s and t:

if (length(s)==length(t)) { for(i=0;i<length;i++) { if(pool[s[i]]==pool[t[i]]) else break; } } continue;

Solution 8
delete(struct node **q,int num) { //code to delete first node

struct node * old,*temp; temp=*q; while(temp!=NULL) {


Types: Data Representation 34

if(temp->data==num) { /*if node to be deleted is the first node in in linked list*/ if(temp==*q) { *q=temp->link /*free the memory occupied by the node*/ free(temp) } addatbeg(struct node **q,int num)//function to add node at beginning { struct node temp; /*add a new node*/ temp=malloc(sizeof(struct node)); temp->data=num; temp->link=*q; *q=temp; }

Solution 9
a. Binary tree can be implemented through the following piece of code struct tree{ int data; struct tree * left; struct tree *right; }p;
Types: Data Representation 35

This creates the one of the nodes of the tree where *left and *right denote left and right child respectively.

b. member() { //traverse the binary tree //at each node check whether the data field in int or not //if data field present than return //else search in right or left child }

c. insert(){ //traverse the left or right sub tree of root node //check whether child of current node in NULL //if null add an integer node as its child //else keep traversing until a NULL is encountered }

*****

Types: Data Representation

36

Bibliography
1. Principles of Programming: Concepts and Constructs by Ravi Sethi 2. Programming in C by Dennis Ritchie 3. Practical C++ Programming by Steve Oualine 4. Concepts of Programming Languages by Robert W. Sebesta 5. Wikipedia 6. scribd.com 7. authorstream.com

*****

Types: Data Representation

37

You might also like