Professional Documents
Culture Documents
Table of Contents
INTRODUCTION 1 THE ROLE TYPES
Values and Their Types Type Expressions Types in This Unit Static Layout Decisions 5 6 7 7 7
5 SETS
Set Values Set Types Implementation of Sets Operations on Sets
Types: Data Representation 2
19 20 20 20
Introduction
This report is based on the chapter Types: Data Representation of the subject Principles of Programming Language which deals with data in imperative languages. The imperative languages are separately from functional languages because the emphasis in imperative language is on data structures with assignable components; i.e. components can hold individual values. The size and layout of data structures tends to be fixed at compile time in this language, before a program is run. The various sections described here provides with the information regarding data structures that grow and shrink are typically implemented using fixed-size cells and pointers. Allocation and de-allocation in imperative language must be done explicitly. Further it introduces with the Basic Types ,integers and reals, programming styles regarding characters and type conversion, arrays and its types, layout, bounds (static and dynamic) ,storage. Some description on the working with records, fields, their specification and operations on them and lot more about pointers , sets, strings and error and type checking. Hope this work of ours would help out in better understanding of students and helping them to cope up with the topic of the subject.
Further integers can be manipulated using arithmetic operations like = and + on integers. Such as if consider let n be the integer representation of p as day, then we can denote the function tomorrow (p) as n+1.
Constructed or structured types: o built up from simpler types o laid out using sequences of locations in the machines o Array, record, and pointers
Type Expressions
Type Expression describes how a data representation is built up (represent data objects) regarding type. Further it helps in avoiding confusion regarding expressions like array [1..100] of char & arithmetic expressions like x+y-z They are also used to lay out values in the underlying machine and to check that operations are applied properly within expressions. Abbreviated to type: array [0..99] of char the role of a type is determined by the use of type expression. It is thus very important to mention and use these values properly for their efficient use. Sample of types using the syntax of Pascal: <simple> ::=<name> | <enumeration> |<subrange> ::=<simple> | array|<simple>| of <type> | records <field-list> end |set of <simple> | <name>
A pointer usually fits in a machine word, independent of the size of data it points to p: ^ T Sets are implemented with a bit for each potential element the bit for an elements has value 1 if the element is in the set. A : set of [1..5] We can also consider an example of array elements laid out in consecutive machine locations, having same type and occupying same amount of space. Each field of a record has its own type with its own layout.
o Field name are referred to as selectors, o Field identifier, or member name. o E denotes a record with a field name f, then field itself is denoted by E.f
type termerp = record spell: array [0..99] of char length: integer; end;
references to objects or data values. In these cases, arrays still consist of elements of a single type, but the elements can reference objects or data values of different types. Such arrays are still homogeneous, because the array elements are of the same type. C# and Java 5.0 provide generic arrays, that is, arrays whose elements are references to objects, through their class libraries. An array is a collection of variables which are all of the same type. It is a data structure, which provides the facility to store a collection of data of the same type under single variable name. The declaration of the array includes the type of array that is the type of value we are going to store in it. The fundamental property of arrays is that A[i], the ith element of array A, can be accessed quickly, for any value of i at run time. The index I is often an integer, but it does not need to be. A language designer can allow the index to be any type or value, so long as A[i] can be accessed efficiently. This section contains an example of an array indexed by characters.
Array Types
An array type specifies the index of the first and last elements of the array and the type of all the elements. The index of the first element is called the lower bound and the index of the last element is called the upper bound of the array. An array type has the form array [ <simple> ] of <type> where <type> gives the type of the array elements and [ <simple> ] specifies the lower and upper bounds. Pascal allows the array index type to be an enumeration or sub range which means the list of elements can be specified or the range of array has to be specified. The following are some examples of array types: array [1996..2000] of real array [ (mon, tue, wed, thu, fri) ] of integer Array types are distinguished from record types mainly because they allow the element indices to be compiled at run time as in the Pascal assignment A [I, J] := A[N-I,2*J]. Among other things, this feature allows a single iterative statement to process arbitrarily many elements of an array variable.
Array bounds
Bounds checking are any method of detecting whether a variable is within some bounds before its use. It is particularly relevant to a variable used as an index into an array to ensure its value lies within the bounds of the array. Example: A value of 32768 about to be assigned to a sixteen-bit signed integer variable (whose bounds are 32768 to +32767), or accessing element 25 on an array with index range 0 through 9 only. The first is also known as range checking, the second as index checking. Failed bounds check usually results in the generation of some sort of exception signal. Pascal initially included the bounds in an array type. The problem with this approach is that an array of 10 integers then has a different type from an array of 100 integers. The problem shows up when the procedures are considered. A Pascal procedure expects arguments of a specific type, so a program would need different procedures for sorting arrays of 10 elements and array of 100 elements. Such problems are solved by using parameterized types, where the array bounds are passed as parameters.
Array Layout
The layout of an array determines the machine address of an element A[i] relative to the address of the first element. Layout can occur separately from allocation, which reserves the actual machine address for the array elements. var A: array[low..high] of T the elements of array A appear in consecutive locations in the underlying machine. Let w be the width of each array element, i.e. each element of T occupies w locations. Then, if A[low] begins at location base, A[low+1] begins at base + w, A[low+2] at base + 2*w, and so on.
The formula for computing the address of A[i] therefore depends on the value of i. A formula for the address of A[i] is best expressed as i*w + (base low*w) where i*w has to be computed at run time, but where (base - low*w) can be pre computed. In C, the first element of an array is the zeroth element; so low=0, and the formula simplifies to i*w + base. Rectangular arrays allocated in memory by row. Let's see how the memory is allocated for this array: char ttt[3][3] = {{x, x, o}, {o, o, x}, {x, o, }}; The memory for this array could be visualized as in the diagram to the right, which identifies a few cells by their subscripts.
11
Because memory is addressed linearly, a better representation is like the diagram below:
char a[ROWS][COLS]; // assume ROWS and COLS are constants Because arrays are laid out in memory by row, each row length is COLS (the number of columns is the size of a row). Let's assume that you want to find the address of a[r][c]. The baseAddress of the array is the address of the first element. The rowSize is COLS in the above example. The elementSize is the number of bytes required to represent the data (typically 1 for char, 4 for int, 4 for float, and 8 for double). The number of rows (ROWS) is not used in the computation because the number of rows is not used, there is no need to pass it when declaring a formal array parameter for a two-dimension array.
Storage Allocation
There are different ways to allocate storage Static allocation allocates storage at compile time Stack allocation - manages run time storage as stack Heap allocation allocates and de-allocates storage area as needed at runtime.
Static Allocation
Names are bound to storage locations. This property allows values of local names to be retained across the complete program.
Types: Data Representation 12
At the compile time compiler determines how much storage should be allocated for each object. At the compile time, compiler determines the following: 1. Where the activation records go, relative to target code 2. Where the addresses should be filled in the records 3. The address for the procedure calls Limitations: Size & position of data objects must be known at compile time. Recursive procedures are restricted. Data objects cannot be created dynamically. Example: FORTRAN A FORTRAN compiler might place the activation record for a procedure together with the code for that procedure. In some systems it is possible to use link editor to link activation records and executable code.
Stack Allocation
Storage is organized as a stack. Activation records are pushed and popped as activation begin and end. Storage for locals in each call of a procedure is contained in the activation record for that call. The values of locals are deleted when the activation ends. A register can be used to mark the top of stack. At run time an activation record can be allocated and de-allocated by incrementing and decrementing register.
Array Initialization
Some languages provide the means to initialize arrays at the time their storage is allocated. In Fortran 95+, an array can be initialized by assigning it an array aggregate in its declaration. An array aggregate for a single-dimensioned array is a list of literals delimited by parentheses and slashes. For example, we could have integer. In C declaration int list [] = {4, 5, 7, 83}; The compiler sets the length of the array. This is meant to be a convenience but is not without cost. It effectively removes the possibility that the system could detect some kinds of programmer errors, such as mistakenly leaving a value out of the list.
13
As discussed in Section 6.3.2, character strings in C and C++ are implemented as arrays of char. These arrays can be initialized to string constants, as in char name [] = "freddie";
The array name will have eight elements, because all strings are terminated with a null character (zero), which is implicitly applied by the system for string constants. Arrays of strings in C and C++ can also be initialized with string literals. In this case, the array is one of pointers to characters. For example, char *names [] = {"Bob", "Jake", "Darcie"};
Such as in case of database kind of records we can write as: <customer_name>: <char>; <account_no>: <integer>; <id_number>: <varchar>;
Here, name refers to the name of fields i type of field and each field has its own distinct name within a record.
Types: Data Representation 14
For example we can include a record of type complex in the following declaration with two fields, re & im.
Also declaration of the fields with same type can be combined together, where re and im are both of types real:
We are free to enter the record fields in any order. Since fields are accessed by their name and not by their relative positions as in an array so change in the order of the fields of a record has no effect on the meaning of the program. A Variable Declaration Allocates Storage:-
This statement denotes that storage is allocated when the template is applied in a variable declaration, not when the template is described. So once a record type is described, we can use it in a variable declaration. Example: var a, b, c : integer;
This denotes that variables a, b and c have storage associated with them and the layout of which is determined by the type integer. We can use the records and field logic as we want keeping in mind some of their basic points worth noting. Operations on Records
Types: Data Representation 15
If expression E denotes a record with a field name f, then the field name itself is denoted by E.f which has both location and a value.
Example: c.re:= a.re + b.re The example given above shows that sum of the values of fields b.re and a.re is placed in the location of c.re.
Comparison between arrays and records with respect to their possible component types, layout and component selection:
Arrays Array elements can be changed at run time, depending on the value of i.
Record elements are selected at compile time Each field in a record can have different type and hence can occupy different amount of space
Records: Records are for representing objects with common properties; all records of same type have same fields in common. Variant records are for representing objects that have some but not all properties in common.
Unions: A union is a special case of variant record, with an empty common part. Variant records have a part common to all records of that type, and a variant part, specific to some subset of the records. Specifically, suppose that objects in a set can be classified into n disjoint subsets (n>1). Such objects can be represented by records with n variant parts, one per subset, and a part common to all subsets.
Unions are used to view the same data item in different ways. Following is an example in C which shows how a 16-bit register AX comprises of two 8-bit registers AH and AL.
struct WORDREGS { unsigned int ax; unsigned int bx; unsigned int cx; unsigned int dx; unsigned int si; unsigned int di; unsigned int cflag; }; struct BYTEREGS { unsigned char a1, ah; unsigned char b1, bh; unsigned char c1, ch; unsigned char d1, dh; Types: Data Representation 17
};
union of these two structures provides access to WORDREGS or BYTEREGS defined as: union REGS { struct WORDREGS x; struct BYTEREGS h; } union REGS reg can access AH register by reg.h.ah and the AX register by reg.x.ax.
In Pascal a variant part appears after fixed part of a record. The constants in the syntax <constant1>, <constant2>,. . . , <constantv> corresponds to distinct states of the variant part; each state has its own field layout. The state depends on the constant stored in a special field, called a tag-field with name <tag-name> and type given by <type-name>. type kind = (leaf, unary, binary); node = record c1: T1; c2: T2; case k: kind of leaf: ( ); unary: ( child: T3 ); binary: ( lchild, rchild: T4 ); end;
18
Record type t consists of only a variant part. The type name kind after case provides for the variant part to be in one of two states, given by constants 1 and 2 associated with kind. Since tag name does not appear between case and the type name kind, there is no tag field; the state of the variant part therefore cannot be stored within the record. The only possible fields for variable x are x.i and x.r, only one of which exists at any given time. Since the state is not stored within the record, an implementation cannot check whether x is in state 1 when x.i is selected and whether x is in state 2 when x.r is selected.
5 SETS
Sets can be implemented efficiently using bits in the underlying machine. Operations of sets turn into bit operations. Pascal allows sets to be used as values. It also provides a type constructor set of for building set types from enumerations and sub ranges.
Set Values
A set is written in Pascal by writing its elements between the set brackets, [ and ]. The following are examples of sets: [] [ 0..9 ] [ a..z, A..Z ] [ Mon..Sun ]
Types: Data Representation 19
The elements of a set can be written individually or as sub ranges. The empty set is [ ], the set with no elements. All set elements must be of the same simple type- specifically, integer, an enumeration, or a sub range of these types.
Set Types
The type set of S represents subsets of S. For example, consider variable A declared by var A: set of [ 1..3 ] A can denote one of the following sets: [ ], [ 1 ], [ 2 ], [ 3 ], [ 1, 2 ], [ 1, 3 ], [ 2, 3 ], [ 1, 2, 3 ] Since these are all the subsets of [ 1, 2, 3 ], the type set of S in Pascal should perhaps be called subset of S.
Implementation of Sets
These sets can all be represented using three bits. Element 1 is represented by the first bit, element 2 by the second bit, and element 3 by the third bit. The set [ 1, 3 ] can then be encoded by the bit vector 101 A set of n elements is implemented as a bit vector of length n, Since the bit vector must typically fit in a word, an implementation is allowed to impose a limit on the maximum number of elements in a set, thereby restricting the usefulness of sets. The purpose of the limit is to allow sets to be represented by one or more machine words.
Operations on Sets
The basic operation on sets is a membership test. The operation in tests if an element x belongs to a set A. Bit vectors allow the following operations on sets to be implemented efficiently, by using bit-wise operations: A+B * / set union set difference set intersection symmetric difference
20
Sets can be compared using the relational operators <=, =, !=, >=, where <= is interpreted as subset and >= is interpreted as subset and >= is interpreted as superset. Note, however, that the operations < and > are not allowed. Example: Following is program fragment to demonstrate sets: case ch of +, -, *, /, (,), ;: begin lookahead := tok[ch]; ch := end; 0, 1, 2, 3, 4, 5, 6, 7, 8, 9: begin . . . lookahead := number end; end; If the labels that select a statement are grouped into a set, then a membership test can be used to select the statement. A derived binary relation between two sets is the subset relation, also called set inclusion. If all the members of set A are also members of set B, then A is a subset of B, denoted A B. For example, {1, 2} is a subset of {1, 2, 3}, but {1, 4} is not. From this definition, it is clear that a set is a subset of itself; in cases where one wishes to avoid this, the term proper subset is defined to exclude this possibility. Just as arithmetic features binary operations on numbers, set theory features binary operations on sets. The:
Union of the sets A and B, denoted A B, is the set of all objects that are a member of A, or B, or both. The union of {1, 2, 3} and {2, 3, 4} is the set {1, 2, 3, 4}. Intersection of the sets A and B, denoted A B, is the set of all objects that are members of both A and B. The intersection of {1, 2, 3} and {2, 3, 4} is the set {2, 3}. Set difference of U and A, denoted U \ A, is the set of all members of U that are not members of A. The set difference {1,2,3} \ {2,3,4} is {1} , while, conversely, the set difference {2,3,4} \ {1,2,3} is {4} . When A is a subset of U, the set difference U \ A is also called the complement of A in U. In this case, if the choice of U is clear from the context, the notation Ac is sometimes used instead of U \ A, particularly if U is a universal set as in the study of Venn diagrams.
Types: Data Representation 21
Symmetric difference of sets A and B is the set of all objects that are a member of exactly one of A and B (elements which are in one of the sets, but not in both). For instance, for the sets {1,2,3} and {2,3,4} , the symmetric difference set is {1,4} . It is the set difference of the union and the intersection, (A B) \ (A B) or (A \ B) (B \ A). Cartesian product of A and B, denoted A B, is the set whose members are all possible ordered pairs (a,b) where a is a member of A and b is a member of B. The Cartesian product of {1, 2} and {red, white} is {(1, red), (1, white), (2, red), (2, white)}. Power set of a set A is the set whose members are all possible subsets of A. For example, the power set of {1, 2} is { {}, {1}, {2}, {1,2} } .
Some basic sets of central importance are the empty set (the unique set containing no elements), the set of natural numbers, and the set of real numbers.
printf("%d", sizeof(str1)); OUTPUT 12 Similarly, char * str2="hello world"; printf("%d",sizeof(str2)); OUTPUT (in 16 bit compilers) 2 Apart from this Pointers are also efficient because to access a memory address location we need not traverse each subsequent address to reach the memory address. We can simply jump off using pointers. This saves execution time. Example: case 1 char arr1[]="hello world"; printf("%d", arr[4]); case 2 char arr2[]="hello world"; char *ptr=arr2; printf("%d", *(ptr+4)); The o/p in both cases is the same i.e. 'o' but in case 1 to print 'o' the compiler will have to go through arr[0], arr[1], arr[2], arr[3].But in 2nd case the Pointer variable simply stores the base address of array arr2[].Now to print the arr[4] element it simply adds 4 to base address and points to the corresponding memory location. Now we use the dereferencing operator * to get the value at the corresponding memory location. Dynamic Data. Data structures that grow and shrink during run-time can be implemented via pointers.This is called Dynamic memory management.
Example: Use of malloc() and calloc() in C facilitates us with dynamic memory management. Dynamic memory management is necessary because sometimes during runtime we don't know what will be the size or type of input during the execution time. So in order to
Types: Data Representation 23
handle such a situation we use dynamic memory management to allocate necessary resources (memory) at runtime.
Operations on Pointers
The basic operations on pointers in dereferencing that can be done using the * operator in C/C++ and using in Pascal. But when it comes to the Pascal language there are 5 basic operations that pointers can perform they are as follows: 1. Dynamic allocation to the heap: Execution of new (p) leaves p pointing to a newly allocated data structure of the type T on the heap. 2. Dereferencing: Expression p denotes data structure pointed to by p.
3. Assignment: Assignments are permitted to pointers of same type. 4. Equality testing: The equality relation = tests if two pointers of same type point to same data structure. 5. De-allocation: A dynamic data structure exists until it is explicitly released by execution of a statement dispose (p).
dynamically i.e. at runtime.via insert() or append() linked list data structure can grow dynamically and via delete() linked list can shrink if required. The conceptual or logical organization of linked list data structure is different from representation in physical memory. The links between cells determine the logical organization. The connected cells need not be physically adjacent to each other, they can be anywhere in memory.
// get memory
free () function can be used to de-allocate memory in above example. Memory leaks in some cases can lead to Buffer Overflows which is a common type of attack used by hackers.
A Representation in Pascal
The TEX typesetting uses two arrays, say pool and start to hold the character strings like: Tex troff word The individual characters in string are kept in array pool. Elements of the other array, start, point to the first character of each string. The actual array names in the code of TEX and str_start and str_pool.Elements start[s] is the index of the first character of the string s. The start array stores the address of the first character of a new string. The array pool contains all the strings.
If x denotes 3.14, then it denotes a real number, alternatively, if x denotes 3.14, then it denotes a real number. Alternatively, if x denotes true, then it denotes a Boolean value.
Types: Data Representation 26
The type of an expression x + y can be inferred from the types of x and y. Since 2 and 2 are integers and the sum of two integers is an integer, 2 + 2 must also denote an integer.
Variable Binding
The language design determines whether a variable has a fixed type. In Pascal and C, the type of a variable is fixed; if i is declared to have a type integer, then it must denote an integer, although the integer it denotes can change at run time. The assignment i := i+1 changes the value denoted by i, but the new value is again an integer. Lisp and Smalltalk do not restrict the types of the values a variable can denote at run time. The distinction between Pascal and Lisp can be explained in terms of binding times. A variable binding associates a property with a variable. Thus, an assignment x := 3.14 is a binding that associates the value 3.14 with x. A binding is static if it occurs before a program runs; it is dynamic if it occurs at run time. Static bindings are sometimes referred to as early bindings, and dynamic bindings are referred to as late bindings.
Types: Data Representation 27
Thus Pascal has static binding of types and dynamic bindings of values to variables, whereas Lisp has dynamic binding of both values and types.
Type Systems
A type system associates a type with each computed value. By examining the flow of these values, a type system attempts to ensure or prove that no type errors can occur. The particular type system in question determines exactly what constitutes a type error, but in general the aim is to prevent operations expecting a certain kind of value from being used with values for which that operation does not make sense (logic errors); memory errors will also be prevented. Type systems are often specified as part of programming languages, and built into the interpreters and compilers for them; although they can also be implemented as optional tools. The depth of type constraints and the manner of their evaluation affect the typing of the language. A programming language may further associate an operation with varying concrete algorithms on each type in the case of type polymorphism. Type theory is the study of type systems, although the concrete type systems of programming languages originate from practical issues of computer architecture, compiler implementation, and language design. A program associates each value with at least one particular type, but it also occurs also that a one value is associated with many subtypes. Other entities, such as objects, modules, communication channels, dependencies can become associated with a type.
28
Even a type can become associated with a type. An implementation of some type system could in theory associate some identifications named this way:
data type a type of a value class a type of an object kind (type theory) a type of a type, or metatype
These are the kinds of abstractions typing can go through on a hierarchy of levels contained in a system. Example: Type system for Fortran: o Variable names starting with letter I through N have type int; All other names have type real o A number has type real if it contains a decimal pointer; otherwise it has type int. o If expressions E and F have same type, following are expressions of same type E + F, E - F, E * F, E / F
Arithmetic operators
Arithmetic operators are functions. Associated with each op is a rule that specifies the type of an expression E op F in terms of the types E and F. An example is
29
Familiar operator symbols like + and * are overloaded; that is, these symbols have different meanings in different contexts. + is used for both integer and real addition, so it has two possible types. The treatment of + can therefore be restated using the following pair of rules: If E has type int and F has type int, then E + F also has type int If E has type real and F has type real, then E + F also has type real
Polymorphism
A polymorphic function has a parameterized type, also called a generic type. Data structures like stacks and queues can be defined to hold values of any type. Thus, we can define stacks of integers, stacks of grammar symbols, and so on. When code for such data structures is put into a library, a library designer cannot possibly anticipate all future uses of the data structure. Polymorphic types allow such a data structure to be defined once and then applied later to any desired type. In imperative languages like Pascal and C, the only polymorphic functions are operations on built-in types. C++ supports parameterized types, using a construct called templates. Functional languages have long supported polymorphic types.
array[ 0..9 ] of integer array[ 0..9 ] of integer In a large context, if variables x, y, and z are declared as follows, are their types equal? x, y : z : array[ 0..9 ] of integer array[ 0..9 ] of integer
In Pascal, x and y have the same type, because they are declared together, but z does not. In the corresponding C fragments, x, y and z all have the same type.
Structural Equivalence
Two type expressions are structurally equivalent if and only if they are equivalent under the following three rules: SE1. A type name is structurally equivalent to itself. SE2. Two types are structurally equivalent if they are formed by applying the same type constructor to structurally equivalent types. SE3. After a type declaration, type n = T, the type name n is structurally equivalent to T. By these rules, the types char and char are structurally equivalent, and so are the type names S and T: type S= array [ 0..99 ] of char; type T= array [ 0..99 ] of char;
type S = integer; T = S; U = integer; o Type expression equivalence. A type name is equivalent only to itself. Two type expressions are equivalent if they are formed by applying the same constructor to equivalent expressions. In other words, the expressions have to be identical.
32
typed languages may result in less checking to perform and less code to revisit. This too may reduce the edit-compile-test-debug cycle. Dynamic typing allows constructs that some static type checking would reject as illegal. For example, eval functions, which execute arbitrary data as code, become possible. An eval function is possible with static typing, but requires advanced uses of algebraic data types. Furthermore, dynamic typing better accommodates transitional code and prototyping, such as allowing a placeholder data structure (mock object) to be transparently used in place of a full-fledged data structure (usually for the purposes of experimentation and testing).
*****
33
Exercises
Solution 7
a. Length of string =start[s+1]-start[s]
Solution 8
delete(struct node **q,int num) { //code to delete first node
if(temp->data==num) { /*if node to be deleted is the first node in in linked list*/ if(temp==*q) { *q=temp->link /*free the memory occupied by the node*/ free(temp) } addatbeg(struct node **q,int num)//function to add node at beginning { struct node temp; /*add a new node*/ temp=malloc(sizeof(struct node)); temp->data=num; temp->link=*q; *q=temp; }
Solution 9
a. Binary tree can be implemented through the following piece of code struct tree{ int data; struct tree * left; struct tree *right; }p;
Types: Data Representation 35
This creates the one of the nodes of the tree where *left and *right denote left and right child respectively.
b. member() { //traverse the binary tree //at each node check whether the data field in int or not //if data field present than return //else search in right or left child }
c. insert(){ //traverse the left or right sub tree of root node //check whether child of current node in NULL //if null add an integer node as its child //else keep traversing until a NULL is encountered }
*****
36
Bibliography
1. Principles of Programming: Concepts and Constructs by Ravi Sethi 2. Programming in C by Dennis Ritchie 3. Practical C++ Programming by Steve Oualine 4. Concepts of Programming Languages by Robert W. Sebesta 5. Wikipedia 6. scribd.com 7. authorstream.com
*****
37