You are on page 1of 15

Master of Computer Application(MCA)-semester 3

MC0073-System Programming-4 Credits


Assignment set-1(60 Marks)

1.)
Describe the following with respect to Language Specification:
A) Fundamentals of Language Processing

Fundamentals of Language Processing

Language Processing – Definition:

Language Processing = Analysis of Source Program + Synthesis of Target Program.

This definition motivates a generic model of language processing activities. We refer to the
collection of language processor components engaged in analysing a source program as the
analysis phase of the language processor. Components engaged in synthesizing a target program
constitute the synthesis phase.

A specification of the source language forms the basis of source program analysis. The
specification consists of three components:

1. Lexical rules which govern the formation of valid lexical units in the source language.

2. Syntax rules which govern the formation of valid statements in the source language.

3. Semantic rules which associate meaning with valid statements of the language.

Thus, analysis of a source statement consists of lexical, syntax and semantic analysis.

Lexical analysis (Scanning)

Lexical analysis identifies the lexical units in a source statement. It then classifies the units into
different lexical classes, e.g. id’s, constants, reserved id’s, etc. and enters them into different
tables. Lexical analysis builds a descriptor, called a token, for each lexical unit.

Syntax analysis (Parsing)

Syntax analysis processes the string of tokens built by lexical analysis to determine the statement
class, e.g. assignment statement, if statement, etc. It then builds an IC which represents the
structure of the statement. The IC is passed to semantic analysis to determine the meaning of the
statement.

Semantic analysis

Semantic analysis of declaration statements differs from the semantic analysis of imperative
statements. The former results in addition of information to the symbol table, e.g. type, length
and dimensionality of variables. The latter identifies the sequence of actions necessary to
implement the meaning of a source statement. In both cases the structure of a source statement
guides the application of the semantic rules.
Example 1.2 : Consider the statement

percent-profit := (profit * 100) / cost-price;

in some programming language. Lexical analysis identifies : =, * and / as operators, 100 as a


constant and the remaining strings as identifiers. Syntax analysis identifies the statement as an
assignment statement with percent-profit as the left hand side and (profit * 100) / cost-price as
the expression on the right hand side. Semantic analysis determines the meaning of the statement
to be the assignment of

to the variable percent-profit.

The synthesis phase is concerned with the construction of target language statement(s) which
have the same meaning as a source statement. Typically, this consists of two main activities:

· Creation of data structures in the target program

· Generation of target code.

We refer to these activities as memory allocation and code generation, respectively.

Phases and Passes of a language processor

From the preceding discussion it is clear that a language processor consists of two distinct
phases–the analysis phase and the synthesis phase. This process is so complex that it is not
reasonable, either from a logical point of view or from an implementation point of view. For this
reason, it is customary to partition the compilation process into a series of sub processes called
phases.

Phase:

A phase is a logically cohesive operation that takes as input one representation of the source
program and produces as output another representation.

Pass: The portions of one or more phases are combined into a module called a pass. A pass reads
the source program or output of another pass, makes the transformations specified by its phases
and writes the output to an intermediate file, which may then be read by a subsequent pass.

Intermediate representation of programs

The language processor performs certain processing more than once. In pass I, it analyses the
source program to note the type information. In pass II, it once again analyses the source program
to generate target code using the type information noted in pass I. This can be avoided using an
intermediate representation of the source program.

(Intermediate Representation (IR))

The first pass performs analysis of the source program, and reflects its results in the intermediate
representation. The second pass reads and analyses the IR, instead of the source program, to
perform synthesis of the target program. This avoids repeated processing of the source program.
The first pass is concerned exclusively with source language issues. Hence it is called the front
end of the language processor. The second pass is concerned with program synthesis for a
specific target language. Hence it is called the back end of the language processor.

Desirable properties of an IR are:

· Ease of use: IR should be easy to construct and analyse.

· Processing efficiency: efficient algorithms must exist for constructing and analysing the IR.

· Memory efficiency: IR must be compact

B) Language Processor development tools

There are two LPDTs widely used in practice. These are, the lexical analyzer generator LEX,
and the parser generator YACC. The input to these tools is a specification of the lexical and
syntactic constructs of L, and the semantic actions to be performed on recognizing the
constructs.

Compiler or Interpreter for a programming language is often decomposed into two parts:

1. Read the source program and discover its structure.

2. Process this structure, e.g. to generate the target program.

Lex and Yacc can generate program fragments that solve the first task.

The task of discovering the source structure again is decomposed into subtasks:

1. Split the source file into tokens (Lex).

2. Find the hierarchical structure of the program (Yacc).

Lex – A Lexical Analyzer Generator

Lex helps write programs whose control flow is directed by instances of regular expressions in
the input stream. It is well suited for editor-script type transformations and for segmenting
input in preparation for a parsing routine.

Lex source is a table of regular expressions and corresponding program fragments. The table is
translated to a program which reads an input stream, copying it to an output stream and
partitioning the input into strings which match the given expressions. As each such string is
recognized the corresponding program fragment is executed. The recognition of the
expressions is performed by a deterministic finite automaton generated by Lex. The program
fragments written by the user are executed in the order in which the corresponding regular
expressions occur in the input stream.
2.) Define the following:
A.) Addressing modes for CISC(Motorola and Intel)

The 68000 addressing (Motorola) modes

· Register to Register,

· Register to Memory,

· Memory to Register, and

· Memory to Memory

68000 Supports a wide variety of addressing modes.

· Immediate mode –- the operand immediately follows the instruction

· Absolute address – the address (in either the "short" 16-bit form or "long" 32-bit form) of
the operand immediately follows the instruction

· Program Counter relative with displacement – A displacement value is added to the


program counter to calculate the operand’s address. The displacement can be positive or
negative.

· Program Counter relative with index and displacement – The instruction contains both the
identity of an "index register" and a trailing displacement value. The contents of the index
register, the displacement value, and the program counter are added together to get the final
address.

· Register direct – The operand is contained in an address or data register.

· Address register indirect – An address register contains the address of the operand.

· Address register indirect with predecrement or postdecrement – An address register


contains the address of the operand in memory. With the predecrement option set, a
predetermined value is subtracted from the register before the (new) address is used. With
the postincrement option set, a predetermined value is added to the register after the
operation completes.

· Address register indirect with displacement — A displacement value is added to the


register’s contents to calculate the operand’s address. The displacement can be positive or
negative.

· Address register relative with index and displacement — The instruction contains both the
identity of an "index register" and a trailing displacement value. The contents of the index
register, the displacement value, and the specified address register are added together to get
the final address.

B)Addressing modes for RISC Machines


The Reduced Instruction Set Computer, or RISC, is a microprocessor CPU design
philosophy that favors a simpler set of instructions that all take about the same amount of
time to execute. The most common RISC microprocessors are AVR, PIC, ARM, DEC
Alpha, PA-RISC, SPARC, MIPS, and IBM’s PowerPC.

· RISC characteristics

- Small number of machine instructions : less than 150

- Small number of addressing modes : less than 4

- Small number of instruction formats : less than 4

- Instructions of the same length : 32 bits (or 64 bits)

- Single cycle execution

- Load / Store architecture

- Large number of GRPs (General Purpose Registers): more than 32

- Hardwired control

- Support for HLL (High Level Language).

3.)Explain the design of single pass and multi pass assemblers.

Single pass translation

LC processing and construction of the symbol table proceed as in two pass translation. The
problem of forward references is tackled using a process called backpatch-ing. The operand
field of an instruction containing a forward reference is left blank initially. The address of
the forward referenced symbol is put into this field when its definition is encountered.

Look at the following instructions:

START 101
READ N 101) + 09 0 113
MOVER BREG, ONE 102) + 04 2 115
MOVEM BREG, TERM 103) + 05 2 116
AGAIN MULT BREG, TERM 104) + 03 2 116
MOVER CREG, TERM 105) + 04 3 116
ADD CREG, ONE 106) + 01 3 115
MOVEM CREG, TERM 107) + 05 3 116
COMP CREG, N 108) + 06 3 113
BC LE, AGAIN 109) + 07 2 104
MOVEM BREG, 110) + 05 2 114
RESULT
PRINT RESULT 111) + 10 0 114
STOP 112) + 00 0 000
N DS 1 113)
RESULT DS 1 114)
ONE DC ‘1’ 115) + 00 0 001
TERM PS 1 116)
END

Fig. 1.7

In the above program (fig. 1.7) , the instruction corresponding to the statement

MOVER BREG, ONE

can be only partially synthesized since ONE is a forward reference. Hence the instruction
opcode and address of BREG will be assembled to reside in location 101. The need for
inserting the second operand’s address at a later stage can be indicated by adding an entry
to the Table of Incomplete Instructions (TII). This entry is a pair (instruction address>,
<symbol>), e.g. (101, ONE) in this case.

By the time the END statement is processed, the symbol table would contain the addresses
of all symbols defined in the source program and TII would contain information describing
all forward references. The assembler can now process each entry in TII to complete the
concerned instruction. For example, the entry (101, ONE) would be processed by obtaining
the address of ONE from symbol table and inserting it in the operand address field of the
instruction with assembled address 101. Alternatively, entries in TII can be processed in an
incremental manner. Thus, when definition of some symbol symb is encountered, all
forward references to symb can be processed.

Design of Multi Pass Assembler

Tasks performed by the passes of a two pass assembler are as follows:

Pass I:

1. Separate the symbol, mnemonic opcode and operand fields.

2. Build the symbol table.

3. Perform LC processing.

4. Construct intermediate representation.

Pass II: Synthesize the target program.

Pass I performs analysis of the source program and synthesis of the intermediate rep-
resentation while Pass II processes the intermediate representation to synthesize the target
program. The design details of assembler passes are discussed after introducing advanced
assembler directives and their influence on LC processing.
3.) Explain the following with respect to Macros and Macro Processors:
A) Macro Definition and Expansion

Macro definition and Expansion

Definition : macro

A macro name is an abbreviation, which stands for some related lines of code. Macros are
useful for the following purposes:

· To simplify and reduce the amount of repetitive coding

· To reduce errors caused by repetitive coding

· To make an assembly program more readable.

A macro consists of name, set of formal parameters and body of code. The use of macro
name with set of actual parameters is replaced by some code generated by its body. This is
called macro expansion.

Macros allow a programmer to define pseudo operations, typically operations that are
generally desirable, are not implemented as part of the processor instruction, and can be
implemented as a sequence of instructions. Each use of a macro generates new program
instructions, the macro has the effect of automating writing of the program.

Macros can be defined used in many programming languages, like C, C++ etc. Example
macro in C programming.Macros are commonly used in C to define small snippets of code.
If the macro has parameters, they are substituted into the macro body during expansion;
thus, a C macro can mimic a C function. The usual reason for doing this is to avoid the
overhead of a function call in simple cases, where the code is lightweight enough that
function call overhead has a significant impact on performance.

For instance,

#define max (a, b) a>b? A: b

Defines the macro max, taking two arguments a and b. This macro may be called like any C
function, using identical syntax. Therefore, after preprocessing

z = max(x, y);

Becomes z = x>y? X:y;


While this use of macros is very important for C, for instance to define type-safe generic
data-types or debugging tools, it is also slow, rather inefficient, and may lead to a number
of pitfalls.

C macros are capable of mimicking functions, creating new syntax within some limitations,
as well as expanding into arbitrary text (although the C compiler will require that text to be
valid C source code, or else comments), but they have some limitations as a programming
construct. Macros which mimic functions, for instance, can be called like real functions, but
a macro cannot be passed to another function using a function pointer, since the macro
itself has no address.

In programming languages, such as C or assembly language, a name that defines a set of


commands that are substituted for the macro name wherever the name appears in a program
(a process called macro expansion) when the program is compiled or assembled. Macros
are similar to functions in that they can take arguments and in that they are calls to lengthier
sets of instructions. Unlike functions, macros are replaced by the actual commands they
represent when the program is prepared for execution. function instructions are copied into
a program only once.

Macro Expansion.

A macro call leads to macro expansion. During macro expansion, the macro statement is
replaced by sequence of assembly statements.

Figure 1.1 Macro expansion on a source program.

Example

In the above program a macro call is shown in the middle of the figure. i.e. INITZ. Which
is called during program execution. Every macro begins with MACRO keyword at the
beginning and ends with the ENDM (end macro).when ever a macro is called the entire is
code is substituted in the program where it is called. So the resultant of the macro code is
shown on the right most side of the figure. Macro calling in high level programming
languages

(C programming)

#define max(a,b) a>b?a:b


Main () {

int x , y;

x=4; y=6;

z = max(x, y); }

The above program was written using C programing statements. Defines the macro max,
taking two arguments a and b. This macro may be called like any C function, using
identical syntax. Therefore, after preprocessing

Becomes z = x>y ? x: y;

After macro expansion, the whole code would appear like this.

#define max(a,b) a>b?a:b

main()

{ int x , y;

x=4; y=6;z = x>y?x:y; }

Example 2:

Consider a typical scenario where one needs to do a number of divisions of Ax register by


10. The following lists the typical evolution of macro development and usage. The final
result is the expansion of the macro. which becomes part of the program. In the following
example the macro use simply inserts the three instructions of the macro definition.

B) Conditional Macro Expansion

Means that some sections of the program may be optional, either included or not in the final
program, dependent upon specified conditions. A reasonable use of conditional assembly
would be to combine two versions of a program, one that prints debugging information
during test executions for the developer, another version for production operation that
displays only results of interest for the average user. A program fragment that assembles the
instructions to print the Ax register only if Debug is true is given below. Note that true is
any non-zero value.
Here is a conditional statements in C programming, the following statements tests the
expression `BUFSIZE == 1020′, where `BUFSIZE’ must be a macro.

#if BUFSIZE == 1020

printf ("Large buffers!n");

#endif /* BUFSIZE is large */

C) Macro Parameters

Macros may have any number of parameters, as long as they fit on one line. Parameter
names are local symbols, which are known within the macro only. Outside the macro they
have no meaning!

Syntax:

<macro name> MACRO <parameter 1>…….<parameter n>

<body line 1>

<body line 2>

<body line m>

ENDM

Valid macro arguments are

1. arbitrary sequences of printable characters, not containing blanks, tabs, commas, or


semicolons

2. quoted strings (in single or double quotes)

3. Single printable characters, preceded by ‘!’ as an escape character


4. Character sequences, enclosed in literal brackets < … >, which may be arbitrary
sequences of valid macro blanks, commas and semicolons

5. Arbitrary sequences of valid macro arguments

6. Expressions preceded by a ‘%’ character

During macro expansion, these actual arguments replace the symbols of the corresponding
formal parameters, wherever they are recognized in the macro body. The first argument
replaces the symbol of the first parameter, the second argument replaces the symbol of the
second parameter, and so forth. This is called substitution.

Example 3

MY_SECOND MACRO CONSTANT, REGISTER

MOV A,#CONSTANT

ADD A,REGISTER

ENDM

MY_SECOND 42, R5

After calling the macro MY_SECOND, the body lines

MOV A,#42

ADD A,R5

are inserted into the program, and assembled. The parameter names CONSTANT and
REGISTER have been replaced by the macro arguments "42" and "R5". The number of
arguments, passed to a macro, can be less (but not greater) than the number of its formal
parameters. If an argument is omitted, the corresponding formal parameter is replaced by an
empty string. If other arguments than the last ones are to be omitted, they can be
represented by commas.

Macro parameters support code reuse, allowing one macro definition to implement multiple
algorithms. In the following, the .DIV macro has a single parameter N. When the macro is
used in the program, the actual parameter used is substituted for the formal parameter
defined in the macro prototype during the macro expansion. Now the same macro, when
expanded, can produce code to divide by any unsigned integer.

Fig. 3.0

Example 4

The macro OPTIONAL has eight formal parameters:


OPTIONAL MACRO P1,P2,P3,P4,P5,P6,P7,P8

<macro body>

ENDM

If it is called as follows,

OPTIONAL 1,2,,,5,6

the formal parameters P1, P2, P5 and P6 are replaced by the arguments 1, 2, 5 and 6 during
substitution. The parameters P3, P4, P7 and P8 are replaced by a zero length string.

5.)Describe the process of Bootstrapping in the context of linkers.

Boot straping

In computing, bootstrapping refers to a process where a simple system activates another


more complicated system that serves the same purpose. It is a solution to the Chicken-and-
egg problem of starting a certain system without the system already functioning. The term
is most often applied to the process of starting up a computer, in which a mechanism is
needed to execute the software program that is responsible for executing software programs
(the operating system).

Bootstrap loading

The discussions of loading up to this point have all presumed that there’s already an
operating system or at least a program loader resident in the computer to load the program
of interest. The chain of programs being loaded by other programs has to start somewhere,
so the obvious question is how is the first program loaded into the computer?

In modern computers, the first program the computer runs after a hardware reset invariably
is stored in a ROM known as bootstrap ROM. as in "pulling one’s self up by the
bootstraps." When the CPU is powered on or reset, it sets its registers to a known state. On
x86 systems, for example, the reset sequence jumps to the address 16 bytes below the top of
the system’s address space. The bootstrap ROM occupies the top 64K of the address space
and ROM code then starts up the computer. On IBM-compatible x86 systems, the boot
ROM code reads the first block of the floppy disk into memory, or if that fails the first
block of the first hard disk, into memory location zero and jumps to location zero. The
program in block zero in turn loads a slightly larger operating system boot program from a
known place on the disk into memory, and jumps to that program which in turn loads in the
operating system and starts it. (There can be even more steps, e.g., a boot manager that
decides from which disk partition to read the operating system boot program, but the
sequence of increasingly capable loaders remains.)

Why not just load the operating system directly? Because you can’t fit an operating system
loader into 512 bytes. The first level loader typically is only able to load a single-segment
program from a file with a fixed name in the top-level directory of the boot disk. The
operating system loader contains more sophisticated code that can read and interpret a
configuration file, uncompress a compressed operating system executable, address large
amounts of memory (on an x86 the loader usually runs in real mode which means that it’s
tricky to address more than 1MB of memory.) The full operating system can turn on the
virtual memory system, loads the drivers it needs, and then proceed to run user-level
programs.

Many Unix systems use a similar bootstrap process to get user-mode programs running.
The kernel creates a process, then stuffs a tiny little program, only a few dozen bytes long,
into that process. The tiny program executes a system call that runs /etc/init, the user mode
initialization program that in turn runs configuration files and starts the daemons and login
programs that a running system needs.

None of this matters much to the application level programmer, but it becomes more
interesting if you want to write programs that run on the bare hardware of the machine,
since then you need to arrange to intercept the bootstrap sequence somewhere and run your
program rather than the usual operating system. Some systems make this quite easy (just
stick the name of your program in AUTOEXEC.BAT and reboot Windows 95, for
example), others make it nearly impossible. It also presents opportunities for customized
systems. For example, a single-application system could be built over a Unix kernel by
naming the application /etc/init.

6.) Describe the procedure for design of a linker.

Design of a linker

Relocation and linking requirements in segmented addressing

The relocation requirements of a program are influenced by the addressing structure of the
computer system on which it is to execute. Use of the segmented addressing structure
reduces the relocation requirements of program.

5.4 Implementation Examples: A Linker for MS-DOS

Example 7.7: Consider the program of written in the assembly language of intel 8088. The
ASSUME statement declares the segment registers CS and DS to the available for memory
addressing. Hence all memory addressing is performed by using suitable displacements
from their contents. Translation time address o A is 0196. In statement 16, a reference to A
is assembled as a displacement of 196 from the contents of the CS register. This avoids the
use of an absolute address, hence the instruction is not address sensitive. Now no relocation
is needed if segment SAMPLE is to be loaded with address 2000 by a calling program (or
by the OS). The effective operand address would be calculated as <CS>+0196, which is the
correct address 2196. A similar situation exists with the reference to B in statement 17. The
reference to B is assembled as a displacement of 0002 from the contents of the DS register.
Since the DS register would be loaded with the execution time address of DATA_HERE,
the reference to B would be automatically relocated to the correct address.

Though use of segment register reduces the relocation requirements, it does not completely
eliminate the need for relocation. Consider statement 14 .

MOV AX, DATA_HERE

Which loads the segment base of DATA_HERE into the AX register preparatory to its
transfer into the DS register . Since the assembler knows DATA_HERE to be a segment, it
makes provision to load the higher order 16 bits of the address of DATA_HERE into the
AX register. However it does not know the link time address of DATA_HERE, hence it
assembles the MOV instruction in the immediate operand format and puts zeroes in the
operand field. It also makes an entry for this instruction in RELOCTAB so that the linker
would put the appropriate address in the operand field. Inter-segment calls and jumps are
handled in a similar way.

Relocation is somewhat more involved in the case of intra-segment jumps assembled in the
FAR format. For example, consider the following program :

FAR_LAB EQU THIS FAR ; FAR_LAB is a FAR label

JMP FAR_LAB ; A FAR jump

Here the displacement and the segment base of FAR_LAB are to be put in the JMP
instruction itself. The assembler puts the displacement of FAR_LAB in the first two
operand bytes of the instruction , and makes a RELOCTAB entry for the third and fourth
operand bytes which are to hold the segment base address. A segment like

ADDR_A DW OFFSET A

(which is an ‘address constant’) does not need any relocation since the assemble can itself
put the required offset in the bytes. In summary, the only RELOCATAB entries that must
exist for a program using segmented memory addressing are for the bytes that contain a
segment base address.

For linking, however both segment base address and offset of the external symbol must be
computed by the linker. Hence there is no reduction in the linking requirements.
Relocation Algorithm

Algorithm

1. program_linked_origin :=<link origin> from linker command ;

2. For each object module

A. t_origin := translated origin of the object module;

OM_size := size of the object module;

B. relocation_factor := program_linked_origin— t_origin;

C. Read the machine language program in work_area.

D. Read the RELOCATB of the object module.

E. For each entry in RELOCTAB

i) Translated_addr :=address in the RELOCTAB entry;

ii) Address_in_work_area := address of work_area+translated_address—t_origin;

iii) Add relocation_factor to the operand address in the word with the address
address_in_work_area.

F. Program_linked_origin := program_linked_origin+ OM_size;

The computations performed in the algorithm are along the lines described… the only new
action is the computation of the work area address of the word requiring relocation(step2(e)
………._). Step2(f) increments program_linked_orign so that the next object module would
granted the next available load address.

You might also like