Professional Documents
Culture Documents
a. Data Formats:
The Data Format is a base topic for topics describing a defined way of coding information adhering to some Data Model for
storage or transfer. Data format in information technology can mean:
1) Data type, constraint placed upon the interpretation of data in a type system
2) Recording format, a format for encoding data for storage on a storage medium
3) File format, a format for encoding data for storage in a computer file
4) Content format, a format for converting data to information
5) Audio format, a format for processing audio data
6) Video format, a format for processing video data
A data type is a type of data. Of course, that is rather circular definition, and also not very helpful. Therefore, a better definition of a
data type is a data storage format that can contain a specific type or range of values. When computer programs store data in
variables, each variable must be assigned a specific data type. Some common data types include integers, floating point numbers,
characters, strings, and arrays. They may also be more specific types, such as dates, timestamps, boolean values, and varchar
(variable character) formats. Some programming languages require the programmer to define the data type of a variable before
assigning it a value. Other languages can automatically assign a variable's data type when the initial data is entered into the
variable. For example, if the variable "var1" is created with the value "1.25," the variable would be created as a floating point data
type. If the variable is set to "Hello world!," the variable would be assigned a string data type. Most programming languages allow
each variable to store a single data type. Therefore, if the variable's data type has already been set to an integer, assigning string
data to the variable may cause the data to be converted to an integer format.
Data types are also used by database applications. The fields within a database often require a specific type of data to be input. For
example, a company's record for an employee may use a string data type for the employee's first and last name. The employee's
date of hire would be stored in a date format, while his or her salary may be stored as an integer. By keeping the data types uniform
across multiple records, database applications can easily search, sort, and compare fields in different records.
c. Addressing Modes:
Addressing modes are an aspect of the instruction set architecture in most central processing unit (CPU) designs. The
various addressing modes that are defined in a given instruction set architecture define how machine language instructions in that
architecture identify the operand (or operands) of each instruction. An addressing mode specifies how to calculate the effective
memory address of an operand by using information held in registers and/or constants contained within a machine instruction or
elsewhere. In computer programming, addressing modes are primarily of interest to compiler writers and to those who write code
directly in assembly language. One of a set of methods for specifying the operand(s) for a machine code instruction. Different
processors vary greatly in the number of addressing modes they provide. The more complex modes described below can usually be
replaced with a short sequence of instructions using only simpler modes.
The most common modes are "register" - the operand is stored in a specified register; "absolute" - the operand is stored at
a specified memory address; and "immediate" - the operand is contained within the instruction. Most processors also have indirect
addressing modes, e.g. "register indirect", "memory indirect" where the specified register or memory location does not contain the
operand but contains its address, known as the "effective address". For an absolute addressing mode, the effective address is
contained within the instruction.
Indirect addressing modes often have options for pre- or post- increment or decrement, meaning that the register or
memory location containing the effective address is incremented or decremented by some amount (either fixed or also specified in
the instruction), either before or after the instruction is executed. These are very useful for stacks and for accessing blocks of data.
Other variations form the effective address by adding together one or more registers and one or more constants which may
themselves be direct or indirect. Such complex addressing modes are designed to support access to multidimensional arrays and
arrays of data structures.
The addressing mode may be "implicit" - the location of the operand is obvious from the particular instruction. This would
be the case for an instruction that modified a particular control register in the CPU or, in a stack based processor where operands are
always on the top of the stack.
Purpose: reads records from input device (code F1). copies them to output device (code 05) at the end of the file, writes EOF on the
output device, then RSUB to the operating system
Data transfer (RD, WD) a buffer is used to store record buffering is necessary for different I/O rates the end of each record is
marked with a null character (0016) the end of the file is indicated by a zero-length record Subroutines (JSUB, RSUB) RDREC,
WRREC
save link register first before nested jump.
Assembler’s functions
Convert mnemonic operation codes to their machine language equivalents Convert symbolic operands to their equivalent machine
addresses. Build the machine instructions in the proper format. Convert the data constants to internal machine representations
Write the object program and the assembly listing
.model small
.stack
.data
message db "Hello world, I'm learning Assembly !!!", "$"
.code
main proc
mov ax,seg message
mov ds,ax
mov ah,09
lea dx,message
int 21h
mov ax,4c00h
int 21h
main endp
end main
.model small: Lines that start with a "." are used to provide the assembler with infomation. The word(s) behind it say what kind of
info. In this case it just tells the assembler the program is small and doesn't need a lot of memory. I'll get back on this later.
.stack: Another line with info. This one tells the assembler that the "stack" segment starts here. The stack is used to store
temporary data. It isn't used in the program, but it must be there, because we make an .EXE file and these files MUST have a stack.
.data: indicates that the data segment starts here and that the stack segment ends there.
.code : indicates that the code segment starts there and the data segment ends there.
There are very few addressing modes on the SPARC, and they may be used only in certain very restricted combinations. The
three main types of SPARC instructions are given below, along with the valid combinations of addressing modes. There are only a
few unusual instructions which do not fall into these catagories.
1. Arithmetic/Logical/Shift instructions
opcode reg1,reg2,reg3 !reg1 op reg2 -> reg3
2. Load/Store Instructions
opcode [reg1+reg2],reg3
The SPARC code for this subroutine can be written several ways; two possible approaches are given below. (The 'X's in
the center line indicate the differences between the two approaches.)
The final algorithm: Given a DFA M = (Q, Σ , δ , q0, F), we can create a regular expression that describes L(M) as follows:
)iv Create a GNFA from M by:
()1 Adding a new start state, qstart, and a new final state, qfinal to the state set. (Sipser names these states "s"
and "a".)
()2 Add an ε transition from qstart to q0, the old start state.
()3 Add ε transitions from all of the old final states of M to qfinal. Make qfinal the only final state of the GNFA.
)v Eliminate the old states of M from the GNFA one at a time, adjusting the labels on the transitions as we've described
after each elimination.
When nothing remains in the GNFA except qstart and qfinal, the label of the single transition from qstart to qfinal is the regular
expression we are interested in.
a. YACC Compiler-Compiler:
If you have been programming for any length of time in a Unix environment, you will have encountered the mystical
programs Lex & YACC, or as they are known to GNU/Linux users worldwide, Flex & Bison, where Flex is a Lex implementation by
Vern Paxson and Bison the GNU version of YACC. We will call these programs Lex and YACC throughout - the newer versions are
upwardly compatible, so you can use Flex and Bison when trying our examples.
These programs are massively useful, but as with your C compiler, their manpage does not explain the language they
understand, nor how to use them. YACC is really amazing when used in combination with Lex, however, the Bison manpage does not
describe how to integrate Lex generated code with your Bison program. YACC can parse input streams consisting of tokens with
certain values. This clearly describes the relation YACC has with Lex, YACC has no idea what 'input streams' are, it needs
preprocessed tokens. While you can write your own Tokenizer, we will leave that entirely up to Lex.
A note on grammars and parsers. When YACC saw the light of day, the tool was used to parse input files for compilers:
programs. Programs written in a programming language for computers are typically *not* ambiguous - they have just one meaning.
As such, YACC does not cope with ambiguity and will complain about shift/reduce or reduce/reduce conflicts.
Example:
%{
#include <stdio.h>
#include <string.h>
int yywrap()
{
return 1;
}
main()
{
yyparse();
}
%}
Compiled programs generally run faster than interpreted programs. The advantage of an interpreter, however, is that it
does not need to go through the compilation stage during which machine instructions are generated. This process can be time-
consuming if the program is long. The interpreter, on the other hand, can immediately execute high-level programs. For this reason,
interpreters are sometimes used during the development of a program, when a programmer wants to add small sections at a time
and test them quickly. In addition, interpreters are often used in education because they allow students to program interactively.
Both interpreters and compilers are available for most high-level languages. However, BASIC and LISP are especially
designed to be executed by an interpreter. In addition, page description languages, such as PostScript, use an interpreter. Every
PostScript printer, for example, has a built-in interpreter that executes PostScript instructions.
The name "compiler" is primarily used for programs that translate source code from a high-level programming language to
a lower level language (e.g., assembly language or machine code). A program that translates from a low level language to a higher
level one is a decompiler. A program that translates between high-level languages is usually called a language translator, source to
source translator, or language converter. A language rewriter is usually a program that translates the form of expressions without a
change of language. A compiler is likely to perform many or all of the following operations: lexical analysis, preprocessing, parsing,
semantic analysis, code generation, and code optimization.
If you are thinking of creating your own programming language, writing a compiler or interpreter, or a scripting facility for
your application, or even creating a documentation parsing facility, the tools on this page are designed to (hopefully) ease your task.
These compiler construction kits, parser generators, lexical analyzer / analyser (lexers) generators, code optimzers (optimizer
generators), provide the facility where you define your language and allow the compiler creation tools to generate the source code
for your software.