You are on page 1of 44

Programming language

Source code of a simple computer program written in the C programming language, which will output the
"Hello, world!" message when compiled and run

A programming language is a formal computer language designed to communicate instructions to


amachine, particularly a computer. Programming languages can be used to create programs to
control the behavior of a machine or to express algorithms.
The earliest known programmable machine preceded the invention of the digital computer and is the
automatic flute player described in the 9th century by the brothers Musa in Baghdad, "during
the Islamic Golden Age".[1] From the early 1800s, "programs" were used to direct the behavior of
machines such asJacquard looms and player pianos.[2] Thousands of different programming
languages have been created, mainly in the computer field, and many more still are being created
every year. Many programming languages require computation to be specified in animperative form
(i.e., as a sequence of operations to perform), while other languages use other forms of program
specification such as the declarative form (i.e. the desired result is specified, not how to achieve it).
The description of a programming language is usually split into the two components of syntax (form)
and semantics(meaning). Some languages are defined by a specification document (for example,
the C programming language is specified by an ISO Standard), while other languages (such as Perl)
have a dominant implementation that is treated as a reference. Some languages have both, with the
basic language defined by a standard and extensions taken from the dominant implementation being
common.

Definitions[edit]
A programming language is a notation for writing programs, which are specifications of a
computation or algorithm.[3] Some, but not all, authors restrict the term "programming language" to
those languages that can express all possible algorithms.[3][4]Traits often considered important for
what constitutes a programming language include:
Function and target
A computer programming language is a language used to write computer programs, which involve
a computerperforming some kind of computation[5] or algorithm and possibly control external devices
such as printers, disk drives,robots,[6] and so on. For example, PostScript programs are frequently
created by another program to control a computer printer or display. More generally, a programming
language may describe computation on some, possibly abstract, machine. It is generally accepted
that a complete specification for a programming language includes a description, possibly idealized,
of a machine or processor for that language.[7] In most practical contexts, a programming language
involves a computer; consequently, programming languages are usually defined and studied this
way.[8] Programming languages differ from natural languages in that natural languages are only used
for interaction between people, while programming languages also allow humans to communicate
instructions to machines.
Abstractions
Programming languages usually contain abstractions for defining and manipulating data
structures or controlling the flow of execution. The practical necessity that a programming language
support adequate abstractions is expressed by theabstraction principle;[9] this principle is sometimes
formulated as a recommendation to the programmer to make proper use of such abstractions. [10]
Expressive power
The theory of computation classifies languages by the computations they are capable of expressing.
All Turing completelanguages can implement the same set of algorithms. ANSI/ISO SQL92 and Charity are examples of languages that are not Turing complete, yet often called
programming languages.[11][12]
Markup languages like XML, HTML or troff, which define structured data, are not usually
considered programming languages.[13][14][15] Programming languages may, however,
share the syntax with markup languages if a computational semantics is defined. XSLT,
for example, is a Turing complete XML dialect.[16][17][18] Moreover, LaTeX, which is mostly
used for structuring documents, also contains a Turing complete subset. [19][20]
The term computer language is sometimes used interchangeably with programming
language.[21] However, the usage of both terms varies among authors, including the
exact scope of each. One usage describes programming languages as a subset of

computer languages.[22] In this vein, languages used in computing that have a different
goal than expressing computer programs are generically designated computer
languages. For instance, markup languages are sometimes referred to as computer
languages to emphasize that they are not meant to be used for programming. [23]
Another usage regards programming languages as theoretical constructs for
programming abstract machines, and computer languages as the subset thereof that
runs on physical computers, which have finite hardware resources.[24] John C.
Reynolds emphasizes that formal specification languages are just as much
programming languages as are the languages intended for execution. He also argues
that textual and even graphical input formats that affect the behavior of a computer are
programming languages, despite the fact they are commonly not Turing-complete, and
remarks that ignorance of programming language concepts is the reason for many flaws
in input formats.[25]

History[edit]
Main article: History of programming languages

Early developments[edit]
The earliest computers were often programmed without the help of a programming
language, by writing programs in absolute machine language. The programs, in decimal
or binary form, were read in from punched cards or magnetic tape, or toggled in on
switches on the front panel of the computer. Absolute machine languages were later
termed first-generation programming languages (1GL).
The next step was development of so-called second-generation programming
languages (2GL) or assembly languages, which were still closely tied to the instruction
set architecture of the specific computer. These served to make the program much more
human-readable, and relieved the programmer of tedious and error-prone address
calculations.
The first high-level programming languages, or third-generation programming
languages (3GL), were written in the 1950s. An early high-level programming language
to be designed for a computer was Plankalkl, developed for the German Z3 byKonrad
Zuse between 1943 and 1945. However, it was not implemented until 1998 and 2000. [26]
John Mauchly's Short Code, proposed in 1949, was one of the first high-level languages
ever developed for an electronic computer.[27] Unlike machine code, Short Code
statements represented mathematical expressions in understandable form. However,
the program had to be translated into machine code every time it ran, making the
process much slower than running the equivalent machine code.

The Manchester Mark 1 ran programs written inAutocode from 1952.

At the University of Manchester, Alick Glennie developed Autocode in the early 1950s.
A programming language, it used a compiler to automatically convert the language into
machine code. The first code and compiler was developed in 1952 for the Mark 1 computer
at the University of Manchester and is considered to be the first compiledhigh-level
programming language.[28][29]
The second autocode was developed for the Mark 1 by R. A. Brooker in 1954 and was called
the "Mark 1 Autocode". Brooker also developed an autocode for the Ferranti Mercury in the
1950s in conjunction with the University of Manchester. The version for the EDSAC 2 was

devised byD. F. Hartley of University of Cambridge Mathematical Laboratory in 1961. Known


as EDSAC 2 Autocode, it was a straight development from Mercury Autocode adapted for
local circumstances, and was noted for its object code optimisation and source-language
diagnostics which were advanced for the time. A contemporary but separate thread of
development, Atlas Autocode was developed for the University of Manchester Atlas
1 machine.
In 1954, FORTRAN was invented at IBM by John Backus. It was the first widely used high
level general purpose programming language to have a functional implementation, as
opposed to just a design on paper.[30][31] It is still popular language for high-performance
computing[32] and is used for programs that benchmark and rank the world's fastest
supercomputers.[33]
Another early programming language was devised by Grace Hopper in the US,
called FLOW-MATIC. It was developed for the UNIVAC I at Remington Rand during the
period from 1955 until 1959. Hopper found that business data processing customers were
uncomfortable with mathematical notation, and in early 1955, she and her team wrote a
specification for anEnglish programming language and implemented a prototype. [34] The
FLOW-MATIC compiler became publicly available in early 1958 and was substantially
complete in 1959.[35] Flow-Matic was a major influence in the design of COBOL, since only it
and its direct descendant AIMACO were in actual use at the time.[36]

Refinement[edit]
The increased use of high-level languages introduced a requirement for low-level
programming languages or system programming languages. These languages, to varying
degrees, provide facilities between assembly languages and high-level languages, and can
be used to perform tasks which require direct access to hardware facilities but still provide
higher-level control structures and error-checking.
The period from the 1960s to the late 1970s brought the development of the major language
paradigms now in use:

APL introduced array programming and influenced functional programming.[37]

ALGOL refined both structured procedural programming and the discipline


of language specification; the "Revised Report on the Algorithmic Language ALGOL
60" became a model for how later language specifications were written.

Lisp, implemented in 1958, was the first dynamically typed functional


programming language

In the 1960s, Simula was the first language designed to support object-oriented
programming; in the mid-1970s,Smalltalk followed with the first "purely" objectoriented language.

C was developed between 1969 and 1973 as a system programming language for
the Unix operating system, and remains popular.[38]

Prolog, designed in 1972, was the first logic programming language.

In 1978, ML built a polymorphic type system on top of Lisp, pioneering statically


typed functional programminglanguages.

Each of these languages spawned descendants, and most modern programming


languages count at least one of them in their ancestry.
The 1960s and 1970s also saw considerable debate over the merits of structured
programming, and whether programming languages should be designed to support it.
[39]
Edsger Dijkstra, in a famous 1968 letter published in the Communications of the ACM,
argued that GOTO statements should be eliminated from all "higher level" programming
languages.[40]

Consolidation and growth[edit]

A selection of textbooks that teach programming, in languages both popular and obscure. These
are only a few of the thousands of programming languages and dialects that have been designed
in history.

The 1980s were years of relative consolidation. C++ combined object-oriented and
systems programming. The United States government standardized Ada, a systems
programming language derived from Pascal and intended for use by defense
contractors. In Japan and elsewhere, vast sums were spent investigating so-called "fifth
generation" languages that incorporated logic programming constructs.[41] The functional
languages community moved to standardize ML and Lisp. Rather than inventing new
paradigms, all of these movements elaborated upon the ideas invented in the previous
decades.
One important trend in language design for programming large-scale systems during the
1980s was an increased focus on the use of modules, or large-scale organizational
units of code. Modula-2, Ada, and ML all developed notable module systems in the
1980s, which were often wedded to generic programmingconstructs.[42]
The rapid growth of the Internet in the mid-1990s created opportunities for new
languages. Perl, originally a Unix scripting tool first released in 1987, became common
in dynamic websites. Java came to be used for server-side programming, and bytecode
virtual machines became popular again in commercial settings with their promise of
"Write once, run anywhere" (UCSD Pascal had been popular for a time in the early
1980s). These developments were not fundamentally novel, rather they were
refinements of many existing languages and paradigms (although their syntax was often
based on the C family of programming languages).
Programming language evolution continues, in both industry and research. Current
directions include security and reliability verification, new kinds of modularity
(mixins, delegates, aspects), and database integration such as Microsoft's LINQ.
Fourth-generation programming languages (4GL) are a computer programming
languages which aim to provide a higher level of abstraction of the internal computer
hardware details than 3GLs. Fifth generation programming languages (5GL) are
programming languages based on solving problems using constraints given to the
program, rather than using an algorithmwritten by a programmer.

Elements[edit]
All programming languages have some primitive building blocks for the description of
data and the processes or transformations applied to them (like the addition of two
numbers or the selection of an item from a collection). These primitives are defined by
syntactic and semantic rules which describe their structure and meaning respectively.

Syntax[edit]
Main article: Syntax (programming languages)

Parse tree of Python code with inset tokenization

Syntax highlighting is often used to aid programmers in recognizing elements of source code.
The language above is Python.

A programming language's surface form is known as itssyntax. Most programming


languages are purely textual; they use sequences of text including words, numbers, and
punctuation, much like written natural languages. On the other hand, there are some
programming languages which are more graphical in nature, using visual relationships
between symbols to specify a program.
The syntax of a language describes the possible combinations of symbols that form a
syntactically correct program. The meaning given to a combination of symbols is
handled by semantics (either formal or hard-coded in areference implementation). Since
most languages are textual, this article discusses textual syntax.
Programming language syntax is usually defined using a combination of regular
expressions (for lexical structure) andBackusNaur form (for grammatical structure).
Below is a simple grammar, based on Lisp:
expression ::= atom | list
atom

::= number | symbol

number

::= [+-]?['0'-'9']+

symbol

::= ['A'-'Z''a'-'z'].*

list

::= '(' expression* ')'

This grammar specifies the following:

an expression is either an atom or a list;

an atom is either a number or a symbol;

a number is an unbroken sequence of one or more decimal digits, optionally


preceded by a plus or minus sign;

a symbol is a letter followed by zero or more of any characters (excluding


whitespace); and

a list is a matched pair of parentheses, with zero or more expressions inside it.

The following are examples of well-formed token sequences in this


grammar: 12345, () and (a b c232 (1)).
Not all syntactically correct programs are semantically correct. Many syntactically
correct programs are nonetheless ill-formed, per the language's rules; and may
(depending on the language specification and the soundness of the implementation)
result in an error on translation or execution. In some cases, such programs may
exhibit undefined behavior. Even when a program is well-defined within a language, it
may still have a meaning that is not intended by the person who wrote it.
Using natural language as an example, it may not be possible to assign a meaning to a
grammatically correct sentence or the sentence may be false:

"Colorless green ideas sleep furiously." is grammatically well-formed but has no


generally accepted meaning.

"John is a married bachelor." is grammatically well-formed but expresses a meaning


that cannot be true.

The following C language fragment is syntactically correct, but performs operations that
are not semantically defined (the operation *p >> 4 has no meaning for a value having
a complex type and p->im is not defined because the value of pis the null pointer):
complex *p = NULL;
complex abs_p = sqrt(*p >> 4 + p->im);

If the type declaration on the first line were omitted, the program would trigger an error
on compilation, as the variable "p" would not be defined. But the program would still be
syntactically correct, since type declarations provide only semantic information.
The grammar needed to specify a programming language can be classified by its
position in the Chomsky hierarchy. The syntax of most programming languages can be
specified using a Type-2 grammar, i.e., they are context-free grammars.[43]Some
languages, including Perl and Lisp, contain constructs that allow execution during the
parsing phase. Languages that have constructs that allow the programmer to alter the
behavior of the parser make syntax analysis an undecidable problem, and generally blur
the distinction between parsing and execution.[44] In contrast to Lisp's macro system and
Perl's BEGINblocks, which may contain general computations, C macros are merely
string replacements, and do not require code execution.[45]

Semantics[edit]
The term semantics refers to the meaning of languages, as opposed to their form
(syntax).
Static semantics[edit]
The static semantics defines restrictions on the structure of valid texts that are hard or
impossible to express in standard syntactic formalisms.[3] For compiled languages, static
semantics essentially include those semantic rules that can be checked at compile time.
Examples include checking that every identifier is declared before it is used (in
languages that require such declarations) or that the labels on the arms of a case
statement are distinct.[46] Many important restrictions of this type, like checking that

identifiers are used in the appropriate context (e.g. not adding an integer to a function
name), or that subroutine calls have the appropriate number and type of arguments, can
be enforced by defining them as rules in alogic called a type system. Other forms
of static analyses like data flow analysis may also be part of static semantics. Newer
programming languages like Java and C# have definite assignment analysis, a form of
data flow analysis, as part of their static semantics.
Dynamic semantics[edit]
Main article: Semantics of programming languages
Once data has been specified, the machine must be instructed to perform operations on
the data. For example, the semantics may define the strategy by which expressions are
evaluated to values, or the manner in which control structuresconditionally
execute statements. The dynamic semantics (also known as execution semantics) of a
language defines how and when the various constructs of a language should produce a
program behavior. There are many ways of defining execution semantics. Natural
language is often used to specify the execution semantics of languages commonly used
in practice. A significant amount of academic research went into formal semantics of
programming languages, which allow execution semantics to be specified in a formal
manner. Results from this field of research have seen limited application to
programming language design and implementation outside academia.

Type system[edit]
Main articles: Data type, Type system, and Type safety
A type system defines how a programming language classifies values and expressions
into types, how it can manipulate those types and how they interact. The goal of a type
system is to verify and usually enforce a certain level of correctness in programs written
in that language by detecting certain incorrect operations. Any decidable type system
involves a trade-off: while it rejects many incorrect programs, it can also prohibit some
correct, albeit unusual programs. In order to bypass this downside, a number of
languages have type loopholes, usually unchecked casts that may be used by the
programmer to explicitly allow a normally disallowed operation between different types.
In most typed languages, the type system is used only to type check programs, but a
number of languages, usually functional ones, infer types, relieving the programmer
from the need to write type annotations. The formal design and study of type systems is
known as type theory.
Typed versus untyped languages[edit]
A language is typed if the specification of every operation defines types of data to which
the operation is applicable, with the implication that it is not applicable to other types.
[47]
For example, the data represented by "this text between the quotes" is
a string, and in many programming languages dividing a number by a string has no
meaning and will be rejected by the compilers. The invalid operation may be detected
when the program is compiled ("static" type checking) and will be rejected by the
compiler with a compilation error message, or it may be detected when the program is
run ("dynamic" type checking), resulting in a run-time exception. Many languages allow
a function called an exception handler to be written to handle this exception and, for
example, always return "-1" as the result.
A special case of typed languages are the single-type languages. These are often
scripting or markup languages, such asREXX or SGML, and have only one data type
most commonly character strings which are used for both symbolic and numeric data.
In contrast, an untyped language, such as most assembly languages, allows any
operation to be performed on any data, which are generally considered to be sequences
of bits of various lengths.[47] High-level languages which are untyped include BCPL, Tcl,
and some varieties of Forth.
In practice, while few languages are considered typed from the point of view of type
theory (verifying or rejecting alloperations), most modern languages offer a degree of
typing.[47] Many production languages provide means to bypass or subvert the type
system, trading type-safety for finer control over the program's execution (see casting).
Static versus dynamic typing[edit]

In static typing, all expressions have their types determined prior to when the program is
executed, typically at compile-time. For example, 1 and (2+2) are integer expressions;
they cannot be passed to a function that expects a string, or stored in a variable that is
defined to hold dates.[47]
Statically typed languages can be either manifestly typed or type-inferred. In the first
case, the programmer must explicitly write types at certain textual positions (for
example, at variable declarations). In the second case, the compiler infers the types of
expressions and declarations based on context. Most mainstream statically typed
languages, such as C++, C# andJava, are manifestly typed. Complete type inference
has traditionally been associated with less mainstream languages, such
as Haskell and ML. However, many manifestly typed languages support partial type
inference; for example, Java and C#both infer types in certain limited cases.
[48]
Additionally, some programming languages allow for some types to be automatically
converted to other types; for example, an int can be used where the program expects a
float.
Dynamic typing, also called latent typing, determines the type-safety of operations at run
time; in other words, types are associated with run-time values rather than textual
expressions.[47] As with type-inferred languages, dynamically typed languages do not
require the programmer to write explicit type annotations on expressions. Among other
things, this may permit a single variable to refer to values of different types at different
points in the program execution. However, type errorscannot be automatically detected
until a piece of code is actually executed, potentially making debugging more
difficult. Lisp,Smalltalk, Perl, Python, JavaScript, and Ruby are dynamically typed.
Weak and strong typing[edit]
Weak typing allows a value of one type to be treated as another, for example treating
a string as a number.[47] This can occasionally be useful, but it can also allow some kinds
of program faults to go undetected at compile time and even at run time.
Strong typing prevents the above. An attempt to perform an operation on the wrong type
of value raises an error.[47]Strongly typed languages are often termed type-safe or safe.
An alternative definition for "weakly typed" refers to languages, such
as Perl and JavaScript, which permit a large number of implicit type conversions. In
JavaScript, for example, the expression 2 * x implicitly converts x to a number, and
this conversion succeeds even if x is null, undefined, an Array, or a string of letters.
Such implicit conversions are often useful, but they can mask programming
errors. Strong and static are now generally considered orthogonal concepts, but usage
in the literature differs. Some use the term strongly typed to mean strongly, statically
typed, or, even more confusingly, to mean simply statically typed. Thus C has been
called both strongly typed and weakly, statically typed. [49][50]
It may seem odd to some professional programmers that C could be "weakly, statically
typed". However, notice that the use of the generic pointer, the void* pointer, does allow
for casting of pointers to other pointers without needing to do an explicit cast. This is
extremely similar to somehow casting an array of bytes to any kind of datatype in C
without using an explicit cast, such as (int) or (char).

Standard library and run-time system[edit]


Main article: Standard library
Most programming languages have an associated core library (sometimes known as the
'standard library', especially if it is included as part of the published language standard),
which is conventionally made available by all implementations of the language. Core
libraries typically include definitions for commonly used algorithms, data structures, and
mechanisms for input and output.
The line between a language and its core library differs from language to language. In
some cases, the language designers may treat the library as a separate entity from the
language. However, a language's core library is often treated as part of the language by
its users, and some language specifications even require that this library be made
available in all implementations. Indeed, some languages are designed so that the
meanings of certain syntactic constructs cannot even be described without referring to
the core library. For example, in Java, a string literal is defined as an instance of

thejava.lang.String class; similarly, in Smalltalk, an anonymous function expression


(a "block") constructs an instance of the library's BlockContext class.
Conversely, Scheme contains multiple coherent subsets that suffice to construct the rest
of the language as library macros, and so the language designers do not even bother to
say which portions of the language must be implemented as language constructs, and
which must be implemented as parts of a library.

Design and implementation[edit]


Programming languages share properties with natural languages related to their
purpose as vehicles for communication, having a syntactic form separate from its
semantics, and showing language families of related languages branching one from
another.[51][52] But as artificial constructs, they also differ in fundamental ways from
languages that have evolved through usage. A significant difference is that a
programming language can be fully described and studied in its entirety, since it has a
precise and finite definition.[53] By contrast, natural languages have changing meanings
given by their users in different communities. While constructed languages are also
artificial languages designed from the ground up with a specific purpose, they lack the
precise and complete semantic definition that a programming language has.
Many programming languages have been designed from scratch, altered to meet new
needs, and combined with other languages. Many have eventually fallen into disuse.
Although there have been attempts to design one "universal" programming language
that serves all purposes, all of them have failed to be generally accepted as filling this
role.[54] The need for diverse programming languages arises from the diversity of
contexts in which languages are used:

Programs range from tiny scripts written by individual hobbyists to huge systems
written by hundreds of programmers.

Programmers range in expertise from novices who need simplicity above all else, to
experts who may be comfortable with considerable complexity.

Programs must balance speed, size, and simplicity on systems ranging


from microcontrollers to supercomputers.

Programs may be written once and not change for generations, or they may
undergo continual modification.

Programmers may simply differ in their tastes: they may be accustomed to


discussing problems and expressing them in a particular language.

One common trend in the development of programming languages has been to add
more ability to solve problems using a higher level of abstraction. The earliest
programming languages were tied very closely to the underlying hardware of the
computer. As new programming languages have developed, features have been added
that let programmers express ideas that are more remote from simple translation into
underlying hardware instructions. Because programmers are less tied to the complexity
of the computer, their programs can do more computing with less effort from the
programmer. This lets them write more functionality per time unit.[55]
Natural language programming has been proposed as a way to eliminate the need for a
specialized language for programming. However, this goal remains distant and its
benefits are open to debate. Edsger W. Dijkstra took the position that the use of a formal
language is essential to prevent the introduction of meaningless constructs, and
dismissed natural language programming as "foolish".[56] Alan Perlis was similarly
dismissive of the idea.[57] Hybrid approaches have been taken in Structured
English and SQL.
A language's designers and users must construct a number of artifacts that govern and
enable the practice of programming. The most important of these artifacts are the
language specification and implementation.

Specification[edit]
Main article: Programming language specification
The specification of a programming language is an artifact that the language users and
the implementors can use to agree upon whether a piece of source code is a
valid program in that language, and if so what its behavior shall be.
A programming language specification can take several forms, including the following:

An explicit definition of the syntax, static semantics, and execution semantics of the
language. While syntax is commonly specified using a formal grammar, semantic
definitions may be written in natural language (e.g., as in the C language), or
a formal semantics (e.g., as in Standard ML[58] and Scheme[59] specifications).

A description of the behavior of a translator for the language (e.g., the C+


+ and Fortran specifications). The syntax and semantics of the language have to be
inferred from this description, which may be written in natural or a formal language.

A reference or model implementation, sometimes written in the language being


specified (e.g., Prolog or ANSI REXX[60]). The syntax and semantics of the language
are explicit in the behavior of the reference implementation.

Implementation[edit]
Main article: Programming language implementation
An implementation of a programming language provides a way to write programs in that
language and execute them on one or more configurations of hardware and software.
There are, broadly, two approaches to programming language
implementation: compilation and interpretation. It is generally possible to implement a
language using either technique.
The output of a compiler may be executed by hardware or a program called an
interpreter. In some implementations that make use of the interpreter approach there is
no distinct boundary between compiling and interpreting. For instance, some
implementations of BASIC compile and then execute the source a line at a time.
Programs that are executed directly on the hardware usually run several orders of
magnitude faster than those that are interpreted in software.[citation needed]
One technique for improving the performance of interpreted programs is just-in-time
compilation. Here the virtual machine, just before execution, translates the blocks
of bytecode which are going to be used to machine code, for direct execution on the
hardware.

Proprietary languages[edit]
Although most of the most commonly used programming languages have fully open
specifications and implementations, many programming languages exist only as
proprietary programming languages with the implementation available only from a single
vendor, which may claim that such a proprietary language is their intellectual property.
Proprietary programming languages are commonly domain specific languages or
internal scripting languages for a single product; some proprietary languages are used
only internally within a vendor, while others are available to external users.
Some programming languages exist on the border between proprietary and open; for
example, Oracle Corporation asserts proprietary rights to some aspects of the Java
programming language, and Microsoft's C# programming language, which has open
implementations of most parts of the system, also has Common Language
Runtime (CLR) as a closed environment.
Many proprietary languages are widely used, in spite of their proprietary nature;
examples include MATLAB and VBScript. Some languages may make the transition
from closed to open; for example, Erlang was originally an Ericsson's internal
programming language.

Usage[edit]

Thousands of different programming languages have been created, mainly in the


computing field.[61] Software is commonly built with 5 programming languages or more.[62]
Programming languages differ from most other forms of human expression in that they
require a greater degree of precision and completeness. When using a natural language
to communicate with other people, human authors and speakers can be ambiguous and
make small errors, and still expect their intent to be understood. However, figuratively
speaking, computers "do exactly what they are told to do", and cannot "understand"
what code the programmer intended to write. The combination of the language
definition, a program, and the program's inputs must fully specify the external behavior
that occurs when the program is executed, within the domain of control of that program.
On the other hand, ideas about an algorithm can be communicated to humans without
the precision required for execution by using pseudocode, which interleaves natural
language with code written in a programming language.
A programming language provides a structured mechanism for defining pieces of data,
and the operations or transformations that may be carried out automatically on that
data. A programmer uses the abstractions present in the language to represent the
concepts involved in a computation. These concepts are represented as a collection of
the simplest elements available (called primitives).[63] Programming is the process by
which programmers combine these primitives to compose new programs, or adapt
existing ones to new uses or a changing environment.
Programs for a computer might be executed in a batch process without human
interaction, or a user might type commands in an interactive session of an interpreter. In
this case the "commands" are simply programs, whose execution is chained together.
When a language can run its commands through an interpreter (such as a Unix shell or
other command-line interface), without compiling, it is called a scripting language.[64]

Measuring language usage[edit]


Main article: Measuring programming language popularity
It is difficult to determine which programming languages are most widely used, and what
usage means varies by context. One language may occupy the greater number of
programmer hours, a different one have more lines of code, and a third may consume
the most CPU time. Some languages are very popular for particular kinds of
applications. For example,COBOL is still strong in the corporate data center, often on
large mainframes;[65][66] Fortran in scientific and engineering applications; Ada in
aerospace, transportation, military, real-time and embedded applications; and C in
embedded applications and operating systems. Other languages are regularly used to
write many different kinds of applications.
Various methods of measuring language popularity, each subject to a different bias over
what is measured, have been proposed:

counting the number of job advertisements that mention the language[67]

the number of books sold that teach or describe the language[68]

estimates of the number of existing lines of code written in the language which
may underestimate languages not often found in public searches [69]

counts of language references (i.e., to the name of the language) found using a web
search engine.

Combining and averaging information from various internet sites, langpop.com claims
that in 2013 the ten most popular programming languages are (in descending order by
overall popularity): C, Java, PHP, JavaScript, C++, Python, Shell, Ruby,ObjectiveC and C#.[70]

Taxonomies[edit]
For more details on this topic, see Categorical list of programming languages.
There is no overarching classification scheme for programming languages. A given
programming language does not usually have a single ancestor language. Languages

commonly arise by combining the elements of several predecessor languages with new
ideas in circulation at the time. Ideas that originate in one language will diffuse
throughout a family of related languages, and then leap suddenly across familial gaps to
appear in an entirely different family.
The task is further complicated by the fact that languages can be classified along
multiple axes. For example, Java is both an object-oriented language (because it
encourages object-oriented organization) and a concurrent language (because it
contains built-in constructs for running multiple threads in parallel). Python is an objectoriented scripting language.
In broad strokes, programming languages divide into programming paradigms and a
classification by intended domain of use, with general-purpose programming
languages distinguished from domain-specific programming languages. Traditionally,
programming languages have been regarded as describing computation in terms of
imperative sentences, i.e. issuing commands. These are generally called imperative
programming languages. A great deal of research in programming languages has been
aimed at blurring the distinction between a program as a set of instructions and a
program as an assertion about the desired answer, which is the main feature
of declarative programming.[71] More refined paradigms include procedural
programming, object-oriented programming, functional programming, and logic
programming; some languages are hybrids of paradigms or multi-paradigmatic.
An assembly language is not so much a paradigm as a direct model of an underlying
machine architecture. By purpose, programming languages might be considered
general purpose,system programming languages, scripting languages, domain-specific
languages, or concurrent/distributed languages (or a combination of these). [72] Some
general purpose languages were designed largely with educational goals. [73]
A programming language may also be classified by factors unrelated to programming
paradigm. For instance, most programming languages use English language keywords,
while a minority do not. Other languages may be classified as being deliberately
esoteric or not.

Computer programming
Computer programming (often shortened to programming) is a process that leads from an
original formulation of a computing problem to executablecomputer programs. Programming involves
activities such as analysis, developing understanding, generating algorithms, verification of
requirements of algorithms including their correctness and resources consumption, and
implementation (commonly referred to as coding[1][2]) of algorithms in a targetprogramming
language. Source code is written in one or more programming languages. The purpose of
programming is to find a sequence of instructions that will automate performing a specific task or
solving a given problem. The process of programming thus often requires expertise in many different
subjects, including knowledge of the application domain, specialized algorithms, and formal logic.
Related tasks include testing, debugging, and maintaining the source code, implementation of the
build system, and management of derived artifacts such as machine code of computer programs.
These might be considered part of the programming process, but often the term software
development is used for this larger process with the term programming, implementation,
or codingreserved for the actual writing of source code. Software
engineering combinesengineering techniques with software development practices.

Overview[edit]

Within software engineering, programming (the implementation) is regarded as one phase in


a software development process.
There is an ongoing debate on the extent to which the writing of programs is an art form, a craft, or
an engineeringdiscipline.[3] In general, good programming is considered to be the measured
application of all three, with the goal of producing an efficient and evolvable software solution (the
criteria for "efficient" and "evolvable" vary considerably). The discipline differs from many other
technical professions in that programmers, in general, do not need to be licensed or pass any
standardized (or governmentally regulated) certification tests in order to call themselves
"programmers" or even "software engineers." Because the discipline covers many areas, which may
or may not include critical applications, it is debatable whether licensing is required for the profession
as a whole. In most cases, the discipline is self-governed by the entities which require the
programming, and sometimes very strict environments are defined (e.g. United States Air Forceuse
of AdaCore and security clearance). However, representing oneself as a "professional software
engineer" without a license from an accredited institution is illegal in many parts of the world.
Another ongoing debate is the extent to which the programming language used in writing computer
programs affects the form that the final program takes.[citation needed] This debate is analogous to that
surrounding the SapirWhorf hypothesis[4]in linguistics and cognitive science, which postulates that a
particular spoken language's nature influences the habitual thought of its speakers. Different
language patterns yield different patterns of thought. This idea challenges the possibility of
representing the world perfectly with language because it acknowledges that the mechanisms of any
language condition the thoughts of its speaker community.

History[edit]
Ada Lovelace commenting on the work of Luigi Menabrea created the firstalgorithm designed for processing by
an Analytical Engine and is often recognized as history's first computer programmer.

Ancient cultures seemed to have no conception of computing beyond arithmetic,algebra,


and geometry, occasionally devising computational systems with elements of calculus (e.g.
the method of exhaustion). The only mechanical device that existed for numerical computation at the
beginning of human history was the abacus, invented in Sumeria circa 2500 BC. Later,
the Antikythera mechanism invented sometime around 100 BC in ancient Greece, is the first
known mechanical calculatorutilizing gears of various sizes and configuration to perform
calculations,[5] which tracked the metonic cycle still used in lunar-to-solar calendars, and which is
consistent for calculating the dates of the Olympiads.[6]
The medieval scientist Al-Jazari built programmable automata in 1206 AD. One system employed in
these devices was the use of pegs and cams placed into a wooden drum at specific locations, which
would sequentially trigger levers that in turn operated percussion instruments. The output of this
device was a small drummer playing various rhythms and drum patterns.[7] The Jacquard loom,
which Joseph Marie Jacquard developed in 1801, uses a series of pasteboard cards with holes
punched in them. The hole pattern represented the pattern that the loom had to follow in weaving
cloth. The loom could produce entirely different weaves using different sets of cards.
Charles Babbage adopted the use of punched cards around 1830 to control hisAnalytical Engine.
Mathematician Ada Lovelace, a friend of Babbage, between 1842 and 1843 translated an article by
Italian military engineer Luigi Menabrea on the engine,[8] which she supplemented with a set of notes,
simply called Notes. These notes include an algorithm to calculate a sequence of Bernoulli numbers,
[9]
intended to be carried out by a machine. Despite controversy over scope of her contribution, many
consider this algorithm to be the first computer program.[8]
Data and instructions were once stored on external punched cards, which were kept in order and arranged in
program decks.

In the 1880s, Herman Hollerith invented the recording of data on a medium that could then be read
by a machine. Prior uses of machine readable media, above, had been for lists of instructions (not
data) to drive programmed machines such as Jacquard looms and mechanized musical instruments.
"After some initial trials with paper tape, he settled on punched cards..."[10] To process these punched
cards, first known as "Hollerith cards" he invented the keypunch, sorter, and tabulator unit record
machines.[11] These inventions were the foundation of the data processing industry. In 1896 he
founded the Tabulating Machine Company (which later became the core of IBM). The addition of

a control panel (plugboard) to his 1906 Type I Tabulator allowed it to do different jobs without having
to be physically rebuilt. By the late 1940s, there were several unit record calculators, such as
the IBM 602 and IBM 604, whose control panels specified a sequence (list) of operations and thus
were programmable machines.
The invention of the von Neumann architecture allowed computer programs to be stored in computer
memory. Early programs had to be painstakingly crafted using the instructions (elementary
operations) of the particular machine, often in binarynotation. Every model of computer would likely
use different instructions (machine language) to do the same task. Later,assembly languages were
developed that let the programmer specify each instruction in a text format, entering abbreviations
for each operation code instead of a number and specifying addresses in symbolic form (e.g., ADD
X, TOTAL). Entering a program in assembly language is usually more convenient, faster, and less
prone to human error than using machine language, but because an assembly language is little
more than a different notation for a machine language, any two machines with different instruction
sets also have different assembly languages.

Wired control panel for an IBM 402 Accounting Machine

The synthesis of numerical calculation, predetermined operation and output, along with a way to
organize and input instructions in a manner relatively easy for humans to conceive and produce, led
to the modern development of computer programming. In 1954, FORTRAN was invented; it was the
first widely used high level programming language to have a functional implementation, as opposed
to just a design on paper.[12][13] (A high-level language is, in very general terms, any programming
language that allows the programmer to write programs in terms that are moreabstract than
assembly language instructions, i.e. at a level of abstraction "higher" than that of an assembly
language.) It allowed programmers to specify calculations by entering a formula directly (e.g. Y = X*2
+ 5*X + 9). The program text, or source, is converted into machine instructions using a special
program called a compiler, which translates the FORTRAN program into machine language. In fact,
the name FORTRAN stands for "Formula Translation". Many other languages were developed,
including some for commercial programming, such as COBOL. Programs were mostly still entered
using punched cards or paper tape. (See computer programming in the punch card era). By the late
1960s, data storage devices and computer terminals became inexpensive enough that programs
could be created by typing directly into the computers. Text editors were developed that allowed
changes and corrections to be made much more easily than with punched cards. (Usually, an error
in punching a card meant that the card had to be discarded and a new one punched to replace it.)
As time has progressed, computers have made giant leaps in processing power, which have allowed
the development of programming languages that are more abstracted from the underlying hardware.
Popular programming languages of the modern era include ActionScript, C, C+
+, C#, Haskell, Java, JavaScript, Objective-C, Perl, PHP, Python, Ruby, Smalltalk,SQL, Visual Basic,
and dozens more.[14] Although these high-level languages usually incur greater overhead, the
increase in speed of modern computers has made the use of these languages much more practical
than in the past. These increasingly abstracted languages are typically easier to learn and allow the
programmer to develop applications much more efficiently and with less source code. However,
high-level languages are still impractical for a few programs, such as those where low-level
hardware control is necessary or where maximum processing speed is vital. Computer programming
has become a popular career in the developed world, particularly in the United States, Europe, and
Japan. Due to the high labor cost of programmers in these countries, some forms of programming
have been increasingly subject to outsourcing(importing software and services from other countries,
usually at a lower wage), making programming career decisions in developed countries more
complicated, while increasing economic opportunities for programmers in less developed areas,
particularly China and India.

Modern programming
Quality requirements
Whatever the approach to development may be, the final program must satisfy some fundamental
properties. The following properties are among the most important:

Reliability: how often the results of a program are correct. This depends on conceptual
correctness of algorithms, and minimization of programming mistakes, such as mistakes in
resource management (e.g., buffer overflows and race conditions) and logic errors (such as
division by zero or off-by-one errors).

Robustness: how well a program anticipates problems due to errors (not bugs). This includes
situations such as incorrect, inappropriate or corrupt data, unavailability of needed resources
such as memory, operating system services and network connections, user error, and
unexpected power outages.

Usability: the ergonomics of a program: the ease with which a person can use the program
for its intended purpose or in some cases even unanticipated purposes. Such issues can make
or break its success even regardless of other issues. This involves a wide range of textual,
graphical and sometimes hardware elements that improve the clarity, intuitiveness,
cohesiveness and completeness of a program's user interface.

Portability: the range of computer hardware and operating system platforms on which the
source code of a program can be compiled/interpreted and run. This depends on differences in
the programming facilities provided by the different platforms, including hardware and operating
system resources, expected behavior of the hardware and operating system, and availability of
platform specific compilers (and sometimes libraries) for the language of the source code.

Maintainability: the ease with which a program can be modified by its present or future
developers in order to make improvements or customizations, fix bugs and security holes, or
adapt it to new environments. Good practices[15] during initial development make the difference in
this regard. This quality may not be directly apparent to the end user but it can significantly affect
the fate of a program over the long term.

Efficiency/performance: Measure of system resources a program consumes (processor time,


memory space, slow devices such as disks, network bandwidth and to some extent even user
interaction): the less, the better. This also includes careful management of resources, for
example cleaning up temporary files and eliminating memory leaks.

Readability of source code[edit]


In computer programming, readability refers to the ease with which a human reader can
comprehend the purpose, control flow, and operation of source code. It affects the aspects of quality
above, including portability, usability and most importantly maintainability.
Readability is important because programmers spend the majority of their time reading, trying to
understand and modifying existing source code, rather than writing new source code. Unreadable
code often leads to bugs, inefficiencies, andduplicated code. A study[16] found that a few simple
readability transformations made code shorter and drastically reduced the time to understand it.
Following a consistent programming style often helps readability. However, readability is more than
just programming style. Many factors, having little or nothing to do with the ability of the computer to
efficiently compile and execute the code, contribute to readability.[17] Some of these factors include:

Different indent styles (whitespace)

Comments

Decomposition

Naming conventions for objects (such as variables, classes, procedures, etc.)

The presentation aspects of this (such as indents, line breaks, color highlighting, and so on) are
often handled by thesource code editor, but the content aspects reflect the programmer's talent and
skills.
Various visual programming languages have also been developed with the intent to resolve
readability concerns by adopting non-traditional approaches to code structure and display. Integrated
development environments (IDEs) aim to integrate all such help. Techniques like Code
refactoring can enhance readability.

Algorithmic complexity[edit]
The academic field and the engineering practice of computer programming are both largely
concerned with discovering and implementing the most efficient algorithms for a given class of
problem. For this purpose, algorithms are classified intoorders using so-called Big O notation, which
expresses resource use, such as execution time or memory consumption, in terms of the size of an
input. Expert programmers are familiar with a variety of well-established algorithms and their
respective complexities and use this knowledge to choose algorithms that are best suited to the
circumstances.

Methodologies[edit]
The first step in most formal software development processes is requirements analysis, followed by
testing to determine value modeling, implementation, and failure elimination (debugging). There exist
a lot of differing approaches for each of those tasks. One approach popular for requirements
analysis is Use Case analysis. Many programmers use forms of Agile software development where
the various stages of formal software development are more integrated together into short cycles
that take a few weeks rather than years. There are many approaches to the Software development
process.
Popular modeling techniques include Object-Oriented Analysis and Design (OOAD) and ModelDriven Architecture (MDA). The Unified Modeling Language (UML) is a notation used for both the
OOAD and MDA.
A similar technique used for database design is Entity-Relationship Modeling (ER Modeling).
Implementation techniques include imperative languages (object-oriented or procedural), functional
languages, and logic languages.

Measuring language usage[edit]


Main article: Measuring programming language popularity
It is very difficult to determine what are the most popular of modern programming languages.
Methods of measuring programming language popularity include: counting the number of job
advertisements that mention the language,[18] the number of books sold and courses teaching the
language (this overestimates the importance of newer languages), and estimates of the number of
existing lines of code written in the language (this underestimates the number of users of business
languages such as COBOL).
Some languages are very popular for particular kinds of applications, while some languages are
regularly used to write many different kinds of applications. For example, COBOL is still strong in
corporate data centers[19] often on largemainframe computers, Fortran in engineering
applications, scripting languages in Web development, and C in embedded software. Many
applications use a mix of several languages in their construction and use. New languages are
generally designed around the syntax of a prior language with new functionality added, (for
example C++ adds object-orientation to C, and Java adds memory management and bytecode to C+
+, but as a result, loses efficiency and the ability for low-level manipulation).

Debugging[edit]

The bug from 1947 which is at the origin of a popular (but incorrect)
etymology for the common term for a software defect.

Main article: Debugging


Debugging is a very important task in the software development
process since having defects in a program can have significant
consequences for its users. Some languages are more prone to
some kinds of faults because their specification does not require
compilers to perform as much checking as other languages. Use of astatic code analysis tool can
help detect some possible problems.
Debugging is often done with IDEs like Eclipse, Visual
Studio, Kdevelop, NetBeansand Code::Blocks. Standalone debuggers like gdb are also used, and
these often provide less of a visual environment, usually using a command line.

Programming languages[edit]
Main articles: Programming language and List of programming languages
Different programming languages support different styles of programming (called programming
paradigms). The choice of language used is subject to many considerations, such as company
policy, suitability to task, availability of third-party packages, or individual preference. Ideally, the
programming language best suited for the task at hand will be selected. Trade-offs from this ideal
involve finding enough programmers who know the language to build a team, the availability of
compilers for that language, and the efficiency with which programs written in a given language
execute. Languages form an approximate spectrum from "low-level" to "high-level"; "low-level"
languages are typically more machine-oriented and faster to execute, whereas "high-level"
languages are more abstract and easier to use but execute less quickly. It is usually easier to code in
"high-level" languages than in "low-level" ones.
Allen Downey, in his book How To Think Like A Computer Scientist, writes:
The details look different in different languages, but a few basic instructions appear in just
about every language:

Input: Gather data from the keyboard, a file, or some other device.

Output: Display data on the screen or send data to a file or other device.

Arithmetic: Perform basic arithmetical operations like addition and multiplication.

Conditional Execution: Check for certain conditions and execute the appropriate
sequence of statements.

Repetition: Perform some action repeatedly, usually with some variation.

Many computer languages provide a mechanism to call functions provided by shared libraries.
Provided the functions in a library follow the appropriate run-time conventions (e.g., method of
passing arguments), then these functions may be written in any other language.

Programmers[edit]
Main article: Programmer
See also: Software developer and Software engineer
Computer programmers are those who write computer software. Their jobs usually involve:

Coding

Debugging

Documentation

Integration

Maintenance

Requirements analysis

Software architecture

Software testing

Specification

Variables, Data Types and Constants


Computers are useful because of their ability to store and manipulate huge
amounts of information. This information may be numbers, such as a financial
report, or alphabetic characters like names and addresses. Managing this
information requires the usage of programming languages. One of the
important task of a programming language is identifying the type of data it is
manipulating.
Data is stored in a computer's memory. The memory system comprises of
uniquely numbered cells called memory addresses. We need to know the
address where something is stored in order to retrieve it and work on it. A
programming language frees us from keeping track of these memory addresses
by substituting names for them. These names are called variables. Variables are
descriptive names for the memory addresses.
Before we use a variable in C we must declare it. We must identify what kind
of information will be stored in it. This is called defining a variable.Variables
must be declared at the start of any block of code, but most are found at the
start of each function. A variable must be defined to be one of the legal C data
types. When a variable is defined it is not automatically initialized, it is the
responsibility of the programmer to initialize this to a start value.

Variable names

All variable definitions must include two things variable name and its data
type. Some rules to be followed in naming a variable in C are, it must start with
an alphabet and can contain letter, underscores and digits. Both uppercase and
lowercase letters can be used. Spaces should not be used in a variable
name. Certain keywords like int, float, struct, if, while cannot be used as
variable names. The variable names should not be very long and one should
refer to the documentation of the C compiler to know the limitation. Generally
the first eight characters of a variable name are significant.

Some of the valid variable names are :

fAmount
iTotal
iCounter

Note : The variable names should be meaningful and denote what the variable
stores. As far as possible single letter variable names should be avoided.
C provides a number of data types, some of the basic types used are :
int

integer

char

character

float

a floating point number

Table 3.1

There are variants to the above, such as


unsigned int

unsigned integer

short int

short integer

long int

long integer

double

double precision floating point number

Table 3.2

When declaring variables to be of type long int, short int and unsigned int it is
permissible to omit the keyword int. Since these variables must always be
integers. Thus we can declare an unsigned integer counter as

unsigned counter;

Unlike in programming languages like Visual Basic, there is no datatype such


as string in C. However you can declare an array of the datatype char and write
functions to manipulate this.
The main difference among these data types is the amount of memory allocated
for storing these. The maximum and minimum values that can be stored in
these variables depends on the version of C and the computer system being
used. Some of the typical values are :

Data Type

Maximum

Minimum

Bytes

int

32767

-32768

unsigned int

65535

short

32767

-32768

2147483647

-2147483648

char (ASCII codes)

127

-128

unsigned char

255

3.4E+38

3.4E-38

1.7E+308

1.7E-308

long

float
double

Table 3.3

Format for declaring a variable in C is :

var_type variable names;


Eg :
int counter;
float total, amount, interest;
char answer;

There is no Boolean type in C, you should use int, unsigned char or char type.
Type int
Int is an integer datatype, a variable declared of this type can have any value in
the range indicated in Table 3.1 above. This is used to store whole numbers.

Program 3.1

/* Program to declare an integer variable i and then set it to be 20 */


#include <stdio.h>
void main()
{
int i = 20;
printf("This is my integer: %d \n", i);
}

The %d is to tell printf() routine to format the integer i as a


decimal number.

Type float

Floating point variables can store a value containing a decimal point. Floating
point numbers may contain both a whole and fractional part.

Program 3.2

/* Program to declare a float variable f and then set it to be 3.1415 */


#include <stdio.h>
void main()
{
float f = 3.1415;
printf("This is my float: %f \n", f);
}

The %f is to tell printf() routine to format the float variable f as


a decimal floating number. Now if you want to see only the first
2 numbers after the floating point, you would modify your
printf() routine call to be like this:

printf("This is my shorter float: %.2f \n", f);

Type double
The type double is similar to float except that it is used whenever the accuracy
of a floating point variable is insufficient. Variables declared to be of type
double can store approximately twice as many significant digits as a variable of
type float. You can use similar method as in the previous example to print a
double.
Type char
Character variables are used to store character values particularly single
characters.
Note: The keyword void is used to denote nothing. As such one will not declare
a variable of type void in a program, but it is more useful when declaring
functions which do not return a value.

Program 3.3

/* Program to declare a character variable d and then set it to be 'd' */


#include <stdio.h>
void main()
{
char c;
c = 'd';
printf("This is my character: %c \n", c);
}

The %c is to tell printf() routine to format the variable f as a


character.

Constants

In C any literal number, single character or string of characters is known as a


constant. Consider the following expression :

fTotal = 30 + fAmount;

Here fTotal and fAmount are variables and the number 30 is a constant. This
value will not change in a program. A user using a program having constants
cannot change the constants when the program is executing. Constants are
generally hard-coded into a program. Unintended modifications are prevented
from occurring. The compiler will catch attempts to reassign new values to
constants.
There are three ways for defining a constant in C. First method is by using
#define statements. #define statement is a preprocessor directive . It is used to
define constants as follows :

#define TRUE 1
#define FALSE 0
#define PI 3.1415

Wherever the constant appears in a program, the preprocessor replaces it by its


value.
The second method is by using the const keyword in conjunction with a
variable declaration.

const float PI = 3.1415;

Here, the compiler will throw an error whenever the variable PI declared as a
constant is assigned a new value.
The third method is by using enumerated data type. This data type is initiated
by the keyword enum, immediately after the keyword enum is the name of the
variable, followed by a list of values enclosed in curly braces. This defines a set
of constants. Hence eliminates the usage of multiple #define statements.

enum answer {FALSE, TRUE};


Thus
answer = "No";

is an invalid statement.
This example is to show how you would output constant values
using printf() routine.

Program 3.4

/* Program to show using #define to declare constants */


#include <stdio.h>
#define TRUE 1
#define FALSE 0
#define PI 3.1415
void main()
{
printf("This is TRUE as an integer: %d \n", TRUE);
printf("This is FALSE as an integer: %d \n", FALSE);
printf("This is PI as an integer: %d \n", PI);
printf("This is PI as a float: %.4 \n", PI);
}

The %c is to tell printf() routine to format the variable f as a


character.

In C an enumerator is of type int. Unless specified the first value is internally


assigned the value 0. So FALSE is assigned a value 0 and TRUE a 1. For each
additional constant in the enumeration the value increases by 1.

enum COLOR {RED, BLUE, GREEN, YELLOW};

Enumerated values that appear subsequently after YELLOW without a specific


value will be assigned sequential values with 53 plus one.
Consider another enumerated data type :

enum direction {north, south, east = 40; west};


Here north =0, south=1, east=40 and west=41.

Expression and Operators in QBASIC


EXPRESSIONS
An expression is the combination of operators, constants and variables that is
evaluated to get a result. The result of the expression is either string data,
numeric data or logical value (true or false) and can be stored in a variable.
For example, the following are expressions in QBASIC.
(A + B) > C
A>=B+C
u* t + *a*t^2

An arithmetic expression may contain more than one operator. While


evaluating such expressions, a hierarchy is followed. The hierarchy in
arithmetic operations is listed as given below:
a. Exponentiation (^)
b. Negation (-)
c. Multiplication and division
d. Integer division
e. Modular division
f. Addition and Subtraction
The hierarchy in relational operations are =, >, <, <>, < =, and > =
respectively. The hierarchy in logical operations are NOT, AND and OR.
NOTE:
When parenthesis is used, it changes the order of hierarchy. The
operators inside the parenthesis are evaluated first. So, you can say
QBASIC expression follows rule of PEDMAS where P, E, D, M, A and S
stand for parenthesis, Exponentiation, Division, Multiplication,
Addition, and Subtraction respectively.
Algebraic expression cannot be used directly in programming. It must
be converted into QBASIC expression.
Algebraic Expression --------------------------------- BASIC Expression
A = L B ------------------------------------------------- A = L * B
P = 2(L + B) ---------------------------------------------- P = 2*(L + B)
I = (P T R)/100 --------------------------------------- I = (P * T * R)/100
V = 4/3 pi R^3 ------------------------------------------- V = 4/3 * PI * R^3

Control Structures in basic


In a program, a control structure determines the order in which statements
are executed. You have already seen all the control structures any program
will need:
1. Sequential Execution
2. Loops
3. Two-way Decisions
Sequential execution is when statements are executed one after another in
order. You don't need to do anything special for this to happen. Loops are
performed using the DO...LOOP control structure. Two-way decisions

are

performed

using

theIF...THEN...ELSE

...END

IF structure.
These are the only control structures anybody needs, no matter what their
task. Other control structures exist and are sometimes useful, but are never
essential.
The basic control structures are like basic shop tools (hammer, saw, screw
driver, ...). Most shop projects require the basic tools. Specialized tools might
be helpful, but are not essential. You should learn how to use the basic tools
before buying the specialized tools.
WebSphere(R) Development Studio ILE RPG Reference

Using Arrays and Tables

Arrays and tables are both collections of data fields (elements) of the same:
Field length
Data type
o Character
o Numeric
o Data Structure
o Date
o Time
o Timestamp
o Graphic
o Basing Pointer
o Procedure Pointer
o UCS-2
Format
Number of decimal positions (if numeric)
Arrays and tables differ in that:

You can refer to a specific array element by its position


You cannot refer to specific table elements by their position
An array name by itself refers to all elements in the array
A table name always refers to the element found in the
last LOOKUP (Look Up a Table or Array Element) operation
Note:
You can define only run-time arrays in a subprocedure. Tables,
prerun-time arrays, and compile-time arrays are not supported.
If you want to use a pre-run array or compile-time array in a
subprocedure, you must define it in the main source section.
The next section describes how to code an array, how to specify the initial values of
the array elements, how to change the values of an array, and the special
considerations for using an array. The section after next describes the same
information for tables.

Arrays

There are three types of arrays:


The run-time array is loaded by your program while it is
running.
The compile-time array is loaded when your program is
created. The initial data becomes a permanent part of your
program.
The prerun-time array is loaded from an array file when your
program begins running, before any input, calculation, or
output operations are processed.
The essentials of defining and loading an array are described for a run-time array. For
defining and loading compile-time and prerun-time arrays you use these essentials and
some additional specifications.
Array Name and Index

You refer to an entire array using the array name alone. You refer to the individual
elements of an array using (1) the array name, followed by (2) a left parenthesis,
followed by (3) an index, followed by (4) a right parenthesis -- for example:
AR(IND). The index indicates the position of the element within the array (starting
from 1) and is either a number or a field containing a number.

The following rules apply when you specify an array name and index:
The array name must be a unique symbolic name.
The index must be a numeric field or constant greater than
zero and with zero decimal positions
If the array is specified within an expression in the extended
factor 2 field, the index may be an expression returning a
numeric value with zero decimal positions

At run time, if your program refers to an array using an index


with a value that is zero, negative, or greater than the number
of elements in the array, then the error/exception routine takes
control of your program.

The Essential Array Specifications

You define an array on a definition specification. Here are the essential specifications
for all arrays:
Specify the array name in positions 7 through 21
Specify the number of entries in the array using the DIM
keyword
Specify length, data format, and decimal positions as you
would any scalar fields. You may specify explicit From- and Toposition entries (if defining a subfield), or an explicit Lengthentry; or you may define the array attributes using the LIKE
keyword; or the attributes may be specified elsewhere in the
program.
If you need to specify a sort sequence, use the ASCEND or
DESCEND keywords.
Figure 65 shows an example of the essential array specifications.
Coding a Run-Time Array

If you make no further specifications beyond the essential array specifications, you
have defined a run-time array. Note that the keywords ALT, CTDATA, EXTFMT,
FROMFILE, PERRCD, and TOFILE cannot be used for a run-time array.
Figure 65. The Essential Array Specifications to Define a Run-Time Array

*...1....+....2....+....3....+....4....+....5....+....6....+....7...

DName+++++++++++ETDsFrom+++To/L+++IDc.Keywords+++++++++++++++++++++++++++
DARC
S
3A
DIM(12)

Loading a Run-Time Array

You can assign initial values for a run-time array using the INZ keyword on the
definition specification. You can also assign initial values for a run-time array through
input or calculation specifications. This second method may also be used to put data
into other types of arrays.
For example, you may use the calculation specifications for the MOVE operation to
put 0 in each element of an array (or in selected elements).
Using the input specifications, you may fill an array with the data from a file. The
following sections provide more details on retrieving this data from the records of a
file.
Note:
Date and time runtime data must be in the same format and
use the same separators as the date or time array being
loaded.
Loading a Run-Time Array in One Source Record
If the array information is contained in one record, the information can occupy
consecutive positions in the record or it can be scattered throughout the record.
If the array elements are consecutive on the input record, the array can be loaded with
a single input specification. Figure 66 shows the specifications for loading an array,
INPARR, of six elements (12 characters each) from a single record from the file
ARRFILE.
Figure 66. Using a Run-Time Array with Consecutive Elements

*...1....+....2....+....3....+....4....+....5....+....6....+....7...
DName+++++++++++ETDsFrom+++To/L+++IDc.Keywords+++++++++++++++++++++++++++
DINPARR
S
12A
DIM(6)
IFilename++SqNORiPos1+NCCPos2+NCCPos3+NCC................................
I........................Fmt+SPFrom+To+++DcField+++++++++L1M1FrPlMnZr....
IARRFILE
AA 01
I
1
72 INPARR

If the array elements are scattered throughout the record, they can be defined and
loaded one at a time, with one element described on a specification line. Figure
67shows the specifications for loading an array, ARRX, of six elements with 12
characters each, from a single record from file ARRFILE; a blank separates each of
the elements from the others.

Figure 67. Defining a Run-Time Array with Scattered Elements

*...1....+....2....+....3....+....4....+....5....+....6....+....7...
DName+++++++++++ETDsFrom+++To/L+++IDc.Keywords+++++++++++++++++++++++++++
DARRX
S
12A
DIM(6)
IFilename++SqNORiPos1+NCCPos2+NCCPos3+NCC................................
I........................Fmt+SPFrom+To+++DcField+++++++++L1M1FrPlMnZr....
IARRFILE
AA 01
I
1
12 ARRX(1)
I
14
25 ARRX(2)
I
27
38 ARRX(3)
I
40
51 ARRX(4)
I
53
64 ARRX(5)
I
66
77 ARRX(6)

Loading a Run-Time Array Using Multiple Source Records


If the array information is in more than one record, you may use various methods to
load the array. The method to use depends on the size of the array and whether or not
the array elements are consecutive in the input records. The ILE RPG program
processes one record at a time. Therefore the entire array is not processed until all the
records containing the array information are read and the information is moved into
the array fields. It may be necessary to suppress calculation and output operations
until the entire array is read into the program.
Sequencing Run-Time Arrays
Run-time arrays are not sequence checked. If you process a SORTA (sort an array)
operation, the array is sorted into the sequence specified on the definition
specification (the ASCEND or DESCEND keywords) defining the array. If the
sequence is not specified, the array is sorted into ascending sequence. When the high
(positions 71 and 72 of the calculation specifications) or low (positions 73 and 74 of
the calculation specifications) indicators are used in the LOOKUP operation, the array
sequence must be specified.
Coding a Compile-Time Array

A compile-time array is specified using the essential array specifications plus the
keyword CTDATA. In addition, on a definition specification you can specify:
The number of array entries in an input record using the
PERRCD keyword. If the keyword is not specified, the number of
entries defaults to 1.
The external data format using the EXTFMT keyword. The only
allowed values are L (left-sign), R (right-sign), or S (zoneddecimal). The EXTFMT keyword is not allowed for float compiletime arrays.

A file to which the array is to be written when the program


ends with LR on. You specify this using the TOFILE keyword.
See Figure 68 for an example of a compile-time array.
Loading a Compile-Time Array

For a compile-time array, enter array source data into records in the program source
member. If you use the **ALTSEQ, **CTDATA, and **FTRANS keywords, the
array data may be entered in anywhere following the source records. If you do not use
those keywords, the array data must follow the source records, and any alternate
collating sequence or file translation records in the order in which the compile-time
arrays and tables were defined on the definition specifications. This data is loaded into
the array when the program is compiled. Until the program is recompiled with new
data, the array will always initially have the same values each time you call the
program unless the previous call ended with LR off.
Compile-time arrays can be described separately or in alternating format (with the
ALT keyword). Alternating format means that the elements of one array are
intermixed on the input record with elements of another array.
Rules for Array Source Records
The rules for array source records are:
The first array entry for each record must begin in position 1.
All elements must be the same length and follow each other
with no intervening spaces
An entire record need not be filled with entries. If it is not,
blanks or comments can be included after the entries
(see Figure 68).
If the number of elements in the array as specified on the
definition specification is greater than the number of entries
provided, the remaining elements are filled with the default
values for the data type specified.
Figure 68. Array Source Record with Comments

DName+++++++++++ETDsFrom+++To/L+++IDc.Keywords++++++++++++++++++++
DARC
S
3A
DIM(12) PERRCD(5) CTDATA
**CTDATA ARC
48K16343J64044HComments can be placed here
12648A47349K346Comments can be placed here
50B125
Comments can be placed here

Each record, except the last, must contain the number of


entries specified with the PERRCD keyword on the definition
specifications. In the last record, unused entries must be blank
and comments can be included after the unused entries.
Each entry must be contained entirely on one record. An entry
cannot be split between two records; therefore, the length of a
single entry is limited to the maximum length of 100
characters (size of source record). If arrays are used and are
described in alternating format, corresponding elements must
be on the same record; together they cannot exceed 100
characters.
For date and time compile-time arrays the data must be in the
same format and use the same separators as the date or time
array being loaded.
Array data may be specified in one of two ways:
1. **CTDATA arrayname: The data for the array may be
specified anywhere in the compile-time data section.
2. **b: (b=blank) The data for the arrays must be specified
in the same order in which they are specified in the
Definition specifications.
Only one of these techniques may be used in one program.
Arrays can be in ascending(ASCEND keyword), descending
(DESCEND keyword), or no sequence (no keyword specified).

For ascending or descending character arrays when


ALTSEQ(*EXT) is specified on the control specification, the
alternate collating sequence is used for the sequence
checking. If the actual collating sequence is not known at
compile time (for example, if SRTSEQ(*JOBRUN) is specified on
a control specification or as a command parameter) the
alternate collating sequence table will be retrieved at runtime
and the checking will occur during initialization at *INIT.
Otherwise, the checking will be done at compile time.

Graphic and UCS-2 arrays will be sorted by hexadecimal


values, regardless of the alternate collating sequence.
If L or R is specified on the EXTFMT keyword on the definition
specification, each element must include the sign (+ or -). An
array with an element size of 2 with L specified would require 3
positions in the source data as shown in the following example.

*....+....1....+....2....+....3....+....4....+....5....+....6....+....*
DName+++++++++++ETDsFrom+++To/L+++IDc.Keywords++++++++++++++++++++
D UPDATES
2 0 DIM(5) PERRCD(5) EXTFMT(L)
CTDATA
**CTDATA UDPATES
+37-38+52-63-49+51

Float compile-time data are specified in the source records as


float or numeric literals. Arrays defined as 4-byte float require
14 positions for each element; arrays defined as 8-byte float
require 23 positions for each element.

Graphic data must be enclosed in shift-out and shift-in


characters. If several elements of graphic data are included in
a single record (without intervening nongraphic data) only one
set of shift-out and shift-in characters is required for the record.
If a graphic array is defined in alternating format with a
nongraphic array, the shift-in and shift-out characters must
surround the graphic data. If two graphic arrays are defined in
alternating format, only one set of shift-in and shift-out
characters is required for each record.

Coding a Prerun-Time Array

In addition to the essential array specifications, you can also code the following
specifications or keywords for prerun-time arrays.
On the definition specifications, you can specify
The name of the file with the array input data, using the
FROMFILE keyword.
The name of a file to which the array is written at the end of
the program, using the TOFILE keyword.
The number of elements per input record, using the PERRCD
keyword.
The external format of numeric array data using the EXTFMT
keyword.
An alternating format using the ALT keyword.
Note:
The integer or unsigned format cannot be specified for arrays
defined with more than ten digits.
On the file-description specifications, you can specify a T in position 18 for the file
with the array input data.

Example of Coding Arrays

Figure 69 shows the definition specifications required for two prerun-time arrays, a
compile-time array, and a run-time array.
Figure 69. Definition Specifications for Different Types of Arrays

*....+....1....+....2....+....3....+....4....+....5....+....6....+....*
HKeywords+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
H DATFMT(*USA) TIMFMT(*HMS)
D*ame+++++++++++ETDsFrom+++To/L+++IDc.Keywords++++++++++++++++++++
* Run-time array. ARI has 10 elements of type date. They are
* initialized to September 15, 1994. This is in month, day,
* year format using a slash as a separator as defined on the
* control specification.
DARI
S
D
DIM(10) INZ(D'09/15/1994')
*
* Compile-time arrays in alternating format. Both arrays have
* eight elements (three elements per record). ARC is a character
* array of length 15, and ARD is a time array with a predefined
* length of 8.
DARC
S
15
DIM(8) PERRCD(3)
D
CTDATA
DARD
S
T
DIM(8) ALT(ARC)
*
* Prerun-time array. ARE, which is to be read from file DISKIN,
* has 250 character elements (12 elements per record). Each
* element is five positions long. The size of each record
* is 60 (5*12). The elements are arranged in ascending sequence.
DARE
S
5A
DIM(250) PERRCD(12) ASCEND
D
FROMFILE(DISKIN)
*
* Prerun-time array specified as a combined file. ARH is written
* back to the same file from which it is read when the program
* ends normally with LR on. ARH has 250 character elements
* (12 elements per record). Each elements is five positions long.
* The elements are arranged in ascending sequence.
DARH
S
5A
DIM(250) PERRCD(12) ASCEND
D
FROMFILE(DISKOUT)
D
TOFILE(DISKOUT)
**CTDATA ARC
Toronto
12:15:00Winnipeg
13:23:00Calgary
15:44:00
Sydney
17:24:30Edmonton
21:33:00Saskatoon
08:40:00
Regina
12:33:00Vancouver
13:20:00

Loading a Prerun-Time Array

For a prerun-time array, enter array input data into a file. The file must be a sequential
program described file. During initialization, but before any input, calculation, or
output operations are processed the array is loaded with initial values from the file. By
modifying this file, you can alter the array's initial values on the next call to the
program, without recompiling the program. The file is read in arrival sequence. The
rules for prerun-time array data are the same as for compile-time array data, except
there are no restrictions on the length of each record. See Rules for Array Source
Records.
Sequence Checking for Character Arrays

Sequence checking for character arrays that have not been defined with
ALTSEQ(*NONE) has two dependencies:
1. Whether the ALTSEQ control specification keyword has been
specified, and if so, how.
2. Whether the array is compile time or prerun time.
The following table indicates when sequence checking occurs.
Control
Specification
Entry

ALTSEQ Used
for SORTA,
LOOKUP and
Sequence
Checking

When
Sequence
Checked for
Compile Time
Array

When
Sequence
Checked for
Prerun Time
Array

ALTSEQ(*NONE) No

Compile time

Run time

ALTSEQ(*SRC)

No

Compile time

Run time

ALTSEQ(*EXT)
(known at
compile time)

Yes

Compile time

Run time

ALTSEQ(*EXT)
(known only at
run time)

Yes

Run time

Run time

Note:
For compatibility with RPG III, SORTA and LOOKUP do not use
the alternate collating sequence with ALTSEQ(*SRC). If you
want these operations to be performed using the alternate
collating sequence, you can define a table on the system
(object type *TBL), containing your alternate sequence. Then
you can change ALTSEQ(*SRC) to ALTSEQ(*EXT) on your control
specification and specify the name of your table on the SRTSEQ
keyword or parameter of the create command.

Initializing Arrays
Run-Time Arrays

To initialize each element in a run-time array to the same value, specify the INZ
keyword on the definition specification. If the array is defined as a data structure
subfield, the normal rules for data structure initialization overlap apply (the
initialization is done in the order that the fields are declared within the data structure).

Compile-Time and Prerun-Time Arrays

The INZ keyword cannot be specified for a compile-time or prerun-time array,


because their initial values are assigned to them through other means (compile-time
data or data from an input file). If a compile-time or prerun-time array appears in a
globally initialized data structure, it is not included in the global initialization.
Note:
Compile-time arrays are initialized in the order in which the
data is declared after the program, and prerun-time arrays are
initialized in the order of declaration of their initialization files,
regardless of the order in which these arrays are declared in
the data structure. Pre-run time arrays are initialized after
compile-time arrays.
If a subfield initialization overlaps a compile-time or prerun-time array, the
initialization of the array takes precedence; that is, the array is initialized after the
subfield, regardless of the order in which fields are declared within the data structure.

Defining Related Arrays

You can load two compile-time arrays or two prerun-time arrays in alternating
format by using the ALT keyword on the definition of the alternating array. You
specify the name of the primary array as the parameter for the ALT keyword. The
records for storing the data for such arrays have the first element of the first array
followed by the first element of the second array, the second element of the first array
followed by the second element of the second array, the third element of the first array
followed by the third element of the second array, and so on. Corresponding elements
must appear on the same record. The PERRCD keyword on the main array definition
specifies the number of corresponding pairs per record, each pair of elements counting
as a single entry. You can specify EXTFMT on both the main and alternating array.
Figure 70 shows two arrays, ARRA and ARRB, in alternating format.
Figure 70. Arrays in Alternating and Nonalternating Format

The records for ARRA and ARRB look like the records below when described as two
separate array files.
This record contains ARRA entries in positions 1 through 60.
Figure 71. Arrays Records for Two Separate Array Files

This record contains ARRB entries in positions 1 through 50.


Figure 72. Arrays Records for One Array File
The records for ARRA and ARRB look like the records below when described as one
array file in alternating format. The first record contains ARRA and ARRB entries in
alternating format in positions 1 through 55. The second record contains ARRA and
ARRB entries in alternating format in positions 1 through 55.
Figure 73. Arrays Records for One Array File in Alternating Format

DName+++++++++++ETDsFrom+++To/L+++IDc.Keywords+++++++++++++++++++++
DARRA
S
6A
DIM(6) PERRCD(1) CTDATA
DARRB
S
5 0 DIM(6) ALT(ARRA)
DARRGRAPHIC
S
3G
DIM(2) PERRCD(2) CTDATA
DARRC
S
3A
DIM(2) ALT(ARRGRAPHIC)
DARRGRAPH1
S
3G
DIM(2) PERRCD(2) CTDATA
DARRGRAPH2
S
3G
DIM(2) ALT(ARRGRAPH1)
**CTDATA ARRA
345126 373
38A437 498
39K143 1297
40B125
93
41C023 3998
42D893
87
**CTDATA ARRGRAPHIC
ok1k2k3iabcok4k5k6iabc
**CTDATA ARRGRAPH1
ok1k2k3k4k5k6k1k2k3k4k5k6i

Searching Arrays

The following can be used to search arrays:


The LOOKUP operation code
The %LOOKUP built-in function
The %LOOKUPLT built-in function
The %LOOKUPLE built-in function
The %LOOKUPGT built-in function
The %LOOKUPGE built-in function
For more information about the LOOKUP operation code, see:

Searching an Array with an Index


Searching an Array Without an Index
LOOKUP (Look Up a Table or Array Element)
For more information about the %LOOKUPxx built-in functions, see %LOOKUPxx
(Look Up an Array Element).
Searching an Array Without an Index

When searching an array without an index, use the status (on or off) of the resulting
indicators to determine whether a particular element is present in the array. Searching
an array without an index can be used for validity checking of input data to determine
if a field is in a list of array elements. Generally, an equal LOOKUP is requested.
In factor 1 in the calculation specifications, specify the search argument (data for
which you want to find a match in the array named) and place the array name factor 2.
In factor 2 specify the name of the array to be searched. At least one resulting
indicator must be specified. Entries must not be made in both high and low for the
same LOOKUP operation. The resulting indicators must not be specified in high or
low if the array is not in sequence (ASCEND or DESCEND keywords). Control level
and conditioning indicators (specified in positions 7 through 11) can also be used. The
result field cannot be used.
The search starts at the beginning of the array and ends at the end of the array or when
the conditions of the lookup are satisfied. Whenever an array element is found that
satisfies the type of search being made (equal, high, low), the resulting indicator is set
on.
Figure 74 shows an example of a LOOKUP on an array without an index.
Figure 74. LOOKUP Operation for an Array without an Index

*...1....+....2....+....3....+....4....+....5....+....6....+....7...
FFilename++IPEASFRlen+LKlen+AIDevice+.Keywords++++++++++++++++++++++++++++
FARRFILE
IT
F
5
DISK
F*
DName+++++++++++ETDsFrom+++To/L+++IDc.Keywords+++++++++++++++++++++++++++
DDPTNOS
S
5S 0 DIM(50) FROMFILE(ARRFILE)
D*
CL0N01Factor1+++++++Opcode(E)+Factor2+++++++Result++++++++Len++D+HiLoEq..
C* The LOOKUP operation is processed and, if an element of DPTNOS equal
C* to the search argument (DPTNUM) is found, indicator 20 is set on.
C
DPTNUM
LOOKUP
DPTNOS
20

ARRFILE, which contains department numbers, is defined in the file description


specifications as an input file (I in position 17) with an array file designation (T in

position 18). The file is program described (F in position 22), and each record is 5
positions in length (5 in position 27).
In the definition specifications, ARRFILE is defined as containing the array
DPTNOS. The array contains 50 entries (DIM(50)). Each entry is 5 positions in length
(positions 33-39) with zero decimal positions (positions 41-42). One department
number can be contained in each record (PERRCD defaults to 1).
Searching an Array with an Index

To find out which element satisfies a LOOKUP search, start the search at a particular
element in the array. To do this type of search, make the entries in the calculation
specifications as you would for an array without an index. However, in factor 2, enter
the name of the array to be searched, followed by a parenthesized numeric field (with
zero decimal positions) containing the number of the element at which the search is to
start. This numeric constant or field is called the index because it points to a certain
element in the array. The index is updated with the element number which satisfied
the search or is set to 0 if the search failed.
You can use a numeric constant as the index to test for the existence of an element that
satisfies the search starting at an element other than 1.
All other rules that apply to an array without an index apply to an array with an index.
Figure 75 shows a LOOKUP on an array with an index.
Figure 75. LOOKUP Operation on an Array with an Index

*...1....+....2....+....3....+....4....+....5....+....6....+....7...
FFilename++IPEASFRlen+LKlen+AIDevice+.Keywords++++++++++++++++++++++++++++
FARRFILE
IT
F
25
DISK
F*
DName+++++++++++ETDsFrom+++To/L+++IDc.Keywords+++++++++++++++++++++++++++
DDPTNOS
S
5S 0 DIM(50) FROMFILE(ARRFILE)
DDPTDSC
S
20A
DIM(50) ALT(DPTNOS)
D*
CL0N01Factor1+++++++Opcode(E)+Factor2+++++++Result++++++++Len++D+HiLoEq..
C* The Z-ADD operation begins the LOOKUP at the first element in DPTNOS.
C
Z-ADD
1
X
3 0
C* At the end of a successful LOOKUP, when an element has been found
C* that contains an entry equal to the search argument DPTNUM,
C* indicator 20 is set on and the MOVE operation places the department
C* description, corresponding to the department number, into DPTNAM.
C
DPTNUM
LOOKUP
DPTNOS(X)
20
C* If an element is not found that is equal to the search argument,
C* element X of DPTDSC is moved to DPTNAM.
C
IF
NOT *IN20
C
MOVE
DPTDSC(X)
DPTNAM
20
C
ENDIF

This example shows the same array of department numbers, DPTNOS, as Figure 74.
However, an alternating array of department descriptions, DPTDSC, is also defined.

Each element in DPTDSC is 20 positions in length. If there is insufficient data in the


file to initialize the entire array, the remaining elements in DPTNOS are filled with
zeros and the remaining elements in DPTDSC are filled with blanks.

Array and Table


Using Arrays

Arrays can be used in input, output, or calculation specifications.


Specifying an Array in Calculations

An entire array or individual elements in an array can be specified in calculation


specifications. You can process individual elements like fields.
A noncontiguous array defined with the OVERLAY keyword cannot be used with the
MOVEA operation or in the result field of a PARM operation.
To specify an entire array, use only the array name, which can be used as factor 1,
factor 2, or the result field. The following operations can be used with an array name:
ADD, Z-ADD, SUB, Z-SUB, MULT, DIV, SQRT, ADDDUR, SUBDUR, EVAL,
EXTRCT, MOVE, MOVEL, MOVEA, MLLZO, MLHZO, MHLZO, MHHZO,
DEBUG, XFOOT, LOOKUP, SORTA, PARM, DEFINE, CLEAR, RESET, CHECK,
CHECKR, and SCAN.
Several other operations can be used with an array element only but not with the array
name alone. These operations include but are not limited to: BITON, BITOFF, COMP,
CABxx, TESTZ, TESTN, TESTB, MVR, DO, DOUxx, DOWxx, DOU, DOW, IFxx,
WHENxx, WHEN, IF, SUBST, and CAT.
When specified with an array name without an index or with an asterisk as the index
(for example, ARRAY or ARRAY(*)) certain operations are repeated for each element
in the array. These are ADD, Z-ADD, EVAL, SUB, Z-SUB, ADDDUR, SUBDUR,
EXTRCT, MULT, DIV, SQRT, MOVE, MOVEL, MLLZO, MLHZO, MHLZO and
MHHZO. The following rules apply to these operations when an array name without
an index is specified:
When factors 1 and 2 and the result field are arrays with the
same number of elements, the operation uses the first element
from every array, then the second element from every array
until all elements in the arrays are processed. If the arrays do
not have the same number of entries, the operation ends when
the last element of the array with the fewest elements has
been processed. When factor 1 is not specified for the ADD,
SUB, MULT, and DIV operations, factor 1 is assumed to be the
same as the result field.

When one of the factors is a field, a literal, or a figurative


constant and the other factor and the result field are arrays,
the operation is done once for every element in the shorter
array. The same field, literal, or figurative constant is used in all
of the operations.
The result field must always be an array.
If an operation code uses factor 2 only (for example, Z-ADD, ZSUB, SQRT, ADD, SUB, MULT, or DIV may not have factor 1
specified) and the result field is an array, the operation is done
once for every element in the array. The same field or constant
is used in all of the operations if factor 2 is not an array.
Resulting indicators (positions 71 through 76) cannot be used
because of the number of operations being processed.
In an EVAL expression, if any arrays on the right-hand side are
specified without an index, the left-hand side must also contain
an array without an index.
Note:
When used in an EVAL operation %ADDR(arr) and
%ADDR(arr(*)) do not have the same meaning. See %ADDR
(Get Address of Variable) for more detail.

Sorting Arrays

You can sort arrays using the SORTA (Sort an Array) operation code. The array is
sorted into sequence (ascending or descending), depending on the sequence specified
for the array on the definition specification.
Sorting using part of the array as a key

You can use the OVERLAY keyword to overlay one array over another. For example,
you can have a base array which contains names and salaries and two overlay arrays
(one for the names and one for the salaries). You could then sort the base array by
either name or salary by sorting on the appropriate overlay array.
Figure 76. SORTA Operation with OVERLAY

*...1....+....2....+....3....+....4....+....5....+....6....+....7...+....
DName+++++++++++ETDsFrom+++To/L+++IDc.Keywords+++++++++++++++++++++++++++
D
DS
D Emp_Info
50
DIM(500) ASCEND
D
Emp_Name
45
OVERLAY(Emp_Info:1)

D
Emp_Salary
9P 2 OVERLAY(Emp_Info:46)
D
CL0N01Factor1+++++++Opcode(E)+Factor2+++++++Result++++++++Len++D+HiLoEq....
C
C* The following SORTA sorts Emp_Info by employee name.
C* The sequence of Emp_Name is used to determine the order of the
C* elements of Emp_Info.
C
SORTA
Emp_Name
C* The following SORTA sorts Emp_Info by employee salary
C* The sequence of Emp_Salary is used to determine the order of the
C* elements of Emp_Info.
C
SORTA
Emp_Salary

Array Output

Entire arrays can be written out under ILE RPG control only at end of program when
the LR indicator is on. To indicate that an entire array is to be written out, specify the
name of the output file with the TOFILE keyword on the definition specifications.
This file must be described as a sequentially organized output or combined file in the
file description specifications. If the file is a combined file and is externally described
as a physical file, the information in the array at the end of the program replaces the
information read into the array at the start of the program. Logical files may give
unpredictable results.
If an entire array is to be written to an output record (using output specifications),
describe the array along with any other fields for the record:
Positions 30 through 43 of the output specifications must
contain the array name used in the definition specifications.
Positions 47 through 51 of the output specifications must
contain the record position where the last element of the array
is to end. If an edit code is specified, the end position must
include blank positions and any extensions due to the edit code
(see "Editing Entire Arrays" listed next in this chapter).
Output indicators (positions 21 through 29) can be specified. Zero suppress (position
44), blank-after (position 45), and data format (position 52) entries pertain to every
element in the array.
Editing Entire Arrays

When editing is specified for an entire array, all elements of the array are edited. If
different editing is required for various elements, refer to them individually.
When an edit code is specified for an entire array (position 44), two blanks are
automatically inserted between elements in the array: that is, there are blanks to the
left of every element in the array except the first. When an edit word is specified, the
blanks are not inserted. The edit word must contain all the blanks to be inserted.

Editing of entire arrays is only valid in output specifications, not with the %EDITC or
%EDITW built-in functions.

Tables

The explanation of arrays applies to tables except for the following differences:
Activity
Differences
Defining
A table name must be a unique symbolic name that begins
with the letters TAB.
Loading
Tables can be loaded only at compilation time and prerun-time.
Using and Modifying table elements
Only one element of a table is active at one time. The table
name is used to refer to the active element. An index cannot
be specified for a table.
Searching
The LOOKUP operation is specified differently for tables.
Different built-in functions are used for searching tables.
Note:
You cannot define a table in a subprocedure.
The following can be used to search a table:
The LOOKUP operation code
The %TLOOKUP built-in function
The %TLOOKUPLT built-in function
The %TLOOKUPLE built-in function
The %TLOOKUPGT built-in function
The %TLOOKUPGE built-in function

For more information about the LOOKUP operation code, see:


LOOKUP with One Table
LOOKUP with Two Tables
LOOKUP (Look Up a Table or Array Element)
For more information about the %TLOOKUPxx built-in functions,
see %TLOOKUPxx (Look Up a Table Element).
LOOKUP with One Table

When a single table is searched, factor 1, factor 2, and at least one resulting indicator
must be specified. Conditioning indicators (specified in positions 7 through 11) can
also be used.
Whenever a table element is found that satisfies the type of search being made (equal,
high, low), that table element is made the current element for the table. If the search is
not successful, the previous current element remains the current element.
Before a first successful LOOKUP, the first element is the current element.
Resulting indicators reflect the result of the search. If the indicator is on, reflecting a
successful search, the element satisfying the search is the current element.
LOOKUP with Two Tables

When two tables are used in a search, only one is actually searched. When the search
condition (high, low, equal) is satisfied, the corresponding elements are made
available for use.
Factor 1 must contain the search argument, and factor 2 must contain the name of the
table to be searched. The result field must name the table from which data is also
made available for use. A resulting indicator must also be used. Control level and
conditioning indicators can be specified in positions 7 through 11, if needed.
The two tables used should have the same number of entries. If the table that is
searched contains more elements than the second table, it is possible to satisfy the
search condition. However, there might not be an element in the second table that
corresponds to the element found in the search table. Undesirable results can occur.
Note:
If you specify a table name in an operation other than LOOKUP
before a successful LOOKUP occurs, the table is set to its first
element.

Figure 77. Searching for an Equal Entry

*...1....+....2....+....3....+....4....+....5....+....6....+....7...
CL0N01Factor1+++++++Opcode(E)+Factor2+++++++Result++++++++Len++D+HiLoEq..
C* The LOOKUP operation searches TABEMP for an entry that is equal to
C* the contents of the field named EMPNUM. If an equal entry is
C* found in TABEMP, indicator 09 is set on, and the TABEMP entry and
C* its related entry in TABPAY are made the current elements.
C
EMPNUM
LOOKUP
TABEMP
TABPAY
09
C* If indicator 09 is set on, the contents of the field named
C* HRSWKD are multiplied by the value of the current element of
C* TABPAY.
C
IF
*IN09
C
HRSWKD
MULT(H)
TABPAY
AMT
6 2
C
ENDIF

Specifying the Table Element Found in a LOOKUP Operation

Whenever a table name is used in an operation other than LOOKUP, the table name
actually refers to the data retrieved by the last successful search. Therefore, when the
table name is specified in this fashion, elements from a table can be used in
calculation operations.
If the table is used as factor 1 in a LOOKUP operation, the current element is used as
the search argument. In this way an element from a table can itself become a search
argument.
The table can also be used as the result field in operations other than the LOOKUP
operation. In this case the value of the current element is changed by the calculation
specification. In this way the contents of the table can be modified by calculation
operations (see Figure 78).
Figure 78. Specifying the Table Element Found in LOOKUP Operations

*...1....+....2....+....3....+....4....+....5....+....6....+....7...
CL0N01Factor1+++++++Opcode(E)+Factor2+++++++Result++++++++Len++D+HiLoEq..
C
ARGMNT
LOOKUP
TABLEA
20
C* If element is found multiply by 1.5
C* If the contents of the entire table before the MULT operation
C* were 1323.5, -7.8, and 113.4 and the value of ARGMNT is -7.8,
C* then the second element is the current element.
C* After the MULT operation, the entire table now has the
C* following value: 1323.5, -11.7, and 113.4.
C* Note that only the second element has changed since that was
C* the current element, set by the LOOKUP.
C
IF
*IN20
C
TABLEA
MULT
1.5
TABLEA
C
ENDIF

[ Top of Page | Previous Page | Next Page | Table of


Contents | Index ]

You might also like