You are on page 1of 211

2009

Shahrood University of Technology Morteza Zahedi, PhD

ENGLISH
FOR COMPUTER AND IT ENGINEERS
This is the lecture note for winter semester 2009

Computer 2

Table of Contents
1. Computer ................................ ................................ ................................ ....10 1-1. History of computing ................................ ................................ ...............11 1-2. Stored program architecture ................................ ................................ .....15 1-3. How computers work ................................ ................................ ...............18 1-3-1. Control unit ................................ ................................ ...................... 18 1-3-2. Arithmetic/logic unit (ALU) ................................ ............................. 20 1-3-3. Memory................................ ................................ ............................ 20 1-3-4. Input/output (I/O) ................................ ................................ .............22 1-4. Programming languages ................................ ................................ ...........23 2. Computer hardware................................ ................................ ..................... 25 2-1. Motherboard ................................ ................................ ............................ 25 2-2. Power supply ................................ ................................ ........................... 26 2-3. Storage controllers ................................ ................................ ................... 26 2-4. Video display controller ................................ ................................ ...........27 2-5. Removable media devices ................................ ................................ ........27 2-6. Internal storage ................................ ................................ ........................ 27 2-7. Sound card ................................ ................................ ............................... 28 2-8. Networking ................................ ................................ .............................. 28 2-9. Other peripherals................................ ................................ ...................... 28 3. Computer software ................................ ................................ ...................... 31 3-1. Overview ................................ ................................ ................................ .31 3-2. Relationship to computer hardware................................ ........................... 32 3-3. Types ................................ ................................ ................................ .......32 3-4. Program and library................................ ................................ .................. 33 3-5. Three layers ................................ ................................ ............................. 33 3-6. Operation ................................ ................................ ................................ .34 3-7. Quality and reliability................................ ................................ ...............35 3-9. License ................................ ................................ ................................ ....36 3-10. Patents ................................ ................................ ................................ ...36 3-11. Ethics and rights for software users ................................ ........................ 36 4. Programming language ................................ ................................ ...............37

English for Computer and IT Engineers

4-1. Definitions ................................ ................................ ............................... 37 4-2. Usage ................................ ................................ ................................ .......38 4-3. Elements ................................ ................................ ................................ ..39 4-3-1. Syntax ................................ ................................ .............................. 39 4-3-2. Static semantics ................................ ................................ ................ 41 4-3-3. Type system................................ ................................ ...................... 42 4-3-4. Typed versus untyped languages................................ ....................... 42 4-3-5. Static versus dynamic typing................................ ............................. 43 4-3-6. Weak and strong typing ................................ ................................ ....43 4-3-7. Execution semantics ................................ ................................ .........44 4-3-8. Core library ................................ ................................ ...................... 44 4-4. Practice ................................ ................................ ................................ ....44 4-4-1. Specification ................................ ................................ ..................... 45 4-4-2. Implementation ................................ ................................ ................. 45 4-5. Taxonomies ................................ ................................ ............................. 46 5. Operating system ................................ ................................ ........................ 48 5-1. Technology ................................ ................................ .............................. 49 5-1-1. Program execution ................................ ................................ ............49 5-1-2. Interrupts ................................ ................................ .......................... 49 5-1-3. Protected mode and supervisor mode ................................ ................ 50 5-1-4. Memory management ................................ ................................ .......51 5-1-5. Virtual memory ................................ ................................ ................ 52 5-1-6. Multitasking ................................ ................................ ..................... 52 5-1-7. Disk access and file systems ................................ ............................. 53 5-1-8. Device drivers................................ ................................ ................... 55 5-2. Security................................ ................................ ................................ ....55 5-2-1. Example: Microsoft Windows ................................ ........................... 57 5-2-2. Example: Linux/Unix................................ ................................ ........58 5-3. File system support in modern operating systems ................................ .....58 5-3-1. Linux and UNIX ................................ ................................ ...............58 5-3-2. Microsoft Windows ................................ ................................ ..........59 5-3-3. Mac OS X................................ ................................ ......................... 59

Computer 4 5-3-4. Special purpose file systems................................ .............................. 60 5-3-5. Journalized file systems ................................ ................................ ....60 5-4. Graphical user interfaces ................................ ................................ ..........60 5-5. History ................................ ................................ ................................ .....61 5-6. Mainframes ................................ ................................ .............................. 63 6. Web engineering ................................ ................................ ......................... 66 6-1. Web design ................................ ................................ .............................. 67 6-2-1. History ................................ ................................ ............................. 68 6-2-2. Web Site Design ................................ ................................ ...............69 6-2-3. Issues................................ ................................ ................................ 70 6-2-4. Accessible Web design ................................ ................................ .....74 6-2-5. Website Planning ................................ ................................ ..............75 6-3. Web page ................................ ................................ ................................ .77 6-3-1. Color, typography, illustration and interaction................................ ...78 6-3-2. Browsers................................ ................................ ........................... 79 6-3-3. Rendering ................................ ................................ ......................... 79 6-3-4. Creating a web page................................ ................................ ..........80 6-3-5. Saving a web page ................................ ................................ ............81 7. HTML ................................ ................................ ................................ ........83 7-1. HTML markup ................................ ................................ ......................... 83 7-1-1. Elements ................................ ................................ ........................... 83 7-1-2. Attributes ................................ ................................ .......................... 84 7-1-3. Character and entity references ................................ ......................... 85 7-1-4. Data types ................................ ................................ ......................... 86 7-2. Semantic HTML ................................ ................................ ...................... 86 7-3. Delivery of HTML ................................ ................................ ................... 88 7-4-1. HTTP ................................ ................................ ............................... 88 7-3-2. HTML e-mail ................................ ................................ ................... 89 7-3-3. Naming conventions ................................ ................................ .........89 7-4. Dynamic HTML................................ ................................ ....................... 90 7-5. Cascading Style Sheets................................ ................................ .............91 7-5-1. Syntax ................................ ................................ .............................. 91

English for Computer and IT Engineers

7-5-2. History ................................ ................................ ............................. 94 7-5-3. Browser support................................ ................................ ................ 97 7-5-4. Limitations ................................ ................................ ....................... 98 7-5-5. Advantages ................................ ................................ ..................... 101 8. Web Scripting languages. ................................ ................................ .......... 102 8-1. PHP ................................ ................................ ................................ ....... 102 8-1-1. History ................................ ................................ ........................... 102 8-1-2. Usage ................................ ................................ ............................. 104 8-1-3. Speed optimization ................................ ................................ ......... 105 8-1-4. Security ................................ ................................ .......................... 105 8-1-5. Syntax ................................ ................................ ............................ 106 8-1-6. Resources ................................ ................................ ....................... 108 8-2. Active Server Pages ................................ ................................ ............... 109 8-3. JavaScript ................................ ................................ .............................. 110 8-3-1. History and naming................................ ................................ ......... 110 8-3-2. Features ................................ ................................ .......................... 111 8-3-3. Use in web pages ................................ ................................ ............ 113 8-3-4. Uses outside web pages................................ ................................ ...117 8-3-5. Debugging ................................ ................................ ...................... 118 8-3-6. Related languages ................................ ................................ ........... 119 8-4. Ajax (programming)................................ ................................ ............... 121 8-4-1. History ................................ ................................ ........................... 121 8-4-2. Technologies ................................ ................................ .................. 121 8-4-3. Critique ................................ ................................ .......................... 122 9. Web 2.0 ................................ ................................ ................................ .... 124 9-1. Definition ................................ ................................ ............................... 124 9-2. Characteristics ................................ ................................ ........................ 127 9-3. Technology overview................................ ................................ ............. 128 9-4. Associated innovations................................ ................................ ........... 128 9-5. Web-based applications and desktops................................ ..................... 129 9-5-1. Internet applications................................ ................................ ........ 129 9-5-2. XML and RSS ................................ ................................ ................ 130

Computer 6 9-5-3. Web APIs ................................ ................................ ....................... 130 9-6. Economics ................................ ................................ ............................. 131 9-7. Criticism ................................ ................................ ................................ 132 9-8. Trademark................................ ................................ .............................. 133 10. Semantic Web ................................ ................................ ........................ 135 10-1. Purpose ................................ ................................ ................................ 135 10-2. Relationship to the Hypertext Web ................................ ....................... 136 10-2-1. Markup ................................ ................................ ......................... 136 10-2-2. Descriptive and extensible................................ ............................. 137 10-2-3. Semantic vs. non-Semantic Web ................................ ................... 137 10-3. Relationship to Object Orientation ................................ ........................ 138 10-4. Skeptical reactions ................................ ................................ ............... 138 10-4-1. Practical feasibility ................................ ................................ ....... 138 10-4-2. An unrealized idea ................................ ................................ ........ 139 10-4-3. Censorship and privacy ................................ ................................ .139 10-4-4. Doubling output formats ................................ ............................... 139 10-4-5. Need ................................ ................................ ............................. 140 10-5. Components ................................ ................................ ......................... 140 11. Electronic learning ................................ ................................ ................. 142 11-1. Market ................................ ................................ ................................ .142 11-2. Growth of e-learning ................................ ................................ ............ 142 11-3. Technology ................................ ................................ .......................... 143 11-4. Services ................................ ................................ ............................... 144 11-5. Goals of e-learning ................................ ................................ ............... 144 11-5-1. Computer-based learning ................................ .............................. 144 11-5-2. Computer-based training ................................ ............................... 145 11-6. Pedagogical elements ................................ ................................ ........... 145 11-7. Pedagogical approaches or perspectives................................ ................ 146 11-8. Reusability, standards and learning objects ................................ ........... 146 11-9. Communication technologies used in e-learning ................................ ...147 11-10. E-Learning 2.0 ................................ ................................ ................... 148 12. Electronic commerce ................................ ................................ ............. 149

English for Computer and IT Engineers

12-1. History ................................ ................................ ................................ .149 12-2. Business applications ................................ ................................ ........... 150 12-3. Government regulations ................................ ................................ ....... 151 12-4. Forms ................................ ................................ ................................ ...151 13. e-Government ................................ ................................ ........................ 152 13-1. History of E-Government ................................ ................................ ..... 153 13-2. Development and implementation issues ................................ .............. 153 13-3. E-democracy ................................ ................................ .................. 156 13-3-1. Practical issues with e-democracy ................................ ................. 156 13-3-2. Internet as political medium ................................ .......................... 157 13-3-3. Benefits and disadvantages ................................ ........................... 157 13-3-4. Electronic direct democracy ................................ .......................... 159 13-4. 14. Electronic voting ................................ ................................ ............ 159 Computer vision ................................ ................................ .................... 163

14-1. State of the art ................................ ................................ ...................... 163 14-2. Related fields ................................ ................................ ....................... 164 14-3. Applications for computer vision................................ .......................... 167 14-4. Typical tasks of computer vision ................................ .......................... 169 14-4-1. Recognition ................................ ................................ .................. 169 14-4-2. Motion ................................ ................................ .......................... 170 14-4-3. Scene reconstruction ................................ ................................ ..... 170 14-4-4. Image restoration ................................ ................................ .......... 170 14-5. Computer vision systems ................................ ................................ ...... 171 15. Artificial intelligence ................................ ................................ ............. 173 15-1. History of AI research ................................ ................................ .......... 174 15-2. Philosophy of AI ................................ ................................ .................. 175 15-3. AI research ................................ ................................ ........................... 176 15-3-1. Problems of AI ................................ ................................ ............. 176 15-3-2. Knowledge representation ................................ ............................. 177 15-3-3. Planning ................................ ................................ ....................... 178 15-3-4. Learning ................................ ................................ ....................... 178 15-3-5. Natural language processing................................ .......................... 178

Computer 8 15-3-6. Cybernetics and brain simulation ................................ .................. 179 15-3-7. Traditional symbolic AI ................................ ................................ 180 15-2-17. Search and optimization ................................ .............................. 181 15-3-8. Logic ................................ ................................ ............................ 182 15-3-9. Classifiers and statistical learning methods ................................ .... 183 15-3-10. Neural networks ................................ ................................ .......... 184 15-4. Applications of artificial intelligence ................................ .................... 184 16. Human-computer interaction................................ ................................ ..185 16-1. Goals ................................ ................................ ................................ ...185 16-2. Differences with related fields ................................ .............................. 186 16-3. Design principles................................ ................................ .................. 186 16-4. Design methodologies ................................ ................................ .......... 187 16-5. Display design................................ ................................ ...................... 188 16-6. Future developments in HCI................................ ................................ .191 16-7. Some notes on terminology ................................ ................................ ..193 16-8. Humancomputer interface ................................ ................................ ..194 17. Machine translation................................ ................................ ................ 196 17-1. History ................................ ................................ ................................ .196 17-2. Translation process................................ ................................ ............... 197 17-3. Approaches ................................ ................................ .......................... 198 17-3-1. Rule-based ................................ ................................ .................... 199 17-3-2. Transfer-based machine translation ................................ ............... 199 17-3-3. Statistical ................................ ................................ ...................... 199 17-3-4. Example-based ................................ ................................ ............. 199 17-4. Major issues ................................ ................................ ......................... 200 17-4-1. Disambiguation................................ ................................ ............. 200 17-5. Applications ................................ ................................ ......................... 201 17-6. Evaluation ................................ ................................ ............................ 202 18. Speech recognition................................ ................................ ................. 203 18-1. History ................................ ................................ ................................ .203 18-2. Applications ................................ ................................ ......................... 203 18-2-1. Health care................................ ................................ .................... 203

English for Computer and IT Engineers

18-2-2. Military................................ ................................ ......................... 204 18-2-3. Telephony and other domains................................ ........................ 207 18-2-4. Further applications ................................ ................................ ...... 208 18-3. Speech recognition systems ................................ ................................ ..208 18-3-1. Hidden Markov model based speech recognition........................... 208 18-3-2. Dynamic time warping based speech recognition .......................... 210 18-4. Performance of speech recognition systems ................................ .......... 210

1. Computer
A computer i a machi that mani lat data according to a lit of instructions.

Figure 1-1: The NASA Columbia Supercomputer

The first devices that resemble modern computers date to the mid-20th century (around 1940 - 1945), although the computer concept and various machines similar to computers existed earlier. Early electronic computers were the si e of a large room, consuming as much power as several hundred modern personal computers. Modern computers are based on tiny integrated circuits and are millions to billions of times more capable while occupying a fraction of the space. Today, simple computers may be made small enough to fit into a wristwatch and be powered from a watch battery. Personal computers, in various forms, are icons of the Information Age and are what most people think of as "a computer"; however, the most common form of computer in use today is the embedded computer. Embedded computers are small, simple devices that are used to control other devices for example, they may be found in machines ranging from fighter aircraft to industrial robots, digital cameras, and children's toys. The ability to store and execute lists of instructions called programs makes computers extremely versatile and distinguishes them from calculators. The ChurchTuring thesis is a mathematical statement of this versatility: any computer with a certain minimum capability is, in principle, capable of performing the same tasks that any other computer can perform. Therefore, computers with capability and complexity ranging from that of a personal digital assistant to a supercomputer are all able to perform the same computational tasks given enough time and storage capacity.

10

11

English for Computer and IT Engineers

1-1. History of computing


It is difficult to identify any one device as the earliest computer, partly because the term "computer" has been subject to varying interpretations over time. Originally, the term "computer" referred to a person who performed numerical calculations (a human computer), often with the aid of a mechanical calculating device.

Figure 1-2: The Jacquard loom was one of the first programmable devices.

The history of the modern computer begins with two separate technologies - that of automated calculation and that of programmability. Examples of early mechanical calculating devices included the abacus, the slide rule and arguably the astrolabe and the Antikythera mechanism (which dates from about 150-100 BC). Hero of Alexandria (c. 1070 AD) built a mechanical theater which performed a play lasting 10 minutes and was operated by a complex system of ropes and drums that might be considered to be a means of deciding which parts of the mechanism performed which actions and when. This is the essen of ce programmability. The "castle clock", an astronomical clock invented by Al-Jazari in 1206, is considered to be the earliest programmable analog computer. [4] It displayed the zodiac, the solar and lunar orbits, a crescent moon-shaped pointer travelling across a gateway causing automatic doors to open every hour, and five robotic musicians who play music when struck by levers operated by a camshaft attached to a water wheel. The length of day and night could be re-programmed every day in order to account for the changing lengths of day and night throughout the year.

11

Computer 12 The end of the Middle Ages saw a re-invigoration of European mathematics and engineering, and Wilhelm Schickard's 1623 device was the first of a number of mechanical calculators constructed by European engineers. However, none of those devices fit the modern definition of a computer because they could not be programmed. In 1801, Joseph Marie Jacquard made an improvement to the textile loom that used a series of punched paper cards as a template to allow his loom to weave intricate patterns automatically. The resulting Jacquard loom was an important step in the development of computers because the use of punched cards to define woven patterns can be viewed as an early, albeit limited, form of programmability. It was the fusion of automatic calculation with programmability that produced the first recognizable computers. In 1837, Charles Babbage was the first to conceptualize and design a fully programmable mechanical computer that he called "The Analytical Engine". Due to limited finances, and an inability to resist tinkering with the design, Babbage never actually built his Analytical Engine. Large-scale automated data processing of punched cards was performed for the U.S. Census in 1890 by tabulating machines designed by Herman Hollerith and manufactured by the Computing Tabulating Recording Corporation, which later became IBM. By the end of the 19th century a number of technologies that would later prove useful in the realization of practical computers had begun to appear: the punched card, Boolean algebra, the vacuum tube (thermionic valve) and the teleprinter. During the first half of the 20th century, many scientific computing needs were met by increasingly sophisticated analog computers, which used a direct mechanical or electrical model of the problem as a basis for computation. However, these were not programmable and generally lacked the versatility and accuracy of modern digital computers.
 $" #" !      
D i i
Fi erati May 1941 Summer 1941

Tabl
Name

aract ri tics of some early di ital computers of t e 1940s


Numeral mputi tem mechani m Binary

Programming

al

Zuse Z3 (Germany) AtanasoffBerry omputer (US)

Electromechanical Electronic

Program-controlled by punched film Yes stock Not programmablesingle purpose No

Binary

'$

"&

Turing complete

&

13
Colossus

English for Computer and IT Engineers )(


K)

January 1944

Binary

Electronic

Program-controlled by patch cables No and switches Program-controlled by 24-channel punched paper tape (but no conditional branch) No

Harvard Mark I 1944 IBM CC )

Decimal

Electromechanical

ENI C

Manchester mall- June 1948 cale Experimental Machine (UK) Modified ENI C (US)

EDSAC (UK)

Manchester Mark October I (UK) 1949

CSIRAC (Australia)

1)( 0 1

1)( 0

10

November 1945

Decimal

Electronic

Program-controlled by patch cables Yes and switches Stored-program in Williams cathode Yes ray tube memory

Binary

Electronic

September Decimal 1948

Electronic

Program-controlled by patch cables Yes and switches plus a primitive readonly stored programming mechanism using the Function Tables as program ROM Stored-program in mercury delay line memory Yes

May 1949

Binary

Electronic

Binary

Electronic

Stored-program in Williams cathode Yes ray tube memory and magnetic drum memory Stored-program in mercury delay line memory Yes

November 1949

Binary

Electronic

13

Computer 2

14

Figure 1-3: E SAC was one of the first computers to implement the von Neumann architecture.

A succession of steadily more powerful and flexible computing devices were constructed in the 1930s and 1940s, gradually adding the key features that are seen in modern computers. Several developers of ENIAC, recognizing its flaws, came up with a far more flexible and elegant design, which came to be known as the "stored program architecture" or von Neumann architecture. This design was first formally described by John von Neumann in the paper First Draft of a Report on the EDVAC, distributed in 1945. A number of projects to develop computers based on the stored-program architecture commenced around this time, the first of these being completed in Great Britain. The first to be demonstrated working was the Manchester Small-Scale Experimental Machine (SSEM or "Baby"), while the EDSAC, completed a year after SSEM, was the first practical implementation of the stored program design. Shortly thereafter, the machine originally described by von Neumann's paperEDVACwas completed but did not see full-time use for an additional two years. Nearly all modern computers implement some form of the stored-program architecture, making it the single trait by which the word "computer" is now defined. While the technologies used in computers have changed dramatically since the first electronic, general-purpose computers of the 1940s, most still use the von Neumann architecture.

Figure 1-4: Microprocessors are miniaturized devices that often implement stored program CPUs.

Computers that used vacuum tubes as their electronic elements were in use throughout the 1950s. Vacuum tube electronics were largely replaced in the 1960s by transistor-based electronics, which are smaller, faster, cheaper to produce, require less power, and are more reliable. In the 1970s, integrated circuit technology and the subsequent creation of microprocessors, such as the Intel 4004,

15

English for Computer and IT Engineers

further decreased size and cost and further increased speed and reliability of computers. By the 1980s, computers became sufficiently small and cheap to replace simple mechanical controls in domestic appliances such as washing machines. The 1980s also witnessed home computers and the now ubiquitous personal computer. With the evolution of the Internet, personal computers are becoming as common as the television and the telephone in the household.

1-2. Stored program architecture


The defining feature of modern computers which distinguishes them from all other machines is that they can be programmed. That is to say that a list of instructions (the program) can be given to the computer and it will store them and carry them out at some time in the future. In most cases, computer instructions are simple: add one number to another, move some data from one location to another, send a message to some external device, etc. These instructions are read from the computer's memory and are generally carried out (executed) in the order they were given. However, there are usually specialized instructions to tell the computer to jump ahead or backwards to some other place in the program and to carry on executing from there. These are called "jump" instructions (or branches). Furthermore, jump instructions may be made to happen conditionally so that different sequences of instructions may be used depending on the result of some previous calculation or some external event. Many computers directly support subroutines by providing a type of jump that "remembers" the location it jumped from and another instruction to return to the instruction following that jump instruction. Program execution might be likened to reading a book. While a person will normally read each word and line in sequence, they may at times jump back to an earlier place in the text or skip sections that are not of interest. Similarly a , computer may sometimes go back and repeat the instructions in some section of the program over and over again until some internal condition is met. This is called the flow of control within the program and it is what allows the computer to perform tasks repeatedly without human intervention. Comparatively, a person using a pocket calculator can perform a basic arithmetic operation such as adding two numbers with just a few button presses. But to add together all of the numbers from 1 to 1,000 would take thousands of button presses and a lot of timewith a near certainty of making a mistake. On the other hand, a

15

Computer

16

computer may be programmed to do this with just a few simple instructions. For example:
mov #0,sum ; set sum to 0 mov #1,num ; set num to 1 loop: add num,sum ; add num to sum add #1,num ; add 1 to num cmp num,#1000 ; compare num to 1000 lable loop ; if num <= 1000, go back to 'loop' halt ; end of program. stop running

Once told to run this program, the computer will perform the repetitive addition task without further human intervention. It will almost never make a mistake and a modern PC can complete the task in about a millionth of a second. However, computers cannot "think" for themselves in the sense that they only solve problems in exactly the way they are programmed to. An intelligent human faced with the above addition task might soon realize that instead of actually adding up all the numbers one can simply use the equation

and arrive at the correct answer (500,500) with little work. In other words, a computer programmed to add up the numbers one by one as in the example above would do exactly that without regard to efficiency or alternative solutions.

Figure 1-5: A 1970s punched card containing one line from a FORTRAN program.

In practical terms, a computer program may run from just a few instructions to many millions of instructions, as in a program for a word processor or a web browser. A typical modern computer can execute billions of instructions per second (gigahertz or GHz) and rarely make a mistake over many years of operation. Large

17

English for Computer and IT Engineers

computer programs comprising several million instructions may take teams of programmers years to write, thus the probability of the entire program having been written without error is highly unlikely. Errors in computer programs are called "bugs". Bugs may be benign and not affect the usefulness of the program, or have only subtle effects. But in some cases they may cause the program to "hang" - become unresponsive to input such as mouse clicks or keystrokes, or to completely fail or "crash". Otherwise benign bugs may sometimes may be harnessed for malicious intent by an unscrupulous user writing an "exploit" - code designed to take advantage of a bug and disrupt a program's proper execution. Bugs are usually not the fault of the computer. Since computers merely execute the instructions they are given, bugs are nearly always the result of programmer error or an oversight made in the program's design. In most computers, individual instructions are stored as machine code with each instruction being given a unique number (its operation code or opcode for short). The command to add two numbers together would have one opcode, the command to multiply them would have a different opcode and so on. The simplest computers are able to perform any of a handful of different instructions; the more complex computers have several hundred to choose fromeach with a unique numerical code. Since the computer's memory is able to store numbers, it can also store the instruction codes. This leads to the important fact that entire programs (which are just lists of instructions) can be represented as lists of numbers and can themselves be manipulated inside the computer just as if they were numeric data. The fundamental concept of storing programs in the computer's memory alongside the data they operate on is the crux of the von Neumann, or stored program, architecture. In some cases, a computer might store some or all of its program in memory that is kept separate from the data it operates on. This is called the Harvard architecture after the Harvard Mark I computer. Modern von Neumann computers display some traits of the Harvard architecture in their designs, such as in CPU caches. While it is possible to write computer programs as long lists of numbers (machine language) and this technique was used with many early computers, it is extremely tedious to do so in practice, especially for complicated programs. Instead, each basic instruction can be given a short name that is indicative of its function and easy to remembera mnemonic such as ADD, SUB, MULT or JUMP. These mnemonics are collectively known as a computer's assembly language. Converting programs written in assembly language into something the computer can actually understand (machine language) is usually done by a computer program called an assembler. Machine languages and the assembly languages that represent them

17

Computer 18 (collectively termed low-level programming languages) tend to be unique to a particular type of computer. For instance, an ARM architecture computer (such as may be found in a PDA or a hand-held videogame) cannot understand the machine language of an Intel Pentium or the AMD Athlon 64 computer that might be in a PC. Though considerably easier than in machine language, writing long programs in assembly language is often difficult and error prone. Therefore, most complicated programs are written in more abstract high-level programming languages that are able to express the needs of the computer programmer more conveniently (and thereby help reduce programmer error). High level languages are usually "compiled" into machine language (or sometimes into assembly language and then into machine language) using another computer program called a compiler. Since high level languages are more abstract than assembly language, it is possible to use different compilers to translate the same high level language program into the machine language of many different types of computer. This is part of the means by which software like video games may be made available for different computer architectures such as personal computers and various video game consoles. The task of developing large software systems is an immense intellectual effort. Producing software with an acceptably high reliability on a predictable schedule and budget has proved historically to be a great challenge; the academic and professional discipline of software engineering concentrates specifically on this problem.

1-3.

ow computers work

A general purpose computer has four main sections: the arithmetic and logic unit (ALU), the control unit, the memory, and the input and output devices (collectively termed I/O). These parts are interconnected by busses, often made of groups of wires. The control unit, ALU, registers, and basic I/O (and often other hardware closely linked with these) are collectively known as a central processing unit (CPU). Early CPUs were composed of many separate components but since the mid-1970s CPUs have typically been constructed on a single integrated circuit called a microprocessor.

1-3-1. ontrol unit

19

English for Computer and IT Engineers

The control unit (often called a control system or central controller) directs the various components of a computer. It reads and interprets (decodes) instructions in the program one by one. The control system decodes each instruction and turns it into a series of control signals that operate the other parts of the computer. Control systems in advanced computers may change the order of some instructions so as to improve performance. A key component common to all CPUs is the program counter, a special memory cell (a register) that keeps track of which location in memory the next instruction is to be read from.

Figure 1-7: A MIPS architecture instruction is decoded b the control s stem.

The control system's function is as followsnote that this is a simplified description, and some of these steps may be performed concurrently or in a different order depending on the type of CPU: 1. Read the code for the next instruction from the cell indicated by the program counter. 2. Decode the numerical code for the instruction into a set of commands or signals for each of the other systems. 3. Increment the program counter so it points to the next instruction. 4. Read whatever data the instruction requires from cells in memory (or perhaps from an input device). The location of this required data is typically stored within the instruction code. 5. Provide the necessary data to an ALU or register. 6. If the instruction requires an ALU or specialized hardware to complete, instruct the hardware to perform the requested operation. 7. Write the result from the ALU back to a memory location or to a register or perhaps an output device. 8. Jump back to step (1). Since the program counter is (conceptually) just another set of memory cells, it can be changed by calculations done in the ALU. Adding 100 to the program counter

19

Computer 20 would cause the next instruction to be read from a place 100 locations further down the program. Instructions that modify the program counter are often known as "jumps" and allow for loops (instructions that are repeated by the computer) and often conditional instruction execution (both examples of control flow). It is noticeable that the sequence of operations that the control unit goes through to process an instruction is in itself like a short computer program - and indeed, in some more complex CPU designs, there is another yet smaller computer called a microsequencer that runs a microcode program that causes all of these events to happen.

1-3-2. Arithmetic/logic unit (ALU)


The ALU is capable of performing two classes of operations: arithmetic and logic. The set of arithmetic operations that a particular ALU supports may be limited to adding and subtracting or might include multiplying or dividing, trigonometry functions (sine, cosine, etc) and square roots. Some can only operate on whole numbers (integers) whilst others use floating point to represent real numbers albeit with limited precision. However, any computer that is capable of performing just the simplest operations can be programmed to break down the more complex operations into simple steps that it can perform. Therefore, any computer can be programmed to perform any arithmetic operationalthough it will take more time to do so if its ALU does not directly support the operation. An ALU may also compare numbers and return boolean truth values (true or false) depending on whether one is equal to, greater than or less than the other ("is 64 greater than 65?"). Logic operations involve Boolean logic: AND, OR, XOR and NOT. These can be useful both for creating complicated conditional statements and processing boolean logic. Superscalar computers contain multiple ALUs so that they can process several instructions at the same time. Graphics processors and computers with SIMD and MIMD features often provide ALUs that can perform arithmetic on vectors and matrices.

1-3-3.

emory

21

English for Computer and IT Engineers

A computer's memory can be viewed as a list of cells into which numbers can be placed or read. Each cell has a numbered "address" and can store a single number. The computer can be instructed to "put the number 123 into the cell numbered 1357" or to "add the number that is in cell 1357 to the number that is in cell 2468 and put the answer into cell 1595". The information stored in memory may represent practically anything. Letters, numbers, even computer instructions can be placed into memory with equal ease. Since the CPU does not differentiate between different types of information, it is up to the software to give significance to what the memory sees as nothing but a series of numbers.

Figure 1-8: Magnetic core memor was popular main memor for computers through the 1960s until it was completel replaced b semiconductor memor .

In almost all modern computers, each memory cell is set up to store binary numbers in groups of eight bits (called a byte). Each byte is able to represent 256 different numbers; either from 0 to 255 or -128 to +127. To store larger numbers, several consecutive bytes may be used (typically, two, four or eight). When negative numbers are required, they are usually stored in two's complement notation. Other arrangements are possible, but are usually not seen outside of specialized applications or historical contexts. A computer can store any kind of information in memory as long as it can be somehow represented in numerical form. Modern computers have billions or even trillions of bytes of memory. The CPU contains a special set of memory cells called registers that can be read and written to much more rapidly than the main memory area. There are typically between two and one hundred registers depending on the type of CPU. Registers are used for the most frequently needed data items to avoid having to access main memory every time data is needed. Since data is constantly being worked on, reducing the need to access main memory (which is often slow compared to the ALU and control units) greatly increases the computer's speed. Computer main memory comes in two principal varieties: random access memory or RAM and read-only memory or ROM. RAM can be read and written to anytime

21

Computer

22

the CPU commands it, but ROM is pre-loaded with data and software that never changes, so the CPU can only read from it. ROM is typically used to store the computer's initial start-up instructions. In general, the contents of RAM is erased when the power to the computer is turned off while ROM retains its data indefinitely. In a PC, the ROM contains a specialized program called the BIOS that orchestrates loading the computer's operating system from the hard disk drive into RAM whenever the computer is turned on or reset. In embedded computers, which frequently do not have disk drives, all of the software required to perform the task may be stored in ROM. Software that is stored in ROM is often called firmware because it is notionally more like hardware than software. Flash memory blurs the distinction between ROM and RAM by retaining data when turned off but being rewritable like RAM. However, flash memory is typically much slower than conventional ROM and RAM so its use is restricted to applications where high speeds are not required. In more sophisticated computers there may be one or more RAM cache memories which are slower than registers but faster than main memory. Generally computers with this sort of cache are designed to move frequently needed data into the cache automatically, often without the need for any intervention on the progr ammer's part.

1- -4. Input/output (I/O)


I/O is the means by which a computer receives information from the outside world and sends results back. Devices that provide input or output to the computer are called peripherals. On a typical personal computer, peripherals include input devices like the keyboard and mouse, and output devices such as the display and printer. Hard disk drives, floppy disk drives and optical disc drives serve as both input and output devices. Computer networking is another form of I/O.

Figure 1-9: Hard disks are common I/O devices used with computers.

Often, I/O devices are complex computers in their own right with their own CPU and memory. A graphics processing unit might contain fifty or more tiny

23

English for Computer and IT Engineers

computers that perform the calculations necessary to display 3D graphics. Modern desktop computers contain many smaller computers that assist the main CPU in performing I/O.

1-4. Programming languages


Programming languages provide various ways of specifying programs for computers to run. Unlike natural languages, programming languages are designed to permit no ambiguity and to be concise. They are purely written languages and are often difficult to read aloud. They are generally either translated into machine language by a compiler or an assembler before being run, or translated directly at run time by an interpreter. Sometimes programs are executed by a hybrid method of the two techniques. There are thousands of different programming languages some intended to be general purpose, others useful only for highly specialized applications.

Table 1-5: Programming Languages

Lists of programming languages

Timeline of programming languages, Categorical list of programming languages, Generational list of programming languages, Alphabetical list of programming languages, NonEnglish-based programming languages

Commonly used Assembly languages

ARM, MIPS, x86

Commonly used High level languages

BASIC, C, C++, C#, COBOL, Fortran, Java, Lisp, Pascal

Commonly used Scripting

Bourne script, JavaScript, Python, Ruby, PHP, Perl

23

24 languages

25

English for Computer and IT Engineers

2. Computer hardware
A typical personal computer consists of a case or chassis in a tower shape (desktop) and the following parts:

Figure 2-1: Internals of t pical personal computer.

Figure 2-2: Inside a Custom Computer.

2-1. Motherboard
y y

Motherboard - It is the "body" or mainframe of the computer, through which all other components interface. Central processing unit (CPU) - Performs most of the calculations which enable a computer to function, sometimes referred to as the "brain" of the computer.

25

Computer hardware 26 Computer fan - Used to lower the temperature of the computer; a fan is almost always attached to the CPU, and the computer case will generally have several fans to maintain a constant airflow. Liquid cooling can also be used to cool a computer, though it focuses more on individual parts rather than the overall temperature inside the chassis. Random Access Memory (RAM) -It is also known as the physical memory of the computer. Fast-access memory that is cleared when the computer is powered-down. RAM attaches directly to the motherboard, and is used to store programs that are currently running. Firmware is loaded from the Read only memory ROM run from the Basic Input-Output System (BIOS) or in newer systems Extensible Firmware Interface (EFI) compliant Internal Buses - Connections to various internal components. o PCI o PCI-E o USB o HyperTransport o CSI (expected in 2008) o AGP (being phased out) o VLB (outdated) External Bus Controllers - used to connect to external peripherals, such as printers and input devices. These ports may also be based upon expansion cards, attached to the internal buses.
o

2-2. Power supply


A case control, and (usually) a cooling fan, and supplies power to run the rest of the computer, the most common types of power supplies are AT and BabyAT (old) but the standard for PCs actually are ATX and Micro ATX.

2-3. Storage controllers


Controllers for hard disk, CD-ROM and other drives like internal Zip and Jaz conventionally for a PC are IDE/ATA; the controllers sit directly on the motherboard (on-board) or on expansion cards, such as a Disk array controller. IDE is usually integrated, unlike SCSI Small omputer System nterface which can be found in some servers. The floppy drive interface is a legacy MFM interface which is now slowly disappearing. All these interfaces are gradually being phased out to be replaced by SATA and SAS.

27

English for Computer and IT Engineers

2-4. Video display controller


Produces the output for the visual display unit. This will either be built into the motherboard or attached in its own separate slot (PCI, PCI-E, PCI-E 2.0, or AGP), in the form of a Graphics Card.

2-5. emovable media devices


y

y y y y y

CD (compact disc) - the most common type of removable media, inexpensive but has a short life-span. o CD-ROM Drive - a device used for reading data from a CD. o CD Writer - a device used for both reading and writing data to and from a CD. DVD (digital versatile disc) - a popular type of removable media that is the same dimensions as a CD but stores up to 6 times as much information. It is the most common way of transferring digital video. o DVD-ROM Drive - a device used for reading data from a DVD. o DVD Writer - a device used for both reading and writing data to states and from a DVD. o DVD-RAM Drive - a device used for rapid writing and reading of data from a special type of DVD. Blu-ray - a high-density optical disc format for the storage of digital information, including high-definition video. o BD-ROM Drive - a device used for reading data from a Blu-ray disc. o BD Writer - a device used for both reading and writing data to and from a Blu-ray disc. HD DVD - a high-density optical disc format and successor to the standard DVD. It was a discontinued competitor to the Blu-ray format. Floppy disk - an outdated storage device consisting of a thin disk of a flexible magnetic storage medium. Zip drive - an outdated medium-capacity removable disk storage system, first introduced by Iomega in 1994. USB flash drive - a flash memory data storage device integrated with a USB interface, typically small, lightweight, removable, and rewritable. Tape drive - a device that reads and writes data on a magnetic tape,used for long term storage.

2-6. nternal storage

27

Computer hardware 28 Hardware that keeps data inside the computer for later use and remains persistent even when the computer has no power.
y y y

Hard disk - for medium-term storage of data. Solid-state drive - a device similar to hard disk, but containing no moving parts. Disk array controller - a device to manage several hard disks, to achieve performance or reliability improvement.

2-7. Sound card


Enables the computer to output sound to audio devices, as well as accept input from a microphone. Most modern computers have sound cards built-in to the motherboard, though it is common for a user to install a separate sound card as an upgrade.

2-8. Networking
Connects the computer to the Internet and/or other computers.
y y y

Modem - for dial-up connections Network card - for DSL/Cable internet, and/or connecting to other computers. Direct Cable Connection - Use of a null modem, connecting two computers together using their serial ports or a Laplink Cable, connecting two computers together with their parallel ports.

dial up connections broad band connections

2-9. Other peripherals


In addition, hardware devices can include external components of a computer system. The following are either standard or very common.

29

English for Computer and IT Engineers

Figure 2-3: Wheel mouse

Includes various input and output devices, usually external to the computer system Input
y

Text input devices o Keyboard - a device to input text and characters by depressing buttons (referred to as keys), similar to a typewriter. The most common English-language key layout is the QWERTY layout. Pointing devices o Mouse - a pointing device that detects two dimensional motion relative to its supporting surface. o Trackball - a pointing device consisting of an exposed protruding ball housed in a socket that detects rotation about two axes. o Xbox 360 Controller - A controller used for Xbox 360, which can be used as an additional pointing device with the left or right thumbstick with the use of the application Switchblade(tm), . Gaming devices o Joystick - a general control device that consists of a handheld stick that pivots around one end, to de tect angles in two or three dimensions. o Gamepad - a general handheld game controller that relies on the digits (especially thumbs) to provide input. o Game controller - a specific type of controller specialized for certain gaming purposes. Image, Video input devices o Image scanner - a device that provides input by analyzing images, printed text, handwriting, or an object. o Webcam - a low resolution video camera used to provide visual input that can be easily transferred over the internet. Audio input devices

29

Computer hardware 30
o

Microphone - an acoustic sensor that provides input by converting sound into electrical signals

Output
y

Image, Video output devices o Printer o Monitor Audio output devices o Speakers o Headset

31

English for Computer and IT Engineers

3. Computer software
Computer software, or just software is a general term used to describe a collection of computer programs, procedures and documentation that perform some tasks on a computer system. The term includes application software such as word processors which perform productive tasks for users, system software such as operating systems, which interface with hardware to provide the necessary services for application software, and middleware which controls and co-ordinates distributed systems. Software includes websites, programs, video games etc. that are coded by programming languages like C, C++, etc.

Figure 3-1: A screenshot of the OpenOffice.org Writer software

"Software" is sometimes used in a broader context to mean anything which is not hardware but which is used with hardware, such as film, tapes and records.

3-1. Overview
Computer software is usually regarded as anything but hardware, meaning that the "hard" are the parts that are tangible (able to hold) while the "soft" part is the intangible objects inside the computer. Software encompasses an extremely wide array of products and technologies developed using different techniques like programming languages, scripting languages etc. The types of software includes web pages developed by technologies like HTML, PHP, Perl, JSP, ASP.NET, XML, and desktop applications like Microsoft Word, OpenOffice developed by technologies like C, C++, JAVA, C#, etc. Software usually runs on an operating

31

Computer software 32 system (which are software also) like Microsoft Windows, Linux (including GNOME and KDE), Sun Solaris etc. so that they are expect to operate as expected and video games like Super Mario, Call of Duty, etc. for personal computers or video game consoles. These games can be created using CGI designed by applications like Maya, 3D Studio Max etc. Also each software usually runs on a software platform in that for instance, Microsoft Windows application will not be able to run on Mac OS application because how the software is written is different between the systems. These applications can work using software porting, interpreters or re-writing the source code for that platform.

3-2. elationship to computer hardware


Computer software is so called to distinguish it from computer hardware, which encompasses the physical interconnections and devices required to store and execute (or run) the software. At the lowest level, software consists of a machine language specific to an individual processor. A machine language consists of groups of binary values signifying processor instructions which change the state of the computer from its preceding state. Software is an ordered sequence of instructions for changing the state of the computer hardware in a particular sequence. It is usually written in high-level programming languages that are easier and more efficient for humans to use (closer to natural language) than machine language. High-level languages are compiled or interpreted into machine language object code. Software may also be written in an assembly language, essentially, a mnemonic representation of a machine language using a natural language alphabet. Assembly language must be assembled into object code via an assembler. The term "software" was first used in this sense by John W. Tukey in 1958. In computer science and software engineering, computer software is all computer programs. The theory that is the basis for most modern software was first proposed by Alan Turing in his 1935 essay Comput ble numbers wit an application to t e Entscheidungsproblem.

3-3. Types
Practical computer systems divide software systems into three major classes: system software, programming software and application software, although the distinction is arbitrary, and often blurred.

33
y

English for Computer and IT Engineers

System software helps run the computer hardware and computer system. It includes operating systems, device drivers, diagnostic tools, servers, windowing systems, utilities and more. The purpose of systems software is to insulate the applications programmer as much as possible from the details of the particular computer complex being used, especially memory and other hardware features, and such as accessory devices as communications, printers, readers, displays, keyboards, etc. Programming software usually provides tools to assist a programmer in writing computer programs, and software using different programming languages in a more convenient way. The tools include text editors, compilers, interpreters, linkers, debuggers, and so on. An Integrated development environment (IDE) merges those tools into a software bundle, and a programmer may not need to type multiple commands for compiling, interpreting, debugging, tracing, and etc., because the IDE usually has an advanced graphical user interface, or GUI. Application software allows end users to accomplish one or more specific (non-computer related) tasks. Typical applications include industrial automation, business software, educational software, medical software, databases, and computer games. Businesses are probably the biggest users of application software, but almost every field of human activity now uses some form of application software.

3-4. Program and library


A program may not be sufficiently complete for execution by a computer. In particular, it may require additional software from a software library in order to be complete. Such a library may include software components used by stand-alone programs, but which cannot work on their own. Thus, programs may include standard routines that are common to many programs, extracted from these libraries. Libraries may also include 'stand-alone' programs which are activated by some computer event and/or perform some function (e.g., of computer 'housekeeping') but do not return data to their calling program. Libraries may be called by one to many other programs; programs may call zero to many other programs.

3-5. Three layers


Users often see things differently than programmers. People who use modern general purpose computers (as opposed to embedded systems, analog computers,

33

Computer software 34 supercomputers, etc.) usually see three layers of software performing a variety of tasks: platform, application, and user software. Platform software Platform includes the firmware, device drivers, an operating system, and typically a graphical user interface which, in total, allow a user to interact with the computer and its peripherals (associated equipment). Platform software often comes bundled with the computer. On a PC you will usually have the ability to change the platform software. Application software Application software or Applications are what most people think of when they think of software. Typical examples include office suites and video games. Application software is often purchased separately from computer hardware. Sometimes applications are bundled with the computer, but that does not change the fact that they run as independent applications. Applications are almost always independent programs from the operating system, though they are often tailored for specific platforms. Most users think of compilers, databases, and other "system software" as applications. User-written software End-user development tailors systems to meet users' specific needs. User software include spreadsheet templates, word processor macros, scientific simulations, and scripts for graphics and animations. Even email filters are a kind of user software. Users create this software themselves and often overlook how important it is. Depending on how competently the user written software has been integrated into purchased application packages, many users may not be aware of the distinction between the purchased packages, and what has been added by fellow co-workers.

3-6. Operation
Computer software has to be "loaded" into the computer's storage (such as a hard drive, memory, or RAM). Once the software has loaded, the computer is able to execute the software. This involves passing instructions from the application software, through the system software, to the hardware which ultimately receives the instruction as machine code. Each instruction causes the computer to carry out an operation -- moving data, carrying out a computation, or altering the control flow of instructions. Data movement is typically from one place in memory to another. Sometimes it involves moving data between memory and registers which enable high-speed data

35

English for Computer and IT Engineers

access in the CPU. Moving data, especially large amounts of it, can be costly. So, this is sometimes avoided by using "pointers" to data instead. Computations include simple operations such as incrementing the value of a variable data element. More complex computations may involve many operations and data elements together. Instructions may be performed sequentially, conditionally, or iteratively. Sequential instructions are those operations that are performed one after another. Conditional instructions are performed such that different sets of instructions execute depending on the value(s) of some data. In some languages this is known as an "if" statement. Iterative instructions are performed repetitively and may depend on some data value. This is sometimes called a "loop." Often, one instruction may "call" another set of instructions that are defined in some other program or module. When more than one computer processor is used, instructions may be executed simultaneously. A simple example of the way software operates is what happens when a user selects an entry such as "Copy" from a menu. In this case, a conditional instruction is executed to copy text from data in a 'document' area residing in memory, perhaps to an intermediate storage area known as a 'clipboard' data area. If a different menu entry such as "Paste" is chosen, the software may execute the instructions to copy the text from the clipboard data area to a specific location in the same or another document in memory. Depending on the application, even the example above could become complicated. The field of software engineering endeavors to manage the complexity of how software operates. This is especially true for software that operates in the context of a large or powerful computer system. Currently, almost the only limitations on the use of computer software in applications is the ingenuity of the designer/programmer. Consequently, large areas of activities (such as playing grand master level chess) formerly assumed to be incapable of software simulation are now routinely programmed. The only area that has so far proved reasonably secure from software simulation is the realm of human art especially, pleasing music and literature. Kinds of software by operation: computer program as executable, source code or script, configuration.

3-7. Quality and reliability

35

Computer software 36 Software reliability considers the errors, faults, and failures related to the design, implementation and operation of software.

See Software auditing, Software quality, Software testing, and Software reliability.

3-9. License
The software's license gives the user the right to use the software in the licensed environment. Some software comes with the license when purchased off the shelf, or an OEM license when bundled with hardware. Other software comes with a free software license, granting the recipient the rights to modify and redistribute the software. Software can also be in the form of freeware or shareware. See also License Management.

3-10. Patents
The issue of software patents is controversial. Some believe that they hinder software development, while others argue that software patents provide an important incentive to spur software innovation. See software patent debate.

3-11. Ethics and rights for software users


Being a new part of society, the idea of what rights users of software should have is not very developed. Some, such as the free software community, believe that software users should be free to modify and redistribute the software they use. They argue that these rights are necessary so that each individual can control their computer, and so that everyone can cooperate, if they choose, to work together as a community and control the direction that software progresses in. Others believe that software authors should have the power to say what rights the user will get.

37

English for Computer and IT Engineers

4. Programming language
A programming language is an artificial language that can be used to write programs which control the behavior of a machine, particularly a computer. Programming languages are defined by syntactic and semantic rules which describe their structure and meaning respectively. Many programming languages have some form of written specification of their syntax and semantics; some are defined by an official implementation (for example, an ISO Standard), while others have a dominant implementation (such as Perl). Programming languages are also used to facilitate communication about the task of organizing and manipulating information, and to express algorithms precisely. Some authors restrict the term "programming language" to those languages that can express all possible algorithms; sometimes the term "computer language" is used for more limited artificial languages. Thousands of different programming languages have been created so far, and new languages are created every year.

4-1. efinitions
Traits often considered important for constituting a programming language:
y

Function: A programming language is a language used to write computer programs, which involve a computer performing some kind of computation[4] or algorithm and possibly control external devices such as printers, robots, and so on. Target: Programming languages differ from natural languages in that natural languages are only used for interaction between people, while programming languages also allow humans to communicate instructions to machines. Some programming languages are used by one device to control another. For example PostScript programs are frequently created by another program to control a computer printer or display. Constructs: Programming languages may contain constructs for defining and manipulating data structures or controlling the flow of execution.

37

Programming language 38
y

Expressive power: The theory of computation classifies languages by the computations they are capable of expressing. All Turing complete languages can implement the same set of algorithms. ANSI/ISO SQL and Charity are examples of languages that are not Turing complete yet often called programming languages.

Non-computational languages, such as markup languages like HTML or formal grammars like BNF, are usually not considered programming languages. A programming language (which may or may not be Turing complete) may be embedded in these non-computational (host) languages.

4-2. Usage
Programming languages differ from most other forms of human expression in that they require a greater degree of precision and completeness. When using a natural language to communicate with other people, human authors and speakers can be ambiguous and make small errors, and still expect their intent to be understood. However, figuratively speaking, computers "do exactly what they are told to do", and cannot "understand" what code the programmer intended to write. The combination of the language definition, a program, and the program's inputs must fully specify the external behavior that occurs when the program is executed, within the domain of control of that program. Programs for a computer might be executed in a batch process without human interaction, or a user might type commands in an interactive session of an interpreter. In this case the "commands" are simply programs, whose execution is chained together. When a language is used to give commands to a software application (such as a shell) it's called a scripting language. Many languages have been designed from scratch, altered to meet new needs, combined with other languages, and eventually fallen into disuse. Although there have been attempts to design one "universal" computer language that serves all purposes, all of them have failed to be generally accepted as filling this role. The need for diverse computer languages arises from the diversity of contexts in which languages are used:
y y

Programs range from tiny scripts written by individual hobbyists to huge systems written by hundreds of programmers. Programmers range in expertise from novices who need simplicity above all else, to experts who may be comfortable with considerable complexity.

39
y y y

English for Computer and IT Engineers Programs must balance speed, size, and simplicity on systems ranging from microcontrollers to supercomputers. Programs may be written once and not change for generations, or they may undergo nearly constant modification. Finally, programmers may simply differ in their tastes: they may be accustomed to discussing problems and expressing them in a particular language.

One common trend in the development of programming languages has been to add more ability to solve problems using a higher level of abstraction. The earliest programming languages were tied very closely to the underlying hardware of the computer. As new programming languages have developed, features have been added that let programmers express ideas that are more remote from simple translation into underlying hardware instructions. Because programmers are less tied to the complexity of the computer, their programs can do more computing with less effort from the programmer. This lets them write more functionality per time unit. Natural language processors have been proposed as a way to eliminate the need for a specialized language for programming. However, this goal remains distant and its benefits are open to debate. Edsger Dijkstra took the position that the use of a formal language is essential to prevent the introduction of meaningless constructs, and dismissed natural language programming as "foolish." Alan Perlis was similarly dismissive of the idea.

4-3. Elements
4-3-1. Syntax
A programming language's surface form is known as its syntax. Most programming languages are purely textual; they use sequences of text including words, numbers, and punctuation, much like written natural languages. On the other hand, there are some programming languages which are more graphical in nature, using spatial relationships between symbols to specify a program.

39

Programming language 40

Figure 4-1: Parse tree of P thon code with inset tokenization

The syntax of a language describes the possible combinations of symbols that form a syntactically correct program. The meaning given to a combination of symbols is handled by semantics (either formal or hard-coded in a reference implementation). Since most languages are textual, this article discusses textual syntax. Programming language syntax is usually defined using a combination of regular expressions (for lexical structure) and Backus-Naur Form (for grammatical structure). Below is a simple grammar, based on Lisp:
expression ::= atom | list atom ::= number | symbol number ::= [+-]?['0'-'9']+ symbol ::= ['A'-'Z''a'-'z'].* list ::= '(' expression* ')'

This grammar specifies the following:


y y y

an expression is either an atom or a list; an atom is either a num er or a sym ol; a num er is an unbroken sequence of one or more decimal digits, optionally preceded by a plus or minus sign; @ @ A

41
y y

English for Computer and IT Engineers a symbol is a letter followed by zero or more of any characters (excluding whitespace); and a list is a matched pair of parentheses, with zero or more expressions inside it.

The following are examples of well-formed token sequences in this grammar: '12345', '()', '(a b c232 (1))' Not all syntactically correct programs are semantically correct. Many syntactically correct programs are nonetheless ill-formed, per the language's rules; and may (depending on the language specification and the soundness of the implementation) result in an error on translation or execution. In some cases, such programs may exhibit undefined behavior. Even when a program is well-defined within a language, it may still have a meaning that is not intended by the person who wrote it. Using natural language as an example, it may not be possible to assign a meaning to a grammatically correct sentence or the sentence may be false:
y y

"Colorless green ideas sleep furiously." is grammatically well-formed but has no generally accepted meaning. "John is a married bachelor." is grammatically well-formed but expresses a meaning that cannot be true.

The following C language fragment is syntactically correct, but performs an operation that is not semantically defined (because p is a null pointer, the operations p->real and p->im have no meaning):
complex *p = NULL; complex abs_p = sqrt (p->real * p->real + p->im * p->im);

The grammar needed to specify a programming language can be classified by its position in the Chomsky hierarchy. The syntax of most programming languages can be specified using a Type-2 grammar, i.e., they are context-free grammars.

4-3-2. Static semantics


The static semantics defines restrictions on the structure of valid texts that are hard or impossible to express in standard syntactic formalisms.[13] The most important of these restrictions are covered by type systems.

41

Programming language 42

4-3-3. Type system


A type system defines how a programming language classifies values and expressions into types, how it can manipulate those types and how they interact. This generally includes a description of the data structures that can be constructed in the language. The design and study of type systems using formal mathematics is known as type theory. Internally, all data in modern digital computers are stored simply as zeros or ones (binary).

4-3-4. Typed versus untyped languages


A language is typed if the specification of every operation defines types of data to which the operation is applicable, with the implication that it is not applicable to other types. For example, "this text between the quotes " is a string. In most programming languages, dividing a number by a string has no meaning. Most modern programming languages will therefore reject any program attempting to perform such an operation. In some languages, the meaningless operation will be detected when the program is compiled ("static" type checking), and rejected by the compiler, while in others, it will be detected when the program is run ("dynamic" type checking), resulting in a runtime exception. A special case of typed languages are the single-type languages. These are often scripting or markup languages, such as Rexx or SGML, and have only one data type most commonly character strings which are used for both symbolic and numeric data. In contrast, an untyped language, such as most assembly languages, allows any operation to be performed on any data, which are generally considered to be sequences of bits of various lengths. High-level languages which are untyped include BCPL and some varieties of Forth. In practice, while few languages are considered typed from the point of view of type theory (verifying or rejecting all operations), most modern languages offer a degree of typing. Many production languages provide means to bypass or subvert the type system.

43

English for Computer and IT Engineers

4-3-5. Static versus dynamic typing


In static typing all expressions have their types determined prior to the program being run (typically at compile-time). For example, 1 and (2+2) are integer expressions; they cannot be passed to a function that expects a string, or stored in a variable that is defined to hold dates. Statically-typed languages can be manifestly typed or type-inferred. In the first case, the programmer must explicitly write types at certain textual positions (for example, at variable declarations). In the second case, the compiler infers the types of expressions and declarations based on context. Most mainstream statically-typed languages, such as C++, C# and Java, are manifestly typed. Complete type inference has traditionally been associated with less mainstream languages, such as Haskell and ML. However, many manifestly typed languages support partial type inference; for example, Java and C# both infer types in certain limited cases. Dynamic typing, also called latent typing, determines the type-safety of operations at runtime; in other words, types are associated with runtime values rather than textual expressions. As with type-inferred languages, dynamically typed languages do not require the programmer to write explicit type annotations on expressions. Among other things, this may permit a single variable to refer to values of different types at different points in the program execution. However, type errors cannot be automatically detected until a piece of code is actually executed, making debugging more difficult. Ruby, Lisp, JavaScript, and Python are dynamically typed.

4-3-6. Weak and strong typing


Weak typing allows a value of one type to be treated as another, for example treating a string as a number. This can occasionally be useful, but it can also allow some kinds of program faults to go undetected at compile time and even at run time. Strong typing prevents the above. An attempt to perform an operation on the wrong type of value raises an error. Strongly-typed languages are often termed type-safe or safe. An alternative definition for "weakly typed" refers to languages, such as Perl, JavaScript, and C++, which permit a large number of implicit type conversions. In JavaScript, for example, the expression 2 * x implicitly converts x to a number, and this conversion succeeds even if x is null, undefined, an Array, or a string of letters. Such implicit conversions are often useful, but they can mask programming errors.

43

Programming language 44 Strong and static are now generally considered orthogonal concepts, but usage in the literature differs. Some use the term strongly typed to mean strongly, statically typed, or, even more confusingly, to mean simply statically typed. Thus C has been called both strongly typed and weakly, statically typed.

4-3-7. Execution semantics


Once data has been specified, the machine must be instructed to perform operations on the data. The execution semantics of a language defines how and when the various constructs of a language should produce a program behavior. For example, the semantics may define the strategy by which expressions are evaluated to values, or the manner in which control structures conditionally execute statements.

4-3-8. ore library


Most programming languages have an associated core library (sometimes known as the 'Standard library', especially if it is included as part of the published language standard), which is conventionally made available by all implementations of the language. Core libraries typically include definitions for commonly used algorithms, data structures, and mechanisms for input and output. A language's core library is often treated as part of the language by its users, although the designers may have treated it as a separate entity. Many language specifications define a core that must be made available in all implementations, and in the case of standardized languages this core library may be required. The line between a language and its core library therefore differs from language to language. Indeed, some languages are designed so that the meanings of certain syntactic constructs cannot even be described without referring to the core library. For example, in Java, a string literal is defined as an instance of the java.lang.String class; similarly, in Smalltalk, an anonymous function expression (a "block") constructs an instance of the library's BlockContext class. Conversely, Scheme contains multiple coherent subsets that suffice to construct the rest of the language as library macros, and so the language designers do not even bother to say which portions of the language must be implemented as language constructs, and which must be implemented as parts of a library.

4-4. Practice

45

English for Computer and IT Engineers

A language's designers and users must construct a number of artifacts that govern and enable the practice of programming. The most important of these artifacts are the language specification and implementation.

4-4-1. Specification
The specification of a programming language is intended to provide a definition that the language users and the implementors can use to determine whether the behavior of a program is correct, given its source code. A programming language specification can take several forms, including the following:
y

An explicit definition of the syntax, static semantics, and execution semantics of the language. While syntax is commonly specified using a formal grammar, semantic definitions may be written in natural language (e.g., the C language), or a formal semantics (e.g., the Standard ML[18] and Scheme[19] specifications). A description of the behavior of a translator for the language (e.g., the C++ and Fortran specifications). The syntax and semantics of the language have to be inferred from this description, which may be written in natural or a formal language. A reference or model implementation, sometimes written in the language being specified (e.g., Prolog or ANSI REXX [20]). The syntax and semantics of the language are explicit in the behavior of the reference implementation.

4-4-2. mplementation
For more details implementation. on this topic, see Programming language

An implementation of a programming language provides a way to execute that program on one or more configurations of hardware and software. There are, broadly, two approaches to programming language implementation: compilation and interpretation. It is generally possible to implement a language using either technique. The output of a compiler may be executed by hardware or a program called an interpreter. In some implementations that make use of the interpreter approach

45

Programming language 46 there is no distinct boundary between compiling and interpreting. For instance, some implementations of the BASIC programming language compile and then execute the source a line at a time. Programs that are executed directly on the hardware usually run several orders of magnitude faster than those that are interpreted in software. One technique for improving the performance of interpreted programs is just-intime compilation. Here the virtual machine, just before execution, translates the blocks of bytecode which are going to be used to machine code, for direct execution on the hardware.

4-5. Taxonomies
There is no overarching classification scheme for programming languages. A given programming language does not usually have a single ancestor language. Languages commonly arise by combining the elements of several predecessor languages with new ideas in circulation at the time. Ideas that originate in one language will diffuse throughout a family of related languages, and then leap suddenly across familial gaps to appear in an entirely different family. The task is further complicated by the fact that languages can be classified along multiple axes. For example, Java is both an object-oriented language (because it encourages object-oriented organization) and a concurrent language (because it contains built-in constructs for running multiple threads in parallel). Python is an object-oriented scripting language. In broad strokes, programming languages divide into programming paradigms and a classification by intended domain of use. Paradigms include procedural programming, object-oriented programming, functional programming, and logic programming; some languages are hybrids of paradigms or multi-paradigmatic. An assembly language is not so much a paradigm as a direct model of an underlying machine architecture. By purpose, programming languages might be considered general purpose, system programming languages, scripting languages, domain specific languages, or concurrent/distributed languages (or a combination of these). Some general purpose languages were designed largely with educational goals. A programming language may also be classified by factors unrelated to programming paradigm. For instance, most programming languages use English

47

English for Computer and IT Engineers

language keywords, while a minority do not. Other languages may be classified as being esoteric or not.

47

Operating system

48

5. Operating system
An operating system (commonly abbreviated O and O/ ) is the software component of a computer system that is responsible for the management and coordination of activities and the sharing of the resources of the computer. The operating system acts as a host for applications that are run on the machine. As a host, one of the purposes of an operating system is to handle the details of the operation of the hardware. This relieves application programs from having to manage these details and makes it easier to write applications. Almost all computers, including handheld computers, desktop computers, supercomputers, and even video game consoles, use an operating system of some type. Some of the oldest models may however use an embedded operating system, that may be contained on a compact disk or other data storage device. B B

Figure 5-1: A la er structure showing where Operating S stem is located on generall used software s stems on desktops

Operating systems offer a number of services to application programs and users. Applications access these services through application programming interfaces (APIs) or system calls. By invoking these interfaces, the application can request a service from the operating system, pass parameters, and receive the results of the operation. Users may also interact with the operating system with some kind a software user interface (UI) like typing commands by using command line interface (CLI) or using a graphical user interface (GUI, commonly pronounced

49

English for Computer and IT Engineers

gooey). For hand-held and desktop computers, the user interface is generally considered part of the operating system. On large multi-user systems like Unix and Unix-like systems, the user interface is generally implemented as an application program that runs outside the operating system. (Whether the user interface should be included as part of the operating system is a point of contention.) Common contemporary operating systems include Microsoft Windows, Mac OS, Linux and Solaris. Microsoft Windows has a significant majority of market share in the desktop and notebook computer markets, while servers generally run on Linux or other Unix-like systems. Embedded device markets are split amongst several operating systems.

5-1. Technology
An operating system is a collection of technologies which are designed to allow the computer to perform certain functions. These technologies may or may not be present in every operating system, and there are often differences in how they are implemented. However as stated above most modern operating systems are derived from common design ancestors, and are therefore basically similar.

5-1-1. Program execution


Executing a program involves the creation of a process by the operating system. The kernel creates a process by setting aside or allocating some memory, loading program code from a disk or another part of memory into the newly allocated space, and starting it running.

5-1-2. nterrupts
Interrupts are central to operating systems as they allow the operating system to deal with the unexpected activities of running programs and the world outside the computer. Interrupt-based programming is one of the most basic forms of timesharing, being directly supported by most CPUs. Interrupts provide a computer with a way of automatically running specific code in response to events. Even very basic computers support hardware interrupts, and allow the programmer to specify code which may be run when that event takes place. When an interrupt is received, the computer's hardware automatically suspends whatever program is currently running by pushing the current state on a stack, and its registers and program counter are also saved. This is analogous to placing a

49

50

bookmark in a book when someone is interrupted by a phone call. This task requires no operating system as such, but only that the interrupt be configured at an earlier time. In modern operating systems, interrupts are handled by the operating system's kernel. Interrupts may come from either the computer's hardware, or from the running program. When a hardware device triggers an interrupt, the operating system's kernel decides how to deal with this event, generally by running some processing code, or ignoring it. The processing of hardware interrupts is a task that is usually delegated to software called device drivers, which may be either part of the operating system's kernel, part of another program, or both. Device drivers may then relay information to a running program by various means. A program may also trigger an interrupt to the operating system, which are very similar in function. If a program wishes to access hardware for example, it may interrupt the operating system's kernel, which causes control to be passed back to the kernel. The kernel may then process the request which may contain instructions to be passed onto hardware, or to a device driver. When a program wishes to allocate more memory, launch or communicate with another program, or signal that it no longer needs the CPU, it does so through interrupts.

5-1-3. Protected mode and supervisor mode


Modern CPUs support something called dual mode operation. CPUs with this capability use two modes: protected mode and supervisor mode, which allow certain CPU functions to be controlled and affected only by the operating system kernel. Here, protected mode does not refer specifically to the 80286 (Intel's x86 16-bit microprocessor) CPU feature, although its protected mode is very similar to it. CPUs might have other modes similar to 80286 protected mode as well, such as the virtual 8086 mode of the 80386 (Intel's x86 32-bit microprocessor or i386). However, the term is used here more generally in operating system theory to refer to all modes which limit the capabilities of programs running in that mode, providing things like virtual memory addressing and limiting access to hardware in a manner determined by a program running in supervisor mode. Similar modes have existed in supercomputers, minicomputers, and mainframes as they are essential to fully supporting UNIX-like multi-user operating systems.

51

English for Computer and IT Engineers

When a computer first starts up, it is automatically running in supervisor mode. The first few programs to run on the computer, being the BIOS, bootloader and the operating system have unlimited access to hardware. However when the operating system passes control to another program, it can place the CPU into protected mode. In protected mode, programs may have access to a more limited set of the CPU's instructions. A user program may leave protected mode only by triggering an interrupt, causing control to be passed back to the kernel. In this way the operating system can maintain exclusive control over things like access to hardware and memory. The term "protected mode resource" generally refers to one or more CPU registers, which contain information that the running program isn't allowed to alter. Attempts to alter these resources generally causes a switch to supervisor mode.

5-1-4.

emory management

Among other things, a multiprogramming operating system kernel must be responsible for managing all system memory which is currently in use by programs. This ensures that a program does not interfere with memory already used by another program. Since programs time share, each program must have independent access to memory. Cooperative memory management, used by many early operating systems assumes that all programs make voluntary use of the kernel's memory manager, and do not exceed their allocated memory. This system of memory management is almost never seen anymore, since programs often contain bugs which can cause them to exceed their allocated memory. If a program fails it may cause memory used by one or more other programs to be affected or overwritten. Malicious programs, or viruses may purposefully alter another program's memory or may affect the operation of the operating system itself. With cooperative memory management it takes only one misbehaved program to crash the system. Memory protection enables the kernel to limit a process' access to the computer's memory. Various methods of memory protection exist, including memory segmentation and paging. All methods require some level of hardware support (such as the 80286 MMU) which doesn't exist in all computers. In both segmentation and paging, certain protected mode registers specify to the CPU what memory address it should allow a running program to access. Attempts

51

52

to access other addresses will trigger an interrupt which will cause the CPU to reenter supervisor mode, placing the kernel in charge. This is called a segmentation violation or Seg-V for short, and since it is usually a sign of a misbehaving program, the kernel will generally kill the offending program, and report the error. Windows 3.1-Me had some level of memory protection, but programs could easily circumvent the need to use it. Under Windows 9x all MS-DOS applications ran in supervisor mode, giving them almost unlimited control over the computer. A general protection fault would be produced indicating a segmentation violation had occurred, however the system would often crash anyway.

5-1-5. Virtual memory


The use of virtual memory addressing (such as paging or segmentation) means that the kernel can choose which memory each program may use at any given time, allowing the operating system to use the same memory locations for multiple tasks. If a program tries to access memory that isn't in its current range of accessible memory, but nonetheless has been allocated to it, the kernel will be interrupted in the same way as it would if the program were to exceed its allocated memory. (See section on memory management.) Under UNIX this kind of interrupt is referred to as a page fault. When the kernel detects a page fault it will generally adjust the virtual memory range of the program which triggered it, granting it access to the memory requested. This gives the kernel discretionary power over where a particular application's memory is stored, or even whether or not it has actually been allocated yet. In modern operating systems, application memory which is accessed less frequently can be temporarily stored on disk or other media to make that space available for use by other programs. This is called swapping, as an area of memory can be use by multiple programs, and what that memory area contains can be swapped or exchanged on demand.

5-1-6.

ultitasking

Multitasking refers to the running of multiple independent computer programs on the same computer, giving the appearance that it is performing the tasks at the same time. Since most computers can do at most one or two things at one time, this

53

English for Computer and IT Engineers

is generally done via time sharing, which means that each program uses a share of the computer's time to execute. An operating system kernel contains a piece of software called a scheduler which determines how much time each program will spend executing, and in which order execution control should be passed to programs. Control is passed to a process by the kernel, which allows the program access to the CPU and memory. At a later time control is returned to the kernel through some mechanism, so that another program may be allowed to use the CPU. This so-called passing of control between the kernel and applications is called a context switch. An early model which governed the allocation of time to programs was called cooperative multitasking. In this model, when control is passed to a program by the kernel, it may execute for as long as it wants before explicitly returning control to the kernel. This means that a malfunctioning program may prevent any other programs from using the CPU. The philosophy governing preemptive multitasking is that of ensuring that all programs are given regular time on the CPU. This implies that all programs must be limited in how much time they are allowed to spend on the CPU without being interrupted. To accomplish this, modern operating system kernels make use of a timed interrupt. A protected mode timer is set by the kernel which triggers a return to supervisor mode after the specified time has elapsed. (See above sections on Interrupts and Dual Mode Operation.) On many single user operating systems cooperative multitasking is perfectly adequate, as home computers generally run a small number of well tested programs. Windows NT was the first version of Microsoft Windows which enforced preemptive multitasking, but it didn't reach the home user market until Windows XP, (since Windows NT was targeted at professionals.)

5-1-7.

isk access and file systems

Access to files stored on disks is a central feature of all operating systems. Computers store data on disks using files, which are structured in specific ways in order to allow for faster access, higher reliability, and to make better use out of the drive's available space. The specific way files are stored on a disk is called a file system, and enables files to have names and attributes. It also allows them to be stored in a hierarchy of directories or folders arranged in a directory tree.

53

54

Early operating systems generally supported a single type of disk drive and only one kind of file system. Early file systems were limited in their capacity, speed, and in the kinds of file names and directory structures they could use. These limitations often reflected limitations in the operating systems they were designed for, making it very difficult for an operating system to support more than one file system. While many simpler operating systems support a limited range of options for accessing storage systems, more modern operating systems like UNIX and Linux support a technology known as a virtual file system or VFS. A modern operating system like UNIX supports a wide array of storage devices, regardless of their design or file systems to be accessed through a common application programming interface (API). This makes it unnecessary for programs to have any knowledge about the device they are accessing. A VFS allows the operating system to provide programs with access to an unlimited number of devices with an infinite variety of file systems installed on them through the use of specific device drivers and file system drivers. A connected storage device such as a hard drive is accessed through a device driver. The device driver understands the specific language of the drive and is able to translate that language into a standard language used by the operating system to access all disk drives. On UNIX this is the language of block devices. When the kernel has an appropriate device driver in place, it can then access the contents of the disk drive in raw format, which may contain one or more file systems. A file system driver is used to translate the commands used to access each specific file system into a standard set of commands that the operating system can use to talk to all file systems. Programs can then deal with these file systems on the basis of filenames, and directories/folders, contained within a hierarchical structure. They can create, delete, open, and close files, as well as gather various information about them, including access permissions, size, free space, and creation and modification dates. Various differences between file systems make supporting all file systems difficult. Allowed characters in file names, case sensitivity, and the presence of various kinds of file attributes makes the implementation of a single interface for every file system a daunting task. Operating systems tend to recommend the use of (and so support natively) file systems specifically designed for them; for example, NTFS in Windows and extn and ReiserFS in Linux. However, in practice, third party drives are usually available to give support for the most widely used filesystems in most general-purpose operating systems (for example, NTFS is available in Linux

55

English for Computer and IT Engineers

through NTFS-3g, and ext2/3 and ReiserFS are available in Windows through FSdriver and rfstool).

5-1-8.

evice drivers

A device driver is a specific type of computer software developed to allow interaction with hardware devices. Typically this constitutes an interface for communicating with the device, through the specific computer bus or communications subsystem that the hardware is connected to, providing commands to and/or receiving data from the device, and on the other end, the requisite interfaces to the operating system and software applications. It is a specialized hardware-dependent computer program which is also operating system specific that enables another program, typically an operating system or applications software package or computer program running under the operating system kernel, to interact transparently with a hardware device, and usually provides the requisite interrupt handling necessary for any necessary asynchronous time-dependent hardware interfacing needs. The key design goal of device drivers is abstraction. Every model of hardware (even within the same class of device) is different. Newer models also are released by manufacturers that provide more reliable or better performance and these newer models are often controlled differently. Computers and their operating systems cannot be expected to know how to control every device, both now and in the future. To solve this problem, OSes essentially dictate how every type of device should be controlled. The function of the device driver is then to translate these OS mandated function calls into device specific calls. In theory a new device, which is controlled in a new manner, should function correctly if a suitable driver is available. This new driver will ensure that the device appears to operate as usual from the operating systems' point of view for any person.

5-2. Security
A computer being secure depends on a number of technologies working properly. A modern operating system provides access to a number of resources, which are available to software running on the system, and to external devices like networks via the kernel. The operating system must be capable of distinguishing between requests which should be allowed to be processed, and others which sho uld not be processed. While some systems may simply distinguish between "privileged" and "non-

55

56

privileged", systems commonly have a form of requester identity, such as a user name. To establish identity there may be a process of authentication. Often a username must be quoted, and each username may have a password. Other methods of authentication, such as magnetic cards or biometric data, might be used instead. In some cases, especially connections from the network, resources may be accessed with no authentication at all (such as reading files over a network share). In addition to the allow/disallow model of security, a system with a high level of security will also offer auditing options. These would allow tracking of requests for access to resources (such as, "who has been reading this file?"). Internal security, or security from an already running program is only possible if all possibly harmful requests must be carried out through interrupts to the operating system kernel. If programs can directly access hardware and resources, they cannot be secured. External security involves a request from outside the computer, such as a login at a connected console or some kind of network connection. External requests are often passed through device drivers to the operating system's kernel, where they can be passed onto applications, or carried out directly. Security of operating systems has long been a concern because of highly sensitive data held on computers, both of a commercial and military nature. The United States Government Department of Defense (DoD) created the Trusted Computer System Evaluation Criteria (TCSEC) which is a standard that sets basic requirements for assessing the effectiveness of security. This became of vital importance to operating system makers, because the TCSEC was used to evaluate, classify and select computer systems being considered for the processing, storage and retrieval of sensitive or classified information. Network services include offerings such as file sharing, print services, email, web sites, and file transfer protocols (FTP), most of which can have compromised security. At the front line of security are hardware devices known as firewalls or intrusion detection/prevention systems. At the operating system level, there are a number of software firewalls available, as well as intrusion detection/prevention systems. Most modern operating systems include a software firewall, which is enabled by default. A software firewall can be configured to allow or deny network traffic to or from a service or application running on the operating system. Therefore, one can install and be running an insecure service, such as Telnet or FTP, and not have to be threatened by a security breach because the firewall would deny all traffic trying to connect to the service on that port. An alternative strategy, and the only sandbox strategy available in systems that do not meet the Popek and Goldberg virtualization requirements, is the operating

57

English for Computer and IT Engineers

system not running user programs as native code, but instead either emulates a processor or provides a host for a p-code based system such as Java. Internal security is especially relevant for multi-user systems; it allows each user of the system to have private files that the other users cannot tamper with or read. Internal security is also vital if auditing is to be of any use, since a program can potentially bypass the operating system, inclusive of bypassing auditing.

5-2-1. Example:

icrosoft Windows

While the Windows 9x series offered the option of having profiles for multiple users, they had no concept of access privileges, and did not allow concurrent access; and so were not true multi-user operating systems. In addition, they implemented only partial memory protection. They were accordingly widely criticised for lack of security. The Windows NT series of operating systems, by contrast, are true multi-user, and implement absolute memory protection. However, a lot of the advantages of being a true multi-user operating system were nullified by the fact that, prior to Windows Vista, the first user account created during the setup process was an administrator account, which was also the default for new accounts. Though Windows XP did have limited accounts, the majority of home users did not change to an account type with fewer rights -- partially due to the number of programs which unnecessarily required administrator rights -- and so most home users ran as administrator all the time. Windows Vista changes this by introducing a privilege elevation system called User Account Control. When logging in as a standard user, a logon session is created and a token containing only the most basic privileges is assigned. In this way, the new logon session is incapable of making changes that would affect the entire system. When logging in as a user in the Administrators group, two separate tokens are assigned. The first token contains all privileges typically awarded to an administrator, and the second is a restricted token similar to what a standard user would receive. User applications, including the Windows Shell, are then started with the restricted token, resulting in a reduced privilege environment even under an Administrator account. When an application requests higher privileges or "Run as administrator" is clicked, UAC will prompt for confirmation and, if consent is given (including administrator credentials if the account requesting the elevation is not a member of the administrators group), start the process using the unrestricted token.[4]

57

58

5-2-2. Example: Linux/Unix


Linux and UNIX both have two tier security, which limits any system-wide changes to the root user, a special user account on all UNIX-like systems. While the root user has virtually unlimited permission to affect system changes, programs running as a regular user are limited in where they can save files, what hardware they can access, etc. In many systems, a user's memory usage, their selection of available programs, their total disk usage or quota, available range of programs' priority settings, and other functions can also be locked down. This provides the user with plenty of freedom to do what needs done, without being able to put any part of the system in jeopardy (barring accidental triggering of system-level bugs) or make sweeping, system-wide changes. The user's settings are stored in an area of the computer's file system called the user's home directory, which is also provided as a location where the user may store their work, similar to My Documents on a Windows system. Should a user have to install software or make system-wide changes, they must become the root user temporarily, usualy with the su command, which is answered with the computer's root password when prompted. Some systems (such as Ubuntu and its derivatives) are configured by default to allow select users to run programs as the root user via the sudo command, using the user's own password for authentication instead of the system's root password. One is sometimes said to "go root" when elevating oneself to root access. For more information on the differences between the Linux su/sudo approach and Vista's User Account Control, see Comparison of privilege authorization features.

5-3. File system support in modern operating systems


Support for file systems is highly varied among modern operating systems although there are several common file systems which almost all operating systems include support and drivers for.

5-3-1. Linux and UNIX


Many Linux distributions support some or all of ext2, ext3, ReiserFS, Reiser4, JFS , XFS , GFS, GFS2, OCFS, OCFS2, and NILFS. The ext file systems, namely ext2 and ext3 are based on the original Linux file system. Others have been developed by companies to meet their specific needs, hobbyists, or adapted from UNIX,

59

English for Computer and IT Engineers

Microsoft Windows, and other operating systems. Linux has full support for XFS and JFS, along with FAT (the MS-DOS file system), and HFS which is the primary file system for the Macintosh. In recent years support for Microsoft Windows NT's NTFS file system has appeared in Linux, and is now comparable to the support available for other native UNIX file systems. ISO 9660 and UDF are supported which are standard file systems used on CDs, DVDs, and BluRay discs. It is possible to install Linux on the majority of these file systems. Unlike other operating systems, Linux and UNIX allow any file system to be used regardless of the media it is stored on, whether it is a hard drive, CD or DVD, or even contained within a file located on an another file system.

5-3-2.

icrosoft Windows

Microsoft Windows presently supports NTFS and FAT file systems, along with network file systems shared from other computers, and the ISO 9660 and UDF filesystems used for CDs, DVDs, and other optical discs such as BluRay. Under Windows each file system is usually limited in application to certain media, for example CDs must use ISO 9660 or UDF, and as of Windows Vista, NTFS is the only file system which the operating system can be installed on. Details of its design are not known. Windows Embedded CE 6.0 introduced ExFAT, a file system more suitable for flash drives.

5-3-3.

ac OS X

Mac OS X supports HFS+ with journaling as its primary file system. It is derived from the Hierarchical File System of the earlier Mac OS. Mac OS X has facilities to read and write FAT, NTFS, UDF, and other file systems, but cannot be installed to them. Due to its UNIX heritage Mac OS X now supports virtually all the file systems supported by the UNIX VFS. Recently Apple Inc. started work on porting Sun Microsystem's ZFS filesystem to Mac OS X and preliminary support is already available in Mac OS X 10.5.

59

60

5-3-4. Special purpose file systems


FAT file systems are commonly found on floppy discs, flash memory cards, digital cameras, and many other portable devices because of their relative simplicity. Performance of FAT compares poorly to most other file systems as it uses overly simplistic data structures, making file operations time-consuming, and makes poor use of disk space in situations where many small files are present. ISO 9660 and Universal Disk Format are two common formats that target Compact Discs and DVDs. Mount Rainier is a newer extension to UDF supported by Linux 2.6 kernels and Windows Vista that facilitates rewriting to DVDs in the same fashion as has been possible with floppy disks.

5-3-5. Journalized file systems


File systems may provide journaling, which provides safe recovery in the event of a system crash. A journaled file system writes some information twice: first to the journal, which is a log of file system operations, then to its proper place in the ordinary file system. Journaling is handled by the file system driver, and keeps track of each operation taking place that changes the contents of the disk. In the event of a crash, the system can recover to a consistent state by replaying a portion of the journal. Many UNIX file systems provide journaling including ReiserFS, JFS, and Ext3. In contrast, non-journaled file systems typically need to be examined in their entirety by a utility such as fsck or chkdsk for any inconsistencies after an unclean shutdown. Soft updates is an alternative to journaling that avoids the redundant writes by carefully ordering the update operations. Log-structured file systems and ZFS also differ from traditional journaled file systems in that they avoid inconsistencies by always writing new copies of the data, eschewing in-place updates.

5-4. Graphical user interfaces


Most modern computer systems support graphical user interfaces (GUI), and often include them. In some computer systems, such as the original implementations of Microsoft Windows and the Mac OS, the GUI is integrated into the kernel. While technically a graphical user interface is not an operating system service, incorporating support for one into the operating system kernel can allow the GUI to be more responsive by reducing the number of context switches required for the

61

English for Computer and IT Engineers

GUI to perform its output functions. Other operating systems are modular, separating the graphics subsystem from the kernel and the Operating System. In the 1980s UNIX, VMS and many others had operating systems that were built this way. Linux and Mac OS X are also built this way. Modern releases of Microsoft Windows such as Windows Vista implement a graphics subsystem that is mostly in user-space, however versions between Windows NT 4.0 and Windows Server 2003's graphics drawing routines exist mostly in kernel space. Windows 9x had very little distinction between the interface and the kernel. Many computer operating systems allow the user to install or create any user interface they desire. The X Window System in conjunction with GNOME or KDE is a commonly-found setup on most Unix and Unix-like (BSD, Linux, Minix) systems. A number of Windows shell replacements have been released for Microsoft Windows, which offer alternatives to the included Windows shell, but the shell itself cannot be separated from Windows. Numerous Unix-based GUIs have existed over time, most derived from X11. Competition among the various vendors of Unix (HP, IBM, Sun) led to much fragmentation, though an effort to standardize in the 1990s to COSE and CDE failed for the most part due to various reasons, eventually eclipsed by the widespread adoption of GNOME and KDE. Prior to open source-based toolkits and desktop environments, Motif was the prevalent toolkit/desktop combination (and was the basis upon which CDE was developed). Graphical user interfaces evolve over time. For example, Windows has modified its user interface almost every time a new major version of Windows is released, and the Mac OS GUI changed dramatically with the introduction of Mac OS X in 2001.

5-5.

istory

The first computers did not have operating systems. By the early 1960s, commercial computer vendors were supplying quite extensive tools for streamlining the development, scheduling, and execution of jobs on batch processing systems. Examples were produced by UNIVAC and Control Data Corporation, amongst others. The operating systems originally deployed on mainframes, and, much later, the original microcomputer operating systems, only supported one program at a time, requiring only a very basic scheduler. Each program was in complete control of the

61

62

machine while it was running. Multitasking (timesharing) first came to mainframes in the 1960s. In 1969-70, UNIX first appeared on the PDP-7 and later the PDP-11. It soon became capable of providing cross-platform time sharing using preemptive multitasking, advanced memory management, memory protection, and a host of other advanced features. UNIX soon gained popularity as an operating system for mainframes and minicomputers alike. MS-DOS provided many operating system like features, such as disk access. However many DOS programs bypassed it entirely and ran directly on hardware. IBM's version, PC-DOS, ran on IBM microcomputers, including the IBM PC and the IBM PC XT, and MS-DOS came into widespread use on clones of these machines. IBM PC compatibles could also run Microsoft Xenix, a UNIX-like operating system from the early 1980s. Xenix was heavily marketed by Microsoft as a multiuser alternative to its single user MS-DOS operating system. The CPUs of these personal computer, could not facilitate kernel memory protection or provide dual mode operation, so Microsoft Xenix relied on cooperative multitasking and had no protected memory. The 80286-based IBM PC AT was the first computer technically capable of using dual mode operation, and providing memory protection. Classic Mac OS, and Microsoft Windows 1.0-3.11 supported only cooperative multitasking (Windows 95, 98, & ME supported preemptive multitasking only when running 32 bit applications, but ran legacy 16 bit applications using cooperative multitasking), and were very limited in their abilities to take advantage of protected memory. Application programs running on these operating systems must yield CPU time to the scheduler when they are not using it, either by default, or by calling a function. Windows NT's underlying operating system kernel which was a designed by essentially the same team as Digital Equipment Corporation's VMS, a UNIX-like operating system which provided protected mode operation for all user programs, kernel memory protection, preemptive multi-tasking, virtual file system support, and a host of other features.

63

English for Computer and IT Engineers

Classic AmigaOS and Windows 1.0-Me did not properly track resources allocated by processes at runtime. If a process had to be terminated, the resources might not be freed up for new programs until the machine was restarted. The AmigaOS did have preemptive multitasking.

5-6.

ainframes

Through the 1960s, many major features were pioneered in the field of operating systems. The development of the IBM System/360 produced a family of mainframe computers available in widely differing capacities and price points, for which a single operating system OS/360 was planned (rather than developing ad-hoc programs for every individual model). This concept of a single OS spanning an entire product line was crucial for the success of System/360 and, in fact, IBM`s current mainframe operating systems are distant descendants of this original system; applications written for the OS/360 can still be run on modern machines. In the mid-70's, the MVS, the descendant of OS/360 offered the first[citation needed ] implementation of using RAM as a transparent cache for disk resident data. OS/360 also pioneered a number of concepts that, in some cases, are still not seen outside of the mainframe arena. For instance, in OS/360, when a program is started, the operating system keeps track of all of the system resources that are used including storage, locks, data files, and so on. When the process is terminated for any reason, all of these resources are re-claimed by the operating system. An alternative CP-67 system started a whole line of operating systems focused on the concept of virtual machines. Control Data Corporation developed the SCOPE operating system in the 1960s, for batch processing. In cooperation with the University of Minnesota, the KRONOS and later the NOS operating systems were developed during the 1970s, which supported simultaneous batch and timesharing use. Like many commercial timesharing systems, its interface was an extension of the Dartmouth BASIC operating systems, one of the pioneering efforts in timesharing and programming languages. In the late 1970s, Control Data and the University of Illinois developed the PLATO operating system, which used plasma panel displays and long-distance time sharing networks. Plato was remarkably innovative for its time, featuring realtime chat, and multi-user graphical games. Burroughs Corporation introduced the B5000 in 1961 with the MCP, (Master Control Program) operating system. The B5000 was a stack machine designed to

63

64

exclusively support high-level languages with no machine language or assembler, and indeed the MCP was the first OS to be written exclusively in a high-level language ESPOL, a dialect of ALGOL. MCP also introduced many other groundbreaking innovations, such as being the first commercial implementation of virtual memory. MCP is still in use today in the Unisys ClearPath/MCP line of computers. UNIVAC, the first commercial computer manufacturer, produced a series of EXEC operating systems. Like all early main-frame systems, this was a batch-oriented system that managed magnetic drums, disks, card readers and line printers. In the 1970s, UNIVAC produced the Real-Time Basic (RTB) system to support largescale time sharing, also patterned after the Dartmouth BASIC system. General Electric and MIT developed General Electric Comprehensive Operating Supervisor (GECOS), which introduced the concept of ringed security privilege levels. After acquisition by Honeywell it was renamed to General Comprehensive Operating System (GCOS). Digital Equipment Corporation developed many operating systems for its various computer lines, including TOPS-10 and TOPS-20 time sharing systems for the 36bit PDP-10 class systems. Prior to the widespread use of UNIX, TOPS-10 was a particularly popular system in universities, and in the early ARPANET community. In the late 1960s through the late 1970s, several hardware capabilities evolved that allowed similar or ported software to run on more than one system. Early systems had utilized microprogramming to implement features on their systems in order to permit different underlying architecture to appear to be the same as others in a series. In fact most 360's after the 360/40 (except the 360/165 and 360/168) were microprogrammed implementations. But soon other means of achieving application compatibility were proven to be more significant. The enormous investment in software for these systems made since 1960s caused most of the original computer manufacturers to continue to develop compatible operating systems along with the hardware. The notable supported mainframe operating systems include:
y y y y

Burroughs MCP -- B5000,1961 to Unisys Clearpath/MCP, present. IBM OS/360 -- IBM System/360, 1966 to IBM z/OS, present. IBM CP-67 -- IBM System/360, 1967 to IBM z/VM, present. UNIVAC EXEC 8 -- UNIVAC 1108, 1964, to Unisys Clearpath IX, present.

65

English for Computer and IT Engineers

65

66

6. Web engineering
The World Wide Web has become a major delivery platform for a variety of complex and sophisticated enterprise applications in several domains. In addition to their inherent multifaceted functionality, these Web applications exhibit complex behavior and place some unique demands on their usability, performance, security and ability to grow and evolve. However, a vast majority of these applications continue to be developed in an adhoc way, contributing to problems of usability, maintainability, quality and reliability. While Web development can benefit from established practices from other related disciplines, it has certain distinguishing characteristics that demand special considerations. In the recent years, there have been some developments towards addressing these problems and requirements. As an emerging discipline, Web engineering actively promotes systematic, disciplined and quantifiable approaches towards successful development of high-quality, ubiquitously usable Web-based systems and applications. In particular, Web engineering focuses on the methodologies, techniques and tools that are the foundation of Web application development and which support their design, development, evolution, and evaluation. Web application development has certain characteristics that make it different from traditional software, information system, or computer application development. Web engineering is multidisciplinary and encompasses contributions from diverse areas: systems analysis and design, software engineering, hypermedia/hypertext engineering, requirements engineering, human-computer interaction, user interface, information engineering, information indexing and retrieval, testing, modelling and simulation, project management, and graphic design and presentation. Web engineering is neither a clone, nor a subset of software engineering, although both involve programming and software development. While Web Engineering uses software engineering principles, it encompasses new approaches, methodologies, tools, techniques, and guidelines to meet the unique requirements of Web-based applications.

67

English for Computer and IT Engineers

6-1. Web design


Web page design is a process of conceptualization, planning, modeling, and execution of electronic media content delivery via Internet in the form of technologies (such as markup languages) suitable for interpretation and display by a web browser or other web-based graphical user interfaces (GUIs).

Figure 6-1: An example of a web page that uses CSS La outs

The intent of web design is to create a web site (a collection of electronic files residing on one or more web servers) that presents content (including interactive features or interfaces) to the end user in the form of web pages once requested. Such elements as text, forms, and bit-mapped images (GIFs, JPEGs, PNGs) can be placed on the page using HTML, XHTML, or XML tags. Displaying more complex media (vector graphics, animations, videos, sounds) usually requires plugins such as Flash, QuickTime, Java run-time environment, etc. Plug-ins are also embedded into web pages by using HTML or XHTML tags. Improvements in the various browsers' compliance with W3C standards prompted a widespread acceptance of XHTML and XML in conjunction with Cascading Style Sheets (CSS) to position and manipulate web page elements. The latest

67

Web engineering 68 standards and proposals aim at leading to the various browsers' ability to deliver a wide variety of media and accessibility options to the client possibly without employing plug-ins. Typically web pages are classified as static or dynamic.
y

Static pages dont change content and layout with every request unless a human (web master or programmer) manually updates the page. Dynamic pages adapt their content and/or appearance depending on the end-users input or interaction or changes in the computing environment (user, time, database modifications, etc.) Content can be changed on the client side (end-user's computer) by using client-side scripting languages (JavaScript, JScript, Actionscript, media players and PDF reader plug-ins, etc.) to alter DOM elements (DHTML). Dynamic content is often compiled on the server utilizing server-side scripting languages (PHP, ASP, Perl, Coldfusion, JSP, Python, etc.). Both approaches are usually used in complex applications.

With growing specialization within communication design and information technology fields, there is a strong tendency to draw a clear line between web design specifically for web pages and web development for the overall logistics of all web-based services.

6-2-1.

istory

Tim Berners-Lee published what is considered to be the first website in August 1991. Berners-Lee was the first to combine Internet communication (which had been carrying email and the Usenet for decades) with hypertext (which had also been around for decades, but limited to browsing information stored on a single computer, such as interactive CD-ROM design). Websites are written in a markup language called HTML, and early versions of HTML were very basic, only giving websites basic structure (headings and paragraphs), and the ability to link using hypertext. This was new and different from existing forms of communication users could easily navigate to other pages by following hyperlinks from page to page. As the Web and web design progressed, the markup language changed to become more complex and flexible, giving the ability to add objects like images and tables to a page. Features like tables, which were originally intended to be used to display tabular information, were soon subverted for use as invisible layout devices. With

69

English for Computer and IT Engineers

the advent of Cascading Style Sheets (CSS), table-based layout is increasingly regarded as outdated. Database integration technologies such as server-side scripting and design standards like W3C further changed and enhanced the way the Web is made. As times change, websites are changing the code on the inside and visual design on the outside with ever-evolving programs and utilities. With the progression of the Web, tens of thousands of web design companies have been established around the world to serve the growing demand for such work. As with much of the information technology industry, many web design companies have been established in technology parks in the developing world as well as many Western design companies setting up offices in countries such as India, Romania, and Russia to take advantage of the relatively lower labor rates found in such countries.

6-2-2. Web Site esign


A Web site is a collection of information about a particular topic or subject. Designing a web site is defined as the arrangement and creation of web pages that in turn make up a web site. A web page consists of information for which the web site is developed. A web site might be compared to a book, where each page of the book is a web page. There are many aspects (design concerns) in this process, and due to the rapid development of the Internet, new aspects may emerge. For non-commercial web sites, the goals may vary depending on the desired exposure and response. For typical commercial web sites, the basic aspects of design are:
y

y y

The content: the substance, and information on the site should be relevant to the site and should target the area of the public that the website is concerned with. The usability: the site should be user-friendly, with the interface and navigation simple and reliable. The appearance: the graphics and text should include a single style that flows throughout, to show consistency. The style should be professional, appealing and relevant. The visibility: the site must also be easy to find via most, if not all, major search engines and advertisement media.

A web site typically consists of text and images. The first page of a web site is known as the Home page or Index. Some web sites use what is commonly called a Splash Page. Splash pages might include a welcome message, language or region

69

Web engineering 70 selection, or disclaimer. Each web page within a web site is an HTML file which has its own URL. After each web page is created, they are typically linked together using a navigation menu composed of hyperlinks. Faster browsing speeds have led to shorter attention spans and more demanding online visitors and this has resulted in less use of Splash Pages, particularly where commercial web sites are concerned. Once a web site is completed, it must be published or uploaded in order to be viewable to the public over the internet. This may be done using an FTP client. Once published, the web master may use a variety of techniques to increase the traffic, or hits, that the web site receives. This may include submitting the web site to a search engine such as Google or Yahoo, exchanging links with other web sites, creating affiliations with similar web sites, etc.

6-2-3. Issues
As in collaborative designs, there are conflicts between differing goals and methods of web site designs. These are a few of the ongoing ones.

Lack of collaboration in desi n


In the early stages of the web, there wasn't as much collaboration between web designs and larger advertising campaigns, customer transactions, social networking, intranets and extranets as there is now. Web pages were mainly static online brochures disconnected from the larger projects. Many web pages are still disconnected from larger projects. Special design considerations are necessary for use within these larger projects. These design considerations are often overlooked, especially in cases where there is a lack of leadership, lack of understanding of why and technical knowledge of how to integrate, or lack of concern for the larger project in order to facilitate collaboration. This often results in unhealthy competition or compromise between departments, and less than optimal use of web pages.

Li uid versus fi ed layouts


On the web the designer has no control over several factors, including the size of the browser window, the web browser used, the input devices used (mouse, touch screen, voice command, text, cell phone number pad, etc.) and the size and characteristics of available fonts.

71

English for Computer and IT Engineers

Some designers choose to control the appearance of the elements on the screen by using specific width designations. This control may be achieved through the use of a HTML table-based design or a more semantic div-based design through the use of CSS. Whenever the text, images, and layout of a design do not change as the browser changes, this is referred to as a fixed width design. Proponents of fixed width design prefer precise control over the layout of a site and the precision placement of objects on the page. Other designers choose a liquid design. A liquid design is one where the design moves to flow content into the whole screen, or a portion of the screen, no matter what the size of the browser window. Proponents of liquid design prefer greater compatibility and using the screen space available. Liquid design can be achieved through the use of CSS, by avoiding styling the page altogether, or by using HTML tables (or more semantic divs) set to a percentage of the page. Both liquid and fixed design developers must make decisions about how the design should degrade on higher and lower screen resolutions. Sometimes the pragmatic choice is made to flow the design between a minimum and a maximum width. This allows the designer to avoid coding for the browser choices making up The Long Tail, while still using all available screen space. Depending on the purpose of the content, a web designer may decide to use either fixed or liquid layouts on a case-by-case basis. Similar to liquid layout is the optional fit to window feature with Adobe Flash content. This is a fixed layout that optimally scales the content of the page without changing the arrangement or text wrapping when the browser is resized.

Flash
Adobe Flash (formerly Macromedia Flash) is a proprietary, robust graphics animation or application development program used to create and deliver dynamic content, media (such as sound and video), and interactive applications over the web via the browser. Flash is not a standard produced by a vendor-neutral standards organization like most of the core protocols and formats on the Internet. Flash is much more restrictive than the open HTML format, though, requiring a proprietary plugin to be seen, and it does not integrate with most web browser UI features like the "Back" button. According to a study, 98% of US Web users have the Flash Player installed. Numbers vary depending on the detection scheme and research demographics.

71

Web engineering 72 Many graphic artists use Flash because it gives them exact control over every part of the design, and anything can be animated and generally "jazz up". Some ed application designers enjoy Flash because it lets them create applications that do not have to be refreshed or go to a new web page every time an action occurs. Flash can use embedded fonts instead of the standard fonts installed on most computers. There are many sites which forgo HTML entirely for Flash. Other sites may use Flash content combined with HTML as conservatively as gifs or jpegs would be used, but with smaller vector file sizes and the option of faster loading animations. Flash may also be used to protect content from unauthorized duplication or searching. Alternatively, small, dynamic Flash objects may be used to replace standard HTML elements (such as headers or menu links) with advanced typography not possible via regular HTML or CSS (see Scalable Inman Flash Replacement). Flash detractors claim that Flash websites tend to be poorly designed, and often use confusing and non-standard user-interfaces, such as the inability to scale according to the size of the web browser, or it's incompatibility with common browser features such as the back button. Up until recently, search engines have been unable to index Flash objects, which has prevented sites from having their contents easily found. This is because many search engine crawlers rely on text to index websites. It is possible to specify alternate content to be displayed for browsers that do not support Flash. Using alternate content also helps search engines to understand the page, and can result in much better visibility for the page. However, the vast majority of Flash websites are not disability accessible (for screen readers, for example) or Section 508 compliant. An additional issue is that sites which commonly use alternate content for search engines to their huma visitors are n usually judged to be spamming search engines and are automatically banned. The most recent incarnation of Flash's scripting language (called "ActionScript", which is an ECMA language similar to JavaScript) incorporates long-awaited usability features, such as respecting the browser's font size and allowing blind users to use screen readers. Actionscript 2.0 is an Object-Oriented language, allowing the use of CSS, XML, and the design of class-based web applications.

When Netscape Navigator 4 dominated the browser market, the popular solution available for designers to lay out a Web page was by using tables. Often even simple designs for a page would require dozens of tables nested in each other. Many web templates in Dreamweaver and other WYSIWYG editors still use this

HH

versus tables for layout

73

English for Computer and IT Engineers

technique today. Navigator 4 didn't support CSS to a useful degree, so it simply wasn't used. After the browser wars subsided, and the dominant browsers such as Internet Explorer became more W3C compliant, designers started turning toward CSS as an alternate means of laying out their pages. CSS proponents say that tables should be used only for tabular data, not for layout. Using CSS instead of tables also returns HTML to a semantic markup, which helps bots and search engines understand what's going on in a web page. All modern Web browsers support CSS with different degrees of limitations. However, one of the main points against CSS is that by relying on it exclusively, control is essentially relinquished as each browser has its own quirks which result in a slightly different page display. This is especially a problem as not every browser supports the same subset of CSS rules. For designers who are used to table-based layouts, developing Web sites in CSS often becomes a matter of trying to replicate what can be done with tables, leading some to find CSS design rather cumbersome due to lack of familiarity. For example, at one time it was rather difficult to produce certain design elements, such as vertical positioning, and fulllength footers in a design using absolute positions. With the abundance of CSS resources available online today, though, designing with reasonable adherence to standards involves little more than applying CSS 2.1 or CSS 3 to properly structured markup. These days most modern browsers have solved most of these quirks in CSS rendering and this has made many different CSS layouts possible. However, some people continue to use old browsers, and designers need to keep this in mind, and allow for graceful degrading of pages in older browsers. Most notable among these old browsers are Internet Explorer 5 and 5.5, which, according to some web designers, are becoming the new Netscape Navigator 4 a block that holds the World Wide Web back from converting to CSS design. However, the W3 Consortium has made CSS in combination with XHTML the standard for web design.

Form versus Function


Some web developers have a graphic arts background and may pay more attention to how a page looks than considering other issues such as how visitors are going to find the page via a search engine. Some might rely more on advertising than search engines to attract visitors to the site. On the other side of the issue, search engine optimization consultants (SEOs) are concerned with how well a web site works

73

Web engineering 74 technically and textually: how much traffic it generates via search engines, and how many sales it makes, assuming looks don't contribute to the sales. As a result, the designers and SEOs often end up in disputes where the designer wants more 'pretty' graphics, and the SEO wants lots of 'ugly' keyword-rich text, bullet lists, and text links. One could argue that this is a false dichotomy due to the possibility that a web design may integrate the two disciplines for a collaborative and synergistic solution. Because some graphics serve communication purposes in addition to aesthetics, how well a site works may depend on the graphic designer's visual communication ideas as well as the SEO considerations. Another problem when using lots of graphics on a page is that download times can be greatly lengthened, often irritating the user. This has become less of a problem as the internet has evolved with high-speed internet and the use of vector graphics. This is an engineering challenge to increase bandwidth in addition to an artistic challenge to minimize graphics and graphic file sizes. This is an on-going challenge as increased bandwidth invites increased amounts of content.

6-2-4. Accessible Web design


To be accessible, web pages and sites must conform to certain accessibility principles. These can be grouped into the following main areas:
y y

y y y y y y

use semantic markup that provides a meaningful structure to the document (i.e. web page) Semantic markup also refers to semantically organizing the web page structure and publishing web services description accordingly so that they can be recognized by other web services on different web pages. Standards for semantic web are set by IEEE use a valid markup language that conforms to a published DTD or Schema provide text equivalents for any non-text components (e.g. images, multimedia) use hyperlinks that make sense when read out of context. (e.g. avoid "Click Here.") don't use frames use CSS rather than HTML Tables for layout. author the page so that when the source code is read line-by-line by user agents (such as a screen readers) it remains intelligible. (Using tables for design will often result in information that is not.)

However, W3C permits an exception where tables for layout either make sense when linearized or an alternate version (perhaps linearized) is made available.

75

English for Computer and IT Engineers

Website accessibility is also changing as it is impacted by Content Management Systems that allow changes to be made to webpages without the need of obtaining programming language knowledge.

6-2-5. Website Planning


Before creating and uploading a website, it is important to take the time to plan exactly what is needed in the website. Thoroughly considering the audience or target market, as well as defining the purpose and deciding what content will be developed are extremely important.

Purpose
It is essential to define the purpose of the website as one of the first steps in the planning process. A purpose statement should show focus based on what the website will accomplish and what the users will get from it. A clearly defined purpose will help the rest of the planning process as the audience is identified and the content of the site is developed. Setting short and long term goals for the website will help make the purpose clear and plan for the future when expansion, modification, and improvement will take place. Also, goal-setting practices and measurable objectives should be identified to track the progress of the site and determine success.

Audience
Defining the audience is a key step in the website planning process. The audience is the group of people who are expected to visit your website the market being targeted. These people will be viewing the website for a specific reason and it is important to know exactly what they are looking for when they visit the site. A clearly defined purpose or goal of the site as well as an understanding of what visitors want to do or feel when they come to your site will help to identify the target audience. Upon considering who is most likely to need or use the content, a list of characteristics common to the users such as:
y y y y

Audience Characteristics Information Preferences Computer Specifications Web Experience

Taking into account the characteristics of the audience will allow an effective website to be created that will deliver the desired content to the target audience.

75

Web engineering 76

Content
Content evaluation and organization requires that the purpose of the website be clearly defined. Collecting a list of the necessary content then organizing it according to the audience's needs is a key step in website planning. In the process of gathering the content being offered, any items that do not support the defined purpose or accomplish target audience objectives should be removed. It is a good idea to test the content and purpose on a focus group and compare the offerings to the audience needs. The next step is to organize the basic information structure by categorizing the content and organizing it according to user needs. Each category should be named with a concise and descriptive title that will become a link on the website. Planning for the site's content ensures that the wants or needs of the target audience and the purpose of the site will be fulfilled.

Compatibility and restrictions


Because of the market share of modern browsers (depending on your target market), the compatibility of your website with the viewers is restricted. For instance, a website that is designed for the majority of websurfers will be limited to the use of valid XHTML 1.0 Strict or older, Cascading Style Sheets Level 1, and 1024x768 display resolution. This is because Internet Explorer is not fully W3C standards compliant with the modularity of XHTML 1.1 and the majority of CSS beyond 1. A target market of more alternative browser (e.g. Firefox and Opera) users allow for more W3C compliance and thus a greater range of options for a web designer. Another restriction on webpage design is the use of different Image file formats. The majority of users can support GIF, JPEG, and PNG (with restrictions). Again Internet Explorer is the major restriction here, not fully supporting PNG's advanced transparency features, resulting in the GIF format still being the most widely used graphic file format for transparent images. Many website incompatibilities go unnoticed by the designer and unreported by the users. The only way to be certain a website will work on a particular platform is to test it on that platform.

Planning documentation
Documentation is used to visually plan the site while taking into account the purpose, audience and content, to design the site structure, content and interactions that are most suitable for the website. Documentation may be considered a

77

English for Computer and IT Engineers

prototype for the website a model which allows the website layout to be reviewed, resulting in suggested changes, improvements and/or enhancements. This review process increases the likelihood of success of the website. First, the content is categorized and the information structure is formulated. The information structure is used to develop a document or visual diagram called a site map. This creates a visual of how the web pages will be interconnected, which helps in deciding what content will be placed on what pages. There are three main ways of diagramming the website structure:
y y y

Linear Website Diagrams will allow the users to move in a predetermined sequence; Hierarchical structures (of Tree Design Website Diagrams) provide more than one path for users to take to their destination; Branch Design Website Diagrams allow for many interconnections between web pages such as hyperlinks within sentences.

In addition to planning the structure, the layout and interface of individual pages may be planned using a storyboard. In the process of storyboarding, a record is made of the description, purpose and title of each page in the site, and they are linked together according to the most effective and logical diagram type. Depending on the number of pages required for the website, documentation methods may include using pieces of paper and drawing lines to connect them, or creating the storyboard using computer software. Some or all of the individual pages may be designed in greater detail as a website wireframe, a mock up model or comprehensive layout of what the page will actually look like. This is often done in a graphic program, or layout design program. The wireframe has no working functionality, only planning.

6-3. Web page


A web page or webpage is a resource of information that is suitable for the World Wide Web and can be accessed through a web browser. This information is usually in HTML or XHTML format, and may provide navigation to other web pages via hypertext links.

77

Web engineering

78

Figure 6-2: A screenshot of a web page.

Web pages may be retrieved from a local computer or from a remote web server. The web server may restrict access only to a private network, e.g. a corporate intranet, or it may publish pages on the World Wide Web. Web pages are requested and served from web servers using Hypertext Transfer Protocol (HTTP). Web pages may consist of files of static text stored within the web server's file system (static web pages), or the web server may construct the (X)HTML for each web page when it is requested by a browser (dynamic web pages). Client-side scripting can make web pages more responsive to user input once in the client browser.

6-3-1. Color, typography, illustration and interaction


Web pages usually include instructions as to the colors of text and backgrounds and very often also contain links to images and sometimes other media to be included in the final view. Layout, typographic and color-scheme information is provided by Cascading Style Sheet (CSS) instructions, which can either be embedded in the HTML or can be provided by a separate file, which is referenced from within the HTML. The latter case is especially relevant where one lengthy stylesheet is relevant to a whole website: due to the way HTTP works, the browser will only download it once from the web server and use the cached copy for the whole site.(notepad) Images are stored on the web server as separate files, but again HTTP allows for the fact that once a web page is downloaded to a browser, it is quite likely that related files such as images and stylesheets will be requested as it is processed. An

79

English for Computer and IT Engineers

HTTP 1.1 web server will maintain a connection with the browser until all related resources have been requested and provided. Browsers usually render images along with the text and other material on the displayed web page. Client-sides computer code such as JavaScript or code implementing Ajax techniques can be provided either embedded in the HTML of a web page or, like CSS stylesheets, as separate, linked downloads specified in the HTML (using for example .js file extensions for JavaScript files). These scripts may run on the client computer, if the user allows them to, and can provide a degree of interactivity between the web page and the user after the page has downloaded.

6-3-2. Browsers
A web browser can have a Graphical User Interface, like Internet Explorer, Mozilla Firefox, or Opera, or can be text-based, like Lynx. Web users with visual impairments may use a screen reader to read out the displayed text, or they may use a more specialized voice browser in the first place. Such users will want to enjoy the benefit of the web page without images and other visual media. Users of fully graphical browsers may still disable the download and viewing of images and other media, to save time, network bandwidth or merely to simplify their browsing experience. Users may also prefer not to use the fonts, font sizes, styles and color schemes selected by the web page designer and may apply their own CSS styling to their viewed version of the page. The World Wide Web Consortium (W3C) and Web Accessibility Initiative (WAI) recommend that all web pages should be designed with all of these options in mind.

6-3-3.

endering

Web pages will often require more screen space than is available for a particular display resolution. Most modern browsers will place scrollbars (the bar at the side of the screen that allows you to move down) in the window to allow the user to see all content. Scrolling horizontally is less prevalent than vertical scrolling, not only because those pages do not print properly, but because it inconveniences the user more so than vertical scrolling would (because lines are horizontal; scrolling back and forth for every line is much more inconvenient than scrolling after reading a whole screen; also most computer keyboards have page up and down keys, and many computer mice have vertical scroll wheels, but the horizontal scrolling

79

Web engineering 80 equivalents are rare). However, web pages may utilize page widening for various purposes.. A web page can either be a single HTML file, or made up of several HTML files represented using frames. Frames have been known to cause problems with navigation, printing, and search engine rankings, although these problems occur mostly in older-generation browsers. Their primary usage is to allow certain content which is usually meant to be static, such as page navigation or page headers, to remain in one place while the main content can be scrolled as necessary. Another merit of using a framed web page is that only the content in the "main" frame will be reloaded. Frames are rendered very differently, depending on the host browser and for this reason, the usage of frames is typically frowned upon in professional web page development communities. With design technologies such as CSS becoming more widespread in their usage, the effect frames provide can be made possible using a smaller amount of code and by using only one web page to display the same amount of content. When web pages are stored in a common directory of a web server, they become a website. A website will typically contain a group of web pages that are linked together, or have some other coherent method of navigation. The most important web page to have on a website is the index page. Depending on the web server settings, this index page can have many different names, but the most common are index.htm and index.html. When a browser visits the homepage for a website, or any URL pointing to a directory rather than a specific file, the web server will serve the index page to the requesting browser. If no index page is defined in the configuration, or no such file exists on the server, either an error or directory listing will be served to the browser. When creating a web page, it is important to ensure it conforms to the World Wide Web Consortium (W3C) standards for HTML, CSS, XML and other standards. The W3C standards are in place to ensure all browsers which conform to their standards can display identical content without any special consideration for proprietary rendering techniques. A properly coded web page is going to be accessible to many different browsers old and new alike, display resolutions, as well as those users with audio or visual impairments.

6-3-4.

reating a web page

To create a web page, a text editor or a specialized HTML editor is needed. In order to upload the created web page to a web server, traditionally an FTP client is needed.

81

English for Computer and IT Engineers

The design of a web page is highly personal. A design can be made according to one's own preference, or a pre-made web template can be used. Web Templates let web page designers edit the content of a web page without having to worry about the overall aesthetics. Many people publish their own web pages using products like Geocities from Yahoo, Tripod, or Angelfire. These web publishing tools offer free page creation and hosting up to a certain size limit. Other ways of making a web page is to download specialized software, like a Wiki, CMS, or forum. These options allow for quick and easy creation of a web page which is typically dynamic. Wikipedia, WordPress, and Invision Power Board are examples of the above three web page options.

6-3-5. Saving a web page


While one is viewing a web page, a copy of it is saved locally; this is what is being viewed. Depending on the browser settings, this copy may be deleted at any time, or stored indefinitely, sometimes without the user realizing it. Most GUI browsers will contain all the options for saving a web page more permanently. These include, but are not limited to:
y y y

Saving the rendered text without formatting or images - Hyperlinks are not identified, but displayed as plain text Saving the HTML file as it was served - Overall structure will be preserved, although some links may be broken Saving the HTML file and changing relative links to absolute ones Hyperlinks will be preserved

Saving the entire web page - All images will be saved, as well as links being changed to absolute
y

Saving the HTML file including all images, stylesheets and scripts into a single MHTML file. This is supported by Internet Explorer, Mozilla, Mozilla Firefox and Opera. Mozilla and Mozilla Firefox only support this if the MAF plugin has been installed. An MHTML file is based upon the MHTML standard.

Common web browsers, like Mozilla Firefox, Internet Explorer and Opera, give the option to not only print the currently viewed web page to a printer, but optionally to "print" to a file which can be viewed or printed later. Some web pages are designed, for example by use of CSS, so that hyperlinks, menus and other navigation items, which will be useless on paper, are rendered into print with this

81

Web engineering 82 in mind. Space-wasting menus and navigational blocks may be absent from the printed version; other hyperlinks may be shown with the link destinations made explicit, either within the body of the page or listed at the end.

83

English for Computer and IT Engineers

7.

T L

T L, an initialism of yperText arkup Language, is the predominant markup language for Web pages. It provides a means to describe the structure of text-based information in a document by denoting certain text as links, headings, paragraphs, lists, and so on and to supplement that text with interactive forms, embedded images, and other objects. HTML is written in the form of tags, surrounded by angle brackets. HTML can also describe, to some degree, the appearance and semantics of a document, and can include embedded scripting language code (such as JavaScript) which can affect the behavior of Web browsers and other HTML processors.
Files and URLs containing HTML often have a .html filename extension.

7-1.

T L markup

HTML markup consists of several key components, including elements (and their attributes), character-based data types, and character references and entity references. Another important component is the document type declaration. The Hello world program, a common computer program employed for comparing programming languages, scripting languages, and markup languages is made of 10 lines of code in HTML, albeit line breaks are optional:
<html> <head> <title>Hello HTML</title> </head> <body> <span>Hello World!</span> </body> </html>

7-1-1. Elements
Elements are the basic structure for HTML markup. Elements have two basic properties: attributes and content. Each attribute and each element's content has certain restrictions that must be followed for an HTML document to be considered valid. An element usually has a start tag (e.g. <element-name>) and an end tag (e.g.

83

HTML

84

</element-name>). The element's attributes are contained in the start tag and content is located between the tags (e.g. <element-name attribute="value">Content</elementname>). Some elements, such as <br>, do not have any content and must not have a

closing tag. Listed below are several types of markup elements used in HTML.

Structural markup describes the purpose of text. For example, <h2>Golf</h2> establishes "Golf" as a second-level heading, which would be rendered in a browser in a manner similar to the "HTML markup" title at the start of this section. Structural markup does not denote any specific rendering, but most Web browsers have standardized on how elements should be formatted. Text may be further styled with Cascading Style Sheets (CSS). Presentational markup describes the appearance of the text, regardless of its function. For example <b>boldface</b> indicates that visual output devices should render "boldface" in bold text, but gives no indication what devices which are unable to do this (such as aural devices that read the text aloud) should do. In the case of both <b>bold</b> and <i>italic</i>, there are elements which usually have an equivalent visual rendering but are more semantic in nature, namely <strong>strong emphasis</strong> and <em>emphasis</em> respectively. It is easier to see how an aural user agent should interpret the latter two elements. However, they are not equivalent to their presentational counterparts: it would be undesirable for a screenreader to emphasize the name of a book, for instance, but on a screen such a name would be italicized. Most presentational markup elements have become deprecated under the HTML 4.0 specification, in favor of CSS based style design. Hypertext markup links parts of the document to other documents. HTML up through version XHTML 1.1 requires the use of an anchor element to create a hyperlink in the flow of text: <a>Wikipedia</a>. However, the href attribute must also be set to a valid URL so for example the HTML code, <a href="http://en.wikipedia.org/">Wikipedia</a>, will render the word "Wikipedia" as a hyperlink.To link on an image, the anchor tag use the following syntax: <a href="url"><img src="image.gif" /></a>

7-1-2. Attributes
Most of the attributes of an element are name-value pairs, separated by "=", and written within the start tag of an element, after the element's name. The value may be enclosed in single or double quotes, although values consisting of certain characters can be left unquoted in HTML (but not XHTML). Leaving attribute values unquoted is considered unsafe. In contrast with name-value pair attributes,

85

English for Computer and IT Engineers

there are some attributes that affect the element simply by their presence in the start tag of the element (like the ismap attribute for the img element). Most elements can take any of several common attributes:
y

The id attribute provides a document-wide unique identifier for an element. This can be used by stylesheets to provide presentational properties, by browsers to focus attention on the specific element, or by scripts to alter the contents or presentation of an element. The class attribute provides a way of classifying similar elements for presentation purposes. For example, an HTML document might use the designation class="notation" to indicate that all elements with this class value are subordinate to the main text of the document. Such elements might be gathered together and presented as footnotes on a page instead of appearing in the place where they occur in the HTML source. An author may use the style non-attributal codes presentational properties to a particular element. It is considered better practice to use an elements son- id page and select the element with a stylesheet, though sometimes this can be too cumbersome for a simple ad hoc application of styled properties. The title attribute is used to attach subtextual explanation to an element. In most browsers this attribute is displayed as what is often referred to as a tooltip.

The generic inline element span can be used to demonstrate these various attributes:
<span id="anId" class="aClass" style="color:blue;" title="Hypertext Markup Language">HTML</span>

This example displays as HTML; in most browsers, pointing the cursor at the abbreviation should display the title text "Hypertext Markup Language." Most elements also take the language-related attributes lang and dir.

7-1-3.

haracter and entity references

As of version 4.0, HTML defines a set of 252 character entity references and a set of 1,114,050 numeric character references, both of which allow individual

85

HTML

86

characters to be written via simple markup, rather than literally. A literal character and its markup counterpart are considered equivalent and are rendered identically. The ability to "escape" characters in this way allows for the characters < and & (when written as &lt; and &amp;, respectively) to be interpreted as character data, rather than markup. For example, a literal < normally indicates the start of a tag, and & normally indicates the start of a character entity reference or numeric character reference; writing it as &amp; or &#x26; or &#38; allows & to be included in the content of elements or the values of attributes. The double-quote character ("), when used to quote an attribute value, must also be escaped as &quot; or &#x22; or &#34; when it appears within the attribute value itself. The single-quote character ('), when used to quote an attribute value, must also be escaped as &#x27; or &#39; (should NOT be escaped as &apos; except in XHTML documents) when it appears within the attribute value itself. However, since document authors often overlook the need to escape these characters, browsers tend to be very forgiving, treating them as markup only when subsequent text appears to confirm that intent. Escaping also allows for characters that are not easily typed or that aren't even available in the document's character encoding to be represented within the element and attribute content. For example, the acute-accented e (), a character typically found only on Western European keyboards, can be written in any HTML document as the entity reference &eacute; or as the numeric references &#233; or &#xE9;. The characters comprising those references (that is, the &, the ;, the letters in eacute, and so on) are available on all keyboards and are supported in all character encodings, whereas the literal is not.

7-1-4. ata types


HTML defines several data types for element content, such as script data and stylesheet data, and a plethora of types for attribute values, including IDs, names, URIs, numbers, units of length, languages, media descriptors, colors, character encodings, dates and times, and so on. All of these data types are specializations of character data.

7-2. Semantic HT L
There is no official specification called "Semantic HTML", though the strict flavors of HTML discussed below are a push in that direction. Rather, semantic HTML refers to an objective and a practice to create documents with HTML that contain only the author's intended meaning, without any reference to how this

87

English for Computer and IT Engineers

meaning is presented or conveyed. A classic example is the distinction between the emphasis element (<em>) and the italics element (<i>). Often the emphasis element is displayed in italics, so the presentation is typically the same. However, emphasizing something is different from listing the title of a book, for example, which may also be displayed in italics. In purely semantic HTML, a book title would use a different element than emphasized text uses (for example a <span>), because they are meaningfully different things. The goal of semantic HTML requires two things of authors: 1. To avoid the use of presentational markup (elements, attributes, and other entities). 2. To use available markup to differentiate the meanings of phrases and structure in the document. So for example, the book title from above would need to have its own element and class specified, such as <cite class="booktitle">The Grapes of Wrath</cite>. Here, the <cite> element is used because it most closely matches the meaning of this phrase in the text. However, the <cite> element is not specific enough to this task, since we mean to cite specifically a book title as opposed to a newspaper article or an academic journal. Semantic HTML also requires complementary specifications and software compliance with these specifications. Primarily, the development and proliferation of CSS has led to increasing support for semantic HTML, because CSS provides designers with a rich language to alter the presentation of semantic-only documents. With the development of CSS, the need to include presentational properties in a document has virtually disappeared. With the advent and refinement of CSS and the increasing support for it in Web browsers, subsequent editions of HTML increasingly stress only using markup that suggests the semantic structure and phrasing of the document, like headings, paragraphs, quotes, and lists, instead of using markup which is written for visual purposes only, like <font>, <b> (bold), and <i> (italics). Some of these elements are not permitted in certain varieties of HTML, like HTML 4.01 Strict. CSS provides a way to separate document semantics from the content's presentation, by keeping everything relevant to presentation defined in a CSS file. See separation of style and content. Semantic HTML offers many advantages. First, it ensures consistency in style across elements that have the same meaning. Every heading, every quotation, every similar element receives the same presentation properties.

87

HTML

88

Second, semantic HTML frees authors from the need to concern themselves with presentation details. When writing the number two, for example, should it be written out in words ("two"), or should it be written as a numeral (2)? A semantic markup might enter something like <number>2</number> and leave presentation details to the stylesheet designers. Similarly, an author might wonder where to break out quotations into separate indented blocks of text: with purely semantic HTML, such details would be left up to stylesheet designers. Authors would simply indicate quotations when they occur in the text, and not concern themselves wi h t presentation. A third advantage is device independence and repurposing of documents. A semantic HTML document can be paired with any number of stylesheets to provide output to computer screens (through Web browsers), high-resolution printers, handheld devices, aural browsers or braille devices for those with visual impairments, and so on. To accomplish this, nothing needs to be changed in a wellcoded semantic HTML document. Readily available stylesheets make this a simple matter of pairing a semantic HTML document with the appropriate stylesheets. (Of course, the stylesheet's selectors need to match the appropriate properties in the HTML document.) Some aspects of authoring documents make separating semantics from style (in other words, meaning from presentation) difficult. Some elements are hybrids, using presentation in their very meaning. For example, a table displays content in a tabular form. Often such content conveys the meaning only when presented in this way. Repurposing a table for an aural device typically involves somehow presenting the table as an inherently visual element in an audible form. On the other hand, we frequently present lyrical songssomething inherently meant for audible presentationand instead present them in textual form on a Web page. For these types of elements, the meaning is not so easily separated from their presentation. However, for a great many of the elements used and meanings conveyed in HTML, the translation is relatively smooth.

7-3. elivery of HT L
HTML documents can be delivered by the same means as any other computer file; however, they are most often delivered in one of two forms: over HTTP servers and through e-mail.

7-4-1. HTTP

89

English for Computer and IT Engineers

The World Wide Web is composed primarily of HTML documents transmitted from a Web server to a Web browser using the Hypertext Transfer Protocol (HTTP). However, HTTP can be used to serve images, sound, and other content in addition to HTML. To allow the Web browser to know how to handle the document it received, an indication of the file format of the document must be transmitted along with the document. This vital metadata includes the MIME type (text/html for HTML 4.01 and earlier, application/xhtml+xml for XHTML 1.0 and later) and the character encoding (see Character encodings in HTML). In modern browsers, the MIME type that is sent with the HTML document affects how the document is interpreted. A document sent with an XHTML MIME type, or served as application/xhtml+xml, is expected to be well-formed XML, and a syntax error causes the browser to fail to render the document. The same document sent with an HTML MIME type, or served as text/html, might be displayed successfully, since Web browsers are more lenient with HTML. However, XHTML parsed in this way is not considered either proper XHTML or HTML, but so-called tag soup. If the MIME type is not recognized as HTML, the Web browser should not attempt to render the document as HTML, even if the document is prefaced with a correct Document Type Declaration. Nevertheless, some Web browsers do examine the contents or URL of the document and attempt to infer the file type, despite this being forbidden by the HTTP 1.1 specification.

7-3-2. HT L e-mail
Most graphical e-mail clients allow the use of a subset of HTML (often ill-defined) to provide formatting and semantic markup capabilities not available with plain text, like emphasized text, block quotations for replies, and diagrams or mathematical formulas that could not easily be described otherwise. Many of these clients include both a GUI editor for composing HTML e-mail messages and a rendering engine for displaying received HTML messages. Use of HTML in e-mail is controversial because of compatibility issues, because it can be used in phishing/privacy attacks, because it can confuse spam filters, and because the message size is larger than plain text.

7-3-3. Naming conventions


The most common filename extension for files containing HTML is .html. A common abbreviation of this is .htm; it originates from older operating systems and

89

HTML

90

file systems, such as the DOS versions from the 80s and early 90s and FAT, which limit file extensions to three letters.

7-4. ynamic HT L
ynamic HT L, or HT L, is a collection of technologies used together to create interactive and animated web sites by using a combination of a static markup language (such as HTML), a client-side scripting language (such as JavaScript), a presentation definition language (such as CSS), and the Document Object Model.
DHTML allows scripting languages to change variables in a web page's definition language, which in turn affects the look and function of otherwise "static" HTML page content, after the page has been fully loaded and during the viewing process. Thus the dynamic characteristic of DHTML is the way it functions while a page is viewed, not in its ability to generate a unique page with each page load. By contrast, a dynamic web page is a broader concept any web page generated differently for each user, load occurrence, or specific variable values. This includes pages created by client side scripting, and ones created by server-side scripting (such as PHP or Perl) where the web server generates content before sending it to the client. DHTML is often used to make rollover buttons or drop-down menus on a web page. A less common use is to create browser based action games. During the late 1990s and early 2000s, a number of games were created using DHTML, but differences between browsers made this difficult: Many techniques had to be implemented in code to enable the games to work on multiple platforms. Recently browsers have been converging towards the web standards, which has made the design of DHTML games more viable. Those games can be played on all major browsers and they can also be ported to Widgets for Mac OS X and Gadgets for Windows Vista, which are based on DHTML code. The term has fallen out of use in recent years, as DHTML scripts often tended to not work well between various web browsers. Newer techniques, such as unobtrusive JavaScript coding (DOM Scripting), allow similar effects, but in an accessible, standards-compliant way through Progressive Enhancement. Some disadvantages of DHTML are that it is difficult to develop and debug due to varying degrees of support among web browsers of the technologies involved, and that the variety of screen sizes means the end look can only be fine -tuned on a

91

English for Computer and IT Engineers

limited number of browser and screen-size combinations. Development for relatively recent browsers, such as Internet Explorer 5.0+, Mozilla Firefox 2.0+, and Opera 7.0+, is aided by a shared Document Object Model. Basic DHTML support was introduced with Internet Explorer 4.0, although there was a basic dynamic system with Netscape Navigator 4.0.

7-5. ascading Style Sheets


ascading Style Sheets (CSS) is a stylesheet language used to describe the presentation of a document written in a markup language. Its most common application is to style web pages written in HTML and XHTML, but the language can be applied to any kind of XML document, including SVG and XUL.
CSS can be used locally by the readers of web pages to define colors, fonts, layout, and other aspects of document presentation. It is designed primarily to enable the separation of document content (written in HTML or a similar markup language) from document presentation (written in CSS). This separation can improve content accessibility, provide more flexibility and control in the specification of presentation characteristics, and reduce complexity and repetition in the structural content (such as by allowing for tableless web design). CSS can also allow the same markup page to be presented in different styles for different rendering methods, such as on-screen, in print, by voice (when read out by a speech-based browser or screen reader) and on Braille-based, tactile devices. CSS specifies a priority scheme to determine which style rules apply if more than one rule matches against a particular element. In this so-called cascade, priorities or weights are calculated and assigned to rules, so that the results are predictable. The CSS specifications are maintained by the World Wide Web Consortium (W3C). Internet media type (MIME type) text/css is registered for use with CSS by RFC 2318 (March 1998).

7-5-1. Syntax
CSS has a simple syntax, and uses a number of English keywords to specify the names of various style properties. A style sheet consists of a list of rules. Each rule or rule-set consists of one or more selectors and a declaration block. A declaration-block consists of a list of semicolon-separated declarations in braces. Each declaration itself consists of a property, a colon (:), a value, then a semi-colon (;).

91

HTML

92

In CSS, selectors are used to declare which elements a style applies to, a kind of match expression. Selectors may apply to all elements of a specific type, or only those elements which match a certain attribute; elements may be matched depending on how they are placed relative to each other in the markup code, or on how they are nested within the document object model. In addition to these, a set of pseudo-classes can be used to define further behavior. Probably the best-known of these is :hover, which applies a style only when the user 'points to' the visible element, usually by holding the mouse cursor over it. It is appended to a selector as in a:hover or #elementid:hover. Other pseudo-classes and pseudo-elements are, for example, :first-line, :visited or :before. A special pseudoclass is :lang(c), "c". A pseudo-class selects entire elements, such as :link or :visited, whereas a pseudoelement makes a selection that may consist of partial elements, such as :first-line or :first-letter. Selectors may be combined in other ways too, especially in CSS 2.1, to achieve greater specificity and flexibility.

Use of C

Prior to CSS, nearly all of the presentational attributes of HTML documents were contained within the HTML markup; all font colors, background styles, element alignments, borders and sizes had to be explicitly described, often repeatedly, within the HTML. CSS allows authors to move much of that information to a separate stylesheet resulting in considerably simpler HTML markup. Headings (h1 elements), sub-headings (h2), sub-sub-headings (h3), etc., are defined structurally using HTML. In print and on the screen, choice of font, size, color and emphasis for these elements is presentational. Prior to CSS, document authors who wanted to assign such typographic characteristics to, say, all h2 headings had to use the HTML font and other presentational elements for each occurrence of that heading type. The additional presentational markup in the HTML made documents more complex, and generally more difficult to maintain. In CSS, presentation is separated from structure. In print, CSS can define color, font, text alignment, size, borders, spacing, layout and many other typographic characteristics. It can do so independently for on -screen and printed views. CSS also defines non-visual styles such as the speed and emphasis with which text is read out by aural text readers. The W3C now considers

II

93

English for Computer and IT Engineers

the advantages of CSS for defining all aspects of the presentation of HTML pages to be superior to other methods. It has therefore deprecated the use of all the original presentational HTML markup.

Sources
CSS information can be provided by various sources. CSS style information can be either attached as a separate document or embedded in the HTML document. Multiple style sheets can be imported, and alternative style sheets can be specified so that the user can choose between them. Different styles can be applied depending on the output device being used; for example, the screen version can be quite different from the printed version, so that authors can tailor the presentation appropriately for each medium.
y

Author styles (style information provided by the web page author), in the form of o external stylesheets, i.e. a separate CSS-file referenced from the document o embedded style, blocks of CSS information inside the HTML document itself o inline styles, inside the HTML document, style information on a single element, specified using the "style" attribute. User style o a local CSS-file specified by the user using options in the web browser, and acting as an override, to be applied to all documents. User agent style o the default style sheet applied by the user agent, e.g. the browser's default presentation of elements.

One of the goals of CSS is also to allow users a greater degree of control over presentation; those who find the red italic headings difficult to read may apply other style sheets to the document. Depending on their browser and the web site, a user may choose from various stylesheets provided by the designers, may remove all added style and view the site using their browser's default styling or may perhaps override just the red italic heading style without altering other attributes. File highlightheaders.css containing:
h1 { color: white; background: orange !important; } h2 { color: white; background: green !important; }

93

HTML

94

Such a file is stored locally and is applicable if that has been specified in the browser options. "!important" means that it prevails over the author specifications.

7-5-2. History
Style sheets have existed in one form or another since the beginnings of SGML in the 1970s. Cascading Style Sheets were developed as a means for creating a consistent approach to providing style information for web documents. As HTML grew, it came to encompass a wider variety of stylistic capabilities to meet the demands of web developers. This evolution gave the designer more control over site appearance but at the cost of HTML becoming more complex to write and maintain. Variations in web browser implementations made consistent site appearance difficult, and users had less control over how web content was displayed. To improve the capabilities of web presentation, nine different style sheet languages were proposed to the W3C's www-style mailing list. Of the nine proposals, two were chosen as the foundation for what became CSS: Cascading HTML Style Sheets (CHSS) and Stream-based Style Sheet Proposal (SSP). First, Hkon Wium Lie (now the CTO of Opera Software) proposed Cascading HTML Style Sheets (CHSS) in October 1994, a language which has some resemblance to today's CSS. Bert Bos was working on a browser called Argo which used its own style sheet language, Stream-based Style Sheet Proposal (SSP). Lie and Bos worked together to develop the CSS standard (the 'H' was removed from the name because these style sheets could be applied to other markup languages besides HTML). Unlike existing style languages like DSSSL and FOSI, CSS allowed a document's style to be influenced by multiple style sheets. One style sheet could inherit or "cascade" from another, permitting a mixture of stylistic preferences controlled equally by the site designer and user. Hkon's proposal was presented at the "Mosaic and the Web" conference in Chicago, Illinois in 1994, and again with Bert Bos in 1995. Around this time, the World Wide Web Consortium was being established; the W3C took an interest in the development of CSS, and it organized a workshop toward that end chaired by Steven Pemberton. This resulted in W3C adding work on CSS to the deliverables of the HTML editorial review board (ERB). Hkon and Bert were the primary technical staff on this aspect of the project, with additional members, including Thomas Reardon of Microsoft, participating as well. By the end of 1996, CSS was

95

English for Computer and IT Engineers

ready to become official, and the CSS level 1 Recommendation was published in December. Development of HTML, CSS, and the DOM had all been taking place in one group, the HTML Editorial Review Board (ERB). Early in 1997, the ERB was split into three working groups: HTML Working group, chaired by Dan Connolly of W3C; DOM Working group, chaired by Lauren Wood of SoftQuad; and CSS Working group, chaired by Chris Lilley of W3C. The CSS Working Group began tackling issues that had not been addressed with CSS level 1, resulting in the creation of CSS level 2 on November 4, 1997. It was published as a W3C Recommendation on May 12, 1998. CSS level 3, which was started in 1998, is still under development as of 2008. In 2005 the CSS Working Groups decided to enforce the requirements for standards more strictly. This meant that already published standards like CSS 2.1, CSS 3 Selectors and CSS 3 Text were pulled back from Candidate Recommendation to Working Draft level.

Difficulty with adoption


Although the CSS1 specification was completed in 1996 and Microsoft's Internet Explorer 3 was released in that year featuring some limited support for CSS, it would be more than three years before any web browser achieved near-full implementation of the specification. Internet Explorer 5.0 for the Macintosh, shipped in March 2000, was the first browser to have full (better than 99 percent) CSS1 support, surpassing Opera, which had been the leader since its introduction of CSS support 15 months earlier. Other browsers followed soon afterwards, and many of them additionally implemented parts of CSS2. As of July 2008, no (finished) browser has fully implemented CSS2, with implementation levels varying (see Comparison of layout engines (CSS)). Even though early browsers such as Internet Explorer 3 and 4, and Netscape 4.x had support for CSS, it was typically incomplete and afflicted with serious bugs. This was a serious obstacle for the adoption of CSS. When later 'version 5' browsers began to offer a fairly full implementation of CSS, they were still incorrect in certain areas and were fraught with inconsistencies, bugs and other quirks. The proliferation of such CSS-related inconsistencies and even the variation in feature support has made it difficult for designers to achieve a consistent appearance across platforms. Some authors commonly resort to using

95

HTML

96

some workarounds such as CSS hacks and CSS filters in order to obtain consistent results across web browsers and platforms. Problems with browsers' patchy adoption of CSS along with errata in the original specification led the W3C to revise the CSS2 standard into CSS2.1, which may be regarded as something nearer to a working snapshot of current CS support in S HTML browsers. Some CSS2 properties which no browser had successfully implemented were dropped, and in a few cases, defined behaviours were changed to bring the standard into line with the predominant existing implementations. CSS2.1 became a Candidate Recommendation on February 25, 2004, but css-21 was pulled back to Working Draft status on June 13, 2005,[3] and only returned to Candidate Recommendation status on July 19, 2007. In the past, some web servers were configured to serve all documents with the filename extension .css as mime type application/x-pointplus rather than text/css. At the time, the Net-Scene company was selling PointPlus Maker to convert PowerPoint files into Compact Slide Show files (using a .css extension).

Variations
CSS has various levels and profiles. Each level of CSS builds upon the last, typically adding new features and typically denoted as CSS1, CSS2, and CSS3. Profiles are typically a subset of one or more levels of CSS built for a particular device or user interface. Currently there are profiles for mobile devices, printers, and television sets. Profiles should not be confused with media types which were added in CSS2.

CSS 1
The first CSS specification to become an official W3C Recommendation is CSS level 1, published in December 1996. Among its capabilities are support for:
y y y y y y

Font properties such as typeface and emphasis Color of text, backgrounds, and other elements Text attributes such as spacing between words, letters, and lines of text Alignment of text, images, tables and other elements Margin, border, padding, and positioning for most elements Unique identification and generic classification of groups of attributes

The W3C maintains the CSS1 Recommendation.

97

English for Computer and IT Engineers

CSS 2
CSS level 2 was developed by the W3C and published as a Recommendation in May 1998. A superset of CSS1, CSS2 includes a number of new capabilities like absolute, relative, and fixed positioning of elements, the concept of media types, support for aural style sheets and bidirectional text, and new font properties such as shadows. The W3C maintains the CSS2 Recommendation. CSS level 2 revision 1 or CSS 2.1 fixes errors in CSS2, removes poorly-supported features and adds already-implemented browser extensions to the specification. While it was a Candidate Recommendation for several months, on June 15, 2005 it was reverted to a working draft for further review. It was returned to Candidate Recommendation status on 19 July 2007.

CSS 3
CSS level 3 is currently under development. The W3C maintains a CSS3 progress report. CSS3 is modularized and will consist of several separate Recommendations. The W3C CSS3 Roadmap provides a summary and introduction.

7-5-3. Browser support


A CSS filter is a coding technique that aims to effectively hide or show parts of the CSS to different browsers, either by exploiting CSS-handling quirks or bugs in the browser, or by taking advantage of lack of support for parts of the CSS specifications. Using CSS filters, some designers have gone as far as delivering entirely different CSS to certain browsers in order to ensure that designs are rendered as expected. Because very early web browsers were either completely incapable of handling CSS, or render CSS very poorly, designers today often routinely use CSS filters that completely prevent these browsers from accessing any of the CSS. Internet Explorer support for CSS began with IE 3.0 and increased progressively with each version. By 2008, the first Beta of Internet Explorer 8 offered support for CSS 2.1 in its best web standards mode. An example of a well-known CSS browser bug is the Internet Explorer box model bug, where box widths are interpreted incorrectly in several versions of the browser, resulting in blocks which are too narrow when viewed in Internet Explorer, but correct in standards-compliant browsers. The bug can be avoided in Internet Explorer 6 by using the correct doctype in (X)HTML documents. CSS hacks and CSS filters are used to compensate for bugs such as this, just one of

97

HTML

98

hundreds of CSS bugs that have been documented in various versions of Netscape, Mozilla Firefox, Opera, and Internet Explorer (including Internet Explorer 7). Even when the availability of CSS-capable browsers made CSS a viable technology, the adoption of CSS was still held back by designers' struggles with browsers' incorrect CSS implementation and patchy CSS support. Even today, these problems continue to make the business of CSS design more complex and costly than it should be, and cross-browser testing remains a necessity. Other reasons for continuing non-adoption of CSS are: its perceived complexity, authors' lack of familiarity with CSS syntax and required techniques, poor support from authoring tools, the risks posed by inconsistency between browsers and the increased costs of testing. Currently there is strong competition between Mozilla's Gecko layout engine, the WebKit layout engine used in Apple's Safari, the similar KHTML engine used in KDE's Konqueror browser, and Opera's Presto layout engine - each of them is leading in different aspects of CSS. As of 2007, Internet Explorer's Trident engine remains the worst at rendering CSS as judged by World Wide Web Consortium standards. In April 2008 Internet Explorer 8 beta fixes many of these shortcomings and renders CSS 2.1. The IEBlog claims that it passes some versions of the ACID2 test.

7-5-4. Limitations
It has been suggested that some of the information in this article's Criticism or Controversy section(s) be merged into other sections to achieve a more neutral presentation. (Discuss) Some noted disadvantages of using "pure" CSS include:

Inconsistent browser support Different browsers will render CSS layout differently as a result of browser bugs or lack of support for CSS features. For example Microsoft Internet Explorer, whose older versions, such as IE 6.0, implemented many CSS 2.0 properties in its own, incompatible way, misinterpreted a significant number of important properties, such as width, height, and float. Numerous so-called CSS "hacks" must be implemented to achieve consistent layout among the most popular or commonly used browsers. Pixel precise layouts can sometimes be impossible to achieve across browsers. Selectors are unable to ascend

99

English for Computer and IT Engineers

CSS offers no way to select a parent or ancestor of element that satisfies certain criteria. A more advanced selector scheme (such as XPath) would enable more sophisticated stylesheets. However, the major reasons for the CSS Working Group rejecting proposals for parent selectors are related to browser performance and incremental rendering issues. One block declaration cannot explicitly inherit from another Inheritance of styles is performed by the browser based on the containment hierarchy of DOM elements and the specificity of the rule selectors, as suggested by the section 6.4.1 of the CSS2 specification. Only the user of the blocks can refer to them by including class names into the class attribute of a DOM element. Vertical control limitations While horizontal placement of elements is generally easy to control, vertical placement is frequently unintuitive, convoluted, or impossible. Simple tasks, such as centering an element vertically or getting a footer to be placed no higher than bottom of viewport, either require complicated and unintuitive style rules, or simple but widely unsupported rules. Absence of expressions There is currently no ability to specify property values as simple expressions (such as margin-left: 10% - 3em + 4px;). This is useful in a variety of cases, such as calculating the size of columns subject to a constraint on the sum of all columns. However, a working draft with a calc() value to address this limitation has been published by the CSS WG, and Internet Explorer 5 and all later versions support a proprietary expression() statement, with similar functionality. Lack of orthogonality Multiple properties often end up doing the same job. For instance, position, display and float specify the placement model, and most of the time they cannot be combined meaningfully. A display: table-cell element cannot be floated or given position: relative, and an element with float: left should not react to changes of display. In addition, some properties are not defined in a flexible way that avoids creation of new properties. For example, you should use the "border-spacing" property on table element instead of the "margin-*" property on table cell elements. This is because according to the CSS specification, internal table elements do not have margins. Margin collapsing Margin collapsing is, while well-documented and useful, also complicated and is frequently not expected by authors, and no simple side-effect-free way is available to control it. Float containment

99

HTML

100

CSS does not explicitly offer any property that would force an element to contain floats. Multiple properties offer this functionality as a side effect, but none of them are completely appropriate in all situations. As there will be an overflow when the elements, which is contained in a container, use float property. Generally, either "position: relative" or "overflow: hidden" solves this. Floats will be different according to the web browser size and resolution, but positions can not. Lack of multiple backgrounds per element Highly graphical designs require several background images for every element, and CSS can support only one. Therefore, developers have to choose between adding redundant wrappers around document elements, or dropping the visual effect. This is partially addressed in the working draft of the CSS3 backgrounds module, which is already supported in Safari and Konqueror. Control of Element Shapes CSS currently only offers rectangular shapes. Rounded corners or other shapes may require non-semantic markup. However, this is addressed in the working draft of the CSS3 backgrounds module. Lack of Variables CSS contains no variables. This makes it necessary to do a "replace-all" when one desires to change a fundamental constant, such as the color scheme or various heights and widths. This may not even be possible to do in a reasonable way (consider the case where one wants to replace certain heights which are 50px, but not others which are also 50px; this would require very complicated regular expressions). In turn, many developers are now using PHP to control and output the CSS file by either CSS @import/PHP require, or by declaring a different header in the PHP/CSS document for the correct parsing mode. The main disadvantage to this is the lack of CSS caching, but can be very useful in many situations. Lack of column declaration While possible in current CSS, layouts with multiple columns can be complex to implement. With the current CSS, the process is often done using floating elements which are often rendered differently by different browsers, different computer screen shapes, and different screen ratios set on standard monitors. Cannot explicitly declare new scope independently of position Scoping rules for properties such as z-height look for the closest parent element with a position:absolute or position:relative attribute. This odd coupling has two undesired effects: 1) it is impossible to avoid declaring a new scope when one is forced to adjust an element's position, preventing one from using the desired scope of a parent element and 2) users are often

101

English for Computer and IT Engineers not aware that they must declare position:relative or position:absolute on any element they want to act as "the new scope". Additionally, a bug in the Firefox browser prevents one from declaring table elements as a new css scope using position:relative (one can technically do so, but numerous graphical glitches result).

7-5-5. Advantages
By combining CSS with the functionality of a Content Management System, a considerable amount of flexibility can be programmed into content submission forms. This allows a contributor, who may not be familiar or able to understand or edit CSS or HTML code to select the layout of an article or other page they are submitting on-the-fly, in the same form. For instance, a contributor, editor or author of an article or page might be able to select the number of columns and whether or not the page or article will carry an image. This information is then passed to the Content Management System, and the program logic will evaluate the information and determine, based on a certain number of combinations, how to apply classes and IDs to the HTML elements, therefore styling and positioning them according to the pre-defined CSS for that particular layout type. When working with large-scale, complex sites, with many contributors such as news and informational sites, this advantage weighs heavily on the feasibility and maintenance of the project. When CSS is used effectively, in terms of inheritance and "cascading," a global stylesheet can be used to affect and style elements site-wide. If the situation arises that the styling of the elements should need to be changed or adjusted, these changes can be made easily, simply by editing a few rules in the global stylesheet. Before CSS, this sort of maintenance was more difficult, expensive and time consuming.

101

Web Scripting languages. 102

8. Web Scripting languages.


8-1. PHP
PHP is a computer scripting language. Originally designed for producing dynamic web pages, it has evolved to include a command line interface capability and can be used in standalone graphical applications. While PHP was originally created by Rasmus Lerdorf in 1995, the main implementation of PHP is now produced by The PHP Group and serves as the de facto standard for PHP as there is no formal specification. Released under the PHP License, the Free Software Foundation considers it to be free software. PHP is a widely-used general-purpose scripting language that is especially suited for web development and can be embedded into HTML. It generally runs on a web server, taking PHP code as its input and creating web pages as output. It can be deployed on most web servers and on almost every operating system and platform free of charge. PHP is installed on more than 20 million websites and 1 million web servers. The most recent major release of PHP was version 5.2.6 on May 1, 2008.

8-1-1. History
PHP originally stood for Personal Home Page. It began in 1994 as a set of Common Gateway Interface binaries written in the C programming language by the Danish/Greenlandic programmer Rasmus Lerdorf. Lerdorf initially created these Personal Home Page Tools to replace a small set of Perl scripts he had been using to maintain his personal homepage. The tools were used to perform tasks such as displaying his rsum and recording how much traffic his page was receiving. He combined these binaries with his Form Interpreter to create PHP/FI, which had more functionality. PHP/FI included a larger implementation for the C programming language and could communicate with databases, enabling the building of simple, dynamic web applications. Lerdorf released PHP publicly on June 8, 1995 to accelerate bug location and improve the code. This release was named PHP version 2 and already had the basic functionality that PHP has today. This included Perl-like variables, form handling, and the ability to embed HTML. The syntax was similar to Perl but was more limited, simpler, and less consistent.

103

English for Computer and IT Engineers

Figure 8-1: Rasmus Lerdorf, who wrote the original Common Gatewa Interface binaries, and Andi Gutmans and Zeev Suraski, who rewrote the parser that formed PHP 3

Zeev Suraski and Andi Gutmans, two Israeli developers at the Technion IIT, rewrote the parser in 1997 and formed the base of PHP 3, changing the language's name to the recursive initialism PHP: Hypertext Preprocessor. The development team officially released PHP/FI 2 in November 1997 after months of beta testing. Afterwards, public testing of PHP 3 began, and the official launch came in June 1998. Suraski and Gutmans then started a new rewrite of PHP's core, producing the Zend Engine in 1999. They also founded Zend Technologies in Ramat Gan, Israel. On May 22, 2000, PHP 4, powered by the Zend Engine 1.0, was released. On July 13, 2004, PHP 5 was released, powered by the new Zend Engine II. PHP 5 included new features such as improved support for object-oriented programming, the PHP Data Objects extension (which defines a lightweight and consistent interface for accessing databases), and numerous performance enhancements. The most recent update released by The PHP Group is for the older PHP version 4 code branch. As of August, 2008 this branch is up to version 4.4.9. PHP 4 is no longer under development nor will any security updates be released. In 2008, PHP 5 became the only stable version under development. Late static binding has been missing from PHP and will be added in version 5.3. PHP 6 is under development alongside PHP 5. Major changes include the removal of register_globals, magic quotes, and safe mode. PHP does not have complete native support for Unicode or multibyte strings; unicode support will be included in PHP 6. Many high profile open source projects ceased to support PHP 4 in new code as of February 5, 2008, due to the GoPHP5 initiative, provided by a consortium of PHP developers promoting the transition from PHP 4 to PHP 5.

103

Web Scripting languages. 104 It runs in both 32-bit and 64-bit environments, but on Windows the only official distribution is 32-bit, requiring Windows 32-bit compatibility mode to be enabled while using IIS in a 64-bit Windows environment. There is a third-party distribution available for 64-bit Windows.

8-1-2. Usage
PHP is a general-purpose scripting language that is especially suited for web development. PHP generally runs on a web server, taking PHP code as its input and creating web pages as output. It can also be used for command-line scripting and client-side GUI applications. PHP can be deployed on most web servers, many operating systems and platforms, and can be used with many relational database management systems. It is available free of charge, and the PHP Group provides the complete source code for users to build, customize and extend for their own use. PHP primarily acts as a filter, taking input from a file or stream containing text and/or PHP instructions and outputs another stream of data; most commonly the output will be HTML. It can automatically detect the language of the user. From PHP 4, the PHP parser compiles input to produce bytecode for processing by the Zend Engine, giving improved performance over its interpreter predecessor. Originally designed to create dynamic web pages, PHP's principal focus is serverside scripting, and it is similar to other server-side scripting languages that provide dynamic content from a web server to a client, such as Microsoft's ASP.NET system, Sun Microsystems' JavaServer Pages, and mod_perl. PHP has also attracted the development of many frameworks that provide building blocks and a design structure to promote rapid application development (RAD). Some of these include CakePHP, PRADO, Symfony and Zend Framework, offering features similar to other web application frameworks. The LAMP architecture has become popular in the web industry as a way of deploying web applications. PHP is commonly used as the P in this bundle alongside Linux, Apache and MySQL, although the P may also refer to Python or Perl. As of April 2007, over 20 million Internet domains were hosted on servers with PHP installed, and PHP was recorded as the most popular Apache module. Significant websites are written in PHP including the user-facing portion of Facebook, ,Wikipedia (Mediawiki)., Yahoo!,MyYearbook and Tagged

105

English for Computer and IT Engineers

8-1-3. Speed optimization


As with many scripting languages, PHP scripts are normally kept as humanreadable source code, even on production web servers. Therefore, these PHP scripts will be compiled at runtime by the PHP engine. Compiling at runtime increases the execution time of the script because it adds an extra step in runtime. PHP scripts can be compiled before runtime using PHP compilers just like other programming languages such as C (the programming language PHP is programmed in and used to program PHP extensions). Code optimizers improve the quality of the compiled code by reducing its size and making changes that can reduce the execution time and improve performance. The nature of the PHP compiler is such that there are often opportunities for code optimization, and an example of a code optimizer is the Zend Optimizer PHP extension. PHP accelerators can offer significant performance gains by caching the compiled form of a PHP script in shared memory to avoid the overhead of parsing and compiling the code every time the script runs.

8-1-4. Security
The proportion of insecure software written in PHP, out of the total of all common software vulnerabilities, amounted to: 12% in 2003, 20% in 2004, 28% in 2005, 43% in 2006, 36% in 2007, and 33.8% for the first quarter of 2008. More than a third of these PHP software vulnerabilities are listed recently. Most of these software vulnerabilities can be exploited remotely, that is without being logged on the computer hosting the vulnerable application. The most common vulnerabilities are caused by not following best practice programming rules and vulnerabilities related to software written in old PHP versions. One very common security concern is register_globals which was disabled by default since 2002 in PHP 4.2 and was removed in PHP6. There are advanced protection patches such as Suhosin and Hardening-Patch, especially designed for web hosting environments. Installing PHP as a CGI binary rather than as an Apache module is the preferred method for added security.

105

Web Scripting languages. 106

8-1-5. Syntax
PHP only parses code within its delimiters. Anything outside its delimiters is sent directly to the output and is not parsed by PHP. The most common delimiters are <?php and ?>, which are open and close delimiters respectively. <script language="php"> and </script> delimiters are also available. Short tags (<? or <?= and ?>) are also commonly used, but like ASP-style tags (<% or <%= and %>), they are less portable as they can be disabled in the PHP configuration. For this reason, the use of short tags and ASP-style tags is discouraged. The purpose of these delimiters is to separate PHP code from non-PHP code, including HTML. Everything outside the delimiters is ignored by the parser and is passed through as output.

Figure 8-2: S ntax-highlighted PHP code embedded within HTML

Variables are prefixed with a dollar symbol and a type does not need to be specified in advance. Unlike function and class names, variable names are case sensitive. Both double-quoted ("") and heredoc strings allow the ability to embed a variable's value into the string. PHP treats newlines as whitespace in the manner of a free-form language (except when inside string quotes), and statements are terminated by a semicolon. PHP has three types of comment syntax: /* */ serves as block comments, and // as well as # are used for inline comments. The echo statement is one of several facilities PHP provides to output text (e.g. to a web browser). In terms of keywords and language syntax, PHP is similar to most high level languages that follow the C style syntax. If conditions, for and while loops, and function returns are similar in syntax to languages such as C, C++, Java and Perl. Data t pes PHP stores whole numbers in a platform-dependent range. This range is typically that of 32-bit signed integers. Unsigned integers are converted to signed values in R

107

English for Computer and IT Engineers

certain situations; this behavior is different from other programming languages. Integer variables can be assigned using decimal (positive and negative), octal, and hexadecimal notations. Real numbers are also stored in a platform-specific range. They can be specified using floating point notation, or two forms of scientific notation. PHP has a native Boolean type that is similar to the native Boolean types in Java and C++.Using the Boolean type conversion rules, non-zero values are interpreted as true and zero as false, as in Perl and C++. The null data type represents a variable that has no value. The only value in the null data type is NULL. Variables of the "resource" type represent references to resources from external sources. These are typically created by functions from a particular extension, and can only be processed by functions from the same extension; examples include file, image, and database resources. Arrays can contain elements of any type that PHP can handle, including resources, objects, and even other arrays. Order is preserved in lists of values and in hashes with both keys and values, and the two can be intermingled. PHP also supports strings, which can be used with single quotes, double quotes, or heredoc syntax. The Standard PHP Library (SPL) attempts to solve standard problems and implements efficient data access interfaces and classes.

Functions
PHP has hundreds of base functions and thousands more from extensions.

5.2 and earlier


Functions are not first-class functions and can only be referenced by their name. User-defined functions can be created at any time without being prototyped. Functions can be defined inside code blocks, permitting a run-time decision as to whether or not a function should be defined. Function calls must use parentheses, with the exception of zero argument class constructor functions called with the PHP new operator, where parentheses are optional. PHP supports quasi-anonymous functions through the create_function() function, although they are not true anonymous functions because anonymous functions are nameless, but functions can only be referenced by name, or indirectly through a variable $function_name();, in PHP.

5.3 and newer


PHP gained support for first-class functions and closures. True anonymous functions are supported using the following syntax:

107

Web Scripting languages. 108


function getAdder($x) { return function ($y) use ($x) { // or: lexical $x; return $x + $y; }; }

Here, getAdder() function creates a closure using parameter $x (keyword "use" forces getting variable from context), which takes additional argument $y and returns it to the caller. Such function can be stored, given as the parameter to another functions, etc. For more details see Lambda functions and closures RFC.

Objects
Basic object-oriented programming functionality was added in PHP 3. Object handling was completely rewritten for PHP 5, expanding the feature set and enhancing performance. In previous versions of PHP, objects were handle like d primitive types. The drawback of this method was that the whole object was copied when a variable was assigned or passed as a parameter to a method. In the new approach, objects are referenced by handle, and not by value. PHP 5 introduced private and protected member variables and methods, along with abstract classes and final classes as well as abstract methods and final methods. It also introduced a standard way of declaring constructors and destructors, similar to that of other object-oriented languages such as C++, and a standard exception handling model. Furthermore, PHP 5 added interfaces and allowed for multiple interfaces to be implemented. There are special interfaces that allow objects to interact with the runtime system. Objects implementing ArrayAccess can be used with array syntax and objects implementing Iterator or IteratorAggregate can be used with the foreach language construct. There is no virtual table feature in the engine, so static variables are bound with a name instead of a reference at compile time. If the developer creates a copy of an object using the reserved word clone, the Zend engine will check if a __clone() method has been defined or not. If not, it will call a default __clone() which will copy the object's properties. If a __clone() method is defined, then it will be responsible for setting the necessary properties in the created object. For convenience, the engine will supply a function that imports the properties of the source object, so that the programmer can start with a by-value replica of the source object and only override properties that need to be changed.

8-1-6.

esources

109

English for Computer and IT Engineers

PHP includes free and open source libraries with the core build. PHP is a fundamentally Internet-aware system with modules built in for accessing FTP servers, many database servers, embedded SQL libraries such as embedded MySQL and SQLite, LDAP servers, and others. Many functions familiar to C programmers such as those in the stdio family are available in the standard PHP build. PHP has traditionally used features such as "magic_quotes_gpc" and "magic_quotes_runtime" which attempt to escape apostrophes (') and quotes (") in strings in the assumption that they will be used in databases, to prevent SQL injection attacks. This leads to confusion over which data is escaped and which is not, and to problems when data is not in fact used as input to a database and when the escaping used is not completely correct. To make code portable between servers which do and do not use magic quotes, developers can preface their code with a script to reverse the effect of magic quotes when it is applied. PHP allows developers to write extensions in C to add functionality to the PHP language. These can then be compiled into PHP or loaded dynamically at runtime. Extensions have been written to add support for the Windows API, process management on Unix-like operating systems, multibyte strings (Unicode), cURL, and several popular compression formats. Some more unusual features include integration with Internet relay chat, dynamic generation of images and Adobe Flash content, and even speech synthesis. The PHP Extension Community Library (PECL) project is a repository for extensions to the PHP language. Zend provides a certification program for programmers to become certified PHP developers.

8-2. Active Server Pages


Active Server Pages (ASP) is Microsoft's first server-side script engine for dynamically-generated web pages. It was initially marketed as an add-on to Internet Information Services (IIS) via the Windows NT 4.0 Option Pack, but has been included as a free component of Windows Server since the initial release of Windows 2000 Server.
Programming ASP websites is made easier by various built-in objects. Each object corresponds to a group of frequently-used functions useful for creating dynamic web pages. In ASP 2.0 there are six such built-in objects: Application, ASPError, Request, Response, Server, and Session. Session, for example, is a cookie-based session object that maintains variables from page to page. Web pages with the ".asp" file extension use ASP, although some Web sites disguise their choice of

109

Web Scripting languages. 110 scripting language for security purposes. The ".aspx" extension is not an ASP page, but an ASP.NET page, another server-side scripting language from Microsoft, based on a mixture of traditional ASP, and Microsoft's .NET technology. Most ASP pages are written in VBScript, but any other Active Scripting engine can be selected instead by using the @Language directive or the <script language="language" runat="server"> syntax. JScript (Microsoft's implementation of ECMAScript) is the other language that is usually available. PerlScript (a derivative of Perl) and others are available as third-party installable Active Scripting engines.

8-3. JavaScript
JavaScript is a scripting language most often used for client-side web development. It was the originating dialect of the ECMAScript standard. It is a dynamic, weakly typed, prototype-based language with first-class functions. JavaScript was influenced by many languages and was designed to look like Java, but be easier for non-programmers to work with.
Although best known for its use in websites (as client-side JavaScript), JavaScript is also used to enable scripting access to objects embedded in other applications (see below). JavaScript, despite the name, is essentially unrelated to the Java programming language, although both have the common C syntax, and JavaScript copies many Java names and naming conventions. The language was originally named "LiveScript" but was renamed in a co-marketing deal between Netscape and Sun, in exchange for Netscape bundling Sun's Java runtime with their then -dominant browser. The key design principles within JavaScript are inherited from the Self and Scheme programming languages. "JavaScript" is a trademark of Sun Microsystems. It was used under license for technology invented and implemented by Netscape Communications and current entities such as the Mozilla Foundation.

8-3-1. History and naming


JavaScript was originally developed by Brendan Eich of Netscape under the name Mocha, which was later renamed to LiveScript, and finally to JavaScript. The change of name from LiveScript to JavaScript roughly coincided with Netscape

111

English for Computer and IT Engineers

adding support for Java technology in its Netscape Navigator web browser. JavaScript was first introduced and deployed in the Netscape browser version 2.0B3 in December 1995. The naming has caused confusion, giving the impression that the language is a spin-off of Java, and it has been characterized by many as a marketing ploy by Netscape to give JavaScript the cachet of what was then the hot new web-programming language. Microsoft named its dialect of the language JScript to avoid trademark issues. JScript was first supported in Internet Explorer version 3.0, released in August 1996, and it included Y2K-compliant date functions, unlike those based on java.util.Date in JavaScript at the time. The dialects are perceived to be so similar that the terms "JavaScript" and "JScript" are often used interchangeably (including in this article). Microsoft, however, notes dozens of ways in which JScript is not ECMA compliant. Netscape submitted JavaScript to Ecma International for standardization resulting in the standardized version named ECMAScript.

8-3-2. Features
Structured programming
JavaScript supports all the structured programming syntax in C (e.g., if statements, while loops, switch statements, etc.). One partial exception is scoping: C-style block-level scoping is not supported. JavaScript 1.7, however, supports block-level scoping with the let keyword. Like C, JavaScript makes a distinction between expressions and statements.

Dynamic programming dynamic typing As in most scripting languages, types are associated with values, not variables. For example, a variable x could be bound to a number, then later rebound to a string. JavaScript supports various ways to test the type of an object, including duck typing. objects as associative arrays JavaScript is heavily object-based. Objects are associative arrays, augmented with prototypes (see below). Object property names are associative array keys: obj.x = 10 and obj["x"] = 10 are equivalent, the dot notation being merely syntactic sugar. Properties and their values can be added, changed, or deleted at run-time. The properties of an object can also be enumerated via a for...in loop.

111

Web Scripting languages. 112 run-time evaluation JavaScript includes an eval function that can execute statements provided as strings at run-time.

Function-level programming first-class functions Functions are first-class; they are objects themselves. As such, they have properties and can be passed around and interacted with like any other object. inner functions and closures Inner functions (functions defined within other functions) are created each time the outer function is invoked, and variables of the outer functions for that invocation continue to exist as long as the inner functions still exist, even after that invocation is finished (e.g. if the inner function was returned, it still has access to the outer function's variables) this is the mechanism behind closures within JavaScript. Prototype-based prototypes JavaScript uses prototypes instead of classes for defining object properties, including methods, and inheritance. It is possible to simulate many classbased features with prototypes in JavaScript. functions as object constructors Functions double as object constructors along with their typical role. Prefixing a function call with new creates a new object and calls that function with its local this keyword bound to that object for that invocation. The function's prototype property determines the new object's prototype. functions as methods Unlike many object-oriented languages, there is no distinction between a function definition and a method definition. Rather, the distinction occurs during function calling; a function can be called as a method. When a function is invoked as a method of an object the function's local this , keyword is bound to that object for that invocation. Others run-time environment JavaScript typically relies on a run-time environment (e.g. in a web browser) to provide objects and methods by which scripts can interact with "the outside world". (This is not a language feature per se, but it is common in most JavaScript implementations.) variadic functions

113

English for Computer and IT Engineers

An indefinite number of parameters can be passed to a function. The function can both access them through formal parameters and the local arguments object. array and object literals Like many scripting languages, arrays and objects (associative arrays in other languages) can be created with a succinct shortcut syntax. The object literal in particular is the basis of the JSON data format. regular expressions JavaScript also supports regular expressions in a manner similar to Perl, which provide a concise and powerful syntax for text manipulation that is more sophisticated than the built-in string functions.

8-3-3. Use in web pages


The primary use of JavaScript is to write functions that are embedded in or included from HTML pages and interact with the Document Object Model (DOM) of the page. Some simple examples of this usage are:
y

y y

Opening or popping up a new window with programmatic control over the size, position, and attributes of the new window (i.e. whether the menus, toolbars, etc. are visible). Validation of web form input values to make sure that they will be accepted before they are submitted to the server. Changing images as the mouse cursor moves over them: This effect is often used to draw the user's attention to important links displayed as graphical elements.

Because JavaScript code can run locally in a user's browser (rather than on a remote server) it can respond to user actions quickly, making an application feel more responsive. Furthermore, JavaScript code can detect user actions which HTML alone cannot, such as individual keystrokes. Applications such as Gmail take advantage of this: much of the user-interface logic is written in JavaScript, and JavaScript dispatches requests for information (such as the content of an e -mail message) to the server. The wider trend of Ajax programming similarly exploits this strength. A JavaScript engine (also known as JavaScript interpreter or JavaScript implementation) is an interpreter that interprets JavaScript source code and executes the script accordingly. The first ever JavaScript engine was created by Brendan Eich at Netscape Communications Corporation, for the Netscape Navigator web browser. The engine, code-named SpiderMonkey, is implemented

113

Web Scripting languages. 114 in C. It has since been updated (in JavaScript 1.5) to conform to ECMA-262 Edition 3. The Rhino engine, created primarily by Norris Boyd (also at Netscape) is a JavaScript implementation in Java. Rhino, like SpiderMonkey, is ECMA-262 Edition 3 compliant. The most common host environment for JavaScript is by far a web browser. Web browsers typically use the public API to create "host objects" responsible for reflecting the DOM into JavaScript. The web server is another common application of the engine. A JavaScript webserver would expose host objects representing an HTTP request and response objects, which a JavaScript program could then manipulate to dynamically generate web pages. A minimal example of a web page containing JavaScript (using HTML 4.01 syntax) would be:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <html> <head><title>simple page</title></head> <body> <script type="text/javascript"> document.write('Hello World!'); </script> <noscript> <p>Your browser either does not support JavaScript, or you have JavaScript turned off.</p> </noscript> </body> </html>

Compatibility considerations
The DOM interfaces for manipulating web pages are not part of the ECMAScript standard, or of JavaScript itself. Officially, they are defined by a separate standardization effort by the W3C; in practice, browser implementations differ from the standards and from each other, and not all browsers execute JavaScript. To deal with these differences, JavaScript authors can attempt to write standardscompliant code which will also be executed correctly by most browsers; failing that, they can write code that checks for the presence of certain browser features and behaves differently if they are not available. In some cases, two browsers may both implement a feature but with different behavior, and authors may find it

115

English for Computer and IT Engineers

practical to detect what browser is running and change their script's behavior to match. Programmers may also use libraries or toolkits which take browser differences into account. Furthermore, scripts will not work for all users. For example, a user may:
y y y y

use an old or rare browser with incomplete or unusual DOM support, use a PDA or mobile phone browser which cannot execute JavaScript, have JavaScript execution disabled as a security precaution, or be visually or otherwise disabled and use a speech browser

To support these users, web authors can try to create pages which degrade gracefully on user agents (browsers) which do not support the page's JavaScript.

Security
JavaScript and the DOM provide the potential for malicious authors to deliver scripts to run on a client computer via the web. Browser authors contain this risk using two restrictions. First, scripts run in a sandbox in which they can only perform web-related actions, not general-purpose programming tasks like creating files. Second, scripts are constrained by the same origin policy: scripts from one web site do not have access to information such as usernames, passwords, or cookies sent to another site. Most JavaScript-related security bugs are breaches of either the same origin policy or the sandbox.

Cross-site vulnerabilities
A common JavaScript-related security problem is cross-site scripting, or XSS, a violation of the same-origin policy. XSS vulnerabilities occur when an attacker is able to cause a trusted web site, such as an online banking website, to include a malicious script in the webpage presented to a victim. The script in this example can then access the banking application with the privileges of the victim, potentially disclosing secret information or transferring money without the victim's authorization. XSS vulnerabilities can also occur because of implementation mistakes by browser authors. XSS is related to cross-site request forgery or XSRF. In XSRF one website causes a victim's browser to generate fraudulent requests to another site with the victim's legitimate HTTP cookies attached to the request.

115

Web Scripting languages. 116

Misunderstanding the client-server boundary


Client-server applications, whether they involve JavaScript or not, must assume that untrusted clients may be under the control of attackers. Thus any secret embedded in JavaScript could be extracted by a determined adversa and the ry, output of JavaScript operations should not be trusted by the server. Some implications:
y

Web site authors cannot perfectly conceal how their JavaScript operates, because the code is sent to the client, and obfuscated code can be reverse engineered. JavaScript form validation only provides convenience for users, not security. If a site verifies that the user agreed to its terms of service, or filters invalid characters out of fields that should only contain numbers, it must do so on the server, not only the client. It would be extremely bad practice to embed a password in JavaScript (where it can be extracted by an attacker), then have JavaScript verify a user's password and pass "password_ok=1" back to the server (since the "password_ok=1" response is easy to forge).

It also does not make sense to rely on JavaScript to prevent user interface operations (such as "view source" or "save image"). This is because a client could simply ignore such scripting.

Browser and plugin coding errors


JavaScript provides an interface to a wide range of browser capabilities, some of which may have flaws such as buffer overflows. These flaws can allow attackers to write scripts which would run any code they wish on the user's system. These flaws have affected major browsers including Firefox, Internet Explorer, and Safari. Plugins, such as video players, Macromedia Flash, and the wide range of ActiveX controls enabled by default in Microsoft Internet Explorer, may also have flaws exploitable via JavaScript, and such flaws have been exploited in the past. In Windows Vista, Microsoft has attempted to contain the risks of bugs such as buffer overflows by running the Internet Explorer process with limited privileges.

117

English for Computer and IT Engineers

Sandbox implementation errors


Web browsers are capable of running JavaScript outside of the sandbox, with the privileges necessary to, for example, create or delete files. Of course, such privileges aren't meant to be granted to code from the web. Incorrectly granting privileges to JavaScript from the web has played a role in vulnerabilities in both Internet Explorer and Firefox. In Windows XP Service Pack 2, Microsoft demoted JScript's privileges in Internet Explorer. Some versions of Microsoft Windows allow JavaScript stored on a computer's hard drive to run as a general-purpose, non-sandboxed program. This makes JavaScript (like VBScript) a theoretically viable vector for a Trojan horse, although JavaScript Trojan horses are uncommon in practice. (See Windows Script Host.)

8-3-4. Uses outside web pages


Outside the web, JavaScript interpreters are embedded in a number of tools. Each of these applications provides its own object model which provides access to the host environment, with the core JavaScript language remaining mostly the same in each application.
y y y

y y y y

ActionScript, the programming language used in Adobe Flash, is another implementation of the ECMAScript standard. Apple's Dashboard Widgets, Microsoft's Gadgets, Yahoo! Widgets, Google Desktop Gadgets are implemented using JavaScript. The Mozilla platform, which underlies Firefox and some other web browsers, uses JavaScript to implement the graphical user interface (GUI) of its various products. Adobe's Acrobat and Adobe Reader (formerly Acrobat Reader) support JavaScript in PDF files. Tools in the Adobe Creative Suite, including Photoshop, Illustrator, Dreamweaver and InDesign, allow scripting through JavaScript. Microsoft's Active Scripting technology supports the JavaScriptcompatible JScript as an operating system scripting language. The Java programming language, in version SE 6 (JDK 1.6), introduced the javax.script package, including a JavaScript implementation based on Mozilla Rhino. Thus, Java applications can host scripts that access the application's variables and objects, much like web browsers host scripts that access the browser's Document Object Model (DOM) for a webpage.

117

Web Scripting languages. 118


y y y

y y y

y y

Applications on the social network platform OpenSocial are implemented in JavaScript. Newer versions of the Qt C++ toolkit include a QtScript module to interpret JavaScript, analogous to javax.script. The interactive music signal processing software Max/MSP released by Cycling '74, offers a JavaScript model of its environment for use by developers. It allows much more precise control than the default GUI centric programming model. Late Night Software's JavaScript OSA (aka JavaScript for OSA, or JSOSA), is a freeware alternative to AppleScript for Mac OS X. It is based on the Mozilla 1.5 JavaScript implementation, with the addition of a MacOS object for interaction with the operating system and third-party applications. ECMAScript was included in the VRML97 standard for scripting nodes of VRML scene description files. Some high-end Philips universal remote panels, including TSU9600 and TSU9400, can be scripted using JavaScript. Sphere is an open source and cross platform computer program designed primarily to make role-playing games that uses JavaScript as scripting language. Adobe Integrated Runtime is a JavaScript runtime that allows developers to create desktop applications. GeoJavaScript enables access to the geospatial extensions in PDF files using TerraGo Technologies GeoPDF Toolbar and Adobe Acrobat and Reader.

8-3-5.

ebugging

Within JavaScript, access to a debugger becomes invaluable when developing large, non-trivial programs. Because there can be implementation differences between the various browsers (particularly within the Document Object Model) it is useful to have access to a debugger for each of the browsers a web application is being targeted at. Currently, Internet Explorer, Firefox, Safari, and Opera all have third-party script debuggers available for them. Internet Explorer has three debuggers available for it: Microsoft Visual Studio is the richest of the three, closely followed by Microsoft Script Editor (a component of Microsoft Office), and finally the free Microsoft Script Debugger which is far more basic than the other two. The free Microsoft Visual Web Developer Express

119

English for Computer and IT Engineers

provides a limited version of the JavaScript debugging functionality in Microsoft Visual Studio. Web applications within Firefox can be debugged using the Firebug plug-in, or the older Venkman debugger, which also works with the Mozilla browser. Firefox also has a simpler built-in Error Console, which logs JavaScript and CSS errors and warnings. Drosera is a debugger for the WebKit engine on Macintosh and Windows powering Apple's Safari. There are also some free tools such as JSLint, a code quality tool that will scan JavaScript code looking for problems, as well as a non-free tool called SplineTech JavaScript HTML Debugger. Since JavaScript is interpreted, loosely-typed, and may be hosted in varying environments, each with their own compatibility differences, a programmer has to take extra care to make sure the code executes as expected in as wide a range of circumstances as possible, and that functionality degrades gracefully when it does not. The next major version of JavaScript, 2.0, will conform to ECMA-262 4th edition.

8-3-6.

elated languages

There is not a particularly close genealogical relationship between Java and JavaScript; their similarities are mostly in basic syntax because both are ultimately derived from C. Their semantics are quite different and their object models are unrelated and largely incompatible. In Java, as in C and C++, all data is statically typed, whereas JavaScript variables, properties, and array elements may hold values of any type. The standardization effort for JavaScript also needed to avoid trademark issues, so the ECMA 262 standard calls the language ECMAScript, three editions of which have been published since the work started in November 1996. Microsoft's VBScript, like JavaScript, can be run client-side in web pages. VBScript has syntax derived from Visual Basic and is only supported by Microsoft's Internet Explorer.

119

Web Scripting languages. 120 JSON, or JavaScript Object Notation, is a general-purpose data interchange format that is defined as a subset of JavaScript. JavaScript is also considered a functional programming language like Scheme and OCaml because it has closures and supports higher-order functions. Mozilla browsers currently support LiveConnect, a feature that allows JavaScript and Java to intercommunicate on the web. However, support for LiveConnect is scheduled to be phased out in the future.

121

English for Computer and IT Engineers

8-4. Ajax (programming)


Ajax (asynchronous JavaScript and XML), or AJAX, is a group of interrelated web development techniques used for creating interactive web applications or rich Internet applications. With Ajax, web applications can retrieve data from the server asynchronously in the background without interfering with the display and behavior of the existing page. Data is retrieved using the XMLHttpRequest object or through the use of Remote Scripting in browsers that do not support it. Despite the name, the use of JavaScript, XML, or its asynchronous use is not required.

8-4-1. History
While the term Ajax was coined in 2005, techniques for the asynchronous loading of content date back to 1996, when Internet Explorer introduced the IFrame element. Microsoft's Remote Scripting, introduced in 1998, acted as a more elegant replacement for these techniques, with data being pulled in by a Java applet with which the client side could communicate using JavaScript. In 1999, Microsoft created the XMLHttpRequest object as an ActiveX control in Internet Explorer 5, and developers of Mozilla and Safari followed soon after with native versions of the object. On April 5, 2006 the World Wide Web Consortium (W3C) released the first draft specification for the object in an attempt to create an official web standard.

8-4-2. Technologies
The term Ajax has come to represent a broad group of web technologies that can be used to implement a web application that communicates with a server in the background, without interfering with the current state of the page. In the article that coined the term Ajax, Jesse James Garrett explained that it refers specifically to these technologies:
y y y y y

XHTML and CSS for presentation the Document Object Model for dynamic display of and interaction with data XML and XSLT for the interchange and manipulation of data, respectively the XMLHttpRequest object for asynchronous communication JavaScript to bring these technologies together

121

Web Scripting languages. 122 Since then, however, there have been a number of developments in the technologies used in an Ajax application, and the definition of the term Ajax. In particular, it has been noted that:
y

JavaScript is not the only client-side scripting language that can be used for implementing an Ajax application. Other languages such as VBScript are also capable of the required functionality. the XMLHttpRequest object is not necessary for asynchronous communication. It has been noted that IFrames are capable of the same effect. XML is not required for data interchange and therefore XSLT is not required for the manipulation of data. JavaScript Object Notation (JSON) is often used as an alternative format for data interchange, although other formats such as preformatted HTML or plain text can also be used.

8-4-3. Critique
Advantages
y

In many cases, the pages on a website consist of much content that is common between them. Using traditional methods, that content would have to be reloaded on every request. However, using Ajax, a web application can request only the content that needs to be updated, thus drastically reducing bandwidth usage. The use of asynchronous requests allows the client's Web browser UI to be more interactive and to respond quickly to inputs, and sections of pages can also be reloaded individually. Users may perceive the application to be faster or more responsive, even if the application has not changed on the server side. The use of Ajax can reduce connections to the server, since scripts and style sheets only have to be requested once.

Disadvantages
y

Dynamically created pages do not register themselves with the browser's history engine, so clicking the browser's "back" button would not return the user to an earlier state of the Ajax-enabled page, but would instead return them to the last page visited before it. Workarounds include the use of invisible IFrames to trigger changes in the browser's history and changing the anchor portion of the URL (following a #) when AJAX is run and monitoring it for changes.

123
y

English for Computer and IT Engineers Dynamic web page updates also make it difficult for a user to bookmark a particular state of the application. Solutions to this problem exist, many of which use the URL fragment identifier (the portion of a URL after the '#') to keep track of, and allow users to return to, the application in a given state. Because most web crawlers do not execute JavaScript code, web applications should provide an alternative means of accessing the content that would normally be retrieved with Ajax, to allow search engines to index it. Any user whose browser does not support Ajax or JavaScript, or simply has JavaScript disabled, will not be able to use its functionality. Similarly, devices such as mobile phones, PDAs, and screen readers may not have support for JavaScript or the XMLHttpRequest object. Also, screen readers that are able to use Ajax may still not be able to properly read the dynamically generated content. The same origin policy prevents Ajax from being used across domains, although the W3C has a draft that would enable this functionality.

123

Web 2.0

124

9. Web 2.0
Web 2.0 is a living term describing changing trends in the use of World Wide Web technology and web design that aims to enhance creativity, information sharing, collaboration and functionality of the web.

Figure 9-1: A tag cloud (constructed b Markus Angermeier presenting some of the themes of Web 2.0.

Web 2.0 concepts have led to the development and evolution of web -based communities and hosted services, such as social-networking sites, video sharing sites, wikis, blogs, and folksonomies. The term became notable after the first O'Reilly Media Web 2.0 conference in 2004.[2][3] Although the term suggests a new version of the World Wide Web, it does not refer to an update to any technical specifications, but to changes in the ways software developers and end-users utilize the Web. According to Tim O'Reilly: Web 2.0 is the business revolution in the computer industry caused by the move to the Internet as platform, and an attempt to understand the rules for success on that new platform.

Some technology experts, notably Tim Berners-Lee, have questioned whether one can use the term in any meaningful way, since many of the technology components of "Web 2.0" have existed since the early days of the Web.

9-1. Definition

125

English for Computer and IT Engineers

Web 2.0 has numerous definitions. Basically, the term encapsulates the idea of the proliferation of interconnectivity and interactivity of web-delivered content. Tim O'Reilly regards Web 2.0 as business embracing the web as a platform and using its strengths, for example global audiences. O'Reilly considers that Eric Schmidt's abridged slogan, don't fight the Internet, encompasses the essence of Web 2.0 building applications and services around the unique features of the Internet, as opposed to expecting the Internet to suit as a platform (effectively "fighting the Internet"). In the opening talk of a first Web 2.0 conference, O'Reilly and John Battelle summarized what they saw as the themes of Web 2.0. They argued that the web had become a platform, with software above the level of a single device, leveraging the power of the "Long Tail", and with data as a driving force. According to O'Reilly and Battelle, an architecture of participation where users can contribute website content creates network effects. Web 2.0 technologies tend to foster innovation in the assembly of systems and sites composed by pulling together features from distributed, independent developers. (This could be seen as a kind of "open source" or possible "Agile" development process, consistent with an end to the traditional software adoption cycle, typified by the so-called "perpetual beta".) Web 2.0 technology encourages lightweight business models enabled by syndication of content and of service and by ease of picking-up by early adopters. O'Reilly provided examples of companies or products that embody these principles in his description of his four levels in the hierarchy of Web 2.0 sites:
y

Level-3 applications, the most "Web 2.0"-oriented, exist only on the Internet, deriving their effectiveness from the inter-human connections and from the network effects that Web 2.0 makes possible, and growing in effectiveness in proportion as people make more use of them. O'Reilly gave eBay, Craigslist, Wikipedia, del.icio.us, Skype, dodgeball, and AdSense as examples. Level-2 applications can operate offline but gain advantages from going online. O'Reilly cited Flickr, which benefits from its shared photo-database and from its community-generated tag database. Level-1 applications operate offline but gain features online. O'Reilly pointed to Writely (now Google Docs & Spreadsheets) and iTunes (because of its music-store portion). Level-0 applications work as well offline as online. O'Reilly gave the examples of MapQuest, Yahoo! Local, and Google Maps (mapping-

125

Web 2.0 126 applications using contributions from users to advantage could rank as "level 2"). Non-web applications like email, instant-messaging clients, and the telephone fall outside the above hierarchy. In alluding to the version-numbers that commonly designate software upgrades, the phrase "Web 2.0" hints at an improved form of the World Wide Web. Technologies such as weblogs (blogs), wikis, podcasts, RSS feeds (and other forms of many-to-many publishing), social software, and web application programming interfaces (APIs) provide enhancements over read-only websites. Stephen Fry, who writes a column about technology in the British Guardian newspaper, describes Web 1.0 as: an idea in people's heads rather than a reality. Its actually an idea that the reciprocity between the user and the provider is what's emphasised. In other words, genuine interactivity, if you like, simply because people can upload as well as download.

The idea of "Web 2.0" can also relate to a transition of some websites from isolated information silos to interlinked computing platforms that function like locallyavailable software in the perception of the user. Web 2.0 also includes a social element where users generate and distribute content, often with freedom to share and re-use. This can result in a rise in the economic value of the web to businesses, as users can perform more activities online.

Here are additional definitions of Web 2.0: the philosophy of mutually maximizing collective intelligence and added value for each participant by formalized and dynamic information sharing and creation. all those Internet utilities and services sustained in a data base which can be modified by users whether in its content (adding,

127

English for Computer and IT Engineers

changing or deleting- information or associating metadata with the existing information), how play them, in tent and external aspect simultaneously The term Web 1.0 came into use after an evolution of the term Web 2.5.

9-2. Characteristics
Web 2.0 websites allow users to do more than just retrieve information. They can build on the interactive facilities of "Web 1.0" to provide "Network as platform" computing, allowing users to run software-applications entirely through a browser. Users can own the data on a Web 2.0 site and exercise control over that data. These sites may have an "Architecture of participation" that encourages users to add value to the application as they use it. This stands in contrast to very old traditional websites, the sort which limited visitors to viewing and whose content only the site's owner could modify. Web 2.0 sites often feature a rich, user-friendly interface based on Ajax, OpenLaszlo, Flex or similar rich media.

Figure 9-2: Flickr, Web 2.0 web site that allows users to upload and share photos

The concept of Web-as-participation-platform captures many of these characteristics. Bart Decrem, a founder and former CEO of Flock, calls Web 2.0 the "participatory Web" and regards the Web-as-information-source as Web 1.0. The impossibility of excluding group-members who dont contribute to the provision of goods from sharing profits gives rise to the possibility that rational members will prefer to withhold their contribution of effort and free-ride on the contribution of others.

127

Web 2.0 128 According to Best, the characteristics of Web 2.0 are: rich user experience, user participation, dynamic content, metadata, web standards and scalability. Further characteristics, such as openness, freedom and collective intelligence by way of user participation, can also be viewed as essential attributes of Web 2.0.

9-3. Technology overview


The sometimes complex and continually evolving technology infrastructure of Web 2.0 includes server-software, content-syndication, messaging-protocols, standards-oriented browsers with plugins and extensions, and various clientapplications. The differing, yet complementary approaches of such elements provide Web 2.0 sites with information-storage, creation, and dissemination challenges and capabilities that go beyond what the public formerly expected in the environment of the so-called "Web 1.0". Web 2.0 websites typically include some of the following features/techniques:
y y y y y y y y y y y

Cascading Style Sheets to aid in the separation of presentation and content Folksonomies (collaborative tagging, social classification, social indexing, and social tagging) Microformats extending pages with additional semantics REST and/or XML- and/or JSON-based APIs Rich Internet application techniques, often Ajax and/or Flex/Flash-based Semantically valid XHTML and HTML markup Syndication, aggregation and notification of data in RSS or Atom feeds mashups, merging content from different sources, client- and server-side Weblog-publishing tools wiki or forum software, etc., to support user-generated content Internet privacy, the extended power of users to manage their own privacy in cloaking or deleting their own user content or profiles.

9-4. Associated innovations


It is a common misconception that "Web 2.0" refers to various visual design elements such as rounded corners or drop shadows. While such design elements have commonly been found on popular Web 2.0 sites, the association is more one

129

English for Computer and IT Engineers

of fashion, a designer preference which became popular around the same time that "Web 2.0" became a buzz word. Another common misassociation with Web 2.0 is AJAX. This error probably comes about because many Web 2.0 sites rely heavily on AJAX or associated DHTML effects. So while AJAX is often required for Web 2.0 sites to function well, it is (usually) not required for them to function. The Freemium business model is also characteristic of many Web 2.0 sites, with the idea that core basic services are given away for free, in order to build a large user base by word-of-mouth marketing. Premium service would then be offered for a price.

9-5. Web-based applications and desktops


Ajax has prompted the development of websites that mimic desktop applications, such as word processing, the spreadsheet, and slide-show presentation. WYSIWYG wiki sites replicate many features of PC authoring applications. Still other sites perform collaboration and project management functions. In 2006 Google, Inc. acquired one of the best-known sites of this broad class, Writely. Several browser-based "operating systems" have emerged, including EyeOS and YouOS. Although coined as such, many of these services function less like a traditional operating system and more as an application platform. They mimic the user experience of desktop operating-systems, offering features and applications similar to a PC environment, as well as the added ability of being able to run within any modern browser. Numerous web-based application services appeared during the dot-com bubble of 19972001 and then vanished, having failed to gain a critical mass of customers. In 2005, WebEx acquired one of the better-known of these, Intranets.com, for USD45 million.

9-5-1. Internet applications


Rich-Internet application techniques such as AJAX, Adobe Flash, Flex, Java, Silverlight and Curl have evolved that have the potential to improve the userexperience in browser-based applications. The technologies allow a web-page to request an update for some part of its content, and to alter that part in the browser, without needing to refresh the whole page at the same time.

129

Web 2.0 130

Server-side software
Functionally, Web 2.0 applications build on the existing Web server architecture, but rely much more heavily on back-end software. Syndication differs only nominally from the methods of publishing using dynamic content management, but web services typically require much more robust database and workflow support, and become very similar to the traditional intranet functionality of an application server.

Client-side software
The extra functionality provided by Web 2.0 depends on the ability of users to work with the data stored on servers. This can come about through forms in an HTML page, through a scripting-language such as Javascript / Ajax, or through Flash, Curl Applets or Java Applets. These methods all make use of the client computer to reduce server workloads and to increase the responsiveness of the application.

9-5-2. XML and

SS

Advocates of "Web 2.0" may regard syndication of s content as a Web 2.0 ite feature, involving as it does standardized protocols, which permit end -users to make use of a site's data in another context (such as another website, a browser plugin, or a separate desktop application). Protocols which permit syn dication include RSS (Really Simple Syndication also known as "web syndication"), RDF (as in RSS 1.1), and Atom, all of them XML-based formats. Observers have started to refer to these technologies as "Web feed" as the usability of Web 2.0 evolves and the more user-friendly Feeds icon supplants the RSS icon. Specialized protocols Specialized protocols such as FOAF and XFN (both for social networking) extend the functionality of sites or permit end-users to interact without centralized websites.

9-5-3. Web APIs


Machine-based interaction, a common feature of Web 2.0 sites, uses two main approaches to Web APIs, which allow web-based access to data and functions: REST and SOAP.

131

English for Computer and IT Engineers

1. REST (Representational State Transfer) Web ]] alone to interact, with XML (eXtensible Markup Language) or JSON payloads; 2. SOAP involves POSTing more elaborate XML messages and requests to a server that may contain quite complex, but pre-defined, instructions for the server to follow. Often servers use proprietary APIs, but standard APIs (for example, for posting to a blog or notifying a blog update) have also come into wide use. Most communications through APIs involve XML or JSON payloads. See also Web Services Description Language (WSDL) (the standard way of publishing a SOAP API) and this list of Web Service specifications.

9-6. Economics
The analysis of the economic implications of "Web 2.0" applications and looselyassociated technologies such as wikis, blogs, social-networking, open-source, open-content, file-sharing, peer-production, etc. has also gained scientific attention. This area of research investigates the implications Web 2.0 has for an economy and the principles underlying the economy of Web 2.0. Cass Sunstein's book "Infotopia" discussed the Hayekian nature of collaborative production, characterized by decentralized decision-making, directed by (often non-monetary) prices rather than central planners in business or government. Don Tapscott and Anthony D. Williams argue in their book Wikinomics: How Mass Collaboration Changes Everything (2006) that the economy of "the new web" depends on mass collaboration. Tapscott and Williams regard it as important for new media companies to find ways of how to make profit with the help of Web 2.0. The prospective Internet-based economy that they term "Wikinomics" would depend on the principles of openness, peering, sharing, and acting globally. They identify seven Web 2.0 business-models (peer pioneers, ideagoras, prosumers, new Alexandrians, platforms for participation, global plantfloor, wiki workplace). Organizations could make use of these principles and models in order to prosper with the help of Web 2.0-like applications: "Companies can design and assemble products with their customers, and in some cases customers can do the majority of the value creation". "In each instance the traditionally passive buyers of editorial and advertising take active, participatory roles in value creation." Tapscott and Williams suggest business strategies as "models where masses of consumers,

131

Web 2.0 132 employees, suppliers, business partners, and even competitors cocreate value in the absence of direct managerial control". Tapscott and Williams see the outcome as an economic democracy. Some other views in the scientific debate agree with Tapscott and Williams that value-creation increasingly depends on harnessing open source/content, networking, sharing, and peering, but disagree that this will result in an economic democracy, predicting a subtle form and deepening of exploitation, in which Internet-based global outsourcing reduces labour-costs by transferring jobs from workers in wealthy nations to workers in poor nations. In such a view, the economic implications of a new web might include on the one hand the emergence of new business-models based on global outsourcing, whereas on the other hand non-commercial online platforms could undermine profit-making and anticipate a co-operative economy. For example, Tiziana Terranova speaks of "free labor" (performed without payment) in the case where prosumers produce surplus value in the circulation-sphere of the cultural industries. Some examples of Web 2.0 business models that attempt to generate revenues in online shopping and online marketplaces are referred to as social commerce and social shopping. Social commerce involves user-generated marketplaces where individuals can set up online shops and link their shops in a networked marketplace, drawing on concepts of electronic commerce and social networking. Social shopping involves customers interacting with each other while shopping, typically online, and often in a social network environment. Academic research on the economic value implications of social commerce and having sellers in online marketplaces link to each others' shops has been conducted by researcher in the s business school at Columbia University.

9-7. Criticism
The argument exists that "Web 2.0" does not represent a new version of the World Wide Web at all, but merely continues to use so-called "Web 1.0" technologies and concepts. Techniques such as AJAX do not replace underlying protocols like HTTP, but add an additional layer of abstraction on top of them. Many of the ideas of Web 2.0 had already been featured in implementations on networked systems well before the term "Web 2.0" emerged. Amazon.com, for instance, has allowed users to write reviews and consumer guides since its launch in 1995, in a form of self-publishing. Amazon also opened its API to outside developers in 2002. Previous developments also came from research in computer-supported

133

English for Computer and IT Engineers

collaborative learning and computer-supported cooperative work and from established products like Lotus Notes and Lotus Domino. In a podcast interview Tim Berners-Lee described the term "Web 2.0" as a "piece of jargon." "Nobody really knows what it means," he said, and went on to say that "if Web 2.0 for you is blogs and wikis, then that is people to people. But that was what the Web was supposed to be all along." Other criticism has included the term a second bubble (referring to the Dot-com bubble of circa 19952001), suggesting that too many Web 2.0 companies attempt to develop the same product with a lack of business models. The Economist has written of "Bubble 2.0." Venture capitalist Josh Kopelman noted that Web 2.0 had excited only 530,651 people (the number of subscribers at that time to TechCrunch, a Weblog covering Web 2.0 matters), too few users to make them an economically viable target for consumer applications. Although Bruce Sterling reports he's a fan of Web 2.0, he thinks it is now dead as a rallying concept. A few critics cite the language used to describe the hype cycle of Web 2.0 as an example of Techno-utopianist rhetoric. According to these critics, Web 2.0 is not the first example of communication creating a false, hyper-inflated sense of the value of technology and its impact on culture. The dot com boom and subsequent bust in 2000 was a culmination of rhetoric of the technological sublime in terms that would later make their way into Web 2.0 jargon. Communication as culture: essays on media and society (1989) and the technologies worth as represented in the stock market. Indeed, several years before the dot com stock market crash the then-Federal Reserve chairman Alan Greenspan equated the run up of stock values as irrational exuberance. Shortly before the crash of 2000 a book by Shiller, Robert J. Irrational Exuberance. Princeton, NJ: Princeton University Press, 2000. was released detailing the overly optimistic euphoria of the dot com industry. The book Wikinomics: How Mass Collaboration Changes Everything (2006) even goes as far as to quote critics of the value of Web 2.0 in an attempt to acknowledge that hyper inflated expectations exist but that Web 2.0 is really different.

9-8. Trademark

133

134 In November 2004, CMP Media applied to the USPTO for a service mark on the use of the term "WEB 2.0" for live events. On the basis of this application, CMP Media sent a cease-and-desist demand to the Irish non-profit organization IT@Cork on May 24, 2006, but retracted it two days later. The "WEB 2.0" service mark registration passed final PTO Examining Attorney review on May 10, 2006, but as of June 12, 2006 the PTO had not published the mark for opposition. The European Union application (application number 004972212, which would confer unambiguous status in Ireland) remains currently pending after its filing on March 23, 2006.

135

English for Computer and IT Engineers

10. Semantic Web


The Semantic Web is an evolving extension of the World Wide Web in which the semantics of information and services on the web is defined, making it possible for the web to understand and satisfy the requests of people and machines to use the web content. It derives from World Wide Web Consortium director Sir Tim Berners-Lee's vision of the Web as a universal medium for data, information, and knowledge exchange.

Figure 10-1: W3C's Semantic Web logo

At its core, the semantic web comprises a set of design principles, collaborative working groups, and a variety of enabling technologies. Some elements of the semantic web are expressed as prospective future possibilities that are yet to be implemented or realized. Other elements of the semantic web are expressed in formal specifications. Some of these include Resource Description Framework (RDF), a variety of data interchange formats (e.g. RDF/XML, N3, Turtle, NTriples), and notations such as RDF Schema (RDFS) and the Web Ontology Language (OWL), all of which are intended to provide a formal description of concepts, terms, and relationships within a given knowledge domain.

10-1. Purpose
Humans are capable of using the Web to carry out tasks such as finding the Finnish word for "cat", reserving a library book, and searching for a low price on a DVD. However, a computer cannot accomplish the same tasks without human direction because web pages are designed to be read by people, not machines. The semantic web is a vision of information that is understandable by computers, so that they can perform more of the tedious work involved in finding, sharing and combining information on the web. Tim Berners-Lee originally expressed the vision of the semantic web as follows: I have a dream for the Web [in which computers] become capable of analyzing all the data on the Web the content, links, and transactions between people and computers. A Semantic Web, which should make this possible, has yet to emerge,

135

Semantic Web

136

but when it does, the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines. The intelligent agents people have touted for ages will finally materialize. Tim Berners-Lee, 1999 Semantic publishing will benefit greatly from the semantic web. In particular, the semantic web is expected to revolutionize scientific publishing, such as real-time publishing and sharing of experimental data on the Internet. This simple but radical idea is now being explored by W3C HCLS group's Scientific Publishing Task Force. Tim Berners-Lee has further stated: People keep asking what Web 3.0 is. I think maybe when you've got an overlay of scalable vector graphics - everything rippling and folding and looking misty - on Web 2.0 and access to a semantic Web integrated across a huge space of data, you'll have access to an unbelievable data resource. Tim Berners-Lee, A 'more revolutionary' Web

10-2. elationship to the Hypertext Web


10-2-1. Markup
Many files on a typical computer can be loosely divided into documents and data. Documents like mail messages, reports, and brochures are read by humans. Data, like calendars, addressbooks, playlists, and spreadsheets are presented using an application program which lets them be viewed, searched and combined in many ways. Currently, the World Wide Web is based mainly on documents written in Hypertext Markup Language (HTML), a markup convention that is used for coding a body of text interspersed with multimedia objects such as images and interactive forms. Metadata tags, for example
<meta name="keywords" content="computing, computer studies, computer"> <meta name="description" content="Cheap widgets for sale"> <meta name="author" content="Billy Bob McThreeteeth">

137

English for Computer and IT Engineers

provide a method by which computers can categorise the content of web pages. The Semantic Web takes the concept further; it involves publishing the data in a language, Resource Description Framework (RDF), specifically for data, so that it can be categorized as human perception and be "understood" by computers. So all data is not only stored, but filed and well handled. HTML describes documents and the links between them. RDF, by contrast, describes arbitrary things such as people, meetings, or airplane parts. For example, with HTML and a tool to render it (perhaps Web browser software, perhaps another user agent), one can create and present a page that lists items for sale. The HTML of this catalog page can make simple, document-level assertions such as "this document's title is 'Widget Superstore'". But there is no capability within the HTML itself to assert unambiguously that, for example, item number X586172 is an Acme Gizmo with a retail price of 199, or that it is a consumer product. Rather, HTML can only say that the span of text "X586172" is something that should be positioned near "Acme Gizmo" and " 199", etc. There is no way to say "this is a catalog" or even to establish that "Acme Gizmo" is a kind of title or that " 199" is a price. There is also no way to express that these pieces of information are bound together in describing a discrete item, distinct from other items perhaps listed on the page.

10-2-2.

escriptive and extensible

The semantic web addresses this shortcoming, using the descriptive technologies Resource Description Framework (RDF) and Web Ontology Language (OWL), and the data-centric, customizable Extensible Markup Language (XML). These technologies are combined in order to provide descriptions that supplement or replace the content of Web documents. Thus, content may manifest as descriptive data stored in Web-accessible databases, or as markup within documents (particularly, in Extensible HTML (XHTML) interspersed with XML, or, more often, purely in XML, with layout/rendering cues stored separately). The machinereadable descriptions enable content managers to add meaning to the content, i.e. to describe the structure of the knowledge we have about that content. In this way, a machine can process knowledge itself, instead of text, using processes similar to human deductive reasoning and inference, thereby obtaining more meaningful results and facilitating automated information gathering and research by computers.

10-2-3. Semantic vs. non-Semantic Web

137

Semantic Web

138

Example tags that would be used in Non-semantic web page (Web 1.0 and Web 2.0):
<item>cat</item>

A tag that would be used by Semantic web 'page'(part of Web 3.0):


<item rdf:about="http://dbpedia.org/resource/Cat">Cat</item>

10-3. elationship to Object Orientation


A number of authors highlight the similarities which the Semantic Web shares with Object Orientation. The idea that the Semantic Web is Object Oriented is quite obvious when you understand that when hypertext and the web was first being created in the late 1980s and early 1990s it was done so using Object Oriented Programming languages such as Objective-C, Smalltalk and CORBA, in the mid1990s this development practise was furthered with the announcement of the Enterprise Objects Framework, Portable Distributed Objects and WebObjects all by NeXT , in addition to the Component Object Model released by Microsoft. XML was then released in 1998, and RDF a year after in 1999. Similarity to Object Orientation also came from two other routes, the first was the development of the very knowledge-centric "Hyperdocument" systems by Douglas Engelbart, and the second comes from the usage and development of the Hypertext Transfer Protocol. The object orientation in the Semantic Web is clear; both the Semantic Web and Object Oriented Programming have classes, attributes (relationships) and instances. Plus with Linked Data there are also Dereferenceable Unified Resource Identifiers which provides Data-by-reference, which you can find in every programming language in the form of Pointers (known as "Object Identifiers" in Object Oriented Programming Languages and Object Databases). Therefore the Unified Modeling Language can be a useful tool for Semantic Web development and Semantic Web integration with Object Oriented Software Development.

10-4. Skeptical reactions


10-4-1. Practical feasibility

139

English for Computer and IT Engineers

Critics question the basic feasibility of a complete or even partial fulfillment of the semantic web. Some develop their critique from the perspective of human behavior and personal preferences, which ostensibly diminish the likelihood of its fulfillment (see e.g., metacrap). Other commentators object that there are limitations that stem from the current state of software engineering itself (see e.g., Leaky abstraction). Where semantic web technologies have found a greater degree of practical adoption, it has tended to be among core specialized communities and organizations for intra-company projects. The practical constraints toward adoption have appeared less challenging where domain and scope is more limited than that of the general public and the World-Wide Web.

10-4-2. An unrealized idea


The original 2001 Scientific American article by Berners-Lee described an expected evolution of the existing Web to a Semantic Web. Such an evolution has yet to occur. Indeed, a more recent article from Berners-Lee and colleagues stated that: "This simple idea, however, remains largely unrealized."

10-4-3. Censorship and privacy


Enthusiasm about the semantic web could be tempered by concerns regardi g n censorship and privacy. For instance, text-analyzing techniques can now be easily bypassed by using other words, metaphors for instance, or by using images in place of words. An advanced implementation of the semantic web would make it much easier for governments to control the viewing and creation of online information, as this information would be much easier for an automated content-blocking machine to understand. In addition, the issue has also been raised that, with the use of FOAF files and geo location meta-data, there would be very little anonymity associated with the authorship of articles on things such as a personal blog.

10-4-4. oubling output formats


Another criticism of the semantic web is that it would be much more timeconsuming to create and publish content because there would need to be two formats for one piece of data: one for human viewing and one for machines. However, many web applications in development are addressing this issue by creating a machine-readable format upon the publishing of data or the request of a

139

Semantic Web

140

machine for such data. The development of microformats has been one reaction to this kind of criticism. Specifications such as eRDF and RDFa allow arbitrary RDF data to be embedded in HTML pages. The GRDDL (Gleaning Resource Descriptions from Dialects of Language) mechanism allows existing material (including microformats) to be automatically interpreted as RDF, so publishers only need to use a single format, such as HTML.

10-4-5. Need
The idea of a 'semantic web' necessarily coming from some marking code other than simple HTML is built on the assumption that it is not possible for a machine to appropriately interpret code based on nothing but the order relationships of letters and words. If this is not true, then a 'semantic web' may be possible to be built on HTML alone, making a specially built 'semantic web' coding system unnecessary. There are latent dynamic network models that can, under certain conditions, be 'trained' to appropriately 'learn' meaning based on order data, in the process 'learning' relationships with order (a kind of rudimentary working grammar). See for example latent semantic analysis.

10-5. Components
The semantic web comprises the standards and tools of XML, XML Schema, RDF, RDF Schema and OWL that are organized in the Semantic Web Stack. The OWL Web Ontology Language Overview describes the function and relationship of each of these components of the semantic web:
y

y y

XML provides an elemental syntax for content structure within documents, yet associates no semantics with the meaning of the content contained within. XML Schema is a language for providing and restricting the structure and content of elements contained within XML documents. RDF is a simple language for expressing data models, which refer to objects ("resources") and their relationships. An RDF-based model can be represented in XML syntax.

141
y

English for Computer and IT Engineers RDF Schema is a vocabulary for describing properties and classes of RDFbased resources, with semantics for generalized-hierarchies of such properties and classes. OWL adds more vocabulary for describing properties and classes: among others, relations between classes (e.g. disjointness), cardinality (e.g. "exactly one"), equality, richer typing of properties, characteristics of properties (e.g. symmetry), and enumerated classes. SPARQL is a protocol and query language for semantic web data sources.

Current ongoing standardizations include:


y

Rule Interchange Format (RIF) as the Rule Layer of the Semantic Web Stack

The intent is to enhance the usability and usefulness of the Web and its interconnected resources through:
y

y y

servers which expose existing data systems using the RDF and SPARQL standards. Many converters to RDF exist from different applications. Relational databases are an important source. The semantic web server attaches to the existing system without affecting its operation. documents "marked up" with semantic information (an extension of the HTML <meta> tags used in today's Web pages to supply information for Web search engines using web crawlers). This could be machineunderstandable information about the human-understandable content of the document (such as the creator, title, description, etc., of the document) or it could be purely metadata representing a set of facts (such as resources and services elsewhere in the site). (Note that anything that can be identified with a Uniform Resource Identifier (URI) can be described, so the semantic web can reason about animals, people, places, ideas, etc.) Semantic markup is often generated automatically, rather than manually. common metadata vocabularies (ontologies) and maps between vocabularies that allow document creators to know how to mark up their documents so that agents can use the information in the supplied metadata (so that Author in the sense of 'the Author of the page' won't be confused with Author in the sense of a book that is the subject of a book review). automated agents to perform tasks for users of the semantic web using this data web-based services (often with agents of their own) to supply information specifically to agents (for example, a Trust service that an agent could ask if some online store has a history of poor service or spamming).

141

Electronic learning 142

11.

Electronic learning

Electronic learning (or e-Learning or eLearning) is a type of education where the medium of instruction is computer technology. No in-person interaction may take place in some instances. E-learning is used interchangeably in a wide variety of contexts. In companies, it refers to the strategies that use the company network to deliver training courses to employees. In the USA, it is defined as a planned teaching/learning experience that uses a wide spectrum of technologies, mainly Internet or computer-based, to reach learners at a distance. Lately in most Universities, e-learning is used to define a specific mode to attend a course or programmes of study where the students rarely, if ever, attend face-to-face for oncampus access to educational facilities, because they study online.

11-1. Market
The worldwide e-learning industry is estimated to be worth over 38 billion euros according to conservative estimates, although in the European Union only about 20% of e-learning products are produced within the common market. Developments in internet and multimedia technologies are the basic enabler of elearning, with content, technologies and services being identified as the three key sectors of the e-learning industry.

11-2. Growth of e-learning


By 2006, nearly 3.5 million students were participating in on-line learning at institutions of higher education in the United States. Many higher education, forprofit institutions, now offer on-line classes. By contrast, only about half of private, non-profit schools offer them. The Sloan report, based on a poll of academic leaders, says that students generally appear to be at least as satisfied with their online classes as they are with traditional ones. Private Institutions may become more involved with on-line presentations as the cost of instituting such a system decreases. Properly trained staff must also be hired to work with students on-line. These staff members must be able to not only understand the content area, but also be highly trained in the use of the computer and Internet. Online education is rapidly increasing, and online doctoral programs have even developed at leading research universities.

143

English for Computer and IT Engineers

11-3. Technology
As early as 1993, Graziadi, W. D. described an online computer-delivered lecture, tutorial and assessment project using electronic Mail, two VAX Notes conferences and Gopher/Lynx together with several software programs that allowed students and instructor to create a Virtual Instructional Classroom Environment in Science (VICES) in Research, Education, Service & Teaching (REST). In 1997 Graziadei, W.D., et al., published an article entitled "Building Asynchronous and Synchronous Teaching-Learning Environments: Exploring a Course/Classroom Management System Solution". They described a process at the State University of New York (SUNY) of evaluating products and developing an overall strategy for technology-based course development and management in teaching -learning. The product(s) had to be easy to use and maintain, portable, replicable, scalable, and immediately affordable, and they had to have a high probability of success with long-term cost-effectiveness. Today many technologies can be, and are, used in e-Learning, from blogs to collaborative software, ePortfolios, and virtual classrooms. Most eLearning situations use combinations of the these techniques. Along with the terms learning technology, instructional technology, and Educational Technology, the term is generally used to refer to the use of technology in learning in a much broader sense than the computer-based training or Computer Aided Instruction of the 1980s. It is also broader than the terms Online Learning or Online Education which generally refer to purely web-based learning. In cases where mobile technologies are used, the term M-learning has become more common. E-learning, however, also has implications beyond just the technology and refers to the actual learning that takes place using these systems.

E-learning is naturally suited to distance learning and flexible learning, but can also be used in conjunction with face-to-face teaching, in which case the term Blended learning is commonly used. E-Learning pioneer Bernard Luskin argues that the "E" must be understood to have broad meaning if e -Learning is to be effective. Luskin says that the "e" should be interpreted to mean exciting, energetic, enthusiastic, emotional, extended, excellent, and educational in addition to "electronic" that is a traditional national interpretation. This broader interpretation allows for 21st century applications and brings learning and media psychology into the equation.
In higher education especially, the increasing tendency is to create a Virtual Learning Environment (VLE) (which is sometimes combined with a Management Information System (MIS) to create a Managed Learning Environment) in which

143

Electronic learning 144 all aspects of a course are handled through a consistent user interface standard throughout the institution. A growing number of physical universities, as well as newer online-only colleges, have begun to offer a select set of academic degree and certificate programs via the Internet at a wide range of levels and in a wide range of disciplines. While some programs require students to attend some campus classes or orientations, many are delivered completely online. In addition, several universities offer online student support services, such as online advising and registration, e-counseling, online textbook purchase, student governments and student newspapers. e-Learning can also refer to educational web sites such as those offering learning scenarios, worksheets and interactive exercises for children. The term is also used extensively in the business sector where it generally refers to cost-effective online training.

11-4. Services
E-learning services have evolved since computers were first used in education. There is a trend to move toward blended learning services, where computer-based activities are integrated with practical or classroom-based situations.

11-5. Goals of e-learning


E-Learning lessons are generally designed to guide students through information or to help students perform in specific tasks. Information based e-Learning content communicates information to the student. Examples include content that distributes the history or facts related to a service, company, or product. In information-based content, there is no specific skill to be learned. In performance-based content, the lessons build off of a procedural skill in which the student is expected to increase proficiency.

11-5-1. Computer-based learning


Computer Based Learning, sometimes abbreviated to CBL, refers to the use of computers as a key component of the educational environment. While this can refer to the use of computers in a classroom, the term more broadly refers to a structured environment in which computers are used for teaching purposes. The concept is

145

English for Computer and IT Engineers

generally seen as being distinct from the use of computers in ways where learning is at least a peripheral element of the experience (e.g. computer games and web browsing).

11-5-2. Computer-based training


Computer-based training (CBT) services are where a student learns by executing special training programs on a computer relating to their occupation. CBT is especially effective for training people to use computer applications because the CBT program can be integrated with the applications so that students can practice using the application as they learn. Historically, CBTs growth has been hampered by the enormous resources required: human resources to create a CBT program, and hardware resources needed to run it. However, the increase in PC computing power, and especially the growing prevalence of computers equipped with CDROMs, is making CBT a more viable option for corporations and individuals alike. Many PC applications now come with some modest form of CBT, often called a tutorial. Web-based training (WBT) is a type of training that is similar to CBT; however, it is delivered over the Internet using a web browser. Web-based training frequently includes interactive methods, such as bulletin boards, chat rooms, instant messaging, videoconferencing, and discussion threads. Web based training is usually a self-paced learning medium though some systems allow for online testing and evaluation at specific times.

11-6. Pedagogical elements


Pedagogical elements are an attempt to define structures or units of educational material. For example, this could be a lesson, an assignment, a multiple choice question, a quiz, a discussion group or a case study. These units should be format independent, so although it may be implemented in any of the following methods, pedagogical structures would not include a textbook, a web page, a video conference or an iPod video. When beginning to create e-Learning content, the pedagogical approaches need to be evaluated. Simple pedagogical approaches make it easy to create content, but lack flexibility, richness and downstream functionality. On the other hand, complex pedagogical approaches can be difficult to set up and slow to develop, though they have the potential to provide more engaging learning experiences for students. Somewhere between these extremes is an ideal pedagogy that allows a particular educator to effectively create educational materials while simultaneously providing the most engaging educational experiences for students.

145

Electronic learning 146

11-7. Pedagogical approaches or perspectives


It is possible to use various pedagogical approaches for eLearning which include:
y

instructional design - the traditional pedagogy of instruction which is curriculum focused, and is developed by a centralized educating group or a single teacher. social-constructivist - this pedagogy is particularly well afforded by the use of discussion forums, blogs, wiki and on-line collaborative activities. It is a collaborative approach that opens educational content creation to a wider group including the students themselves. Laurillard's Conversational Model is also particularly relevant to eLearning, and Gilly Salmon's Five-Stage Model is a pedagogical approach to the use of discussion boards. Cognitive perspective focuses on the cognitive processes involved in learning as well as how the brain works. Emotional perspective focuses on the emotional aspects of learning, like motivation, engagement, fun, etc. Behavioural perspective focuses on the skills and behavioural outcomes of the learning process. Role-playing and application to on-the-job settings. Contextual perspective focuses on the environmental and social aspects which can stimulate learning. Interaction with other people, collaborative discovery and the importance of peer support as well as pressure.

eusability, standards and learning 11-8. objects


Much effort has been put into the technical reuse of electronically-based teaching materials and in particular creating or re-using Learning Objects. These are self contained units that are properly tagged with keywords, or other metadata, and often stored in an XML file format. Creating a course requires putting together a sequence of learning objects. There are both proprietary and open, non-commercial

147

English for Computer and IT Engineers

and commercial, peer-reviewed repositories of learning objects such as the Merlot repository. A common standard format for e-learning content is SCORM whilst other specifications allow for the transporting of "learning objects" (Schools Interoperability Framework) or categorizing meta-data (LOM). These standards themselves are early in the maturity process with the oldest being 8 years old. They are also relatively vertical specific: SIF is primarily pK-12, LOM is primarily Corp, Military and Higher Ed, and SCORM is primarily Military and Corp with some Higher Ed. PESC- the Post-Secondary Education Standards Council- is also making headway in developing standards and learning objects for the Higher Ed space, while SIF is beginning to seriously turn towards Instructional and Curriculum learning objects. In the US pK12 space there are a host of content standards that are critical as wellthe NCES data standards are a prime example. Each state government's content standards and achievement benchmarks are critical metadata for linking e-learning objects in that space.

11-9. Communication technologies used in elearning


Communication technologies are generally categorized as asynchronous or synchronous. Asynchronous activities use technologies such as blogs, wikis, and discussion boards. The idea here is that participants may engage in the exchange of ideas or information without the dependency of other participants involvement at the same time. Electronic mail (Email) is also asynchronous in that mail can be sent or received without having both the participants involvement at the same time. Synchronous activities involve the exchange of ideas and information with one or more participants during the same period of time. A face to face discussion is an example of synchronous communications. Synchronous activities occur with all participants joining in at once, as with an online chat session or a virtual classroom or meeting. Virtual classrooms and meetings can often use a mix of communication technologies.

147

Electronic learning 148 In many models, the writing community and the communication channels relate with the E-learning and the M-learning communities. Both the communities provide a general overview of the basic learning models and the activities required for the participants to join the learning sessions across the virtual classroom or even across standard classrooms enabled by technology. Many activities, essential for the learners in these environments, require frequent chat sessions in the form of virtual classrooms and/or blog meetings.

11-10. E-Learning 2.0


The term e-Learning 2.0 is used to refer to new ways of thinking about e-learning inspired by the emergence of Web 2.0. From an e-Learning 2.0 perspective, conventional e-learning systems were based on instructional packets that were delivered to students using Internet technologies. The role of the student consisted in learning from the readings and preparing assignments. Assignments were evaluated by the teacher. In contrast, the new e-learning places increased emphasis on social learning and use of social software such as blogs, wikis, podcasts and virtual worlds such as Second Life. This phenomenon has also been referred to as Long Tail Learning The first 10 years of e-learning (e-learning 1.0) was focused on using the internet to replicate the instructor-led experience. Content was designed to lead a learner through the content, providing a wide and ever-increasing set of interactions, experiences, assessments, and simulations. E-learning 2.0, by contrast (patterned after Web 2.0) is built around collaboration. e-learning 2.0 assumes that knowledge (as meaning and understanding) is socially constructed. Learning takes place through conversations about content and grounded interaction about problems and actions. Advocates of social learning claim that one of the best ways to learn something is to teach it to others.

149

English for Computer and IT Engineers

12.

Electronic commerce

Electronic commerce , commonly known as e-commerce or eCommerce, consists of the buying and selling of products or services over electronic systems such as the Internet and other computer networks. The amount of trade conducted electronically has grown extraordinarily since the spread of the Internet. A wide variety of commerce is conducted in this way, spurring and drawing on innovations in electronic funds transfer, supply chain management, Internet marketing, online transaction processing, electronic data interchange (EDI), inventory management systems, and automated data collection systems. Modern electronic commerce typically uses the World Wide Web at least at some point in the transaction's lifecycle, although it can encompass a wider range of technologies such as e-mail as well.
A large percentage of electronic commerce is conducted entirely electronically for virtual items such as access to premium content on a website, but most electronic commerce involves the transportation of physical items in some way. Online retailers are sometimes known as e-tailers and online retail is sometimes known as e-tail. Almost all big retailers have electronic commerce presence on the World Wide Web. Electronic commerce that is conducted between businesses is referred to as Business-to-business or B2B. B2B can be open to all interested parties (e.g. commodity exchange) or limited to specific, pre-qualified participants (private electronic market). Electronic commerce is generally considered to be the sales aspect of e-business. It also consists of the exchange of data to facilitate the financing and payment aspects of the business transactions.

12-1. History
Timeline
y y

1990: Tim Berners-Lee writes the first web browser, WorldWideWeb, using a NeXT computer. 1992: J.H. Snider and Terra Ziporyn publish Future Shop: How New Technologies Will Change the Way We Shop and What We Buy. St. Martin's Press. ISBN 0312063598.

149

Electronic commerce 150


y

y y

y y

y y y

1994: Netscape releases the Navigator browser in October under the code name Mozilla. Pizza Hut offers pizza ordering on its Web page. The first online bank opens. Attempts to offer flower delivery and magazine subscriptions online. Adult materials also becomes commercially available, as do cars and bikes. Netscape 1.0 is introduced in late 1994 SSL encryption that made transactions secure. 1995: Jeff Bezos launches Amazon.com and the first commercial-free 24 hour, internet-only radio stations, Radio HK and NetRadio start broadcasting. Dell and Cisco begin to aggressively use Internet for commercial transactions. eBay is founded by computer programmer Pierre Omidyar as AuctionWeb. 1998: Electronic postal stamps can be purchased and downloaded for printing from the Web. 1999: Business.com sold for US $7.5 million to eCompanies, which was purchased in 1997 for US $150,000. The peer-to-peer filesharing software Napster launches. 2000: The dot-com bust. 2002: eBay acquires PayPal for $1.5 billion. Niche retail companies CSN Stores and NetShops are founded with the concept of selling products through several targeted domains, rather than a central portal. 2003: Amazon.com posts first yearly profit. 2007: Business.com acquired by R.H. Donnelley for $345 million. 2008: US eCommerce and Online Retail sales projected to reach $204 billion, an increase of 17 percent over 2007.

12-2. Business applications


Some common applications related to electronic commerce are the following:
y y y y y y y y y y y

E-mail and messaging Content Management Systems Documents, spreadsheets, database Accounting and finance systems Orders and shipment information Enterprise and client information reporting Domestic and international payment systems Newsgroup On-line Shopping Messaging Conferencing

151

English for Computer and IT Engineers

12-3. Government regulations


In the United States, some electronic commerce activities are regulated by the Federal Trade Commission (FTC). These activities include the use of commercial e-mails, online advertising and consumer privacy. The CAN-SPAM Act of 2003 establishes national standards for direct marketing over e-mail. The Federal Trade Commission Act regulates all forms of advertising, including online advertising, and states that advertising must be truthful and non-deceptive. Using its authority under Section 5 of the FTC Act, which prohibits unfair or deceptive practices, the FTC has brought a number of cases to enforce the promises in corporate privacy statements, including promises about the security of consumers personal information. As result, any corporate privacy policy related to e-commerce activity may be subject to enforcement by the FTC.

12-4. Forms
Contemporary electronic commerce involves everything from ordering "digital" content for immediate online consumption, to ordering conventional goods and services, to "meta" services to facilitate other types of electronic commerce. On the consumer level, electronic commerce is mostly conducted on the World Wide Web. An individual can go online to purchase anything from books, grocery to expensive items like real estate. Another example will be online banking like online bill payments, buying stocks, transferring funds from one account to another, and initiating wire payment to another country. All these activities can be done with a few keystrokes on the keyboard. On the institutional level, big corporations and financial institutions use the internet to exchange financial data to facilitate domestic and international business. Data integrity and security are very hot and pressing issues for electronic commerce these days.

151

e-Government

152

13.

e-Government

e-Government (from electronic government, also known as e-gov, digital government, online government or in a certain context transformational government) refers to the use of internet technology as a platform for exchanging information, providing services and transacting with citizens, businesses, and other arms of government. e-Government may be applied by the legislature, judiciary, or administration, in order to improve internal efficiency, the delivery of public services, or processes of democratic governance. The primary delivery models are Government-to-Citizen or Government-to-Customer (G2C), Government-toBusiness (G2B) and Government-to-Government (G2G) & Government-toEmployees (G2E). Within each of these interaction domains, four kind of activities take place
y y

y y

pushing information over the Internet, e.g: regulatory services, general holidays, public hearing schedules, issue briefs, notifications, etc. two-way communications between the agency and the citizen, a business, or another government agency. In this model, users can engage in dialogue with agencies and post problems, comments, or requests to the agency. conducting transactions, e.g: lodging tax returns, applying for services and grants. governance, e.g: online polling, voting, and campaigning.

The most important anticipated benefits of e-government include improved efficiency, convenience, and better accessibility of public services. While e-government is often thought of as "online government" or "Internet-based government," many non-Internet "electronic government" technologies can be used in this context. Some non-internet forms include telephone, fax, PDA, SMS text messaging, MMS, wireless networks and services, Bluetooth, CCTV, tracking systems, RFID, biometric identification, road traffic management and regulatory enforcement, identity cards, smart cards and other NFC applications; polling station technology (where non-online e-voting is being considered), TV and radiobased delivery of government services, email, online community facilities, newsgroups and electronic mailing lists, online chat, and instant messaging technologies. There are also some technology-specific sub-categories of egovernment, such as m-government (mobile government), u-government (ubiquitous government), and g-government (GIS/GPS applications for egovernment).

153

English for Computer and IT Engineers

There are many considerations and potential implications of implementing and designing e-government, including disintermediation of the government and its citizens, impacts on economic, social, and political factors, and disturbances to the status quo in these areas. In countries such as the United Kingdom, there is interest in using electronic government to re-engage citizens with the political process. In particular, this has taken the form of experiments with electronic voting, aiming to increase voter turnout by making voting easy. The UK Electoral Commission has undertaken several pilots, though concern has been expressed about the potential for fraud with some electronic voting methods.

13-1. History of E-Government


E-government is the use of information technology to provide citizen and organizations with more convenient access to government information and services and to provide delivery of public services to citizen, business partners, and those working in the public sector. The initial part of implementation of e-governance is "computerization" of public offices enabling them by building their capacity for better service delivery and brining in good governance using technology as a catalyst and the second part is provision of citizen centric services through digital media like developing interactive government portals. The countries with remarkable e-governance initiatives are New Zealand, Canada and Singapore. E-government in the United States was especially driven by the 1998 Government Paperwork Elimination Act and by President Clinton's December 17, 1999, Memorandum on E-Government, which ordered the top 500 forms used by citizens to be placed online by December 2000. The memorandum also directed agencies to construct a secure e-government infrastructure.

13-2. evelopment and implementation issues


The development and implementation of e-government involves consideration of its effects on the organisation of the public sector (Cordella, 2007) and on the

153

e-Government

154

nature of the services provided by the state including environmental, social, cultural, educational, and consumer issues, among others. Governments may need to consider the impact by gender, age, language skills, and cultural diversity, as well as the effect on literacy, numeracy, education standards and IT literacy. Economic concerns include the "Digital divide," or the effect of non-use, non-availability or inaccessibility of e-government, or of other digital resources, upon the structure of society, and the potential impact on income and economics. Economic and revenue-related concerns include e-government's effect on taxation, debt, Gross Domestic Product (GDP), commerce and trade, corporate governance, and its effect on non-e-government business practices, industry and trade, especially Internet Service Providers and Internet infrastructure. From a technological standpoint, the implementation of e-government has effects on e-enablement, interoperability (e.g., e-GIF) and semantic web issues, "legacy technology" (making "pre-eGovernment IT" work together with or be replaced by e-government systems), and implications for software choices (between open source and proprietary software, and between programming languages) as well as political blogging especially by legislators. There are also management issues related to service integration, local egovernment, and Internet governance including ICANN, IETF and W3C, and financial considerations, such as the cost of implementation / effect on existing budgets, effect on government procurement, and funding. The phrase "e-government" has been a rallying cry for public sector modernization since the 90's, but for many it is now losing its appeal as a slogan or concept. This trend has various drivers. Firstly, there is a wish to mainstream e-government so that best use of technology is integrated into all public sector activity rather than seen as a special interest or add-on. Secondly, many administrations recognise the importance of linking e-government to wider public sector change programmes. Thirdly, the phrase e-government is itself not particularly useful in motivating a change programme. These sorts of considerations have led countries such as the UK to talk of transformational government rather than e-government. Finally, there is the issue of the implications for the public sector of Web 2.0. All these considerations suggest that e-government is entering a new phase and one in which the term "e-government" is itself becoming less popular.

155

English for Computer and IT Engineers

155

e-Government

156

13-3. E-democracy
E-democracy, a combination of the words "electronic" and "democracy," comprises the use of electronic communications technologies such as the Internet in enhancing democratic processes within a democratic republic or representative democracy. It is a political development still in its infancy, as well as the subject of much debate and activity within government, civic-oriented groups and societies around the world.
The term is both descriptive and prescriptive. Typically, the kinds of enhancements sought by proponents of e-democracy are framed in terms of making processes more accessible; making citizen participation in public policy decision-making more expansive and direct so as to enable broader influence in policy outcomes as more individuals involved could yield smarter policies; increasing transparency and accountability; and keeping the government closer to the consent of the governed, thereby increasing its political legitimacy. E-democracy includes within its scope electronic voting, but has a much wider span than this single aspect of the democratic process. E-democracy is also sometimes referred to as cyberdemocracy or digital democracy.

13-3-1. Practical issues with e-democracy


One major obstacle to the success of e-democracy is that of citizen identification. For secure elections and other secure citizen-to-government transactions, citizens must have some form of identification that preserves privacy and maybe also one which could be used in internet forums. The need to allow anonymous posting while at the same time giving certain contributors extra status can be solved using certain cryptographic methods. Another obstacle is that there are many vested interests that would be harmed by a more direct democracy. Amongst these are politicians, media moguls and some interests in big business and trade unions. These organizations may be expected to oppose meaningful application of e-democracy concepts. Robert's Rules of Order notes that a deliberative assembly requires an environment of simultaneous aural communication; otherwise "situations unprecedented in parliamentary law may arise." Even in a teleconference or videoconference, adjustments must be made in reference to how recognition is to be sought and the floor obtained. The common

157

English for Computer and IT Engineers

parliamentary law has not yet developed standardized procedures for conducting business electronically.

13-3-2. Internet as political medium


The Internet is viewed as a platform and delivery medium for tools that help to eliminate some of the distance constraints in direct democracy. Technical media for e-democracy can be expected to extend to mobile technologies such as cellphones. There are important differences between previous communication media and the Internet that are relevant to the Internet as a political medium. Most importantly the Internet is a many-to-many communication medium where radio and television, which broadcast few-to-many, and telephones broadcast few-to-few, are not. Also, the Internet has a much greater computational capacity allowing strong encryption and database management, which is important in community information access and sharing, deliberative democracy and electoral fraud prevention. Further, people use the Internet to collaborate or meet in an asynchronous manner that is, they do not have to be physically gathered at the same moment to get things accomplished. The lower cost of information-exchange on the Internet, as well as the high-level of reach that the content potentially has makes the Internet an attractive medium for political information, particularly amongst social interest groups and parties with lower budgets. For example, environmental or social issue groups may find the Internet an easier mechanism to increase awareness of their issues compared to traditional media outlets, such as television or newspapers, which require heavy financial investment. Due to all these factors, the Internet has the potential to take over certain traditional media of political communication such as the telephone, the television, newspapers and the radio. Another example is OpenForum.com.au, an Australian non-for-profit eDemocracy project which invites politicians, senior public servants, academics, business people and other key stakeholders to engage in high-level policy debate.

13-3-3. Benefits and disadvantages


Contemporary technologies such as electronic mailing lists, peer-to-peer networks, collaborative software, wikis, Internet forums and blogs are clues to and early potential solutions for some aspects of e-democracy. Equally, these technologies

157

e-Government

158

are bellwethers of some of the issues associated with the territory, such as the inability to sustain new initiatives or protect against identity theft, information overload and vandalism. Some traditional objections to direct democracy are argued to apply to edemocracy, such as the potential for governance to tend towards populism and demagoguery. More practical objections exist, not least in terms of the digital divide between those with access to the media of e-democracy (mobile phones and Internet connections) and those without, as well as the opportunity cost of expenditure on e-democracy innovations. Electronic democracy can also carry the benefit of reaching out to youth, as a mechanism to increase youth voter turnout in elections and raising awareness amongst youth. With the consistent decline of voter turnout e-democracy and electronic voting mechanisms can help revert that trend. Youth, in particular, have seen a significant drop in turnout in most industrialized nations, including Canada, the United States and the United Kingdom. The use of electronic political participation mechanism may appear more familiar to youth, and as a result, garner more participation by youths who would otherwise find it inconvenient to vote using the more traditional methods. Electronic democracy can help improve democratic participation, reduce civic illiteracy and voter apathy and become a useful asset for political discussion, education, debate and participation.

159

English for Computer and IT Engineers

13-3-4. Electronic direct democracy


Electronic direct democracy is a form of direct democracy in which the Internet and other electronic communications technologies are used to ameliorate the bureaucracy involved with referendums. Many advocates think that also important to this notion are technological enhancements to the deliberative process. Electronic direct democracy is sometimes referred to as EDD (many other names are used for what is essentially the same concept). EDD requires electronic voting or some way to register votes on issues electronically. As in any direct democracy, in an EDD citizens would have the right to vote on legislation, author new legislation, and recall representatives (if any representatives are preserved). EDD as a system is not fully implemented anywhere in the world although several initiatives are currently forming. Ross Perot was for a time a prominent advocate of EDD when he advocated "electronic town halls" during his 1992 and 1996 Presidential campaigns in the United States. Switzerland, already partially governed by direct democracy, is making progress towards such a system. Several attempts at open source governance are in nascent stages, most notably the Metagovernment project. Senator On-Line, an Australian political party running for the Senate in the 2007 federal elections proposes to institute an EDD system so that Australians decide which way the senators vote on each and every bill. Liquid democracy, or direct democracy with delegable proxy, would allow citizens to choose a proxy to vote on their behalf while retaining the right to cast their own vote on legislation. The voting and the appointment of proxies cou be done ld electronically. The proxies could even form proxy chains, in which if A appoints B and B appoints C, and neither A and B vote on a proposed bill but C does, C's vote will count for all three of them. Citizens could also rank their proxies in order of preference, so that if their first choice proxy fails to vote, their vote can be cast by their second-choice proxy. The topology of this system would mirror the structure of the Internet itself, in which routers may have a primary and alternate server from which to request information.

13-4. Electronic voting


Electronic voting (also known as `) is a term encompassing several different types of voting, embracing both electronic means of casting a vote and electronic means of counting votes.

159

e-Government

160

Electronic voting technology can include punch cards, optical scan voting systems and specialized voting kiosks (including self-contained Direct-recording electronic (DRE) voting systems). It can also involve transmission of ballots and votes via telephones, private computer networks, or the Internet. Electronic voting technology can speed the counting of ballots and can provide improved accessibility for disabled voters. However, there has been controversy, especially in the United States, that electronic voting, especially DRE voting, can facilitate electoral fraud. Electronic voting systems for electorates have been in use since the 1960s when punch card systems debuted. The newer optical scan voting systems allow a computer to count a voter's mark on a ballot. DRE voting machines which collect and tabulate votes in a single machine, are used by all voters in all elections in Brazil, and also on a large scale in India, the Netherlands, Venezuela, and the United States. Internet voting systems have gained popularity and have been used for government elections and referendums in the United Kingdom, Estonia and Switzerland as well as municipal elections in Canada and party primary elections in the United States and France. There are also hybrid systems that include an electronic ballot marking device (usually a touch screen system similar to a DRE) or other assistive technology to print a voter-verifiable paper ballot, then use a separate machine for electronic tabulation.

Paper-based electronic voting system


Sometimes called a "document ballot voting system," paper-based voting systems originated as a system where votes are cast and counted by hand, using paper ballots. With the advent of electronic tabulation came systems where paper cards or sheets could be marked by hand, but counted electronically. These systems included punch card voting, marksense and later digital pen voting systems. Most recently, these systems can include an Electronic Ballot Marker (EBM), that allow voters to make their selections using an electronic input device, usually a touch screen system similar to a DRE. Systems including a ballot marking device can incorporate different forms of assistive technology.

161

English for Computer and IT Engineers V U

Direct-recording electronic (DRE voting s stem A direct-recording electronic (DRE) voting machine records votes by means of a ballot display provided with mechanical or electro-optical components that can be activated by the voter (typically buttons or a touchscreen); that processes data with computer software; and that records voting data and ballot images in memory components. After the election it produces a tabulation of the voting data stored in a removable memory component and as printed copy. The system may also provide a means for transmitting individual ballots or vote totals to a central location for consolidating and reporting results from precincts at the central location. These systems use a precinct count method that tabulates ballots at the polling place . They typically tabulate ballots as they are cast and print the results after the close of polling.

Figure 13-1: Electronic voting machine b Premier Election Solutions (formerl Diebold Election S stems used in all Brazilian elections and plebiscites. Photo b Agncia Brasil

In 2002, in the United States, the Help America Vote Act mandated that one handicapped accessible voting system be provided per polling place, which most jurisdictions have chosen to satisfy with the use of DRE voting machines, some switching entirely over to DRE. In 2004, 28.9% of the registered voters in the United States used some type of direct recording electronic voting system, up from 7.7% in 1996. Public network DRE voting s stem A public network DRE voting system is an election system that uses electronic ballots and transmits vote data from the polling place to another location over a public network. Vote data may be transmitted as individual ballots as they are cast, periodically as batches of ballots throughout the election day, or as one batch at the close of voting. This includes Internet voting as well as telephone voting. Public network DRE voting system can utilize either precinct count or central count method. The central count method tabulates ballots from multiple precincts at a central location. V

161

e-Government

162

Internet voting can use remote locations (voting from any Internet capable computer) or can use traditional polling locations with voting booths consisting of Internet connected voting systems. Corporations and organizations routinely use Internet voting to elect officers and Board members and for other proxy elections. Internet voting systems have been used privately in many modern nations and publicly in the United States, the UK, Ireland, Switzerland and Estonia. In Switzerland, where it is already an established part of local referendums, voters get their passwords to access the ballot through the postal service. Most voters in Estonia can cast their vote in local and parliamentary elections, if they want to, via the Internet, as most of those on the electoral roll have access to an e-voting system, the largest run by any European Union country. It has been made possible because most Estonians carry a national identity card equipped with a computer-readable microchip and it is these cards which they use to get access to the online ballot. All a voter needs is a computer, an electronic card reader, their ID card and its PIN, and they can vote from anywhere in the world. Estonian e-votes can only be cast during the days of advance voting. On election day itself people have to go to polling stations and fill in a paper ballot.

163

English for Computer and IT Engineers

14.

Computer vision

Computer vision is the science and technology of machines that see. As a scientific discipline, computer vision is concerned with the theory for building artificial systems that obtain information from images. The image data can take many forms, such as a video sequence, views from multiple cameras, or multidimensional data from a medical scanner.
As a technological discipline, computer vision seeks to apply the theories and models of computer vision to the construction of computer vision systems. Examples of applications of computer vision systems include systems for:
y y y y y

Controlling processes (e.g. an industrial robot or an autonomous vehicle). Detecting events (e.g. for visual surveillance or people counting). Organizing information (e.g. for indexing databases of images and image sequences). Modeling objects or environments (e.g. industrial inspection, medical image analysis or topographical modeling). Interaction (e.g. as the input to a device for computer-human interaction).

Computer vision can also be described as a complement (but not necessarily the opposite) of biological vision. In biological vision, the visual perception of humans and various animals are studied, resulting in models of how these systems operate in terms of physiological processes. Computer vision, on the other hand, studies and describes artificial vision system that are implemented in software and/or hardware. Interdisciplinary exchange between biological and computer vision has proven increasingly fruitful for both fields. Sub-domains of computer vision include scene reconstruction, event detection, tracking, object recognition, learning, indexing, ego-motion and image restoration.

14-1. State of the art


The field of computer vision can be characterized as immature and diverse. Even though earlier work exists, it was not until the late 1970s that a more focused study of the field started when computers could manage the processing of large data sets

163

Computer vision 164 such as images. However, these studies usually originated from various other fields, and consequently there is no standard formulation of "the computer vision problem." Also, and to an even larger extent, there is no standard formulation of how computer vision problems should be solved. Instead, there exists an abundance of methods for solving various well-defined computer vision tasks, where the methods often are very task specific and seldom can be generalized over a wide range of applications. Many of the methods and applications are still in the state of basic research, but more and more methods have found their way into commercial products, where they often constitute a part of a larger system which can solve complex tasks (e.g., in the area of medical images, or quality control and measurements in industrial processes). In most practical computer vision applications, the computers are pre-programmed to solve a particular task, but methods based on learning are now becoming increasingly common. This finally had a huge impact on the Industrial field

14-2. elated fields


A significant part of artificial intelligence deals with autonomous planning or deliberation for systems which can perform mechanical actions such as moving a robot through some environment. This type of processing typically needs input data provided by a computer vision system, acting as a vision sensor and providing high-level information about the environment and the robot. Other parts which sometimes are described as belonging to artificial intelligence and which are used in relation to computer vision is pattern recognition and learning techniques. As a consequence, computer vision is sometimes seen as a part of the artificial intelligence field or the computer science field in general.

165

English for Computer and IT Engineers

Figure 14-1: Relation between computer vision and various other fields

Physics is another field that is strongly related to computer vision. A significant part of computer vision deals with methods which require a thorough understanding of the process in which electromagnetic radiation, typically in the visible or the infra-red range, is reflected by the surfaces of objects and finally is measured by the image sensor to produce the image data. This process is based on optics and solid-state physics. More sophisticated image sensors even require quantum mechanics to provide a complete comprehension of the image formation process. Also, various measurement problems in physics can be addressed using computer vision, for example motion in fluids. Consequently, computer vision can also be seen as an extension of physics. A third field which plays an important role is neurobiology, specifically the study of the biological vision system. Over the last century, there has been an extensive study of eyes, neurons, and the brain structures devoted to processing o visual f stimuli in both humans and various animals. This has led to a coarse, yet complicated, description of how "real" vision systems operate in order to solve certain vision related tasks. These results have led to a subfield within computer vision where artificial systems are designed to mimic the processing and behaviour

165

Computer vision 166 of biological systems, at different levels of complexity. Also, some of the learningbased methods developed within computer vision have their background in biology. Yet another field related to computer vision is signal processing. Many methods for processing of one-variable signals, typically temporal signals, can be extended in a natural way to processing of two-variable signals or multi-variable signals in computer vision. However, because of the specific nature of images there are many methods developed within computer vision which have no counterpart in the processing of one-variable signals. A distinct character of these methods is the fact that they are non-linear which, together with the multi-dimensionality of the signal, defines a subfield in signal processing as a part of computer vision. Beside the above mentioned views on computer vision, many of the related research topics can also be studied from a purely mathematical point of view. For example, many methods in computer vision are based on statistics, optimization or geometry. Finally, a significant part of the field is devoted to the implementation aspect of computer vision; how existing methods can be realized in various combinations of software and hardware, or how these methods can be modified in order to gain processing speed without losing too much performance. The fields, most closely related to computer vision, are image processing, image analysis, robot vision and machine vision. There is a significant overlap in terms of what techniques and applications they cover. This implies that the basic techniques that are used and developed in these fields are more or less identical, something which can be interpreted as there is only one field with different names. On the other hand, it appears to be necessary for research groups, scientific journals, conferences and companies to present or market themselves as belonging specifically to one of these fields and, hence, various characterizations which distinguish each of the fields from the others have been presented. The following characterizations appear relevant but should not be taken as universally accepted. Image processing and image analysis tend to focus on 2D images, how to transform one image to another, e.g., by pixel-wise operations such as contrast enhancement, local operations such as edge extraction or noise removal, or geometrical transformations such as rotating the image. This characterization implies that image processing/analysis neither require assumptions nor produce interpretations about the image content. Computer vision tends to focus on the 3D scene projected onto one or several images, e.g., how to reconstruct structure or other information about the 3D scene from one or several images. Computer vision often relies on more or less complex assumptions about the scene depicted in an

167

English for Computer and IT Engineers

image. Machine vision tends to focus on applications, mainly in industry, e.g., vision based autonomous robots and systems for vision based inspection or measurement. This implies that image sensor technologies and control theory often are integrated with the processing of image data to control a robot and that realtime processing is emphasized by means of efficient implementations in hardware and software. There is also a field called imaging which primarily focus on the process of producing images, but sometimes also deals with processing and analysis of images. For example, medical imaging contains lots of work on the analysis of image data in medical applications. Finally, pattern recognition is a field which uses various methods to extract information from signals in general, mainly based on statistical approaches. A significant part of this field is devoted to applying these methods to image data. A consequence of this state of affairs is that you can be working in a lab related to one of these fields, apply methods from a second field to solve a problem in a third field and present the result at a conference related to a fourth field!

14-3. Applications for computer vision


One of the most prominent application fields is medical computer vision or medical image processing. This area is characterized by the extraction of information from image data for the purpose of making a medical diagnosis of a patient. Generally, image data is in the form of microscopy images, X-ray images, angiography images, ultrasonic images, and tomography images. An example of information which can be extracted from such image data is detection of tumours, arteriosclerosis or other malign changes. It can also be measurements of organ dimensions, blood flow, etc. This application area also supports medical research by providing new information, e.g., about the structure of the brain, or about the quality of medical treatments. A second application area in computer vision is in industry. Here, information is extracted for the purpose of supporting a manufacturing process. One example is quality control where details or final products are being automatically inspected in order to find defects. Another example is measurement of position and orientation of details to be picked up by a robot arm. Military applications are probably one of the largest areas for computer vision. The obvious examples are detection of enemy soldiers or vehicles and missile guidance. More advanced systems for missile guidance send the missile to an area rather than

167

Computer vision

168

a specific target, and target selection is made when the missile reaches the area based on locally acquired image data. Modern military concepts, such as "battlefield awareness", imply that various sensors, including image sensors, provide a rich set of information about a combat scene which can be used to support strategic decisions. In this case, automatic processing of the data is used to reduce complexity and to fuse information from multiple sensors to increase reliability.

Figure 14-2: Artist's Concept of Rover on Mars, an example of an unmanned land-based vehicle. Notice the stereo cameras mounted on top of the Rover. (credit: Maas Digital LLC

One of the newer application areas is autonomous vehicles, which include submersibles, land-based vehicles (small robots with wheels, cars or trucks), aerial vehicles, and unmanned aerial vehicles (UAV). The level of autonomy ranges from fully autonomous (unmanned) vehicles to vehicles where computer vision based systems support a driver or a pilot in various situations. Fully autonomous vehicles typically use computer vision for navigation, i.e. for knowing where it is, or for producing a map of its environment (SLAM) and for detecting obstacles. It can also be used for detecting certain task specific events, e. g., a UAV looking for forest fires. Examples of supporting systems are obstacle warning systems in cars, and systems for autonomous landing of aircraft. Several car manufacturers have demonstrated systems for autonomous driving of cars, but this technology has still not reached a level where it can be put on the market. There are ample examples of military autonomous vehicles ranging from advanced missiles, to UAVs for recon missions or missile guidance. Space exploration is already being made with autonomous vehicles using computer vision, e. g., NASA's Mars Exploration Rover. Other application areas include:

169
y y

English for Computer and IT Engineers Support of visual effects creation for cinema and broadcast, e.g., camera tracking (matchmoving). Surveillance.

14-4. Typical tasks of computer vision


Each of the application areas described above employ a range of computer vision tasks; more or less well-defined measurement problems or processing problems, which can be solved using a variety of methods. Some examples of typical computer vision tasks are presented below.

14-4-1.

ecognition

The classical problem in computer vision, image processing and machine vision is that of determining whether or not the image data contains some specific object, feature, or activity. This task can normally be solved robustly and without effort by a human, but is still not satisfactorily solved in computer vision for the general case: arbitrary objects in arbitrary situations. The existing methods for dealing with this problem can at best solve it only for specific objects, such as simple geometric objects (e.g., polyhedrons), human faces, printed or hand-written characters, or vehicles, and in specific situations, typically described in terms of well-defined illumination, background, and pose of the object relative to the camera. Different varieties of the recognition problem are described in the literature:
y

Recognition: one or several pre-specified or learned objects or object classes can be recognized, usually together with their 2D positions in the image or 3D poses in the scene. Identification: An individual instance of an object is recognized. Examples: identification of a specific person's face or fingerprint, or identification of a specific vehicle. Detection: the image data is scanned for a specific condition. Examples: detection of possible abnormal cells or tissues in medical images or detection of a vehicle in an automatic road toll system. Detection based on relatively simple and fast computations is sometimes used for finding smaller regions of interesting image data which can be further analyzed by more computationally demanding techniques to produce a correct interpretation.

Several specialized tasks based on recognition exist, such as:

169

Computer vision 170


y

Content-based image retrieval: finding all images in a larger set of images which have a specific content. The content can be specified in different ways, for example in terms of similarity relative a target image (give me all images similar to image X), or in terms of high-level search criteria given as text input (give me all images which contains many houses, are taken during winter, and have no cars in them). Pose estimation: estimating the position or orientation of a specific object relative to the camera. An example application for this technique would be assisting a robot arm in retrieving objects from a conveyor belt in an assembly line situation. Optical character recognition (or OCR): identifying characters in images of printed or handwritten text, usually with a view to encoding the text in a format more amenable to editing or indexing (e.g. ASCII).

14-4-2. Motion
Several tasks relate to motion estimation, in which an image sequence is processed to produce an estimate of the velocity either at each points in the image or in the 3D scene. Examples of such tasks are:
y y

Egomotion: determining the 3D rigid motion of the camera. Tracking: following the movements of objects (e.g. vehicles or humans).

14-4-3. Scene reconstruction


Given one or (typically) more images of a scene, or a video, scene reconstruction aims at computing a 3D model of the scene. In the simplest case the model can be a set of 3D points. More sophisticated methods produce a complete 3D surface model.

14-4-4. Image restoration


The aim of image restoration is the removal of noise (sensor noise, motion blur, etc.) from images. The simplest possible approach for noise removal is various types of filters such as low-pass filters or median filters. More sophisticated methods assume a model of how the local image structures look like, a model which distinguishes them from the noise. By first analysing the image data in terms of the local image structures, such as lines or edges, and then controlling the filtering based on local information from the analysis step, a better level of noise removal is usually obtained compared to the simpler approaches.

171

English for Computer and IT Engineers

14-5. Computer vision systems


The organization of a computer vision system is highly application dependent. Some systems are stand-alone applications which solve a specific measurement or detection problem, while other constitute a sub-system of a larger design which, for example, also contains sub-systems for control of mechanical actuators, planning, information databases, man-machine interfaces, etc. The specific implementation of a computer vision system also depends on if its functionality is pre-specified or if some part of it can be learned or modified during operation. There are, however, typical functions which are found in many computer vision systems.
y

Image acquisition: A digital image is produced by one or several image sensor which, besides various types of light-sensitive cameras, includes range sensors, tomography devices, radar, ultra-sonic cameras, etc. Depending on the type of sensor, the resulting image data is an ordinary 2D image, a 3D volume, or an image sequence. The pixel values typically correspond to light intensity in one or several spectral bands (gray images or colour images), but can also be related to various physical measures, such as depth, absorption or reflectance of sonic or electromagnetic waves, or nuclear magnetic resonance. Pre-processing: Before a computer vision method can be applied to image data in order to extract some specific piece of information, it is usually necessary to process the data in order to assure that it satisfies certain assumptions implied by the method. Examples are o Re-sampling in order to assure that the image coordinate system is correct. o Noise reduction in order to assure that sensor noise does not introduce false information. o Contrast enhancement to assure that relevant information can be detected. o Scale-space representation to enhance image structures at locally appropriate scales. Feature extraction: Image features at various levels of complexity are extracted from the image data. Typical examples of such features are o Lines, edges and ridges. o Localized interest points such as corners, blobs or points.
More complex features may be related to texture, shape or motion.

171

Computer vision 172


y

Detection/Segmentation: At some point in the processing a decision is made about which image points or regions of the image are relevant for further processing. Examples are o Selection of a specific set of interest points o Segmentation of one or multiple image regions which contain a specific object of interest. High-level processing: At this step the input is typically a small set of data, for example a set of points or an image region which is assumed to contain a specific object. The remaining processing deals with, for example: o Verification that the data satisfy model-based and application specific assumptions. o Estimation of application specific parameters, such as object pose or object size. o Classifying a detected object into different categories.

173

English for Computer and IT Engineers

15. Artificial intelligence


Artificial intelligence (AI) is the intelligence of machines and the branch of computer science which aims to create it.

Figure 15-1: Garr Kasparov pla ing against Deep Blue, the first machine to win a chess match against a world champion.

Major AI textbooks define artificial intelligence as "the study and design of intelligent agents," where an intelligent agent is a system that perceives its environment and takes actions which maximize its chances of success. John McCarthy, who coined the term in 1956, defines it as "the science and engineering of making intelligent machines." Among the traits that researchers hope machines will exhibit are reasoning, knowledge, planning, learning, communication, perception and the ability to move and manipulate objects. General intelligence (or "strong AI") has not yet been achieved and is a long-term goal of some AI research. AI research uses tools and insights from many fields, including computer science, psychology, philosophy, neuroscience, cognitive science, linguistics, ontology, operations research, economics, control theory, probability, optimization and logic. AI research also overlaps with tasks such as robotics, control systems, scheduling, data mining, logistics, speech recognition, facial recognition and many others. Other names for the field have been proposed, such as computational intelligence, synthetic intelligence, intelligent systems, or computational rationality. These alternative names are sometimes used to set oneself apart from the part of AI dealing with symbols (considered outdated by many, see GOFAI) which is often associated with the term AI itself.

173

Artificial intelligence

174

15-1. History of AI research


In the middle of the 20th century, a handful of scientists began a new approach to building intelligent machines, based on recent discoveries in neurology, a new mathematical theory of information, an understanding of control and stability called cybernetics, and above all, by the invention of the digital computer, a machine based on the abstract essence of mathematical reasoning. The field of modern AI research was founded at conference on the campus of Dartmouth College in the summer of 1956. Those who attended would become the leaders of AI research for many decades, especially John McCarthy, Marvin Minsky, Allen Newell and Herbert Simon, who founded AI laboratories at MIT, CMU and Stanford. They and their students wrote programs that were, to most people, simply astonishing: computers were solving word problems in algebra, proving logical theorems and speaking English. By the middle 60s their research was heavily funded by the U.S. Department of Defense and they were optimistic about the future of the new field:
y y

1965, H. A. Simon: "Machines will be capable, within twenty years, of doing any work a man can do" 1967, Marvin Minsky: "Within a generation ... the problem of creating 'artificial intelligence' will substantially be solved."

These predictions, and many like them, would not come true. They had failed to recognize the difficulty of some of the problems they faced. In 1974, in response to the criticism of England's Sir James Lighthill and ongoing pressure from Congress to fund more productive projects, the U.S. and British governments cut off all undirected, exploratory research in AI. This was the first AI Winter. In the early 80s, AI research was revived by the commercial success of expert systems (a form of AI program that simulated the knowledge and analytical skills of one or more human experts). By 1985 the market for AI had reached more than a billion dollars and governments around the world poured money back into the field. However, just a few years later, beginning with the collapse of the Lisp Machine market in 1987, AI once again fell into disrepute, and a second, more lasting AI Winter began. In the 90s and early 21st century AI achieved its greatest successes, albeit somewhat behind the scenes. Artificial intelligence was adopted throughout the technology industry, providing the heavy lifting for logistics, data mining, medical

175

English for Computer and IT Engineers

diagnosis and many other areas. The success was due to several factors: the incredible power of computers today (see Moore's law), a greater emphasis on solving specific subproblems, the creation of new ties between AI and other fields working on similar problems, and above all a new commitment by researchers to solid mathematical methods and rigorous scientific standards.

15-2. Philosophy of AI
Artificial intelligence, by claiming to be able to recreate the capabilities of the human mind, is a both challenge and an insipiration for philosophy. Are there limits to how intelligent machines can be? Is there an essential difference between human intelligence and artificial intelligence? Can a machine have a mind and consciousness? A few of the most influential answers to these questions are given below.
y

Turing's "polite convention": If a machine acts as intelligently as a human being, then it is as intelligent as a human being. Alan Turing theorized that, ultimately, we can only judge the intelligence of machine based on its behavior. This theory forms the basis of the Turing test. The Dartmouth proposal: "Every aspect of learning or any other feature of intelligence can be so precisely described that a machine can be made to simulate it." This assertion was printed in the proposal for the Dartmouth Conference of 1956, and represents the position of most working AI researchers. Newell and Simon's physical symbol system hypothesis: "A physical symbol system has the necessary and sufficient means of general intelligent action." This statement claims that the essence of intelligence is symbol manipulation. Hubert Dreyfus argued that, on the contrary, human expertise depends on unconscious instinct rather than conscious symbol manipulation and on having a "feel" for the situation rather than explicit symbolic knowledge. Gdel's incompleteness theorem: A formal system (such as a computer program) can not prove all true statements. Roger Penrose is among those who claim that Gdel's theorem limits what machines can do. Searle's strong AI hypothesis: "The appropriately programmed computer with the right inputs and outputs would thereby have a mind in exactly the

175

Artificial intelligence

176

same sense human beings have minds." Searle counters this assertion with his Chinese room argument, which asks us to look inside the computer and try to find where the "mind" might be.
y

The artificial brain argument: The brain can be simulated. Hans Moravec, Ray Kurzweil and others have argued that it is technologically feasible to copy the brain directly into hardware and software, and that such a simulation will be essentially identical to the original. This argument combines the idea that a suitably powerful machine can simulate any process, with the materialist idea that the mind is the result of physical processes in the brain.

15-3. AI research
15-3-1. Problems of AI
While there is no universally accepted definition of intelligence, AI researchers have studied several traits that are considered essential.

Deduction, reasoning, problem solving


Early AI researchers developed algorithms that imitated the process of conscious, step-by-step reasoning that human beings use when they solve puzzles, play board games, or make logical deductions. By the late 80s and 90s, AI research had also developed highly successful methods for dealing with uncertain or incomplete information, employing concepts from probability and economics. For difficult problems, most of these algorithms can require enormous computational resources most experience a "combinatorial explosion": the amount of memory or computer time required becomes astronomical when the problem goes beyond a certain size. The search for more efficient problem solving algorithms is a high priority for AI research. It is not clear, however, that conscious human reasoning is any more efficient when faced with a difficult abstract problem. Cognitive scientists have demonstrated that human beings solve most of their problems using unconscious reasoning, rather than the conscious, step-by-step deduction that early AI research was able to model. Embodied cognitive science argues that unconscious sensorimotor skills are essential to our problem solving abilities. It is hoped that sub-symbolic methods, like computational intelligence and situated AI, will be able to model these

177

English for Computer and IT Engineers

instinctive skills. The problem of unconscious problem solving, which forms part of our commonsense reasoning, is largely unsolved.

15-3-2.

nowledge representation

Knowledge representation and knowledge engineering are central to AI research. Many of the problems machines are expected to solve will require extensive knowledge about the world. Among the things that AI needs to represent are: objects, properties, categories and relations between objects; situations, events, states and time; causes and effects; knowledge about knowledge (what we know about what other people know); and many other, less well researched domains. A complete representation of "what exists" is an ontology (borrowing a word from traditional philosophy), of which the most general are called upper ontologies. Among the most difficult problems in knowledge representation are:
y

Default reasoning and the qualification problem: Many of the things people know take the form of "working assumptions." For example, if a bird comes up in conversation, people typically picture an animal that is fist sized, sings, and flies. None of these things are true about birds in general. John McCarthy identified this problem in 1969 as the qualification problem: for any commonsense rule that AI researchers care to represent, there tend to be a huge number of exceptions. Almost nothing is simply true or false in the way that abstract logic requires. AI research has explored a number of solutions to this problem. Unconscious knowledge: Much of what people know isn't represented as "facts" or "statements" that they could actually say out loud. They take the form of intuitions or tendencies and are represented in the brain unconsciously and sub-symbolically. This unconscious knowledge informs, supports and provides a context for our conscious knowledge. As with the related problem of unconscious reasoning, it is hoped that situated AI or computational intelligence will provide ways to represent this kind of knowledge. The breadth of common sense knowledge: The number of atomic facts that the average person knows is astronomical. Research projects that attempt to build a complete knowledge base of commonsense knowledge, such as Cyc, require enormous amounts of tedious step-by-step ontological engineering they must be built, by hand, one complicated concept at a time.

177

Artificial intelligence

178

15-3-3. Planning
Intelligent agents must be able to set goals and achieve them. They need a way to visualize the future (they must have a representation of the state of the world and be able to make predictions about how their actions will change it) and be able to make choices that maximize the utility (or "value") of the available choices. In some planning problems, the agent can assume that it is the only thing acting on the world and it can be certain what the consequences of its actions may be. However, if this is not true, it must periodically check if the world matches its predictions and it must change its plan as this becomes necessary, requiring the agent to reason under uncertainty. Multi-agent planning uses the cooperation and competition of many agents to achieve a given goal. Emergent behavior such as this is used by evolutionary algorithms and swarm intelligence.

15-3-4. Learning
Important machine learning problems are:
y y

Unsupervised learning: find a model that matches a stream of input "experiences", and be able to predict what new "experiences" to expect. Supervised learning, such as classification (be able to determine what category something belongs in, after seeing a number of examples of things from each category), or regression (given a set of numerical input/output examples, discover a continuous function that would generate the outputs from the inputs). Reinforcement learning: the agent is rewarded for good responses and punished for bad ones. (These can be analyzed in terms decision theory, using concepts like utility).

The mathematical analysis of machine learning algorithms and their performance is a branch of theoretical computer science known as computational learning theory.

15-3-5. Natural language processing


Natural language processing gives machines the ability to read and understand the languages human beings speak. Many researchers hope that a sufficiently powerful natural language processing system would be able to acquire knowledge on its

179

English for Computer and IT Engineers

own, by reading the existing text available over the internet. Some straightforward applications of natural language processing include information retrieval (or text mining) and machine translation. Motion and manipulation The field of robotics is closely related to AI. Intelligence is required for robots to be able to handle such tasks as object manipulation and navigation, with subproblems of localization (knowing where you are), mapping (learning what is around you) and motion planning (figuring out how to get there).

Figure 15-2: ASIMO uses sensors and intelligent algorithms to avoid obstacles and navigate stairs.

15-3-6. Cybernetics and brain simulation


In the 40s and 50s, a number of researchers explored the connection between neurology, information theory, and cybernetics. Some of them built machines that used electronic networks to exhibit rudimentary intelligence, such as W. Grey Walter's turtles and the Johns Hopkins Beast. Many of these researchers gathered for meetings of the Teleological Society at Princeton and the Ratio Club in England.

179

Artificial intelligence

180

Figure 15-4: The human brain provides inspiration for artificial intelligence researchers, however there is no consensus on how closel it should be simulated.

15-3-7. Traditional symbolic AI


When access to digital computers became possible in the middle 1950s, AI research began to explore the possibility that human intelligence could be reduced to symbol manipulation. The research was centered in three institutions: CMU, Stanford and MIT, and each one developed its own style of research. John Haugeland named these approaches to AI "good old fashioned AI" or "GOFAI". Cognitive simulation Economist Herbert Simon and Alan Newell studied human problem solving skills and attempted to formalize them, and their work laid the foundations of the field of artificial intelligence, as well as cognitive science, operations research and management science. Their research team performed psychological experiments to demonstrate the similarities between human problem solving and the programs (such as their "General Problem Solver") they were developing. This tradition, centered at Carnegie Mellon University would eventually culminate in the development of the Soar architecture in the middle 80s. Logical AI Unlike Newell and Simon, John McCarthy felt that machines did not need to simulate human thought, but should instead try to find the essence of abstract reasoning and problem solving, regardless of whether people used the same algorithms. His laboratory at Stanford (SAIL) focused on using formal logic to solve a wide variety of problems, including knowledge

181

English for Computer and IT Engineers

representation, planning and learning. Logic was also focus of the work at the University of Edinburgh and elsewhere in Europe which led to the development of the programming language Prolog and the science of logic programming. "Scruffy" symbolic AI Researchers at MIT (such as Marvin Minsky and Seymour Papert) found that solving difficult problems in vision and natural language processing required ad-hoc solutions they argued that there was no simple and general principle (like logic) that would capture all the aspects of intelligent behavior. Roger Schank described their "anti-logic" approaches as "scruffy" (as opposed to the "neat" paradigms at CMU and Stanford), and this still forms the basis of research into commonsense knowledge bases (such as Doug Lenat's Cyc) which must be built one complicated concept at a time. Knowledge based AI When computers with large memories became available around 1970, researchers from all three traditions began to build knowledge into AI applications. This "knowledge revolution" led to the development and deployment of expert systems (introduced by Edward Feigenbaum), the first truly successful form of AI software. The knowledge revolution was also driven by the realization that truly enormous amounts of knowledge would be required by many simple AI applications.

15-2-17. Search and optimization


Many problems in AI can be solved in theory by intelligently searching through many possible solutions: Reasoning can be reduced to performing a search. For example, logical proof can be viewed as searching for a path that leads from premises to conclusions, where each step is the application of an inference rule. Planning algorithms search through trees of goals and subgoals, attempting to find a path to a target goal, a process called means-ends analysis. Robotics algorithms for moving limbs and grasping objects use local searches in configuration space. Many learning algorithms use search algorithms based on optimization. Simple exhaustive searches are rarely sufficient for most real world problems: the search space (the number of places to search) quickly grows to astronomical numbers. The result is a search that is too slow or never completes. The solution, for many problems, is to use "heuristics" or "rules of thumb" that eliminate choices that are unlikely to lead to the goal (called "pruning the search tree"). Heuristics supply the program with a "best guess" for what path the solution lies on.

181

Artificial intelligence

182

A very different kind of search came to prominence in the 1990s, based on the mathematical theory of optimization. For many problems, it is possible to begin the search with some form of a guess and then refine the guess incrementally until no more refinements can be made. These algorithms can be visualized as blind hill climbing: we begin the search at a random point on the landscape, and then, by jumps or steps, we keep moving our guess uphill, until we reach the top. Other optimization algorithms are simulated annealing, beam search and random optimization. Evolutionary computation uses a form of optimization search. For example, they may begin with a population of organisms (the guesses) and then allow them to mutate and recombine, selecting only the fittest to survive each generation (refining the guesses). Forms of evolutionary computation include swarm intelligence algorithms (such as ant colony or particle swarm optimization) and evolutionary algorithms (such as genetic algorithms and genetic programming).

15-3-8. Logic
Logic was introduced into AI research by John McCarthy in his 1958 Advice Taker proposal. The most important technical development was J. Alan Robinson's discovery of the resolution and unification algorithm for logical deduction in 1963. This procedure is simple, complete and entirely algorithmic, and can easily be performed by digital computers. However, a naive implementation of the algorithm quickly leads to a combinatorial explosion or an infinite loop. In 1974, Robert Kowalski suggested representing logical expressions as Horn clauses (statements in the form of rules: "if p then q"), which reduced logical deduction to backward chaining or forward chaining. This greatly alleviated (but did not eliminate) the problem. Logic is used for knowledge representation and problem solving, bu it can be t applied to other problems as well. For example, the satplan algorithm uses logic for planning, and inductive logic programming is a method for learning. There are several different forms of logic used in AI research.
y y

Propositional or sentential logic is the logic of statements which can be true or false. First-order logic also allows the use of quantifiers and predicates, and can express facts about objects, their properties, and their relations with each other. Fuzzy logic, a version of first-order logic which allows the truth of a statement to be represented as a value between 0 and 1, rather than simply

183

English for Computer and IT Engineers True (1) or False (0). Fuzzy systems can be used for uncertain reasoning and have been widely used in modern industrial and consumer product control systems. Default logics, non-monotonic logics and circumscription are forms of logic designed to help with default reasoning and the qualification problem. Several extensions of logic have been designed to handle specific domains of knowledge, such as: description logics; situation calculus, event calculus and fluent calculus (for representing events and time); causal calculus; belief calculus; and modal logics.

15-3-9. Classifiers and statistical learning methods


The simplest AI applications can be divided into two types: classifiers ("if shiny then diamond") and controllers ("if shiny then pick up"). Controllers do however also classify conditions before inferring actions, and therefore classification forms a central part of many AI systems. Classifiers are functions that use pattern matching to determine a closest match. They can be tuned according to examples, making them very attractive for use in AI. These examples are known as observations or patterns. In supervised learning, each pattern belongs to a certain predefined class. A class can be seen as a decision that has to be made. All the observations combined with their class labels are known as a data set. When a new observation is received, that observation is classified based on previous experience. A classifier can be trained in various ways; there are many statistical and machine learning approaches. A wide range of classifiers are available, each with its strengths and weaknesses. Classifier performance depends greatly on the characteristics of the data to be classified. There is no single classifier that works best on all given problems; this is also referred to as the "no free lunch" theorem. Various empirical tests have been performed to compare classifier performance and to find the characteristics of data that determine classifier performance. Determining a suitable classifier for a given problem is however still more an art than science. The most widely used classifiers are the neural network,[140] kernel methods such as the support vector machine,[141] k-nearest neighbor algorithm,[142] Gaussian mixture model,[143] naive Bayes classifier,[144] and decision tree.[145] The performance of

183

Artificial intelligence

184

these classifiers have been compared over a wide range of classification tasks[146] in order to find data characteristics that determine classifier performance.

15-3-10. Neural networks


The study of artificial neural networks began in the decade before the field AI research was founded. In the 1960s Frank Rosenblatt developed an important early version, the perceptron. Paul Werbos developed the backpropagation algorithm for multilayer perceptrons in 1974, which led to a renaissance in neural network research and connectionism in general in the middle 1980s. The Hopfield net, a form of attractor network, was first described by John Hopfield in 1982.

Figure 15-5: A neural network is an interconnected group of nodes, akin to the vast network of neurons in the human brain.

Common network architectures which have been developed include the feedforward neural network, the radial basis network, the Kohonen self-organizing map and various recurrent neural networks. Neural networks are applied to the problem of learning, using such techniques as Hebbian learning, competitive learning and the relatively new field of Hierarchical Temporal Memory which simulates the architecture of the neocortex.

15-4. Applications of artificial intelligence


Artificial intelligence has successfully been used in a wide range of fields including medical diagnosis, stock trading, robot control, law, scientific discovery and toys. Frequently, when a technique reaches mainstream use it is no longer considered artificial intelligence, sometimes described as the AI effect. It may also become integrated into artificial life.

185

English for Computer and IT Engineers

16. Human-computer interaction


This article is about the interaction between users and computers. For the direct communication between brain cells and computers, please see Brain-computer interface.

Humancomputer interaction or HCI is the study of interaction between people (users) and computers. It is often regarded as the intersection of computer science, behavioral sciences, design and several other fields of study. Interaction between users and computers occurs at the user interface (or simply interface), which includes both software and hardware, for example, general-purpose computer peripherals and large-scale mechanical systems, such as aircraft and power plants. The following definition is given by the Association for Computing Machinery:
"Human-computer interaction is a discipline concerned with the design, evaluation and implementation of interactive computing systems for human use and with the study of major phenomena surrounding them." Because human-computer interaction studies a human and a machine in conjunction, it draws from supporting knowledge on both the machine and the human side. On the machine side, techniques in computer graphics, operating systems, programming languages, and development environments are relevant. On the human side, communication theory, graphic and industrial design disciplines, linguistics, social sciences, cognitive psychology, and human performance are relevant. Engineering and design methods are also relevant. Due to the multidisciplinary nature of HCI, people with different backgrounds contribute to its success. However, due to the different value systems of its diverse members, the collaboration can be challenging.HCI is also sometimes referred to as man machine interaction (MMI) or computerhuman interaction (CHI).

16-1. Goals
A basic goal of HCI is to improve the interactions between users and computers by making computers more usable and receptive to the user's needs. Specifically, HCI is concerned with:

185

Human-computer interaction 186


y

y y y y

methodologies and processes for designing interfaces (i.e., given a task and a class of users, design the best possible interface within given constraints, optimizing for a desired property such as learnability or efficiency of use) methods for implementing interfaces (e.g. software toolkits and libraries; efficient algorithms) techniques for evaluating and comparing interfaces developing new interfaces and interaction techniques developing descriptive and predictive models and theories of interaction

A long term goal of HCI is to design systems that minimize the barrier between the human's cognitive model of what they want to accomplish and the computer's understanding of the user's task. Professional practitioners in HCI are usually designers concerned with the practical application of design methodologies to real-world problems. Their work often revolves around designing graphical user interfaces and web interfaces. Researchers in HCI are interested in developing new design methodologies, experimenting with new hardware devices, prototyping new software systems, exploring new paradigms for interaction, and developing models and theories of interaction.

16-2. Differences with related fields


HCI differs with human factors in that there is more of a focus on users working with computers rather than other kinds of machines or designed artifacts, and an additional focus on how to implement the (software and hardware) mechanisms behind computers to support human-computer interaction. HCI also differs with ergonomics in that there is less of a focus on repetitive work-oriented tasks and procedures, and much less emphasis on physical stress and the physical form or industrial design of physical aspects of the user interface, such as the physical form of keyboards and mice.

16-3. Design principles


When evaluating a current user interface, or designing a new user interface, it is important to keep in mind the following experimental design principles:
y

Early focus on user(s) and task(s): Establish how many users are needed to perform the task(s) and determine who the appropriate users should be;

187

English for Computer and IT Engineers someone that has never used the interface, and will not use the interface in the future, is most likely not a valid user. In addition, define the task(s) the users will be performing and how often the task(s) need to be performed. Empirical measurement: Test the interface early on with real users who come in contact with the interface on an everyday basis, respectively. Keep in mind that results may be altered if the performance level of the user is not an accurate depiction of the real human-computer interaction. Establish quantitative usability specifics such as: the number of users performing the task(s), the time to complete the task(s), and the number of errors made during the task(s). Iterative design: After determining the users, tasks, and empirical measurements to include, perform the following iterative design steps: Design the user interface Test Analyze results Repeat

1. 2. 3. 4.

Repeat the iterative design process until a sensible, user-friendly interface is created.

16-4. Design methodologies


A number of diverse methodologies outlining techniques for human computer interaction design have emerged since the rise of the field in the 1980s. Most design methodologies stem from a model for how users, designers, and technical systems interact. Early methodologies, for example, treated users' cognitive processes as predictable and quantifiable and encouraged design practitioners to look to cognitive science results in areas such as memory and attention when designing user interfaces. Modern models tend to focus on a constant feedback and conversation between users, designers, and engineers and push for technical systems to be wrapped around the types of experiences users want to have, rather than wrapping user experience around a completed system.
y

User-centered design: user-centered design (UCD) is a modern, widely practiced design philosophy rooted in the idea that users must take centerstage in the design of any computer system. Users, designers and technical practitioners work together to articulate the wants, needs and limitations of the user and create a system that addresses these elements. Often, usercentered design projects are informed by ethnographic studies of the

187

Human-computer interaction 188 environments in which users will be interacting with the system. This practice is similar, but not identical to Participatory Design, which emphasizes the possibility for end-users to contribute actively through shared design sessions and workshops.
y

Principles of User Interface Design: these are seven principles that may be considered at any time during the design of a user interface in any order, namely Tolerance, Simplicity, Visibility, Affordance, Consistency, Structure and Feedback.
See List of human-computer interaction topics#Interface design methods for more

16-5. Display design


Displays are human-made artifacts designed to support the perception of relevant system variables and to facilitate further processing of that information. Before a display is designed, the task that the display is intended to support must be defined (e.g. navigating, controlling, decision making, learning, entertaining, etc.). A user or operator must be able to process whatever information that a system generates and displays; therefore, the information must be displayed according to principles in a manner that will support perception, situation awareness, and understanding.

THIRTEEN PRINCIPLES OF DISPLAY DESIGN


These principles of human perception and information processing can be utilized to create an effective display design. A reduction in errors, a reduction in required training time, an increase in efficiency, and an increase in user satisfaction are a few of the many potential benefits that can be achieved through utilization of these principles. Certain principles may not be applicable to different displays or situations. Some principles may seem to be conflicting, and there is no simple solution to say that one principle is more important than another. The principles may be tailored to a specific design or situation. Striking a functional balance among the principles is critical for an effective design.

Perceptual Principles
1. Make displays legible (or audible)

189

English for Computer and IT Engineers

A displays legibility is critical and necessary for designing a usable display. If the characters or objects being displayed cannot be discernible, then the operator cannot effectively make use of them. 2. Avoid absolute judgment limits Do not ask the user to determine the level of a variable on the basis of a single sensory variable (e.g. color, size, loudness). These sensory variables can contain many possible levels. 3. Top-down processing Signals are likely perceived and interpreted in accordance with what is expected based on a users past experience. If a signal is presented contrary to the users expectation, more physical evidence of that signal may need to be presented to assure that it is understood correctly. 4. Redundancy gain If a signal is presented more than once, it is more likely that it will be understood correctly. This can be done by presenting the signal in alternative physical forms (e.g. color and shape, voice and print, etc.), as redundancy does not imply repetition. A traffic light is a good example of redundancy, as color and position are redundant. 5. Similarity causes confusion: Use discriminable elements Signals that appear to be similar will likely be confused. The ratio of similar features to different features causes signals to be similar. For example, A423B9 is more similar to A423B8 than 92 is to 93. Unnecessary similar features should be removed and dissimilar features should be highlighted.

Mental Model Principles


6. Principle of pictorial realism A display should look like the variable that it represents (e.g. high temperature on a thermometer shown as a higher vertical level). If there are multiple elements, they can be configured in a manner that looks like it would in the represented environment.

189

Human-computer interaction 190 7. Principle of the moving part Moving elements should move in a pattern and direction compatible with the users mental model of how it actually moves in the system. For example, the moving element on an altimeter should move upward with increasing altitude.

Principles Based on Attention


8. Minimizing information access cost When the users attention is averted from one location to another to access necessary information, there is an associated cost in time or effort. A display design should minimize this cost by allowing for frequently accessed sources to be located at the nearest possible position. However, adequate legibility should not be sacrificed to reduce this cost. 9. Proximity compatibility principle Divided attention between two information sources may be necessary for the completion of one task. These sources must be mentally integrated and are defined to have close mental proximity. Information access costs should be low, which can be achieved in many ways (e.g. close proximity, linkage by common colors, patterns, shapes, etc.). However, close display proximity can be harmful by causing too much clutter. 10. Principle of multiple resources A user can more easily process information across different resources. For example, visual and auditory information can be presented simultaneously rather than presenting all visual or all auditory information.

Memory Principles
11. Replace memory with visual information: knowledge in the world A user should not need to retain important information solely in working memory or to retrieve it from long-term memory. A menu, checklist, or another display can aid the user by easing the use of their memory. However, the use of memory may sometimes benefit the user rather than the need for reference to some type of knowledge in the world (e.g. a expert computer operator would rather use direct commands from their memory rather than referring to a manual). The use of

191

English for Computer and IT Engineers

knowledge in a users head and knowledge in the world must be balanced for an effective design. 12. Principle of predictive aiding Proactive actions are usually more effective than reactive actions. A display should attempt to eliminate resource-demanding cognitive tasks and replace them with simpler perceptual tasks to reduce the use of the users mental resources. This will allow the user to not only focus on current conditions, but also think about possible future conditions. An example of a predictive aid is a road sign displaying the distance from a certain destination. 13. Principle of consistency Old habits from other displays will easily transfer to support processing of new displays if they are designed in a consistent manner. A users long-term memory will trigger actions that are expected to be appropriate. A design must accept this fact and utilize consistency among different displays.

16-6. Future developments in HCI


The means by which humans interact with computers continues to evolve rapidly. Human-computer interaction is affected by the forces shaping the nature of future computing. These forces include:
y y y y y y y

Decreasing hardware costs leading to larger memories and faster systems Miniaturization of hardware leading to portability Reduction in power requirements leading to portability New display technologies leading to the packaging of computational devices in new forms Specialized hardware leading to new functions Increased development of network communication and distributed computing Increasingly widespread use of computers, especially by people who are outside of the computing profession

191

Human-computer interaction 192


y

Increasing innovation in input techniques (i.e., voice, gesture, pen), combined with lowering cost, leading to rapid computerization by people previously left out of the "computer revolution." Wider social concerns leading to improved access to computers by currently disadvantaged groups

The future for HCI is expected to include the following characteristics:

Ubiquitous communication Computers will communicate through high speed local networks, nationally over wide-area networks, and portably via infrared, ultrasonic, cellular, and other technologies. Data and computational services will be portably accessible from many if not most locations to which a user travels. High functionality systems Systems will have large numbers of functions associated with them. There will be so many systems that most users, technical or non-technical, will not have time to learn them in the traditional way (e.g., through thick manuals). Mass availability of computer graphics Computer graphics capabilities such as image processing, graphics transformations, rendering, and interactive animation will become widespread as inexpensive chips become available for inclusion in general workstations. Mixed media Systems will handle images, voice, sounds, video, text, formatted data. These will be exchangeable over communication links among users. The separate worlds of consumer electronics (e.g., stereo sets, VCRs, televisions) and computers will partially merge. Computer and print worlds will continue to cross assimilate each other. High-bandwidth interaction The rate at which humans and machines interact will increase substantially due to the changes in speed, computer graphics, new media, and new input/output devices. This will lead to some qualitatively different interfaces, such as virtual reality or computational video. Large and thin displays New display technologies will finally mature enabling very large displays and also displays that are thin, light weight, and have low power consumption. This will have large effects on portability and will enable the development of paper-like, pen-based computer interaction systems very different in feel from desktop workstations of the present.

193

English for Computer and IT Engineers

Embedded computation Computation will pass beyond desktop computers into every object for which uses can be found. The environment will be alive with little computations from computerized cooking appliances to lighting and plumbing fixtures to window blinds to automobile braking systems to greeting cards. To some extent, this development is already taking place. The difference in the future is the addition of networked communications that will allow many of these embedded computations to coordinate with each other and with the user. Human interfaces to these embedded devices will in many cases be very different from those appropriate to workstations. Augmented reality A common staple of science fiction, augmented reality refers to the notion of layering relevant information into our vision of the world. Existing projects show real-time statistics to users performing difficult tasks, such as manufacturing. Future work might include augmenting our social interactions by providing additional information about those we converse with. Group interfaces Interfaces to allow groups of people to coordinate will be common (e.g., for meetings, for engineering projects, for authoring joint documents). These will have major impacts on the nature of organizations and on the division of labor. Models of the group design process will be embedded in systems and will cause increased rationalization of design. User Tailorability Ordinary users will routinely tailor applications to their own use and will use this power to invent new applications based on their understanding of their own domains. Users, with their deeper knowledge of their own knowledge domains, will increasingly be important sources of new applications at the expense of generic systems programmers (with systems expertise but low domain expertise). Information Utilities Public information utilities (such as home banking and shopping) and specialized industry services (e.g., weather for pilots) will continue to proliferate. The rate of proliferation will accelerate with the introduction of highbandwidth interaction and the improvement in quality of interfaces.

16-7. Some notes on terminology


y

HCI vs MMI. MMI has been used to refer to any manmachine interaction, including, but not exclusively computers. The term was used early on in control room design for anything operated on or observed by an operator, e.g. dials, switches, knobs and gauges.

193

Human-computer interaction 194


y

HCI vs CHI. The acronym CHI (pronounced kai), for computerhuman interaction, has been used to refer to this field, perhaps more frequently in the past than now. However, researchers and practitioners now refer to their field of study as HCI (pronounced as an initialism), which perhaps rose in popularity partly because of the notion that the human, and the human's needs and time, should be considered first, and are more important than the machine's. This notion became increasingly relevant towards the end of the 20th century as computers became increasingly inexpensive (as did CPU time), small, and powerful. Since the turn of the millennium, the field of human-centered computing has emerged with an even more pronounced focus on understanding human beings as actors within socio technical systems. Usability vs Usefulness. Design methodologies in HCI aim to create user interfaces that are usable, i.e. that can be operated with ease and efficiency. However, an even more basic requirement is that the user interface be useful, i.e. that it allows the user to complete relevant tasks. Intuitive and Natural. Software products are often touted by marketers as being "intuitive" and "natural" to use, often simply because they have a graphical user interface. Many researchers in HCI view such claims as unfounded (e.g. a poorly designed GUI may be very unusable), and some object to the use of the words intuitive and natural as vague and/or misleading, since these are very context-dependent terms.

16-8. Humancomputer interface


The humancomputer interface can be described as the point of communication between the human user and the computer. The flow of information between the human and computer is defined as the loop of interaction. The loop of interaction has several aspects to it including:
y y y

Task Environment: The conditions and goals set upon the user. Machine Environment: The environment that the computer is connected i.e a laptop in a college student's dorm room. Areas of the Interface: Non-overlapping areas involve processes of the human and computer not pertaining to their interaction. While the overlapping areas, only concern themselves with the processes pertaining to their interaction.

195
y y y

English for Computer and IT Engineers

Input Flow: Begins in the task environment as the user has some task that requires using their computer. Output : The flow of information that originates in the machine environment. Feedback: Loops through the interface that evaluate, moderate, and confirm processes as they pass from the human through the interface to the computer and back.

195

Machine translation 196

17.

Machine translation

Machine translation, sometimes referred to by the abbreviation MT, is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another. At its basic level, MT performs simple substitution of words in one natural language for words in another. Using corpus techniques, more complex translations may be attempted, allowing for better handling of differences in linguistic typology, phrase recognition, and translation of idioms, as well as the isolation of anomalies.
Current machine translation software often allows for customisation by domain or profession (such as weather reports) improving output by limiting the scope of allowable substitutions. This technique is particularly effective in domains where formal or formulaic language is used. It follows then that machine translation of government and legal documents more readily produces usable output than conversation or less standardised text. Improved output quality can also be achieved by human intervention: for example, some systems are able to translate more accurately if the user has unambiguously identified which words in the text are names. With the assistance of these techniques, MT has proven useful as a tool to assist human translators, and in some cases can even produce output that can be used "as is". However, current systems are unable to produce output of the same quality as a human translator, particularly where the text to be translated uses casual language.

17-1. History
The idea of machine translation may be traced back to 17th century. In 1629, Ren Descartes proposes a universal language, with equivalent ideas in different tongues sharing one symbol. In the 1950s, after World War II, The Georgetown experiment (1954) involved fully-automatic translation of over sixty Russian sentences into English. The experiment was a great success and ushered in an era of substantial funding for machine-translation research. The authors claimed that within three to five years, machine translation would be a solved problem. Real progress was much slower, however, and after the ALPAC report (1966), which found that the ten-year-long research had failed to fulfill expectations, funding was greatly reduced. Beginning in the late 1980s, as computational power

197

English for Computer and IT Engineers

increased and became less expensive, more interest was shown in statistical models for machine translation. The idea of using digital computers for translation of natural languages was proposed as early as 1946 by A.D.Booth and possibly others. The Georgetown experiment was by no means the first such application, and a demonstration was made in 1954 on the APEXC machine at Birkbeck College (London Univ.) of a rudimentary translation of English into French. Several papers on the topic were published at the time, and even articles in popular journals (see for example Wireless World, Sept. 1955, Cleave and Zacharov). A similar application, also pioneered at Birkbeck College at the time, was reading and composing Braille texts by computer. Recently, the Internet has emerged as a global information infrastructure, revolutionizing access to any information, as well as fast information transfer and exchange. Using Internet and e-mail technology, people need to communicate rapidly over long distances across continent boundaries. Not all of these Internet users, however, can use their own language for global communication to different people with different languages. Therefore, using machine translation software, people can possibly communicate and contact one another around the world in their own mother tongue, in the near future.

17-2. Translation process


The translation process may be stated as: 1. Decoding the meaning of the source text; and 2. Re-encoding this meaning in the target language. Behind this ostensibly simple procedure lies a complex cognitive operation. To decode the meaning of the source text in its entirety, the translator must interpret and analyse all the features of the text, a process that requires in-depth knowledge of the grammar, semantics, syntax, idioms, etc., of the source language, as well as the culture of its speakers. The translator needs the same in-depth knowledge to reencode the meaning in the target language. Therein lies the challenge in machine translation: how to program a computer that will "understand" a text as a person does, and that will "create" a new text in the target language that "sounds" as if it has been written by a person.

197

Machine translation This problem may be approached in a number of ways.

198

17-3. Approaches
Machine translation can use a method based on linguistic rules, which means that words will be translated in a linguistic way the most suitable (orally speaking) words of the target language will replace the ones in the source language.

Figure 17-1: P ramid showing comparative depths of intermediar representation, interlingual machine translation at the peak, followed b transfer-based, then direct translation.

It is often argued that the success of machine translation requires the problem of natural language understanding to be solved first. Generally, rule-based methods parse a text, usually creating an intermediary, symbolic representation, from which the text in the target language is generated. According to the nature of the intermediary representation, an approach is described as interlingual machine translation or transfer-based machine translation. These methods require extensive lexicons with morphological, syntactic, and semantic information, and large sets of rules. Given enough data, machine translation programs often work well enough for a native speaker of one language to get the approximate meaning of what is written by the other native speaker. The difficulty is getting enough data of the right kind to support the particular method. For example, the large multilingual corpus of data needed for statistical methods to work is not necessary for the grammar based -

199

English for Computer and IT Engineers

methods. But then, the grammar methods need a skilled linguist to carefully design the grammar that they use. To translate between closely related languages, a technique referred to as shallowtransfer machine translation may be used.

17-3-1. Rule-based
The rule-based machine translation paradigm includes transfer-based machine translation, interlingual machine translation and dictionary-based machine translation paradigms.

17-3-2. Transfer-based machine translation


Interlingual
Interlingual machine translation is one instance of rule-based machine-translation approaches. In this approach, the source language, i.e. the text to be translated, is transformed into an interlingual, i.e. source-/target-language-independent representation. The target language is then generated out of the interlingua.

Dictionary-based
Machine translation can use a method based on dictionary entries, which means that the words will be translated as they are by a dictionary.

17-3-3. Statistical
Statistical machine translation tries to generate translations using statistical methods based on bilingual text corpora, such as the Canadian Hansard corpus, the English-French record of the Canadian parliament and EUROPARL, the record of the European Parliament. Where such corpora are available, impressive results can be achieved translating texts of a similar kind, but such corpora are still very rare. The first statistical machine translation software was CANDIDE from IBM. Google used SYSTRAN for several years, but has switched to a statistical translation method in October 2007. Recently, they improved their translation capabilities by inputting approximately 200 billion words from United Nations materials to train their system. Accuracy of the translation has improved.

17-3-4. Example-based

199

Machine translation 200 Example-based machine translation (EBMT) approach is often characterised by its use of a bilingual corpus as its main knowledge base, at run-time. It is essentially a translation by analogy and can be viewed as an implementation of case-based reasoning approach of machine learning.

17-4. Major issues


17-4-1. Disambiguation
Word-sense disambiguation concerns finding a suitable translation when a word can have more than one meaning. The problem was first raised in the 1950s by Yehoshua Bar-Hillel. He pointed out that without a "universal encyclopedia", a machine would never be able to distinguish between the two meanings of a word. Today there are numerous approaches designed to overcome this problem. They can be approximately divided into "shallow" approaches and "deep" approaches. Shallow approaches assume no knowledge of the text. They simply apply statistical methods to the words surrounding the ambiguous word. Deep approaches presume a comprehensive knowledge of the word. So far, shallow approaches have been more successful. The late Claude Piron, a long-time translator for the United Nations and the World Health Organization, wrote that machine translation, at its best, automates the easier part of a translator's job; the harder and more time-consuming part usually involves doing extensive research to resolve ambiguities in the source text, which the grammatical and lexical exigencies of the target language require to be resolved: Why does a translator need a whole workday to translate five pages, and not an hour or two? ..... About 90% of an average text corresponds to these simple conditions. But unfortunately, there's the other 10%. It's that part that requires six [more] hours of work. There are the ambiguities one has to resolve. For instance, the author of the source text, an Australian physician, cited the example of an epidemic which was declared during World War II in a "Japanese prisoner of war camp". Was he talking about an American camp with Japanese prisoners or a Japanese camp with American prisoners? The English has two senses. It's necessary therefore to do research, maybe to the extent of a phone call to Australia. The ideal deep approach would require the translation software to do all the research necessary for this kind of disambiguation on its own; but this would

201

English for Computer and IT Engineers

require a higher degree of AI than has yet been attained. A shallow approach which simply guessed at the sense of the ambiguous English phrase that Piron mentions (based, perhaps, on which kind of prisoner-of-war camp is more often mentioned in a given corpus) would have a reasonable chance of guessing wrong fairly often. A shallow approach that involves "ask the user about each ambiguity" would, by Piron's estimate, only automate about 25% of a professional translator's job, leaving the harder 75% still to be done by a human.

17-5. Applications
There are now many software programs for translating natural language, several of them online, such as:
y y

SYSTRAN, which powers both Google translate and AltaVista's Babel Fish Promt, which powers online translation services at Voila.fr and Orange.fr

Although no system provides the holy grail of "fully automatic high quality machine translation" (FAHQMT), many systems produce reasonable output. Despite their inherent limitations, MT programs are used around the world. Probably the largest institutional user is the European Commission. Toggletext uses a transfer-based system (known as Kataku) to translate between English and Indonesian. Google has claimed that promising results were obtained using a proprietary statistical machine translation engine. The statistical translation engine used in the Google language tools for Arabic <-> English and Chinese <-> English has an overall score of 0.4281 over the runner-up IBM's BLEU-4 score of 0.3954 (Summer 2006) in tests conducted by the National Institute for Standards and Technology. Uwe Muegge has implemented a demo website that uses a controlled language in combination with the Google tool to produce fully automatic, highquality machine translations of his English, German, and French web sites. With the recent focus on terrorism, the military sources in the United States have been investing significant amounts of money in natural language engineering. InQ-Tel (a venture capital fund, largely funded by the US Intelligence Community, to stimulate new technologies through private sector entrepreneurs) brought up companies like Language Weaver. Currently the military community is interested

201

Machine translation 202 in translation and processing of languages like Arabic, Pashto, and Dari. Information Processing Technology Office in DARPA hosts programs like TIDES and Babylon Translator. US Air Force has awarded a $1 million contract to develop a language translation technology.

17-6. Evaluation
There are various means for evaluating the performance of machine -translation systems. The oldest is the use of human judges to assess a translation's quality. Even though human evaluation is time-consuming, it is still the most reliable way to compare different systems such as rule-based and statistical systems. Automated means of evaluation include BLEU, NIST and METEOR. Relying exclusively on unedited machine translation ignores the fact that communication in human language is context-embedded, and that it takes a human to adequately comprehend the context of the original text. Even purely human generated translations are prone to error. Therefore, to ensure that a machinegenerated translation will be of publishable quality and useful to a human, it must be reviewed and edited by a human. It has, however, been asserted that in certain applications, e.g. product descriptions written in a controlled language, a dictionary-based machine-translation system has produced satisfactory translations that require no human intervention.

203

English for Computer and IT Engineers

18.

Speech recognition

Speech recognition (also known as automatic speech recognition or computer speech recognition) converts spoken words to machine-readable input (for example, to keypresses, using the binary code for a string of character codes). The term "voice recognition" may also be used to refer to speech recognition, but can more precisely refer to speaker recognition, which attempts to identify the person speaking, as opposed to what is being said.

18-1. History
One of the most notable domains for the commercial application of speech recognition in the United States has been health care and in particular the work of the medical transcriptionist (MT). According to industry experts, at its inception, speech recognition (SR) was sold as a way to completely eliminate transcription rather than make the transcription process more efficient, hence it was not accepted. It was also the case that SR at that time was often technically deficient. Additionally, to be used effectively, it required changes to the ways physicians worked and documented clinical encounters, which many if not all were reluctant to do. The biggest limitation to speech recognition automating transcription, however, is seen as the software. The nature of narrative dictation is highly interpretive and often requires judgment that may be provided by a real human but not yet by an automated system. Another limitation has been the extensive amount of time required by the user and/or system provider to train the software. A distinction in ASR is often made between "artificial syntax systems" which are usually domain-specific and "natural language processing" which is usually language-specific. Each of these types of application presents its own particular goals and challenges.

18-2. Applications
Here we list the most important applications of speech recognition systems.

18-2-1. Health care


In the health care domain, even in the wake of improving speech recognition technologies, medical transcriptionists (MTs) have not yet become obsolete. Many

203

Speech recognition 204 experts in the field anticipate that with increased use of speech recognition technology, the services provided may be redistributed rather than replaced. Speech recognition can be implemented in front-end or back-end of the medical documentation process. Front-End SR is where the provider dictates into a speech-recognition engine, the recognized words are displayed right after they are spoken, and the dictator is responsible for editing and signing off on the document. It never goes through an MT/editor. Back-End SR or Deferred SR is where the provider dictates into a digital dictation system, and the voice is routed through a speech-recognition machine and the recognized draft document is routed along with the original voice file to the MT/editor, who edits the draft and finalizes the report. Deferred SR is being widely used in the industry currently. Many Electronic Medical Records (EMR) applications can be more effective and may be performed more easily when deployed in conjunction with a speechrecognition engine. Searches, queries, and form filling may all be faster to perform by voice than by using a keyboard.

18-2-2. Military
High-performance fighter aircraft
Substantial efforts have been devoted in the last decade to the test and evaluation of speech recognition in fighter aircraft. Of particular note are the U.S. program in speech recognition for the Advanced Fighter Technology Integration (AFTI)/F-16 aircraft (F-16 VISTA), the program in France on installing speech recognition systems on Mirage aircraft, and programs in the UK dealing with a variety of aircraft platforms. In these programs, speech recognizers have been operated successfully in fighter aircraft with applications including: setting radio frequencies, commanding an autopilot system, setting steer-point coordinates and weapons release parameters, and controlling flight displays. Generally, only very limited, constrained vocabularies have been used successfully, and a major effort has been devoted to integration of the speech recognizer with the avionics system. Some important conclusions from the work were as follows:

205

English for Computer and IT Engineers

1. Speech recognition has definite potential for reducing pilot workload, but this potential was not realized consistently. 2. Achievement of very high recognition accuracy (95% or more) was the most critical factor for making the speech recognition system useful with lower recognition rates, pilots would not use the system. 3. More natural vocabulary and grammar, and shorter training times would be useful, but only if very high recognition rates could be maintained. Laboratory research in robust speech recognition for military environments has produced promising results which, if extendable to the cockpit, should improve the utility of speech recognition in high-performance aircraft. Working with Swedish pilots flying in the JAS-39 Gripen cockpit, Englund (2004) found recognition deteriorated with increasing G-loads. It was also concluded that adaptation greatly improved the results in all cases and introducing models for breathing was shown to improve recognition scores significantly. Contrary to what might be expected, no effects of the broken English of the speakers were found. It was evident that spontaneous speech caused problems for the recognizer, as could be expected. A restricted vocabulary, and above all, a proper syntax, could thus be expected to improve recognition accuracy substantially. The Eurofighter Typhoon currently in service with the UK RAF employs a speaker-dependent system, i.e. it requires each pilot to create a template. The system is not used for any safety critical or weapon critical tasks, such as weapon release or lowering of the undercarriage, but is used for a wide range of other cockpit functions. Voice commands are confirmed by visual and/or aural feedback. The system is seen as a major design feature in the reduction of pilot workload, and even allows the pilot to assign targets to himself with two simple voice commands or to any of his wingmen with only five commands.

Helicopters
The problems of achieving high recognition accuracy under stress and noise pertain strongly to the helicopter environment as well as to the fighter environment. The acoustic noise problem is actually more severe in the helicopter environment, not only because of the high noise levels but also because the helicopter pilot generally does not wear a facemask, which would reduce acoustic noise in the microphone. Substantial test and evaluation programs have been carried out in the post decade in speech recognition systems applications in helicopters, notably by the U.S. Army Avionics Research and Development Activity (AVRADA) and by the Royal Aerospace Establishment (RAE) in the UK. Work in France has included speech

205

Speech recognition 206 recognition in the Puma helicopter. There has also been much useful work in Canada. Results have been encouraging, and voice applications have included : control of communication radios; setting of navigation systems; and control of an automated target handover system. As in fighter applications, the overriding issue for voice in helicopters is the impact on pilot effectiveness. Encouraging results are reported for the AVRADA tests, although these represent only a feasibility demonstration in a test environment. Much remains to be done both in speech recognition and in overall speech recognition technology, in order to consistently achieve performance improvements in operational settings.

Battle management
Battle management command centres generally require rapid access to and control of large, rapidly changing information databases. Commanders and system operators need to query these databases as conveniently as possible, in an eyesbusy environment where much of the information is presented in a display format. Human machine interaction by voice has the potential to be very useful in these environments. A number of efforts have been undertaken to interface commercially available isolated-word recognizers into battle management environments. In one feasibility study, speech recognition equipment was tested in conjunction with an integrated information display for naval battle management applications. Users were very optimistic about the potential of the system, although capabilities were limited. Speech understanding programs sponsored by the Defense Advanced Research Projects Agency (DARPA) in the U.S. has focused on this problem of natural speech interface.. Speech recognition efforts have focused on a database of continuous speech recognition (CSR), large-vocabulary speech which is designed to be representative of the naval resource management task. Significant advances in the state-of-the-art in CSR have been achieved, and current efforts are focused on integrating speech recognition and natural language processing to allow spoken language interaction with a naval resource management system.

Training air traffic controllers


Training for military (or civilian) air traffic controllers (ATC) represents an excellent application for speech recognition systems. Many ATC training systems currently require a person to act as a "pseudo-pilot", engaging in a voice dialog with the trainee controller, which simulates the dialog which the controller would

207

English for Computer and IT Engineers

have to conduct with pilots in a real ATC situation. Speech recognition and synthesis techniques offer the potential to eliminate the need for a person to act as pseudo-pilot, thus reducing training and support personnel. Air controller tasks are also characterized by highly structured speech as the primary output of the controller, hence reducing the difficulty of the speech recognition task. The U.S. Naval Training Equipment Center has sponsored a number of developments of prototype ATC trainers using speech recognition. Generally, the recognition accuracy falls short of providing graceful interaction between the trainee and the system. However, the prototype training systems have demonstrated a significant potential for voice interaction in these systems, and in other training applications. The U.S. Navy has sponsored a large-scale effort in ATC training systems, where a commercial speech recognition unit was integrated with a complex training system including displays and scenario creation. Although the recognizer was constrained in vocabulary, one of the goals of the training programs was to teach the controllers to speak in a constrained language, using specific vocabulary specifically designed for the ATC task. Research in France has focussed on the application of speech recognition in ATC training systems, directed at issues both in speech recognition and in application of task -domain grammar constraints. The USAF, USMC, US Army, and FAA are currently using ATC simulators with speech recognition from a number of different vendors, including UFA, Inc. and Adacel Systems Inc (ASI). This software uses speech recognition and synthetic speech to enable the trainee to control aircraft and ground vehicles in the simulation without the need for pseudo pilots. Another approach to ATC simulation with speech recognition has been created by Supremis. The Supremis system is not constrained by rigid grammars imposed by the underlying limitations of other recognition strategies.

18-2-3. Telephony and other domains


ASR in the field of telephony is now commonplace and in the field of computer gaming and simulation is becoming more widespread. Despite the high level of integration with word processing in general personal computing, however, ASR in the field of document production has not seen the expected increases in use.

207

Speech recognition 208 The improvement of mobile processor speeds let create speech-enabled Symbian and Windows Mobile Smartphones. Current speech-to-text programs are too large and require too much CPU power to be practical for the Pocket PC. Speech is used mostly as a part of User Interface, for creating pre-defined or custom speech commands. Leading software vendors in this field are: Microsoft Corporation (Microsoft Voice Command); Nuance Communications (Nuance Voice Control); Vito Technology (VITO Voice2Go); Speereo Software (Speereo Voice Translator). People with disabilities are another part of the population that benefit from using speech recognition programs. It is especially useful for people who have difficulty with or are unable to use their hands, from mild repetitive stress injuries to involved disabilities that require alternative input for support with accessing the computer. In fact, people who used the keyboard a lot and developed RSI became an urgent early market for speech recognition. Speech recognition is used in deaf telephony, such as spinvox voice-to-text voicemail, relay services, and captioned telephone.

18-2-4. Further applications


y y y y y y y y y y y y y y

Automatic translation Automotive speech recognition (e.g., Ford Sync) Telematics (e.g. vehicle Navigation Systems) Court reporting (Realtime Voice Writing) Hands-free computing: voice command recognition computer user interface Home automation Interactive voice response Mobile telephony, including mobile email Multimodal interaction Pronunciation evaluation in computer-aided language learning applications Robotics Transcription (digital speech-to-text). Speech-to-Text (Transcription of speech into mobile text messages) Air Traffic Control Speech Recognition

18-3. Speech recognition systems


18-3-1. Hidden Markov model based speech recognition

209

English for Computer and IT Engineers

Modern general-purpose speech recognition systems are generally based on HMMs. These are statistical models which output a sequence of symbols or quantities. One possible reason why HMMs are used in speech recognition is that a speech signal could be viewed as a piecewise stationary signal or a short time stationary signal. That is, one could assume in a short-time in the range of 10 milliseconds, speech could be approximated as a stationary process. Speech could thus be thought of as a Markov model for many stochastic processes. Another reason why HMMs are popular is because they can be trained automatically and are simple and computationally feasible to use. In speech recognition, the hidden Markov model would output a sequence of n-dimensional real-valued vectors (with n being a small integer, such as 10), outputting one of these every 10 milliseconds. The vectors would consist of cepstral coefficients, which are obtained by taking a Fourier transform of a short time window of speech and decorrelating the spectrum using a cosine transform, then taking the first (most significant) coefficients. The hidden Markov model will tend to have in each state a statistical distribution that is a mixture of diagonal covariance Gaussians which will give a likelihood for each observed vector. Each word, or (for more general speech recognition systems), each phoneme, will have a different output distribution; a hidden Markov model for a sequence of words or phonemes is made by concatenating the individual trained hidden Markov models for the separate words and phonemes. Described above are the core elements of the most common, HMM-based approach to speech recognition. Modern speech recognition systems use various combinations of a number of standard techniques in order to improve results over the basic approach described above. A typical large-vocabulary system would need context dependency for the phonemes (so phonemes with different left and right context have different realizations as HMM states); it would use cepstral normalization to normalize for different speaker and recording conditions; for further speaker normalization it might use vocal tract length normalization (VTLN) for male-female normalization and maximum likelihood linear regression (MLLR) for more general speaker adaptation. The features would have so-called delta and delta-delta coefficients to capture speech dynamics and in addition might use heteroscedastic linear discriminant analysis (HLDA); or might skip the delta and delta-delta coefficients and use splicing and an LDA-based projection followed perhaps by heteroscedastic linear discriminant analysis or a global semitied covariance transform (also known as maximum likelihood linear transform, or MLLT). Many systems use so-called discriminative training techniques which dispense with a purely statistical approach to HMM parameter estimation and instead optimize some classification-related measure of the training data. Examples

209

Speech recognition 210 are maximum mutual information (MMI), minimum classification error (MCE) and minimum phone error (MPE). Decoding of the speech (the term for what happens when the system is presented with a new utterance and must compute the most likely source sentence) would probably use the Viterbi algorithm to find the best path, and here there is a choice between dynamically creating a combination hidden Markov model which includes both the acoustic and language model information, or combining it statically beforehand (the finite state transducer, or FST, approach).

18-3-2. Dynamic time warping based speech recognition


Dynamic time warping is an approach that was historically used for speech recognition but has now largely been displaced by the more successful HMM based approach. Dynamic time warping is an algorithm for measuring similarity between two sequences which may vary in time or speed. For instance, similarities in walking patterns would be detected, even if in one video the person was walking slowly and if in another they were walking more quickly, or even if there were accelerations and decelerations during the course of one observation. DTW has been applied to video, audio, and graphics indeed, any data which can be turned into a linear representation can be analyzed with DTW. A well known application has been automatic speech recognition, to cope with different speaking speeds. In general, it is a method that allows a computer to find an optimal match between two given sequences (e.g. time series) with certain restrictions, i.e. the sequences are "warped" non-linearly to match each other. This sequence alignment method is often used in the context of hidden Markov models.

18-4. Performance systems

of

speech

recognition

The performance of speech recognition systems is usually specified in terms of accuracy and speed. Accuracy may be measured in terms of performance accuracy which is usually rated with word error rate (WER), whereas speed is measured with the real time factor. Other measures of accuracy include Single Word Error Rate (SWER) and Command Success Rate (CSR). Most speech recognition users would tend to agree that dictation machines can achieve very high performance in controlled conditions. There is some confusion,

211

English for Computer and IT Engineers

however, over the interchangeability of the terms "speech recognition" and "dictation". Commercially available speaker-dependent dictation systems usually require only a short period of training (sometimes also called `enrollment') and may successfully capture continuous speech with a large vocabulary at normal pace with a very high accuracy. Most commercial companies claim that recognition software can achieve between 98% to 99% accuracy if operated under optimal conditions. `Optimal conditions' usually assume that users:
y y y

have speech characteristics which match the training data, can achieve proper speaker adaptation, and work in a clean noise environment (e.g. quiet office or laboratory space).

This explains why some users, especially those whose speech is heavily accented, might achieve recognition rates much lower than expected. Speech recognition in video has become a popular search technology used by several video search companies. Limited vocabulary systems, requiring no training, can recognize a small number of words (for instance, the ten digits) as spoken by most speakers. Such systems are popular for routing incoming phone calls to their destinations in large organizations. Both acoustic modeling and language modeling are important parts of modern statistically-based speech recognition algorithms. Hidden Markov models (HMMs) are widely used in many systems. Language modeling has many other applications such as smart keyboard and document classification.

211

You might also like