Advanced Computer Architecture

Advanced Computer Architecture
What is Parallel Processing?

Parallel processing is another method used to improve
performance in a computer system, when a system processes two
different instructions simultaneously,it is performing parallel
processing
Nowadays, commercial applications are most used on parallel
computers. A computer that runs such an application has to be
able to process large amount of data in sophisticated ways. We
can say with no doubt that commercial applications will define
future parallel computers architecture. But scientific applications
will still remain important users of parallel computing technology.
Trends in commercial and scientific applications are merging as
commercial
applications
perform
more
sophisticated
computations and scientific applications become more data

intensive. Today, a lot of parallel programming languages and
compilers, based on dependencies detected in source code, are
able to automatically split a program into multiple processes
and/or threads to be executed concurrently on the available
processors from a parallel system.
Parallel computing is an efficient form of information processing which emphasizes the
exploitation of concurrent events in the computing process. Concurrency implies parallelism,
simultaneity and pipelining. Parallel events may occur in multiple resources during the same
time interval; simultaneous events may occur at the same time instant; and pipelined events may
occur in overlapped time spans. Parallel processing demands concurrent execution of many
programs in the computer. It is a cost effective means to improve system performance through
concurrent activities in the computer. The highest level of parallel processing is conducted among
multiple jobs or programs through multiprogramming, time-sharing, and multiprocessing. This
presentation covers the basics of parallel computing. Beginning with a brief overview and some
concepts and terminology associated with parallel computing, the topics of parallel memory
architectures, Parallel computer architectures and Parallel programming models are then
explored.
Introduction:-
Parallel computing is an efficient form of information processing which emphasizes

the exploitation of concurrent events in the computing process. Concurrency implies
parallelism, simultaneity and pipelining. Parallel events may occur in multiple
resources during the same time interval; simultaneous events may occur at the
same time instant; and pipelined events may occur in overlapped time spans.
Parallel processing demands concurrent execution of many programs in the
computer. The highest level of parallel processing is conducted among multiple jobs
or programs through multiprogramming, time-sharing, and multiprocessing.
What is Parallel Computing?

Traditionally, software has been written for serial computation. To be executed by a single
computer having a single Central Processing Unit (CPU). Problems are solved by a series of
instructions, executed one after the other by the CPU. Only one instruction may be executed at
any moment in time.
Where as parallel computing is the simultaneous use of multiple compute
resources to solve a computational problem. The compute resources can include a
single computer with multiple processors, an arbitrary number of computers
connected by a network, A combination of both.
The computational problem usually demonstrates characteristics such as the ability
to be:
1) Broken apart into discrete pieces of work that can be solved simultaneously.
2) Execute multiple program instructions at any moment in time.
3) Solved in less time with multiple compute resources than with a single compute
resource.
Why Use Parallel Computing?

There are two primary reasons for using parallel computing:
a) Save time - wall clock time
b) Solve larger problems
Other reasons might include:

A) Taking advantage of non-local resources - using available compute resources on a
wide area network, or even the Internet when local compute resources are scarce.
B) Cost savings - using multiple "cheap" computing resources instead of paying for
time on a supercomputer.
C) Overcoming memory constraints - single computers have very finite memory
resources. For large problems, using the memories of multiple computers may
overcome this obstacle.
D) Transmission speeds - the speed of a serial computer is directly dependent upon
how fast data can move through hardware. Absolute limits are the speed of light (30
cm/nanosecond) and the transmission limit of copper wire (9 cm/nanosecond).
Increasing speeds necessitate increasing proximity of processing elements.
1) Concepts of Parallel Computing
Parallelism in Uniprocessor systems:
We can introduce parallelism techniques in Uniprocessor systems. Whose having

single processor those techniques are.
A) Multiplicity of Functional units:
Any functions of the ALU can be
distributed to multiple and specialized functional units, which can operate in

parallel. For example in CDC 6600 Uniprocessor has 10 functional units built into
its CPU. These 10 units are independent of each other and may operate
simultaneously.
B) Parallelism and pipelining with in CPU :
Parallel adders using
such techniques as carry-look ahead and carry-save are now built into almost all
ALUs. High-speed multiplier recording and convergency division are techniques for
exploring parallelism.
Various phases of Instructions executions are now pipelined. Including instruction

fetch, decode, operand fetch arithmetic logic execution and store result. To facilitate
overlapped instruction execution through pipe, instruction prefect and data
buffering have been developed.
C) Overlapped CPU and I/O Operations:

I/O operations can be performed simultaneously with CPU computations by using
separate I/O controllers, channels and I/O processors. The DMA channel can be used
to provide direct information transfer between I/O devices and main memory.
2.architectural classification schemes4.2 Architectural

Classification Schemes
4.2.1 Flynns Classification
The most popular taxonomy of computer architecture was defined by Flynn in 1966.
Flynn's classification scheme is based on the notion of a stream of information. Two types of
information flow into a processor: instructions and data. The instruction stream is defined as the
sequence of instructions performed by the processing unit. The data stream is defined as the data
traffic exchanged between the memory and the processing unit.
According to Flynn's classification, either of the instruction or data streams can be single or
multiple.
Computer architecture can be classified into the following four distinct categories:
single-instruction single-data streams (SISD);
single-instruction multiple-data streams (SIMD);
multiple-instruction single-data streams (MISD); and
19
multiple-instruction multiple-data streams (MIMD).
Conventional single-processor von Neumann computers are classified as SISD systems. Parallel
computers are either SIMD or MIMD. When there is only one control unit and all processors
execute the same instruction in a synchronized fashion, the parallel machine is classified as
SIMD. In a MIMD machine, each processor has its own control unit and can execute different
instructions on different data. In the MISD category, the same stream of data flows through a
linear array of processors executing different instruction streams. In practice, there is no viable
MISD machine; however, some authors have considered pipelined machines (and perhaps
systolic-array computers) as examples for MISD. An extension of Flynn's taxonomy was
introduced by D. J. Kuck in 1978. In his classification, Kuck extended the instruction stream
further to single (scalar and array) and multiple (scalar and array) streams. The data stream in
Kuck's classification is called the execution stream and is also extended to include single (scalar
and array) and multiple (scalar and array) streams. The combination of these streams results in a
total of 16 categories of architectures.
4.2.1.1 SISD Architecture
A serial (non-parallel) computer
Single instruction: only one instruction stream is being acted on by the CPU during any
one clock cycle
Single data: only one data stream is being used as input during any one clock cycle
Deterministic execution
This is the oldest and until recently, the most prevalent form of computer
Examples: most PCs, single CPU workstations and mainframes
Figure 4.1 SISD COMPUTER
20
4.2.1.2 SIMD Architecture
A type of parallel computer
Single instruction: All processing units execute the same instruction at any given clock
cycle
Multiple data: Each processing unit can operate on a different data element
This type of machine typically has an instruction dispatcher, a very high-bandwidth
internal network, and a very large array of very small-capacity instruction units.
Best suited for specialized problems characterized by a high degree of regularity, such as
image processing.
Synchronous (lockstep) and deterministic execution
Two varieties: Processor Arrays and Vector Pipelines
Examples:
o Processor Arrays: Connection Machine CM-2, Maspar MP-1, MP-2
o Vector Pipelines: IBM 9000, Cray C90, Fujitsu VP, NEC SX-2, Hitachi S820
Figure 4.2 SIMD COMPUTER
CU-control unit
PU-processor unit
MM-memory module
SM-Shared memory
IS-instruction stream
DS-data stream
4.2.1.3 MISD Architecture
21
There are n processor units, each receiving distinct instructions operating over the same data
streams and its derivatives. The output of one processor become input of the other in the macro
pipe. No real embodiment of this class exists.
A single data stream is fed into multiple processing units.
Each processing unit operates on the data independently via independent instruction
streams.
Few actual examples of this class of parallel computer have ever existed. One is the
experimental Carnegie-Mellon C.mmp computer (1971).
Some conceivable uses might be:
o multiple frequency filters operating on a single signal stream
o multiple cryptography algorithms attempting to crack a single coded message.
Figure 4.3 MISD COMPUTER
4.2.1.4 MIMD Architecture
Multiple-instruction multiple-data streams (MIMD) parallel architectures are made of multiple
processors and multiple memory modules connected together via some interconnection network.
They fall into two broad categories: shared memory or message passing. Processors exchange
information through their central shared memory in shared memory systems, and exchange
information through their interconnection network in message passing systems.
22
Currently, the most common type of parallel computer. Most modern computers fall into
this category.
Multiple Instruction: every processor may be executing a different instruction stream
Multiple Data: every processor may be working with a different data stream
Execution can be synchronous or asynchronous, deterministic or non-deterministic
Examples: most current supercomputers, networked parallel computer "grids" and
multiprocessor
SMP computers - including some types of PCs.
A shared memory system typically accomplishes interprocessor coordination through a global
memory shared by all processors. These are typically server systems that communicate through a
bus and cache memory controller.
A message passing system (also referred to as distributed memory) typically combines the local
memory and processor at each node of the interconnection network. There is no global memory,
so it is necessary to move data from one local memory to another by means of message passing.
Figure 4.4 MIMD COMPUTER
Computer Class Computer System Models
1. SISD IBM 701, IBM 1620, IBM 7090, PDP VAX11/ 780
2. SISD (With
multiple
functional units)
IBM360/91 (3); IBM 370/168 UP
3. SIMD (Word
Slice
Processing)
Illiac IV ; PEPE
4. SIMD (Bit Slice STARAN; MPP; DAP
23
processing)
5. MIMD (Loosely
Coupled)
IBM 370/168 MP; Univac 1100/80
6. MIMD(Tightly
Coupled)
Burroughs- D 825
Table 4.1 Flynns Computer System Classification
4.2.2 Fengs Classification
Tse-yun Feng suggested the use of degree of parallelism to classify various computer
architectures.
Serial Versus Parallel Processing
The maximum number of binary digits that can be processed within a unit time by a
computer system is called the maximum parallelism degree P.
A bit slice is a string of bits one from each of the words at the same vertical position.
There are 4 types of methods under above classification
Word Serial and Bit Serial (WSBS)
Word Parallel and Bit Serial (WPBS)
Word Serial and Bit Parallel(WSBP)
Word Parallel and Bit Parallel (WPBP)

WSBS has been called bit parallel processing because one bit is processed at a time.
WPBS has been called bit slice processing because m-bit slice is processes at a time.
WSBP is found in most existing computers and has been called as Word Slice processing
because one word of n bit processed at a time.
WPBP is known as fully parallel processing in which an array on n x m bits is processes at one
time.
Mode Computer Model Degree of
parallelism
WSPS
N=1
M=1
The MINIMA (1,1)
WPBS
N=1
M>1
STARAN
MPP
DAP
(1,256)
(1,16384)
(1,4096)
WSBP
n>1
m=1
(Word Slice Processing)
IBM 370/168 UP
CDC 6600
Burrough 7700
VAX 11/780
(64,1)
(60,1)
(48,1)
(16/32,1)
WPBP
n>1
m>1
(fully parallel Processing)
Illiav IV (64,64)
24
Table 4.2 Fengs Computer Classification
4.2.3 Handlers Classification
Wolfgang Handler has proposed a classification scheme for identifying the parallelism
degree and pipelining degree built into the hardware structure of a computer system. He
considers at three subsystem levels:
Processor Control Unit (PCU)
Arithmetic Logic Unit (ALU)

Bit Level Circuit (BLC)
Each PCU corresponds to one processor or one CPU. The ALU is equivalent to Processor
Element (PE). The BLC corresponds to combinational logic circuitry needed to perform 1 bit
operations in the ALU.
A computer system C can be characterized by a triple containing six independent entities
T(C) = <K x K', D x D', W x W' >
Where K = the number of processors (PCUs) within the computer
D = the number of ALUs under the control of one CPU
W = the word length of an ALU or of an PE
W' = The number of pipeline stages in all ALUs or in a PE
D' = the number of ALUs that can be pipelined
K' = the number of PCUs that can be pipelined
4.3 Let us Sum Up
The architectural classification schemes has been presented in this lesson under 3 different
classifications Flynns, Fengs and Handlers. The instruction format representation has also be
given for Flynns scheme and examples of all classifications has been discussed.
4.4 Lesson-end Activities
1.With examples, explain Flynns computer system classification.
2.Discuss how parallelism can be achieved using Fengs and Handlers classification.
4.5 Points for Discussions
Single Instruction, Single Data stream (SISD)
A sequential computer which exploits no parallelism in either the instruction or data streams.
Examples of SISD architecture are the traditional uniprocessor machines like a PC or old
mainframes.
Single Instruction, Multiple Data streams (SIMD)
A computer which exploits multiple data streams against a single instruction stream to perform
operations which may be naturally parallelized. For example, an array processor or GPU.
25
Multiple Instruction, Single Data stream (MISD)
Unusual due to the fact that multiple instruction streams generally require multiple data streams
to be effective..
Multiple Instruction, Multiple Data streams (MIMD)
Multiple autonomous processors simultaneously executing different instructions on different
data. Distributed systems are generally recognized to be MIMD architectures; either exploiting a
single shared memory space or a distributed memory space.
3.Evolution of parallel processors

HISTORY OF PARALLEL COMPUTERS
The experiments with and implementations of the use of parallelism started long
back in the 1950s by the IBM. The IBM STRETCH computers also known as IBM 7030
were built in 1959. In the design of these computers, a number of new concepts like
overlapping I/O with processing and instruction look ahead were introduced. A
serious approach towards designing parallel computers was started with the
development of ILLIAC IV in 1964 at the University of Illionis. It had a single control
unit but multiple processing elements. On this machine, at one time, a single
operation is executed on different data items by different processing elements. The
concept of pipelining was introduced in computer CDC 7600 in 1969. It used
pipelined arithmatic unit. In the years 1970 to 1985, the research in this area was
focused on the development of vector super computer. In 1976, the CRAY1 was
developed by Seymour Cray. Cray1 was a pioneering effort in the development of
vector registers. It accessed main memory only for load and store operations. Cray1
did not use virtual memory, and optimized pipelined arithmetic unit. Cray1 had
clock speed of 12.5 n.sec. The Cray1 processor evloved upto a speed of 12.5 Mflops
on 100 100 linear equation solutions. The next generation of Cray called Cray
XMP was developed in the years 1982-84. It was coupled with 8-vector
supercomputers and used a shared memory.
Apart from Cray, the giant company manufacturing parallel computers,Control Data
Corporation (CDC) of USA, produced supercomputers, the CDC 7600. Its vector
supercomputers called Cyber 205 had memory to memory architecture, that is,
input vector operants were streamed from the main memory to the vector
arithmetic unit and the results were stored back in the main memory. The
advantage of this architecture was that it did not limit the size of vector operands.
The disadvantage was that it required a very high speed memory so that there
would be no speed mismatch between vector arithmetic units and main memory.
Manufacturing such high speed memory is very costly. The clock speed of Cyber 205
was 20 n.sec.
In the 1980s Japan also started manufacturing high performance vector
supercomputers. Companies like NEC, Fujitsu and Hitachi were the main
manufacturers. Hitachi
6 7 Introduction to Parallel Computing
developed S-810/210 and S-810/10 vector supercomputers in 1982. NEC developed
SX-1 and Fujitsu developed VP-200. All these machines used semiconductor
technologies to achieve speeds at par with Cray and Cyber. But their operating
system and vectorisers were poorer than those of American companies.
Parallel computing is the Computer Science discipline that deals with the system architecture and software issues
related to the concurrent execution of applications. It has been an area of active research interest and application for
decades, mainly the focus of high performance computing, but is now emerging as the prevalent computing paradigm
due to the semiconductor industrys shift to multi-core processors
The interest in parallel computing dates back to the late 1950s, with advancements surfacing in the form of
supercomputers throughout the 60s and 70s. These were shared memory multiprocessors, with multiple
processors working side-by-side on shared data. In the mid 1980s, a new kind of parallel computing was launched
when the Caltech Concurrent Computation project built a supercomputer for scientific applications from 64 Intel
8086/8087 processors. This system showed that extreme performance could be achieved with mass market, off the
shelf microprocessors. These massively parallel processors (MPPs) came to dominate the top end of computing, with
the ASCI Red supercomputer computer in 1997 breaking the barrier of one trillion floating point operations per
second. Since then, MPPs have continued to grow in size and power.
Starting in the late 80s, clusters came to compete and eventually displace MPPs for many applications. A cluster is
a type of parallel computer built from large numbers of off-the-shelf computers connected by an off-the-shelf network.
Today, clusters are the workhorse of scientific computing and are the dominant architecture in the data centers that
power the modern information age.
nToday, parallel computing is becoming mainstream based on multi-core processors. Most desktop and laptop
systems now ship with dual-core microprocessors, with quad-core processors readily available. Chip manufacturers
have begun to increase overall processing performance by adding additional CPU cores. The reason is that
increasing performance through parallel processing can be far more energy-efficient than increasing microprocessor
clock frequencies. In a world which is increasingly mobile and energy conscious, this has become essential.
Fortunately, the continued transistor scaling predicted by Moores Law will allow for a transition from a few cores to
many.
Parallel Software
The software world has been very active part of the evolution of parallel computing. Parallel programs have been
harder to write than sequential ones. A program that is divided into multiple concurrent tasks is more difficult to write,
due to the necessary synchronization and communication that needs to take place between those tasks. Some
standards have emerged. For MPPs and clusters, a number of application programming interfaces converged to a
single standard called MPI by the mid 1990s. For shared memory multiprocessor computing, a similar process
unfolded with convergence around two standards by the mid to late 1990s: pthreads and OpenMP. In addition to
these, a multitude of competing parallel programming models and languages have emerged over the years. Some of
these models and languages may provide a better solution to the parallel programming problem than the above
standards, all of which are modifications to conventional, non-parallel languages like C.
As multi-core processors bring parallel computing to mainstream customers, the key challenge in computing today is
to transition the software industry to parallel programming. The long history of parallel software has not revealed any
silver bullets, and indicates that there will not likely be any single technology that will make parallel software
ubiquitous. Doing so will require broad collaborations THE MANYCORE SHIFT: Microsoft Parallel Computing Initiative Ushers
Computing into the Next Era | 2
across industry and academia to create families of technologies that work together to bring the power of parallel
computing to future mainstream applications. The changes needed will affect the entire industry, from consumers to
hardware manufacturers and from the entire software development infrastructure to application developers who rely
upon it.
Future capabilities such as photorealistic graphics, computational perception, and machine learning
really heavily on highly parallel algorithms. Enabling these capabilities will advance a new generation
of experiences that expand the scope and efficiency of what users can accomplish in their digital
lifestyles and work place. These experiences include more natural, immersive, and increasingly multisensory interactions that offer multi-dimensional richness and context awareness. The future for
parallel computing is bright, but with new opportunities come new challenges.
1.2.3 Trends towards Parallel Processing

From an application point of view, the mainstream of usage of computer is experiencing a
trend of four ascending levels of sophistication:
Data processing
Information processing
Knowledge processing
Intelligence processing
Computer usage started with data processing, while is still a major task of todays
computers. With more and more data structures developed, many users are shifting to computer
roles from pure data processing to information processing. A high degree of parallelism has been
found at these levels. As the accumulated knowledge bases expanded rapidly in recent years,
there grew a strong demand to use computers for knowledge processing. Intelligence is very
difficult to create; its processing even more so.
Todays computers are very fast and obedient and have many reliable memory cells to be
qualified for data-information-knowledge processing.
Computers are far from being satisfactory in performing theorem proving, logical inference and
creative thinking.
5
From an operating point of view, computer systems have improved chronologically in four
phases:
batch processing
multiprogramming
time sharing
multiprocessing
Figure 1.1 The spaces of data, information, knowledge and intelligence from the viewpoint of computer
processing
In these four operating modes, the degree of parallelism increase sharply from phase to phase.
We define parallel processing as
Parallel processing is an efficient form of information processing which emphasizes the
exploitation of concurrent events in the computing process. Concurrency implies parallelism,
simultaneity, and pipelining. Parallel processing demands concurrent executiom of many
programs in the computer. The highest level of parallel processing is conducted among multiple
jobs or programs through multiprogramming, time sharing, and multiprocessing.
Parallel processing can be challenged in four programmatic levels:
Job or program level
Task or procedure level
Interinstruction level
Intrainstruction level
The highest job level is often conducted algorithmically. The lowest intra-instruction level is
often implemented directly by hardware means. Hardware roles increase from high to low levels.
On the other hand, software implementations increase from low to high levels.
Information
Processing
Increasing Complexity
and Sophistication in
Processing
Intelligence
Processing
Knowledge
Processing
Data Processing
Increasing Volumes
of raw material to be
processed
6
Figure 1.2 The system architecture of the super mini VAX 11/780 microprocessor system
The trend is also supported by the increasing demand for a faster real-time, resource
sharing and fault-tolerant computing environment.
MainMemory
232Words of 32
bits each
Console
Diagnostic Memory
Floating Point Accelerator
CPU
R0...
PC
ALU
Registers
Local
Memory
Floppy
Disk
Synchronous back plane interconnect (SBI)

Unibus
Adapter
Massbus
Adapter
Uni Bus
I/O Devices
MassBus
I/O Devices
SBI I/O Device
Input Output Sub System
7
It requires a broad knowledge of and experience with all aspects of algorithms, languages,
software, hardware, performance evaluation and computing alternatives.
To achieve parallel processing requires the development of more capable and cost
effective computer system.
With respect to parallel processing, the general architecture trend is being shifted from
conventional uniprocessor systems to multiprocessor systems to an array of processing elements
controlled by one uniprocessor. From the operating system point of view computer systems have
been improved to batch processing, multiprogramming, and time sharing and multiprocessing.
Computers to be used in the 1990 may be the next generation and very large scale integrated
chips will be used with high density modular design. More than 1000 mega float point operation
per second are expected in these future supercomputers. The evolution of computer systems
helps in learning the generations of computer systems.
4. Principles of pipelining
The two major parametric
considerations in designing a parallel
computer architecture are:
executing multiple number of
instructions in parallel,
increasing the efficiency of
processors.
There are various methods by which
instructions can be executed in parallel
Pipelining is one of the classical
and effective methods to increase
parallelism where different stages
perform repeated functions on
different operands.
Vector processing is the
arithmetic or logical computation
applied on vectors whereas in
scalar processing only one data
item or a pair of data items is
processed.
Superscalar processing : For

improving the processors speed by
having multiple instructions per cycle
is known as Superscalar processing.
Multithreading : used for increasing
processor utilization which is also
used in parallel computer
architecture.
Execution of Concurrent Events in the
computing
process to achieve faster
Computational Speed
Levels of Parallel Processing
- Job or Program level
- Task or Procedure level
- Inter-Instruction level
- Intra-Instruction level
5.Array Processing
Signal Processing is a wide area of research that extends

from the simplest form of 1-D signal processing to the
complex form of M-D and array signal processing. This
article presents a short survey of the concepts, principles
and applications of Array Processing. Array structure can
be defined as a set of sensors that are spatially
separated, e.g. antennas. The basic problem that we
attend to solve by using array processing technique(s) is
to:
Determine number and locations of energy-radiating
sources (emitters).
Enhance the signal to noise ratio SNR "signal-tointerference-plus-noise ratio (SINR)".
Track multiple moving sources.
The ultimate goal of sensor array signal processing is to
estimate the values of parameters by using available
temporal and spatial information, collected through
sampling a wavefield with a set of antennas that have a
precise geometry description. The processing of the
captured data and information is done under the
assumption that the wavefield is generated by a finite
number of signal sources (emitters), and contains
information about signal parameters characterizing and
describing the sources. There are many applications
related to the above problem formulation, where the
number of sources, their directions and locations should
be specified. To motivate the reader, some of the most
important applications related to array processing will be
discussed.
Radar and Sonar Systems:
array processing concept was closely linked to radar and
sonar systems which represent the classical applications
of array processing. The antenna array is used in these

systems to determine location(s) of source(s), cancel
interference, suppress ground clutter. Radar Systems
used basically to detect objects by using radio waves. The
range, altitude, speed and direction of objects can be
specified. Radar systems started as military equipments
then entered the civilian world. In radar applications,
different modes can be used, one of these modes is the
active mode. In this mode the antenna array based
system radiates pulses and listens for the returns. By
using the returns, the estimation of parameters such as
velocity, range and DOAs (direction of arrival) of target of
interest become possible. Using the passive far-field
listening arrays, only the DOAs can be estimated. Sonar
Systems (Sound Navigation and Ranging) use the sound
waves that propagate under the water to detect objects
on or under the water surface. Two types of sonar
systems can be defined the active one and the passive
one. In active sonar, the system emits pulses of sound
and listens to the returns that will be used to estimate
parameters. In the passive sonar, the system is
essentially listening for the sounds made by the target
objects. It is very important to note the difference
between the radar system that uses audio waves and the
sonar system that uses sound waves, the reason why the
sonar uses the sound wave is because sound waves
travel farther in the water than do radar and light waves.
In passive sonar, the receiving array has the capability of
detecting distant objects and their locations. Deformable
array are usually used in sonar systems where the
antenna is typically drawn under the water. In active
sonar, the sonar system emits sound waves (acoustic
energy) then listening and monitoring any existing echo
(the reflected waves). The reflected sound waves can be
used to estimate parameters, such as velocity, position

and direction etc. Difficulties and limitations in sonar
systems comparing to radar systems emerged from the
fact that the propagation speed of sound waves under the
water is slower than the radio waves. Another source of
limitation is the high propagation losses and scattering.
Despite all these limitations and difficulties, sonar system
remains a reliable technique for range, distance, position
and other parameters estimation for underwater
applications.[3][5]
Radar System
NORSAR is an independent geo-scientific research facility
that was founded in Norway in 1968. NORSAR has been
working with array processing ever since to measure
seismic activity around the globe.[6] They are currently
working on an International Monitoring System which will
comprise 50 primary and 120 auxiliary seismic stations
around the world. NORSAR has ongoing work to improve
array processing to improve monitoring of seismic activity
not only in Norway but around the globe.[7]
Communications (wireless)
Communication can be defined as the process of
exchanging of information between two or more parties.
The last two decades witnessed a rapid growth of wireless
communication systems. This success is a result of
advances in communication theory and low power
dissipation design process. In general, communication
(telecommunication) can be done by technological means
through either electrical signals (wired communication) or
electromagnetic waves (wireless communication).
Antenna arrays have emerged as a support technology to
increase the usage efficiency of spectral and enhance the

accuracy of wireless communication systems by utilizing
spatial dimension in addition to the classical time and
frequency dimensions. Array processing and estimation
techniques have been used in wireless communication.
During the last decade these techniques were re-explored
as ideal candidates to be the solution for numerous
problems in wireless communication. In wireless
communication, problems that affect quality and
performance of the system may come from different
sources. The multiuser medium multiple access- and
multipath -signal propagation over multiple scattering
paths in wireless channels- communication model is one
of the most widespread communication models in
wireless communication (mobile communication).
Multi-Path Communication Problem in wireless
communication Systems
In the case of multiuser communication environment, the
existence of multiuser increases the inter-user
interference possibility that can affect quality and
performance of the system adversely. In mobile
communication systems the multipath problem is one of
the basic problems that base stations have to deal with.
Base stations have been using spatial diversity for
combating fading due to the severe multipath. Base
stations use an antenna array of several elements to
achieve higher selectivity. Receiving array can be directed
in the direction of one user at a time, while avoiding the
interference from other users.
Medical applications
Array processing techniques got on much attention from

medical and industrial applications. In medical
applications, the medical image processing field was one
of the basic fields that use array processing. Other
medical applications that use array processing: diseases
treatment, tracking waveforms that have information
about the condition of internal organs e.g. the heart,
localizing and analyzing brain activity by using biomagnetic sensor arrays.[8]
Array Processing for Speech Enhancement
Speech enhancement and processing represents another
field that has been affected by the new era of array
processing. Most of the acoustic front end systems
became fully automatic systems (e.g. telephones).
However, the operational environment of these systems
contains a mix of other acoustic sources; external noises
as well as acoustic couplings of loudspeaker signals
overwhelm and attenuate the desired speech signal. In
addition to these external sources, the strength of the
desired signal is reduced due to the relatively distance
between speaker and microphones. Array processing
techniques have opened new opportunities in speech
processing to attenuate noise and echo without
degrading the quality of and affecting adversely the
speech signal. In general array processing techniques can
be used in speech processing to reduce the computing
power (number of computations) and enhance the quality
of the system (the performance). Representing the signal
as a sum of sub-bands and adapting cancellation filters
for the sub-band signals can reduce the demanded
computation power and lead to a higher performance
system. Relying on multiple input channels allows
designing systems of higher quality comparing to
systems that use single channel and solving problems

such as source localization, tracking and separation,
which cannot be achieved in case of using single channel.
Array Processing in Astronomy Applications
Astronomical environment contains a mix of external
signals and noises that affect the quality of the desired
signals. Most of the arrays processing applications in
astronomy are related to image processing. The array
used to achieve a higher quality that is not achievable by
using a single channel. The high image quality facilitates
quantitative analysis and comparison with images at
other wavelengths. In general, astronomy arrays can be
divided into two classes: the beamforming class and the
correlation class. Beamforming is a signal processing
techniques that produce summed array beams from a
direction of interest used basically in directional signal
transmission or reception- the basic idea is to combine
elements in a phased array such that some signals
experience destructive inference and other experience
constructive inference. Correlation arrays provide images
over the entire single-element primary beam pattern,
computed off-line from records of all the possible
correlations between the antennas, pairwise.
One antenna of the Allan Telescope Array
Other applications
In addition to these applications, many applications have
been developed based on array processing techniques:
Acoustic Beamforming for Hearing Aid Applications,
Under-determined Blind Source Separation Using Acoustic
Arrays, Digital 3D/4D Ultrasound Imaging Array, Smart
Antennas, Synthetic aperture radar, underwater acoustic

imaging, and Chemical sensor arrays...etc
Scalar Pipelines
Scalar to superscalar[edit]
The simplest processors are scalar processors. Each instruction executed by a scalar
processor typically manipulates one or two data items at a time. By contrast, each
instruction executed by a vector processor operates simultaneously on many data
items. An analogy is the difference between scalar and vector arithmetic. A
superscalar processor is a mixture of the two. Each instruction processes one data
item, but there are multiple functional units within each CPU thus multiple
instructions can be processing separate data items concurrently.
Superscalar CPU design emphasizes improving the instruction dispatcher accuracy,
and allowing it to keep the multiple functional units in use at all times. This has
become increasingly important as the number of units has increased. While early
superscalar CPUs would have two ALUs and a single FPU, a modern design such as
the PowerPC 970 includes four ALUs, two FPUs, and two SIMD units. If the dispatcher
is ineffective at keeping all of these units fed with instructions, the performance of
the system will be no better than that of a simpler, cheaper design.
A superscalar processor usually sustains an execution rate in excess of one
instruction per machine cycle. But merely processing multiple instructions
concurrently does not make an architecture superscalar, since pipelined,
multiprocessor or multi-core architectures also achieve that, but with different
methods.
In a superscalar CPU the dispatcher reads instructions from memory and decides
which ones can be run in parallel, dispatching each to one of the several functional
units contained inside a single CPU. Therefore, a superscalar processor can be
envisioned having multiple parallel pipelines, each of which is processing
instructions simultaneously from a single instruction thread.
Limitations[edit]
Available performance improvement from superscalar techniques is limited by three
key areas:
The degree of intrinsic parallelism in the instruction stream (instructions requiring
the same computational resources from the CPU).
The complexity and time cost of dependency checking logic and register renaming
circuitry
The branch instruction processing.
Existing binary executable programs have varying degrees of intrinsic parallelism. In
some cases instructions are not dependent on each other and can be executed
simultaneously. In other cases they are inter-dependent: one instruction impacts
either resources or results of the other. The instructions a = b + c; d = e + f can be
run in parallel because none of the results depend on other calculations. However,
the instructions a = b + c; b = e + f might not be runnable in parallel, depending on
the order in which the instructions complete while they move through the units.
When the number of simultaneously issued instructions increases, the cost of
dependency checking increases extremely rapidly. This is exacerbated by the need
to check dependencies at run time and at the CPU's clock rate. This cost includes
additional logic gates required to implement the checks, and time delays through
those gates. Research[citation needed] shows the gate cost in some cases may be
gates, and the delay cost , where is the number of instructions in the processor's
instruction set, and is the number of simultaneously dispatched instructions.
Even though the instruction stream may contain no inter-instruction dependencies,
a superscalar CPU must nonetheless check for that possibility, since there is no
assurance otherwise and failure to detect a dependency would produce incorrect
results.
No matter how advanced the semiconductor process or how fast the switching
speed, this places a practical limit on how many instructions can be simultaneously
dispatched. While process advances will allow ever greater numbers of functional
units (e.g., ALUs), the burden of checking instruction dependencies grows rapidly, as
does the complexity of register renaming circuitry to mitigate some dependencies.
Collectively the power consumption, complexity and gate delay costs limit the
achievable superscalar speedup to roughly eight simultaneously dispatched
instructions.
However even given infinitely fast dependency checking logic on an otherwise
conventional superscalar CPU, if the instruction stream itself has many
dependencies, this would also limit the possible speedup. Thus the degree of
intrinsic parallelism in the code stream forms a second limitation.

Advanced Computer Architecture

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Advanced Computer Architecture

Uploaded by

Copyright:

Available Formats

Advanced Computer Architecture

What is Parallel Processing?

computations and scientific applications become more data

Parallel computing is an efficient form of information processing which emphasizes

What is Parallel Computing?

Why Use Parallel Computing?

Other reasons might include:

1) Concepts of Parallel Computing

Parallelism in Uniprocessor systems:

We can introduce parallelism techniques in Uniprocessor systems. Whose having

A) Multiplicity of Functional units:

Any functions of the ALU can be

distributed to multiple and specialized functional units, which can operate in

B) Parallelism and pipelining with in CPU :

Parallel adders using

Various phases of Instructions executions are now pipelined. Including instruction

C) Overlapped CPU and I/O Operations:

2.architectural classification schemes4.2 Architectural

Word Parallel and Bit Parallel (WPBP)

Arithmetic Logic Unit (ALU)

3.Evolution of parallel processors

1.2.3 Trends towards Parallel Processing

Synchronous back plane interconnect (SBI)

Superscalar processing : For

Signal Processing is a wide area of research that extends

of array processing. The antenna array is used in these

used to estimate parameters, such as velocity, position

increase the usage efficiency of spectral and enhance the

Array processing techniques got on much attention from

systems that use single channel and solving problems

Antennas, Synthetic aperture radar, underwater acoustic

You might also like