You are on page 1of 120

Distributed Systems,

cs5223
Lecture 01 (2004-01-06)
Seif Haridi
Department of Computer Science,
NUS

haridi@comp.nus.edu.sg

2002-08-15 S. Haridi, CS2104, Lecture 01 1


Overview
 Organization

 Course overview
 Getting started (introduction to distributed
systems and distributed algorithms)

2002-08-15 S. Haridi, CS2104, Lecture 01 2


Organization/Objectives

2002-08-15 S. Haridi, CS2104, Lecture 01 3


Objectives
 Understand some of the fundamental aspects of
distributed systems

 Overview of systems aspects (half of the course)

 Focus is on algorithmic aspects (half of the course)

 Learn how to read/present research papers

2002-08-15 S. Haridi, CS2104, Lecture 01 4


Non objectives

 Learning in detail about all middleware for


constructing distributed applications
 Learn how to program distributed applications
 Web services
 Java and distributed computing
 Mozart and distributed computing
 Look at
 M.L. Liu, Distributed Computing
 P. Van Roy and S. Haridi, Concepts, Techniques
and Models of Computer Programming
2002-08-15 S. Haridi, CS2104, Lecture 01 5
Distributed Systems
CS5223
 cs5223
 written final exam 60%
 Midterm exam 20%
 Assignments 20%
 Module homepage
http://www.comp.nus.edu.sg/~cs5223
IVLE
 Teaching
 Lectures
 Consultation using IVLE
 Come any time

2002-08-15 S. Haridi, CS2104, Lecture 01 6


Teacher
 Course responsible [Lectures]
Seif Haridi haridi@comp.nus.edu.sg
cs5223@comp.nus.edu.sg

2002-08-15 S. Haridi, CS2104, Lecture 01 7


Lectures

 Held by me

2002-08-15 S. Haridi, CS2104, Lecture 01 8


Lecture Structure
 Reminder of last lecture

 Overview

 Content

 Summary

 Reading suggestions

2002-08-15 S. Haridi, CS2104, Lecture 01 9


Material
 Lectures are based on mainly two books
 (DS) Andrew S. Tanenbaum, Maarten van Steen,
Distributed Systems, Principles and Paradigms, Prentice-
Hall 2002.
 (DA) Randy Chow and Theodore Johnson,
Distributed Operating Systems & Algorithms, Addison
Wesley 1997, ISBN 0-201-49838-3.
 Copies should be available (now or soon) at the CO-
OP
 The handouts are in most cases self explanatory
 Available from the webpage
 Some scientific papers
2002-08-15 S. Haridi, CS2104, Lecture 01 10
Other recommended material
 Coulouris, Dollimore, Kindberg, “Distributed
Systems: Concepts and Design”, Addison-
Wesley (3rd Edition)
 M.L. Liu, Distributed Computing, principles
and applications, Addison Wesley
 Nancy Lynch, Distributed Algorithms

2002-08-15 S. Haridi, CS2104, Lecture 01 11


Reading Suggestions
 Will be available on webpage (Lectures)
 Initially
 Chapter 1 of Tanenbaum (DS)

2002-08-15 S. Haridi, CS2104, Lecture 01 12


Assignments
 There will be one assignments
 You will have to study one or two research papers
 Or do a programming assignment
 One discussion group per assignment
 solutions to be submitted through IVLE
 there is a deadline for each assignment

2002-08-15 S. Haridi, CS2104, Lecture 01 13


General information

 Reading of papers
 Ingroups of two or three
 Each group will read one or two research papers

 For each paper studied


 Identify the problem
 Explain the solution(s) presented in the paper
 Identify positive and negative aspects of the paper
 Propose your own solution if any
 Provide a report
 Give a presentation to the class

2002-08-15 S. Haridi, CS2104, Lecture 01 14


Assignment Groups
 Assignment done via IVLE
 is everybody subscribed to IVLE?

2002-08-15 S. Haridi, CS2104, Lecture 01 15


Use IVLE
 Only on exceptional cases
cs5223@comp.nus.edu.sg

 Questions
 There is a discussion group for each book
chapter/lectures
 There is a discussion group for general matters
 Submit your assignments using the
corresponding Workbin (IVLE)
2002-08-15 S. Haridi, CS2104, Lecture 01 16
Feedback in General
 Approach me directly, (any time) or arrange
for appointment
 Do not be afraid!

2002-08-15 S. Haridi, CS2104, Lecture 01 17


Questions and Using Brakes!
 Please do ask questions during the lectures
 repeat an explanation
 give better explanation
 for an example?

 Please say when things go too fast!


 Please say when things go too slow!

2002-08-15 S. Haridi, CS2104, Lecture 01 18


Background Knowledge
I assume the following some knowledge on
 Programming languages knowledge: C/Java
 Operating systems knowledge: basic concepts
 Networking: basic concepts
 Algorithms and data structures
I will try to be as elementary as possible
 Ask me if lack some knowledge

2002-08-15 S. Haridi, CS2104, Lecture 01 19


Course Overview

2002-08-15 S. Haridi, CS2104, Lecture 01 20


What is a distributed system

2002-08-15 S. Haridi, CS2104, Lecture 01 21


Distributed system
A simplified view

Processor

Process
Communication
Thread Medium

Communication channel

Node: processor/process
2002-08-15 S. Haridi, CS2104, Lecture 01 22
Distributed system
 Set of computing nodes that cooperate in
order to achieve a well defined goal

 Nodes cooperate through communication

 Communication is by message passing at


the fundamental level

2002-08-15 S. Haridi, CS2104, Lecture 01 23


Distributed System
A distributed system is one/more applications
running on a collection of independent
computers that appears to its users as a
single coherent system

2002-08-15 S. Haridi, CS2104, Lecture 01 24


What is a Distributed System?
 Distributed hardware
n processing elements (processor + memory), PE
 Interconnected by some network

 No shared-memory

 Distributed software
 No centralized OS, each PE has its own copy of OS
 No physically centralized file system

 Means for inter-process communication

 Distributed applications
2002-08-15 S. Haridi, CS2104, Lecture 01 25
Why distributed systems?
 Information exchange (collaborative work)
 Resource sharing (e.g. printer, backup
storage, disk units, etc.)
 Resource sharing (applications, information,
media, services)
 Cost reduction
 Increase of availability (partial-failure)
 Increase of performance through
parallelism,...
2002-08-15 S. Haridi, CS2104, Lecture 01 26
Main characteristics
 No shared memory between nodes
 Each node has its memory
 Communication by message passing

 No global clock
 Each node has its own clock

 Impossible for a node to obtain an instantaneous


global state of the system

2002-08-15 S. Haridi, CS2104, Lecture 01 27


Examples of Distributed
Systems

 Airlinereservation system
 Bank automated teller machine network

 CSCW (Computer Supported Cooperative


Work)
 Intranet

 Internet

 Mobile computing

2002-08-15 S. Haridi, CS2104, Lecture 01 28


A typical portion of the
Internet
intranet
ISP

backbone

satellite link

desktop computer:
server:
network link:

2002-08-15 S. Haridi, CS2104, Lecture 01 29


A typical intranet
email server Desktop
computers
printand other servers

Local area
Web server network

email server
print
File server
other servers
the rest of
the Internet
router/firewall

2002-08-15 S. Haridi, CS2104, Lecture 01 30


How Distributed Systems are
built?
A number of computers connected by a
network
 Distribution middleware services layer that
gives a uniform view of the nodes, and hides
some of the network and distribution aspects
 Application on top the middleware service
layer (using a programming system)

2002-08-15 S. Haridi, CS2104, Lecture 01 31


Middleware view

2002-08-15 S. Haridi, CS2104, Lecture 01 32


Middleware view
 Distributed Systems is
organized often as a
layer on the top of local
operating systems

2002-08-15 S. Haridi, CS2104, Lecture 01 33


Goals of a Distributed System
 Transparency
 Hide the fact the processes are resources are physically
distributed
 Scalability
 Distributed systems should be easy to expand
 Availability
 Distributed systems should be continuously available
 Openness
 New users/components into the system
 Incremental and independent augmentation by
independent developer teams
2002-08-15 S. Haridi, CS2104, Lecture 01 34
Transparency
 Ideally a distributed application (system) should
look like conventional centralized systems, no
distinction between local and remote resources
 This is the user view
 The developer view is different
 Network aware, knows the cost of distribution of
programming entities (e.g. objects)
 Have means to control the distribution behavior

2002-08-15 S. Haridi, CS2104, Lecture 01 35


Transparency
 Access Transparency
 Hide differences in data representation and
how a resource is accessed
 Hides heterogeneity of underlying nodes

 Location Transparency
 Hide where a resource/service is located
 Migration Transparency
 Hides that a resources/service may be moved
to another location while in use
2002-08-15 S. Haridi, CS2104, Lecture 01 36
Transparency
 Relocation Transparency
 Hides that a resource maybe moved to
another location (machine/node)
 Failure Transparency
 Hide the failure and recovery of a resource
 Concurrency Transparency
 Hides that a resources may be shared by a
number of competitive uses/processes
2002-08-15 S. Haridi, CS2104, Lecture 01 37
Transparency
Transparency Description
Hide differences in data representation and how a resource is
Access
accessed
Location Hide where a resource is located
Migration Hide that a resource may move to another location
Hide that a resource may be moved to another location while in
Relocation
use
Hide that a resource may be shared by several competitive
Replication
users
Hide that a resource may be shared by several competitive
Concurrency
users
Failure Hide the failure and recovery of a resource

Persistence Hide whether a (software) resource is in memory or on disk

2002-08-15 S. Haridi, CS2104, Lecture 01 38


Scalability
 Size
 Add more users and resources/components
 Distance
 Cope with geographically apart resources/users
 Management
 Spanning over independent administrative
organizations
 Local management

2002-08-15 S. Haridi, CS2104, Lecture 01 39


Scalability Problems (Size)
Concept Example

Centralized services A single server for all users

Centralized data A single database for location information

Centralized algorithms All requests go through one process

Examples of scalability limitations.

2002-08-15 S. Haridi, CS2104, Lecture 01 40


Scaling Techniques I

1.4

Off loading the server by sending form processing


procedures to the client
2002-08-15 S. Haridi, CS2104, Lecture 01 41
Scaling Techniques II
• Distributed Algorithms
• No process has complete information
of the system
• Process decisions are based on local
information
• Failure of one process does not ruin
the whole system
• Non implicit assumptions about exactly
synchronized clocks (global clock)
2002-08-15 S. Haridi, CS2104, Lecture 01 42
Scaling Techniques II

1.5

An example of dividing the DNS name space into zones.


2002-08-15 S. Haridi, CS2104, Lecture 01 43
Scalability Problems
(Distance)
 Long communication delays
 Programming techniques for Local Area
Networks LAN do not really work for Wide
Area Networks WAN
 Synchronous Communication like Remote
Procedure Calls (RPC) are not suitable
 Asynchronous Message passing is more
appropriate

2002-08-15 S. Haridi, CS2104, Lecture 01 44


Scalability Problems
(Distance)
 Long communication delays
 Programming techniques for Local Area
Networks LAN do not really work for Wide
Area Networks WAN
 Synchronous Communication like Remote
Procedure Calls (RPC) are not suitable
 Asynchronous Message passing is more
appropriate

2002-08-15 S. Haridi, CS2104, Lecture 01 45


Scalability Problems
(Distance)
 WAN has unreliable communication media
 Cannot exploit broadcast communication
 Only point-to-point communication
 Locating a service on a WAN is more difficult
that on LAN
 On LAN just broadcast a service identifier, and
wait for response

2002-08-15 S. Haridi, CS2104, Lecture 01 46


Scalability Problems (Different
Administrative Organizations)
 Different and conflicting policies for
 Resource usage
 Management of the system
 Security policies
 WHO has access to WHAT resources
 Can I trust a non local system administrator

2002-08-15 S. Haridi, CS2104, Lecture 01 47


Scalability Problems (Different
Administrative Organizations)
Admin Domain 1 Admin Domain 2

Distributed System DS

 Protect DS from the domains 1 & 2


 Protect domains 1 & 2 from the DS

 GRID Computing GGF

2002-08-15 S. Haridi, CS2104, Lecture 01 48


Focus of the Distributed
systems part (Basics)
 Components of Distributed Systems
 Inter-process communication
 Processes, threads, client/servers, code
migration, software agents
 Naming services

2002-08-15 S. Haridi, CS2104, Lecture 01 49


Focus of the Distributed
systems part (Middleware)
 Examples of middleware for building DS
 Distributed Object-based Systems
 CORBA
 Distributed COM
 GLOBE
 Distributed Coordination-based systems
 Security

2002-08-15 S. Haridi, CS2104, Lecture 01 50


Focus of the Distributed
systems part (Infrastructures)

 Distributed file systems


 Distributed document-based systems

2002-08-15 S. Haridi, CS2104, Lecture 01 51


Focus of the Distributed
Algorithms part
 Model of Computations
 Techniques for coordination of processess
 Techniques for high availability
 Fault tolerance
 Reliable group communication
 Distributed agreement
 Techniques for scalability
 Consistency models
 Replication techniques
2002-08-15 S. Haridi, CS2104, Lecture 01 52
Distributed Algorithms

2002-08-15 S. Haridi, CS2104, Lecture 01 53


Distributed Algorithms
 How to design distributed algorithms
 Study of some fundamental problems
 Analysis of distributed algorithms
 How to achieve fault-tolerance in a distributed
system
 Fault-tolerance: ability for a system to provide
useful service despite the failure of some of
its components
 Very important for high availability

2002-08-15 S. Haridi, CS2104, Lecture 01 54


Why studying distributed
algorithms?
 Distributed algorithms are backbone of
distributed computing systems
 They are essential for the implementation of
distributed systems
 Distributed operating systems
 Distributed databases, communication systems,

 Real-time process-control systems,

 Transportation, etc.

2002-08-15 S. Haridi, CS2104, Lecture 01 55


Classes of distributed
algorithms
 Fully decentralized
 Fault-tolerant
 More difficult in general

 With a centralized coordinator


 Conceptually simpler
 Single point of failure, bottleneck
 Require efficient mechanisms for selecting a
new coordinator if the current one fails
2002-08-15 S. Haridi, CS2104, Lecture 01 56
References
 Text book:
 Distributed Operating Systems & Algorithms
 RandyChow and Theodore Johnson, Addison
Wesley, 1997
 Others
 Distributed Algorithms
 Nancy A. Lynch, 1996
 Research papers

2002-08-15 S. Haridi, CS2104, Lecture 01 57


Distributed Algorithms
 Models of Distributed Computation
 Causality
 Orderingof events, Logical Clocks (timestamps)
 Causal communication

 Distributed snapshots
 Detecting stable properties, Diffusing computation
 Modeling a distributed computation
 Expressing correctness properties of a dist. algo.
 Failures in a distributed system

2002-08-15 S. Haridi, CS2104, Lecture 01 58


Distributed Algorithms: outline
 Synchronization
 Distributed
mutual exclusion: needed to regulate
accesses to a common resource that can be used
only by one process at a time
 Election
 Usedfor instance, to designate a new coordinator
when the current coordinator fails

2002-08-15 S. Haridi, CS2104, Lecture 01 59


Distributed Algorithms: outline
 Distributed agreement
 How to get a set of nodes to agree on a
value

 Distributed agreement is used for instance,


 To determine which nodes are alive in the
system
 To confine malicious behavior of some
components
 (Fault-tolerance again!)

2002-08-15 S. Haridi, CS2104, Lecture 01 60


Distributed Algorithms: outline
 Replicated data management
A key for high availability is to replicate
components (data/files, servers, etc.)

 We shall be concerned with


 Techniques for maintaining replicated data in a
distributed system, (database techniques)
 Atomic broadcast/multicast
 Membership

2002-08-15 S. Haridi, CS2104, Lecture 01 61


Distributed Algorithms: outline
 Check-pointing and recovery
 Error recovery is essential for fault-tolerance
 When a processor fails and then is repaired, it will
need to recover its state of the computation
 To enable recovery, check-pointing (recording of
the state into a stable storage) is needed
 We will be concerned with techniques used for
this, in the context of distributed systems

2002-08-15 S. Haridi, CS2104, Lecture 01 62


Background

2002-08-15 S. Haridi, CS2104, Lecture 01 63


Distributed system, distributed
computing
 Early computing was performed on a single
processor. Uni-processor computing can be
called centralized computing.
 A distributed system is a collection of
independent computers, interconnected via a
network, capable of collaborating on a task.
 Distributed computing is computing
performed in a distributed system.

2002-08-15 S. Haridi, CS2104, Lecture 01 64


Distributed Systems

w o rk
s t a t io n s a lo c a l n e t w o r k

T h e In te rn e t

a n e tw o r k h o s t

2002-08-15 S. Haridi, CS2104, Lecture 01 65


Examples of Distributed
systems
 Network of workstations (NOW): a group of
networked personal workstations connected
to one or more server machines.
 The Internet

 An intranet: a network of computers and


workstations within an organization,
segregated from the Internet via a protective
device (a firewall).

2002-08-15 S. Haridi, CS2104, Lecture 01 66


Computers in a Distributed
System

 Workstations: computers used by end-users


to perform computing
 Server machines: computers which provide
resources and services
 Personal Assistance Devices: handheld
computers connected to the system via a
wireless communication link.

2002-08-15 S. Haridi, CS2104, Lecture 01 67


Centralized vs. Distributed
Computing

t e r m in a l
m a in f r a m e c o m p u t e r
w o r k s t a t io n

n e t w o r k lin k

n e tw o rk h o s t
c e n t r a liz e d c o m p u tin g
d is tr ib u t e d c o m p u tin g

2002-08-15 S. Haridi, CS2104, Lecture 01 68


Evolution of pardigms
 Client-server: Socket API, remote method invocation
 Distributed objects
 Object broker: CORBA
 Network service: Jini
 Object space: JavaSpaces
 Mobile agents
 Message oriented middleware (MOM): Java Message
Service
 Collaborative applications

2002-08-15 S. Haridi, CS2104, Lecture 01 69


Cooperative distributed
computing projects
Cooperative distributed computing projects
(also called distributed computing in some
literature): these are projects that parcel out
large-scale computing to workstations, often
making use of surplus CPU cycles. Example:
seti@home: project to scan data retrieved by
a radio telescope to search for radio signals
from another world.

2002-08-15 S. Haridi, CS2104, Lecture 01 70


Why distributed computing?
 Economics: distributed systems allow the
pooling of resources, including CPU cycles,
data storage, input/output devices, and
services.
 Reliability: a distributed system allow
replication of resources and/or services, thus
reducing service outage due to failures.
 The Internet has become a universal platform
for distributed computing.
2002-08-15 S. Haridi, CS2104, Lecture 01 71
The Weaknesses and Strengths of
Distributed Computing
In any form of computing, there is always a
tradeoff in advantages and disadvantages
Some of the reasons for the popularity of
distributed computing :
 The affordability of computers and
availability of network access
 Resource sharing
 Scalability
 Fault Tolerance

2002-08-15 S. Haridi, CS2104, Lecture 01 72


The Weaknesses and Strengths of
Distributed Computing
The disadvantages of distributed computing:
 Multiple Points of Failures: the failure of
one or more participating computers, or one
or more network links, can spell trouble.
 Security Concerns: In a distributed system,
there are more opportunities for unauthorized
attack.
 Difficult to develop application

2002-08-15 S. Haridi, CS2104, Lecture 01 73


Introductory Basics
M. L. Liu

2002-08-15 S. Haridi, CS2104, Lecture 01 74


Basics in three areas
Some of the notations and concepts from
these areas will be employed from time to
time in the presentations for this course:
 Programming Languages
 Operating systems

 Networks.

2002-08-15 S. Haridi, CS2104, Lecture 01 75


Procedural versus Object-oriented
Programming
In building network applications, there are
two main classes of programming
languages: procedural language and
object-oriented language.
 Procedural languages, with the C language
being the primary example, use procedures
(functions) to break down the complexity of
the tasks that an application entails.  
 Object-oriented languages, exemplified by
Java, use objects to encapsulate the details.
Each object carrying state data as well as
behaviors. State data are represented as
instance data. Behaviors are represented as
methods.
2002-08-15 S. Haridi, CS2104, Lecture 01 76
Operating Systems
Basics

2002-08-15 S. Haridi, CS2104, Lecture 01 77


Operating systems basics
A process consists of an executing program,
its current values, state information, and the
resources used by the operating system to
manage its execution.
 A program is an artifact constructed by a
software developer; a process is a dynamic
entity which exists only when a program is
run.

2002-08-15 S. Haridi, CS2104, Lecture 01 78


Process State Transition
Diagram

te r m in a te d
sta rt
queued
e x it
d is p a t c h r u n n in g
re ady

e v e n t c o m p le t io n w a it in g
fo r ev en t
b lo c k e d

S im p life d fin ite s ta t e d ia g r a m f o r a p r o c e s s 's life t im e


2002-08-15 S. Haridi, CS2104, Lecture 01 79
Example: Java processes
 There are three types of Java program:
applications, applets, and servlets, all are written as
a class.
 A Java application program is run as an
independent(standalone) process.
 An applet is run using a browser or the applet viewer.
 A servlet is run in the context of a web server.
 A Java program is compiled into byte code, a
universal object code. When run, the byte code is
interpreted by the Java Virtual Machine (JVM).

2002-08-15 S. Haridi, CS2104, Lecture 01 80


Three Types of Java programs
 Applications
a program whose byte code can be run on any system
which has a Java Virtual Machine. An application may
be standalone (monolithic) or distributed (if it interacts
with another process).
 Applets

A program whose byte code is downloaded from a


remote machine and is run in the browser’s Java Virtual
Machine.
 Servlets

A program whose byte code resides on a remote


machine and is run at the request of an HTTP client (a
browser).
2002-08-15 S. Haridi, CS2104, Lecture 01 81
Three Types of Java programs
A standalone Java application is run on a local machine
computer

Java object

Java Virtual Machine

An applet is an object downloaded (transferred) from a remote machine,


then run on a local machine.

an applet Java object

Java Virtual Machine

Aservlet is an object that runs on a remote machine and


interacts with a local program using a request-response protocol

a servlet
request a process

response

2002-08-15 S. Haridi, CS2104, Lecture 01 82


Concurrent Processing
On modern day operating systems, multiple processes
appear to be executing concurrently on a machine
by timesharing resources.

Processes
P1
P2
P3
P4
time
Timesharing of a resource
2002-08-15 S. Haridi, CS2104, Lecture 01 83
Concurrent processing within
a process
It is often useful for a process to have parallel threads of execution,
each of which timeshare the system resources in much the same
way as concurrent processes.

A p a r e n t p r o c e s s m a y s p a w n c h ild p r o c e s s e s . A p r o c e s s m a y s p a w n c h ild th r e a d s
a p ro c ess
p aren t p ro c ess
m a in t h r e a d

c h ild t h r e a d 1

c h ild t h r e a d 2
c h ild p r o c e s s e s

2002-08-15 C o nS.c uHaridi,


r r e n tCS2104,
p r o c e s Lecture
s i n g w i01
th in a p r o c e s s 84
Thread-safe Programming
 When two threads independently access and
update the same data object, such as a counter, as
part of their code, the updating needs to be
synchronized. (See next slide.)
 Because the threads are executed concurrently, it
is possible for one of the updates to be overwritten
by the other due to the sequencing of the two sets
of machine instructions executed in behalf of the
two threads.
 To protect against the possibility, a synchronized
method can be used to provide mutual exclusion.
2002-08-15 S. Haridi, CS2104, Lecture 01 85
Race Condition
t im e

f e t c h v a lu e in c o u n t e r a n d lo a d in t o a r e g is t e r f e t c h v a lu e in c o u n t e r a n d lo a d in t o a r e g is t e r

in c r e m e n t v a lu e in r e g is t e r f e t c h v a lu e in c o u n t e r a n d lo a d in t o a r e g is t e r

s t o r e v a lu e in r e g is t e r t o c o u n t e r in c r e m e n t v a lu e in r e g is t e r

in c r e m e n t v a lu e in r e g is te r
f e t c h v a lu e in c o u n t e r a n d lo a d in t o a r e g is t e r

in c r e m e n t v a lu e in r e g is te r s t o r e v a lu e in r e g is t e r t o c o u n t e r

s to r e v a lu e in r e g is te r to c o u n te r s to r e v a lu e in r e g is te r to c o u n te r

T h is e x e c u tio n re s u lts in th e T h is e x e c u tio n re s u lts in th e


valu e 2 in th e c o u n te r va lu e 1 in th e c o u n te r

in s t r u c t io n e x e c u t e d in c o n c u r r e n t p r o c e s s o r t h r e a d 1
in s t r u c t io n e x e c u t e d in c o n c u r r e n t p r o c e s s o r t h r e a d 2

2002-08-15 S. Haridi, CS2104, Lecture 01 86


Network Basics

2002-08-15 S. Haridi, CS2104, Lecture 01 87


Network standards and
protocols
 On public networks such as the Internet, it is
necessary for a common set of rules to be
specified for the exchange of data.
 Such rules, called protocols, specify such
matters as the formatting and semantics of
data, flow control, error correction.
 Software can share data over the network
using network software which supports a
common set of protocols.
2002-08-15 S. Haridi, CS2104, Lecture 01 88
Protocols
A protocol is a set of rules that must be observed by the
participants.
Protocols must be formally defined and precisely
implemented. For each protocol, there must be rules that
specify the followings:
How is the data exchanged encoded?
How are events (sending , receiving) synchronized so
that the participants can send and receive in a
coordinated order?
The specification of a protocol does not dictate how the rules
are to be implemented.

2002-08-15 S. Haridi, CS2104, Lecture 01 89


The network architecture
 Network hardware transfers electronic signals,which
represent a bit stream, between two devices.
 Modern day network applications require an application
programming interface (API) which masks the
underlying complexities of data transmission.
 A layered network architecture allows the functionalities
needed to mask the complexities to be provided
incrementally, layer by layer.
 Actual implementation of the functionalities may not be
clearly divided by layer.

2002-08-15 S. Haridi, CS2104, Lecture 01 90


The OSI seven-layer network
architecture
application layer application layer

presentation layer presentation layer

session layer session layer

transport layer transport layer

network layer network layer

data link layer data link layer

physical layer physical layer

2002-08-15 S. Haridi, CS2104, Lecture 01 91


Network Architecture
The division of the layers is conceptual: the
implementation of the functionalities need
not be clearly divided as such in the
hardware and software that implements the
architecture.
The conceptual division serves at least two
useful purposes :
1. Systematic specification of protocols
it allows protocols to be specified systematically
2. Conceptual Data Flow: it allows programs to be written
in terms of logical data flow.
2002-08-15 S. Haridi, CS2104, Lecture 01 92
The TCP/IP Protocol Suite
 The Transmission Control Protocol/Internet Protocol
suite is a set of network protocols which supports a
four-layer network architecture.
 It is currently the protocol suite employed on the
Internet.
A p p lic a t io n la y e r A p p lic a t io n la y e r

T r a n s p o r t la y e r T r a n s p o r t la y e r

I n t e r n e t la y e r I n t e r n e t la y e r

P h y s ic a l la y e r P h y s ic a l la y e r

T h e I n te r n e t n e tw o r k a r c h ite c tu r e
2002-08-15 S. Haridi, CS2104, Lecture 01 93
The TCP/IP Protocol Suite -2
 The Internet layer implements the Internet
Protocol, which provides the functionalities for
allowing data to be transmitted between any
two hosts on the Internet.
 The Transport layer delivers the transmitted
data to a specific process running on an
Internet host.
 The Application layer supports the programming
interface used for building a program.

2002-08-15 S. Haridi, CS2104, Lecture 01 94


Network Resources
 Network resources are resources available to
the participants of a distributed computing
community.
 Network resources include hardware such as
computers and equipment, and software
such as processes, email mailboxes, files,
web documents.
 An important class of network resources is
network services such as the World Wide
Web and file transfer (FTP), which are
provided by specific processes running on
2002-08-15 S. Haridi, CS2104, Lecture 01 95
computers.
Identification of Network Resources

One of the key challenges in distributed


computing is the unique identification of
resources available on the network, such as
e-mail mailboxes, and web documents.
 Addressing an Internet Host
 Addressing a process running on a host

 Email Addresses

 Addressing web contents: URL

2002-08-15 S. Haridi, CS2104, Lecture 01 96


Addressing Internet
Hosts

2002-08-15 S. Haridi, CS2104, Lecture 01 97


The Internet Topology

a n In te rn e t h o s t

s u b n e ts

T h e In te rn e t b a c k b o n e

T h e I n t e r n e t T o p o lo g y M o d e l

2002-08-15 S. Haridi, CS2104, Lecture 01 98


The Internet Topology
 The internet consists of an hierarchy of
networks, interconnected via a network
backbone.
 Each network has a unique network address.
 Computers, or hosts, are connected to a
network. Each host has a unique ID within its
network.
 Each process running on a host is associated
with zero or more ports. A port is a logical
entity for data transmission.
2002-08-15 S. Haridi, CS2104, Lecture 01 99
The Internet addressing
scheme
 In IP version 4, each address is 32 bit long.
 The address space accommodates 232 (4.3 billion) addresses in total.
 Addresses are divided into 5 classes (A through E)

byte 0 byte 1 byte 2 byte 3


class A address
0
class B address 10
network address
class C address 11 0
host portion
Multicast addresses111
11 0 multicast group

reserved
reserved address 1 1 1 10 reserved

2002-08-15 S. Haridi, CS2104, Lecture 01 100


The Internet addressing scheme -
2
Subdividing the host portion of an Internet
address:
byte 0 byte 1 byte 2 byte 3

class B address 1 0 network address host portion

A class A/C address space can


also be similarly subdivided..
Which portion of the host address
is used for the subnet identification
is determined by a subnet mask. subnet address local host address

2002-08-15 S. Haridi, CS2104, Lecture 01 101


Example
Suppose the dotted-decimal notation for a particular Internet
address
is129.65.24.50. The 32-bit binary expansion of the notation
1 2 9 .6 5 .2 4 .5 0
is as 10000001

follows: 01000001

00011000
00110010

Since the leading bit sequence is 10, the address is a Class


address. Within the class, the network portion is identified by the
remaining bits in the first two bytes, that is, 00000101000001, and the
host portion is the values in the last two bytes, or 0001100000110010.
For convenience, the binary prefix for class identification is often
included as part of the network portion of the address, so that we
2002-08-15
would say that this particular address is at network 129.65 and
S. Haridi, CS2104, Lecture 01 102

then at host address 24.50 on that network.


Another Example
Given the address 224.0.0.1, one can expand it as
follows:
2 2 4 .0 .0 .1
  1110000

00000000

00000000
00000001

The binary prefix of 1110 signifies that this is class D, or


multicast, address. Data packets sent to this
address should therefore be delivered to the
multicast group
0000000000000000000000000001.

2002-08-15 S. Haridi, CS2104, Lecture 01 103


The Internet Address Scheme - 3
 For human readability, Internet addresses
are written in a dotted decimal notation:
nnn.nnn.nnn.nnn, where each nnn group is a decimal value in
the range of 0 through 255
# Internet host table (found in /etc/hosts file)
127.0.0.1 localhost
129.65.242.5 falcon.csc.calpoly.edu falcon loghost
129.65.241.9 falcon-srv.csc.calpoly.edu falcon-srv
129.65.242.4 hornet.csc.calpoly.edu hornet
129.65.241.8 hornet-srv.csc.calpoly.edu hornet-srv
129.65.54.9 onion.csc.calpoly.edu onion
129.65.241.3 hercules.csc.calpoly.edu hercules
2002-08-15 S. Haridi, CS2104, Lecture 01 104
IP version 6 Addressing Scheme

 Each address is 128-bit long.


 There are three types of addresses:
 Unicast: An identifier for a single interface.
 Anycast: An identifier for a set of interfaces
(typically belonging to different nodes).
 Multicast: An identifier for a set of interfaces
(typically belonging to different nodes). A packet
sent to a multicast address is delivered to all
interfaces identified by that address.
 See Request for Comments: 2373
http://www.faqs.org/rfcs/ (link is in book’s
reference)
2002-08-15 S. Haridi, CS2104, Lecture 01 105
The Domain Name System
(DNS)
Each Internet address is mapped to a symbolic name, using
the DNS, in the format of:
<computer-name>.<subdomain hierarchy>.<organization>.<sector name>{.<country code>}
e.g., www.csc.calpoly.edu.us

root

to p -le v e l d o m a in

com co u n try co d e
edu gov net org m il
in t h e U .S .

o r g a n iz a t io n T o p - le v e l d o m a in n a m e h a s t o b e a p p lie d f o r .
S u b d o m a in h ie r a c h y a n d n a m e s a r e a s s ig n e d
b y th e o r g a n iz a tio n .

...
s u b d o m a in
...

host nam e

2002-08-15 S. Haridi, CS2104, Lecture 01 106


The Domain Name System
 For network applications, a domain name must be
mapped to its corresponding Internet address.
 Processes known as domain name system servers
provide the mapping service, based on a
distributed database of the mapping scheme.
 The mapping service is offered by thousands of
DNS servers on the Internet, each responsible for a
portion of the name space, called a zone. The
servers that have access to the DNS information
(zone file) for a zone is said to have authority for
that zone

2002-08-15 S. Haridi, CS2104, Lecture 01 107


Domain Name Hierarchy
. ( r o o t d o m a in )

.a u ... .c a ... .u s ... .z w .c o m .g o v .e d u . m il .n e t .o rg

c o u n tr y c o d e
u c s b .e d u ... c a lp o ly . e d u ...

c s c ... e e e n g lis h . . . w ir e le s s
cs ... e c e ...

2002-08-15 S. Haridi, CS2104, Lecture 01 108


Name lookup and resolution
 If a domain name is used to address a host, its
corresponding IP address must be obtained for the
lower-layer network software.
 The mapping, or name resolution, must be
maintained in some registry.
 For runtime name resolution, a network service is
needed; a protocol must be defined for the naming
scheme and for the service. Example: The DNS
service supports the DNS; the Java RMI registry
supports RMI object lookup; JNDI is a network
service lookup protocol.
2002-08-15 S. Haridi, CS2104, Lecture 01 109
Addressing a process
running on a host

2002-08-15 S. Haridi, CS2104, Lecture 01 110


Logical Ports
host A

host B
...

process

...
port

Each host has 65536 ports.

The Internet

2002-08-15 S. Haridi, CS2104, Lecture 01 111


Well Known Ports
 Each Internet host has 216 (65,535) logical
ports. Each port is identified by a number
between 1 and 65535, and can be allocated
to a particular process.
 Port numbers beween 1 and 1023 are
reserved for processes which provide well-
known services such as finger, FTP, HTTP,
and email.

2002-08-15 S. Haridi, CS2104, Lecture 01 112


Well-known ports
A s s ig n m e n t o f s o m e w e ll-k n o w n p o r ts
P ro to co l Port S e r v ic e
echo 7 IP C t e s t i n g

d a y t im e 13 p r o v id e s t h e c u r r e n t d a t e a n d t im e

ftp 21 f ile t r a n s f e r p r o t o c o l

t e ln e t 23 r e m o t e , c o m m a n d - lin e t e r m in a l s e s s io n

s m tp 25 s im p le m a il t r a n s f e r p r o t o c o l

t im e 37 p r o v id e s a s t a n d a r d t im e

f in g e r 79 p r o v id e s in f o r m a t io n a b o u t a u s e r

h ttp 80 w e b s e rv e r
R M I R e g is t r y 1099 r e g is tr y fo r R e m o te M e th o d In v o c a tio n
w e b s e r v e r w h ic h s u p p o r t s
s p e c ia l w e b s e r v e r 8080
s e r v le t s , J S P , o r A S P
2002-08-15 S. Haridi, CS2104, Lecture 01 113
Choosing a port to run your
program
 For programming: when a port is needed,
choose a random number above the well
known ports: 1,024- 65,535.
 For providing a network service for the
community, then arrange to have a port
assigned to and reserved for your service.

2002-08-15 S. Haridi, CS2104, Lecture 01 114


Addressing a Web Document

2002-08-15 S. Haridi, CS2104, Lecture 01 115


The Uniform Resource
Identifier (URI)
 Resources to be shared on a network need
to be uniquely identifiable.
 On the Internet, a URI is a character string
which allows a resource to be located.
 There are two types of URIs:
 URL (Uniform Resource Locator) points to a
specific resource at a specific location
 URN (Uniform Resource Name) points to a
specific resource at a nonspecific location.
2002-08-15 S. Haridi, CS2104, Lecture 01 116
URL
A URL has the format of:
protocol://host address[:port]/directory path/file name#section
A sample URL:
http:// www.csc.calpoly.edu :8080/~ mliu/ CSC369 / hw.html # hw1

section name
file name
host name
directory path
protocol of server port number of server process

Other protocols that can appear in a URL are:


file
ftp
gopher
news
telnet
WAIS

2002-08-15 S. Haridi, CS2104, Lecture 01 117


More on URL
 The path in a URL is relative to the
document root of the server.
 A URL may appear in a document in a
relative form:
< a href=“another.html”>
and the actual URL referred to will be
another.html preceded by the protocol,
hostname, directory path of the document .

2002-08-15 S. Haridi, CS2104, Lecture 01 118


Summary - 1
We discussed the following topics:
 What is meant by distributed computing

 Distributed system

 Basic concepts in operating system:


processes and threads

2002-08-15 S. Haridi, CS2104, Lecture 01 119


Summary - 2
 Basic concepts in data communication:
 Network architectures: the OSI model and the
Internet model
 Connection-oriented communication vs.
connectionless communication
 Naming schemes for network resources
 The Domain Name System (DNS)
 Protocol port numbers

 Uniform Resource Identifier (URI)

 Email addresses
2002-08-15 S. Haridi, CS2104, Lecture 01 120

You might also like