Professional Documents
Culture Documents
SECTION A
Ans-
1. Safety Property: At any instant, only one process can execute the critical section.
2. Liveness Property: This property states the absence of deadlock and starvation. Two
or more sites should not endlessly wait for messages which will never arrive.
3. Fairness: Each process gets a fair chance to execute the CS. Fairness property
generally means the CS execution requests are executed in the order of their arrival
(time is determined by a logical clock) in the system.
Ans- The term distributed system is used to describe a system with the following
characteristics: it consists of several computers that do not share a memory or a clock; the
computers communicate with each other by exchanging messages over a communication
network; and each computer has its own memory and runs its own operating system
Ans- In distributed deadlock detection, the delay in propagating local information might
cause the deadlock detection algorithms to identify deadlocks that do not really exist. Such
situations are called phantom deadlocks and they lead to unnecessary aborts.
Ans- This mechanism provides the binding together of different filename spaces to form a
single hierarchically structured name space. It is UNIX specific and most of existing DFS
(Distributed File System) are based on UNIX. A filename space can be bounded to or
mounted at an internal node or a leaf node of a namespace tree. A node onto which a name
space is mounted is called mount point. The kernel maintains a mount table, which maps
mount points to appropriate storage devices.
Ans- Fault tolerance is the property that enables a system to continue operating properly in
the event of the failure of some (one or more faults within) of its components. If its operating
quality decreases at all, the decrease is proportional to the severity of the failure, as compared
to a native designed system in which even a small failure can cause total breakdown. Fault
tolerance is particularly sought after in high-availability or life-critical systems.
Ans- Fault is a condition that causes the software to fail to perform its required function but
failure is the inability of a system or component to perform required function according to its
specification.
Ans- In a flat transaction, a client makes requests to more than one server. A flat client
transaction completes each of its requests before going on to the next one. Therefore, each
transaction accesses servers‟ objects sequentially. When servers use locking, a transaction
can only be waiting for one object at a time. In a nested transaction, the top-level transaction
can open subtransactions, and each subtransaction can open further subtransactions down to
any depth of nesting.
Ans- A lock or mutex (from mutual exclusion) is a synchronization mechanism for enforcing
limits on access to a resource in an environment where there are many threads of execution.
A lock is designed to enforce a mutual exclusion concurrency control policy.
SECTION B
Ans- The term distributed system is used to describe a system with the following
characteristics: it consists of several computers that do not share a memory or a clock; the
computers communicate with each other by exchanging messages over a communication
network; and each computer has its own memory and runs its own operating system. The
resources owned and controlled by a computer are said to be local to it, while the resources
owned and controlled by other computers and those that can only be accessed through the
network are said to be remote. Typically, accessing remote resources is more expensive than
accessing local resources because of the communication delays that occur in the network and
the CPU overhead incurred to Process communication protocols.
In the workstation model, the distributed system consists of a number of workstations (up to
several thousand). Each user has a workstation at his disposal, where in general, all of the
user's work is performed. With the help of a distributed file system, a user can access data
regardless of the location of the data or of the user's workstation. The ratio of the number of
processors to the number of users is normally one. The workstations are typically equipped
with a powerful processor} memory, a bit-mapped display, and in some cases a math co-
processor and local disk storage. Athena and Andrew are examples of this workstation model.
In the processor pool model, the ratio of the number of processors to the number of users is
normally water than one. This model attempts to allocate one or more processors according to
a user's needs. Once the processors assigned to a user complete their tasks, they return to the
pool and await a new assignment. Amoeba is an experimental system that is a combination of
the workstation and the processor pool models. In Amoeba, each user has a workstation
where the user performs tasks that require a quick interactive response (such as editing). In
addition to the workstation, users have access to a pool of processors for running applications
that require greater speed (such as parallel algorithms performing significant numerical
computations)
Ans- In Lamport‟s algorithm requests for CS are executed in the increasing order of
timestamps and time is determined by logical clocks. Every site Si keeps a queue, request
queuei , which contains mutual exclusion requests ordered by their timestamps. This
algorithm requires communication channels to deliver messages the FIFO order.
The Algorithm
When a site Si wants to enter the CS, it broadcasts a REQUEST(tsi , i ) message to all
other sites and places the request on request queuei . ((tsi , i ) denotes the timestamp of
the request.)
NCS 701 Solution_2
When a site Sj receives the REQUEST(tsi , i ) message from site Si ,places site Si ‟s
request on request queuej and it returns a timestamped REPLY message to Si .
L1: Si has received a message with timestamp larger than (tsi , i ) from all other sites.
Site Si , upon exiting the CS, removes its request from the top of its request queue and
broadcasts a timestamped RELEASE message to all other sites.
When a site Sj receives a RELEASE message from site Si, it removes Si‟s request from
its request queue.
When a site removes a request from its request queue, its own request may come at the
top of the queue, enabling it to enter the CS.
An optimization
In Lamport‟s algorithm, REPLY messages can be omitted in certain situations. For example,
if site Sj receives a REQUEST message from site Si after it has sent its own REQUEST
message with timestamp higher than the timestamp of site Si ‟s request, then site Sj need not
send a REPLY message to site Si. This is because when site Si receives site Sj‟s request with
timestamp higher than its own, it can conclude that site Sj does not have any smaller
timestamp request which is still pending. With this optimization, Lamport‟s algorithm
requires between 3(N − 1) and 2(N − 1) messages per CS execution.
Q-2(c). What are the various modes of processor failure? What are agreement and
validity objectives of The Interactive Consistency Problem?
PROCESS DEATHS- When a process dies, it is important that the resources allocated to
that process are recouped, otherwise they may be permanently lost. Many distributed systems
are structured along the client-server model in which a client requests a service by sending a
message to a server. If the server process fails, it is necessary that the client machine be
informed so that the client process, waiting for a reply can be unblocked to take suitable
action. Likewise, if a client process dies after sending a request to a server, it is imperative
that the server be informed that the client process no longer exists. This will facilitate the
server in reclaiming any resources it has allocated to the client process.
MACHINE FMLURE-.In the case of machine failure, all the processes running at the
machine will die. As far as the behavior of a client process or a server process is concerned,
there is not much ditftrence in their behavior in the event of a machine failure or a process
death. The only difference lies in how the failure is detected. In the case of a process death,
other processes including the kernel remain active. Hence, a message stating that the process
has died can be sent to an inquiring process. On the other hand, an absence of any kind of
message indicates either process death or a failure due to machine failure.
NETWORK FAILURE- A communication link failure can partition a network into subnets,
making it impossible for a machine to communicate with another machine in a different
subnet. A process cannot really tell the difference between a machine and a communication
link failure, unless the underlying communication network (such as a 'slotted ring network)
can recognize a machine failure. If the communication network cannot recognize machine
failures and thus cannot return suitable error code (such as Ethernet, a fault-tolerant design
will have to assume that a machine may be operating and processes on that machine are
active.
Recovery is complicated by the fact that a failed replica manager is a member of a collection
and that the other members continue to provide a service during the time that it is
unavailable. When a replica manager recovers from a failure, it uses information obtained
from the other replica mangers to restore its objects to their current values, taking into
account all the changes that have occurred during the time it was unavailable.
SECTION C
Ans-
Ans- Resource management in distributed operating systems is concerned with making both
local and remote resources available to users in an effective manner. Users of a distributed
operating system should be able to access remote resources as easily as they can access local
resources. In other words, the specific location of resources should be hidden from the users.
The resources of a distributed system are made available to users in the following ways:
data migration
computation migration
distributed scheduling
Ans- The vector clocks algorithm was independently developed by Colin Fidge and
Friedemann Mattern in 1988.
System Model
Q-4(b) What are the various performance metrics used for distributed mutual exclusion
algorithms
Synchronization delay: After a site leaves the CS, it is the time required and before the next
site enters the CS
System throughput: The rate at which the system executes requests for the CS.
System Throughput=1/(SD+E)
Where SD is the synchronization delay and E is the average critical section execution time.
Low and High Load Performance: We often study the performance of mutual exclusion
algorithms under two special loading conditions, viz., “low load” and “high load”. The load is
determined by the arrival rate of CS execution requests. Under low load conditions, there is
seldom more than one request for the critical section present in the system simultaneously.
Under heavy load conditions, there is always a pending request for critical section at a site.
Q-4 (c) What are the correctness criteria for distributed deadlock detection algorithm?
Describe edge chasing algorithm.
Ans- Correctness Criteria: A deadlock detection algorithm must satisfy the following two
conditions:
1. Progress (No undetected deadlocks):- The algorithm must detect all existing
deadlocks in finite time. In other words, after all wait-for dependencies for a deadlock
have formed, the algorithm should not wait for any more events to occur to detect the
deadlock.
2. Safety (No false deadlocks):- The algorithm should not report deadlocks which do
not exist (called phantom or false deadlocks).
NCS 701 Solution_2
Edge-Chasing Algorithm- In an edge-chasing algorithm, the presence of a cycle in a
distributed graph structure is be verified by propagating special messages called probes,
along the edges of the graph. These probe messages are different than the request and reply
messages. The formation of cycle can be deleted by a site if it receives the matching probe
sent by it previously. Whenever a process that is executing receives a probe message, it
discards this message and continues. Only blocked processes propagate probe messages along
their outgoing edges. Main advantage of edge-chasing algorithms is that probes are fixed size
messages which are normally very short.
Q-5(a) What do you mean by Distributed Shared Memory? What are its advantages?
With DSM, programs access data in the shared address space just as they access data in
traditional virtual memory. In systems that support DSM, data moves between secondary
memory and main memory as well as between main memories of different nodes. Each node
can own data stored in the shared address space, and the ownership can change when data
moves from one node to another. When a process accesses data in the shared address space, a
mapping manager maps the shared memory address to the physical memory (which can be
local or remote).
1. In the message passing model, programs make shared data available through explicit
message passing. In other words, programmers need to be conscious of the data movement
between processes. Programmers have to explicitly use communication primitives (such as
SEND and RECEIVE), a task that places a significant burden on them. In contrast, DSM
systems hide this explicit data movement and provide a simpler abstraction for sharing data
that programmers are already well versed with. Hence, it is easier to design and write parallel
algorithms using DSM rather than through explicit message passing.
3. By moving the entire block or page containing the data referenced to the site of reference
instead of moving only the specific piece of data referenced, DSM takes advantage of the
locality of reference exhibited by programs and thereby cuts down on the overhead of
communicating over the network.
4. DSM systems are cheaper to build than tightly coupled multiprocessor systems. This is
because DSM systems can be built using off-the-shelf hardware and do not require complex
interfaces to connect the shared memory to the processors.
5. The physical memory available at all the nodes of a DSM system combined together is
enormous. This large memory can be used to efficiently run programs that require larg €
memory without incurring disk latency due to swapping in traditional distributed systems.
This fact is also favored by anticipated increases in processor speed relative to memory speed
and the advent of very fast networks.
Q-5(b) Explain the architecture of distributed file system. Discuss name resolution and
writing policy in distributed file system.
Ans- In a distributed file system, files can be stored at any machine (or computer) and the
computation can be performed at any machine (I. e., the machines are peers). When a
machine needs to access a file stored on a remote machine, the remote machine performs the
necessary file access operations and returns data if a read operation is performed. However,
for higher performance, several machines, referred to as file servers, are dedicated to storing
files and performing storage and retrieval operations. The rest of the machines in the system
can be used solely for computational purposes. These machines are referred to as clients and
they access the files stored on servers. Some client machines may also be equipped with a
local disk storage that can be used for caching remote files, as a swap area, or as a storage
area.
Naming and Name resolution- A name in file systems is associated with an object (such as
a file or a directory). Name resolution refers to the process of mapping a name to an object or,
in the case of replication, to multiple objects. A name space is a collection of names which
may or may not share an identical resolution mechanism.
Writing Policy- The writing policy decides when a modified cache block at a client should
be transferred to the server. The simplest policy is write-through. In write-through, all writes
requested by the applications at clients are also carried out at the servers immediately. The
main advantage of write-through is reliability. In the event of a client crash, little information
is lost. A write-through policy, however, does not take advantage of the cache.
If process X is to be rolled back, it can be rolled back to the recovery point l3without
affecting any other process. Suppose that Y fails after sending message m and is rolled back
to y2. In this case, the receipt of m is recorded in x3, but the sending of m is not recorded in
y2. Now we have a situation where X has received message m from Y, but Y has no record of
sending it, which corresponds to an inconsistent state.
Under such circumstances, m is referred to as an orphan message and process X must also
roll back. X must roll back because Y interacted with X after establishing its recovery point
92. When Y is rolled back to y2 the event that is responsible for the interaction is undone.
Therefore, all the effects at X caused by the interaction must also be undone. This can be
achieved by rolling back X to recovery point x2. Likewise, it can be seen that, if Z is rolled
back, all three processes must roll back to their very first recovery points, namely, x19 911
and 21. This effect, where rolling back one process causes one or more other processes to roll
back, is known as the domino effect and orphan messages are the cause.
Ans-
Static voting
Dynamic voting
System Model- The replicas of files are stored at different sites. Every file access operation
requires that an appropriate lock is obtained. The lock granting rules allow either „one writer
and no readers‟ or „multiple readers and no writers‟ simultaneously. It is assumed that at
every site there is a lock manager the lock related operations, and every file is associated with
a version number, gives the number of times the file has · been updated. The version
numbers are stored on stable storage, and every successful write operation on a replica
updates its version number.
VOTE ASSIGNMENT- Let v be the total number of votes assigned to all the copies. The
values for T (read quorurn) and w (write quorum) are selected such that
The values selected for T and w combined with the fact that write operations update only the
current copies guarantee the following:
Ans- Distributed Deadlock- With deadlock detection schemes, a transaction is aborted only
when it is involved in a deadlock. Most deadlock detection schemes operate by finding cycles
in the transaction wait-for graph. In a distributed system involving multiple servers being
accessed by multiple transactions, a global wait-for graph can in theory be constructed from
the local ones. There can be a cycle in the global wait-for graph that is not in any single local
one – that is, there can be a distributed deadlock. Recall that the wait-for graph is a directed
graph in which nodes represent transactions and objects, and edges represent either an object
held by a transaction or a transaction waiting for an object. There is a deadlock if and only if
there is a cycle in the wait-for graph.
Above figure shows the interleavings of the transactions U, V and W involving the objects A
and B managed by servers X and Y and objects C and D managed by server Z.
Detection of a distributed deadlock requires a cycle to be found in the global transaction wait-
for graph that is distributed among the servers that were involved in the transactions. Local
wait-for graphs can be built by the lock manager at each server. In the above example, the
local wait-for graphs of the servers are:
As the global wait-for graph is held in part by each of the several servers involved,
communication between these servers is required to find cycles in the graph.
The complete wait-for graph in figure (a) shows that a deadlock cycle consists of alternate
edges, which represent a transaction waiting for an object and an object held by a transaction.
Ans- Replication is a technique for enhancing services. The motivations for replication
include:
Performance enhancement: The caching of data at clients and servers is by now familiar as
a means of performance enhancement. Browsers and proxy servers cache copies of web
resources to avoid the latency of fetching resources from the originating server. Furthermore,
data are sometimes replicated transparently between several originating servers in the same
domain.
Increased availability: Users require services to be highly available. That is, the proportion
of time for which a service is accessible with reasonable response times should be close to
100%. Apart from delays due to pessimistic concurrency control conflicts (due to data
locking), the factors that are relevant to high availability are:
• Server failures;
• Network partitions and disconnected operation (communication disconnections that
are often unplanned and are a side effect of user mobility).
Fault tolerance: Highly available data is not necessarily strictly correct data. It may be out of
date, for example; or two users on opposite sides of a network partition may make updates
that conflict and need to be resolved. A fault-tolerant service, by contrast, always guarantees
strictly correct behaviour despite a certain number and type of faults. The correctness
concerns the freshness of data supplied to the client and the effects of the client‟s operations
upon the data. Correctness sometimes also concerns the timeliness of the service‟s responses
– such as, for example, in the case of a system for air traffic control, where correct data are
needed on short timescales.
System model-
Group Communication- Systems that can adapt as processes join, leave and crash – fault-
tolerant systems, in particular – require the more advanced features of failure detection and
notification of membership changes. A full group membership service maintains group views,
which are lists of the current group members, identified by their unique process identifiers.
The list is ordered, for example, according to the sequence in which the members joined the
group. A new group view is generated each time that a process is added or excluded. It is
important to understand that a group membership service may exclude a process from a group
because it is Suspected, even though it may not have crashed. A communication failure may
have made the process unreachable, while it continues to execute normally. A membership
service is always free to exclude such a process. The effect of exclusion is that no messages
will be delivered to that process henceforth. Moreover, in the case of a closed group, if that
process becomes connected again, any messages it attempts to send will not be delivered to
the group members. That process will have to rejoin the group (as a „reincarnation‟ of itself,
with a new identifier), or abort its operations.
A false suspicion of a process and the consequent exclusion of the process from the group
may reduce the group‟s effectiveness. The group has to manage without the extra reliability
or performance that the withdrawn process could potentially have provided. The design
challenge, apart from designing failure detectors to be as accurate as possible, is to ensure
that a system based on group communication does not behave incorrectly if a process is
falsely suspected. An important consideration is how a group management service treats
network partitions. Disconnection or the failure of components such as a router in a network
may split a group of processes into two or more subgroups, with communication between the
subgroups impossible. Group management services differ in whether they are primary
partition or partitionable. In the first case, the management service allows at most one
subgroup (a majority) to survive a partition; the remaining processes are informed that they
should suspend operations. This arrangement is appropriate for cases where the processes
manage important data and the costs of inconsistencies between two or more subgroups
outweigh any advantage of disconnected working.