Professional Documents
Culture Documents
Outline
Introduction
Concepts
Collective parallel I/O algorithms
Collective buffering experiments
Conclusion
Question
Introduction
Existing parallel I/O system evolved
directly from I/O system for serial
machines
Serial I/O systems are heavily tuned for:
Sequential, large accesses, limited file
sharing between processes
High degree of both spatial and temporal
locality
Introduction (cont.)
This paper presents a set of algorithms
known as Collective Buffering algorithms
These algorithms seeks to improve I/O
performance on distributed memory
machines by utilizing global knowledge of
the I/O operations
Concepts
Global data structure
Global data structure is the logical view of the
data from the applications point of view
Scientific applications generally use global
data structures consisting of arrays distributed
in one, two, or three dimensions
Concepts (cont.)
Data distribution
The global data structure is distributed among
node memories by cutting it into data chunks.
The HPF BLOCK distribution partitions the
global data structure into P equally sized
pieces
The HPF CYCLIC divides the global data
structure into small pieces (by distribution size
or block size) and deals these pieces out to
the P nodes in a round-robin fashion
Concepts (cont.)
Concepts (cont.)
File layout
File layout is another form of data distribution
The file represents a linearization of the global
data structures, such as the row-major
ordering of a three-dimensional array
This linearization is called canonical file
The file are distributed among I/O nodes
Concepts (cont.)
Conclusion
Collective buffering significantly improves
Nave parallel I/O performance by two
orders of magnitude for small data block
sizes
Peak performance can be obtained with
minimal buffer space (approximately 1
megabyte per I/O node)
Performance is dependent on intermediate
distribution (up to a factor of 2)
Conclusion (cont.)
There is no single intermediate distribution
which provides the best performance for
all cases, but a few come close
Collective buffering with scatter/gather can
potentially deliver peak performance for all
data block sizes.
Question
What is the advantages and
disadvantages of the Nave algorithm ?
What is Collective Buffering and how this
technique may improve parallel I/O
performance ?