Ryan Sudhakaran Pposter

Uploaded by

Akihisa Yoshii

0% found this document useful (0 votes)

33 views1 page

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Ryan Sudhakaran Pposter

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

33 views1 page

Ryan Sudhakaran Pposter

Uploaded by

Akihisa Yoshii

Ryan Sudhakaran Pposter

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 1

Search inside document

GPU Cluster computing with Nvidia Jetson TK1 boards

Ryan Sudhakaran1
Mentor: Weile Wang2

National Aeronautics and Space Administration

San Jose State University CAARE Program

Tests and Results

Abstract
The use of GPUs (graphical processing units) for HPC (high performance computing)
applications is becoming increasingly popular among researchers in the scientific
community. GPUs are designed for efficient graphical simulation, but are also
optimized for parallel applications and can be incredibly useful in various fields such as
computational fluid dynamics and machine learning. The NVIDIA Jetson TK1
development kit is a miniature computing device that uses the Nvidia Kepler
architecture, which is very efficient at parallel processing. The goal is to eventually
have a way of testing future HPC applications on a smaller scale for educational and
proof-of-concept purposes.

Background
How does a parallel computation
differ from a traditional
computation? Imagine you have
4 independent tasks that you
need to compute in some
program. A serial computation
process will perform tasks 1-4
sequentially, while a parallel
computation process can
compute each task simulateously
(given that there are 4 distinct
processors). It is fairly intuitive to
see that a paralell processor can
allow for much faster
computation given a large
number of tasks. Unfortunately
parallelization is not a perfect
solution to every computational
problem, rather they are best at
dealing with problems that can
be broken down into very simple
operations (such as large linear
algebra solvers or sorting
algorithms).

plot

Figure 1: A serial processing scheme [1]

Figure 2: A parallel processing scheme [1]

www.nasa.gov
1

To test whether the cluster was

performing more efficiently than
a single node, I used a code
provided by John Burkhardt[2]
that implements parellization in
the counting of prime numbers
from 1 to N (with N ranging
from 1 to 262144). With
parallelization via MPI
(message passing interface),
the range N can be broken up
and distributed among the
nodes, allowing for
simulatneous counting.
Presented are the results of the
parallel prime program being
run on a single Jetson vs. a
cluster with varying number of
processes.

San Jose State University, 2NASA Ames Research Center

Setup

Note that both the multi-node and single-node implementation become

more efficient at the jump from 4 to 8 processes, but the single node
become less efficient as the number becomes larger. The jump from 16 to
32 processes has little effect on the runtime.

Future Improvements
Develop simple fluid dynamics simulation or
atmospheric model utilizing both GPU and
cluster parallelization (CUDA-Aware MPI)
Test various cluster based algorithms for
machine learning
Collect data on most efficient load-balancing
and optimize workload distribution among
nodes and cores.

References
[1]https://computing.llnl.gov/tutorials/parallel_comp/
[2]https://people.sc.fsu.edu/~jburkardt/c_src/prime_mpi/prime_mpi.html

Email: ryan.sudhakaran@sjsu.edu
Support provided by NASA Office of Educations Minority University Research and Education Project,
Contract #NNX15AQ02A

Design and Implementation of A Parallel Priority Queue On Many-Core Architectures
Document10 pages
Design and Implementation of A Parallel Priority Queue On Many-Core Architectures
fonseca_r
No ratings yet
Unit1, Module1 Activity3
Document1 page
Unit1, Module1 Activity3
Piyush Chawla Student - PantherCreekHS
No ratings yet
Csit3913 PDF
Document12 pages
Csit3913 PDF
Deivid CArpio
No ratings yet
"Studies On Implementation and Enhancement of Computational Speed of Computer "Using New Binary Number System"
Document9 pages
"Studies On Implementation and Enhancement of Computational Speed of Computer "Using New Binary Number System"
reeta1981
No ratings yet
Soto Ferrari
Document9 pages
Soto Ferrari
fawzi5111963_7872830
No ratings yet
SPPU High Performance Computing
Document12 pages
SPPU High Performance Computing
Govind Rajput
No ratings yet
Christen 07
Document8 pages
Christen 07
bernasek
No ratings yet
International Journal of Distributed and Parallel Systems (IJDPS)
Document20 pages
International Journal of Distributed and Parallel Systems (IJDPS)
ijdps
No ratings yet
Spark On Hadoop Vs MPI OpenMP On Beowulf
Document10 pages
Spark On Hadoop Vs MPI OpenMP On Beowulf
ravigobi
No ratings yet
Efficient Scalable Median Filtering Using Histogram-Based Operations
Document12 pages
Efficient Scalable Median Filtering Using Histogram-Based Operations
tresa
No ratings yet
Parallel Computing An Introduction
Document40 pages
Parallel Computing An Introduction
cleopatra2121
No ratings yet
A Deployment of Lambda Calculus in Distributed Systems
Document6 pages
A Deployment of Lambda Calculus in Distributed Systems
thrw3411
No ratings yet
Benchmarking State-of-the-Art Deep Learning Software Tools
Document6 pages
Benchmarking State-of-the-Art Deep Learning Software Tools
Luis Dominguez Leiton
No ratings yet
tpds21 Taskflow
Document18 pages
tpds21 Taskflow
f f
No ratings yet
Prezentare Paper NBiS 2014
Document22 pages
Prezentare Paper NBiS 2014
Catalin Negru
No ratings yet
Model For Cluster Omputing
Document10 pages
Model For Cluster Omputing
hulla_ragh
No ratings yet
Low-Complexity Task Scheduling for Heterogeneous Systems
Document15 pages
Low-Complexity Task Scheduling for Heterogeneous Systems
AyushAwasthi
No ratings yet
Ijaia 040202
Document21 pages
Ijaia 040202
Adam Hansen
No ratings yet
Parallel Computing Seminar Report
Document35 pages
Parallel Computing Seminar Report
Ameya Waghmare
100% (3)
Parallel Computing: "Parallelization" Redirects Here. For Parallelization of Manifolds, See
Document20 pages
Parallel Computing: "Parallelization" Redirects Here. For Parallelization of Manifolds, See
Vaibhav Roy
No ratings yet
Acceleration of Element-By-Element Kernel in Unstructured Implicit Low-Order Finite-Element Earthquake Simulation Using OpenACC On Pascal GPUs
Document12 pages
Acceleration of Element-By-Element Kernel in Unstructured Implicit Low-Order Finite-Element Earthquake Simulation Using OpenACC On Pascal GPUs
wyn1961518411
No ratings yet
Research Article: Scheduling Algorithm: Tasks Scheduling Algorithm For Multiple Processors With Dynamic Reassignment
Document10 pages
Research Article: Scheduling Algorithm: Tasks Scheduling Algorithm For Multiple Processors With Dynamic Reassignment
Sirajul Islam
No ratings yet
CS416 - Parallel and Distributed Computing: Lecture # 01
Document20 pages
CS416 - Parallel and Distributed Computing: Lecture # 01
abdullah ashraf
No ratings yet
GPU Graph Traversal
Document11 pages
GPU Graph Traversal
Owais Akbani
No ratings yet
A Presentation On Parallel Computing: - Ameya Waghmare (Rno 41, BE CSE) Guided by-Dr.R.P.Adgaonkar (HOD), CSE Dept
Document32 pages
A Presentation On Parallel Computing: - Ameya Waghmare (Rno 41, BE CSE) Guided by-Dr.R.P.Adgaonkar (HOD), CSE Dept
Manish Ravula
No ratings yet
Command Concurrrency OPENcl
Document20 pages
Command Concurrrency OPENcl
sughra afzali
No ratings yet
A Programming Model For Massive Data Parallelism With Data Dependencies
Document8 pages
A Programming Model For Massive Data Parallelism With Data Dependencies
web1_webteam
No ratings yet
AI Technology For NoC Performance Evaluation
Document5 pages
AI Technology For NoC Performance Evaluation
Soumya Shatakshi Panda
No ratings yet
Ijait 020602
Document12 pages
Ijait 020602
ijaitjournal
No ratings yet
Assignment-2 Ami Pandat Parallel Processing: Time Complexity
Document12 pages
Assignment-2 Ami Pandat Parallel Processing: Time Complexity
VICTBTECH SPU
No ratings yet
The Data Locality of Work Stealing: Theory of Computing Systems
Document27 pages
The Data Locality of Work Stealing: Theory of Computing Systems
Anonymous RrGVQj
No ratings yet
Dubb: Interactive, Heterogeneous Communication
Document7 pages
Dubb: Interactive, Heterogeneous Communication
tubagusrizal
No ratings yet
GPU Versus FPGA For High Productivity Computing: Imperial College London, Electrical and Electronic Engineering, London
Document6 pages
GPU Versus FPGA For High Productivity Computing: Imperial College London, Electrical and Electronic Engineering, London
Usama Javed
No ratings yet
Econom On 2016
Document44 pages
Econom On 2016
王崇岳
No ratings yet
High Performance Computing Using Parallel Processing
Document3 pages
High Performance Computing Using Parallel Processing
api-19815974
No ratings yet
Nscet E-Learning Presentation: Listen Learn Lead
Document51 pages
Nscet E-Learning Presentation: Listen Learn Lead
durai murugan
No ratings yet
High Performance Computer Architecture (CS60003)
Document15 pages
High Performance Computer Architecture (CS60003)
Sunil Mishra
No ratings yet
Scimakelatex 25942 A B C D
Document4 pages
Scimakelatex 25942 A B C D
One TWo
No ratings yet
1 PB
Document9 pages
1 PB
Okwuogu Chisom
No ratings yet
A Presentation On Parallel Computing: - Ameya Waghmare (Rno 41, BE CSE) Guided by-Dr.R.P.Adgaonkar (HOD), CSE Dept
Document32 pages
A Presentation On Parallel Computing: - Ameya Waghmare (Rno 41, BE CSE) Guided by-Dr.R.P.Adgaonkar (HOD), CSE Dept
Shubhadip Dinda
No ratings yet
Dynamic Data Management Among
Document11 pages
Dynamic Data Management Among
CS & IT
No ratings yet
Parallel & Distributed Computing
Document52 pages
Parallel & Distributed Computing
litbumreader
No ratings yet
Use of DAG in Distributed Parallel Computing
Document5 pages
Use of DAG in Distributed Parallel Computing
International Journal of Application or Innovation in Engineering & Management
No ratings yet
Bayesian, Linear-Time Communication
Document7 pages
Bayesian, Linear-Time Communication
Gath
No ratings yet
Scimakelatex 4782 Anon Mous
Document7 pages
Scimakelatex 4782 Anon Mous
mdp anon
No ratings yet
MPI Application Tune Up r5
Document23 pages
MPI Application Tune Up r5
rida99
No ratings yet
OS Coctaler
Document25 pages
OS Coctaler
Sundus Adawi
No ratings yet
Synchronous Data Flow Algorithms
Document12 pages
Synchronous Data Flow Algorithms
cpayne10409
No ratings yet
Nasa Case Study 11
Document4 pages
Nasa Case Study 11
tecotaco
No ratings yet
Parallel Computing: An Overview of Concepts, Approaches and Implementation
Document32 pages
Parallel Computing: An Overview of Concepts, Approaches and Implementation
aadrika
No ratings yet
Introduction to Parallel Processing Fundamentals
Document51 pages
Introduction to Parallel Processing Fundamentals
Dattatray Bhate
No ratings yet
Rssi07 Paper
Document8 pages
Rssi07 Paper
radicalbnd
No ratings yet
ComputerJournal2011 (Curvas Elípticas)
Document19 pages
ComputerJournal2011 (Curvas Elípticas)
Mario GlezP
No ratings yet
A Presentation On Parallel Computing: - Ameya Waghmare (Rno 41, BE CSE) Guided by-Dr.R.P.Adgaonkar (HOD), CSE Dept
Document32 pages
A Presentation On Parallel Computing: - Ameya Waghmare (Rno 41, BE CSE) Guided by-Dr.R.P.Adgaonkar (HOD), CSE Dept
aadrika
No ratings yet
Exploiting Dynamic Resource Allocation: 1. Abstract
Document5 pages
Exploiting Dynamic Resource Allocation: 1. Abstract
Arnav Guddu
No ratings yet
Dean 08 Map Reduce
Document7 pages
Dean 08 Map Reduce
Lindsey Hoffman
No ratings yet
High Performance and Scalable GPU Graph Traversal
Document15 pages
High Performance and Scalable GPU Graph Traversal
kumarabarbarian
No ratings yet
Computação Paralela
Document18 pages
Computação Paralela
Guilherme Alminhana
No ratings yet
Evan Cabral Methods Paper
Document6 pages
Evan Cabral Methods Paper
John
No ratings yet
Heterogeneous Computing with OpenCL
From Everand
Heterogeneous Computing with OpenCL
Benedict Gaster
Rating: 1 out of 5 stars
1/5 (1)
Heterogeneous Computing with OpenCL 2.0
From Everand
Heterogeneous Computing with OpenCL 2.0
David R. Kaeli
No ratings yet
Manual 3com SuperStack 3 4400 (
Document90 pages
Manual 3com SuperStack 3 4400 (
Heladio Juarez
No ratings yet
Flyer GV-CS1320
Document2 pages
Flyer GV-CS1320
Ebinbin Ajagun
No ratings yet
FortiGate Entry Level PO - SE - Updatelite - Watermark
Document36 pages
FortiGate Entry Level PO - SE - Updatelite - Watermark
Mohamed Kalkoul
No ratings yet
Lab 3 Mikro P Report
Document10 pages
Lab 3 Mikro P Report
Muhammad Adnan
No ratings yet
Uninstall GG
Document2 pages
Uninstall GG
Vignesh Renganath
No ratings yet
Minitab 16
Document8 pages
Minitab 16
Vitaly
No ratings yet
PM810 Modbus Application Note
Document10 pages
PM810 Modbus Application Note
bilal761
No ratings yet
Web Prog Slides Jscript Ver
Document76 pages
Web Prog Slides Jscript Ver
Hassan Tiger
No ratings yet
Shubhamastu
Document4 pages
Shubhamastu
PSG
No ratings yet
Implement priority queue using array
Document3 pages
Implement priority queue using array
Eced
No ratings yet
Connecting Remotely To The HUS Network: 1. Installing The Computer Software
Document4 pages
Connecting Remotely To The HUS Network: 1. Installing The Computer Software
Mikko
No ratings yet
STM Unit - 1 Notes
Document34 pages
STM Unit - 1 Notes
sam john
No ratings yet
CSNB224 Course Outline
Document4 pages
CSNB224 Course Outline
Akhmal Haziq
No ratings yet
Datastage Training Guide for ETL Development
Document5 pages
Datastage Training Guide for ETL Development
jhansi rani
No ratings yet
Read Chapter 8 of Our Textbook! Subject: Objects and Classes
Document2 pages
Read Chapter 8 of Our Textbook! Subject: Objects and Classes
VuQuocAn
No ratings yet
Systems Engineer With Experience in Developing
Document3 pages
Systems Engineer With Experience in Developing
Alexis Guzman Valverde
No ratings yet
Cisco Nexus 3232C - 215-15147 - A0
Document9 pages
Cisco Nexus 3232C - 215-15147 - A0
nixdorf
No ratings yet
Bubble Sort Algorithm Explained
Document16 pages
Bubble Sort Algorithm Explained
JITENDRA SINGH RAJPUROHIT
No ratings yet
Socket Programming
Document206 pages
Socket Programming
princekumar
100% (5)
Jarir IT Flyer Qatar1
Document4 pages
Jarir IT Flyer Qatar1
sebincherian
No ratings yet
Fortigate 80E Series: Data Sheet
Document6 pages
Fortigate 80E Series: Data Sheet
essoh
No ratings yet
REF54 Tob 750443 ENh
Document158 pages
REF54 Tob 750443 ENh
Uday Powar
No ratings yet
IDoc Archiving
Document22 pages
IDoc Archiving
Mahesh Tatireddy
100% (1)
BIOS Handbook D25xx: Description
Document74 pages
BIOS Handbook D25xx: Description
Ioana Bulgariu
No ratings yet
For ( (I 0 I 2 I++) ) Do Read A ($i) Done For ( (I 0 I 2 I++) ) Do Echo " "$ (A ($i) ) Done Output:-Ubuntu@ubuntu: $ Bash CB - SH 1 3
Document8 pages
For ( (I 0 I 2 I++) ) Do Read A ($i) Done For ( (I 0 I 2 I++) ) Do Echo " "$ (A ($i) ) Done Output:-Ubuntu@ubuntu: $ Bash CB - SH 1 3
Dhruv Pandit
No ratings yet
Basic Operation of Plato
Document34 pages
Basic Operation of Plato
suka baca buku
No ratings yet
Proposal For Coding Challenge
Document3 pages
Proposal For Coding Challenge
Kimball
No ratings yet
Osintgram : Osintgram Is A OSINT Tool On Instagram To Collect, Analyze, and Run Reconnaissance
Document5 pages
Osintgram : Osintgram Is A OSINT Tool On Instagram To Collect, Analyze, and Run Reconnaissance
Imnot Andy
No ratings yet
Digital Signage Player Android 11: DJP-E350
Document9 pages
Digital Signage Player Android 11: DJP-E350
Rahul Jha
No ratings yet