Professional Documents
Culture Documents
3033-012
Graphics Processing Units (GPUs):
Architecture and Programming
Lecture 1: Introduction
Mohamed Zahran (aka Z)
mzahran@cs.nyu.edu
http://www.mzahran.com
Who Am I?
Mohamed Zahran (aka Z)
Computer architecture/OS/Compilers
Interaction
http://www.mzahran.com
Office hours: Wed 5:00-7:00 pm
Room: WWH 328
Course web page:
http://cs.nyu.edu/courses/spring12/CSCI-GA.3033-012/index.html
Why GPUs
GPU Architecture
GPU-CPU Interaction
GPU programming model
Solving real-life problems using GPUs
The Textbook
Programming Massively Parallel
Processors: A Hands-on Approach
By
David B. Kirk & Wen-mei W. Hwu
Grading
Homework assignments
Project
Programming assignments
Final
: 10%
: 20%
: 30%
: 40%
Computer History
Eckert and Mauchly
Computer History
Maurice Wilkes
EDSAC 1 (1949)
http://www.cl.cam.ac.uk/UoCCL/misc/EDSAC99/
2,250 transistors
12 mm2
108 KHz
29,000 transistors
33 mm2
5 MHz
Introduced in 1979
Basic architecture
of the IA32 PC
Pentium III
9,500,000
transistors
125 mm2
450 MHz
Introduced in 1999
http://www.intel.com/intel/museum/25anniv/hof/hof_main.htm
Pentium 4
55,000,000
transistors
146 mm2
3 GHz
Introduced in 2000
http://www.chip-architect.com
IBM Power 7
Montecito
Cell Processor
Niagara
(SUN UltraSparc T2)
Hardware Improvement
Positive Cycle
of Computer
Industry
Better Software
The Status-Quo
We moved from single core to multicore
for technological reasons
Many-core GPU
Multi-core CPU
ALU
ALU
ALU
ALU
Control
CPU
GPU
Cache
DRAM
DRAM
ALU
ALU
ALU
ALU
Control
CPU
GPU
Cache
DRAM
DRAM
ALU
ALU
ALU
ALU
Control
CPU
GPU
Cache
DRAM
DRAM
Host
Input Assembler
Thread Execution Manager
Parallel Data
Cache
Parallel Data
Cache
Parallel Data
Cache
Parallel Data
Cache
Parallel Data
Cache
Parallel Data
Cache
Parallel Data
Cache
Parallel Data
Cache
Texture
Texture
Texture
Texture
Texture
Texture
Texture
Texture
Texture
Load/store
Load/store
Load/store
Load/store
Global Memory
Load/store
Load/store
Parallel Data
Cache
Parallel Data
Cache
Parallel Data
Cache
Parallel Data
Cache
Parallel Data
Cache
Parallel Data
Cache
Parallel Data
Cache
Parallel Data
Cache
Texture
Texture
Texture
Texture
Texture
Texture
Texture
Texture
Texture
Load/store
Load/store
Load/store
Load/store
Global Memory
Load/store
Load/store
Host
Input Assembler
Thread Execution Manager
Parallel Data
Cache
Parallel Data
Cache
Parallel Data
Cache
Parallel Data
Cache
Parallel Data
Cache
Parallel Data
Cache
Parallel Data
Cache
Parallel Data
Cache
Texture
Texture
Texture
Texture
Texture
Texture
Texture
Texture
Texture
Load/store
Load/store
Load/store
Load/store
Global Memory
Load/store
Load/store
Host
Input Assembler
Thread Execution Manager
Parallel Data
Cache
Parallel Data
Cache
Parallel Data
Cache
Parallel Data
Cache
Parallel Data
Cache
Parallel Data
Cache
Parallel Data
Cache
Parallel Data
Cache
Texture
Texture
Texture
Texture
Texture
Texture
Texture
Texture
Texture
Load/store
Load/store
Load/store
Load/store
Global Memory
Load/store
Load/store
Amdahl's Law
Execution Time After Improvement =
Execution Time Unaffected +( Execution Time Affected / Amount of Improvement )
Example: