CSCI 315: Artificial Intelligence Through Deep Learning: W&L Winter Term 2017 Prof. Levy

CSCI 315: Artificial Intelligence
through Deep Learning
W&L Winter Term 2017

Prof. Levy
Introduction to Deep Learning with

TensorFlow
Why TensorFlow (vs. just NumPy)?
• Recall two main essentials: dot product and
activation function derivative.
• Dot product: n
net j =∑ a i wij , w 0≡1
i=0
hnet = np.dot(append(Ij,1), self.wih)
• “Embarrassingly Parallel”: since each unit i has its
own incoming weights, neti can be computed
independently from / simultaneously with all other
units in its layer.
• On an ordinary computer, we (NumPy dot) must
compute one neti after another, sequentially.
Ordinary dot product computation
for a layer
First me! Then me!

Exploiting Parallelism
All together now!

GPU to the Rescue!
• Graphics Processing Unit: Designed for videogames,

to exploit the parallelism in pixel-level updates.
• NVIDIA offers CUDA API for programmers, but it's
wicked hard – need to track locations of values in
memory.
• TensorFlow exploits GPU / CUDA if they're available.
GPU: A Multi-threaded
architecture
A traditional architecture has: one
processor, one memory, one process at a
time:
CPU
Von Neumann
Bottleneck
Memory
http://web.eecs.utk.edu/~plank/plank/classes/cs360/360/notes/Memory/lecture.html
• A distributed architecture (e.g., Beowulf cluster) has
several processors, each with its own memory
• Communication among processors uses message-
passing (e.g., MPI)
Connecting Network
CPU CPU … CPU
Memory Memory … Memory

• A shared memory architecture allows several processes
to access the same memory, either from a single CPU or
several CPUs
• Typically, a single process launches several “lightweight
processes” called threads, which all share the same heap
and global memory with each having its own stack.
• Ideally, each thread runs on its own processor (“core”)
Core 1 Core 1 … Core n
NVIDIA
Jetson TK1: NVIDIA GeForce
Memory (Heap / Globals) 192 cores GTX 1080Ti:
3584 cores
Python vs. NumPy vs. TensorFlow
• Dot product in “naive” Python:
• This will be slow, because the interpreter is

executing the loop code c += a[k] * b[k] over and
over
• Some speedup is likely once the interpreter has
compiled your code into a .pyc (bytecode) file.
Python vs. NumPy vs. TensorFlow
• Dot in NumPy: c = np.dot(a, b)
• “Under the hood”: Your arrays a and b are passed
to a pre-compiled C program that computes the dot
product, typically much faster than you would get
with your own code:
• Hence, TensorFlow will require us to specify info

about types and memory in order to exploit GPU
Why TensorFlow (vs. just NumPy)?
• Recall two main essentials: dot product and
activation function derivative.
• Activation function derivative:
1 df (x) ex
f (x )= = f ' (x)= x 2 = f ( x)(1− f ( x))
1+e−x dx (1+e )
f (x )=tanh(x ) f ' (x )=sech2 ( x)

e ix ∂ yi
y i=f ( x i)= = yi(1− y i )if i= j ,− y i y j if i≠*j
∑ ej x
∂yj
j
• This is called symbolic differentiation and requires

us to use our calculus or a special computation tool,
case-by-case. TensorFlow will automate this for us!
Tensor + Flow = TensorFlow
• Scalar: a single number (rank = zero)
• Vector: a sequence of numbers (rank = one)
• Matrix: a rectangular array of numbers
(rank= two)
• Tensor: any rank
https://www.mathworks.com/help/matlab/math/ch_data_struct5.gif
Rank as Bracket Count
TensorFlow: First Program
Line-by-line analysis
Our usual import-and-abbreviate

(c.f. import numpy as np)
• These are the parameters (weights and biases) of our
familiar neural-net layer.
• 28x28=784 pixels for input image; 10 possible digits
at output
• Like NumPy, Tensor flow provides some useful
generator functions (random_uniform, zeros)
that we can call directly.
• What’s really new is the Variable object: this is the
component from which we will build our networks.
• For input data, TensorFlow requires a special kind
of object called a placeholder.
• Note the mandatory data type (32-bit float):
essential for GPU and related high-performance
tricks!
• So matmul is pretty clearly the TensorFlow
equivalent of NumPy dot.
• Unlike dot, however, matmul does not return an
immediate result; instead, it gives us the ability to
compute a result, in a Session (up next).
• We do however have enough to visualize our
model, using a traditional dataflow graph (hence the
“Flow” part of TensorFlow) ....
y
Adapted from Buduma (2017) Fig. 3.2

TensorFlow Sessions:
Getting the Job Done
https://stackoverflow.com/questions/44433438/understanding-tf-global-variables-initializer
Finishing Up
An ordinary Python list,
containing 784 ones.
Note name agreement
So what output do we expect?

Finishing Up

CSCI 315: Artificial Intelligence Through Deep Learning: W&L Winter Term 2017 Prof. Levy

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CSCI 315: Artificial Intelligence Through Deep Learning: W&L Winter Term 2017 Prof. Levy

Uploaded by

Copyright:

Available Formats

CSCI 315: Artificial Intelligence

through Deep Learning

W&L Winter Term 2017

Introduction to Deep Learning with

First me! Then me!

All together now!

• Graphics Processing Unit: Designed for videogames,

CPU CPU … CPU

Memory Memory … Memory

Core 1 Core 1 … Core n

• This will be slow, because the interpreter is

• Hence, TensorFlow will require us to specify info

f (x )=tanh(x ) f ' (x )=sech2 ( x)

• This is called symbolic differentiation and requires

Our usual import-and-abbreviate

Adapted from Buduma (2017) Fig. 3.2

Note name agreement

So what output do we expect?

You might also like