Professional Documents
Culture Documents
Program Syllabus
Core Curriculum
This section consists of all the lessons and projects you need to complete in order to receive your
certificate.
3 Parts
14 Projects
1. 1
Part 1
LockedPart 2 - Locked
Purchase Term To Unlock
LockedPart 3 - Locked
Purchase Term To Unlock
Path Planning, Concentrations, and Systems
In this term, you'll learn how to plan where the vehicle should go, how the vehicle systems work
together to get it there, and you'll perform a deep-dive into a concentration of your choice.
Up Next
Extracurricular
This section consists of extra lessons and projects you can choose to complete in order to increase your
chances of changing careers.
2 Parts
7 Projects
1. 1
Part 1
Part 2
Career: Networking
Networking is a very important component to a successful job search. In the following lesson,
you will learn how tell your unique story to recruiters in a succinct and professional but
relatable way.
Up Next
https://www.youtube.com/watch?v=jHA__A61nqc
https://www.youtube.com/watch?v=QiflJFVOt18
3. Overview of ND Program
https://www.youtube.com/watch?v=RZ5iolr4RGs
https://www.youtube.com/watch?v=JGpXenoW0dk
5. Career Support
https://www.youtube.com/watch?time_continue=9&v=4MGOyNXh4EQ
6. Nanodegree Support
Getting Support
There are several ways in which you will receive support during the program from Udacity's network
of Mentors and Reviewers, as well as your fellow students.
Mentorship
You can think of your Mentor as your Advisor in the Nanodegree.
Your in-classroom Mentor will be your guide through the program and will do the following:
Check in with you weekly to make sure that you are on track.
Help you set learning goals.
Guide you to supplementary resources when you get stuck.
Respond to any questions you have about the program.
If you have questions or comments about the Mentorship experience, or if you're having trouble
reaching your mentor, please email mentorship-support@udacity.com.
Forum Q&A
Udacity Discourse will be your home for the forums and the wiki.
Aside from your Mentor, the forums are a great place to ask in-depth and technical questions.
Questions in the forums will be answered by both paid mentors and other students. Make sure to like
answers as you read them, and feel free to post answers yourself!
We will be using Discourse for the forums, and you should be able to access these forums anytime by
following the forum link on the left hand side of the classroom. Once you are there, check out the
different categories and subcategories, and post a question if you have one!
Slack Community
Your private slack team will be the best place to chat live with students and staff.
Slack is the best place for live discussion and interaction with your community of students. If you
haven't joined already, you can sign up here. (Note that this Slack instance is for enrolled students and
is different from the ND013 Slack Team.)
Reviews
Our global team of Reviewers will code review each of your project submissions usually within 24
hours.
For each project you submit, you will receive detailed feedback from a project Reviewer.
Sometimes, a reviewer might ask you to resubmit a project to meet specifications. In that case, an
indication of needed changes will also be provided. Note that you can submit a project as many times
as needed to pass.
Feedback
Please help us improve the program by submitting bugs and issues to our Waffle board.
In order to keep our content up-to-date and address issues quickly, we've set up a Waffle board to track
error reports and suggestions.
If you find an error, check there to see if it has already been filed. If it hasn't, you can file an issue by
clicking on the "Add issue" button, adding a title, and entering a description in the details (you will
need a GitHub account for this).
Links and screenshots, if available, are always appreciated!
Quiz Question
Have you signed up for Slack? Have you visited the forums?
7. Deadline Policy
Deadline Policy
When we use the term deadline with regards to Nanodegree program projects, we use it in one of two
ways:
A final deadline for passing all projects
Ongoing suggested deadline for individual projects
It is very important to understand the distinctions between the two, as your progress in the program is
measured against the deadlines weve established. Please see below for an explanation of what each
usage means.
Stanley - The car that Sebastian Thrun and his team at Stanford built to win the DARPA Grand
Challenge.
The recent advancements in self-driving cars are built on decades of work by people around the world.
In the next video, you'll get a chance to step back and learn about some of this work and how your own
contributions may one day fit into this narrative.
In particular, you'll get a chance to relive the DARPA Grand Challenge, one of the great milestones in
self-driving car technology, and meet some of the people who took on this seemingly impossible task.
This video is not required, but we highly encourage you to watch it when you get the chance.
We hope you enjoy it as much as we did!
https://www.youtube.com/watch?v=saVZ_X9GfIM
10. Self-Driving Car Quiz
10,000,000
Question 2 of 2
Can you guess which of the following companies are CURRENTLY developing self-driving cars?
Quiz Question
Which of the following features could be useful in the identification of lane lines on the road?
Color
Shape
Orientation
2. Color Selection
https://www.youtube.com/watch?time_continue=1&v=bNOWJ9wdmhk
Quiz Question
What color is pure white in our combined red + green + blue [R, G, B] image?
Check out the code below. First, I import pyplot and image from matplotlib. I also import
numpy for operating on the image.
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
I then read in an image and print out some stats. Ill grab the x and y sizes and make a copy of the
image to work with. NOTE: Always make a copy of arrays or other variables in Python. If instead, you
say "a = b" then all changes you make to "a" will be reflected in "b" as well!
# Read in the image and print out some stats
image = mpimg.imread('test.jpg')
print('This image is: ',type(image),
'with dimensions:', image.shape)
Next, I'll select any pixels below the threshold and set them to zero.
After that, all pixels that meet my color criterion (those above the threshold) will be retained, and those
that do not (below the threshold) will be blacked out.
# Identify pixels below the threshold
thresholds = (image[:,:,0] < rgb_threshold[0]) \
| (image[:,:,1] < rgb_threshold[1]) \
| (image[:,:,2] < rgb_threshold[2])
color_select[thresholds] = [0,0,0]
The result, color_select, is an image in which pixels that were above the threshold have been
retained, and pixels below the threshold have been blacked out.
In the code snippet above, red_threshold, green_threshold and blue_threshold are all
set to 0, which implies all pixels will be included in the selection.
In the next quiz, you will modify the values of red_threshold, green_threshold and
blue_threshold until you retain as much of the lane lines as possible while dropping everything
else. Your output image should look like the one below.
Image after color selection
In this case, I'll assume that the front facing camera that took the image is mounted in a fixed position
on the car, such that the lane lines will always appear in the same general region of the image. Next, I'll
take advantage of this by adding a criterion to only consider pixels for color selection in the region
where we expect to find the lane lines.
Check out the code below. The variables left_bottom, right_bottom, and apex represent the
vertices of a triangular region that I would like to retain for my color selection, while masking
everything else out. Here I'm using a triangular mask to illustrate the simplest case, but later you'll use
a quadrilateral, and in principle, you could use any polygon.
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
# Pull out the x and y sizes and make a copy of the image
ysize = image.shape[0]
xsize = image.shape[1]
region_select = np.copy(image)
# Grab the x and y sizes and make two copies of the image
# With one copy we'll extract only the pixels that meet our selection,
# then we'll paint those pixels red in the original image to see our selection
# overlaid on the original.
ysize = image.shape[0]
xsize = image.shape[1]
color_select= np.copy(image)
line_image = np.copy(image)
In the next quiz, you can vary your color selection and the shape of your region mask (vertices of a
triangle left_bottom, right_bottom, and apex), such that you pick out the lane lines and
nothing else.
In this next quiz, I've given you the values of red_threshold, green_threshold, and
blue_threshold but now you need to modify left_bottom, right_bottom, and apex to
represent the vertices of a triangle identifying the region of interest in the image. When you run the
code in the quiz, your output result will be several images. Tweak the vertices until your output looks
like the examples shown below.
Start Quiz
So you found the lane lines... simple right? Now youre ready to upload the algorithm to the car and
drive autonomously right?? Well, not quite yet ;)
As it happens, lane lines are not always the same color, and even lines of the same color under different
lighting conditions (day, night, etc) may fail to be detected by our simple color selection.
What we need is to take our algorithm to the next level to detect lines of any color using sophisticated
computer vision methods.
So, what is computer vision?
9.
What is Computer Vision?
https://www.youtube.com/watch?v=wxQhfSdxjKU
In rest of this lesson, well introduce some computer vision techniques with enough detail for you to
get an intuitive feel for how they work.
You'll learn much more about these topics during the Computer Vision module later in the program.
We also recommend the free Udacity course, Introduction to Computer Vision.
Throughout this Nanodegree Program, we will be using Python with OpenCV for computer vision
work. OpenCV stands for Open-Source Computer Vision. For now, you don't need to download or
install anything, but later in the program we'll help you get these tools installed on your own computer.
OpenCV contains extensive libraries of functions that you can use. The OpenCV libraries are well
documented, so if youre ever feeling confused about what the parameters in a particular function are
doing, or anything else, you can find a wealth of information at opencv.org.
10. Canny Edge Detection
https://www.youtube.com/watch?v=Av2GsgQWX8I
https://www.youtube.com/watch?time_continue=6&v=LQM--KPJjD0
Note! The standard location of the origin (x=0, y=0) for images is in the top left corner with y
values increasing downward and x increasing to the right. This might seem weird at first, but if
you think about an image as a matrix, it makes sense that the "00" element is in the upper left.
Now let's try a quiz. Below, Im plotting a cross section through this image. Where are the areas in the
image that are most likely to be identified as edges?
Quiz Question
The red line in the plot above shows where I took a cross section through the image. The wiggles in the
blue line indicate changes in intensity along that cross section through the image. Check all the boxes
of the letters along this cross section, where you expect to find strong edges.
A
E
11. Canny to Detect Lane Lines
Lets try our Canny edge detector on this image. This is where OpenCV gets useful. First, we'll have a
look at the parameters for the OpenCV Canny function. You will call it like this:
edges = cv2.Canny(gray, low_threshold, high_threshold)
In this case, you are applying Canny to the image gray and your output will be another image called
edges. low_threshold and high_threshold are your thresholds for edge detection.
The algorithm will first detect strong edge (strong gradient) pixels above the high_threshold, and
reject pixels below the low_threshold. Next, pixels with values between the low_threshold
and high_threshold will be included as long as they are connected to strong edges. The output
edges is a binary image with white pixels tracing out the detected edges and black everywhere else.
See the OpenCV Canny Docs for more details.
What would make sense as a reasonable range for these parameters? In our case, converting to
grayscale has left us with an 8-bit image, so each pixel can take 2^8 = 256 possible values. Hence, the
pixel values range from 0 to 255.
This range implies that derivatives (essentially, the value differences from pixel to pixel) will be on the
scale of tens or hundreds. So, a reasonable range for your threshold parameters would also be in
the tens to hundreds.
As far as a ratio of low_threshold to high_threshold, John Canny himself recommended a
low to high ratio of 1:2 or 1:3.
We'll also include Gaussian smoothing, before running Canny, which is essentially a way of
suppressing noise and spurious gradients by averaging (check out the OpenCV docs for GaussianBlur).
cv2.Canny() actually applies Gaussian smoothing internally, but we include it here because you can
get a different result by applying further smoothing (and it's not a changeable parameter within
cv2.Canny()!).
You can choose the kernel_size for Gaussian smoothing to be any odd number. A larger
kernel_size implies averaging, or smoothing, over a larger area. The example in the previous
lesson was kernel_size = 3.
Note: If this is all sounding complicated and new to you, don't worry! We're moving pretty fast through
the material here, because for now we just want you to be able to use these tools. If you would like to
dive into the math underpinning these functions, please check out the free Udacity course, Intro to
Computer Vision, where the third lesson covers Gaussian filters and the sixth and seventh lessons cover
edge detection.
#doing all the relevant imports
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
import cv2
Here I've called the OpenCV function Canny on a Gaussian-smoothed grayscaled image called
blur_gray and detected edges with thresholds on the gradient of high_threshold, and
low_threshold.
In the next quiz you'll get to try this on your own and mess around with the parameters for the Gaussian
smoothing and Canny Edge Detection to optimize for detecting the lane lines and not a lot of other
stuff.
In image space, a line is plotted as x vs. y, but in 1962, Paul Hough devised a method for representing
lines in parameter space, which we will call Hough space in his honor.
In Hough space, I can represent my "x vs. y" line as a point in "m vs. b" instead. The Hough Transform
is just the conversion from image space to Hough space. So, the characterization of a line in image
space will be a single point at the position (m, b) in Hough space.
So now Id like to check your intuition if a line in image space corresponds to a point in Hough
space, what would two parallel lines in image space correspond to in Hough space?
Question 1 of 5
What will be the representation in Hough space of two parallel lines in image space?
Alright, so a line in image space corresponds to a point in Hough space. What does a point in
image space correspond to in Hough space?
A single point in image space has many possible lines that pass through it, but not just any lines, only
those with particular combinations of the m and b parameters. Rearranging the equation of a line, we
find that a single point (x,y) corresponds to the line b = y - xm.
So what is the representation of a point in image space in Hough space?
Question 2 of 5
What does a point in image space correspond to in Hough space?
A
What if you have 2 points in image space. What would that look like in Hough space?
Question 3 of 5
What is the representation in Hough space of two points in image space?
Alright, now we have two intersecting lines in Hough Space. How would you represent their
intersection at the point (m0, b0) in image space?
Question 4 of 5
What does the intersection point of the two lines in Hough space correspond to in image space?
A) A line in image space that passes through both (x1, y1) and (x2, y2)
https://www.youtube.com/watch?v=XQf7FOhwOVk
So, what happens if we run a Hough Transform on an image of a square? What will the corresponding
plot in Hough space look like?
https://www.youtube.com/watch?v=upKjISd3aBk
Question 5 of 5
What happens if we run a Hough Transform on an image of a square? What will the corresponding plot
in Hough space look like?
Let's look at the input parameters for the OpenCV function HoughLinesP that we will use to find
lines in the image. You will call it like this:
lines = cv2.HoughLinesP(edges, rho, theta, threshold, np.array([]),
min_line_length, max_line_gap)
In this case, we are operating on the image edges (the output from Canny) and the output from
HoughLinesP will be lines, which will simply be an array containing the endpoints (x1, y1, x2, y2)
of all line segments detected by the transform operation. The other parameters define just what kind of
line segments we're looking for.
First off, rho and theta are the distance and angular resolution of our grid in Hough space.
Remember that, in Hough space, we have a grid laid out along the (, ) axis. You need to specify rho
in units of pixels and theta in units of radians.
So, what are reasonable values? Well, rho takes a minimum value of 1, and a reasonable starting place
for theta is 1 degree (pi/180 in radians). Scale these values up to be more flexible in your definition of
what constitutes a line.
The threshold parameter specifies the minimum number of votes (intersections in a given grid cell)
a candidate line needs to have to make it into the output. The empty np.array([]) is just a
placeholder, no need to change it. min_line_length is the minimum length of a line (in pixels)
that you will accept in the output, and max_line_gap is the maximum distance (again, in pixels)
between segments that you will allow to be connected into a single line. You can then iterate through
your output lines and draw them onto the image to see what you got!
# Iterate over the output "lines" and draw lines on the blank
for line in lines:
for x1,y1,x2,y2 in line:
cv2.line(line_image,(x1,y1),(x2,y2),(255,0,0),10)
2. Launch the Jupyter notebook with Anaconda or Docker. This notebook is simply to make sure
the installed packages are working properly. The instructions for the first project are on the
next page.
# Anaconda
source activate carnd-term1 # If currently deactivated, i.e. start of a new
terminal session
jupyter notebook test.ipynb
# Docker
docker run -it --rm -p 8888:8888 -v ${pwd}:/src udacity/carnd-term1-starter-
kit test.ipynb
# OR
docker run -it --rm -p 8888:8888 -v `pwd`:/src udacity/carnd-term1-starter-
kit test.ipynb
3. Go to http://localhost:8888/notebooks/test.ipynb in your browser and run
all the cells. Everything should execute without error.
Troubleshooting
ffmpeg
NOTE: If you don't have ffmpeg installed on your computer you'll have to install it for moviepy to
work. If this is the case you'll be prompted by an error in the notebook. You can easily install ffmpeg
by running the following in a code cell in the notebook.
import imageio
imageio.plugins.ffmpeg.download()
Docker
To get the latest version of the docker image, you may need to run:
docker pull udacity/carnd-term1-starter-kit
Project Expectations
For each project in Term 1, keep in mind a few key elements:
rubric
code
writeup
submission
Rubric
Each project comes with a rubric detailing the requirements for passing the project. Project reviewers
will check your project against the rubric to make sure that it meets specifications.
Before submitting your project, compare your submission against the rubric to make sure you've
covered each rubric point.
Here is an example of a project rubric:
Example of a project rubric
Code
Every project in the term includes code that you will write. For some projects we provide code
templates, often in a Jupyter notebook. For other projects, there are no code templates.
In either case, you'll need to submit your code files as part of the project. Each project has specific
instructions about what files are required. Make sure that your code is commented and easy for the
project reviewers to follow.
For the Jupyter notebooks, sometimes you must run all of the code cells and then export the notebook
as an HTML file. The notebook will contain instructions for how to do this.
Because running the code can take anywhere from several minutes to a few hours, the HTML file
allows project reviewers to see your notebook's output without having to run the code.
Even if the project requires submission of the HTML output of your Jupyter notebook, please submit
the original Jupyter notebook itself, as well.
Writeup
All of the projects in Term 1 require a writeup. The writeup is your chance to explain how you
approached the project.
It is also an opportunity to show your understanding of key concepts in the program.
We have provided writeup templates for every project so that it is clear what information needs to be in
each writeup. These templates can be found in each project repository, with the title
writeup_template.md.
Your writeup report should explain how you satisfied each requirement in the project rubric.
The writeups can be turned in either as Markdown files (.md) or PDF files.
Submission
When submitting a project, you can either submit it as a link to a GitHub repository
(https://github.com/ )or as a ZIP file. When submitting a GitHub repository, we advise creating a new
repository, specific to the project you are submitting.
GitHub repositories are a convenient way to organize your projects and display them to the world. A
GitHub repository also has a README.md file that opens automatically when somebody visits your
GitHub repository link.
As a suggestion, the README.md file for each repository can include the following information:
a list of files contained in the repository with a brief description of each file
any instructions someone might need for running your code
an overview of the project
Project Submission
Navigate to the project repository on GitHub (https://github.com/udacity/CarND-LaneLines-P1 )and
have a look at the Readme file for detailed instructions on how to get setup with Python and OpenCV
and how to access the Jupyter Notebook containing the project code. You will need to download, or
git clone, this repository in order to complete the project.
In this project, you will be writing code to identify lane lines on the road, first in an image, and later in
a video stream (really just a series of images). To complete this project you will use the tools you
learned about in the lesson, and build upon them.
Your first goal is to write code including a series of steps (pipeline) that identify and draw the lane lines
on a few test images. Once you can successfully identify the lines in an image, you can cut and paste
your code into the block provided to run on a video stream.
You will then refine your pipeline with parameter tuning and by averaging and extrapolating the lines.
Finally, you'll make a brief writeup report. The github repository has a writeup_template.md that
can be used as a guide.
Have a look at the video clip called "P1_example.mp4" in the repository to see an example of what
your final output should look like. Two videos are provided for you to run your code on. These are
called "solidWhiteRight.mp4" and solidYellowLeft.mp4".
Evaluation
Once you have completed your project, use the Project
Rubric(https://review.udacity.com/#!/rubrics/322/view ) to review the project. If you have covered
all of the points in the rubric, then you are ready to submit! If you see room for improvement in any
category in which you do not meet specifications, keep working!
Your project will be evaluated by a Udacity reviewer according to the same Project
Rubric(https://review.udacity.com/#!/rubrics/322/view ). Your project must "meet specifications" in
each category in order for your submission to pass.
Submission
What to include in your submission
You may submit your project as a zip file or with a link to a github repo. The submission must include
two files:
Jupyter Notebook with your project code
writeup report (md or pdf file)
https://www.youtube.com/watch?time_continue=2&v=oR1IxPTTz0U
2. Mercedes-Benz
https://www.youtube.com/watch?v=Z_hi4djW5aw
3. NVIDIA
https://www.youtube.com/watch?v=C6Rt9lxMqHs
4. Uber ATG
https://www.youtube.com/watch?v=V23NZzX0efY
6. Get Started
When you're ready to get started on your job search, head back to your syllabus and click on
"Extracurricular."
There you'll find two optional modules built by our Careers team: Job Search Strategies and
Networking.
The Udacity Careers team has put together this custom curriculum to help you in your job search. From
writing your resume and cover letter, to creating profiles on LinkedIn and GitHub, the team is here to
help you secure your dream job!
These modules and their associated projects are completely optional, but we highly recommend you
complete them to succeed in the job market. Udacity Hiring Partners (https://career-resource-
center.udacity.com/hiring-partners-jobs )are excited to hire students and alumni. We want to help you
optimize your application materials and targeted them to specific jobs!
https://www.youtube.com/watch?v=UIycORUrPww
Quiz Question
What's the best estimate for the price of a house?
Classification problems are important for self-driving cars. Self-driving cars might need to classify
whether an object crossing the road is a car, pedestrian, and a bicycle. Or they might need to identify
which type of traffic sign is coming up, or what a stop light is indicating.
In the next video, Luis will demonstrate a classification algorithm called "logistic regression". He'll use
logistic regression to predict whether a student will be accepted to a university.
Linear regression will lead to neural networks, which is a much more advanced classification tool.
Quiz Question
Does the student get Accepted?
8. Neural Networks
https://www.youtube.com/watch?time_continue=1&v=Mqogpnp1lrU
9. Perceptron
Perceptron
Now you've seen how a simple neural network makes decisions: by taking in input data, processing that
information, and finally, producing an output in the form of a decision! Let's take a deeper dive into the
university admission example to learn more about processing the input data.
Data, like test scores and grades, are fed into a network of interconnected nodes. These individual
nodes are called perceptrons, or artificial neurons, and they are the basic unit of a neural network. Each
one looks at input data and decides how to categorize that data. In the example above, the input either
passes a threshold for grades and test scores or doesn't, and so the two categories are: yes (passed the
threshold) and no (didn't pass the threshold). These categories then combine to form a decision -- for
example, if both nodes produce a "yes" output, then this student gains admission into the university.
Let's zoom in even further and look at how a single perceptron processes input data.
The perceptron above is one of the two perceptrons from the video that help determine whether or not a
student is accepted to a university. It decides whether a student's grades are high enough to be accepted
to the university. You might be wondering: "How does it know whether grades or test scores are more
important in making this acceptance decision?" Well, when we initialize a neural network, we don't
know what information will be most important in making a decision. It's up to the neural network to
learn for itself which data is most important and adjust how it considers that data.
It does this with something called weights.
Weights
When input comes into a perceptron, it gets multiplied by a weight value that is assigned to this
particular input. For example, the perceptron above has two inputs, tests for test scores and
grades, so it has two associated weights that can be adjusted individually. These weights start out as
random values, and as the neural network network learns more about what kind of input data leads to a
student being accepted into a university, the network adjusts the weights based on any errors in
categorization that results from the previous weights. This is called training the neural network.
A higher weight means the neural network considers that input more important than other inputs, and
lower weight means that the data is considered less important. An extreme example would be if test
scores had no affect at all on university acceptance; then the weight of the test score input would be
zero and it would have no affect on the output of the perceptron.
When writing equations related to neural networks, the weights will always be represented by some
type of the letter w. It will usually look like a W when it represents a matrix of weights or a w when it
represents an individual weight, and it may include some additional information in the form of a
subscript to specify which weights (you'll see more on that next). But remember, when you see the
letter w, think weights.
In this example, we'll use wgrades for the weight of grades and wtest for the weight of test. For the
image above, let's say that the weights are: wgrades=1,wtest =0.2. You don't have to be concerned
with the actual values, but their relative values are important. wgrades is 5 times larger than wtest,
which means the neural network considers grades input 5 times more important than test in
determining whether a student will be accepted into a university.
The perceptron applies these weights to the inputs and sums them in a process known as linear
combination. In our case, this looks like wgradesxgrades+wtestxtest=1xgrades0.2xtest.
Now, to make our equation less wordy, let's replace the explicit names with numbers. Let's use 1 for
grades and 2 for tests. So now our equation becomes
w1x1+w2x2
In this example, we just have 2 simple inputs: grades and tests. Let's imagine we instead had m different
inputs and we labeled them x1,x2,...,xm. Let's also say that the weight corresponding to x1 is w1 and so
on. In that case, we would express the linear combination succintly as:
i=1mwixi
Here, the Greek letter Sigma is used to represent summation. It simply means to evaluate the
equation to the right multiple times and add up the results. In this case, the equation it will sum is wixi
But where do we get wi and xi?
i=1m means to iterate over all i values, from 1 to m.
So to put it all together, i=1mwixi means the following:
Start at i=1
Evaluate w1x1 and remember the results
Move to i=2
Evaluate w2x2 and add these results to w1x1
Continue repeating that process until i=m, where m is the number of inputs.
One last thing: you'll see equations written many different ways, both here and when reading on your
own. For example, you will often just see i instead of i=1m. The first is simply a shorter way of
writing the second. That is, if you see a summation without a starting number or a defined end value, it
just means perform the sum for all of the them. And sometimes, if the value to iterate over can be
inferred, you'll see it as just . Just remember they're all the same thing: i=1mwixi=iwixi=wixi.
Perceptron Formula
This formula returns 1 if the input (x1,x2,...,xm) belongs to the accepted-to-university category or
returns 0 if it doesn't. The input is made up of one or more real numbers, each one represented by xi,
where m is the number of inputs.
Then the neural network starts to learn! Initially, the weights ( wi) and bias (b) are assigned a random
value, and then they are updated using a learning algorithm like gradient descent. The weights and
biases change so that the next training example is more accurately categorized, and patterns in data are
"learned" by the neural network.
Now that you have a good understanding of perceptions, let's put that knowledge to use. In the next
section, you'll create the AND perceptron from the Neural Networks video by setting the values for
weights and bias.
First, the linear combination will be the sum of the weighted inputs: linear_combination =
weight1*input1 + weight2*input2 then we can put this value into the biased Heaviside step
function, which will give us our output (0 or 1):
Perceptron Formula
import pandas as pd
# Print output
num_wrong = len([output[4] for output in outputs if output[4] == 'No'])
output_frame = pd.DataFrame(outputs, columns=['Input 1', ' Input 2', ' Linear Combination', '
Activation Output', ' Is Correct'])
if not num_wrong:
print('Nice! You got it all correct.\n')
else:
print('You got {} wrong. Keep trying!\n'.format(num_wrong))
print(output_frame.to_string(index=False))
So, how can you choose the values for weights and bias so that if both inputs = 1, the output = 1?
NOT Perceptron
Unlike the other perceptrons we looked at, the NOT operations only cares about one input. The
operation returns a 0 if the input is 1 and a 1 if it's a 0. The other inputs to the perceptron are ignored.
In this quiz, you'll set the weights (weight1, weight2) and bias bias to the values that calculate
the NOT operation on the second input and ignores the first input.
import pandas as pd
# Print output
num_wrong = len([output[4] for output in outputs if output[4] == 'No'])
output_frame = pd.DataFrame(outputs, columns=['Input 1', ' Input 2', ' Linear Combination', '
Activation Output', ' Is Correct'])
if not num_wrong:
print('Nice! You got it all correct.\n')
else:
print('You got {} wrong. Keep trying!\n'.format(num_wrong))
print(output_frame.to_string(index=False))
We have a perceptron that can do AND, OR, or NOT operations. Let's do one more, XOR. In the next
section, you'll learn how a neural network solves more complicated problems like XOR.
The above neural network contains 4 perceptrons, A, B, C, and D. The input to the neural network is
from the first node. The output comes out of the last node. The weights are based on the line thickness
between the perceptrons. Any link between perceptrons with a low weight, like A to C, you can ignore.
For perceptron C, you can ignore all input to and from it. For simplicity we wont be showing bias, but
it's still in the neural network.
Quiz
The neural network above calculates XOR. Each perceptron is a logic operation of OR, AND,
Passthrough, or NOT. The Passthrough operation just passes it's input to the output. However, the
perceptrons A , B, and C don't indicate their operation. In the following quiz, set the correct operations
for the three perceptrons to calculate XOR.
Note: Any line with a low weight can be ignored.
Quiz Question
Set the operations for the perceptrons in the XOR neural network?
Perceptron
Operations
A
NOT
B
AND
C
OR
You've seen that a perceptron can solve linearly separable problems. Solving more complex problems,
you use more perceptrons. You saw this by calculating AND, OR, NOT, and XOR operations using
perceptrons. These operations can be used to create any computer program. With enough data and time,
a neural network can solve any problem that a computer can calculate. However, you don't build a
Twitter using a neural network. A neural network is like any tool, you have to know when to use it.
The power of a neural network isn't building it by hand, like we were doing. It's the ability to learn
from examples. In the next few sections, you'll learn how a neural networks sets it's own weights and
biases.
13. The Simplest Neural Network
Diagram of a simple neural network. Circles are units, boxes are operations.
The cool part about this architecture, and what makes neural networks possible, is that the activation
function, f(h) can be any function, not just the step function shown earlier.
For example, if you let f(h)=h, the output will be the same as the input. Now the output of the network
is
y=iwixi+b
This equation should be familiar to you, it's the same as the linear regression model!
Other activation functions you'll see are the logistic (often called the sigmoid), tanh, and softmax
functions. We'll mostly be using the sigmoid function for the rest of this lesson:
sigmoid(x)=1/(1+ex)
def sigmoid(x):
# TODO: Implement sigmoid function
return 1 / (1 + np.exp(-x))
print('Output:')
print(output)
solution.py
import numpy as np
def sigmoid(x):
# TODO: Implement sigmoid function
return 1/(1 + np.exp(-x))
print('Output:')
print(output)
Learning weights
You've seen how you can use perceptrons for AND and XOR operations, but there we set the weights
by hand. What if you want to perform an operation, such as predicting college admission, but don't
know the correct weights? You'll need to learn the weights from example data, then use those weights
to make the predictions.
To figure out how we're going to find these weights, start by thinking about the goal. We want the
network to make predictions as close as possible to the real values. To measure this, we need a metric
of how wrong the predictions are, the error. A common metric is the sum of the squared errors (SSE):
E=21j[yjy^j]2
where y^ is the prediction and y is the true value, and you take the sum over all output units j and
another sum over all data points . This might seem like a really complicated equation at first, but it's
fairly simple once you understand the symbols and can say what's going on in words.
First, the inside sum over j. This variable j represents the output units of the network. So this inside
sum is saying for each output unit, find the difference between the true value y and the predicted value
from the network y^, then square the difference, then sum up all those squares.
Then the other sum over is a sum over all the data points. So, for each data point you calculate the
inner sum of the squared differences for each output unit. Then you sum up those squared differences
for each data point. That gives you the overall error for all the output predictions for all the data points.
The SSE is a good choice for a few reasons. The square ensures the error is always positive and larger
errors are penalized more than smaller errors. Also, it makes the math nice, always a plus.
Remember that the output of a neural network, the prediction, depends on the weights
y^j=f(iwijxi)
and accordingly the error depends on the weights
E=21j[yjf(iwijxi)]2
We want the network's prediction error to be as small as possible and the weights are the knobs we can
use to make that happen. Our goal is to find weights wij that minimize the squared error E. To do this
with a neural network, typically you'd use gradient descent.
https://www.youtube.com/watch?v=29PmNG7fuuM
As Luis said, with gradient descent, we take multiple small steps towards our goal. In this case, we
want to change the weights in steps that reduce the error. Continuing the analogy, the error is our
mountain and we want to get to the bottom. Since the fastest way down a mountain is in the steepest
direction, the steps taken should be in the direction that minimizes the error the most. We can find this
direction by calculating the gradient of the squared error.
Gradient is another term for rate of change or slope. If you need to brush up on this concept, check out
Khan Academy's great lectures on the topic.
To calculate a rate of change, we turn to calculus, specifically derivatives. A derivative of a function
f(x) gives you another function f(x) that returns the slope of f(x) at point x. For example, consider
f(x)=x2. The derivative of x2 is f(x)=2x. So, at x=2, the slope is f(2)=4. Plotting this out, it looks like:
Example of a gradient
The gradient is just a derivative generalized to functions with more than one variable. We can use
calculus to find the gradient at any point in our error function, which depends on the input weights.
You'll see how the gradient descent step is derived on the next page.
Below I've plotted an example of the error of a neural network with two inputs, and accordingly, two
weights. You can read this like a topographical map where points on a contour line have the same error
and darker contour lines correspond to larger errors.
At each step, you calculate the error and the gradient, then use those to determine how much to change
each weight. Repeating this process will eventually find weights that are close to the minimum of the
error function, the block dot in the middle.
Gradient descent steps to the lowest error
Caveats
Since the weights will just go where ever the gradient takes them, they can end up where the error is
low, but not the lowest. These spots are called local minima. If the weights are initialized with the
wrong values, gradient descent could lead the weights into a local minimum, illustrated below.
Gradient descent leading into a local minimum
There are methods to avoid this, such as using momentum.
# Input data
x = np.array([0.1, 0.3])
# Target
y = 0.2
# Input to output weights
weights = np.array([-0.8, 0.5])
gradient.py
def sigmoid(x):
"""
Calculate sigmoid
"""
return 1/(1+np.exp(-x))
learnrate = 0.5
x = np.array([1, 2])
y = np.array(0.5)
# Initial weights
w = np.array([0.5, -0.5])
solution.py
import numpy as np
def sigmoid(x):
"""
Calculate sigmoid
"""
return 1/(1+np.exp(-x))
learnrate = 0.5
x = np.array([1, 2])
y = np.array(0.5)
# Initial weights
w = np.array([0.5, -0.5])
Data cleanup
You might think there will be three input units, but we actually need to transform the data first. The
rank feature is categorical, the numbers don't encode any sort of relative values. Rank 2 is not twice
as much as rank 1, rank 3 is not 1.5 more than rank 2. Instead, we need to use dummy variables to
encode rank, splitting the data into four new columns encoded with ones or zeros. Rows with rank 1
have one in the rank 1 dummy column, and zeros in all other columns. Rows with rank 2 have one in
the rank 2 dummy column, and zeros in all other columns. And so on.
We'll also need to standardize the GRE and GPA data, which means to scale the values such they have
zero mean and a standard deviation of 1. This is necessary because the sigmoid function squashes really
small and really large inputs. The gradient of really small and large inputs is zero, which means that the
gradient descent step will go to zero too. Since the GRE and GPA values are fairly large, we have to be
really careful about how we initialize the weights or the gradient descent steps will die off and the
network won't train. Instead, if we standardize the data, we can initialize the weights easily and
everyone is happy.
This is just a brief run-through, you'll learn more about preparing data later. If you're interested in how
I did this, check out the data_prep.py file in the programming exercise below.
Ten rows of the data after transformations.
Now that the data is ready, we see that there are six input features: gre, gpa, and the four rank
dummy variables.
Here's the general algorithm for updating the weights with gradient descent:
Set the weight step to zero: wi=0
For each record in the training data:
Make a forward pass through the network, calculating the output y^=f(iwixi)
Calculate the error gradient in the output unit, =(yy^)f(iwixi)
Update the weight step wi=wi+xi
Update the weights wi=wi+wi/m where is the learning rate and m is the number of records.
Here we're averaging the weight steps to help reduce any large variations in the training data.
Repeat for e epochs.
You can also update the weights on each record instead of averaging the weight steps after going
through all the records.
Remember that we're using the sigmoid for the activation function, f(h)=1/(1+eh)
And the gradient of the sigmoid is f(h)=f(h)(1f(h))
where h is the input to the output unit,
h=iwixi
And finally, we can update wi and wi by incrementing them with weights += ... which is
shorthand for weights = weights + ....
Efficiency tip!
You can save some calculations since we're using a sigmoid here. For the sigmoid function, f(h)=f(h)
(1f(h)). That means that once you calculate f(h), the activation of the output unit, you can use it to
calculate the gradient for the error gradient.
Programming exercise
Below, you'll implement gradient descent and train the network on the admissions data. Your goal here
is to train the network until you reach a minimum in the mean square error (MSE) on the training set.
You need to implement:
The network output: output.
The error gradient: error.
Update the weight step: del_w +=.
Update the weights: weights +=.
After you've written these parts, run the training by pressing "Test Run". The MSE will print out, as
well as the accuracy on a test set, the fraction of correctly predicted admissions.
Feel free to play with the hyperparameters and see how it changes the MSE.
Gradient.py
import numpy as np
from data_prep import features, targets, features_test, targets_test
def sigmoid(x):
"""
Calculate sigmoid
"""
return 1 / (1 + np.exp(-x))
# Initialize weights
weights = np.random.normal(scale=1 / n_features**.5, size=n_features)
for e in range(epochs):
del_w = np.zeros(weights.shape)
for x, y in zip(features.values, targets):
# Loop through all records, x is the input, y is the target
data_prep.py
import numpy as np
import pandas as pd
admissions = pd.read_csv('binary.csv')
# Standarize features
for field in ['gre', 'gpa']:
mean, std = data[field].mean(), data[field].std()
data.loc[:,field] = (data[field]-mean)/std
binary.csv
solution.py
import numpy as np
from data_prep import features, targets, features_test, targets_test
def sigmoid(x):
"""
Calculate sigmoid
"""
return 1 / (1 + np.exp(-x))
# Initialize weights
weights = np.random.normal(scale=1/n_features**.5, size=n_features)
for e in range(epochs):
del_w = np.zeros(weights.shape)
for x, y in zip(features.values, targets):
# Loop through all records, x is the input, y is the target
Derivation
Before, we were dealing with only one output node which made the code straightforward. However
now that we have multiple input units and multiple hidden units, the weights between them will require
two indices: wij where i denotes input units and j are the hidden units.
For example, the following image shows our network, with its input units labeled x1,x2, and x3, and its
hidden nodes labeled h1 and h2:
The lines indicating the weights leading to h1 have been colored differently from those leading to h2
just to make it easier to read.
Now to index the weights, we take the input unit number for the i and the hidden unit number for the j.
That gives us
w11
for the weight leading from x1 to h1, and
w12
for the weight leading from x1 to h2.
The following image includes all of the weights between the input layer and the hidden layer, labeled
with their appropriate wij indices:
Before, we were able to write the weights as an array, indexed as wi.
But now, the weights need to be stored in a matrix, indexed as wij. Each row in the matrix will
correspond to the weights leading out of a single input unit, and each column will correspond to the
weights leading in to a single hidden unit. For our three input units and two hidden units, the weights
matrix looks like this:
Be sure to compare the matrix above with the diagram shown before it so you can see where the
different weights in the network end up in the matrix.
To initialize these weights in Numpy, we have to provide the shape of the matrix. If features is a 2D
array containing the input data:
# Number of records and input units
n_records, n_inputs = features.shape
# Number of hidden units
n_hidden = 2
weights_input_to_hidden = np.random.normal(0, n_inputs**-0.5, size=(n_inputs,
n_hidden))
Calculating the input to the first hidden unit with the first column of the weights matrix.
And for the second hidden layer input, you calculate the dot product of the inputs with the second
column. And so on and so forth.
In NumPy, you can do this for all the inputs and all the outputs at once using np.dot
hidden_inputs = np.dot(inputs, weights_input_to_hidden)
You could also define your weights matrix such that it has dimensions n_hidden by n_inputs then
multiply like so where the inputs form a column vector:
Note: The weight indices have changed in the above image and no longer match up with the labels
used in the earlier diagrams. That's because, in matrix notation, the row index always precedes the
column index, so it would be misleading to label them the way we did in the neural net diagram. Just
keep in mind that this is the same weight matrix as before, but rotated so the first column is now the
first row, and the second column is now the second row. If we were to use the labels from the earlier
diagram, the weights would fit into the matrix in the following locations:
The dot product can't be computed for a 3x2 matrix and 3-element array. That's because the 2 columns
in the matrix don't match the number of elements in the array. Some of the dimensions that could work
would be the following:
The rule is that if you're multiplying an array from the left, the array must have the same number of
elements as there are rows in the matrix. And if you're multiplying the matrix from the left, the number
of columns in the matrix must equal the number of elements in the array on the right.
print(features)
> array([ 0.49671415, -0.1382643 , 0.64768854])
print(features.T)
> array([ 0.49671415, -0.1382643 , 0.64768854])
print(features[:, None])
> array([[ 0.49671415],
[-0.1382643 ],
[ 0.64768854]])
Alternatively, you can create arrays with two dimensions. Then, you can use arr.T to get the column
vector.
np.array(features, ndmin=2)
> array([[ 0.49671415, -0.1382643 , 0.64768854]])
np.array(features, ndmin=2).T
> array([[ 0.49671415],
[-0.1382643 ],
[ 0.64768854]])
I personally prefer keeping all vectors as 1D arrays, it just works better in my head.
Programming quiz
Below, you'll implement a forward pass through a 4x3x2 network, with sigmoid activation functions
for both layers.
Things to do:
Calculate the input to the hidden layer.
Calculate the hidden layer output.
Calculate the input to the output layer.
Calculate the output of the network.
Multiplier.py
import numpy as np
def sigmoid(x):
"""
Calculate sigmoid
"""
return 1/(1+np.exp(-x))
# Network size
N_input = 4
N_hidden = 3
N_output = 2
np.random.seed(42)
# Make some fake data
X = np.random.randn(4)
print('Hidden-layer Output:')
print(hidden_layer_out)
output_layer_in = np.dot(hidden_layer_out,weights_hidden_to_output)
output_layer_out = sigmoid(output_layer_in)
print('Output-layer Output:')
print(output_layer_out)
solution.py
import numpy as np
def sigmoid(x):
"""
Calculate sigmoid
"""
return 1/(1+np.exp(-x))
# Network size
N_input = 4
N_hidden = 3
N_output = 2
np.random.seed(42)
# Make some fake data
X = np.random.randn(4)
print('Hidden-layer Output:')
print(hidden_layer_out)
print('Output-layer Output:')
print(output_layer_out)
19. Backpropagation
https://www.youtube.com/watch?v=MZL97-2joxQ
Backpropagation
Now we've come to the problem of how to make a multilayer neural network learn. Before, we saw
how to update weights with gradient descent. The backpropagation algorithm is just an extension of
that, using the chain rule to find the error with the respect to the weights connecting the input layer to
the hidden layer (for a two layer network).
To update the weights to hidden layers using gradient descent, you need to know how much error each
of the hidden units contributed to the final output. Since the output of a layer is determined by the
weights between layers, the error resulting from units is scaled by the weights going forward through
the network. Since we know the error at the output, we can use the weights to work backwards to
hidden layers.
For example, in the output layer, you have errors ko attributed to each output unit k. Then, the error
attributed to hidden unit j is the output errors, scaled by the weights between the output and hidden
layers (and the gradient):
Then, the gradient descent step is the same as before, just with the new errors:
where wij are the weights between the inputs and hidden layer and xi are input unit values. This form
holds for however many layers there are. The weight steps are equal to the step size times the output
error of the layer times the values of the inputs to that layer
Here, you get the output error, output, by propagating the errors backwards from higher layers. And
the input values, Vin are the inputs to the layer, the hidden layer activations to the output unit for
example.
Implementing in NumPy
For the most part you have everything you need to implement backpropagation with NumPy.
However, previously we were only dealing with error terms from one unit. Now, in the weight update,
we have to consider the error for each unit in the hidden layer, j:
wij=jxi
Firstly, there will likely be a different number of input and hidden units, so trying to multiply the errors
and the inputs as row vectors will throw an error
hidden_error*inputs
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-22-3b59121cb809> in <module>()
----> 1 hidden_error*x
ValueError: operands could not be broadcast together with shapes (3,) (6,)
Also, wij is a matrix now, so the right side of the assignment must have the same shape as the left side.
Luckily, NumPy takes care of this for us. If you multiply a row vector array with a column vector array,
it will multiply the first element in the column by each element in the row vector and set that as the first
row in a new 2D array. This continues for each element in the column vector, so you get a 2D array that
has shape (len(column_vector), len(row_vector)).
hidden_error*inputs[:,None]
array([[ -8.24195994e-04, -2.71771975e-04, 1.29713395e-03],
[ -2.87777394e-04, -9.48922722e-05, 4.52909055e-04],
[ 6.44605731e-04, 2.12553536e-04, -1.01449168e-03],
[ 0.00000000e+00, 0.00000000e+00, -0.00000000e+00],
[ 0.00000000e+00, 0.00000000e+00, -0.00000000e+00],
[ 0.00000000e+00, 0.00000000e+00, -0.00000000e+00]])
It turns out this is exactly how we want to calculate the weight update step. As before, if you have your
inputs as a 2D array with one row, you can also do hidden_error*inputs.T, but that won't work
if inputs is a 1D array.
Backpropagation exercise
Below, you'll implement the code to calculate one backpropagation update step for two sets of weights.
I wrote the forward pass, your goal is to code the backward pass.
Things to do
Calculate the network error.
Calculate the output layer error gradient.
Use backpropagation to calculate the hidden layer error.
Calculate the weight update steps.
Backprop.py
import numpy as np
def sigmoid(x):
"""
Calculate sigmoid
"""
return 1 / (1 + np.exp(-x))
## Backwards pass
## TODO: Calculate error
error = None
import numpy as np
def sigmoid(x):
"""
Calculate sigmoid
"""
return 1 / (1 + np.exp(-x))
## Forward pass
hidden_layer_input = np.dot(x, weights_input_hidden)
hidden_layer_output = sigmoid(hidden_layer_input)
Implementing backpropagation
Now we've seen that the error in the output layer is
k=(yky^k)f(ak)
and the error in the hidden layer is
For now we'll only consider a simple network with one hidden layer and one output unit. Here's the
general algorithm for updating the weights with backpropagation:
Set the weight steps for each layer to zero
The input to hidden weights wij=0
The hidden to output weights Wj=0
For each record in the training data:
Make a forward pass through the network, calculating the output y^
Calculate the error gradient in the output unit, o=(yy^)f(z) where z=jWjaj, the input
to the output unit.
Propagate the errors to the hidden layer jh=oWjf(hj)
Update the weight steps,:
Wj=Wj+oaj
wij=wij+jhai
Update the weights, where is the learning rate and m is the number of records:
Wj=Wj+Wj/m
wij=wij+wij/m
Repeat for e epochs.
Backpropagation exercise
Now you're going to implement the backprop algorithm for a network trained on the graduate school
admission data. You should have everything you need from the previous exercises to complete this one.
Your goals here:
Implement the forward pass.
Implement the backpropagation algorithm.
Update the weights.
Backprop.py
import numpy as np
from data_prep import features, targets, features_test, targets_test
np.random.seed(21)
def sigmoid(x):
"""
Calculate sigmoid
"""
return 1 / (1 + np.exp(-x))
# Hyperparameters
n_hidden = 2 # number of hidden units
epochs = 900
learnrate = 0.005
for e in range(epochs):
del_w_input_hidden = np.zeros(weights_input_hidden.shape)
del_w_hidden_output = np.zeros(weights_hidden_output.shape)
for x, y in zip(features.values, targets):
## Forward pass ##
# TODO: Calculate the output
hidden_input = np.dot(x, weights_input_hidden)
hidden_output = sigmoid(hidden_input)
output = sigmoid(np.dot(hidden_output, weights_hidden_output))
## Backward pass ##
# TODO: Calculate the error
error = y-output
data_prep.py
import numpy as np
import pandas as pd
admissions = pd.read_csv('binary.csv')
# Standarize features
for field in ['gre', 'gpa']:
mean, std = data[field].mean(), data[field].std()
data.loc[:,field] = (data[field]-mean)/std
# Split off random 10% of the data for testing
np.random.seed(21)
sample = np.random.choice(data.index, size=int(len(data)*0.9), replace=False)
data, test_data = data.ix[sample], data.drop(sample)
binary.csv
solution.py
import numpy as np
from data_prep import features, targets, features_test, targets_test
np.random.seed(21)
def sigmoid(x):
"""
Calculate sigmoid
"""
return 1 / (1 + np.exp(-x))
# Hyperparameters
n_hidden = 2 # number of hidden units
epochs = 900
learnrate = 0.005
n_records, n_features = features.shape
last_loss = None
# Initialize weights
weights_input_hidden = np.random.normal(scale=1 / n_features ** .5,
size=(n_features, n_hidden))
weights_hidden_output = np.random.normal(scale=1 / n_features ** .5,
size=n_hidden)
for e in range(epochs):
del_w_input_hidden = np.zeros(weights_input_hidden.shape)
del_w_hidden_output = np.zeros(weights_hidden_output.shape)
for x, y in zip(features.values, targets):
## Forward pass ##
# TODO: Calculate the output
hidden_input = np.dot(x, weights_input_hidden)
hidden_output = sigmoid(hidden_input)
output = sigmoid(np.dot(hidden_output,
weights_hidden_output))
## Backward pass ##
# TODO: Calculate the error
error = y - output
Further reading
Backpropagation is fundamental to deep learning. TensorFlow and other libraries will perform the
backprop for you, but you should really really understand the algorithm. We'll be going over backprop
again, but here are some extra resources for you:
From Andrej Karpathy: Yes, you should understand backprop
In this lesson, you learned the power of perceptrons. How powerful one perceptron is and the power of
a neural network using multiple perceptrons. Then you learned how each perceptron can learn from
past samples to come up with a solution.
Now that you understand the basics of a neural network, the next step is to build a basic neural
network. In the next lesson, you'll build your own neural network.
23. Summary
https://www.youtube.com/watch?v=m8xslYUBXYo