Professional Documents
Culture Documents
CVEN2002/CVEN2702
2016
1. Introduction
1. Introduction
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
2 / 35
1. Introduction
What is statistics?
Definition
Statistics is the science of the (i) collection, (ii) processing, (iii)
analysis, and (iv) interpretation of data.
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
3 / 35
1. Introduction
What is statistics?
In engineering, this includes many diversified tasks such as:
calculating the average lengths of downtimes of a computers.
checking whether the level of lead in the water supply is within
safety standards.
determining the strength of supports for generators at a power
plant.
collecting and presenting data on the number of persons attending
seminars on solar energy.
...
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
4 / 35
1. Introduction
What is statistics?
Statistics is a discipline in its own right, that makes use of
mathematics (and computer science, and other discipline
application expertise).
Statistics considers the presence of randomness, uncertainty and
variation, which are everywhere in real life.
- If each computer had exactly the same length of downtime,
then a single observation would reveal all desired information, and
therefore we would not need statistics :-(
Statistics
Statistics allows us to take this uncertainty into account when
making judgements and decisions.
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
5 / 35
1. Introduction
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
6 / 35
1. Introduction
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
7 / 35
1. Introduction
Dr Jakub Stoklosa
Semester 2, 2016
8 / 35
1. Introduction
uns.
500
1000
1500
2000
2500
precipitation amount
Dr Jakub Stoklosa
Semester 2, 2016
9 / 35
1. Introduction
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
10 / 35
1. Introduction
We should analyse and interpret the data bearing in mind that the
observed features may be consequences of chance only.
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
11 / 35
1. Introduction
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
12 / 35
1. Introduction
Fact
Every step in this process requires understanding statistical principles
and concepts as well as knowledge and skills in statistical methods.
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
13 / 35
1. Introduction
A group of 19 subjects was divided into light blond, dark blond, light
brunette and dark brunette groups and a pain threshold score was
measured for each subject.
2
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
14 / 35
1. Introduction
L Blond
D Blond
L Brunette
D Brunette
20
40
60
80
100
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
15 / 35
1. Introduction
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
16 / 35
1. Introduction
Population
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
17 / 35
1. Introduction
Population
Lets go back to our two examples.
Example 1
In Example 1 (clouds seeding), the population consists of all the
clouds of the sky. An individual is a cloud and the variable of interest is
the amount of rainfall.
Example 2
In Example 2 (pain tolerance), the population consists of all blonds and
brunettes of the world. An individual is one of those people and the
variable of interest is the pain threshold score.
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
18 / 35
1. Introduction
Sample
It is often physically impossible or infeasible to obtain data on the
whole population. Imagine collecting a bit of information from
every human on this planet!
This is also of very expensive, or very time-consuming, or could
consist of destructive experiments.
In most situations, we can only observe a subset of the
population, that is, we must work with only partial information.
The subset of the population which is observed is called the sample.
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
19 / 35
1. Introduction
Sample
Example 1
In Example 1, the sample consists of the 52 clouds whose rainfall
amounts have been recorded.
(We might also consider that we have two samples: 26 seeded clouds
and 26 unseeded clouds).
Example 2
In Example 2, the sample consists of the 19 persons whose pain
threshold scores have been measured.
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
20 / 35
1. Introduction
Sampling
The process of selecting the sample is called sampling.
If the sample is to be informative about the total population, it must
be representative of that population.
Suppose you are interested in the average height of UNSW
students, would you select the sample from the UNSW basketball
team?
Or, suppose you are interested in the average age of UNSW
students, would you select a sample made up of postgraduate
students only?
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
21 / 35
1. Introduction
Sampling
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
22 / 35
1. Introduction
Random sampling
In practice, the only sampling scheme that guarantees the sample to
be representative of the population is random sampling.
The individuals of the sample are selected in a totally random
fashion, without any other prior consideration.
Any non-random selection of a sample often results in one which is
inherently biased toward some values as opposed to others.
We must not attempt to deliberately choose the sample according to
some criteria.
Instead, we should just leave it up to chance to obtain a sample
which correctly covers the underlying population.
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
23 / 35
1. Introduction
Fact
The statistical procedures presented in this course may not be valid
when applied to non-random samples.
Never unquestioningly accept samples without knowing how the
data have been generated/collected/observed!
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
24 / 35
2. Descriptive Statistics
2. Descriptive Statistics
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
25 / 35
2. Descriptive Statistics
2.1 Introduction
Introduction
Statistical data, obtained from surveys, experiments, or any series of
measurements, are often so numerous that they are virtually useless
unless they are condensed.
Data should be presented in ways that facilitate their
interpretation and subsequent analysis.
The aspect of statistics which deals with organising, describing and
summarising data is called descriptive statistics.
Essentially, descriptive statistics tools consist of two key methods:
1
numerical methods.
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
26 / 35
2. Descriptive Statistics
2.1 Introduction
Types of variables
But first we must discuss variables of which there are two types:
1
Dr Jakub Stoklosa
Semester 2, 2016
27 / 35
2. Descriptive Statistics
2.1 Introduction
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
28 / 35
2. Descriptive Statistics
2.1 Introduction
Numerical variables:
I
discrete: the variable can only take a finite (or countable) number of
distinct values.
Eg. the number of courses you are enrolled in, number of persons
in a household, etc.
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
29 / 35
2. Descriptive Statistics
Graphical representations
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
30 / 35
2. Descriptive Statistics
Graphical representations
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
31 / 35
2. Descriptive Statistics
Graphical representations
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
32 / 35
2. Descriptive Statistics
Dotplot
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
33 / 35
2. Descriptive Statistics
Dotplot: example
Example
In 1987, for the first time, physicists observed neutrinos from a supernova
that occurred outside of our solar system. At a site in Kamiokande, Japan,
the following times (in seconds) between neutrinos were recorded:
0.107; 0.196; 0.021; 0.283; 0.179; 0.854; 0.58; 0.19; 7.3; 1.18; 2
Draw a dotplot:
time (s)
Note that the largest observation is extremely different to the others. Such an
observation is called an outlier.
Could be: recording error? missed observations? real observation?
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
34 / 35
2. Descriptive Statistics
Objectives
Objectives
Now you should be able to:
Identify the role that statistics can play in the engineering
problem-solving process.
Discuss how variability affects the data collected and used for
making engineering decisions.
Discuss how probability theory is used in engineering and
science.
Discuss the importance of random sampling.
Understand the importance of graphical representations, construct
and interpret a dotplot.
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
35 / 35