You are on page 1of 4

Applied Math Unit1 Summary and Useful Formulas

Frequency: number of times an item occurs

Class width: upper boundary minus lower boundary

Frequency density: frequency ÷ class width


𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑎𝑛 𝑖𝑡𝑒𝑚
Relative Frequency or Proportion: 𝑡𝑜𝑡𝑎𝑙 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦
…… also used probability

Mean
𝑥
1. For data in a list, the mean is 𝑥̅ = ∑
𝑛
∑ 𝑓𝑥
2. For data in a frequency table, 𝑥̅ = ∑𝑓
𝑥
3. For grouped data, 𝑥̅ = ∑ where the x values are the midpoints of the groups
𝑛

(midpoint = upper limit + lower limit) ÷ 2

4. To find x% trimmed mean, find x% of the number of items, then leave off that amount of values from
both the top and the bottom of the list and average what is left

Median: the middle item when a list is arranged in ascending or descending order.

1. For items in a list, the position of median is (n + 1)÷2


2. On a graph, the median is the x-value that corresponds to 50% of the cumulative frequency
(0.5𝑁−𝐹)
3. For grouped data, 𝑚𝑒𝑑𝑖𝑎𝑛 = 𝐿 + ×I
𝑓
𝐿 = 𝑙𝑜𝑤𝑒𝑟 𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦, 𝐼 = 𝑐𝑙𝑎𝑠𝑠 𝑤𝑖𝑑𝑡ℎ; 𝑁 = 𝑡𝑜𝑡𝑎𝑙 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦;
𝐹 = 𝑐𝑢𝑚𝑢𝑙𝑎𝑡𝑖𝑣𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑏𝑒𝑓𝑜𝑟𝑒 𝑚𝑒𝑑𝑖𝑎𝑛 𝑐𝑙𝑎𝑠𝑠; 𝑓 = 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑚𝑒𝑑𝑖𝑎𝑛 𝑐𝑙𝑎𝑠𝑠

Mode: the most frequently occurring item; Estimating mode from a histogram
the item with the highest frequency

1. For data in a list, just look for the one with the highest
frequency
𝐼×𝑑1
2. For grouped data, 𝑀𝑂𝐷𝐸 = 𝐿 + , where
𝑑1 + 𝑑2
L = lower boundary of modal class;
d1 = modal frequency – frequency of the previous class
d2 = modal frequency – frequency of the next class;
I = width of modal class

Range and Quartiles

1. Range highest value – lowest value


2. Lower quartile, Q1 = the value below which 25% of the group lies
3. Upper quartile, Q3 = the value below which 75% of the group lies
4. Interquartile range, IQR = Q3 – Q1
5. Semi-interquartile range = half of the IQR

Percentile: the value below which a certain percentage of the group (or distribution) lies.
Applied Math Unit1 Summary and Useful Formulas

Variance

1. Variance is the average of the squared differences between each value and the mean
2. For data in a list, you can use either formula (a) or (b):
(𝑥− 𝑥̅ )2
a. 𝑉𝑎𝑟(𝑋) = ∑ 𝑛
(the mean of squared deviations from the mean)
𝑥2
b. 𝑉𝑎𝑟(𝑋) = ∑ − (𝑥̅ )2 (the mean of the squares minus the square of the mean)
𝑛

3. For data in a frequency table, or in groups, you can use either formula (a) or (b):
∑ 𝑓(𝑥− 𝑥̅ )2
a. 𝑉𝑎𝑟(𝑋) = ∑𝑓
∑ 𝑓𝑥 2
b. 𝑉𝑎𝑟(𝑋) = ∑𝑓
− (𝑥̅ )2

Estimation

A parameter is a statistical measure that is calculated using the values from a population; OR a numerical value
that describes some feature of the population. A statistic however, is a numerical value that describes some
characteristic of a sample. Usually we don’t use the whole population to do our calculations, so we use
Sample Statistics to get our estimates for Population Parameters. Whenever we don’t know the true value of a
population parameter, we just estimate it.

Parameter Estimator
a. Population mean, µ 𝑥̅ , the sample mean
𝑛
b. Population variance, 𝜎 2 ×Sample variance
𝑛−1

c. Population standard deviation, 𝜎 Square root of estimated variance


d. Population proportion, P 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑎𝑛 𝑖𝑡𝑒𝑚
Sample proportion, 𝑝 = 𝑡𝑜𝑡𝑎𝑙 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦

Probability

Sample space is the set of all possible outcomes of an event, and the sum of probabilities in the sample space
always equals 1, or 100%.
𝑛𝑜. 𝑜𝑓 𝑤𝑎𝑦𝑠 𝑓𝑜𝑟 𝐴 𝑡𝑜 𝑜𝑐𝑐𝑢𝑟 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑓𝑎𝑣𝑜𝑢𝑟𝑎𝑏𝑙𝑒 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠 𝑜𝑓 𝐴
1. 𝑃(𝐴) = 𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑜𝑠𝑠𝑖𝑏𝑖𝑙𝑖𝑡𝑖𝑒𝑠 𝑜𝑟 𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠
2. General Formula: 𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵)
3. Complemenary Events 𝑃(𝐴′ ) = 1 − 𝑃(𝐴)
4. Independent Events 𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴) × 𝑃(𝐵)
5. Mutually Exclusive events cannot occur together, therefore…
a. 𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) 𝑏𝑒𝑐𝑎𝑢𝑠𝑒
b. 𝑃(𝐴 ∩ 𝐵) = 0
6. Conditional Probability
𝑃(𝐴∩𝐵)
a. 𝑃(𝐴|𝐵) = 𝑃(𝐵)
b. 𝑏𝑢𝑡 𝑖𝑓 𝐴 𝑎𝑛𝑑 𝐵 𝑎𝑟𝑒 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡, 𝑡ℎ𝑒𝑛 𝑃(𝐴|𝐵) = 𝑃(𝐵)
Applied Math Unit1 Summary and Useful Formulas

Random Variables: variables whose values depend on the outcome of a random event

If X is a random variable, then

1. The sum of probabilities is 1.


2. The Total Area under the graph of its density function is 1.
3. Mean µ, or Expected Value 𝐸(𝑋) is calculated as ∑ 𝑥𝑃(𝑋 = 𝑥)
2 ∑ 𝑥 2 𝑃(𝑋 = 𝑥) − 𝜇2
4. Variance 𝜎 , or 𝑉𝑎𝑟(𝑋) is calculated as

Probability Distributions
𝑋− 𝜇
1. Normal Distribution Standardize your X variables using 𝑍 =
𝜎
𝑛
2. Binomial Distribution 𝑃(𝑋 = 𝑥) = 𝑥𝐶 𝑝 𝑥 (1 − 𝑝)𝑛−𝑥

3. Normal Approximation to the Binomial Distribution (when the number of trials n is very large)

𝐹𝑜𝑟 𝑃(𝑋 < 𝑎 𝑛𝑢𝑚𝑏𝑒𝑟), 𝑢𝑠𝑒 𝑃(𝑋 < 𝑡ℎ𝑒 𝑙𝑜𝑤𝑒𝑟 𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦) 𝑎𝑛𝑑 𝑡𝑟𝑎𝑛𝑠𝑓𝑜𝑟𝑚 𝑡𝑜 𝑛𝑜𝑟𝑚𝑎𝑙 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑍
𝐹𝑜𝑟 𝑃(𝑋 > 𝑎 𝑛𝑢𝑚𝑏𝑒𝑟), 𝑢𝑠𝑒 𝑃(𝑋 > 𝑡ℎ𝑒 𝑢𝑝𝑝𝑒𝑟 𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦) 𝑎𝑛𝑑 𝑡𝑟𝑎𝑛𝑠𝑓𝑜𝑟𝑚 𝑡𝑜 𝑛𝑜𝑟𝑚𝑎𝑙 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑍
𝐹𝑜𝑟 𝑃(𝑋 ≤ 𝑎 𝑛𝑢𝑚𝑏𝑒𝑟), 𝑢𝑠𝑒 𝑃(𝑋 < 𝑡ℎ𝑒 𝑢𝑝𝑝𝑒𝑟 𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦) 𝑎𝑛𝑑 𝑡𝑟𝑎𝑛𝑠𝑓𝑜𝑟𝑚 𝑡𝑜 𝑛𝑜𝑟𝑚𝑎𝑙 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑍
𝐹𝑜𝑟 𝑃(𝑋 ≥ 𝑎 𝑛𝑢𝑚𝑏𝑒𝑟), 𝑢𝑠𝑒 𝑃(𝑋 < 𝑡ℎ𝑒 𝑙𝑜𝑤𝑒𝑟 𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦) 𝑎𝑛𝑑 𝑡𝑟𝑎𝑛𝑠𝑓𝑜𝑟𝑚 𝑡𝑜 𝑛𝑜𝑟𝑚𝑎𝑙 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑍

̅
Distribution of the sample mean 𝑿

By the Central limit Theorem, regardless of the population that a random variable X comes from, the sample
mean 𝑋̅ follows a normal distribution. As sample size increases, the distribution gets more and more normal.

1. The expected value (or mean) of 𝑋̅ is 𝜇 , same as the original population.


𝜎2
2. The variance of the sample mean 𝑋̅ is 𝑛
.
𝜎
3. The standard deviation of 𝑋̅ is . Of course, this is just the square root of its variance.
√𝑛

As usual, if you don’t know the true value of 𝜎 or 𝜎 2 , then calculate the estimator as shown in the table above.

Confidence Intervals

A confidence interval is a range of values (an interval) that has a known probability of containing the true value
of a population parameter.

Use Z-tables (standard normal distribution) if the sample size is large (n is 30 or more).

Use t-tables with (n – 1) degrees of freedom if the sample size is small (n less than 30) or if standard deviation
is unknown and you have to estimate it.
𝜎 𝜎
Confidence interval for the population mean 𝜇 𝑋̅ ± 𝑍 or 𝑋̅ ± 𝑡
√𝑛 √𝑛

𝑝(1−𝑝) 𝑝(1−𝑝)
Confidence intervals for population proportion P 𝑝 ± 𝑍√ 𝑛
or 𝑝 ± 𝑡√ 𝑛
Applied Math Unit1 Summary and Useful Formulas

Correlation and Regression

Regression Equation of y on x y = a + bx
𝑛 ∑ 𝑥𝑦−∑ 𝑥 ∑ 𝑦
Gradient of regression line 𝑏=
𝑛 ∑ 𝑥 2 −(∑ 𝑥)2

𝑛 ∑ 𝑥𝑦−∑ 𝑥 ∑ 𝑦
Pearson’s Correlation Coefficient 𝑟=
√[𝑛 ∑ 𝑥 2 −(∑ 𝑥)2 ][𝑛 ∑ 𝑦 2 −(∑ 𝑦)2 ]

If you are asked for the regression line of x on y, just interchange x and y in these formulas.

Hypothesis tests

If the value of your test statistic falls inside the rejection region (critical region), then your conclusion is to
reject the null hypothesis in favour of the alternative hypothesis.

If the value of your test statistic falls outside the rejection region (critical region), then your conclusion is to
reject the alternative hypothesis in favour of the null hypothesis.

Hypothesis test for the population mean,

1. Use z tables or t tables as appropriate to find the value of your test statistic

2. The null hypothesis is H0 : µ = some stated value, k


The alternative is hypothesis is Ha : µ ≠ k (two tailed test) or
Ha : µ < k (one tailed test) or
Ha : µ > k (one tailed test)

𝑋̅−𝜇 𝑋̅−𝜇
3. The test statistic is 𝑍 = 𝜎⁄ , or of course 𝑇 = 𝜎⁄
√𝑛 √𝑛
As usual, if you can find the true value for 𝜎, then use it. If you can’t find the true value, use the
estimator.
𝑝−𝑃
4. The test statistic for the population proportion is
𝑝(1−𝑝)

𝑛

Chi squared test for independence

1. The null hypothesis is H0: The two variables are independent


The null hypothesis is H1: The two variables are not independent
𝑅𝑜𝑤 𝑇𝑜𝑡𝑎𝑙 ×𝐶𝑜𝑙𝑢𝑚𝑛 𝑇𝑜𝑡𝑎𝑙 𝑅×𝐶
2. Expected frequency of a value in the contingency table is 𝑜𝑣𝑒𝑟𝑎𝑙𝑙 𝑇𝑜𝑡𝑎𝑙
or 𝑇
3. Degrees of freedom is (#rows – 1) × (#columns – 1)
(𝑂−𝐸)2
4. The test statistic is 𝜒 2 = ∑ 𝐸

You might also like