Professional Documents
Culture Documents
Mean
𝑥
1. For data in a list, the mean is 𝑥̅ = ∑
𝑛
∑ 𝑓𝑥
2. For data in a frequency table, 𝑥̅ = ∑𝑓
𝑥
3. For grouped data, 𝑥̅ = ∑ where the x values are the midpoints of the groups
𝑛
4. To find x% trimmed mean, find x% of the number of items, then leave off that amount of values from
both the top and the bottom of the list and average what is left
Median: the middle item when a list is arranged in ascending or descending order.
Mode: the most frequently occurring item; Estimating mode from a histogram
the item with the highest frequency
1. For data in a list, just look for the one with the highest
frequency
𝐼×𝑑1
2. For grouped data, 𝑀𝑂𝐷𝐸 = 𝐿 + , where
𝑑1 + 𝑑2
L = lower boundary of modal class;
d1 = modal frequency – frequency of the previous class
d2 = modal frequency – frequency of the next class;
I = width of modal class
Percentile: the value below which a certain percentage of the group (or distribution) lies.
Applied Math Unit1 Summary and Useful Formulas
Variance
1. Variance is the average of the squared differences between each value and the mean
2. For data in a list, you can use either formula (a) or (b):
(𝑥− 𝑥̅ )2
a. 𝑉𝑎𝑟(𝑋) = ∑ 𝑛
(the mean of squared deviations from the mean)
𝑥2
b. 𝑉𝑎𝑟(𝑋) = ∑ − (𝑥̅ )2 (the mean of the squares minus the square of the mean)
𝑛
3. For data in a frequency table, or in groups, you can use either formula (a) or (b):
∑ 𝑓(𝑥− 𝑥̅ )2
a. 𝑉𝑎𝑟(𝑋) = ∑𝑓
∑ 𝑓𝑥 2
b. 𝑉𝑎𝑟(𝑋) = ∑𝑓
− (𝑥̅ )2
Estimation
A parameter is a statistical measure that is calculated using the values from a population; OR a numerical value
that describes some feature of the population. A statistic however, is a numerical value that describes some
characteristic of a sample. Usually we don’t use the whole population to do our calculations, so we use
Sample Statistics to get our estimates for Population Parameters. Whenever we don’t know the true value of a
population parameter, we just estimate it.
Parameter Estimator
a. Population mean, µ 𝑥̅ , the sample mean
𝑛
b. Population variance, 𝜎 2 ×Sample variance
𝑛−1
Probability
Sample space is the set of all possible outcomes of an event, and the sum of probabilities in the sample space
always equals 1, or 100%.
𝑛𝑜. 𝑜𝑓 𝑤𝑎𝑦𝑠 𝑓𝑜𝑟 𝐴 𝑡𝑜 𝑜𝑐𝑐𝑢𝑟 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑓𝑎𝑣𝑜𝑢𝑟𝑎𝑏𝑙𝑒 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠 𝑜𝑓 𝐴
1. 𝑃(𝐴) = 𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑜𝑠𝑠𝑖𝑏𝑖𝑙𝑖𝑡𝑖𝑒𝑠 𝑜𝑟 𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠
2. General Formula: 𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵)
3. Complemenary Events 𝑃(𝐴′ ) = 1 − 𝑃(𝐴)
4. Independent Events 𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴) × 𝑃(𝐵)
5. Mutually Exclusive events cannot occur together, therefore…
a. 𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) 𝑏𝑒𝑐𝑎𝑢𝑠𝑒
b. 𝑃(𝐴 ∩ 𝐵) = 0
6. Conditional Probability
𝑃(𝐴∩𝐵)
a. 𝑃(𝐴|𝐵) = 𝑃(𝐵)
b. 𝑏𝑢𝑡 𝑖𝑓 𝐴 𝑎𝑛𝑑 𝐵 𝑎𝑟𝑒 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡, 𝑡ℎ𝑒𝑛 𝑃(𝐴|𝐵) = 𝑃(𝐵)
Applied Math Unit1 Summary and Useful Formulas
Random Variables: variables whose values depend on the outcome of a random event
Probability Distributions
𝑋− 𝜇
1. Normal Distribution Standardize your X variables using 𝑍 =
𝜎
𝑛
2. Binomial Distribution 𝑃(𝑋 = 𝑥) = 𝑥𝐶 𝑝 𝑥 (1 − 𝑝)𝑛−𝑥
3. Normal Approximation to the Binomial Distribution (when the number of trials n is very large)
𝐹𝑜𝑟 𝑃(𝑋 < 𝑎 𝑛𝑢𝑚𝑏𝑒𝑟), 𝑢𝑠𝑒 𝑃(𝑋 < 𝑡ℎ𝑒 𝑙𝑜𝑤𝑒𝑟 𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦) 𝑎𝑛𝑑 𝑡𝑟𝑎𝑛𝑠𝑓𝑜𝑟𝑚 𝑡𝑜 𝑛𝑜𝑟𝑚𝑎𝑙 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑍
𝐹𝑜𝑟 𝑃(𝑋 > 𝑎 𝑛𝑢𝑚𝑏𝑒𝑟), 𝑢𝑠𝑒 𝑃(𝑋 > 𝑡ℎ𝑒 𝑢𝑝𝑝𝑒𝑟 𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦) 𝑎𝑛𝑑 𝑡𝑟𝑎𝑛𝑠𝑓𝑜𝑟𝑚 𝑡𝑜 𝑛𝑜𝑟𝑚𝑎𝑙 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑍
𝐹𝑜𝑟 𝑃(𝑋 ≤ 𝑎 𝑛𝑢𝑚𝑏𝑒𝑟), 𝑢𝑠𝑒 𝑃(𝑋 < 𝑡ℎ𝑒 𝑢𝑝𝑝𝑒𝑟 𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦) 𝑎𝑛𝑑 𝑡𝑟𝑎𝑛𝑠𝑓𝑜𝑟𝑚 𝑡𝑜 𝑛𝑜𝑟𝑚𝑎𝑙 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑍
𝐹𝑜𝑟 𝑃(𝑋 ≥ 𝑎 𝑛𝑢𝑚𝑏𝑒𝑟), 𝑢𝑠𝑒 𝑃(𝑋 < 𝑡ℎ𝑒 𝑙𝑜𝑤𝑒𝑟 𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦) 𝑎𝑛𝑑 𝑡𝑟𝑎𝑛𝑠𝑓𝑜𝑟𝑚 𝑡𝑜 𝑛𝑜𝑟𝑚𝑎𝑙 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑍
̅
Distribution of the sample mean 𝑿
By the Central limit Theorem, regardless of the population that a random variable X comes from, the sample
mean 𝑋̅ follows a normal distribution. As sample size increases, the distribution gets more and more normal.
As usual, if you don’t know the true value of 𝜎 or 𝜎 2 , then calculate the estimator as shown in the table above.
Confidence Intervals
A confidence interval is a range of values (an interval) that has a known probability of containing the true value
of a population parameter.
Use Z-tables (standard normal distribution) if the sample size is large (n is 30 or more).
Use t-tables with (n – 1) degrees of freedom if the sample size is small (n less than 30) or if standard deviation
is unknown and you have to estimate it.
𝜎 𝜎
Confidence interval for the population mean 𝜇 𝑋̅ ± 𝑍 or 𝑋̅ ± 𝑡
√𝑛 √𝑛
𝑝(1−𝑝) 𝑝(1−𝑝)
Confidence intervals for population proportion P 𝑝 ± 𝑍√ 𝑛
or 𝑝 ± 𝑡√ 𝑛
Applied Math Unit1 Summary and Useful Formulas
Regression Equation of y on x y = a + bx
𝑛 ∑ 𝑥𝑦−∑ 𝑥 ∑ 𝑦
Gradient of regression line 𝑏=
𝑛 ∑ 𝑥 2 −(∑ 𝑥)2
𝑛 ∑ 𝑥𝑦−∑ 𝑥 ∑ 𝑦
Pearson’s Correlation Coefficient 𝑟=
√[𝑛 ∑ 𝑥 2 −(∑ 𝑥)2 ][𝑛 ∑ 𝑦 2 −(∑ 𝑦)2 ]
If you are asked for the regression line of x on y, just interchange x and y in these formulas.
Hypothesis tests
If the value of your test statistic falls inside the rejection region (critical region), then your conclusion is to
reject the null hypothesis in favour of the alternative hypothesis.
If the value of your test statistic falls outside the rejection region (critical region), then your conclusion is to
reject the alternative hypothesis in favour of the null hypothesis.
1. Use z tables or t tables as appropriate to find the value of your test statistic
𝑋̅−𝜇 𝑋̅−𝜇
3. The test statistic is 𝑍 = 𝜎⁄ , or of course 𝑇 = 𝜎⁄
√𝑛 √𝑛
As usual, if you can find the true value for 𝜎, then use it. If you can’t find the true value, use the
estimator.
𝑝−𝑃
4. The test statistic for the population proportion is
𝑝(1−𝑝)
√
𝑛