You are on page 1of 6

Chapter 1 - Introduction to Inferential Statistics

Recall: Descriptive Statistics - an area in Applied Statistics that includes methods used to organize, summarize, and present data. Inferential Statistics - an area in Applied Statistics that includes methods used to make generalizations about some characteristics of the population based on information contained in the sample. Estimation - inference about a parameter is made by finding a single value or a range of values computed from the sample data that may be used to make a statement about the unknown value of the parameter. Point Estimator - a single statistic whose realized value is used to estimate the true but unknown value of the parameter. Interval Estimator - a rule that tells how to calculate the limits based on sample data that will form an interval within which the parameter is expected to lie with a specified degree of confidence. Hypothesis Testing - inference about the population (or the parameter, in particular) is made by assessing whether or not the sample data support an assertion made about the true value of a parameter. Population - a collection of items of interest in a statistical study. Sample - a subset of items that have been selected from the population. Parameter - a summary measure describing some characteristic of the population. Statistic - a summary measure describing some characteristic of the sample. Sampling Distribution - in is the probability (or relative frequency) distribution of a statistic in repeated sampling from the same population. Theorems on Sampling Distributions: Remarks on Sampling Distributions: 1. If all possible random samples of size n are 1. A statistic is a variable whose value depends only drawn with replacement from a finite population on the observed sample and may vary from sample to sample. size N with mean, , and standard deviation, , 2. The sampling distribution of a statistic will depend then: on the size of the population, sample, and the method of choosing the sample. 3. The standard deviation of the sampling distribution is called the standard error - it tells us 2. If all possible random samples of size n are the extent to which we expect the values of the drawn without replacement from a finite statistic to vary from different possible samples. population size N with mean, , and standard 4. If a statistic of the sampling distribution of some deviation, , then: statistic is equal to the parameter of interest, then we say that the statistic is an unbiased estimator. 5. A parameter may have many unbiased The factor is called the finite estimators; however, an ideal estimator of a population correction factor. parameter is one that has the smallest standard error among all unbiased estimators.

Stat 115 Page 1

Chapter 1 - The Standard Normal Distribution


The Standard Normal Curve Denoted by its probability density function A random variable X that follows a normal distribution with parameters - mean and variance - is written as Normal curves are important in Statistics because: The sampling distribution of many characteristics of interest in research closely approximates that of the normal curve. Many classical inferential methods require that the population is (at least) normally distributed. The sampling distributions of many sample statistics can be described quite well by the Normal curve. Characteristics of the Normal Curve: It is bell-shaped; however, the converse is not true. It is symmetric with respect to the y-axis. It is asymptotic to the x-axis. The area under the whole curve is 1. The mean determines the position of the curve, while the standard deviation determines its height. All of the measures of central tendency lie together at the absolute maximum. Standard Scores and Probability The standard scores, or Z-scores, are the units of the observations, expressed in standard deviations. or The areas above or below a Z-score (say, and , respectively) can be expressed as: For above, For below, Standard scores can be expressed as kth percentiles, such that or

Central Limit Theorem De Moivre - Laplace Theorem States that when the sample size is sufficiently It was discovered by mathematicians Abraham de large (in practice, it is usually at least 30), the Moivre and Pierre-Simon Laplace. It states that for any random variable that is normal distribution can be used to approximate distributed with parameters will approximate the the sampling distribution. It gives way for the use of the Theorems on standard normal distribution for a sufficiently large Sampling Distributions when the sample size is sample size. It gives way for the use of the Central Limit sufficiently large. It also gives a guarantee that the standard error Theorem (CLT) for different sampling distributions, is reduced as the sample size increases, and that including the Bernoulli Distribution. The random variables can be standardized , such it approximates the normal distribution. It holds for populations that are symmetrically or that asymmetrically distributed.

Stat 115 Page 2

Chapter 2 - Inferences for


Interval Estimation A confidence interval is generally constructed, where is the confidence coefficient. The confidence coefficient is not the probability that the true value of the parameter falls in the interval estimate, but rather, the probability that the interval estimator encloses the true value of the parameter. A narrow confidence interval brings us closer to locating the parameter, while a greater confidence coefficient gives a better guarantee that the confidence interval encloses the true value of the parameter. A good confidence interval is one that is as narrow as possible and has a large confidence coefficient. Interval Estimators for Margin of Error and the Sample Size for Estimating The margin of error is the maximum distance from the sample mean to the population mean. Assumptions Confidence Interval The analytical margin of error is the absolute 1. is known value that is added to the sample mean in the confidence interval. If the sample mean will be used to estimate the 2. is unknown population mean, there is confidence that the error will not exceed a 3. n > 30

Note:

specified amount, e, when

Hypothesis Testing A statistical hypothesis is an assertion/conjecture concerning one or more populations. Null Hypothesis (Ho) - the hypothesis that is being tested. Alternative Hypothesis (Ha) - the contradiction of the null hypothesis One-Tailed Test - a test where the alternative hypothesis specifies a one-directional difference for the parameter of interest. Two-Tailed Test - a test where the alternative hypothesis does not specify a directional difference for the parameter of interest. A test statistic is a statistic computed from sample measurements that is especially sensitive to the differences between the hypotheses. The critical region or region of rejection is the set of values of the test statistic for which the null hypothesis will be rejected. Critical Value - the value that separates the critical regions. Type I and II Errors - the errors made by rejecting the null when it is true, and accepting the null hypothesis when it is false, respectively. Level of Significance - denoted by , is the maximum probability of Type I Error the researcher is willing to commit. The P-value Tests for The smallest value of the level of Case Test Statistic Ha Critical Region significance, for which the null hypothesis 1. is known < will be rejected. > If the level of significance is at least the Pvalue, then the null hypothesis is reject. 2. is unknown < Since it is hard to compute for the exact P> value when the t-Test is used, the P-value interval is computed, that is, the values of that encloses the test statistic value. 3. n > 30 <

>

Stat 115 Page 3

Chapter 3 - Inferences for Population Proportions


Population Proportion - denoted by or P, is the proportion of the elements of a population that have the particular characteristic of interest. Interval Estimation of Population Proportion The interval estimator is used under the assumptions that: The random sample size ultimately determined is reasonably large. The population is either infinite or, if finite, is large relative to the resulting sample size, that is, the sampling fraction is at most 5%. The approximate confidence interval for the population proportion is:

Sample Size for Estimating P If p will be used to estimate P, then there is

confidence that the error will not

exceed a specified amount, e, when . When the value of P is unknown or cannot be approximated, then using P = 0.5 produces a maximum value; hence, a conservative formula for the sample size is .

Hypothesis Testing for Population Proportion Test Statistic:

Region of Rejection: Ho Ha Region of Rejection

Remarks: This test is approximately an -level test when the sample size is large enough. A rule of thumb is used in determining if the sample size is large enough is as follows: Test is valid if the product of the sample size and the population proportion (or its complement) is at least 5.

Stat 115 Page 4

Chapter 4 - Inferences about 12 and 12: Interval Estimators


Independent Samples Design - a study design in which statistically independent samples are drawn from the populations and comparative information about the populations is derived from a comparison of the independent samples. Matched Samples Design - a study design in which the elements of the samples drawn from the two populations are carefully matched in pairs so that the 2 elements in each pair are as similar as possible with respect to the characteristics that might be related to the variable of interest in the study. Interval Estimators about 12 for Independent Samples Assumptions 1. The variances are known (or if the sample sizes are both greater than 30, but replace 2 with s2) 2. The variances are equal but unknown, or the samples are equal and at most 30 3. The variances are unequal but unknown Confidence Interval

Interval Estimator about XY for Matched Samples

Interval Estimator about 12 for Independent Random Samples with Replacement

Interpretations for the Confidence Interval (A, B): If both A and B are positive, then 1 - 2 > 0, or 1 > 2 If both A and B are negative, then 1 - 2 < 0, or 1 < 2 If A is negative and |A| > |B|, then 1 - 2 0, or 1 2 If A is negative and |A| < |B|, then 1 - 2 0, or 1 2 The interpretations also apply for the interval estimator for the population proportions, but replace with .

Stat 115 Page 5

Chapter 4 - Inferences about 12 and 12: Hypothesis Testing


Tests for 12 for Independent Samples Assumptions 1. The variances are known (or if both of the sample sizes greater than 30, but replace 2 with s2) 2. The variances are equal but unknown, or if the sample sizes are equal and at most 30). Test Statistic HA Critical Region

3. The variances are unknown

Test for XY for Matched Samples Test Statistic:

Test for 12 for Independent Samples Test Statistic:

Region of Rejection: Ho Ha Region of Rejection

Region of Rejection Ho Ha Region of Rejection

Stat 115 Page 6

You might also like