You are on page 1of 10

Midterm review

CHAPTER 1: DATA AND STATISTICS


Statistics Definition:
-

Narrow: numerical facts


Broader: science/art of collecting, analyzing, presenting and interpreting data (business
and economics) better understanding of business and econ environment better
decision making

Data: facts and figures collected, analyzed and summarized for presentation and interpretation
Element: entity on which data are collected (usually first column)
Variable: characteristic of interest
Observation: set of measurements obtained for a particular element (# of obs = # of elements)
Total number of data items = # of observations x # of variables
Scales of measurement: determine amount of info in data and most appropriate summarization
and statistical analyses

Nominal: names or labels to identify and attribute


Ordinal: properties of nominal and the order or rank matters
Interval: properties of ordinal and the interval (difference) matters
Ratio: properties of interval and the ratio matters

Categorical data uses nominal or ordinal (Statistics is limited summarize by counting)


Quantitative uses interval or ratio (arithmetic operations available); data can be discrete or
continuous
Cross-sectional data: same or approximately same point in time
Time series data: over several time periods (this graphs usually in business and econ publications
trends)
Data sources:
-

Existing sources: company databases (employees and customers); with Internet, this is
even more powerful and required
Statistical studies:
o Experimental: variable of interest is first identified, one or more other are
identified and controlled to know how they influence
o Nonexperimental/observational: no attempt to control variables. Ex survey

Existing sources may be preferred because of the cost and time required for studies. Cost of data
acquisition and analysis should not exceed savings by using the info.
Descriptive statistics: summaries of data (tabular, graphical, numerical)
Population: set of all elements of interest
Sample: subset of population
Census: process of conducting a survey to entire population
Survey: for sample
Statistical inference: estimates and test hypotheses about characteristics of population
Data warehousing: capture, store and maintain data
Data mining: methods for developing useful decision-making info from large databases;
technology that relies heavily on statistical methodology; stress on automated and predictive;
limited ability to uncover and identify causal relationships
CHAPTER 2: TABULAR AND GRAPHICAL PRESENTATIONS
Frequency distribution: tabular summary of data shows number (frequency) in each
nonoverlapping class
Relative frequency distribution: shows proportion of items = Frequency of the class / n
Percent frequency distribution: same as relative x 100%
Bar chart: graphical presentation for data in frequency, relative freq or percent freq distribution
Pie chart: graphical presentation for data in frequencies distribution
Dot plot: simplest graphical summary of data
Histogram: graphical presentation of quantitative data
Often number of classes = number of categories found in data; sum of frequencies = number of
observations; Sum of relative frequencies = 1; sum of percent frequencies =100
Determine number of nonoverlapping classes: general 5 to 20 classes.
Width of classes: same for each class; Approx class width= (largest-smallest)/number of classes; in
practice determined by trial and error
Class limits: so that each data item belongs to one and only one class

Cumulative frequency distribution: tabular summary of quantitative data; shows number of data
less than or equal to the upper class limit of each class; last entry equals total number of
observations
Exploratory data analysis: simple arithmetic graphs (stem and leaf)
Stem and leaf: easier to construct; more info because it shows actual data
Cross tabulation: tabular summary for two variables; provides insight about relationship

Simpsons paradox: conclusion based upon aggregate data can be reversed if we look at the
unaggregated data
Scatter diagram: graphical presentation of relationship between two quantitative variables
Trendline: approximation of relationship between two quantitative variables

CHAPTER 3: NUMERICAL MEASURES


Measures of location:
Mean: most important measure of location
Trimmed mean: deleting percentage of smallest and largest values from set
Median: measure of central location; value in the middle when arranged in ascending order;
better to use than mean when data set has extreme values
Mode: value that occurs with greatest frequency; more than one mode can exist (multimodal)
Percentile: how data are spread over the interval from smallest to largest.
pth percentile: at least p percent of the observations are less than or equal to this value
Calculating pth percentile:
1. Arrange data in ascending
2. Compute index i= (p/100) x n
3. If not an integer, round up; if integer, average of i and i+1
Quartiles: data into four parts
-

Q1= first, 25 percentile


Q2 second quartile, 50 percentile, median
Q3 third quartile, 75 percentile

Measure of variability:
Range: largest value smallest value
Interquartile range: difference between third quartile and first quartile; range for the middle 50%
of the data
Variance: utilizes all data; difference between the value of each observation and the mean

For any data set, sum of deviations about the mean always equal zero.
Standard deviation: positive square root of variance; better measurement, measured in same units
as original data.
Coefficient of variation: expressed as percentage; how large standard deviation is relative to mean
Coefficient of variation= (standard deviation/mean) x 100%
Measures of Shape:

To the left, skewness is negative; when symmetric, mean and median are equal

z-score: standardized value; number of standard deviations xi is from the mean.


Chebyshevs theorem: at least (1 (1/z)2) of the data values must be within z standard deviations
of the mean, where z is any value greater than 1.

At least 75% of the values must be within z=2 standard deviations


At least 89% of the values must be within z=3 standard deviations
At least 94% of the values must be within z=4 standard deviations

Empirical rule: for bell-shaped distribution

Approximately 68% within 1 standard deviation


Approximately 95% within 2 standard deviations
Almost all within three standard deviations

Exploratory data analysis:


Five number summary:
1.
2.
3.
4.
5.

Smallest value
First quartile (Q1)
Median (Q2)
Third quartile (Q3)
Largest value

Box plot: graphical summary based on five-number summary


1.
2.
3.
4.
5.

Ends of box at first and third quartiles


Vertical line in box at the median
Limits located at interquartile range; data outside are outliers
Dashed lines (whiskers) from ends of the box to smallest and largest value inside limits
Location of each outlier marked with *

Covariance: descriptive measure of linear association between two variables

Positive value for sxy show positive linear association between x and y; as x increases y increases
If close to zero, no linear association between x and y
Problem with covariance: value depends on units of measurement for x and y solution:
correlation coefficient

Sample correlation coefficient of +1 = perfect positive linear relationship


Sample correlation coefficient of -1 = perfect negative linear relationship
CHAPTER 4: INTRODUCTION TO PROBABILITY
Probability: numerical measure of likelihood that an event will occur
Experiment: process that generates well-defined outcomes
Sample space: set of all experimental outcomes
Counting rule for multiple-step experiments: if sequence of k steps with n1 possible outcomes on
first step, n2 possible outcomes on the second step and so on; total number of experimental
outcomes by (n1) (n2) (nk)
Tree diagram: graphical representation to visualize multiple-step experiments
Combinations: selecting n objects from a (usually larger) set of N objects
Counting rule for combinations:

Permutations: when n objects are to be selected from a set of N objects where the order of
selection is important; in a different order is a different experimental outcome
Experiments results in more permutations than combinations for the same number of objects.
Assigning probabilities: between 0 and 1 and the sum adds up to 1
Classical method: when all outcomes are equally likely; if n outcomes possible probability of 1/n
Relative frequency method: data are available to estimate the proportion of the time the
experimental outcome will occur if repeated a large number of times number of times outcome
is seen/total number of repetitions
Subjective method: one cannot realistically assume that are equally likely and when little relevant
data are available; may use experience, intuition specify a degree of belief

Event: collection of sample points

Mutually exclusive events: have no sample points in common


Joint probabilities: show the intersection of two events.
Marginal probabilities: prob of each event separately

P (A|B): Probability of A given B


Independent events: probability of event A is not changed by the existence of event B

If two events are mutually exclusive, they cannot be independent


Revising probabilities when new info is obtained is an important phase of probability analysis
Bayes Theorem

Start analysis with initial or prior probability calculate revised probabilities or posterior
probabilities
Prior New info application of Bayes Theorem Posterior

Collectively exhaustive: union of events is entire sample space


CHAPTER 5: DISCRETE PROBABILITY DISTRIBUTIONS
Random variable: numerical description of the outcome
Discrete random variable: may assume values; uses probability function *f(x)0; f(x) = 1+
Continuous random variable: may assume any value within an interval
Probability distribution: how probabilities are distributed over the values of the random variable
Discrete uniform probability function: f(x) = 1/n
Expected value of a discrete random variable: E(x) = = xf(x)
Variance of a Discrete random variable: Var(x) = 2 = (x )2f(x)
Binomial probability distribution: discrete probability distribution
Properties of binomial experiment:
1.
2.
3.
4.

Sequence of n identical trials


Two outcomes possible on each trial; success and failure
Success = p does not change from trial to trial
Trials are independent

Expected Value and Variance for the Binomial Distribution:


-

E(x) = = np
Var(x) = 2 = np (1 p)

Poisson: over a specified interval of time or space


Properties of Poisson experiment:
1. Probability is the same for any two intervals of equal length
2. Occurrence or nonoccurrence in any interval is independent from any other interval

Hypergeometric: closely related to binomial but trials are not independent and probability of
success changes from trial to trial

You might also like