You are on page 1of 31

Definitions of Statistics

a branch of science dealing with the collection, organization, presentation, analysis and interpretation of any kind of data. (as a science) any descriptive form of measurements such as mean, quartile, standard deviation, etc., which are computed from a sample. (as a measure)

Two Divisions of Statistics


1. Descriptive Statistics
focuses on the task of collecting, processing and presenting data can answer the questions who, what or how

2. Inferential or Analytical Statistics


focuses on the analysis and interpretation of data makes conclusion (predict, estimate, compare)

Descriptive vs Inferential
D A bowler wants to find his bowling average for the past 12 games. I A bowler wants to estimate his chance of winning a game based on his current season average and of his opponents.

A housewife wants to determine the average weekly amount she spent in the past three months.

A housewife would like to predict based on last years grocery bills, the average weekly amount she will spend in groceries for this year.

Classify the following as Descriptive or Inferential Statistics


1. The Economist would like to determine the highest monthly electric and water consumption of the company. 2. A statistician would like to know the average volume (in cubic feet) per month of chemical wastes produced by a particular chemical plant. 3. The manager would like to find out how the number of years of experience improves the job performance of workers. 4. The PRC found out that the proportion of the Bar Exam passers is significantly higher than 50% in the last three years. 5. The chemist wants to determine whether there is a direct relationship between the tensile strength of plastic sheets and the respective quantity of a particular ingredient.

6 . The storeowner is 95% confident that the adjustment in the vending machine increased the mean fill level of soft drink plastic cups between 1.2 and 1.8 ounces. 7. The president of a particular network would like to determine which TV show has the highest rating on weekends. 8. The director of the hospital wants to test at the 1% level of significance whether the four different kinds of serum for curing heart disease are the same. 9. The management director for a large industrial firm wants to determine if three different training programs have different effects on employees productivity levels. 10. Statistics from the DOTC show that in the last 5 years, the previous year has the greatest variability in sales of Ford cars.

Basic Concepts in Statistics


Population - a complete collection of all elements to be studied Census - a collection of data from every element in a population Sample - a subcollection of elements drawn from a population - taken to reduce work-time and cost in the study Variable - any particular characteristic inherent in the members of the sample or population which may take a varied range of values Data set - a collection of observations

Types of Samples
1. Non-probability Samples (always the choice of the researcher) Convenience Judgment Purposive Quota Accidental 2. Probability/Random Samples (everybody is given a
chance to be chosen)

Simple Systematic Stratified Clustered Single-stage Multi-stage

PROBABILITY or RANDOM SAMPLING

Simple Random Sample - obtained by assigning numbers to each member of the population and randomly picking up some of these numbers like in lottery (RAN key of the calculator can be used)
Example:

Out of 95 population, one desired to get a random sample of 30.


Using the RAN in the calculator, disregard the decimal point and select the first two or the last two digits of the numbers.
Random Numbers 0.678 0.904 0.129 0.015 Interpretation 67 90 12 01 or 1

Systematic Random Sample - obtained using an ordered list of the population, then selecting members systematically from the list (formula: k = N/n)
Example:

Out of 55 population, a sample of 11 students will be taken.


Using the formula , find the interval for each sample. Start with any number, which may be chosen also using simple random sampling. 1 2 3 4 5 6 7 8 9 10 11

(Assume that based on simple random sampling, 28 is the first sample)

12
23 34 45

13
24 35 46

14
25 36 47

15
26 37 48

16
27 38 49

17
28 39 50

18
29 40 51

19
30 41 52

20
31 42 53

21
32 43 54

22
33 44 55

Stratified Random Sample - composed of several smaller samples that are taken separately (by simple or systematic random sampling) from every stratum in the population.
Example:

20% of the employees from each department of Company X will be taken to comprise the stratified random sample.
COMPANY X

Strata
Population Sample

Accounting
110 22

Treasury
80 16

Personnel
70 14

Legal
50 10

Marketing
120 24

Clustered Random Sample - taken from separate samples in randomly chosen subset geographically distinct clusters (c = x/n)
Example:

A researcher wants to take samples of students from NCR. If the average number of students in the college/universities of the NCR is 3000, and the desired sample is 500.
x 3000 c = ---- = --------- = 6 n 500 500 ------- = 83 or 84 6

Thus, only 6 schools from NCR will be chosen (using any method) and 83 or 84 students from each chosen school (cluster) will be included in the sample.

Classification of Variables
1. Qualitative - not expressed numerically - differs in kind rather than in degrees
Dichotomous - can be made only in two categories (male or female, employed or unemployed, etc.) Multinomial - can be made in more than two categories (job title, color, language, religion, etc.)

2. Quantitative - normally expressed numerically - differs in degree rather than in kind


Discrete - can assume specific values (whole nos.) (number of family members, students in a classroom) Continuous - can assume values at all points (w/ decimal) (height, temperature, volume)

Types of Data based on the Level of Measurement


1. Nominal Data (qualitative) - numbers are actually placeholders for names ex: blood type, color, religion, marital status, employee/student number 2. Ordinal Data (qualitative) - numbers produce a distinct order, ranking or arrangement of data ex: socio-economic class, pain level, product rating, year level

3. Interval Data (quantitative) - have precise differences between measures but the zero value is arbitrary and does not imply an absence of the characteristic being measured ex: IQ level, temperature scale, test scores 4. Ratio Data (quantitative) - based on a standard scale which have a fixed zero point in which the zero value denotes the complete absence of the characteristic being measured ex: volume, height, weight, salaries, age

Methods of Generating Data


1. Questionnaire method - easiest way, used in the business world 2. Interview method - very tedious but more reliable, used also in business 3. Observation method - not reliable, used by psychologist/psychiatrist 4. Registration method - used in offices 5. Experiment method - most accurate, used by scientific researchers

Methods of Presenting Data


1. Textual form - requires good choice of descriptive words 2. Tabular form - presented in rows or columns, arranged in ascending or descending range 3. Graphical form - a pictorial/visual representation used to show the relationships between two or more different variables

Kinds of Graphs
1. Line graph - shows the successive points that are connected by lines 2. Bar graph - can be vertical or horizontal , simple or multiple, with equal widths and do not overlap 3. Pie graph - a segmented circle 4. Pictograph - use symbols or pictures

Determining the sample size


- the Slovins Formula can be used as a guide for obtaining the sample size

N n = -----------1 + Ne2
where: n = sample size N = population size e = margin of error

Margin of Error (e)

a value which quantifies possible sampling errors or uncertainties about the survey results

Examples: 1. Find the sample size if the population is 10,000 and the margin of error is: a. 5% b. 1% 2. A company has 1350 employees. If a representative group of 177 was selected and asked questions. What is the margin of error in determining the sample? 3. With a 5% margin of error, a sample of 350 respondents were interviewed. From what number of population was it taken?

4. A survey to find out if families living in a certain municipality are in favor of the Charter change will be conducted. To ensure that all income groups are represented, the respondents will be divided into classes, as shown below:
Strata High-Income Middle-Income Low-Income Class Class A Class B Class C No. of Families 1,000 2,500 1,500 N = 5,000

a. Using a 5% margin of error, how many families should be included in the samples? b. Using a proportional allocation, how many from each group should be taken as samples?

Strata HighIncome
MiddleIncome LowIncome

Class

No. of Families
1,000

Percent

No. of sample

20%

74

B
C

2,500
1,500 N = 5,000

50%
30%

185
111 n = 370

Summation Notation - denotes the sum of some numerical values - used to determine the sum of the quantities at a given range - used the Greek capital letter sigma, Illustration: n

Xi
i=1

Read as summation of Xi , i taken from 1 to n where: 1 = represents the lower limit (start) n = represents the upper limit (end)

Examples: Consider the values of X and Y below


X Y 4 -1 2 2 0 3 -2 1

Find:
3 4 4

1. Xi
i=1 4

2. Xi Yi
i=2 3

3. Yi 2
i=1 4 i=3 2 i=1

4. Xi3
i=1

5. (Xi i)
i=2

6. Xi ( Yi 2)

Examples:
X1 = 4 Y1 = -1 Find: 3 1. Xi i=1 4 4. Xi3 i=1 X2 = 2 Y2 = 2 X3 = 0 Y3 = 3 X4 = -2 Y4 = 1

4 2. Xi Yi i=2 3 5. (Xi i) i=2

4 3. Yi 2 i=1 4 2 6. Xi ( Yi 2 ) i=3 i=1

Summation Notation Theorems


1. Summation of a constant c taken from 1 to n
n

c = nc
i=1

Example:
4

5 = 4 (5) = 20
i=1

2. Summation of the product of a constant c and a variable


n n

cXi = c Xi
i=1 i=1

Example: If X1 = 3, X2 = -2, X3 = 1, then


3 i=1 3 i=1

4Xi = 4 Xi = 4 (3-2+1) = 4(2) = 8

3. Summation of the sum of two or more variables


n i=1 n i=1 n i=1 n i=1

(Xi + Yi + Zi) = Xi + Yi + Zi Example:


X Y 2 -3 -1 0 4 1

(Xi + Yi) = (2-1+4) + (-3+0+1) = 5 - 2 = 3


i=1

Frequency Distribution or Frequency Table - the organization of data in a tabular form

Frequency - number of observed values


Range (R) - the difference of the highest value HV) and the lowest value (LV) R = HV - LV Class Width or Interval (i) - distance between the class lower boundary

You might also like