You are on page 1of 24

IBA JU Statistics for Managers Course Instructor: Dr.

Swapan Kumar Dhar

Definition of Statistics Statistics is the science of collecting, organizing, presenting, analyzing and interpreting data for the purpose of making intelligent statements and drawing appropriate conclusions. So, according to this definition, there are four stages: (1) Collection of data (2) Presentation of data (3) Analysis of data and (4) Interpretation of data. Example of Statistics: Examples include the average starting salary of college graduates, the number of deaths due to road accidents last year, and 20% students of BBA are female. In these examples statistics are a value or a percentage. Other examples include: 95% students of BBA come to the class in time. 25% students of IBA come to the campus by car. The above are all examples of statistics. Data: Data are the facts and figures that are collected, analyzed and summarized for presentation and interpretation. The data collected in a particular study are referred as the data set for the study. For example, the heights (in cm.) of 14 randomly selected persons from a group of 100 persons are as follows: 152, 160, 158, 155, 154, 155, 162, 164, 160, 153, 161, 158, 167, 151. The above information on height of people constitutes a data. A set of five students is selected from a class of the course Business Statistics and measurements are entered into a spreadsheet as shown in the following Figure. Figure: Measurements on five undergraduate students:

The following table is a data set for15 stocks of U.S.A. Table 1: A Data Set for 15 Stocks Annual Sales Stock Exchange ($ million) A OTC 7.4 B NYSE 54.7 C AMEX 20.7 D OTC 86.6 E OTC 44.3 F AMEX 197.3 G OTC 30.6 H AMEX 26.4 I OTC 10.4 J AMEX 32.0 K NYSE 81.9 L OTC 36.1 M OTC 22.7 N AMEX 21.2 O OTC 43.5 Let us consider another example of data: The following are the information collected from married couples. SI. No. Duration of Married life

Earnings Per Share ($) .52 .32 .10 .25 .32 1.54 .38 .34 .48 .40 .21 .89 .21 .79 .38

Age of

Education of

Occupation of

Have daughter Yes = l No = 0 0 0 1 1 1 0

No. of Children

Husband 1 2 3 4 5 6 26 40 32 35 34 45

Wife 20 35 28 25 24 38 2 12 8 14 6 20

Husband 10 0 5 7 10 0

Wife 5 0 3 0 10 0

Husband * 1 1 2 1 2 1

Wife ** 0 0 0 0 0 0

0 2 1 2 1 3

* 1 for agriculture, 2 for service. ** 0 for housewife, 1 for service. There are mainly two types of data: (1) Qualitative data and (2) Quantitative data. Quantitative (numeric) data: If the observations can be expressed numerically, it is a quantitative data. For example, the data values for annual sales in Table 1 are quantitative. Since the data are quantitative, annual sales are referred to as a quantitative data. Earnings per share and price earnings ratio are also quantitative data. Qualitative data or categorical data (non numeric): A qualitative data is measured non-numerically. For example, referring to the shadow stocks in Table 1, we see that data values for the exchange variable are levels used to identify. Hence the data are qualitative and exchange is referred to as a qualitative. The marital status, the sex of students, hair color are examples of qualitative data. Quantitative data are of two types: (1) Discrete and (2) Continuous. Discrete Data: discrete data is restricted to certain values, usually whole numbers. They are often the result of enumeration or counting. The number of students in your class and the number of cars sold by General Motors are examples. In neither case will you observe fractional values. Discrete data can assume only certain values.

Continuous data: continuous data is one that can take on any value within a given range. No matter how close two observations may be: continuous data generally results from measurement. For example, height of person. Elements: The elements are the entities on which data are collected. For the data set in Table 1, each individual stock is an element. Table: Quantitative Observations Elementary Unit A basketball player A patient A customer A bolt A home A retail store A country A diamond Table: Qualitative Observations Elementary Unit A basketball player A patient A customer A bolt A home A retail store A country A diamond Characteristic of Interest (parameter) Positions Health Satisfaction Condition of thread Type of heating Merchandise Location Shape of stone Qualitative Observation Guard, forward or center Healthy or unhealthy Satisfied or not Defective or nondefective Oil, gas or electrical power Hard goods, soft goods or both Continent: Asia, Europe or Africa Square or round Characteristic of Interest (Parameter) Points scored in a game Blood pressure Money spent Diameter Price Size Per capita income Weight Quantitative Observation Points Blood pressure count Taka Fraction of an inch Taka Square feet of space Taka Number of karats

Population: A population is the entire collection of all observations of interest to the researcher. That is, population is the entire set of individuals or objects or the measurements obtained from all individuals or objects of interest. A population may consist of individuals. Example: (1) All the students enrolled at IBA. (2) All the students in Business statistics and decisionmaking (3) All teachers of JU etc. A population may also consist of objects or measurements. Example: (1) All the tires produced by a company (2) The accounts receivable at the end of October for some firm (3) The scores of first class test of all the students in Business statistics etc. Thus a population in the statistical sense does not always refer to people. Exercise: For each of the following proposed studies, indicate (i) an elementary unit, (ii) a characteristic of interest and (iii) whether the observation is qualitative or quantitative. (a) A soft drink manufacturer is planning a study to determine the preference of customers for drinks in cans or drinks in bottles. The manufacturer plans to use the results in planning a packaging problem. (b) A study is undertaken by a builder to determine the number of families owning more than one car in a geographical region. The information will be used to determine the type of garage to construct for new homes planned for the region. (c) A manufacturer of components suddenly experiences a sharp increase in the number of defectives returned by customers. She plans to check her manufacturing operations as a basis for deciding whether or not to make any adjustments in the manufacturing process. What are the population, the elementary unit and the characteristic? Is the study quantitative or qualitative? Variable: A variable is a characteristic of interest for the elements or it is the characteristic of the sample or the population being observed. The data set in Table 1 has the following four variables: Stock, Exchange, Annual Sales, Earnings Per Share. Types of variable:

There are two basic types of variables: (1) Quantitative and (2) Qualitative. Qualitative Variable: When the characteristic being studied is nonnumeric, it is called a qualitative variable or an attribute. Examples: gender, religious affiliation, type of automobile owned, date of birth etc. Quantitative variable: When the variable studied can be reported numerically, the variable is called a quantitative variable. Examples: balance in your checking account, the ages of company presidents etc. A quantitative variable can be (a) discrete or (b) continuous. (a) Discrete Variable: A discrete variable is restricted to certain values, usually whole numbers. They are often the result of enumeration or counting. The number of students in your class and the number of cars sold by General Motors are examples. In neither case will you observe fractional values. Discrete variables can assume only certain values. (b) Continuous Variable: A continuous variable is one that can take on any value within a given range. No matter how close two observations may be: A continuous variable generally results from measurement. For example, height of person, weight of person. Parameter: A parameter is a value, usually unknown (and which therefore has to be estimated), used to represent a certain population characteristic. For example, the population mean is a parameter that is often used to indicate the average value of a quantity. A parameter is a descriptive measure of the entire population of all observations of interest to the researcher. Examples are the average income of all those wage earners in the US or the total output of all the manufacturing plants.

Data

Qualitative or Attribute

Quantitative or Numerical

Examples: Type of car owned, Color of pens, Gender

Discrete
Number of children, Number of employees

Continuous
Weight of a shipment, Distance between Dhaka and Savar

Example 1: Suppose as an educator you want to introduce a new elementary teaching method in your school. But before that you want to study the IQ levels of the children (i.e., how many or what percentage of the children are below normal with IQ less than 90; percentage of children who are normal with IQ between 90 and 120; and percentage of children above normal with IQ above 120) to get some idea whether the new method would be fruitful or not. Example 2: You want to start a new business (may be an expensive car dealership) and hence you want to know the average household income per year in your primary business region (which is essential for sustaining your business). A high average household income usually indicates a bright prospect for your new business. Example 3: You are a part of an anti-smoking campaign in your school. You are concerned about the general health of your fellow students and want to know what percentage of the students are regular smokers. Example 4: As a quality control expert, you want to know the proportion of good computer chips produced by a manufacturing unit. Example 5: Suppose you are working as a consultant for a telephone company and want to find out the proportion of households in a particular district having mobile phones. Note that, for each of the above five problems, there is a collection of individuals or a group of items under study. This is called a population and this is the most basic concept of any statistical study.

In Example 1, the group of individuals under study, i.e., the population is the collection of all children in your school who go to elementary school (because you want to know their IQ level and depending on that you will introduce your teaching method). In Example 2, you are interested in the average income of households in your primary So, the collection of households in your primary business region is the 'Popu lation'. business region. particular

In Example 3, the population under study is the collection of enrolled students (in a semester) since you want to conduct your study on them. In Example 4, the population is the collection of all computer chips produced by the You then want to see what percentage of this population is good.

manufacturing unit. want to

In Example 5, the population consists of all the households in the particular district. You determine the proportion of this population having mobile phone(s).

A population is a collection of all distinct individuals or objects or items under study. N denotes the number of entities in a population, called the population size . Variable and Parameter: Once a population is fixed, we then want to study a characteristic of the individuals (or objects or items) in the population, e.g., height, weight, cholesterol level, income, ethnic background, dietary habit, smoking habit or other information. This individual characteristic is called a variable (value of which can vary from individual to individual, or from one object to another within the population). Let us now identify the variables in the above-mentioned examples. The variable in: (a) Example 1 is the IQ of an elementary school child; (b) Example 2 is the income of a household; (c) Example 3 is the smoking habit of a student; (d) Example 4 is the quality of a computer chip; (e) Example 5 is the ownership of mobile phone(s) by a household; Most of the variables can be classified in two broad categories: (a) categorical (or qualitative) variable, if it assigns a categorical (non-numerical) value to each individual or object, (b) quantitative variable, if it assigns a numerical value to each individual or object. A categorical variable is also called an attribute whereas a quantitative variable is often referred to simply as a variable. In the above six examples, we have quantitative variables in the first- two and categorical ones in the last four. (a) In Example 1, IQ of a child is a quantitative variable. (b) In Example 2, income of a household is a quantitative variable. (c) In Example 3, smoking habit is a categorical variable (since a person is either a smoker or a nonsmoker). (d) In Example 4, quality of a computer chip is a categorical variable (since a computer chip is classified as either 'good' or 'bad'). (e) In Example 5, ownership of mobile phone(s) is a categorical variable (since a household either owns or does not own). More examples of categorical variables: Ethnic background of a person (which can assume values like Caucasian, African, Asian, etc.); gender (male or female), color of a flower (white, red, yellow, etc.); and party affiliation of a voter (Party A, Party B, Independent, etc.). More examples of quantitative variables: Age of a person (which can assume values like 21, 22, 79, etc.); height of a person (5.5', 4.9', 6.1', etc.); monthly income of a person (Taka 10,000, Taka 9,751, etc.); IQ of a student (80, 92, 105, etc.); and cholesterol level of a person (180, 200, 250, etc.). Eventually, our goal is to know a summary value of the variable for the population under consideration. A summary value of the variable for the population is called a parameter. In a statistical study, we are interested in a parameter since often it gives a fairly good idea about the population. It is the unknown characteristics of the population. In Example 1, our parameter could be the average IQ level of all individuals in the population.

In example 2, the average annual household income is our parameter of interest. In example 3, the proportion of regular smokers in the population is a parameter of interest. In example 4, the percentage of good computer chips in the population is a parameter of interest. In Example 5, the percentage of households (within the state) having mobile phone(s) is a parameter of interest. Types of Statistics The study of statistics is usually divided into two categories: descriptive statistics and inferential statistics. Descriptive Statistics The definition of statistics given earlier referred to organizing, presenting, analyzing . . . data. This facet of statistics is usually referred to as descriptive statistics. Descriptive Statistics: Methods of organizing, summarizing, and presenting data in an informative way. For instance, the United States government reports the population of the United States was 179,323,000 in 1960; 203,302,000 in 1970; 226,542,000 in 1980; 248,709,000 in 1990, and 265,000,000 in 2000. This information is descriptive statistics. It is descriptive statistics if we calculate the percentage growth from one decade to the next. However, it would not be descriptive statistics if we used these to estimate the population of the United States in the year 2011 or the percentage growth from 2000 to 2010. Why? Because these statistics are not being used to summarize past populations but to estimate future populations. The following are some other examples of descriptive statistics. There are a total of 46,837 miles of interstate highways in the United States. The interstate system represents only 1 percent of the nations total roads but carries more than 20 percent of the traffic. The longest is I-90, which stretches from Boston to Seattle, a distance of 3,099 miles. The shortest is I-878 in New York City, which is 0.70 of a mile in length. Alaska does not have any interstate highways, Texas has the most interstate miles at 3,232, and New York has the most interstate routes with 28. Inferential Statistics The second type of statistics is inferential statisticsalso called statistical inference. Our main concern regarding inferential statistics is finding something about a population from a sample taken from that population. For example, a recent survey showed only 46 percent of high school seniors can solve problems involving fractions, decimals, and percentages; and only 77 percent of high school seniors correctly totaled the cost of salad, a burger, fries, and a cola on a restaurant menu. Since these are inferences about a population (all high school seniors) based on sample data, we refer to them as inferential statistics. You might think of inferential statistics as a best guess of a population value based on sample information. Inferential Statistics: The methods used to estimate a property of a population on the basis of a sample. Sources of Data The choice of a data collection method from a particular source depends on the facilities available, the extent of accuracy required in analyses, the expertise of the investigator, the time span of the study, and the amount of money and other resources required for data collection. When the data to be collected are very voluminous and require huge amounts of money, manpower, and time, reasonably accurate conclusions can be drawn by observing even a small part of the population provided the concept of sampling is used objectively. Data sources are classified as (i) primary sources, and (ii) secondary sources. Primary Data Sources Individuals, focus groups, and/or panels of respondents specifically decided upon and set up by the investigator for data collection are examples of primary data sources. Secondary Data Sources: Secondary data refer to those data that have been collected earlier for some purpose other than the analysis currently being undertaken. Besides newspapers and business magazines, other sources of such data are as follows: Government publications include publications by Bureau of Statistics, Bangladesh Bank Bulletin etc. Non-Govt. publications include publications of various industrial and trade associations. International organizations which publish data are IMF, ILO, WHO, UNICEF etc. Methods of Organizing Data Raw Data: Numerical data collected and presented in their original form are raw data. There are many ways to sort and organize the raw data. There are many ways to organize these data. Classification The raw data are so scattered that these are not used directly for necessary analysis or it is not advantageous to use these in raw form. For analytical purpose, data are condensed. One of the condensation techniques is to present the data in tabular form so that the main feature of the data is expressed and it is understood even by a layman. The table contains rows and columns, where rows are used to accommodate all information of a character in different levels and columns are for the same purpose but for another character. The number of observations of a character is

placed in a row or in a column according to its affinity of homogeneity. Thus, classification is a technique by which data are divided into several classes so that data of each class are homogeneous in nature. As an example, we can classify the couples of ages 14-49 years according to their family planning adoption using the data of following Table. Table: Distribution of MBA students according to their choice of taking course of Business Statistics & Decision Analysis Taking Business Statistics Yes No Total Number of Students 156 70 226

The above students can also be classified by their occupation and choice of taking course. Table: Distribution of MBA students according to occupation and course option Taking Business Statistics Fresh Yes No Total 30 25 55 Occupation Service 126 45 171 156 70 226 Total

Aims of Classification Whatever be the type of classification, it is an important step of analysis of numerical data. The aims of classification are: (i)Classification helps in expressing the scattered raw data in a simple way according to levels of a character so that observations of a level are homogeneous in nature. (ii) It helps in comparing the observations of different levels, (iii) It helps in presenting the important information contains in data set. (iv) It helps to signify the characteristics of data in a simple way. (v) It helps in analysis, explanation and interpretation of data set. Frequency: Number of times a variate value is repeated is called frequency of the variate value. Example: Suppose there are seven girl students who have secured 54 marks, 7 is the frequency of 54 marks. If there are 12 people with monthly income of Taka 5000 -7000, 12 is the frequency of the income group 5000 -7000. Summarizing qualitative data: Frequency Distribution A frequency distribution is a tabular summary of a set of data showing the frequency of items in each of several non-overlapping classes. The objective in developing a frequency distribution is to provide insights about the data that cannot be quickly obtained by looking, only at the original data. To see how frequency distributions can be used with qualitative data, consider the data set for the Table 1. Table1: Data from a Sample of 50 Computer Purchases: IBM Compaq Apple Apple Apple Apple Apple Packard Bell Here the data are qualitative. Packard Bell Gateway IBM Apple Packard Bell Gateway IBM IBM Compaq IBM Compaq Apple

The following table presents the frequency distribution of computer purchases. Company Frequency Apple 7 Compaq 3 Gateway 2000 2 IBM 5 Packard Bell 3 Total 20 The advantage of the frequency distribution is that it provides a data summary that is easier to understand than the original data. The frequency distribution provides a data summary showing how samples of 20 computer purchases are distributed across the five companies. Relative Frequency and Percent Frequency Distributions The relative frequency of a class is the proportion of the total number of data items belonging to the class. This is defined as Frequency of the class Relative frequency of a class = , where n is the total number of observation.

A relative frequency distribution is a tabular summary of a set of data showing the relative frequency for each class. A percent frequency distribution is a tabular summary of a set of data showing the percent frequency for each class. So we can develop a relative frequency distribution for the personal computer data for Table 1. Table 2: Relative and Percent Frequency Distributions of Computer Purchases. Relative frequency Percent frequency 7 = .35 .35 100 = 35 Apple 7 20 3 = .15 .15 100 = 15 Compaq 3 20 2 = .10 .10 100 = 10 Gateway 2000 2 20 5 = .25 .25 100 = 25 IBM 5 20 3 = .15 .15 100 = 15 Packard Bell 3 20 Total 1.00 100 From this distribution, we see that on the basis of the sample data 35% of the purchases were Apple, 15% each were for Compaq and Packard Bell and so on. Bar Graph or Bar Chart A bar graph is a graphical device for depicting qualitative data. On the horizontal axis of the graph, we specify the labels that are used for each of the classes. A frequency relative frequency or percent frequency scale can be used for the vertical axis of the graph. Then using a bar of fixed width drawn above each class label, we extend the height of the bar until we reach the frequency, relative frequency or percent frequency of the class as indicated by the vertical axis. Company Frequency

Fig. 1: Bar Graph of Computer Purchases. Example: Following is the data set of favorite colors of 21 students. Make a frequency distribution and draw bar diagram. Brown Green Brown Blue Red Red Brown Blue Blue Brown Orange Blue Solution: Category Brown Green Orange Yellow Red Blue Total Tally Frequency 6 3 3 2 2 5 21 Relative Frequency 6 21 3 21 3 21 2 2 5 21 21 21 1 Percent 28 14 14 10 10 24 100% Green Brown Yellow Orange Green Blue Brown Orange Yellow

//// /
/// /// // //

////

Fig. 2: Bar Diagram of Favorite Color Example: In the manufacturing of printed circuit boards, finished boards are subjected to a final inspection before they are shipped to customers.Here is data on the type of defect for each board rejected at final inspection during a particular time period: Type of Defect Low copper platting Poor electroless coverage Lamination problems Plating separation Etching problems Miscellaneous Draw Bar diagram. Solution: Frequency 112 35 10 8 5 12 Relative Frequency 0.615 0.192 0.055 0.044 0.027 0.066

Fig. 3: Bar Diagram of Type of Defect

10

Example: The following students are enrolled in the foreign language department and their major fields are as follows: Spanish, Spanish, French, Italian, French, Spanish, German, German, Russian, Russian, French, German, German, German, Spanish, Russian, German, Italian, German, and Spanish. Make a frequency distribution table. Solution: The given data is qualitative data. The frequency distribution table is constructed by writing down the major field and next to it the number of students (frequency). Major field German Russian Spanish French Italian Total Frequency (number of students) 7 3 5 3 2 20

Fig: Bar Diagram The Line Diagram: The line diagram is used to represent the time series data. In this diagram time is represented along the X-axis and the variable is plotted along the y-axis. Thus we get a point for each time period and successive points, when connected by straight lines give the desired diagram. This diagram is called a line graph or time series graph. Example: Following represents the number of PCs in JU for the last 10 years. This is a hypothetical data. 2001 30 2002 40 2003 70 2004 150 2005 200 2006 280 2007 325 2008 450 2009 750 2010 1025

Construct a line diagram or line chart.

Solution:

11

Fig.4: Line Chart Pie Chart: The pie chart is a commonly used graphical device for presenting relative frequency distributions for qualitative data. To draw a pie chart, we first draw a circle; then use the relative frequencies to subdivide the circle into sectors or parts that correspond to the relative frequency for each class. Example: Prepare a pie chart for the computer purchases problem. Solution: Since there are 360 degrees in a circle and since Apple has a relative frequency of 0.35, the sector of the pie chart labeled Apple should consist of 0.35 360 = 126 degrees. Similar calculations for the other classes yield the pie chart in Figure 5.

P a c k a r d 1 5 % A p p le 3 5 % IB M 2 5 %

G a te w a y 1 0 %

C o m p a q 1 5 %

Fig. 5: Pie Diagram Example: Draw pie chart for the number of defects problem. Solution:

12

Fig. 6: Pie Chart for number of Defects. Summarizing Quantitative Data: Already we have discussed that a frequency distribution is a tabular summary of a set of data showing the frequency of items in each of several non-overlapping classes. This definition holds for quantitative as well as qualitative data. Example: The Director of IBA wishes prepare to a report showing the number of hours per week students spend studying. He selects a random sample of 30 students and determines the number of hours each student studied last week. 15.0 20.3 23.2 23.7 13.7 12.9 19.7 21.4 27.1 15.4 18.3 16.6 18.3 29.8 23.0 17.1 14.2 18.9 20.8 10.3 13.5 26.1 20.7 15.7 17.4 14.0 18.6 17.8 12.9 33.8

Organize the data into a frequency distribution. Solution: Step 1: Decide the number of classes using the formula k Where k = number of classes and n = number of observations. This formula is called 2 to the k rule. This rule suggests you select the smallest number (k) for the number of classes such that 2k is greater than the number of observations ( n) . For this example, There are 30 observations, so n=30. Two raised to the fifth power is 32. Therefore, we should have at least 5 classes, i.e., k= 5. Step 2: Determine the class interval or width using the formula

>n

i H L , Where i is the class interval,


k
k is the number of classes.

H is the highest observed value, L is the lowest observed

value and

Here class interval is obtained by subtracting the lower limit of a class from the lower limit of the next class. These class intervals should be uniform. For our case,

H-L 33.8-10.3 = = 4.7 5 . k 5

Set the lower limit of the first class at 7.5 hours, giving a total of 6 classes. Step 3: Set the individual class limits. Step 4: Tally the observations into individual classes.

13

Hours Studying 9.5 up to 14.5 14.5 up to 19.5 19.5 up to 24.5 24.5 up to 29.5 29.5 up to 34.5 Step 5: Count the number of items in each class. Hours Studying 9.5 up to 14.5 14.5 up to 19.5 19.5 up to 24.5 24.5 up to 29.5 29.5 up to 34.5 This is the frequency distribution. Frequency: The number of observations in each class.

Tally Mark

//// // //// //// / //// ///


// //

Frequency (f) 7 11 8 2 2

Relative Frequency distribution: A relative frequency distribution shows the percent of observations in each class. Hours Studying 9.5 up to 14.5 14.5 up to 19.5 19.5 up to 24.5 24.5 up to 29.5 29.5 up to 34.5 Total Frequency (f) 7 11 8 2 2 30 Relative frequency 7/30 = 0.2333 11/30 = 0.3667 8/30 = 0..2667 2/30 = 0.0667 2/30 = 0.0667 30/30 = 1.000

Class Intervals and Mid Points Two concepts are frequency used. One is mid point and the other is class interval. The mid point, also called a class mark is determined by going halfway between the lower class limit and the upper class limit. Mid point

Lower class limit + Upper class limit 2

The class interval for a frequency distribution having classes of the same size can be determined by subtracting the lower limit of a class from the lower limit of the next higher class. For example: Class Class Interval 30,000 40,000 = 10,000 40,000 50,000 Example: Consider the following data. 37 54 60 67 73 79 (i) (ii) (iii) 42 44 47 56 55 53 61 62 63 65 66 68 75 74 72 80 78 82 Make a frequency distribution table. Make a cumulative frequency distribution. Make a relative frequency distribution. 46 58 67 69 71 83 50 59 64 66 76 85 48 60 64 70 81 86 52 62 68 72 80 88 90 92

14

Solution: (i), (ii) and (iii). Here lowest value = 37 and highest value = 92. Class interval 35-40 40-45 45-50 50-55 55-60 60-65 65-70 70-75 75-80 80-85 Tally marks / // /// //// //// Frequency 1 2 3 4 4 8 8 6 4 5 3 2 Cumulative frequency 1 3 6 10 14 22 30 36 40 45 48 50 Relative frequency = .02 .04 .06 .08 .08 .16 .16 .12 .08 .10 .06 .04 Frequency 50

//// /// //// /// //// I


////

////

85-90 /// 90-95 // * Here upper limit is exclusive.

Example: Form a frequency distribution from the following data by inclusive method taking 4 as the magnitude of class intervals: 10 17 15 36 18 15 Solution: Class interval Tally marks 10-13 //// 14-17 //// /// 1 8 -21 //// /// 22-25 //// // 26-29 //// 30-33 //// 34-37 // 38-41 / Example: Marks obtained by 50 boys of a class are as under: 34 30 04 54 60 17 10 59 45 21 15 25 51 07 43 52 18 40 12 43 48 36 48 51 55 32 22 41 39 22 26 30 34 35 19 53 49 40 Frequency 5 8 8 7 5 4 2 1 10 17 47 18 38 13 19 40 30 43 22 11 16 19 24 29 18 25 26 32 14 17 20 23 21 28 33 38 34 13 10 16 20 22 29 19 23 31 27 30 12 15 18 24

25 14

Construct a frequency table with a class interval 0-9, 10-19, 20-29 and so on. Solution: Under the inclusive method, the upper limit of each class is not repeated as the lower limit of the next class. That is, overlapping of class interval is avoided; both lower and upper class limits are included in the class interval. Thus, a value of 49 will fall in 40-49 class. Marks 0-9 10-19 20-29 30-39 40-49 Tally Bars // Frequency 2 12 6 10 12

////

//// // //// / //// //// //// //// // 15

7 / 1 N =5 0 Example: Following are the marks, out of 100, obtained by 50 students in Statistics:

50-59 60-69

//// //

70 45 33 64 50 25 65 75 30 20 55 65 60 58 52 56 45 42 35 40

47 51 39 61 33 59 49 41 15

55 42 63 82 65 45 63 54 52 48 46 57 53 55 42 45 39 64 55 26 18 Make a frequency distribution taking a class interval of 10 marks; take the first class interval as 0-10. Solution : Marks 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80 80-90 Frequency Distribution of Marks Tally Bars // // Frequency 0 2 3 6 13 14 9 2 1

//// / //// //// /// //// //// //// //// ////


// /

Example: The following are the number of replacement parts used in a mill in 50 consecutive weeks for a certain groups of similar machines: 49 41 45 52 47 46 42 43 46 48 45 36 56 44 61 68 54 58 51 47 47 49 42 48 53 48 41 65 45 52 58 50 55 45 43 72 63 45 38 43 42 47 43 4 9 46 57 49 44 47 48 Construct a frequency distribution with a uniform class interval. Solution: By taking 5 as the width of class interval, the distribution is as follows: Class Intervals 35-39 40-44 45-49 50-54 55-59 60-64 65-69 70-74 Tally Bars // Frequency 2 11 21 6 5 2 2 1

//// //// / //// //// //// //// / //// / ////


// // /

Example: Form a frequency distribution by taking a suitable class interval for the following data giving the ages of 52 employees in a government agency: 67 34 36 48 49 31 61 34 43 45 38 32 28 61 29 47 36 50 46 30 46 32 30 33 45 49 48 41 53 36 37 47 47 30 46 50 28 35 35 38 46 43 34 36 62 69 50 28 44 43 60 39 Solution: The lowest value is 28 and the highest value is 69. The difference between the two extreme values is 41 i.e. 69-28. When we take a class interval of 5 then the classes formed is 41/5=8.2. It means there will be nine classes including for the accommodation of fractional value. Class Intervals 25-30 Frequency Distribution of Ages Tally Bars //// Frequency 4

16

30-35 35-40 40-45 45-50 50-55 55-60 60-65 65-70

//// //// //// //// //// //// //// ///


//// //// //

10 10 5 13 4 0 4 2

Suggestions on constructing a frequency distribution

(1) Equal-size class intervals should be used: Whenever possible, the class intervals used in a
frequency distribution should be equal. Unequal class intervals present problems when the distribution is portrayed graphically. Unequal class intervals, however, may be necessary in certain situations in order to avoid a large number of empty or almost empty classes. (2) Avoid overlapping class limits (3) Avoid open-ended classes. Graphical Presentation of a Frequency Distribution We will concentrate on three commonly used graphic forms: a histogram, a frequency polygon and a cumulative frequency polygon (often called an ogive). Histogram: A histogram is one of the most widely used charts and one of the easiest to understand. It is a common graphical presentation of quantitative data. It describes a frequency distribution in terms of a series of bars each used to represent the number of class frequencies in a particular class. The histogram is particularly appropriate when the variable is continuous. Class boundary: Because of the gaps between the upper class limit of one class interval and the lower class limit of the next. Some true end points (say 34.5, 39.5, 44.5...) dividing the class is adopted. These end points are called class boundaries. There are also lower and upper class boundaries. Frequencies of discontinuous classes should be changed by into class boundaries. Upper Class Boundary = Upper Class Limit + d/2 Lower Class Boundary = Lower Class Limit d/2 Where d = gap between the upper limit of any class and lower limit of the succeeding class. A histogram can be constructed in the following way: Step 1: On a sheet of graph paper, use the class boundaries to mark class intervals on the horizontal axis. Step 2: Fill each class interval with a vertical bar so that the height of the bar equals the corresponding class frequency. Example: For the following data, construct a histogram. Here upper classes are exclusive. Class 35-40 40-45 45-50 50-55 55-60 60-65 65-70 70-75 75-80 80-85 85-90 90-95 Solution: Class interval 35-40 40-45 Class boundaries 34.5 39.5 39.5 44.5 Frequency 1 2 Frequency 1 2 3 4 4 8 8 6 4 5 3 2

17

45-50 50-55 55-60 60-65 65-70 70-75 75-80 80-85 85-90 90-95 The histogram follows:

44.5 49.5 49.5 54.5 54.5 59.5 59.5 64.5 64.5 69.5 69.5 74.5 74.5 79.5 79.5 84.5 84.5 89.5 89.5 94.5

3 4 4 8 8 6 4 5 3 2

Histogram 34.5 39.5

y c n e u q e r F

9 8 7 6 5 4 3 2 1 0 1 Class Boundary

39.5 44.5 44.5 49.5 49.5 54.5 54.5 59.5 59.5 64.5 64.5 69.5 69.5 74.5 74.5 79.5 79.5 84.5 84.5 89.5 89.5 94.5

Example: For the following data, construct a histogram. Class 35 up to 40 40 up to 45 45 up to 50 50 up to 55 55 up to 60 60 up to 65 65 up to 70 70 up to 75 75 up to 80 80 up to 85 85 up to 90 90 up to 95 Solution: Frequency 1 2 3 4 4 8 8 6 4 5 3 2

18

Histogr am

35 up to 40 40 up to 45

y c n e u q e r F

9 8 7 6 5 4 3 2 1 0 1 Class Interval

45 up to 50 50 up to 55 55 up to 60 60 up to 65 65 up to 70 70 up to 75 75 up to 80 80 up to 85 85 up to 90 90 up to 95

The Frequency Polygon A frequency polygon is similar in shape to the histogram. It consists of line segments connecting the points formed by the intersection of the class midpoint and the class frequency. Example: For the data given in previous Example, construct a frequency polygon: Solution: Class boundaries 34.5 39.5 39.5 44.5 44.5 49.5 49.5 54.5 54.5 59.5 59.5 64.5 64.5 69.5 69.5 74.5 74.5 79.5 79.5 84.5 84.5 89.5 89.5 94.5 Mid-value 37 42 47 52 57 62 67 72 77 82 87 92 Frequency 1 2 3 4 4 8 8 6 4 5 3 2

19

Frequency Polygon
9 8 7 y6 c n e5 u q e r4 F 3 2 1 0 37 42 47 52 57 62 67 72 77 82 87 92 Mid Value

Comparison between Histogram and Bar diagram Apparently both histogram and bar diagram took like same, but they are quite different and serve distinct purposes. First, a histogram is used for representing a frequency distribution only, but a bar diagram is used mainly for qualitative data. Second, in a histogram, the area of a rectangle is proportional to the relevant frequency whereas in a bar diagram it is the height of the bar that counts. Third, the rectangles in a histogram are all adjacent but the spacing of bars in a bar diagram is quite arbitrary. Comparison between Histogram and Frequency Polygon If the variable under consideration is continuous, the histogram is superior to the frequency polygon. If the variable is essentially discrete the frequency polygon is to be preferred. A histogram can be used for unequal class intervals but a frequency polygon for grouped data is admissible only when the intervals are equal. The frequency polygon has an advantage over the histogram that it allows us to compare directly two or more frequency distributions. Cumulative Frequency Polygon Suppose we are interested in how many private schools pay the teachers Tk. 55,000 a year or less. Also, how many private schools have average salaries of Tk. 72,000 or more. The answers to these queries can be approximated by developing a cumulative frequency distribution and portraying it graphically in a cumulative frequency polygon, often called an ogive. There are two types of cumulative frequency polygon (1) A less than cumulative frequency polygon (2) A more than cumulative frequency polygon. A less than cumulative frequency polygon Example: The frequency distribution for the annual salaries at the 160 colleges is given below. Income ($ thousands) 20 up to 30 30 up to 40 40 up to 50 50 up to 60 60 up to 70 70 up to 80 80 up to 90 90 up to 100 Frequency (number of colleges) 4 20 41 44 29 16 2 4

20

(a) Construct a less than cumulative frequency polygon. (b) 50% of the colleges have average annual salaries equal to or less than what amount? (c) Seventy five percent of the salaries are equal to less than what amount? Solution: Table: Less than Cumulative Frequency Distribution for the Average Annual Salaries of Professors Average annual salary ($ thousands) 20 up to 30 30 up to 40 40 up to 50 50 up to 60 60 up to 70 70 up to 80 80 up to 90 90 up to 100 Frequency 4 20 41 44 29 16 2 4 Cumulative frequency 4 24 65 109 138 154 156 160

(a) To plot a less than cumulative frequency polygon, the upper class limits (boundaries) are scaled on the X-axis and the cumulative frequencies on the Y-axis.
Fig. Less Than Cumulative Frequency Polygon

y c n e u q e r F e v i t a l u m u C

180 160 140 120 100 80 60 40 20 0 0 10 20 30 40 50 60 70 80 90 100 110


Upper Class Lim its

(b) It is about 54,000 (c) 160 .75 = 120. It is about $ 64,000. More than cumulative frequency polygon A more than cumulative frequency distribution is constructed by starting with the highest class and working backward, i.e., adding the frequencies up from the highest class to the lowest class. To draw a more than cumulative frequency polygon we use the lower limit of a class and the corresponding cumulative frequency. The details are presented here: Average annual salary ($ 000) 20 up to 30 30 up to 40 40 up to 50 50 up to 60 60 up to 70 Frequency 4 20 41 44 29 Cumulative frequency 160 156 136 95 51

21

70 up to 80 80 up to 90 90 up to 100

16 2 4

22 6 4

Fig. More Than Cumulative Frequency Polygon

180 y 160 c n 140 e u q 120 e r 100 F e 80 v i t a l 60 u m 40 u C 20 0 0 20 40 60 80 100 Lower Class Limit


To determine how many average annual salaries at the 160 colleges are equal to or greater than $ 55000. Answer is about 75. STEM-AND -LEAF PLOT This diagram is used to represent the values of a quantitative variable. The values of the variable are divided into two parts, first part is called stem and second part is leaf. In each part there may be one or more digits. However for two digits number, the digit of tenth position is used as stem and the digit of extreme right one is used as leaf. For 3-digit numbers the extreme right digit is used as leaf and the digits in the left are used as stem. Thus, there may be many leaves against a stem. For example, if the values of a variable are 15, 18, 22, 26, 24......., then stems are 1 and 2 and leaves are 4, 5, 6, 8. The leaves are shown against a stem in ascending order of magnitude. Stems are also presented one by one in a column in ascending order of magnitude. However, if there are many digits in the leaf, one digit is retained others are ignored. The resultant diagram showing two digits of the values of a variable by stem and leaf as discussed above is known as Stem-and -Leaf plot. This diagram is helpful in presenting the data of smaller size and to have the idea of the range of data. The repetition of observations in the data set can also be noted. The shape of the diagram is looked like a histogram, where bars are drawn parallel to the X-axis. However, the diagram helps to observe all the values of the data , while it is not possible to study all the values by a histogram. Example: The following data represent the marks of some students in a Statistics course (Marks out of 100): 23, 45, 72, 75, 80, 86, 60, 42, 52, 48, 66, 72, 71, 80, 82, 84, 32, 34, 37, 46, 48, 49, 60, 61, 62, 65 Represent the data by a Stem-and- Leaf plot. 86, 75, 74, 62, 45, 36, 30, 64, 66, 70, 71, 55, 54. 50, 52, 53, 55, 58, 64, 63, 64,

Solution:

22

Stem 2 3 4 5 6 7 8

Leaf 3 0 2 0 0 0 0 2 5 2 0 1 0 4 5 2 1 1 0 6 6 3 2 2 2 7 8 4 2 2 4 8 5 3 4 6 9 5 4 5 6 8 4 5 4 5 6 6

Example: we produce a stem and leaf display for the set of personal income data given in the following table. Per Capita Income for 1983 by State (in Thousand of Dollars) State AL AK AZ AR CA CO CT DE FL Income 9.2 17.2 10.7 9.0 13.3 12.8 14.9 12.7 11.6 State GA HI ID IL IN IA KS KY LA Income 10.4 12.1 9.6 12.4 10.5 10.7 12.2 9.4 10.3 State ME MD MA MI MN MS MO MT NE Income 9.8 13.0 13.3 11.5 11.9 8.1 11.0 9.9 11.2 State NV NH NJ NM NY NC ND OH OK OR PA Income 12.5 12.2 14.1 9.6 13.0 9.8 11.7 11.2 11.0 10.7 11.5 State RI SC SD TN TX UT VT VA WA WV WI WY Income 11.7 9.2 9.8 9.5 11.7 9.0 10.0 12.1 12.2 9.2 11.4 11.9

For these data the smallest stem value is 8.1 and the largest is 17.2. The stem values are recorded in a vertical column. For each number in the data set, the leaf or fractional part of the number is recorded to the right of the stem. So the stem and leaf display is 7 8 9 10 11 12 13 14 15 16 17 .1 .2 .7 .6 .8 .3 .9 .2

.0 .4 .5 .7 .0 .1

.6 .5 .9 .1 .3

.4 .7 .0 .4 .0

.8 .3 .2 .2

.9 .7 .7 .5

.6 .0 .2 .2

.8 .0 .1

.2 .5 .2

.8 .7

.5 .7

.0 .4

.2 .9

We immediately see that the per capita incomes (in $ 1000 units) range from 8.1 to 17.2 that all but 4 states had per capita personal incomes between 9.0 and 14.0, that exactly 13 states had per capita incomes from 9.0 to 9.9 and that exactly 13 states had incomes from 11.0 to 11.9. Example: Data below monthly output of coal from July 2001 to June 2003for some particular country. Monthly output (Million tons) 22.3 27.0 26.1 26.2 22.3 26.0 29.4 25.6 25.6 23.7 28.8 28.2 24.6 23.8 27.2 28.1 22.2 26.3 27.1 28.9 25.7 27.3 27.4 29.6 25.7 25.9 23.0 28.5 27.1 26.0 26.0 29.7 27.0 28.8 28.2 26.7

The data can be presented by stem and leaf display as follows:

23

Stem 22 23 24 25 26 27 28 29

Leaf .3 .7 .6 .6 .0 .1 .8 .4

.3 .8 .7 .3 .0 .8 .6

.2 .0 .7 .0 .0 .2 .7 .9 .1 .3 .2 .6 .0 .2 .1 .2 .1 .9 .7 .4 .5

Lecture Blue Print Describing Data Sets

Frequency Distributions

Graphic Presentations

Cumulative Frequency Distribution

Histogram Bar Chart

Relative Frequency Distribution

Pie Chart

Frequency Polygon Percent Frequency Distribution Cumulative Frequency Polygon

Stem and Leaf

24

You might also like