Curs Eng REI La Statistica

1 Introduction......................................................................................................................
2
1.1 History....................................................................................................................... 2
1.2 Object........................................................................................................................ 2
1.3 Data collection.......................................................................................................... 3
2 Arrange the data............................................................................................................... 3
2.1 Raw data................................................................................................................... 3
2.2 Data array................................................................................................................. 3
2.3 Frequency Distribution...............................................................................................
3 !ra"hical "resentations....................................................................................................
3.1 #artogra$s............................................................................................................... %
3.2 Frequency &olygons................................................................................................... %
3.3 #olu$n charts........................................................................................................... %
3. &ies............................................................................................................................ %
3.% Histogra$s................................................................................................................ %
3.' Ogi(e )cu$ulati(e *requency distribution+.................................................................%
3., Radar charts.............................................................................................................. %
3.- &areto charts............................................................................................................. %
.easuring the #entral /endency....................................................................................... %
.1 !eo$etrical $easures............................................................................................... %
.1.1 /he .ode............................................................................................................... %
.1.2 /he .edian............................................................................................................ %
.2 Algebraic $easures................................................................................................... '
.2.1 /he Arith$etic .ean.............................................................................................. '
.2.2 /he !eo$etric .ean.............................................................................................. '
.2.3 Har$onic $ean...................................................................................................... '
.2. 0uadratic $ean..................................................................................................... ,
.3 1elect a""ro"riate $easure...................................................................................... ,
% Dis"ersion......................................................................................................................... ,
%.1 /he range.................................................................................................................. ,
%.2 Inter*ractile Range..................................................................................................... ,
1 Introduction
1.1 History
The word statistik comes from the Italian word statista (meaning "statesman"). It was first used
by Gottfried Achenwall (1719177!)" a #rofessor at $arlborough and Gottingen. %r. &. A. '.
(immerman introduced the word statistics into &ngland. Its use was #o#ulari)ed by *ir +ohn
*inclair in his wor," Statistical Account of Scotland 1791-1799. -ong before the eighteenth
century" howe.er" #eo#le had been recording and using data.
/fficial go.ernment statistics are as old as recorded history. The /ld Testament contains se.eral
accounts of census ta,ing. Go.ernments of ancient 0abylonia" &gy#t" and 1ome gathered
detailed records of #o#ulations and resources. In the $iddle Ages" go.ernments began to register
the ownershi# of land. In A.D. 72!" 3harlemagne as,ed for detailed descri#tions of churchowned
#ro#erties. &arly in the ninth century" he com#leted a statistical enumeration of the serfs attached
to the land. About 1452" 'illiam the 3on6ueror ordered the writing of the Domesday Book, a
record of the ownershi#" e7tent" and .alue of the lands of &ngland. This wor, was &ngland8s first
statistical abstract.
1.2 Object
$anagers a##ly some statistical techni6ue to e.ery branch of #ublic and #ri.ate enter#rise.
These techni6ues are so di.erse that statisticians commonly se#arate them into two broad
categories9 descriptive statistics and inferential statistics
*u##ose a #rofessor com#utes an a.erage grade for one history class. *ince statistics describe the
#erformance of that one class but do not ma,e a generali)ation about se.eral classes" we can say
that the #rofessor is using descriptive statistics. Gra#hs" tables" and charts that dis#lay data so
that they are easier to understand are all e7am#les of descri#ti.e statistics.
:ow su##ose that the history #rofessor decides to use the a.erage grade achie.ed by one history
class to estimate the a.erage grade achie.ed in all ten sections of the same history course. The
#rocess of estimating this a.erage grade would be a #roblem in inferential statistics. *tatisticians
also refer to this category as statistical inference. /b.iously" any conclusion the #rofessor ma,es
about the ten sections of the course will be based on a generali)ation that goes far beyond the
data for the original history class; and the generali)ation may not be com#letely .alid" so the
#rofessor must state how li,ely it is to be true. *imilarly" statistical inference in.ol.es
generali)ations and statements about the probability of their .alidity.
The methods and techni6ues of statistical inference can also be used in a branch of statistics
called decision theory. <nowledge of decision theory is .ery hel#ful for managers" because it is
used to ma,e decisions under conditions of uncertaintywhen" for e7am#le" a manufacturer of
stereo sets cannot s#ecify #recisely the demand for its #roducts or when the chair#erson of the
&nglish de#artment must schedule faculty teaching assignments without ,nowing #recisely the
student enrolment.
1.3 Data collection
=sually statisticians gather data from a sam#le. They use this information to ma,e inferences
about the #o#ulation that the sam#le re#resents. Thus" sample and population are relati.e terms.
A #o#ulation is a whole" and a sam#le is a fraction or segment of that whole.
A hos#ital may study a small" re#resentati.e grou# of >ray records rather than e7amining each
record for the last ?4 years. The Gallu# @oll may inter.iew a sam#le of only !"?44 adult
Americans in order to #redict the o#inion of all adults li.ing in the =nited *tates.
*tudying sam#les is easier than studying the whole #o#ulation; it costs less and ta,es less time.
/ften" testing an air#lane #art for strength destroys the #art; thus" testing fewer #arts is desirable.
*ometimes testing in.ol.es human ris,; thus" use of sam#ling reduces that ris, to an acce#table
le.el. Ainally" it has been #ro.en that e7amining an entire #o#ulation still allows defecti.e items
to be acce#ted; thus" sam#ling" in some instances" can raise the 6uality le.el. If you8re wondering
how that can be so" thin, of how tired and inattenti.e you might get if you had to loo,
continuously at thousands and thousands of items #assing before you.
A population is a collection of all the elements we are studying and about which we are trying to
draw conclusions. 'e must define this #o#ulation so that it is clear whether or not an element is
a member of the #o#ulation. The #o#ulation for a shoes mar,et study may be all women from
urban area city who ha.e annual family incomes between ?.444 and !4.444 and ha.e at least B4
years.
A sample is a collection of some" but not all" of the elements of the #o#ulation. The #o#ulation of
our mar,eting sur.ey is all women who meet the 6ualifications listed abo.e. Any grou# of
women who meet these 6ualifications can be a sam#le" as long as the grou# is only a fraction of
the whole #o#ulation.
*tatisticians select their obser.ations so that all rele.ant grou#s are re#resented in the data. To
determine the #otential mar,et for a new #roduct" for e7am#le" analysts might study 144
consumers in a certain geogra#hical area. Analysts must be certain that this grou# contains
#eo#le re#resenting .ariables such as income le.el" race" education" and neighbourhood.
A representative sample contains the rele.ant characteristics of the #o#ulation in the same
proportions as they are included in that #o#ulation. If onethird of our #o#ulation is re#resented
by retired women" then a sam#le of the #o#ulation that is re#resentati.e in terms of age will also
contain onethird of retired #ersons.
2 Arrange the data
2.1 Raw data
Information before it is arranged and analy)ed is called ra data. It is "raw" because it is un
#rocessed by statistical methods.
2.2 Data array
'hen data are arranged in com#act" usable form" decision ma,ers can ta,e reliable information
from the en.ironment and use it to ma,e intelligent decisions. The #ur#ose of organi)ing data is
to enable us to see 6uic,ly some of the characteristics of the data we ha.e collected. 'e loo, for
things such as the range (the largest and smallest .alues)" a##arent #atterns" what .alues the data
may tend to grou# around" what .alues a##ear most often" and so on.
The more information of this ,ind that we can learn from our sam#le" the better we can
understand the #o#ulation from which it came" and the better we can ma,e decisions. An
im#ortant #art of #lanning management information systems is to summari)e and #resent data to
con.ey critical information 6uic,ly and sim#ly.
The data array is one of the sim#lest ways to #resent data. It arranges .alues in ascending or
descending order. !e"ample#
%ata arrays otter se.eral ad.antages o.er raw data9
'e can 6uic,ly notice the lowest and highest .alues in the data. $n our carpet e"ample C
'e can easily di.ide the data into sections. %he first 1&, etc.
'e can see whether any .alues a##ear more than once in the array. &6ual .alues a##eal
together.
'e can obser.e the distance between succeeding .alues in the data.
2.3 Frequency Distribution
In s#ite of these ad.antages" sometimes a data array isn8t hel#ful. 0ecause it lists e.ery
obser.ation" it is a cumbersome form for dis#laying large 6uantities of data. 'e need to
com#ress the information and still be able to use it for inter#retation and decision ma,ing.
/ne way we can com#ress data is to use a fre'uency table or a fre'uency distribution. !e"ample
- construction of a fre'uency table#. A fre6uency distribution is a table that organi)es data into
classes, that is" into grou#s of .alues describing one characteristic of the data. A fre6uency
distribution shows the number of obser.ations from the data set that fall into each of the classes.
:otice that we lose some information in constructing the fre6uency distribution. !e"ample#
A relative fre'uency distribution #resents fre6uencies in terms of fractions or #ercentages. :otice
that the sum of all the relati.e fre6uencies e6uals 1.44" or 144 #ercent.
3lassification schemes can be either 6uantitati.e or 6ualitati.e and either discrete or continuous.
Discrete classes are se#arate entities that do not #rogress from one class to the ne7t without a
brea,. (ontinuous data do #rogress from one class to the ne7t without a brea,. They in.ol.e nu
merical measurement such as the weights of cans of tomatoes" the #ressure.
It8s hel#ful to remember that discrete .ariables are things that can be counted" but continuous
.ariables are things that a##ear at some #oint on a scale.
3 Graphical presentations
Gra#hs gi.e data in a twodimensional #icture. /n the hori)ontal a7is" we can show the .alues of
the .ariable (the characteristic we are measuring. /n the vertical a7is" we mar, the fre6uencies
of the classes shown on the hori)ontal a7is. Gra#hs are useful because they em#hasis and clarify
#atterns that are not so readily discernible in tables. They attract a reader8s attention to #atterns in
the data. Gra#hs can also hel# us do #roblems concerning fre6uency distributions.
3.1 Cartograms
3.2 Frequency Polygons
3.3 Column carts
3.! Pies
3." Histograms
A histo*ram is a series of rectangles" each #ro#ortional in width to the range of .alues within a
class and #ro#ortional in height to the number of items falling in the class.
3.# Ogi$e %cumulati$e &requency distribution'
3.( Radar carts
3.) Pareto carts
4 Measuring the Central Tendency
!.1 *eometrical measures
4.1.1 The Mode
$ode is that .alue that is re#eated most often in time data set.
+or *rouped data - firstly find the modal class.
! 1
1
4
+
+ = h " ,
o
" where
"& lower limit of the modal class.
h class width
D1Enini1
D!EniniF1
4.1.2 The Median
The median is a single .alue from the data set that measures the central item in the data. Galf of
the items lie abo.e this #oint" and the other half lie below it.
To find the median of a data set" first array the data in ascending or descending order. If the data
set contains an odd number of items" the middle item of the array is the median. If there is an
even number of items" the median is the a.erage of the two middle items.
/ften" we ha.e access to data only after it has been grou#ed in a fre6uency distribution. In this
case we must locate the median class9
( )
e
,
i
e
n
+ n
h " ,

+
+ =
1
!
1
4
+ = sum of all the class fre6uencies up to, but not includin*, the median class
!.2 +lgebraic measures
4.2.1 The Arithmetic Mean
$ost of the time when we refer to the a.erage of something we are tal,ing about the arithmetic
mean.
n
"
"
i
=
To find the arithmetic mean of grou#ed data" we first calculate the mid#oint of each class. The
weighted mean enables us to calculate an a.erage that ta,es into account the importance of each
.alue to the o.erall total.
i
i i
n
n "
"
=
@ro#erties (with demonstrations)9
1. If adding the same number to all .alues of a data range" the a.erage will increase by this
number.
!. If multi#lying each number of data range by the same number" the a.erage will be multi#lied
by this number.
B. The a.erage of the differences to the a.erage is )ero.
4.2.2 The Geometric Mean
*ometimes when we are dealing with 6uantities that change o.er a #eriod of time" we need to
,now an a.erage rate of change" such as an a.erage growth rate o.er a #eriod of se.eral years. In
such cases" the sim#le arithmetic mean is ina##ro#riate" because it gi.es the wrong answers.
'hat we need to find is the *eometric mean-
=

i i
n n
i *
" "
4.2.3 Harmonic mean
=
i
i
i
h
"
n
n
"
4.2.4 Quadratic mean
=
n
n "
"
'
!
!
!.3 ,elect a--ro-riate measure
'hen we wor, statistical #roblems" we must decide whether to use the mean" the median" or the
mode as the measure of central tendency.
'hen the #o#ulation is s,ewed negati.ely or #ositi.ely" the median is often the best measure of
location" because it is always between the mean and the mode. The median is not as highly
influenced by the fre6uency of occurrence of a single .alue as is the mode" nor is it #ulled by
e7treme .alues as is the mean.
/therwise" there are no uni.ersal guidelines for a##lying the mean" median" or mode to the
measure of central tendency for different #o#ulations. &ach case must be Hudged inde#endently.
The choice of mean, median, or mode, sometimes de#ends on the common practice in a
#articular industry. The a.erage !arithmetic mean# factory wage is often announced and may be
useful for many business #lanning decisions. 0ut the median #rice of a new house is a more
useful statistic to #eo#le mo.ing to a new neighborhood (it a.oids the #roblems caused by one or
two "#alaces" distorting the mean). Aor automobile designers it ma,es more sense to thin, of the
modal family with two children when #lanning new cars.
!ispersion
The dis#ersion measures the s#read (the .ariability) of a data set.
".1 .e range
The range is the difference between the highest and the lowest obser.ed .alues. The range is
easy to understand and to find" but its usefulness as a measure of dis#ersion is limited.
".2 /nter&ractile Range

Curs Eng REI La Statistica

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Curs Eng REI La Statistica

Uploaded by

Copyright:

Available Formats

1 Introduction......................................................................................................................

You might also like