Professional Documents
Culture Documents
Introduction
Any science needs precision for its development. For precision, facts, observations or measurements have to be expressed in figures. It has been said when you can measure what you are spea ing about and express it in numbers, you now something about it, but when you cannot express it in numbers your nowledge is of meagre and unsatisfactory ind.! everything depends on measurement. &.g. you have to measure or count the number of missing teeth '( measure the vertical dimension and express it in number so that it ma es sense. Statistic or datum means a measured or counted fact or piece of the information stated as a figure such as height of one person, birth weight of a baby etc. Statistics or data is plural of the same. Statistics is the science of figures. Bio statistics is the term used when tools of statistics are applied to data that is derived from biological sciences such as medicine. " #ord $elvin %imilarly in medicine, be it diagnosis, treatment or research
)*ype text+
In pharmacology
, *o find the action of drugs , *o compare the action of two drugs or two successive dosages of same drug , *o find the relative potency of a new drug with respect to a standard drug
In medicine
, *o compare the efficiency of a particular drug, operation or line of treatment , *o find association between two attributes such as cancer and smo ing , *o identify signs and symptoms of disease
In research
, It helps in compilation of data , drawing conclusions and ma ing recommendations.
For students
, -y learning the methods in biostatistics a student learns to evaluate articles published in medical and dental 0ournals or papers read in medical and dental conferences. , 1e also understands the basic methods of observation in his clinical practice and research.
Variable
, 2haracteristics which ta es different values for different person, place or thing such as height, weight, blood pressure
Population
, .opulation includes all persons, events and ob0ects under study. it may be finite or infinite.
Sample
, 4efined as a part of a population generally selected so as to be representative of the population whose variables are under study
)*ype text+
Parameter
, It is a constant that describes a population e.g. in a college there are 567 girls. *his describes the population, hence it is a parameter.
Statistic
, %tatistic is a constant that describes the sample e.g. out of 866 students of the same college 597 girls. *his 597 will be statistic as it describes the sample
Attribute
, A characteristic based on which the population can be described into categories or class e.g. gender, caste, religion.
%ource of data
*he main sources for collection of data , &xperiments , %urveys , (ecords
Experiments
, &xperiments are performed to collect data for investigations and research by one or more wor ers.
Surveys
, 2arried out for &pidemiological studies in the field by trained teams to find incidence or prevalence of health or disease in a community.
)*ype text+
Records
, (ecords are maintained as a routine in registers and boo s over a long period of time , provides readymade data.
*ypes of data
4ata is of two types
4ata presentation
%tatistical data once collected should be systematically arranged and presented , *o arouse interest of readers , For data reduction , *o bring out important points clearly and stri ingly , For easy grasp and meaningful conclusions , *o facilitate further analysis , *o facilitate communication *wo main types of data presentation are , *abulation , ?raphic representation with charts and diagrams
Tabulation
It is the most common method 4ata presentation is in the form of columns and rows It can be of the following types , %imple tables , Fre=uency distribution tables
)*ype text+
%imple *able
@umber of patients at $I4%, -gm Aan 6B Feb 6B Earch 6B 8,C66 ;,D66 ;,<96
-ar chart 1istogram Fre=uency polygon Fre=uency curve #ine diagram 2umulative fre=uency diagram or ogive %catter diagram .ie chart .ictogram %pot map or map diagram
-ar chart
#ength of bars drawn vertical or hori:ontal is proportional to fre=uency of variable. suitable scale is chosen bars usually e=ually spaced *hey are of three types simple bar chart multiple bar chart two or more variables are grouped together component bar chart bars are divided into two parts each part representing certain item and proportional to magnitude of that item
300 250 200 1 50 1 00 50 0 1 st Qtr 2nd Qtr 3rd Qtr 4th Qtr Number o f C D P atients
)*ype text+
400 350 300 250 200 150 100 50 0 1st Qtr 45 320 250
390
80
2nd Qtr
3rd Qtr
4th Qtr
3000 2500 2000 1 500 1 000 1 500 500 0 1 st Qtr 2nd Qtr 3rd Qtr 4th Qtr 300 450 P atients to prostho 200 1 850 1 400 21 00 P atients to other Departments 500
1istogram
)*ype text+
pictorial presentation of fre=uency distribution consists of series of rectangles class interval given on vertical axis area of rectangle is proportional to the fre=uency
80 70 60 50 40 30 20 10 0 Number o !arious "esions 45 40 32 22 43 34 29 38 75 0 to 3 3 to 6 6 to 9 9 to 12 12 to 15 15 to 18 18 to 21 21 to 24 24 to 27
Fre=uency polygon
obtained by 0oining midpoints of histogram bloc s at the height of fre=uency by straight lines usually forming a polygon
Fre=uency curve
)*ype text+
when number of observations is very large and class interval is reduced the fre=uency polygon losses its angulations becoming a smooth curve nown as fre=uency curve
#ine diagram
line diagram are used to show the trends of events with the passage of time
90 80 70 60 50 40 30 20 10 0 0 1 2 3 4 5 10 25 60 Patients # ith $eriodontitis 85
graphical representation of cumulative fre=uency . it is obtained by adding the fre=uency of previous class
100 90 80 70 60 50 40 30 20 10 0
25
35
)*ype text+
.ie chart
In this fre=uencies of the group are shown as segment of circle 4egree of angle denotes the fre=uency Angle is calculated by , class !re"uency # $%& total observations
70- 11.
30- 5. 200- 31. PR/)01/ C/N)/ P+R2/ /R01/ P+D/ 150- 24.
180- 29.
.ictogram
.opular method of presenting data to the common man
Average value in a distribution is the one central value around which all the other observations are concentrated Average value helps , to find most characteristic value of a set of measurements , to find which group is better off by comparing the average of one group with that of the other the most commonly used averages are , mean , median , mode
Eean
refers to arithmetic mean it is the summation of all the observations divided by the total number of observations FnG denoted by I for sample and J for population I K x; L I8 L I> M. In N n Advantages , it is easy to calculate 4isadvantages , influenced by extreme values
Eedian
Ohen all the observation are arranged either in ascending order or descending order, the middle observation is nown as median In case of even number the average of the two middle values is ta en Eedian is better indicator of central value as it is not affected by the extreme values
Eode
)*ype text+
Eost fre=uently occurring observation in a data is called mode @ot often used in medical statistics. &xample @umber of decayed teeth in ;6 children 8,8,5,;,>,6,;6,8,>,C Eean K >5 N ;6 K >.5 Eedian K F6,;,8,8,'($,>,5,C,;6G K 8L> N8 K 8.9 Eode K 8 F > *imesG.
*ypes of variability
*here are three types of variability , Biological variability , Real variability , Experimental variability &xperimental variability are of three subtypes , 'bserver &rror , Instrumental &rror , %ampling &rror
-iological variability
It is the natural difference which occurs in individuals due to age, gender and other attributes which are inherent *his difference is small and occurs by chance and is within certain accepted biological limits
)*ype text+
(eal /ariability
such variability is more than the normal biological limits the cause of difference is not inherent or natural and is due to some external factors e.g. difference in incidence of cancer among smo ers and non smo ers may be due to excessive smo ing and not due to chance only
&xperimental /ariability
it occurs due to the experimental study they are of three types , 'bserver error the investigator may alter some information or not record the measurement correctly , Instrumental error this is due to defects in the measuring instrument both the observer and the instrument error are called non sampling error , %ampling error or errors of bias this is the error which occurs when the samples are not chosen at random from population. *hus the sample does not truly represent the population
e.g. -. of an individual can show variation even if ta en by standardi:ed method and measured by the same person. *hus one should now what is the normal variation and how to measure it. *he various measures of variation or dispersion are (ange Eean or average deviation %tandard deviation 2o efficient of variation
(ange
It is the simplest 4efined as the difference between the highest and the lowest figures in a sample 4efines the normal limits of a biological characteristic e.g. freeway space ranges between 8"5 mm @ot satisfactory as based on two extreme values only
Eean deviation
)*ype text+
It is the summation of difference or deviations from the mean in any distribution ignoring the L or , sign 4enoted by E4 E4 K P F x , x G n I K observation I K mean n K no of observation
%tandard deviation
Also called root mean s=uare deviation It is an Improvement over mean deviation used most commonly in statistical analysis 4enoted by %4 or s for sample and Q for a population 4enoted by the formula %4 K P F x , x G8 n or n"; ?reater the standard deviation, greater will be the magnitude of dispersion from mean %mall standard deviation means a high degree of uniformity of the observations Hsually measurement beyond the range of R 8 %4 are considered rare or unusual in any distribution
It summari:es the deviation of a large distribution from its mean. It helps in finding the suitable si:e of sample e.g. greater deviation indicates the need for larger sample to draw meaningful conclusions It helps in calculation of standard error which helps us to determine whether the difference between two samples is by chance or real
2oefficient of variation
It is used to compare attributes having two different units of measurement e.g. height and weight 4enoted by 2/ 2/ K %4 I ;66 Eean and is expressed as percentage
)*ype text+
)*ype text+
*hese limits on either side of measurement are called con!idence limits the loo of fre=uency distribution curve may vary depending on mean and %4 . thus it becomes necessary to standardi:e it. &g" 'ne study has %4 as > and other has %4 as 8,thus it becomes difficult to compare them *hus normal curve is standardi:ed by using the unit of standard deviation to place any measurement with reference to mean. *he curve that emerges through this procedure is called standard normal curve
)*ype text+
%ampling
It is not possible to include each and every member of population as it will be time consuming, costly , laborious . therefore sampling is done %ampling is a process by which some unit of a population or universe are selected for the study and by sub0ecting it to statistical computation, conclusions are drawn about the population from which these units are drawn *he sample will be a representative of entire population only
)*ype text+
It is sufficiently large It is unbiased %uch sample will have its statistics almost e=ual to parameters of entire population *wo main characteristics of a representative sample are .recision Hnbiased character
.recision
.recision depends on a sample si:e 'rdinarily sample si:e should not be less than >6 .recision K UnNs n K sample si:e , s K standard deviation .recision is directly proportional to s=uare root of sample si:e, greater the sample si:e greater the precision Also greater the %4, less will be the precision *hus in such cases to obtain precision, sample si:e needs to be increased
Hnbiased character
*he sample should be unbiased i.e. every individual should have an e=ual chance to be selected in the sample. *hus a standard random sampling method should be used @on sampling errors can be ta en care of by , Hsing standardi:ed instruments and criteria , -y single , double , triple blind trials , Hse of a control group
)*ype text+
*he investigator needs to decide how large an error due to sampling defect is allowable i.e. allowable error # &ither the investigator should start with assumed %4 or do a pilot study to estimate %4 sample si:e K 5 %48 N #8 Eean pulse rate of population is <6 beats per min with standard deviation of C beats. Ohat will be the sample si:e if allowable error is R; n K 5 I C I C N ; I ; K 89B If # is less n will be more i.e. larger the sample si:e lesser is the error.
)*ype text+
e.g. incidence rate in last influen:a was found to be 97 of the population exposed what should be the si:e of the sample to find incidence rate in current epidemic if allowable error is ;67V p K 97 = K D97 l K ;6 7 of p K 6.97 nK 5 I 9 I D9 N 6.9 I 6.9 K <B66
.robability or p value
2oncept of probability is very important in statistics .robability is the chance of occurrence of any event or permutation combination. It is denoted by p for sample and . for population In various tests of significance we are often interested to now whether the observed difference between 8 samples is by chance or due to sampling variation. *here probability or p value is used . ranges from 6 to ; 6 K there is no chance that the observed difference could not be due to sampling variation ; K it is absolutely certain that observed difference between 8 samples is due to sampling variation 1owever such extreme values are rare. . K 6.5 i.e. chances that the difference is due to sampling variation is 5 in ;6 'bviously the chances that it is not due to sampling variation will be B in ;6
)*ype text+
*he essence of any test of significance is to find out p value and draw inference If p value is 6.69 or more it is customary to accept that difference is due to chance Fsampling variationG . *he observed difference is said to be statistically not significant. If p value is less than 6.69 observed difference is not due chance but due to role of some external factors. *he observed difference here is said to be statistically significant. )rom shape o! normal curve Oe now that D97 observation lie within mean R 8%4 . *hus probability of value more or less than this range is 97 )rom probability tables p value is also determined by probability tables in case of student t test or chi s=uare test By area under normal curve 1ere :K standard normal deviate is calculated 2orresponding to : values the area under the curve is determined FAG .robability is given by 8F6.9 " AG
)*ype text+
*ests of significance
Ohatever be the sampling procedure or the care ta en while selecting sample, the sample statistics will differ from the population parameters Also variations between 8 samples drawn from the same population may also occur i.e. differences in the results between two research wor ers for the same investigation may be observed *hus it becomes important to find out the significance of this observed variation ie whether it is due to chance or biological variation Fstatistically not significantG '( due to influence of some external factors F statistically significantG *o test whether the variation observed is of significance, the various tests of significance are done. *he test of significance can be broadly classified as ;. Parameteric tests 8. *on parametric tests
.arameteric tests
.arametric tests are those tests in which certain assumptions are made about the population , .opulation from which sample is drawn has normal distribution , *he variances of sample do not differ significantly
)*ype text+
*he observations found are truly numerical thus arithmetic procedure such as addition, division, and multiplication can be used
%ince these test ma e assumptions about the population parameters hence they are called parameteric tests . *hese are usually used to test the difference *hey areW , %tudent t testF paired or unpairedG , A@'/A , *est of significance between two means
*est of significance can also be divided into one tailed or 8 tailed test
)*ype text+
@ull hypothesis It is a hypothesis of no difference between statistics of a sample and parameter of the population or between statistics of two samples It nullifies the claim that the experimental result is different from or better than the one observed already
If pX 6.69 the difference is due to chance and not statistically different but if p Y 6.69 the difference is due to some external factor and statistically significant
*ypes of error
Ohile drawing conclusions in a study we are li ely to commit two types of error. , *ype I error , *ype II error
*ype I error
*his type of error occurs Ohen we conclude that the difference is significant when in fact there is no real difference in the population ie, we re0ect the null hypothesis when it is true 4enoted by Z
*ype II error
*his type of error occurs Ohen we say that the difference is not significant when in fact there is a real difference between the populations i.e. the null hypothesis is not re0ected when it is actually false It is denoted by [
*hese tests are used for sample si:e greater than >6 *he test used is T test T is standard normal derivate and has been discussed under normal distribution T K observation , mean N %4 1owever in T test standard deviation is replaced by standard error In T test, T K observed difference N standard error Oe now that standard deviation measure the variation within a sample %tandard error is the measure of difference in values occuring , between a sample and population , between two samples of the same population %tandard error used in T test can be , Standard error o! mean , Standard error o! proportion , Standard error o! di!!erence bet+een ' means , Standard error o! di!!erence bet+een ' proportions If in the T test the TX8 i.e. if the observed difference between the 8 means or proportion is greater than 8 times the standard error of difference p Y 6.69 according to the given table , P ;.B 6.; 8.6 6.69 8.> 6.68 8.B 6.6;
)*ype text+
*hus the difference is not due to chance and may be due to influence of some external factor i.e. the difference is statistically significant
It helps to now what is the significance of difference obtained by 8 research wor ers for the same investigation %& FI; , I8G K U %4;8 N n; L %488 N n8 &g.Find the significance of difference in mean heights of 96 girls and 96 boys with following values ?irls -oys SE Eean ;5<.5 ;9;.B - . /%0%1' 23& 4 - 50'6 ,-observed di!!erence 2 SE , - 5350% 7 58908 2 50'6 - $0'% %ince T value is more than 8 ,p will be less than .69 *hus difference is statistically significant and it can be concluded that boys are taller than girls /%0$1' 2 3& %4 B.B B.>
)*ype text+
%tandard error of proportion is the unit which measures variation in proportion of a character from sample to population %& of proportion K U p I = N n pKproportion of positive character =Kproportion of negative character nKsample si:e Also proportion of population K proportion of sample R 8 %&. *hus one can determine whether the proportion of sample is within limits of population proportion .roportion of blood group - among Indians is >67. If in a sample of ;66 individuals it is 897 what is your conclusion about the group %&. K U p I = N n K U 89 I <9 N ;66 K 5.>> T K observed diff N %& K >6 " 89 N 5.>> K ;.;9 %ince : is Y 8 ,p will be more .69 thus the difference is not significant.
Hnpaired t test
Applied to unpaired data of observation made on individuals of 8 separate groups to find the significance of difference between 8 means %ample si:e is less than >6 e.g. difference in accuracy in an impression using two different impression materials %teps in unpaired t *est are 2alculate the mean of two samples 2alculate combined standard deviation 2alculate the standard error of mean which is given by %&E K %4 U;Nn; L ;Nn8 2alculate observed difference between means I; , I8
)*ype text+
2alculate t value K observed difference N %tandard error of mean 4etermine the degree of freedom which is one less than no of observation in a sample Fn ";G 1ere combined degree of freedom will be K Fn; , ;G L Fn8 , ;G (efer to table and find the probability of the t value corresponding to degree of freedom .Y 6.69 states difference is significant .X 6.69 states difference is not significant In a nutritional study ;> children in group A are given usual diet along with vitamin A and vitamin 4 while ;8 children in group - ta e the usual diet. *he gain in weight in pounds for both groups after ;8 months is shown in the table Is vitamin A and 4 responsible for gain in weightV
?roup A 9 > 5 > 8 B > 8 > B < 9 > ?roup ; > 8 5 8 ; > 5 > 8 8 > "
)*ype text+
Eean of group A K 5 Eean of group - K 8.9 *otal %4 K ;.>< *otal %& K 6.95C tK 'bserved difference N %& t K 5 , 8.9 N 6.95C K 8.<5 2ombined degree of freedom K n; L n8 , 8 ;8 L;> " 8 p /alue is chec ed corresponding to the t value at 8> d.f. from the t table It is Y 6.68 *hus difference is statistically significant And accounted to role of vitamins A\4
.aired t test
It is applied to paired data of observation from one sample only . Hsed in sample less than >6 *he individual gives a pair of observation i.e. observation before and after ta ing a drug
)*ype text+
*he steps involved are 2alculate the difference in paired observation i.e. before and after K x; , x8 K y 2alculate the mean of this difference K y 2alculate %4 2alculate %& K %4 N U n 4etermine t K y N %& 4etermine the degree of freedom %ince there is one sample df K n"; (efer to table and find the probability of the t value corresponding to degree of freedom .Y 6.69 states difference is significant .X 6.69 states difference is not significant &g.%ystolic -. of a normal individual before and after in0ection of hypotensive drug is given in the table. 4oes the drug lower the -.V
-. before giving drug I; ;88 ;8; ;86 ;;9 ;8B ;>6 ;86 ;89 ;8C -. after giving drug I8 ;86 ;;C ;;9 ;;6 ;88 ;>6 ;;B ;85 ;89 4ifference I;"I8 K y 8 > 9 9 5 6 5 ; >
Eean of difference y K P y N n K 8< N D K > %4 K U P F y " y G8 Nn"; K ;.<> %& K %4 N n K ;.<> N D K 6.9C t K y N %& K > N 6.9C K 9.;< 4egree of freedom to n , ; K D , ; K C
)*ype text+
p value corresponding to t K 9.;< and d.f. C is Y 6.66; *hus highly significant *hus decrease in -. is due to the 4rug
X8 K ] F ' , & G 8 N & X8 denotes 2hi s=uare ' K 'bserved /alue & K &xpected /alue
/accine - seems to be superior to /accine A Oe perform 2hi %=uare test to verify if the vaccine - is superior to vaccine A or is it merely due to chance
Find total attac and non attac rates *otal Attac rate K >B N ;<B K 6.865 6.<D9 *otal @on Attac (ate K ;56 N ;<B K
/accine A FnKD6G
Attac ed ' K 88 & K 6.865 I D6 K;C.>B ' " & K L >.B5 ' K ;5
@ot Attac ed ' K BC & K 6.<D9 I D6 K <;.99 ' " & K " >.99 ' K <8 & K 6.<D9 I CB K BC.>< ' " & K L >.B>
FnKCBG
X8 K P F ' , & G 8 N & K F>.B5G8 N;C.>B L F>.99G 8 N <;.99 L F>.95G 8N ;<.95 L F>.B>G 8 N BC.>< K 6.<8 L 6.;< L 6.<; L 6.;D K ;.<D Find the 4egree of Freedom K Fc";G Fr";G c K number of 2olumns r K number of (ows d.f. K F8";GF8";G K ; Find the p value 'n referring to 2hi s=uare table with one degree of freedom the p value was more than 6.69.
)*ype text+
1ence the difference is not statistically significant and the null hypothesis of no difference between vaccines is accepted.
A@'/A
Analysis of variance Investigations may not always be confined to comparison of 8 samples only e.g. we might li e to compare the difference in vertical dimension obtained using > or more methods li e phonetics, swallowing, niswongers method In such cases where more than 8 samples are used A@'/A can be used Also when measurements are influenced by several factors playing there role e.g. factors affecting retention of a denture, A@'/A can be used. A@'/A helps to decide which factors are more important (e=uirements 4ata for each group are assumed to be independent and normally distributed %ampling should be at random 'ne way A@'/A Ohere only one factor will effect the result between 8 groups *wo way A@'/A Ohere we have 8 factors that affect the result or outcome Eulti way A@'/A
)*ype text+
F test
F K Eean %=uare between %amples N Eean %=uare within %amples F K variance ratio *he values of Eean s=uare are seen from the analysis of variance table if we have the values of sum of s=uares and degree of freedom F which are calculated G Eean %=uare between %amples , It denotes the difference between the sample mean of all groups involved in the study FA, -, 2 etcG with the mean of the population Eean %=uare within %amples , it denotes the difference between the means in between different samples *he greater both these value more is the difference between the samples *he F value observed from the study is compared to the theoretical F value obtained from the *ables at ;7 and 97 confidence limits. *he results are then interpreted. If the observed value is more than theoretical value at ;7 , the relation is highly significant. If the observed value is less than the theoretical value at 97 it is not significant.
)*ype text+
If the observed value is between ; and 97 of theoretical value it is statistically significant. .resented by 4r %hilpi ?ilra
)*ype text+