Biostatistics (DR Shilpi Gilra)

Bio Statistics
Introduction
Any science needs precision for its development. For precision, facts, observations or measurements have to be expressed in figures. It has been said when you can measure what you are spea ing about and express it in numbers, you now something about it, but when you cannot express it in numbers your nowledge is of meagre and unsatisfactory ind.! everything depends on measurement. &.g. you have to measure or count the number of missing teeth '( measure the vertical dimension and express it in number so that it ma es sense. Statistic or datum means a measured or counted fact or piece of the information stated as a figure such as height of one person, birth weight of a baby etc. Statistics or data is plural of the same. Statistics is the science of figures. Bio statistics is the term used when tools of statistics are applied to data that is derived from biological sciences such as medicine. " #ord $elvin %imilarly in medicine, be it diagnosis, treatment or research
)*ype text+
Applications and uses of bio statistics as a science

In physiology and anatomy
, *o define the limits of normality for variable such as height or weight or -lood .ressure etc in a population. , /ariation more than natural limits may be pathological i.e abnormal due to play of certain external factors. , *o find correlation between two variables li e height and weight.
In pharmacology
, *o find the action of drugs , *o compare the action of two drugs or two successive dosages of same drug , *o find the relative potency of a new drug with respect to a standard drug
In medicine
, *o compare the efficiency of a particular drug, operation or line of treatment , *o find association between two attributes such as cancer and smo ing , *o identify signs and symptoms of disease
In community medicine and public health

, *o test usefulness of sera or vaccine in the field , In epidemiologic studies the role of causative factors is statistically tested
)*ype text+
In research
, It helps in compilation of data , drawing conclusions and ma ing recommendations.
For students
, -y learning the methods in biostatistics a student learns to evaluate articles published in medical and dental 0ournals or papers read in medical and dental conferences. , 1e also understands the basic methods of observation in his clinical practice and research.
2ommon %tatistical *erms

Constant
, 3uantities that do not vary e.g. in biostatistics, mean, standard deviation are considered constant for a population
Variable
, 2haracteristics which ta es different values for different person, place or thing such as height, weight, blood pressure
Population
, .opulation includes all persons, events and ob0ects under study. it may be finite or infinite.
Sample
, 4efined as a part of a population generally selected so as to be representative of the population whose variables are under study
)*ype text+
Parameter
, It is a constant that describes a population e.g. in a college there are 567 girls. *his describes the population, hence it is a parameter.
Statistic
, %tatistic is a constant that describes the sample e.g. out of 866 students of the same college 597 girls. *his 597 will be statistic as it describes the sample
Attribute
, A characteristic based on which the population can be described into categories or class e.g. gender, caste, religion.
%ource of data
*he main sources for collection of data , &xperiments , %urveys , (ecords
Experiments
, &xperiments are performed to collect data for investigations and research by one or more wor ers.
Surveys
, 2arried out for &pidemiological studies in the field by trained teams to find incidence or prevalence of health or disease in a community.
)*ype text+
Records
, (ecords are maintained as a routine in registers and boo s over a long period of time , provides readymade data.
*ypes of data
4ata is of two types
Qualitative or discrete data

In such data there is no notion of magnitude or si:e of an attribute as the same cannot be measured. *he number of person having the same attribute are variable and are measured e.g. li e out of ;66 people <9 have class I occlusion, ;9 have class II occlusion and ;6 have class III occlusion. 2lass I II III are attributes , which cannot be measured in figures, only no of people having it can be determined
Quantitative or continuous data

In this the attribute has a magnitude. both the attribute and the number of persons having the attribute vary &.g Freeway space. It varies for every patient. It is a =uantity with a different value for each individual and is measurable. It is continuous as it can ta e any value between 8 and 5 li e it can be 8.;6 or 8.99 or >.6< etc.
)*ype text+
4ata presentation
%tatistical data once collected should be systematically arranged and presented , *o arouse interest of readers , For data reduction , *o bring out important points clearly and stri ingly , For easy grasp and meaningful conclusions , *o facilitate further analysis , *o facilitate communication *wo main types of data presentation are , *abulation , ?raphic representation with charts and diagrams
Tabulation
It is the most common method 4ata presentation is in the form of columns and rows It can be of the following types , %imple tables , Fre=uency distribution tables
)*ype text+
%imple *able
@umber of patients at $I4%, -gm Aan 6B Feb 6B Earch 6B 8,C66 ;,D66 ;,<96
Fre=uency distribution table

In a fre=uency distribution table, the data is first split into convenient groups F class interval G and the number of items F fre=uency G which occurs in each group is shown in ad0acent column. @umber of 2avities 6 to > > to B B to D D and above @umber of .atients <C B< >8 ;B
Charts and diagrams

Hseful method of presenting statistical data .owerful impact on imagination of the people *hey are
)*ype text+
-ar chart 1istogram Fre=uency polygon Fre=uency curve #ine diagram 2umulative fre=uency diagram or ogive %catter diagram .ie chart .ictogram %pot map or map diagram
-ar chart
#ength of bars drawn vertical or hori:ontal is proportional to fre=uency of variable. suitable scale is chosen bars usually e=ually spaced *hey are of three types simple bar chart multiple bar chart two or more variables are grouped together component bar chart bars are divided into two parts each part representing certain item and proportional to magnitude of that item
300 250 200 1 50 1 00 50 0 1 st Qtr 2nd Qtr 3rd Qtr 4th Qtr Number o f C D P atients
)*ype text+
%imple -ar 2hart
400 350 300 250 200 150 100 50 0 1st Qtr 45 320 250
370 280 220 180 95
390
290 CD Patients RPD Patients FPD Patients 40
80
2nd Qtr
3rd Qtr
4th Qtr
Eultiple -ar 2hart
3000 2500 2000 1 500 1 000 1 500 500 0 1 st Qtr 2nd Qtr 3rd Qtr 4th Qtr 300 450 P atients to prostho 200 1 850 1 400 21 00 P atients to other Departments 500
2omponent -ar 2hart
1istogram
)*ype text+
pictorial presentation of fre=uency distribution consists of series of rectangles class interval given on vertical axis area of rectangle is proportional to the fre=uency
80 70 60 50 40 30 20 10 0 Number o !arious "esions 45 40 32 22 43 34 29 38 75 0 to 3 3 to 6 6 to 9 9 to 12 12 to 15 15 to 18 18 to 21 21 to 24 24 to 27
Fre=uency polygon
obtained by 0oining midpoints of histogram bloc s at the height of fre=uency by straight lines usually forming a polygon
Fre=uency curve
)*ype text+
when number of observations is very large and class interval is reduced the fre=uency polygon losses its angulations becoming a smooth curve nown as fre=uency curve
#ine diagram
line diagram are used to show the trends of events with the passage of time
90 80 70 60 50 40 30 20 10 0 0 1 2 3 4 5 10 25 60 Patients # ith $eriodontitis 85
2umulative Fre=uency 4iagram

)*ype text+
graphical representation of cumulative fre=uency . it is obtained by adding the fre=uency of previous class
100 90 80 70 60 50 40 30 20 10 0
90 70 40 45 55 Pre&a"en!e o Denta" Caries ' in $er!ent(
25
35
0 to 10 to 20 to 30 to 40 to 50 to 60 to 10 20 30 40 50 60 70 %rs %rs %rs %rs %rs %rs %rs
%catter or 4ot diagram

shows relationship between two variables If the dots are clustered showing a straight line, it shows a relationship of linear nature
14 12 10 8 6 4 2 0 0 5 10 15 Carious "esion )u*ar +,$osure
)*ype text+
.ie chart
In this fre=uencies of the group are shown as segment of circle 4egree of angle denotes the fre=uency Angle is calculated by , class !re"uency # $%& total observations
70- 11.
30- 5. 200- 31. PR/)01/ C/N)/ P+R2/ /R01/ P+D/ 150- 24.
180- 29.
.ictogram
.opular method of presenting data to the common man
%pot map or map diagram

*hese maps are prepared to show geographic distribution of fre=uencies of characteristics
Eeasures of statistical averages or central tendency

)*ype text+
Average value in a distribution is the one central value around which all the other observations are concentrated Average value helps , to find most characteristic value of a set of measurements , to find which group is better off by comparing the average of one group with that of the other the most commonly used averages are , mean , median , mode
Eean
refers to arithmetic mean it is the summation of all the observations divided by the total number of observations FnG denoted by I for sample and J for population I K x; L I8 L I> M. In N n Advantages , it is easy to calculate 4isadvantages , influenced by extreme values
Eedian
Ohen all the observation are arranged either in ascending order or descending order, the middle observation is nown as median In case of even number the average of the two middle values is ta en Eedian is better indicator of central value as it is not affected by the extreme values
Eode
)*ype text+
Eost fre=uently occurring observation in a data is called mode @ot often used in medical statistics. &xample @umber of decayed teeth in ;6 children 8,8,5,;,>,6,;6,8,>,C Eean K >5 N ;6 K >.5 Eedian K F6,;,8,8,'($,>,5,C,;6G K 8L> N8 K 8.9 Eode K 8 F > *imesG.
*ypes of variability
*here are three types of variability , Biological variability , Real variability , Experimental variability &xperimental variability are of three subtypes , 'bserver &rror , Instrumental &rror , %ampling &rror
-iological variability
It is the natural difference which occurs in individuals due to age, gender and other attributes which are inherent *his difference is small and occurs by chance and is within certain accepted biological limits
)*ype text+
e.g. vertical dimension may vary from patient to patient
(eal /ariability
such variability is more than the normal biological limits the cause of difference is not inherent or natural and is due to some external factors e.g. difference in incidence of cancer among smo ers and non smo ers may be due to excessive smo ing and not due to chance only
&xperimental /ariability
it occurs due to the experimental study they are of three types , 'bserver error the investigator may alter some information or not record the measurement correctly , Instrumental error this is due to defects in the measuring instrument both the observer and the instrument error are called non sampling error , %ampling error or errors of bias this is the error which occurs when the samples are not chosen at random from population. *hus the sample does not truly represent the population
Eeasures of variation or dispersion

-iological data collected by measurement shows variation
)*ype text+
e.g. -. of an individual can show variation even if ta en by standardi:ed method and measured by the same person. *hus one should now what is the normal variation and how to measure it. *he various measures of variation or dispersion are (ange Eean or average deviation %tandard deviation 2o efficient of variation
(ange
It is the simplest 4efined as the difference between the highest and the lowest figures in a sample 4efines the normal limits of a biological characteristic e.g. freeway space ranges between 8"5 mm @ot satisfactory as based on two extreme values only
Eean deviation
)*ype text+
It is the summation of difference or deviations from the mean in any distribution ignoring the L or , sign 4enoted by E4 E4 K P F x , x G n I K observation I K mean n K no of observation
%tandard deviation
Also called root mean s=uare deviation It is an Improvement over mean deviation used most commonly in statistical analysis 4enoted by %4 or s for sample and Q for a population 4enoted by the formula %4 K P F x , x G8 n or n"; ?reater the standard deviation, greater will be the magnitude of dispersion from mean %mall standard deviation means a high degree of uniformity of the observations Hsually measurement beyond the range of R 8 %4 are considered rare or unusual in any distribution
Hses of %tandard 4eviation

)*ype text+
It summari:es the deviation of a large distribution from its mean. It helps in finding the suitable si:e of sample e.g. greater deviation indicates the need for larger sample to draw meaningful conclusions It helps in calculation of standard error which helps us to determine whether the difference between two samples is by chance or real
2oefficient of variation
It is used to compare attributes having two different units of measurement e.g. height and weight 4enoted by 2/ 2/ K %4 I ;66 Eean and is expressed as percentage
)*ype text+
@ormal distribution or normal curve

%o much of physiologic variation occurs in any observation @ecessary to , 4efine normal limits , 4etermine the chances of an observation being normal , *o determine the proportion of observation that lie within a given range @ormal distribution or normal curve used most commonly in statistics helps us to find these #arge number of observations with a narrow class interval gives a fre=uency curve called the normal curve It has the following characteristics -ell shaped -ilaterally symmetrical Fre=uency increases from one side reaches its highest and decreases exactly the way it had increased *he highest point denotes mean, median and mode which coincide Eean LS ; %4 includes BC.8<7 of all observations . such observations are fairly common Eean L" 8 %4 includes D9.597 of all observations i.e. by convention values beyond this range are uncommon or rare. *here chances of u u<<being normal is ;66 , D9.59 7 i.e. only 5.99.7. Eean L" > %4 includes DD.<>7. such values are very rare. *here chance of being normal is 6.8<7 only
)*ype text+
*hese limits on either side of measurement are called con!idence limits the loo of fre=uency distribution curve may vary depending on mean and %4 . thus it becomes necessary to standardi:e it. &g" 'ne study has %4 as > and other has %4 as 8,thus it becomes difficult to compare them *hus normal curve is standardi:ed by using the unit of standard deviation to place any measurement with reference to mean. *he curve that emerges through this procedure is called standard normal curve
)*ype text+
Properties o! standard normal curve

smooth bell shaped perfectly symmetrical based on infinite number of observations thus curve does not touch I axis mean is :ero %4 is always ; total area under the curve is ; mean median mode coincide the unit of %4 here is relative or standard normal deviate and is denoted by T T K 'bservation , Eean %4 Oith the help of T value we can find the area under the curve from a table *his area helps to give the . value
%ampling
It is not possible to include each and every member of population as it will be time consuming, costly , laborious . therefore sampling is done %ampling is a process by which some unit of a population or universe are selected for the study and by sub0ecting it to statistical computation, conclusions are drawn about the population from which these units are drawn *he sample will be a representative of entire population only
)*ype text+
It is sufficiently large It is unbiased %uch sample will have its statistics almost e=ual to parameters of entire population *wo main characteristics of a representative sample are .recision Hnbiased character
.recision
.recision depends on a sample si:e 'rdinarily sample si:e should not be less than >6 .recision K UnNs n K sample si:e , s K standard deviation .recision is directly proportional to s=uare root of sample si:e, greater the sample si:e greater the precision Also greater the %4, less will be the precision *hus in such cases to obtain precision, sample si:e needs to be increased
Hnbiased character
*he sample should be unbiased i.e. every individual should have an e=ual chance to be selected in the sample. *hus a standard random sampling method should be used @on sampling errors can be ta en care of by , Hsing standardi:ed instruments and criteria , -y single , double , triple blind trials , Hse of a control group
)*ype text+
4etermination of sample si:e

For 3uantitative 4ata
*he investigator needs to decide how large an error due to sampling defect is allowable i.e. allowable error # &ither the investigator should start with assumed %4 or do a pilot study to estimate %4 sample si:e K 5 %48 N #8 Eean pulse rate of population is <6 beats per min with standard deviation of C beats. Ohat will be the sample si:e if allowable error is R; n K 5 I C I C N ; I ; K 89B If # is less n will be more i.e. larger the sample si:e lesser is the error.
For =ualitative data

In such data we deal with proportion %ample si:e K n K 5 p = #8 p K proportion of positive character = K proportion of negative character = K ;"p or F;66"p if expressed in percentG # K allowable error usually ;67 of p
)*ype text+
e.g. incidence rate in last influen:a was found to be 97 of the population exposed what should be the si:e of the sample to find incidence rate in current epidemic if allowable error is ;67V p K 97 = K D97 l K ;6 7 of p K 6.97 nK 5 I 9 I D9 N 6.9 I 6.9 K <B66
.robability or p value
2oncept of probability is very important in statistics .robability is the chance of occurrence of any event or permutation combination. It is denoted by p for sample and . for population In various tests of significance we are often interested to now whether the observed difference between 8 samples is by chance or due to sampling variation. *here probability or p value is used . ranges from 6 to ; 6 K there is no chance that the observed difference could not be due to sampling variation ; K it is absolutely certain that observed difference between 8 samples is due to sampling variation 1owever such extreme values are rare. . K 6.5 i.e. chances that the difference is due to sampling variation is 5 in ;6 'bviously the chances that it is not due to sampling variation will be B in ;6
)*ype text+
*he essence of any test of significance is to find out p value and draw inference If p value is 6.69 or more it is customary to accept that difference is due to chance Fsampling variationG . *he observed difference is said to be statistically not significant. If p value is less than 6.69 observed difference is not due chance but due to role of some external factors. *he observed difference here is said to be statistically significant. )rom shape o! normal curve Oe now that D97 observation lie within mean R 8%4 . *hus probability of value more or less than this range is 97 )rom probability tables p value is also determined by probability tables in case of student t test or chi s=uare test By area under normal curve 1ere :K standard normal deviate is calculated 2orresponding to : values the area under the curve is determined FAG .robability is given by 8F6.9 " AG
)*ype text+
*ests of significance
Ohatever be the sampling procedure or the care ta en while selecting sample, the sample statistics will differ from the population parameters Also variations between 8 samples drawn from the same population may also occur i.e. differences in the results between two research wor ers for the same investigation may be observed *hus it becomes important to find out the significance of this observed variation ie whether it is due to chance or biological variation Fstatistically not significantG '( due to influence of some external factors F statistically significantG *o test whether the variation observed is of significance, the various tests of significance are done. *he test of significance can be broadly classified as ;. Parameteric tests 8. *on parametric tests
.arameteric tests
.arametric tests are those tests in which certain assumptions are made about the population , .opulation from which sample is drawn has normal distribution , *he variances of sample do not differ significantly
)*ype text+
*he observations found are truly numerical thus arithmetic procedure such as addition, division, and multiplication can be used
%ince these test ma e assumptions about the population parameters hence they are called parameteric tests . *hese are usually used to test the difference *hey areW , %tudent t testF paired or unpairedG , A@'/A , *est of significance between two means
@on parametric tests

In many biological investigations, the research wor er may not now the nature of distribution or other re=uired values of the population. Also some biological measurements may not be true numerical values hence arithmetic procedures are not possible in such cases. In such cases distribution free or non parametric tests are used in which no assumption are made about the population parameters e.g. Eann Ohitney test 2hi s=uare test .hi coefficient test Fischers &xact test %ign *est Freidmans *est
*est of significance can also be divided into one tailed or 8 tailed test
)*ype text+
*wo tailed test

*his test determines if there is a difference between the two groups without specifying whether difference is higher or lower It includes both ends or tails of the normal distribution %uch test is called *wo tailed test &g when one wants to now if mean I3 in malnourished children is different from well nourished children but does not specify if it is more or less
'ne tailed test

In the test of significance when one wants to specifically now if the difference between the two groups is higher or lower ie the direction plus or minus side is specified. *hen one end or tail of the distribution is excluded eg if one wants to now if mal nourished children have less mean I3 than well nourished then higher side of the distribution will be excluded %uch test of significance is called one tailed test
Stages in per!orming test o! signi!icance

%tate the null hypothesis %tate the alternative hypothesis Accept or re0ect the null hypothesis Finally determine the p value
%tate the null hypothesis

)*ype text+
@ull hypothesis It is a hypothesis of no difference between statistics of a sample and parameter of the population or between statistics of two samples It nullifies the claim that the experimental result is different from or better than the one observed already
%tate the Alternative hypothesis

It is hypothesis stating that the sample result is different ie larger or smaller than the value of population or statistics of one sample is different from the other
Accept or re0ect the null hypothesis

@ull 1ypothesis is accepted or re0ected depending on whether the result falls in :one of acceptance or :one of re0ection If the result of a sample falls in the area of mean R 8%& the null hypothesis is accepted. *his area of normal curve is called :one of acceptance for null hypothesis If the result of sample falls beyond the area of mean R 8 %& null hypothesis of no difference is re0ected and alternate hypothesis accepted *his area of normal curve is called :one of re0ection for null hypothesis
Finally determine the p value

. value is determined using any of the previously mentioned methods
)*ype text+
If pX 6.69 the difference is due to chance and not statistically different but if p Y 6.69 the difference is due to some external factor and statistically significant
*ypes of error
Ohile drawing conclusions in a study we are li ely to commit two types of error. , *ype I error , *ype II error
*ype I error
*his type of error occurs Ohen we conclude that the difference is significant when in fact there is no real difference in the population ie, we re0ect the null hypothesis when it is true 4enoted by Z
*ype II error
*his type of error occurs Ohen we say that the difference is not significant when in fact there is a real difference between the populations i.e. the null hypothesis is not re0ected when it is actually false It is denoted by [
*ests of significance for large samples

)*ype text+
*hese tests are used for sample si:e greater than >6 *he test used is T test T is standard normal derivate and has been discussed under normal distribution T K observation , mean N %4 1owever in T test standard deviation is replaced by standard error In T test, T K observed difference N standard error Oe now that standard deviation measure the variation within a sample %tandard error is the measure of difference in values occuring , between a sample and population , between two samples of the same population %tandard error used in T test can be , Standard error o! mean , Standard error o! proportion , Standard error o! di!!erence bet+een ' means , Standard error o! di!!erence bet+een ' proportions If in the T test the TX8 i.e. if the observed difference between the 8 means or proportion is greater than 8 times the standard error of difference p Y 6.69 according to the given table , P ;.B 6.; 8.6 6.69 8.> 6.68 8.B 6.6;
)*ype text+
*hus the difference is not due to chance and may be due to influence of some external factor i.e. the difference is statistically significant
%tandard error of mean

Hsed for =uantitative data %tandard error of mean is the difference between sample mean and population mean given by %& x K %4 of %ample N Un also population mean will be sample mean R 8 standard error of mean limits of population mean 1ere TKsample mean , population mean N %& of mean In a random sample of ;66 the mean blood sugar is C6 mg 7 with %4 B mg7 . Oithin what limits the population mean will be V Ohat can be said about another sample whose mean is C87 %& K B N U;66 K B N ;6 K 6.B *hus the population mean will be C6R 8 I 6.B K <C.C to C;.8 A sample with C87 mean is not within limits of population mean thus it does not seem to be drawn from the same population *his will enable us to now whether the sample mean is within the
%tandard error of difference between 8 means

Hsed for =uantitative data It is the difference between means of two samples drawn from the same population
)*ype text+
It helps to now what is the significance of difference obtained by 8 research wor ers for the same investigation %& FI; , I8G K U %4;8 N n; L %488 N n8 &g.Find the significance of difference in mean heights of 96 girls and 96 boys with following values ?irls -oys SE Eean ;5<.5 ;9;.B - . /%0%1' 23& 4 - 50'6 ,-observed di!!erence 2 SE , - 5350% 7 58908 2 50'6 - $0'% %ince T value is more than 8 ,p will be less than .69 *hus difference is statistically significant and it can be concluded that boys are taller than girls /%0$1' 2 3& %4 B.B B.>
%tandard error of proportion

In case of =ualitative data where character remains same but its fre=uency varies we express it in proportion instead of mean .roportion of individual having special character p = is number of individual not having the character .L= K; or ;66 if expressed in 7age
)*ype text+
%tandard error of proportion is the unit which measures variation in proportion of a character from sample to population %& of proportion K U p I = N n pKproportion of positive character =Kproportion of negative character nKsample si:e Also proportion of population K proportion of sample R 8 %&. *hus one can determine whether the proportion of sample is within limits of population proportion .roportion of blood group - among Indians is >67. If in a sample of ;66 individuals it is 897 what is your conclusion about the group %&. K U p I = N n K U 89 I <9 N ;66 K 5.>> T K observed diff N %& K >6 " 89 N 5.>> K ;.;9 %ince : is Y 8 ,p will be more .69 thus the difference is not significant.
%tandard error of difference between 8 proportion

Eeasures the difference in proportion of a character from sample to sample %& Fp;"p8G K U p; =; N n; L p8 =8 N n8 If typhoid mortality in a sample of ;66 is 86 7 and other sample of ;66 is >67 then is this difference in mortality rate significant V p; K 86 W =; K C6 W n; K ;66 p8 K >6 W =8 K <6 W n8 K ;66 %&Fp;"p8G K B.6C T K >6 , 86 N B.6C K ;.B5
)*ype text+
TY 8 , pY.69 thus difference observed is not significant
*est of significance for small samples

In case of samples less than >6 the T value will not follow the normal distribution 1ence T test will not give the correct level of significance . In such cases students t test is used It was given by O% ?ossett whose pen name was student *here are two types of student t *est ;. Hnpaired t test 8. .aired t test
Hnpaired t test
Applied to unpaired data of observation made on individuals of 8 separate groups to find the significance of difference between 8 means %ample si:e is less than >6 e.g. difference in accuracy in an impression using two different impression materials %teps in unpaired t *est are 2alculate the mean of two samples 2alculate combined standard deviation 2alculate the standard error of mean which is given by %&E K %4 U;Nn; L ;Nn8 2alculate observed difference between means I; , I8
)*ype text+
2alculate t value K observed difference N %tandard error of mean 4etermine the degree of freedom which is one less than no of observation in a sample Fn ";G 1ere combined degree of freedom will be K Fn; , ;G L Fn8 , ;G (efer to table and find the probability of the t value corresponding to degree of freedom .Y 6.69 states difference is significant .X 6.69 states difference is not significant In a nutritional study ;> children in group A are given usual diet along with vitamin A and vitamin 4 while ;8 children in group - ta e the usual diet. *he gain in weight in pounds for both groups after ;8 months is shown in the table Is vitamin A and 4 responsible for gain in weightV
?roup A 9 > 5 > 8 B > 8 > B < 9 > ?roup ; > 8 5 8 ; > 5 > 8 8 > "
)*ype text+
Eean of group A K 5 Eean of group - K 8.9 *otal %4 K ;.>< *otal %& K 6.95C tK 'bserved difference N %& t K 5 , 8.9 N 6.95C K 8.<5 2ombined degree of freedom K n; L n8 , 8 ;8 L;> " 8 p /alue is chec ed corresponding to the t value at 8> d.f. from the t table It is Y 6.68 *hus difference is statistically significant And accounted to role of vitamins A\4
.aired t test
It is applied to paired data of observation from one sample only . Hsed in sample less than >6 *he individual gives a pair of observation i.e. observation before and after ta ing a drug
)*ype text+
*he steps involved are 2alculate the difference in paired observation i.e. before and after K x; , x8 K y 2alculate the mean of this difference K y 2alculate %4 2alculate %& K %4 N U n 4etermine t K y N %& 4etermine the degree of freedom %ince there is one sample df K n"; (efer to table and find the probability of the t value corresponding to degree of freedom .Y 6.69 states difference is significant .X 6.69 states difference is not significant &g.%ystolic -. of a normal individual before and after in0ection of hypotensive drug is given in the table. 4oes the drug lower the -.V
-. before giving drug I; ;88 ;8; ;86 ;;9 ;8B ;>6 ;86 ;89 ;8C -. after giving drug I8 ;86 ;;C ;;9 ;;6 ;88 ;>6 ;;B ;85 ;89 4ifference I;"I8 K y 8 > 9 9 5 6 5 ; >
Eean of difference y K P y N n K 8< N D K > %4 K U P F y " y G8 Nn"; K ;.<> %& K %4 N n K ;.<> N D K 6.9C t K y N %& K > N 6.9C K 9.;< 4egree of freedom to n , ; K D , ; K C
)*ype text+
p value corresponding to t K 9.;< and d.f. C is Y 6.66; *hus highly significant *hus decrease in -. is due to the 4rug
2hi s=uare test

2hi s=uare test unli e : and t test is a non parametric test *he test involves calculation of a =uantity called chi s=uare . 2hi s=uare is denoted by X8 It was developed by $arl .earson *he most important application of chi s=uare test in medical statistics are *est of proportion *est of association *est of goodness of fit *est of proportion Hsed as an alternate test to find the significance of difference in 8 or more than 8 proportions *est of association *o measure the probability of association between 8 discreet attributes e.g smo ing and cancer *est of goodness of fit *ests whether the observed values of a character differ from the expected value by chance or due to play of some external factor
)*ype text+
X8 K ] F ' , & G 8 N & X8 denotes 2hi s=uare ' K 'bserved /alue & K &xpected /alue
%teps in 2hi %=uare *est

%tate the null hypothesis 4etermine the 2hi s=uare value Find the degree of freedom (efer the 2hi s=uare table to find the probability value corresponding to the degree of freedom #et us consider the following example Oe are ma ing a field trial of 8 vaccines *he results of field trial are /accine A *otal Attac ed 88 ;5 >B @ot Attac ed*otal BC D6 <8 CB ;56 ;<B Attac (ate 85.57 ;B.87
/accine - seems to be superior to /accine A Oe perform 2hi %=uare test to verify if the vaccine - is superior to vaccine A or is it merely due to chance
%tate the null hypothesis

It states that the vaccines have e=ual efficacy
4etermining the 2hi %=uare /alue

)*ype text+
Find total attac and non attac rates *otal Attac rate K >B N ;<B K 6.865 6.<D9 *otal @on Attac (ate K ;56 N ;<B K
/accine A FnKD6G
Attac ed ' K 88 & K 6.865 I D6 K;C.>B ' " & K L >.B5 ' K ;5
@ot Attac ed ' K BC & K 6.<D9 I D6 K <;.99 ' " & K " >.99 ' K <8 & K 6.<D9 I CB K BC.>< ' " & K L >.B>
FnKCBG
& K 6.865 I CB K ;<.95 ' " & K ">.95
X8 K P F ' , & G 8 N & K F>.B5G8 N;C.>B L F>.99G 8 N <;.99 L F>.95G 8N ;<.95 L F>.B>G 8 N BC.>< K 6.<8 L 6.;< L 6.<; L 6.;D K ;.<D Find the 4egree of Freedom K Fc";G Fr";G c K number of 2olumns r K number of (ows d.f. K F8";GF8";G K ; Find the p value 'n referring to 2hi s=uare table with one degree of freedom the p value was more than 6.69.
)*ype text+
1ence the difference is not statistically significant and the null hypothesis of no difference between vaccines is accepted.
A@'/A
Analysis of variance Investigations may not always be confined to comparison of 8 samples only e.g. we might li e to compare the difference in vertical dimension obtained using > or more methods li e phonetics, swallowing, niswongers method In such cases where more than 8 samples are used A@'/A can be used Also when measurements are influenced by several factors playing there role e.g. factors affecting retention of a denture, A@'/A can be used. A@'/A helps to decide which factors are more important (e=uirements 4ata for each group are assumed to be independent and normally distributed %ampling should be at random 'ne way A@'/A Ohere only one factor will effect the result between 8 groups *wo way A@'/A Ohere we have 8 factors that affect the result or outcome Eulti way A@'/A
)*ype text+
*hree or more factors affect the result or outcomes between groups
F test
F K Eean %=uare between %amples N Eean %=uare within %amples F K variance ratio *he values of Eean s=uare are seen from the analysis of variance table if we have the values of sum of s=uares and degree of freedom F which are calculated G Eean %=uare between %amples , It denotes the difference between the sample mean of all groups involved in the study FA, -, 2 etcG with the mean of the population Eean %=uare within %amples , it denotes the difference between the means in between different samples *he greater both these value more is the difference between the samples *he F value observed from the study is compared to the theoretical F value obtained from the *ables at ;7 and 97 confidence limits. *he results are then interpreted. If the observed value is more than theoretical value at ;7 , the relation is highly significant. If the observed value is less than the theoretical value at 97 it is not significant.
)*ype text+
If the observed value is between ; and 97 of theoretical value it is statistically significant. .resented by 4r %hilpi ?ilra
)*ype text+

Biostatistics (DR Shilpi Gilra)

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Biostatistics (DR Shilpi Gilra)

Uploaded by

Copyright:

Available Formats

Bio Statistics

Applications and uses of bio statistics as a science

In community medicine and public health

2ommon %tatistical *erms

Qualitative or discrete data

Quantitative or continuous data

Fre=uency distribution table

Charts and diagrams

%imple -ar 2hart

370 280 220 180 95

290 CD Patients RPD Patients FPD Patients 40

Eultiple -ar 2hart

2omponent -ar 2hart

2umulative Fre=uency 4iagram

90 70 40 45 55 Pre&a"en!e o Denta" Caries ' in $er!ent(

0 to 10 to 20 to 30 to 40 to 50 to 60 to 10 20 30 40 50 60 70 %rs %rs %rs %rs %rs %rs %rs

%catter or 4ot diagram

%pot map or map diagram

Eeasures of statistical averages or central tendency

e.g. vertical dimension may vary from patient to patient

Eeasures of variation or dispersion

Hses of %tandard 4eviation

@ormal distribution or normal curve

Properties o! standard normal curve

4etermination of sample si:e

For =ualitative data

@on parametric tests

*wo tailed test

'ne tailed test

Stages in per!orming test o! signi!icance

%tate the null hypothesis

%tate the Alternative hypothesis

Accept or re0ect the null hypothesis

Finally determine the p value

*ests of significance for large samples

%tandard error of mean

%tandard error of difference between 8 means

%tandard error of proportion

%tandard error of difference between 8 proportion

TY 8 , pY.69 thus difference observed is not significant

*est of significance for small samples

2hi s=uare test

%teps in 2hi %=uare *est

%tate the null hypothesis

4etermining the 2hi %=uare /alue

& K 6.865 I CB K ;<.95 ' " & K ">.95

*hree or more factors affect the result or outcomes between groups

You might also like