You are on page 1of 55

Biostatistics in

Health

Why statistics

What are Variables?


Correlational vs. Experimental Research
Dependent vs. Independent Variables
Measurement Scales
Relations between Variables
Why Relations between Variables are Impo
rtant
Two Basic Features of Every Relation betwe
en Variables
What is "Statistical Significance" (p-value)

How to Determine that a Result is "Really" Signifi


cant
Statistical Significance and the Number of Analys
es Performed
Strength vs. Reliability of a Relation between Vari
ables
Why Stronger Relations between Variables are Mor
e Significant
Why Significance of a Relation between
Variables Depends on the Size of the Sample
Why Small Relations can be Proven Significant
Only in Large Samples

Can "No Relation" be a Significant Result?


How to Measure the Magnitude (Strength) of Relations
between Variables
Common "General Format" of Most Statistical Tests
How the "Level of Statistical Significance" is Calculated
Why the "Normal distribution" is Important
Illustration of How the Normal Distribution is Used in
Statistical Reasoning (Induction)
Are All Test Statistics Normally Distributed?
How Do We Know the Consequences of Violating the
Normality Assumption?

Introduction
In all walks of life, especially health care, we are under
constant threat of being overwhelmed by data. But
unless we are going to practice on the basis of hunches,
we need these data to make good decisions and so
optimize our treatment or policy plans. So, our first
objective must be to understand a little about data.
Data do not arise by magic; data collection is
determined by a human, with all his or her prejudices
and proneness to error. We rely on instruments, such as
questionnaires, mercury barometers and
spectrophotometers, to measure and record variables of
interest, such as peoples ages or blood pressures, or
amounts of chloro-fluorocarbons in the atmosphere.
But all instruments have limitations on their accuracy.

Introduction (contd..)
Humans tend to accept a body of raw data at
face value; a computer print-out of numbers
can appear to be an unalterable truth. It
may, in fact, be junk. Subsequent statistical
processing will not recycle junk into truth
(garbage in, garbage outprinciple).
The next problem is: how do we extract
meaningful information from masses of raw
data? (How do we see the forest, when the
trees keep getting in the way?)

Definition
Statistics
A science that deals with principles & methods
for the collection, presentation, analysis &
interpretation of data
A science that deals with the study of
aggregates or total or population

Bio-Statistics
Statistical methods used in the field of Health,
Biology, Medicine, Public Health

Statistical thinking would


one day be as necessary
a qualification for good
citizenship as the ability
to read and write
H.G. WELLS, 1903

Some Basics

Data and variables


A variable is a characteristic of a population which
can take different values. The population might be a
human population; variables of interest in this
population might be age, income, number of children
and education level. One could envisage a population
of teaching hospitals and perhaps be interested in such
variables as annual budget, number of medical
students and number of X-rays taken per year.
The Health Commission might be interested in a
population of small chemical processing plants and
wish to measure variables related to the health of
employees for example, fresh air recirculation times,
or number of emergency dousing showers per factory.

Data and variables (contd..)


Data are measurements collected on a variable as a
result of taking observations.
Often, data will have associated units of measurement.
Most often, due to time and resource constraints, we
are dealing with data collected on a subset or sample
of a population.
Data may be classified as being discrete if the
variable can take only a finite number of values or
continuous if the variable can (at least within a
certain range) take any value along the number line,
for example, height, plasma cholesterol level, blood
pressure.

Measurement Scales
Depending on the nature of the variable, we have
different measurement scales.
Firstly, not all data are numbers, though they are often
represented as numbers. choose to code males as 1
and females as 2, especially when entering the data
into a computer. In this example, the numbers 1 and 2
have none of the usual properties of numbers
Two observations are equal if both are female (or both
male) and non-equal otherwise. Such a scale is called
nominal or categorical, and because of this lack of
mathematical properties it is called the weakest
scale.

Measurement of scales
(contd..)
A slightly stronger scale, one that uses the
mathematical notion of ordering is the ordinal scale.
An example of a variable measured on an ordinal scale
might be the position of an examination candidate in
the results order of merit. The number of the ranked
position tells us nothing about how much difference
exists between successive positions on the scale.
Interval scale: It possesses strong mathematical
properties due to the fact that in this scale equal
differences between points represent equal differences
in the measured quantity. For example, the difference
between 12 metres and 11 metres is the same as the
difference between 4 metres and 3 metres.

Measurement of scales
The ratio scale is considered a refinement of the
interval scale. In this scale, the order and size of
interval are important, but the ratio between two
measures also has meaning. This occurs when
there is a true zero point associated with the
scale.
For example,
temperature in degrees Celsius (interval scale)
Temperature in Degree Kelvin (ratio scale).

Simplification of data
The stronger measures can always be collapsed down to
form a weaker measure but not vice versa. Eg. Height of
students
when data are simplified or summarised, information is always
lost, understanding may be gained.
Here, if we measure heights on enough people, we may be
overwhelmed by the number of individual measurements, but
being able to say that x% are tall and (100 - x)% are short
may give a useful insight into this aspect of the population.
Of course, using this nominal scale of measuring height we
are no longer able to say what an individuals actual height is.
The reason we stress scales of measure is that the information
content of data depends on the scale, and different
descriptive techniques and different statistical tests are
appropriate to different scales

Descriptive statistics
Descriptive statistics includes methods for
presenting and summarising data.
These allow us to digest and understand
large quantities of data, and to effectively
communicate to others important aspects
of our research.

Descriptive statistics
Frequency Distributions and Data
Presentation

If you arrange your raw data so that the


scores on a variable of interest are in
order of magnitude, that is, you rank the
data, and then indicate by means of a
table or graph how often a score occurs,
then you will have constructed a
frequency distribution a tally of the
scores.

Table
grams/day
frequency
0-9
0.125
10-19
20-29
30-39
40-59
60-99

frequency

relative

125

125/1000 =

250
400
150
50
25

250/1000
400/1000
150/1000
50/1000
25/1000

=
=
=
=
=

0.250
0.400
0.150
0.050
0.025

Bar diagram

Inferential statistics
The inferential approach helps to decide
whether the outcome of the study is a result of
factors planned within design of the study or
determined by chance.
The two approaches are often used
sequentially in that first data are described
with descriptive statistics, and then additional
statistical manipulations are done to make
inferences about the likelihood that the
outcome the outcome was due to chance
through inferential statistics.

Need for learning Biostatistical


principles for
health/research workers
1. A knowledge of Statistics is required to understand
the rationale on which diagnostic, prognostic and
therapeutic decisions are or should be based
2. To appreciate that Medicine is highly dependent on
concepts of probability
3. Within their competence, health workers need to
interpret laboratory tests and bedside observations
& measurements in the light of a knowledge of

Need for learning Biostatistics


contd.
4. Health workers must know and understand the
Statistical & Epidemiological facts about the
etiology and prognosis of the diseases that they
treat in order to give the best advice to their
patients about how to avoid or limit the effects of
these diseases
5. Health workers are the primary generators of the
data on which Health Statistics are based.
Therefore they need to know how data can be and
should be used, both for the benefit of their own

Need for learning Biostatistics


contd.
6. Health managers need to know how to interpret
and draw inferences from the indicators that
describe health levels & trends etc.
7. The study of Statistics helps to foster in students
the critical and deductive faculties that they need
through out their studies and in their professional
work through out their careers
8. Knowledge of Statistics helps in understanding &
evaluating medical literature/ reports so that they
keep abreast of developments in their profession

Statistical applications in
Health
Normal values of a characteristic
How to classify an individual as healthy
or sick?
Needs treatment or not ?
Based on the normal values of certain
clinical, laboratory & other measurements
Normal here is of statistical concept and
depends on the distribution of the
characteristic in the Population

Statistical applications in
Health
Most often in Health research, we need to
know whether the level of a parameter is
same between two or more groups
Eg.
Is the birth wt same in male &
female?
Is it same in babies born to
educated &
uneducated mothers?
We can answer this only by statistical
means (Testing of hypotheses)

Statistical applications in
Health
Sometimes, the interest could be the
relation between 2 variables Eg. Weeks
of gestation & Birth weight of the
newborn
We wish to know whether a change in
one brings a change in the other
Statistical methods that help in this are:
Exploratory - Scatter plot
Quantification Correlation Coefficient

Statistical applications in
Health
Quantifying the influence of one factor on another
How much birth weight can be gained if the
delivery can be postponed by a week?
How likely the new therapy improve

clinical

outcomes?
What is risk of breast cancer in a woman, if

her

mother had h/o the same? (Regression Analysis)

Statistical applications in Health


If our interest is in seeing the agreement between two measurements of
the same characteristic (note: agreement correlation)
BP measurements by a physician & a trained nurse
Interpretation of slides by 2 pathologists
Smoking status by interview &

presence of

cotinine in urine

(Agreement analysis methods)


Exploratory plots
Quantification Kappa statistic

Statistical applications in
Health
Agreement between 2 diagnostic
tests

Eg. Diagnosis of TB by sputum


culture vs PCR
Sensitivity
Specificity
Positive Predictive value (PV+)
Negative predictive value (PV-)

Statistical applications in
Health
Development of new drugs and treatment modalities
What is the tolerable dose of a new drug?
What are the pharmaco-kinetics of the drug?
What is the effect of new drug/treatment in
condition?

treating a

How comparable is the new drug/treatment


existing drug(s)/ treatment(s)?

with the

Issues: Inclusion and exclusion criteria, confounding, bias,


blinding, randomization etc. (Clinical Trials Therapeutic &
Prophylactic)

Statistical applications in
Health
Ensuring maximum benefit of
diagnosis and treatment with
minimum cost, based on available
resources
(Health Economics & Operational
Research)

Statistical applications in
Health

Indices on population characteristics


Birth rate
Death rate
Fertility rate
Reproduction rate
Infant mortality, maternal mortality etc.

(Vital statistics)

Statistical applications in
Health

Population dynamics
Growth
Rural Urban components
Migration
Age composition
Expectation of life at birth
(Demography)

Statistical applications in
Health
Estimating the magnitude of various diseases
Distribution w.r. to age, place, time
Identifying the possible causative factors
Principles of different study designs for different
objectives
Control of Confounding, Bias
Role of any interactions

(Epidemiological methods)

Statistical applications in
Health
Estimating the potency and relative potency
of drugs
What is the dose at which 50% of subjects
respond? (LD50, ED50)
Eg. If the relative potency of a drug (A)
compared to another drug (B) is 1.5, then 1
unit of drug A is equivalent to 1.5 units of
drug B

(Biological Assays)

Statistical applications in
Health
Maintaining the quality of:
Drugs
Laboratory Instruments
Surgical Instruments

(Quality Control analysis)

Statistical applications in
Health
Synthesis of data from separate but
similar, comparable studies to have a
quantitative summary of pooled results
Aim is to integrate the findings, pool the
data and find overall trend of results

Meta-analysis (Systematic Review)

Statistical applications in
Health
Use of current best evidence derived from
published clinical & epidemiological research in
management of patients, with due attention to:
Balance of risks & benefits of diagnostic tests
Alternative treatment regimens
Each patients unique circumstances, including
baseline risk, co-morbidities & personal
preferences

(Evidence Based Medicine)

Statistical applications in Health


Application of best available evidence
in setting public health policies and
practices:
Evidence may be from epidemiologic,
demographic, sociologic, economic
etc. sources
Implementation of public health
policies, programs & practices require
good evidence on feasibility, efficacy,
efficiency, cost, acceptability to the

Whatever the branch of


Statistical applications:
What sample size is needed to arrive at a
valid conclusion?
What is the role of chance in the observed
findings?
Is the observed result due to some other
factor?
Is there any inter-play of different factors?
Can the observed result be generalized?

Role of statistics in Health


Sciences & Health Care
Delivery

Statistical methods are applied consciously


or subconsciously in health care delivery at
the community and individual patient levels
At the Community level:
Monitor & assess the health situation & trends
Predict the likely outcome of an intervention

At individual patient level:


To arrive at the most likely diagnosis
To predict the prognostic course
To evaluate the relative efficacy of different
modes
of treatment

Practical convenience dictates that we study


a set of items or individuals from a larger
aggregate or population about which we wish
to know
Eg. What proportion of primary school
children in Delhi have their first molar tooth
erupted ?
Study all primary school going children and
count how many had the first molar erupted
- Needs enormous time, resources
- May not be necessary also

Alternatively, a subset or a portion of


the total primary school children can be
studied and the results observed in the
small set can be projected to the total
children
The Total or Aggregate we are talking
about is called the Population and the
part or subset is referred to as the
Sample
Thus the essence of all Statistics is:
Study only a portion and project the
results to a target total

In the process of studying only a part


and guessing about the total:
Part or subset is called Sample
The target of our investigation or the
total/aggregate is called the Population
In the molar eruption example:
All primary school going children (in Delhi) is
our Population
Selected subset of primary school going
children forms our Sample

E.g. 2. If we wish to know the proportion of


anemic pregnant women in a village:
All currently pregnant women in the
village form the Population
Selected small set of pregnant women
(from the same village) will form the
Sample
E.g. 3. If we wish to know the birth weight of
newborns in a clinic:
All babies born in the clinic is the
Population
Selected newborn babies form the
Sample

Population vs Sample
Note that results of sample are of interest only
if they tell us about the population from which
the sample is drawn
Intuitively, the bigger the Sample, the more
confident we are about the applicability of
results to the population
The more the Sample is representative of the
Population (Total), the more confident we are
So for a given investigation, clear definition of
Population & Sample and careful selection of
the Sample are very important

In the investigation on molar eruption in


primary school children, the characteristic
of interest is whether a child has an erupted
molar (Yes/ No)
Similarly, in the birth weights example, the
characteristic of interest is the birth weight
of the newborns (800 gms 4500 gms)
Such characteristics which are likely to vary
from person to person are called Variables
Because the characteristic is likely to vary
from person to person we have to study it
based on a selected group of persons

If the characteristic is same for all individuals


in the Population, there is no need to study.
Just knowing the status of one will tell all. In
other words, such a characteristic is not a
Variable
Molar eruption status can take 2 values:
Yes/No
Anemic status of mothers can take 2
values:Yes/No
Birth weight of newborns can take any value
between some range (say 800 4500 gms)
The variable Blood group can take possible
values as A, B, AB or O

Variables that can take a few values, that


are not Measurable, but only attributes or
qualities are called Qualitative Variables
In our examples molar eruption status,
anemic status, Blood group are Qualitative
Variables
Birth weight of newborns is quantifiable
and can take a wide range of possible
values between say 800-4500 gms. This
characteristic can be measured and has
some units of measurement. This type
variables are called Quantitative
Variables

Qualitative Variables
Since Qualitative variables can a take a few values or
categories, they are also called Categorical variables
For example, any individual can have either A or B or
AB or O blood group, but no in between values
Examples: Race, smoking status, gender etc.

Nominal, Ordinal Variables

Quantitative Variables
Quantitative variables as the name indicates
can take any value on a continuous
spectrum.
For convenience sake we measure them in
some units (rounding off) however finer
they are (height in metres, cms, mm etc.)
But the actual values can lie on a continuous
spectrum. So they are also called Continuous
variables
Examples: Blood sugar, serum creatinine,
age, income, height, weight etc.

Variables

Qualitative

Quantitative

Continuous
Eg.
Temperature

Discrete
Eg.
Viral load

Nominal
Eg.
Bl. group

Ordinal

Eg.
Stage of dis.

So for any given


investigation we should be
very clear about:
Target population
Sample
Variable(s) under study

Thank you

You might also like