You are on page 1of 6

ESSENTIALS OF DATA ANALYSIS

After data are obtained are obtained through questionnaire, interviews,


observation or through secondary sources, they need to be edited. The blank
responses, if any, have to be handled in some way, the data coded, and
categorizing scheme has to be set up. The data will then have to be keyed in,
and some software program used to analyze them.

Editing

Data have to be edited, especially when they relate to responses to open-


ended questions of interviews and questionnaires, or unstructured
observations. In other words, information that may have been noted down by
the interviewer, observer, or researcher in a hurry must be clearly deciphered
so that it may be coded systematically in its entirety. Lack of clarity at this
stage will result later in confusion. The edited data should be identifiable
through the use of a different color pencil or ink so that original information
is still available in case of further doubts.

Incoming mailed questionnaire data have to be checked for incompleteness


and inconsistencies, if any, by designated members of research staff.
Inconsistencies that can be logically corrected should be rectified and edited
at this stage.

Much of the editing is automatically taken care of in the case of computer-


assisted telephone interviews and electronically administered questionnaires,
even as the respondent is answering the question.

Handling Blank Responses

Not all respondents answer every item in the questionnaire. Answers may
have been left blank because the respondent did not understand the question,
did not know the answer, was not willing to answer, or was simply
indifferent to the need to respond the entire questionnaire. If a substantial
number of questions – say 25% of the items in the questionnaire – have been
left unanswered, it may be a good idea to throw out the questionnaire and
not include it in the data set for analysis. In this event, it is important to
mention the number of returned but unused responses due to excessive
missing data in the final report submitted to the sponsor of the study. If,
however, only two or three items are left blank in a questionnaire with, say,
30 or more items, we need to decide how these blank responses are to be
handled.

One way to handle a blank response to an interval-scaled item with a


midpoint would be assign the midpoint in the scale as the response to the
particular item. An alternative way is to allow the computer to ignore the
blank responses when the analyses are done. There are several ways to
handling blank responses; a common approach, however, is either to give the
midpoint in the scale as the value or to ignore the particular item during the
analysis.

Coding

The next step is to code the responses. Scanner sheets can be used for
collecting questionnaire data; such sheets facilitate the entry of the responses
directly into the computer without manual keying in of the data. However, if
for whatever reason this cannot be done, then it is perhaps better to use a
coding sheet first to transcribe the data from the questionnaire and then key
in the data. This method, in contrast to flipping through each questionnaire
for each item, avoids confusions, especially when there are many questions
and a large number of questionnaires as well.

It is possible to key in the data directly from the questionnaires, but that
would need flipping through several questionnaires, page by page, resulting
in possible errors and omissions of items. Transfer of the data first onto a
code sheet would thus help.

Human errors can occur while coding. At least 10% of the coded
questionnaires should therefore be checked for coding accuracy. Their
selection may follow a systematic sampling procedure. That is, every nth
form coded could be verified for accuracy. If many errors are found in the
sample, all items may have to be checked.
Categorizing

At this point it is useful to set up a scheme for categorizing the variables


such that the several items measuring a concept are all grouped together.
Responses to some of the negatively worded questions have also to be
reversed so that all answers are in the same direction.

If the questions measuring a concept are not contiguous but scattered over
various parts of the questionnaire, care has to be taken to include all the
items without any omission or wrong inclusion.

Entering Data

If questionnaire data are not collected on scanner answer sheets, which can
be directly entered into the computer as a data file, the raw data will have to
be manually keyed into the computer. Raw data can be entered through and
software program. For instance, the SPSS Data Editor, which looks like a
spread-editor represents a case, and each column represents a variable. All
missing values will appear with a period (dot) in the cell. It is possible to
add, change, or delete values after the data have been entered.

It is also easy to compute the new variables that have been categorized
earlier, using the Compute dialog box, which opens when the Transform
icon is chosen. Once the missing values, the recodes, and the computing of
new variables are taken care of, the data are ready for analysis.

Feel for the Data

We can acquire a feel for the data by checking the central tendency and the
dispersion. The mean, the range, the standard deviation, and the variance in
the data will give researcher a good idea of how the respondents have
reacted to the items in the questionnaire and how good items and measures
are. If the response to each individual item in a scale does not have a good
spread (range) and shows very little variability, then the researcher would
suspect that the particular question was probably not properly worded and
respondents did not quite understand the intent of the question. Biases, if
any, could also be detected if the respondents have tended to respond
similarly to all the items – that is, struck to only certain points on the scale.
The maximum and minimum scores, mean, standard deviation, variance, and
other statistics can be easily obtained, and these will indicate whether the
responses range satisfactorily over the scale.

A frequency distribution of the nominal variables of interest should be


obtained. Visual displays thereof through histograms/bar charts, and son on,
can also be provided through programs that generate charts. In addition to
the frequency distributions and the means and standard deviations, it is good
to know how the dependent and independent variables in the study are
related to each other. For this purpose, an intercorrelation matrix of these
variables should also be obtained.

It is always prudent to obtain (1) the frequency distributions for the


demographic variables, (2) the mean, standard deviation, range, and variance
on the other dependent and independent variables, and (3) an intercorrelation
matrix of the variables, irrespective of whether or not the hypotheses are
directly related to these analyses. These statistics give a feel for the data.

Establishing the goodness of data lends credibility to all subsequent analyses


and findings. Hence, getting a feel for the data becomes the necessary first
step in all data analysis. Based on this initial feel, further detailed analyses
may be done to test the goodness of the data.

Testing Goodness of Data

The reliability and validity of the measures can now be tested.

Reliability

The reliability of a measure is established by testing for both consistency


and stability. Consistency indicates how well the items measuring a concept
hang together as a set. Cronbach’s alpha is a reliability coefficient that
indicates how well the items in a set are positively correlated to one another.
Cronbach’s alpha is computed in terms of the average intercorrelations
among the items measuring the concept. The closer Cronbach’s alpha is to 1,
the higher the internal consistency reliability.
Another measure of consistency reliability used in specific situations is the
split-half reliability coefficient. Since this reflects the correlations between
two halves of a set of items, the coefficients obtained will vary depending on
how scale is split. Sometimes split-half reliability is obtained to test for
consistency when more than one scale, dimension, or factor, is assessed. The
items across each of the dimensions or factors are split, based on some
predetermined logic. In almost every case, Cronbach’s alpha is an adequate
test of internal consistency reliability

The stability of a measure can be assessed through parallel from reliability


and test-retest reliability. When a high correlation between two similar
forms of a measure is obtained, parallel form reliability is established. Test-
retest reliability can be established by computing the correlation between the
same tests administered at two different time periods.

Validity

Factorial validity can be established by submitting the data for factor


analysis. The results of factor analysis (a multivariate technique) will
confirm whether or not the theorized dimensions emerge. Measures are
developed by first delineating the dimensions so as to operationalize the
concept. Factor analysis would reveal whether the dimensions are indeed
tapped by the items in the measure, as theorizes. Criterion-related validity
can be established by testing for the power of the measure to differentiate
individuals who are known to be different. Convergent validity can be
established when there is high degree of correlation between two different
sources responding to the same measure. Discriminate validity can be
established when two distinctly different concepts are not correlated to each
other.

Hypothesis Testing

Once the data are ready for analysis, (i.e., out-of-range/missing responses,
etc., are cleaned up, and the goodness of the measures is established), the
researcher is ready to test the hypotheses already developed for the study.
There are different statistical tests which are selected according to different
hypotheses and nature of data.
Interpretation of Data Analyzed

After the data has been completely analyzed, its results have to be properly
interpreted. That interpretation of results is the most meaningful to the
organization.

You might also like