You are on page 1of 36

Research Methodology

Lecture No : 21
Data Preparation and Data Entry
Recap Lecture

In the last few lectures we discussed about:

• Research Design
• The purpose, investigation type, researcher
interference, study setting, unit of analysis, time
horizon, Measurement of variables
• Sources of Data
• Sampling
• Experimental Design
Lecture Objectives

Getting the data ready for analysis


• Data preparation
• Coding, codebook, pre-coding, coding rules
• Data entry
• Editing data
• Data transformation
Data Preparation and Description

• Data preparation includes editing, coding, and


data entry
• It is the activity that ensures the accuracy of the
data and their conversion from raw form to
reduced and classified forms that are more
appropriate for analysis.
• Preparing a descriptive statistic summary is
another preliminary step that allows data entry
errors to be identified and corrected.
Getting the Data Ready for Analysis

• After data obtained through questionnaire, they


need to be coded, keyed in, and edited.
• Outliers, inconsistencies and blank responses, if
any, have to be handled in some way.
Coding

• Data coding involves assigning a number to the


participants responses so, they can be entered into
data base.
• In coding, categories are the partitions of a data set
of a given variable. For instance, if the variable is
gender, the categories are male and female.
• Categorization is the process of using rules to
partition a body of data.

• Both closed and open questions must be coded.


Coding Cont.

• Numeric coding simplifies the researcher’s task


in converting a nominal variable like gender to a
1 or 2.
Code Construction

There are two basic rules for code construction.


• First, the coding categories should be
exhaustive, meaning that a coding category
should exist for all possible responses.

• For example, household size might be coded 1,


2, 3, 4, and 5 or more.

• The “5 or more” category assures all subjects of


a place in a category.
Code Construction Cont.

• Second, the coding categories should be


mutually exclusive and independent.

• This means that there should be no overlap


among the categories to ensure that a subject or
response can be placed in only one category.
Code Construction Cont.

• Missing data should also be represented with a


code.

• In the “good old days” of computer cards, a


numeric value such as 9 or 99 was used to
represent missing data.

• Today, most software will understand that either


a period or a blank response represents missing
data.
Codebook

• A codebook contains each variable in the study


and specifies the application of coding rules to
the variable.

• It is used by the researcher or research staff to


promote more accurate and more efficient data
entry.

• It is the definitive source for locating the


positions of variables in the data file during
analysis.
Sample Codebook
Pre-coding

• Pre-coding means assigning codebook codes to


variables in a study and recording them on the
questionnaire.
• Or you could design the questionnaire in such a
way that apart from the respondents choice it
also indicates the appropriate code next to it.
• With a pre-coded instrument, the codes for
variable categories are accessible directly from
the questionnaire.
Sample Pre-coded Instrument
Coding Open-Ended Questions

• One of the primary reasons for using open-


ended questions is that insufficient information
or lack of a hypothesis may prohibit preparing
response categories in advance. Researchers
are forced to categorize responses after the data
are collected.
Coding Open-Ended Questions Cont.

• In the Figure on the next slide, question 6


illustrates the use of an open-ended question.
After preliminary evaluation, response
categories were created for that item. They can
be seen in the codebook.
Coding Open-Ended Questions Cont.
Coding Rules

Appropriate to the
Exhaustive
research problem

Categories
should be

Derived from one


Mutually exclusive
classification principle
Data Entry

• After responses have been coded, they can be


entered into data base.
• Raw data can be entered through any software
program.
• For example: SPSS Data Editor.
Data Entry Cont.

Database
Keyboarding
Programs

Digital/ Optical
Barcodes Recognition

Voice
recognition
Editing Data

• After data entered, the blank responses, if any,


have to be handled in some way, and
inconsistent data have to be checked and
followed up.
• Data editing deals with detecting and correcting
illogical, inconsistent, or illegal data and
omissions in the information returned by the
participants of study.
Editing Data Cont.

Accurate Consistent

Criteria
Arranged for Uniformly
simplification entered

Complete
Field Editing

• Field Editing Review

• Entry Gaps  Callback

• Validates  Re-interviewing
Field Editing Review

• In large projects, field editing review is a


responsibility of the field supervisor.

• It should be done soon after the data have been


collected.

• During the stress of data collection, data


collectors often use ad hoc abbreviations and
special symbols.
• If the forms are not completed soon, the field
interviewer may not recall what the respondent
said.
• Therefore, reporting forms should be reviewed
regularly.
Field Editing Cont.

• Entry Gaps  Callback

• When entry gaps are present, a callback should


be made rather than guessing what the
respondent probably said.
Field Editing Cont.

• Validates  Re-interviewing

• The field supervisor also validates field results


by re-interviewing some percentage of the
respondents on some questions to verify that
they have participated.

• Ten percent is the typical amount used in data


validation.
Central Editing

• Scale of Study  Number of Editors

• At this point, the data should get a thorough


editing.

• For a small study, a single editor will produce


maximum consistency.

• For large studies, editing tasks should be


allocated by sections.
Central Editing Cont.

• Wrong Entry  Replacements

• Sometimes it is obvious that an entry is incorrect


and the editor may be able to detect the proper
answer by reviewing other information in the
data set.
• This should only be done when the correct
answer is obvious.
• If an answer given is inappropriate, the editor
can replace it with a no answer or unknown.
Central Editing Cont.

• Fakery  Open-ended Questions

• The editor can also detect instances of armchair


interviewing, fake interviews, during this phase.

• This is easiest to spot with open-ended


questions.
Central Editing Cont.
Guidelines for Editors
Be familiar with instructions given to interviewers and coders

Do not destroy the original entry

Make all editing entries identifiable and in standardized form

Initial all answers changed or supplied

Place initials and date of editing on each instrument completed


Handling “Don’t Know” Responses

• When the number of “don’t know” (DK)


responses is low, it is not a problem. However, if
there are several given, it may mean that the
question was poorly designed, too sensitive, or
too challenging for the respondent.

• The best way to deal with undesired DK answers


is to design better questions at the beginning.

• If DK response is legitimate, it should be kept as


a separate reply category.
Data Transformation

• Data transformation, a variation of data coding,


is a process of changing the original numerical
representation of a quantitative value to another
value.

• E.g: The data given is in per year consumption


and we need it for each month.

• Data are typically changed to avoid problems in


the next stage of data analysis process.
Data Transformation Cont.

• For example, economists often use a logarithmic


transformation so that the data are more evenly
distributed.
• Data transformation is also necessary when
several questions have been used to measure a
single concept.
• E.g: Intentions to leave is measured through 10
questions which need to be transformed into a
single value for a single respondent
Recap

• Questionnaire checking involves eliminating


unacceptable questionnaires.
• These questionnaires may be incomplete,
instructions not followed, missing pages, past
cutoff date or respondent not qualified.
• Editing looks to correct illegible, incomplete,
inconsistent and ambiguous answers.
• Coding typically assigns alpha or numeric codes
to answers that do not already have them so that
statistical techniques can be applied.
Recap Cont.

• Cleaning reviews data for consistencies.


Inconsistencies may arise from faulty logic, out
of range or extreme values.

• Statistical adjustments applies to data that


requires weighting and scale transformations.

You might also like