You are on page 1of 12

EPIDATA

Dataset

• A set of data is stored in the computer as a file


• A file consists of a collection of cases
• Information on characteristics that we are interested in  collect
in a study  fields (or variables)
• For each variable/fields, a subject has their own response 
values
What does the data set look like?

• A data file/table has rectangular shape where:


• Each row represents a case
• Each column represents a field/variable
• A common way data are viewed and analysed

Idno gender cancer dob therapy


1001 1 2 22/06/2009 2
1002 2 1 02/08/2013 1
1003 1 1 11/12/2010 1
Types of data

• Continuous, e.g. height,weight


• Always numeric
• Indicate number of decimal places and unit
• Categorical, e.g. gender, type of cancer, type of treatment
• Maybe numeric or string
• String: male or female, numeric: 1. male; 2. female
• Date
• String/text, e.g.comments, “please describe”, “phone number”
General principles of good data management

• Have one value per cell


• For continuous variables all values must be in the same scale
• Do not include random text or blank lines in your data file
• Be consistent with date format and international standar times
• Capture data in its rawest form e.g. height, weight / BMI
• Enter raw rather than summary  QoL (each scale)
• Enter data promptly, all data should be entered
• Don’t confuse data entry and calculation
Unique identifier

• Each record in a data file/tablemust have a unique identifier


• Unique identifiers are used to refer to specific records in your
database without using identifying information such as names.
• The unique identifier can be single or combination
• Hospital number shouldn’t be used as unique identifier
Variable names

• Each variable must have a unique name


• Choose an informative name
• Be consistent when naming variable
• Don’t use spaces or punctuation marks
• Begin with a letter, if not your variable names may not be carried
across packages
• Variable labels and value labels provide descriptive information
than make it easier to understand your data and results
Missing data & Queries

• Missing data
• Include a code for missing data and use the same missing code for all
numeric fields
• E.g. 9 / 99 / -1, for dates 01/01/1111
• Queries
• Include a code for queries
• E.g. -2, for dates 02/02/2222
• All queries should be dealt with before beginning analysis
What is database?

• Define: An organized collection of data


• Purpose for research  obtain useful results from analysis of data
• To analyse effectively, data must be:
• Relevant
• Reliable: secure and audit train
• Organised: documented, codebook
• Consistent: relationships, data types enforced
• Clean: data validated on entry
Database: Epidata

• Can’t I use Microsoft Excel/Microsoft access/Stats package


(SPSS/STATA/SAS)??
• Use only if you don’t care about
• Integrity
• Consistency
• Security
• Usability
 Can’t export directly as stats package binary file
Epidata Manager

• Define new data structure


• Add label and definitions
• Modify existing data structure (without loss of data )
• Document data
• Export data for analysis
Epidata Entry Client

• Is used to enter or edit data already defined and contained in a


project created with the epidata manager
• Entering data
• Editing data
• Saving data
• Navigating record

You might also like