You are on page 1of 18

Website : http://research.hkuspace.hku.

hk

Workshop 3: How to Conduct Data Analysis


Topic: Date: Time: Venue: Facilitators: Statistical Package for the Social Sciences (SPSS) for Windows 22 January 2011 9:30 11:30 a.m. Room 310, HKU SPACE Admiralty Learning Centre Ms. Dorothy Cheung and Mr. Wong Yat Nam CELL Research Centre HKU School of Professional and Continuing Education The University of Hong Kong

The workshop is intended for postgraduate students and its emphasis is on data analysis using the computer software package Statistical Package for the Social Sciences (SPSS) for Windows. It will cover basic statistical concepts and techniques applicable to exploratory and confirmatory data analysis. Recommended reading: Norusis, M. J. (2010). PASW Statistics 18 Guide to Data Analysis. New Jersey: Prentice Hall. Norusis, M. J. (2010). PASW Statistics 18 Statistical Procedures Companion. New Jersey: Prentice Hall.

3rd Workshop for EdD Students (22 Jan 2010)

Chapter 1 Data Editor


Starting SPSS for Windows: Start Program SPSS for Windows PASW Statistics 18 Type in data Get the SPSS Data Editor Window In Data Editor:

Cases are presented in rows; Variables are presented in columns; The intersection of the row and the column is called a cell.

Define a variable: Click Variable View at the bottom left of the Data Editor Name: The name of the variable, e.g. Gender, Subject001 (maximum 64 characters, no spaces and the first character must be a letter) Type: numeric or string Width: The maximum number of characters for data entry for a variable, e.g. 8. Decimal: Number of decimal places, e.g. 0, 1, 2 Label: A string of texts to describe the variable that will be shown as the title of output table, e.g. Gender, Age (maximum 255 characters) Values: Set the value of the data entered for a variable (maximum 120 characters) For example: To set 1 = Male; and 2 = Female, ~ ~ ~ Enter 1 in Value box and Male in Value label box; then click Add or press Enter Enter 2 in Value box and Female in Value label box; then click Add or press Enter Click OK

Page 1 of 16

3rd Workshop for EdD Students (22 Jan 2010)

Missing: To identify the meaning of a blank cell (i.e. missing value) For example: You can set 88 or 888 for Not Applicable; and 99 or 999 for Unknown data.

Columns: Column width of a variable, e.g. 8 Align: The alignment of data shown in Data View can be Left, Center or Right Measure: It indicates the level of measurements, Scale, Ordinal or Nominal
~

Nominal: This gives categorization without order, for example, Gender (1 = male; 2 = female) Age range (1 = 17 or below; 2 = 18-29; 3 = 30-39; 4 = 40-49; 5 = 50 an above) Nationality (1 = Chinese; 2 = British; 3 = American; 4 = Others)

Ordinal: This gives categorization with implied order, for example, 5-point Likert scales Very good good no opinion poor very poor Very satisfactory satisfactory undecided unsatisfactory very unsatisfactory Most important important neutral not important least important Strongly agree agree undecided disagree strongly disagree Highly favorable favorable no opinion unfavorable highly unfavorable Highly appropriate appropriate neutral inappropriate highly inappropriate Very supportive supportive neutral unsupportive very unsupportive Definitely yes probably yes uncertain probably no definitely no 3- or 4-point scales Agree undecided disagree Very satisfied moderately satisfied a little dissatisfied very dissatisfied

Page 2 of 16

3rd Workshop for EdD Students (22 Jan 2010)

Scale: This contains a true zero point that indicates a total absence of whatever is being measured, e.g. height, working hours per week.

Enter Data: Click the tab of Data View at the bottom left of Data Editor Move to the next cell in a row, press Tab or key Move to the previous cell in a row, press key Move to the next cell in a column, press Enter or key Move to the previous cell in a column, press key

Insert variables or cases: Highlight a particular variable column where you want to insert a new variable, at the tool bar, select Edit Insert Variable. Highlight a particular row where you want to insert a new row (case), at the tool bar, select Edit Insert Cases. Delete variables or cases: Highlight the variable, at the tool bar, select Edit Clear Highlight the row, at the tool bar, select Edit Clear

Go to particular case: At the tool bar, select Edit Go to case

Select cases: At the tool bar, select Data Select cases ~ ~ ~ All cases If condition is satisfied if select variable e.g. Gender = 1, then press OK Based on time or case range range: observation: first case to last case, e.g. 26 to 79

Page 3 of 16

3rd Workshop for EdD Students (22 Jan 2010)

Sample Questionnaire for Exercise 1


Gender: Age Group: Your height: Male 17-19 ________ m ________ Assistant Professor Research Associate Director Female 20-24 25-30 31-40 41-50 50+

Number of online courses you are currently teaching: Job Title: Professor Lecturer/Tutor Research Assistant President

Associate Professor Teaching Assistant Dean

Other (please specify): ___________________

How often do you use the learning center for your teaching? Always Often Sometimes Rarely Never

Self-esteem Scale 1. On the whole, I am satisfied with myself. 2.* At times, I think I am no good at all. 3. I feel that I have a number of good qualities. 4. I am able to do things as well as most other people. 5. *I feel I do not have much to be proud of.

Strongly Disagree 0 0 0 0 0

Disagree 1 1 1 1 1

Agree 2 2 2 2 2

Strongly Agree 3 3 3 3 3

Page 4 of 16

3rd Workshop for EdD Students (22 Jan 2010)

Exercise 1 (open SPSS file data_editing.sav)


1. Define the variables as shown in this first table, and enter the data from the second table (below this one): Variables name Code Gender Age Description Case number Gender of respondents 1=Male; 2=Female; 99=no answer Age group of respondents 1=17-19; 2=20-24; 3=25-30; 4=31-40; 5=41-50; 6=50+; 99=no answer Height Course Jobtitle Height of respondents e.g. 1.69; 1.78, 99=no answer Number of online courses you are teaching 88=not applicable; 99=no answer Job title of respondents 1=Professor; 2=Associate Professor; 3 =Assistant Professor; 4 =Lecturer/Tutor; 5=Teaching Assistant; 6=Research Associate; 7=Research Assistant; 8=Dean; 9=Director; 10=President; 11=Other 88= not applicable; 99=no answer Freq The educators use of the learning center 5=Always; 4=Often; 3=Sometimes; 2=Rarely; 1=Never; 88=not applicable; 99=no answer Q1, Q2, Q3, Q4, Q5 Code Gender 1 2 13 24 25 26 27 38 39 40 1 2 1 2 2 1 99 2 1 2 Self-esteem Scale 0=Strongly Disagree, 1=Disagree, 2=Agree, 3=Strongly Agree Age 2 2 4 3 5 99 1 2 3 5 Height Course Jobtitle Freq 1.69 1.57 1.80 1.50 1.65 1.80 1.75 1.55 1.90 1.77 1 5 2 0 1 0 0 3 4 1 5 3 8 6 9 5 5 4 6 1 4 2 5 4 0 3 1 5 0 99 Q1 3 2 2 3 3 3 2 0 1 0 Q2 1 3 0 3 2 0 1 3 1 0 Q3 3 2 2 3 3 3 3 1 2 2 Q4 2 3 1 3 1 3 3 1 3 0 Q5 0 2 1 0 3 0 3 1 2 1

Page 5 of 16

3rd Workshop for EdD Students (22 Jan 2010)

2. 3. 4.

Change the name of the variable from Jobtitle to Job Change gender in code 27 from 99 to 1 Insert this case before case (Code) 13 Code Gender 11 1 Age 4 Height Course 2.0 2 Job 2 Freq 5 Q1 1 Q2 2 Q3 2 Q4 3 Q5 0

5.

Insert a variable Programme (below) between Age and Height Variable name Programme Code 1 2 11 13 24 25 26 27 38 39 40 Description Title of your teaching programme Title Managing in Organizations B343C B343 Marketing Research B370 B370C Electronic Financial Services Language and Literacy in Social Context ES850C Introduction to the Internet Emerging Technologies

6.

Select cases (1) Gender =1 (2) Range: cases 4 to 9

7.

Recode variables (the questions with asterisk represent reverse score) Strongly Disagree=3, Disagree=2, Agree=1, Strongly Agree=0 Click Transform Recode into Different Variables Select Q2 and Q5 Click Old and New Values Enter 0=3, 1=2, 2=1, 3=0

Page 6 of 16

3rd Workshop for EdD Students (22 Jan 2010)

8.

Click All other values=System-missing Click Continue


Change the name Q2 to Q2_r and Q5 to Q5_r Click OK

Compute scale (Adding the 5 items to calculate the score of self-esteem) Click Transform Compute Variables Target Variable enter the name of the score, i.e. se_scale Enter formula Q1 + Q3 + Q4 + Q2_r + Q5_r Click OK

Chapter 2 Run Frequencies


1. Open Data File Click File Open Data gss.sav (Data from General Social Survey) 2. Run Frequencies At the tool bar, select Analyze Descriptive Statistics Frequencies Double click the selected Variable(s) into the Variable(s) box

Select the following Variable(s) for generating frequency tables: ~ ~ ~ ~ Respondents Sex[sex] Job Satisfaction[satjob] Spouses Highest Degree[spdeg] Marital Status[marital]

Page 7 of 16

3rd Workshop for EdD Students (22 Jan 2010)

3.

Press OK to view the frequency tables in the Output Window

Plot Charts At the tool bar, select Graphs Legacy Dialogs Bar Charts Choose Simple Tick Summaries for groups of cases Click Define

Highlight Respondents Sex[sex] Click

into Category Axis

Click OK to view the bar chart in the Output Window

Do it again for a Pie Chart with Variable RS Highest Degree[degree] and a Histogram with Variable Age of Respondent[age]

Page 8 of 16

3rd Workshop for EdD Students (22 Jan 2010)

Chapter 3 Run Mean and Standard Deviation (SD)


Some Basic Concepts Median: The value above and below which half of the cases fall the 50th percentile. If there is an even number of cases, the median is the average of the two middle cases when they are sorted in ascending or descending order. Examples: (i) 3, 5, 8, 10, 11, so the median is 8

(ii) 4, 5, 7, 9, 11, 12, so the median is (7+9)/2 = 8 Mode: The most frequently occurring value. If more than one value has the same greatest frequency of occurrence, all of them are mode. Example: 2, 3, 4, 4, 4, 4, 5, 7, 7, 9, so the mode is 4

Mean: A measure of central tendency the arithmetic average which is the sum of the cases divided by the number of cases. Example: 16, 10, 5, 6, 8, 15, 20, 14, 16, 10 (16+10+5+6+8+15+20+14+16+10)/10=12, so the mean is 12 Standard Deviation (SD): A measure of dispersion around the mean. In a normal distribution, 68% of cases fall within one standard deviation of the mean and 95% of cases fall within two standard deviations of the mean. Example: Language aptitude scores of classes

Class 1 2 3 4 5

Mean 80.5 66.4 70.1 56.2 52.5

SD 13.2 4.8 5.3 18.7 23.3

Page 9 of 16

3rd Workshop for EdD Students (22 Jan 2010)

Another real life example: The average height for adult men in the Hong Kong is about 178 cm, with a standard deviation of around 6 cm. This means that most of the men (one SD, about 68%, assuming a normal distribution) have a height within 6 cm of the mean (172 184 cm) and almost all the men (two SD, about 95%) have a height within 12 cm of the mean (166 190 cm). To Run Mean and Standard Deviation 1. At the tool bar, select Analyze Descriptive Statistics Descriptives

2. Double click the following Variable(s) into the Variable(s) box ~ ~ ~ Age of Respondent[age] Highest Year of School Completed[educ] Hours Worked Last Week[hrs1]

3. Click the icon Option to select Mean, Std deviation, Minimum, Maximum 4. Click Continue, then press OK to view the results in the Output Window 5. Do it again for Hours Per Day Watching TV[tvhours] with Mean, Std deviation, Minimum, and Maximum. Exercise 3: Please find: 1. The average income of the respondents families[incomdol] as well as the respondents income[rincomdol], also the Std Deviation, Minimum, and Maximum values;

Page 10 of 16

3rd Workshop for EdD Students (22 Jan 2010)

Chapter 4 Run Crosstabs


Chi-square test: It helps answer questions like

Are the two variables independent? OR Is there any relationship between the two variables?

For example, you may want to know whether there is a significant difference between male and female students in their preference for learning methods. You cant let a student choose two preferences of learning methods. Male Method 1 Method 2 Method 3 Method 4 Column Total Assumption of Chi-square test: Most of the expected counts must be greater than 5, and none less than 1. Expected counts = 47 31 49 15 118 Female 53 22 24 19 142 Row Total 100 53 73 34 260

RowTotal ColumnTotal GrandTotal 100 118 29 260

For example, expected count of male students choose method 1 is

Example A study was conducted to test a possible relationship between first language background and desire for student-centered classroom in an adult ESL class. Chinese For Against Undecided Column Total 11 45 16 72 Spanish 30 12 8 50 French 25 7 10 42 Row Total 66 64 34 164

Page 11 of 16

3rd Workshop for EdD Students (22 Jan 2010)

To run Pearson Chi-square Crosstabs: 1. 2. 3. Open data file, click File Open Data gss.sav At the tool bar, select Analyze Descriptive Statistics Crosstabs Select one or more control variables, for example: 4. 5. Row Respondents Sex[sex]; Column Spouses Highest Degree[spdeg]

Click Statistics tick Chi-square Continue Click cells (for percentages) Click: Row, Column Continue OK

Example Is life exciting or dull? Lets consider whether education is related to a persons perception of life. Less than high school Exciting Routine Dull 1. 2. 3. 4. Open data file, click File Open Data gss.sav Select Is life exciting or dull? [life] as Row (1=exciting 2=routine 3=Dull) Select Degree[degree] as Column (0=less than high school 1=high school 2=junior college or more 3=bachelor 4=Graduate) Click Statistics tick Chi-square Continue OK High school Junior college or more Bachelor Graduate

Page 12 of 16

3rd Workshop for EdD Students (22 Jan 2010)

The column of Asymp. Sig. (2-sided) is p-value. If P-value 0.01, there is highly significant correlation (99% confidence); If P-value 0.05 but > 0.01, there is significant correlation (95% confidence); If P-value > 0.05, there is no significant correlation. If either more than 20% of the cells have an expected count less than 5, or the minimum expected count is less than 1, Chi-square test couldnt be used. The result indicated that there is a significant relationship between education and persons perception of life (p = .00).

Exercise 4: 1. 2. 3. Marital status[marital] with If rich, continue or stop working[richwork]; Respondents Sex[sex] with Is life exciting or dull[life]; Degree[degree] with Job Satisfaction[satjob]

Page 13 of 16

3rd Workshop for EdD Students (22 Jan 2010)

Chapter 5 Run Paired-Samples T-Test

Comparing the means of two variables for a single group (before and after). The study design for this test involves measuring each subject twice: Before and after some kinds of treatment or intervention. For example, in a study on high blood pressure, all patients are measured at the beginning of the study, given treatment, and measured again. Thus, all patients have two measures, often called before and after measures.

Comparing the means of two variables of matched pairs. Example 1: Experimental group and control group Experimental group and new teaching method Control group and traditional teaching method Example 2: Fathers education and mothers education

Example: 1. 2. 3. 4. 5. 6. Open data file, click File Open Data endorph.sav (Beta endorphin levels before and after a half-marathon run for 11 men) At the tool bar, select Analyze Compare means Paired-Samples T-Test Highlight before and after variables, then click into Paired Variables box Click on the icon Options and a Paired-Samples T-Test Options window pops-up Specify a value (95 or 99) in the Confidence Interval box, then click Continue Click OK to view the results shown in output window

Page 14 of 16

3rd Workshop for EdD Students (22 Jan 2010)

Exercise 5: 1. Use the file gss.sav Now consider differences between the parents working hours per week (variables hrs worked last week by husband[husbhr] and hrs worked last week by wife[wifehr]). Is there a statistically significant average difference between fathers and mothers working hours? 2. Use the file COUNTRY.sav Is there a statistically significant average difference of the average life expectancy between males and females (variables Male life expectancy 1992[lifeexpm] and Female life expectancy 1992[lifeexpf])? 3. Repeat Chapter 4 using other variables.

Page 15 of 16

3rd Workshop for EdD Students (22 Jan 2010)

Chapter 6 Run One-Way ANOVA (Analysis of Variance)


It is useful for comparing more than two population means. For example, if we are studying four methods for teaching English, you may want to compare average test scores for all four groups. Dependent variable and independent variable: Dependent variable: A variable being affected or assumed to be affected by the independent variable. Independent variable: A variable that affects (or is assumed to affect) the dependent variable under study and is included in the research design so that its effect can be determined. Example 1: The effect of four teaching methods on reading scores on students Example 2: Dependent variable: reading scores Independent variable: teaching methods

Peoples average number of working hour is affected by their educational levels Dependent variable: the average number of hours worked in a week Independent variable: educational levels (less than high school; high school; junior college; bachelor; and graduate).

To obtain a one-way analysis of variance (ANOVA):

Indicate the variables which you want to compare their means, and move it into the Dependent List Select the variables which define the groups and move it into Factor box Click OK

Example: 1. 2. 3. 4. 5. Open gssft.sav file At the tool bar, click Analyze Compare Means One-way ANOVA Highlight the variable Number of Hours Worked Last Week[hrs1], then click Dependent List Highlight the variable RS Highest Degree[degree], then click Click OK to view the results in the output window into the Factor box. into the

Page 16 of 16

3rd Workshop for EdD Students (22 Jan 2010)

Bonferroni multiple comparison test Many multiple comparison procedures are available. One of the simplest is the Bonferroni procedure: 1. 2. 3. 4. 5. 6. Open gssft.sav file Click Analyze Compare Means One-way ANOVA Select the variable Number of Hours Worked Last Week[hrs1], then click List Select the variable RS Highest Degree[degree], then click Click the icon Post Hoc Tick Bonferroni and Set Significance Level at 0.05 or 0.01, click Continue, then OK into the Factor box into the Dependent

The difference in hours worked between the two groups is shown in the column labeled Mean Difference. Pairs of means that are significantly different from each other are marked with an asterisk. Results:

People with a graduate degree work significantly longer than people with education of less than high school, of high school, as well as of junior college; People with a graduate degree and with a bachelor degree did not have differences in the working hours

Exercise 6: 1. 2. Repeat the example above. Use the gss.sav data file:

Is there a relationship between the highest degree earned and number of hours of television viewed a day (variable RS Highest Degree[degree] & HOURS PER DAY WATCHING TV[tvhours])? Dependent variable: the average number of hours of TV viewed a day Independent variable: educational levels (less than high school; high school; junior college; bachelor; & graduate).

Page 17 of 16

You might also like