Professional Documents
Culture Documents
hk
The workshop is intended for postgraduate students and its emphasis is on data analysis using the computer software package Statistical Package for the Social Sciences (SPSS) for Windows. It will cover basic statistical concepts and techniques applicable to exploratory and confirmatory data analysis. Recommended reading: Norusis, M. J. (2010). PASW Statistics 18 Guide to Data Analysis. New Jersey: Prentice Hall. Norusis, M. J. (2010). PASW Statistics 18 Statistical Procedures Companion. New Jersey: Prentice Hall.
Cases are presented in rows; Variables are presented in columns; The intersection of the row and the column is called a cell.
Define a variable: Click Variable View at the bottom left of the Data Editor Name: The name of the variable, e.g. Gender, Subject001 (maximum 64 characters, no spaces and the first character must be a letter) Type: numeric or string Width: The maximum number of characters for data entry for a variable, e.g. 8. Decimal: Number of decimal places, e.g. 0, 1, 2 Label: A string of texts to describe the variable that will be shown as the title of output table, e.g. Gender, Age (maximum 255 characters) Values: Set the value of the data entered for a variable (maximum 120 characters) For example: To set 1 = Male; and 2 = Female, ~ ~ ~ Enter 1 in Value box and Male in Value label box; then click Add or press Enter Enter 2 in Value box and Female in Value label box; then click Add or press Enter Click OK
Page 1 of 16
Missing: To identify the meaning of a blank cell (i.e. missing value) For example: You can set 88 or 888 for Not Applicable; and 99 or 999 for Unknown data.
Columns: Column width of a variable, e.g. 8 Align: The alignment of data shown in Data View can be Left, Center or Right Measure: It indicates the level of measurements, Scale, Ordinal or Nominal
~
Nominal: This gives categorization without order, for example, Gender (1 = male; 2 = female) Age range (1 = 17 or below; 2 = 18-29; 3 = 30-39; 4 = 40-49; 5 = 50 an above) Nationality (1 = Chinese; 2 = British; 3 = American; 4 = Others)
Ordinal: This gives categorization with implied order, for example, 5-point Likert scales Very good good no opinion poor very poor Very satisfactory satisfactory undecided unsatisfactory very unsatisfactory Most important important neutral not important least important Strongly agree agree undecided disagree strongly disagree Highly favorable favorable no opinion unfavorable highly unfavorable Highly appropriate appropriate neutral inappropriate highly inappropriate Very supportive supportive neutral unsupportive very unsupportive Definitely yes probably yes uncertain probably no definitely no 3- or 4-point scales Agree undecided disagree Very satisfied moderately satisfied a little dissatisfied very dissatisfied
Page 2 of 16
Scale: This contains a true zero point that indicates a total absence of whatever is being measured, e.g. height, working hours per week.
Enter Data: Click the tab of Data View at the bottom left of Data Editor Move to the next cell in a row, press Tab or key Move to the previous cell in a row, press key Move to the next cell in a column, press Enter or key Move to the previous cell in a column, press key
Insert variables or cases: Highlight a particular variable column where you want to insert a new variable, at the tool bar, select Edit Insert Variable. Highlight a particular row where you want to insert a new row (case), at the tool bar, select Edit Insert Cases. Delete variables or cases: Highlight the variable, at the tool bar, select Edit Clear Highlight the row, at the tool bar, select Edit Clear
Select cases: At the tool bar, select Data Select cases ~ ~ ~ All cases If condition is satisfied if select variable e.g. Gender = 1, then press OK Based on time or case range range: observation: first case to last case, e.g. 26 to 79
Page 3 of 16
Number of online courses you are currently teaching: Job Title: Professor Lecturer/Tutor Research Assistant President
How often do you use the learning center for your teaching? Always Often Sometimes Rarely Never
Self-esteem Scale 1. On the whole, I am satisfied with myself. 2.* At times, I think I am no good at all. 3. I feel that I have a number of good qualities. 4. I am able to do things as well as most other people. 5. *I feel I do not have much to be proud of.
Strongly Disagree 0 0 0 0 0
Disagree 1 1 1 1 1
Agree 2 2 2 2 2
Strongly Agree 3 3 3 3 3
Page 4 of 16
Page 5 of 16
2. 3. 4.
Change the name of the variable from Jobtitle to Job Change gender in code 27 from 99 to 1 Insert this case before case (Code) 13 Code Gender 11 1 Age 4 Height Course 2.0 2 Job 2 Freq 5 Q1 1 Q2 2 Q3 2 Q4 3 Q5 0
5.
Insert a variable Programme (below) between Age and Height Variable name Programme Code 1 2 11 13 24 25 26 27 38 39 40 Description Title of your teaching programme Title Managing in Organizations B343C B343 Marketing Research B370 B370C Electronic Financial Services Language and Literacy in Social Context ES850C Introduction to the Internet Emerging Technologies
6.
7.
Recode variables (the questions with asterisk represent reverse score) Strongly Disagree=3, Disagree=2, Agree=1, Strongly Agree=0 Click Transform Recode into Different Variables Select Q2 and Q5 Click Old and New Values Enter 0=3, 1=2, 2=1, 3=0
Page 6 of 16
8.
Compute scale (Adding the 5 items to calculate the score of self-esteem) Click Transform Compute Variables Target Variable enter the name of the score, i.e. se_scale Enter formula Q1 + Q3 + Q4 + Q2_r + Q5_r Click OK
Select the following Variable(s) for generating frequency tables: ~ ~ ~ ~ Respondents Sex[sex] Job Satisfaction[satjob] Spouses Highest Degree[spdeg] Marital Status[marital]
Page 7 of 16
3.
Plot Charts At the tool bar, select Graphs Legacy Dialogs Bar Charts Choose Simple Tick Summaries for groups of cases Click Define
Do it again for a Pie Chart with Variable RS Highest Degree[degree] and a Histogram with Variable Age of Respondent[age]
Page 8 of 16
(ii) 4, 5, 7, 9, 11, 12, so the median is (7+9)/2 = 8 Mode: The most frequently occurring value. If more than one value has the same greatest frequency of occurrence, all of them are mode. Example: 2, 3, 4, 4, 4, 4, 5, 7, 7, 9, so the mode is 4
Mean: A measure of central tendency the arithmetic average which is the sum of the cases divided by the number of cases. Example: 16, 10, 5, 6, 8, 15, 20, 14, 16, 10 (16+10+5+6+8+15+20+14+16+10)/10=12, so the mean is 12 Standard Deviation (SD): A measure of dispersion around the mean. In a normal distribution, 68% of cases fall within one standard deviation of the mean and 95% of cases fall within two standard deviations of the mean. Example: Language aptitude scores of classes
Class 1 2 3 4 5
Page 9 of 16
Another real life example: The average height for adult men in the Hong Kong is about 178 cm, with a standard deviation of around 6 cm. This means that most of the men (one SD, about 68%, assuming a normal distribution) have a height within 6 cm of the mean (172 184 cm) and almost all the men (two SD, about 95%) have a height within 12 cm of the mean (166 190 cm). To Run Mean and Standard Deviation 1. At the tool bar, select Analyze Descriptive Statistics Descriptives
2. Double click the following Variable(s) into the Variable(s) box ~ ~ ~ Age of Respondent[age] Highest Year of School Completed[educ] Hours Worked Last Week[hrs1]
3. Click the icon Option to select Mean, Std deviation, Minimum, Maximum 4. Click Continue, then press OK to view the results in the Output Window 5. Do it again for Hours Per Day Watching TV[tvhours] with Mean, Std deviation, Minimum, and Maximum. Exercise 3: Please find: 1. The average income of the respondents families[incomdol] as well as the respondents income[rincomdol], also the Std Deviation, Minimum, and Maximum values;
Page 10 of 16
Are the two variables independent? OR Is there any relationship between the two variables?
For example, you may want to know whether there is a significant difference between male and female students in their preference for learning methods. You cant let a student choose two preferences of learning methods. Male Method 1 Method 2 Method 3 Method 4 Column Total Assumption of Chi-square test: Most of the expected counts must be greater than 5, and none less than 1. Expected counts = 47 31 49 15 118 Female 53 22 24 19 142 Row Total 100 53 73 34 260
Example A study was conducted to test a possible relationship between first language background and desire for student-centered classroom in an adult ESL class. Chinese For Against Undecided Column Total 11 45 16 72 Spanish 30 12 8 50 French 25 7 10 42 Row Total 66 64 34 164
Page 11 of 16
To run Pearson Chi-square Crosstabs: 1. 2. 3. Open data file, click File Open Data gss.sav At the tool bar, select Analyze Descriptive Statistics Crosstabs Select one or more control variables, for example: 4. 5. Row Respondents Sex[sex]; Column Spouses Highest Degree[spdeg]
Click Statistics tick Chi-square Continue Click cells (for percentages) Click: Row, Column Continue OK
Example Is life exciting or dull? Lets consider whether education is related to a persons perception of life. Less than high school Exciting Routine Dull 1. 2. 3. 4. Open data file, click File Open Data gss.sav Select Is life exciting or dull? [life] as Row (1=exciting 2=routine 3=Dull) Select Degree[degree] as Column (0=less than high school 1=high school 2=junior college or more 3=bachelor 4=Graduate) Click Statistics tick Chi-square Continue OK High school Junior college or more Bachelor Graduate
Page 12 of 16
The column of Asymp. Sig. (2-sided) is p-value. If P-value 0.01, there is highly significant correlation (99% confidence); If P-value 0.05 but > 0.01, there is significant correlation (95% confidence); If P-value > 0.05, there is no significant correlation. If either more than 20% of the cells have an expected count less than 5, or the minimum expected count is less than 1, Chi-square test couldnt be used. The result indicated that there is a significant relationship between education and persons perception of life (p = .00).
Exercise 4: 1. 2. 3. Marital status[marital] with If rich, continue or stop working[richwork]; Respondents Sex[sex] with Is life exciting or dull[life]; Degree[degree] with Job Satisfaction[satjob]
Page 13 of 16
Comparing the means of two variables for a single group (before and after). The study design for this test involves measuring each subject twice: Before and after some kinds of treatment or intervention. For example, in a study on high blood pressure, all patients are measured at the beginning of the study, given treatment, and measured again. Thus, all patients have two measures, often called before and after measures.
Comparing the means of two variables of matched pairs. Example 1: Experimental group and control group Experimental group and new teaching method Control group and traditional teaching method Example 2: Fathers education and mothers education
Example: 1. 2. 3. 4. 5. 6. Open data file, click File Open Data endorph.sav (Beta endorphin levels before and after a half-marathon run for 11 men) At the tool bar, select Analyze Compare means Paired-Samples T-Test Highlight before and after variables, then click into Paired Variables box Click on the icon Options and a Paired-Samples T-Test Options window pops-up Specify a value (95 or 99) in the Confidence Interval box, then click Continue Click OK to view the results shown in output window
Page 14 of 16
Exercise 5: 1. Use the file gss.sav Now consider differences between the parents working hours per week (variables hrs worked last week by husband[husbhr] and hrs worked last week by wife[wifehr]). Is there a statistically significant average difference between fathers and mothers working hours? 2. Use the file COUNTRY.sav Is there a statistically significant average difference of the average life expectancy between males and females (variables Male life expectancy 1992[lifeexpm] and Female life expectancy 1992[lifeexpf])? 3. Repeat Chapter 4 using other variables.
Page 15 of 16
Peoples average number of working hour is affected by their educational levels Dependent variable: the average number of hours worked in a week Independent variable: educational levels (less than high school; high school; junior college; bachelor; and graduate).
Indicate the variables which you want to compare their means, and move it into the Dependent List Select the variables which define the groups and move it into Factor box Click OK
Example: 1. 2. 3. 4. 5. Open gssft.sav file At the tool bar, click Analyze Compare Means One-way ANOVA Highlight the variable Number of Hours Worked Last Week[hrs1], then click Dependent List Highlight the variable RS Highest Degree[degree], then click Click OK to view the results in the output window into the Factor box. into the
Page 16 of 16
Bonferroni multiple comparison test Many multiple comparison procedures are available. One of the simplest is the Bonferroni procedure: 1. 2. 3. 4. 5. 6. Open gssft.sav file Click Analyze Compare Means One-way ANOVA Select the variable Number of Hours Worked Last Week[hrs1], then click List Select the variable RS Highest Degree[degree], then click Click the icon Post Hoc Tick Bonferroni and Set Significance Level at 0.05 or 0.01, click Continue, then OK into the Factor box into the Dependent
The difference in hours worked between the two groups is shown in the column labeled Mean Difference. Pairs of means that are significantly different from each other are marked with an asterisk. Results:
People with a graduate degree work significantly longer than people with education of less than high school, of high school, as well as of junior college; People with a graduate degree and with a bachelor degree did not have differences in the working hours
Exercise 6: 1. 2. Repeat the example above. Use the gss.sav data file:
Is there a relationship between the highest degree earned and number of hours of television viewed a day (variable RS Highest Degree[degree] & HOURS PER DAY WATCHING TV[tvhours])? Dependent variable: the average number of hours of TV viewed a day Independent variable: educational levels (less than high school; high school; junior college; bachelor; & graduate).
Page 17 of 16