Professional Documents
Culture Documents
The purpose of this guidance document is to outline the steps to take to conduct an analysis of parenting
styles in Ontario using the modified 21-item version of the Parenting Styles and Dimensions
Questionnaire (PSDQ).
Step 1: Import data into the software. This is can be done in several ways; this is one approach
Step 2: Perform data cleaning. Examine the distribution of variables in the database to ensure they are
within the expected range. Conduct cross-validations, when possible.
Data parent1;
Set mydata;
Exclusions:
If ps2 le 2; Only including responses from biological parents
If ps1 ge 3 and ps1 le 11; (ps2=1, 2)
Only including responses from children aged 3 to 11
Run;
Note: You may find it useful to assign labels to your derived variables.
Proc format;
Value ageg 1=25 or younger
2=25.1 to 30
3=30.1 to 35
4= 35.1 to 40
5= 40.1 to older ;
Run;
To assign these labels to the actual variables, place this code at the end of the above data step (Step 3).
Run the proc format statement first.
Tables ps3-ps23 ethnic agegp edu child incomegp marry ps2; Distribution of variables
Tables ethnic*ethnic1;
Tables edu*educ;
Tables agebirth*agegp; Verifying derived variables are correctly
Tables marital*marry; coded
Tables incomegp*income;
Tables dc2*child;
Run;
Step 5: Examine the means of your 21-items on the modified PSDQ. This will produce Table 3 (first
column) of the accompanying document.
At this point, you may want to conduct a factor analysis to explore the number of latent constructs in the
PDSQ. A factor analysis will not tell you anything about the parenting styles used in the population but it
will tell you about the properties of the PDSQ. Specifically, it will tell you how many unmeasured
constructs the questionnaire is tapping into.
1) You have added new items to an existing scale. Factor analysis will provide evidence about what
dimensions (if any) these items are measuring.
2) You are using a modified version of a validated scale. Factor analysis will tell you if your
modifications have changed the dimensions of the questionnaire.
3) You are using a validated scale in a new population and you have reason to suspect that the
scales dimensions will not transfer exactly across populations or that the items will not load
consistently across populations. Factor analysis will tell you if the dimensions and item loadings
are the same/similar.
4) If previous factor analyses have provided inconsistent results. Replication will tell you what
factor structure is likely correct.
Bottom line: You should only be performing a factor analysis if you are interested in learning more about
the psychometric properties of the PSDQ.
There are two main types of factor analysis: exploratory and confirmatory. The overall objectives of your
research will dictate what approach you want to use. An exploratory analysis is useful if you have no
preconceived ideas about the data. A confirmatory factor analysis is useful if you have a hypothesis about
the underlying factor structure that you would like to confirm.
For the original parenting analysis, two exploratory factor analyses were conducted. A common
approach with scale validation studies is to randomly split your sample into two groups and see if your
results are replicated across samples. You may want to perform two exploratory factor analyses, or an
exploratory factor analysis on half your data and confirmatory factor analysis on the other half of your
data.
There are various approaches to split your sample randomly into two. Here is one example:
%let num=338;
SAS will
dataselect 338 variables, which was half of the original dataset. The ranuni (220377) is the seed number. Y
sample;
set parent1;
random=ranuni(220377);
run;
data together ;
merge sample3 parent1;
The data set together contains both random samples. The first sample (with 338 variables) is samp=
by idnum;
if samp ne 1 then samp=2;
run;
You should verify that the distributions of variables are approximately equal in the two subsamples. Do
not be concerned if the distribution of one or two variables is different across samples: statistically
speaking, a p-value of 0.05 will produce a significant result by chance 1 out of 20 times. This code will
complete Tables 1 and Tables 3.
There are many considerations when conducting a factor analysis. Some of the most important are the
extraction method, the rotation method and the final number of factors that best fit your data.
The method statement specifies the extraction method. Two of the most common extractions are
maximum likelihood (method=ml) and principal components (method=prin). Theoretically, the ML and
PRIN will produce similar results if your data are multivariate normally distributed. ML will provide
goodness of fit indices, whereas PRIN will perform better if the data are not multivariate normally
distributed (Note: these data produced similar solutions using PRIN and ML).
The rotate statement specifies the rotation method. There are two types of rotations: orthogonal and
oblique. Orthogonal rotations should be used if your factors are uncorrelated (Note: items within factors
will be correlated, but the factors themselves will not be). Oblique should be used if the factors are
correlated. Rotations can only be performed when at least two factors are extracted. The above code
extracts one factor (nfact=1), so a rotation is not necessary. The most common orthogonal rotation is the
varimax rotation (rotate=varimax). The most common oblique rotations are the oblimin and promax
rotations (rotate=promax; rotate=oblimin).
The nfact statement specifies the number of factors you want to extract. Explore the data in a one factor
solution first. This will help you decide how many overall factors you want to extract. If there is more
than one factor (dimension), re-run the analysis changing the nfact to the number of factors you want to
retain.
There are many ways to determine the best number of factors. The most commonly used are
1) Scree plots
2) Eigenvalues (weak criteria)
3) Proportion of variance accounted for
4) RMSEA
5) Parallel analysis
Scree plots: The scree option in the proc factor statement requests scree plots. Scree plots plot the
Eigenvalues against the total number of factors. Below is a reproduction of a SAS scree plot based on
data from Sample 1.
3
Eigenvalue
2
0
0 5 10 15 20 25
-1
Number of factors
The point above where slope changes (known as the elbow) indicates how many factors to extract.
Based on this plot, either 2 or 3 factors would be appropriate for these data. Note: the x-axis of a scree
plot (i.e. the number of factors extracted) will always be equal to the number of items in the factor
analysis. In this case, there are 21 items so 21 factors were extracted. It does not matter how many
factors you specify to extract (e.g. nfact=4 vs. nfact=20), the scree plot will always look the same in the
same dataset.
Eigenvalues: Eigenvalues show the amount of variance in the data that is explained by a given factor.
Eigenvalues should be greater than 1. The first factor with an eigenvalue below 1 is probably not
important but some judgment is needed. This criteria has widely used but heavily criticized: it is better to
use one of the methods described to determine the number of factors to extract. Based on the output
below, a two factor solution seems to fit data best, but a three factor solution could still be plausible.
RMSEA: SAS will not provide this output directly for an exploratory factor analysis: it must be
calculated by hand. Only a maximum likelihood extraction (method=ML) will provide the appropriate
information to calculate the RMSEA: you cannot calculate the RMSEA from a principal components
extraction (method=PRIN)
There are a few formulas to calculate an RMSEA. This is probably the most basic. Values will differ
slightly between methods:
2
( )1
Square root of: df
(n1)
The RMSEA for this model is: sqrt of [(258.73/150 1)/(338-1)] = 0.046
You will need to run a number of different factor models to determine the RMSEA since the 2 and the df
values are specific to each model (e.g. nfact=1; nfact=2; nfact=3). Some guidelines to interpret the
RMSEA are provided below:
Kabacoff R. Determining the dimensionality of data: a SAS macro for parrallel analysis. SAS
Users Group International (SUGI) 28 http://www2.sas.com/proceedings/sugi28/090-28.pdf.
After imputing the code exactly as written, place this code at the end to run the macro:
%parallel(data=sample3,
var=ps3-ps23, This calls in the appropriate dataset. If
niter=100, you have split the data, make sure to run
statistic=Median); the parallel analysis on the right dataset
here it is named sample3.
Look at where the actual and simulated data intersect. In the paper that was linked to above, a two
factor solution seems best.
Step 7: Rerun your exploratory factor analysis, according to the appropriate number of factors (as
determined in Step 6).
Review the factor loadings on your rotated solution. Some important considerations are:
Are items loading as expected? (i.e. are those variables intended to measure authoritarian
parenting actually loading together on a common factor);
Are these loadings significant? This is a subjective criteria. The most commonly used are in the
range of 30-40 (specify using the flag= option).
Are items loading on multiple factors? Generally you want strong loadings on the main factor
and weak loadings on other factors. Weak loadings are defined as falling between the range of
(-10 to +10)
A 3 factor solution for parenting styles is presented below. Table 4 is a reproduction of these data
Consider removing items that do not load as hypothesized (e.g. ps18). Strong evidence for removing
items are:
Step 8: If modifications are made to the scale (e.g. items deleted), remove the items and re-run the factor
analysis. Review the output.
Step 9: At this point, you may want to run another exploratory factor analysis or a confirmatory factor
analysis on the other half of your sample. Repeat steps 6-8 for an exploratory factor analysis. If you are
interested in a confirmatory factor analysis, a good guidebook is (also good for exploratory factor
analysis):
Hatcher L. 1994. A step by step approach to using SAS for factor analysis and structural equation
modeling. Cary, NC: SAS Institute Inc.
Fabrigar LR & Wegener DT. 2012. Exploratory Factor analysis. Toronto, ON: Oxford University
Press.
Fabrigar LR, Wegener DT, MacCallum RC, Strahan EJ. Evaluating the use of exploratory factor
analysis in psychological research. Psychological Methods. 1999;4:272-299.
Costello AB, Osborne JW. Best practices in exploratory factor analysis: Four recommendations for
getting the most from your analysis. Practical assessment, research & evaluation. 2005;10(7):1-9.
Step 10: Determine the internal reliability (i.e. alphas) for each scale dimension, i.e. each parenting style.
Base your decisions on the variables to include on factor analysis (if applicable). Otherwise, select the
variables that were hypothesized a priori to measure each parenting styles. A good scales alpha should
be at least 0.70. This code will produce the data in Table 6.
Step 11: Calculate the average score on each dimension of parenting (i.e. parenting style). The variables
you include in your calculation will depend on how well you think the particular item is measuring the
latent construct of interest. This decision can be based on consultations with experts, results from factor
analysis, the literature, or all of the above. This code calculates the average score on each parenting style
based on what the factor analysis suggested and how well this fit the hypothesized data structure.
Data parent2;
Set together;
Authoritative = mean (of ps3 ps5 ps7 ps10 ps12 ps14 ps16 ps20);
Authoritarian = mean (of ps4 ps6 ps8 ps11 ps17 ps19);
Permissive= mean (of ps13 ps21 ps23);
Run;
proc surveyreg data=parent2;
class child;
model authoritative=child/solution clparm;
strata h_unit;
title 'children authoritative';
Step 12: child/alpha=0.017;
lsmeans Evaluate how parenting styles vary by demographic and household characteristics of
respondents.
run; If the outcome variables (authoritative, authoritarian, permissive) are moderately correlated
(r= +/- 0.6) or are strongly negatively correlated, MANOVAs are a better analytic strategy. When data
does
procnot meet these
surveyreg criteria, there is little benefit to performing a MANOVA instead of multiple ANOVAs
data=parent2;
orclass
regressions.
child; If you do proceed with an ANOVA or regression, adjust your p-values for the number of
The number of children at home is the
comparisons by dividing your overallclparm;
model authoritarian=child/solution desired p-value by the number of dependent variables. In this case,
independent variable in these three models.
there
strataare three outcomes. If p=0.05 is your desired overall alpha level, divide 0.05/3 = 0.017.
h_unit; Each has a separate outcome variable:
title 'children authoritarian'; permissive, authoritarian, and authoritative
lsmeans child/alpha=0.017; parenting. Because there are three outcomes,
run; a Bonferroni adjustment is requested. The
alpha level for the mean comparisons
proc surveyreg data=parent2; (lsmeans) is set at 0.017.
class child;
model permiss=child/solution clparm; Using surveyfreq allows you to control for
strata h_unit; the stratified sampling strategy (strata h_unit).
lsmeans child/alpha=0.017; If you want to include weights include a
title 'children permissive'; (weight nameofweightvar) statement.
run;
Rerun the above code for each independent variable of interest (e.g. ps2, agegp, marry) to produce Tables
7 and 8. Note, Tables 7 and 8 provide similar information: the items included when calculating the mean
scores of authoritative, authoritarian and permissive parenting were different. Please see footnote at the
bottom of the graph to see which variables were included for each table.
Step 13: Perform a cluster analysis. There are many different statistical approaches to a cluster analysis.
Proc cluster implements a hierarchal clustering analysis; proc fastclus implements a k-means cluster
analysis; proc lca implements a latent class analysis (proc lca is not in the base SAS software and must be
downloaded from the Penn State Methodology Centre: http://methodology.psu.edu/). This analysis used a
k-means strategy, the most commonly used approach.
Before you conduct a cluster analysis, it is useful to standardize your clustering variables to a mean of 0
and standard deviation of 1.
If you are interested in performing a cluster analysis separately for mothers (ps2=1) and fathers (ps2=2),
you will need to sort your data first by the relevant variable (ps2). Then specify where variable=level in
your cluster code, i.e. where ps2=1. Maxc specifies the number of
proc sort data=stand;
by ps2; clusters you want to extract
There are many methods that you can use to determine the best numbers of clusters. This article
provides a nice synthesis:
1) Pseudo F-test
2) Cubic Clustering Criteria
Pseudo F-test: The cluster analysis that produces the highest value of the pseudo F-statistic is the model
that fits your data best. The actual value of the pseudo-F will depend on your data. Re-run the fastclus
code with a range of different clusters (maxc=2; maxc=3; maxc=4, etc) to find the best pseudo F-statistic.
Cubic Clustering Criteria Again, you want to maximize values of the CCC. This statistic may not be
valid when data are correlated. Interpret CCC as follows:
3 or greater Good fit
0-2.9 Potential clusters
<0 (negative values) Outliers
Step 14: Determine the mean scores of parenting styles in each cluster. The alpha=0.017 produces
confidence limits that are adjusted for the number of dependent variables. This code produces the data
Step 15: Test if they are significantly different from each other. Conventional wisdom suggests they
would be (intent of cluster analysis is to partition population into groups which they are most similar to).
This code produces data in Tables 10 and 11.