You are on page 1of 18

PARENTING STYLES ANALYSIS GUIDANCE DOCUMENT

(NOT OFFICIAL RRFSS DATA DICTIONARY)

The purpose of this guidance document is to outline the steps to take to conduct an analysis of parenting
styles in Ontario using the modified 21-item version of the Parenting Styles and Dimensions
Questionnaire (PSDQ).

The data needed to conduct these analyses are as follows:


Item scores for each of the 21-items on the PSDQ
Any demographic characteristics of respondents that are of interest

For these analyses, we use data from RRFSS (provincial sample)


Item scores for each of the 21-items on the PSDQ are contained in variables ps3-ps23
Demographic characteristics of study respondents that were of interest include:
- the number of children under the age of 11 in your household (dc2);
- parents relationship to the child (ps2);
- marital status (marital);
- ethnicity of respondent (ethnic1);
- highest level of education (educ);
- household income (income)
- parents age (derived from: yrbirth, mbirth)

The main analyses outlined are:

1) Factor analysis (Steps 6-9)


2) Determining how demographic characteristics of respondents vary by their average scores on
authoritative, authoritarian and permissive parenting (Step 12)
3) Cluster analysis (Step 13)
4) Determining mean scores of parenting styles by cluster (Step 14)

Steps to conduct these analyses

Step 1: Import data into the software. This is can be done in several ways; this is one approach

Proc import datafile=c:\sasdb\parenting_sept24.225.sav out=mydata dbms=sav replace;


Run;

Step 2: Perform data cleaning. Examine the distribution of variables in the database to ensure they are
within the expected range. Conduct cross-validations, when possible.

Proc freq data=mydata;


tables ps3-ps23 dc2 ps2 marital ethnic1 educ income;
run;

proc means data=mydata;


tables ps3-ps23;
run;
Step 3: Create a working data set. This will involve recoding your missing and unknown responses;
collapsing categories together as required; removing a priori exclusions from the database

Data parent1;
Set mydata;

Array m (21) ps3-ps23;


Do i=1 to 21;
If m(i)=9 then m(i)=. ;
End;
Recoding missing (value=9) and unknowns
Array p(21) ps3-ps23;
(value=8) on the 21-item questionnaire then deleting
Do i=1 to 21;
all missing observations from the dataset.
If p(i)=8 then p(i)=.;
End;

Array s(21) ps3-ps23;


Do i=1 to 21;
If s(i)=. then delete;
End;

Exclusions:
If ps2 le 2; Only including responses from biological parents
If ps1 ge 3 and ps1 le 11; (ps2=1, 2)
Only including responses from children aged 3 to 11

Creating variable to capture the age of


biological parent when index child was
Dbirth=15; born.
if mbirth gt 12 then mbirth=.;
Bday=MDY (mbirth, dbirth, yrbirth); Because date of birth was not collected
Inttime=MDY (intmonth, intday, intyear); in this dataset, everyone was assigned a
Child=ps1*365.25; birthday at midmonth (dbirth=15).
Childbirth=(inttime-childd);
Agebirth=(childbirth-bday)/365.25; Several manipulations were done to
calculate the age of the parent when the
index child was born

[Data step continues NEXT PAGE]


[Data step continued from PREVIOUS PAGE]

If income le 3 then incomegp=1; Creating categories for income


Else if income gt 3 and income le 5 then incomegp=2; variables. You may want to change
Else if income gt 5 and income le 8 then incomegp=3; the groupings based on your data
Else if income gt 8 and income le 12 then incomegp=4;
Else incomegp=5;

Creating categories for number of


If dc2=1 then child=1; children. There were no families
Else if dc2=2 then child=2; with more than 10 children under
Else if dc2 gt2 and dc2 le 10 then child=3; age 11, if there are, you may want to
Else child=.; change upper limits. Note: values
greater than 10 are for missing (99)
and unknown (98)

If marital le 2 then marry=1; Creating categories for marital status;


Else if marital le 6 and marital gt 2 then marry=2; you may want to change based on
Else marry=.; your data.

If educ le 3 then edu=1;


Else if educ=4 then edu=2; Creating categories for education.
Else educ=.; You may want to change based on
your data

If ethnic1=1 then ethnic=1;


Creating categories for ethnicity. You
Else if ehtnic1=7 or ethnic1=13 or ethnic1=24 or
may want to change based on your data
ethnic1=43 or ethnic1=56 then ethnic=2;
Else ethnic=3;

if agegroup=3 then agegrp=39.5-ps1; Creating categories for age groups.


if agegroup=4 then agegrp=47-ps1; Because some parents did not provide
If agebirth le 25 or agegrp le 25 then agegp=1; their age but did provide an age range,
If agebirth gt 25 and agebirth le 30 then agegp=2; both agebirth and agegrp variables were
If agebirth gt 30 and agebirth le 35 then agegp=3; used when constructing variables. May
If agegrp gt 30 and agegrp le 35 then agegp=3; not be necessary if all parents provide
If agebirth gt 35 and agebirth le 40 then agegp=4; their exact age.
If agegrp gt 35 and agegrp le 40 then agegp=4;
If agebirth gt 40 or agegrp gt 40 then agegp=5;

Run;
Note: You may find it useful to assign labels to your derived variables.

Proc format;
Value ageg 1=25 or younger
2=25.1 to 30
3=30.1 to 35
4= 35.1 to 40
5= 40.1 to older ;

Value inc 1=$40 or less


2=40 to 60
3= 60 to 90
4=90 or more
5=missing ;

Value eth 1=Canadian


2=British
3=Other ;

Value ed 1=high school or less


2=college or university ;

Value mar 1=married or common law


2=other ;

Value chi 1=one


2=two
3=three or more ;

Run;

To assign these labels to the actual variables, place this code at the end of the above data step (Step 3).
Run the proc format statement first.

Format agegp ageg.


incomegp inc.
ethnic eth.
Edu ed.
Marry mar.
Child chi.
;
Step 4: Examine the distributions of your variables. This will produce Table 1 (column 2) and Table 2 in
the accompanying document. Perform cross-validations to ensure your derived variables were coded
correctly.

Proc surveyfreq data=parent1;

Tables ps3-ps23 ethnic agegp edu child incomegp marry ps2; Distribution of variables
Tables ethnic*ethnic1;
Tables edu*educ;
Tables agebirth*agegp; Verifying derived variables are correctly
Tables marital*marry; coded
Tables incomegp*income;
Tables dc2*child;

sStrata h_unit; Take sampling design into account

Run;

Step 5: Examine the means of your 21-items on the modified PSDQ. This will produce Table 3 (first
column) of the accompanying document.

Proc surveymeans data=parent1;


Using surveymeans allows you to incorporate the
Var ps3-ps23; sampling design into the analyses. Failing to
account for stratified sampling will result in SE
Strata h_unit; estimates that are inappropriately large. Point
estimates will not be biased.
Run;

THE FOLLOWING MAY OR MAY NOT BE RELEVANT DEPENDING ON THE GOALS OF


YOUR ANALYSIS. IF YOU ARE NOT INTERESTED IN PERFORMING A FACTOR
ANALYSIS, SKIP TO STEP 10.

At this point, you may want to conduct a factor analysis to explore the number of latent constructs in the
PDSQ. A factor analysis will not tell you anything about the parenting styles used in the population but it
will tell you about the properties of the PDSQ. Specifically, it will tell you how many unmeasured
constructs the questionnaire is tapping into.

Some reasons why you would want to conduct a factor analysis:

1) You have added new items to an existing scale. Factor analysis will provide evidence about what
dimensions (if any) these items are measuring.
2) You are using a modified version of a validated scale. Factor analysis will tell you if your
modifications have changed the dimensions of the questionnaire.
3) You are using a validated scale in a new population and you have reason to suspect that the
scales dimensions will not transfer exactly across populations or that the items will not load
consistently across populations. Factor analysis will tell you if the dimensions and item loadings
are the same/similar.
4) If previous factor analyses have provided inconsistent results. Replication will tell you what
factor structure is likely correct.

Bottom line: You should only be performing a factor analysis if you are interested in learning more about
the psychometric properties of the PSDQ.

There are two main types of factor analysis: exploratory and confirmatory. The overall objectives of your
research will dictate what approach you want to use. An exploratory analysis is useful if you have no
preconceived ideas about the data. A confirmatory factor analysis is useful if you have a hypothesis about
the underlying factor structure that you would like to confirm.

For the original parenting analysis, two exploratory factor analyses were conducted. A common
approach with scale validation studies is to randomly split your sample into two groups and see if your
results are replicated across samples. You may want to perform two exploratory factor analyses, or an
exploratory factor analysis on half your data and confirmatory factor analysis on the other half of your
data.

There are various approaches to split your sample randomly into two. Here is one example:

%let num=338;
SAS will
dataselect 338 variables, which was half of the original dataset. The ranuni (220377) is the seed number. Y
sample;
set parent1;
random=ranuni(220377);
run;

proc sort data=sample;


by random;
run;

Including the samp=1 statementdata sample3;


provides an easy identification variable for all of those in the first random sample. This is important if yo
set sample;
if _N_ le #
samp=1;
run;

proc sort data=sample3;


by idnum;
run;
proc sort data=parent1;
by idnum;
run;

data together ;
merge sample3 parent1;
The data set together contains both random samples. The first sample (with 338 variables) is samp=
by idnum;
if samp ne 1 then samp=2;
run;
You should verify that the distributions of variables are approximately equal in the two subsamples. Do
not be concerned if the distribution of one or two variables is different across samples: statistically
speaking, a p-value of 0.05 will produce a significant result by chance 1 out of 20 times. This code will
complete Tables 1 and Tables 3.

Proc freq data=together;


Tables (ethnic agegp marry edu incomegp ps2 child)*samp/chisq;
Run;

Proc ttest data=together;


Class samp;
Var ps3-ps23;
Run;

Step 6: Conduct an exploratory factor analysis.

Proc factor data=together


method=ml
priors=smc
nfact=1
scree
rotate=varimax
flag=0.30
;
var ps3-ps23; This statement restricts the factor analysis to those
where samp=1; in Sample 1. If you do not split your data, this
run; statement is not necessary.

There are many considerations when conducting a factor analysis. Some of the most important are the
extraction method, the rotation method and the final number of factors that best fit your data.

The method statement specifies the extraction method. Two of the most common extractions are
maximum likelihood (method=ml) and principal components (method=prin). Theoretically, the ML and
PRIN will produce similar results if your data are multivariate normally distributed. ML will provide
goodness of fit indices, whereas PRIN will perform better if the data are not multivariate normally
distributed (Note: these data produced similar solutions using PRIN and ML).

The rotate statement specifies the rotation method. There are two types of rotations: orthogonal and
oblique. Orthogonal rotations should be used if your factors are uncorrelated (Note: items within factors
will be correlated, but the factors themselves will not be). Oblique should be used if the factors are
correlated. Rotations can only be performed when at least two factors are extracted. The above code
extracts one factor (nfact=1), so a rotation is not necessary. The most common orthogonal rotation is the
varimax rotation (rotate=varimax). The most common oblique rotations are the oblimin and promax
rotations (rotate=promax; rotate=oblimin).
The nfact statement specifies the number of factors you want to extract. Explore the data in a one factor
solution first. This will help you decide how many overall factors you want to extract. If there is more
than one factor (dimension), re-run the analysis changing the nfact to the number of factors you want to
retain.

There are many ways to determine the best number of factors. The most commonly used are

1) Scree plots
2) Eigenvalues (weak criteria)
3) Proportion of variance accounted for
4) RMSEA
5) Parallel analysis

Scree plots: The scree option in the proc factor statement requests scree plots. Scree plots plot the
Eigenvalues against the total number of factors. Below is a reproduction of a SAS scree plot based on
data from Sample 1.

3
Eigenvalue
2

0
0 5 10 15 20 25
-1

Number of factors

The point above where slope changes (known as the elbow) indicates how many factors to extract.
Based on this plot, either 2 or 3 factors would be appropriate for these data. Note: the x-axis of a scree
plot (i.e. the number of factors extracted) will always be equal to the number of items in the factor
analysis. In this case, there are 21 items so 21 factors were extracted. It does not matter how many
factors you specify to extract (e.g. nfact=4 vs. nfact=20), the scree plot will always look the same in the
same dataset.
Eigenvalues: Eigenvalues show the amount of variance in the data that is explained by a given factor.
Eigenvalues should be greater than 1. The first factor with an eigenvalue below 1 is probably not
important but some judgment is needed. This criteria has widely used but heavily criticized: it is better to
use one of the methods described to determine the number of factors to extract. Based on the output
below, a two factor solution seems to fit data best, but a three factor solution could still be plausible.

Preliminary Eigenvalues: Total = 7.81169526


Average = 0.37198549
Eigenvalue Difference Proportion Cumulative
1 4.72267026 2.31276510 0.6046 0.6046
2 2.40990516 1.45271793 0.3085 0.9131
3 0.95718723 0.32631501 0.1225 1.0356
4 0.63087223 0.20047022 0.0808 1.1164
5 0.43040200 0.03585966 0.0551 1.1715
6 0.39454235 0.12300235 0.0505 1.2220
7 0.27153999 0.05580430 0.0348 1.2567
8 0.21573570 0.04653355 0.0276 1.2843
9 0.16920215 0.10538425 0.0217 1.3060
10 0.06381789 0.05872322 0.0082 1.3142
11 0.00509467 0.02791481 0.0007 1.3148
12 -.02282014 0.06135301 -0.0029 1.3119
13 -.08417315 0.06852704 -0.0108 1.3011
14 -.15270018 0.09407157 -0.0195 1.2816
15 -.24677175 0.00285332 -0.0316 1.2500
16 -.24962507 0.01809539 -0.0320 1.2180
17 -.26772046 0.02231431 -0.0343 1.1838
18 -.29003477 0.06647203 -0.0371 1.1466
19 -.35650680 0.01274681 -0.0456 1.1010
20 -.36925361 0.05041483 -0.0473 1.0537
21 -.41966844 -0.0537 1.0000
Proportion of variance explained: This will be up to the researcher to decide. Generally factors that
explain at least 10% of the overall variance should be retained. Using the 10% criteria, a three factor
solution seems to fit the data best (See output on previous page).

RMSEA: SAS will not provide this output directly for an exploratory factor analysis: it must be
calculated by hand. Only a maximum likelihood extraction (method=ML) will provide the appropriate
information to calculate the RMSEA: you cannot calculate the RMSEA from a principal components
extraction (method=PRIN)

There are a few formulas to calculate an RMSEA. This is probably the most basic. Values will differ
slightly between methods:

2
( )1
Square root of: df
(n1)

Significance Tests Based on 338 Observations


Test DF Chi-Square Pr >
ChiSq
H0: No common factors 210 1307.2173 <.0001
HA: At least one common factor
H0: 3 Factors are sufficient 150 258.7269 <.0001
HA: More factors are needed

The RMSEA for this model is: sqrt of [(258.73/150 1)/(338-1)] = 0.046

You will need to run a number of different factor models to determine the RMSEA since the 2 and the df
values are specific to each model (e.g. nfact=1; nfact=2; nfact=3). Some guidelines to interpret the
RMSEA are provided below:

<0.05 Close fit


0.05 to 0.08 Acceptable fit
0.081 to 0.100 Marginal fit
>0.100 Poor fit
Parallel analysis: This approach uses data simulation to determine the best number of factors to retain. It
is generally considered one of the strongest criteria. The macro for parallel analysis is located in this
document:

Kabacoff R. Determining the dimensionality of data: a SAS macro for parrallel analysis. SAS
Users Group International (SUGI) 28 http://www2.sas.com/proceedings/sugi28/090-28.pdf.

After imputing the code exactly as written, place this code at the end to run the macro:

%parallel(data=sample3,
var=ps3-ps23, This calls in the appropriate dataset. If
niter=100, you have split the data, make sure to run
statistic=Median); the parallel analysis on the right dataset
here it is named sample3.

Look at where the actual and simulated data intersect. In the paper that was linked to above, a two
factor solution seems best.

Step 7: Rerun your exploratory factor analysis, according to the appropriate number of factors (as
determined in Step 6).

Proc factor data=together


method=ml
priors=smc Note: These data appeared to fit a 3-factor solution, hence the
nfact=3 nfact=3. This statement should change depending on the
scree overall number of factors your data suggests to extract. If the
rotate=varimax PSDQ scale is consistent in your population as in others,
flag=0.30 you should extract 3 factors
;
var ps3-ps23;
where samp=1;
run;

Review the factor loadings on your rotated solution. Some important considerations are:

Are items loading as expected? (i.e. are those variables intended to measure authoritarian
parenting actually loading together on a common factor);

Are these loadings significant? This is a subjective criteria. The most commonly used are in the
range of 30-40 (specify using the flag= option).

Are items loading on multiple factors? Generally you want strong loadings on the main factor
and weak loadings on other factors. Weak loadings are defined as falling between the range of
(-10 to +10)
A 3 factor solution for parenting styles is presented below. Table 4 is a reproduction of these data

Rotated Factor Pattern


Factor Factor Factor
1 2 3
ps3 I am responsive to my child's feelings and needs 41 * -14 -17
ps4 I discipline by taking away privileges from my child with little -5 45 * -13
or no explanation
ps5 I give my child reasons why rules should be followed 50 * -2 4
ps6 I use physical consequences as a way of disciplining my child -1 43 * 12
ps7 I take into account my child's desires before asking him/her to 36 * 4 -11
do something
ps8 I scold and criticize to make my child improve 0 44 * 3
ps9 I give in to my child when he/she causes a commotion about -11 41 * 14
something
ps1 I give comfort and understanding when my child is upset 40 * -19 -18
0
ps1 I use threats as consequences with little or no justification -13 54 * -1
1
ps1 I help my child to understand the impact of his/her behaviour 58 * -10 0
2
ps1 I am confident about my parenting abilities -27 -2 44 *
3
ps1 I encourage my child to freely express himself/herself even when 66 * -6 -19
4 we disagree
ps1 I threaten my child with consequences more often than actually 1 47 * 29
5 giving it
ps1 I give praise when my child is good 37 * -4 -6
6
ps1 I yell or shout when my child misbehaves 4 47 * 31 *
7
ps1 I explain the consequences of bad behaviour to my child before 23 -11 -14
8 he/she misbehaves
ps1 I discipline by putting my child off somewhere alone with little -18 61 * -4
9 or no explanation
ps2 I encourage my child to express his/her opinions 77 * -2 -7
0
ps2 I find it difficult to discipline my child -15 16 53 *
1
ps2 I state consequences to my child and do not follow through -6 42 * 23
Rotated Factor Pattern
Factor Factor Factor
1 2 3
2
ps2 I am unsure on how to solve my child's misbehaviour -14 10 51 *
3

Consider removing items that do not load as hypothesized (e.g. ps18). Strong evidence for removing
items are:

1) Low communalities. Low is generally anything below 0.10.

Final Communality Estimates and


Variable Weights
Total Communality: Weighted =
8.878798 Unweighted = 5.749348
Variable Communality Weight
ps3 0.21757351 1.27809402
ps4 0.22228781 1.28589530
ps5 0.25440987 1.34135466
ps6 0.19641921 1.24443866
ps7 0.14446442 1.16886047
ps8 0.19133242 1.23662203
ps9 0.19553997 1.24307898
ps10 0.22713695 1.29388988
ps11 0.30551412 1.43991578
ps12 0.34469610 1.52613339
ps13 0.26792128 1.36590957
ps14 0.47416929 1.90161001
ps15 0.30592755 1.44079836
ps16 0.13899039 1.16146264
Final Communality Estimates and
Variable Weights
Total Communality: Weighted =
8.878798 Unweighted = 5.749348
Variable Communality Weight
ps17 0.31520180 1.46034432
ps18 0.08278463 1.09024850
ps19 0.40486968 1.68022966
ps20 0.60040360 2.50238997
ps21 0.33370240 1.50074750
ps22 0.23772341 1.31184893
ps23 0.28827914 1.40492534

2) Low correlations with other items on the same factor.

Proc corr data=together;


Only the 9 variables that were
Var ps3 ps5 ps7 ps10 ps12 ps14 ps16 ps18 ps20;
hypothesized to load and did load on
Run;
factor 1 are included

Step 8: If modifications are made to the scale (e.g. items deleted), remove the items and re-run the factor
analysis. Review the output.

Step 9: At this point, you may want to run another exploratory factor analysis or a confirmatory factor
analysis on the other half of your sample. Repeat steps 6-8 for an exploratory factor analysis. If you are
interested in a confirmatory factor analysis, a good guidebook is (also good for exploratory factor
analysis):

Hatcher L. 1994. A step by step approach to using SAS for factor analysis and structural equation
modeling. Cary, NC: SAS Institute Inc.

Other resources for factor analysis are:

Fabrigar LR & Wegener DT. 2012. Exploratory Factor analysis. Toronto, ON: Oxford University
Press.

Fabrigar LR, Wegener DT, MacCallum RC, Strahan EJ. Evaluating the use of exploratory factor
analysis in psychological research. Psychological Methods. 1999;4:272-299.
Costello AB, Osborne JW. Best practices in exploratory factor analysis: Four recommendations for
getting the most from your analysis. Practical assessment, research & evaluation. 2005;10(7):1-9.

Step 10: Determine the internal reliability (i.e. alphas) for each scale dimension, i.e. each parenting style.
Base your decisions on the variables to include on factor analysis (if applicable). Otherwise, select the
variables that were hypothesized a priori to measure each parenting styles. A good scales alpha should
be at least 0.70. This code will produce the data in Table 6.

Proc corr data=together alpha;


The ALPHA command provides internal
reliability statistics.
Var ps3 ps5 ps7 ps10 ps12 ps14 ps16 ps18 ps20;
These are the 9 variables that were
Where samp=1 This command will hypothesized to load on the authoritative
calculate alphas only for dimension. Any dropped items (as
Run; the first sample suggested by factor analysis) should not
be included.

Step 11: Calculate the average score on each dimension of parenting (i.e. parenting style). The variables
you include in your calculation will depend on how well you think the particular item is measuring the
latent construct of interest. This decision can be based on consultations with experts, results from factor
analysis, the literature, or all of the above. This code calculates the average score on each parenting style
based on what the factor analysis suggested and how well this fit the hypothesized data structure.

Data parent2;
Set together;
Authoritative = mean (of ps3 ps5 ps7 ps10 ps12 ps14 ps16 ps20);
Authoritarian = mean (of ps4 ps6 ps8 ps11 ps17 ps19);
Permissive= mean (of ps13 ps21 ps23);
Run;
proc surveyreg data=parent2;
class child;
model authoritative=child/solution clparm;
strata h_unit;
title 'children authoritative';
Step 12: child/alpha=0.017;
lsmeans Evaluate how parenting styles vary by demographic and household characteristics of
respondents.
run; If the outcome variables (authoritative, authoritarian, permissive) are moderately correlated
(r= +/- 0.6) or are strongly negatively correlated, MANOVAs are a better analytic strategy. When data
does
procnot meet these
surveyreg criteria, there is little benefit to performing a MANOVA instead of multiple ANOVAs
data=parent2;
orclass
regressions.
child; If you do proceed with an ANOVA or regression, adjust your p-values for the number of
The number of children at home is the
comparisons by dividing your overallclparm;
model authoritarian=child/solution desired p-value by the number of dependent variables. In this case,
independent variable in these three models.
there
strataare three outcomes. If p=0.05 is your desired overall alpha level, divide 0.05/3 = 0.017.
h_unit; Each has a separate outcome variable:
title 'children authoritarian'; permissive, authoritarian, and authoritative
lsmeans child/alpha=0.017; parenting. Because there are three outcomes,
run; a Bonferroni adjustment is requested. The
alpha level for the mean comparisons
proc surveyreg data=parent2; (lsmeans) is set at 0.017.
class child;
model permiss=child/solution clparm; Using surveyfreq allows you to control for
strata h_unit; the stratified sampling strategy (strata h_unit).
lsmeans child/alpha=0.017; If you want to include weights include a
title 'children permissive'; (weight nameofweightvar) statement.
run;
Rerun the above code for each independent variable of interest (e.g. ps2, agegp, marry) to produce Tables
7 and 8. Note, Tables 7 and 8 provide similar information: the items included when calculating the mean
scores of authoritative, authoritarian and permissive parenting were different. Please see footnote at the
bottom of the graph to see which variables were included for each table.

Step 13: Perform a cluster analysis. There are many different statistical approaches to a cluster analysis.
Proc cluster implements a hierarchal clustering analysis; proc fastclus implements a k-means cluster
analysis; proc lca implements a latent class analysis (proc lca is not in the base SAS software and must be
downloaded from the Penn State Methodology Centre: http://methodology.psu.edu/). This analysis used a
k-means strategy, the most commonly used approach.

Before you conduct a cluster analysis, it is useful to standardize your clustering variables to a mean of 0
and standard deviation of 1.

proc standard out=stand data=parent2 mean=0 std=1;


var ps3-ps23;
run;

If you are interested in performing a cluster analysis separately for mothers (ps2=1) and fathers (ps2=2),
you will need to sort your data first by the relevant variable (ps2). Then specify where variable=level in
your cluster code, i.e. where ps2=1. Maxc specifies the number of
proc sort data=stand;
by ps2; clusters you want to extract

Out=clus2f outputs a dataset


named clus2f. This contains
proc fastclus data=stand maxc=2 out=clus2f maxiter=100; information on cluster
title "K-Means Two-Cluster Solution"; membership. For a two cluster
var ps3-ps23; solution, individuals will be
where ps2=1; assigned to cluster 1 or cluster 2.
run;
The where statement indicates
the cluster analysis is only
conducted on mothers (ps2-1)
With a k-means cluster analysis, you will need to run the model multiple times specifying various
numbers of clusters to determine the best number of clusters for your data. Start with 2 (maxc=2) and
work your way up to as many as you think are conceptually plausible and also interpretable (e.g.
maxc=6).

There are many methods that you can use to determine the best numbers of clusters. This article
provides a nice synthesis:

Milligan G, Cooper M. An examination of procedures for determining the number of


clusters in a data set. Psychometrika. 1985/06/01 1985;50(2):159-179.

Two simple methods are the

1) Pseudo F-test
2) Cubic Clustering Criteria

Both of these statistics are provided automatically in the fastclus output:

Pseudo F Statistic = 30.51

Approximate Expected Over-All R-Squared = 0.13609

Cubic Clustering Criterion = 32.782

Pseudo F-test: The cluster analysis that produces the highest value of the pseudo F-statistic is the model
that fits your data best. The actual value of the pseudo-F will depend on your data. Re-run the fastclus
code with a range of different clusters (maxc=2; maxc=3; maxc=4, etc) to find the best pseudo F-statistic.

Cubic Clustering Criteria Again, you want to maximize values of the CCC. This statistic may not be
valid when data are correlated. Interpret CCC as follows:
3 or greater Good fit
0-2.9 Potential clusters
<0 (negative values) Outliers

Step 14: Determine the mean scores of parenting styles in each cluster. The alpha=0.017 produces
confidence limits that are adjusted for the number of dependent variables. This code produces the data

proc sort data=clus2f;


by cluster; Calculating the means of each
parenting style, by cluster, in the
dataset clus2f (defined in step 13).
proc surveymeans data=clus2f alpha=0.017;
by cluster; The strata h_unit indicates a stratified
var authoritative authoritarian permiss; sampling strategy (where h_unit forms
strata h_unit; the strata)
run;

Step 15: Test if they are significantly different from each other. Conventional wisdom suggests they
would be (intent of cluster analysis is to partition population into groups which they are most similar to).
This code produces data in Tables 10 and 11.

proc surveyreg data=clus2f;


class cluster;
model authoritative=cluster/solution clparm; Make sure your models account for the
strata h_unit; survey design (use surveyreg) and that you
title 'authoritative cluster f'; adjust for 3 dependent variables
lsmeans cluster/alpha=0.0166; (alpha=0.016).
run;
Run separate models for each outcome and
proc surveyreg data=clus2f; for males and females.
class cluster;
model authoritarian=cluster/solution clparm;
strata h_unit;
title 'authoritarion cluster f';
lsmeans cluster/alpha=0.0166;
run;

proc surveyreg data=clus2f;


class cluster;
model permiss=cluster/solution clparm;
strata h_unit;
title 'permissive cluster f';
lsmeans cluster/alpha=0.0166;
run;

You might also like