You are on page 1of 8

MODULE 2 -

BASIC CONCEPTS AND DESCRIPTIVE STATISTICS

Different kinds of variables There are two major types of variables: (Rosenthal, 2001) Quantitative variables (numeric variables) take on numeric values.
Example: Height and number of children in a family

Qualitative variables are nonnumeric. Their values represent (nonnumeric) categories.


Example: Sex and political party affiliation

It is also called categorical variables. A categorical variable with exactly two categories is dichotomous variable. A categorical variable with exactly two categories is a dichotomous variable. For example, sex may take on only two values, female and male. Variables may be classified as discrete or continuous. A discrete variable can take on only a limited number of values. All categorical variables are discrete. A continuous variable can take on an unlimited number of values between any two selected values. In applying statistical techniques, it is important to consider the types of measurement. You may prefer to substitute the terms interval or quantitative for continuous and nominal, categorical, or qualitative for dichotomous and discrete. (Tabachnick & Fidell,2001). Levels of measurement When some characteristic is measured, researchers are able to assign to it a series of numbers according to a set of rules (Levin & Fox, 2003). Numbers have at least three important functions of research for social researchers, depending on the particular level of measurement that they employ. Specifically, series of numbers can be used to
1. classify or categorize at the nominal level of measurement 2. rank or order at the ordinal level of measurement 3. assign a score at the interval level of measurement

Nominal level The nominal level of measurement involves naming or labeling---that is, placing cases into categories and counting their frequency of occurrence.
To illustrate, we might use a nominal level measure to indicate whether each respondent is prejudiced or tolerant toward Latinos. As shown in Table 2, we might question the 10 students in a given class and determine that 5 can be regarded as (1) prejudiced and 5 can be considered (2) tolerant (Levin & Fox, 2003, p. 9).

Table 2 Attitudes of 10 college students toward Latinos: Nominal data


Attitude toward Latinos 1 = prejudiced 2 = tolerant Total Frequency 5 5 10

PSYSTA2

Module 2 -Descriptive statistics

Dr. Felicidad T. Villavicencio

Other nominal-level measures in social research are sex (male versus female), marital status (single, married, widowed, separated), political party (Liberal, Nacionalista), religion (Christian, Non-Christian), time orientation (present, past, and future), to mention only a few. Nominal data are not graded, ranked, or scaled for qualities, such as better or worse, higher or lower, more or less. Clearly, then, a nominal measure of sex does not signify whether males are superior or inferior to females. Nominal data are merely labeled, sometimes by name (male versus female), other times by number (1 versus 2), but always for the purpose of grouping the cases into separate categories to indicate sameness or differentness with respect to a given quality or characteristic. Ordinal level When the researcher goes beyond this level of measurement and seeks to order his or her cases in terms of the degree to which they have any given characteristic, he or she is working at the ordinal level of measurement. The nature of the relationship among ordinal categories depends on that characteristic the researcher seeks to measure.
To illustrate, one might classify individuals with respect to socioeconomic status as lower class, middle class, or upper class. Or, rather than categorize the students in a given classroom as either prejudiced or tolerant, the researcher might rank them according to their degree of prejudice against Latinos, as indicated in Table 3.

Table 3 Attitudes of 10 college students toward Latinos: Ordinal data


Student Joyce Paul Cathy Mike Judy Joe Kelly Ernie Linda Ben Rank 1 = most prejudiced 2 = second 3 = third 4 = fourth 5 = fifth 6 = sixth 7 = seventh 8 = eighth 9 = ninth 10= least prejudiced

The ordinal level of measurement yields information about the ordering of categories, but does not indicate the magnitude of differences between numbers. For instance, the researcher who employs an ordinal-level measure to study prejudice toward Latinos does not know how much more prejudiced one respondent is than another. In the example given (Table 3), it is not possible to determine how much more prejudiced Joyce is than Paul or how much less prejudiced Ben is than Linda or Ernie. This is because, the interval between the points or ranks on an ordinal scale are not known or meaningful. Therefore, it is not possible to assign scores to cases located at points along the scale (Levin & Fox, 2003, p. 10). Interval level By contrast, the interval level of measurement not only tells us about the ordering of categories but also indicates the exact distance between them. Interval measures use constant units of measurement (for example, dollars or cents, Fahrenheit or Celsius, yards or feet, minutes or seconds) that yield equal intervals between points on the scale. To illustrate, an interval measure of prejudice against Latinossuch as a set of responses to a series of questions about Latinos that is scores from 0 to 100 (100 is extreme prejudice)might yield the data shown in Table 4 about the 10 students in a given classroom. PSYSTA2 Module 2 -Descriptive statistics Dr. Felicidad T. Villavicencio 2

Table 4 Attitudes of 10 college students toward Latinos: Interval data


Student Joyce Paul Cathy Mike Judy Joe Kelly Ernie Linda Ben Score 98 96 95 94 22 21 20 15 11 6

Higher scores indicate greater prejudice against Latinos

As presented in Table 4, we are able to order the students in terms of their prejudices and, in addition, indicate the distances separating one from another. For instance, it is possible to say that Ben is the least prejudiced member of the class, because he received the lowest score. We can also say that Ben is only slightly less prejudiced than Linda or Ernie but much less prejudiced than Joyce, Paul, Cathy, or Mike, all of whom received extremely high scores. Depending on the purpose for which the study is designed, such information might be important to determine, but is not available at the ordinal level of measurement. Univariate statistics- summarize data for a single variable. Percentages are univariate statistics. Example, 45% of the participants are male. Bivariate statistics summarize the degree of relationship between two variables. Two variables have a relationship, that is, they are related, when the values observed for one variable vary, differ, or change according to those of the other.
Example, gender and height are related because height varies by sex (Men tend to be taller than women). .

Size of association (strength of association) concerns the degree to which the values of one variable the values of one variable vary, differ, or change according to changes in the values of the other. Multivariate statistics summarize relationships involving three or more variables. Multivariate approaches are sometimes used to address whether a relationship between two variables is a causal relationship. In a causal relationship, one variable affects another. Other relationships may not be causal but instead reflects the influence of confounding variables. Confounding variables are variables that affect the pattern of association between two other variables. Researchers often classify variables as independent or dependent. Independent variables - are presumed to affect or cause dependent variables. Dependent variables - are presumed to be affected by (caused by) independent variables.

PSYSTA2

Module 2 -Descriptive statistics

Dr. Felicidad T. Villavicencio

Key concepts in univariate descriptive statistics Descriptive statistics is a frequently used statistical procedure. Descriptive statistics are designed to give you information about the distribution of your variables.
Measures of central tendency (Mean, Median, Mode) Measures of variability around the mean (Std deviation and Variance) Measures of deviation from normality (Skewness and Kurtosis) Information concerning the spread of the distribution (Maximum, Minimum, and Range) Information about the stability or sampling error of certain measures including standard error (S.E.) of the mean (S.E. mean), S.E. of the kurtosis, and S.E. of the skewness.

Statistical significance Significance is typically designated with words such as significance, statistical significance, or probability. The latter word is the source of the letter that represents significance, the letter p. The p value identifies the likelihood that a particular outcome may have occurred by chance. For instance, group A may score an average of 32 on a scale of depression while group B scores 43 on the same scale. If a t test determines that group A differs from group B at a p = .01level of significance, it may be concluded that there is a 1 in 100 probability that the resulting difference happened by chance, and a 99 in 100 probability that the discrepancy in scores is a reliable finding. Regardless of the type of analysis the p value identifies the likelihood that a particular outcome occurred by chance.
A Chi-square analysis - identifies whether observed values differ significantly from expected values A t test or ANOVA identifies whether the mean of one group differs significantly from the mean of another group or groups Correlations and regressions identify whether two or more variables are significantly related to each other.

In all instances, a significance value will be calculated identifying the likelihood that a particular outcome is or is not reliable. Within the context of research in the social sciences, nothing is ever proved. It is demonstrated or supported at a certain level of likelihood or significance. The smaller the p value the greater the likelihood that the findings are valid.
p < .05 p between .05 and .10 p =.001 to .0001 -- the result is considered statistically significant the result is considered marginally significant the smaller the value the greater confidence the researcher has that the findings are valid.

The normal distribution A normal distribution is symmetric about the mean or average value. In a normal distribution, 68% of the values will lie between plus-or-minus () 1 standard deviation of the mean, 95% of the values will lie between 2 standard deviations of the mean, and 99% of values will lie between 3 standard deviations of the mean. A normal distribution is illustrated below.

PSYSTA2

Module 2 -Descriptive statistics

Dr. Felicidad T. Villavicencio

M-3SD

M-2SD

M-1SD

M+1SD

M+2SD

M+3SD

Example: The average (or mean) height of a Filipino male adult is 64 inches (5 ft. 4 in.) with a standard deviation of 4 inches. Thus, 68% of Filipino men are between 5 ft and 5 ft 8 in. (644), 95% of Filipino men are between (648), 99% of Filipino men are between (64) Measures of central tendency
Mean the average value of the distribution, or, the sum of all values divided by the number of values. Median the middle value of the distribution Mode the most frequently occurring value

Measures of variability around the mean Variance a measure of the spread of the scores in a distribution of scores, that is, a measure of dispersion. The larger the variance, the further the individual cases are from the mean. The smaller the variance, the closer the individual scores are to the mean. Standard deviation (SD) shows the spread, variability, or dispersion of scores in a distribution of scores. It is a measure of the average amount the scores in a distribution deviate from the mean. The more widely the scores are spread out, the larger the standard deviation. The SD is an important statistic in its own right and also because it is the basis of other statistics such as correlations and standard errors, as well as for all other standard scores, such as the stanine and the z score.
For example in Table 1, three distributions of scores, A, B, and C, are shown with their means and standard deviations. Like other measures of dispersion, the SD tells you how good the measure of central tendency (in this case the mean) is as an estimate of a value in the distribution. In distribution A, the distribution is a perfect estimate, and the SD is zero. In distribution C, by contrast, the SD is high, and the mean of 35 is a poor estimate of any particular score in the distribution.

Table 1 Distribution
A B C 35 28 1 PSYSTA2 35 29 2 35 30 4 35 32 5 35 24 24 35 36 46 35 38 65 35 40 66 35 41 68 35 42 69 Mean 35 35 35 SD 0.0 5.2 30.6

Module 2 -Descriptive statistics

Dr. Felicidad T. Villavicencio

Measures of deviation from normality Kurtosis a measure of the peakedness or the flatness of a distribution.
A kurtosis value near zero (0) -- indicates a shape close to normal. (mesokurtic) A positive value for the kurtosis indicates a distribution more peaked than normal. (leptokurtic) A negative kurtosis indicates a shape flatter than normal. (platykurtic) An extreme negative kurtosis (e.g., < - 5.0) indicates a distribution where more of the values are in the tails of the distribution than around the mean.

A kurtosis value between 1.0 is considered excellent for most psychometric purposes, but a value between 2.0 is in many cases also acceptable, depending on the particular application.

Skewness measures to what extent a distribution of values deviates from symmetry around the mean.
A value of zero (0) represents a symmetric or evenly balanced distribution. A positive skewness indicates a greater number of smaller values. A negative skewness indicates a greater number of larger values. As with kurtosis, a skewness value between 1.0 is considered excellent for most psychometric properties, but a value between 2.0 is in many cases also acceptable, depending on your particular application.

PSYSTA2

Module 2 -Descriptive statistics

Dr. Felicidad T. Villavicencio

Measures for size of the distribution Maximum value, minimum value, range, and sum are measures for size for the distribution. Measures of stability: Standard error Standard error (SE) is often short for standard error of the mean or standard error of estimate. The smaller the standard error, the better the sample statistic is as an estimate of the population parameterat least under most conditions. The standard error is a measure of sampling error; it refers to error in estimates resulting from random fluctuations in samples. SE goes down as N goes up. Thus, standard error is designed to be a measure of stability or of sampling error. A small value of standard error for skewness or kurtosis indicates greater stability or smaller sampling error.

Displaying Results Tables and figures enable authors to present a large amount of information efficiently and to make their data more comprehensible. Tables usually show numerical values or textual information (e.g., lists of stimulus words) arranged in an orderly display of columns and rows. A figure may be a chart, a graph, a photograph, a drawing, or any other illustration or nontextual depiction (2010, APA, p. 125)

Publication Manual of the American Psychological Association (2010, p. 117) If you present descriptive statistics in a table or figure, you do not need to repeat them in text, although you should (a) mention the table in which the statistics can be found and (b) emphasize particular data in the narrative when they help in interpretation of the findings. Standards for Figures (APA, 2010, p. 152) The standards for good figures are simplicity, clarity, continuity, and (of course) information value. A good figure augments rather than duplicates the text, conveys only essential facts, omits visually distracting detail, is easy to readits elements (type, lines, labels, symbols, etc.) are large enough to be read with ease, is easy to understandits purpose is readily apparent, is consistent with and in the same style as similar figures in the same article, and is carefully planned and prepared. Types of Figures (2010, APA, p. 151) Graphs typically display the relationship between two quantitative indices or between a continuous quantitative variable (usually displayed by the y-axis) and groups of subjects displayed along the x-axis). PSYSTA2 Module 2 -Descriptive statistics Dr. Felicidad T. Villavicencio 7

Charts generally display nonquantitative information such as the flow of subjects through a process, for example, flow charts). Maps generally display spatial information. Drawings -- show information pictorially Photographs contain direct visual representations of information. Statistica output : Descriptive statistics Statistics/Analyze/Basic statistics/Descriptive statistics/ok/variables/advanced/check variables/summary descriptive statistics
Descriptive Statistics (Dataset descriptive stat) Valid N TEST GPA Mean JOY Mean AXT 220 220 220 220 Mean Minimum Maximum Variance Std.Dev. Skewness 41.00000 3.00000 5.00000 4.77778 4.93333 5.00000 4.50000 42.31706 6.505156 0.560268 0.25359 0.35812 0.38640 0.45105 0.64385 0.46195 0.503582 0.693664 0.598434 -0.009794 0.621612 0.526644 0.671606 -0.005019 0.802403 0.397848 0.679666 0.880982 Std.Err. Kurtosis Std.Err. 0.326632 0.326632 0.326632 0.326632 0.326632 21.82727 8.000000 1.54886 3.35227 2.19899 2.65848 2.45948 1.96818 1.000000 1.400000 1.000000 1.000000 1.000000 1.000000 0.164033 0.102014 0.164033 0.215204 0.164033 0.634267 0.164033 0.202180 0.164033 0.861060

0.164033 -0.275578 0.326632

Mean ANG 220 Mean HOP 220 Mean BOR 220

0.164033 -0.137514 0.326632

Table 1 Descriptive Statistics of Variables under Study


Valid N TEST GPA Enjoyment Anger Anxiety Hopelessness Boredom 220 220 220 220 220 220 220 Mean 21.83 1.55 3.35 2.20 2.66 2.46 1.97 Minimum Maximum Variance Std.Dev. Skewness 8.00 1.00 1.40 1.00 1.00 1.00 1.00 41.00 3.00 5.00 4.78 4.93 5.00 4.50 42.32 0.25 0.36 0.39 0.45 0.64 0.46 6.51 0.50 0.60 0.62 0.67 0.80 0.68 0.56 0.69 -0.01 0.53 -0.01 0.40 0.88 Std.Err. of Kurtosis Skewness 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.10 -0.28 0.22 0.63 0.20 -0.14 0.86 Std.Err. of Kurtosis 0.33 0.33 0.33 0.33 0.33 0.33 0.33

PSYSTA2

Module 2 -Descriptive statistics

Dr. Felicidad T. Villavicencio

You might also like