You are on page 1of 40

Z-test Vs T-test Sometimes, measuring every single piece of item is just not practical.

That is why we developed and use statistical methods to solve problems. The most practical way to do it is to measure just a sample of the population. Some methods test hypotheses by comparison. The two of the more known statistical hypothesis test are the T-test and the Z-test. Let us try to breakdown the two. A T-test is a statistical hypothesis test. In such test, the test statistic follows a Students T-distribution if the null hypothesis is true. The T-statistic was introduced by W.S. Gossett under the pen name Student. The T-test is also referred as the Student T-test. It is very likely that the T-test is most commonly used Statistical Data Analysis procedure for hypothesis testing since it is straightforward and easy to use. Additionally, it is flexible and adaptable to a broad range of circumstances. There are various T-tests and two most commonly applied tests are the one-sample and paired-sample T-tests. One-sample T-tests are used to compare a sample mean with the known population mean. Two-sample Ttests, the other hand, are used to compare either independent samples or dependent samples. T-test is best applied, at least in theory, if you have a limited sample size (n < 30) as long as the variables are approximately normally distributed and the variation of scores in the two groups is not reliably different. It is also great if you do not know the populations standard deviation. If the standard deviation is known, then, it would be best to use another type of statistical test, the Z-test. The Z-test is also applied to compare sample and population means to know if theres a significant difference between them. Z-tests always use normal distribution and also ideally applied if the standard deviation is known. Z-tests are often applied if the certain conditions are met; otherwise, other statistical tests like T-tests are applied

in substitute. Z-tests are often applied in large samples (n > 30). When Ttest is used in large samples, the t-test becomes very similar to the Z-test. There are fluctuations that may occur in T-tests sample variances that do not exist in Z-tests. Because of this, there are differences in both test results. Summary: 1. Z-test is a statistical hypothesis test that follows a normal distribution while T-test follows a Students T-distribution. 2. A T-test is appropriate when you are handling small samples (n < 30) while a Z-test is appropriate when you are handling moderate to large samples (n > 30). 3. T-test is more adaptable than Z-test since Z-test will often require certain conditions to be reliable. Additionally, T-test has many methods that 4. T-tests will are more suit commonly any used than need. Z-tests.

5. Z-tests are preferred than T-tests when standard deviations are known. Read more: Difference Between Z-test and T-test | Difference Between | Ztest vs T-test http://www.differencebetween.net/miscellaneous/differencebetween-z-test-and-t-test/#ixzz1SHI026NN
A standardized test is a test that is administered and scored in a consistent, or "standard", manner. It is constructed by specialists and experts based on standardized norms and principle. Standardized tests are designed in such a way that the questions, conditions for administering, scoring procedures, and interpretations are consistent[1] and are administered and scored in a predetermined, standard manner.[2] Any test in which the same test is given in the same manner to all test takers is a standardized test. Standardized tests need not be high-stakes tests, time-limited tests, or multiple-choice tests. The opposite of a standardized test is a non-standardized test. Non-standardized testing gives significantly different tests to different test takers, or gives the same test under significantly different conditions (e.g., one group is permitted far less time to complete the test than the next

group), or evaluates them differently (e.g., the same answer is counted right for one student, but wrong for another student). Standardized tests are perceived as being more fair than non-standardized tests. The consistency also permits more reliable comparison of outcomes across all test takers.

Standardized test
From Wikipedia, the free encyclopedia

It has been suggested that Standardized testing and its effects and Standardized testing and public policy be merged into this article or section. (Discuss) Proposed since February 2011.

Young adults in Poland sit for their Matura exams. The Matura is standardized so that universities can easily compare results from students across the entire country.

A standardized test is a test that is administered and scored in a consistent, or "standard", manner. It is constructed by specialists and experts based on standardized norms and principle. Standardized tests are designed in such a way that the questions, conditions for administering, scoring procedures, and interpretations are consistent[1] and are administered and scored in a predetermined, standard manner.[2] Any test in which the same test is given in the same manner to all test takers is a standardized test. Standardized tests need not be high-stakes tests, time-limited tests, or multiple-choice tests. The opposite of a standardized test is a nonstandardized test. Non-standardized testing gives significantly different tests to different test takers, or gives the same test under significantly different conditions (e.g., one group is permitted far less time to complete the test than the next group),

or evaluates them differently (e.g., the same answer is counted right for one student, but wrong for another student). Standardized tests are perceived as being more fair than non-standardized tests. The consistency also permits more reliable comparison of outcomes across all test takers.
Contents
[hide]

o o o o o

1 History 1.1 United States 2 Design and scoring 2.1 Scoring issues 2.2 Score 3 Standards 3.1 Evaluation standards 3.2 Testing standards 4 Advantages 5 Disadvantages and criticism 6 Scoring information loss 7 Educational decisions 8 See also 9 References 10 Further reading 11 External links

[edit]History

The earliest evidence of standardized testing was in China,[3] where the imperial examinations covered the Six Arts which included music, archery and horsemanship, arithmetic, writing, and knowledge of the rituals and ceremonies of both public and private parts. Later, the studies (military strategies, civil law, revenue and taxation, agriculture and geography) were added to the testing. In this form, the examinations were institutionalized during the 6th century CE, under the Sui Dynasty.

Standardized testing was not traditionally a part of Western pedagogy; based on the sceptical and open-ended tradition of debate inherited from Ancient Greece, Western academia favored non-standardized assessments using essays written by students. However, given the large number of school students during and after the Industrial Revolution, when compulsory education laws increased student populations, openended assessment of all students decreased. Moreover, the lack of a standardized process introduces a substantial source of measurement error, as graders might show favoritism or might disagree with each other about the relative merits of different answers. More recently, it has been shaped in part by the ease and low cost of grading of multiple-choice tests by computer. Grading essays by computer is more difficult, but is also done. In other instances, essays and other open-ended responses are graded according to a pre-determined assessment rubric by trained graders.
[edit]United

States

Further information: List of standardized tests in the United States The use of standardized testing in the United States is a 20th-century phenomenon with its origins in World War I and the Army Alpha and Beta tests developed by Robert Yerkes and colleagues.[4] In the United States, the need for the federal government to make meaningful comparisons across a highly de-centralized (locally controlled) public education system has also contributed to the debate about standardized testing, including the Elementary and Secondary Education Act of 1965 that required standardized testing in public schools. US Public Law 107-110, known as the No Child Left Behind Act of 2001 further ties public school funding to standardized testing.
[edit]Design

and scoring

Some standardized testing uses multiple-choice tests, which are relatively inexpensive to score, but any form of assessment can be used.

Standardized testing can be composed of multiple-choice, true-false, essay questions, authentic assessments, or nearly any other form of assessment. Multiplechoice and true-false items are often chosen because they can be given and scored inexpensively and quickly by scoring special answer sheets by computer or via computer-adaptive testing. Some standardized tests have short-answer or essay writing components that are assigned a score by independent evaluators who use rubrics (rules or guidelines) and benchmark papers (examples of papers for each possible score) to determine the grade to be given to a response. Most assessments, however, are not scored by people; people are used to score items that are not able to be scored easily by computer (i.e., essays). For example, the Graduate Record Exam is a computer-adaptive assessment that requires no scoring by people (except for the writing portion).[5]
[edit]Scoring

issues

Human scoring is often variable, which is why computer scoring is preferred when feasible. For example, some believe that poorly paid employees will score tests badly.[6] Agreement between scorers can vary between 60 to 85 percent, depending on the test and the scoring session. Sometimes states pay to have two or more scorers read each paper; if their scores do not agree, then the paper is passed to additional scorers.[6] Open-ended components of test are often only a small proportion of the test. Most commonly, a major test includes both human-scored and computer-scored sections.
[edit]Score

There are two types of standardized test score interpretations: a normreferenced score interpretation or a criterion-referenced score interpretation. Normreferenced score interpretations compare test-takers to a sample of peers. Criterionreferenced score interpretations compare test-takers to a criterion (a formal definition of content), regardless of the scores of other examinees. These may also be described as standards-based assessments as they are aligned with the standardsbased education reform movement.[7] Norm-referenced test score interpretations are associated with traditional education, which measures success by rank ordering students using a variety of metrics, including grades and test scores, while standards-based assessments are based on the belief that all students can succeed

if they are assessed against standards which are required of all students regardless of ability or economic background.[citation needed]
[edit]Standards

The considerations of validity and reliability typically are viewed as essential elements for determining the quality of any standardized test. However, professional and practitioner associations frequently have placed these concerns within broader contexts when developing standards and making overall judgments about the quality of any standardized test as a whole within a given context.
[edit]Evaluation

standards

In the field of evaluation, and in particular educational evaluation, the Joint Committee on Standards for Educational Evaluation[8] has published three sets of standards for evaluations. The Personnel Evaluation Standards[9] was published in 1988, The Program Evaluation Standards (2nd edition)[10] was published in 1994, and The Student Evaluation Standards[11] was published in 2003. Each publication presents and elaborates a set of standards for use in a variety of educational settings. The standards provide guidelines for designing, implementing, assessing and improving the identified form of evaluation. Each of the standards has been placed in one of four fundamental categories to promote educational evaluations that are proper, useful, feasible, and accurate. In these sets of standards, validity and reliability considerations are covered under the accuracy topic. For example, the student accuracy standards help ensure that student evaluations will provide sound, accurate, and credible information about student learning and performance.
[edit]Testing

standards

In the field of psychometrics, the Standards for Educational and Psychological Testing[12] place standards about validity and reliability, along with errors of measurement and issues related to theaccommodation of individuals with disabilities. The third and final major topic covers standards related to testing applications, credentialing, plus testing in program evaluation and public policy.
[edit]Advantages

One of the main advantages of standardized testing is that the results can be empirically documented; therefore, the test scores can be shown to have a relative

degree of validity and reliability, as well as results which are generalizable and replicable.[13] This is often contrasted with grades on a school transcript, which are assigned by individual teachers. It may be difficult to account for differences in educational culture across schools, difficulty of a given teacher's curriculum, differences in teaching style, and techniques and biases that affect grading. This makes standardized tests useful for admissions purposes in higher education, where a school is trying to compare students from across the nation or across the world. Another advantage is aggregation. A well designed standardized test provides an assessment of an individual's mastery of a domain of knowledge or skill which at some level of aggregation will provide useful information. That is, while individual assessments may not be accurate enough for practical purposes, the mean scores of classes, schools, branches of a company, or other groups may well provide useful information because of the reduction of error accomplished by increasing the sample size. Standardized tests, which by definition give all test-takers the same test under the same (or reasonably equal) conditions, are also perceived as being more fair than assessments that use different questions or different conditions for students according to their race, socioeconomic status, or other considerations.
[edit]Disadvantages

and criticism

"Standardized tests can't measure initiative, creativity, imagination, conceptual thinking, curiosity, effort, irony, judgment, commitment, nuance, good will, ethical reflection, or a host of other valuable dispositions and attributes. What they can measure and count are isolated skills, specific facts and function, content knowledge, the least interesting and least significant aspects of learning." Bill Ayers[14] Standardized tests are useful tools for assessing student achievement, and can be used to focus instruction on desired outcomes, such as reading and math skills.
[15]

However, critics feel that overuse and misuse of these tests harms teaching and

learning by narrowing the curriculum. According to the group FairTest, when standardized tests are the primary factor in accountability, schools use the tests to define curriculum and focus instruction. Critics say that "teaching to the test" disfavors higher-order learning. While it is possible to use a standardized test without letting its contents determine curriculum and instruction, frequently, what is not

tested is not taught, and how the subject is tested often becomes a model for how to teach the subject. Uncritical use of standardized test scores to evaluate teacher and school performance is inappropriate, because the students' scores are influenced by three things: what students learn in school, what students learn outside of school, and the students' innate intelligence.[16] The school only has control over one of these three factors. Value-added modeling has been proposed to cope with this criticism by statistically controlling for innate ability and out-of-school contextual factors.[17] In a value-added system of interpreting test scores, analysts estimate an expected score for each student, based on factors such as the student's own previous test scores, primary language, or socioeconomic status. The difference between the student's expected score and actual score is presumed to be due primarily to the teacher's efforts. Supporters of standardized testing respond that these are not reasons to abandon standardized testing in favor of either non-standardized testing or of no assessment at all, but rather criticisms of poorly designed testing regimes. They argue that testing does and should focus educational resources on the most important aspects of education imparting a pre-defined set of knowledge and skills and that other aspects are either less important, or should be added to the testing scheme.
[edit]Scoring

information loss

When tests are scored right-wrong, an important assumption has been made about learning. The number of right answers or the sum of item scores (where partial credit is given) is assumed to be the appropriate and sufficient measure of current performance status. In addition, a secondary assumption is made that there is no meaningful information in the wrong answers. In the first place, a correct answer can be achieved using memorization without any profound understanding of the underlying content or conceptual structure of the problem posed. Second, when more than one step for solution is required, there are often a variety of approaches to answering that will lead to a correct result. The fact that the answer is correct does not indicate which of the several possible procedures were used. When the student supplies the answer (or shows the work) this information is readily available from the original documents.

Second, if the wrong answers were blind guesses, there would be no information to be found among these answers. On the other hand, if wrong answers reflect interpretation departures from the expected one, these answers should show an ordered relationship to whatever the overall test is measuring. This departure should be dependent upon the level of psycholinguistic maturity of the student choosing or giving the answer in the vernacular in which the test is written. In this second case it should be possible to extract this order from the responses to the test items.[18] Such extraction processes, the Rasch model for instance, are standard practice for item development among professionals. However, because the wrong answers are discarded during the scoring process, attempts to interpret these answers for the information they might contain is seldom undertaken. Third, although topic-based subtest scores are sometimes provided, the more common practice is to report the total score or a rescaled version of it. This rescaling is intended to compare these scores to a standard of some sort. This further collapse of the test results systematically removes all the information about which particular items were missed. Thus, scoring a test rightwrong loses 1) how students achieved their correct answers, 2) what led them astray towards unacceptable answers and 3) where within the body of the test this departure from expectation occurred. This commentary suggests that the current scoring procedure conceals the dynamics of the test-taking process and obscures the capabilities of the students being assessed. Current scoring practice oversimplifies these data in the initial scoring step. The result of this procedural error is to obscure of the diagnostic information that could help teachers serve their students better. It further prevents those who are diligently preparing these tests from being able to observe the information that would otherwise have alerted them to the presence of this error. A solution to this problem, known as Response Spectrum Evaluation (RSE),[19] is currently being developed that appears to be capable of recovering all three of these forms of information loss, while still providing a numerical scale to establish current performance status and to track performance change. This RSE approach provides an interpretation of the thinking processes behind every answer (both the right and the wrong ones) that tells teachers how they were thinking for every answer they provide.[20] Among other findings, this chapter reports that the recoverable information explains between two and three times more of the

test variability than considering only the right answers. This massive loss of information can be explained by the fact that the "wrong" answers are removed from the test information being collected during the scoring process and is no longer available to reveal the procedural error inherent in right-wrong scoring. The procedure bypasses the limitations produced by the linear dependencies inherent in test data. Testing bias occurs when a test systematically favors one group over another, even though both groups are equal on the trait the test measures. Critics allege that test makers and facilitators tend to represent a middle class, white background. Critics claim that standardized testing match the values, habits, and language of the test makers[citation needed]. However, being that most tests come from a white, middle-class background, it is important to note that the highest scoring groups are not people of that background, but rather tend to come from Asian populations.[21] Not all tests are well-written, for example, containing multiple-choice questions with ambiguous answers, or poor coverage of the desired curriculum. Some standardized tests include essay questions, and some have criticized the effectiveness of the grading methods. Recently, partial computerized grading of essays has been introduced for some tests, which is even more controversial.[22]
[edit]Educational

decisions

Test scores are in some cases used as a sole, mandatory, or primary criterion for admissions or certification. For example, some U.S. states require high school graduation examinations. Adequate scores on these exit exams are required for high school graduation. The General Educational Development test is often used as an alternative to a high school diploma. Other applications include tracking (deciding whether a student should be enrolled in the "fast" or "slow" version of a course) and awarding scholarships. In the United States, many colleges and universities automatically translate scores on Advanced Placement tests into college credit, satisfaction of graduation requirements, or placement in more advanced courses. Generalized tests such as the SAT or GRE are more often used as one measure among several, when making admissions decisions. Some public institutions have cutoff scores for the SAT, GPA, or class rank, for creating classes of applicants to automatically accept or reject.

Heavy reliance on standardized tests for decision-making is often controversial, for the reasons noted above. Critics often propose emphasizing cumulative or even nonnumerical measures, such as classroom grades or brief individual assessments (written in prose) from teachers. Supporters argue that test scores provide a clearcut, objective standard that minimizes the potential for political influence or favoritism. The National Academy of Sciences recommends that major educational decisions not be based solely on a test score.[23] The use of minimum cut-scores for entrance or graduation does not imply a single standard, since test scores are nearly always combined with other minimal criteria such as number of credits, prerequisite courses, attendance, etc. Test scores are often perceived as the "sole criteria" simply because they are the most difficult, or the fulfillment of other criteria is automatically assumed. One exception to this rule is the GED, which has allowed many people to have their skills recognized even though they did not meet traditional criteria.
[edit]

Psychological testing
From Wikipedia, the free encyclopedia

Jump to: navigation, search

Psychological testing
Diagnostics ICD-10-PCS ICD-9-CM MeSH
GZ1

94.02
D011581

Psychology

History Subfields

Basic science
Abnormal Biological Cognitive Comparative Cultural Developmental Evolutionary Experimental Mathematical Personality Positive Social

Applied science
Clinical Consumer Educational Health Industrial and organizational Law Military Occupational health Political Religion School Sport

Lists
Disciplines Organizations Outline Psychologists Psychotherapies Publications Research methods Theories Timeline Topics

Portal
vde

Psychological testing is a field characterized by the use of samples of behavior in order to assess psychological construct(s), such as cognitive and emotional functioning, about a given individual. The technical term for the science behind psychological testing is psychometrics. By samples of behavior, one means observations of an individual performing tasks that have usually been prescribed beforehand, which often means scores on a test. These responses are often compiled into statistical tables that allow the evaluator to compare the behavior of the individual being tested to the responses of a norm group.

Contents
[hide]

1 Psychological tests 2 Psychological assessment 3 Interpreting scores 4 Types of psychological tests o 4.1 IQ/achievement tests o 4.2 Public safety employment tests o 4.3 Attitude tests o 4.4 Neuropsychological tests o 4.5 Personality tests 4.5.1 Objective tests (Rating scale or self-report measure) 4.5.2 Projective tests (Free response measures) o 4.6 Sexological tests o 4.7 Direct observation tests 5 Test security 6 References 7 External links

[edit] Psychological tests


A psychological test is an instrument designed to measure unobserved constructs, also known as latent variables. Psychological tests are typically, but not necessarily, a series of tasks or problems that the respondent has to solve. Psychological tests can strongly resemble questionnaires, which are also designed to measure unobserved constructs, but differ in that psychological tests ask for a respondent's maximum performance whereas a questionnaire asks for the respondent's typical performance.[1] A useful psychological test must be both valid (i.e., there is evidence to support the specified interpretation of the test results[2]) and reliable (i.e., internally consistent or give consistent results over time, across raters, etc.). It is important that people who are equal on the measured construct also have an equal probability of answering the test items correctly.[3] For example, an item on a mathematics test could be "In a soccer match two players get a red card; how many players are left in the end?"; however, this item also requires knowledge of soccer to be answered correctly, not just mathematical ability. Group membership can also influence the chance of correctly answering items (differential item functioning). Often tests are constructed for a specific population, and this should be taken into account when administering tests. If a test is invariant to some group difference (e.g. gender) in one population (e.g. England) it does not automatically mean that it is also invariant in another population (e.g. Japan).

[edit] Psychological assessment

Psychological assessment is similar to psychological testing but usually involves a more comprehensive assessment of the individual. Psychological assessment is a process that involves the integration of information from multiple sources, such as tests of normal and abnormal personality, tests of ability or intelligence, tests of interests or attitudes, as well as information from personal interviews. Collateral information is also collected about personal, occupational, or medical history, such as from records or from interviews with parents, spouses, teachers, or previous therapists or physicians. A psychological test is one of the sources of data used within the process of assessment; usually more than one test is used. Many psychologists do some level of assessment when providing services to clients or patients, and may use for example, simple checklists to assess some traits or symptoms, but psychological assessment is a more complex, detailed, in-depth process. Typical types of focus for psychological assessment are to provide a diagnosis for treatment settings; to assess a particular area of functioning or disability often for school settings; to help select type of treatment or to assess treatment outcomes; to help courts decide issues such as child custody or competency to stand trial; or to help assess job applicants or employees and provide career development counseling or training.[4]

[edit] Interpreting scores


Psychological tests, like many measurements of human characteristics, can be interpreted in a norm-referenced or criterion-referenced manner. Norms are statistical representations of a population. A norm-referenced score interpretation compares an individual's results on the test with the statistical representation of the population. In practice, rather than testing a population, a representative sample or group is tested. This provides a group norm or set of norms. One representation of norms is the Bell curve (also called "normal curve"). Norms are available for standardized psychological tests, allowing for an understanding of how an individual's scores compare with the group norms. Norm referenced scores are typically reported on the standard score (z) scale or a rescaling of it. A criterion-referenced interpretation of a test score compares an individual's performance to some criterion other than performance of other individuals. For example, the generic school test typically provides a score in reference to a subject domain; a student might score 80% on a geography test. Criterion-referenced score interpretations are generally more applicable to achievement tests rather than psychological tests. Often, test scores can be interpreted in both ways; a score of 80% on a geography test could place a student at the 84th percentile, or a standard score of 1.0 or even 2.0.

[edit] Types of psychological tests


There are several broad categories of psychological tests:

[edit] IQ/achievement tests

IQ tests purport to be measures of intelligence, while achievement tests are measures of the use and level of development of use of the ability. IQ (or cognitive) tests and achievement tests are common norm-referenced tests. In these types of tests, a series of tasks is presented to the person being evaluated, and the person's responses are graded according to carefully prescribed guidelines. After the test is completed, the results can be compiled and compared to the responses of a norm group, usually composed of people at the same age or grade level as the person being evaluated. IQ tests which contain a series of tasks typically divide the tasks into verbal (relying on the use of language) and performance, or non-verbal (relying on eyehand types of tasks, or use of symbols or objects). Examples of verbal IQ test tasks are vocabulary and information (answering general knowledge questions). Non-verbal examples are timed completion of puzzles (object assembly) and identifying images which fit a pattern (matrix reasoning). IQ tests (e.g., WAIS-IV, WISC-IV, Cattell Culture Fair III, Woodcock-Johnson Tests of Cognitive Abilities-III, Stanford-Binet Intelligence Scales V) and academic achievement tests (e.g. WIAT, WRAT, Woodcock-Johnson Tests of Achievement-III) are designed to be administered to either an individual (by a trained evaluator) or to a group of people (paper and pencil tests). The individually-administered tests tend to be more comprehensive, more reliable, more valid and generally to have better psychometric characteristics than group-administered tests. However, individually administered tests are more expensive to administer because of the need for a trained administrator (psychologist, school psychologist, or psychometrician).

[edit] Public safety employment tests


Vocations within the public safety field (i.e., fire service, law enforcement, corrections, emergency medical services) often require Industrial Organizational Psychology tests for initial employment and advancement throughout the ranks. The National Firefighter Selection Inventory - NFSI, the National Criminal Justice Officer Selection Inventory NCJOSI, and the Integrity Inventory are prominent examples of these tests.

[edit] Attitude tests


Attitude test assess an individual's feelings about an event, person, or object. Attitude scales are used in marketing to determine individual (and group) preferences for brands, or items. Typically attitude tests use either a Thurston Scale, or Likert Scale to measure specific items.

[edit] Neuropsychological tests


Main article: Neuropsychological test These tests consist of specifically designed tasks used to measure a psychological function known to be linked to a particular brain structure or pathway. They are typically used to assess impairment after an injury or illness known to affect neurocognitive

functioning, or when used in research, to contrast neuropsychological abilities across experimental groups.

[edit] Personality tests


Main article: Personality test Psychological measures of personality are often described as either objective tests or projective tests. The terms "objective test" and "projective test" have recently come under criticism in the Journal of Personality Assessment. The more descriptive "rating scale or self-report measures" and "free response measures" are suggested, rather than the terms "objective tests" and "projective tests," respectively.

[edit] Objective tests (Rating scale or self-report measure)


Objective tests have a restricted response format, such as allowing for true or false answers or rating using an ordinal scale. Prominent examples of objective personality tests include the Minnesota Multiphasic Personality Inventory, Millon Clinical Multiaxial Inventory-III,[5] Child Behavior Checklist,[6] Symptom Checklist 90[7] and the Beck Depression Inventory.[8] Objective personality tests can be designed for use in business for potential employees, such as the NEO-PI, the 16PF, and the OPQ (Occupational Personality Questionnaire), all of which are based on the Big Five taxonomy. The Big Five, or Five Factor Model of normal personality, has gained acceptance since the early 1990s when some influential meta-analyses (e.g., Barrick & Mount 1991) found consistent relationships between the Big Five personality factors and important criterion variables. Another personality test based upon the Five Factor Model is the Five Factor Personality Inventory Children (FFPI-C.).[9] aa

[edit] Projective tests (Free response measures)


Projective tests allow for a freer type of response. An example of this would be the Rorschach test, in which a person states what each of ten ink blots might be. Projective testing became a growth industry in the first half of the 1900s, with doubts about the theoretical assumptions behind projective testing arising in the second half of the 1900s.[10] Some projective tests are used less often today because they are more time consuming to administer and because the reliability and validity are controversial. As improved sampling and statistical methods developed, much controversy regarding the utility and validity of projective testing has occurred. The use of clinical judgement rather than norms and statistics to evaluate people's characteristics has convinced many that projectives are deficient and unreliable (results are too dissimilar each time a test is given to the same person). However, many practitioners continue to rely on projective testing, and some testing experts (e.g., Cohen, Anastasi) suggest that these measures can

be useful in developing therapeutic rapport. They may also be useful in creating inferences to follow-up with other methods. The most widely used scoring system for the Rorschach is the Exner system of scoring.[11] Another common projective test is the Thematic Apperception Test (TAT),[12] which is often scored with Westen's Social Cognition and Object Relations Scales[13] and Phebe Cramer's Defense Mechanisms Manual.[14] Both "rating scale" and "free response" measures are used in contemporary clinical practice, with a trend toward the former. Other projective tests include the House-Tree-Person Test, Robert's Apperception Test, and the Attachment Projective.

[edit] Sexological tests


Main article: Sexological testing The number of tests specifically meant for the field of sexology is quite limited. The field of sexology provides different psychological evaluation devices in order to examine the various aspects of the discomfort, problem or dysfunction, regardless of whether they are individual or relational ones.

[edit] Direct observation tests


Although most psychological tests are "rating scale" or "free response" measures, psychological assessment may also involve the observation of people as they complete activities. This type of assessment is usually conducted with families in a laboratory, home or with children in a classroom. The purpose may be clinical, such as to establish a pre-intervention baseline of a child's hyperactive or aggressive classroom behaviors or to observe the nature of a parent-child interaction in order to understand a relational disorder. Direct observation procedures are also used in research, for example to study the relationship between intrapsychic variables and specific target behaviors, or to explore sequences of behavioral interaction. The Parent-Child Interaction Assessment-II (PCIA)[15] is an example of a direct observation procedure that is used with school-age children and parents. The parents and children are video recorded playing at a make-believe zoo. The Parent-Child Early Relational Assessment (Clark, 1999)[16] is used to study parents and young children and involves a feeding and a puzzle task. The MacArthur Story Stem Battery (MSSB)[17] is used to elicit narratives from children. The Dyadic Parent-Child Interaction Coding System-II (Eyberg, 1981) tracks the extent to which children follow the commands of parents and vice versa and is well suited to the study of children with Oppositional Defiant Disorders and their parents.

[edit] Test security


Many psychological tests are generally not available to the public, but rather, have restrictions both from publishers of the tests and from psychology licensing boards that

prevent the disclosure of the tests themselves and information about the interpretation of the results.[18][19] Test publishers consider both copyright and matters of professional ethics to be involved in protecting the secrecy of their tests, and they sell tests only to people who have proved their educational and professional qualifications to the test maker's satisfaction. Purchasers are legally bound from giving test answers or the tests themselves out to the public unless permitted under the test maker's standard conditions for administration of the tests.[20]

[edit] References
1.
^ Mellenbergh, G.J. (2008). Chapter 10: Surveys. In H.J. Adr & G.J. Mellenbergh (Eds.) (with contributions by D.J. Hand), Advising on Research Methods: A consultant's companion (pp. 183-209). Huizen, The Netherlands: Johannes van Kessel Publishing. 2. ^ American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association. 3. ^ Mellenbergh, G.J. (1989). Item bias and item response theory. International Journal of Educational Research, 13(2), 127--143. 4. ^ Standards for Education and Training in Psychological Assessment: Position of the Society for Personality Assessment An Official Statement of the Board of Trustees of the Society for Personality Assessment. Journal of Personality Assessment, 87, 355 357. 5. ^ Millon, T. (1994). Millon Clinical Multiaxial Inventory-III. Minneapolis, MN: National Computer Systems. 6. ^ Achenbach, T. M., & Rescorla, L. A. (2001). Manual for the ASEBA SchoolAge Forms and Profiles. Burlington: University of Vermont, Research Center for Children, Youth, and Families. ISBN 0-938565-73-7 7. ^ Derogatis L. R. (1983). SCL90: Administration, Scoring and Procedures Manual for the Revised Version. Baltimore: Clinical Psychometric Research. 8. ^ Beck, A. T., Steer, R. A., & Brown, G. K. (1996). Manual for the Beck Depression Inventory, 2nd ed. San Antonio, TX: The Psychological Corporation. 9. ^ McGhee, RL., Ehrler, D., & Buckhalt, J. (2008). Manual for the Five Factor Personality Inventory Children. Austin, TX: Pro Ed, Inc. 10. ^ Wasserman, John D (2003). "Nonverbal Assessment of Personality and Psychopathology". In McCallum, R. Steve. Handbook of Nonverbal Assessment. New York: Kluwer Academic / Plenum Publishers. ISBN 0-306-47715-7. http://books.google.com/books?id=_Z_rgY-Qb-sC&pg=PA283&dq=nonverbal+ %22personality+and+psychopathology %22&hl=en&ei=33PnTIH_GZDBcZLxteUK&sa=X&oi=book_result&ct=result&resnum =1&ved=0CCcQ6AEwAA#v=onepage&q=nonverbal%20%22personality%20and %20psychopathology%22&f=false. Retrieved 20 November 2010 11. ^ Exner, J. E. & Erdberg, P. (2005) The Rorschach: A comprehensive system: advanced Interpretation (3rd Edition. Vol 2). Hoboken, NJ: Wiley and Sons. 12. ^ Murray, H. A. (1943). Thematic Apperception Test manual. Cambridge, MA: Harvard University Press.

13.

^ Westen, D. (1991). Social cognition and object relations. Psychological Bulletin, 109(3), 429455. 14. ^ Cramer, P. (2002). Defense Mechanism Manual, revised June 2002. Unpublished manuscript, Williams College. (Available from Dr. Phebe Cramer.) 15. ^ Holigrocki, R. J, Kaminski, P. L., & Frieswyk, S. H. (1999). Introduction to the Parent-Child Interaction Assessment. Bulletin of the Menninger Clinic, 63(3), 413428. 16. ^ Clark, R. (1999). The Parent-Child Early Relational Assessment: A Factorial Validity Study. Educational and Psychological Measurement, 59(5), 821846. 17. ^ Bretherton, I., Oppenheim, D., Buchsbaum, H., Emde, R. N., & the MacArthur Narrative Group. (1990). MacArthur Story-Stem battery. Unpublished manual. 18. ^ The Committee on Psychological Tests and Assessment (CPTA), American Psychological Association (1994). "Statement on the Use of Secure Psychological Tests in the Education of Graduate and Undergraduate Psychology Students". American Psychological Association. http://www.apa.org/science/securetests.html. "It should be recognized that certain tests used by psychologists and related professionals may suffer irreparable harm to their validity if their items, scoring keys or protocols, and other materials are publicly disclosed." 19. ^ Kenneth R. Morel (2009-09-24). "Test Security in Medicolegal Cases: Proposed Guidelines for Attorneys Utilizing Neuropsychology Practice". Archives of Clinical Neuropsychology (Oxford University Press) 24 (7): 635646. doi:10.1093/arclin/acp062. PMID 19778915. http://acn.oxfordjournals.org/cgi/reprint/24/7/635. Retrieved 2009-11-08. 20. ^ Pearson Assessments (2009). "Legal Policies". Psychological Corporation. https://psychcorp.pearsonassessments.com/hai/Templates/Generic/NoBoxTemplate.aspx? NRMODE=Published&NRNODEGUID=%7b94E55B33-4F2E-4ABC-A2E09842926A478F%7d&NRORIGINALURL=%2fpai%2fca%2flegal%2flegal %2ehtm&NRCACHEHINT=NoModifyGuest#release. Retrieved 2009-11-15.

[edit] External links


American Psychological Association webpage on testing and assessment Society for Personality Assessment: Standards for Education and Training in Psychological Assessment "What is Psychological Testing?"

[show]v d ePsychology

[show]v d ePsychologic and psychiatric evaluation and testing (ICD-9-CM V3

94.0-94.1, ICD-10-PCS GZ1)

M: PSO/PSI

mepr

dsrd (o, p, m, p, a, d, proc(eval/thrp), s), sysi/epon, spvo drug(N5A/5B/5C/6A/6B/6D)

Retrieved from "http://en.wikipedia.org/wiki/Psychological_testing" Categories: Psychological testing | Clinical psychology Hidden categories: Articles with inconsistent citation formats
Personal tools

Log in / create account

Namespaces

Article Discussion

Variants Views Actions Search

Read Edit View history

Navigation

Main page Contents Featured content Current events Random article Donate to Wikipedia

Interaction Toolbox

Help About Wikipedia Community portal Recent changes Contact Wikipedia

What links here Related changes Upload file Special pages Permanent link Cite this page

Print/export

Create a book Download as PDF Printable version

Languages

Deutsch Espaol Franais Italiano Nederlands Norsk (bokml) Polski Suomi Svenska

This page was last modified on 15 July 2011 at 07:40. Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply. See Terms of use for details. Wikipedia is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization. Contact us

Privacy policy About Wikipedia Disclaimers

Psychological tests can be grouped into several broad categories. Personality tests measure personal qualities, sometimes referred to as traits. Achievement tests measure what a person has learned. Aptitude tests are designed to predict future behaviour, such as success in school or job performance. Intelligence tests measure verbal and/or nonverbal skills related to academic success. Interest inventories are used to help individuals make effective career choices. Psychological tests are usually administered and interpreted by a psychologist because studies in psychopathology, along with academic courses and supervision in psychological testing, are an integral part of the doctoral degree in clinical psychology. A counsellor who has had the appropriate academic courses and supervision may administer occupational tests or achievement and aptitude tests, but most counselors have not received the training to administer personality tests. Academic courses and supervision in psychological testing are usually not a part of a psychiatrist's medical training, so most psychiatrists can ethically administer only some specific clinical tests that are straight-forward check-lists of symptoms. Of course, ethics is one thing, and the desire to make money is another thing. Therefore you will often find individuals offering to do all kinds of psychological testingoften on the Interneteven when they lack the training to administer and interpret such tests. Psychological tests fall into several categories: 1. Achievement and aptitude tests are usually seen in educational or employment settings, and they attempt to measure either how much you know about a certain topic (i.e., your achieved knowledge), such as mathematics or spelling, or how much of a capacity you have (i.e., your aptitude) to master material in a particular area, such as mechanical relationships. Intelligence tests attempt to measure your intelligence, or your basic ability to

understand the world around you, assimilate its functioning, and apply this knowledge to enhance the quality of your life. Or, as Alfred Whitehead said about intelligence, it enables the individual to profit by error without being slaughtered by it. Intelligence, therefore, is a measure of a potential, not a measure of what youve learned (as in an achievement test), and so it is supposed to be independent of culture. The trick is to design a test that can actually be culture-free; most intelligence tests fail in this area to some extent for one reason or another. 2. Neuropsychological tests attempt to measure deficits in cognitive functioning (i.e., your ability to think, speak, reason, etc.) that may result from some sort of brain damage, such as a stroke or a brain injury. 3. Occupational tests attempt to match your interests with the interests of persons in known careers. The logic here is that if the things that interest you in life match up with, say, the things that interest most school teachers, then you might make a good school teacher yourself. 4. Personality tests attempt to measure your basic personality style and are most used in research or forensic settings to help with clinical diagnoses. Two of the most well-known personality tests are the Minnesota Multiphasic Personality Inventory (MMPI), or the revised MMPI-2, composed of several hundred yes or no questions, and the Rorschach (the inkblot test), composed of several cards of inkblotsyou simply give a description of the images and feelings you experience in looking at the blots. Personality tests are either objective or projective. Objective Tests Objective tests present specific questions or statements that are answered by selecting one of a set of alternatives(eg. true or false). Objective tests traditionally use a "paper-and-pencil" format which is simple to score reliably. Although many objective tests ask general questions about preferences and behaviours, situational tests solicit responses to specific scenarios. The MMPI - The Minnesota Multiphasic Personality Inventory is the leading objective pesonality test. Its hundreds of true-false items cover a broad range of behaviours. A major advantage of the MMPI is the incorporation of validity scales designed to detect possible response bias, such as trying to present oneself in a socially desirable way. Projective Techniques Projective personality tests use ambiguouis stimuli into which hte test take presumably projects meaning. This indirect type of assessment is believed by

many to more effectively identify a person's real or underlying personality. a. Scoring Projective Techniques Because the test taker is free to respond in any way, rather than being required to select an answer from a set of alternatives, projective tests can be difficult to score. To ensure reliability, projective tests must be accompanied by a specific set of scoring criteria. Projective tests are more reliable and valid when scoring focuses on the way the questions are answered (sturcdture of responses) rather than the content of the answers. Two leading projective tests are the Rorschach and the Thematic Apperception Test(TAT). b. The Rorschach Test In the Rorschach, individuals are asked to describe in detail their impressions of a series of inkblots. Scoring involves analysis of both the structure and content of responses. c. The Thematic Apperception Test (TAT) In the TAT, individuals construct stories to describe a series of pictures. TAT analysis traditionally focuses on the role played by the main character in each story. 5. Specific clinical tests attempt to measure specific clinical matters, such as your current level of anxiety or depression.

WASHINGTON EDUCATIONAL RESEARCH ASSOCIATION White Paper

Ethical Standards in Testing: Test Preparation and Administration


WERA Professional Publications Volume 1 1999 Revised, 2001

Ethical Standards in Testing: Test Preparation and Administration WERA Professional Publications Volume 1 (Revised, 2001)

M. A. Power, Editor 1999 Washington Educational Research Association P.O. Box 64489 University Place, WA 98464 www.wera-web.org

Ethical Standards in Testing: Test Preparation and Administration


Introduction Tests should give an accurate picture of students knowledge and skills in the subject area or domain being tested. Accurate achievement data are very important for planning curriculum and instruction and for program evaluation. Test scores that overestimate or underestimate students actual knowledge and skills cannot serve these important purposes. The purpose of the Washington State Educational Assessment Program (WSEAP) is to promote learning by assessing essential skills that all students should possess. WSEAP assessments use representative samples of test items from a content area to estimate student achievement. To get valid and reliable results, it is essential that the scores from selected test items accurately reflect the larger domain of knowledge. Some efforts to help students do well on assessments can cause artificially high test scores. In other situations, when students have not been adequately prepared to take the assessments - or to take them seriously - artificially low test scores can result. Test preparation activities which promote quality, long-term learning are appropriate, even essential. Good test-taking skills and appropriate content learning can reduce the likelihood that extraneous factors will influence students test scores. Unethical and inappropriate activities are those aimed only at increasing shortterm learning and test scores. Any effort to influence performance on specific items or item types is inappropriate, without instruction in and attention to the broader area which those items represent. Attempting to target specific items undermines the purpose of the assessment and calls affected student scores into question.

With the active participation of representatives from other educational associations and agencies in the state, the Washington Educational Research Association (WERA) has developed this position paper on ethical standards in test preparation and administration. Guidelines also are included to help in creating a situation that will assist students in doing their best on tests. The best way to prevent inappropriate testing practice is to help teachers and administrators become aware of what is good practice, and what is not. WERA invites its members and those of other associations and organizations to help shape and subscribe to these standards. Everyone concerned with the accuracy of data on student achievement needs to help spread the word about what constitutes appropriate and ethical test preparation and administration. ACTION SEMINAR PARTICIPANTS - ETHICAL TESTING STANDARDS The following individuals participated in a series of seminars in 199899 during which these standards were developed. Jim Nelson Seminar Facilitator and Writer WERA Member Emeritus Gig Harbor, WA Linda Elman Jerry Litzenberger Director of Research & Evaluation Director, Graduate Follow up Study Central Kitsap School District Snohomish, WA Gordon Ensign Jr. Duncan MacQuarrie Director of Assessment (Retired) Director of Curriculum and Assessment Commission on Student Learning Office Supt. Of Public Instruction Jill Hearne Steve Siera Educational Consultant Director, Research & Assessment Seattle Kent School District Bev Henderson Bob Silverman Curriculum Coordinator Senior WASL Analyst Kennewick School District Office Supt. of Public Instruction Audrian Huff Donna Smith Principal, Fairwood Elem. School Principal, Terminal Park Elem. School Kent School District Auburn School District Wally Hunt Ric Williams Supervisor, Title I/Learning Director, Evaluation and Research Assistance Program Everett Public Schools

Office Supt. of Public Instruction WASHINGTON EDUCATIONAL RESEARCH ASSOCIATION ETHICAL STANDARDS TEST PREPARATION AND ADMINISTRATION IT IS APPROPRIATE AND ETHICAL TO: 1. Communicate to students, parents and the public what any test does and does not do, when and how it will be administered, and how the results may be appropriately used. 2. Teach to the Essential Learning Requirements (WA. state curriculum standards) at each grade level so that students will learn the skills and knowledge they need to accurately show what they know and can do. 3. Incorporate all subject area objectives into the local curriculum throughout the year including, but not limited to, the objectives of the tests to be administered. 4. Review skills, strategies, and concepts previously taught. 5. Teach and review test-taking and familiarization skills that include an understanding of test characteristics independent of the subject matter being tested. 6. Use any test preparation documents and materials prepared by the testmaker, the Office of the Superintendent of Public Instruction or the Commission on Student Learning. 7. Read and discuss the test administration manual with colleagues. 8. Schedule and provide the appropriate amount of time needed for the assessment. 9. Take appropriate security precautions before, during and after administration of the test. 10. Include all eligible students in the assessment. 11. Actively proctor students during tests, keeping them focused and on task. 12. Seek clarification on issues and questions from the administrative team responsible for ethical and appropriate practices. 13. Avoid any actions that would permit or encourage individuals or groups of students to receive scores that misrepresent their actual level of knowledge and skill. BEFORE THE TEST - IT IS INAPPROPRIATE AND UNETHICAL TO: 1. Use any test preparation material that promises to raise scores on a particular test by targeting skills or knowledge from specific test items, and does not increase students general knowledge and skills. Materials which target the general skills tested may be appropriate if they reflect school or district

priorities and best practices. 2. Limit curriculum and instruction only to those skills, strategies, and concepts included on the test. 3. Limit review to only those areas on which student performance was low on previous tests. 4. "Cram" test material just before the tests are given. 5. Train students for testing using locally developed versions of national normreferenced tests. *6. Reveal all or any part of secure copyrighted tests to students, in any manner, oral or written, prior to test administration. *7. Copy or otherwise reproduce all or any part of secure or copyrighted tests. *8. Review or provide test question answers to students. *9. Possess unauthorized copies of state tests. DURING THE TEST - IT IS INAPPROPRIATE AND UNETHICAL TO: 1. Read any parts of the test to students except where indicated in the directions. 2. Define or pronounce words used in the test. 3. Make comments of any kind during the test, including remarks about quality or quantity of student work, unless specifically called for in the administration manual. 4. Give "special help" of any kind to students taking the test. 5. Suggest or "coach" students to mark or change their answers in any way. 6. Exclude eligible students from taking the test. *7. Reproduce test documents for any purpose.
* It is illegal under state statute to conduct or assist in carrying out any of the items marked with *. (Penalties may range from fines to dismissal, or even withdrawal of certification. [RCW 28A.230.190. Acts of Unprofessional Conduct, WAC 18087-050])

AFTER THE TEST - IT IS INAPPROPRIATE AND UNETHICAL TO: 1. Make inaccurate reports, unsubstantiated claims, inappropriate interpretations, or otherwise false and misleading statements about assessment results. *2. Erase or change student answers.
* It is illegal under state statute to conduct or assist in carrying out any of the items marked with *. (Penalties may range from fines to dismissal, or even withdrawal of certification. [RCW 28A.230.190. Acts of Unprofessional Conduct, WAC 18087-050])

Many of the issues regarding ethical assessment practice are in the hands of the classroom teacher, but a significant number of these issues must be addressed through administrative practice. GUIDELINES FOR TEST PREPARATION AND ADMINISTRATION The Teachers Role:

Students will do their best on tests if they find an encouraging and supportive atmosphere, if they know that they are well prepared, and that with hard work they will perform well. To create a situation that will encourage students to do their best, teachers should: 1. Attend workshops on test administration. 2. Develop an assessment calendar and schedule and share it with students and parents. 3. Prepare students well in advance for assessment by teaching test-wiseness skills independent of the subject matter being tested. Teach and review test familiarity that includes an understanding of how to use the test booklets and answer sheets, item response strategies, time management, listening, and following directions. 4. Develop a list of which and how many students will be tested and when. Determine which students will require special accommodations. 5. Develop a list of students who will be exempted from testing and the reason for the exemption. This list must be reviewed and approved by the principal or test administration committee. Parents must be notified and alternative assessments must be identified. 6. Develop plans for the administration of makeup tests for students absent during the scheduled testing period. 7. Prepare and motivate students just before the test. 8. Prepare to administer the test, with sufficient materials available for all students to be tested. 9. Prepare classrooms for the test. Arrange for comfortable seating where students will not be able to see each other's test materials but will be able to hear test directions. Eliminate posters or other materials that may be distracting or contain information that could be used to help students answer test items. 10. Alert neighboring teachers to the testing schedule and ask their help in achieving optimal testing conditions and in keeping noise levels to a minimum. 11. Arrange for a separate supervised area for those students who finish early and may cause a distraction for other students.

12. Read the test administration manual carefully, in advance. Administer the test according to directions. 13. Meet with proctors and discuss their duties and responsibilities. Carefully and actively proctor the test. 14. Arrange for appropriate breaks and student stress relievers. 15. Follow the rules for test security and return all test materials to the test administrator. GUIDELINES FOR TEST PREPARATION AND ADMINISTRATION The Principal's Role There are a number of things the principal can do to enhance the testing atmosphere in the school. 1. Inform both students and parents about what each test does and does not do, when and how it will be administered, and how the results will be reported and used. Indicate the importance of tests for students, staff, and the school. Stress the importance of school attendance on the scheduled testing dates. 2. Encourage the implementation of appropriate test-wiseness teaching and review. Teaching test familiarity skills should be independent of subject matter being tested. Discourage subject matter drill and practice solely for the test. 3. Let parents know about upcoming tests and what they can do to encourage their children's performance. 4. Work with teachers to develop a building testing schedule. Attempt to maximize the efficiency of the building's physical layout and resources. 5. Pay careful attention to school schedules during the testing period. Avoid planning assemblies, fire drills, maintenance, etc., during the testing period. 6. Develop a plan to keep tests and answer sheets secure before and after administration, and ensure that all are returned properly. 7. Arrange, where possible, for teachers to have proctoring help in administering tests. Ensure that tests are carried out according to ethical and legal practice. 8. Provide a handbook or policy statement such as this one to all involved with test administration spelling out proper and improper testing procedures.

9. Create a process to check out any suspicions or allegations of cheating. Document all steps taken. 10. Require detailed written explanations about why a student was not tested or the reason a score was not figured into a school's average. 11. Encourage teachers' participation in workshops and inservice sessions on assessment. 12. Ensure that all students are tested. Review all test accommodations, including exclusion, as a last resort, made for students with special needs. Ensure that accommodations/exclusions are consistent with specific testing program guidelines, and that appropriate accommodations are available as needed. 13. Ensure that there are no interruptions in classrooms during the testing period, including custodial tasks, intercom calls, delivery of messages, etc. 14. Work with the test coordinator and classroom teachers to schedule and staff makeup days for students who miss all or parts of the test. This might include bringing in a substitute or finding other ways to creatively use building staff to administer makeups in an appropriate setting. 15. Share test results with all staff. Staff members need to work together to ensure that the testing process is a smooth one. School improvement is a team effort. REFERENCES American Association for Counseling and Development (now American Counseling Association) and Association for Measurement and Evaluation in Counseling and Development (now Association for Assessment in Counseling) (1989). Responsibilities of users of standardized tests: RUST statement revised. Alexandria, VA: Author. American Educational Research Association, American Psychological Association, National Council of Measurement in Education (1985). Standards for educational and psychological testing. Washington, DC: APA. American Educational Research Association (1992). Ethical standards of the American Educational Research Association. Washington, DC: Author.

American Federation of Teachers, National Council on Measurement in Education, National Education Association (1990). Standards for teacher competence in educational assessment of students. Washington, DC: Author. American Psychological Association (1992). Ethical principles of psychologists and code of conduct. Washington, DC: Author. Assessment Handbook: A collection of proper test administration guidelines, prepared collaboratively by staff in the Departments of Planning, Research and Evaluation, Curriculum, Instruction and Staff Development, and Regions 1 and 2 of the South Kitsap School District. Centralia School District (1998) CTB/McGraw-Hill. (1997). Position Regarding Use of Test-Related Instructional Materials. Monterey, CA: Author. Ethics in Assessment. (1995). ERIC Digest. Author: Schmeiser, Cynthia B. Document Identifier: ERIC Document Reproduction Service No ED301111 Publication Year 30 Jan 1995 Document Type: Eric Product (071); Eric Digests Joint Committee on Standards for Educational Evaluation (1988). The personnel evaluation standards: How to assess systems for evaluating educators. Newberry Park, CA: Sage. Joint Committee on Standards for Educational Evaluation (1994). The program evaluation standards: How to assess evaluations of educational programs. Thousand Oaks, CA: Sage. Mehrens, W. A. (1984). "National Tests and Local Curriculum: Match or Mismatch?" Educational Measurement: Issues and Practice, 3, (3), 9-15. Mehrens, W. A. and Kaminski, J. (1989). "Methods for Improving Standardized Test Scores: Fruitful, Fruitless or Fraudulent. Educational Measurement: Issues and Practices, 8 (1), 14-22. Mehrens, W. A., Popham, J. W. and Ryan, J.M. (1998). "How to Prepare Students for Performance Assessments, Educational Measurement: Issues and Practice, 3, (3), 18-22. Michigan State Board of Education. (1987). Michigan Educational Assessment Program: Local and intermediate district coordinators manual. Lansing, MI: Author.

National Council of Measurement in Education, Ad hoc Committee on the Development of a Code of Ethics (1995). Code of Professional Responsibilities in Educational measurement. Washington, DC: Author. Popham, J. W. (1991). "Appropriateness of Teachers Test-Preparation Practices. Educational Measurement: Issues and Practice, 1, (1), 12-15. Preparing Students To Take Standardized Achievement Tests. ERIC Digest. Author: Mehrens, William A. Document Identifier: ERIC Document Reproduction Service No ED314427 Note: 3p.; Dec 1989 Document Type: Eric Product (071); Eric Digests. Washington State Testing Program Statutory References. (1998). RCW 28A.230.190. Acts of Unprofessional Conduct, WAC 180-87-050. [Ch. 180-87 WAC-p.2]

Percentiles and More Quartiles


Topic Index | Algebra Index | Regents Exam Prep Center

Percentiles are like quartiles, except that percentiles divide the set of data into 100 equal parts
while quartiles divide the set of data into 4 equal parts. Percentiles measure position from the bottom. Percentiles are most often used for determining the relative standing of an individual in a population or the rank position of the individual. Some of the most popular uses for percentiles are connected with test scores and graduation standings. Percentile ranks are an easy way to convey an individual's standing at graduation relative to other graduates. Unfortunately, there is no universally accepted definition of "percentile". Consider the following two slightly different definitions: Definition 1: A percentile is a measure that tells Definition 2: A percentile is a measure that tells us what percent of the total frequency scored at or us what percent of the total frequency scored

below that measure. A percentile rank is the percentage of scores that fall at or below a given score. Formula: To find the percentile rank of a score, x, out of a set of n scores, where x is included:

below that measure. A percentile rank is the percentage of scores that fall below a given score. Formula: To find the percentile rank of a score, x, out of a set of n scores, where x is not included:

Where B = number of scores below x E = number of scores equal to x n = number of scores


See this formula in more detail in the Examples section.

Example: If Jason graduated 25th out of a class of Example: If Jason graduated 25th out of a class of 150 students, then 125 students were ranked 150 students, then 125 students were ranked below below Jason. Jason's percentile rank would be: Jason. Jason's percentile rank would be:

Jason's standing in the class at the 84th percentile is as higher or higher than 84% of the graduates. Good job, Jason!

Jason's standing in the class at the 83rd percentile is higher than 83% of the graduates. Good job, Jason!

The slight difference in these two definitions can lead to significantly different answers when dealing with small amounts of data. Note: We will be using Definition 1 for the rest of this page.
(other interpretations are also possible - check with your teacher)

About Percentile Ranks:


percentile rank is a number between 0 and 100 indicating the percent of cases falling at or below that score. percentile ranks are usually written to the nearest whole percent: 74.5% = 75% = 75th percentile scores are divided into 100 equally sized groups scores are arranged in rank order from lowest to highest there is no 0 percentile rank - the lowest score is at the first percentile there is no 100th percentile - the highest score is at the 99th percentile. you cannot perform the same mathematical operations on percentiles that you can on raw scores. You cannot, for example, compute the mean of percentile scores, as the results may be misleading.

Consider:
1. Karl takes the big Earth Science test and his teacher tells him that he scored at the 92nd percentile. Is

Karl pleased with his performance on the test? He should be. He scored as high or higher than 92% of the
people taking the test.

2. Sue takes the Chapter 4 math test. If Sue's score is the same as "the mean" score for the math test, she scored at the 50th percentile and she did "as well or better than" 50% of the students taking the test. 3. If Ty scores at the 75th percentile on the Social Studies test, he did "as well or better than" 75% of the students taking the test.

Examples: Finding Percentiles 1. The math test scores were: 50, 65, 70, 72, 72, 78, 80, 82, 84, 84, 85, 86, 88, 88, 90, 94, 96, 98, 98,
99. Find the percentile rank for a score of 84 on this test. Be sure the scores are ordered from smallest to largest. Locate the 84. Solution Using Formula:

Solution Using Visualization: Since there are 2 values equal to 84, assign one to the group "above 84" and the other to the group "below 84". 50, 65, 70, 72, 72, 78, 80, 82, 84, | 84, 85, 86, 88, 88, 90, 94, 96, 98, 98, 99

The score of 84 is at the 45th percentile for this test.

2. The math test scores were: 50, 65, 70, 72, 72, 78, 80, 82, 84, 84, 85, 86, 88, 88, 90, 94, 96, 98, 98,
99. Find the percentile rank for a score of 86 on this test.

Be sure the scores are ordered from smallest to largest. Locate the 86. Solution Using Formula:

Solution Using Visualization: Since there is only one value equal to 86, it will be counted as "half" of a data value for the group "above 86" as well as the group "below 86". 50, 65, 70, 72, 72, 78, 80, 82, 84, 84, 85, 8|6, 88, 88, 90, 94, 96, 98, 98, 99

The score of 86 is at the 58th percentile for this test.

3. Quartiles can be thought of as percentile measure. Remember that quartiles break the data set into 4
equal parts. If 100% is broken into four equal parts, we have subdivisions at 25%, 50%, and 75% creating the: First quartile (lower quartile) to be at the 25th percentile. Median (or second quartile) to be at the 50th percentile. Third quartile (upper quartile) to be a the 75th percentile.
Test Scores Frequency 76-80 81-85 86-90 91-95 Cumulative Frequency 3 7 6 4 3 10 16 20

In a similar fashion, the second quartile will be located (50% 20 = 10) ten values up from the bottom

in the interval 81-85.For the table at the left, find the intervals in which the first, second and third quartiles lie. The third quartile will be located (75% 20 = 15) fifteen values up from the bottom in the interval 8690. If there are a total of 20 scores, the first quartile will be located (25% 20 = 5) five values up from the bottom. This puts the first quartile in the interval 81-85.
See how to use your TI-83+/TI-84+ graphing calculator to find quartiles. Click calculator. Topic Index | Algebra Index | Regents Exam Prep Center Created by Lisa Schultzkie Copyright 1998-2011 http://regentsprep.org Oswego City School District Regents Exam Prep Center

Measurement Scales (1 of 6)
Measurement is the assignment of numbers to objects or events in a systematic fashion. Four levels of measurement scales are commonly distinguished: nominal, ordinal, interval, and ratio. There is a relationship between the level of measurement and the appropriateness of various statistical procedures. For example, it would be silly to compute the mean of nominal measurements. However, the appropriateness of statistical analyses involving means for ordinal level data has been controversial. One position is that data must be measured on an interval or a ratio scale for the computation of means and other statistics to be valid. Therefore, if data are measured on an ordinal scale, the median but not the mean can serve as a measure of central tendency. The arguments on both sides of this issue will be examined in the context of an hypothetical experiment designed to determine whether people prefer to work with color or with black and white computer displays. Twenty subjects viewed black and white displays and 20 subjects viewed color displays.

Nominal Scale

Nominal measurement consists of assigning items to groups or categories. No quantitative information is conveyed and no ordering of the items is implied. Nominal scales are therefore qualitative rather than quantitative. Religious preference, race, and sex are all examples of nominal scales. Frequency distributions are usually used to analyze data measured on a nominal scale. The main statistic computed is the mode. Variables measured on a nominal scale are often referred to as categorical or qualitative variables.

Ordinal Scale
Measurements with ordinal scales are ordered in the sense that higher numbers represent higher values. However, the intervals between the numbers are not necessarily equal. For example, on a five-point rating scale measuring attitudes toward gun control, the difference between a rating of 2 and a rating of 3 may not represent the same difference as the difference between a rating of 4 and a rating of 5. There is no "true" zero point for ordinal scales since the zero point is chosen arbitrarily. The lowest point on the rating scale in the example was arbitrarily chosen to be 1. It could just as well have been 0 or -5.

Interval Scale
On interval measurement scales, one unit on the scale represents the same magnitude on the trait or characteristic being measured across the whole range of the scale. For example, if anxiety were measured on an interval scale, then a difference between a score of 10 and a score of 11 would represent the same difference in anxiety as would a difference between a score of 50 and a score of 51. Interval scales do not have a "true" zero point, however, and therefore it is not possible to make statements about how many times higher one score is than another. For the anxiety scale, it would not be valid to say that a person with a score of 30 was twice as anxious as a person with a score of 15. True interval measurement is somewhere between rare and nonexistent in the behavioral sciences. No interval-level scale of anxiety such as the one described in the example actually exists. A good example of an interval scale is the Fahrenheit scale for temperature. Equal differences on this scale represent equal differences in temperature, but a temperature of 30 degrees is not twice as warm as one of 15 degrees. RATIO SCALES Ratio scales are like interval scales except they have true zero points. A good example is the Kelvin scale of temperature. This scale has an absolute zero. Thus, a temperature of 300 Kelvin is twice as high as a temperature of 150 Kelvin.

Interval Scale
On interval measurement scales, one unit on the scale represents the same magnitude on the trait or characteristic being measured across the whole range of the scale. For example, if anxiety were measured on an interval scale, then a difference between a score of 10 and a score of 11 would represent the same difference in anxiety as would a difference between a score of 50 and a score of 51. Interval scales do not have a "true" zero point, however, and therefore it is not possible to make statements about how many times higher one score is than another. For the anxiety scale, it would not be valid to say that a person with a score of 30 was twice as anxious as a person with a score of 15. True interval measurement is somewhere between rare and nonexistent in the behavioral sciences. No interval-level scale of anxiety such as the one described in the example actually exists. A good example of an interval scale is the Fahrenheit scale for temperature. Equal differences on this scale represent equal differences in temperature, but a temperature of 30 degrees is not twice as warm as one of 15 degrees.

You might also like