You are on page 1of 6

Questionnaire Validity

The validity of a questionnaire relies first and foremost on reliability. If the questionnaire cannot be shown to be reliable, there is no discussion of validity. But there is good news. Demonstrating validity is easy, compared to reliability. If you have reached this point and have a reliable instrument for measuring the issues or phenomena you are after, demonstrating its validity will not be difficult. Validity refers to whether the questionnaire or survey measures what it intends to measure. While there are very detailed and technical ways of proving validity that are beyond the level of this discussion, there are some concepts that are useful to keep in mind. The overriding principle of validity is that it focuses on how a questionnaire or assessment process is used. Reliability is a characteristic of the instrument itself, but validity comes from the way the instrument is employed. The following ideas support this principle:

As nearly as possible, the data gathering should match the decisions you need to make. This means if you need to make a priority-focused decision, such as allocating resources or eliminating programs, your assessment process should be a comparative one that ranks the programs or alternatives you will be considering.

Gather data from all the people who can contribute information, even if they are hard to contact. For example, if you are conducting a survey of customer service, try to get a sample of all the customers, not just those who are easy to reach, such as those who have complained or have made suggestions.

A perfect example of a questionnaire that may have high

reliability, but poor validity is a standardized questionnaire that has been used in hundreds of companies. These instruments are marketed

A good diagnosis of your organization is not likely to come from a generic instrument...

aggressively using promises of "industry norms" to compare your results with. Weigh carefully the value of such comparisons against the almost certain lack of fit with your culture, philosophy and way of managing. A good diagnosis of your organization is not likely to come from a generic instrument with lots of normative comparisons.

If you're going after sensitive information, protect your sources. It has been said that in the Prussian army at the turn of the century, decisions were made twice, once when officers were sober, again when they were drunk. This concept acknowledges the power of the "socially acceptable response" to questions or requests. Don't assume that a simple statement printed on the questionnaire that "all individual responses will be kept confidential" will make everybody relax and provide candid answers. Give respondents the freedom to decide which information about themselves they wish to withhold, and employ other administrative procedures, such as handing out Login IDs and Passwords separately from the e-mail inviting people to participate in the survey.

Questionnaire Validity

The validity of a questionnaire relies first and foremost on reliability. If the questionnaire cannot be shown to be reliable, there is no discussion of validity. But even reliable instruments may not be valid if they are employed for situations they were not designed for. A good example of a questionnaire that may have high reliability, but poor validity is a standardized questionnaire that is used over and over in hundreds of companies. These instruments are marketed aggressively using promises of "industry norms" to compare your results with. Validity is not a characteristic of a particular instrument, attached to it in a way that ensures it will always produce accurate information no matter where or when it is used. If you want validity, you have to be able to demonstrate validity in your situation; it is not built into the instrument. But there is good news. Demonstrating validity is relatively straightforward, compared to reliability. If you have reached this point and have a reliable instrument for measuring the issues or phenomena you are after, demonstrating its validity will not be difficult.

How Do We Measure Validity?


While there are detailed and technical ways of establishing validity that are beyond the level of this discussion, the following are brief descriptions of the three basic approaches. All proofs of validity employ one or more of these methods:

Content Validity If the content of a test or instrument matches an actual job or situation that is being studied, then the test has content validity. For example, a Training Needs Assessment for middle managers should have content (such as skills, activities and abilities) relevant to the jobs of middle managers. Skills that pertain to landscaping workers would not be appropriate in a needs assessment instrument for managers.

Predictive Validity This form of validity comes from an instruments ability to predict an outcome or event in the future. If a questionnaire or instrument is developed to assess promotion potential of a group of newly hired workers, the results of the test should be able to predict which of the group will actually be promoted. The predictive validity of the instrument is shown in the correlation between the scores from the test and the persons promoted.

Construct Validity This form of validity derives from the correlation between the test or questionnaire and another instrument or process that measures the same construct. The Myers-Briggs Type Indicator (MBTI) is a well-established test of personality types. A new instrument developed to assess the same characteristics would have construct validity if the scores from the new instrument correlated highly with the scores from the MBTI.

These are the methods for proving validity. Providers of questionnaires and surveys who are unable or unwilling to talk about validity in those terms (content, predictive, or construct) should be avoided. You will sometimes hear discussions of face validity. Despite the use of the term, face validity is not a form of validity assessment. It is simply a subjective appraisal of how an instrument appears to a person who examines it. There are numerous examples of valid instruments without face validity and completely bogus instruments with loads of face validity.

Reliability and Validity in Questionnaire Design


In todays world organisations need strategic goals and targets and clear measurements are needed to assess progress towards these goals. Some of these targets are easy to define and the measurements are clear cut, particularly certain financial goals, production and quality control targets. However some of the most vital aspects of a well-functioning organisation are more complex to measure. For example, the climate and culture of an organisation is known to be central to optimising employee wellbeing, productivity and innovation. Similarly, it is important to select executives or employees with certain character traits and dynamics for them to function effectively in their roles. Unlike annual income or production, which can be directly measured, many of the psychological aspects of an organisation are intangible constructs and can only be measured indirectly. The classic example of an intangible construct is Intelligence Quotient (IQ). Most of us agree that there is such a thing as intelligence and that some people have more of it than others! But unlike height or weight it cant be measured with a tape-measure or a set of bathroom scales. Figuring Out What You Want To Measure Often the first step in measuring an intangible construct is coming up with an Operational Definition. This means defining what the construct is, what its comprised of and what measures it. This stage tends to include a review of previous research on the topic to identify what is known about the subject and how people have tried to measure it in the past. In this type of work, our clients usually have a model of what makes up their construct, or we can help them develop one. As a fictitious example, they might want to measure Organisational Effectiveness and they hypothesise that it is made up of four organisational traits: Morale, Innovation, Management and Teamwork. In this case, each of the four traits needs to be measured. Questionnaires are generally used to collect this type of information. For example, a good design might be a questionnaire with six questions each about each of the traits. The responses from the six questions about each trait will later be aggregated to give a measurement of Morale, Innovation, Management and Teamwork. After defining the construct and its components (traits), and producing questions to measure each of these, a testing stage is strongly recommended. The aim of testing is to ensure that the questions are measuring what they are intended to: that is that they produce a reliable and valid measurement. Reliability Reliability means the consistency or repeatability of the measure. This is especially important if the measure is to be used on an on-going basis to detect change. There are several forms of reliability, including: Test-retest reliability - whether repeating the test/questionnaire under the same conditions produces the same results; and Reliability within a scale - that all the questions designed to measure a particular trait are indeed measuring the same trait.

Validity Validity means that we are measuring what we want to measure. There are a number of types of validity including:

Face Validity - whether at face value, the questions appear to be measuring the construct. This is largely a common-sense assessment, but also relies on knowledge of the way people respond to survey questions and common pitfalls in questionnaire design; Content Validity - whether all important aspects of the construct are covered. Clear definitions of the construct and its components come in useful here; Criterion Validity/Predictive Validity - whether scores on the questionnaire successfully predict a specific criterion. For example, does the questionnaire used in selecting executives predict the success of those executives once they have been appointed; and Concurrent Validity - whether results of a new questionnaire are consistent with results of established measures.

Validating a Model Going back to our hypothetical example, the client has a model of Organisational Effectiveness that is made up of four organisational traits: Morale, Innovation, Management and Teamwork. They also have a questionnaire with questions that are intended to measure each of these traits. However, as they are using the questionnaire to infer levels of Morale, Innovation, Management and Teamwork, it is important to assess whether the results are consistent with this model being accurate. There are a number of statistical methods available to test whether the data collected using the questionnaire supports the model, or whether either the questionnaire or the model needs revision or development. Principal components analysis and exploratory or confirmatory factor analysis are among the statistical techniques often used to assess a model. These techniques can often provide a deeper understanding of the issues being surveyed, and can reveal that questions are measuring more or less than they were intended to. For example, many years ago Data Analysis Australia staff were assisting a client with survey data relating to occupational health and safety (OHS) issues. One of the questions might be paraphrased as My supervisor puts my health and safety above productivity, which was created to measure OHS issues. However, analysis revealed responses to this question instead related mainly to the first words my supervisor, and showed more about industrial relations than OHS. Another benefit of using techniques such as factor analysis to assess a questionnaire is improved efficiency. We are often able to advise clients on ways in which they can reduce the length of their questionnaires while maintaining or increasing the information that can be obtained. Reducing the number of questions in an overly lengthy questionnaire makes it easier for respondents to complete, and increases response rates. Generalisability and Confounding Issues In testing the questionnaire, the test sample is also important. For example IQ tests were used incorrectly in the US many years ago on migrants with limited English in this case they received poor scores, but the test was inadvertently measuring their ability to read and respond to a test written in English rather than their actual IQ. There are two important lessons that can be taken from this example. The first is that other issues that alter our results can pop up in research if we dont give sufficient thought to what we are really measuring. As in the OHS example earlier, even a question that appears fine on the surface can be confounded by other issues in some cases. The second lesson is to be cautious in generalising results to other groups. If a questionnaire is designed for a specific group it is important to test it on a representative group. A questionnaire that will be used for assessing Board members should be tested on current/prospective Board members if these are the people that the information is required for. If the questionnaire is to be used on many different groups of people, its important to test it on the different groups it will be used for to ensure it is valid in all its intended usages. Which of These Issues Do I Need to Consider For My Questionnaire? The type of reliability and validity issues that need to be considered vary from one situation to the next, depending on what the questionnaire is measuring and its intended use. There are a range of statistical procedures designed to test reliability and validity. In addition specific survey designs may

be necessary to ensure that the required information is available to establish some of the more complex types of validity or reliability. A number of Data Analysis Australias clients work in specialist areas in which a small number of rigorously tested survey products form their core business. For these questionnaires in particular, attending to issues of reliability and validity is important to ensure their products are of a high quality. Ongoing research and development of the survey products allows clients to maintain an edge in the marketplace. For simpler surveys where a questionnaire is gathering information that only needs to be used in a practical way rather than inferential way, the reliability and validity requirements are more basic. However, even in these situations, it is important to make sure consideration is given to whether the survey is measuring what it should be.

Personality Questionnaire Validity and Reliability Synopsis


December, 1998
Overall: We estimate that the Personality Questionnaire will indicate an English-speaking adults personality type accurately 85% of the time in a non-controlled (i.e. over the internet) environment. Resulting types are repeatable 75% of the time in a non-controlled environment, and 95% of the time in a controlled environment. Unless otherwise noted, these statistics were generated from a set of 100,000 subjects. 1st Validation Technique: Best Approach Method. This method incorporates Content and Criterion-related validity assessment. Best Approach was used during the development of the Personality Questionnaire to ensure that we were starting with a good basic indicator. We were assured of content validity by ensuring that the creator of the Personality Questionnaire was an expert on psychological type, and on determining which behaviors were attributed to which personality functions and attitudes. 2nd Validation Technique: Comparison Method. This method was used during first-phase and second-phase testing and validation of the Personality Questionnaire. It was used primarily to validate the end-results of the Personality Questionnaire. We compared Personality Questionnaire results against the results of other well-known instruments, namely the MBTI and Keirseys Temperament Sorter. Our goal was to produce the same type as these comparable indicators at least 75% of the time in an uncontrolled environment. We released the questionnaire once we achieved this goal. Another revision after release brought us up to 85% matching. 3rd Validation Technique: Averages Method. We used this method to validate the individual questions that make up the Personality Questionnaire. This was used periodically throughout the implementation and revision of the questionnaire. For this method, we checked that answers to specific questions fell within ten percentage points of expected norms. Expected norms of the general population are as follows: 60% Extraverted, 40% Introverted, 75% Sensing, 25% Intuitive, 50% Thinking, 50% Feeling, 50% Judging, 50% Perceiving. Using 5000 questionnaire results, we checked that each question rendered results within 10% points

of these expected norms. If the question did not meet these standards, it was revised and re-tested until it did.

Reliability Technique: All reliability data was determined via Repetition.

You might also like