Michigan

MELAB
TECHNICAL
MANUAL
English Language Institute
Testing and Certification Division
3020 North University Building
University of Michigan
Ann Arbor MI 48109-1057
Please address all correspondence to:
The English Language Institute
Testing and Certification Division
3020 North University Building
The University of Michigan
Ann Arbor MI 48109-1057
telephone: (313) 764-2416
fax: (313) 763-0369
email: slbriggs@umich.edu
Second Printing, August 1996.
1994 by the English Language Institute, The University of Michigan. This document may be
reproduced or reprinted, in whole or in part, without permission as long as the source is clearly
acknowledged. This document or any reproductions may not be sold.
The Regents of the University of Michigan: Deane Baker, Ann Arbor; Paul W. Brown, Petoskey; Laurence B. Deitch,
Bloomfield Hills; Shirley M. McFee, Battle Creek; Rebecca McGowan, Ann Arbor; Philip H. Power, Ann Arbor; Nellie M.
Varner, Detroit; James L. Waters, Muskegon; James J. Duderstadt (ex officio)
Preface
This manual is intended to provide comprehensive information to those who use or are
considering using the Michigan English Language Assessment Battery (MELAB) to make
decisions about the English language proficiency of individuals. It is written specifically for
college and admission personnel and personnel in professional organizations who need such
information to assess the appropriateness and acceptability of the MELAB for particular
purposes.
Included in the manual is general information about the MELAB (Section 1), MELAB statistics
(Section 2) and reliability and validity information (Section 3). We hope that providing this
information to test users will help them be more knowledgeable about the MELAB and enable
them to judge the technical adequacy of the MELAB.
The English Language Institute at the University of Michigan (ELI-UM) has a long history of
involvement in English as a second language testing, and the development of this manual also
has a long history. We appreciate the advice and guidance of Liz Hamp-Lyons, Stan Jones, and
Peter Skehan, external consultants who helped us map out what we needed to include in the
manual. We were fortunate a few years ago to have as a research assistant Sheila Prochnow-
Mathias who conducted several systematic studies of the writing component of the MELAB. Most
recently, we benefited greatly from the expertise of Michael Persinger in handling various aspects
of the statistical analysis and putting into words the results of the factor analyses that appears in
Section 3.
This manual is also the result of a team effort of many staff members in our testing and
certification division. Mary Spaan, in particular, deserves praise for her consistent efforts to
produce tests of high technical quality, the maintenance of excellent records that were used as
the research base of this manual, offering valuable advice at all stages of the development of this
manual, and editing the manual closely at its final stage. We are grateful to Karyn Pidgeon and
Bob Sage who handled their routine duties of arranging and scoring MELAB tests so efficiently
that they had time to enter data essential for test analysis. Jennifer Engar also provided useful
assistance with data retrieval and entry. Theresa Rohlck used her excellent organizational and
administrative skills to compile a stratified random sample of MELAB papers that was used in the
reliability studies and to oversee several large scale data entry projects and has used her capable
word processing skills to transform the text and tables into a publishable format. Finally, we
appreciate the strong support of our ELI-UM director John Swales who encouraged our
completion of this manual .
As we have developed this manual, we have tried to follow the guidelines set forth in the
Standards for Educational and Psychological Testing (1985)
1
and hope that the manual offers the
information you need. If there is information that is missing or unclear, please contact us so that
we might address such inconsistencies in the next edition or a supplement.
Sarah Briggs Barbara Dobson
Associate Director for Testing and Certification Research Assistant
English Language Institute English Language Institute

1
Standards for Educational and Psychological Testing were developed thorough the joint efforts of the American
Educational Research Association, American Psychological Corporation and the National Council on Measurement in
Education and were published by American Psychological Association in Washington, DC.
TABLE OF CONTENTS
Preface ...................................................................................................................................... i
SECTION 1 GENERAL INFORMATION ABOUT THE MELAB................................... 1
1.1 AN OVERVIEW OF THE MELAB ...................................................................................... 1
1.2 MELAB ADMINISTRATION............................................................................................... 2
1.3 DESCRIPTION OF THE MELAB ....................................................................................... 3
1.3.1 MELAB Parts and their Components
1.3.2 How the MELAB relates to the former Michigan Battery
1.4 SCORING OF THE MELAB............................................................................................... 7
1.4.1 Part 1 Composition Score
1.4.2 Part 2 Listening Score
1.4.3 Part 3 Grammar, Cloze, Vocabulary, and Reading (GCVR) Score
1.4.4 Final MELAB Score
1.4.5 Speaking Test (optional)
1.5 USING THE MELAB........................................................................................................ 12
1.5.1 Interpreting Scores
1.5.2 An Example of MELAB Use for a University Context
1.5.2.1 Undergraduates
1.5.2.2 Graduates
1.5.3 An Example of MELAB Use for a Community College Context
1.5.4 An Example of MELAB Use for Professional Contexts
1.6 PREPARING FOR THE MELAB...................................................................................... 16
1.7 TEST SECURITY/INVALIDATIONS................................................................................. 16
SECTION 2 MELAB STATISTICS ................................................................................... 17
2.1 GENERAL DESCRIPTIVE STATISTICS.......................................................................... 17
2.2 FREQUENCY DISTRIBUTION OF MELAB SCORES FOR ALL EXAMINEES ................ 18
2.3 PERFORMANCE OF REFERENCE GROUPS ON THE MELAB..................................... 22
2.4 INTERCORRELATIONS AMONG MELAB SCORES AND MELAB ORAL RATING........ 24
SECTION 3 MELAB RELIABILITY AND VALIDITY.......................................................25
3.1 RELIABILITY...................................................................................................................25
3.1.1 Reliability of MELAB Part 1 (Composition)
3.1.1.1 MELAB Composition Raters: Who they are; how they are trained
3.1.1.2 Interrater Reliability
3.1.1.3 Intrarater Reliability
3.1.1.4 Alternate Form Reliability
3.1.2 Test/Retest Reliability
3.1.3 Alternate Forms Reliability (for MELAB Part 2 and MELAB Part 3)
3.1.3.1 Developing Alternate Forms
3.1.3.2 Distribution of Scores on Alternate Forms of MELAB Part 2 (Listening)
and MELAB Part 3 (GCVR)
3.1.4 Internal Consistency Reliability (KR21 and Cronbach's Alpha)
3.2 Validity............................................................................................................................34
3.2.1 Content-related Evidence
3.2.1.1 Content-related Evidence for Part 1: Composition
3.2.1.2 Content-related Evidence for Part 2: Listening
3.2.1.3 Content-Related Evidence for Part 3: Grammar, Cloze,
Vocabulary, Reading (GCVR)
3.2.1.4 Content-related Evidence for Speaking Test
3.2.1.5 Content-related Evidence for Final MELAB Score
3.2.2 Construct-related Evidence
3.2.2.1 Language Proficiency Theory and the MELAB
3.2.2.2 Factor Analysis of the MELAB
3.2.2.3 Native speaker performance on the MELAB
3.2.3 Criterion-Related Evidence
3.2.3.1 MELAB and Tests of "Productive" Language Skills
3.2.3.2 MELAB and Another Proficiency Battery, the TOEFL
3.2.3.3 MELAB and Teacher Assessments
APPENDICES..........................................................................................................................63
Appendix A MELAB Centers (By Country)
Appendix B Historical Background Leading to the MELAB
Appendix C Sample MELAB Score Report Form
Appendix D MELAB Speaking Test - Spoken English Descriptors and Salient Features
Appendix E Descriptive Statistics (1987-1990)
Appendix F Reliability (1987-1990)
LIST OF ILLUSTRATIONS
FIGURES
Figure 3.1 Box Plots of Scaled Scores of Alternate Forms of MELAB Part 2 (Listening)......... 31
Figure 3.2 Box Plots of Scaled Scores of Alternate Forms of MELAB Part 3 (GCVR)............. 32
Figure 3.3 Final MELAB Scores for Seven Levels of Written and Spoken English.................. 59
TABLES
Table 1.1 Examples of MELAB Part 2 (Listening) Test Items.................................................. 4
Table 1.2 Examples of MELAB Part 3 (GCVR) Test Items...................................................... 5
Table 1.3 MELAB Part 2 (Listening) Score Converted from Raw Score ................................ 10
Table 1.4 MELAB Part 3 (GCVR) Score Converted from Raw Score .................................... 10
Table 1.5 Final MELAB Scores and Proficiency Levels in Speaking and Writing................... 12
Table 1.6 Examples of MELAB Scores................................................................................. 13
Table 2.1 Score Descriptives for 4,811 First-Time MELABs Administered 1991-1993........... 17
Table 2.2 MELAB Scaled Scores Corresponding to Specified Percentiles ............................ 18
Table 2.3 Frequency Distribution of Final MELAB Scores..................................................... 19
Table 2.4 Frequency Distribution of MELAB Part 1 (Composition) Scores............................. 19
Table 2.5 Frequency Distribution of MELAB Part 2 (Listening) Scaled Scores ...................... 20
Table 2.6 Frequency Distribution of MELAB Part 3 (GCVR) Scaled Scores .......................... 21
Table 2.7 MELAB Scaled Score Mean and Standard Deviation by Examinee
Reason for Testing............................................................................................... 22
Table 2.8 MELAB Scaled Score Mean and Standard Deviation by Sex ................................ 22
Table 2.9 MELAB Scaled Score Mean and Standard Deviation by Age ................................ 22
Table 2.10 Mean and Standard Deviation by Native Language for Parts 1, 2, 3, and
Final MELAB Scores ............................................................................................ 23
Table 2.11 Intercorrelations of Scaled MELAB Part Scores and Final Scores and
MELAB Oral Rating, 1991-1993............................................................................ 24
Table 3.1 Summary of MELAB Reliability Estimates............................................................. 25
Table 3.2 MELAB Part 1 Interrater Reliability ....................................................................... 26
Table 3.3 Rater Score Differences for MELAB Part 1 ........................................................... 27
Table 3.4 MELAB Test/Retest Results ................................................................................. 28
Table 3.5 Correlations Among Alternate Forms of MELAB ................................................... 30
Table 3.6 Percentile Rankings of Scaled Scores of Alternate Forms of MELAB
Part 2 (Listening).................................................................................................. 30
Table 3.7 Percentile Rankings of Scaled Scores of Alternate Forms of MELAB
Part 3 (GCVR)...................................................................................................... 31
Table 3.8 Kuder-Richardson Formula 21 Internal Consistency Reliability Estimates for
MELAB Part 2 (Listening) ..................................................................................... 33
Table 3.9 Kuder-Richardson Formula 21 Internal Consistency Reliability Estimates for
MELAB Part 3 (GCVR) ......................................................................................... 33
Table 3.10 Cronbach's Alpha Reliability Estimates for MELAB Part 2 (Listening).................... 34
Table 3.11 Cronbach's Alpha Reliability Estimates for MELAB Part 3 (GCVR)....................... 34
Table 3.12 MELAB Part 2 (Listening) Items: Type and Number by Form ................................ 39
Table 3.13 MELAB Part 3 (GCVR) Grammar Items: Type and Number by Form.................... 42
Table 3.14 MELAB Part 3 Reading Passage Readability Statistics......................................... 43
Table 3.15 MELAB Part 3 Reading Passages (type and length) and Item Difficulty................. 44
Table 3.16 MELAB Part 3 (GCVR) Item Difficulty by Sub-section ........................................... 45
Table 3.17 MELAB Components and Bachman/Palmer Model of Language Knowledge......... 47
Table 3.18 Part 2 (Listening) Form BB Component Means, Standard Deviations, and
Correlation Matrix .................................................................................................49
Table 3.19 Part 2 (Listening) Form BB Factor Loadings Single Factor Solution.......................49
Table 3.20 Part 2 (Listening) Form BB Reproduced Correlation Matrix Single Factor
Solution ................................................................................................................49
Table 3.21 Part 2 (Listening) Form CC Component Means, Standard Deviations, and
Table 3.22 Part 2 (Listening) Form CC Factor Loadings Single Factor Solution.......................50
Table 3.23 Part 2 (Listening) Form CC Reproduced Correlation Matrix Single Factor
Solution ................................................................................................................50
Table 3.24 Part 3 (GCVR) Form AA Component Means, Standard Deviations, and
Table 3.25 Part 3 (GCVR) Form AA Factor Loadings Single Factor Solution...........................51
Table 3.26 Part 3 (GCVR) Form AA Reproduced Correlation Matrix Single Factor Solution ....51
Table 3.27 Part 3 (GCVR) Form BB Component Means, Standard Deviations, and
Table 3.28 Part 3 (GCVR) Form BB Factor Loadings Single Factor Solution...........................52
Table 3.29 Part 3 (GCVR) Form BB Reproduced Correlation Matrix Single Factor Solution ....52
Table 3.30 Part 3 (GCVR) Form CC Component Means, Standard Deviations, and
Table 3.31 Part 3 (GCVR) Form CC Factor Loadings Single Factor Solution ..........................52
Table 3.32 Part 3 (GCVR) Form CC Reproduced Correlation Matrix Single Factor Solution....53
Table 3.33 Listening Form BB, GCVR (Forms AA, BB, CC), and Composition Component
Means, Standard Deviations, and Correlation Matrix .............................................53
Table 3.34A Listening Form BB, GCVR (Forms AA, BB, CC), and Composition
Factor Pattern Loadings Two Factor Solution........................................................54
Table 3.34B Listening Form BB, GCVR (Forms AA, BB, CC), and Composition
Factor Structure Loadings Two Factor Solution.....................................................54
Table 3.35 Listening Form BB, GCVR (Forms AA, BB, CC), and Composition Reproduced
Correlation Matrix Two Factor Solution..................................................................54
Table 3.36 Listening Form CC, GCVR (Forms AA, BB, CC), and Composition Component
Means, Standard Deviations, and Correlation Matrix .............................................55
Table 3.37A Listening Form CC, GCVR (Forms AA, BB, CC), and Composition
Factor Pattern Loadings Two Factor Solution........................................................55
Table 3.37B Listening Form CC, GCVR (Forms AA, BB, CC), and Composition
Factor Structure Loadings Two Factor Solution.....................................................56
Table 3.38 Listening Form CC, GCVR (Forms AA, BB, CC), and Composition Reproduced
Correlation Matrix Two Factor Solution..................................................................56
Table 3.39 MELAB Scores for Those Claiming English as Their Native Language and
MELAB Total Group Scores..................................................................................57
Table 3.40 Brief Proficiency Descriptions for MELAB Writing and Speaking Ratings ...............58
Table 3.41 MELAB/TOEFL Descriptive Statistics....................................................................60
Table 3.42 Descriptive Statistics for Teacher Assessment Validity Study ................................61
Table 3.43 Relationship Between MELAB Scores and Teacher Ranking of Students..............62
1
SECTION 1: GENERAL INFORMATION ABOUT THE MELAB
1.1 AN OVERVIEW OF THE MELAB
The Michigan English Language Assessment Battery (MELAB) is an examination whose purpose
is to evaluate the advanced level English language competence of adult non-native speakers of
English. The MELAB assesses both spoken and written English:
Part 1 is an impromptu composition, written on an assigned topic;
Part 2 is a listening test, delivered via tape recording;
Part 3 is a written test containing grammar, cloze reading
1
, vocabulary, and
reading comprehension problems.
A speaking test is optional. The local examiner provides an oral rating based
on an oral interview.
The MELAB was developed to assess the English language proficiency of students who are
applying to U.S. and Canadian universities, colleges, or community colleges where the language
of instruction is English. The MELAB is also used to assess the general English language
proficiency of professionals such as medical personnel, engineers, managers, and government
officials who will need to use English in their work or in on-site training in the U.S. Other
individuals who take the MELAB are non-native speakers interested in obtaining a general
estimate of their English language proficiency to help them make decisions about applying for
educational or employment opportunities.
Many educational institutions in the U.S. and Canada accept the MELAB as an alternative to the
TOEFL
2
as evidence of English language proficiency. International organizations such as the
World Health Organization and the International Monetary Fund use the MELAB when they need
evidence of English language proficiency of fellowship and scholarship candidates. State
professional boards such as Boards of Nursing use MELAB scores as an indicator of English
proficiency when non-native speakers of English apply for certification exams.
The MELAB is a secure test battery. The test forms included in the battery are not commercially
available.
3
The MELAB is administered only by the English Language Institute--University of
Michigan (ELI-UM) and official examiners in the U.S. and around the world who are authorized by
the ELI-UM. A permanent team of testing professionals at the ELI-UM develops the MELAB. A
permanent staff in Ann Arbor oversees all registration for the battery, scores all test papers, and
issues all official score reports.
MELAB score reports include scaled scores for the different parts of the test battery as well as a
Final MELAB score, which is the mean (average) of the scores on Parts 1, 2, and 3. Scores on
the optional speaking test are not averaged with the other part scores. Brief biographical
information, the test date, and the test location also appear on score reports.
Examinees receive one copy of their MELAB scores and may request that the ELI-UM send
official score reports directly to particular colleges, universities or professional organizations. All
official score reports are embossed and sent out by the ELI-UM. Admissions officers are
cautioned not to accept copies of score reports directly from students. Scores are considered
current and valid for two years. No score reports are issued for tests taken more than two years
in the past.

1
Cloze reading refers to a test method requiring the examinee to identify the words that have been deleted from a text.
2
Test of English as a Foreign Language (TOEFL). Princeton, NJ: Educational Testing Service.
3
Other English language tests developed at ELI-UM are available to educators. Retired forms of MELAB Part 2 and 3
are made available as the MELICET (Michigan English Language Institute College English Test) through ELI Test
Publications.
2
1.2 MELAB ADMINISTRATION
The ELI-UM oversees the administration of all MELABs. MELABs are administered at the ELI-
UM in Ann Arbor, Michigan, and by some 300 approved MELAB examiners around the world
following uniform test administration procedures.
4
MELABs are administered biweekly to groups
of examinees at the ELI-UM in Ann Arbor and regularly at certain approved group test centers;
however, generally MELABs around the U.S. and the world are arranged and administered on an
individual basis with scheduling to accommodate the needs and constraints of the examiner and
the examinee.
In Ann Arbor, Michigan, MELABs are administered by trained staff of the ELI testing division.
Official MELAB examiners who administer the test elsewhere are generally educators who have
applied to serve as MELAB examiners and have met the following selection criteria:
permanent affiliation with an educational institution;
native or near native proficiency in English;
some knowledge of standardized testing and testing procedures;
professional background in evaluation, educational measurement, guidance and counseling,
admissions, or ESL/EFL;
personal qualities necessary in a person responsible for the administration of a secure
examination to individuals and to large groups of examinees.
Test administration instructions are provided to all MELAB examiners, and examiners regularly
receive updating on policy or administration changes. Examiners are monitored for compliance
with established MELAB administration procedures. Many MELAB examiners have extensive
experience in the administration of English language proficiency tests.
An individual who wants to take the MELAB completes the registration form printed in the MELAB
Information Bulletin
5
and sends it, along with appropriate test fees, to the ELI-UM. Upon receipt
of the form and fees, the ELI-UM staff checks test records to determine whether the individual
meets MELAB eligibility requirements. To be eligible for the MELAB, a person must not have
taken the test more than 3 times in a 12 month period and must wait at least 6 weeks between
MELABs. Eligible persons who have taken the MELAB before are assigned different forms of the
test than the forms they took earlier.
Once the staff verifies an individual's eligibility, the ELI-UM sends that person the name and
address of a local MELAB examiner and an official identification form to present to the examiner.
Simultaneously, the examiner is sent the prospective examinee's name and appropriate testing
materials. It is the examinee's responsibility to contact the examiner to arrange a test date within
six months of issuance of the identification form.
Some individuals take the MELAB at the request of a sponsoring agency (such as the World
Health Organization). These people are registered by their sponsoring agency, which also pays
their test fees. After verifying their eligibility, the ELI-UM sends sponsored candidates the official
identification form and the name and address of whom to contact to take the examination.
Group Testing Centers register groups of examinees and send rosters of these potential MELAB
examinees to the ELI-UM, which verifies that these examinees are eligible for testing. Each
Group Testing Center sets its own test date(s).
Whether taking the MELAB individually or at a group center, every examinee must present proper
identification to the examiner on the day of the test. This identification is:

4
Appendix A provides a listing of MELAB examination centers.
5
The MELAB Information Bulletin and registration forms are available from the English Language Institute, Testing and
Certification Division, free of charge.
3
a properly-completed MELAB identification form with two recent photographs of the
examinee attached; and
two forms of photo identification (including signed passport, alien registration card,
or national identity card).
The identification forms and photographs are sent to the ELI-UM with the examinee's test papers
where they are kept on file for two years.
All completed test papers are sent to the ELI-UM for scoring. Score reports are sent by U.S. Mail
to examinees and schools seven to fourteen days after test papers are received in Ann Arbor.
For examinees who need score reports faster, there is a rush service available. For an additional
fee, test results are sent by courier directly to an admissions office within 48 hours of the time the
ELI-UM receives the test papers.
Unless examinees order rush service, the total "turnaround" time, from the time test registration
forms are sent to the ELI-UM until the time score reports are received, is six to eight weeks. The
time may be shorter for people tested in the U.S. or registered by a sponsor.
1.3 DESCRIPTION OF THE MELAB
1.3.1 MELAB Parts and their Components
6
Part 1: Composition. The first part of the battery is a 30-minute writing task. Examinees are
instructed to write an essay on an assigned topic, or prompt. Two prompts are given, and
examinees select the one on which they prefer to write. Examinees are expected to write
between 200 and 300 words on the topic they choose. Examinees are instructed to ask the
MELAB examiner to explain or translate the topics if they do not understand them and are
advised to make a short outline if they wish. They are instructed that extremely short essays
receive a lower rating and that their handwriting should be readable. They are instructed to
change or correct parts of their essay as they wish but that they should not copy the whole
composition over. They are informed that their composition is judged on clarity and overall
effectiveness, topic development, organization, and range, accuracy and appropriateness of
grammar and vocabulary. Examinees are informed that compositions written on topics other than
the assigned topic are not assigned a rating. The topics typically require the examinee to take a
position on an issue and defend it, to describe something from personal experience, or to explain
a problem and offer possible solutions. Some sample composition topics are:
1. What are the characteristics of a good teacher? Explain and give examples.
2. How should students be evaluated: according to their achievements or their effort? Discuss.
3. What do you think is your country's greatest problem? Explain in detail and tell what you
think can be done about it.
4. Would you prefer to live in the city or in the country? Explain the reasons for your choice.
Part 2: Listening. In this part of the battery, examinees hear 50 test items delivered via an audio
tape recording lasting about 25 minutes. Examinees are informed that the purpose of Part 2 is to
assess how well they understand spoken English. Test instructions are presented via audio tape
and in writing, and examinees are given the opportunity to ask questions about test procedure
prior to beginning the listening test.
The spoken discourse in the listening test includes short questions and statements and longer
discourse segments. Questions, statements, short conversations, a mini-lecture on a topic of
general interest followed by questions and a longer conversation followed by questions are

6
For more information about the content of Part 1, see Section 3.2.1.1; of Part 2, see Section 3.2.1.2; of Part 3, see
Section 3.2.1.3; of the Speaking Test, see Section 3.2.1.4
4
Table 1.1 Examples of MELAB Part 2 (Listening) Test Items
Type of item Instructions Aural Cue Answer choices
Short question Select the best
answer to the
question.
Have you been to
see the new movie
yet?
a. Yes, I'm going tomorrow.
b. No, it wasn't very good.
c. Yes, I went yesterday.
Short statement Select the answer
that means about
the same as what
you hear.
Frank never would've
gone to the lecture if
he'd known how
boring it was going to
be.
a. He didn't want to go.
b. He didn't like it.
c. He never went.
Short
conversational
exchange
Select the answer
that means about
the same as what
you hear.
M: Let's go to the
football game.
F: Good idea. I don't
want to stay home.
a. They will go to a game.
b. They will stay home.
c. They don't like football.
Statement with
emphasized
segment
Select what the
speaker will say
next.
Tom said he was
going to drive to
Chicago next
week . . .
a. not last week.
b. not next month.
c. not fly.
Question with
emphasized
segment
Select the best
response to the
question.
Do you have John's
keys?
a. No, but Jane does.
b. No, I have Jim's.
c. No, only his bags.
Questions about
content of a
mini-lecture
Listen to a short
lecture. Take
notes about what
you hear. Look
at a graph or
chart in the test
booklet. Answer
questions about
the lecture
referring to notes,
chart, and graph.
Mini-lecture followed
by several questions.
3 printed answer choices for
each question
Questions about
content of a
conversation
Listen to a
conversation,
look at a map or
diagram
associated with
the conversation,
take notes while
listening. Answer
questions
referring to the
map or diagram
and notes.
Conversational
discourse lasting 4-5
minutes followed by
several questions.
3 printed answer choices for
each question
5
Table 1.2 Examples of MELAB Part 3 (GCVR) Test Items
Type of item Instructions Problem stem Answer choices
Grammar Choose the word
or phrase that
best completes
the conversation.
"The boys say they
were treated
unfairly."
"They got the same
treatment _____
everyone else."
a. than
b. that
c. so as
d. as
Cloze reading Select answers
that are
appropriate in
both grammar
and meaning.
A single reading
passage of about
250 words with 20
words missing--
approximately every
7th word appearing
as a blank in the text.
Four answer choices are
presented for each missing
word
Vocabulary
synonym
Choose the word
or phrase that
means about the
same thing as the
underlined word
or phrase.
Bill Collins launched
his restaurant last
June.
a. moved
b. started
c. sold
d. bought
Vocabulary
completion
Choose the word
or phrase that
best fits the
context.
I disagree with a few
of his opinions, but
_____ we agree.
a. deliberately
b. conclusively
c. essentially
d. immensely
Reading Read the
passage and
answer the
questions
following it
according to
information given
in the passage.
A reading selection
of 150-300 words
followed by,
generally, five
questions.
4 answer choices for each
reading comprehension
question
delivered only once via audio recording. The recording is in standard American English, with
male and female speakers speaking at a normal rate. The mini-lectures are delivered at an
average rate of about 150 wpm. A pause of 12-15 seconds follows each aurally presented test
item. Examinees are instructed to select answers from three response options presented in
multiple-choice format in a test booklet and to mark their answers on a separate answer sheet.
Examinees are advised to take notes as they listen to the lecture and longer conversation. The
notes may be used when answering questions.
Part 3: Grammar, Cloze, Vocabulary, Reading (GCVR). The third part of the battery is a 100-
item grammar, cloze, vocabulary, and reading comprehension test. Examinees are allotted one
hour and 15 minutes (75 minutes) for this part of the MELAB. There are 30 grammar items, 20
cloze items, 30 vocabulary items, and 20 reading comprehension questions. Examinees select
responses from four multiple-choice options. The grammar and vocabulary items are discrete-
type items; the cloze items are within the context of a single passage of written text; and the
reading items are based on four different passages of text on different topics.
Speaking Test. The speaking test is an optional part of the MELAB. Interviews lasting 10 to 15
minutes are conducted individually by the examiner administering the MELAB. ELI-UM provides
interviewing guidelines to the examiner that suggest a 3-part framework for the interview: an
opening warm-up phase, a main part to elicit extended discourse from the examinee, and a
closing phase. The interview might include questions about the examinee's background, future
6
plans, and opinions on certain issues. It might elicit discourse about the examinee's field of
specialization.
The examiner rates the examinee's general command of spoken English. The examiner
considers fluency, intelligibility, grammar, vocabulary, comprehension, and functional language
use.
1.3.2 How the MELAB relates to the former Michigan Battery
The ELI-UM has had a language proficiency testing program for many years.
7
Through the years
new components of the proficiency battery have been developed. In 1985, the old "Michigan
Battery" was replaced with the present Michigan English Language Assessment Battery
(MELAB).
MELAB Part 1, the written composition, is essentially the same type of writing task as was in the
Michigan Battery, but the essay topics are continually up-dated. The level rating descriptions
have undergone complete revision, and a coding system has been established to provide
feedback to examinees about salient features of their written composition.
MELAB Part 2, the listening test, replaced the Listening Comprehension Test (LCT)
8
. Compared
to the LCT, MELAB Part 2 was designed to place less emphasis on "aural grammar" and
expanded emphasis on the comprehension of naturally spoken English. Part 2 now includes
items that require understanding the prosodics of spoken English, as well as two segments of
longer discourse, in the form of a lecture and a conversation. In the piloting of the first forms of
MELAB Part 2, examinees were given both an LCT and a MELAB Part 2. Forms BB and CC of
MELAB Part 2 correlated .65 and .67 respectively with LCT.
9
Correlation coefficients of the new
item types (emphasis comprehension items and items based on the comprehension of extended
discourse) with the LCT items ("aural grammar" items) range from .42 to .52. These coefficients
indicate that there is only a moderate relationship among the skills tested by the new and the old
item types. Typically scores on MELAB Part 2 are significantly lower than on the LCT (by an
average of 10 points). Therefore, it is crucial that users of MELAB scores do not consider
MELAB scores to be equivalent to scores on the old Michigan Battery. Any proficiency test
cut scores established prior to 1985 (that is, established for the old Michigan Battery rather than
the MELAB) must be re-examined.
MELAB Part 3 replaced the Michigan Test of English Language Proficiency
10
(MTELP). The
number of grammar and vocabulary items was reduced, and a cloze reading passage (a longer
discourse segment with items focusing on both grammatical appropriateness and text
comprehension) was added. Correlation of various forms of MELAB Part 3 (Forms AA, BB, CC)
with the MTELP (Forms P, Q, R) yield coefficients ranging from .88 to .91 which suggests that
Part 3 of the MELAB may be more similar to its predecessor component, the MTELP, than Part 2
of the MELAB is to its predecessor component, the LCT.
The speaking test, an oral interview, remains an optional component of the battery. The MELAB
Spoken English Reference Sheet, though, reflects an extensive revision of the level descriptors
used in oral interviews given with the old Michigan Battery.

7
Information about tests in former Michigan proficiency batteries is available in Appendix B: Historical Background
Leading to the MELAB.
8
Listening Comprehension Test (LCT). (1972). Ann Arbor, MI: English Language Institute, The University of Michigan.
9
MELAB piloting is conducted on non-native speakers of English who contact the English Language Institute because
they need an assessment of their English language proficiency. Pilot MELAB listening tests, Forms AA, BB, and CC were
administered under normal test conditions with subjects representative of the range of language backgrounds and
proficiency levels typical of MELAB candidates; Form AA N=106, Form BB N=215, Form CC N=178. Pilot testing
occurred during 1983-85. In 1986, Form AA, a 40-item listening test, was dropped as it was decided to use a 50-item
operational format.
10
Michigan Test of English Language Proficiency (MTELP). (1968, 1971, 1979). Ann Arbor, MI: English Language
Institute, The University of Michigan.
7
1.4 SCORING OF THE MELAB
MELAB scores are reported on an official score report form. A score for each of the parts is
reported, as well as the Final MELAB Score, which is the average of the scores for Part 1, Part 2,
and Part 3. Examinees who have had the speaking test have a oral rating reported. The oral
rating is not averaged into the Final MELAB Score. Other information about test performance
may be reported in the verbal comments section of the score report. A sample score report form
is shown in Appendix C.
Scores on the various MELAB parts are reported on a numerical scale:
Section Range Note
Part 1 (Composition) 53 - 97
May be supplemented by codes
Part 2 (Listening) 30 - 100
Part 3 (GCVR) 15 - 100
Final MELAB Score 33 - 99
Average of Parts 1, 2 and 3
Speaking 1 - 4+
Optional. May be supplemented by interviewer comments
1.4.1 Part 1 Composition Score
MELAB compositions are rated on how clearly and effectively ideas are communicated in written
English. Each MELAB composition score is a numerical score (ranging from 53 to 97) that may
be followed by one or more letter codes. The numerical score represents the general level of
writing proficiency evident in the composition. Letter codes represent features of the writing that
the raters found especially strong or weak in relation to the overall level of the writing.
Compositions are rated or scored by a small group of trained, experienced raters at the ELI-UM
who assess compositions daily. Each composition is scored by at least two raters who read
independently and independent of knowledge about how the examinee performed on other parts
of the MELAB. Compositions are typically read in small batches, and their order is altered by the
second rater to minimize any effect that the position of a composition within a set of compositions
might have on the score it is given.
A rater assigns one of ten numerical scores to a composition (53, 57, 63, 67, 73, 77, 83, 87, 93,
or 97). If the scores the first two raters assign to a single composition are identical or only one
scale point apart, the composition is assigned the average of these two scores. Essays for which
the two initial ratings differ by more than one scale point are scored by a third rater. When the
initial ratings are two scale points apart and the third rating falls between them, the middle score
is used. In all other cases, the composition is given the average of the two scores that are
closest to (or equal to) each other. A third rater is also used in cases where there is a large
discrepancy between an examinee's composition score and scores on other parts of the MELAB.
Approximately 8% of MELAB compositions are read by a third rater.
Description of MELAB scores by level:
97 Topic is richly and fully developed. Flexible use of a wide range of syntactic (sentence level)
structures, accurate morphological (word forms) control. Organization is appropriate and
effective, and there is excellent control of connection. There is a wide range of appropriately
used vocabulary. Spelling and punctuation appear error free.
93 Topic is fully and complexly developed. Flexible use of a wide range of syntactic structures.
Morphological control is nearly always accurate. Organization is well controlled and appropriate
to the material, and the writing is well connected. Vocabulary is broad and appropriately used.
Spelling and punctuation errors are not distracting.
87 Topic is well developed, with acknowledgment of its complexity. Varied syntactic structures
are used with some flexibility, and there is good morphological control. Organization is controlled
and generally appropriate to the material, and there are few problems with connection.
8
Vocabulary is broad and usually used appropriately. Spelling and punctuation errors are not
distracting.
83 Topic is generally clearly and completely developed, with at least some acknowledgment of
its complexity. Both simple and complex syntactic structures are generally adequately used;
there is adequate morphological control. Organization is controlled and shows some appropriacy
to the material, and connection is usually adequate. Vocabulary use shows some flexibility, and
is usually appropriate. Spelling and punctuation errors are sometimes distracting.
77 Topic is developed clearly but not completely and without acknowledging its complexity.
Both simple and complex syntactic structures are present; in some "77" essays these are
cautiously and accurately used while in others there is more fluency and less accuracy.
Morphological control is inconsistent. Organization is generally controlled, while connection is
sometimes absent or unsuccessful. Vocabulary is adequate, but may sometimes be
inappropriately used. Spelling and punctuation errors are sometimes distracting.
73 Topic development is present, although limited by incompleteness, lack of clarity, or lack of
focus. The topic may be treated as though it has only one dimension, or only one point of view is
possible. In some "73" essays both simple and complex syntactic structures are present, but
with many errors; others have accurate syntax but are very restricted in the range of language
attempted. Morphological control is inconsistent. Organization is partially controlled, while
connection is often absent or unsuccessful. Vocabulary is sometimes inadequate, and
sometimes inappropriately used. Spelling and punctuation errors are sometimes distracting.
67 Topic development is present but restricted, and often incomplete or unclear. Simple
syntactic structures dominate, with many errors; complex syntactic structures, if present, are not
controlled. Lacks morphological control. Organization, when apparent, is poorly controlled, and
little or no connection is apparent. Narrow and simple vocabulary usually approximates meaning
but is often inappropriately used. Spelling and punctuation errors are often distracting.
63 Contains little sign of topic development. Simple syntactic structures are present, but with
many errors; lacks morphological control. There is little or no organization, and no connection
apparent. Narrow and simple vocabulary inhibits communication, and spelling and punctuation
errors often cause serious interference.
57 Often extremely short; contains only fragmentary communication about the topic. There is
little syntactic or morphological control, and no organization or connection are apparent.
Vocabulary is highly restricted and inaccurately used. Spelling is often indecipherable and
punctuation is missing or appears random.
53 Extremely short, usually about 40 words or less; communicates nothing, and is often copied
directly from the prompt. There is little sign of syntactic or morphological control, and no apparent
organization or connection. Vocabulary is extremely restricted and repetitively used. Spelling is
often indecipherable and punctuation is missing or appears random.
N.O.T. N.O.T. (Not On Topic) indicates a composition written on a topic completely different
from any of those assigned; it does not indicate that a writer has merely digressed from or
misinterpreted a topic. N.O.T. compositions often appear prepared and memorized. They are
not assigned scores or codes.
Since September, 1989, in addition to giving a numerical score to a composition, raters have had
the option of assigning letter codes. Any code assigned to the same composition by two or more
raters is reported along with the numerical score to provide additional interpretive information to
both examinees and institutional score users. A letter code means that one feature of the writing
was especially strong or weak for the particular score level, though not strong or weak enough to
raise or lower the overall score. For example, the vocabulary used could be especially broad for
a score level of 77, but not strong enough to raise the overall score to 83. Codes, like number
scores, are assigned independently by 2 to 3 trained raters.
Code letters do not raise or lower number scores. The codes do not replace the number score;
they add detail to it. There are 20 codes. Each describes one feature of writing. None of these
features works alone, but each one can affect writing quality. A list of the code letters and their
9
meanings is given below. The alphabetical order of the codes has no positive or negative
meaning. For example, 'a' does not mean excellent, and 'f' does not mean poor or failure.
Letters are used simply as a convenient means for reporting score information.
Key to composition score codes:
NOTE: the codes are meant to indicate that a certain feature is ESPECIALLY GOOD OR BAD IN
COMPARISON TO THE OVERALL LEVEL OF THE WRITING
a topic especially poorly or incompletely developed
b topic especially well developed
d organization especially uncontrolled
e organization especially well controlled
f connection especially poor
g connection especially smooth
h syntactic (sentence level) structures especially simple
i syntactic structures especially complex
j syntactic structures especially uncontrolled
l especially poor morphological (word forms) control
m especially good morphological control
n vocabulary especially narrow
o vocabulary especially broad
p vocabulary use especially inappropriate
r spelling especially inaccurate
s punctuation especially inaccurate
t paragraph divisions missing or apparently random
v question misinterpreted or not addressed
w reduced one score level for unusual shortness
x other (write-in: see score report)
10
1.4.2 Part 2 Listening Score
The listening score reported is a converted or scaled score. The raw score, the total number of
test items answered correctly, is converted to a scaled score. There are 50 test items on the
listening test. The scale range in the conversion scale for Part 2 of the MELAB is 30-100. A
chance, or guessing, score is about 45 (scaled score). The average score is about 75.
The specific conversion scale for each form varies from one form to the other. Conversion scales
for various MELAB forms are based on normative information (primarily percentile rank) on
alternate forms of the tests.
11
Converted scores are not percentage scores nor the exact number
of problems answered correctly. The following table is extrapolated from the conversion table of
four forms of MELAB Part 2 and provides a rough indication of the relationship between the raw
and scaled scores of MELAB Part 2.
Table 1.3 MELAB Part 2 (Listening) Score Converted from Raw Score
Raw Score
(number correct)
Raw Score
(% correct)
Part 2: MELAB
Scaled Score
47 - 48 94 - 96 95
42 - 43 84 - 86 90
37 - 40 74 - 80 85
30 - 35 60 - 70 80
25 - 29 50 - 58 75
23 - 25 46 - 50 70
20 - 21 40 - 42 65
18 33 45
1.4.3 Part 3 Grammar, Cloze, Vocabulary, and Reading (GCVR) Score
The MELAB Part 3 (GCVR) score, like the Part 2 (Listening) score, is a scaled score. There are
100 test items on Part 3. The total number answered correctly, the raw score, is converted to a
scaled score. The score range in the conversion scale for MELAB Part 3 is 15-100. A chance, or
guessing score is about 40 (scaled score). The mean score is about 75. The specific conversion
scale for each form varies from one form to the other. The following table is a rough indication of
the relationship between raw and scaled scores of MELAB Part 3.
Table 1.4 MELAB Part 3 (GCVR) Score Converted from Raw Score
Raw Score
(number correct)
Raw Score
(% correct)
Part 3: MELAB
Scaled Score
89 - 93 89 - 93 95
78 - 83 78 - 83 90
66 - 75 66 - 75 85
59 - 66 59 - 66 80
52 - 57 52 - 57 75
44 - 52 44 - 52 70
38 - 46 38 - 46 65
25 25 40
Note that the scaled score that is reported for MELAB Part 2 and Part 3 is neither a percentage
score nor the exact number of problems answered correctly. The information in Tables 1.3 and

11
The initial normative group used to establish conversion scales for a new test form is composed of approximately 100
MELAB examinees who each take two alternate forms. Various factors such as linguistic background and proficiency
level are considered in selection of subjects for a test norm group. Conversion scales are re-analyzed later with a larger
subject pool, and slight adjustments may be made in the scale.
11
1.4 is provided only to increase understanding of the relationship between raw scores and
MELAB reported scaled scores. Raw scores are not reported on the score report form.
Generally both Part 2 and Part 3 of the MELAB are at present hand-scored with a scoring stencil.
If an examinee questions the accuracy of the scoring, the answer sheets are re-scored. If any
discrepancies are found, all MELAB reports are immediately corrected, and revised reports are
distributed. There is no fee for this service. Continued monitoring of MELAB scoring procedures
has revealed a consistently high level of accuracy in the scoring and reporting of MELAB scores.
1.4.4 Final MELAB Score
The Final MELAB score is the average of the scores of Parts 1, 2, and 3 (e.g. 73 + 87 + 76 = 236
divided by 3 = 79).
Occasionally, instead of a numerical score, the letters "NFS" appear on a score report form.
"NFS" stands for "No Final Score." "NFS" appears if the English Language Institute Testing
Division can not report a final score because (1) the examinee left the room and did not complete
the examination; or (2) the examinee received no Part 1 score, e.g. wrote off topic so that "NOT"
(see Section 1.4.1) was assigned in place of a Part 1 score; or (3) some sort of cheating or test
compromise occurred prior to or during the MELAB test administration.
If the examinee takes the speaking test, the oral rating is never averaged in with the Final MELAB
score. It is always reported separately on the score report form.
1.4.5 Speaking Test (optional)
The speaking test is available to provide a rating of the examinee's spoken English. The score
on this part is not averaged in with the three other parts of the MELAB. The MELAB oral score
provides information about face-to-face communicative use of English. The oral rating is arrived
at independently of the other MELAB scores and thus provides information that may complement
or confirm interpretation of results on other parts of the MELAB.
MELAB oral scores range from 1 to 4, with 4 being the highest. An examiner who thinks the
examinee's spoken English is between levels may add a plus (+) or minus (-) to the score, for
example, 3+, or 2-. The average oral rating is 2+. In addition to an overall rating, the examiner
may comment on:
features of the examinee's spoken English
fluency/intelligibility
grammar/vocabulary
functional language use/sociolinguistic proficiency
listening comprehension
The examiner's observations of salient features of the examinee's spoken English are reported on
the score report form. A reference sheet provides examiners with descriptions of each score
level and of salient features of spoken English. A copy of this reference sheet appears in
Appendix D.
12
1.5 USING THE MELAB
1.5.1 Interpreting Scores
The MELAB is developed to evaluate the English language proficiency of NNSs interested in
pursuing academic studies in English at the college and university level. The component items
and parts are trialed on NNSs of English, primarily in their late teens and twenties, both within
and outside the U.S.
MELAB scores provide information on the general language proficiency of the examinee at the
time the individual took the test battery. Examinees with higher Final MELAB scores tend to be
judged more proficient in English (as measured by "productive" tests--a written composition and a
speaking test) than those with lower Final MELAB scores. Table 1.5 shows speaking and writing
proficiency level information of examinees who scored in various Final MELAB score ranges.
This information suggests that examinees with higher Final MELAB scores have higher speaking
and writing proficiency than those with lower Final MELAB scores. However, there is some
overlap in the level of the productive skills of examinees in different Final MELAB score ranges.
For example, some individuals with Final MELAB scores in the high 70's have speaking and
writing profiles similar to some individuals with Final MELAB scores in the low 80's. This
suggests that some variability in speaking and writing proficiency is to be expected with reference
to specific Final MELAB scores and that it is inappropriate to make rigid interpretations of what
scores mean with regard to language proficiency.
Table 1.5 Final MELAB Scores and Proficiency Levels in Speaking and Writing
1
Final MELAB
Score Level
Typical Rating
Speaking
2
Proficiency
Speaking
3
Typical Rating
Writing
2
Proficiency
Writing
4
Below 60 1+ -- 3 limited to capable 60 -- 67 limited
60-69 2 -- 3 modest to capable 65 -- 73 limited to basic
70-79 2+ -- 3 modest to capable 70 -- 77 basic
80-89 3 -- 3+ capable 77 -- 83 basic to good
90+ 3+ -- 4 capable to very good 87 -- 93 good to very good
1
The subjects were 1705 MELAB examinees of varied linguistic backgrounds who took a MELAB exam with a
speaking test between 1987 and 1990; about 30% of the exams were conducted in the U.S. or Canada and 70%
elsewhere around the world. The speaking test they took was the MELAB Speaking Test; the writing test was MELAB
Part 1. It should be noted that the Final MELAB scores are, therefore, not completely independent of the writing scores
used in this study.
2
Typical score: scores of those in the 25th percentile to 75th percentile (the interquartile range)
3
Proficiency Speaking: very good/good = speaking at 4 or 4+ level
capable = speaking at 3 or 3+ level
marginal/modest = speaking at 2 or 2+ level
limited = speaking at 1 or 1+ level
4
Proficiency Writing: very good = writing at 93 or 97 level
good = writing at 83 or 87 level
basic = writing at 73 or 77 level
limited = writing at 57, 63 or 67 level
It is important to remember that MELAB scores are only estimates of examinees' true proficiency.
MELAB scores, like all test scores, are affected by measurement error. A MELAB score may be
influenced by factors unrelated to an examinee's language proficiency. Such factors might
include temporary characteristics unique to the individual (e.g. fatigue, anxiety, illness). Individual
personal characteristics with regard to background and personality as well as aspects of test
method can also affect test performance. Consequently, it is always appropriate to be cautious
about interpreting scores on language tests.
A useful statistic to consider when interpreting test scores is the standard error of measurement,
or SEM. The SEM estimates the extent to which "observed" test scores (here, MELAB scores)
13
and "true" test scores (hypothetical scores free of all measurement error) differ. The SEM of the
MELAB is estimated to be approximately 3 points. As an example of how to consider the SEM
when interpreting MELAB scores, consider the case of two examinees, Examinee A, who scores
81 on the MELAB, and Examinee B, who scores 79 (see Table 1.6). It is not possible to say
whether these "observed" scores are the same as the examinees' "true" scores. What can be
inferred, though, considering the SEM, is that 68 out of 100 times, Examinee A's "true" score will
fall between 78 and 84; and Examinee B's will fall between 76 and 82. Because these score
ranges overlap, we cannot be sure that Examinee A is more proficient than Examinee B. To say
that there is a statistically significant (at the .05 level) difference in the proficiency of these two
individuals, there would have to be at least a 6 point difference in their scores.
Table 1.6 Examples of MELAB Scores
Examinee
Part I
Composition
Part 2
Listening
Part 3
GCVR
Final
MELAB Score
A 77 84 82 81
B 77 81 79 79
C 65 86 73 75
D 77 62 85 75
When interpreting MELAB scores, both part scores and the final score should be considered. For
example, consider two individuals (C and D) who have the same final scores but quite different
part scores (see Table 1.6).
Even though C and D have the same final MELAB score, a 75, their scores on the three parts
suggest differences in their language proficiency. Such differences may affect their ability to use
English effectively in different contexts.
When interpreting language proficiency test scores, it is also important to consider that factors
other than language also affect how well someone can communicate. Lyle Bachman in
Fundamentals of Language Testing (1990)
12
theorizes that communicative language ability
consists of both knowledge of language and knowledge of the world. In the general context of
language used when pursuing academic studies in English, it follows that the ability to function in
this setting involves not only knowledge of English, but also other knowledge and skills such as
intellectual knowledge and study skills. Language is just one of many factors that affect success,
or lack of success, in an academic setting. Consequently, MELAB scores should not be used to
predict academic success or failure.
When MELAB scores are used by an institution to provide evidence of English language
proficiency, the institution must determine what level of English language proficiency is desirable
for that institutional context. Relevant factors to consider include the nature of the ESL services
available to non-native speakers of English and the linguistic demands of the instructional
context. Examples of how different types of institutions use MELAB scores appear in sections
1.5.2 to 1.5.4 below.
A MELAB score, like a snapshot, captures characteristics of a person at a particular time; and,
like a snapshot, it can become outdated. Because language proficiency may change over time,
MELAB score users should always consider how recently the test was administered when
interpreting a MELAB score. The ELI-UM will not issue score reports for tests taken more than
two years in the past.
We know that various factors (such as amount and quality of instruction, experience using the
language, motivation, language background, and individual differences in language ability) affect
the development of language proficiency, but these factors affect various people differently. It is
not possible to predict the length of time or type of instruction necessary for an individual to

12
Bachman, L. (1990). Fundamentals of language testing. Oxford: Oxford University Press.
14
demonstrate a particular performance level on the MELAB or to achieve a score gain of a
particular amount on the MELAB.
1.5.2 An Example of MELAB Use for a University Context
The University of Michigan is a large public university offering undergraduate and graduate
programs in a variety of disciplines. Generally, international students are admitted for full-time
academic study rather than part-time academic study. Upon matriculation, an international
student is expected to take a full academic load. Only non-native speakers with good or
advanced proficiency in English are considered for admission, but it is accepted that some
students may need some ESL work to enhance their written and oral communication skills.
During the academic year, a range of courses is available to help non-native speakers become
effective and fully participating members of the academic community. These ESL courses are
primarily known as English for Academic Purposes (EAP) courses and focus on improving written
and oral communication. Most of these courses are one credit hour courses that meet 2 or 3
hours a week for 14 weeks.
1.5.2.1 Undergraduates
At the University of Michigan, the various admissions officers inform applicants who are non-
native speakers of English that English proficiency requirements can be met with MELAB or
TOEFL scores.
Applicants are considered for admission to undergraduate study with Final MELAB scores above
80 and all part scores at 80 or above, or with TOEFL scores above 560 and all section scores at
56 or above. Rigid cut scores are not applied. All relevant information about English language
proficiency is used by admissions staff so the policy is applied in a flexible manner. Applicants
whose Final MELAB or part scores are 85 or lower, or whose TOEFL scores are 600 or lower or
whose section scores are 60 or lower (and also those without a Test of Written English
13
score of
at least 5.0) are generally required to have their English language proficiency re-evaluated upon
arrival. As a result of this on-campus testing, students may be required to take an EAP (English
for Academic Purposes) course. Typically, at UM about half of the entering undergraduates are
exempted from English language work. The typical requirement for the other half is one EAP
mini-course, usually a writing course that meets two hours a week. The EAP writing course must
be taken before the student enrolls in a regular university composition course. EAP courses are
taken concurrently with other academic course work.
1.5.2.2 Graduates
At the University of Michigan, applicants to graduate programs who already have received a
degree from an accredited U.S. institution are not required to provide evidence of language
proficiency with a MELAB or TOEFL score. However, they are evaluated for English proficiency
on campus prior to beginning their first term of admission. Applicants whose previous degree is
not from a U.S. institution or whose degree is from a U.S. institution where the majority of
instruction is in a language other than English are required to show evidence of English
proficiency through MELAB or TOEFL scores. Generally, applicants must have Final MELAB
scores of at least 80 or TOEFL scores of at least 560. Applicants to Biological and Physical
Science or Engineering generally need a Final MELAB score of at least 80 to be considered for
admission. Those applying to programs in the social sciences and humanities generally need a
Final MELAB score of at least 85 to be considered for admission. As is the case for
undergraduate students, there is flexibility when using language proficiency test scores for
admissions decisions, and graduate students who have a Final MELAB score of 85 or below, or
part scores of 85 or below (or TOEFL scores at or below 600/60 or TWE below 5.0) are required
to be re-evaluated upon arrival on campus. Typically, 65 percent of the graduate students

13
Test of Written English. Princeton, NJ: Educational Testing Service.
15
reassessed on campus are required to take supplementary EAP instruction of 2 to 4 hours a
week concurrently with their regular academic load.
We have found at the University of Michigan that MELAB scores need to be interpreted with
some flexibility and that factors such as educational background, first language of the learner,
and aural acuity can differentially influence the rate of development of proficiency. Although
generally a Final MELAB score of at least 80 has been the recommended minimum for
admission, there have been special cases in which a graduate student with a score in the low
70's has been recommended for admission. An individual may obtain a below average score on
Part 3 of the MELAB but obtain significantly above average scores on Part 1 (Composition) and
Part 2 (Listening), and the examinee might demonstrate effective communication strategies in the
speaking test. Such an individual might be viewed as nearly adequate and recommended for a
reduced course load the first term. Prior practical experience in the individual's area of graduate
study can also compensate for gaps in range and accuracy of English, and English language
improvement courses can be taken concurrently with graduate academic courses.
1.5.3 An Example of MELAB Use for a Community College Context
Washtenaw Community College (WCC) is a two-year community college in Michigan that
requires international applicants to provide evidence of English proficiency before it issues the
authorization form for a student visa. The minimum MELAB score for admission is 75, and the
minimum TOEFL score is 500. Upon arrival, these international students, along with all other
students, take an exam of basic skills in reading, writing, and mathematics (the ASSET test)
14
.
International students with low ASSET scores may be required to take one or more of WCC's
English as a second language classes which are offered for new international students and for
resident students for whom English is a second language.
1.5.4 An Example of MELAB Use for Professional Contexts
The MELAB was developed to give evidence of English proficiency for academic purposes.
However, the test may also be appropriate for assessing the proficiency of individuals who need
English for academic examinations certifying their professional competence. Because at least
modest proficiency in spoken and written English is necessary to succeed on a professional
examination in English, those with low MELAB scores may not be allowed to take certain
professional exams. One agency, the Michigan State Board of Nursing, Department of Licensing
and Regulation, has specified that the following MELAB scores must be obtained before sitting for
the licensing examination:
Final MELAB: not less than 75
Part scores: none less than 70
Oral rating: at least 3
Various professional agencies may wish to establish certain minimal scores to meet the specific
purposes for which examinees are having their proficiency evaluated. The English Language
Institute Testing Staff will work with professional agencies in designing studies to establish
appropriate MELAB scores for various academic and professional programs.

14
ACT ASSET Program. Iowa City, IA: The American College Testing Program (ACT).
16
1.6 PREPARING FOR THE MELAB
The MELAB is a general language proficiency test that is not linked to any particular book,
language study program, or course of study. The best way to develop proficiency in a language
is through active use of the language for communication, combined with study of materials that
widen exposure to the language. A variety of English language learning materials is available at
bookstores and libraries. Some materials are also available that give students practice with
multiple-choice tests. The ELI does not sell any particular test preparation materials; it does,
though, recommend A Student's Guide to the MELAB
15
by Mary Spaan to examinees who want
extra practice with test questions in the MELAB format. Examinees may also become familiar
with the format of the test by working through the sample problems in the free MELAB Information
Bulletin. They may prepare for the impromptu composition portion by writing on a topic for 30
minutes. They may prepare for the listening test by giving themselves frequent opportunities to
listen to spoken English.
1.7 TEST SECURITY/INVALIDATIONS
Test security is taken extremely seriously in an effort to ensure that MELAB test scores actually
reflect the proficiency of the examinee. The MELAB is never sold to students, educators,
libraries, or the general public. Examiners agree to keep all MELAB materials in a secure, locked
place when not in use. Examiners are knowledgeable about the testing procedures. A detailed
outline of their duties is given in the Administration Manual for MELAB Examiners
16
. Examiners
are instructed as to what procedures to follow if test compromise is suspected. If test
compromise occurs, the examinee's test will be invalidated and any schools that received scores
will be notified. The exam center will also be notified, and the examinee will not be allowed to test
again.
While it is rare, it does occasionally happen that examinees try to cheat on their MELAB. They
might try to have someone else take the test for them, or they might try to copy from other
examinees' test papers. Examiners thoroughly check identification before the test. Two
photographs that must look like the person appearing for the test are collected and sent with the
test papers to the ELI-UM. Test forms are alternated in administrations where more than one
examinee is being tested, which makes copying useless. Test papers and photos are kept on file
at the ELI-UM. Any institute or admissions office suspicious of an applicant's scores may request
a photograph and/or handwriting sample (from the written composition). Official score reports are
sent directly from the ELI-UM to an admissions office and bear an embossed seal over the Final
Score. Any report that appears tampered with or any examinee copy or photocopy of a MELAB
score report should not be accepted as official, and the ELI-UM should be notified.
On occasion, when an examinee takes the MELAB a second time, the scores on the two tests are
extremely different. We know from our years of experience testing thousands of examinees that
scores do not normally show extreme variation in a short time. For example, an examinee
scoring a 65 on the GCVR section might after six weeks take the test again and score in the 70's,
but it would be very unlikely for the second score to be in the high 80's or 90's. It is that sort of
variation that makes us take a closer look at tests, to see if there has been any test compromise.
Because a composition is an integral part of every MELAB, we have handwriting samples from
both test administrations. The examiners who administered the MELAB may be contacted when
investigating any irregularities. Examiners also inform ELI-UM if anything unusual happens while
they are administering the MELAB, and this is investigated by the ELI-UM.
We strongly encourage officials at institutions to contact the English Language Institute MELAB
testing office at the University of Michigan in Ann Arbor if there is any doubt about the authenticity
or veracity of a MELAB score report. This is a FREE service and all requests are handled
promptly.

15
Spaan, M. (1992). A student's guide to the MELAB. Ann Arbor, MI: University of Michigan Press.
16
Administration Manual for MELAB Examiners is available from the ELI-UM Testing Division.
17
SECTION 2: MELAB STATISTICS
2.1 GENERAL DESCRIPTIVE STATISTICS
Table 2.1 presents general descriptive test statistics (across all test forms) for MELAB part scores
and the final MELAB score. The statistics are based on the test papers of 4,811 examinees
taking the MELAB for the first time between 1991 and 1993.
1
Of these examinees, 67.8 percent
were tested in the United States or Canada; 32.2 percent were tested overseas. They spoke 78
different native languages. The most common native language was Chinese (20.2 percent).
Others of the ten most common languages are, in descending order of frequency, Arabic, Farsi,
Spanish, Japanese, Korean, Vietnamese, Russian, German, and Portuguese. Sixty-seven
percent of the sample spoke one of these ten languages. For mean MELAB scores of this group
classified by reason for testing, by sex, and by age, see Tables 2.7, 2.8, and 2.9, respectively.
Table 2.1 Score Descriptives for 4,811 First-Time MELABs Administered 1991-1993
Part 1
(Composition)
Part 2
(Listening)
Part 3
(GCVR)
Final
MELAB
Minimum Scaled Score 53 33 25 38
Maximum Scaled Score 97 100 100 99
Median Scaled Score 75 79 77 77
Mean Scaled Score 75.42 77.40 74.69 75.84
Standard Deviation 7.90 12.13 14.87 10.40
Reliability
1
.90 .89 .94 .91
SEM
2
2.50 4.02 3.64 3.12
1
Reliability figures were calculated using the mean interrater correlation for Part 1 (see Section 3.1; note that the set of
compositions used to calculate this coefficient is not identical to the set summarized in Table 2.1) and from KR 21
applied to raw scores for Part 2 and Part 3. The reliability estimate for the Final MELAB Score is the mean of these
estimates for Part 1, Part 2, and Part 3.
2
Standard error of measurement
Table 2.2 shows MELAB part scores and final scores that correspond to specified percentiles.
Using this table, it is possible to see what MELAB score (final or scaled part score) indicates that
a particular examinee scored higher than a certain percentage of all examinees tested.
For example, to find out what final MELAB score indicates that an examinee scored higher than
50 percent of all examinees tested, find 50 in the column labeled Percentile, and look at the
column on the far right. A final MELAB score of 77 corresponds to the 50th percentile (for final
MELAB scores). The table can also be used to see what MELAB part scores correspond to a
given percentile. For example, to find out what Listening Test score indicates that an examinee
got a higher Listening Test score than 50 percent of all the examinees, look at the column labeled
Part 2 (Listening). An examinee must score 79 on the Listening Test to be at the 50th percentile
(in terms of Listening Test scores).

1
The statistics are very similar to those calculated for tests taken prior to 1991 (see Appendix E for data on 13,588 first-
time MELABs administered between 1987 and 1990).
18
Table 2.2 MELAB Scaled Scores Corresponding to Specified Percentiles
(Based on 4,811 first-time MELABs administered 1991-1993)
Percentile Part 1
Composition
Part 2
Listening
Part 3
GCVR
Final
MELAB
99 95 98 99 96
95 90 93 95 91
90 87 91 93 89
85 85 89 91 87
80 83 88 88 85
75 80 86 86 83
70 80 85 85 82
65 77 84 83 81
60 77 82 81 80
55 75 81 79 78
50 75 79 77 77
45 75 78 75 76
40 73 76 72 74
35 73 75 70 72
30 70 73 67 71
25 70 70 65 69
20 67 67 62 67
15 67 64 58 64
10 65 60 53 62
5 63 54 46 57
0 53 33 25 38
2.2 FREQUENCY DISTRIBUTION OF MELAB SCORES FOR ALL EXAMINEES
Tables 2.3 through 2.6 give information about score distribution on the MELAB. First, they show
the number and the percent of examinees who obtained a particular score on MELAB (Part 1,
Part 2, Part 3, or on the Final MELAB). In the far right column of each table is the cumulative
percent that corresponds to each score. The cumulative percent for a given score point is the
percentage of examinees with that score or a lower score. Thus, if an examinee's score
corresponds to the cumulative percent of 70, that examinee scored as well as or better than 70
percent of all the examinees. These tables are based on MELABs (all forms) administered
between 1991 and 1993 (see Appendix E for similar information on MELABs administered
between 1987 and 1990).
19
Final
MELAB Score N
Percent with
the Score
Cumulative
Percent
99 3 0.1 100.0
98 7 0.1 99.9
97 14 0.3 99.8
96 22 0.5 99.5
95 22 0.5 99.0
94 39 0.8 98.6
93 48 1.0 97.8
92 76 1.6 96.8
91 80 1.7 95.2
90 95 2.0 93.5
89 99 2.1 91.6
88 128 2.7 89.5
87 137 2.8 86.8
86 154 3.2 84.0
85 118 2.5 80.8
84 157 3.3 78.3
83 196 4.1 75.1
82 165 3.4 71.0
81 181 3.8 67.6
80 207 4.3 63.8
79 173 3.6 59.5
78 191 4.0 55.9
77 176 3.7 51.9
76 168 3.5 48.3
75 142 3.0 44.8
74 166 3.5 41.8
73 138 2.9 38.4
72 150 3.1 35.5
Final
MELAB Score N
Percent with
the Score
Cumulative
Percent
71 139 2.9 32.4
70 123 2.6 29.5
69 126 2.6 27.0
68 126 2.6 24.3
67 103 2.1 21.7
66 85 1.8 19.6
65 103 2.1 17.8
64 96 2.0 18.7
63 102 2.1 13.7
62 83 1.7 11.6
61 64 1.3 9.8
60 46 1.0 8.5
59 39 0.8 7.5
58 37 0.8 6.7
57 62 1.3 6.0
56 50 1.0 4.7
55 42 0.9 3.6
54 21 0.4 2.8
53 23 0.5 2.3
52 31 0.6 1.8
51 16 0.3 1.2
50 11 0.2 0.9
49 10 0.2 0.6
48 7 0.1 0.4
47 5 0.1 0.3
46 3 0.1 0.2
44 5 0.1 0.1
38 1 0.0 0.0
Table 2.4 Frequency Distribution of MELAB Part 1 (Composition) Scores
MELAB
Part 1 Score N
Percent with
the Score
Cumulative
Percent
97 18 0.4 100.0
95 46 1.0 99.6
93 101 2.1 98.7
90 135 2.8 96.6
87 191 4.0 93.8
85 254 5.3 89.8
83 334 6.9 84.5
80 393 8.2 77.6
77 586 12.2 69.4
75 600 12.5 57.2
73 692 14.4 44.8
70 482 10.0 30.4
67 468 9.7 20.3
65 207 4.3 10.6
63 199 4.1 6.3
60 59 1.2 2.2
57 24 0.5 1.0
55 14 0.3 0.5
53 8 0.2 0.2
Table 2.3 Frequency Distribution of Final MELAB Scores
20
Table 2.5 Frequency Distribution of MELAB Part 2 (Listening) Scaled Scores
MELAB
Part 2 Score N
Percent with
the Score
Cumulative
Percent
100 18 0.4 100.0
98 71 1.5 99.6
96 54 1.1 98.2
95 22 0.5 97.0
94 67 1.4 96.6
93 43 0.9 95.2
92 132 2.7 94.3
91 112 2.3 91.5
90 159 3.3 89.2
89 198 4.1 85.9
88 103 2.1 81.8
87 203 4.2 79.7
86 209 4.3 75.4
85 180 3.7 71.1
84 200 4.2 67.3
83 146 3.0 63.2
82 224 4.7 60.2
81 124 2.6 55.5
80 131 2.7 52.9
79 150 3.1 50.2
78 121 2.5 47.1
77 167 3.5 44.6
76 195 4.1 41.1
75 165 3.4 37.0
74 122 2.5 33.6
73 81 1.7 31.1
72 134 2.8 29.4
71 43 0.9 26.6
70 124 2.6 25.7
MELAB
Part 2 Score N
Percent with
the Score
Cumulative
Percent
69 78 1.6 23.1
68 61 1.3 21.5
67 100 2.1 20.2
66 64 1.3 18.2
65 61 1.3 16.8
64 51 1.1 15.6
63 52 1.1 14.5
62 51 1.1 13.4
61 60 1.2 12.4
60 53 1.1 11.1
59 63 1.3 10.0
58 51 1.1 8.7
57 22 0.5 7.6
56 70 1.5 7.2
55 22 0.5 5.7
54 48 1.0 5.3
53 28 0.6 4.3
52 37 0.8 3.7
50 29 0.6 2.9
49 15 0.3 2.3
47 19 0.4 2.0
46 20 0.4 1.6
45 4 0.1 1.2
43 9 0.2 1.1
42 4 0.1 0.9
40 19 0.4 0.9
37 10 0.2 0.5
35 8 0.2 0.2
33 4 0.1 0.1
21
Table 2.6 Frequency Distribution of MELAB Part 3 (GCVR) Scaled Scores
MELAB
Part 3 Score N
Percent with
the Score
Cumulative
Percent
100 18 0.4 100.0
99 36 0.7 99.6
98 42 0.9 98.9
97 59 1.2 98.0
96 69 1.4 96.8
95 83 1.7 95.3
94 91 1.9 93.6
93 92 1.9 91.7
92 106 2.2 89.8
91 129 2.7 87.6
90 66 1.4 84.9
89 106 2.2 83.6
88 115 2.4 81.4
87 169 3.5 79.0
86 124 2.6 75.5
85 141 2.9 72.9
84 177 3.7 69.9
83 109 2.3 66.3
82 119 2.5 64.0
81 122 2.5 61.5
80 134 2.8 59.0
79 137 2.8 56.2
78 129 2.7 53.4
77 102 2.1 50.7
76 106 2.2 48.6
75 115 2.4 46.4
74 81 1.7 44.0
73 88 1.8 42.3
72 88 1.8 40.4
71 107 2.2 38.6
70 105 2.2 36.4
69 82 1.7 34.2
68 89 1.8 32.5
67 139 2.9 30.7
66 105 2.2 27.8
MELAB
Part 3 Score N
Percent with
the Score
Cumulative
Percent
65 114 2.4 25.6
64 55 1.1 23.2
63 85 1.8 22.1
62 74 1.5 20.3
61 64 1.3 18.8
60 37 0.8 17.4
59 62 1.3 16.7
58 18 0.4 15.4
57 53 1.1 15.0
56 48 1.0 13.9
55 73 1.5 12.9
54 42 0.9 11.4
53 48 1.0 10.5
52 16 0.3 9.5
51 43 0.9 9.2
50 33 0.7 8.3
49 31 0.6 7.6
48 51 1.1 7.0
47 25 0.5 5.9
46 29 0.6 5.4
45 34 0.7 4.8
44 42 0.9 4.1
43 59 1.2 3.2
40 25 0.5 2.0
39 17 0.4 1.5
38 12 0.2 1.1
37 5 0.1 0.9
36 18 0.4 0.7
35 2 0.0 0.4
34 6 0.1 0.3
33 2 0.0 0.2
32 3 0.1 0.2
31 3 0.1 0.1
30 1 0.0 0.0
25 1 0.0 0.0
22
2.3 PERFORMANCE OF REFERENCE GROUPS ON THE MELAB
Tables 2.7 through 2.10 present descriptive information on the performance of various groups of
MELAB examinees. It is important to keep in mind that the statistics describe performance of
examinees who themselves elected to take a MELAB and that all group classification is based on
information supplied by the examinees. The group statistics cannot be assumed to be
representative of the general population, and the data in the tables should not be used to make
generalizations about differences in the English language proficiency of such groups in the
general population.
Table 2.7 MELAB Scaled Score Mean and Standard Deviation by Examinee
Reason for Testing
(based on information provided by first-time MELAB examinees from 1991 to 1993)
Reason for
Testing
Number Part 1
Mean
Part 1
SD
Part 2
Mean
Part 2
SD
Part 3
Mean
Part 3
SD
Final
Mean
Final
SD
To enter a 2-year
college
337 72.16 7.09 75.19 12.07 68.53 14.81 71.96 10.09
To enter a 4-year
college
1981 74.36 7.59 77.90 11.75 72.28 14.73 74.85 10.26
To enter a university
for graduate work
1260 76.55 8.00 78.41 11.59 78.30 13.00 77.76 9.60
For professional
certification
177 77.12 7.85 77.80 11.57 81.54 14.14 78.82 9.94
Other 721 77.25 8.25 76.84 13.86 77.73 15.78 77.29 11.48
Table 2.8 MELAB Scaled Score Mean and Standard Deviation by Sex
Sex Number Part 1
Mean
Part 1
SD
Part 2
Mean
Part 2
SD
Part 3
Mean
Part 3
SD
Final
Mean
Final
SD
Male 2719 75.12 7.97 76.72 12.24 74.22 15.09 75.35 10.50
Female 2089 75.82 7.79 78.28 11.94 75.31 14.57 76.47 10.24
Table 2.9 MELAB Scaled Score Mean and Standard Deviation by Age
Age Range Number Part 1
Mean
Part 1
SD
Part 2
Mean
Part 2
SD
Part 3
Mean
Part 3
SD
Final
Mean
Final
SD
<21 1397 74.76 7.47 79.21 11.14 72.37 14.78 75.45 10.10
21-25 1288 74.47 7.97 76.93 12.70 72.84 15.22 74.75 10.86
26-30 1035 76.35 7.61 78.35 11.64 78.35 13.88 77.69 9.79
31-35 570 76.88 8.11 77.14 11.52 77.86 13.72 77.29 9.77
>35 521 76.10 8.71 72.09 13.23 74.77 15.38 74.33 11.18
23
Table 2.10 Mean and Standard Deviation by Native Language
1
for Parts 1, 2, 3
and Final MELAB Scores
(based on 13,588 first-time MELABs administered 1987-1990)
PART 1 PART 2 PART 3 FINAL
LANGUAGE
2 Cases Mean SD Mean SD Mean SD Mean SD
Ibo 47 80.85 7.98 72.77 8.33 84.51 8.62 79.38 7.24
Yoruba 33 83.33 7.17 78.00 8.60 86.64 10.53 82.76 7.46
Amharic 106 72.64 8.08 71.40 9.02 67.92 15.00 70.60 9.44
Arabic 1914 69.96 6.80 72.31 10.97 63.10 14.97 68.48 9.58
Hebrew 122 75.82 6.52 86.19 9.22 78.60 12.02 80.17 7.88
Somali 67 72.97 8.67 70.08 11.64 68.10 17.97 70.39 11.48
Tigre 36 74.28 8.20 73.67 11.00 70.56 15.93 72.81 10.35
Malayalam 40 80.85 7.77 78.70 9.05 82.88 11.65 80.85 8.36
Tamil 99 80.35 8.77 81.76 9.16 83.78 12.26 81.93 9.19
Telegu 68 79.29 7.07 77.15 9.66 80.93 11.79 79.13 8.61
Chinese 2710 73.32 5.87 75.37 9.97 76.08 12.13 74.91 8.03
Cambodian 46 71.22 5.01 71.48 8.58 69.46 8.30 70.78 5.81
Indonesian 277 72.12 5.44 73.00 10.57 69.00 13.06 71.43 8.16
Malay 76 77.20 6.30 80.12 7.65 79.90 11.48 79.07 7.26
Tagalog 205 82.21 6.77 82.18 7.52 87.00 9.32 83.83 6.87
Vietnamese 482 70.83 5.84 70.03 11.25 66.70 15.76 69.19 9.67
Burmese 26 77.42 6.98 76.92 9.39 81.00 13.33 78.42 8.71
Hmong 87 73.13 6.53 73.59 8.86 70.77 12.26 72.53 8.14
Lao 72 70.75 5.86 76.11 7.26 67.32 15.43 71.40 8.54
Thai 192 70.33 5.21 70.14 10.04 68.08 11.30 69.50 7.14
Japanese 1208 68.09 6.72 68.10 12.18 61.46 14.28 65.83 9.76
Korean 483 70.83 5.79 75.52 9.28 73.99 13.13 73.45 7.89
Turkish 238 70.77 9.36 71.15 14.43 64.12 16.97 68.65 12.81
Finnish 48 82.48 6.82 86.27 7.31 85.40 9.29 84.73 6.87
Hungarian 96 77.50 5.85 78.85 10.68 77.43 12.52 77.90 8.11
Bengali 94 78.14 8.00 72.88 10.92 76.67 13.94 75.92 9.00
Farsi 719 72.59 6.50 76.62 11.36 70.95 13.97 73.38 9.37
Gujarati 96 78.17 9.22 78.44 9.82 78.16 15.45 78.20 10.79
Hindi 114 82.22 8.27 80.44 11.99 85.29 12.20 82.61 9.51
Punjabi 93 79.10 8.07 75.57 12.67 79.60 12.80 78.10 10.20
Urdu 164 78.38 8.05 77.62 10.50 76.78 14.54 77.60 10.09
Greek 161 77.38 6.87 80.73 9.81 76.44 13.61 78.17 9.03
Polish 232 79.17 6.24 81.00 10.53 78.52 12.25 79.55 8.48
Russian 109 76.10 7.09 76.51 12.47 72.31 16.08 75.02 10.73
SerboCroatian 76 76.30 6.25 82.09 9.18 76.71 14.06 78.36 9.00
French 318 77.94 7.29 77.84 12.64 77.44 15.07 77.74 10.53
Italian 59 79.83 6.03 81.73 9.04 81.78 11.97 81.07 8.16
Portuguese 297 76.67 6.20 78.02 11.79 76.23 12.86 76.92 9.07
Romanian 83 80.95 7.50 81.84 10.13 82.81 10.86 81.91 8.48
Spanish 1028 74.44 7.15 76.29 11.06 73.52 14.26 74.75 9.72
Danish 25 82.56 5.23 89.40 6.06 87.64 6.77 86.52 5.36
Dutch 67 82.63 7.11 90.08 5.54 90.05 6.51 87.60 5.28
German 290 82.84 6.66 87.38 7.03 84.34 11.36 84.87 7.36
Norwegian 35 79.89 7.32 86.74 8.91 84.29 10.11 83.69 7.81
Swedish 70 80.06 6.54 87.46 7.49 83.46 11.04 83.63 7.56
1
Only those languages that were represented at least 25 times are included in the table.
2
Languages are grouped by family and sub-family.
24
2.4 INTERCORRELATIONS AMONG MELAB SCORES AND MELAB ORAL RATING
Table 2.11 shows the intercorrelations of scaled MELAB part scores, Final MELAB scores, and
scores on the optional MELAB Speaking Test. The correlation coefficients are measures of the
extent of the relationships among the various subtests and tests. The coefficients in Table 2.11
are based on first time MELABs administered between 1991 and 1993. For correlations involving
the oral rating, the number of cases is 1,076, a subset of the 4,811 used to calculate the other
correlation coefficients. The correlation coefficients based on 13,588 MELABs administered
between 1987 and 1990 are virtually identical to those shown here (see Appendix E).
In general, the correlation coefficients are moderate to moderately high. This implies, first of all,
that there is overlap in what the various tests assess (which would be expected given the
commonly accepted premise that people who are highly skilled in one area of language
proficiency also tend to be skilled in other areas of language proficiency). Secondly, it suggests
that although there is this overlap, the various tests do provide some unique information about
examinees' skills.
It should be noted that correlations of MELAB Part 1, Part 2, and Part 3 with Final MELAB are
spuriously inflated because the Final MELAB Score is an average of the scores on those three
subtests. To correct for this, the correlation between each part score and the sum of the other
two part scores was calculated and appears in the table below the artificially high coefficients. No
correction was necessary in the case of the Speaking Test as the oral rating is not averaged into
the Final MELAB score.
Table 2.11 Intercorrelations of Scaled MELAB Part Scores and Final Scores (n=4,811) and
MELAB Oral Rating (n=1,076), 1991-1993
MELAB Part 1 MELAB Part 2 MELAB Part 3 Final MELAB
MELAB Part 1
(Composition)
MELAB Part 2
(Listening)
.60
MELAB Part 3
(GCVR)
.74 .70
FINAL MELAB .84
(Part 1 with
Part 2 + Part 3 = .73 )
.88
(Part 2 with
Part 1 + Part 3 = .71)
.94
(Part 3 with
Part 1 + Part 2 = .80)
MELAB
ORAL RATING
.54 .54 .54 .60
25
SECTION 3: MELAB RELIABILITY AND VALIDITY
3.1 RELIABILITY
Reliability coefficients are estimates of the consistency of scores generated by a test. There are
a number of types of reliability and various techniques for measuring each type of reliability. In
this section (and in Appendix F) are reports of a number of reliability studies of the MELAB. For
reference purposes, Table 3.1 below summarizes the findings of these studies. The footnotes
indicate where to find reports of the studies that generated the figures in the table.
Table 3.1 Summary of MELAB Reliability Estimates
Type of Reliability
Coefficient
Part 1
(Composition)
Part 2
(Listening)
Part 3
(GCVR)
Final
MELAB
Internal
Consistency
not applicable
.87 to .90
1
.93 to .95
1
.91
2
Alternate Forms
.89
3
.83 to .87
4
.94
4
not applicable
Test/Retest not applicable
.82
5
.92
5
.91
5
Interrater
.90
6
not applicable not applicable not applicable
Intrarater
.87 to .92
7
not applicable not applicable not applicable
1
See 3.1.4
2
See 2.1, 3.1.2
3
See 3.1.1.4
4
See 3.1.3.1
5
See 3.1.2
6
See 3.1.1.2
7
See 3.1.1.3
3.1.1 Reliability of MELAB Part 1 (Composition)
Studies have been done to assess several types of reliability of the MELAB Composition test:
interrater reliability, intrarater reliability, and alternate form reliability. However, before reporting
the results of those studies, it is useful to consider first who rates MELAB compositions and how
they are trained, as these are important factors in the reliability of MELAB Part 1.
3.1.1.1 MELAB Composition Raters: Who they are; how they are trained
MELAB compositions are scored by a small group of trained, experienced raters at the ELI-UM
who rate compositions daily. The raters are specialists in ESL and have regular appointments in
the Testing Division. New raters begin their training by studying a manual containing descriptors
of the ten MELAB composition scores and several exemplars at each of the ten levels and by
discussing these exemplars and general MELAB scoring criteria with experienced raters. They
then rate numerous sets of compositions, comparing their scores with the scores that
experienced raters gave those compositions and discussing any score discrepancies with
experienced raters. Scores assigned by raters-in-training are never used as official MELAB
scores. Only after a new rater demonstrates an acceptable interrater reliability with experienced
raters and an acceptable intrarater reliability (a measure of how consistent a rater is in assigning
scores) are scores assigned by a new rater used in determining official MELAB scores.
26
3.1.1.2 Interrater Reliability
Interrater reliability is a measure of the degree to which different raters agree on the scores they
assign to individual compositions. Interrater reliability for the MELAB composition test is
monitored on a regular basis. Interrater reliability figures (Pearson product-moment correlations
adjusted using the Spearman-Brown Prophecy Formula) are typically around .90.
Table 3.2 below reports interrater reliability coefficients for MELAB compositions administered in
two recent year-long periods. For each of the two time periods shown in Table 3.2, where any
two raters both read the same composition, their scores were correlated (Pearson product-
moment correlation coefficient). There were nine different raters between 1990 and 1993 (two of
whom read only a small number of compositions in the 92/93 year). All possible pairings of raters
did not occur; the number of pairs of raters whose interrater r's were used to calculate the mean r
and mean adjusted r for each set of compositions is shown in the table. The means were
calculated by transforming the r's of the pairs of raters to Fischer Z's, finding the mean of the
Fischer Z's, and then transforming that mean value to an r value.
Table 3.2 MELAB Part 1 Interrater Reliability
Dates N of
Comps
Mean of
Comp
Scores
SD of
Comp.
Scores
N of
Pairs of
Raters
Range
r
1
Mean
r
Median
r
12/90 - 4/92 4020 75.56 6.99 31
.52
4
to .92
.82 .83
12/92 - 11/93 2304 74.19 7.90 23 .68 to .90 .82 .83
Dates N of
Comps
Range
Adjusted
r
2
Mean
Adjusted
r
Median
Adjusted
r
SEM
3
12/90 - 4/92 4020
.68
4
to .96
.90 .90 2.21
12/92 - 11/93 2304 .81 to .94 .90 .90 2.50
1
r is the Pearson product-moment correlation coefficient.
2
Adjusted r is an estimate of the reliability of the final composition scores (MELAB Part 1 scores) based on two raters
per composition. It is obtained using the Spearman-Brown prophecy formula.
3
SEM is the standard error of measurement of the composition scores.
4
The minimum value in the range is very atypical. The next lowest r was .69, the second lowest adjusted r .82.
The interrater reliability figures shown in Table 3.2 are consistent with those found in earlier,
smaller-scale studies on MELAB compositions. Spaan (1993)
1
found interrater correlations (r's)
ranging from .85 to .88 (adjusted r's from .92 to .94). Haugen (1986)
2
, found interrater
correlations (r's) from .83 to .89 (adjusted r's from .91 to .94). They are also consistent with
interrater correlation coefficients calculated from scores on 3512 compositions written for the
Michigan Test of English Language Proficiency Battery, the precursor to the MELAB. When the
MELAB replaced the MTELP Battery, no significant changes were made to either the composition
or to the scoring system. Homburg (1984)
3
reports those r's ranging from .72 to .93 with a
median of .88 (adjusted r's from .84 to .96 with a median of .94).
The r values in Table 3.2 indicate how much consistency there is in the relative order in which two
raters rank a group of compositions. It is also important to determine how well pairs of raters
agree on the actual score assigned to a composition because it is possible that two raters could
agree on the ordering of a set of papers but that one could, for example, consistently rate

1
Spaan, M. (1993). The effect of prompt in essay examinations. In D. Douglas & C. Chapelle (Eds.), A new decade of
language testing research: Selected papers from the 1990 Language Testing Research Colloquium (pp. 98-122).
Alexandria, VA: TESOL.
2
Haugen, J. (1986). ELI-UM Testing Division internal document.
3
Homburg, T. J. (1984). Holistic evaluation of ESL compositions: Can it be validated objectively? TESOL Quarterly 18
(1): 87 - 107.
27
compositions ten points higher than the other. As a measure of this kind of agreement, for each
pair of raters shown in Table 3.2, the mean difference in scores they assigned to compositions
they read in common was calculated. For example, the score that Rater A assigned to
Composition X was subtracted from the score that Rater B assigned to Composition X. This was
done for all compositions read by both Rater A and Rater B; and subsequently a "mean
difference" for the pair of raters was calculated for the set of compositions rated by both of them.
This process was repeated for all rater pairs included in Table 3.2. Information about these mean
differences appears in Table 3.3 below. The smaller the mean difference, the less the likelihood
of systematic rater bias.
Table 3.3 Rater Score Differences for MELAB Part 1
Test Dates N. of pairs
of raters
Mean of the
Mean Score
Differences
Median of the
Mean Score
Differences
Range of the
Mean Score
Differences
12/90 - 4/92 31 1.00 .79 .00 to 2.87
12/92 - 11/93 23 1.29 1.05 .00 to 4.58
As can be seen from Table 3.3, the mean of all the mean score differences in both data sets was
approximately 1 score point, a difference that is not of practical significance.
The largest mean score difference observed was 4.58 points (on the 45 point 53-97 scale). For
MELAB compositions, whenever two raters assign scores that differ by 4 or 6 points, their scores
are averaged. The difference in an examinee's Part 1 score is then 2 or 3 points from what it
would have been if both raters had assigned the identical score. This translates to a difference in
Final MELAB score of a maximum of only 1 point. If two raters' scores differ by more than 6
points, a third rater is used. Spaan (1993) found that 7% of 176 compositions read as part of a
study on the effect of prompt on composition scores were adjudicated by a third rater. In the
most recent year for which statistics are available, 1993, approximately 8% of 2,311 MELAB
compositions were read by three raters.
3.1.1.3 Intrarater Reliability
Intrarater reliability is a measure of the consistency of a single rater's scoring. For a reliability
study conducted in 1986, three raters each scored the same 50 MELAB compositions (a stratified
random sample from 1984/1985 MELABs) and then, one month later, rescored them
(independently and with the compositions in scrambled order). The three raters had intrarater
correlations of .87, .89, and .92 (Haugen, 1986). Homburg (1984) reports that intrarater reliability
coefficients computed from scores on 3512 compositions written in 1979/80 for the Michigan Test
of English Language Proficiency Battery ranged from a low of .874 to a high of .936 (note: Part 1
of the MELAB and its scoring system are essentially identical to the composition and scoring
system of the MTELP Battery).
3.1.1.4 Alternate Form Reliability
An important test reliability question is how likely it is that an examinee would receive the same
score on two alternate forms of the test (alternate or parallel form reliability).
Spaan's study (1993) on the effect of prompt topic on composition score provides information
related to alternate form reliability. In Part 1 of the MELAB, examinees must choose to write
about one of the two composition topics offered them. Typically, in Part 1, one topic choice is a
"narrative/personal" (NP) prompt and the other is an "argumentative/impersonal" (AI) prompt. For
Spaan's study, 88 subjects each wrote two MELAB compositions during the same test
administration, one in response to an NP prompt and the other in response to an AI prompt. The
compositions were scored by MELAB composition raters using the standard composition scoring
method. The correlation between the subjects' scores on their two compositions (a measure of
alternate form reliability) was .89. The subjects' mean scores on their two different compositions
28
differed by only .886 (with a 95% confidence interval of [.0696, 1.703]), suggesting that prompt
topic (and by analogy, test form) does not significantly affect an examinee's Part 1 score.
3.1.2 Test/Retest Reliability
To investigate the test/retest reliability of the MELAB, a small scale test/retest study (n=63) was
conducted by the ELI Testing Division in August, 1991. The subjects were examinees with 19
different native languages who had registered for regularly scheduled Ann Arbor MELAB
administrations. After they had taken the test once under normal conditions, they were invited to
return one to two weeks later to retake the entire exam free of charge. They were told they could
choose the better of the two sets of scores retained as their official score. Their scores from their
first test were not revealed to them until after they had been retested.
Each subject was given the same forms of MELAB Part 2 (Listening) and Part 3 (GCVR) at Time
1 and Time 2; however, they were given different forms of Part 1 (Composition) as it was felt that
the practice effect would play too strong a role in their Time 2 performance if they wrote again on
exactly the same topic.
The means for these administrations, shown in Table 3.4 below, are similar to the MELAB
population means (see Table 2.1). However, it should be noted that the standard deviations of
the scores in the test/retest sample are consistently smaller than those of the MELAB population
at large. The restriction in range, which perhaps can be attributed to the fact that very strong and
very weak examinees saw no benefit in being retested, may lower intercorrelations of Time
1/Time 2 data.
The Final MELAB score test/retest reliability coefficient, the correlation of scores at Time 1 with
those at Time 2, is .91. A coefficient of .91 indicates a high degree of consistency in the way
examinees were ranked by the MELAB each time they took it. The subjects' final scores
increased, on average, by 2.25 points. This increase is not unexpected since the forms of Parts 2
and 3 were identical at test Time 1 and test Time 2.
Table 3.4 MELAB Test/Retest Results
Part 1
(Composition)
Part 2
(Listening)
Scaled
Part 3
(GCVR)
Scaled
Final
MELAB
Time 1 Mean 75.14 74.63 70.67 73.41
Time 1 SD 5.22 11.42 12.73 8.42
Time 2 Mean 73.92 79.54 73.35 75.67
Time 2 SD 5.00 10.39 11.61 7.81
Mean Difference from
Time 1 to Time 2
- 1.22
1
+4.90* +2.68* +2.25*
Correlation of
Time 1 with Time 2
.54*
2
.82* .92* .91*
*significant, p < .001
1
Composition topics were different at Time 1 and Time 2
2
Correlation coefficient is likely depressed due to a restriction in range of the scores at both Time 1 and Time 2
The correlation coefficients of the two sets of Part 2 scores and of the two sets of Part 3 scores
are also high. The correlation between the two sets of composition scores, on the other hand, is
only moderate. It is likely that this correlation coefficient is depressed because of the restricted
range of composition scores in both Time 1 scores and Time 2 scores. The MELAB Part 1 scale
ranges from 53 to 97. In this sample, however, the Part 1 scores at Time 1 ranged only from 65
29
to 87 and at Time 2 only from 63 to 87 (only about 60% of the total scale). The standard
deviations of the two sets of compositions (5.22 and 5.00) are smaller than the population
standard deviation (7.90, see section 2.1 General Descriptive Statistics).
In order to get a better understanding of how the composition scores at Time 1 are related to the
composition scores at Time 2, it is useful to look at the frequency and pattern of score changes.
The scores of 81% of the examinees in this study remained quite stable. That is, their scores at
Time 2 were the same or only one scale point (4 or 6 points) higher or lower than their scores at
Time 1 (a difference of 1 to 2 points on the final MELAB score scale). The moderate correlation
(.54) between Time 1 and Time 2 indicates that there was not a strong directionality in the score
changes; scores increased for some and decreased for others.
There is no significant difference in the Time 1 mean score and the Time 2 mean score of Part 1
(Composition). There are significant differences between the Time 1 and Time 2 mean scores of
Part 2 (Listening) and of Part 3 (GCVR). These changes may reflect, to some degree, a limitation
in the test/retest method--that the role of memory in test performance at Time 2 cannot be
eliminated because examinees took the same tests twice.
The way in which subsection scores within Part 3 changed from Time 1 to Time 2 does suggest
that memory had an effect on the score changes. Within Part 3, there were no significant
differences between Time 1 and Time 2 means on the 30 grammar items, the 20 cloze items, or
the 30 vocabulary items. There was, however, a small but significant difference (+1.25 raw score
points) in the means of the scores on the 20 reading items. Since the significant change in Part 3
scores resulted mainly from changes in performance on reading items, items based upon
coherent, extended discourse, one might infer that memory played a role in Time 2 test
performance.
3.1.3 Alternate Forms Reliability (for MELAB Part 2 and MELAB Part 3)
Different forms of MELAB Part 2 (Listening) and MELAB Part 3 (GCVR) are designed to be
"equated test forms." That is, although different forms of MELAB Part 2 might yield different raw
scores for the same person, the scaled scores on the different forms are related by equivalency
tables so that it is possible to say that a score of X on one form is equivalent to a score of Y on
another. The same holds true for the various forms of MELAB Part 3.
3.1.3.1 Developing Alternate Forms
There are several steps in the development of equated, or alternate, forms of the MELAB. The
first step is the writing and editing of items according to established content and format
specifications. Items that appear strong after preliminary pre-testing on 75-90 subjects are then
used in ELI-UM's annual overseas proficiency certificate test, which is taken by about 5,000
people. On the basis of an item analysis of 600 of these test papers, items with the best
discrimination indexes (item-test correlations of .30 or higher; proportion of correct responses
increasing progressively in five groups, from the lowest-scoring group to the highest-scoring
group) and acceptable item difficulty (.30-.85) are identified and, along with the best items from
other years' certificate tests, used as an item bank to create new MELAB forms.
The items that compose a new form of MELAB Part 2 or MELAB Part 3 are selected to be similar
to items composing earlier forms with respect to item statistics, item type, and item content. The
facility indexes for the parallel tests (and for the groups of item-types within them) must have
similar descriptive statistics: range, mean, standard deviation. The quantity of each item type
(for example, short discourse listening items or lecture comprehension items) must be the same
in all forms. Additionally, efforts are made to make item content consistent across the forms (for
example, a consistent number of grammar items testing embedding or of vocabulary items testing
idioms). Tables 3.13 - 3.16 in Section 3.2.1.3 provide content and statistical information about
30
four forms of MELAB Part 3 and give more detail about how the forms are constructed to
resemble each other.
After a new MELAB Part 2 or Part 3 form is assembled, the new form, along with an established
form, is administered in the U.S. and Canada to 90 - 200 examinees of varied linguistic
backgrounds and proficiency levels. Examinees' performance on the two tests is used to develop
score equivalency tables that convert raw scores to scaled scores. The test equation techniques
used are a combination of 1) matching means, standard deviations, and percentiles; and 2)
estimating equivalencies through a regression formula that considers the tests' means, standard
deviations, and correlation. Table 3.5 below shows the correlations found between the raw
scores on new forms and the established forms during these equating administrations.
Table 3.5 Correlations Among Alternate Forms of MELAB
New Form(s) Established Forms Correlation
MELAB Part 2 (Listening): DD, EE MELAB Part 2 (Listening): BB, CC .83 to .87
MELAB Part 3 (GCVR): DD MELAB Part 3 (GCVR): AA, BB, CC .94
3.1.3.2 Distribution of Scores on Alternate Forms of MELAB Part 2 (Listening) and
MELAB Part 3 (GCVR)
There are currently four alternate forms of MELAB Part 2 (Listening) and four alternate forms of
MELAB Part 3 (GCVR) that have been used on large numbers of examinees. Tables 3.6 and 3.7
below summarize the distribution of scores, by form, for 4811 "first-time" MELABs taken between
1991 and 1993. For each form, the tables show the number of candidates, the mean scaled
score, the standard deviation of the scores, and the scaled score on each form that corresponds
to a given percentile. The box plots in Figures 3.1 and 3.2 present similar information graphically.
It should be noted that it is not known whether the groups that took each form have equal
language proficiency.
Table 3.6 Percentile Rankings of Scaled Scores of Alternate Forms
of MELAB Part 2 (Listening)
1
Percentile Form BB Score
N=1093
Mean=77.88
SD=10.74
Form CC Score
N=1386
Mean=76.83
SD=11.77
Form DD Score
N=1151
Mean=76.84
SD=12.64
Form EE Score
N=1181
Mean=78.17
SD=13.17
100 (max.) 100 100 98 100
99 98 98 98 98
95 92 94 92 94
90 90 92 90 91
75 85 85 86 89
50 80 77 81 82
25 71 70 69 72
10 61 60 59 58
5 57 54 53 52
1 49 42 43 40
0 (min.) 35 35 33 33
1
Based on 4811 "first time" MELABs administered from 1991 to 1993
31
Table 3.7 Percentile Rankings of Scaled Scores of Alternate Forms of
MELAB Part 3 (GCVR)
1
Percentile Form AA Score
N=1222
Mean=74.77
SD=13.72
Form BB Score
N=1337
Mean=75.18
SD=14.53
Form CC Score
N=1165
Mean=75.02
SD=15.10
Form DD Score
N=1087
Mean=73.66
SD=16.20
100 (max.) 100 100 100 100
99 99 99 99 97
95 96 95 96 95
90 93 92 93 93
75 85 87 87 87
50 76 78 78 77
25 66 65 67 62
10 55 55 50 49
5 48 46 44 45
1 40 39 36 39
0 (min.) 31 25 30 31
1
Based on 4811 "first time" MELABs administered from 1991 to 1993
Figure 3.1 Box Plots of Scaled Scores of Alternate Forms of MELAB Part 2 (Listening)
1
1
The data used to construct the box plots is identical to the data used in Table 3.6 above (that is, on 4811 "first time"
MELABs administered from 1991 to 1993)
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
PT2Form BB CC DD EE
Univariate Procedure
Schematic Plots
Variable=PT2EQ
100
90
80
70
60
50
40
30
+
+ +
+
32
Figure 3.2 Box Plots of Scaled Scores of Alternate Forms of MELAB Part 3 (GCVR)
1
1
The data used to construct the box plots is identical to the data used in Table 3.7 above (that is, on 4811 "first time"
MELABs administered from 1991 to 1993)
0
0
0
0
0
0
PT3Form AA BB CC DD
Univariate Procedure
Schematic Plots
Variable=PT3EQ
100
90
80
70
60
50
40
30
20
+
+
+
+
33
3.1.4 Internal Consistency Reliability (KR21 and Cronbach's Alpha)
Tables 3.8 and 3.9 report the Kuder Richardson 21 reliability value and the standard error of
measurement of different forms of MELAB Part 2 and Part 3. The figures are based on raw
scores of "first-time" MELABs administered between 1991 and 1993. The KR21 reliabilities range
from .87 to .90 for the Part 2 (Listening) and from .94 to .95 for Part 3 (GCVR). Statistics on
MELABs administered prior to 1991 are similar (see Appendix F for statistics on MELABs
administered between 1987 and 1990).
Table 3.8 Kuder-Richardson Formula 21 Internal Consistency Reliability Estimates
for MELAB Part 2 (Listening)
1
Form N Mean SD KR 21 SEM
2
BB 1093 30.33 8.97 .87 3.23
CC 1386 32.12 9.40 .89 3.12
DD 1151 29.81 9.73 .89 3.23
EE 1181 31.21 9.94 .90 3.14
1
Based on the raw scores of "first time" MELABs administered between 1991 and 1993. It should be noted that there is
no Part 2 Form AA of MELAB.
2
Standard error of measurement in raw score points; SEM in scaled score points would be slightly larger.
Table 3.9 Kuder-Richardson Formula 21 Internal Consistency Reliability Estimates
for MELAB Part 3 (GCVR)
1
Form N Mean SD KR 21 SEM
2
AA 1222 59.78 18.81 .94 4.61
BB 1337 57.04 18.27 .94 4.48
CC 1165 56.69 20.32 .95 4.54
DD 1087 59.55 20.33 .95 4.54
1
Based on the raw scores of "first time" MELABs administered between 1991 and 1993.
2
Standard error of measurement in raw score points; SEM in scaled score points would be slightly smaller.
In order to calculate a more exact measure of internal consistency (Cronbach's alpha) for MELAB
Part 2 and MELAB Part 3, item-level data was obtained from a set of 610 MELAB examinations.
This data set is a stratified random sample (considering native language, test site, and mean final
score and standard deviation per language group) of MELAB's taken between 1990 and 1991.
All forms of Part 3 are represented in this data set, but for Part 2, only forms BB and CC are
represented (there is no Form AA, and Forms DD and EE were not in use in 1990).
As can be seen in Tables 3.10 and 3.11 below, the reliability coefficients for Part 2 (50 items)
range from .89 to .90 and for Part 3 (100 items) from .93 to .95. Alphas are also shown for the
subsets of different item types within each form; however, it should be noted that no component
score is ever reported. For Part 2 and for Part 3, MELAB score reports show only an examinee's
total score. Because the value of alpha depends on the number of items in the scale (fewer items
yields a lower alpha, more items yields a higher alpha), the number of items of each type is also
shown.
34
Table 3.10 Cronbach's Alpha Reliability Estimates for MELAB Part 2 (Listening)
FORM N QUESTS
.
STMTS. EMPH. LECTURE CONVERS. TOTAL
(50 items)
Part 2
Mean
1
Part 2
SD
1
Part 2
SEM
2
BB 299 .68
(8 items)
.52
(7 items)
.81
(10 items)
.71
(13 items)
.66
(12 items)
.90 30.20 9.62 3.04
CC 311 .68
(8 items)
.63
(7 items)
.68
(10 items)
.63
(11 items)
.73
(14 items)
.89 32.02 9.24 3.06
1
Based on raw scores
2
Standard error of measurement in raw score points; SEM in scaled score points would be slightly larger (Form BB =
3.72; Form CC = 3.78)
Table 3.11 Cronbach's Alpha Reliability Estimates for MELAB Part 3 (GCVR)
Form N Grammar Cloze Vocabulary Reading Total
(100 items)
Part 3
Mean
1
Part 3
SD
1
Part 3
SEM
2
AA 148 .84
(30 items)
.73
(20 items)
.86
(30 items)
.83
(20 items)
.95 59.40 18.57 4.15
BB 196 .84
(30 items)
.75
(20 items)
.81
(30 items)
.80
(20 items)
.93 58.08 17.05 4.51
CC 165 .87
(30 items)
.79
(20 items)
.86
(30 items)
.85
(20 items)
.95 55.94 19.43 4.34
DD 101 .85
(30 items)
.70
(20 items)
.88
(30 items)
.75
(20 items)
.94 59.81 18.19 4.46
1
Based on raw scores
2
Standard error of measurement in raw score points; SEM in scaled score points would be slightly smaller (Form AA =
2.98; Form BB = 3.44; Form CC = 3.30; Form DD. = 3.60)
3.2 Validity
The theory and practice of test validation is actually complicated and evolving. The following
section presents information related to the validity of the MELAB as a test aimed at measuring the
English language proficiency of individuals interested in college-level academic work. Test
validation involves gathering and analyzing evidence about whether the content of the MELAB is
appropriate and whether inferences about English proficiency can be made from MELAB test
scores. Evidence is presented about what is tested on the MELAB and the underlying construct
is examined. Also, the relationship of MELAB scores to other measures of English language
proficiency is described. However, because test validation is considered a continuing process,
MELAB test users also have the responsibility of accumulating evidence about inferences and
interpretations of MELAB test scores.
3.2.1 Content-related Evidence
For a test to have content validity, its content should be a representative sample of the behavior
domain we want to test. In an attempt to ensure that content validity of the MELAB is high, a
systematic procedure is followed in specifying what is to be tested and in constructing and
selecting test items. Tests are constructed by test development teams composed of professional
staff of the ELI-UM Testing and Certification Division. All professional staff have an academic
background in Teaching English as a Second Language (TESL), English, or linguistics, and
experience teaching English as a Second Language. Information about aim, development, and
content of each part of the MELAB follows.
35
3.2.1.1 Content-related Evidence for Part 1: Composition
Test Aim: The composition task attempts to assess an examinee's ability to communicate in
written English, particularly as might be relevant to the capacity to carry out academic work. It
aims to determine how clearly and effectively someone can develop a topic in written discourse.
Prompt topics are intended to be ones about which examinees of various ages, cultural
background, English proficiency level and educational experience have content knowledge. Part
1, as a direct writing task, serves to complement Part 3 of the MELAB, in which knowledge of
written English is tested in another way, through a multiple-choice format. The writing prompt
topics are not intended to reflect the range of written assignments at the university level, but the
task is relevant to performance constraints of academic essay exams.
Test Development: A direct writing task has been a component of Michigan proficiency batteries
since the 1950's. Through the years specifications for composition topics have been established
and serve as guidelines for the development of new topics. Topics are written by research staff
members of the ELI Testing and Certification Division who also have duties as composition raters
and may be ESL instructors at the University of Michigan.
Topic writers are advised to consider various constraints of the test situation, expectations about
the writing outcome, and various characteristics of the examinees when formulating possible
MELAB writing topics. Aspects of the test situation considered important are that 30-minutes is
allotted for the writing task and that access to the specific topics is given just prior to the time of
writing and therefore the responses are impromptu, and that examinees do not have access to
dictionaries or other writing aids, although the administrator of the exam may translate or briefly
explain topics to examinees. Because it is expected that the text produced by the examinee will
be at least 150 words, prompt writers are advised to develop topics that are broad enough that
someone can write at length on them rather than exhaust the topic in only 10 or 15 minutes.
Certain topics are avoided, specifically those that might elicit formulaic or previously prepared
responses, e.g. topics asking for the history of the examinee's country or autobiographical
accounts of the examinee's life. Characteristics of the examinees that are considered when
devising topics are that the examinees come from a range of linguistic and cultural backgrounds,
have various educational backgrounds, and are different ages. Topics are developed to be
accessible and attractive to a range of young adult and adult examinees. Topics are avoided that
could be considered politically or culturally objectionable or limited or that require the examinee to
draw on specialized knowledge of a culture, a field, or a discipline. Topics may call upon the
personal experience, attitudes, or general knowledge of the examinees.
Groups of item writers discuss and revise suggested topics, and topics are piloted on MELAB
examinees in Ann Arbor and internationally as part of the Examination for the Certificate of
Proficiency in English program. Topics are selected and pre-tested in an attempt to avoid bias
against any particular group of examinees and are expected to elicit a range of responses with
scores falling in a normal distribution. New topic sets are introduced annually. Some of the
studies of MELAB prompt difficulty have been published: Hamp-Lyons and Prochnow, 1991
4
;
Hamp-Lyons and Prochnow-Mathias, 1994
5
; Spaan, 1993
6
Test development of MELAB Part 1 includes continued monitoring of the scoring system. The
most recent significant analysis and revision of the scoring scale descriptors was in 1989. The
descriptions were rewritten (then piloted, and rewritten several times) for greater clarity and

4
Hamp-Lyons, L. & Prochnow, S. (1991). The difficulties of difficulty: Prompts in writing assessment. In S. Anivan (Ed.),
Current developments in language testing (pp. 58 - 76). Singapore: SEAMEO Regional Language Centre.
5
Hamp-Lyons, L. & Prochnow-Mathias, S. (1994). Examining expert judgments of task difficulty on essay tests. Journal
of Second Language Writing 3 (1): 49 - 68.
6
Spaan, M. (1993). The effect of prompt in essay examinations. In D. Douglas & C. Chapelle (Eds.), A new decade of
language testing research: Selected papers from the 1990 Language Testing Research Colloquium (pp. 98 - 122).
Alexandria, VA: TESOL.
36
consistency and include references to discourse level features of texts. Codes were specified so
that raters could indicate features of particular examinees' texts that were especially good or bad
in relation to the overall level of writing.
Test Content: Examinees are given 30 minutes to write on a single topic they choose from a set
of two topics. At any given time approximately 30 sets of topics (or 60 individual topics) are being
used as MELAB prompts. Prompts are generally brief, two or three sentences in length, and
presented in sets. In general, topics ask examinees either to describe something from personal
experience, state a position on an issue and defend it, explain a problem and offer solutions, or to
compare and contrast and take a position. Spaan (1993) has referred to some prompts as
stimulating a personal narrative and others as ones that may stimulate impersonal, argumentative
text. Hamp-Lyons and Prochnow-Mathias (1994) have categorized MELAB prompts into five
types: expository/private, expository/public, argumentative/private, argumentative/public, and a
combination (argumentative/expository/public). Offering examinees a choice of the type of
composition they write is intended to give them an opportunity to display their writing ability in the
mode they choose. The difficulty of expository writing versus narrative writing for speakers of
English as a second language may not be the same as for native speakers of the language.
The desired outcome of the task is a writing sample of sufficient length to show the language
proficiency of the examinee. The topics may present a challenge to examinees in terms of their
capacity to integrate, indeed to use language to express ideas, rather than simply to produce a
string of sentences in order to parade grammatical items. Consequently the compositions are
marked to reflect factors such as organization, coherence of discourse, and expression of
content, as well as the more formal elements of written language.
In the written directions examinees are informed that the essays are judged on clarity and overall
effectiveness, as well as on topic development, organization, range, accuracy, and
appropriateness of grammar and vocabulary. The compositions are evaluated holistically, in one
location, by trained raters using a 10-point criterion referenced rating scale. Features of the
scoring system represent aspects of writing articulated by professional raters as salient in
impromptu ESL composition writing. The values of the scores, 53, 57, 63, 67, 73, 77, 83, 87, 93,
97, were chosen to calibrate with the scaled scores on Parts 2 and 3 of the MELAB. (See also
Section 1.3.1)
3.2.1.2 Content-related Evidence for Part 2: Listening
Test Aim: The listening test of the MELAB is intended to assess the ability to comprehend
spoken English. It attempts to determine the examinee's ability to understand the meaning of
short utterances and of more extended discourse as spoken by university educated, native
speakers of standard American English. It intends to require the examinees to activate their own
schemata to interpret the meaning of what they hear and to use various components of their
linguistic system in achieving meaning from the spoken discourse and presumes the activation of
various comprehension abilities such as prediction, exploitation of redundancy in the material,
and the capacity to make inferences and draw conclusions while listening. The test does not
attempt to specifically incorporate a variety of English dialects or registers but focuses on general
spoken English, conversational as well as more planned speech, e.g. lectures based on written
notes.
Test Development: A test of aural comprehension has been a component of a Michigan
proficiency battery since the late 1960's when the commercial availability of electronic audio
delivery systems made it feasible to include a pre-recorded listening test. The MELAB listening
content guidelines, markedly different from guidelines for earlier aural tests, were established in
the mid-1980's in light of current theoretical models of language proficiency. Content guidelines
prescribe the inclusion of different types of listening items including some with minimal context
and some based on extended context.
37
Minimal context items permit the sampling of a range of conversational utterances. On the item
level, this is a way of assessing particular knowledge of conversational routines. On the test level
it is a way to assess the ability to comprehend the unexpected. A natural element of real
language ability is to predict what speakers will say in particular contexts; new contexts are
continually being created in real communicative encounters. The extended discourse segments
are well beyond memory-span and the listener must therefore engage in specific lecture-
processing strategies. Note-taking is permitted, and examinees may use their notes when
answering the questions. This, too, is meant to resemble a lecture-processing situation, and to
engage lecture-listening skills which would operate in more realistic settings. The task is
intended to reflect current theories that postulate that listening comprehension involves
vocabulary, predictive abilities, background knowledge, and awareness of stress and intonation
as well as grammar in a communicative context.
Items are written and edited by the ELI testing research staff who follow general specifications for
content and form of the aurally-delivered item cue and the answer choice options that appear in
printed form in the test booklet. Test developers are guided to formulate items that represent
natural spoken English and to base extended spoken text on naturally occurring conversations.
They develop short lectures on topics of general interest about which specific background
knowledge is not required. A source for minimal context items is naturally occurring interactions
between native and non-native speakers in an American university setting, both within and
outside the classroom. Items are developed that incorporate grammatical features and lexical
components that appear in real language use situations where there has been some
misinterpretation or misunderstanding on the part of the non-native speaker. Answer choice
distractors for the minimal context items represent predictable misunderstandings stemming from
particular grammatical or lexical features of the utterances. Distractors for the emphasis type
items reflect expected misunderstanding as to which lexical item in the utterance is being
articulated with emphatic stress. In questions about the longer segments of discourse, the wrong
response options are based on possible misunderstandings of main points and/or significant
details conveyed in the extended discourse.
Test items are trialed on native and non-native speakers of various linguistic backgrounds and
academic levels similar to typical MELAB examinees. (Test items can be answered easily by
adult native speakers of standard American English who have normal hearing and average
literacy skills.) Items are selected for inclusion in MELAB listening tests based upon
consideration of the content of the particular item, the item difficulty for trial populations, and the
correlation of response to the item with responses on similar items and with total score on a
listening test. Item difficulty figures range from .30 to .80 with mean difficulty about .65.
Additionally, the trial population is divided into five groups based on their total score on the
listening test, and items are analyzed in terms of how well the different groups performed on the
item. Items are selected for inclusion if a greater percentage of those in the higher scoring
groups answer them correctly and if the items have a point biserial correlation of at least .30 with
similar test items. Parallel forms (coded as BB, CC, etc.) of the listening test are constructed
following an established format with regard to item content and statistical characteristics.
Test Content: The MELAB listening test is delivered via audio recording lasting approximately
25 minutes. Conversations and mini-lectures are semi-scripted and recorded by male and female
native speakers, using colloquial phrasings as they naturally emerge in the discourse. All test
item questions are delivered in Standard American English at a normal delivery rate (about 150
wpm) and reduced speech is included when it would naturally occur in spoken English.
Examinees select responses to 50 test questions from multiple choice options printed in a test
booklet. A brief description of the different types of test items and the number of those items that
appear on four forms of MELAB Part 2 can be found in Table 3.12.
38
Some items require the examinee to assume the role of participant and others the role of
eavesdropper. None requires particular background characteristics of the examinees with regard
to content knowledge, but most items require some familiarity with conversational routines. Test
items typically require more than literal interpretation of the cue; most involve using the linguistic
system (phonological, grammatical, lexical) and extracting the illocutionary meaning, i.e. the
function of the utterance such as request, invitation, etc. Some test items focus on the
understanding of lexical expressions common to spoken English and the understanding of
utterances with the natural embeddings and complexities that occur in spoken English.
Some items (the emphasis items) focus specifically on the meaning conveyed by
suprasegmentals or prosodic aspects of a speaker's utterance. They require the listener to work
out the meaning-intention of the speaker to enable a decision to be made on how the speaker
would continue. The utterance stem is deliberately ambiguous, and this ambiguity can only be
resolved by integrating the stress information contained in the pronunciation of the utterance. As
such, the item requires the combination of a particular and important performance skill in listening
linked with the capacity to extract underlying meaning from language.
Also included in the MELAB are items based on more extended conversational discourse and
extended discourse simulating a short lecture. Visual support material in the form of a chart or
graph is intended to assist the listener in interpreting the extended discourse. As mentioned
previously, examinees are encouraged to take notes and make further notations on the graphic
material which is printed on the answer sheet as they listen to the lecture or conversation. After
the lecture or conversation, they may refer to these notes when they hear a question and are
selecting answer responses. The mini-lecture represents the situation in much post-secondary
academic training where lectures are given in combination with blackboard or other audio-visual
presentations, and it is the task of the student to relate the two sources of information. Graphic
material also serves to contextualize the other segment of extended discourse, the conversation,
and requires the mobilization of many components of the examinee's language system in
processing the discourse and then retrieving explicit information as well as drawing inferences
when responding to questions about the conversation. This is a way of assessing the ability to
make use of redundancy of information, the ability to identify significant meaning elements of the
discourse, and the ability to maintain understanding throughout extended segments of spoken
English.
In an attempt to reduce the role of reading in the listening tasks, the printed answer choices are
generally brief (2 - 7 words in length). Generally, there is a 12 second pause between items,
during which the examinee can read and select the answer printed in the test booklet and then
mark it on a separate answer sheet.
Aurally presented test cues vary in length but generally are between 9 and 14 words in length.
Each of the single-sentence and two-utterance conversational exchanges are independent of
each other, and test questions on the content of the mini-lecture and the content of extended
conversational discourse are independent as test items.
Different forms of the MELAB listening test have different test items. The number of each type of
item varies somewhat by form, but the items in each form are equated with regard to general
range of content, coverage, and difficulty. Mean item difficulty of the various item types in
MELAB Part 2 varies slightly across form. There is an attempt to adjust for differences when test
forms are equated. An analysis of item responses of a stratified random sample of 610 MELABs
taken in 1990-91 revealed a mean item difficulty of .604 on Form BB (N=299) and .641 on Form
CC (N=311). No item type appears significantly more difficult across all forms.
39
Table 3.12 MELAB Part 2 (Listening) Items: Type and Number by Form
Number of Items by Form
Type of Item BB CC DD EE
Question
Choose the appropriate answer to a short question.
8 8 10 10
Statement
Identify the paraphrase of single utterances or short
conversational exchanges between two speakers.
7 7 17 16
Emphasis
Choose the appropriate response to short
expressions articulated with emphasis on particular
lexical items or identify how a speaker might continue
after emphasizing a certain lexical term.
10 10 8 8
Lecture
Select appropriate answers to short questions based
on a 3 - 4 minute mini-lecture presented on audio
tape. A visual graph related to the mini-lecture is on
the answer sheet and examinees are advised to take
notes on what they hear to aid them in recalling
information when responding to questions following
the lecture.
13 11 5 7
Conversation
Select the appropriate answer to short questions
based on an approximately 4 - 5 minute
conversation. A simplified map related to the
conversation is on the answer sheet and examinees
are advised to take notes to aid them in recalling
information when responding to questions following
the conversation.
12 14 10 9
3.2.1.3 Content-Related Evidence for Part 3: Grammar, Cloze, Vocabulary, Reading
(GCVR)
Test Aim: The GCVR test aims to sample aspects of the learner's usage and use of English.
Both syntax and morphology are assessed in the grammar section; the ability to recognize
elements of coherence and cohesion, as well as elements of grammar and semantics, is
measured in the cloze section; the understanding and use of lexis is focused on in the vocabulary
section; and comprehension of written text is the focus of the reading section. Taken together,
these different components are intended to provide a measure of general language competence
in an academic setting.
Test Development: A general test of multiple-choice items of grammar, vocabulary, and reading
has been a component of a Michigan proficiency battery since the 1960's. The grammar and
vocabulary components have allowed the sampling of a variety of grammatical and lexical
elements and the reading comprehension component contains items that reflect the close
attention to detail that much of academic reading requires, as well as the capacity to go slightly
beyond the information given and draw appropriate inferences on the basis of the text. A cloze
component was added in the 1980's in an attempt to include a specifically integrative language
40
proficiency measure. Item specifications and content guidelines guide item writers in the
development of grammar, cloze, vocabulary, and reading test items.
Grammar: In the grammar items, the head represents a short (2-line) conversational exchange.
A slot appears in the second speaker's utterance. Only one of the multiple options correctly
completes the second speaker's utterance. Item specifications require that the head use
relatively high frequency vocabulary that is appropriate for a spoken register but is not too
idiomatic and not too complicated or lengthy . The distractors may be grammatical deviations or
interlanguage errors common to non-native speakers. The options are generally parallel in form
or structure, e.g. all variations in verb tense, all adverbials. Distractors are not wrong simply
because of orthography or punctuation, e.g. "It's" vs. "Its," and test items do not test prescriptive
usage distinctions in English or usage distinctions variable in native speakers of English, e.g.
"have got" vs. "have gotten."
Cloze: Selections are made from texts of general interest to examinees. Reading passages
selected for cloze tests are about 250 words long and are simpler to understand when complete
than the passages used in the reading comprehension section of MELAB Part 3. The cloze test
includes items that attempt to tap the learner's understanding of organizational features of texts,
i.e. cohesive and coherent features as well as grammatical knowledge. It also taps the learner's
pragmatic knowledge, particularly knowledge about expected vocabulary in certain written
contexts. It uses a combination of random and rational deletion; about every 7th word is deleted,
but words are chosen for deletion that appear important for a continuous processing of the text.
Words are deleted that might require the reader to refer to previous elements of the text or might
require momentarily leaving the slot blank and continuing, then going back and filling in the slot
later.
Once a text is selected and deletions made, the cloze passage is initially trialed on ELI testing
and instructional staff. The text may be revised and the cloze blanks altered based on comments
and performance of native speakers. The cloze selections are trialed on non-native speakers in
Michigan as supply item cloze tests. The multiple-choice distractors are then chosen from the
responses of non-native speakers. Only clearly wrong responses are chosen for incorrect
options. The cloze selection in a multiple-choice format is then piloted on non-native speakers.
Passages that correlate positively with examinees' performance on other cloze proficiency tests
are chosen for inclusion in a form of MELAB Part 3. Evidence gathered during the pre-testing of
cloze tests has shown generally similar correlations of cloze tests (in the .60's) with the grammar,
vocabulary, and reading comprehension components.
Vocabulary: Words that are included in the vocabulary test are drawn primarily from frequency-
count word lists. Selection of words is tempered by the expert judgment of the item writers who
recognize the many limitations of word lists (e.g. when they were constructed, what types of texts
were analyzed). Primary sources
7
of word lists have been Computational Analysis of Present
Day American English by Henry Kucera and Nelson W. Francis
8
and The Teacher's Word Book
of 30,000 Words by Edward Thorndike and Irving Lorge.
9
Words selected for testing occur 5 to
12 times per million and in more than one genre (Kucera-Francis) or 5 to 20 times per million
(Thorndike Lorge summary count). The MELAB targets this frequency because such words
frequently appear in academic-related discourse and anyone interacting in such a discourse
community is expected to know such words.

7
The American Heritage Word Frequency Book by J. B. Carroll, P. Davies and B. Richman (1991) also serves as a
reference text on word frequency.
8
Francis, W. N. & Kucera, H. (1967). Computational analysis of present-day American English. Providence: Brown
University Press.
9
Lorge, I. & Thorndike, E. L. (1944). The teacher's word book of 30,000 words. New York: Teachers College, Columbia
University.
41
Item writers rely heavily on their prior education and experience as English language teachers
and raters of written and spoken English as they select words for testing and choose words for
distractors. Item writers choose words for testing that appear useful for general and academic
English. There is a systematic attempt, given the academic nature of the target population, to
include between 40% and 50% sub-technical words, i.e. those non-discipline-specific, but
nonetheless highly relevant words that figure prominently in academic discourse, such as trivial
and concede. Second, in an attempt to probe the area of more colloquial language as is
encountered in lectures and oral presentations and some social situations, a smaller proportion of
the total, i.e. 10% to 15%, is idiomatic phrases and expressions. Otherwise, the sampling is
mainly based on the sorts of frequency criteria outlined above.
Reading: It might be argued that much of the MELAB is a reading test, because of the format of
test, namely test booklets for both the Parts 2 and 3 of the MELAB that expect the examinee to
process written text to take the exam. However, the MELAB Part 3 reading section aims at the
assessment of examinees' comprehension of college-level reading texts.
Item specifications for this part of the MELAB stipulate reading passages that are excerpts from
primarily expository texts from various publications of general interest to educated adults (e.g.
Ann Arbor Observer, National Geographic, Natural History, New York Times, New Yorker,
Newsweek, Scientific American, Smithsonian, Travel Holiday, UM Research News). Test
passages are of various types: humanities (e.g. literature, folktales); social science (e.g.
anthropology, history, government); physical science (astronomy, physics, mechanics); biological
science (biology, zoology, medicine). The content of each passage is accessible to a non-
specialist in that particular subject area; passage subject matter is intended to have general
interest and appeal. Different types of passages are included in each form in an attempt to
present a range of readings that will not advantage or disadvantage examinees of any particular
educational background. Content of the passages does not typically contain information that is
simply common knowledge. Passages are edited in an attempt to enhance clarity, cohesion, and
coherence of the excerpt that is no longer in its original context. Edited passages are pre-tested
with questions. More questions per passage are pretested than used in the MELAB to permit
selection of those questions with the best item to total discrimination and item difficulty values.
Test Content: MELAB Part 3 contains 100 test items and all test material, including instructions,
is presented in printed test booklets. Each test item has four options for a response. The
examinee selects one response and marks the response on a separate answer sheet. The
format includes:
Grammar: 30 items
Cloze: 20 items based on 1 passage
Vocabulary: 30 items
Reading: 20 items based on 4 or 5 passages
Grammar: Grammar items included in MELAB Part 3 include items focusing on control of
English syntax (sentence structure) and morphology (word structure). Table 3.13 summarizes
the types of grammar items in each of four forms of MELAB Part 3.
42
Table 3.13 MELAB Part 3 (GCVR) Grammar Items: Type and Number by Form
Number of Items by Form
Type of Item AA BB CC DD
Syntax
Grammar type I
(Sentence structure, excl. embedding)
8 6 5 6
Grammar type II
(Embedding)
4 5 4 5
Morphology
Grammar type III
(Verbals: aux., passive, infinitive)
6 7 7 6
Grammar type IV
(Nominals)
7 5 5 6
Grammar type V
(Misc.: adverbs, prepositions)
5 7 9 7
Cloze: One cloze reading selection of approximately 250 words is included in each MELAB Part
3 form. The first and last sentence of the text is complete, that is, no words are omitted. In the
rest of the text, approximately every 7th word is omitted, with 20 words in total missing from the
selection; there is some variation is how often words are missing ranging from every 6th to 9th
word. The selection with the words missing is printed on one half of the test paper; on the other
half of the page, multiple-choice answers are printed from which the examinee must choose a
response. Distractors may vary in syntactic categories from the correct response, and distractors
may be grammatically correct but wrong with reference to the explicit meaning conveyed
elsewhere in the selection.
Items deleted in cloze passages of different forms of MELAB Part 3 are from different form
classes (e.g. nominals, adjectivals, verbals) and are words that serve various discourse functions
in the texts. The cloze test samples closed to open form classes in the ratio of approximately
60:40. Closed classes of words are typically words that can be classified as prepositions,
pronouns, determiners, conjunctions, and auxiliary verbs such as modals. Open classes of words
are nouns, adjectives, main verb, and adverbs.
Vocabulary: The vocabulary sub-test consists of 30 multiple choice items, 15 each of two types:
synonym, and word-meaning in context.
The synonym type of item appears in a single, relatively short, sentence with the word to be
tested underlined. Context is minimal; the four multiple-choice options appear below the
sentence. The answer choices are all high frequency words (at least 50 per million on the word
lists). The word underlined in the stem is a less-frequently occurring word. Examinees must
select the answer choice that is a synonym for the underlined word.
In the completion type of item, a sentence with a blank slot is followed by four multiple-choice
options, only one of which fits appropriately into the sentence. More context is provided than in
the synonym type vocabulary items. All of the answer choices in the completion type of item are
at the targeted frequency level. Also, all are the same word form, e.g. all adjectives, and fit the
sentence grammatically.
The selection criteria result in 75% of the words included in the MELAB Part 3 vocabulary test
being part of a 15,000 general core vocabulary such as Longman's Lexicon of Contemporary
English by Tom McArthur (1981).
10

10
McArthur, T. (1981) Longman's lexicon of contemporary English. Harlow, Essex: Longman.
43
Reading: The reading sub-test consists of four or five passages with twenty questions in total.
Comprehension of the passages is assessed by the questions, generally five, that follow each
passage. Four response options are printed following each question, and the examinee is
instructed to select the one response that correctly answers the question.
The test passages have similar readability levels as measured by standard readability formulas.
An examination of the reading ease of the reading passages suggests that they are considered
"difficult" and at a college level according to a standard readability formula based on sentence
length and syllables per 100 words as is shown in Table 3.14 below. Because readability
formulas have well-known limitations (e.g. the ideational complexity is not considered), readability
statistics are not used in passage selection; however, the statistics do provide some measure of
the structural complexity of the passages.
Table 3.14 MELAB Part 3 Reading Passage Readability Statistics
1
MELAB Form
and Passage
Flesch
11
Reading Ease
Flesch
Grade Level
AA Passage 1 30.7 15.9
Passage 2 32.9 15.6
Passage 3 78.6 7.1
Passage 4 43.3 14.0
Passage 5 48.2 13.3
BB Passage 1 30.9 15.9
Passage 2 35.4 15.2
Passage 3 46.4 13.5
Passage 4 43.3 14.0
CC Passage 1 54.1 11.8
Passage 2 31.2 15.8
Passage 3 43.9 13.9
Passage 4 46.3 13.6
Passage 5 46.5 13.5
DD Passage 1 26.5 17.0
Passage 2 57.8 10.7
Passage 3 51.7 12.5
Passage 4 53.5 11.9
1
Flesch
Reading Ease
Flesch
Grade Level
Reading
Ease
90-100 4 very easy
80-90 5 easy
70-80 6 fairly easy
60-70 7-8 standard
50-60 9-10 fairly difficult
30-50 11-14 difficult
0-30 15-16 very difficult
The group of questions following each passage is intended to check examinees' comprehension
of the entire passage rather than of just a portion of it, and of important ideas rather than of
insignificant details. The questions are independent of each other. Questions focus on
examinees' comprehension of:

11
Microsoft Corporation. (1991). Microsoft word for windows 2.0 [Computer software] United States: Microsoft Corp.
Flesch reading ease and grade level generated by Microsoft Word for WIndows 2.0 computer software.
44
the main idea of the passage;
details explicitly stated in the passage;
inferences that can be drawn from the passage;
the author's viewpoint or purpose; and
the logical relationships between portions of the passage.
Questions are generally short and written with the intention of making them easy for examinees to
understand. The vocabulary and syntax are simpler in the questions than in the passage.
Typically, distractors are possible misreadings of the passage.
When a MELAB Part 3 reading section is assembled, reading passage type, length, and mean
item difficulty are considered. As can be seen in Table 3.15, different types of reading
passages
Table 3.15 MELAB Part 3 Reading Passages (type & length) and Item Difficulty
MELAB
FORM
Item
Numbers Type of Passage
Words per
Passage
Projected
Mean Item
Difficulty
1
Actual
Mean Item
Difficulty
2
AA 81 - 85 Physical Science 190 .759 .743
86 - 88 Social Science 140 .616 .538
89 - 92 Narrative 170 .620 .559
93 - 95 Social Science 190 .624 .619
96 - 100 Biological Science 275 .521 .482
965 (.630) .588
BB 81 - 85 Social Science 265 .743 .650
91 - 95 Social Science 200 .601 .587
1005 (.624) .571
CC 81 - 85 Social Science 225 .703 .548
86 - 89 Social Science 141 .583 .609
90 - 92 Humanities 136 .702 .499
93 - 95 Physical Science 170 .528 .491
96 - 100 Physical Science 250 .606 .542
922 (.628) .538
DD 81 - 85 Physical Science 175 .571 .669
86 - 90 Social Science 260 .566 .689
96 - 100 Social Science 280 .688 .600
975 (.627) .624
1
Difficulty of items for trial population composed of approximately 600 Examination for the Certificate of Proficiency in
English tested outside the U.S.
2
Difficulty of items for operationalized MELAB; MELAB population for which difficulty was calculated is a stratified
random sample of 610 MELABs taken in 1990-91 (Part 3 Form AA N=148, Form BB N=196, Form CC N=165, Form DD
N=101)
45
are included in each form, and the total length of all of the passages included in each form is
similar (approximately 900 - 1000 words). Projected mean item difficulty is based on mean item
difficulty for the trial population. The projected mean item difficulty for each form is about .63.
Table 3.15 also shows the actual mean item difficulty for MELAB Part 3 from when it was
operationalized. Differences from the projected item difficulty are accounted for when score
equivalency tables are established. (See Section 3.1.3.1 for information on the development of
alternate forms.)
MELAB Part 3 Summary Information: Information about the difficulty of the grammar, cloze,
vocabulary, and reading items of four forms of MELAB Part 3 is shown in Table 3.16. Slight
variations in mean item difficulty across test forms are reduced by test equating, when raw scores
are converted to scaled scores.
Table 3.16 MELAB Part 3 (GCVR) Item Difficulty by Sub-Section
12
AA BB CC DD
Grammar (20 items)
Mean Item Difficulty .587 .616 .601 .619
Standard Deviation .161 .140 .118 .161
Cloze (20 items)
Vocabulary (30 items)
Reading (20 items)
All GCVR Items (100 items)
3.2.1.4 Content-related Evidence for Speaking Test
Test Aim: The Speaking Test is a direct measure used to determine the examinee's capability to
communicate effectively in spoken English. The examinee generates a speech sample in a live,
one-on-one conversational setting. The task is intended to provide an opportunity for the
examinee to discuss concepts in his or her academic, professional or technical subject area. The
rating on speaking ability is intended to supplement the final MELAB score and provides
information complementary to that provided by the three part scores of the MELAB.
Test Development: The usefulness of having a direct measure of an examinee's speaking
ability has been recognized by various individuals who have need to know a person's English
language proficiency. Because the MELAB is typically administered by individuals who have
professional training in teaching English as a second or foreign language, the test administrators
are able to conduct oral interviews and rate speaking ability. Guidelines for the administration
and rating of oral interviews were developed in the mid-1980's and reviewed and revised in the
early 1990's based upon observation of MELAB interviews conducted in Ann Arbor and reviews
of audio-taped oral interviews conducted by various examiners in various world-wide locations.

12
Item difficulty from operationalized MELAB; MELAB population for which difficulty was calculated is a stratified random
sample of 610 MELABs taken in 1990-91 (Part 3 Form AA N=148, Form BB N=196, Form CC N=165, Form DD N=101)
46
Procedural directions for conducting oral interviews are intended only as guidelines, and
examiners are advised to be flexible and sensitive to the individual nature of each interview.
Examiners are advised to be aware that speaking performance may vary throughout an interview
and to assume that examinees may be nervous. Examiners are advised to speak in a natural
way with normal rate of delivery and to adjust their English (by slowing their rate or simplifying
lexis or syntax) only as needed to promote communication in the interview. They are advised to
establish a comfortable environment in which the examinee can speak with elaborated replies
and to avoid making the interview an interrogation. Ratings of performance are expected to
reflect the level of performance the examinee sustained during the interview.
Test Content: Interviews are conducted individually and last generally 10 to 15 minutes.
Interviews have a three-part structure. In the opening phase, the examinee is asked general
background questions phrased as yes/no questions, tag questions and simple factual questions.
In the main part of the interview, a variety of question formats is used to elicit extended discourse.
Information about the examinee acquired during the interview is used to provide the content of
questions calling for spoken descriptions, comparisons, or speculation by the examinee.
Examinees are encouraged, when possible, to talk about academic, professional, or technical
topics in which they have some expertise. The interview is ended when the examiner believes he
or she has a sufficient sample to be able to rate the examinee accurately and appropriately.
Examiners, referring to MELAB "Overall Spoken English Descriptors," match the examinee's
performance with one of four levels of spoken English. Comments about positive and negative
features of the examinee's spoken English may be explicated by the oral interviewer and are
reported on the MELAB score report along with the rating. The oral rating is not averaged with
the Final MELAB Score.
3.2.1.5 Content-related Evidence for Final MELAB Score
The MELAB attempts to provide both a general, fairly global estimate of general readiness for
academic study and also information at a more specific, and even diagnostic level. The three
parts of the MELAB, along with the MELAB Speaking Test, provide estimates of examinees'
competence in handling both written and spoken English. In this way, it reflects a theoretical view
which assumes that there are general and specific components of language proficiency. The test
consequently provides information, in the shape of the total score, that can be the basis for a
general judgment of the examinee's capacity to undertake academic study or training in an
English medium setting. It also provides specific information, in the shape of the part scores,
which can provide a more rounded picture, and which can be used to diagnose and to make more
qualitative judgments that may be linked to the specific settings in which particular students may
have to operate.
3.2.2 Construct-related Evidence
The construct-related validity of a test is the extent to which a test measures a theoretical
construct or trait. The general construct that a MELAB is believed to be measuring is English
language proficiency, or more specifically, proficiency in English as a second language for
academic study.
In this section, three types of construct-related evidence about the MELAB are reviewed. The
first explores the MELAB in relation to a theoretical model about communicative language ability,
the second presents the results of a factor-analysis of the MELAB, and the third focuses on native
speaker performance on the MELAB.
47
3.2.2.1 Language Proficiency Theory and the MELAB
An established model of the construct English language proficiency for academic study does not
exist, but over the years increasing attention has been given to model building that can assist us
in our efforts to develop valid language tests. One of the most well-known is one proposed by
Canale and Swain (1980)
13
and Canale (1983)
14
in which the construct "language proficiency" is
actually replaced by the term "communicative competence." The Canale (1983) model proposes
four aspects of communicative competence: grammatical competence, sociolinguistic
competence, discourse competence, and strategic competence. This model has been revised
and expanded by researchers of applied linguistics and language testing.
One current model particularly devised to inform language test developers is one proposed by
Bachman (1990)
15
and later revised by Bachman and Palmer (in press)
16
. According to this
model, "communicative language ability" consists of "both knowledge, or competence, and the
capacity for implementing, or executing that competence in appropriate, contextualized
communicative language use." (Bachman, 1990, p. 84) Communicative language ability involves
the mobilization of knowledge of the world and knowledge of language. Language knowledge
consists of organizational knowledge (grammatical and textual knowledge) and pragmatic
knowledge (lexical, functional, and sociolinguistic knowledge).
While the MELAB was not developed from the model, it may be useful to analyze the components
of the MELAB in relationship to such a model, to examine which aspects of language knowledge
are called on in each part of the MELAB. As can be seen in Table 3.17, it appears that the
MELAB targets both organizational and pragmatic knowledge. It provides broad-based measures
of language knowledge, thus approaching the evaluation of English for academic study with
attention to both the form and function of language use.
Table 3.17 MELAB Components and Bachman/Palmer Model of Language Knowledge
Organizational
Knowledge
Pragmatic
Knowledge
Grammatical Textual Lexical Functional Sociolinguistic
Part 1
Composition X X X X
Part 2
Listening Q & S X X X
Part 2
Listening E X X X
Part 2
Listening L & C X X X X X
Part 3
Grammar X
Part 3
Cloze X X X X
Part 3
Vocabulary X
Part 3
Reading X X X X
Speaking
(optional) X X X X X

13
Canale, M. & Swain, M. (1980). Theoretical bases of communicative approaches to second language teaching and
testing. In Applied Linguistics 1 (1) p. 1 - 47.
14
Canale, M. (1983). On some dimensions of language proficiency. In Oller, J. W. Jr. (Ed.), Issues in language testing
research. pp. 333 - 342 Rowley, MA: Newbury House. .
15
Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford: Oxford University Press.
16
Bachman, L. F. & Palmer, A.S. (in press). Language testing in practice. Oxford: Oxford University Press.
48
3.2.2.2 Factor Analysis of the MELAB
Factor analysis provides insight into the MELAB's construct validity. Construct validity has been
examined on two levels.
17
First, the relative homogeneity of component scores within test part
was explored, that is, all of the Part 2 component scores were analyzed separately from all of the
Part 3 component scores. This provided an opportunity to assess the component score structure
independent of potential test method differences, since Part 2 requires test takers to respond to
aural cues and Part 3 requires them to respond to written cues. Second, the relative
homogeneity of component scores across test part was explored, that is all component scores
from Part 2, Part 3, and Part 1, the composition score, were analyzed together. This provided an
opportunity to compare the relative impact of "trait" vs. "method" in contributing to the overall test
structure.
The following component scores were analyzed:
Part 2 Listening (50 items)
Question (8 items)
Statement (7 items)
Emphasis (10 items
Lecture (13 items)
Conversation (12 items)
18
Part 3 Grammar, Cloze, Vocabulary, Reading (100 items)
Grammar (30 items)
Cloze (20 items)
Vocabulary Synonym (15 items)
Vocabulary Completion (15 items)
Reading comprehension (20 items)
The first set of analyses was of two test forms for Part 2 (BB, CC) and three test forms for Part 3
(AA, BB, CC). A separate factor analysis was conducted for each test form--two for Part 2 and
three for Part 3. The goal of this was to provide some internal replication of the analyses.
Consistency of results across test forms would provide greater confidence in the interpretability of
the factor solution, and provide evidence that the test's construct validity is not idiosyncratic to
test form.
For the second set of analyses, two factor analyses were conducted--one for Part 2 BB collapsed
across Part 3, and the other for Part 2 CC collapsed across Part 3. This provides an internal
replication as with the first set of analyses. This strategy was chosen because Part 2 BB and CC
component scales have a slightly different number of lecture and conversation items and thus
should not be collapsed together; Part 3 forms do not have that inconsistency.
19
The analyses were conducted using the Maximum Likelihood extraction method.
20
In the case of
multiple factor solutions, the oblimin rotation method was used. Oblimin was chosen because it is
an oblique (non-orthogonal) rotation method which allows for correlated factors. Since the

17
A data base of 610 MELAB scores was used for the factor analysis. This data set is a stratified random sample
(considering native language, test site, and mean final score and standard deviation per language group) of MELABs
taken in 1990-91.
18
Items presented here are for Part 2-(Listening) Form BB. Form CC , also submitted to factor analysis, consists of the
same subscales for question, statement, and emphasis, but the lecture subscale consists of 11 items and the
conversation 14 items.
19
It was not feasible to collapse the analysis file into six groups based on the two parts' test forms because that would
have resulted in sample sizes that were too small.
20
Earlier analyses using the Principal Axis Factor extraction method yielded similar results, and they will not be discussed
further.
49
MELAB components are assessing language ability as the underlying construct, it was expected
that multiple empirically derived factors would be correlated.
Three basic criteria were used in interpreting the factor solutions. First, the number of factors
chosen was based primarily on the percentage of total variance explained by each factor and by
examination of scree plots of the eigenvalues of each factor. The goal was to select a number of
factors which account for a "sizable" percentage of variance and to exclude those which account
for little incremental variance. The former is apt to describe common variance shared by a
number of component scores. Second, the loading of each factor should "make sense"
conceptually. They should also replicate across test forms. Third, the factor solution should do a
good job of modeling the common variance of the component scores. This common variance is
described through the correlation matrix of the component scores. A successful factor model can
reproduce this correlation matrix using the model's derived factor loadings. For interpretation, a
residual correlation may be calculated as the difference between the actual and reproduced
correlations. Small residuals reflect a better fit of the factor solution. The quality of reproduction
provides an indication of the quality of the model.
21
Part 2: Listening. Form BB and CC showed similar solutions. A one factor solution appears to
adequately explain the data. For Form BB, the intercorrelation matrix, component score means,
and standard deviations are presented in Table 3.18. A single factor explains 52.1% of the total
variance (eigenvalue = 2.61). Each of the component scores show similar loadings on this factor
(loadings range from .68 to .75). This is shown in Table 3.19. The single factor solution is
successful at reproducing the original correlation matrix (see Table 3.20). Only two residual
correlations are greater than .05.
Table 3.18 Part 2 (Listening) Form BB (N=299)
Component Means, Standard Deviations, and Correlation Matrix
N Item Mean SD Q S E L C
Question 8 5.43 2.03 1.00
Statement 7 4.08 1.67 .55 1.00
Emphasis 10 5.60 2.97 .51 .50 1.00
Lecture 13 7.86 2.89 .49 .54 .56 1.00
Conversation 12 6.82 2.45 .56 .51 .42 .57 1.00
Table 3.19 Part 2 (Listening) Form BB
Factor Loadings Single Factor Solution
Component Factor 1
Lecture .75
Question .73
Statement .73
Conversation .71
Emphasis .68
Table 3.20 Part 2 (Listening) Form BB
Reproduced Correlation Matrix Single Factor Solution
Question Statement Emphasis Lecture Conversation
Question .54 * .02 .01 -.05 .03
Statement .53 .53 * -.00 -.00 -.01
Emphasis .50 .50 .47* .05 -.06
Lecture .55 .55 .51 .56* .02
Conversation .52 .52 .49 .53 .50*

21
For a good reference on factor analysis, see Dillon, W. R. and Goldstein, M. (1984). Multivariate analysis: Methods
and applications. New York: John Wiley and Sons.
50
For Form CC, the intercorrelation matrix and component score means and standard deviations
are presented in Table 3.21. A single factor explains 52% of the total variance (eigenvalue =
2.64). Each of the component scores shows similar loadings on this factor (loadings range from
.61 to .76). This is presented in Table 3.22. The single factor solution is successful at
reproducing the original correlation matrix (see Table 3.23). Only one residual correlation is
greater than .05.
Table 3.21 Part 2 (Listening) Form CC (N=311)
N Item Mean SD Q S E L C
Question 8 4.40 2.17 1.00
Statement 7 4.38 1.86 .62 1.00
Emphasis 10 6.39 2.27 .46 .42 1.00
Lecture 11 7.04 2.37 .54 .52 .53 1.00
Conversation 14 9.84 2.97 .55 .59 .43 .57 1.00
Table 3.22 Part 2 (Listening) Form CC
Component Factor 1
Question .76
Statement .76
Conversation .75
Lecture .74
Emphasis .61
Table 3.23 Part 2 (Listening) Form CC
Question Statement Emphasis Lecture Conversation
Question .58 * .04 -.01 -.02 -.02
Statement .58 .57 * -.04 -.04 -.02
Emphasis .47 .46 .37* .08 -.03
Lecture .57 .56 .45 .55* .02
Conversation .57 .57 .46 .55 .56*
A two factor model was examined as a potential improvement to the one factor model. For Form
BB, the second factor consisted of the emphasis and lecture components. These factors
correlated .63 and .77 for the two forms. However, this second factor was relatively unstable
across test forms. For Form BB, factors 1 and 2 accounted for 40.1% and 23.4% respectively of
the total variance (eigenvalues = 2.0 and 1.17). For Form CC, factors 1 and 2 accounted for
54.0% and 5.3% respectively of the total variance (eigenvalues = 2.70 and .26). This instability
across test forms, combined with the small percentage of variance accounted for by factor 2 on
Form CC, suggests caution in interpreting a two factor solution. In addition, the initial extraction
procedure for both analyses showed relatively small eigenvalues for all factors other than the first
(i.e., all eigenvalues were much lower than 1.0). Because of the lack of confidence in the two
factor model, and because the one factor solution explains the test adequately, Part 2 is
interpreted as a single factor construct.
51
Part 3: GCVR. The three test forms showed similar solutions. A one factor solution appears to
adequately explain the data.
For form AA, the intercorrelation matrix and component score means and standard deviations are
presented in Table 3.24. A single factor explains 70.8% of the total variance (eigenvalue = 3.54).
Component score loadings range from .68 to .88 (see Table 3.25). The single factor solution is
successful at reproducing the original correlation matrix (see Table 3.26). Only one residual
correlation is greater than .05.
Table 3.24 Part 3 (GCVR) Form AA (N=148)
N Item Mean SD G C VS VC R
Grammar 30 17.61 5.94 1.00
Cloze 20 12.14 3.77 .73 1.00
Vocab Synonym 15 8.66 3.52 .69 .64 1.00
Vocab Completion 15 9.07 3.33 .75 .69 .76 1.00
Reading 20 11.84 4.64 .74 .72 .65 .69 1.00
Table 3.25 Part 3 (GCVR) Form AA
Component Factor 1
Grammar .88
Vocab Completion .87
Reading .83
Cloze .82
Vocab Synonym .68
Table 3.26 Part 3 (GCVR) Form AA
G C VS VC R
Grammar .77* .01 -.02 -.01 .02
Cloze .72 .68 * -.03 -.02 .04
Vocab Synonym .71 .67 .66* .06 -.03
Vocab Completion .75 .71 .70 .75* .03
Reading .73 .68 .67 .72 .69*
For Form BB, the intercorrelation matrix and component score means and standard deviations
are presented in Table 3.27. A single factor explains 61.1% of the total variance (eigenvalue =
3.06). Component score loadings range from .72 to .81 (see Table 3.28). The single factor
solution is successful at reproducing the original correlation matrix (see Table 3.29). There are
no residual correlations greater than .05.
Table 3.27 Part 3 (GCVR) Form BB (N=196)
Grammar 30 18.47 5.97 1.00
Cloze 20 11.91 3.78 .61 1.00
Vocab Synonym 15 7.82 3.17 .55 .52 1.00
Reading 20 11.41 4.36 .63 .67 .59 .62 1.00
52
Table 3.28 Part 3 (GCVR) Form BB
Component Factor 1
Reading .81
Grammar .79
Cloze .77
Vocab Synonym .72
Table 3.29 Part 3 (GCVR) Form BB
G C VS VC R
Grammar .63* -.00 -.02 .02 -.01
Cloze .62 .60 * -.03 -.02 .04
Vocab Synonym .57 .56 .52* .04 .00
Vocab Completion .64 .63 .58 .65* -.03
Reading .64 .628 .58 .65 .65*
are presented in Table 3.30. A single factor explains 66.0% of the total variance (eigenvalue =
3.30). Component score loadings range from .74 to .87 (see Table 3.31). The single factor
solution is successful at reproducing the original correlation matrix (see Table 3.32). Only three
residual correlations are greater than .05.
Table 3.30 Part 3 (GCVR) Form CC (N=165)
Grammar 30 18.04 6.47 1.00
Cloze 20 11.16 4.16 .62 1.00
Vocab Synonym 15 8.14 3.43 .65 .51 1.00
Reading 20 10.85 5.05 .68 .72 .62 .65 1.00
Table 3.31 Part 3 (GCVR) Form CC
Component Factor 1
Grammar .86
Reading .80
Vocab Synonym .78
Cloze .74
53
Table 3.32 Part 3 (GCVR) Form CC
G C VS VC R
Grammar .74* -.02 -.01 .02 -.01
Cloze .64 .55* -.07 -.02 .12
Vocab Synonym .67 .58 .60* .05 -.00
Vocab Completion .75 .65 .68 .76* -.05
Reading .69 .60 .62 .70 .65*
In summary, given the quality and strong consistency of the single factor model across test forms,
consideration of a two factor solution was not necessary.
Combined MELAB Parts. The second set of analyses examined the factor structure of the
component scores for Parts 2, 3, and Part 1, the composition score. Separate analyses were
conducted for Part 2 Forms BB and CC. One, two, and three factor models were tested. The
results for both test forms suggest that a two factor solution can adequately explain the MELAB
component scores.
For the analysis using Listening Form BB, the intercorrelation matrix and component score
means and standard deviations are presented in Table 3.33. Two factors explain 61.5% of the
total variance (factors 1 and 2 explain 55.6% and 5.9% of the variance, respectively; eigenvalues
= 6.12 and .65). The factor pattern coefficients are presented in Table 3.34A; the factor structure
coefficients are presented in Table 3.34B.
22
The first factor consists of the all the components of
MELAB Part 3--grammar, cloze, vocabulary completion, vocabulary synonym, and reading--plus
MELAB Part 1--the composition score. The second factor consists of all the components of Part
2--question, statement, emphasis, lecture, and conversation. Thus, structurally the MELAB may
be interpreted as having two underlying constructs--written and aural language ability. The
objective items of Part 3 and the composed writing of Part 1 are statistically similar enough to be
grouped under a single written dimension.
Table 3.33 Listening Form BB, GCVR (Forms AA, BB, CC), and Composition (N=299)
Mean SD Q S E L C G Cl VS VC R Comp
Question 5.43 2.03 1.00
Statement 4.08 1.67 .55 1.00
Emphasis 5.60 2.97 .51 .50 1.00
Lecture 7.86 2.90 .50 .54 .56 1.00
Conversation 6.81 2.45 .56 .51 .42 .56 1.00
Grammar 18.41 6.59 .51 .60 .51 .60 .47 1.00
Cloze 12.04 4.10 .40 .50 .52 .56 .40 .72 1.00
VocabSyn 8.27 3.43 .39 .48 .43 .53 .37 . .67 .61 1.00
VocabCom. 8.61 3.64 .44 .54 .40 .57 .44 .75 .67 .72 1.00
Reading 11.75 4.86 .47 .54 .56 .64 .49 .74 .73 .69 .72 1.00
Composition 76.61 6.86 .50 .47 .44 .54 .44 .67 .61 .57 .63 .62 1.00

22
For technical descriptions of pattern and structure loadings, see Dillon, W. R. and Goldstein.M. (1984). Multivariate
analysis: Methods and applications: New York: John Wiley and Sons.
54
Table 3.34A Listening Form BB, GCVR (Forms AA, BB, CC), and Composition
Factor Pattern Loadings Two Factor Solution
Component Factor 1 Factor 2
Vocab Completion
.89 -.05
Vocab Synonym .87 -.09
Cloze .81 .01
Grammar .80 .12
Reading .78 .12
Composition .58 .20
Question -.08 .82
Conversation -.04 .75
Statement .22 .56
Emphasis .16 .55
Lecture .32 .50
Table 3.34B Listening Form BB, GCVR (Forms AA, BB, CC), and Composition
Factor Structure Loadings Two Factor Solution
Grammar .87 .69
Reading .86 .68
Vocab Completion .86 .61
Cloze .81 .60
Vocab Synonym .80 .54
Composition .73 .62
Question .52 .76
Lecture .68 .73
Conversation .51 .72
Statement .62 .72
Emphasis .56 .67
This two factor solution is successful at reproducing the original correlation matrix (see Table
3.35). Only three residual correlations are greater than .05. The two factors are highly correlated
(r = .73), suggesting that although statistically distinct, the written and aural dimensions are both
closely tied to a superordinate language ability construct.
Table 3.35 Listening Form BB, GCVR (Forms AA, BB, CC), and Composition
Reproduced Correlation Matrix Two Factor Solution
Q S E L C G Cl VS VC R Comp
Question .58* .02 .01 -.05 .01 .01 -.03 .01 .01 -.02 .04
Statement .54 .53* .00 -.01 -.00 .03 -.01 .01 .02 -.03 -.04
Emphasis .50 .50 .46* .05 -.06 -.01 .06 -.00 -.07 .04 -.02
Lecture .54 .55 .51 .58* .04 -.02 .00 .00 -.01 .02 -.00
Conversation .55 .51 .48 .52 .52* -.01 -.02 -.01 .02 .01 -.01
Grammar .50 .57 .52 .62 .49 . 76* .01 -.02 .01 -.01 .02
Cloze .43 .51 .46 .55 .42 .71 .66* -.04 -.02 .03 .01
VocabSyn .38 .48 .43 .52 .38 .69 .65 .64* .03 .01 -.01
VocabCom. .43 .52 .47 .57 .42 .74 .69 .69 .73* -.02 .01
Reading .49 .57 .52 .61 .48 .75 .70 .68 .73 .74* -.02
Composition .46 .51 .46 .54 .44 .64 .59 .57 .62 .64 .55*
55
are presented in Table 3.36. Two factors explain 56.9% of the total variance (factors 1 and 2
explain 48.8% and 8.1% of the variance, respectively; eigenvalues = 5.36 and .89). The pattern
coefficients for the analysis are presented in Table 3.37A; the factor structure coefficients are
presented in Table 3.37B. The first factor consists of the all the components of MELAB Part 3--
grammar, cloze, vocabulary completion, vocabulary synonym, and reading--plus MELAB Part 1--
the composition score. The second factor consists of all the components of Part 2--question,
statement, emphasis, lecture, and conversation. Thus, the factor structure of MELAB appears to
replicate across test forms. The two factor solution successfully reproduces the original
correlation matrix (see Table 3.38.) Only three residual correlations are greater than .05. The
two factors are highly correlated (r = .64), again supporting that although statistically distinct, the
two proficiency dimensions are closely tied to a superordinate construct.
Table 3.36 Listening Form CC, GCVR (Forms AA, BB, CC), and Composition (N=311)
Mean SD Q S E L C G Cl VS VC R Comp
Question 4.40 2.17 1.00
Statement 4.38 1.86 .62 1.00
Emphasis 6.39 2.27 .46 .42 1.00
Lecture 7.04 2.37 .54 .52 .53 1.00
Conversation 9.84 2.97 .55 .59 .43 .57 1.00
Grammar 17.92 5.60 .56 .47 .45 .48 .45 1.00
Cloze 11.50 3.60 .43 .38 .42 .45 .40 .59 1.00
VocabSyn 8.41 3.37 .36 .29 .34 .38 .30 .61 .53 1.00
VocabCom. 8.05 3.29 .38 .34 .30 .38 .33 .63 .60 .68 1.00
Reading 11.34 4.30 .48 .40 .46 .54 .47 .59 .67 .55 .55 1.00
Composition 75.93 6.55 .48 .43 .39 .46 .35 .57 .57 .49 .53 .49 1.00
Table 3.37A Listening Form CC, GCVR (Forms AA, BB, CC), and Composition
Factor Pattern Loadings Two Factor Solution
Vocab Completion .89 -.11
Vocab Synonym .87 -.12
Cloze .66 .15
Grammar .62 .25
Reading .55 .28
Composition .52 .24
Statement -.06 .79
Conversation -.05 .77
Question .05 .74
Lecture .10 .67
Emphasis .12 .54
56
Table 3.37B Listening Form CC, GCVR (Forms AA, BB, CC), and Composition
Factor Structure Loadings Two Factor Solution
Vocab Completion .83 .46
Vocab Synonym .79 .43
Grammar .78 .65
Cloze .75 .57
Reading .73 .63
Composition .67 .57
Question .52 .77
Statement .44 .75
Conversation .44 .74
Lecture .53 .74
Emphasis .46 .61
Table 3.38 Listening Form CC, GCVR (Forms AA, BB, CC), and Composition
Reproduced Correlation Matrix Two Factor Solution
Q S E L C G Cl VS VC R Comp
Question .60* .04 -.02 -.03 -.02 .04 -.03 .00 -.00 -.02 .03
Statement .58 .56* -.04 -.03 .03 .01 -.02 -.00 .03 -.05 .02
Emphasis .48 .46 .39* .07 -.03 .00 .02 .01 -.05 .03 -.00
Lecture .57 .55 .46 .55* .03 -.03 -.01 .01 -.01 .04 .01
Conversation .57 .56 .46 .55 .55* -.01 -.00 .00 .02 .02 -.06
Grammar .52 .46 .44 .51 .46 .64* -.02 .01 .01 -.02 .01
Cloze .46 .40 .40 .46 .40 .61 .58* -.05 -.01 .10 .04
VocabSyn .36 .29 .33 .37 .30 .59 .58 .63* .03 -.00 -.02
VocabCom. .38 .31 .35 .39 .32 .62 .61 .65 .68* -.03 -.00
Reading .51 .45 .43 .50 .45 .61 .57 .55 .58 .58* -.04
Composition .46 .41 .39 .45 .41 .56 .53 .51 .54 .53 .49*
A three factor solution did not appear to improve the model's ability to explain the test data. For
Form BB, the three factors accounted for 55.7%, 6.2%, and 2.2% of the total variance. This third
factor is trivial compared to the other two and consists of the emphasis component score. With
the three factor solution, only one residual correlation is greater than .05. Likewise, for Form CC,
the three factors accounted for 49.0%, 8.4%, and 2.5% of the total variance. This third factor
consists of the reading component score. Three residual correlations are greater than .05.
Because of the trivial size of the third factor, and its marginal improvements to reproducing the
correlation matrix, a three factor solution is rejected.
A single factor model was also not adequate to explain the intercorrelations of component scores.
For Form BB, a one factor model resulted in 18 residual correlations greater than .05. For Form
CC, a one factor model resulted in 32 residual correlations greater than .05.
In summary, the entire MELAB is interpreted as a two factor test. The first factor consists of all
parts of the MELAB and the second factor consists of the Part 2 listening items. This
interpretation is consistent across forms. These factors are strongly correlated (r = .73 and .64
for Forms BB and CC). The two factor model appears best at explaining the MELAB component
scores. A one factor model is inadequate, and a three factor model is unnecessary.
57
3.2.2.3 Native Speaker Performance on the MELAB
Because construct-related validity focuses on the extent to which a test measures a particular
trait or construct, it may be useful to examine the MELAB scores of individuals who consider
English their native language. Table 3.39 compares descriptive score information of 73 such
MELAB examinees tested between 1987 and 1990 with the scores of all examinees took the
MELAB between 1987 and 1990.
Table 3.39 MELAB Scores for Those Claiming English as Their Native Language
and MELAB Total Group Scores
23
English Total Group
MELAB Range Mean SD Range Mean SD
Part 1 67 -- 97 83.89 7.36 53 -- 97 73.89 7.88
Part 2 60 -- 98 85.90 7.85 30 - 100 75.37 11.44
Part 3 71 -- 100 90.74 7.33 21 - 100 72.63 15.34
Final 70 -- 97 86.85 6.55 30 - 98 73.96 10.37
The average final score is higher for those claiming English as their native language. Over half,
54%, have final scores above 90, and 82% have final scores of 80 or above. This pattern
suggests that the MELAB is a test of English language proficiency. The scores that were lower
may be attributable to examinees' unfamiliarity with spoken American English, undeveloped
literacy skills, inattentiveness, or to method effect.
3.2.3 Criterion-Related Evidence
One way to address the question of whether a test is measuring what it intends to measure is to
investigate how well examinees' performance on that test corresponds to their performance on
another measure assumed to give a trustworthy assessment of the abilities of interest. This
second measure thus serves as a criterion measure to use in drawing inferences about the
validity of the first test. Following are reports of studies conducted in order to collect criterion-
related information on the validity of the MELAB. The criterion measures used are: (1) tests of
"productive" language skills (a written composition and a speaking test; (2) another English
proficiency battery, the TOEFL; and (3) teacher assessments of their students' English
proficiency.
3.2.3.1 MELAB and Tests of "Productive" Language Skills
One criterion might be performance on "productive" measures of language use. "Productive"
measures can be defined as those measures that require examinees to produce a language
sample. Productive measures may reveal directly how well someone can use English. In order
to ascertain whether level of productive language skills is related to MELAB score, an analysis of
the relationship of examinees' performance on tests of spoken and written English to MELAB
scores was conducted. The scores of 2,781 MELAB examinees of various linguistic
backgrounds who took the MELAB and the MELAB Speaking Test between January, 1987 and
December, 1993 were analyzed. These examinees all had scores representing their productive
skills in writing English and in speaking English--a writing assessment score from MELAB Part 1
and a spoken English assessment score from the MELAB Speaking Test. Each of these scores
is determined by raters who rate independently of each other; that is, the MELAB Part 1 score is
arrived at by two raters who do not know what the examinee's spoken English is like; and the
MELAB administrator who conducts the oral interview and rates the examinee's spoken English
does not have knowledge of the examinee's performance on MELAB Part 1.

23
Data about language is based on information provided by the examinee; verification of the veracity of the information
was not obtainable.
58
The examinees were classified into seven groups on the basis of their scores on these two tests
of productive language skills, MELAB Part 1 and the MELAB Speaking Test.
Group 7: writing 93-97, speaking 4 or 4+
Group 6: writing 83-87, speaking 4 or 4+ or writing 93-97, speaking 3 or 3+
Group 3 writing 73-77, speaking 2 or 2+
The MELAB Part 1 scores and MELAB Speaking Test scores are referenced to descriptions of
productive language ability. (See Section 1.4.1 for full descriptions of the MELAB composition
score levels and Appendix D for full descriptions of the MELAB spoken English ratings.) A brief
summary of what the scores mean in terms of writing and speaking proficiency is shown in Table
3.40.
Table 3.40 Brief Proficiency Descriptions for MELAB Writing and Speaking Ratings
Part 1 Composition
Score
Writing
Proficiency
Speaking
Score
Speaking
Proficiency
93 - 97 very good 4 - 4+ very good/good
83 - 87 good 3 - 3+ capable
73 - 77 basic 2 - 2+ marginal/modest
57 - 67 limited 1 - 1+ limited
The final MELAB scores obtained by examinees in the seven groups were examined and
analyzed. A one-way ANOVA revealed significant differences in the final MELAB score means
of the seven groups (F (6,1698)=514.54, p<.001) and a Scheff test, a post-hoc comparison test,
showed that each group mean differed significantly from all other group means (p<.001).
24
The shaded areas in Figure 3.3 show the range of Final MELAB scores obtained by the middle 50
percent of examinees in each of the seven language proficiency groups. For example, the middle
50 percent of examinees with "very good" writing and speaking skills Group 7) had Final MELAB
scores between 92 and 96; the iddle 50 percent of those with "basic" writing and
"marginal/modest" speaking skills (Group 3) had Final MELAB scores betwen 70 and 79. The
Final MELAB scores at the 25th, median (50th), and 75th percentile of each of the seven groups
are shown at the bottom of Figure 3.3.
A clear pattern is evident in Figure 3.3. Examinees demonstrating better productive skills in
writing and speaking tended to have higher MELAB scores than those who scored lower on the
productive language tests. There is some overlap in the score ranges representing the seven
proficiency levels, but an examination of the four groups (darkest shading) who were rated at the
same skill level in both writing and speaking reveals that the Final MELAB scores of these groups
do not overlap.

24
Because the Final MELAB score is not independent of one of the productive measures, the writing score from Part 1, a
MELAB score variable was created that was the sum of MELAB Part 2 and MELAB Part 3 scores. A comparison of the
groups was conducted using that variable instead of MELAB final score. The results were essentially the same; the
groups differed significantly from each other. However, it was decided to present the data in terms of Final MELAB score
because it is that score that test users are familiar with, not the artificially-created MELAB Part 2 plus MELAB Part 3
variable.
59
Figure 3.3 Final MELAB Scores for Seven Levels of Written and Spoken English
Final
MELAB
Score
Group 1 Group 2 Group 3 Group 4 Group 5 Group 6 Group 7
99
98
97
96
95
94
93
92
91
90
89
88
87
86
85
84
83
82
81
80
79
78
77
76
75
74
73
72
71
70
69
68
67
66
65
64
63
62
61
60
59
58
57
56
55
75% 65 73 79 82 89 91 96
Median 59 67 76 79 86 89 94
25% 55 59 70 73 82 86 92
60
3.2.3.2 MELAB and Another Proficiency Battery, the TOEFL
A MELAB criterion-related validity study conducted in 1986 used the TOEFL, the Test of English
as a Foreign Language, as the criterion measure. The subjects in this study were 72 incoming
University of Michigan students admitted between Fall, 1983 and Fall, 1985. Although they all
had taken TOEFL, they were required to take the MELAB upon arrival on campus because
further information was needed about their English proficiency. Table 3.41 shows descriptive
statistics on the MELAB and TOEFL scores of this group of students.
Table 3.41 MELAB/TOEFL Descriptive Statistics
Test Sample Size Mean SD Range
MELAB 72 80.68 5.30 71 - 97
TOEFL 72 566.17 35.17 470 - 650
On average, these students had taken the TOEFL 11 months earlier than they took the MELAB,
the majority between 5 and 18 months earlier. It is possible, therefore, that the MELAB scores
are slightly higher than they would have been if the two tests had been administered at the same
time. Not surprisingly, since students with low TOEFL scores generally are not admitted to the
University of Michigan, the mean TOEFL score for this group is higher than the mean of all who
take the TOEFL, and the range of TOEFL scores is restricted. A similar restriction in range of
MELAB scores is evident. The restricted range of the TOEFL and MELAB scores likely
depresses the correlation between them.
The correlation between examinees' Final MELAB scores and TOEFL total scores (n = 72) was
.704 in spite of the restricted range of scores. A correlation of this magnitude suggests that there
is considerable overlap in what these tests are measuring.
It is important to remember that while the MELAB and the TOEFL are similar in purpose, the two
test batteries have obvious differences in form and content. Although both have three subtests,
the contents of the subtests are not analogous (for example, grammar, vocabulary, and reading
are tested in one subtest of the MELAB, while in TOEFL, vocabulary and reading are tested in
one subtest and grammar in another). Perhaps the most notable difference is that the MELAB
contains a productive writing component (Part 1), whereas TOEFL, unless TWE is given with
TOEFL, assesses writing in a multiple-choice format (Section 2: Structure & Written Expression).
The two subtests that do focus on the same skill, listening (MELAB Part 2 and TOEFL Part 1),
correlate at .640.
61
3.2.3.3 MELAB and Teacher Assessments
In August, 1991, a MELAB validity study was conducted using teacher assessments of students
as criterion measures. The MELAB was administered to two groups of students enrolled in ESL
classes. The students' scores were then compared with teacher assessments of the students'
English language proficiency in order to examine to what degree the test and the teachers
agreed.
One group of subjects, Group A, was a class of 28 "advanced" adult students (from 9 different
language backgrounds) at a private language school. They had been placed in the class, called
"TOEFL Preparation," either by getting high scores on the placement test used at the language
school
25
or by having been at the language school long enough to have completed the school's
lower-level courses.
The second group of subjects, Group B, was made up of 20 students (from 7 different language
backgrounds) already admitted to full-time graduate or undergraduate programs in the United
States. The students for whom TOEFL scores were available (n = 17) had a mean TOEFL
overall score of 575. They were taking a 7-week intensive course in English for academic
purposes that began two months prior to the start of their regular university classes.
Table 3.42 presents descriptive information about the MELAB scores of both groups of students
and of the 1991-1993 MELAB population in general.
Table 3.42 Descriptive Statistics for Teacher Assessment Validity Study
Group A Group B General Population
1
Number 28 20 4,811
MELAB Mean 75.32 83.55 75.84
MELAB SD 6.73 4.71 10.40
1
Based on "first-time" MELABs taken between 1991 and 1993.
The figures in Table 3.42 show that Group A is more typical of the entire population of people
who take the MELAB than is Group B. The mean MELAB score of Group A is very similar to the
general population mean; however, the standard deviation is smaller. The mean MELAB score of
students in Group B is significantly higher and the standard deviation significantly smaller than
the corresponding figures for the population of all MELAB examinees. Since Group B is a group
of students already admitted to highly-competitive English-medium universities, it is not surprising
that their mean MELAB score is above 80 and that there is relatively little variation in their scores.
However, this restriction in the range of scores of Group B (and to a lesser extent, of Group A)
does depress correlation coefficients based on those scores.
During the last week of their classes, a few days prior to the MELAB administration, the teachers
of both groups completed questionnaires that asked them to evaluate their students' "proficiency"
in English (as distinct from their participation, effort, or improvement in class).
26
Students in
Group A were evaluated by only one teacher. Students in Group B had different teachers for
writing, reading/vocabulary, listening, and speaking/pronunciation. For students in Group B, the
teacher ranking and evaluation figures used in this study are the mean values of scores assigned
by all their teachers.

25
English Placement Test (EPT). (1978). Ann Arbor, MI: English Language Institute, The University of Michigan.
26
As a check on the reliability of the questionnaire used in this study, one of the teachers was asked to fill out the
questionnaire twice over a one-week interval. The correlation (Pearson r) between the way she ranked her students each
of the two times was .92 (p < .001). On the questionnaire item asking whether or not students were ready for full-time
university work in English, the teacher marked seventeen of the twenty students "ready" on the first questionnaire and
twenty of twenty "ready" on the second questionnaire.
62
Teachers were asked to rank their students on the basis of their English proficiency (for Group B,
on the basis of their proficiency in the skill area corresponding to the course each teacher taught).
The teacher-assigned ranks (in the case of Group B, the ranked mean ranks) were correlated
with ranked MELAB scores. As Table 3.43 shows, a moderately strong relationship was found for
each group of students, suggesting similarities in the ways that the MELAB and teachers rank
students.
Table 3.43 Relationship Between MELAB Scores and Teacher Ranking of Students
Group N Spearman Rank Correlation
Coefficient
Significance
(1-tailed)
A 28 .54 .01
B 20 .67 .001
The teachers of both groups were also asked to indicate whether they thought each of their
students was ready to be a full-time student in an English-medium university or not. For Group A,
the mean MELAB score of students the teacher judged as ready for full-time work (77.59, n = 17)
was significantly higher (t = -2.40, df = 26, one tailed p < .02) than the mean MELAB score of
students she judged as not ready (71.82, n = 11).
The teachers of students in Group B knew that their students had already been accepted for full-
time university work, so perhaps it is not surprising that only one student was judged by more
than one teacher as unready for full-time work. This student's MELAB score was 68, the lowest
in the class. All other students were judged by at least three of their four teachers to be ready for
full-time university study. These students all scored at least 80 on the MELAB.
It is important to stress that due to the nature and size of the sample populations used in this
study, the results of the various parts of the study must be interpreted cautiously. The results,
however, do suggest that there is a moderate degree of agreement between the way ESL
teachers and the MELAB judge the English language proficiency of non-native speakers.
63
APPENDICES
64
65
APPENDIX A: MELAB CENTERS BY COUNTRY
The MELAB is given throughout the U.S. and Canada, and in over 100 other countries. In the U.S. and
Canada, most states/provinces have several examiners. Examination Centers are subject to change without
notice.
ALBANIA
ARGENTINA
ARMENIA
AUSTRALIA
AUSTRIA
BAHAMAS
BAHRAIN
BANGLADESH
BELARUS
BELGIUM
BOLIVIA
BRAZIL
BRUNEI DARUSSALUM
BULGARIA
CAMEROON
CANADA
CENTRAL AFRICAN REPUBLIC
CHILE
COLOMBIA
COSTA RICA
CYPRUS
CZECH REPUBLIC
DENMARK
DOMINICAN REPUBLIC
ECUADOR
EGYPT
EL SALVADOR
ENGLAND
ETHIOPIA
FEDERATED STATES OF
MICRONESIA
FIJI
FINLAND
FRANCE
GEORGIA
GERMANY
GHANA
GREECE
GUATEMALA
GUYANA
HAITI
HONDURAS
HONG KONG
HUNGARY
ICELAND
INDIA
INDONESIA
IRAN
ISRAEL
ITALY
JAPAN
JORDAN
KENYA
KOREA
KUWAIT
LATVIA
LEBANON
LESOTHO
LIBYA
LITHUANIA
MADAGASCAR
MALAWI
MALAYSIA
MALI
MAURITIUS
MEXICO
MONGOLIA
MOROCCO
MOZAMBIQUE
MYANMAR
NEPAL
NETHERLANDS
NETHERLANDS ANTILLES
NEW CALEDONIA
NEW ZEALAND
NICARAGUA
NIGERIA
NORWAY
OMAN
PAKISTAN
PANAMA
PARAGUAY
PRCHINA (by special arrangement)
PERU
PHILIPPINES
POLAND
PORTUGAL
PUERTO RICO
ROMANIA
RUSSIA
SAUDI ARABIA
SENEGAL
SIERRA LEONE
SINGAPORE
SOUTH AFRICA
SPAIN
SRI LANKA
SUDAN
SURINAME
SWEDEN
SWITZERLAND
SYRIA
TAIWAN R.O.C.
TANZANIA
THAILAND
TONGA
TRINIDAD
TURKEY
UGANDA
UKRAINE
UNITED ARAB EMIRATES
UNITED STATES
URUGUAY
VENEZUELA
WALES
ZAIRE
ZAMBIA
ZIMBABWE
66
APPENDIX B: HISTORICAL BACKGROUND LEADING TO THE MELAB
The English Language Institute was founded in 1941 to research and administer English
language training programs. Research at the ELI has also included the development of language
tests. Prior to World War II, language teaching in the U.S. involved mainly instruction by the
"grammar-translation" method. English language tests, strongly influenced by a British tradition,
were tests of dictation, reading aloud, written essays, translations and commentaries on specific
English literature reading assignments.
In the 1940's and 1950's, Professor Robert Lado, then Director of Testing for the English
Language Institute of the University of Michigan, wrote a series of tests that would "objectify"
English language testing. Lado hoped that by using multiple choice objectively scored tests, the
reliability problems inherent in subjective scoring would be eliminated and that the scoring
process would also be faster, easier, and not require highly trained scorers.
The content of the tests reflected innovations in language teaching, namely an approach that
emphasized listening and speaking. The aural/oral approach developed at a time when linguists
were designing courses intended to promote quick acquisition of practical speaking ability. Basic
principles of course development rested on the following theoretical assumptions:
language is speech (not writing);
language is a set of acquired habits;
language teachers should teach the language, not about the language;
the language is what native speakers say, not what
they think they say or think they ought to say; and
all languages are different.
A fundamental feature of the new approach was that a scientific descriptive analysis of the
language was the basis from which to develop teaching materials and language tests.
The Lado Test of Aural Comprehension (Lado TAC) was written in 1946. It was a multiple-choice
test with aurally presented prompts and answer choices which were either pictorially or
graphically presented. Many of the items were based on phonemic variation; for example, in one
item the aural stimulus is "She is thinking." The answer choices depict a) a woman singing; b) a
woman sinking in water; c) a woman thinking (cartoon-like elements make this clear). The Lado
TAC was used in the ELI-UM's Intensive English program.
Many of the students who had progressed through all the levels in the Intensive English program
wished to continue academic studies at the University of Michigan. Additional reading/writing
components were needed to test these students for readiness for academic work. In 1951, Lado
published the English Language Test (Lado ELT). This was another "objective" (i.e., multiple
choice) test, and it contained multiple-choice problems of structure, pronunciation, and
vocabulary.
Beginning in 1956, an English language test battery, which then included a written composition,
the Lado TAC, and the Lado ELT, was administered to entering foreign students at the University
of Michigan. The tests in the battery had been developed for advanced level students in the ELI
intensive language study program and were thought appropriate to evaluate the language ability
of non-native speakers entering various academic programs. Some entering students had had
some kind of English test in their home country as part of the admission process, but many were
not tested until they arrived on campus. A 1959 predictive validity study of the test involved
comparing the English test battery scores and first semester GPA of 599 students, both
undergraduate and graduate students in six programs of study at the UM. Basically, the results
indicated a moderate (.51) but statistically significant correlation between scores on the test
battery and GPA. An analysis of the failure rate (i.e. below the minimally acceptable GPA)
revealed the greatest failure rate (42.5%) for students who had lower test scores and did not
follow a recommendation for reduced course load and supplementary English. However, it
should be noted that 58% of those with lower language test scores did, in fact, have passing
GPAs. Thus, it was recognized early on that language proficiency is only one factor in predicting
67
academic success, and other factors, such as type of course work attempted, prior academic
background, and motivation, are also significant.
In the late 1950's there also was increased national interest in the English proficiency of foreign
students as more students from around the world came to the United States to attend educational
and technical assistance programs. It was argued that a working knowledge of English was
essential for full participation in such programs. In 1961, at the conference on Testing the English
Proficiency of Foreign Students held in Washington, D.C., the need for a large-scale overseas
testing program was affirmed and guidelines were agreed upon concerning the nature of the test
with regard to content and administration. Subsequently, the Test of English as a Foreign
Language (TOEFL) was developed and started in 1963. The TOEFL program is administered
through the Educational Testing Service.
The English Language Institute of the University of Michigan continued, however, to research and
develop language tests and continued to make its Testing Service available to other institutions of
higher learning and to organizations that needed to assess the English proficiency of individuals.
During the late 1950's, the UM was the only U.S. institution with a testing service that conducted
ESL tests overseas and had an ESL test reporting service. However, a testing program of the
American University Language Center (AULC) in Washington D.C. developed tests for use by the
State Department, and in the early 1960's one form of an AULC test was made available to
American universities for overseas testing.
In the 1960's, the ELI Testing Division changed part of the battery to include an integrated skills
test of reading comprehension. The Michigan Test of English Language Proficiency (MTELP)
replaced the Lado ELT in 1961. The Lado ELT was a discrete item test, that is, each item on the
test was independent of the others, and it tested structure, pronunciation, and vocabulary as
separate systems. The new test, the MTELP, continued to include the testing of structure and
vocabulary, but the pronunciation sections were deleted and a reading section was added. The
MTELP included grammar in a conversational format; vocabulary, both in synonym selection and
choice of a word in a given context; and reading selections followed by comprehension questions.
Thus, while the grammar and vocabulary items were still discrete items, the reading questions
were based on longer discourse units (up to approximately 200 words). A 1967 study of the
Michigan Battery that included the MTELP as one of its parts looked at the correlation of Michigan
Battery scores and the academic success of graduate students as indicated by two criteria, GPA
and course grades. A significant correlation between course grades and Michigan Battery scores
was found for graduate students in the humanities and social sciences, with a higher correlation
between Michigan Battery scores and course grades in recitation classes in courses like history
than in lab courses, such as survey sampling. Various studies of the correlation of the MTELP
and the TOEFL, conducted in the early 1960's, showed a high correlation (.88) between the two
tests, and the correlation of TOEFL scores with Michigan Battery scores (conducted in the later
1960's), was moderately high to high (.69 to .88) as well.
1
In the late 1960's and early 1970's, the ELI Testing Division produced the Michigan Test of Aural
Comprehension (MTAC/1969) and the Listening Comprehension Test (LCT/1973). These aural
comprehension tests were designed to replace the Lado TAC. They were less phonemically
based and placed more reliance on structural knowledge of the English language.
Through the years, the various components of the parts of the battery were altered, but generally
a Michigan battery has consisted of three parts. Overseas testing included a written composition,
an aural-oral rating by the examiner, and a multiple-choice test of language, such as the MTELP.
Limited international availability of electronic equipment precluded the administration of a
recorded listening test as part of the overseas Michigan battery until the 1970's. In 1985, the "old
Michigan Battery" was replaced with the present Michigan English Language Assessment Battery
(MELAB).

1
Information about the MTELP is available in MTELP Manual. (1977), Ann Arbor, MI: English Language Institute, The
University of Michigan.
68
APPENDIX C
The image quality of the official score report has been intentionally degraded for online
publication in .PDF format. Official Score reports have perfect laser-quality resolution.
69
APPENDIX D
SPOKEN ENGLISH DESCRIPTORS AND SALIENT FEATURES
REFERENCE SHEET
OVERALL SPOKEN ENGLISH DESCRIPTORS
Rating
4
GOOD/VERY GOOD SPEAKER Fluent speech with confident ease of expression. Speech is clear and intelligible but may
be accented (variant articulation and prosodic features). Fully functional sustained communication but may show occasional minor
variation in vocabulary and morphology. Initiates and develops topics throughout interview. Active, collaborative language use.
Aware of what are shared cultural referents and provides additional contextual information when necessary. Uses target-like
lexical phrasings and fillers. Vocabulary is adequate to handle topics with accuracy and facility. Comprehends interviewer's
unadjusted speech easily and grasps the functional intent of interviewer's discourse.
3
CAPABLE SPEAKER Reasonably fluent speech with only occasional rough spots. Speech may be variant in articulation or
prosodic features but is generally intelligible. Occasional clarification may be needed but is handled successfully. Attentive to
monitoring the interaction and realizes when clarification is needed to promote successful communication. Generally elaborated
responses though may search for exact words when responding. Minor unevenness in grammatical accuracy or some inappropriate
word choice does not inhibit conveying ideas. Takes conversational lead at times and contributes to topic extension and
development. Generally can grasp English spoken at a normal rate of delivery and can follow extended discourse though
clarification may be requested and necessary when cultural referents are not shared or when lexical meaning of terminology is
unfamiliar.
2
MARGINAL/MODEST SPEAKER Speech may appear fluent but is deviant in form and substance of utterances or speech
may be somewhat disfluent (slow/halting/measured/unfilled pauses) with reasonably well-formed utterances. Speech may be variant
in stress, intonation, and articulation so that communication is impeded and interviewer may need to attend closely to understand.
Discourse is limited as the speaker does not generally elaborate sufficiently and is often imprecise at clarifying or elaborating when
asked to do so. May not adapt to topic shifts without repetition or rephrasing of interviewer. Word choice is sometimes off target
and syntax is often awkward. Interviewer may need to rephrase linguistically complex questions or comments. May grasp gist of
questions and follow general direction of interview but may misinterpret actual intent of some questions.
1
LIMITED SPEAKER Speech rate is slow and utterances are filled with frequent hesitations. Unable to sustain
communication in interactions; usually passive and only able to respond to simple questions; unable to grasp complex questions or
topic shifts or unexpected questions; unable to handle extended discourse of interviewer; minimal initiative taken in interaction.
Utterances are generally short and morphological and grammatical variation are frequent. Interviewer must adjust rate and linguistic
complexity and restrict topic selection to communicate.
SALIENT FEATURES
FLUENCY/INTELLIGIBILITY (Isolated vs. Segments)
RATE OF SPEECH (too fast/too slow)
PAUSING/HESITATION (too long, unfilled pauses)
ARTICULATION (consonants, vowels, word endings, article deletions,
lack of voice projection, mumbling)
PROSODICS (rhythm, intonation)
GRAMMAR/VOCABULARY
UTTERANCE LENGTH
UTTERANCE COMPLEXITY (adequacy for suppositions, conditional use,
hypothetical use)
LEXICAL RANGE (target-like phrasings, idiomatic word choice,
specific terminology, rich vs. sparse)
MORPHOLOGICAL CONTROL (tense markers, adverbials, word forms)
GRAMMATICAL ACCURACY (occasional deviations vs. distracting
control vs. control causing miscommunications)
FUNCTIONAL LANGUAGE USE/SOCIOLINGUISTIC
PROFICIENCY
INITIATIVE (vs. passive contributor)
ELABORATION (extended responses, appropriate length,
sufficient information, but not too redundant)
SUSTAINED TOPIC DEVELOPMENT (use of transitional links,
prominence given to key points, logical development)
INTERACTIONAL FACILITY (monitors interactions, seeks
clarification when appropriate, takes turn at appropriate time,
properly engaged, appropriate eye contact/posture)
SENSITIVITY TO CULTURAL REFERENTS (establishes common frame
of reference, initiates clarification, rephrasing, concrete
relevant examples)
LISTENING COMPREHENSION
INTERVIEWER MUST:
adjust rate of delivery
adjust complexity of utterances (lexicon, syntax, length)
restrict topic exploration
frequently rephrase and/or repeat
70
APPENDIX E: DESCRIPTIVE STATISTICS (1987-1990)
Table 1E Score Descriptives for 13,588 First-Time MELABs Administered 1987-1990
MELAB Part 1
(Composition)
MELAB Part 2
(Listening)
MELAB Part 3
(GCVR)
MELAB
Final
Minimum Equated Score 53 30 21 35
Maximum Equated Score 97 100 100 98
Mean Equated Score 73.89 75.37 72.63 73.96
Standard Deviation 7.88 11.54 15.34 10.37
Reliability
1 .90 .87 .94 .90
SEM
2 2.49 4.16 3.76 3.28
1
Reliability figures are derived from the mean interrater correlation for Part 1 (see Section 3.1; note that the set of
compositions used to calculate this coefficient isnot identiacal to the set summarized in Table 1E) and from KR 21 applied
to raw scores for Parts 2 and 3. The reliability estimate for the MELAB Final Score is the mean of these estimates for Part
1, Part 2, and Part 3
2
Standard Error of Measurement
Frequency Distribution of MELAB scores (Part 1, Part 2, Part 3, and Final): 1987-1990
Tables 2E through 5E give information about score distribution on the MELAB. First, they show
the number and the percent of examinees who obtained a particular score on MELAB (Part 1,
Part 2, Part 3, or on the Final MELAB). In the far right column of each table is the cumulative
percent that corresponds to each score. The cumulative percent for a given score point is the
percentage of examinees with that score or a lower score. Therefore, if an examinee's score
corresponds to the cumulative percent of 70, that examinee scored as well or better than 70% of
all the examinees. These tables are based on MELABs (all forms) administered between 1987
and 1990; for similar information on MELABs administered between 1991 and 1993, see Section
2.2.
71
Table 2E Frequency Distribution of Final MELABScores
(based on 13,588 first-time MELABs administered 1987 - 1990)
MELAB
Final
Score
N Percent
with
the Score
Cumulativ
e
Percent
98 8 0.1 100.0
97 36 0.3 99.9
96 45 0.3 99.7
95 76 0.6 99.3
94 71 0.5 98.8
93 80 0.6 98.3
92 158 1.2 97.7
91 157 1.2 96.5
90 173 1.3 95.4
89 240 1.8 94.1
88 213 1.6 92.3
87 218 1.6 90.7
86 396 2.9 89.1
85 274 2.0 86.2
84 255 1.9 84.2
83 481 3.5 82.3
82 445 3.3 78.8
81 567 4.2 75.5
80 399 2.9 71.3
79 451 3.3 68.4
78 585 4.3 65.1
77 535 3.9 60.8
76 597 4.4 56.9
75 468 3.4 52.5
74 386 2.8 49.0
73 533 3.9 46.2
72 491 3.6 42.3
71 528 3.9 38.6
70 357 2.6 34.8
69 354 2.6 32.1
68 441 3.2 29.5
67 368 2.7 26.3
MELAB
Final
Score
N Percent
with
the Score
Cumulative
Percent
66 333 2.5 23.6
65 353 2.6 21.1
64 270 2.0 18.5
63 270 2.0 16.5
62 257 1.9 14.5
61 204 1.5 12.7
60 197 1.4 11.1
59 171 1.3 9.7
58 170 1.3 8.4
57 169 1.2 7.2
56 123 0.9 5.9
55 138 1.0 5.0
54 114 0.8 4.0
53 88 0.6 3.2
52 74 0.5 2.5
51 61 0.4 2.0
50 73 0.5 1.5
49 43 0.3 1.0
48 26 0.2 0.7
47 18 0.1 0.5
46 10 0.1 0.4
45 7 0.1 0.3
44 8 0.1 0.2
43 6 0.0 0.2
42 4 0.0 0.1
41 5 0.0 0.1
40 4 0.0 0.1
39 1 0.0 0.0
38 2 0.0 0.0
37 1 0.0 0.0
35 1 0.0 0.0
30 1 0.0 0.0
Table 3E Frequency Distribution of MELAB Part 1 (Composition) Scores
MELAB
Part 1 Score
N Percent with
the Score
Cumulative
Percent
97 52 0.4 100
95 61 0.4 99.6
93 282 2.1 99.2
90 202 1.5 97.1
87 450 3.4 95.6
85 398 2.9 92.2
83 979 7.2 89.3
80 541 4.0 82.1
77 1999 14.7 78.1
75 1277 9.4 63.4
73 2426 17.9 54.0
70 1159 8.5 36.1
67 1866 13.7 27.6
65 716 5.3 13.9
63 676 5.0 8.6
60 188 1.4 3.6
57 172 1.3 2.3
55 65 0.5 1.0
53 70 0.5 0.5
72
Table 4E Frequency Distribution of MELAB Part 2 (Listening) Scaled Scores
MELAB
Part 2 Score
N Percent with
the Score
Cumulative
Percent
100 28 0.2 100.0
98 70 0.5 99.8
96 129 0.9 99.3
95 61 0.4 98.3
94 109 0.8 97.9
93 206 1.5 97.1
92 326 2.4 95.6
91 149 1.1 93.2
90 287 2.1 92.1
89 103 0.8 90.0
88 139 1.0 89.2
87 459 3.4 88.2
86 345 2.5 84.8
85 605 4.5 82.3
84 373 2.7 77.8
83 385 2.8 75.1
82 716 5.3 72.2
81 498 3.7 66.9
80 475 3.5 63.3
79 472 3.5 59.8
78 474 3.5 56.3
77 758 5.6 52.8
76 555 4.1 47.2
75 486 3.6 43.2
74 260 1.9 39.6
73 513 3.8 37.7
72 263 1.9 33.9
71 291 2.1 32.0
MELAB
Part 2 Score
N Percent with
the Score
Cumulative
Percent
70 217 1.6 29.8
69 244 1.8 28.2
68 240 1.8 26.4
67 491 3.6 24.7
66 216 1.6 21.0
65 212 1.6 19.4
64 216 1.6 17.9
63 222 1.6 16.3
62 199 1.5 14.7
61 216 1.6 13.2
60 162 1.2 11.6
59 171 1.3 10.4
58 165 1.2 9.2
57 147 1.1 7.9
56 121 0.9 6.9
55 119 0.9 6.0
54 100 0.7 5.1
52 188 1.4 4.4
50 76 0.6 3.0
49 81 0.6 2.4
47 49 0.4 1.8
46 53 0.4 1.5
43 27 0.2 1.1
42 30 0.2 0.9
40 36 0.3 0.6
35 47 0.3 0.4
30 5 0.0 0.0
73
Table 5E Frequency Distribution of MELAB Part 3 (GCVR) Scaled Scores
MELAB
Part 3 Score
N Percent with
the Score
Cumulative
Percent
100 37 0.3 100.0
99 87 0.6 99.7
98 116 0.9 99.1
97 157 1.2 98.2
96 181 1.3 97.1
95 165 1.2 95.7
94 147 1.1 94.5
93 162 1.2 93.5
92 195 1.4 92.3
91 307 2.3 90.8
90 165 1.2 88.6
89 261 1.9 87.3
88 358 2.6 85.4
87 441 3.2 82.8
86 334 2.5 79.5
85 425 3.1 77.1
84 444 3.3 74.0
83 223 1.6 70.7
82 292 2.1 69.1
81 302 2.2 66.9
80 368 2.7 64.7
79 330 2.4 62.0
78 387 2.8 59.5
77 352 2.6 56.7
76 335 2.5 54.1
75 274 2.0 51.6
74 243 1.8 49.6
73 209 1.5 47.8
72 298 2.2 46.3
71 287 2.1 44.1
70 315 2.3 42.0
69 314 2.3 39.7
68 242 1.8 37.4
67 429 3.2 35.6
66 407 3.0 32.4
65 399 2.9 29.4
64 172 1.3 26.5
63 223 1.6 25.2
MELAB
Part 3 Score
N Percent with
the Score
Cumulative
Percent
62 173 1.3 23.6
61 180 1.3 22.3
60 81 0.6 21.0
59 219 1.6 20.4
58 42 0.3 18.8
57 217 1.6 18.5
56 173 1.3 16.9
55 204 1.5 15.6
54 137 1.0 14.1
53 160 1.2 13.1
52 55 0.4 11.9
51 152 1.1 11.5
50 85 0.6 10.4
49 84 0.6 9.8
48 188 1.4 9.1
47 82 0.6 7.8
46 143 1.1 7.2
45 110 0.8 6.1
44 149 1.1 5.3
43 110 0.8 4.2
40 72 0.5 3.4
39 55 0.4 2.9
38 88 0.6 2.5
37 53 0.4 1.8
36 57 0.4 1.4
35 36 0.3 1.0
34 24 0.2 0.7
33 23 0.2 0.6
32 12 0.1 0.4
31 22 0.2 0.3
30 6 0.0 0.1
29 2 0.0 0.1
28 4 0.0 0.1
27 2 0.0 0.1
26 1 0.0 0.0
25 2 0.0 0.0
23 1 0.0 0.0
21 1 0.0 0.0
74
Table 6E Intercorrelations of MELAB Part Scores and Final Scores (n=13,588) and
MELAB Speaking Test Scores (n=2,326) 1987-1990
1
MELAB Part 1 MELAB Part 2 MELAB Part 3 Final MELAB
MELAB Part 1
(Composition)
MELAB Part 2
(Listening)
.60
MELAB Part 3
(GCVR)
.74 .69
FINAL MELAB .84
(Part 1 with
Part 2 + Part 3 = .74)
.87
(Part 2 with
Part 1 + Part 3 = .70)
.94
(Part 3 with
Part 1 + Part 2 = .79)
MELAB
ORAL RATING
.55 .49 .52 .59
1
Correlations of MELAB Part 1, Part 2, and Part 3 with Final MELAB are spuriously high because the Final MELAB score
is an average of the scores on those three subtests. This is not the case for the correlation between the MELAB
Speaking Test and the Final MELAB; the speaking test score is not a part of the final score.
75
APPENDIX F: Reliability of MELAB Part 1 and Part 2 (by Form): KR21
Table 1F Kuder-Richardson Formula 21 Internal Consistency Reliability Estimates for
MELAB Part 2 (Listening), 1987-1990
1
Form N Mean SD KR 21
SEM
2
BB 6350 28.19 8.91 .86 3.33
CC 7321 31.07 9.18 .88 3.18
1
Based on the raw scores of "first time" MELABs administered between 1987 and 1990. It should be noted that there is
no form AA of MELAB and that forms DD and EE were first used in 1991.
2
Standard error of measurement.
Table 2F Kuder-Richardson Formula 21 Internal Consistency Reliability Estimates for
MELAB Part 3 (GCVR), 1987-1990
1
Form N Mean SD KR 21
SEM
2
AA 4582 58.12 19.43 .94 4.76
BB 4340 55.43 19.17 .94 4.70
CC 2966 52.26 19.38 .94 4.75
DD 1700 54.40 18.65 .94 4.57
1
Based on the raw scores of "first time" MELABs administered between 1987 and 1990.
2
Standard error of measurement.

Michigan

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Michigan

Uploaded by

Copyright:

Available Formats

MELAB

You might also like