Professional Documents
Culture Documents
Manu Konchady
Mustru Publishing,
Oakton, Virginia.
Learn English Vocabulary and Writing:
Use Software to Prepare for the SAT or GRE Exams
by Manu Konchady
Mustru Publishing,
3112 Bradford Wood Court,
Oakton, VA 22124.
Preface v
1. Introduction 1
1.1. Computer Assisted Language Learning . . . . . 3
1.2. Quizzes . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1. Should you Guess an Answer? . . . . . . 7
1.3. Software . . . . . . . . . . . . . . . . . . . . . . 9
1.3.1. WordNet . . . . . . . . . . . . . . . . . . 11
1.3.2. Text Sources . . . . . . . . . . . . . . . 11
1.3.3. Audio . . . . . . . . . . . . . . . . . . . 12
1.3.4. Emustru . . . . . . . . . . . . . . . . . . 13
2. Learning Vocabulary 21
2.1. Why Learn Words? . . . . . . . . . . . . . . . . 21
2.2. Which Words are Important? . . . . . . . . . . 23
2.2.1. How many Words should you Learn? . . 24
2.2.2. Do you know a word? . . . . . . . . . . . 25
2.2.3. Can you guess the meaning of a word? . 30
2.2.4. Five Ways to Grow your Vocabulary . . 31
2.3. How to Learn with Online Quizzes . . . . . . . 34
2.3.1. Visual Thesaurus . . . . . . . . . . . . . 35
2.3.2. Free Rice . . . . . . . . . . . . . . . . . 37
2.3.3. Quizlet . . . . . . . . . . . . . . . . . . . 38
2.3.4. Emustru . . . . . . . . . . . . . . . . . . 39
i
2.4. Why should you learn Spelling? . . . . . . . . . 42
2.4.1. Spelling Error Analysis . . . . . . . . . . 42
2.4.2. Emustru Spelling Quiz . . . . . . . . . . 45
2.5. Words, Meanings, and Relationships . . . . . . 48
2.6. Word Games . . . . . . . . . . . . . . . . . . . 50
2.6.1. Emustru . . . . . . . . . . . . . . . . . . 50
2.7. Web Sites to Learn Vocabulary and Spelling . . 54
ii
4.2.2. Essay Prompt . . . . . . . . . . . . . . . 91
4.2.3. Essay Length . . . . . . . . . . . . . . . 93
4.3. How do you write an essay for E-rater? . . . . . 94
4.3.1. Grammar . . . . . . . . . . . . . . . . . 95
4.3.2. Usage . . . . . . . . . . . . . . . . . . . 97
4.3.3. Mechanics . . . . . . . . . . . . . . . . . 99
4.3.4. Style . . . . . . . . . . . . . . . . . . . . 103
4.3.5. Organization and Development . . . . . 107
4.3.6. Lexical Complexity . . . . . . . . . . . . 110
4.3.7. Prompt-Specific Vocabulary Usage . . . 113
4.3.8. E-rater Writing Tips . . . . . . . . . . . 115
4.4. Emustru Essay Evaluator . . . . . . . . . . . . . 118
4.5. Web sites to learn Essay Writing . . . . . . . . 123
Index 155
iii
iv
Preface
Most books for exams like the SAT describe sample questions,
methods to answer questions, and a few practice exams. These
types of books are very helpful to learn about an exam, the for-
mat, the schedule, and the level of difficulty. However, practice
exams have little value after the first or second attempt. The
questions and answers are familiar and you can identify the
answer from memory.
This book also emphasizes practice exams, however, ques-
tions are customized to your skill level. The software included
with this book tracks your performance on previous exams be-
fore creating a new custom quiz. Questions are dynamically
generated when you are ready to take your exam. The use
of dynamic quizzes means that you cannot rely on memory to
answer questions. The only time a question is repeated is if
you missed a question or if the software requires you answer
the same question correctly more than once.
On the downside, automatically generated questions are not
as precise as manually generated questions. A compiled ques-
tion is carefully produced; the description of the question and
the set of answers are chosen based on some pattern and ver-
ified. The software to automatically generate questions, at-
tempts to mimic the same process.
An essay writing section is part of the current SAT and GRE
exams. The Educational Testing Service (ETS), the developers
v
of the SAT and GRE exams, uses machine and human graders
to evaluate essays. An automated essay evaluator is included
with the accompanying software. You can also learn how E-
rater ®, the essay evaluator from ETS, will score your essay.
Audience
This book and the accompanying software is for anyone plan-
ning on taking a standardized test or simply interested in using
software to learn English. If you plan on using the software,
you will need some basic knowledge of a PC (either on a Win-
dows or Linux platform). The author will provide technical
support to install and run the software.
Organization
The first chapter begins with a description of some of the soft-
ware that you can use to learn a language. Most of the software
explained in the book is open source (with the exception of E-
rater) and can be downloaded from the Web. The second chap-
ter includes a collection of tools to learn spelling, vocabulary,
and word relationships. Methods to improve your vocabulary
and guess the meaning of unknown words are also mentioned.
The third chapter mentions a few tips to build sentences and
explains how an automatic grammar checker works. Three dif-
ferent types of sentence quizzes are described. In the first quiz,
you need to find the missing word(s) from a given set of words.
In the second quiz, an error may or may not be present in a
sentence; you have to spot the error or leave the sentence as-
is. The third quiz substitutes an underlined sentence fragment
vi
with a possible correction; here you have to identify the sen-
tence fragment that is the most appropriate and grammatically
correct.
The fourth chapter explains how automated essay evaluation
works. The accompanying software includes an essay evaluator
that you can use to evaluate your essays. Many tips to write
an essay for the E-rater essay evaluator are mentioned. You
can write and organize your essay such that E-rater will be
more likely to assign a high score. The final chapter includes
some topics (listening, speaking, and comprehension) that are
not covered in detail in this book, but are part of standardized
tests. Finally, the appendices include an installation guide for
the accompanying software, a brief guide to punctuation, and
a collection of links to lists of SAT words, misspelled words,
and words ordered by a frequency index.
Conventions
The following typographical conventions are used in the book.
vii
Support
Visit http://emustru.sf.net to download the code used in
this book. The sample code is written in PhP and Java. Please
report bugs, errors, and questions to mkonchady@yahoo.com.
Bugs in the code will be corrected and posted in a new version
of the sample code. Your feedback is valuable and will be incor-
porated into subsequent versions of the book. Please contact
the author, if you would like more information on some topics
that have not been covered or explained in sufficient detail. I
have attempted to make the contents of the book comprehen-
sible and correct. Any errors or omissions in the book are mine
alone.
Acknowledgements
First I would like to thank the developers of the open source
tools including – Lucene (a search engine API), LingPipe (a
collection of linguistic tools), WordNet (a thesaurus / dictio-
nary), MySQL (a database), FreeTTS (a speech synthesizer),
and several other tools. These open source tools have made it
possible to develop the accompanying open source “Emustru”
software to learn English and practice for standardized tests.
The development of Emustru was partially funded by Sarai.net,
India and Cetril, France. Emustru received the third place
award (Education Category) in the free software competition
held by the Trophées du Libre in June 2009.
The list of roots, prefixes, and suffixes for words is included
with the permission of Jessica DeForest. The list of common
misspelled words includes the list from Wikipedia.
viii
1. Introduction
Current language exams evaluate not only vocabulary, gram-
mar, and writing skills, but also listening and speaking abil-
ities. Exams like the SAT reasoning test and the Graduate
Record Exam (GRE) do appear challenging at first. They re-
quire a fairly large vocabulary, knowledge of some grammar
rules, and decent writing skills. Memorizing word lists and a
list of grammar rules is tedious. Can a computer help you pre-
pare for these types of exams? Yes. There are many programs
on the Web to learn word lists, grammar, evaluate writing, and
convert text to speech.
English is a moderately difficult language to learn for sev-
eral reasons. One, estimates of the number of words in the
English language is large and continues to grow. The Ox-
ford English Dictionary contains about 170,000 words while
the computer-based WordNet dictionary / thesaurus contains
roughly 150,000 words. The total number of English words
exceeds one million, if all the forms of a word are included.
The same meaning can be expressed in many ways making it
harder for a student to understand the language. However, few
exams test for more than 10,000 words.
Secondly, spelling and pronunciation can vary depending on
the region. For instance, the American spelling of the standard
measurement of length is meter while the British spelling of
the same word is metre. Similarly, the British spelling of a legal
1
1. Introduction
2
1.1. Computer Assisted Language Learning
3
1. Introduction
4
1.2. Quizzes
1.2. Quizzes
Computer-based quizzes are common on the Web. A search
for “vocabulary quizzes” on the Web returns over 50,000 hits
on the Google search engine. Similarly, a search - “grammar
quizzes” returned about the same number of hits. Some of the
5
1. Introduction
What is ...
Answer A
B
C
Yes
Correct
No
Feedback
Explanation
6
1.2. Quizzes
7
1. Introduction
40
30
2
20
10
Score
0
3
-10
4
-20
-30
-40
-50
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1
Penalty
0.25, your score will be roughly 6 out of 100. So, for a quarter
point penalty, there is no harm in guessing from four possible
choices. However, as the penalty rises to 1.0, the expected score
rapidly falls and it is not worthwhile guessing if the penalty is
higher than a third of a point, since the expected score becomes
negative.
Of course, the more answers that you can eliminate, the
higher your expected score. When you have just two choices
with a quarter point penalty, you can expect a score of roughly
37 out of 100. So, it does pay to guess and your reward grows
8
1.3. Software
1.3. Software
All the software described in this book works on a PC running
the Windows or Linux operating system. This book describes
open source software that can be downloaded, evaluated, and
customized without subscription fees or license requirements.
You can learn vocabulary, check grammar, evaluate writing,
and correct spelling with the collection of software packages
(see Appendix A) included with this book. If you are inter-
ested, you can tinker with the software, improve it, make sug-
gestions, add documentation, and test the code for bugs.
There are many sites [2] on the Web to learn English vo-
cabulary, grammar, writing, and reading. At the end of each
chapter, a list of relevant Web sites that include practice tests
are mentioned. The main skills a student of any new language
would need to prepare for an exam include -
9
1. Introduction
10
1.3. Software
1.3.1. WordNet
Wordnet[3] is a popular open source dictionary / thesaurus for
English from the Cognitive Science Laboratory of Princeton
University. The typical dictionary orders words in an alpha-
betic order. In WordNet, words are assigned to synonym sets
(synsets) and relationships are defined between synsets. For
example, the word package is assigned to a synset that con-
tains the words - bundle, packet, and parcel. These words have
the same meaning as the word package and belong to a com-
mon synset.
A word can also belong to more than one synset. The word
package is also used as a verb in the synset that contains the
word box. Synsets are related to each other in a hierarchical
like relationship. The synset with the words - collection and
aggregation is a more general meaning of package, while the
words sheaf and bale are more specific words. Relationships
are also defined between individual words. For example, the
word wild is the opposite of the word tame. Chapter 2 includes
a more detailed description of WordNet and its use in Emustru.
11
1. Introduction
1.3.3. Audio
PG also includes a collection of audio books. Some of the
books are generated by a professional reader and other books
are converted to speech by a text-to-speech converter. The
audio books spoken by a human will sound more realistic than
a similar automatically-generated book. Which book is better
is a personal choice.
The main advantage of audio books is that you can hear and
read the same text simultaneously. This means that you can
recognize words, accents, and pronunciation that may appear
in a listening passage in an exam. In some text to speech
software, you can control the speed of the audio output, the
pitch, the type of voice (male/ female), and other parameters.
Speech to text software converts spoken text into written
text. Often, this software must be trained to recognize indi-
vidual pronunciation and the accuracy of the output depends
on many factors including the sensitivity of the microphone
12
1.3. Software
1.3.4. Emustru
The public domain Emustru software was written to accom-
pany this book to help you prepare for your exam. It can
be downloaded from http://emustru.sf.net. The software
runs on the Windows and Linux platforms and the installa-
tion details are included in Appendix A. A demo version of
the software is available at the same site.
Spelling
Emustru includes features to learn some of the skills mentioned
earlier. The spelling quiz selects words from a given word list,
that has been optionally ordered by rank, and generates an au-
dio file to “say” the word. The open source speech synthesizer
FreeTTS (http://freetts.sf.net) was used to generate au-
dio files.
Vocabulary
A word is selected from a pre-defined or user-provided word
list; The meaning is extracted from the WordNet [3] dictionary.
Some words have more than one meaning and just two of the
most popular meanings are selected for a quiz. Several words
that are unrelated to the given word are added to the list of
answers. A student selects the meaning of a given word from
a list of five options.
13
1. Introduction
Sentence Analysis
The Cloze (http://en.wikipedia.org/wiki/Cloze_test)
test is a test where some words of a sentence are removed and
the student must identify the missing words from a set of given
words. This test evaluates vocabulary and knowledge of words
in context. For example, the following sentence has two missing
words and a set of five choices.
• neighborhoods, crudeness
• viral, nearly
• dilemmas, container
• tongued, unfolding
• deceleration, maneuvered
14
1.3. Software
The words that are missing in the sentence are selected from a
pre-defined or custom word list. A student learns the context
in which words chosen from the word list are used in sentences.
This test complements the earlier vocabulary test where a stu-
dent learnt the meaning of words.
Grammar
Most word processors include a grammar checker along with
a spell checker to help the writer create a document that has
correct syntax and spelling. In general, a grammar checker
limits the number of false positives, i.e. the number of flagged
errors that are not valid. A writer is more likely to be annoyed
by a grammar checker that identifies errors in correct sentences
and may be willing to tolerate error sentences that are not
detected.
Emustru uses a statistical rule-based grammar checker to
find errors. A large number of rules are constructed after ob-
serving part of speech (POS) tag and word patterns in a corpus
that is known to contain sentences with valid syntax. These
patterns are encoded in rules and stored in database tables.
The grammar checker compares patterns extracted from a test
sentence with patterns saved in tables. Any pattern that is
rare or unusual is flagged as a potential error. The grammar
checker in Emustru is included in the essay writing evaluation
function (see Chapter 4).
Essay Writing
The essay evaluation function in Emustru assigns a score based
on a number of extracted features from a short essay of about
300-400 words. Many of the current competitive exams such as
15
1. Introduction
Emustru Quizzes
Emustru uses some of the philosophies behind the CALL ap-
proach to learn a language. A student learns vocabulary through
16
1.3. Software
New Quiz
Incorrect Responses
Missed Questions
25%
Correct Responses
Correctly Answered
25%
New Questions
Unseen Questions
50%
17
1. Introduction
One of the five options for the test word ensues, is correct.
Emustru picks the test word by rank or at random from a
given word list. The button will evaluate the current
question and return the next question. The evaluation will
indicate if the given answer was correct or not. The
button shows the previous question and answers. A question
18
1.3. Software
19
1. Introduction
20
2. Learning Vocabulary
Learning words from a long list is a dull and boring task.
Many ways have been suggested to make this task more in-
teresting and one of the most popular ways is through multi-
ple short quizzes of 10-20 questions each. The popular book
“Word Power Made Easy” by Norman Lewis contains many
such quizzes. Some of the quizzes contain the familiar multi-
ple choice questions where a student must select the correct
meaning of a word. Other quizzes provide the meaning of the
word and the starting letters of the related word. The student
fills in the remaining letters of the word that represents the
same meaning. A true or false quiz asks a question and the
student must verify if the highlighted word in the question is
appropriate or not. Finally, another type of quiz matches a set
of words with a set of meanings that have been jumbled. A
student matches a word with the correct meaning.
21
2. Learning Vocabulary
22
2.2. Which Words are Important?
23
2. Learning Vocabulary
24
2.2. Which Words are Important?
40K
No. of Unique Words
30K
20K
Base Words
10K
25
2. Learning Vocabulary
26
2.2. Which Words are Important?
27
2. Learning Vocabulary
use the word pedestrian with the second meaning of the word.
Imagine that the word was missing from the sentences. You can
extract the meaning of the word from the surrounding words
without a lot of difficulty. Even if you have no clue what a
word means, you can use the remaining words in the sentence
to limit the number of answer choices and make a calculated
guess.
Roots
Analyzing the letter sequences of an unknown word is another
way to guess the word meaning. Consider the word primeval
that is made up of a prefix (prim), root (ev), and a suffix
(al). The prefix prim is associated with the word first, the
root ev with an age or era, and the suffix al with a reference
or pertaining to the meaning of the prefix and root. We can
conclude that the word primeval means something that existed
in the earliest stages of life, by combining the meanings of the
prefix, root, and suffix (see Figure 2.2).
prim ev primeval
nav naval
e mot ion al emotional
28
2.2. Which Words are Important?
29
2. Learning Vocabulary
30
2.2. Which Words are Important?
31
2. Learning Vocabulary
32
2.2. Which Words are Important?
33
2. Learning Vocabulary
34
2.3. How to Learn with Online Quizzes
35
2. Learning Vocabulary
800
700
Score
600
500
400
300
200
0 2 4 6 8 10 12 14 16 18 20
Questions Answered
36
2.3. How to Learn with Online Quizzes
37
2. Learning Vocabulary
2.3.3. Quizlet
Quizlet™ is a more general Website than Free Rice and Visual
Thesaurus to learn lists of words in any language or terms from
any topic and their associated meanings. You can create your
own list of terms and meanings and upload the file to Quizlet.
The data collection is optionally saved in the public domain
and you can share your collection with others. The flashcard
model is used in this type of quiz. You can imagine a sample
flashcard set of five words -
38
2.3. How to Learn with Online Quizzes
Quizlet uses five different modes to help the student learn this
list of five words. In the first familiarize mode, the words and
meanings are shown in a flashcard-like user interface. After
you are reasonably familiar with the list of words, you can test
yourself in the learn mode. In this mode, the meaning of any
one of the five words selected at random, is shown and you
have to guess the associated word. Quizlet keeps track of the
number of correct responses and periodically re-tests you with
the questions that you missed.
The third test mode, generates questions and answers in a
dynamic quiz. Three types of questions are created - multiple
choice, true or false, and free text questions. A multiple choice
question has 4-5 answers, of which one is the correct answer.
The true or false question shows a meaning and a word and you
must verify if the meaning is appropriate for the given word.
Finally in a free text question, you need to enter the associated
word, given a particular meaning.
The fourth scatter mode shows the list of all words and
meanings scattered in a window. The aim of this game is to
make the entire window blank. A pair, a word and its meaning
disappear from the window, when either the word or meaning
is dragged and dropped over its partner. In the final race mode,
you answer questions as they appear on the screen.
2.3.4. Emustru
The user interfaces in the three products - Visual Thesaurus,
Free Rice, and Quizlet, are clean and easy to use. One prob-
lem with Visual Thesaurus and Free Rice is that you have little
control over the contents of the quizzes. You can choose a sub-
ject, however, the questions for the subject are pre-determined
39
2. Learning Vocabulary
40
2.3. How to Learn with Online Quizzes
Quiz a d x
Upto the first 25% of the questions may use words that were
missed earlier. Similarly, a maximum of another 25% of the
questions may include words that appeared in questions that
were correctly answered. Finally, the remaining questions will
use new words. The number of times, words that appeared in
questions that were correctly answered, should appear is set
in a parameter in the config.php file. If the parameter is set
to 0, then a word whose meaning was correctly identified, will
not appear again in a quiz.
41
2. Learning Vocabulary
uses the wrong words - knight and too, instead of night and to.
These types of errors are difficult to spot with a spell checker.
A forgiving reader may ignore a few such errors, but others may
form a negative opinion that is difficult to alter. Even if the
written matter is interesting, the initial opinion formed based
on spelling errors may dominate. This is specially important
if you have to write an essay for an exam.
42
2.4. Why should you learn Spelling?
43
2. Learning Vocabulary
area back
weak
bake
trek beaks
creak bleak balk
remark breaks
wreak brake
break breach
read
freak beak bread
real bureau
breakup
peak
leak breath
daybreak
44
2.4. Why should you learn Spelling?
Error analysis is helpful to the extent that you are careful when
spelling words that use these letters and sequences.
45
2. Learning Vocabulary
2-Letters 40 e 13
r 12
Others 19 Type %
i 11
Extra Letters 6 s 11
Missing Letters 2
Transposition 92 Correct Wrong %
e a 6.8
a e 6.2
i a 4.8
e i 4.8
i e 4.0
ie ei 4.0
o e 2.9
ei ie 2.6
46
2.4. Why should you learn Spelling?
47
2. Learning Vocabulary
48
2.5. Words, Meanings, and Relationships
Since two questions with the same word, but different mean-
ings in a single quiz, maybe confusing, Emustru repeats the
same word in another quiz. The period between the appear-
ances of the same word in a quiz can be configured in the
config.php file.
Each question contains five possible answers: only one of
the answers is correct. The remaining four incorrect answers
are selected carefully such that there is no overlap with other
words that have the same meaning. For example, the incor-
rect answers for a question with the test word - prosaic, must
exclude words in both meanings of prosaic. Sometimes, a hy-
49
2. Learning Vocabulary
2.6.1. Emustru
Hangman is a fairly well known game to find a word within n
chances. You pick letters from a screen-based keyboard: if the
letters appear in the unknown word, they are shown in their
letter positions (see Figure 2.12). In general, vowels and a few
consonants such as r, s, t, and n are the most frequent letters
in words. You are allowed to make upto six incorrect letter
guesses.
In some Hangman games, you maybe given more chances
to guess the letters and even the meaning of the word maybe
shown in a hint. The partial word game is a similar game with
a few letters of the word that are shown (see Figure 2.13).
The letters that are shown are 2 or more consecutive letters
from the beginning, middle, or end of the word. You have to
50
2.6. Word Games
The right hand side of Figure 2.13 has the equivalent un-
scramble question for the same word. The letters of the word
are jumbled and you need to enter the letters of the word in the
correct order. A hint is included at the bottom of the screen
(not shown in the Figure).
51
2. Learning Vocabulary
The second word bank of the two word phrase “central bank”,
is shown with a list of possible preceding words. Roughly half
of the questions in the quiz will show a preceding word and the
remainder the following word of a phrase. The purpose of this
quiz is to become familiar with common phrases from a large
corpus of text and to use these phrases in your own writing in
the proper context.
52
2.6. Word Games
53
2. Learning Vocabulary
54
2.7. Web Sites to Learn Vocabulary and Spelling
55
2. Learning Vocabulary
56
3. Learning Sentence
Construction
Writing a short paragraph seems more difficult than having
a conversation. There are several reasons why we perceive
writing, harder than speaking. A written sentence is gener-
ally more formal than a spoken sentence and takes more time
to compose, edit, and review. The art of building great sen-
tences is complex and cannot be explained in a single chapter.
However, this chapter will use quizzes to identify grammati-
cal errors and find missing words in sentences extracted from
newswire articles and classic literature. Each question uses a
sentence from a large collection of 35,000 sentences from the
Brown corpus and other sources. The sentences cover a range
of genres from religion to press articles.
As you take more quizzes, you will come across a large num-
ber of examples of sentence usage and styles. The style of your
sentences will depend on the reader. If your reader is a close
friend, then your sentence maybe informal. On the other hand,
an essay for an exam or a class should be well organized, clear,
and precise. This chapter contains some tips to build better
sentences for essays.
57
3. Learning Sentence Construction
58
3.1. Building Sentences
59
3. Learning Sentence Construction
Example Sentences:
An introduction to an essay on the impact of world
events on the U.S. economy: As the war with Iraq
winds down, worries about the dark threat of ter-
rorism on American soil, and the interminable war
in Afghanistan all make for exceptionally nervous
markets.
The conclusion of an essay on the Cassini-Huygens
mission to Saturn: Saturn’s numerous moons and
magnificent rings have still so much to tell and to
share along with Titan whose mystery was revealed
by the Cassini-Huygens mission.
A combined sentence: However, insecurity in the
country continued as numerous rebel groups emerged
to challenge nepotism and tribalism; the govern-
ment responded ruthlessly arresting, torturing and
forcing many into exile.
A contrast sentence: Despite the financial rewards,
many college students shunned jobs in trading se-
curities.
60
3.1. Building Sentences
3.1.2. Punctuation
Although punctuation does not directly add to the content of
your essay or even add to the word count, it is extremely im-
portant in a machine graded essay. A missed period at the end
of a sentence means that the sentence extractor in a machine
graded essay will collapse two sentences into a single sentence.
This may not be appear to be harmful, but will most likely
lead to a grammatically incorrect run-on sentence.
Secondly, the machine may incorrectly classify the combined
sentence as an introduction instead of an introduction and a
main point; A missing main point in a paragraph will be noted.
Finally, a machine may not detect the use of discourse words
such as despite or firstly that usually appear in the beginning
of a sentence. The position of words in a sentence is an indica-
tor of their use; the machine will not correctly tag such words
when they are found in other locations of a sentence. Another
avoidable error is starting a sentence with a lower case let-
ter. This error is easily noticed by both human and machine
graders.
Omitting other punctuation marks like the apostrophe can
sometime be humorous. For example, the missing apostrophe
at the end of the word Residents in –
implies that residents are not cooperating and will not enter
bins. While the period has a single purpose (to end a sen-
tence), the apostrophe is used to show possession (Jim’s), cre-
ate a contraction (it’s for it is), omit numbers or letter (’69),
and create plurals of words or letters (do’s). A missed comma
may not cost you much, but can change the meaning of some
61
3. Learning Sentence Construction
Example Sentences:
A long sentence from “Alice in Wonderland”: "Lastly,
she pictured to herself how this same little sister of
hers would, in the after-time, be herself a grown
woman; and how she would keep, through all her
riper years, the simple and loving heart of her child-
hood: and how she would gather about her other
little children, and make their eyes bright and ea-
ger with many a strange tale, perhaps even with
62
3.1. Building Sentences
63
3. Learning Sentence Construction
64
3.2. Is it grammatically correct?
Punctuation
Grammar check and spell check are functions that most of us
expect in a word processor. Unfortunately punctuation check,
an important part of writing, is absent. Although punctua-
tion may not be perceived as important as parts of speech like
nouns and verbs, the use of punctuation to make text clear
and unambiguous, does make the reader’s job easy. Appendix
B contains a brief description of punctuation characters.
65
3. Learning Sentence Construction
Generate Apply
Tree Rules
S
No Errors
NP VP .
fixing NP
the computer
66
3.2. Is it grammatically correct?
Manual Rules
LanguageTool [19] is a manual rule-based grammar checker for
several languages including English, Polish, and German. The
set of rules are manually created and stored in a large XML
file. Consider a rule to detect a typo in the sentence -
Notice, a spell checker would not flag this error, since exits is
a legitimate word. The rule to spot this specific error would
be -
67
3. Learning Sentence Construction
Id: “THERE_EXITS”
Automatic Rules
Grammar checkers based on automatically generated rule sets,
have been shown to have reasonable accuracy [20] to be used in
applications such as Essay Evaluation. The automated gram-
matical error detection system called ALEK, is part of a suite
of tools being developed by ETS to provide students learning
68
3.2. Is it grammatically correct?
69
3. Learning Sentence Construction
70
3.2. Is it grammatically correct?
71
3. Learning Sentence Construction
72
3.3. Emustru Sentence Quizzes
73
3. Learning Sentence Construction
74
3.3. Emustru Sentence Quizzes
75
3. Learning Sentence Construction
76
3.3. Emustru Sentence Quizzes
77
3. Learning Sentence Construction
Handing the (A) money (B) over, Russ wiped his hands
on his pant-legs as if riding (C) himself of something (D)
unclean. No Error (E)
-A
-B
-C
-D
-E
78
3.3. Emustru Sentence Quizzes
79
3. Learning Sentence Construction
80
4. Automatic Essay Scoring
The SAT and GRE exams include an essay question: An es-
say prompt is given and you are asked to argue for/against a
proposition or describe an event/procedure. In the interest of
saving time and money, the Educational Testing Service (ETS),
the organization responsible for these exams, has replaced one
of the two human graders per essay with an Automated Es-
say Scoring (AES) grader [24]. This chapter is not a tutorial
on writing; There are many excellent books on essay writing
and building sentences ([34, 35]). Here, the discussion is about
automated methods of evaluating essays and how you should
write an essay, if you know that the essay will be graded by a
machine.
81
4. Automatic Essay Scoring
82
4.1. How does it Work?
83
4. Automatic Essay Scoring
84
4.1. How does it Work?
85
4. Automatic Essay Scoring
86
4.1. How does it Work?
87
4. Automatic Essay Scoring
Raw Text
Essay Model
88
4.2. Applying AES
Raw Text
89
4. Automatic Essay Scoring
90
4.2. Applying AES
Why does AES work: When the possible scores for an essay
are in the range 1-6, even a very primitive AES will be correct
at least half the time. Consider, a human grader’s score of x
for an essay, a random number between 1 and 6 will be within
±1 of x, about 45% of the time. If the extreme scores of 1
and 6 are ignored, then the random score will be correct in
about half of all cases. This means that an AES has to make
an intelligent guess of the score in a fairly narrow range, to be
correct.
91
4. Automatic Essay Scoring
• You have passed a driving test. Your friend who does not
have a driver’s license would like to know the procedure.
Explain how you passed the driving test.
92
4.2. Applying AES
The E-rater 2.0 [11] can use a model that is not based on
any particular prompt to grade essays. This is very convenient,
since it is not necessary to create separate models based on each
topic. A single model can capture the necessary information to
score any essay. One argument against the use of a single model
for all prompts is that content specific words are not given
any additional importance in the model. The use of content
specific words in an essay is an indicator that the student has
understood the prompt and the essay is relevant.
93
4. Automatic Essay Scoring
94
4.3. How do you write an essay for E-rater?
4.3.1. Grammar
The types of grammar errors detected include run-on sentences,
garbled sentences, subject-verb agreement, ill-formed verbs,
95
4. Automatic Essay Scoring
The singular and plural forms of the verb presume, follow the
noun company. A statistical grammar checker may flag the
first sentence, since a singular collective noun is followed by
a plural verb. E-rater uses filters to allow such sequences,
even though the automatically generated rules indicate that
the sequence is rare. The grammar rules applied to evaluate a
sentence, depend on the frequency of observed bigrams in the
corpus. E-rater’s grammar checker was trained on a corpus
of about 30 million words from newswire text. All possible
grammatical errors will not be detected, and you need to make
sure that your sentence does not contain any of the grammar
errors that E-rater can detect (see Section 3.2.2).
96
4.3. How do you write an essay for E-rater?
4.3.2. Usage
Usage errors are common mistakes such as – the wrong or
missing article, confused words, the wrong form of a word, a
faulty comparison, preposition error, or a non-standard verb
form.
97
4. Automatic Essay Scoring
98
4.3. How do you write an essay for E-rater?
4.3.3. Mechanics
Mechanics errors are mostly word form errors: a misspelled
word, a missing punctuation, or a missing capital letter in a
word. Although these types of errors may seem petty, a miss-
ing punctuation error can alter the meaning of a sentence mak-
99
4. Automatic Essay Scoring
Letter Errors
100
4.3. How do you write an essay for E-rater?
Punctuation Errors
• Every sentence should end with a punctuation mark (a
sentence separator character – ?, ., or !).
• Although the apostrophe is a tiny punctuation charac-
ter that is easily overlooked, a missing apostrophe alters
the meaning of a sentence. The sentence – “The audi-
ence last night did not respond with either applause or
boos to mention of Hughes remark.” is missing an apos-
trophe after the word Hughes. The meaning of the sen-
tence without the apostrophe implies that the “Hughes
remark” is a type of remark.
• Notice if the last comma in the following sentence is
dropped, the sentence has a strange meaning – “The
Mayor apparently received the Bronx leader’s assent to
dropping Controller Lawrence E. Gerosa, who lives in the
Bronx, from this year ’s ticket”.
101
4. Automatic Essay Scoring
102
4.3. How do you write an essay for E-rater?
4.3.4. Style
Writing style is subjective: A good or bad style depends on
the reader’s likes and dislikes. Since the essay you will write
will be graded by E-rater, you will need to create an essay that
satisfies E-rater’s view of good style. E-rater collects statistics
and searches for patterns to evaluate style.
103
4. Automatic Essay Scoring
104
4.3. How do you write an essay for E-rater?
105
4. Automatic Essay Scoring
106
4.3. How do you write an essay for E-rater?
107
4. Automatic Essay Scoring
S1 L1
S2 L2
Extract features
and assign
labels
Sn Ln
108
4.3. How do you write an essay for E-rater?
109
4. Automatic Essay Scoring
110
4.3. How do you write an essay for E-rater?
short words. The use of inflected words and words from the
SAT list (see Appendix C) in your essay can make the average
word length closer to the average word length of a high-scoring
essay.
The second feature of lexical complexity is based on the stan-
dard frequency index (SFI) [35]. Every unique word is assigned
a SFI value; Words that appear frequently in text have a higher
SFI than words that are seen rarely (see Table 4.5). Unfortu-
nately, the SFI value of a word does not distinguish between
different meanings of the same word. For example, there is
no distinction between the noun meaning and verb meaning of
sound. If you do happen to use the less popular meaning of a
word, you will not gain any additional benefit, since the SFI
value includes all meanings.
Table 4.5.: Twenty Words and SFI Values from Brown Corpus
Word x Frequency x SFI x Word y Frequency y SFI y
Function words like the and of have the highest SFI. Content
words like underestimate and acidulous are seen less often and
111
4. Automatic Essay Scoring
112
4.3. How do you write an essay for E-rater?
Figure 4.4.: Top 1000 Words from the Brown Corpus sorted by
SFI vs. Word Frequency
140 SFI
70K
Frequency
120 60K
100 50K
Frequency
80 40K
SFI
60 30K
40 20K
20 10K
0
1 10 100 1000
Words
113
4. Automatic Essay Scoring
essays are manually pre-scored for each of the six score points.
The two features are –
114
4.3. How do you write an essay for E-rater?
115
4. Automatic Essay Scoring
116
4.3. How do you write an essay for E-rater?
117
4. Automatic Essay Scoring
118
4.4. Emustru Essay Evaluator
• Number of paragraphs
• Number of sentences
119
4. Automatic Essay Scoring
120
4.4. Emustru Essay Evaluator
The spelling tab shows the list of sentences in the essay along
with any spelling errors in each of the sentences. For every
spelling error, a potential suggestion is also shown. The vocab-
ulary tab shows a few of the word-statistics such as the num-
ber of words, average word length, number of unique words,
and the standard deviation of the word length. The organi-
zation tab shows the coherence between individual sentences,
sentences and their parent paragraph, and sentences with the
essay text as whole. Other statistics include the counts of the
use of passive voice and discourse markers. Notice, a higher
count of passive voice markers may lead to a lower score, while
a higher count of discourse marker is usually associated with
a high scoring essay. The final tab contains a list of all the
features in the essay compared to an ideal high scoring essay
(Figure 4.7).
The score assigned to the essay (5 in this case) is shown
compared to an ideal essay. The number of grammatical errors
and the category are also shown in Figure 4.7. The remaining
19 features are not shown in the figure. Any value that is not
121
4. Automatic Essay Scoring
122
4.5. Web sites to learn Essay Writing
2. http://www.knowledge-technologies.com: Pearson
Knowledge Technologies’ Intelligent Essay Assessor™.
123
4. Automatic Essay Scoring
124
5. Other Topics
This chapter covers other topics that are part of standardized
tests such as listening, comprehension, and speaking. A large
number of commercial software products convert text to speech
and vice versa. The quality of these products varies and this
chapter does not evaluate commercial software.
5.1. Listening
A simple way to learn from audio is to read a transcript while
listening to the audio version of the same transcript. A large
number of books from the Project Gutenberg [6] are available
in both MP3 and text formats.
Espeak [?] is an open source speech synthesizer for English
that runs on the Linux and Windows platforms. Although the
Espeak’s audio output does not sound as natural as a human’s
voice, the quality is good enough to follow. Several voices are
included – a default English voice, an U.S. voice, and a Scot-
tish voice. On the Windows platform, Microsoft Sam (Speech
Articulation Module) is a default voice.
There are two ways to run Espeak: either from a GUI (see
Figure 5.1) or the command line. The GUI can read text files
and provides options to change the reading speed, the voice,
and other controls. It is easy to use and on the Windows
125
5. Other Topics
126
5.2. Speaking
5.2. Speaking
Speech recognition software allows you to control and dictate
text to your computer through voice commands. The first at-
tempts to build automatic speech recognition (ASR) software
were not entirely successful. The problems of recognizing var-
ious accents and converting speech to text in real time were
harder than expected. The latter problem was solved with the
rapid increase in the computing power of PCs and improved
software. However, most speech recognition software still uses
two components - one for training and another for recognition.
You will need to spend some time training your speech recog-
nition software to become familiar with your accent; The train-
ing may require you to read long chunks of text. If your ASR
software has been sufficiently trained, then the recognition soft-
ware will have reasonably high precision.
ASR software is complex and you can find out a lot more
about it on the Web. The Sphinx project at Carnegie Mellon
University is a popular open source tool for ASR. It has been
used for several years, but needs some technical knowledge to
train and test speech recognition.
127
5. Other Topics
5.3. Comprehension
Passage comprehension is considered one of the trickier sec-
tions of a language exam. The reasons are – the topic of a
passage maybe unfamiliar and consequently harder to compre-
hend, you are required to read and understand a passage within
a time limit, and finally you may not know some of the passage
vocabulary.
Although a passage is not the same as the five-paragraph
essay discussed in Chapter 4, you can use the same analysis
techniques to study a passage. The initial description of the
passage will explain the context – the passage will usually be
an extract from a novel, a scientific article, or an essay. The
first paragraph will establish the characters or the topic that
will feature in the remainder of the passage.
As you browse the passage, you will find sentences where the
author uses the discourse words mentioned in Section 4.3.5. Of-
ten, passage questions will test if you understood the meaning
of sentences that contain words like – despite, while, or how-
ever. Other words that maybe worth highlighting, include the
names of people, places, and things. These words describe the
entities mentioned in the passage. The adjectives used in the
passage are also likely to indicate the tone of the passage. A
question on the author’s views or attitude is a fairly common
question in a long passage.
Since exams like the SAT or GRE test aptitude, the subject
matter of the passage maybe taken from a broad range of top-
ics. The subject of the passage may include a scientific discus-
sion (from physics, chemistry, botany, mathematics, zoology),
a social commentary (philosophy, culture, history, geography),
or a critique of the arts (drama, music, literature, sculpture,
128
5.3. Comprehension
5.3.1. Requirements
Before you begin reading long passages, you should first build
your vocabulary. If you do not know the meaning of 5 or more
words in a passage of 150-200 words, you will find it difficult
to answer some questions. There is always the possibility that
you will not know the meaning of a few words in a passage.
However, using the context and your knowledge of roots, pre-
fixes, and suffixes (see Appendix C), you can make a reason-
able guess that should be close enough to help you answer a
question.
Many passage questions test for the less frequent meaning
of a word. For example, the word mold could be used as a
verb (to shape or form) or a noun (a decaying surface or a
pattern). The meaning of the word will depend on the context
and you will be able to answer these types of questions, if you
are familiar with most of the meanings of a word.
129
5. Other Topics
5.3.2. Tips
Before a passage begins, a blurb will describe the source. For
example, the start of a passage about the traits of a conduc-
tor may state – In this excerpt from the “Joy of Music”, the
conductor and composer Leonard Bernstein distinguishes the
great from the average conductor. Although, the blurb is not
part of the passage, it is important to read it carefully and
recognize the background of the passage.
130
5.3. Comprehension
131
5. Other Topics
132
5.4. Web sites to practice Reading Comprehension
3. http://www.ehow.com/topic_916_taking-the-sat.html:
Tips for taking the SAT
133
5. Other Topics
134
A. Installing Emustru
The Emustru software used in this book is available from
http://emustru.sf.net. Emustru has been tested on the
Windows and Linux platforms. The application is Web-based
and runs on the Linux-Apache-MySQL-PhP (LAMP) or the
Windows-Apache-MySQL-PhP (WAMP) stacks.
Windows
This document will assume you have an existing stack on ei-
ther the Windows or Linux platforms. The WAMP project
(http://www.wampserver.com/en) distributes the three com-
ponents of the stack - Apache, MySQL, and PhP. The WAMP
distribution makes it simple to install the stack without down-
loading and customizing each of the individual components
(see Figure A.1).
Apache and MySQL run as services and must be started be-
fore installing Emustru. The Administrative Tools of the Con-
trol Panel includes options to enable these services at startup
time. The default directory for WAMP is c:\wamp and the
www sub-directory under this directory is the location for Web
projects. The Emustru distribution can be unzipped in the
c:/wamp/www directory. A default index.php file is created in
the www directory and can be viewed from the browser at the
URL, http://localhost/index.php.
135
A. Installing Emustru
136
useful tool to troubleshoot problems with database tables and
is fairly easy to use. The root MySQL password must be set
in the config.inc.php file under the phpmyadmin directory.
Linux
Many of the current Linux distributions include options to in-
stall a Web server (Apache), a database server (MySQL) and
PhP. If you have not installed these components, then you can
either install a separate package XAMPP, use the distribution
to add these components, or download each of the options sep-
arately.
The XAMPP (http://www.apachefriends.org/en/xamp
p.html) project is a multi-platform tool to build the AMP
stack on Linux, Windows, MacOS, and Solaris platforms. It
includes the same components as WAMP and a few others as
well. On Linux, the XAMPP distribution is a gzipped tar file,
that can be unzipped in an /opt directory. You will need to
become the root user to complete the rest of the installation.
After unzipping the distribution, you can start Apache and
MySQL with the "lampp start" command from the top level
137
A. Installing Emustru
Configuration
The screen shown in Figure A.2 should appear, if the Emus-
tru distribution has been unzipped under the htdocs directory,
from a browser session with the URL set to http://localhost/
emustru/index.php. This screen is common to Linux and
Windows installations.
The installation screen in Figure A.2 is based on a Windows
installation. A Linux installation is similar with the exception
138
Figure A.2.: Emustru Installation Screen
139
A. Installing Emustru
Customization
The default distribution comes with a word list of about 8,000
words and 6,500 sentences. Two additional sources of sen-
140
tences and words can be downloaded from SourceForge.net -
brown.zip and sat.zip. The brown.zip file contains 25,000
words and 35,000 sentences from the Brown corpus [5]. The
sat.zip file contains 8,500 words and 120,000 sentences ex-
tracted from e-books downloaded from the Project Gutenberg
[6].
Unzip both of these files in the install/table_data direc-
tory of the installation directory. Then login as admin (initial
password admin) and press the “Load Word Table” button
shown in Figure A.3.
141
A. Installing Emustru
You can also add a list of words to one of the word list
types. Emustru will accept a file with one word per line in
several formats. An optional number accompanying the word
is interpreted as a rank and words with higher ranks will be
shown earlier in quizzes than other words in a generated quiz.
If no ranks are provided, all words are assigned the same rank
and a rank order quiz for such a word list, will fetch words in
alphabetic order. The words for questions in any quiz can be
selected at random or by rank order. An option to select an
order type for the quiz is provided before a quiz is generated.
Troubleshooting
The Java code uses a JDBC connector to access the MySQL
database and will not function if network access is disabled in
MySQL. Network access is set through the skip-networking
option in the my.ini file. During a fresh installation, you may
need to clear out any existing log and configuration files from
the temporary directory.
• There are several log files that contain messages indicat-
ing problems with the installation or running of Emustru.
– The emustru.log file in the temporary directory con-
tains messages from problems found in the PhP or
Java code.
– Entries in the Apache error log file, the MySQL log
file, and a PhP error log file may contain useful in-
formation to debug a problem.
• The essay evaluate function starts a shell script from PhP
to run the Java code and can be found in the temporary
directory.
142
• Similarly, the other Java functions are run from PhP us-
ing the shell_exec command which may not work if
PhP is operating in safe mode.
143
A. Installing Emustru
144
B. Parts of Speech
Identifying the part of speech of a word will make it easier
to understand the meaning of the word as well as its context
in a sentence. The nine common parts of speech are - noun,
pronoun, verb, adjective, adverb, conjunction, determiner, in-
terjection, and preposition. (see Figures B.1 and B.2). It is
important to know parts of speech, not just to build grammat-
ically correct sentences, but also to learn vocabulary, compre-
hend a passage, and score well in a Cloze test. You can practise
your skills in finding the part of speech of words in a sentence,
using the Link Parser [18].
Determiner Pronoun
Person: Tom
at, the, a Place: Vienna he, they,
us, him
Thing: Piano
145
B. Parts of Speech
146
Figure B.2.: Conjunctions and Prepositions
Sentence
Tags c n
o
n t io
verb determiner noun j pronoun verb adjective
unc
Sentence
147
B. Parts of Speech
Describe Analyze
Action
Verb
Identify Generate
Refer Create
Perform Speculate
148
is replaced with “It’s”. Other popular contractions include -
haven’t, couldn’t, and you’re.
149
B. Parts of Speech
? ! .
:
;
,
150
In general, the semi-colon is used more often than the colon
to indicate a pause. It may be difficult to define the precise
length of a pause in a sentence and lookup the appropriate
punctuation character, based on the duration of the pause, for
a given sentence.
151
B. Parts of Speech
152
C. Word Lists
The PDF files below can be downloaded from
http://emustru.sf.net.
• http://emustru.sf.net/list_confused_words.pdf –
List of Sentences for Confused Words: Words such
as accept and except are sometimes used incorrectly
in sentences. This list includes a set of sample sentences
for every pair of confused words.
• http://emustru.sf.net/list_misspelled_words.pdf
– List of Misspelled Words: A collection of 2700
words that have been frequently misspelled.
• http://emustru.sf.net/list_preposition_errors.pdf
– List of Preposition Errors: A short list of common
preposition errors
• http://emustru.sf.net/list_sat_words.pdf – List
of SAT Words: A list of 8600 words that appear often
153
C. Word Lists
• http://emustru.sf.net/list_words_sfi.pdf – List
of 10K Words from Brown Corpus: A list of ten
thousand words and their associated standard frequency
index values from the Brown Corpus.
154
Index
155
Index
grammar checker, 15, 64, 66, New York Times, 12, 32, 34
96 O’Connor, Johnson, 22
ALEK, 68
E-rater, 70 part of speech, 31
LanguageTool, 67 passage comprehension, 128
parse tree-based method, passive sentences, 105
67 personal opinions, 130
rule-based method, 67 phrase game, 52
grammar errors, 85, 95 precise sentence, 64
great sentences, 57 preposition error, 99
Project Gutenberg, 11, 32,
hangman, 50 125
holistic score, 83 pronunciation, 2, 4
human grader, 16, 63, 81, proxes, 84
83, 87, 90, 94, 100 punctuation, 61, 65
hyponym, 49 punctuation errors, 101
Intelligent Essay Assessor, 90 quiz, 5, 21
Intellimetric, 90 Quizlet, 10, 38
Krugman, Paul, 12 sentence
156
Index
157
Index
158
Bibliography
[1] http://en.wikipedia.org/wiki/Computer-assisted_
language_learning, Computer Assisted Language
Learning (CALL).
[2] http://www.camsoftpartners.co.uk/freestuff.htm,
Free resources and articles on Computer Assisted
Language Learning.
[4] http://ftp.ets.org/pub/res/erater_iaai03_burst
ein.pdf, Criterion: Online essay evaluation: An applica-
tion for automated evaluation of student essays.
159
Bibliography
[9] E. B. Page and N.S. Petersen: The computer move into es-
say grading. Upgrading the ancient test. Phi Delta Kappa,
76(7), 561-565.
[12] http://www.vantagelearning.com/school/products
/intellimetric/, Intellimetric, Vantage Learning.
[13] http://www.knowledge-technologies.com/prodIEA.
shtml, Intelligent Essay Assessor, Prentice Hall.
160
Bibliography
[18] http://www.link.cs.cmu.edu/link/submit-sentence-
4.html, The Link Parser from Carnegie Mellon Univer-
sity.
[26] http://www.vantagelearning.com/school/products/
intellimetric, Intellimetric, Vantage Learning.
161
Bibliography
[31] http://www.ets.org/Media/Research/pdf/r3.pdf :
The Ups and Downs of Preposition Error Detection in
ESL Writing, ETS, Princeton, NJ. 2008.
[34] http://en.wikipedia.org/wiki/Cosine_similarity:
Cosine similarity measure.
162