Professional Documents
Culture Documents
ii
Abstract
This research investigates innovations for the computer transcription of handwritten Pitmans
Shorthand as a rapid means of text entry (up to 100 words per minute) into todays pen-based
handheld devices.
Two mathematical models are developed in this work. The first model deals with high level
phonetic-based translation, while the second model is specifically concerned with low level
primitive-based translation. Both models are closely related to the lexicon organization and
contextual processing for online handwritten Pitmans Shorthand recognition.
A number of research issues that arise from interpreting handwritten Pitmans Shorthand strokes
of digital ink as text are addressed including: (a) a feasibility study into improving a conventional
phonetic-based transliteration approach to advance word recognition; (b) an investigation into
new Bayesian Network modelling of strokes and their relationships in order to solve the problem
of geometric variations and vowel ambiguities of handwritten Pitmans Shorthand; (c) generation
of a new machine-readable Pitmans Shorthand lexicon to facilitate the direct transcription of
geometric features of Pitmans Shorthand into English text; (d) analysis of the impact of
statistical language modelling in handwriting phrase recognition; (e) and a discussion of the
graphical user interface issues in relation to the development of a commercial prototype from the
frame of reference of this research.
The research has been carried out in close cooperation with Nanyang Technology University
(NTU) in Singapore. The system is currently undergoing a final evaluation in terms of its
recognition accuracy, as well as its potential to be introduced as a commercially viable fast text
input system.
iii
Acknowledgements
I would like to take this opportunity to express my sincere gratitude to my supervisor Dr.
Colin Higgins for his valuable guidance and constant support since the day I had stepped
into the School of Computer Science of the University of Nottingham till the day of the
successful completion of this research.
My sincere gratitude also goes to Professor Graham Leedham for his dedicated guidance and
genuine assistance for keeping the close collaboration between the two participating teams
of this research. My deepest thanks also go to Ma Yang for her heartfelt contribution and
her immediate responses during the critical time of this collaborative research.
My sincere thanks also go to Ms. Joyce Cox for her kind and professional help in proof
reading the quality of English of the thesis. Also, from the bottom of my heart, I am very
grateful to all participants who have helped me in the experiments of this research. Many
thanks also go to my colleagues in the LTR Research group for their warm friendship that
made me feel at home in our LTR lab.
Also, my endless thanks to my uncle, Dr. Kyin Win for supporting me financially and
emotionally to make my dream of participating in the doctorate research come true. My
heartfelt thanks also go to the International Office and the School of Computer Science of
the University of Nottingham for their enormous financial support for the development of
this research.
Also, my sincere love and thanks to my parents, fianc, and all my friends in Nottingham for
supporting me financially, emotionally and spiritually during the difficult days of my long
residence in Nottingham.
Last but not least, my sincere thanks to all the members of the School of Computer Science
of the University of Nottingham for all their help and advice, given to me when I needed it
most.
Thank you all, Swe Myo Htwe.
iv
Table of contents
ABSTRACT ..II
ACKNOWLEDGEMENTSIII
TABLE OF CONTENTS ...IV
LIST OF FIGURES
LIST OF TABLES .
1
CHAPTER 1
INTRODUCTION ................................................................................................... 2
1.1
BACKGROUND...................................................................................................................... 3
1.1.1
Collaboration ................................................................................................................. 3
1.1.2
Motivation ...................................................................................................................... 4
1.1.3
Scope .............................................................................................................................. 5
1.2
BRIEF OVERVIEW ................................................................................................................. 7
1.2.1
General Objectives and Contributions ........................................................................... 7
1.3
SYNOPSIS OF THE DISSERTATION ....................................................................................... 12
2
BACKGROUND TO THE AUTOMATIC RECOGNITION OF HANDWRITTEN
PITMANS SHORTHAND ............................................................................................................... 15
CHAPTER 2
INTRODUCTION ................................................................................................. 16
2.1
2.1.1
2.1.2
2.1.3
2.1.4
2.2
2.3
2.4
Conditional independence.................................................................................................30
Inference ........................................................................................................................... 32
Learning............................................................................................................................33
2.5
INTRODUCTION ................................................................................................. 40
v
3.7.3
Performance evaluation of the word level transcription .............................................. 57
3.8
DISCUSSION ....................................................................................................................... 58
4
CHAPTER 4
4.1
4.2
4.3
4.4
4.4.1
4.4.2
4.5
4.5.1
4.5.2
4.6
4.6.1
4.6.2
4.7
4.8
4.8.1
4.8.2
4.8.3
4.8.4
INTRODUCTION ................................................................................................. 62
4.8.4.1
4.8.4.2
4.8.4.3
4.8.4.4
4.8.5
4.8.5.1
4.8.5.2
4.8.5.3
4.8.5.4
4.8.6
Analysis of word transcription accuracy for the special-rule data set ....................... 103
4.8.6.1
4.8.6.2
4.8.6.3
4.8.6.4
4.9
CHAPTER 5
5.1
OVERVIEW ....................................................................................................................... 112
5.1.1
Rule-based creation of the electronic Pitmans Shorthand lexicon............................ 113
5.2
STRUCTURE OF THE ELECTRONIC PITMANS SHORTHAND LEXICON ................................. 114
5.2.1
Feature set.................................................................................................................. 114
5.2.2
Key.............................................................................................................................. 115
5.2.3
Lexicon layout ............................................................................................................ 116
5.3
CONVERSION PROCEDURE ................................................................................................ 118
5.3.1
The importance of algorithms of the presented rules ................................................. 119
5.3.2
Description of Rules ................................................................................................... 120
5.4
EXPERIMENTAL RESULTS ................................................................................................. 127
5.4.1
Data set ...................................................................................................................... 127
5.4.2
Analysis of the accuracy of a machine readable Pitmans Shorthand lexicon ........... 128
5.4.3
Analysis of the distribution of homophones in machine-readable Pitmans Shorthand
lexicons 134
5.5
DISCUSSION ..................................................................................................................... 136
6
PHRASE LEVEL TRANSCRIPTION OF ONLINE HANDWRITTEN PITMANS
SHORTHAND OUTLINES ............................................................................................................ 137
vi
CHAPTER 6
6.1
6.2
6.3
6.4
6.5
7
GRAPHICAL USER INTERFACES OF THE HANDWRITTEN PITMANS
SHORTHAND RECOGNITION SYSTEM................................................................................... 148
CHAPTER 7
7.1
OVERVIEW ....................................................................................................................... 150
7.2
INK DATA COLLECTION IN THIS RESEARCH ....................................................................... 151
7.3
GENERAL TRAINING DATA COLLECTION TOOL ................................................................. 155
7.4
DEVELOPER GRAPHICAL USER INTERFACE ...................................................................... 158
7.5
SHORTHAND DATA ENTRY GRAPHICAL USER INTERFACES................................................ 159
7.6
EXPERIMENTAL RESULTS ................................................................................................. 164
7.6.1
Analysis of the general distribution of user fondness for the presented prototypes ... 166
7.6.2
Analysis of the distribution of user fondness for the presented prototypes in the case of
speed writing............................................................................................................................. 167
7.6.3
Analysis of the distribution of user fondness for the presented prototypes in the case of
a small amount of text entry into handheld devices .................................................................. 167
7.6.4
The comparison of the most favourite GUI of experienced shorthand writers and that
of novice shorthand writers ...................................................................................................... 168
7.7
DISCUSSION ..................................................................................................................... 169
8
CHAPTER 8
8.1
8.2
8.3
8.3.1
8.3.2
8.4
REFERENCES................................................................................................................................. 180
APPENDIX .......................................................................... ERROR! BOOKMARK NOT DEFINED.
vii
F, V
P,B
T, D
TH, th
S, Z
vowel neighbourhood
at the beginning of an outline
in the middle of an outline
at the end of an outline
circle neighbourhood
close circles
unclose circles
hooks
viii
FIGURE 3.6: SAMPLE OF PHONEME TRANSLATION OF A DOUBLE LENGTH STROKE
.................................................................................................................................................... 51
FIGURE 3.7: (A) SAMPLE INPUT OF PHONEME ORDERING PROCESS (B) SAMPLE
OUTPUT OF PHONEME ORDERING PROCESS ............................................................. 52
FIGURE 3.8: SAMPLE ELEMENT OF A PHONETIC LEXICON IN A HASH TABLE............. 54
FIGURE 3.9: SAMPLE COLLECTED OUTLINES........................................................................ 55
FIGURE 3.10: THE DISTRIBUTION OF HOMOPHONES IN DIFFERENT SIZED
PHONETIC LEXICONS .......................................................................................................... 55
FIGURE 3.11: ILLUSTRATION OF THE INCIDENCE OF PHONEME VARIATION DUE TO
CONFUSION BETWEEN A CIRCLE AND A HOOK .......................................................... 59
FIGURE 3.12: ILLUSTRATION OF THE INCIDENCE OF PHONEME VARIATION DUE TO
LENGTH CONFUSION ........................................................................................................... 60
FIGURE 4.1: AN ABSTRACT VIEW OF THE WHOLE SYSTEM .............................................. 63
ix
FIGURE 4.12: SAMPLES OF THE STROKE COMBINATION DATA SET .............................. 87
FIGURE 4.13: TWO DIFFERENT SHORTHAND OUTLINES FOR THE WORD AFTER; (A)
THE WORD AFTER IS WRITTEN ACCORDING TO THE DIRECT CONVERSION
OF PHONEMES INTO PRIMITIVES (B) THE WORD AFTER IS WRITTEN
ACCORDING TO THE DOUBLE-LENGTH RULE OF PITMANS SHORTHAND ......... 88
FIGURE 4.14: SCREEN SHOT OF OUTLINES WRITTEN BY WRITER A ............................. 89
FIGURE 4.15: EVALUATION OF THE VOCALISED OUTLINE IDENTIFICATION OF THE
RECOGNITION ENGINE........................................................................................................ 90
FIGURE 4.16: EVALUATION OF THE SEGMENTATION ACCURACY OF THE
RECOGNITION ENGINE........................................................................................................ 92
FIGURE 4.17: EVALUATION OF THE CLASSIFICATION ACCURACY OF THE
RECOGNITION ENGINE........................................................................................................ 93
FIGURE 4.18: ILLUSTRATION OF A RELATIONSHIP BETWEEN RECOGNITION
ACCURACY AND TRANSCRIPTION ACCURACY OF THE SINGLE CONSONANT
DATA SET................................................................................................................................. 95
FIGURE 4.19: COMPARISON OF THE HANDWRITING OF TWO WRITERS....................... 96
FIGURE 4.20: ILLUSTRATION OF THE WORD TRANSCRIPTION ACCURACY OF THE
SINGLE CONSONANT DATA SET ...................................................................................... 96
FIGURE 4.21: ILLUSTRATION OF THE CORRECTION ACCURACY IN COMPARISON
WITH THE CLASSIFICATION OR VOWEL ERRORS OF THE SINGLE CONSONANT
DATA SET................................................................................................................................. 98
FIGURE 4.22: ILLUSTRATION OF AN AVERAGE DISTRIBUTION OF FACTORS
INFLUENCING THE ACCURACY OF A RESULT LIST (SINGLE CONSONANT DATA
SET) ........................................................................................................................................... 99
FIGURE 4.23: ILLUSTRATION OF THE RELATIONSHIP BETWEEN RECOGNITION
ACCURACY AND TRANSCRIPTION ACCURACY OF THE STROKE-COMBINATION
DATA SET............................................................................................................................... 100
FIGURE 4.24: ILLUSTRATION OF THE WORD TRANSCRIPTION ACCURACY OF THE
STROKE-COMBINATION DATA SET................................................................................ 101
FIGURE 4.25: ILLUSTRATION OF THE CORRECTION ACCURACY IN COMPARISON
WITH THE CLASSIFICATION/VOWEL ERRORS OF THE STROKE COMBINATION
DATA SET............................................................................................................................... 102
FIGURE 4.26: ILLUSTRATION OF AN AVERAGE DISTRIBUTION OF FACTORS
INFLUENCING THE ACCURACY OF A RESULT LIST (STROKE-COMBINATION
DATA SET) ............................................................................................................................. 103
FIGURE 4.27: RELATIONSHIP BETWEEN RECOGNITION ACCURACY AND
TRANSCRIPTION ACCURACY OF THE SPECIAL-RULE DATA SET ........................ 104
FIGURE 4.28: EVALUATION OF THE WORD TRANSCRIPTION ACCURACY OF THE
SPECIAL-RULE DATA SET................................................................................................. 105
FIGURE 4.29: ILLUSTRATION OF THE CORRECTION ACCURACY IN COMPARISON
WITH CLASSIFICATION OR VOWEL ERRORS OF THE SPECIAL-RULE DATA SET
.................................................................................................................................................. 106
FIGURE 4.30: ILLUSTRATION OF AN AVERAGE DISTRIBUTION OF FACTORS
INFLUENCING THE ACCURACY OF A RESULT LIST (SPECIAL-RULE DATA SET)
.................................................................................................................................................. 107
FIGURE 5.1: (A) SAMPLE ENTRIES OF A CONVENTIONAL PITMANS SHORTHAND
DICTIONARY AVAILABLE IN BOOK FORMAT (B) SAMPLE ENTRIES OF AN
ELECTRONIC PITMANS SHORTHAND LEXICON ........................................................ 112
FIGURE 5.2: SAMPLE KEYS OF THE ELECTRONIC PITMANS SHORTHAND LEXICON;
VOWELS ARE UNDERLINED ............................................................................................. 115
FIGURE 5.3: SAMPLE ENTRIES OF THE ELECTRONIC PITMANS SHORTHAND
LEXICON................................................................................................................................. 116
FIGURE 5.4: ILLUSTRATION OF THE CONVERSION PROCEDURE ................................. 119
FIGURE 5.5: ILLUSTRATION OF THE USE OF A DOT PRIMITIVE FOR THE SOUND COM
AT THE BEGINNING OF A WORD .................................................................................... 123
FIGURE 5.6: ILLUSTRATION OF THE USE OF NEGATIVE PREFIX IR- IN A VOCALISED
OUTLINE................................................................................................................................. 124
FIGURE 5.7: ILLUSTRATION OF THE USE OF PL HOOK IN A VOCALISED OUTLINE .. 125
FIGURE 5.8: ILLUSTRATION OF A ONE SYLLABLE HALF-LENGTH OUTLINE ............... 126
x
FIGURE 5.9: ILLUSTRATION OF THE OMISSION OF THE SYLLABLE TER IN A
VOCALISED OUTLINE ......................................................................................................... 126
FIGURE 5.10: ILLUSTRATION OF INCOMPATIBLE PRIMITIVE PAIRS FOR DOUBLING
.................................................................................................................................................. 127
FIGURE 5.11: SAMPLE ENTRIES OF A MACHINE-READABLE PITMANS SHORTHAND
LEXICON................................................................................................................................. 128
FIGURE 5.12: AVERAGE ACCURACIES OF DIFFERENT SIZES OF MACHINEREADABLE PITMANS SHORTHAND LEXICONS.......................................................... 129
FIGURE 5.13: TWO DIFFERENT OUTLINES FOR THE WORD WEATHER; (A) THE
WORD WEATHER IS WRITTEN ACCORDING TO THE DOUBLE-LENGTH RULE OF
PITMANS SHORTHAND; (B) THE WORD WEATHER IS NOT WRITTEN
ACCORDING TO THE DOUBLE-LENGTH RULE OF PITMANS SHORTHAND ....... 131
FIGURE 5.14: (A) SHORTHAND OUTLINE FOR THE WORD FACTOR; (B)
SHORTHAND OUTLINE FOR THE WORD FURTHER................................................ 131
FIGURE 5.15: TWO DIFFERENT SHORTHAND OUTLINES FOR THE WORD UNION . 132
FIGURE 5.16: TWO DIFFERENT OUTLINES FOR THE WORD LANDLORD .................. 132
FIGURE 5.17: TWO DIFFERENT OUTLINES FOR THE WORD ENVIRONMENT........... 133
FIGURE 5.18: THE DISTRIBUTION OF DIFFERENT CATEGORIES OF ERRORS IN
ELECTRONIC PITMANS SHORTHAND LEXICONS OF DIFFERENT SIZES ........... 133
FIGURE 5.19: THE DISTRIBUTION OF UNIQUENESS OF THE ELECTRONIC PITMANS
SHORTHAND LEXICONS ..................................................................................................... 134
FIGURE 6.1: SAMPLES OF PITMANS SHORTHAND OUTLINES WRITTEN IN THREE
DIFFERENT POSITIONS; (A) OUTLINES WRITTEN INCLUDING VOWEL
NOTATIONS, (B) OUTLINES WRITTEN WITHOUT VOWEL NOTATIONS ................ 140
FIGURE 6.2: ILLUSTRATION OF THE HANDWRITTEN PITMANS SHORTHAND PHRASE
LEVEL TRANSCRIPTION PROCESS................................................................................ 141
FIGURE 6.3: AN ABSTRACT VIEW OF THE OBJECT MODEL MICROSOFT.INK.......... 144
FIGURE 6.4: SCREEN SHOTS OF THE RECOGNITION RESULTS PRODUCED BY THE
RECOGNISERCONTEXT API .......................................................................................... 145
FIGURE 6.5: PERFORMANCE OF THE CONTEXTUAL REJECTION STRATEGY ........... 146
FIGURE 7.1: FRONT-END AND BACK-END ARCHITECTURE OF THE SYSTEM ............ 149
FIGURE 7.2: ILLUSTRATION OF INTERACTIONS BETWEEN USER INTERFACES AND
BACK-END ENGINES OF THE SYSTEM ......................................................................... 150
FIGURE 7.3 ILLUSTRATION OF THE TABLET PC PLATFORM APIS PRESENTED AT
[REF]........................................................................................................................................ 152
FIGURE 7.4: ILLUSTRATION OF THE HIGH LEVEL RELATIONSHIP OF OBJECT
MODELS OF THE TABLET PC PLATFORM APIS ........................................................... 154
FIGURE 7.5: HOME PAGE OF THE TRAINING DATA COLLECTOR ................................... 155
FIGURE 7.6: SAMPLE DATA ENTRY PAGE OF THE TRAINING DATA COLLECTOR GUI
.................................................................................................................................................. 156
FIGURE 7.7: SCREEN SHOT OF THE DEVELOPER GRAPHICAL INTERFACE .............. 158
FIGURE 7.8: THE FIRST VERSION OF THE COLLABORATORS TABLET PC INTERFACE
FOR THE HANDWRITTEN PITMANS SHORTHAND RECOGNITION SYSTEM...... 160
FIGURE 7.9: THE LATEST VERSION OF THE COLLABORATORS TABLET PC
INTERFACE FOR PITMANS SHORTHAND RECOGNITION SYSTEM...................... 161
FIGURE 7.10: SCREENSHOT OF A NOTE-PAD LAYOUT OF THE END-USER
INTERFACE OF THIS RESEARCH.................................................................................... 163
FIGURE 7.11: SCREENSHOT OF AN ALTERNATIVE LAYOUT OF THE END-USER
INTERFACE OF THIS RESEARCH.................................................................................... 164
FIGURE 7.12: THUMBNAILS OF THE FOUR GUIS EVALUATED IN THE EXPERIMENT 165
xi
FIGURE 7.13: THE GENERAL DISTRIBUTION OF USER FONDNESS FOR THE PRESENTED
PROTOTYPES......................................................................................................................... 166
FIGURE 7.14: THE DISTRIBUTION OF USER FONDNESS FOR THE PRESENTED
PROTOTYPES IN THE CASE OF SPEED WRITING ..................................................... 167
FIGURE 7.15: THE DISTRIBUTION OF USER FONDNESS FOR THE PRESENTED
PROTOTYPES IN THE CASE OF A SMALL AMOUNT OF TEXT ENTRY INTO
HANDHELD DEVICES.......................................................................................................... 168
FIGURE 7.16: THE COMPARISON OF THE MOST FAVOURITE GUI OF EXPERIENCED
SHORTHAND WRITERS AND THAT OF NOVICE SHORTHAND WRITERS ............ 169
Chapter 1 Introduction
Recently, there has been a dramatic growth in the use of handheld devices as powerful
appliances to collect and distribute information efficiently.
companies and organizations worldwide who are implementing mobile business solutions to
accelerate business cycles, increase productivity and reduce operating costs by the use of
mobile phones, tablet PCs, pocket PCs and Personal Digital Assistants (PDAs). Current
handheld computers are applicable to daily business procedures, however the ultimate
usefulness of these handheld devices depends on a solution to a serious bottleneck: textual
information needs to be entered as quickly and accurately as possible, similar to using a full
size keyboard. Computers continue to get smaller and thinner with the thinnest tablet PC
recently launched by NEC merely 1 cm thick and weighing less than 1Kg at the time of
writing. The transformation of a standard QWERTY keyboard into these compact devices
has not been so effective; miniature keyboards make text entry very slow at less than 10
words per minute (wpm) [Mt98].
This bottleneck has been a major concern for manufacturers of handheld devices and
decades of research and development have been invested in inventing a feasible means of
text entry into mobile devices, resulting in commercial systems with four main types of text
input methods: (a) on-screen keyboards, (b) handwriting recognition systems, (c) gesture
based text entry systems, and (d) speech recognition systems. The existing systems meet the
fundamental requirement of inputting text into handheld devices, but a solution to practical
rapid text input into handheld devices still remains to be done.
This dissertation presents work on the research, design, implementation and evaluation of
techniques that facilitate rapid text entry into a pen based computer, approximately at the
same rate as speech (i.e., more than 100 words per minute). It is based on Pitmans
Shorthand, which is a speed-writing mechanism widely practiced in the real time reporting
community.
This chapter gives an overview of the linguistic post processing system of a handwritten
Pitmans Shorthand recognizer. It mainly highlights the motivation and scope of the work.
It also outlines the general objectives of the thesis and draws attention to the authors
contribution to achieve each objective. A synopsis of the thesis that explains the structure of
the dissertation along with a brief summary for each chapter is given at the end of the
chapter.
1.1 Background
1.1.1 Collaboration
Research in this thesis has been carried out in close cooperation with Nanyang
Technological University (NTU) in Singapore to the extent that a team from NTU
contributed to the research and development of low level classification of handwritten ink
data, and a team from the University of Nottingham contributed to transliteration of
classified primitives into English words. The collaboration has been a great success with
several workshops carried out at NTU annually as well as with a series of co-authored
publications [HHL+04a], [HHL+04b], [HHL+04c], [YLH+04a], [YLH+04b], [HHL+05a],
[HHL+05b], [HHL+05c], [YLH+05a], [YLH+05b], [YLH+05c]. In addition, concurrent
development of the two engines (i.e., recognition and transcription engines) has not been
difficult, mainly due to the accessibility of the classified data of the recognition engine since
the start of the project. This is because the collaborator has already carried out extensive
research on the low level segmentation and classification of handwritten Pitmans Shorthand
outlines for over two decades and the collaborators contribution to this research is, in fact,
improving an existing recognition engine rather than developing a completely new one.
Previous work done by our collaborator can be referenced in [Lg84], [LD84], [LDB84],
[LDB85], [LD86], [LD87], [QL89], [LQ89], [Lg89], [Lg90], [LQ90], [QL91], [NL92],
[LQ92], [QL93]. The transcription engine and work described in this thesis is, however,
new.
1.1.2 Motivation
The major motive behind this research has been to investigate the linguistic post processing
of handwritten Pitmans Shorthand as a rapid means of text entry on handheld devices and
an evaluation of the overall performance via a tablet PC based demo system. This involves
data pre-processing, lexicon preparation, word level interpretation, phrase level
interpretation and the development of a Graphical User Interface (GUI). No earlier work
fully presents a handwritten Pitmans Shorthand recognizer for handheld devices with a
complete GUI.
One of the factors that holds the potential for the automatic recognition of handwritten
Pitmans Shorthand is due to the language itself being simple and fast to write. Pitmans
Shorthand records speech phonetically and comprises simple notations of 24 consonants, 12
vowels, and 4 diphthongs. It defines 90 of the most frequently used words as shortforms
(i.e., single simple pen strokes invented for speed improvement purposes) and these 90
shortforms account for over 37% of the most commonly used English words [Lg90].
Personal Digital Assistants (PDAs) are not productive enough to record speech in a real
time.
In addition, having a cooperative research network provides a firm foundation on which this
research can be based. The linguistic post processing of handwritten Pitmans Shorthand
can be taken as a further step of expanding what is already possible with a Pitmans
Shorthand classifier, as reported in the literature. The classifier supports the noise reduction,
and outlines segmentation and classification of pattern primitives into related categories. It is
a low level processing tool and its output is fed directly to the transcription engine.
Finally, hardware and technical viability played an important role in the successful
development of the whole research.
handheld devices have become more easily accessible with a more powerful engine but at a
cheaper price. A number of mobile PC and tablet PC development tool kits have become
available and these factors have strengthened the feasibility of the research.
1.1.3 Scope
From a handwriting recognition perspective, this research relates to an online recognition1.
It includes a minimal study of the low level processing of handwritten scripts, with a deep
research into the transliteration of shorthand primitives into orthographic English words.
This incorporates theories and techniques of pattern recognition, natural language processing
and mobile PC applications. Figure 1.1 illustrates a high level view of the scope of the
thesis.
time order; whereas in the off-line recognition, an input is in the form of a digital image of
handwritten word.
Syntactic
knowledge
Handwriting Online
Pattern
recognition handwriting
recognition
recognition
Lexical
semantic
knowledge
Statistical
language
model
Natural
language
processing
Tablet PC
applications
Pen based
PC
applications
Legend
Handheld
device
applications
Three areas have been investigated in the field of pattern recognition. The first one is
concerned with setting protocols to interrelate a linguistic post processor with a low level
classification engine. Without the successful integration of these two engines, work in this
thesis would not have been feasible. The second one consists of defining a network model
that not only best represents the natural ambiguity of handwritten Pitmans Shorthand, but
also produces promising output for a written word. The third area is focused on investigating
relevant word rejection strategies in which the interpretation cost has been taken into
account, mainly in terms of its search time and storage requirements.
In the field of natural language processing, a substantial amount of work has been done in
the construction of a shorthand lexicon that is used to support word level transcription. This
mainly includes the application of rule based algorithms to simulate instinctive knowledge
gained from learning Pitmans Shorthand and the creation of a shorthand dictionary based on
this knowledge. In addition, a survey on the impact of statistical language modelling in
handwriting recognition has been carried out in relation to phrase level transcription.
In the field of mobile PC applications, three types of end user interfaces have been
developed in this research; (1) a Training Data Collector, (2) an Advanced User Controller
and (3) a Final User Interface. By using the Training Data Collector, a vast amount of
training data can be collected effectively, and by using the Advanced User Controller, a
developer can have deep insight into the structure of the system, thereby enabling him/her to
make changes to the low level parameter settings. Similarly, by using the Final User
Interface, a user can have a front-end view of the system and can practice real time
shorthand input into handheld devices. The development of the interfaces includes the
application of pen based APIs, analysis of parameters of the transcription engine, collection
of training and testing data, and evaluation of the overall system performance.
How effectively has the Pitmans Shorthand linguistic post processor integrated with
the collaborators low level recognition engine under the condition of developing the
two engines in different countries?
The solution includes an extensive collaboration between the two teams including
the authors annual visits to the partners institution, setting protocols on the data
flow and modification of components between the two systems, concurrent
evaluation of the whole system on both sites, and co-authoring the publication of
progress reports.
What are the tasks of the recognition engine and the transcription engine in general?
A high level view of the tasks of the recognition and transcription engines are shown
in Figure 1.2. The white boxes at the top of Figure 1.2 represent processes of the
recognition engine and the shaded boxes represent tasks taken by the transcription
engine. The sample input outlines in Figure 1.2 illustrate the functions of the
recognition and transcription engines.
To what extent is the linguistic post processor based on the previous work?
Data
collection
Segmentation
Classification
Word level
transcription
Phrase level
transcription
segmented data
classified data
transcribed word(s)
result word(s)
worn
warm
storm
x1, y1
x2, y2
x3, y3
x4, y4
.
.
worn
Segment 1
3 possible types of
Segment 1.
Coordinates of a
pen
2nd sample input outline
Collected data
segmented data
classified data
transcribed word(s)
Sudden
Welcome
Seldom
x1, y1
x2, y2
x3, y3
x4, y4
.
.
result word(s)
Warm
Segment 1
Coordinate of a
pen
3 possible types of
Segment 1.
Legend
Process included in our collaborators
recognition engine
Process included in the authors
transcription engine
Data flow
Figure 1.2: A high level view of the scope of the recognition engine and the
transcription engine
What new approaches are there in this linguistic post processing research?
The Significant new approaches of this thesis are:
Welcome
Shorthand is reported in this thesis. Whereas, most of the work in the literature
emphasized only an initial segmentation and classification of shorthand primitives.
(d) The first tablet PC based demo system has been produced. This allows a future
researcher to have deep insight into the performance of the recognition and
transcription engines via functional interfaces. It also enables an end user to input
shorthand into a handheld device.
The system is intended to be applicable for any pen based mobile device in which
the use of a traditional QWERTY keyboard is impractical. Experiments and
evaluations of this thesis are based on tablet PCs with Microsoft Windows XP
Tablet PC Edition 2005.
10
What kind of people are involved in the training and testing of the overall system?
In order to evaluate a realistic performance of the whole system, the training and
testing involve writers with different levels of skills in Pitmans Shorthand, different
genders and ages.
11
levels of tidiness. The evaluation process also involves a list of practical concerns
such as usability, learning curve, popularity/commercial viability of the system.
This chapter (Chapter 1) presents the motivation, scope, and background of the research. It
introduces the three main problem areas relating to the themes of the thesis, major objectives
and contributions.
Chapter 2 reviews key concepts in the areas of Pitmans Shorthand recognition, pattern
recognition and natural language processing. The focus in Pitmans Shorthand recognition is
on the evaluation of existing text entry methods into handheld devices, the study of Pitmans
Shorthand, and the review of existing approaches applied to the automatic recognition of
handwritten Pitmans Shorthand problems. The focus in pattern recognition is on the
analysis of the capabilities of commonly used graphical models to resolve natural
ambiguities of handwriting. Finally, the focus in natural language processing is on the
review of the Viterbi algorithm and statistical language modelling techniques used to
enhance the solution to the phrase level transcription problem.
Chapter 3 reports on a prototype that implements the architecture designs described in the
literature.
Pitmans Shorthand outlines and presents the problems that need resolving. This chapter
12
Chapter 4 presents the main architecture and design of a novel primitive based transcription
approach. Ambiguities of handwritten Pitmans Shorthand, in particular, stroke variations
and vowel omissions are resolved by introducing Bayesian Network based shorthand outline
models. The word interpretation includes outline models creation, belief propagation,
Bayesian Network based learning and model selection. The conceptual solution is shown to
improve the solution to the word level transcription problem.
Chapter 6 proposes a Viterbi algorithm based framework to resolve the Pitmans Shorthand
specific phrase level transcription problem. The framework comprises Pitmans Shorthand
related contextual knowledge. Experimental results demonstrate the practical benefits of the
proposed framework.
Chapter 7 documents the roles of the graphical user interfaces of this research that are
designed for the developers authoring environment, the experimental users authoring
environment, and end-users authoring environment. Experimental results substantiate the
affirmative feasibility of the proposed interfaces.
13
This thesis supports the argument that the development of an automatic handwritten
Pitmans Shorthand interpreter is feasible and useful. Chapter 8 highlights the argument by
reviewing the dissertations key points, linking the results to the general objectives,
highlighting the contributions and presenting the perspective future work.
14
Background
to
the
Automatic
15
Recognition
of
Chapter 2 Introduction
This chapter provides background information on the computer aided recognition and
interpretation of handwritten Pitmans Shorthand. It comprises of seven sections, outlined
as:
-
Summary.
On screen keyboard.
16
On the whole, Pitmans Shorthand presents a number of strengths that facilitate very rapid
writing, but it also presents a drawback in that it has a long learning curve, which includes
17
memorizing new phonetic symbols and pronouncing words based on a number of rules.
Having said that, there are millions of Pitmans Shorthand writers who have received
training in its use [Lg90] and most of them remark that it is worth learning although there is
a little frustration at the time of learning. Therefore, the automatic recognition of
handwritten Pitmans Shorthand is intended to be beneficial to a particular group of
stenographers, plus some interested users who are dedicated to achieve fast data entry using
handheld devices.
On the whole, gesture based text entry systems facilitate a faster data input compared to
normal cursive handwriting recognizers; however, memorizing gestures of a substantial
number of words results in a very steep learning curve.
In general, Pitmans Shorthand recognition is similar to gesture recognition since both
interpret a series of lines into words and produce a fast data input. However, there is no
18
Starting
point
(a)
(b)
Figure 2.1: Illustration of text entry using SHARK system (a) The word quick is
written using ATOMIK keyboard layout (b) The word quick is written without using a
template keyboard
2.1.4 Speech
Recognition
Systems
vs.
Handwritten
Pitmans
Shorthand Recognizer
In terms of efficiency and operational cost, speech recognition systems seem the most
outstanding compared to other data input methods because users can speak naturally as well
as rapidly (around 100 -120 words per minute) using speech recognition systems. An
example is given in the real time subtitling of TV programs, where speech is automatically
dictated into text and manual retranslation cost is reduced. A primary negative aspect of
speech recognition systems is that data must be spoken. On some occasions, it is not always
feasible to input data via voice, for instance, an automatic transcription of a noisy debate
using a speech recognition system is considerably difficult unless it is feasible to encourage
speakers to use microphones.
develop a system that facilitates an alternative means to record speech without using speech
input.
19
Words are written as they are pronounced with the main feature of Pitmans Shorthand being
the simplicity of its notations. There are 24 consonants, 12 vowels and 4 dipthongs in
Pitmans Shorthand. A skeleton of a shorthand outline is formed by a combination of
consonant strokes, and the writing of vowels is optional. This means it is essential to write
the consonant strokes of a word, but vowel notations can be omitted when the writing needs
to be fast. There is no standard rule defined for the omission of vowels it varies widely
depending on a writers experience or an individuals inclination.
Due to the phonetic based formation of words, Pitmans Shorthand is easily adaptable to
multiple languages (15 languages to date). It is practiced as a speech-recording medium in
the real time reporting community at a practical rate of about 120-180 words per
minute[Lg90 ]. It is widely used in offices in the UK and is also taught in 74 other
countries. [Lg90]
Figure 2.2 illustrates 21 out of 24 basic Pitmans consonants with three diagrams, which can
be easily remembered. In order to understand notations of the diagrams, consider the leftmost stroke in Figure 2.2 (a). This stroke indicates that notations for phonemes /P/ and /B/
are
and
respectively. Note that the two notations are the same down-stroke with
different line thicknesses. Similarly, according to Figure 2.2 (b), notations of phonemes F
and V are
and
respectively.
20
P, B
P, B
S, Z
th, TH
K, G
F, V
SH, ZH
N, NG
(a)
(b)
(c)
In addition to the 21 consonants in Figure 2.2, there are three additional consonants of
Pitmans Shorthand, which are /W/, /Y/ and /H/. These consonants are formed using hooks
and upstrokes as shown in Figure 2.3. Vowels and diphthongs are simple pen strokes and
are also illustrated in Figure 2.4.
Vowel notations
Diphthong notations
Diphones notation
Words are constructed with consonant and vowel notations in Pitmans Shorthand and a
script containing both consonants and vowels is called a vocalized outline.
Samples of
vocalized outlines, including notations of vowels, diphones and diphthongs are illustrated in
Figure 2.5 .
21
diphones
diphthong
vowel
bait
go
radio
time
By using basic notations, illustrated in Figure 2.2, Figure 2.3, Figure 2.4, a person can start
writing a shorthand outline that is phonetically correct, but not in complete accordance with
the special rules of Pitmans Shorthand. The special rules include 20 definitions, invented
for speed enhancement purposes and need to be remembered thoroughly if a person wants to
be a professional Pitmans Shorthand writer. Details about the special rules of Pitmans
Shorthand can be referenced in [Oj95] and one of the special rules is given as an example
here. In the example (Figure 2.6), the word play, comprising of three phonemes (/P/, /L/
and //) can be written phonetically using the basic notations of Pitmans Shorthand as
shown in Figure 2.6 (b). However, one of the special rules of Pitman shorthand is read: - if
a phoneme /P/ is followed by a phoneme /L/, the notation for /L/ is transformed into a small
hook and the small hook is attached to the beginning of /P/ stroke. Therefore, the word
play should be written in the form of Figure 2.6 (c) rather than in Figure 2.6 (b) although
the form in (b) is phonetically correct.
Phonemes
P
Basic notations
(a)
(b)
(c)
Figure 2.6: (a) Basic notations of Pitmans Shorthand (b) The word play is written
phonetically using basic notations (c) The word play is written using a special rule
of Pitmans Shorthand
22
a/an
all
and
as/has
do
eye/I
have
(a)
your company
I am not
(b)
Figure 2.7: (a) Samples of short forms (b) Samples of phrases
Concurrent work [Lg84] investigated this idea in more detail and further work [LD86]
evaluated the enormous potential of the online recognition of handwritten Pitmans
Shorthand for the real time recording of speech (e.g., verbatim reporting of meetings and
court proceedings). In this approach, four main studies were carried out: (a) detection of a
consonant boundary in a whole outline, (b) classification of segmented consonant strokes,
(c) evaluation of a normal-length stroke confusing with half-length/ double-length strokes,
and (d) evaluation of various inclinations of horizontal and vertical strokes. In addition,
23
different classification algorithms were used to classify vocalized outlines and short-forms in
this approach. The best classification rate reported at the time was 14.5%.
In the early 1990s, extensive research was carried out to improve the recognition of
vocalised outlines and short-forms.
knowledge was the most feasible means to improve the recognition of short-forms, in which
the recognition was based on a template-matching algorithm.
transliteration was carried out by firstly sorting classified pattern primitives into correct
linguistic order; secondly converting primitives into phonemes using a set of production
rules and finally converting phonemes into orthographic English words. The concept of a
machinography, that is, how to modify the original Pitmans notations to be ideally suited
for machine recognition was also addressed in this work.
In later work [LQ90], basic notations of Pitmans Shorthand were categorized into 89 basic
features and incorporated into a neural network.
Concurrently, Leedham and Qiao [LQ90] carried out another experiment to evaluate the
classification performance using a fuzzy classifier. In this approach, the classification (90%
correctness) was achieved by interacting between segmentation and classification processes.
Initial classification errors were also corrected using knowledge of legal primitive-pairs in
this approach.
In 1993, Qiao and Leedham [QL93] took another innovative approach to classify segmented
primitives. Their new method allowed communication between bottom up processes (i.e.,
segmentation based classification) and top down processes (i.e., holistic classification) via an
interactive heuristic (IH) search schema. They reported that locating a boundary between
24
features without first recognizing a whole outline was difficult. The performance of their
work was 84% correct segmentation and 58% correct classification.
In the early 2000s, another research group [NB02] [KSN+03] [SKN+04] [KSN04] started
investigating off-line automatic recognition of handwritten Pitmans Shorthand. This group
concentrated more on the linguistic post-processing of classified primitives into orthographic
English words. Similar to Leedhams approach, phonetic based transcription using the same
concept of vowel ordering was implemented. Incidence of homophones (outlines which are
written the same but have different representations) was addressed in their work and the
filtering of homophones using domain based rejection strategies and context based rejection
strategies was investigated. They mentioned that an ordinary phonetic dictionary was not
adequate for generating text, and a modified dictionary, particularly designed for the
recognition of Pitmans Shorthand was necessary. On the whole, a major limitation of their
work was an impractical assumption about homophones i.e., an occurrence of only two
homophones for each word was considered.
In summary, work carried out by pervious research mainly emphasized the low level
segmentation and classification of shorthand primitives with little work reported on the
backend transliteration. This thesis proposes that further extensive research is required to
improve word level transcription as well as phrase level transcription. In order to achieve
this goal, first it is necessary to make a thorough evaluation of recent popular handwriting
recognition algorithms and natural language processing algorithms.
25
reasonable separation of strokes and provide the related identity of the strokes.
It is
necessary to take into account the context of strokes to achieve a promising interpretation;
however, dealing with spatial context can easily become computationally intensive [BSH04].
For optimum text interpretation, it is practical to take a balanced consideration between
context and the low level ink information of strokes.
In the field of handwriting recognition, a common approach to handle variables (e.g., context
and observed ink information) is by embedding them into a probabilistic model and
discriminating between them based on resultant probabilities. Graphical models are
considered here. Graphical models are a marriage between probability theory and graph
theory [Jm99]. They consist of two kinds: undirected and directed models. Undirected
models have simple definitions of independence, whereas directed models have a more
complicated notion of independence [Mk98]. There is a huge uncertainty and complexity in
the word recognition of handwritten shorthand and directed models are more relevant to
represent features of shorthand as well as interdependencies between them. Popular directed
graphical models are: Hidden Markov Models (HMMs), Neural Networks and Bayesian
Networks. In general, these models belong to the same family examples are provided that
HMM is a kind of dynamic Bayesian Network; a Neural Network is a kind of input/output
HMM. The primary difference between them is the way variables are structured (i.e.,
topology) and the way interdependencies between variables are handled.
26
S1
S2
S3
Si
Observed node
T1
T2
T3
Ti
Figure 2.8: A sample HMM model for a single outline of Pitmans Shorthand. At each
state i, i probability of a particular stroke Si to be type Ti is observed.
There are several kinds of HMMs depending on network topology: HMMs with a mixture of
Gaussian output, input-output HMMs and factorial HMMs. Details of these algorithms can
be found in the literature [Mk01], [Rl89].
In the field of pattern recognition many systems have applied HMMs examples include the
representation of utterances as HMMs for speech recognition [Sa04], [MS04]; the
representation of facial images (combinations of hair, forehead, eyes, nose and mouth) as
HMMs for face recognition [HSS02], [KKL03]; the representation of words as HMM for
handwriting recognition [GB04], [HLB00]; the representation of human motion as HMMs
for gesture recognition [CFH03], [KP01]; the representation of pen-gestures (e.g., writing
pressure and smoothness of a line) as HMMs for signature recognition [JBS05], [YWP95].
27
Generally, HMMs work extremely well for certain types of applications; however, the
Markov assumption itself, i.e., that the probability of being in a given state at time t only
depends on the state at time t-1 is not always appropriate for certain kinds of problems where
dependencies extend from other states [Rl89].
connected via weighted links where the weight specifies the strength of a particular
connection between one node to another. The use of Neural Networks has been
demonstrated in several pattern recognition applications [Ri93]. Similar to HMMs, Neural
Networks have been devised in different types including single-layer linear Networks,
threshold Networks, multilayer Networks and multilayer Networks with learning.
Input from the classifier
89 links to the
following layer
bias
Figure 2.9: An individual cell a of Neural Network, modelled for the classification of
handwritten Pitmans Shorthand in [LQ90]
In that
network, there are 20 layers and each layer (i.e., each segment) consists of 89 nodes,
representing the 89 basic Pitmans primitives. Only one node from each layer is capable of
activating the next layer and the activation is based on the competition among the nodes. A
major drawback of this model is an unnecessary consideration of a wide range of primitives
28
in each layer. In fact, by using the context of an outline and a shorthand dictionary, the
number of nodes required for each layer can be normalised.
In Pitmans Shorthand, stroke relationships mean occurrences of vowel notations and their
positions in a vocalized outline, and starting positions of the first consonant stroke i.e.,
whether it is written above, on or below a base line. An example of vowel dependency and
an example of positional dependency of the first consonant stroke are illustrated in Figure
2.10 (a) and (b) respectively. As shown in Figure 2.10 (a), a dot vowel written at two
different locations (i.e., beginning and end of a stroke) represent two different words in
Pitmans Shorthand. Similarly, two exact outlines written at two different starting positions
(i.e., above and below a base line) represent two different words in Figure 2.10 (b).
29
aid
eat
(a)
The first consonant B (i.e.,
written on the base line
is
base line
bath
bathe
(b)
Conditional independence.
Inference.
Learning.
30
Cloudy
Rain
Wet grass
Rain
CPT of W node
W
Wet grass
S R
P(W= T)
P(W = F)
T
T
F
F
0.98
0.95
0.0
0.94
0.02
0.05
1.0
0.06
T
F
F
T
Therefore, the state of a node being observed or hidden in a Bayesian Network has a huge
influence on the conditional dependency between variables. By using the Bayes ball
algorithm [Sr98], conditional independence between variables can be easily determined with
the information on a node being hidden or observed. The Bayes ball algorithm is illustrated
in Figure 2.13.
31
Legend
Hidden node
Observed node
Figure 2.13: Illustration of the Bayes Ball algorithm [Sr98]. If there is no flow of a ball
from A to B in a graph, A and B are conditionally independent given a set of observed
or hidden variables X and vice versa.
In addition, every node in the Bayesian Network needs to be specified with a Conditional
Probability Distribution (CPD) and a table holding these distribution values is called a
Conditional Probability Table (CPT). A sample CPT of a W node is shown in Figure
2.12. The table indicates the likelihood of grass getting wet with regard to whether a
sprinkler is on and/or whether it has rained.
2.4.3.2
Inference
32
Bayesian Networks are useful is because they permit a more efficient inference procedure
[Ja99]. Inference can be categorized into two types: exact and approximate.
Exact inference procedures are useful when a network structure is not too complex;
however, approximate inference procedures work better in practice when a model becomes
computationally complicated such as models with a repetitive structure or large clusters.
Examples of exact inference algorithms include a local message passing algorithm [Pj88],
[PS91] and a junction tree algorithm [HD96], [CDL+99]. Popular approximate inference
methods include Monte Carlo sampling methods [MD98], variational techniques [SJJ96],
[JGJ+98], [JJ98], and loopy belief propagation [WF99], [Wy00], [FW00].
2.4.3.3
Learning
In ML learning, the goal is to find the maximum likelihood of training data given N cases,
which are assumed to be independent. Assume that D = (D1, , DM) is a training data set
which contains M cases, the maximum (optimal) likelihood of each node can be denoted
as
33
arg max P( D | )
(2.1)
In MAP learning, Maximum a Posterior (MAP) estimation assumes the existence of a prior
p() over the parameters [Ja99]. It prevents the case of zero probability if a particular
parameter is never seen in the training samples by the use of a Dirichlet prior. The chance of
having zero probability in MAP is because the algorithm is based on counting. According
to the wet grass example in Figure 2.12, the MAP of the wet grass node including Dirichlet
prior can be denoted as:
PMAP (W w | S s, R r )
N (W w, S s, R r )
N ( S s, R r )
(2.2)
where N() is the number of times the corresponding parameters are found to be true or
false and , are uniform Dirichlet priors, used when a particular parameter is not seen in a
training set. In general, MAP is used if there are a small number of training cases compared
to the number of parameters [Mk01], however it is still important that the counts are based
on sufficient statistics to achieve an optimal estimation.
Expectation Maximization (EM) is mainly used when variables are partially observable i.e.,
the network contains some hidden nodes. It computes expected values of all the nodes after
(M step) training by using an inference algorithm, and then treats these expected values as
though they were observed (in E step) [Mk01]. Using EM, it is important to know the
structure of the model in advanced as this is the key to identifying any hidden nodes. In the
case of the wet grass example in Figure 2.12, the EM of a W node can be denoted as
PEM (W w | S s, R r )
E (W w, S s, R r )
E ( S s, R r )
(2.3)
where E() is the number of times corresponding parameters are expected to occur.
According to Murphy [Mk01], E() is computed as follows
34
E ( e) I ( e | Dm ) P ( e | Dm )
m
(2.4)
2.5.1
[MS99] stated that the major purpose behind statistical language modelling is to capture a
languages regularities via statistical inference on its corpus.
[QAC05], the concept of applying statistical language models to automatic text transcription
was initiated by speech recognition research.
[Ms01], [QAC05] and [MB01], [ZB04], [VBB04] applied statistical language modelling
techniques to resolve the problems of online and offline handwritten sentence recognition,
respectively, and the work [QAC05] achieved up to 90.4% word recognition accuracy.
In general, the most commonly used statistical language models in the field of handwriting
recognition are n-gram models, which are denoted as follows by [QAC05]:
35
p (W ) p ( wi | wii1n 1 )
(2.5)
i 1
where p(W) is the probability of a word sequence given by a statistical language model, and
= arg max p ( S | W ) p (W )
(2.6)
where is the most likely word sequence for a written sentence (out of the candidate
sequences W), S is a given handwritten sentence to recognise, P(S|W) is the posterior
probability of the written sentence S given a sequence W, and p(W) is the statistical language
models probability for the sequence W. This work identifies the most likely word sequence
for a written sentence by finding the best path in a word graph (i.e., a graphical model of a
sentences candidate words) using a Viterbi search algorithm [QAC05].
2.5.2
Viterbi Algorithm
The Viterbi algorithm provides an efficient way of finding the most likely state sequence in
the Maximum a Posterior (MAP) probability sense of a process, which is assumed to be a
finite-state discrete-time Markov process [Ml00]. Here, a finite state means that the number
of states in the model is limited, discrete-time means that it takes the same unit of time to get
from any state to its adjacent state in the model, and the Markov process means that
(assuming that it is a first order Markov process) the probability of being in state ck at time k
(given all states up to k-1) depends only on the previous states ck-1 at time k-1. [Ml00]
formulates the first order Markov process as follow:
36
p (c k | co , c1 ,..., c k 1 ) p (c k | c k 1 )
(2.7)
Overall, the Markov process can be of any order and the nth order Markov process is defined
as:
p (c k | c0 , c1 ,..., c k 1 ) p (c k | c k n ,..., c k )
(2.8)
In order to clarify the Viterbi algorithms role in handwriting recognition, consider the
Viterbi algorithm (formula 2.9) proposed by [Ml00] for handwritten word recognition. It is
assumed that the process is the first order Markov process in the algorithm.
(2.9)
i 1
where gc(Z) is the maximum posterior probability of the sequence of characters conditioned
on candidate characters sequence C = c1,c2,.., cn, and zi is a feature vector for the ith
character.
37
2.7 Summary
This chapter presented a literature review of systems and techniques relating to computer
aided recognition and transcription of handwritten Pitmans Shorthand. The commercial
viability of the handwritten Pitmans Shorthand recogniser is evaluated in comparison to the
functionalities of handheld devices existing text entry systems. The chapter presents basic
information on Pitmans Shorthand, which is vital to enable the reader to easily follow the
thesis discussions, and it also provides brief reviews of decades of previous work on the
automatic recognition of handwritten Pitmans Shorthand. A number of graphical models
applied to the pattern recognition field were discussed, with a thorough algorithm review of
the Bayesian Networks architecture, mainly from the aspect of the algorithms efficiency in
handling handwritten Pitmans Shorthand word recognition problems. The role of statistical
language models in the recognition of handwritten sentences has also been addressed,
together with a review of the Viterbi algorithm. The chapter also highlighted tablet PC
related application program interfaces (APIs) that are essential for the development of a
commercially viable prototype handwritten Pitmans Shorthand recogniser.
38
39
Chapter 3 Introduction
The previous chapter reviewed the performance of existing work carried out on the
automatic recognition of handwritten Pitmans Shorthand and presented an overview of
popular pattern recognition algorithms that can be used to improve the performance of word
level and phrase level recognition. Before taking the next step to advanced word and phrase
recognition, this chapter first presents a preliminary experiment, carried out to verify
whether existing transliteration methods, proposed in the literature, are efficient enough for
the purpose of this project.
Taking into
consideration the transcription accuracy achieved by existing systems, this research is not
based on the assumption that phonetic based transcription is the only one absolute solution
to transliterate handwritten Pitmans Shorthand.
primitives into words was not feasible at the time of previous work because there was no
electronic Pitmans Shorthand lexicon that enabled primitives to be directly mapped to
related words. If an electronic Pitmans Shorthand lexicon is in existence, direct translation
of primitives into text will become feasible. It is proposed in this research that it is
reasonable to create an electronic Pitmans Shorthand lexicon and analyse a primitive-to-text
translation approach. However, a careful appraisal of conventional methods is performed
before implementing a new algorithm.
advantages and disadvantages of phonetic based translation via experimental results, and
40
In general, appraisal of existing methods can be carried out easily if the existing systems
serve the purpose of the assessment directly. However, this is not the case in the current
assessment (i.e., assessment of conventional phonetic based transcription methods). There
are two reasons for this: firstly, previous work by [LQ90], [QL93], [LD86] mainly
emphasises low level pattern classification and the work presents just logical procedures of a
linguistic post processor with no detailed implementation for phonetic based word
translation.
recognition and the systems there do not fit the objectives of the current experiment. As a
result, this chapter presents a prototype of a linguistic post processor that includes the
conventional idea of phonetic based word translation, plus novel pattern tuning algorithms,
which are effective in dealing with the shape variations of handwritten Pitmans Shorthand
.
41
Input
Transcription engine
Vocalised outline interpreter
Vocalised outline
recogniser
Short-form
recogniser
Short-form
interpreter
Pre-processing
Segmentation engine
(dominant point
detection)
Classifier
(Neural Network)
Template
matching
engine
Output:
A ranked list of
words
Output:
A ranked list of
primitives
Output:
A list of
words
Phrase level transcription
Output
Text
Internet
The role of the transcription engine is to find the best candidate word for a given vocalized
outline or short-form. It includes two major stages: word level transcription and phrase level
transcription. At the word level transcription, short-forms are not taken into account since
they have been interpreted into the most likely words by the recognition engine. Vocalized
outlines are transliterated into sets of English characters by two processes: pre-processing
and word recognition. These two processes are the primary components of the system
presented in this chapter. The pre-processor performs the setting up of essential lexical
knowledge relating to handwritten Pitmans Shorthand. The word recognizer then takes a
42
ranked list of classified primitives, which are forwarded from the recognition engine as
input, and produces a ranked list of candidate words as output.
After word recognition, candidate words of either a vocalized outline or a short-form are put
through a phrase level processor and the word with the highest contextual probability is
chosen as a correct representation for an input outline. The phrase level transcription is not
studied in this chapter since the primary purpose of the preliminary experiment is to analyse
word recognition performance.
Lexicon preparation: converts a phonetic lexicon into a hash table such that similar
sounding words are indexed under the same key in order to cope with phonetic rules
of Pitmans Shorthand.
43
Input outline
Vocalized
outline
recogniser
A ranked list of
classified primitives
Transcription engine
Vocalized outline interpreter
Lexicon preparation
(Pre-processing)
Phonetic
lexicon
A ranked list
of words
Lexicon lookup
Legend
Sentence level
transcription
Output word(s)
Data
Data flow
Process
Read/Write access
Storage
A major benefit of keeping similar sounding words under the same key is to reduce the
search complexity at O(1). In addition, it enables the retrieval of a list of ambiguous words
for an input outline by a single lookup because the creation of a hash table for a lexicon is
44
based on the hypothesis: words with similar pronunciations resemble one another in
Pitmans Shorthand. One may question why similar sounding words resemble one another
in Pitmans Shorthand since this assumption is not true in normal English. In normal
alphabetical handwriting, two similar sounding words do not exactly need to look alike. An
example is given with the words tail and tale in (Figure 3.3); the two words sound alike,
but their scripts are dissimilar enough not to be confused.
Typed words in English
tale
tail
Figure 3.3: Illustration of sample words in normal English and Pitmans Shorthand
In contrast to normal English, similar sounding words do look alike or are identical in
Pitmans Shorthand. This is due to the special rule of Pitmans Shorthand invented for
speed improvement purposes i.e., a pair of voiced and unvoiced consonants are written in the
same stroke with different line thicknesses. An example is given with the words tail and
tale again (Figure 3.3): the two words sound alike and their scripts look identical in
Pitmans Shorthand.
directly affects search performance and an algorithm for the lexicon organisation is
presented below:
N: numbers of words contained in a phonetic lexicon
Xi: ith phonetic index of the phonetic lexicon
Yi: word data relating to Xi
table: a hash table used to store data of the phonetic lexicon
key: a phonetic key
value: word data to which a specified key is mapped in table
45
Initialisation
table = a hash table
Lexicon organisation
For i = 0 to N
key = Xi
Yi = getWordData(Xi)
//convert unvoiced consonants into voiced consonants
key = tuneToVoicedConsonants(key)
//if a phonetic key already exists
if (table.containsKey(key))
value = table.get(key)
value += Yi
end
else if (!table.containsKey(key))
value = Yi
end
table.put(key,value)
end
The lexicon preparation takes place when the transcription engine is run for the first time
and does not repeat when input outlines are transcribed in real time. If any modification of a
lexicon is required, such as a change of word-list or a change of users domain, the existing
hash table can be updated by repeating the lexicon preparation procedure.
Once the lexicon data is ready the next process, denoted as Nearest Neighbourhood Query
is invoked.
46
F, V
P,B
vowel neighbourhood
T, D
TH, th
S, Z
circle neighbourhood
close circles
unclose circles
hooks
The Nearest Neighbourhood Query (NNQ) is, in fact, a heuristic approach in which
misclassified pen strokes are adjusted according to the degree of similarity to other strokes.
Primitives with similar geometric features are predefined in the same neighbourhood and the
system comprises of seven neighbourhoods, where four relate to vertical and horizontal
strokes, one to circular primitives and the remaining two to dot and dash vowel-primitives.
Here, similarity means having similar angular structure for stroke primitives, having similar
shape for circular primitives or having similar location and shape for vowel primitives.
Samples of the predefined neighbourhoods are illustrated in Figure 3.4 and the Nearest
Neighbourhood Query algorithm is presented as follows:
{N1, N2, .., N7}: a collection of seven neighbourhoods
O: an input handwritten outline
I : number of segments of an input outline, O
Si: ith segment of an input outline, O
Pattern: a pattern category of Si
Xi: a resultant vector, containing a set of primitives that are similar to Si
R: an output vector, containing a set of Xi where (i = 1, 2,.., I)
M: a matrix, containing a number of outlines that are similar to O
Initialization
Initialize N1, N2, N3, N4, N5, N6, N7
X = a new vector
R = a new vector
47
Stroke adjustment
for i = 0 to I
//assign the ith segment of an input outline as a pattern category
Pattern = Si
for j = 0 to 7
//if the jth neighbourhood contains the value of Pattern
if (Pattern Nj)
//get all the elements of Nj excluding Pattern
Xi= Nj \ Pattern
end
end
R += Xi
end
Matrix = createMatrix(R)
return Matrix
Handwritten
outline
The output of NNQ is a matrix of primitives, whereby each row represents a particular
shorthand outline that is similar to an input pattern and each column represents a certain
segment of the shorthand outline. A pictorial presentation of NNQ is given in Figure 3.5 in
48
which sample input and output of the algorithm can be clearly seen. Once the NNQ process
is completed the next process, Feature to Phoneme Conversion is invoked.
To clarify the first two rules, consider the two examples described below and to
clarify the last three rules, refer to examples in
49
50
Table 3-2. In addition, basic notations of Pitmans Shorthand relating to each rule
can be looked at in Table 3-1.
Table 3-1: Relationship between the production rules and basic Pitman phonemes
Rule
Pitman phonemes
FD
SES, ZES circles, ST, STER loop, N, F, V, SHUN hook, suffix SHIP hook,
suffix ING/INGS dot
LD
MD, ND, suffix MENT, half length strokes, double length strokes
PC
W, Y, H
PCRO
PL, BR, etc., PR, BR, etc., FR, VR, etc., and FL, VL etc.
DT
51
Recognition output
1st primitive- double length /F/ or /V/ curve
2nd primitive- /A/ vowel
Apply double length rule of /TER/, /DER/,
/THER/, /TURE/
Output of Feature to Phoneme Conversion
Reference
Normal /F/
consonant
Output 1
/TER/+/A/ vowel
Output 2
/DER/+/A/ vowel
Output 3
/THER/+/A/ vowel
Output 4
/TURE/+/A/ vowel
As shown in the reference section of Figure 3.6, a normal downward curve represents a
phoneme /F/ in Pitmans Shorthand, however, when the curve is doubled in length, it
represents the sound /F/ plus additional sounds of /TER/, /DER/, /THER/ or /TURE/.
Therefore, a candidate list for the word after contains four different pronunciations at the
end of phoneme conversion (Figure 3.6).
52
An example for phoneme ordering is given in Figure 3.7 in which its sample inputs are
directly taken from outputs of the Feature to Phoneme conversion process, demonstrated
in Figure 3.6. As shown Figure 3.7(a) the vowel /A/ is detected last although it is the first
phoneme in the word after. The system uses dominant point information and sequence
information of ink data to place vowels at their correct positions. After phonemes have been
sorted into correct order, the resultant phonemes are matched up with a phonetic lexicon in
the next process, called lexicon lookup. Then a list of autographic English words that best
represent the input shorthand outline is produced at the end the search.
Input 1
/TER/+/A/ vowel
Input 2
/DER/+/A/ vowel
Output 1
/A/ vowel+/TER/
Input 3
/THER/+/A/
vowel
Input 4
/TURE/+/A/
vowel
Output 3
/A/
vowel+/THER/
(a)
Output 2
/A/ vowel+/DER/
Output 4
/A/
vowel+/TURE/
(b)
Figure 3.7: (a) Sample input of phoneme ordering process (b) sample output of
phoneme ordering process
53
Number
(a)
Pitman
outline
English
word
Word
Primitives
classified by a
recognition
engine
1.
Phonemes of an outline
- /W/ consonant
(Rule:
2.
= /W/)
3.
4.
- /AW/ vowel
Translation is based on the rule of primitive combination (PC). The rule applied to this
example is IF an upward diagonal stroke is preceded by a small anti-clockwise hook,
THEN the combination of these two primitives denotes the phoneme /W/
(b)
Printed
1.
- /PR/ or /BR/
2.
(Rule:
3.
- /N/ curve
= /PR/ or /BR/)
4.
5.
- // vowel
6.
- // vowel
Translation is based on the rule of primitive combination and reverse ordering (PCRO).
The rule applied here is IF a small hook is followed by a straight downward stroke, the
small hook is converted into phoneme /R/ and swapped with a succeeding phoneme.
(c)
Go
1.
2.
example is IF a horizontal stroke is written from left to right, THEN the stroke directly
denotes the phoneme /G/ or /K/.
54
Word
/B T/
Bat
Pat
Bad
Pad
For an analysis of word transcription performance, 432 Pitman outlines were collected,
written with different levels of tidiness on a WACOM ART II Tablet by three writers. Each
writer wrote a sample sentence, consisting of 28 vocalized outlines and 20 short-forms, three
times. Here, the sample sentence covers the whole range of shorthand primitives and the
selected words contained in the 5000 most frequently used English words of the general
domain. Samples of the collected data are illustrated in Figure 3.9.
55
Unique outlines in %
100
Uniqueness of outlines
for perfect recognition
90
80
Uniqueness of outlines
given line thickness
ambiguity
70
60
Uniqueness of outlines
given vowel ambiguity
50
5000
4500
4000
3500
3000
2500
2000
1000
500
300
100
40
Figure 3.10 illustrates experimental results obtained from different sizes of phonetic lexicons
up to 5000 words. The X-axis of the graph represents different sizes of lexicon, and words
56
extracted for these lexicons are sorted according to their frequency of usage. This means, a
lexicon of size 100 represents the first hundred most commonly used words in English; a
lexicon of size 300 represents the first 300 most commonly used words and so on. The first
test simulates how an input of Pitmans outline can be uniquely identified by a lexicon in the
presence of perfect segmentation and recognition. According to the test, 97% of the 5000
most frequently used English words have a unique representation. The maximum ambiguity
is 3 potential words per index and an average ambiguity is 1.02 potential words per index.
Therefore, a transcription accuracy of at least 97% can be estimated if there are no errors in
the low level segmentation and classification of shorthand outlines.
The second test (Figure 3.10) estimates the transcription performance in the presence of
unclear thickness of a pen-stroke.
recognition of Pitman shorthand as most digitizers are unable to detect the thickness of a
pen-stroke even though Pitman defines similar sounding consonants by the same stokes and
differentiates between voiced and unvoiced sounds by thick and thin lines. It should also be
noted that regardless of the input technology, writers do not make a clear distinction between
thick and thin strokes. According to this test, ambiguity of a lexicon of 5000 words increases
by about 9% if there is no distinction between voiced and unvoiced consonants. The
transcription accuracy here is expected to be at least 87%.
The third test in Figure 3.10 predicts the transcription performance in the presence of
ambiguous vowel notations. This is an important consideration in the recognition of Pitman
shorthand since vowels are occasionally omitted in writing Pitmans Shorthand and omitted
positions vary from writers experience or individual inclination. If a solution to handle the
unpredicted omission of vowels in an outline is by excluding vowels from the lexicon and
matching without vowel components, the new version of the lexicon has about 56% unique
indices.
57
in the presence of shape variation and position confusion due to speed writing or
different users writing;
Transcription accuracy
(Vocalised outline)
Overall
84%
0%
omission or confusion
In the presence of inconsistent
0%
writing
In the presence of classification
100%
error
As shown (Table 3-3), the best rate achieved by the vocalized outline interpreter is 84%.
12% of error rate is due to inconsistent writing i.e. outlines which are comprehensible to
human readers, but are not consistent with the writing rules of Pitman shorthand. An
interesting phenomenon observed in this experiment is that 48% of perfect transcription
occurs in the presence of recognition errors. This shows the approximate pattern matching
technique applied in NNQ is capable of dealing with classification errors. A primary
limitation of this system, which accounts for 40% of error rate, is being unable to correctly
transcribe outlines with hidden or omitted vowels.
58
Both accuracy and error rates reported throughout this experiment are based on a number of
outlines and they can be denoted as follow:
c
100
t
(3.1)
where a is the word transcription accuracy, c is the total number of correctly interpreted
outlines, and t is the total number of handwritten outlines.
tc
100
t
(3.2)
where e is the error rate, t is the total number of handwritten outlines, and c is the total
number of correctly interpreted outlines.
3.8 Discussion
On the whole, a primary advantage of phonetic based transcription of vocalised outlines is to
be able to adapt to existing language models (i.e., phonetic models), which define large
vocabularies with probability distributions between sequences of phonemes.
Another
distinct advantage is that a machine performs the same logical procedures as a human
interpreter to transcribe Pitmans Shorthand outlines and this makes the machine
transcription concept easy to follow.
59
writing rules invented for speed improvement purposes in Pitmans Shorthand allow
multiple ways of pronouncing different sounds for primitives with minor differences of size,
length, thickness or inclination. In general, it is practical to express accurate size, length, or
inclination of a stroke for a printed script; however it is less practical for handwriting,
especially if the script is written at speed. The following examples illustrate the variation of
pronunciations with the minor differences between geometric features.
there is a non-stressed vowel between /T/ and /L/, Pitman uses a combination of a small
hook and a vertical stroke (i.e.,
On the other hand, an outline with a small circle followed by a vertical stroke (i.e.,
stands for the sound /ST/ and it can be easily confused with an outline of /TL/ or
/T+silent_Vowel+L/ if the circle at the beginning is not clearly written. According to
experimental results, approximately 45% of small hooks are recognised as circles.
Therefore, a direct conversion of primitives, which are prone to minor recognition errors,
into phonemes can lead to completely different interpretations.
Handwritten outline
/ST/
/TL/ or
/T+silent_vowel+L/
60
/SH T ER/
/SH /
61
62
Chapter 4 Introduction
The previous chapter reviewed the advantages and disadvantages of a phonetic based
transliteration of handwritten Pitmans Shorthand and concluded that the idea of not
following conventional phonetic approaches is rather appealing. This chapter discusses the
novel approach, implemented specifically for this research to improve word transcription
accuracy by using a primitive to text transliteration approach. In this new approach,
Bayesian Network representation is applied to configure ambiguities and stroke
dependencies of handwritten Pitmans Shorthand outlines.
First of all, an overview of the whole system is given, thereby enabling the reader to get a
clear understanding of the role of the word transcription processes. Following the overview,
a detailed description of a Bayesian Network word recogniser is given under the following
topics.
Life cycle: explanation of a life cycle of Bayesian Network models that represent
handwritten Pitmans Shorthand outlines.
63
Model selection: selection of N-best outline models for a given input outline centred
on knowledge based rejection strategies.
Transcription engine
Vocalised outline interpreter
Vocalised outline
recogniser
Short-form
recogniser
Short-form
interpreter
Pre-processing
Segmentation engine
(dominant point
detection)
Classifier \
(Neural Network)
Template
matching
engine
Output:
A ranked list of
words
Output:
A ranked list of
primitives
Output:
A list of
words
Phrase level transcription
Output
Text
Internet
An overview of the whole system is given (Figure 4.1), in which the diagram is nearly
identical to the one illustrated in the pervious chapter. A major difference between the two
frameworks is the change to a vocalized outline interpreter (shaded box) in the new
framework, where text is interpreted directly from primitive attributes instead of phonetic
64
attributes as in the old framework. A summary of processes included in the new vocalized
outline interpreter is presented as follow.
Vocalised
outline
recogniser
Input
outline
Transcription engine
Bayesian Network based vocalised outline interpreter
Pre-processing
Shorthand
lexicon
Lexicon construction
A ranked
list of
words
Training
Bayesian
Network based
outline models
Word interpretation
Phrase level
transcription
Output word(s)
Legends
Data
Data flow
Process
Read/Write access
Storage
The shaded box in Figure 4.2 highlights the role of Bayesian Network based vocalized
outline transcription.
interpretation. The preprocessing takes place at the first time of setting up a transcription
engine and it is skipped in real time transcription of shorthand outlines unless any
modification of lexical data is required.
automatically convert a phonetic lexicon into a Pitmans Shorthand lexicon such that
different combinations of a series of geometric patterns represent different keys, with each
key mapping to one, or more than one, word. This approach (the creation of the Pitmans
65
Shorthand lexicon) is distinct from previous work and a full description of the lexicon
creation is given in a separate chapter, Chapter 5. Another important function of
preprocessing is to create Bayesian Network based outline models where user independent
handwritten data and lexicon information are embedded in hierarchical probabilistic
structures.
The next process, which takes place immediately after the preprocessing, is word
interpretation. A primary function of the word interpretation is to produce a ranked list of Nbest words based on a confidence score of the low level recognition plus a belief of nodes of
an outline model. After the word interpretation, the N-best words are then forwarded to the
next process, called a phrase level interpreter to produce the final word(s) for a given input
outline.
and bays
66
model enables the system to easily find potential candidate words for a given outline and
improves the search performance. Here, similar outlines stands for words with the same
series of geometric features (of a consonant kernel) regardless of different line thicknesses
and different vowel positions. Samples of similar outlines are illustrated in Figure 4.3.
pays
bays
oak
go
airs
erase
In terms of the life cycle, outline models are firstly created with the use of a shorthand
lexicon and secondly updated with a set of training data. The models are then saved as a
knowledge source for word interpretation until any changes are required. Examples of
changes include expanding the word list of an existing dictionary or altering a user domain.
In response to the change of a user domain, outline models are created, edited or removed
according to the users preference, defined at a domain set up process. Note that vocabulary
(i.e., the word list of a dictionary) has a huge impact on word transcription performance and
outline models should be associated with a dictionary of a corresponding domain. Figure
4.4 illustrates the life cycle of outline models.
In real time word interpretation, a series of classified primitives of an input outline are
matched with outline models, and the model with the highest posterior probability is taken as
a correct representation for a written outline.
67
Training
data
Creation of a
new outline
model
Update of existing
outline models
with training data
Removal of
outline model with
new domain set up
Legends
Storage
Process
Node of an
outline model
Outline model
Read / write access
Vowels are always written last no matter how words are pronounced in Pitmans Shorthand
and this makes the automatic transliteration of handwritten Pitmans Shorthand distinct from
the transcription of handwritten English. According to the study in Chapter 3, reordering
vowels to their corresponding positions was found inefficient in the case of having missing
vowel variables in an outline. To improve upon existing systems, it was argued, one should
68
somehow seek a more parsimonious solution that also leads to a better text interpretation
performance. Thus, this research proposes a novel network model, denoted as an outline
model which represents the inherently complex features of handwritten Pitmans Shorthand.
In normal English
In Pitmans Shorthand
1st letter
2nd letter
3rd letter
4th letter
Figure 4.5: Illustration of chronological writing order of normal English and Pitmans
Shorthand
1. Root node: A root node corresponds to an outline O and it represents one, or more than
one, word. It contains N child nodes {P1, P2,.. PN} where Pi corresponds to a collection of
primitives which represents the ith segment of the outline O.
69
O
O
P3
V1
P1
P3
P1
P4
P2
H1
P4
P5
(b)
(a)
3. Virtual node: A virtual node corresponds to a certain segment of a shorthand outline and
represents a conditional variable that allows the embedding of multiple possibilities of a
consonant-segment in an outline model O. It appears when two or more primitives compete
to represent a particular node of O during the training process, but it never appears while O
is created with a shorthand lexicon at the beginning. The definition of virtual node reads: if
a particular primitive (e.g., P1 in Figure 4.6 (b)) is dependent on another primitive (e.g.,P2 in
Figure 4.6 (b)) and there is an optional relationship between them (i.e., either at most one or
none of them can be true at the same time), we can assume that there is a mechanism that
controls the values of P1 and P2, resulting in a virtual node V1 as shown (Figure 4.6 (b)).
70
4. Hidden node: A hidden node corresponds to a certain portion of a shorthand outline and
represents a conditional variable that allows the embedding of hidden vowel primitives in an
outline model. An interesting assumption in relation to the creation of a hidden node is that
it appears from the time when outline models are created with a shorthand lexicon, although
the lexicon provides accurate vowel information at the time. This is due to a major purpose
behind hidden nodes i.e., to identify missing vowel components, randomly omitted by
writers according to the writers experience or preference. The definition of hidden nodes
reads: if a particular primitive (e.g., P4 in Figure 4.6 (b)) appears or disappears from time to
time and the variation does not adhere to any rule, we can assume that there is a hidden
mechanism that controls the value of P4 or P5, resulting in a hidden node H1.
In order to demonstrate how an outline model is created with the use of four types of node,
the step by step creation of an outline model for the word bake is given (Figure 4.7).
1. Firstly, a root node of an outline model is generated with the word bake.
2. The root node then creates N number of child nodes using a shorthand lexicon such
that each consonant primitive of a word in the lexicon turns into a unique node and
each vowel primitive turns into a hidden node, where N is the number of primitives
of the word.
3. The outline model is then updated with a number of training samples, resulting in
additional leaf nodes and virtual nodes.
71
4 7 91
4 7
1 6
2 possibilities of
the 1st segment
92
4 5
Description
Outline model
R
Step 1:
Step 2:
Bake
R
7
L
91
Step 3:
91
92
Step 4:
legend
R
Root node
Virtual node
Hidden node
Leaf node
72
91
92
WStart
S1, 0, 64, 4, 0.56
S1, 0,64, 1, 0.44
S2, 64, 137, 7, 0.88
S2, 64, 137, 6, 0.12
V1, 0, 64, 2, 1, 92
WEnd
Figure 4.8: Sample training data for the word bake processed by the recognition
engine; the italic text on the right explains what each line of data represents
In addition, detailed explanation on training data, applied in the creation of an outline model
is given in Figure 4.8 which illustrates the training data 1 of the word bake, depicted in
Figure 4.7. The second and third line of data (Figure 4.8) indicates that there are two
possible pattern categories associating with the first segment of the word bake: type 4 and
type 1. Here, type 4 is equal to an existing pattern of a shorthand lexicon and type 1 is a new
pattern observed by the recognition engine. In order to update an existing outline model
with this new observation, an existing unique node (Figure 4.7, Step 3) is firstly transformed
into a virtual node and secondly attached with two leaf nodes, resulting in a virtual node
with two children. Similarly, according to the sixth line of data (Figure 4.8), a vowel
primitive (type 92) classified by the recognition engine is different from the one defined in a
lexicon (type 91), resulting in a hidden node with two children.
73
If nodes are extremely independent of each other, they become d-separated given an
evidence node, thereby making a network model unable to cope with abnormal
circumstances. For example, variables A and B, which are usually dependent on
each other, may be disconnected due to an occurrence of rare evidence, E.
On the other hand, if nodes are precisely connected to each other with conditional
probability distributions for all possible cases, it becomes computationally inflexible
to obtain a reliable estimation.
Taking into account the drawbacks of the above two extreme situations, the outline models
for this research are designed with the following practical hypothesis:
1. each node Pi is independent of its non-descendants Pj
2. each node Pi is independent of its descendants Di given a parent of Di
3. leaf nodes {L1, L2,, Ln} are independent of each other unless they share the same
parent Xj
Alternatively, conditional dependency between variables of an outline model can be
presented using a Bay ball algorithm [Sr98] as illustrated (Figure 4.9).
74
Legend
Hidden node
Observed node
4.5 Inference
The inference process of a Bayesian Network involves updating the probability of nodes
given some evidence and priori probabilities [XL02]. It is called finding belief of a node x,
denoted as BEL(x). In our case, evidence of nodes is given by a lexicon, training data or
user input. A primary use of BEL(x) is to find the likelihood of outline models, with which
the N-best models for a given shorthand outline are selected.
Among a variety of beliefs updating algorithms that support the Bayesian Network, this
work directly applies the message passing algorithm developed by Pearl [Pj88] the belief
of every node in the network is taken as the product of and messages, where is a
message received from each of its parents (if any) and is a message received from each of
its children (if any). Alternatively, and of each node of an outline model is denoted as a
product of X(U), a message that node X receives from its parent U and Y j ( X ) , a message
that node X receives from its child Yj. Note that an outline model is a tree structure where
every node has one and only one parent (except the root node, which has no parent) and has
N number of children (Y1, Y2,.., YN).
75
In this work, message initialisation varies depending on the type of node. The initialisation
of and messages for different types of node of an outline model is presented as follows.
Root node: A root node is the topmost one in an outline model and does not have any parent;
therefore its message is set to 0.5 assuming that there is an equal chance of taking a TRUE
or FALSE value for this node. Its message is set to 1 assuming that there is a TRUE
relationship from its child nodes.
Unique node: A unique node does not have any descendants and is linked directly to a root
node. Its message is set to 1 assuming that there is a TRUE relationship from its parent
(root node) and its message is set to 1, stating that a primitive associating with this node
appears in both lexicon and training data.
Virtual node: A virtual node is a judgemental node holding a true relationship from its
parent
(i.e.,
1)
and
optional
=P(Child_Nodes|observation).
76
relationship
to
its
children
(i.e.,
Hidden node: Similar to a virtual node, a hidden node holds a true relationship from its
parent
(i.e.
1)
and
optional
relationship
to
its
children
(i.e.,
=P(Child_Nodes|observation).
Leaf node: A leaf node (not including a unique node) holds an optional relationship from
a virtual node or a hidden node and its message is set to P(Child_Nodes|observation). It
does not have any children and its message is set to a confidence score of the node
obtained from training data.
On the whole, our message initialisation strategy is similar to the one implemented by Xiao
and Leedham [XL02] for signature verification. Nonetheless, the estimated values are
different in this work in accordance with the characteristics of handwritten Pitmans
Shorthand.
BEL( x) ( x) ( x)
(4.1)
where is a normalization factor, (x) is a combined message received from all the children
of node X and (x) is a combined message received from all the parents of node X.
Depending on the type of node, (x) is calculated differently. If it is a Root Node, (x) can
be defined by the formula presented by Pearl [Pj88]:
( x) (Y ( x))
j
77
(4.2)
Where Y j (x) is a message that a node X receives from its child node Yj.
( x) 1
(4.3)
Y j ( x)
( x)
0.001
(4.4)
Y j ( x)
( x)
0.1
otherwise
(4.5)
In equations 4.4 and 4.5, the values 0.001 and 0.1 are predefined probabilities, used when
none of the child node of X is likely to be true. Selection of these confidence scores is based
on several experimental results, testing with different thresholds between 0 and 1.
If a node is a leaf node, but not a unique node, (x) is defined as:
(1.0,1.0)
( x)
(a, b)
(4.6)
78
where a and b are normalised recognition and training probabilities for a corresponding
node. The next section, Learning of Outline Models explains how a and b are calculated
using training data.
There are various learning algorithms [Hd99], [Mk01] that support Bayesian
Networks and the selection of an appropriate one is based on two factors: structure of the
network (whether it is known or unknown) and evidence of nodes (whether they are fully or
partially observed). With full details of these two factors, an appropriate learning algorithm
for a particular Bayesian Network can be identified using Murphys decision table (Table 41), shown below.
Table 4-1 Murphys decision table [Mk01]
Structure
Known
Unknown
Full
Close form
Local search
Observability
Partial
EM
Structural EM
The table indicates which algorithm is likely to be the most effective under which
circumstances. For example, the Expectation Maximization (EM) algorithm is likely to be
79
the most suitable one for a Bayesian Network whose structure is known in advanced and
whose parameters are partially observed. With reference to this table, parameter learning of
an outline model is discussed in two parts: learning of consonant primitives and learning
of vowel primitives.
The basic idea behind the MLE method is to maximise the likelihood of training data D,
which contains M cases (believed to be independent) [Hd99].
Assuming that
80
a=
1
M
( | P, D )
i 1
(4.7)
b=
1
M
( | P, Di)
(4.8)
i 1
In addition, the value b is saved in a history file to be used to create new outline models that
do not have training data; the history file creation is presented below.
Initialization
D: a collection of training data
Di: ith sample of the training data, D
L: a primitive lexicon
Li: an element of L which holds an equal word value as Di
N: number of consonant primitives contained in Di
j: an index identifier
Ni,j: a pattern representing the jth consonant primitive of Di
Li,j: a pattern representing the jth consonant primitive of Li
b : probability of Ni,j to have a relationship with Li,j
81
History updating
If (Ni,j != Li,j & b >0)
Save b
The above two lines of pseudo code indicate that if a pattern Ni,j ,observed in training data is
not the same as the one defined in a lexicon and if there is evidence confirming a
relationship between Ni,j and Li,j (i.e., the value of b is greater than zero), the system creates a
history file and stores the value b as a probability of Li,j to be recognised as Ni,j . Then, later
in the training process, the history file is retrieved to construct new outline models for words
which do not have any training samples.
In brief, the use of the history file has a significant benefit to the training of shorthand
outline models, particularly for words which do not have sufficient training data. This is
mainly because Pitmans Shorthand is no longer a demanding skill nowadays and the
collection of thousands or millions of samples of training data is infeasible in terms of
accessibility to experienced writers.
82
hidden vowels in an outline model can be estimated after learning in the M step. Then in the
E step, in which E>M, these estimated values can be treated as though they are observed.
On the whole, the EM learning of a vowel (hidden) node is denoted as:
E (V TRUE )
E (O TRUE )
(4.9)
where V is a vowel node, O is an outline model and E() is the number of times a
corresponding parameter is expected to occur. According to Murphy [Mk01], E(...) is
computed as follows
(4.10)
(4.11)
83
primitives which belong to the given outline model. Alternatively, equation 4.11 can be
denoted in terms of belief of a node as follow:
(4.12)
where j = (1,..,n), Oi is the ith outline model, x is a root node of Oi, BEL() is the belief of a
node, Pj is an input primitive and Nj is a child node of the root node x.
To find the N-best outline models for a given input, models with top N posterior
probabilities are chosen. However using the posterior probabilities alone to find the best
models is not computationally efficient. The problem is that the number of outline models
increases along with the number of words contained in a lexicon and calculating the
posterior probability of thousands of outline models in real time word transcription is
infeasible, mainly in terms of operational time. Therefore, three unigram-based rejection
strategies are applied in our system in order to reduce model selection time.
The first rejection strategy number of consonant primitives (NCP) of an input outline is
used as a first level filter to reject outline models that are not relevant to a given input. The
approach is denoted as NCP filter and the algorithm is denoted as:
NCP(i)
(4.13)
input outline. Example 1 (below) clarifies the concept behind NCP filter.
Example 1
Assuming that k= 2, O = {O1, O2, O3, O4, O5, O6} is a set of outline models contained in the
system and NCP of O1, O2, O3, O4, O5, O6 are 2, 2, 6, 3, 5 and 2 respectively, O NCP(2) is
calculated using the formula 4.13 as follow:
O NCP(k) = O NCP(i) \ O NCP(i k)
84
O NCP(2) = { O1, O2, O3, O4, O5, O6} \ { O3, O4, O5}
= { O1, O2, O6}
The second rejection strategy outline models are discriminated in favour of a pair of
primitives, appearing at the first and last (consonant) segments of an outline. This approach
is denoted as F&L filter and the algorithm is denoted as:
(4.14)
where OFi,Li is a set of outline models whose first and last segments relate to any type of
primitive and k and j are types of the first and last segments of an input outline respectively.
Example 2 below demonstrates the concept behind F&L filter.
Example 2
Assuming that k= 5, j = 6, O = {O1, O2, O3, O4, O5, O6} is a set of outline models contained
in the system and (F(i),L(i)) of O1, O2, O3, O4, O5, O6 are (3,2), (5,5), (5,6), (1,2), (5,6) and
(5,2) respectively, O F(5),L(6) is calculated as:
O F(k),L(j) = OF(i),L(i) \ OF(i k),L(i j)
O F(5),L(6) = { O1, O2, O3, O4, O5, O6} \ { O1, O2, O4, O6}
= { O3, O5}
The idea behind formula 4.14 is based on an interesting phenomenon i.e., wrongly spelled
English words are sometimes comprehensible to a reader as long as the first and the last
letters of the words are clearly indicated. For example, you may understand the following
sentence even though it contains a number of spelling errors: Wornlgy seplled Egnlish
words are sitll leiglbe to a reader as lnog as the frist and lsat ltteers of the words are crroect.
In other words, first and last letters of a word provide heuristics for word identification in
English. Similar to this phenomenon, the outline model selection in our work can be based
on evidence of the first and last primitives of an outline provided that the first and last
segments of an outline are always written in Pitmans Shorthand. According to our study
85
The third rejection strategy outline models are selected depending on the existence of
circular primitives in an input outline. The approach is referred to C filter and the
algorithm is denoted as:
(4.15)
where OC(i) is a set of outline models, k is a conditional variable which is TRUE if an input
outline contains circular primitives and FALSE otherwise. Example 3 below demonstrates
the concept behind C filter.
Example 3
Assuming that k= TRUE, O = {O1, O2, O3, O4, O5, O6} is a set of outline models contained
in the system and C(i) of O1, O2, O3, O4, O5, O6 are TRUE, FALSE, TRUE, FALSE, FALSE,
TURE respectively, O C(TRUE) is calculated as:
O C(k)
= OC(i) \ OC(i k)
O C(TRUE) = { O1, O2, O3, O4, O5, O6} \ { O2, O4, O5}
= {O1, O3, O6}
Formula 4.15 checks the existence of circular primitives in outline models and splits the
whole outline models into two main groups: those containing circular primitives and those
not containing circular primitives. In general, this rejection strategy performs well, with
reliable accuracy of the collaborators recognition engine at detecting circular primitives of
an outline (if there are any).
Overall, model selection strategies carried out in this work are illustrated from left to right
rejection order (Figure 4.10). After the C filter, posterior probabilities of the remaining
86
outline models are calculated using formula 4.12, with which the N-best candidate outline
models for a given input are chosen.
NCP
filter
F&L
filter
C filter
Posterior
probability
filter
Legend
A collection of outline model (its
length represents number of
outline models)
Rejection strategy
In the presence of shape variation and position confusion of pen strokes due to
natural handwriting.
In the presence of missing vowel primitives that are randomly omitted among
outlines by experienced Pitmans Shorthand writers.
87
Single-consonant data set: This data set contains outlines with skeletons having
one and only one consonant stroke, for instance, shorthand outlines for the
words bay (
), pea (
) and pat(
homophones (i.e., outlines that look similar but have different representations)
contain in this data set as outlines are similar with minor difference of line
thicknesses, vowel positions and inclinations.
Stroke-combination data set: This data set contains outlines with skeletons
having two or more consonant strokes, written according to the normal rules of
Pitmans Shorthand i.e., phonemes of the words are directly converted into
Pitmans primitives without applying any special rules of Pitmans Shorthand,
invented for speed enhancement purposes. The data set covers the whole range
of possible stroke combinations, and sample outlines of the data set are
illustrated (Figure 4.11).
Bar
making
rare
escape
machine
Special-rule data set: This data set contains words written according to the
special rules of Pitmans Shorthand. For instance, instead of writing the word
after by comprising primitives of the phonemes /F/, /T/, /R/ and vowels as in
Figure 4.12(a), Pitman uses a doubled length /F/ curve to express the word
88
after as in (Figure 4.12 (b)). In general, this data set contains inconsistent
outlines, written without following corresponding special rules of Pitmans
Shorthand by (inexperienced) shorthand writers who do not digest the complete
rules of Pitmans Shorthand.
Vowel
Vowel
Consonant /F/
Consonant /T/
Consonant /R/
Figure 4.12: Two different shorthand outlines for the word after; (a) the word after
is written according to the direct conversion of phonemes into primitives (b) the word
after is written according to the double-length rule of Pitmans Shorthand
Table 4-2: Details of the data collection for the three data sets
Numbers
of words
Writer ID
Number of
times
135
Writer A
135
Writer B
192
Writer A
192
Writer B
192
Writer C
87
Writer A
87
Writer D
87
Writer E
In total, 1416 outlines were collected for the three data sets where
89
Table 4-2 provides details of the collected data. The data is collected using a tablet PC
with an electromagnetic digitizer of resolution 1000 ppi and five writers were involved in the
data collection. The three data sets cover the whole range of shorthand primitives and the
word list contains the 5000 most frequently used English words of the general domain. 45%
of the data is included in a training data set and samples of the collected data are illustrated
(Figure 4.13).
Pitmans
Shorthand outline
for the word bay
using the whole data sets and experimental results are discussed as follow.
Firstly, the accuracy of the vocalised outline identification is discussed. To clarify what is
meant by the vocalised outline identification it is the process of defining whether a written
outline is a short-form or a phonetically written outline. As shown in Figure 4.14, accuracies
of the vocalised outline identification vary from writer to writer, or even from time to time
90
for the same writer. For instance, consider the accuracies of the vocalised outline
identification for writer A for the single-consonant data set where there is a difference of
approximately 62% between the accuracy of the first time and the second time writings. The
study finds that a major reason for having such a difference of accuracy is that writer A
omitted most of the vowels while writing the single-consonant data set the first time whereas
the writer indicated at least one vowel for most of the words at the second time of writing.
Therefore, it is summarised that the indication of at least one vowel for an outline is critical
for obtaining high vocalised outline identification accuracy.
recognized as vocalised outlines are remarked as short-forms by the recognition engine. For
example, 73% of the data written by writer A for the single consonant data set are remarked
as short-forms by the recognition engine although the outlines are, in fact, vocalized
outlines. On the whole, average vocalised outline identification accuracy for the whole data
sets is 69%.
Legend
120%
Accuracy
100%
80%
60%
40%
20%
0%
A
writer
91
segmentation accuracy of different writers of same data set, consider the results of the
special-rule data set where the segmentation accuracy of outlines written by writer E is
higher than that of writer A. Statistics show that writer A does not have previous experience
of using a pen based text entry system whereas writer E has previous experience of applying
pen based text entry systems of handheld devices. In addition, statistics show that writer A
prefers writing small scripts on a tablet in a similar manner to writing on a conventional
paper whereas writer E produces larger scripts with flexible pen movements on the digitizer.
Therefore, it is observed that writers previous experience of using pen based text entry
systems has an influence over the segmentation performance of the recognition engine. The
average segmentation accuracy of the overall data sets is 36%. The segmentation accuracy
presented in Figure 4.15 is based on the number of correctly detected vocalized outlines and
is formulated as follow:
(t y )
100
t
(4.16)
where s is segmentation accuracy, t is the total number of written words and y is the total
number of outlines that are recognised as short-forms instead of vocalised outlines.
92
writer
Legend
Single-consonant data set
Stroke-combination data set
Special-rule data set
tx
100
t
(4.17)
93
writer
Legend
Single-consonant data set
Stroke-combination data set
Special-rule data set
Each group comprises of four graphs discussing experimental results from different aspects,
outlined as follow:
Recognition accuracy vs. transcription accuracy: the graph illustrates the influence
of the performance of the recognition engine over the transcription engine. It applies
94
two types of data in order to discuss the theme: firstly, data with any kind of errors
of the recognition engine and secondly, (filtered) data with no vocalised outline
identification and segmentation errors of the recognition engine.
Accuracy of the end result: the graph illustrates the accuracy of a list of results of an
input outline according to three measures: firstly, the accuracy of a correct word
appearing in the result list, secondly, the accuracy of the correct word appearing in
the top five group of the result list and thirdly, the accuracy of the correct word
appearing at the topmost position of the result list. Note that accuracies illustrated in
this graph are based on data with no vocalized outline and segmentation errors as the
correction of these errors is not included in the scope of this research.
Factors influencing the accuracy of a result list: the graph illustrates the average
distribution of factors influencing a correct word not to appear at the topmost
position of the result list. Similarly, results reported in this graph are based on data
with no vocalised outline identification and segmentation errors.
95
transcription accuracy (less than 20%). It has been discussed that inadequacy of vocalised
outline identification of the recognition engine is mainly affected by the omission of vowels
among outlines and therefore, the indication of at least one vowel of a vocalised outline is
also encouraged in this research in order to achieve high transcription accuracy.
In the presence of
any kind of
recognition errors
In the presence of no
vocalised outline
identification and
segmentation errors
Writer
An interesting phenomenon here is that although writer A has an immediate level of skill in
Pitmans Shorthand and writer B has an inexperienced level of skill in Pitmans Shorthand,
outlines written by writer B are transcribed more accurately than that of writer A. The study
finds that this is because the handwriting of writer B is more legible with more informative
96
pen strokes than that of writer A as compared (Figure 4.18). In relation to this finding, it is
remarked that the writing of legible scripts is encouraged in this current research to be able
to obtain high recognition and transcription accuracy.
Pitmans Shorthand outlines written by writer B
night
nod
note
nut
night
nod
note
nut
Transcription accuracy
80%
60%
40%
20%
0%
A
Writer
Figure 4.19: Illustration of the word transcription accuracy of the single consonant
data set
97
e
c 100
t
(4.18)
where c is the classification error, e is the number of words having a classification error and t
is the total number of input words.
f
100
t
(4.19)
where v is the vowel error, f is the number of words having omitted vowels and t is the total
number of input words.
b
100
t
(4.20)
where a is the correction accuracy, b is the total number of words interpreted correctly by
the transcription engine in the presence of the classification /vowel errors and t is the total
number of words having classification errors or a vowel errors respectively.
On average, the correction rate for the classification error is 76% and the correction rate for
the vowel error is 55%. This indicates that the Bayesian Network based outline models,
implemented in the transcription engine are capable of coping with the classification and
vowel errors.
98
80%
70%
60%
40%
Successful transcription
in the presence of
classification errors
30%
Vowel errors
50%
20%
10%
0%
A
Successful transcription
in the presence of vowel
errors
Wrtier
99
49%
31%
due to similarity to other outlines
due to classification errors
due to vowel errors
due to combination of similarity to other outlines, classification errors and
vowel errors
4.8.5 Analysis
of
Word
Transcription
Accuracy
Using
Stroke-
2
3
Filtered data does not contain any vocalized outline identification or segmentation errors.
Unfiltered data contains any kind of recognition errors.
100
errors and in relation to this finding, it is evaluated that reliable outline segmentation is
important for the overall transcription accuracy in the case of having words with two or
more consonant strokes.
120%
100%
In the presence of
any kind of
recognition errors
80%
60%
In the presence of no
vocalised outline
identification and
segmentation errors
40%
20%
0%
C
Writer
101
An interesting phenomenon here is that although writer Cs data is not included in the
training data set, 96% of the writers outlines are transcribed with a correct word appearing
at the top-five of a result list. This indicates that the history based learning algorithm
implemented in Bayesian Network models can effectively cope with unseen patterns that are
not included in a training data set.
Accuracy
100%
80%
60%
40%
20%
0%
C
Writer
102
Note that writers rarely omit vowels in this data set, compared to the single-consonant data
set. Writers of this data set are encouraged to indicate at least one vowel of an outline,
mainly in order to avoid the rejection of substantial data at the recognition stage by a
vocalised outline detector.
100%
80%
60%
Successful transcription
in the presence of
classification errors
40%
Vowel errors
20%
0%
C
Successful transcription
in the presence of vowel
errors
Wrtier
103
11%
due to combination of
similarity to other
outlines, classification
errors and vowel errors
89%
Filtered data does not contain any vocalized outline identification error or segmentation error.
104
The study finds that this is mainly due to the preference of writer D writing outlines without
vowel components as well as due to the writing of incorrect outlines without fully following
the special rules of Pitmans Shorthand. In relation to this finding, it is remarked that the
writing of consistent outlines in accordance with the special rules of Pitmans Shorthand is
encouraged in this research in order to obtain high transcription accuracy.
In the presence of
any kind of
recognition errors
80%
60%
In the presence of no
vocalised outline
identifcation and
segmentation errors
40%
20%
0%
A
Writer
105
Transcription accuracy
80%
60%
40%
20%
0%
A
Writer
Figure 4.27: Evaluation of the word transcription accuracy of the special-rule data set
106
100%
80%
60%
Successful transcription
in the presence of
classification errors
40%
Vowel errors
20%
0%
A
Successful transcription
in the presence of vowel
errors
Wrtier
107
due to combination of
similarity to other
outlines, classification
errors and vowel errors
The word
interpretation comprises the network modelling, the belief propagation, the Bayesian
learning and the model selection. Experimental results of the new framework are presented,
following the full description of Bayesian Network based transcription algorithms.
Overall, a primary issue discussed in relation to the performance of the recognition engine is
the indication of at least one vowel of an outline in order to avoid an incidence where
outlines are mistaken as short-forms instead of vocalised outlines. In terms of the feasibility
108
From the aspect of the performance of the Bayesian Network based word interpreter, the
average transcription accuracies for the three (filtered6) data sets are 91% for a correct word
appearing in a result list, 85% for a correct word appearing in the top five of a result list and
58% for a correct word appearing at the topmost position of a result list. Overall, the
accuracy of 91% is satisfactory, however the accuracy of a correct word appearing at the
topmost position of a result list (58%) indicates that the homophones of the result list need to
be resolved with the application of contextual information. A resolution to this problem is
discussed in detail in chapter 6, which is about the phrase level transcription.
From the aspect of a relationship between the features of a writer and the transcription
accuracy, the study finds that gender and age of writers do not have significant influence on
the performance of the recognition and transcription systems. However, the study finds that
a writers skill in Pitmans Shorthand and a writers previous experience in using pen based
text entry systems are related to the overall transcription accuracy. Another consideration is
that writers of the current experiments are right-handed and a further analysis of the
transcription performance with left-handed writers is recommended. In addition, writers of
the current experiment use British English and further analysis of the transcription
performance with the writers who use American English is challenging. Since Pitmans
Shorthand is written phonetically, outlines written according to British English and
American English are different, especially in vowel notations.
6
Filtered data does not contain any vocalized outline identification error or segmentation error.
109
Table 4-3: Transcription accuracies of the phonetic based transcription and the
primitive based transcription approaches
Average Transcription
Primitive based
Phonetic based
Accuracy
transcription
transcription
Overall
93%
84%
100%
0%
0%
0%
100%
100%
omission or confusion
In the presence of inconsistent
writing
In the presence of
classification error
As shown (Table 4-3), an average transcription accuracy of the primitive based transcription
approach increases by 9% compared to that of the phonetic based transcription approach.
The study finds that this is mainly due to the increased correction accuracy of vowel errors
in the novel framework. Overall, performance of the novel proposed framework is promising
but must be improved upon in the areas discussed for it to become a commercially viable
system.
110
111
Chapter 5 Introduction
The previous chapter presented a novel solution as a means of interpreting handwritten
Pitmans Shorthand outlines using Bayesian Network algorithms, in which geometrical
features of the outlines are directly translated into English word(s). On the whole, the
solution was found to be efficient, mainly with the use of a machine-readable Pitmans
Shorthand dictionary that contains a set of shorthand outlines mapping to corresponding
English word(s). Based on a thorough literature review carried out in this research, no other
machine-readable (electronic) Pitmans Shorthand lexicon has ever been designed, making
this the first of its kind which has been developed specifically for this research. This may be
a major reason why none of the previous work (of the same framework) attempted to analyse
performance of the direct transcription of geometric primitives into English words.
Specifically, this chapter presents full details of the creation of the electronic Pitmans
Shorthand lexicon, developed for this research, under the following four sections:
1. Overview: overview of the rule based creation of the electronic Pitmans Shorthand
lexicon and discussion on general advantages and disadvantages of rule based
approaches.
2. Structure: description of the lexicon structure in terms of feature set, key and lexicon
as a whole.
3. Rules: description of rules, applied in our system to conform to the writing rules of
Pitmans Shorthand.
4. Experimental results: evaluation of the electronic Pitmans Shorthand lexicon,
mainly in terms of accuracy and homophone distribution.
112
5.1 Overview
Firstly, in order to clarify precisely what is meant by a Pitmans Shorthand lexicon, sample
entries of a conventional Pitmans Shorthand dictionary (available in book format) and
sample entries of an electronic Pitmans Shorthand dictionary are illustrated (Figure 5.1).
Key
Shorthand
Key
Word
airs
airs , erase
bake
bake
ball
ball
bays
bays, pays
airs, erase
erase
oak, go
oak
oak, go
go
pays, bays
pays
(a)
(b)
113
instance, a consonant W
and an upward stroke
3. Define conversion rules that conform to the writing rules of Pitmans Shorthand.
4. Generate a series of geometric features for a given word using the phonetic lexicon
and conversion rules.
The Pitmans Shorthand lexicon is created using a set of conversion rules. When rule-based
algorithms are introduced in the field of handwriting recognition, one may argue that rules
are static and incapable of coping with natural ambiguities [Sy94], [Lr89]. Here, it is
important to realise that the rules reported in this chapter are applied only to static lexical
data, not to handwritten data; due to the use of this static lexical data, accuracy of the
Pitmans Shorthand lexicon becomes reliable. Like other rule-based approaches [FF93],
114
[Mm03], training is not required for the creation of the shorthand lexicon and rules can be
refined easily as needed if the lexicon is to be altered.
Type
Pattern
Description
Type
/T/ or /D/
17
/F/ or /V/
18
/th/ or /TH/
19
/P/ or /B/
20
/M/
21
/N/ or /NG/
22
/K/ or /G/
23
Small hook
/SH/ or /ZH/
24
Large hook
/CH/ or /J/
25
Vowel
10
/R/ (downward)
26
Vowel
11
/L/ (upward)
27
Diphthong
12
/S/ or /Z/
28
Diphthong
13
/R/ (upward)
29
Diphthong
115
Pattern
Description
14
/L/ (upward)
30
Diphthong
15
31
Diphones
16
5.2.2 Key
A key corresponds to a vocalised shorthand outline and relates to one or more than one
word.
vocalised outline are firstly allocated in chronological order with a series of vowel primitives
following at the end. A major reason for keeping vowel primitives at the end of a key is to
cope with the special writing order of Pitmans Shorthand i.e., consonants are always written
first with vowels placed around the consonant kernel later. Sample keys are given in Figure
5.2 where each key comprises two major components one containing consonant primitives
and the other containing vowel primitives. Both components are arranged in chronological
order such that a primitive at the end of the first component corresponds to the last
consonant of a vocalised outline and a primitive at the beginning of the second component
corresponds to the first vowel of the vocalised outline.
famous
yellow
Word
Pronunciation
Keys of a Pitmans
Shorthand lexicon
Famous
/F M S/
/F M S /
2+5+16+91+92
Yellow
/Y L W/
/Y L /
23+13+11+91+92
Figure 5.2: Sample keys of the electronic Pitmans Shorthand lexicon; vowels are
underlined
116
Key
Word
2+91
4+91
pays, bays
fee
father
after
pays
bays
Sample one in Figure 5.3 presents a lexicon entry for the words fee, father, further and
after. The example indicates that words with similar geometric features of different
lengths belong to the same key. In order to recognise length variation of the words, consider
the sample Pitmans Shorthand outlines given in Figure 5.3.
Similarly, sample two in Figure 5.3 presents a lexicon entry for the words pays and
bays. The example indicates that words with similar geometric features of different
thicknesses belong to the same key. Consider the sample outlines illustrated in Figure 5.3 to
identify line thickness difference between the two words.
117
P: a phonetic lexicon
N: numbers of words contained in P
Wi: ith word of the phonetic lexicon, P
Vi: phonetic representation of Wi
Si: a series of geometric features of Vi
table: a hash table, holding a Pitmans Shorthand lexicon
key: a key, representing a vocalised shorthand outline
value: word value to which a specified key is mapped in table
Initialisation
table = a hash table
Lexicon organisation
For i = 0 to N
//convert phonemes of a word into geometric features
Si = convertPhonemetoShorthand(Vi)
key = Si
//if a key already exists
if (table.containsKey(key))
value = table.get(key)
value += Wi
end
else if (!table.containsKey(key))
value = Wi
end
table.put(key,value)
end
118
(5.1)
For instance, if x is a set of phonemes /T D / (for the word today), then y is produced
by invoking the conversion procedure as follows:
y = convertPhonemeToShorthand(/T D /)
y = 1+ 1+ 91+ 92
In total, the conversion procedure comprises 46 rules, conforming to the writing rules of
Pitmans Shorthand 2000, defined in [Oj95]. In order to produce a primitive representation
for a given set of phonemes, the 46 rules are applied in an ascending priority order as follow:
Priority 1: 1st rule 17th rule
Priority 2: 18th rule 32nd rule
Priority 3: 33rd rule 36th rule
Priority 4: 37th rule 39th rule
Priority 5: 40th rule 41st rule
Priority 6: 42nd rule 43rd rule
Priority 7: 44th rule
Priority 8: 45th rule 46th rule
For instance, application of the 2nd rule must follow the completion of the 1st rule and
similarly, application of the 18th rule must follow the completion of the 17th rule. With the
119
aid of a diagram (Figure 5.4), data flow in the conversion procedure can be followed.
Input (phonemes of the word famous)
/F M S/
Rule 1
Rule 2
Rule 17
Rule 18 to 32
Rule 45 to 48
120
ability to produce shorthand outlines, and it is important to clarify the concept behind each
rule to enable the reader to understand how the complex writing rules of Pitmans Shorthand
are embedded in the system of this research.
5.3.2
Description of Rules
Table 5-2: Summary of the 46 rules applied in the creation of the Pitmans Shorthand
lexicon
Rule
Description
Diphthong U
WH
PL, BL, TL, DL, CHL, JL, KL, GL, used consonantly at the beginning, in the
middle or at the end of a word
FL, VL, ThL, ML, NL, SHL, used consonantly at the beginning of a word
10
11
12
Past tense ED
13
14
15
ING
16
INGS
17
Suffix SHIP
18
S or Z stroke
121
19
Suffix MENT
20
Suffix MENTAL
21
Suffix MENTALLY
22
23
MD, ND
24
FR, VR, Thr, THR, SHR, ZHR, MR, NR, used consonantly at the beginning of
a word
25+26
PR, BR, TR, DR, CHR, JR, KR, GR, FR, VR, Thr, THR, SHR, ZHR, MR,
NR, used syllabically at the beginning, in the middle or at the end of a word
27
SKR, SGR
28
KW, GW
29
PL, BL, TL, DL, CHL, JL, KL, GL, used syllabically in the middle or at the
end of a word
30
FL, VL, THL, ML, NL, SHL, used syllabically in the middle or at the end of a
word
31
S followed by H
32
S+vowel+hookR, ST+vowel+hookR
33
Downward L
34
35
36
SHUN
37
N hook
38
Upward L
39
40
Suffix LY
41
42
Dash H
122
43
44
45
46
Vowel conversion
Table 5-2 presents a summary of the forty-six rules with a list of phonemes relating to each
of them. In order to avoid information overload, algorithms of just five rules are presented
in this section, and the remaining rules can be referenced in detail in Appendix A.
In general, the rules are discussed here in three aspects: complexity, objective and strategy.
The complexity of a rule corresponds to either direct conversion or indirect conversion.
Direct conversion directly converts phonemes into geometric features, whereas indirect
conversion performs the unusual conversion of phonemes into geometric features with
respect to the special writing rules of Pitmans Shorthand, invented for speed improvement
purposes. In addition, the objective of a rule corresponds to the major role of the rule, and
the strategy of a rule corresponds to a programming procedure of the rule.
Description of the 3rd Rule (CON and COM at the beginning of a word)
Complexity: indirect conversion
Objective: to convert the sounds CON and COM at the beginning of a word into a dot
primitive. A sample outline containing the sound COM at the beginning is illustrated in
Figure 5.5.
Strategy: if a word starts with the sound CON or COM, and if the sound CON or COM is
not followed by the sound ING, S, Z, T or D at the end of the word, convert the sound CON
or COM into a dot primitive.
123
Reference
/COM/
/M/
/NS/
//
Figure 5.5: Illustration of the use of a dot primitive for the sound COM at the
beginning of a word
Description of the 5th Rule (Negative prefix IL-, IM-, IN-, IR-, UN-)
Complexity: indirect conversion
Objective: to convert the sound IL-, IM-, IN-, IR- or UN-, negative prefix of a word, into a
series of consonant and vowel primitives. A sample Pitmans Shorthand outline containing
the prefix IR- is illustrated in Figure 5.6.
Strategy:
1. Save words containing the prefix IL-, IM-, IN-, IR- or UN- in a list.
2. Check if a word representation of an input matches with any element of the list;
3. if it does and if the prefix is IL-, convert the sound IL into an upward stroke L,
followed by a dot primitive and another extra upward stroke L;
4. if it does and if the prefix is IM-, convert the sound IM- into a curve M, followed a
dot primitive and another extra curve M;
5. if it does and if the prefix is IR-, convert the sound IR- into a downward curve R,
followed by a dot primitive and another extra downward curve R;
6. if it does and if the prefix is IN-, convert the sound IN- into a curve N, followed by a
dot primitive and another extra curve N.
7. if it does and if the prefix is UN-, convert the sound UN- into curve N, followed by a
dash primitive and another extra curve N.
124
In addition, the 5th rule states that a consonant /D/ following the prefix IN- and UN- is not
allowed to be omitted. This is to avoid a conflict with the ND writing rule of Pitmans
Shorthand, in which the consonant /D/ following /N/ is omitted. Detail about the ND rule
can be referenced in Appendix B.
Reference
Pitmans Shorthand notations
Pitmans Shorthand outline for
the word irregular
/R/
(start)
//
/R/
(middle)
//
/G/
/L/
/R/
(end)
//
/U/
Figure 5.6: Illustration of the use of negative prefix IR- in a vocalised outline
Description of the 6th Rule ( PL, BL,,GL, used consonantly at the beginning, in the middle
or at the end of a word
Complexity: indirect conversion
Objective: to convert a pair of consonants PL, BL, TL, DL, CHL, JL, KL or GL at the
beginning, in the middle or at the end of a word into a series of a small hook L followed by a
corresponding consonant primitive. Note that the consonant L is written as an upward or
downward curve (instead of a hook) when it is not immediately following /P/, /B/, /T/, /D/,
/CH/, /J/, /K/ or /G/. A sample Pitmans Shorthand outline containing the sound /P L/ at the
beginning of a word is illustrated in Figure 5.7.
Strategy:
1. If /N/ comes before /T L/ or /D L/, hook L is not used.
2. If /T/ or /D/ does not appear in the same syllable as /L/, hook L is not used;
125
3. otherwise, replace phonemes /P L/, /B L/, /T L/, /D L/, /CH L/, /J L/, /K L/ and /GL/
of an input with a, b, c, d, e, f, g and h respectively, where
a = hook + P stroke
b= hook + B strokes
c = hook + T stroke
d = hook + D stroke
e = hook + CH stroke
f = hook + J stroke
g = hook + K stroke
h = hook + G stroke.
Reference
Pitmans Shorthand outline
for the word play
/P/
/L/
/PL/
//
Description of the 14th rule (Half length stroke for one syllable words)
Complexity: indirect conversion
Objective: to omit /T/ or /D/ at the end of one syllable words. This relates to the half-length
writing rule of Pitmans Shorthand and a sample (one-syllable) half-length outline is
illustrated in Figure 5.8.
Strategy: if a word is a one-syllable word and if there are consonants in a word other than
just /R/ and /T/ or /T/, then /T/ or /D/ at the end the word is omitted, provided that /T/ is not
following a voiced consonant and /D/ is not following an unvoiced consonant.
126
Reference
Pitmans Shorthand notations
/K/
/T/
//
Reference
Pitmans Shorthand notations
/F+TER/
/F/
//
Figure 5.9: Illustration of the omission of the syllable TER in a vocalised outline
127
/F K/
/V K/
/F G/
/V G/
128
key
Element
May
Maid
Made
Bat
Pat
Bad
Pad
(t e)
* 100
t
(5.2)
where a is the accuracy of an electronic Pitmans Shorthand lexicon, t is the total number of
words included in a testing data set and e is the total number of incorrectly generated words
whose primitive representations do not match with patterns defined in an original Pitmans
Shorthand dictionary (available in book format).
129
Number of words
included in a lexicon
75
80
85
90
95
100
Accuray in Percentage
In order to clarify the four types of errors, consider the following four examples in which
each example provides a sample erroneous shorthand outline with a corresponding
explanation for each type of error.
130
In order to clarify errors due to an ambiguity of the writing rules of Pitmans Shorthand,
consider one of the rules of Pitmans Shorthand which reads: Straight strokes are doubled
in length to represent the sounds of TER, -DER, -THER, and TURE when they follow
another stroke [Oj95].
shorthand representation for the word weather as Figure 5.13(a) in which the sound
THER is added via a doubled length stroke. However, the typical Pitmans Shorthand
dictionary (available in book format) defines the word weather as in Figure 5.13 (b) in
which the sound THER is not written according to the double-length rule. The study finds
that this is because the typical Pitmans Shorthand lexicon applies another rule of Pitmans
Shorthand which reads: A straight stroke is not doubled if the doubling would produce two
strokes of unequal length without an angle [Oj95].
weather relates to the first rule or the second rule, consider other two outlines
(
Figure 5.14), defined in the typical Pitmans Shorthand dictionary. Between the two words,
the typical dictionary defines that doubling is not allowed for the word factor as the curve
before the straight stroke will produce two strokes of unequal length if the straight stroke is
doubled (case a); however, it defines that doubling is allowed for the word further since
the word complies with the double length rule of Pitmans Shorthand (case b). On the
whole, the transcription engine assumes that the word weather belongs to the case (b)
rather than case (a) since doubling does not produce two strokes of unequal length if the
straight stroke is doubled to add the sound THER. As a result, a shorthand outline for the
word weather generated by the transcription engine is different from the one defined in the
131
typical Pitmans Shorthand dictionary and hence the error. Overall, a primary cause of error
in this case is due to ambiguity of the rules of Pitmans Shorthand.
Figure 5.13: Two different outlines for the word weather; (a) the word weather is
written according to the double-length rule of Pitmans Shorthand; (b) the word
weather is not written according to the double-length rule of Pitmans Shorthand
Shorthand outline for the word
factor
(a)
(b)
Figure 5.14: (a) Shorthand outline for the word factor; (b) shorthand outline for the
word further
132
Figure 5.15: two different shorthand outlines for the word union
133
Normal N stroke
N hook
or
reduce ambiguity for the computer aided transcription, the transcription engine restricts the
writing of the suffix MENT to be only one form and the inability of generating an
alternative form is taken as an error in the current research.
Two different shorthand outlines for the word environment
134
On the whole, the graph (Figure 5.18) illustrates the distribution of four types of errors,
discovered in the current experiment where the major error (40%) is due to the limitation of
machine compatible scripts.
135
Unique outlines in %
120
In the presence of clear
line thickness and
complete vowel
information
100
80
60
40
20
50
00
40
00
30
00
20
00
50
0
10
0
Figure 5.19 illustrates experimental results carried out on different sizes of electronic
Pitmans Shorthand lexicon of up to 5000 words. The X-axis of the graph represents
different sizes of lexicons and words are sorted according to the frequency of usage in each
lexicon. The Y-axis of the graph represents the uniqueness of an electronic Pitmans
Shorthand lexicon where the formulation of the uniqueness can be defined as follow:
ta
* 100
t
(5.3)
where u is the uniqueness a lexicon, t is the total number of keys containing in the lexicon,
and a is the total number of keys having one and only one relationship with a corresponding
English word in the lexicon.
The first test (Figure 5.19) illustrates uniqueness of lexicons in the presence of clear
distinction between line thicknesses as well as in the presence of complete vowel
information. The study finds that uniqueness of the lexicon of 5000 most frequently used
English words is 96%. The maximum ambiguity is 4 candidate words per key and an
average ambiguity is 1.02 potential words per key.
136
The second test (Figure 5.19) illustrates uniqueness of lexicons in the presence of line
thickness ambiguity. According to experimental results, uniqueness of the lexicon of 5000
words drops by about 5% if there is no distinction between thick and thin strokes. The
maximum ambiguity is 5 candidate words per key and an average ambiguity is 1.05 potential
words per key.
Finally, the third test (Figure 5.19) illustrates uniqueness of lexicons in the presence of
vowel ambiguity. The study finds that uniqueness of the lexicon of 5000 words is
approximately 71% when vowel primitives are not included in the keys of a lexicon. The
maximum ambiguity is 15 candidate words per key and an average ambiguity is 1.22
potential words per key.
5.5 Discussion
On the whole, this chapter presents the creation of a novel machine-readable Pitmans
Shorthand lexicon in order to facilitate the direct translation of geometrical features of
shorthand outlines into English words. Experimental results present accuracies of different
sizes of electronic Pitmans Shorthand lexicon as well as the distribution of homophones in
the novel lexicon.
137
138
139
Chapter 6 Introduction
This chapter focuses on the solutions to the phrase level recognition of online handwritten
Pitmans Shorthand outlines. The primary aims of this chapter are first to investigate a
contextual method that can effectively reduce homophone ambiguities appeared in a
resulting list of a corresponding handwritten Pitmans Shorthand outline; and second, to
propose a phrase level recognition framework to produce the most likely word sequence for
a written phrase using the Vertibi algorithm.
Unlike research carried out in the previous chapters of the thesis, which determined the
investigation into finding a novel solution for a given problem to be the one and only one
goal, this chapters research is carried out with multiple goals, i.e., to investigate conceptual
algorithms for implementing a handwritten Pitmans Shorthand phrase recogniser, and also
to consider the possibility of applying existing Application Program Interfaces (APIs)
[Mic04] to resolve the problem of handwritten Pitmans Shorthand phrase recognition. A
major bottleneck of the integration is access to the APIs hidden functions to enable the
Pitmans Shorthand recognisers candidate English words to be input into the APIs.
This chapter presents detailed studies carried out to meet the main objectives mentioned
above, and it is categorised into the following four sections:
-
Phrase level recognition algorithm: propose a conceptual solution to find the most
likely word sequences for a handwritten Pitmans Shorthand phrase with the use of
the Viterbi algorithm and statistical language modelling techniques.
140
Several word rejection strategies [GKM+97], [PP02], [ MAG+02], have been applied in the
field of handwritten word recognition. Their reliability is related to their capability not to
accept false word hypotheses and not to reject true word hypotheses [Ka04]. Common
rejection thresholds reported in the literature are the class-threshold (e.g., [QAC05], which
rejects words according to their grammatical nature), the domain-threshold (e.g., [NB02],
which rejects words according to a user domain), the lexicon-threshold (e.g., [ESS+98],
which rejects words according to a lexicons confidence score) and the recogniser-threshold
(e.g., [PP02], which rejects words according to the confidence scores produced by a Hidden
Markov Model-based on-line handwriting recogniser).
In addition to the rejection strategies mentioned above, a critical contextual knowledge that
needs to be put into practice for rejecting homophones of handwritten Pitmans Shorthand
outlines is the shorthand outlines position. In Pitmans Shorthand, the outlines correct
positioning is highly critical [Oj95], as it provides a primary clue for retrieving vowel
information even though vowels are omitted in an outline. In general, an outlines position is
141
determined by the first pen-stroke in Pitmans Shorthand, such that if an outlines first stroke
is written above the writing line, it is considered to be written in the first position; if the first
stroke is written on the writing line, it is considered to be written in the second position; and
if the first stroke is written through the writing line, it is considered to be written in the third
position. Samples of three Pitmans Shorthand outlines written in three different positions
are illustrated in (Figure 6.1 a). As illustrated (Figure 6.1 (b)), although the three words
comprise exactly the same consonant stroke, their corresponding English words can be
easily identified by the differences between the outlines positions.
Pitmans Shorthand outlines for the words
at, aid and eat
Outline written in
the first position
at
aid
Outline written in
the first position
eat
at
(a)
aid
eat
(b)
Overall, stenographers apply the outlines position as a primary clue to identify the most
relevant words for a particular shorthand outline. However, this knowledge has never been
applied to solve the problem of homophone ambiguity in machine-based transcriptions.
Based on this observation, the contextual rejection strategy proposed in this chapter is
defined as:
(6.1)
where WP(i) is a list of candidate words for an input shorthand outline (written in different
positions) and k is an input outlines written position, which is 1 for the first position, 2 for
142
the second position and 3 for the third position. In order to clarify the algorithm, consider
Example 1 given below.
Example 1
Assuming that k = 1, W = {W1, W2, W3, W4, W5, W6} is a set of candidate words for a given
shorthand outline and P(i) of W1, W2, W3, W4, W5, W6 are 2, 1, 1, 3, 2 and 1 respectively,
then W P(k) is calculated as:
W P(k)
= WP(i) \ WP(i k)
W P(k)
Short-form
interpreter
Ordered
word list
Bayesian Network
based vocalised outline
interpreter
Short-form
interpreter
Ordered
word list
Ordered
word list
Bayesian Network
based vocalised
outline interpreter
Ordered
word list
Filtered
word list
Filtered
word list
Pitmans Shorthand
phrase level
transcription
Language
model
Filtered
word list
Filtered
word list
Word
graph
The structure of the phrase level transcription process (for handwritten Pitmans Shorthand)
is illustrated in Figure 6.2, where the framework is based on the architecture of the online
143
Based on the algorithm defined in the online handwritten English sentence recognition
[QAC05], the most likely sequence of words for a handwritten Pitmans Shorthand
phrase is defined as:
= arg max p ( P | W ) p (W )
(6.2)
where W is the candidate words sequences, P is the handwritten Pitmans Shorthand phrase,
p(P|W) is the posterior probability of a handwritten phrase P conditioned on the given
sequence of words W, and p(W) is the prior probability of sequence W.
In detail, p(P|W) is evaluated by the confidence score of the Bayesian Network based word
interpreter and p(W) is given by a statistical language model. In other words, the efficiency
of finding the best sequence of words for a given input phrase depends on the confidence
score of the handwritten word recogniser plus the confidence score of the applied statistical
language model.
144
This chapter focuses on the statistical language models impact on phrase level recognition,
because a language models quality can directly affect the overall word recognition
accuracy. For instance, [MB01] showed that a bi-gram language model outperforms a
unigram language model in offline handwritten sentence recognition. Similarly, work by
[ZB04] showed that a tri-gram model increases word recognition accuracy by 6.8%
compared to a bi-gram model. Again, work by [QAC05] showed that using bi-gram and trigram models for online handwritten sentence recognition results in only a 0.1% difference in
word recognition accuracy. These findings show that it is critical to apply an appropriate
statistical language model in order to obtain an overall promising result.
Considering a statistical language models quality, this chapter proposes the use of the
statistical language model embedded in Microsoft handwriting recognition APIs [Mic04], in
which the language model has been thoroughly trained with millions of words of various
languages, dictionaries and grammar for the development of a commercially viable system.
6.3
This section presents a feasibility study of the integration of a Pitmans Shorthand phrase
recogniser with Microsoft handwriting recognition APIs, in order to take advantage of
existing statistical language models embedded in the APIs. In order to discuss the specific
API relating to this study, consider the object model illustrated in Figure 6.3.
145
Ink Collection
and Display
Ink Data
Recognition
Recognizer
RecognizerContext
WordList
RecognizerGuide
RecognitionResult
RecognitionAlternates
RecognitionAlternate
Strokes
Stroke
Gesture
Figure 6.3 illustrates an object model for the class Microsoft.Ink that includes child
objects facilitating automatic handwriting recognition. A specific object relating to the
current study is the RecogniserContext object, which enables ink recognition, retrieving
the recognitions result and alternative results.
146
the RecogniserContext API, Figure 6.4 (b) illustrates the change in recognition results
upon a new words arrival, Figure 6.4 (c) illustrates the change in the recognition results
when the APIs context is limited to a full file paths name, and Figure 6.4 (d) illustrates the
change in recognition results when the APIs context is limited to an e-mail address
username.
(a)
(b)
(c)
(d)
In total, approximately 40 kinds of input scopes can be defined in relation to the APIs
context. A major bottleneck in integrating the handwritten Pitmans Shorthand recogniser
into this powerful API is the APIs lack of function that can facilitate taking a written
phrases candidate words as inputs, and producing a ranked list of candidate phrases as an
output. Overall, the investigation of a solution to facilitate this function is rewarding and is
open to future work of this research.
147
6.4
Experimental Results
A small experiment is carried out in this chapter to evaluate the performance of the
contextual rejection strategy reported in this chapter. The data set includes 700 phrases,
which were automatically generated using the word lists gained from the experiments of the
Bayesian Network based word transcription (Chapter 4). A primary goal of this experiment
is to analyse the accuracy of irrelevant words removal from the result lists based on the
shorthand outlines position information, which defines rejection accuracy as:
100
a
0
If correct words appear in result lists after applying the rejection strategy
Otherwise
(6.3)
120
accuracy
100
80
60
40
20
0
1
30 59 88 117 146 175 204 233 262 291 320 349 378 407 436 465 494 523 552 581 610 639 668 697
phrase number
The rejection accuracy of 700 phrases is illustrated in Figure 6.5. The study finds that the
contextual rejection strategy correctly filtered 98% of the phrases, and that inaccurate
position writing, practised primarily by inexperienced writers, caused the 2% error rate. The
findings show that the contextual rejection strategy proposed in this chapter is highly reliable
in conjunction with accurate position writing.
6.5 Discussion
This chapter presents Pitmans Shorthand specific contextual knowledge to reduce
handwritten Pitmans Shorthands homophone ambiguities. Theoretical algorithms to
148
resolve the problem of handwritten Pitmans Shorthand phrase recognition are proposed
with the use of the Viterbi algorithm and language models. In relation to the use of sufficient
statistical language models in order to enhance the phrase level recognition performance, a
feasibility study of the phrase level recognisers integration with the existing handwriting
recognition APIs is carried out. The study highlights the APIs efficiency and proposes the
potential benefits of successfully integrating the two components.
Overall, this chapter has addressed solutions to the online handwritten Pitmans Shorthand
phrase recognition problem; however the framework is not fully implemented in this
research mainly because the research into this problem is no longer new, and an established
framework is available in the market. Compared to phrase level transcription, investigation
into novel solutions to handwritten Pitmans Shorthands word level transcription problems
has been more emphasised in this research, as the state of the art of the latter needs more
extensive research in order to produce a commercially viable handwritten Pitmans
Shorthand recogniser.
149
150
Chapter 7 Introduction
Previous chapters presented full details of the back-end interpretation of handwritten
Pitmans Shorthand outlines, whereas this chapter presents the research and development of
front-end user interfaces, via which users of the handwritten Pitmans Shorthand recogniser
interact with the back-end programs. The primary objective of the chapter is to demonstrate
the commercial viability of the end result of this research with a series of well-designed
prototypes. Figure 7.1 depicts the front-end and back-end layers of the system.
Developer
Volunteers for
the training
data collection
Client layer
End users
Front-end layer
Tool layer
Back-end layer
Database layer
151
7.1 Overview
Visual C#
Visual C++
Java
Visual C#
Ink Collector
GUI
Recognition
engine
Transcription
engine
Result GUI
0 1050 1169
31 1051 1167
46 1053 1171
.
.
WStart
S1,32,1,25,1
S2,32,92,3,0.997942
S2,32,92,12,0.24453
.
.
Data file 1
play 0.322
bay 0.112
clay 0.001
.
.
Data file 3
Data file 2
Legends
File storage
Process flow
Process
Read/write access
Figure 7.2: Illustration of interactions between user interfaces and back-end engines
of the system
152
A brief overview of the interactions between front-end interfaces and back-end engines are
illustrated (Figure 7.2). As shown, handwritten ink data, collected by an ink collector GUI is
firstly saved in a data file (data file 1 in Figure 7.2). The data file is then processed by the
collaborators recognition engine where a series of ink coordinates are transformed into a
ranked list of words or primitives (data file 2 in Figure 7.2). Then, on arrival of the
classified data, the transcription engine invokes and produces a ranked list of n-best words
for the corresponding classified data (data file 3 in Figure 7.2). Once the transcription result
is ready, the front-end GUI retrieves the result file and displays it as the best text
representation for a written outline.
programs becomes independent without necessarily waiting for the completion of each
other; thereby enabling the concurrent development of several components of the system in
two countries to become productive. In addition, the current system includes more than one
programming environment and data files are, in fact, primary media communicating
programs of the different environments. The graphical user interfaces presented in this
chapter are implemented using tablet PCs with Microsoft Windows XP Tablet PC Edition.
The detailed programming environments included in the current system development are:
-
Microsoft visual C#, used in the development of front end user interfaces.
153
collection of online handwritten Pitmans Shorthand data, but also the collection of any kind
of ink data needed for various purposes.
In general, the Tablet PC platform APIs relating to the ink collection can be divided into
three distinct groups: ink collection APIs, ink data management APIs and ink recognition
APIs. A pictorial presentation of how these APIs work together, at a high level, is provided
at the MSDN library [Abo04] and the illustration is replicated here (Figure 7.3) as a
reference for discussion.
According to the pictorial presentation of the Tablet PC platform APIs (Figure 7.3), the ink
collection procedure of this research relates to the utilization of Pen APIs (i.e., the first stage
of Figure 7.3). Here, clarification is made why APIs of the other stages (i.e., No. 2, 3 and 4
stages of Figure 7.3) are not applicable to the current research regardless of their efficient
ink manipulating and recognition capabilities. This is because the Tablet PC platform APIs
support the processing of only fifteen handwritten languages at the time of writing, and
Pitmans Shorthand is not one of them. In brief, only the ink collection APIs are applicable
to the current research, and the remaining functions of ink manipulation and ink recognition
154
are covered by the recognition engine and the transcription engine of this research
respectively.
Figure 7.4 depicts the high level relationship of object models of the tablet PC APIs where a
hierarchical relationship of an ink collection object, namely InkCollector is highlighted.
The primary function of this object is capturing a series of ink coordinates and timestamps of
a pen-stroke. In general, any handwriting data written on handheld devices with a Microsoft
tablet PC platform can be collected with the use of an InkCollector object.
155
Ink Collection
InkTablets
and Display
IInkTablet
InkCollector
InkDisp
IInkCursors
IInkCursor
InkDrawingAttributes
IInkCursorButtons
IInkCursorButton
INKEDLib
InkEdit
PenInputPanel
InkOverlay
InkPicture
InkRenderer
InkRectangle
InkTransform
Ink Data
Recognition
Figure 7.4: Illustration of the high level relationship of object models of the Tablet PC
platform APIs
156
The interfaces (Figure 7.5 & Figure 7.6) of this research are particularly designed for the
collection of a large sum of handwriting data for system training purposes. It has been
applied as a primary data collection tool in this research, and it can also be applied as a
general data collection tool for any other kinds of handwriting recognition systems.
A primary purpose of the interfaces in this research is to collect and organise training data
effectively as well as to enable volunteers (shorthand writers) to have a user friendly
experience of entering Pitmans Shorthand outlines into a tablet PC. Mention should be
made that Pitmans Shorthand was once widely practised as a speech recording mechanism,
but more recently it has ceased being a popular writing system. Therefore, volunteers of the
157
training data collection process can be of various ages as well as domains. In addition, tablet
PCs are fairly new devices for the general population at the time of writing, and no more
than 20% of the volunteers of this research have previous experience of using pen-based
computers. Taking into account these factors, an important criterion is set in relation to the
layout of the training data collector i.e., functions of the interfaces should be kept as simple
as possible, and the appearance of the interfaces must be suitable for volunteers of various
ages and domains.
Figure 7.6: Sample data entry page of the training data collector GUI
In general, the training data collector collects two types of data: writer data and ink data.
The writer data is intended for the evaluation of the overall system performance and the ink
158
data is intended for the training of the transcription engine. As illustrated (Figure 7.5), the
home page of the training data collector collects the following writer information:
Name: intended for automatic naming of training data folders and files.
Gender: intended for evaluating whether the transcription accuracy varies between
female writers and male writers.
Way of writing: intended for evaluating whether the transcription accuracy varies
depending on whether the writer is left-handed or right-handed.
159
The developer GUI (Figure 7.7) provides an advanced setting of the system where its
functions are particularly intended for system developers. Since it is a gateway to control
parameters of the recognition and transcription engines, it has presented huge benefits to the
current research and development. Moreover, it is also intended to be beneficial to future
system developers whose work is going to be based on this research. Functions included in
this interface are:
Definition of training data set: a text area is provided to enter a list of words that are
to be collected for training purposes. Since Pitmans Shorthand is no longer a
popular writing system nowadays, databases of handwritten Pitmans Shorthand
outlines, designed for training a handwriting recogniser, are not available at the time
160
161
Box 1
Box 2
Box 3
Figure 7.8: The first version of the collaborators tablet PC interface for the
handwritten Pitmans Shorthand recognition system
162
Parameter
control
functions
make a cake
Textual output
make
the
Transcription
results
Recognition
results
Input area
Figure 7.9: The latest version of the collaborators tablet PC interface for Pitmans
Shorthand recognition system
Unlike the interfaces developed by the collaborator, end-user interfaces of this thesis put
emphasis on the usability issues including user friendliness, commercial viability and
completeness of the system. From the aspect of user friendliness, the research interfaces are
designed to look similar to a conventional shorthand note-pad. In this way, primary users
(stenographers) of the system are expected to get used to the interfaces quickly, thereby
enabling a short learning curve.
While taking into account the creation of a note pad like interface, a pen-input area (writing
area) becomes a critical concern i.e., whether a square writing box should be designed for
163
the writing of a single word or multiple words. In general, facilitating the writing of
multiple words in a single writing area suffers word boundary ambiguities. However, on the
other hand, enabling the writing of N words in N writing areas suffers wasted screen spaces.
In this research, the collaborators recognition engine encourages the writing of one and only
one word in each writing area in order to reduce word boundary ambiguities. As a result,
end-user interfaces of this research also encourage the writing of N words in N writing areas.
Regardless of the use of several writing boxes in the interface, the original goal (i.e., to
create a note pad like interface) is achieved by connecting the boxes with faded borders as
illustrated (Figure 7.10).
In addition, the dimension of a writing area is discussed in relation to the creation of a notepad like interface. The size of Pitmans Shorthand note-pads, commonly used by
stenographers, is roughly A5 size (210mm x 149 mm) with approximately 8mm line
intervals [Lg90]. By taking a ratio of the size of a note pad to the size of 15 digitiser of a
tablet PC (e.g., 1024 pixel x 768 pixel), the 8 mm line interval is considered to be equal to
approximately 30 pixels. Based on these measurements, the dimension of individual writing
box of the interface is set at 100 pixels x 60 pixels with a line interval of 30 pixels. The
solution appears to be practical but requires further assessment for user acceptability.
From the aspect of commercial viability, the overall presentation of the interface is designed
to look good in addition to its reliable functioning. On the whole, users are provided with a
choice of two layouts to interact with the final interfaces of the system. The first one (Figure
7.10) is designed to be practical for its rapid note taking purpose and it resembles a
conventional shorthand note-pad. The second one (Figure 7.11) is designed to be practical
for its general text entry purpose and it resembles a handwriting recogniser of Microsoft
Windows XP Tablet PC edition.
164
From the aspect of completeness, the final interfaces of this research perform as a gateway to
access to any components of the system, including a data entry GUI with a list of toolboxes
for text edition and parameter setting, the developer GUI (Figure 7.7), the training data
collection GUI (Figure 7.5), and a back-end view of the system (similar to the one proposed
by the collaborator (Figure 7.9)).
components, the simplicity of the look of the interfaces is achieved by hiding any advanced
level control components with show/hide functions to open/close them respectively, as
illustrated (Figure 7.10 & Figure 7.11).
make a cake
Textual output
make
Text editing
toolbox
cake
Writing area
Page feed
Advanced setting
toolbox
Figure 7.10: Screenshot of a note-pad layout of the end-user interface of this research
165
Advanced setting
toolbox
Textual output
make a cake
make
cake
Writing area
Page feed
On the whole, four types of GUIs are evaluated in the experiment. Two of them were
developed by the collaborated research team and the other two were developed in the
research and development of this thesis. In order to assist the reader to easily recognise the
four GUIs, thumbnail views of the GUIs are presented in Figure 7.12. The experiment was
conducted by 20 persons with different levels of skill in Pitmans Shorthand (including those
with no background knowledge in Pitmans Shorthand up to those with professional level of
skill in Pitmans Shorthand).
166
Prototype 1
Prototype 2
Prototype 3
Prototype 4
Figure 7.12: Thumbnails of the four GUIs evaluated in the experiment
In general, experiments carried out in this chapter are categorised into four groups as
follows:
The distribution of user fondness for the presented prototypes in the case of speed
writing.
The distribution of user fondness for the presented prototypes in the case of general
text entry into handheld devices.
The comparison of the most favourite GUI of experienced shorthand writers and that
of novice shorthand writers.
167
percentage of user
prototype 2
prototype 3
prototype 4
Figure 7.13: The general distribution of user fondness for the presented prototypes
(Figure 7.13) illustrates the level of user fondness for the four prototypes where the X-axis
represents the level of user preference for a specific prototype over the others, and the Y-axis
represents the percentage of users. Experimental results show that prototype 4 is the most
favourite GUI for 60% of users, and the prototype 1 is the least favourite GUI for 95% of
users.
168
percentage of user
Figure 7.14: The distribution of user fondness for the presented prototypes in the
case of speed writing
(Figure 7.14) illustrates the level of user fondness for the four prototypes, especially in
relation to the need for rapid writing, for instance, in the case of the real time recording of
spoken speech. An interesting phenomenon here is that prototype 3 becomes the most
preferred GUI over the others, in particular prototype 4. Note that, prototype 4 is the most
favourite GUI in the experiment for general cases (Figure 7.13). This finding shows that the
majority of users regard that a notepad like interface is more appropriate for rapid notetaking purposes.
169
percentage of user
prototype 3 prototype 4
Figure 7.15: The distribution of user fondness for the presented prototypes in the
case of a small amount of text entry into handheld devices
Figure 7.15 illustrates the level of user fondness for the four prototypes, particularly in
relation to the need for entering a small amount of textual information into handheld devices,
for instance, entering a persons name into the name field of an address book. In contrast to
the case in Figure 7.14, the study finds that prototype 4 becomes the most favourite GUI
mainly due to the small amount of screen space taken by the shorthand recogniser while
other applications need to be run at the same time.
170
precentage of user
Figure 7.16: The comparison of the most favourite GUI of experienced shorthand
writers and that of novice shorthand writers
Finally, a comparison of the most favourite GUI of experienced shorthand writers and that of
novice shorthand writers (for the general purpose of use) is given in Figure 7.16. The study
finds that 100% of experienced shorthand writers prefer prototype 3 over the others, whereas
the majority of novice writers (80%) prefer prototype 4 over the others.
7.7 Discussion
This chapter presents the research and development findings of prototypes of the automatic
handwritten Pitmans Shorthand recogniser. It takes a step towards a commercialization of
the product by showing what can be done with the prototypes of the handwritten Pitmans
Shorthand recogniser.
171
is finally designed with the integration of prototype 3 and prototype 4 so that users are
provided with a choice of two different layouts (i.e., prototype 3 or prototype 4) in order to
interact with the automatic handwritten Pitmans Shorthand recogniser on tablet PCs.
172
8. Conclusion
Conclusion
173
8. Conclusion
Chapter 8 Introduction
This chapter presents the summary and conclusion of researches carried out in this thesis and
it is divided into the following four sections:
Contribution: draws attention to a list of major contributions that have been made to
the research and development in order to meet the overall objectives of the thesis,
outlined in chapter 1.
Future work: presents further research directions that may be taken in order to
improve upon the presented approaches for a commercially viable system.
Chapter 1 introduced the research of the thesis by highlighting a motivating need for the
development of new lexical post-processing methods to enhance the quality of text
interpretation of online handwritten Pitman shorthand outlines.
highlighted the necessity for the development of a functional user friendly graphical user
interface which facilitates a rapid text entry into pen based computing handheld devices
using handwritten Pitmans Shorthand.
174
8. Conclusion
A thorough literature review was carried out in Chapter 2 which overviews currently
available text entry systems into handheld devices and describes commonly used pattern
recognition and natural language processing algorithms that are applied to resolve problems
of the handwriting recognition.
The investigation into the efficiency of a conventional phonetic based word transcription
approach (where primitives of a shorthand script are firstly converted into a phonetic
representation in order to interpret it once more into corresponding English words with the
use of a phonetic dictionary) was discussed in Chapter 3. It was shown that the approach is
not robust against ambiguities of Pitmans Shorthand, in particular, ambiguities of the
random omission of vowels among outlines.
development of a novel Bayesian Network based word transcription algorithm which aims to
enhance the solution using a primitive based transcription approach (Chapter 4). In the new
approach, primitives of a shorthand script are directly converted into autographic English
word(s) without getting transformed into phonemes with the use of a Pitmans Shorthand
lexicon. It was shown that the new primitive based approach outperforms the conventional
phonetic based method.
In relation to the primitive to text transcription approach, Chapter 5 presented the automatic
generation of a novel machine-readable Pitmans Shorthand lexicon which is an essential
component facilitating the primitive based transcription of the Bayesian Network based
word recogniser. The lexicon was shown to be a very effective mechanism for automatically
generating Pitmans Shorthand representations for any given words.
Following an extensive research into the word level transcription of handwritten Pitmans
Shorthand outlines, Chapter 6 proposed new contextual methods to enhance the solution
quality of the phrase level transcription problem. It was shown that the application of a well
175
8. Conclusion
Finally, prototypes of end user graphical user interfaces (GUIs), designed to demonstrate the
real time recognition of handwritten Pitmans Shorthand on a tablet PC are presented in
Chapter 7. This involves an evaluation of the user friendliness of the prototypes as well as
the selection of a final GUI for the whole system based on experimental results.
8.2 Contribution
A number of original contributions have been drawn from the thesis and they are identified
as follow:
For the first time, an investigation into the integration of the low-level online
handwritten Pitmans Shorthand recogniser with the high-level linguistic postprocessor is presented. It is shown that the integration has resulted in a more
productive quality research than the work reported in the literature of the same
framework.
For the first time, the Bayesian Network representation is applied to the modelling
of handwritten Pitmans Shorthand outlines. A series of experiments are carried out
to analyse the transcription performance of the Bayesian Network based word
interpreter. The findings show that the Bayesian Network representation is robust
176
8. Conclusion
against stroke variation and highly effective for handling major ambiguities of
handwritten Pitmans Shorthand (i.e., classification errors and vowel errors).
For the first time, a machine readable Pitmans Shorthand lexicon is generated. The
findings show that the capability of the lexicon (i.e., ability to produce an accurate
Pitmans Shorthand representation for a corresponding word) plays an important
role in producing high quality solutions.
A complete but resolute testing data set (which covers the whole range of rules of
Pitmans Shorthand) is proposed. The dataset is sufficient to be applied as a quality
benchmark dataset in the literature.
For the first time, the development of end user graphical user interfaces for enabling
Pitmans Shorthand data entry into tablet PCs is carried out. It is shown that the
final interface of the system is ready to be introduced as a commercially viable
prototype.
177
8. Conclusion
commercially viable product. A major reward gained from the future cooperative research
will be the removal of the limitations of the current system, in particular recovering
segmentation errors of the recognition engine by allowing an interactive processing between
the recognition engine and the transcription engine. In the current Bayesian Network model,
the modeling of segmentation ambiguities is infeasible mainly due to the lack of real time
interaction with the low level segmentation process of the recognition engine. With an
interactive supply of low level segmentation data, Bayesian Network based stroke models
for Pitmans Shorthand notations can be added in connection with existing shorthand outline
models. In this way, segmentation ambiguities can be embedded in probabilistic models and
recovered in the lexical post-processing stage. Overall, it may be worthwhile exploring a
solution to segmentation errors, which is a critical issue in the recognition of natural
handwriting.
178
8. Conclusion
8.4 Dissemination
The research carried out in this thesis has been disseminated in pattern recognition specific
international journals and conference proceedings. The following provides a list of papers
that have been published or submitted throughout the research.
1. Swe Myo Htwe, Colin Higgins, Graham Leedham & Ma Yang, Knowledge based
transcription of Pitmans handwritten shorthand using word frequency and context,
179
8. Conclusion
2. Ma Yang, Graham Leedham, Colin Higgins, & Swe Myo Htwe, Segmentation and
recognition of vocalized outlines in Pitman shorthand, Proceedings of the 17th International
Conference on Pattern Recognition, Vol. I, ISBN 0-7695-2128-2, pp. 441-444, Cambridge,
UK, 23-26 August 2004.
3. Swe Myo Htwe, Colin Higgins, Graham Leedham & Ma Yang, Post Processing of
Handwriting Pitmans Shorthand using Unigram and Heuristic Approaches, Published in
Lecture Notes in Computer Science: Document Analysis Systems VI, 3163, SpringerVerlag, pp. 332-336, Proceedings of the IAPR workshop on document analysis systems,
University of Florence, Italy, 8-10 September 2004.
4. Ma Yang, Graham Leedham, Colin Higgins & Swe Myo Htwe, Segmentation and
recognition of phonetic features in handwritten Pitman shorthand, Pattern Recognition,
August 2004, Accepted and in press.
5. Swe Myo Htwe, Colin Higgins, Graham Leedham & Ma Yang, Evaluation of Feature
Sets in the Post Processing of Handwritten Pitmans Shorthand, Proceedings of the 9th
International Workshop on Frontiers in Handwriting Recognition, ISBN 0-7695-2187-8, pp.
359-364, Kokubunji, Tokyo, Japan, 26-29 October 2004.
6. Swe Myo Htwe, Colin Higgins & Graham Leedham, Post-processing of handwritten
Phonetic Pitmans Shorthand using a Bayesian Network built on geometric attributes, In
Pattern Recognition and Image Analysis, Lecture Notes in Computer Science 3687,
180
8. Conclusion
Springer, Sameer Singh, Maneesha Singh, Chid Apte, Petra Perner (Eds.), pp. 569-579,
2005.
7. Swe Myo Htwe, Colin Higgins & Graham Leedham, Transliteration of online
handwritten phonetic Pitmans Shorthand with the use of a Bayesian Network, Proceedings
of the 8th International Conference on Document Analysis and Recognition, Vol. 2, pp.
1090-1094, Seoul, Korea, 29 August - 1 September 2005.
8. Ma Yang, Graham Leedham, Colin Higgins & Swe Myo Htwe, On-line recognition of
Pitmans Shorthand for fast mobile text entry, Proceedings of the 3rd IEEE International
Conference on Information Technology and Applications, pp. 686-691, Sydney, Australia,
4-7 July 2005.
181
References
References
Abo04
About
Pen
Input,
Ink
and
Recognition,
available
from
www.msdn.microsoft.com, 2004.
Bc85
Brooks C.P., Computer Transcription of Written Shorthand for the Deaf, PhD
Thesis, Faculty of Engineering, University of Southampton, 1985.
BN81
Brooks C.P. & Newell A.F., Simultaneous Transcription of Pitman's New Era
Shorthand, Int. Conf. on Microprocessors in Automation and Communications,
pp. 171-179, London, 27-29 Jan, 1981.
BN85
BSH04
Bishop C.M., Svensen M., Hinton G.E., Distinguishing Text from Graphics in
On-line Handwritten Ink, in Proceedings of the Ninth International Workshop
on Frontiers in Handwriting Recognition (IWFHR'09), pp. 142-147, Tokyo,
Japan, 26-29 October, 2004.
CDL+99
CFH03
Chen F-S., Fu C-M. & Huang C-L., Hand Gesture Recognition Using a Realtime Tracking Method and Hidden Markov Models, Image and Vision
Computing, Vol.21(8): pp. 745-758, 2003.
182
References
CK04
Cho S-J. & Kim J.H., Bayesian Network Modeling of Strokes and their
Relationships for On-line Handwriting Recognition, Pattern Recognition, Vol.
37(22): pp. 253-264, 2004.
ESS+98
El-Yacoubi A., Sabourin R., Suen C.Y. & Gilloux M., Improved Model
Architecture and Training Phase in a Off-line HMM-based Word Recognition
System, in Proc. of the 14th International Conference on Pattern Recognition,
pp.17-20, Brisbaine, Australia, 1998.
FF93
Feddag A. & Foxley E., A Lexical Analyser for Arabic, International Journal of
Man-Machine Studies, Vol.38(2): pp.313-330, Feburary, 1993.
FW00
Freeman W. & Weiss Y., On the Fixed Points of the Max-Product Algorithm,
IEEE Transactions on Information Theory, 2000.
GB04
Gnter S., & Bunke H., HMM-Based Handwritten Word Recognition: on the
Optimization of the Number Of States, Training Iterations And Gaussian
Components, Pattern Recognition , Vol. 37, pp. 2069 - 2079, 2004.
GKM+97
Gloger J., Kaltenmeier A., Mandler E. & Andrews L., Rejection Management in
a Handwriting Recognition System, in Proc. 4th International Conference
Document Analysis and Recognition, pp.556-559, Ulm, Germany, 1997.
HD96
Hd99
183
References
HHL+04a
Htwe S. M., Higgins C., Leedham C.G. & Yang M., Knowledge Based
Transcription of Pitman's Handwritten Shorthand Using Word Frequency and
Context, in the Proceedings of the 7th IEEE International Conference on
Development and Application Systems, pp. 508-512, Suceava, Romania, 27-29
May 2004.
HHL+04b Htwe S.M., Higgins C., Leedham C.G. & Yang M., Post Processing of
Handwriting Pitmans Shorthand using Unigram and Heuristic Approaches, In
Document Analysis Systems VI, 3163, Lecture Notes in Computer Science,
Springer-Verlag, pp. 332-336, 2004.
HHL+04c
Htwe S.M, Higgins C., Leedham C.G. & Yang M., Evaluation of Feature Sets
in the Post Processing of Handwritten Pitmans Shorthand, Proceedings of the
9th International Workshop on Frontiers in Handwriting Recognition, ISBN 07695-2187-8, pp. 359-364, Kokubunji, Tokyo, Japan, 26-29 October 2004.
HHL+05a
Htwe S.M, Higgins C & Leedham C.G & Yang M., Transliteration of Online
Handwritten Phonetic Pitmans Shorthand with the use of a Bayesian Network,
Proceedings of the 8th International Conference on Document Analysis and
Recognition, Vol. 2, pp. 1090-1094, Seoul, Korea, 29 August - 1 September
2005.
HHL+05b Htwe S.M., Higgins C. & Leedham C.G & Yang M., Post-processing of
Handwritten Phonetic Pitmans Shorthand using a Bayesian Network Built on
Geometric Attributes, Proceedings of the 3rd International Conference on
Advances in Pattern Recognition, pp. 569- 579, Bath, UK, 22-25 August 2005.
184
References
HHL+05c
Htwe S.M., Higgins C., Leedham C.G. & Yang M., Post-processing of
Handwritten Phonetic Pitmans Shorthand using a Bayesian Network Built on
Geometric Attributes, In Pattern Recognition and Image Analysis, Lecture
Notes in Computer Science 3687, Springer, Sameer Singh, Maneesha Singh,
Chid Apte, Petra Perner (Eds.), pp. 569-579, 2005.
HLB00
Hu J., Lim S.G. & Brown M.K., Writer Independent On-line Handwriting
Recognition using an HMM Approach, Pattern recognition, Vol. 33(1): pp. 133147, 2000.
Hn97
HSS02
Hu T., Silva L.C. D. & Sengupta K., A Hybrid Approach of NN and HMM for
Facial Emotion Classification, Pattern Recognition, Vol. 23(11): pp. 1303-1310,
2002.
HV93
Hoffman J.S. & Vidal J.J., Cluster Network for Recognition of Handwritten,
Cursive Script Characters, Neural Networks, Vol.6(1): pp.69-78, 1993.
Ja99
Bilmes J.A., Natural Statistical Models for Automatic Speech Recognition, PhD
Dissertation, University of California at Berkeley, May 1999.
JBS05
Justino E. J. R., Bortolozzi F. & Sabourin R., A comparison of SVM and HMM
Classifier in the Off-line Signature Verification, Pattern Recognition Letters,
185
References
JGJ+98
JJ98
Jaakkola T.S. & Jordan M.I., Learning in Graphical Models, Chapter Improving
the Mean Field Approximations via the Use of Mixture Distributions. Kluwer
Academic Publishers, 1998.
Jm99
Ka04
KKL03
Kim M., Kim D. & Lee S-Y., Face Recognition Using the Embedded HMM
with Second-order Block-specific Observations, Pattern Recognition, Vol.
36(11): pp. 2723-2735, 2003.
KP01
Kapoor A. & Picard R., A real-time Head Nod and Shake Detector, in
Proceedings from the Workshop on Perspective User Interfaces, November
2001.
KSN+03
Kumar G.H., Shankar M.R., Nagabushan P., Anami B.S., Generation of Pitman
Shorthand Language Symbol for Diphthongs Grammalogues and Punctuation
from Spoken English Language Text: An Approach based on Discrete Wavelet
186
References
KSN04
LD84
LD86
LD87
LDB84
Leedham C.G., Downton A.C., Brooks C.P. & Newell A.F., On-line
Acquisition of Pitman's Handwritten Shorthand as a Means of Rapid Data
Entry, Proc. of the 1st Int. Conf. on Human-Computer Interaction, pp. 2.862.91, London, UK, 4-7 September 1984.
187
References
LDB85
Leedham C.G., Downton A.C., Brooks C.P. & Newell A.F., On-line
Acquisition of Pitman's Handwritten Shorthand as a Means of Rapid Data
Entry, Human-Computer Interaction - INTERACT '84, B. Shackel (Ed.), pp.
151-156, North Holland, 1985.
Lg84
Lg89
Lg90
LQ89
Leedham C.G. & Qiao Y., On-line Recognition of Vocalised Pitman shorthand
Outlines, Proc. of the IEE Colloquium on Character Recognition and
Applications, Digest No. 1989/109, pp. 10/1-10/5, Savoy Place, London, 2
October 1989.
LQ90
188
References
LQ92
Leedham C.G. & Qiao Y., High Speed Text Input to Computer Using
Handwriting, Instructional Science, Vol. 21, pp. 209-221, September 1992.
Lr89
LY97
MAG+02
Marukata S., Artires T., Gallinari P. & Dorizzi B., Rejection Measures for
Handwriting Sentence Recognition, in Proc. 8th International Workshop on
Frontiers in Handwriting Recognition, pp. 24-29, Niagara-on-the-Lake, Canada,
2002.
MB01
Marti U-V. & Bunke H., Using a Statistical Language Model to Improve the
Performance of an HMM-Based Cursive Handwriting Recognition System,
IJPRAI, Vol. 15(1): pp. 65-90, 2001.
Md98
Mic04
Mk01
Mk98
189
References
Ml00
Mm03
Miozzo M., On the Processing of Regular and Irregular Forms of Verbs and
Nouns: Evidence from Neuropsychology, Congnition, Vol.87(2), pp. 101-127,
March 2003.
Ms01
MS04
MS99
Mt98
Toshiyuki M., POBox: An efficient Text Input Method for Handheld and
Ubiquitous Computers, Proc. of the ACM Conference on Human Factors in
Computing System (CHI98), Los Angeles, USA, pp. 328-335, April 1998.
NB02
190
References
NL92
Oj95
Osborne J., Pitman 2000: Shorthand First Course (Pitman 2000 Shorthand),
1995.
Pj88
Pj88
PP02
Pitrelli J.F. & Perrone M.P., Confidence Modeling for Verification PostProcessing for Handwriting Recognition, in Proc. 8th International Workshop
on Frontiers in Handwriting Recognition, pp.30-35, Niagara-on-the-Lake,
Canada, 2002.
PS91
Peot M. & Shachter R., Fusion and Propagation with Multiple Observations in
Belief Networks, Artificial Intelligence, Vol.48: pp. 299-318, 1991.
PVM+03
Perraud F., Viard-Gaudin C., Morin E. & Lallican P-M., N-Gram and N-Class
Models for on Line Handwriting Recognition, In 7th ICDAR, pages 1053-1059,
2003.
QAC05
Quiniou S., Anquetil E. & Carbonnel S., Statistical language Models for Online Handwritten Sentence Recognition, Proceedings of the Eighth International
Conference on Document Analysis and Recognition (ICDAR 2005), pp. 516-
191
References
QL89
QL91
QL93
Ri93
Rl89
Sa04
Seward A., A fast HMM Match Algorithm for very Large Vocabulary Speech
Recognition, Speech Comm, Vol. 42, pp. 191-206, 2004.
SJJ96
Saul L.K., Jaakkola T. & Jordan M.I., Mean Field Theory for Sigmoid Belief
Networks, Journal of Artificial Intelligence Research, Vol. 14: pp. 61-76, 1996.
192
References
SKN+04
Shankar M.R., Kumar G.H., Nagabhushan P., Anami B.S., Linear Predictive
Coefficient Based Approach for Generation of Pitman Shorthand Language
Characters from Spoken English Language, Proceedings of the International
Conference on Computational Science (ICCS2004), Krakow, Poland, June 6-9,
2004.
Sr98
Sy94
Tab04
VBB04
WF99
WH93
193
References
Win05
Windows
XP
Tablet
PC
Edition
2005
Features,
available
from
XL02
YLH+04a
Yang M., Leedham C.G., Higgins C., & Htwe S.M, Segmentation and
Recognition of Vocalized Outlines in Pitman shorthand, Proceedings of the
17th International Conference on Pattern Recognition, Vol. I, ISBN 0-76952128-2, pp. 441-444, Cambridge, UK, 23-26 August 2004.
YLH+04b Yang M., Leedham C.G., Higgins C. & Htwe S.M., Segmentation and
Recognition of Phonetic Features in Handwritten Pitman Shorthand, Pattern
Recognition, August 2004, Accepted and in press.
YLH+05a
Yang M., Leedham C.G., Higgins C. & Htwe S.M, On-line Recognition of
Pitmans Shorthand for Fast Mobile Text Entry, Proceedings of the 3rd IEEE
International Conference on Information Technology and Applications, pp. 686691, Sydney, Australia, 4-7 July 2005.
YLH+05b Yang M., Leedham C.G., Higgins C. & Htwe S.M., An On-line Automatic
Recognition System for Pitmans Shorthand, Proceedings of the IEEE Region
10 Technical Conference (TENCON05), Melbourne, Australia, 21-24
November 2005.
194
References
YLH+05c
Yang M., Graham Leedham., Higgins C. & Htwe S.M., Critical Technological
Issues of Commercializing a Pitman Shorthand Recognition System,
Proceedings
of
the
5th
International
Conference
on
Information,
YWP95
Yang L., Widjaja B. K. & Prasad R., Application of Hidden Markov Models for
Signature Verification, Pattern Recognition, Vol.28(22): pp. 161--170, 1995.
ZB04
ZK03
195
Appendix
Chapter 9 Appendix A
symbol (
(Figure 9.1).
Strategy: convert a combination of /Y/, /ZH/, /JH/ or /CH/ and /U/ or /AH/ into a diphthong
feature
.
Reference
Pitmans Shorthand outline
for the word refuse
/R/
/F/
/S/
/Y/
196
(diphthong)
Appendix
Figure 9.1: Illustration of the use of diphthong feature
in a vocalised outline
/COM/
/M/
/NS/
//
Figure 9.2: Illustration of the use of a dot primitive for the sound COM at the
beginning of a word
197
Appendix
Reference
Pitmans Shorthand notations
Pitmans outline for the
word where
/WH/
/R/
//
198
Appendix
In addition, the 5th rule states that a consonant /D/ following the prefix IN- and UN- is not
allowed to be omitted. This is to avoid a conflict with the ND writing rule of Pitmans
Shorthand, in which the consonant /D/ following /N/ is omitted. Detail about the ND rule
can be referenced in Appendix B.
Reference
Pitmans Shorthand notations
Pitmans Shorthand outline for
the word irregular
/R/
(start)
//
/R/
(middle)
//
/G/
/L/
/R/
(end)
//
/U/
Figure 9.4: Illustration of the use of negative prefix IR- in a vocalised outline
199
Appendix
a = hook + P stroke
b= hook + B strokes
c = hook + T stroke
d = hook + D stroke
e = hook + CH stroke
f = hook + J stroke
g = hook + K stroke
h = hook + G stroke.
Reference
Pitmans Shorthand outline
for the word play
/P/
/L/
/PL/
//
200
Appendix
/F/
/L/
/FL/
//
Figure 9.6: Illustration of the use of FL hook at the beginning of a vocalised outline
/S/
/P/
/R/
/PR/
201
/SPR/
//
Appendix
Figure 9.7: Illustration of the use of SPR stroke in a vocalised outline
/M/
/S/
/T/
/R/
/STER/
/AH/
202
Appendix
/K/
/N/
/S/
/D/
//
//
Figure 9.9: Illustration of omission of the sound CON in the middle of a vocalised
outline
/SEZ/
//
203
Appendix
Objective: to convert the sound /ED/ that makes a verb a past tense into a disjoined stroke T
or stroke D. A sample Pitmans Shorthand outline containing a disjointed /ED/ at the end is
illustrated in Figure 9.11.
Strategy: if a word is ended with /T/ or /D/ and if a vowel comes before /T/ or /D/, then
replace /vowel+T/ or /vowel+D/ at the end of the word with T or D stroke respectively.
Reference
Pitmans Shorthand notations
/D/
/T/
//
/ED/
204
Appendix
/T/
/ST/
/K/
//
/K/
/T/
//
205
Appendix
Strategy: if an input ends with the sound ING, convert the sound ING into a dot primitive.
Reference
Pitmans Shorthand notations
/K/
/P/
//
/ING/
/T/
/K/
//
/INGZ/
Figure 9.15: Illustration of the use of the suffix INGS in a vocalised outline
206
Appendix
Reference
Pitmans Shorthand outline
for the word scholarship
/S/
/K/
/L/
/R/
/AW/
// /SHIP/
Figure 9.16: Illustration of the use of the suffix SHIP in a vocalised outline
/B/
/Z/
//
//
207
Appendix
/P/
/RT/
/MENT/
//
/AH/
Figure 9.18: Illustration of the use of the suffix MENT in a vocalised outline
Reference
Pitmans Shorthand notations
/P/
/S/
/P/
/R/
/MENTAL/ //
208
/ /
Appendix
Figure 9.19: illustration of the use of the suffix MENTAL in a vocalised outline
Reference
Pitmans Shorthand notations
/P/
/S/
/P/
/R/
/MENTAL/ //
/ /
Figure 9.20: illustration of the use of the suffix MENTALLY in a vocalised outline
209
Appendix
Reference
Pitmans Shorthand notations
/FTER/
/F/
//
Figure 9.21: Illustration of the omission of the syllable TER in a vocalised outline
Primitive pairs that cannot be represented by doubling
/F K/
/V K/
/F G/
/V G/
210
Appendix
Reference
Pitmans Shorthand notations
/MD/
/M/
//
//
211
Appendix
Reference
Pitmans Shorthand notations
/F/
/R/
/FR/
//
Figure 9.24: Illustration of the use of FR hook at the beginning of a vocalised outline
212
Appendix
Reference
Pitmans Shorthand notations
/P/
/R/
/PER/
//
Reference
Pitmans Shorthand notations
/D/
/S/
/K/
/R/
/B/
/SKR/
213
//
/I/
Appendix
Figure 9.26: Illustration of occurrence of the sound SKR in a vocalised outline
Reference
Pitmans Shorthand notations
/K /
/W/
/KW/
//
214
Appendix
Reference
Pitmans Shorthand notations
/F/
/L/
/F U L /
//
215
Appendix
Reference
Pitmans Shorthand notations
/R/
/H/
/S /
/R/
//
/AW/
Reference
Pitmans Shorthand notations
/S /
/P/
/R/
/PR/
/S PER/
//
216
Appendix
Reference
Pitmans Shorthand notations
/N/
/L/(down)
/ /
//
217
Appendix
Reference
Pitmans Shorthand notations
/R/
/F/
/R F /
//
Reference
Pitmans Shorthand notations
/D/
/V/
/D V/
/D/
//
/I/
Figure 9.33: Illustration of the use of hook V in the middle of a vocalised outline
218
Appendix
Objective: to convert the sound SHUN in the middle or at the end of a word into a small or
large hook. A sample Pitmans Shorthand outline containing a large SHUN hook at the end
is illustrated in Figure 9.34.
Strategy:
1. if the sound SHUN appears at the beginning of a word, it is not converted into a
hook;
2. if the sound SHUN is preceded by a circle S or Z, it is converted into a small hook;
3. otherwise, the sound SHUN in the middle or at the end of an input is converted into
a large hook.
Reference
Pitmans Shorthand notations
/T/
/N/
/SHUN /
//
Figure 9.34: Illustration of the use of a large SHUN hook in a vocalised outline
219
Appendix
5. If /N Z/ appears at the end of an input and is preceded by a curve stroke, the sound
/N Z/ is converted into a series of a small hook followed by a small circle.
6. If /N Z/ or /N S/ appears at the end of an input and is preceded by a straight stroke,
the sound /N Z/ or /N S/ is converted into a small circle.
7. If the sound /N SES/ or /N ZES/ appears at the end of an input and is preceded by a
straight stroke, the sound /N SES/ or /N ZES/ is converted into a large circle.
8. If the sound /N STER/ or /N ST/ appears at the end of an input and is preceded by a
straight stroke, the sound /N STER/ or /N ST/ is converted into a large loop or small
loop respectively.
9. If the sound /N T S/ or /N D S/ appears at the end of an input and is preceded by a
straight stroke, the sound /N T S/ or /N D Z/ is converted into a small circle.
Reference
Pitmans Shorthand notations
/L/
/N/
/N /
//
//
Figure 9.35: Illustration of the use of hook N at the end of a vocalised outline
220
Appendix
Reference
Pitmans Shorthand notations
/M/
/L/
/ /
Reference
Pitmans Shorthand notations
/D/
/D D/
/ K/
/K T/
//
//
221
Appendix
A series of primitives that cannot be represented by halving
R+T/D
S+T
L+K/G+T/D
Reference
Pitmans Shorthand notations
/S/
/L/
/ /
/ /
222
Appendix
Reference
Pitmans Shorthand notations
/R/
/L/
/ R/
/S/
//
//
223
Appendix
Reference
Pitmans Shorthand notations
/H/
/H/
/M/
//
Reference
Pitmans Shorthand notations
/K/
/V R/
/V R/ (reversed)
//
224
Appendix
Objective: to convert consonants that have not been converted into geometric features into
their corresponding primitives.
Strategy:
1. replace the consonant /P/ with stroke P
2. replace the consonant /B/ with stroke B
3. replace the consonant /T/ with stroke T
4. replace the consonant /D/ with stroke D
5. replace the consonant /K/ with stroke K
6. replace the consonant /G/ with stroke G
7. replace the consonant /M/ with stroke M
8. replace the consonant /N/ with stroke N
9. replace the consonant /NG/ with stroke NG
10. replace the consonant /F/ with stroke F
11. replace the consonant /V/ with stroke V
12. replace the consonant /Th/ with stroke Th
13. replace the consonant /TH/ with stroke TH
14. replace the consonant /W/ with a series of small hook followed by upward R
15. replace the consonant /Y/ with a series of small hook followed by upward R
16. replace the consoant /CH/ with stroke CH
17. replace the consonant /JH/ with stroke JH
18. replace the consonant /SH/ with stroke SH
19. replace the consonant /S/ with a small circle
20. replace the consonant /Z/ with a small circle
21. replace the consonant /ZH/ with a stroke ZH
22. replace the consonant /H/ with a series of small circle, followed by stroke R
The 45th rule: extract vowels of a word and append them to the end of consonant primitives.
225
Appendix
The 46th rule: convert vowels into their corresponding geometric primitives.
226
Appendix
Appendix B
Rules name
MD, ND
Half-length strokes
Suffix -LY
The initially hooked FR, VR, Thr, THR are always reversed
THR
227