SweMyoHtweThesis PDF

Lexicon Organisation and
Contextual Methods for Online

Handwritten Pitmans
Shorthand Recognition
by Swe Myo Htwe, BSc
Thesis submitted to the University of Nottingham for the degree of

Doctor of Philosophy
School of Computer Science and Information Technology
December 2006
To my parents and fianc
ii
Abstract
This research investigates innovations for the computer transcription of handwritten Pitmans
Shorthand as a rapid means of text entry (up to 100 words per minute) into todays pen-based
handheld devices.
Two mathematical models are developed in this work. The first model deals with high level
phonetic-based translation, while the second model is specifically concerned with low level
primitive-based translation. Both models are closely related to the lexicon organization and
contextual processing for online handwritten Pitmans Shorthand recognition.
A number of research issues that arise from interpreting handwritten Pitmans Shorthand strokes
of digital ink as text are addressed including: (a) a feasibility study into improving a conventional
phonetic-based transliteration approach to advance word recognition; (b) an investigation into
new Bayesian Network modelling of strokes and their relationships in order to solve the problem
of geometric variations and vowel ambiguities of handwritten Pitmans Shorthand; (c) generation
of a new machine-readable Pitmans Shorthand lexicon to facilitate the direct transcription of
geometric features of Pitmans Shorthand into English text; (d) analysis of the impact of
statistical language modelling in handwriting phrase recognition; (e) and a discussion of the
graphical user interface issues in relation to the development of a commercial prototype from the
frame of reference of this research.
The research has been carried out in close cooperation with Nanyang Technology University
(NTU) in Singapore. The system is currently undergoing a final evaluation in terms of its
recognition accuracy, as well as its potential to be introduced as a commercially viable fast text
input system.
iii
Acknowledgements
I would like to take this opportunity to express my sincere gratitude to my supervisor Dr.
Colin Higgins for his valuable guidance and constant support since the day I had stepped
into the School of Computer Science of the University of Nottingham till the day of the
successful completion of this research.
My sincere gratitude also goes to Professor Graham Leedham for his dedicated guidance and
genuine assistance for keeping the close collaboration between the two participating teams
of this research. My deepest thanks also go to Ma Yang for her heartfelt contribution and
her immediate responses during the critical time of this collaborative research.
My sincere thanks also go to Ms. Joyce Cox for her kind and professional help in proof
reading the quality of English of the thesis. Also, from the bottom of my heart, I am very
grateful to all participants who have helped me in the experiments of this research. Many
thanks also go to my colleagues in the LTR Research group for their warm friendship that
made me feel at home in our LTR lab.
Also, my endless thanks to my uncle, Dr. Kyin Win for supporting me financially and
emotionally to make my dream of participating in the doctorate research come true. My
heartfelt thanks also go to the International Office and the School of Computer Science of
the University of Nottingham for their enormous financial support for the development of
this research.
Also, my sincere love and thanks to my parents, fianc, and all my friends in Nottingham for
supporting me financially, emotionally and spiritually during the difficult days of my long
residence in Nottingham.
Last but not least, my sincere thanks to all the members of the School of Computer Science
of the University of Nottingham for all their help and advice, given to me when I needed it
most.
Thank you all, Swe Myo Htwe.
iv
Table of contents
ABSTRACT ..II
ACKNOWLEDGEMENTSIII
TABLE OF CONTENTS ...IV
LIST OF FIGURES
LIST OF TABLES .
1
LINGUISTIC POST PROCESSING OF HANDWRITTEN PITMANS SHORTHAND .. 1
CHAPTER 1
INTRODUCTION ................................................................................................... 2
1.1
BACKGROUND...................................................................................................................... 3
1.1.1
Collaboration ................................................................................................................. 3
1.1.2
Motivation ...................................................................................................................... 4
1.1.3
Scope .............................................................................................................................. 5
1.2
BRIEF OVERVIEW ................................................................................................................. 7
1.2.1
General Objectives and Contributions ........................................................................... 7
1.3
SYNOPSIS OF THE DISSERTATION ....................................................................................... 12
2
BACKGROUND TO THE AUTOMATIC RECOGNITION OF HANDWRITTEN
PITMANS SHORTHAND ............................................................................................................... 15
CHAPTER 2
INTRODUCTION ................................................................................................. 16
2.1
2.1.1
2.1.2
2.1.3
2.1.4
2.2
2.3
2.4
EVALUATION OF EXISTING TEXT INPUT SYSTEMS FOR HANDHELD DEVICES ....................... 16

On-screen keyboards vs. a handwritten Pitmans Shorthand recognizer..................... 17
A cursive handwriting recognizer vs. a handwritten Pitmans Shorthand recognizer . 17
Gesture based text entry systems vs. a handwritten Pitmans Shorthand recognizer... 18
Speech recognition systems vs. a handwritten Pitmans Shorthand recognizer........... 19
PITMANS SHORTHAND: A BRIEF OVERVIEW ...................................................................... 20
AUTOMATIC RECOGNITION OF HANDWRITTEN PITMANS SHORTHAND: AN OVERVIEW ...... 23
HANDWRITING RECOGNITION ALGORITHMS TO IMPROVE A WORD LEVEL TRANSLITERATION
26
2.4.1
Hidden Markov Models (HMMs) ................................................................................. 27
2.4.2
Neural Networks........................................................................................................... 28
2.4.3
Bayesian Networks ....................................................................................................... 29
2.4.3.1
2.4.3.2
2.4.3.3
Conditional independence.................................................................................................30
Inference ........................................................................................................................... 32
Learning............................................................................................................................33
2.5
NATURAL LANGUAGE PROCESSING ALGORITHMS FOR HANDWRITTEN PHRASE RECOGNITION

35
2.5.1
Statistical language modeling ...................................................................................... 35
2.5.2
Viterbi algorithm .......................................................................................................... 36
2.6
PEN APPLICATION PROGRAM INTERFACES (APIS) .............................................................. 37
2.7
SUMMARY.......................................................................................................................... 38
3
EVALUATION OF PHONETIC BASED TRANSCRIPTION OF VOCALISED
HANDWRITTEN PITMANS OUTLINES..................................................................................... 39
CHAPTER 3
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.7.1
3.7.2
INTRODUCTION ................................................................................................. 40
SYSTEM OVERVIEW ............................................................................................................ 41

TRANSCRIPTION OF VOCALIZED OUTLINES BASED ON A PHONETIC APPROACH ................... 43
LEXICON PREPARATION ..................................................................................................... 44
NEAREST NEIGHBOURHOOD QUERY (NNQ) ...................................................................... 47
FEATURE TO PHONEME CONVERSION ................................................................................ 49
PHONEME ORDERING ......................................................................................................... 51
EXPERIMENTAL RESULTS ................................................................................................... 54
Data Set ........................................................................................................................ 54
Analysis of a phonetic lexicon ...................................................................................... 55
v
3.7.3
Performance evaluation of the word level transcription .............................................. 57
3.8
DISCUSSION ....................................................................................................................... 58
4
BAYESIAN NETWORK BASED WORD TRANSCRIPTION ........................................... 61
CHAPTER 4
4.1
4.2
4.3
4.4
4.4.1
4.4.2
4.5
4.5.1
4.5.2
4.6
4.6.1
4.6.2
4.7
4.8
4.8.1
4.8.2
4.8.3
4.8.4
INTRODUCTION ................................................................................................. 62
SYSTEM OVERVIEW ........................................................................................................... 63

SUMMARY OF BAYESIAN NETWORK BASED WORD TRANSCRIPTION................................... 64
LIFE CYCLE OF OUTLINE MODELS ..................................................................................... 65
OUTLINE MODEL ARCHITECTURE ...................................................................................... 67
Nodes of an outline model ............................................................................................ 68
Relationships between nodes........................................................................................ 73
INFERENCE ......................................................................................................................... 74
Message Initialization .................................................................................................. 75
Belief Updating ............................................................................................................ 76
LEARNING OF OUTLINE MODELS ....................................................................................... 78
Learning of consonant primitives ................................................................................. 79
Learning of vowel primitives ........................................................................................ 81
MODEL SELECTION ............................................................................................................ 82
EXPERIMENTAL RESULT ..................................................................................................... 86
Data set ........................................................................................................................ 87
Evaluation of the recognition engine............................................................................ 89
Evaluation of the word transcription accuracy ............................................................ 93
Analysis of word transcription accuracy using the single consonant data set ............. 94
4.8.4.1
4.8.4.2
4.8.4.3
4.8.4.4
4.8.5
Analysis of word transcription accuracy using stroke-combination data set ............... 99
4.8.5.1
4.8.5.2
4.8.5.3
4.8.5.4
4.8.6
Analysis of the recognition accuracy vs. the transcription accuracy.................................99

Analysis of the accuracy of a result list .......................................................................... 100
Analysis of the correction accuracy vs. the classification/vowel errors ..........................101
Analysis of factors influencing the accuracy of a result list............................................102
Analysis of word transcription accuracy for the special-rule data set ....................... 103
4.8.6.1
4.8.6.2
4.8.6.3
4.8.6.4
4.9
Analysis of the recognition accuracy vs. the transcription accuracy.................................94

Analysis of the accuracy of a result list ............................................................................95
Analysis of the correction accuracy vs. the classification/vowel errors ............................ 97
Analysis of factors influencing the accuracy of a result list.............................................. 98
Analysis of the recognition accuracy vs. the transcription accuracy............................... 103

Analysis of the accuracy of the result list .......................................................................104
Analysis of the correction accuracy vs. the classification/vowel errors ..........................105
Analysis of factors influencing the accuracy of a result list............................................106
DISCUSSION ..................................................................................................................... 107
GENERATION OF A MACHINE-READABLE PITMANS SHORTHAND LEXICON

110
CHAPTER 5
INTRODUCTION ............................................................................................... 111
5.1
OVERVIEW ....................................................................................................................... 112
5.1.1
Rule-based creation of the electronic Pitmans Shorthand lexicon............................ 113
5.2
STRUCTURE OF THE ELECTRONIC PITMANS SHORTHAND LEXICON ................................. 114
5.2.1
Feature set.................................................................................................................. 114
5.2.2
Key.............................................................................................................................. 115
5.2.3
Lexicon layout ............................................................................................................ 116
5.3
CONVERSION PROCEDURE ................................................................................................ 118
5.3.1
The importance of algorithms of the presented rules ................................................. 119
5.3.2
Description of Rules ................................................................................................... 120
5.4
EXPERIMENTAL RESULTS ................................................................................................. 127
5.4.1
Data set ...................................................................................................................... 127
5.4.2
Analysis of the accuracy of a machine readable Pitmans Shorthand lexicon ........... 128
5.4.3
Analysis of the distribution of homophones in machine-readable Pitmans Shorthand
lexicons 134
5.5
DISCUSSION ..................................................................................................................... 136
6
PHRASE LEVEL TRANSCRIPTION OF ONLINE HANDWRITTEN PITMANS
SHORTHAND OUTLINES ............................................................................................................ 137
vi
CHAPTER 6
6.1
6.2
6.3
6.4
6.5
INTRODUCTION ............................................................................................... 138
CONTEXTUAL REJECTION STRATEGY ............................................................................... 139

HANDWRITTEN PITMANS SHORTHAND PHRASE RECOGNITION ........................................ 141
THE INTEGRATION OF A PITMANS SHORTHAND PHRASE RECOGNISER WITH APIS........... 143
DISCUSSION ..................................................................................................................... 146
7
GRAPHICAL USER INTERFACES OF THE HANDWRITTEN PITMANS
SHORTHAND RECOGNITION SYSTEM................................................................................... 148
CHAPTER 7
INTRODUCTION ............................................................................................... 149
7.1
OVERVIEW ....................................................................................................................... 150
7.2
INK DATA COLLECTION IN THIS RESEARCH ....................................................................... 151
7.3
GENERAL TRAINING DATA COLLECTION TOOL ................................................................. 155
7.4
DEVELOPER GRAPHICAL USER INTERFACE ...................................................................... 158
7.5
SHORTHAND DATA ENTRY GRAPHICAL USER INTERFACES................................................ 159
7.6
7.6.1
Analysis of the general distribution of user fondness for the presented prototypes ... 166
7.6.2
Analysis of the distribution of user fondness for the presented prototypes in the case of
speed writing............................................................................................................................. 167
7.6.3
Analysis of the distribution of user fondness for the presented prototypes in the case of
a small amount of text entry into handheld devices .................................................................. 167
7.6.4
The comparison of the most favourite GUI of experienced shorthand writers and that
of novice shorthand writers ...................................................................................................... 168
7.7
DISCUSSION ..................................................................................................................... 169
8
CONCLUSION ....................................................................................................................... 171
CHAPTER 8
8.1
8.2
8.3
8.3.1
8.3.2
8.4
INTRODUCTION ............................................................................................... 172
RESEARCH WORK SUMMARY ............................................................................................ 172

CONTRIBUTION ................................................................................................................ 174
FUTURE WORK ................................................................................................................. 175
Improvement upon the overall system ........................................................................ 175
Application of the presented system to the real life problems .................................... 177
DISSEMINATION ............................................................................................................... 177
REFERENCES................................................................................................................................. 180
APPENDIX .......................................................................... ERROR! BOOKMARK NOT DEFINED.
vii
FIGURE 1.1: SCOPE OF THE THESIS........................................................................................... 6

FIGURE 1.2: A HIGH LEVEL VIEW OF THE SCOPE OF THE RECOGNITION ENGINE
AND THE TRANSCRIPTION ENGINE................................................................................... 9
FIGURE 2.1: ILLUSTRATION OF TEXT ENTRY USING SHARK SYSTEM (A) THE WORD
QUICK IS WRITTEN USING ATOMIK KEYBOARD LAYOUT (B) THE WORD
QUICK IS WRITTEN WITHOUT USING A TEMPLATE KEYBOARD ........................... 19
FIGURE 2.2: BASIC CONSONANTS OF PITMANS SHORTHAND AS ILLUSTRATED IN
[OJ95] ......................................................................................................................................... 21
FIGURE 2.3: /W/, /Y/, H/ CONSONANTS OF PITMANS SHORTHAND ..................................... 21
FIGURE 2.4: VOWELS, DIPHTHONGS AND DIPHONES OF PITMANS SHORTHAND.......... 21
FIGURE 2.5: ILLUSTRATION OF VOCALIZED OUTLINES ........................................................ 22
FIGURE 2.6: (A) BASIC NOTATIONS OF PITMANS SHORTHAND (B) THE WORD PLAY
IS WRITTEN PHONETICALLY USING BASIC NOTATIONS (C) THE WORD PLAY IS
WRITTEN USING A SPECIAL RULE OF PITMANS SHORTHAND ................................. 22
FIGURE 2.7: (A) SAMPLES OF SHORT FORMS (B) SAMPLES OF PHRASES........................... 23
FIGURE 2.8: A SAMPLE HMM MODEL FOR A SINGLE OUTLINE OF PITMANS
SHORTHAND. AT EACH STATE I, I PROBABILITY OF A PARTICULAR STROKE SI
TO BE TYPE TI IS OBSERVED. .............................................................................................. 27
FIGURE 2.9: AN INDIVIDUAL CELL A OF NEURAL NETWORK, MODELLED FOR THE
CLASSIFICATION OF HANDWRITTEN PITMANS SHORTHAND IN [LQ90] ................ 28
FIGURE 2.10: ILLUSTRATION OF STROKE DEPENDENCIES IN PITMANS SHORTHAND
(A) VOWEL DEPENDENCY (B) POSITIONAL DEPENDENCY OF THE FIRST
CONSONANT PRIMITIVE ...................................................................................................... 30
FIGURE 2.11: C IS CONDITIONALLY INDEPENDENT OF W GIVEN R .................................... 31
FIGURE 2.12: S IS CONDITIONALLY DEPENDENT ON R GIVEN AN OBSERVED

DATA, W ................................................................................................................................... 31
FIGURE 2.13: ILLUSTRATION OF THE BAYES BALL ALGORITHM [SR98]. IF THERE IS NO
FLOW OF A BALL FROM A TO B IN A GRAPH, A AND B ARE CONDITIONALLY
INDEPENDENT GIVEN A SET OF OBSERVED OR HIDDEN VARIABLES X AND VICE
VERSA. ...................................................................................................................................... 32
FIGURE 3.1: AN ABSTRACT VIEW OF THE WHOLE SYSTEM .............................................. 42
FIGURE 3.2: DETAILED VIEW OF A VOCALIZED OUTLINE INTERPRETER...................... 44
FIGURE 3.3: ILLUSTRATION OF SAMPLE WORDS IN NORMAL ENGLISH AND
PITMANS SHORTHAND ....................................................................................................... 45
Consonant neighbourhoods
F, V
P,B
T, D
TH, th
S, Z
vowel neighbourhood
at the beginning of an outline
in the middle of an outline
at the end of an outline
circle neighbourhood
close circles
unclose circles
hooks
FIGURE 3.4: SAMPLE NEIGHBOURHOODS PREDEFINED IN THE NEAREST

NEIGHBOURHOOD QUERY APPROACH ......................................................................... 47
FIGURE 3.5: SAMPLE OUTPUT PRODUCED BY THE NEAREST NEIGHBOURHOOD
QUERY...................................................................................................................................... 48
viii
FIGURE 3.6: SAMPLE OF PHONEME TRANSLATION OF A DOUBLE LENGTH STROKE
.................................................................................................................................................... 51
FIGURE 3.7: (A) SAMPLE INPUT OF PHONEME ORDERING PROCESS (B) SAMPLE
OUTPUT OF PHONEME ORDERING PROCESS ............................................................. 52
FIGURE 3.8: SAMPLE ELEMENT OF A PHONETIC LEXICON IN A HASH TABLE............. 54
FIGURE 3.9: SAMPLE COLLECTED OUTLINES........................................................................ 55
FIGURE 3.10: THE DISTRIBUTION OF HOMOPHONES IN DIFFERENT SIZED
PHONETIC LEXICONS .......................................................................................................... 55
FIGURE 3.11: ILLUSTRATION OF THE INCIDENCE OF PHONEME VARIATION DUE TO
CONFUSION BETWEEN A CIRCLE AND A HOOK .......................................................... 59
FIGURE 3.12: ILLUSTRATION OF THE INCIDENCE OF PHONEME VARIATION DUE TO
LENGTH CONFUSION ........................................................................................................... 60
FIGURE 4.1: AN ABSTRACT VIEW OF THE WHOLE SYSTEM .............................................. 63
FIGURE 4.2: ILLUSTRATION OF BAYESIAN NETWORK BASED WORD

TRANSCRIPTION ................................................................................................................... 64
FIGURE 4.4 ILLUSTRATION OF THREE PAIRS OF SIMILAR OUTLINES GROUPED IN AN
OUTLINE MODEL.................................................................................................................... 66
FIGURE 4.5 LIFE CYCLE OF OUTLINE MODELS ........................................................................ 67
FIGURE 4.6 ILLUSTRATION OF DIFFERENT CHRONOLOGICAL WRITING ORDER OF
NORMAL ENGLISH AND PITMANS SHORTHAND .......................................................... 68
FIGURE 4.7 ILLUSTRATION OF UNIQUE NODES OF AN OUTLINE MODEL ......................... 69
FIGURE 4.8 ILLUSTRATION OF STEP BY STEP CREATION OF OUTLINE MODELS ............ 71
FIGURE 4.9 SAMPLE TRAINING DATA FOR THE WORD BAKE PROCESSED BY THE
RECOGNITION ENGINE; THE ITALIC TEXT ON THE RIGHT EXPLAINS WHAT EACH
LINE OF DATA REPRESENTS ............................................................................................... 72
FIGURE 4.10 ILLUSTRATION OF CONDITIONAL DEPENDENCY OF VARIABLES IN AN
OUTLINE MODEL USING THE BAYES BALL ALGORITHM [SR98]. IF THERE IS NO
FLOW OF A BALL FROM A TO B IN A GRAPH, A AND B ARE CONDITIONALLY
INDEPENDENT GIVEN A SET OF OBSERVED OR HIDDEN VARIABLES X AND VICE
VERSA. ...................................................................................................................................... 74
FIGURE 4.11 ILLUSTRATION OF OUTLINE MODEL SELECTION STRATEGIES................... 86
ix
FIGURE 4.12: SAMPLES OF THE STROKE COMBINATION DATA SET .............................. 87
FIGURE 4.13: TWO DIFFERENT SHORTHAND OUTLINES FOR THE WORD AFTER; (A)
THE WORD AFTER IS WRITTEN ACCORDING TO THE DIRECT CONVERSION
OF PHONEMES INTO PRIMITIVES (B) THE WORD AFTER IS WRITTEN
ACCORDING TO THE DOUBLE-LENGTH RULE OF PITMANS SHORTHAND ......... 88
FIGURE 4.14: SCREEN SHOT OF OUTLINES WRITTEN BY WRITER A ............................. 89
FIGURE 4.15: EVALUATION OF THE VOCALISED OUTLINE IDENTIFICATION OF THE
RECOGNITION ENGINE........................................................................................................ 90
FIGURE 4.16: EVALUATION OF THE SEGMENTATION ACCURACY OF THE
FIGURE 4.17: EVALUATION OF THE CLASSIFICATION ACCURACY OF THE
FIGURE 4.18: ILLUSTRATION OF A RELATIONSHIP BETWEEN RECOGNITION
ACCURACY AND TRANSCRIPTION ACCURACY OF THE SINGLE CONSONANT
DATA SET................................................................................................................................. 95
FIGURE 4.19: COMPARISON OF THE HANDWRITING OF TWO WRITERS....................... 96
FIGURE 4.20: ILLUSTRATION OF THE WORD TRANSCRIPTION ACCURACY OF THE
SINGLE CONSONANT DATA SET ...................................................................................... 96
FIGURE 4.21: ILLUSTRATION OF THE CORRECTION ACCURACY IN COMPARISON
WITH THE CLASSIFICATION OR VOWEL ERRORS OF THE SINGLE CONSONANT
DATA SET................................................................................................................................. 98
FIGURE 4.22: ILLUSTRATION OF AN AVERAGE DISTRIBUTION OF FACTORS
INFLUENCING THE ACCURACY OF A RESULT LIST (SINGLE CONSONANT DATA
SET) ........................................................................................................................................... 99
FIGURE 4.23: ILLUSTRATION OF THE RELATIONSHIP BETWEEN RECOGNITION
ACCURACY AND TRANSCRIPTION ACCURACY OF THE STROKE-COMBINATION
DATA SET............................................................................................................................... 100
FIGURE 4.24: ILLUSTRATION OF THE WORD TRANSCRIPTION ACCURACY OF THE
STROKE-COMBINATION DATA SET................................................................................ 101
WITH THE CLASSIFICATION/VOWEL ERRORS OF THE STROKE COMBINATION
DATA SET............................................................................................................................... 102
INFLUENCING THE ACCURACY OF A RESULT LIST (STROKE-COMBINATION
DATA SET) ............................................................................................................................. 103
FIGURE 4.27: RELATIONSHIP BETWEEN RECOGNITION ACCURACY AND
TRANSCRIPTION ACCURACY OF THE SPECIAL-RULE DATA SET ........................ 104
FIGURE 4.28: EVALUATION OF THE WORD TRANSCRIPTION ACCURACY OF THE
SPECIAL-RULE DATA SET................................................................................................. 105
WITH CLASSIFICATION OR VOWEL ERRORS OF THE SPECIAL-RULE DATA SET
.................................................................................................................................................. 106
INFLUENCING THE ACCURACY OF A RESULT LIST (SPECIAL-RULE DATA SET)
.................................................................................................................................................. 107
FIGURE 5.1: (A) SAMPLE ENTRIES OF A CONVENTIONAL PITMANS SHORTHAND
DICTIONARY AVAILABLE IN BOOK FORMAT (B) SAMPLE ENTRIES OF AN
ELECTRONIC PITMANS SHORTHAND LEXICON ........................................................ 112
FIGURE 5.2: SAMPLE KEYS OF THE ELECTRONIC PITMANS SHORTHAND LEXICON;
VOWELS ARE UNDERLINED ............................................................................................. 115
FIGURE 5.3: SAMPLE ENTRIES OF THE ELECTRONIC PITMANS SHORTHAND
LEXICON................................................................................................................................. 116
FIGURE 5.4: ILLUSTRATION OF THE CONVERSION PROCEDURE ................................. 119
FIGURE 5.5: ILLUSTRATION OF THE USE OF A DOT PRIMITIVE FOR THE SOUND COM
AT THE BEGINNING OF A WORD .................................................................................... 123
FIGURE 5.6: ILLUSTRATION OF THE USE OF NEGATIVE PREFIX IR- IN A VOCALISED
OUTLINE................................................................................................................................. 124
FIGURE 5.7: ILLUSTRATION OF THE USE OF PL HOOK IN A VOCALISED OUTLINE .. 125
FIGURE 5.8: ILLUSTRATION OF A ONE SYLLABLE HALF-LENGTH OUTLINE ............... 126
x
FIGURE 5.9: ILLUSTRATION OF THE OMISSION OF THE SYLLABLE TER IN A
VOCALISED OUTLINE ......................................................................................................... 126
FIGURE 5.10: ILLUSTRATION OF INCOMPATIBLE PRIMITIVE PAIRS FOR DOUBLING
.................................................................................................................................................. 127
FIGURE 5.11: SAMPLE ENTRIES OF A MACHINE-READABLE PITMANS SHORTHAND
LEXICON................................................................................................................................. 128
FIGURE 5.12: AVERAGE ACCURACIES OF DIFFERENT SIZES OF MACHINEREADABLE PITMANS SHORTHAND LEXICONS.......................................................... 129
FIGURE 5.13: TWO DIFFERENT OUTLINES FOR THE WORD WEATHER; (A) THE
WORD WEATHER IS WRITTEN ACCORDING TO THE DOUBLE-LENGTH RULE OF
PITMANS SHORTHAND; (B) THE WORD WEATHER IS NOT WRITTEN
ACCORDING TO THE DOUBLE-LENGTH RULE OF PITMANS SHORTHAND ....... 131
FIGURE 5.14: (A) SHORTHAND OUTLINE FOR THE WORD FACTOR; (B)
SHORTHAND OUTLINE FOR THE WORD FURTHER................................................ 131
FIGURE 5.15: TWO DIFFERENT SHORTHAND OUTLINES FOR THE WORD UNION . 132
FIGURE 5.16: TWO DIFFERENT OUTLINES FOR THE WORD LANDLORD .................. 132
FIGURE 5.17: TWO DIFFERENT OUTLINES FOR THE WORD ENVIRONMENT........... 133
FIGURE 5.18: THE DISTRIBUTION OF DIFFERENT CATEGORIES OF ERRORS IN
ELECTRONIC PITMANS SHORTHAND LEXICONS OF DIFFERENT SIZES ........... 133
FIGURE 5.19: THE DISTRIBUTION OF UNIQUENESS OF THE ELECTRONIC PITMANS
SHORTHAND LEXICONS ..................................................................................................... 134
FIGURE 6.1: SAMPLES OF PITMANS SHORTHAND OUTLINES WRITTEN IN THREE
DIFFERENT POSITIONS; (A) OUTLINES WRITTEN INCLUDING VOWEL
NOTATIONS, (B) OUTLINES WRITTEN WITHOUT VOWEL NOTATIONS ................ 140
FIGURE 6.2: ILLUSTRATION OF THE HANDWRITTEN PITMANS SHORTHAND PHRASE
LEVEL TRANSCRIPTION PROCESS................................................................................ 141
FIGURE 6.3: AN ABSTRACT VIEW OF THE OBJECT MODEL MICROSOFT.INK.......... 144
FIGURE 6.4: SCREEN SHOTS OF THE RECOGNITION RESULTS PRODUCED BY THE
RECOGNISERCONTEXT API .......................................................................................... 145
FIGURE 6.5: PERFORMANCE OF THE CONTEXTUAL REJECTION STRATEGY ........... 146
FIGURE 7.1: FRONT-END AND BACK-END ARCHITECTURE OF THE SYSTEM ............ 149
FIGURE 7.2: ILLUSTRATION OF INTERACTIONS BETWEEN USER INTERFACES AND
BACK-END ENGINES OF THE SYSTEM ......................................................................... 150
FIGURE 7.3 ILLUSTRATION OF THE TABLET PC PLATFORM APIS PRESENTED AT
[REF]........................................................................................................................................ 152
FIGURE 7.4: ILLUSTRATION OF THE HIGH LEVEL RELATIONSHIP OF OBJECT
MODELS OF THE TABLET PC PLATFORM APIS ........................................................... 154
FIGURE 7.5: HOME PAGE OF THE TRAINING DATA COLLECTOR ................................... 155
FIGURE 7.6: SAMPLE DATA ENTRY PAGE OF THE TRAINING DATA COLLECTOR GUI
.................................................................................................................................................. 156
FIGURE 7.7: SCREEN SHOT OF THE DEVELOPER GRAPHICAL INTERFACE .............. 158
FIGURE 7.8: THE FIRST VERSION OF THE COLLABORATORS TABLET PC INTERFACE
FOR THE HANDWRITTEN PITMANS SHORTHAND RECOGNITION SYSTEM...... 160
FIGURE 7.9: THE LATEST VERSION OF THE COLLABORATORS TABLET PC
INTERFACE FOR PITMANS SHORTHAND RECOGNITION SYSTEM...................... 161
FIGURE 7.10: SCREENSHOT OF A NOTE-PAD LAYOUT OF THE END-USER
INTERFACE OF THIS RESEARCH.................................................................................... 163
FIGURE 7.11: SCREENSHOT OF AN ALTERNATIVE LAYOUT OF THE END-USER
INTERFACE OF THIS RESEARCH.................................................................................... 164
FIGURE 7.12: THUMBNAILS OF THE FOUR GUIS EVALUATED IN THE EXPERIMENT 165
xi
FIGURE 7.13: THE GENERAL DISTRIBUTION OF USER FONDNESS FOR THE PRESENTED
PROTOTYPES......................................................................................................................... 166
FIGURE 7.14: THE DISTRIBUTION OF USER FONDNESS FOR THE PRESENTED
PROTOTYPES IN THE CASE OF SPEED WRITING ..................................................... 167
FIGURE 7.15: THE DISTRIBUTION OF USER FONDNESS FOR THE PRESENTED
PROTOTYPES IN THE CASE OF A SMALL AMOUNT OF TEXT ENTRY INTO
HANDHELD DEVICES.......................................................................................................... 168
FIGURE 7.16: THE COMPARISON OF THE MOST FAVOURITE GUI OF EXPERIENCED
SHORTHAND WRITERS AND THAT OF NOVICE SHORTHAND WRITERS ............ 169
1. Linguistic Post Processing of Handwritten Pitmans Shorthand
Linguistic Post Processing of Handwritten Pitmans

Shorthand
Chapter 1 Introduction
Recently, there has been a dramatic growth in the use of handheld devices as powerful
appliances to collect and distribute information efficiently.
Examples are provided by
companies and organizations worldwide who are implementing mobile business solutions to
accelerate business cycles, increase productivity and reduce operating costs by the use of
mobile phones, tablet PCs, pocket PCs and Personal Digital Assistants (PDAs). Current
handheld computers are applicable to daily business procedures, however the ultimate
usefulness of these handheld devices depends on a solution to a serious bottleneck: textual
information needs to be entered as quickly and accurately as possible, similar to using a full
size keyboard. Computers continue to get smaller and thinner with the thinnest tablet PC
recently launched by NEC merely 1 cm thick and weighing less than 1Kg at the time of
writing. The transformation of a standard QWERTY keyboard into these compact devices
has not been so effective; miniature keyboards make text entry very slow at less than 10
words per minute (wpm) [Mt98].
This bottleneck has been a major concern for manufacturers of handheld devices and
decades of research and development have been invested in inventing a feasible means of
text entry into mobile devices, resulting in commercial systems with four main types of text
input methods: (a) on-screen keyboards, (b) handwriting recognition systems, (c) gesture
based text entry systems, and (d) speech recognition systems. The existing systems meet the
fundamental requirement of inputting text into handheld devices, but a solution to practical
rapid text input into handheld devices still remains to be done.
This dissertation presents work on the research, design, implementation and evaluation of
techniques that facilitate rapid text entry into a pen based computer, approximately at the
same rate as speech (i.e., more than 100 words per minute). It is based on Pitmans
Shorthand, which is a speed-writing mechanism widely practiced in the real time reporting
community.
This chapter gives an overview of the linguistic post processing system of a handwritten
Pitmans Shorthand recognizer. It mainly highlights the motivation and scope of the work.
It also outlines the general objectives of the thesis and draws attention to the authors
contribution to achieve each objective. A synopsis of the thesis that explains the structure of
the dissertation along with a brief summary for each chapter is given at the end of the
chapter.
1.1 Background
1.1.1 Collaboration
Research in this thesis has been carried out in close cooperation with Nanyang
Technological University (NTU) in Singapore to the extent that a team from NTU
contributed to the research and development of low level classification of handwritten ink
data, and a team from the University of Nottingham contributed to transliteration of
classified primitives into English words. The collaboration has been a great success with
several workshops carried out at NTU annually as well as with a series of co-authored
publications [HHL+04a], [HHL+04b], [HHL+04c], [YLH+04a], [YLH+04b], [HHL+05a],
[HHL+05b], [HHL+05c], [YLH+05a], [YLH+05b], [YLH+05c]. In addition, concurrent
development of the two engines (i.e., recognition and transcription engines) has not been
difficult, mainly due to the accessibility of the classified data of the recognition engine since
the start of the project. This is because the collaborator has already carried out extensive
research on the low level segmentation and classification of handwritten Pitmans Shorthand
outlines for over two decades and the collaborators contribution to this research is, in fact,
improving an existing recognition engine rather than developing a completely new one.
Previous work done by our collaborator can be referenced in [Lg84], [LD84], [LDB84],
[LDB85], [LD86], [LD87], [QL89], [LQ89], [Lg89], [Lg90], [LQ90], [QL91], [NL92],
[LQ92], [QL93]. The transcription engine and work described in this thesis is, however,
new.
1.1.2 Motivation
The major motive behind this research has been to investigate the linguistic post processing
of handwritten Pitmans Shorthand as a rapid means of text entry on handheld devices and
an evaluation of the overall performance via a tablet PC based demo system. This involves
data pre-processing, lexicon preparation, word level interpretation, phrase level
interpretation and the development of a Graphical User Interface (GUI). No earlier work
fully presents a handwritten Pitmans Shorthand recognizer for handheld devices with a
complete GUI.
One of the factors that holds the potential for the automatic recognition of handwritten
Pitmans Shorthand is due to the language itself being simple and fast to write. Pitmans
Shorthand records speech phonetically and comprises simple notations of 24 consonants, 12
vowels, and 4 diphthongs. It defines 90 of the most frequently used words as shortforms
(i.e., single simple pen strokes invented for speed improvement purposes) and these 90
shortforms account for over 37% of the most commonly used English words [Lg90].
Taking into consideration a simultaneous verbatim written transcription of speech, the

automatic recognition of handwritten Pitmans Shorthand should not be seen as an option,
but as a necessity for mobile rapid note takers. Regardless of the portability and efficiency
of handheld devices, todays mobile rapid note-takers (e.g, stenographers) still retain a
traditional way of writing shorthand with a paper notepad and a pencil as their tablet PCs or
Personal Digital Assistants (PDAs) are not productive enough to record speech in a real
time.
In addition, having a cooperative research network provides a firm foundation on which this
research can be based. The linguistic post processing of handwritten Pitmans Shorthand
can be taken as a further step of expanding what is already possible with a Pitmans
Shorthand classifier, as reported in the literature. The classifier supports the noise reduction,
and outlines segmentation and classification of pattern primitives into related categories. It is
a low level processing tool and its output is fed directly to the transcription engine.
Finally, hardware and technical viability played an important role in the successful
development of the whole research.
Compared to the time of the previous research,
handheld devices have become more easily accessible with a more powerful engine but at a
cheaper price. A number of mobile PC and tablet PC development tool kits have become
available and these factors have strengthened the feasibility of the research.
1.1.3 Scope
From a handwriting recognition perspective, this research relates to an online recognition1.
It includes a minimal study of the low level processing of handwritten scripts, with a deep
research into the transliteration of shorthand primitives into orthographic English words.
This incorporates theories and techniques of pattern recognition, natural language processing
and mobile PC applications. Figure 1.1 illustrates a high level view of the scope of the
thesis.
In an on-line recognition, an input is in the form of successive points of strokes collected in
time order; whereas in the off-line recognition, an input is in the form of a digital image of
handwritten word.
Syntactic
knowledge
Handwriting Online
Pattern
recognition handwriting
recognition
recognition
Lexical
semantic
knowledge
Statistical
language
model
Natural
language
processing
Tablet PC
applications
Pen based
PC
applications
Legend
Handheld
device
applications
Scope of the thesis
Figure 1.1: Scope of the thesis
Three areas have been investigated in the field of pattern recognition. The first one is
concerned with setting protocols to interrelate a linguistic post processor with a low level
classification engine. Without the successful integration of these two engines, work in this
thesis would not have been feasible. The second one consists of defining a network model
that not only best represents the natural ambiguity of handwritten Pitmans Shorthand, but
also produces promising output for a written word. The third area is focused on investigating
relevant word rejection strategies in which the interpretation cost has been taken into
account, mainly in terms of its search time and storage requirements.
In the field of natural language processing, a substantial amount of work has been done in
the construction of a shorthand lexicon that is used to support word level transcription. This
mainly includes the application of rule based algorithms to simulate instinctive knowledge
gained from learning Pitmans Shorthand and the creation of a shorthand dictionary based on
this knowledge. In addition, a survey on the impact of statistical language modelling in
handwriting recognition has been carried out in relation to phrase level transcription.
In the field of mobile PC applications, three types of end user interfaces have been
developed in this research; (1) a Training Data Collector, (2) an Advanced User Controller
and (3) a Final User Interface. By using the Training Data Collector, a vast amount of
training data can be collected effectively, and by using the Advanced User Controller, a
developer can have deep insight into the structure of the system, thereby enabling him/her to
make changes to the low level parameter settings. Similarly, by using the Final User
Interface, a user can have a front-end view of the system and can practice real time
shorthand input into handheld devices. The development of the interfaces includes the
application of pen based APIs, analysis of parameters of the transcription engine, collection
of training and testing data, and evaluation of the overall system performance.
1.2 Brief Overview

1.2.1 General Objectives and Contributions
The aim of this research is to propose and evaluate a set of techniques to significantly
improve the transcription accuracy of a handwritten Pitmans Shorthand recognizer and
deliver a commercially viable and functional prototype. In order to enable the reader to gain
a brief overview of this research, the following questions and answers are provided in which
the questions represent general objectives of the research and the answers highlight the
authors contribution to achieve the objectives.
A set of questions relating to the system integration and configuration:
How effectively has the Pitmans Shorthand linguistic post processor integrated with
the collaborators low level recognition engine under the condition of developing the
two engines in different countries?
The solution includes an extensive collaboration between the two teams including
the authors annual visits to the partners institution, setting protocols on the data
flow and modification of components between the two systems, concurrent
evaluation of the whole system on both sites, and co-authoring the publication of
progress reports.
What are the tasks of the recognition engine and the transcription engine in general?
A high level view of the tasks of the recognition and transcription engines are shown
in Figure 1.2. The white boxes at the top of Figure 1.2 represent processes of the
recognition engine and the shaded boxes represent tasks taken by the transcription
engine. The sample input outlines in Figure 1.2 illustrate the functions of the
recognition and transcription engines.
A set of questions relating to the linguistic post processing:
To what extent is the linguistic post processor based on the previous work?
The linguistic postprocessor is based on the recognition engine developed by our

research collaborator and the recognition engine has inherited a majority of the
pervious work reported in the literature. Apart from the recognition components, the
remaining transcription components are based on completely different approaches
rather than the ones reported in the literature. Reasons for using the new approaches
are discussed in detail in Chapter 3.
Tablet PC based graphical user interface
Data
collection
Segmentation
Classification
Word level
transcription
Phrase level
transcription
1st sample input outline

collected data
segmented data
classified data
transcribed word(s)
result word(s)
worn
warm
storm
x1, y1
x2, y2
x3, y3
x4, y4
.
.
worn
Segment 1
3 possible types of
Segment 1.
Coordinates of a
pen
2nd sample input outline
Collected data
segmented data
classified data
transcribed word(s)
Sudden
Welcome
Seldom
x1, y1
x2, y2
x3, y3
x4, y4
.
.
result word(s)
Warm
Segment 1
Coordinate of a
pen
3 possible types of
Segment 1.
Legend
Process included in our collaborators
recognition engine
Process included in the authors
transcription engine
Data flow
Figure 1.2: A high level view of the scope of the recognition engine and the
transcription engine
What new approaches are there in this linguistic post processing research?
The Significant new approaches of this thesis are:
Welcome
(a) An electronic version of a shorthand dictionary has been successfully created by

using rule based algorithms. A similar kind of Pitmans Shorthand dictionary in eformat was never present in the past.
(b) Bayesian Network based outline models have been proposed with the aid of an
electronic shorthand dictionary and training data. These outline models well
represent the distribution of the natural parameters of handwritten Pitmans
Shorthand and increase word transcription accuracy.
(c)
A complete framework for the online recognition of handwritten Pitmans
Shorthand is reported in this thesis. Whereas, most of the work in the literature
emphasized only an initial segmentation and classification of shorthand primitives.
(d) The first tablet PC based demo system has been produced. This allows a future
researcher to have deep insight into the performance of the recognition and
transcription engines via functional interfaces. It also enables an end user to input
shorthand into a handheld device.
A set of questions relating to the development of a mobile PC application:
For what types of handheld devices is the system intended to be applicable?
The system is intended to be applicable for any pen based mobile device in which
the use of a traditional QWERTY keyboard is impractical. Experiments and
evaluations of this thesis are based on tablet PCs with Microsoft Windows XP
Tablet PC Edition 2005.
For what kind of domain/scenario is the application aimed to be applicable?
The application is aimed to be applicable:
10
(a) as a rapid text input system on handheld devices

e.g., typing a text message on a mobile phone, inputting rich text information into
PDA.
(b) as a useful tool for stenographers in a real time verbatim written transcription of
the spoken word
e.g., taking a memo in a meeting, taking verbatim legal records of speeches in a
court, providing real time subtitling services for the deaf and hard of hearing
community.
(c) as a real time language translation tool.
With additional configuration, the system can be applicable as a real time language
transliteration tool for an international traveller. For example, using shorthand a
person can write his/her question with English phonetics and immediately the
question is translated into, say, Japanese. This can resolve language barriers for
international travellers enabling them access to essential information. The language
translation feature needs an additional installation and configuration of third party
software and it is not included in the scope of this thesis.
A set of questions relating to training and testing of the overall system:
What kind of people are involved in the training and testing of the overall system?
In order to evaluate a realistic performance of the whole system, the training and
testing involve writers with different levels of skills in Pitmans Shorthand, different
genders and ages.
How is the whole system performance evaluated?
The overall system performance is evaluated under different criteria such as

unconstrained writing, independent users, different speeds of writing and different
11
levels of tidiness. The evaluation process also involves a list of practical concerns
such as usability, learning curve, popularity/commercial viability of the system.
1.3 Synopsis of the Dissertation

The research in this thesis combines theories and techniques from the fields of pattern
recognition, natural language processing and mobile PC application. It aims for a
commercially viable and functional prototype with a set of techniques that significantly
improve the transcription accuracy of a handwritten Pitmans Shorthand recognizer.
This chapter (Chapter 1) presents the motivation, scope, and background of the research. It
introduces the three main problem areas relating to the themes of the thesis, major objectives
and contributions.
Chapter 2 reviews key concepts in the areas of Pitmans Shorthand recognition, pattern
recognition and natural language processing. The focus in Pitmans Shorthand recognition is
on the evaluation of existing text entry methods into handheld devices, the study of Pitmans
Shorthand, and the review of existing approaches applied to the automatic recognition of
handwritten Pitmans Shorthand problems. The focus in pattern recognition is on the
analysis of the capabilities of commonly used graphical models to resolve natural
ambiguities of handwriting. Finally, the focus in natural language processing is on the
review of the Viterbi algorithm and statistical language modelling techniques used to
enhance the solution to the phrase level transcription problem.
Chapter 3 reports on a prototype that implements the architecture designs described in the
literature.
In particular, it expounds the phonetic based transcription of handwritten
Pitmans Shorthand outlines and presents the problems that need resolving. This chapter
12
primarily discusses whether the conventional phonetic based transliteration method is

efficient for the purpose of the thesis.
Chapter 4 presents the main architecture and design of a novel primitive based transcription
approach. Ambiguities of handwritten Pitmans Shorthand, in particular, stroke variations
and vowel omissions are resolved by introducing Bayesian Network based shorthand outline
models. The word interpretation includes outline models creation, belief propagation,
Bayesian Network based learning and model selection. The conceptual solution is shown to
improve the solution to the word level transcription problem.
Chapter 5 focuses on the generation of a novel machine-readable Pitmans Shorthand

lexicon, which is repeatedly applied to the primitive based transcription of handwritten
Pitmans Shorthand. The rule based conversion of a phonetic representation of a word into a
Pitmans Shorthand representation is presented. This involves the creation of an electronic
Pitmans Shorthand lexicon with the application of the writing rules of Pitmans Shorthand,
plus the evaluation of the proposed methods with different sizes of lexicons.
Chapter 6 proposes a Viterbi algorithm based framework to resolve the Pitmans Shorthand
specific phrase level transcription problem. The framework comprises Pitmans Shorthand
related contextual knowledge. Experimental results demonstrate the practical benefits of the
proposed framework.
Chapter 7 documents the roles of the graphical user interfaces of this research that are
designed for the developers authoring environment, the experimental users authoring
environment, and end-users authoring environment. Experimental results substantiate the
affirmative feasibility of the proposed interfaces.
13
This thesis supports the argument that the development of an automatic handwritten
Pitmans Shorthand interpreter is feasible and useful. Chapter 8 highlights the argument by
reviewing the dissertations key points, linking the results to the general objectives,
highlighting the contributions and presenting the perspective future work.
14
2. Background to the Automatic Recognition of Handwritten Pitmans Shorthand
Background
to
the
Automatic
Handwritten Pitmans Shorthand
15
Recognition
of
This chapter provides background information on the computer aided recognition and
interpretation of handwritten Pitmans Shorthand. It comprises of seven sections, outlined
as:
-
Evaluation of existing text input systems for handheld devices.
A brief overview of Pitmans Shorthand.
An overview of the automatic recognition of handwritten Pitmans Shorthand.
Handwriting recognition algorithms to improve word level transliteration.
Natural language processing algorithms to improve phrase level transliteration.
Pen interface application interfaces.
Summary.
2.1 Evaluation of Existing Text Input Systems for Handheld

Devices
After a brief review of text entry into current handheld devices, capabilities of available text
entry methods are evaluated, particularly, in comparison with those of a handwritten
Pitmans Shorthand recognizer in this section. Methods evaluated include:
-
On screen keyboard.
Cursive handwriting recognition system.
Gesture recognition system.
Speech recognition system.
16
2.1.1 On-screen Keyboards vs. a Handwritten Pitmans Shorthand

Recognizer
An on-screen keyboard is a virtual keyboard displayed on the flat display panel of a device
where text is entered by tapping a stylus on it serially, for instance, IBMs Touchboard for
Windows. The method provides an adequate means of interaction with computers; however,
it requires constant visual attention since keys are not physically sensitive to fingers. From
the aspect of human computer interaction, the use of handwritten data entry has been shown
to be more natural for entering text into handheld devices [Win05]. However, practical use
for one system over another still relies on the purpose of use and/or individual user
preference. If a user prefers a handwritten recognizer to an on-screen keyboard in general, a
commercially viable handwritten Pitmans Shorthand recognizer may be of great interest to
the user.
2.1.2 A Cursive Handwriting Recognizer vs. a Handwritten Pitmans

Shorthand Recognizer
A cursive handwriting recognition engine built in Microsoft Tablet PC Edition 2005
[Win05] is a well known handwriting recognizer at the time of writing. It is capable of
interpreting cursive script; however, efficiency of the system is restricted by the limited
speed of normal cursive writing (i.e., less than 40 words per minute). Using handwritten
data not in the form of normal longhand is likely to be a solution to overcome the slow input
problem. According to [Lg90], a Pitmans Shorthand writing system is an alternative to a
longhand writing system that can be practiced nearly at the same rate as speech (i.e., about
120- 180 words per minute (wmp)).
On the whole, Pitmans Shorthand presents a number of strengths that facilitate very rapid
writing, but it also presents a drawback in that it has a long learning curve, which includes
17
memorizing new phonetic symbols and pronouncing words based on a number of rules.
Having said that, there are millions of Pitmans Shorthand writers who have received
training in its use [Lg90] and most of them remark that it is worth learning although there is
a little frustration at the time of learning. Therefore, the automatic recognition of
handwritten Pitmans Shorthand is intended to be beneficial to a particular group of
stenographers, plus some interested users who are dedicated to achieve fast data entry using
handheld devices.
2.1.3 Gesture Based Text Entry Systems vs. a Handwritten Pitmans

In general, gesture based text entry systems provide a virtual keyboard that enables users to
make input gestures. A well known gesture based text entry system at the time of writing is
SHARK [ZK03], developed by IBM. It allows a user to input gestures with the aid of a
virtual, template keyboard and gradually trains the user to be capable of inputting gestures
without using the keyboard. In this way, SHARK eliminates the constant virtual attention
required for a virtual keyboard and produces fast data input. In order to enable the reader to
get a clear view of a gesture based text entry system, word entries into the SHARK system is
shown in Figure 2.1. Figure 2.1 (a) illustrates input for the word quick using a virtual
keyboard, and Figure 2.1 (b) illustrates input for the same word quick without using the
virtual keyboard.
On the whole, gesture based text entry systems facilitate a faster data input compared to
normal cursive handwriting recognizers; however, memorizing gestures of a substantial
number of words results in a very steep learning curve.
In general, Pitmans Shorthand recognition is similar to gesture recognition since both
interpret a series of lines into words and produce a fast data input. However, there is no
18
need to memorize every gesture of a word using a handwritten Pitmans Shorthand

recognizer since the construction of Pitmans outlines is based on a set of phonetic rules.
Starting
point
Starting
point
(a)
(b)
Figure 2.1: Illustration of text entry using SHARK system (a) The word quick is
written using ATOMIK keyboard layout (b) The word quick is written without using a
template keyboard
2.1.4 Speech
Recognition
Systems
vs.
Handwritten
Pitmans
In terms of efficiency and operational cost, speech recognition systems seem the most
outstanding compared to other data input methods because users can speak naturally as well
as rapidly (around 100 -120 words per minute) using speech recognition systems. An
example is given in the real time subtitling of TV programs, where speech is automatically
dictated into text and manual retranslation cost is reduced. A primary negative aspect of
speech recognition systems is that data must be spoken. On some occasions, it is not always
feasible to input data via voice, for instance, an automatic transcription of a noisy debate
using a speech recognition system is considerably difficult unless it is feasible to encourage
speakers to use microphones.
Therefore, this research proposes that it is reasonable to
develop a system that facilitates an alternative means to record speech without using speech
input.
19
2.2 Pitmans Shorthand: a Brief Overview

Pitmans Shorthand was first presented by Sir Issac Pitman in 1837 and it has two forms New Era and Pitman 2000. Research in this thesis is based on the latter one as Pitman 2000
is a modified version of New Era and offers more accurate transcription as well as a faster
learning curve.
Words are written as they are pronounced with the main feature of Pitmans Shorthand being
the simplicity of its notations. There are 24 consonants, 12 vowels and 4 dipthongs in
Pitmans Shorthand. A skeleton of a shorthand outline is formed by a combination of
consonant strokes, and the writing of vowels is optional. This means it is essential to write
the consonant strokes of a word, but vowel notations can be omitted when the writing needs
to be fast. There is no standard rule defined for the omission of vowels it varies widely
depending on a writers experience or an individuals inclination.
Due to the phonetic based formation of words, Pitmans Shorthand is easily adaptable to
multiple languages (15 languages to date). It is practiced as a speech-recording medium in
the real time reporting community at a practical rate of about 120-180 words per
minute[Lg90 ]. It is widely used in offices in the UK and is also taught in 74 other
countries. [Lg90]
Figure 2.2 illustrates 21 out of 24 basic Pitmans consonants with three diagrams, which can
be easily remembered. In order to understand notations of the diagrams, consider the leftmost stroke in Figure 2.2 (a). This stroke indicates that notations for phonemes /P/ and /B/
are
and
respectively. Note that the two notations are the same down-stroke with
different line thicknesses. Similarly, according to Figure 2.2 (b), notations of phonemes F
and V are
and
respectively.
20

M
T, D
P, B
P, B
S, Z
th, TH
K, G
F, V
SH, ZH
N, NG
(a)
(b)
(c)
Figure 2.2: Basic Consonants of Pitmans Shorthand as illustrated in [Oj95]
In addition to the 21 consonants in Figure 2.2, there are three additional consonants of
Pitmans Shorthand, which are /W/, /Y/ and /H/. These consonants are formed using hooks
and upstrokes as shown in Figure 2.3. Vowels and diphthongs are simple pen strokes and
are also illustrated in Figure 2.4.
Figure 2.3: /W/, /Y/, H/ consonants of Pitmans Shorthand
Vowel notations
Diphthong notations
Diphones notation
Figure 2.4: Vowels, diphthongs and diphones of Pitmans Shorthand
Words are constructed with consonant and vowel notations in Pitmans Shorthand and a
script containing both consonants and vowels is called a vocalized outline.
Samples of
vocalized outlines, including notations of vowels, diphones and diphthongs are illustrated in
Figure 2.5 .
21

vowel
diphones
diphthong
vowel
bait
go
radio
time
Figure 2.5: Illustration of vocalized outlines
By using basic notations, illustrated in Figure 2.2, Figure 2.3, Figure 2.4, a person can start
writing a shorthand outline that is phonetically correct, but not in complete accordance with
the special rules of Pitmans Shorthand. The special rules include 20 definitions, invented
for speed enhancement purposes and need to be remembered thoroughly if a person wants to
be a professional Pitmans Shorthand writer. Details about the special rules of Pitmans
Shorthand can be referenced in [Oj95] and one of the special rules is given as an example
here. In the example (Figure 2.6), the word play, comprising of three phonemes (/P/, /L/
and //) can be written phonetically using the basic notations of Pitmans Shorthand as
shown in Figure 2.6 (b). However, one of the special rules of Pitman shorthand is read: - if
a phoneme /P/ is followed by a phoneme /L/, the notation for /L/ is transformed into a small
hook and the small hook is attached to the beginning of /P/ stroke. Therefore, the word
play should be written in the form of Figure 2.6 (c) rather than in Figure 2.6 (b) although
the form in (b) is phonetically correct.
Phonemes
P
Basic notations
Outline for the word

play
Outline for the word

play
(a)
(b)
(c)
Figure 2.6: (a) Basic notations of Pitmans Shorthand (b) The word play is written
phonetically using basic notations (c) The word play is written using a special rule
of Pitmans Shorthand
22
In addition to vocalized outlines, other important components of Pitmans Shorthand are

short forms and phrases. In general, short forms and phrases account for over 40% of the
most commonly used English words [Lg90] and are key attributes for facilitating the
outstanding speed of Pitmans Shorthand. Examples of short forms and phrases are depicted
in Figure 2.7.
a/an
all
and
as/has
do
eye/I
have
(a)
you should not be
your company
I am not
(b)
Figure 2.7: (a) Samples of short forms (b) Samples of phrases
2.3 Automatic Recognition of Handwritten Pitmans Shorthand: an

Overview
The first investigation into the feasibility of using handwritten Pitmans Shorthand as a
means of verbatim transcription as an aid for the deaf was reported by Brooks and Newell
[BN81], [Bc85], [BN85].
Concurrent work [Lg84] investigated this idea in more detail and further work [LD86]
evaluated the enormous potential of the online recognition of handwritten Pitmans
Shorthand for the real time recording of speech (e.g., verbatim reporting of meetings and
court proceedings). In this approach, four main studies were carried out: (a) detection of a
consonant boundary in a whole outline, (b) classification of segmented consonant strokes,
(c) evaluation of a normal-length stroke confusing with half-length/ double-length strokes,
and (d) evaluation of various inclinations of horizontal and vertical strokes. In addition,
23
different classification algorithms were used to classify vocalized outlines and short-forms in
this approach. The best classification rate reported at the time was 14.5%.
In the early 1990s, extensive research was carried out to improve the recognition of
vocalised outlines and short-forms.
Leedham [Lg90] reported that using contextual
knowledge was the most feasible means to improve the recognition of short-forms, in which
the recognition was based on a template-matching algorithm.
In this approach, the
transliteration was carried out by firstly sorting classified pattern primitives into correct
linguistic order; secondly converting primitives into phonemes using a set of production
rules and finally converting phonemes into orthographic English words. The concept of a
machinography, that is, how to modify the original Pitmans notations to be ideally suited
for machine recognition was also addressed in this work.
In later work [LQ90], basic notations of Pitmans Shorthand were categorized into 89 basic
features and incorporated into a neural network.
Their approach could correct initial
classification errors and achieved a classification rate of 94.5%.
Concurrently, Leedham and Qiao [LQ90] carried out another experiment to evaluate the
classification performance using a fuzzy classifier. In this approach, the classification (90%
correctness) was achieved by interacting between segmentation and classification processes.
Initial classification errors were also corrected using knowledge of legal primitive-pairs in
this approach.
In 1993, Qiao and Leedham [QL93] took another innovative approach to classify segmented
primitives. Their new method allowed communication between bottom up processes (i.e.,
segmentation based classification) and top down processes (i.e., holistic classification) via an
interactive heuristic (IH) search schema. They reported that locating a boundary between
24
features without first recognizing a whole outline was difficult. The performance of their
work was 84% correct segmentation and 58% correct classification.
In the early 2000s, another research group [NB02] [KSN+03] [SKN+04] [KSN04] started
investigating off-line automatic recognition of handwritten Pitmans Shorthand. This group
concentrated more on the linguistic post-processing of classified primitives into orthographic
English words. Similar to Leedhams approach, phonetic based transcription using the same
concept of vowel ordering was implemented. Incidence of homophones (outlines which are
written the same but have different representations) was addressed in their work and the
filtering of homophones using domain based rejection strategies and context based rejection
strategies was investigated. They mentioned that an ordinary phonetic dictionary was not
adequate for generating text, and a modified dictionary, particularly designed for the
recognition of Pitmans Shorthand was necessary. On the whole, a major limitation of their
work was an impractical assumption about homophones i.e., an occurrence of only two
homophones for each word was considered.
In summary, work carried out by pervious research mainly emphasized the low level
segmentation and classification of shorthand primitives with little work reported on the
backend transliteration. This thesis proposes that further extensive research is required to
improve word level transcription as well as phrase level transcription. In order to achieve
this goal, first it is necessary to make a thorough evaluation of recent popular handwriting
recognition algorithms and natural language processing algorithms.
25
2.4 Handwriting Recognition Algorithms to Improve a Word Level

Transliteration
Work in this thesis considers the fundamental problem of interpreting shorthand strokes of
digital ink as text.
Here, features extracted from a shorthand outline already give a
reasonable separation of strokes and provide the related identity of the strokes.
It is
necessary to take into account the context of strokes to achieve a promising interpretation;
however, dealing with spatial context can easily become computationally intensive [BSH04].
For optimum text interpretation, it is practical to take a balanced consideration between
context and the low level ink information of strokes.
In the field of handwriting recognition, a common approach to handle variables (e.g., context
and observed ink information) is by embedding them into a probabilistic model and
discriminating between them based on resultant probabilities. Graphical models are
considered here. Graphical models are a marriage between probability theory and graph
theory [Jm99]. They consist of two kinds: undirected and directed models. Undirected
models have simple definitions of independence, whereas directed models have a more
complicated notion of independence [Mk98]. There is a huge uncertainty and complexity in
the word recognition of handwritten shorthand and directed models are more relevant to
represent features of shorthand as well as interdependencies between them. Popular directed
graphical models are: Hidden Markov Models (HMMs), Neural Networks and Bayesian
Networks. In general, these models belong to the same family examples are provided that
HMM is a kind of dynamic Bayesian Network; a Neural Network is a kind of input/output
HMM. The primary difference between them is the way variables are structured (i.e.,
topology) and the way interdependencies between variables are handled.
26
2.4.1 Hidden Markov Models (HMMs)

Hidden Markov Models represent hidden and observed states in terms of state variables,
which can have complex interdependencies [Mk98]. One of the tutorials on HMM [Rl89]
presents that The Hidden Markov Model is a finite set of states, each of which is associated
with a (generally multidimensional) probability distribution. Transitions among the states are
governed by a set of probabilities called transition probabilities.
A sample HMM model, representing an outline of Pitmans Shorthand is illustrated in

Figure 2.8. In the figure, an outline is divided into several slices - each slice represents a
segmented classified primitive, containing one discrete hidden node and one discrete
observed node.
Hidden node
S1
S2
S3
Si
Observed node
T1
T2
T3
Ti
Figure 2.8: A sample HMM model for a single outline of Pitmans Shorthand. At each
state i, i probability of a particular stroke Si to be type Ti is observed.
There are several kinds of HMMs depending on network topology: HMMs with a mixture of
Gaussian output, input-output HMMs and factorial HMMs. Details of these algorithms can
be found in the literature [Mk01], [Rl89].
In the field of pattern recognition many systems have applied HMMs examples include the
representation of utterances as HMMs for speech recognition [Sa04], [MS04]; the
representation of facial images (combinations of hair, forehead, eyes, nose and mouth) as
HMMs for face recognition [HSS02], [KKL03]; the representation of words as HMM for
handwriting recognition [GB04], [HLB00]; the representation of human motion as HMMs
for gesture recognition [CFH03], [KP01]; the representation of pen-gestures (e.g., writing
pressure and smoothness of a line) as HMMs for signature recognition [JBS05], [YWP95].
27
Generally, HMMs work extremely well for certain types of applications; however, the
Markov assumption itself, i.e., that the probability of being in a given state at time t only
depends on the state at time t-1 is not always appropriate for certain kinds of problems where
dependencies extend from other states [Rl89].
2.4.2 Neural Networks

Neural Networks are based on the structure of the brain and are designed to mimic a
biological counterpart of neurons [Ri93].
They consist of neurons (i.e., variables),
connected via weighted links where the weight specifies the strength of a particular
connection between one node to another. The use of Neural Networks has been
demonstrated in several pattern recognition applications [Ri93]. Similar to HMMs, Neural
Networks have been devised in different types including single-layer linear Networks,
threshold Networks, multilayer Networks and multilayer Networks with learning.
Input from the classifier
89 links form the

previous layer
89 links to the
following layer
bias
Figure 2.9: An individual cell a of Neural Network, modelled for the classification of
handwritten Pitmans Shorthand in [LQ90]
A Multilayer Neural Network designed for the transcription of handwritten Pitmans

Shorthand presented in previous research [LQ90] is illustrated in Figure 2.9.
In that
network, there are 20 layers and each layer (i.e., each segment) consists of 89 nodes,
representing the 89 basic Pitmans primitives. Only one node from each layer is capable of
activating the next layer and the activation is based on the competition among the nodes. A
major drawback of this model is an unnecessary consideration of a wide range of primitives
28
in each layer. In fact, by using the context of an outline and a shorthand dictionary, the
number of nodes required for each layer can be normalised.
2.4.3 Bayesian Networks

Word level transcription in this thesis mainly applies Bayesian Network architecture. A
Bayesian Network [Pj88] is a directed acyclic graph in which each node represents a
mutually exclusive and collectively exhaustive set of random variables, and links between
the nodes signify probabilistic dependencies between them. It has been a remarkable tool in
the domain of handwriting recognition for its outstanding ability to model natural ambiguity
(e.g., to model complex stroke relationships). In on-line handwriting recognition, stroke
relationships are usually robust against geometric variation and important for discriminating
characters of similar shapes [CK04].
In Pitmans Shorthand, stroke relationships mean occurrences of vowel notations and their
positions in a vocalized outline, and starting positions of the first consonant stroke i.e.,
whether it is written above, on or below a base line. An example of vowel dependency and
an example of positional dependency of the first consonant stroke are illustrated in Figure
2.10 (a) and (b) respectively. As shown in Figure 2.10 (a), a dot vowel written at two
different locations (i.e., beginning and end of a stroke) represent two different words in
Pitmans Shorthand. Similarly, two exact outlines written at two different starting positions
(i.e., above and below a base line) represent two different words in Figure 2.10 (b).
29
dot vowel at the end of a stroke

dot vowel in the middle of a stroke
aid
eat
(a)
The first consonant B (i.e.,
written on the base line
The first consonant B (i.e.,

) is
written above the base line
is
base line
bath
bathe
(b)
Figure 2.10: Illustration of stroke dependencies in Pitmans Shorthand (a) vowel

dependency (b) positional dependency of the first consonant primitive
A summary of Bayesian Networks is described in this chapter and implementation of the

network in association with problems of the word level transcription of handwritten
Pitmans Shorthand is discussed in detail in Chapter 4. In general, the basic concepts of
Bayesian Networks are discussed under the following sections:
Conditional independence.
Inference.
Learning.
2.4.3.1 Conditional Independence

In a Bayesian Network, an edge between nodes defines a dependency between variables. For
example, consider the event grass is wet and possible causes cloudy and rain. If
cloudy (C) becomes independent of wet grass (W) having an observed data, rain (R),
conditional independence between cloudy (C) and wet grass (W) can be indicated using a
series of arrows as shown in Figure 2.11.
30
Cloudy
Rain
Wet grass
Figure 2.11: C is conditionally independent of W given R
Another type of dependency in Bayesian Network is explaining away [WH93], in which

each variable is competing to explain observed data. For example, consider the event
grass is wet, and possible causes rain and water sprinkler. Figure 2.12, in contrast to
the case in Figure 2.11, illustrates that two independent nodes, Sprinkler (S) and rain (R),
become conditionally dependent when having the observed data, wet grass (W). The
converging arrows towards wet grass (W) in Figure 2.12 indicate that if the grass is wet
when it is raining, the probability of a sprinkler being on becomes automatically less
possible and vice versa.
Sprinkler
Rain
CPT of W node
W
Wet grass
S R
P(W= T)
P(W = F)
T
T
F
F
0.98
0.95
0.0
0.94
0.02
0.05
1.0
0.06
T
F
F
T
Figure 2.12: S is conditionally dependent on R given an observed data, W
Therefore, the state of a node being observed or hidden in a Bayesian Network has a huge
influence on the conditional dependency between variables. By using the Bayes ball
algorithm [Sr98], conditional independence between variables can be easily determined with
the information on a node being hidden or observed. The Bayes ball algorithm is illustrated
in Figure 2.13.
31
Legend
Hidden node
Direction of the flow of a ball
Observed node
Indication of parent-child relationship
Figure 2.13: Illustration of the Bayes Ball algorithm [Sr98]. If there is no flow of a ball
from A to B in a graph, A and B are conditionally independent given a set of observed
or hidden variables X and vice versa.
In addition, every node in the Bayesian Network needs to be specified with a Conditional
Probability Distribution (CPD) and a table holding these distribution values is called a
Conditional Probability Table (CPT). A sample CPT of a W node is shown in Figure
2.12. The table indicates the likelihood of grass getting wet with regard to whether a
sprinkler is on and/or whether it has rained.
2.4.3.2
Inference
Inference of Bayesian Network involves computing the probability distribution of a node

given the values of some other nodes. In other words, it is the process of finding the
likelihood of an explanation given evidence and priori probabilities. One of the reasons
32
Bayesian Networks are useful is because they permit a more efficient inference procedure
[Ja99]. Inference can be categorized into two types: exact and approximate.
Exact inference procedures are useful when a network structure is not too complex;
however, approximate inference procedures work better in practice when a model becomes
computationally complicated such as models with a repetitive structure or large clusters.
Examples of exact inference algorithms include a local message passing algorithm [Pj88],
[PS91] and a junction tree algorithm [HD96], [CDL+99]. Popular approximate inference
methods include Monte Carlo sampling methods [MD98], variational techniques [SJJ96],
[JGJ+98], [JJ98], and loopy belief propagation [WF99], [Wy00], [FW00].
2.4.3.3
Learning
Learning in Bayesian Networks often refers to learning parameters of a network as well as

the structure of the network.
In brief, parameter learning is an estimation of a conditional probability table (CPT) of each

node in a network based on a number of training samples. Here, the learning methods vary
widely depending on attributes of training samples i.e., whether they are (fully or partially)
observed, or whether they are (fully or partially) hidden. In general, there are three common
types of parameter learning methods Maximum Likelihood (ML), Maximum a Posterior
(MAP) and Expectation Maximization (EM).
In ML learning, the goal is to find the maximum likelihood of training data given N cases,
which are assumed to be independent. Assume that D = (D1, , DM) is a training data set
which contains M cases, the maximum (optimal) likelihood of each node can be denoted
as
33
arg max P( D | )
(2.1)
In MAP learning, Maximum a Posterior (MAP) estimation assumes the existence of a prior
p() over the parameters [Ja99]. It prevents the case of zero probability if a particular
parameter is never seen in the training samples by the use of a Dirichlet prior. The chance of
having zero probability in MAP is because the algorithm is based on counting. According
to the wet grass example in Figure 2.12, the MAP of the wet grass node including Dirichlet
prior can be denoted as:
PMAP (W w | S s, R r )
N (W w, S s, R r )
N ( S s, R r )
(2.2)
where N() is the number of times the corresponding parameters are found to be true or
false and , are uniform Dirichlet priors, used when a particular parameter is not seen in a
training set. In general, MAP is used if there are a small number of training cases compared
to the number of parameters [Mk01], however it is still important that the counts are based
on sufficient statistics to achieve an optimal estimation.
Expectation Maximization (EM) is mainly used when variables are partially observable i.e.,
the network contains some hidden nodes. It computes expected values of all the nodes after
(M step) training by using an inference algorithm, and then treats these expected values as
though they were observed (in E step) [Mk01]. Using EM, it is important to know the
structure of the model in advanced as this is the key to identifying any hidden nodes. In the
case of the wet grass example in Figure 2.12, the EM of a W node can be denoted as
PEM (W w | S s, R r )
E (W w, S s, R r )
E ( S s, R r )
(2.3)
where E() is the number of times corresponding parameters are expected to occur.
According to Murphy [Mk01], E() is computed as follows
34
E ( e) I ( e | Dm ) P ( e | Dm )
m
(2.4)
where I(e|Dm) is an indicator function which is 1 if a parameter e occurs in training case m,

and 0 otherwise.
2.5 Natural Language Processing Algorithms for Handwritten

Phrase Recognition
This section presents natural language processing algorithms relating to the field of
handwriting recognition. In particular, it focuses on the role of statistical language modelling
algorithms in handwritten sentence recognition systems.
2.5.1
Statistical Language Modelling
[MS99] stated that the major purpose behind statistical language modelling is to capture a
languages regularities via statistical inference on its corpus.
According to the literature
[QAC05], the concept of applying statistical language models to automatic text transcription
was initiated by speech recognition research.
The concept was then later adapted to
handwriting recognition problems, resulting in several handwriting recognition engines built

with statistical language modelling techniques.
For instance, recent work [PVM+03],
[Ms01], [QAC05] and [MB01], [ZB04], [VBB04] applied statistical language modelling
techniques to resolve the problems of online and offline handwritten sentence recognition,
respectively, and the work [QAC05] achieved up to 90.4% word recognition accuracy.
In general, the most commonly used statistical language models in the field of handwriting
recognition are n-gram models, which are denoted as follows by [QAC05]:
35

n
p (W ) p ( wi | wii1n 1 )
(2.5)
i 1
where p(W) is the probability of a word sequence given by a statistical language model, and
p ( wi | wii1n 1 ) is the frequency of the sequence wii1n1 occurring in a corpus.
By applying a statistical language model, [QAC05] proposed a solution to online

handwritten sentence recognition as follows:
= arg max p ( S | W ) p (W )
(2.6)
where is the most likely word sequence for a written sentence (out of the candidate
sequences W), S is a given handwritten sentence to recognise, P(S|W) is the posterior
probability of the written sentence S given a sequence W, and p(W) is the statistical language
models probability for the sequence W. This work identifies the most likely word sequence
for a written sentence by finding the best path in a word graph (i.e., a graphical model of a
sentences candidate words) using a Viterbi search algorithm [QAC05].
2.5.2
Viterbi Algorithm
The Viterbi algorithm provides an efficient way of finding the most likely state sequence in
the Maximum a Posterior (MAP) probability sense of a process, which is assumed to be a
finite-state discrete-time Markov process [Ml00]. Here, a finite state means that the number
of states in the model is limited, discrete-time means that it takes the same unit of time to get
from any state to its adjacent state in the model, and the Markov process means that
(assuming that it is a first order Markov process) the probability of being in state ck at time k
(given all states up to k-1) depends only on the previous states ck-1 at time k-1. [Ml00]
formulates the first order Markov process as follow:
36
p (c k | co , c1 ,..., c k 1 ) p (c k | c k 1 )
(2.7)
Overall, the Markov process can be of any order and the nth order Markov process is defined
as:
p (c k | c0 , c1 ,..., c k 1 ) p (c k | c k n ,..., c k )
(2.8)
In order to clarify the Viterbi algorithms role in handwriting recognition, consider the
Viterbi algorithm (formula 2.9) proposed by [Ml00] for handwritten word recognition. It is
assumed that the process is the first order Markov process in the algorithm.
g c ( Z ) log p ( z i | ci ) log[ p (c1 | c0 ) P(c 2 | c1 ) ... P(c n | c n 1 )]
(2.9)
i 1
where gc(Z) is the maximum posterior probability of the sequence of characters conditioned
on candidate characters sequence C = c1,c2,.., cn, and zi is a feature vector for the ith
character.
2.6 Pen Application Program Interfaces (APIs)

With the rapid popularity of pen based computing in recent years, a number of pen based
application program interfaces (APIs) have become widely available. One of the most
popular APIs for collecting, manipulating and recognizing digital ink is Microsoft Tablet PC
platform SDK APIs [Tab04], which mainly supports the Microsoft Tablet PC platform. The
APIs include functions to manipulate low level ink data as well as higher segment-level,
stroke-level, word-level and phrase-level recognition. Some of the stroke-level APIs are not
directly applicable to the current research as Pitmans Shorthand is not included in the
supported languages. Nevertheless, other APIs are highly useful for the development of ink
37
input and text output user interfaces.
Implementation of these APIs for the overall
recognition and transcription of handwritten Pitmans Shorthand is discussed in detail in

Chapter 7.
2.7 Summary
This chapter presented a literature review of systems and techniques relating to computer
aided recognition and transcription of handwritten Pitmans Shorthand. The commercial
viability of the handwritten Pitmans Shorthand recogniser is evaluated in comparison to the
functionalities of handheld devices existing text entry systems. The chapter presents basic
information on Pitmans Shorthand, which is vital to enable the reader to easily follow the
thesis discussions, and it also provides brief reviews of decades of previous work on the
automatic recognition of handwritten Pitmans Shorthand. A number of graphical models
applied to the pattern recognition field were discussed, with a thorough algorithm review of
the Bayesian Networks architecture, mainly from the aspect of the algorithms efficiency in
handling handwritten Pitmans Shorthand word recognition problems. The role of statistical
language models in the recognition of handwritten sentences has also been addressed,
together with a review of the Viterbi algorithm. The chapter also highlighted tablet PC
related application program interfaces (APIs) that are essential for the development of a
commercially viable prototype handwritten Pitmans Shorthand recogniser.
38
3. Evaluation of phonetic based transcription of vocalised handwritten Pitmans outlines
Evaluation of phonetic based transcription of vocalised

handwritten Pitmans outlines
39
The previous chapter reviewed the performance of existing work carried out on the
automatic recognition of handwritten Pitmans Shorthand and presented an overview of
popular pattern recognition algorithms that can be used to improve the performance of word
level and phrase level recognition. Before taking the next step to advanced word and phrase
recognition, this chapter first presents a preliminary experiment, carried out to verify
whether existing transliteration methods, proposed in the literature, are efficient enough for
the purpose of this project.
In particular, the primary goal of this preliminary assessment is to ensure whether it is

practical to convert segmented portions of shorthand outlines into phonetic values prior to a
text translation. Perhaps direct translation of segmented primitives of shorthand outlines
into English words is more efficient, however, such an attempt has never been reported
throughout two decades of previous work. It has been shown that a primitive to text
translation approach is robust against stroke variation [CK04] and the approach is applied in
several commercial handwriting recognisers [Hn97], [LY97], [HV93].
Taking into
consideration the transcription accuracy achieved by existing systems, this research is not
based on the assumption that phonetic based transcription is the only one absolute solution
to transliterate handwritten Pitmans Shorthand.
In addition, the direct translation of
primitives into words was not feasible at the time of previous work because there was no
electronic Pitmans Shorthand lexicon that enabled primitives to be directly mapped to
related words. If an electronic Pitmans Shorthand lexicon is in existence, direct translation
of primitives into text will become feasible. It is proposed in this research that it is
reasonable to create an electronic Pitmans Shorthand lexicon and analyse a primitive-to-text
translation approach. However, a careful appraisal of conventional methods is performed
before implementing a new algorithm.
Therefore, this chapter firstly analyses the
advantages and disadvantages of phonetic based translation via experimental results, and
40
then presents a discussion on why a primitive based transcription approach is preferable to

phonetic based transcription approach.
In general, appraisal of existing methods can be carried out easily if the existing systems
serve the purpose of the assessment directly. However, this is not the case in the current
assessment (i.e., assessment of conventional phonetic based transcription methods). There
are two reasons for this: firstly, previous work by [LQ90], [QL93], [LD86] mainly
emphasises low level pattern classification and the work presents just logical procedures of a
linguistic post processor with no detailed implementation for phonetic based word
translation.
Secondly, work by [NB02], [KSN+03], [SKN+04] emphasise offline
recognition and the systems there do not fit the objectives of the current experiment. As a
result, this chapter presents a prototype of a linguistic post processor that includes the
conventional idea of phonetic based word translation, plus novel pattern tuning algorithms,
which are effective in dealing with the shape variations of handwritten Pitmans Shorthand
.
3.1 System Overview

In order to assist the reader to get a clear understanding of the whole framework, an
overview of the transcription engine in combination with our collaborators recognition
engine is given (Figure 3.1). Ink data is collected by the recognition engine whose role is to
firstly differentiate between vocalized outlines and short forms. It then segments a vocalized
outline into the most relevant fragments by detecting dominant points along the outline. The
segmented primitives are then processed through a neural network classifier, and a ranked
list of pattern primitives, along with each of their related categories, is produced at the end of
the classification process. Short-forms are recognized separately from vocalized outlines
using a Template Matching Algorithm. Unlike the vocalized outline recognizer, the shortform recognizer immediately produces a ranked list of candidate words for a given short-
41
form. Detailed descriptions of the collaborators recognition engine can be referenced in

recent publications, [YLH+04a], [YLH+04b], [YLH+05a].
Input
Collaborators recognition engine
Transcription engine
Vocalised outline interpreter
Vocalised outline
recogniser
Short-form
recogniser
Short-form
interpreter
Pre-processing
Segmentation engine
(dominant point
detection)
Classifier
(Neural Network)
Template
matching
engine
Word level transcription
Output:
A ranked list of
words
Output:
A ranked list of
primitives
Output:
A list of
words
Phrase level transcription
Output
Text
Internet
Figure 3.1: An abstract view of the whole system
The role of the transcription engine is to find the best candidate word for a given vocalized
outline or short-form. It includes two major stages: word level transcription and phrase level
transcription. At the word level transcription, short-forms are not taken into account since
they have been interpreted into the most likely words by the recognition engine. Vocalized
outlines are transliterated into sets of English characters by two processes: pre-processing
and word recognition. These two processes are the primary components of the system
presented in this chapter. The pre-processor performs the setting up of essential lexical
knowledge relating to handwritten Pitmans Shorthand. The word recognizer then takes a
42
ranked list of classified primitives, which are forwarded from the recognition engine as
input, and produces a ranked list of candidate words as output.
After word recognition, candidate words of either a vocalized outline or a short-form are put
through a phrase level processor and the word with the highest contextual probability is
chosen as a correct representation for an input outline. The phrase level transcription is not
studied in this chapter since the primary purpose of the preliminary experiment is to analyse
word recognition performance.
3.2 Transcription of Vocalized Outlines Based on a Phonetic

Approach
A detailed view of a phonetic based vocalized outline interpreter is illustrated in Figure 3.2
and it consists of the following modules:
-
Lexicon preparation: converts a phonetic lexicon into a hash table such that similar
sounding words are indexed under the same key in order to cope with phonetic rules
of Pitmans Shorthand.
Nearest Neighbourhood Query: slightly adjusts segmented features of an input

shorthand outline in order to cope with shape-variation in handwriting.
Feature to phoneme conversion: converts geometrical features of shorthand outlines

into phonetic representation in order to match with a phonetic lexicon.
Phoneme ordering: reorders resultant phonemes, produced by a Feature to

phoneme conversion process into a linguistic sequence in order to match with a
phonetic lexicon.
Lexicon lookup: matches a series of phonemes with a phonetic lexicon to find

related English words.
43
Input outline
Vocalized
outline
recogniser
A ranked list of
classified primitives
Vocalized outline interpreter
Lexicon preparation
(Pre-processing)
Phonetic
lexicon
Nearest Neighbourhood Query

(NNQ)
Feature to phoneme conversion
Contextual
knowledge
Phoneme ordering
A ranked list
of words
Lexicon lookup
Legend
Sentence level
transcription
Output word(s)
Data
Data flow
Process
Read/Write access
Storage
Figure 3.2: Detailed view of a vocalized outline interpreter
3.3 Lexicon Preparation

The primary purpose of the lexicon preparation is to convert a phonetic dictionary into a
hash table data structure and categorise words with similar pronunciations under the same
key. Here, words with similar pronunciations mean words with either identical phonemes or
similar phonemes. For instance, the words bet and pet have similar pronunciations
because they contain similar phonemes with the only difference of a voiced consonant /B/
and an unvoiced consonant / P/.
A major benefit of keeping similar sounding words under the same key is to reduce the
search complexity at O(1). In addition, it enables the retrieval of a list of ambiguous words
for an input outline by a single lookup because the creation of a hash table for a lexicon is
44
based on the hypothesis: words with similar pronunciations resemble one another in
Pitmans Shorthand. One may question why similar sounding words resemble one another
in Pitmans Shorthand since this assumption is not true in normal English. In normal
alphabetical handwriting, two similar sounding words do not exactly need to look alike. An
example is given with the words tail and tale in (Figure 3.3); the two words sound alike,
but their scripts are dissimilar enough not to be confused.
Typed words in English
Handwritten words in English
Handwritten words in Pitmans Shorthand
tale
tail
Figure 3.3: Illustration of sample words in normal English and Pitmans Shorthand
In contrast to normal English, similar sounding words do look alike or are identical in
Pitmans Shorthand. This is due to the special rule of Pitmans Shorthand invented for
speed improvement purposes i.e., a pair of voiced and unvoiced consonants are written in the
same stroke with different line thicknesses. An example is given with the words tail and
tale again (Figure 3.3): the two words sound alike and their scripts look identical in
Pitmans Shorthand.
Therefore, keeping similar sounding words under the same root
directly affects search performance and an algorithm for the lexicon organisation is
presented below:
N: numbers of words contained in a phonetic lexicon
Xi: ith phonetic index of the phonetic lexicon
Yi: word data relating to Xi
table: a hash table used to store data of the phonetic lexicon
key: a phonetic key
value: word data to which a specified key is mapped in table
45
Initialisation
table = a hash table
Lexicon organisation
For i = 0 to N
key = Xi
Yi = getWordData(Xi)
//convert unvoiced consonants into voiced consonants
key = tuneToVoicedConsonants(key)
//if a phonetic key already exists
if (table.containsKey(key))
value = table.get(key)
value += Yi
end
else if (!table.containsKey(key))
value = Yi
end
table.put(key,value)
end
The lexicon preparation takes place when the transcription engine is run for the first time
and does not repeat when input outlines are transcribed in real time. If any modification of a
lexicon is required, such as a change of word-list or a change of users domain, the existing
hash table can be updated by repeating the lexicon preparation procedure.
Once the lexicon data is ready the next process, denoted as Nearest Neighbourhood Query
is invoked.
46
3.4 Nearest Neighbourhood Query (NNQ)

Consonant neighbourhoods
F, V
P,B
vowel neighbourhood
T, D
TH, th
S, Z
circle neighbourhood
at the beginning of an outline

in the middle of an outline
at the end of an outline
close circles
unclose circles
hooks
Figure 3.4: Sample neighbourhoods predefined in the Nearest Neighbourhood Query

Approach
The Nearest Neighbourhood Query (NNQ) is, in fact, a heuristic approach in which
misclassified pen strokes are adjusted according to the degree of similarity to other strokes.
Primitives with similar geometric features are predefined in the same neighbourhood and the
system comprises of seven neighbourhoods, where four relate to vertical and horizontal
strokes, one to circular primitives and the remaining two to dot and dash vowel-primitives.
Here, similarity means having similar angular structure for stroke primitives, having similar
shape for circular primitives or having similar location and shape for vowel primitives.
Samples of the predefined neighbourhoods are illustrated in Figure 3.4 and the Nearest
Neighbourhood Query algorithm is presented as follows:
{N1, N2, .., N7}: a collection of seven neighbourhoods
O: an input handwritten outline
I : number of segments of an input outline, O
Si: ith segment of an input outline, O
Pattern: a pattern category of Si
Xi: a resultant vector, containing a set of primitives that are similar to Si
R: an output vector, containing a set of Xi where (i = 1, 2,.., I)
M: a matrix, containing a number of outlines that are similar to O
Initialization
Initialize N1, N2, N3, N4, N5, N6, N7
X = a new vector
R = a new vector
47
Stroke adjustment
for i = 0 to I
//assign the ith segment of an input outline as a pattern category
Pattern = Si
for j = 0 to 7
//if the jth neighbourhood contains the value of Pattern
if (Pattern Nj)
//get all the elements of Nj excluding Pattern
Xi= Nj \ Pattern
end
end
R += Xi
end
Matrix = createMatrix(R)
return Matrix
Handwritten
outline
//output of NNQ algorithm
Sample output of the Nearest Neighbourhood Query

algorithm
Figure 3.5: Sample output produced by the Nearest Neighbourhood Query
The output of NNQ is a matrix of primitives, whereby each row represents a particular
shorthand outline that is similar to an input pattern and each column represents a certain
segment of the shorthand outline. A pictorial presentation of NNQ is given in Figure 3.5 in
48
which sample input and output of the algorithm can be clearly seen. Once the NNQ process
is completed the next process, Feature to Phoneme Conversion is invoked.
3.5 Feature to Phoneme Conversion

This process converts segmented portions of a shorthand outline (e.g., loops, hooks or
strokes) into a phonetic representation using a set of production rules. According to our
study, approximately 20% of segmented portions of shorthand outlines, either forwarded
from the recognition engine or produced by the NNQ, can be directly converted into basic
Pitmans phonemes. The remaining 80% need knowledge of adjacent primitives to be
translated into phonetic values based on a number of production rules. Similar to the work
by Leedham [Lg90], the production rules are applied with respect to a relationship between
an individual primitive and its adjacent primitives. Unlike Leedhams approach, rules are
applied in the order of priority in this novel system. Basically, there are five production rules
introduced in this new system and they can be stated in a descending priority order as
follows:
1. Feature Detection (FD)
2. Length Detection (LD)
3. Primitive Combination (PC)
4. Primitive Combination and Reverse Ordering (PCRO)
5. Direct Translation (DT)
To clarify the first two rules, consider the two examples described below and to
clarify the last three rules, refer to examples in
49
50
Table 3-2. In addition, basic notations of Pitmans Shorthand relating to each rule
can be looked at in Table 3-1.
Table 3-1: Relationship between the production rules and basic Pitman phonemes
Rule
Pitman phonemes
FD
SES, ZES circles, ST, STER loop, N, F, V, SHUN hook, suffix SHIP hook,
suffix ING/INGS dot
LD
MD, ND, suffix MENT, half length strokes, double length strokes
PC
W, Y, H
PCRO
PL, BR, etc., PR, BR, etc., FR, VR, etc., and FL, VL etc.
DT
All consonants except Y, W and H
Example 1: Application of Feature Detection (FD) Rule

STER large loop: Pitman uses a large loop to indicate the sound of /STER/ in the middle or
at the end of an outline. For this case, one of the FD rules reads: IF a stroke or curve
primitive is followed by a large circular loop primitive in the middle or at the end of an
outline, THEN the loop appends phonemes of /STER/ to the preceding phoneme.
Example 2: Application of Length Detection (LD) Rule

Double length curves: Normal Curve-primitives are doubled in length to represent the
addition of the syllables -TER, -DER, -THER and -TURE in Pitmans Shorthand. For this
case, one of the LD rules reads: IF a curve primitive is doubled in length, THEN the
double-length curve inserts phonemes of /TER/, /DER/, /THER/ and /TURE/ after the
phoneme of the curve. To understand this principle clearly, consider the example in Figure
3.6.
51
Double-length input outline

for the word after
Recognition output
1st primitive- double length /F/ or /V/ curve
2nd primitive- /A/ vowel
Apply double length rule of /TER/, /DER/,
/THER/, /TURE/
Output of Feature to Phoneme Conversion
Reference
Normal /F/
consonant
Output 1
/TER/+/A/ vowel
Output 2
/DER/+/A/ vowel
Output 3
/THER/+/A/ vowel
Output 4
/TURE/+/A/ vowel
Figure 3.6: Sample of phoneme translation of a double length stroke
As shown in the reference section of Figure 3.6, a normal downward curve represents a
phoneme /F/ in Pitmans Shorthand, however, when the curve is doubled in length, it
represents the sound /F/ plus additional sounds of /TER/, /DER/, /THER/ or /TURE/.
Therefore, a candidate list for the word after contains four different pronunciations at the
end of phoneme conversion (Figure 3.6).
3.6 Phoneme Ordering

The primary task of Phoneme ordering is reordering resultant phonemes, produced by the
Feature to Phoneme Conversion process. The reordering is required due to a special
writing order of Pitmans Shorthand i.e., consonants of a word are always written first and
vowel notations are written only after the completion of a whole consonant kernel. In online
handwriting recognition, pen data is collected in a series of time order and so vowelprimitives are always tagged behind consonant-primitives regardless of the linguistic order
in our system. In order to obtain correctly ordered phonemes, vowels need to be inserted
among consonants. Leedham [Lg90] proposed the same strategy to sort phonemes according
to the linguistic order and this process is denoted as Phoneme ordering.
52
An example for phoneme ordering is given in Figure 3.7 in which its sample inputs are
directly taken from outputs of the Feature to Phoneme conversion process, demonstrated
in Figure 3.6. As shown Figure 3.7(a) the vowel /A/ is detected last although it is the first
phoneme in the word after. The system uses dominant point information and sequence
information of ink data to place vowels at their correct positions. After phonemes have been
sorted into correct order, the resultant phonemes are matched up with a phonetic lexicon in
the next process, called lexicon lookup. Then a list of autographic English words that best
represent the input shorthand outline is produced at the end the search.
Input of the phoneme ordering process
Output of the phoneme ordering process
Input 1
/TER/+/A/ vowel
Input 2
/DER/+/A/ vowel
Output 1
/A/ vowel+/TER/
Input 3
/THER/+/A/
vowel
Input 4
/TURE/+/A/
vowel
Output 3
/A/
vowel+/THER/
(a)
Output 2
/A/ vowel+/DER/
Output 4
/A/
vowel+/TURE/
(b)
Figure 3.7: (a) Sample input of phoneme ordering process (b) sample output of
phoneme ordering process
53
Table 3-2: Phonemes translation using PC, PCRO or DT rules
Number
(a)
Pitman
outline
English
word
Word
Primitives
classified by a
recognition
engine
1.
Phonemes of an outline
- /W/ consonant
(Rule:
2.
= /W/)
3.
- /D/ or /T/ consonant
4.
- /AW/ vowel
Translation is based on the rule of primitive combination (PC). The rule applied to this
example is IF an upward diagonal stroke is preceded by a small anti-clockwise hook,
THEN the combination of these two primitives denotes the phoneme /W/
(b)
Printed
1.
- /PR/ or /BR/
2.
(Rule:
3.
- /N/ curve
= /PR/ or /BR/)
- /T/ or /D/ consonant
4.
5.
- // vowel
6.
- // vowel
Translation is based on the rule of primitive combination and reverse ordering (PCRO).
The rule applied here is IF a small hook is followed by a straight downward stroke, the
small hook is converted into phoneme /R/ and swapped with a succeeding phoneme.
(c)
Go
1.
2.
- /G/ or /K/ consonant

- // vowel
Translation is based on the rule of direct translation (DT).
The rule applied to this
example is IF a horizontal stroke is written from left to right, THEN the stroke directly
denotes the phoneme /G/ or /K/.
54
3.7 Experimental Results

The preliminary experiment described in this chapter is categorized into two main studies:
firstly, statistical analysis of homophones (words which look similar but have different
representations) in a phonetic lexicon and secondly, performance evaluation of the word
level transcription of the system prototype.
3.7.1 Data Set

For a statistical analysis of a phonetic lexicon, a list of the 5000 most frequently used
English words, extracted from the Brown Corpus is used. Based on this word list, a hash
table is created with a series of phonemes as a key for each group of words. Here, the
phonetic keys are extracted from the CMU phonetic dictionary. (Figure 3.8 gives a pictorial
representation of the hash table).
Index
Word
/B T/
Bat
Pat
Bad
Pad
Figure 3.8: Sample element of a phonetic lexicon in a hash table
For an analysis of word transcription performance, 432 Pitman outlines were collected,
written with different levels of tidiness on a WACOM ART II Tablet by three writers. Each
writer wrote a sample sentence, consisting of 28 vocalized outlines and 20 short-forms, three
times. Here, the sample sentence covers the whole range of shorthand primitives and the
selected words contained in the 5000 most frequently used English words of the general
domain. Samples of the collected data are illustrated in Figure 3.9.
55
Figure 3.9: Sample collected outlines
3.7.2 Analysis of a Phonetic Lexicon

The goal of this experiment is to estimate an approximate number of candidate words
(homophones) for each input outline by using a phonetic lexicon, and to evaluate which
vocabulary level has the highest ambiguity distribution and which level has the least. Here,
vocabulary level means words known and used by a user and it is equivalent to a number of
words contained in a lexicon. Statistics obtained from this study are intended to estimate the
preliminary accuracy of the post processing of handwritten Pitmans Shorthand with respect
to different levels of writers vocabulary.
Unique outlines in %
100
Uniqueness of outlines
for perfect recognition
90
80
given line thickness
ambiguity
70
60
given vowel ambiguity
50
5000
4500
4000
3500
3000
2500
2000
1000
500
300
100
40
Lexicon size in no.of words
Figure 3.10: The distribution of homophones in different sized phonetic lexicons
Figure 3.10 illustrates experimental results obtained from different sizes of phonetic lexicons
up to 5000 words. The X-axis of the graph represents different sizes of lexicon, and words
56
extracted for these lexicons are sorted according to their frequency of usage. This means, a
lexicon of size 100 represents the first hundred most commonly used words in English; a
lexicon of size 300 represents the first 300 most commonly used words and so on. The first
test simulates how an input of Pitmans outline can be uniquely identified by a lexicon in the
presence of perfect segmentation and recognition. According to the test, 97% of the 5000
most frequently used English words have a unique representation. The maximum ambiguity
is 3 potential words per index and an average ambiguity is 1.02 potential words per index.
Therefore, a transcription accuracy of at least 97% can be estimated if there are no errors in
the low level segmentation and classification of shorthand outlines.
The second test (Figure 3.10) estimates the transcription performance in the presence of
unclear thickness of a pen-stroke.
This is the most common case experienced in the
recognition of Pitman shorthand as most digitizers are unable to detect the thickness of a
pen-stroke even though Pitman defines similar sounding consonants by the same stokes and
differentiates between voiced and unvoiced sounds by thick and thin lines. It should also be
noted that regardless of the input technology, writers do not make a clear distinction between
thick and thin strokes. According to this test, ambiguity of a lexicon of 5000 words increases
by about 9% if there is no distinction between voiced and unvoiced consonants. The
transcription accuracy here is expected to be at least 87%.
The third test in Figure 3.10 predicts the transcription performance in the presence of
ambiguous vowel notations. This is an important consideration in the recognition of Pitman
shorthand since vowels are occasionally omitted in writing Pitmans Shorthand and omitted
positions vary from writers experience or individual inclination. If a solution to handle the
unpredicted omission of vowels in an outline is by excluding vowels from the lexicon and
matching without vowel components, the new version of the lexicon has about 56% unique
indices.
57
3.7.3 Performance Evaluation of the Word Level Transcription

The goal of this experiment is to evaluate the word transcription performance of our
proposed framework under the following criteria:
in the presence of shape variation and position confusion due to speed writing or
different users writing;
in the presence of segmentation and classification errors due to misclassification or

hardware constraints, and
in the presence of abnormal outlines due to inconsistent writing

Table 3-3: Experimental results of the phonetic based word translation
Description
Transcription accuracy
(Vocalised outline)
Overall
84%
In the presence of vowel
0%
omission or confusion
In the presence of inconsistent
0%
writing
In the presence of classification
100%
error
As shown (Table 3-3), the best rate achieved by the vocalized outline interpreter is 84%.
12% of error rate is due to inconsistent writing i.e. outlines which are comprehensible to
human readers, but are not consistent with the writing rules of Pitman shorthand. An
interesting phenomenon observed in this experiment is that 48% of perfect transcription
occurs in the presence of recognition errors. This shows the approximate pattern matching
technique applied in NNQ is capable of dealing with classification errors. A primary
limitation of this system, which accounts for 40% of error rate, is being unable to correctly
transcribe outlines with hidden or omitted vowels.
58
Both accuracy and error rates reported throughout this experiment are based on a number of
outlines and they can be denoted as follow:
c
100
t
(3.1)
where a is the word transcription accuracy, c is the total number of correctly interpreted
outlines, and t is the total number of handwritten outlines.
tc
100
t
(3.2)
where e is the error rate, t is the total number of handwritten outlines, and c is the total
number of correctly interpreted outlines.
3.8 Discussion
On the whole, a primary advantage of phonetic based transcription of vocalised outlines is to
be able to adapt to existing language models (i.e., phonetic models), which define large
vocabularies with probability distributions between sequences of phonemes.
Another
distinct advantage is that a machine performs the same logical procedures as a human
interpreter to transcribe Pitmans Shorthand outlines and this makes the machine
transcription concept easy to follow.
In terms of disadvantage, performance of phonetic based transcription falls dramatically in

the presence of omitted vowels in vocalised outlines. According to our statistical analysis of
a phonetic dictionary (Figure 3.10), transcription of vocalised outlines without vowel
components is estimated at merely 56% correctness. In addition, the special writing rules of
Pitmans Shorthand raises ambiguity in phonetic based transcription a number of special
59
writing rules invented for speed improvement purposes in Pitmans Shorthand allow
multiple ways of pronouncing different sounds for primitives with minor differences of size,
length, thickness or inclination. In general, it is practical to express accurate size, length, or
inclination of a stroke for a printed script; however it is less practical for handwriting,
especially if the script is written at speed. The following examples illustrate the variation of
pronunciations with the minor differences between geometric features.
Example 1: Appearance variation

As shown in Figure 3.11, standard notations for consonants /T/ and /L/ are a vertical stroke
and a curve
respectively, however if /T/ is immediately followed by consonant /L/ or if
there is a non-stressed vowel between /T/ and /L/, Pitman uses a combination of a small
hook and a vertical stroke (i.e.,
) to indicate the sound of /TL/ or /T+ silent_Vowel+L/.
On the other hand, an outline with a small circle followed by a vertical stroke (i.e.,
stands for the sound /ST/ and it can be easily confused with an outline of /TL/ or
/T+silent_Vowel+L/ if the circle at the beginning is not clearly written. According to
experimental results, approximately 45% of small hooks are recognised as circles.
Therefore, a direct conversion of primitives, which are prone to minor recognition errors,
into phonemes can lead to completely different interpretations.
Handwritten outline
Basic Pitmans notations

/T/
/L/
/S/
/ST/
/TL/ or
/T+silent_vowel+L/
Figure 3.11: Illustration of the incidence of phoneme variation due to confusion

between a circle and a hook
60
Example 2: Length variation

In Pitmans Shorthand, curves of different lengths represent different phonemes; however,
length is not clearly shown in some outlines while writing at speed. As shown in Figure
3.12 a sample outline of the word shatter can be interpreted wrongly as /SH / instead of
/SH T ER/ if the curve /SH/ is not recognised as a long curve.
Phonetic representations of
double-length and normal
length outlines
Handwritten outline for the

word shatter
/SH T ER/
/SH /
Figure 3.12: Illustration of the incidence of phoneme variation due to length

confusion
Examples 1 and 2 demonstrate that converting inaccurate handwritten primitives into

phonemes allows unnecessary candidates to appear at an early stage and subsequently affects
the transcription performance. Cho & Kim [CK04] proposed that stroke relationships are
usually robust against geometric variations and important for discriminating characters of
similar shapes in on-line handwriting recognition. It is, therefore, more appropriate to retain
stroke information of an outline rather than changing it into phonemes.
After a thorough evaluation on the advantages and disadvantages of a phonetic based
transcription approach, it has finally been concluded that the remaining work of the thesis is
going to be based on a novel transcription method that retains low level stroke information,
denoted as a primitive based transcription approach.
61
5. Generation of a machine-readable Pitmans Shorthand lexicon
Bayesian Network Based Word Transcription
62
The previous chapter reviewed the advantages and disadvantages of a phonetic based
transliteration of handwritten Pitmans Shorthand and concluded that the idea of not
following conventional phonetic approaches is rather appealing. This chapter discusses the
novel approach, implemented specifically for this research to improve word transcription
accuracy by using a primitive to text transliteration approach. In this new approach,
Bayesian Network representation is applied to configure ambiguities and stroke
dependencies of handwritten Pitmans Shorthand outlines.
First of all, an overview of the whole system is given, thereby enabling the reader to get a
clear understanding of the role of the word transcription processes. Following the overview,
a detailed description of a Bayesian Network word recogniser is given under the following
topics.
Summary: a brief description of each process included in a Bayesian Network based

word recogniser.
Life cycle: explanation of a life cycle of Bayesian Network models that represent
handwritten Pitmans Shorthand outlines.
Network architecture: description of an outline models architecture, including

attributes (nodes) and relationship between nodes (topology).
Inference: propagation of the likelihood of an attribute of an outline model based on

other attributes of the model, in which the propagated values are used for N-best
word selection.
Training (Learning): training outline models with a collection of training data in

order to enable the system to cope with the natural ambiguity of handwriting in
Pitmans Shorthand
63
Model selection: selection of N-best outline models for a given input outline centred
on knowledge based rejection strategies.
Experiment: performance evaluation of the Bayesian Network based word

recogniser
4.1 System Overview

Input
Collaborators recognition engine
Vocalised outline interpreter
Vocalised outline
recogniser
Short-form
recogniser
Short-form
interpreter
Pre-processing
Segmentation engine
(dominant point
detection)
Classifier \
(Neural Network)
Primitive based word

transcription
Template
matching
engine
Output:
A ranked list of
words
Output:
A ranked list of
primitives
Output:
A list of
words
Phrase level transcription
Output
Text
Internet
Figure 4.1: An abstract view of the whole system
An overview of the whole system is given (Figure 4.1), in which the diagram is nearly
identical to the one illustrated in the pervious chapter. A major difference between the two
frameworks is the change to a vocalized outline interpreter (shaded box) in the new
framework, where text is interpreted directly from primitive attributes instead of phonetic
64
attributes as in the old framework. A summary of processes included in the new vocalized
outline interpreter is presented as follow.
4.2 Summary of Bayesian Network Based Word Transcription

A ranked
list of
primitives
Vocalised
outline
recogniser
Input
outline
Bayesian Network based vocalised outline interpreter
Pre-processing
Shorthand
lexicon
Lexicon construction
A ranked
list of
words
Training
Bayesian
Network based
outline models
Word interpretation
Phrase level
transcription
Output word(s)
Legends
Data
Data flow
Process
Read/Write access
Storage
Figure 4.2: Illustration of Bayesian Network based word transcription
The shaded box in Figure 4.2 highlights the role of Bayesian Network based vocalized
outline transcription.
It comprises of two major processes: preprocessing and word
interpretation. The preprocessing takes place at the first time of setting up a transcription
engine and it is skipped in real time transcription of shorthand outlines unless any
modification of lexical data is required.
A primary function of preprocessing is to
automatically convert a phonetic lexicon into a Pitmans Shorthand lexicon such that
different combinations of a series of geometric patterns represent different keys, with each
key mapping to one, or more than one, word. This approach (the creation of the Pitmans
65
Shorthand lexicon) is distinct from previous work and a full description of the lexicon
creation is given in a separate chapter, Chapter 5. Another important function of
preprocessing is to create Bayesian Network based outline models where user independent
handwritten data and lexicon information are embedded in hierarchical probabilistic
structures.
The next process, which takes place immediately after the preprocessing, is word
interpretation. A primary function of the word interpretation is to produce a ranked list of Nbest words based on a confidence score of the low level recognition plus a belief of nodes of
an outline model. After the word interpretation, the N-best words are then forwarded to the
next process, called a phrase level interpreter to produce the final word(s) for a given input
outline.
4.3 Life Cycle of Outline Models

Outline models are the primary components of a vocalised outline interpreter and this
section describes the life cycle of outline models throughout the word transcription process.
Firstly, a precise description of outline models is given a collection of outline models

represents a dictionary, and the number of outline models, generated in the word interpreter
is not the same as the number of words of a dictionary. This is because each outline model
is designed to represent one, or more than one, word in order to cope with hardware
limitations or ambiguities of handwritten Pitmans Shorthand. An example of a hardware
limitation is the passive digitisers of Personal Digital Assistants (PDAs) that are incapable of
detecting accurate line thicknesses. This limitation makes a shorthand recogniser fail to
distinguish between two similar outlines with different line thicknesses e.g., outlines for the
words pays
and bays
. As a result, grouping similar outlines under the same
66
model enables the system to easily find potential candidate words for a given outline and
improves the search performance. Here, similar outlines stands for words with the same
series of geometric features (of a consonant kernel) regardless of different line thicknesses
and different vowel positions. Samples of similar outlines are illustrated in Figure 4.3.
pays
bays
oak
go
airs
erase
Figure 4.3: Illustration of three pairs of similar outlines
In terms of the life cycle, outline models are firstly created with the use of a shorthand
lexicon and secondly updated with a set of training data. The models are then saved as a
knowledge source for word interpretation until any changes are required. Examples of
changes include expanding the word list of an existing dictionary or altering a user domain.
In response to the change of a user domain, outline models are created, edited or removed
according to the users preference, defined at a domain set up process. Note that vocabulary
(i.e., the word list of a dictionary) has a huge impact on word transcription performance and
outline models should be associated with a dictionary of a corresponding domain. Figure
4.4 illustrates the life cycle of outline models.
In real time word interpretation, a series of classified primitives of an input outline are
matched with outline models, and the model with the highest posterior probability is taken as
a correct representation for a written outline.
67
A collection of outline models

Shorthand
lexicon
Training
data
Creation of a
new outline
model
A closer view of an outline model

for a particular word
Update of existing
outline models
with training data
Removal of
outline model with
new domain set up
Legends
Storage
Process
Node of an
outline model
Outline model
Read / write access
Figure 4.4: Life cycle of outline models
4.4 Outline Model Architecture

An outline model is formed by concatenating the basic geometric features of a shorthand
outline, produced by the low level recognition engine, in chronological order. Note that
chronological writing order in Pitmans Shorthand is not synonymous with the one in normal
English. The difference between them is illustrated in Figure 4.5, a chronological writing
order of the word beat in normal English is b, e, a, t whereas the writing order changes to
b, t, e, a in Pitmans Shorthand.
Vowels are always written last no matter how words are pronounced in Pitmans Shorthand
and this makes the automatic transliteration of handwritten Pitmans Shorthand distinct from
the transcription of handwritten English. According to the study in Chapter 3, reordering
vowels to their corresponding positions was found inefficient in the case of having missing
vowel variables in an outline. To improve upon existing systems, it was argued, one should
68
somehow seek a more parsimonious solution that also leads to a better text interpretation
performance. Thus, this research proposes a novel network model, denoted as an outline
model which represents the inherently complex features of handwritten Pitmans Shorthand.
The word beat
In normal English
In Pitmans Shorthand
Chronological writing order of the word beat
1st letter
2nd letter
1st consonant (b)
3rd letter
2nd consonant (t)
4th letter
3rd vowel (ea)
Figure 4.5: Illustration of chronological writing order of normal English and Pitmans
Shorthand
4.4.1 Nodes of an Outline Model

The structure of an outline model is based on a Bayesian Network representation [Pj88] in
which the model is retained in a hierarchical structure with each node corresponding to a
primitive variable or a conditional variable, and each link signifying probabilistic
dependency between nodes. Similar to a network architecture designed by Xiao in the
domain of signature verification [XL02], our outline model creates the following four types
of nodes, depending on the relationship between one node and another.
1. Root node: A root node corresponds to an outline O and it represents one, or more than
one, word. It contains N child nodes {P1, P2,.. PN} where Pi corresponds to a collection of
primitives which represents the ith segment of the outline O.
69
2. Unique node: A unique node corresponds to a particular segment of a shorthand outline

and it represents one and only one pattern. It appears while an outline model O is created
with a shorthand lexicon at the beginning, and it remains or disappears while O is updated
with training data. The definition of a unique node is: if a particular segment (node) of an
outline model relates to one and only one type of geometric feature after it has been updated
with a shorthand lexicon as well as training data, the node is considered to be independent of
other nodes and linked directly to a root node. Figure 4.6 (a) and (b) illustrate occurrence
of unique nodes in two cases.
O
O
P3
V1
P1
P3
P1
P4
P2
H1
P4
P5
(b)
(a)
Figure 4.6: Illustration of unique nodes of an outline model

(a) Occurrence of unique nodes after O has been created with a Pitmans Shorthand
lexicon since features of O in the lexicon are genuinely accurate, every segment
(node) of O is related to one and only one pattern, resulting in a unique node for every
segment. (b) Occurrence of unique nodes after O has been updated with lexicon and
training data since there is more than one possibility in the first and third segment of
O, two corresponding unique nodes disappear, resulting in one remaining unique
node, P3.
3. Virtual node: A virtual node corresponds to a certain segment of a shorthand outline and
represents a conditional variable that allows the embedding of multiple possibilities of a
consonant-segment in an outline model O. It appears when two or more primitives compete
to represent a particular node of O during the training process, but it never appears while O
is created with a shorthand lexicon at the beginning. The definition of virtual node reads: if
a particular primitive (e.g., P1 in Figure 4.6 (b)) is dependent on another primitive (e.g.,P2 in
Figure 4.6 (b)) and there is an optional relationship between them (i.e., either at most one or
none of them can be true at the same time), we can assume that there is a mechanism that
controls the values of P1 and P2, resulting in a virtual node V1 as shown (Figure 4.6 (b)).
70
4. Hidden node: A hidden node corresponds to a certain portion of a shorthand outline and
represents a conditional variable that allows the embedding of hidden vowel primitives in an
outline model. An interesting assumption in relation to the creation of a hidden node is that
it appears from the time when outline models are created with a shorthand lexicon, although
the lexicon provides accurate vowel information at the time. This is due to a major purpose
behind hidden nodes i.e., to identify missing vowel components, randomly omitted by
writers according to the writers experience or preference. The definition of hidden nodes
reads: if a particular primitive (e.g., P4 in Figure 4.6 (b)) appears or disappears from time to
time and the variation does not adhere to any rule, we can assume that there is a hidden
mechanism that controls the value of P4 or P5, resulting in a hidden node H1.
In order to demonstrate how an outline model is created with the use of four types of node,
the step by step creation of an outline model for the word bake is given (Figure 4.7).
1. Firstly, a root node of an outline model is generated with the word bake.
2. The root node then creates N number of child nodes using a shorthand lexicon such
that each consonant primitive of a word in the lexicon turns into a unique node and
each vowel primitive turns into a hidden node, where N is the number of primitives
of the word.
3. The outline model is then updated with a number of training samples, resulting in
additional leaf nodes and virtual nodes.
71
Shorthand outline for the word bake =

Lexicon entry for the word bake
in terms of strokes
Lexicon entry for the word bake

in terms of type number
4 7 91
Training data 1 for the word bake

4 7
1 6
Training data 2 for the word bake

=
Steps
2 possibilities of
the 1st segment
92
4 5
Description
Outline model
R
Step 1:
Creation of a root node
Step 2:
Creation of unique node and hidden node
Bake
R
7
L
91
Step 3:
Update with training data 1

V
91
92
Step 4:
Update with training data 2
legend
R
Root node
Virtual node
Hidden node
Leaf node
Figure 4.7: Illustration of step by step creation of outline models
72
91
92
(Input: a Pitmans outline for the word bake)
WStart
S1, 0, 64, 4, 0.56
S1, 0,64, 1, 0.44
S2, 64, 137, 7, 0.88
S2, 64, 137, 6, 0.12
(1st line: word start)

(2nd line: segment number, start coordinate, end coordinate, primitive
type, probability)
(3rd line)
(4th line)
(5th line)
V1, 0, 64, 2, 1, 92
(6th line: vowel number, start coordinate, end coordinate, sequence,

position, type)
WEnd
(7th line: word end)
Figure 4.8: Sample training data for the word bake processed by the recognition
engine; the italic text on the right explains what each line of data represents
In addition, detailed explanation on training data, applied in the creation of an outline model
is given in Figure 4.8 which illustrates the training data 1 of the word bake, depicted in
Figure 4.7. The second and third line of data (Figure 4.8) indicates that there are two
possible pattern categories associating with the first segment of the word bake: type 4 and
type 1. Here, type 4 is equal to an existing pattern of a shorthand lexicon and type 1 is a new
pattern observed by the recognition engine. In order to update an existing outline model
with this new observation, an existing unique node (Figure 4.7, Step 3) is firstly transformed
into a virtual node and secondly attached with two leaf nodes, resulting in a virtual node
with two children. Similarly, according to the sixth line of data (Figure 4.8), a vowel
primitive (type 92) classified by the recognition engine is different from the one defined in a
lexicon (type 91), resulting in a hidden node with two children.
73
4.4.2 Relationships between Nodes

Relationships between nodes of a Bayesian Network are indicated by drawing arcs from
cause variables to their immediate effects [Hd99]. The arc signifies a cause-effect
relationship and encodes conditional probability distribution (CPD) indicating to which
extent one variable is likely to affect another. In addition, the level of dependency between
nodes has a significant affect on computational expense: the stronger the relationship
between nodes, the bigger the conditional probability table size grows and vice versa.
Before determining the dependency of nodes of an outline model, this research takes into
account the following two extreme situations:
If nodes are extremely independent of each other, they become d-separated given an
evidence node, thereby making a network model unable to cope with abnormal
circumstances. For example, variables A and B, which are usually dependent on
each other, may be disconnected due to an occurrence of rare evidence, E.
On the other hand, if nodes are precisely connected to each other with conditional
probability distributions for all possible cases, it becomes computationally inflexible
to obtain a reliable estimation.
Taking into account the drawbacks of the above two extreme situations, the outline models
for this research are designed with the following practical hypothesis:
1. each node Pi is independent of its non-descendants Pj
2. each node Pi is independent of its descendants Di given a parent of Di
3. leaf nodes {L1, L2,, Ln} are independent of each other unless they share the same
parent Xj
Alternatively, conditional dependency between variables of an outline model can be
presented using a Bay ball algorithm [Sr98] as illustrated (Figure 4.9).
74
Legend
Hidden node
Direction of the flow of a ball
Observed node
Indication of parent-child relationship
Figure 4.9 Illustration of conditional dependency of variables in an outline model

using the Bayes Ball algorithm [Sr98].
If there is no flow of a ball from A to B in a graph, A and B are conditionally
independent given a set of observed or hidden variables X and vice versa.
4.5 Inference
The inference process of a Bayesian Network involves updating the probability of nodes
given some evidence and priori probabilities [XL02]. It is called finding belief of a node x,
denoted as BEL(x). In our case, evidence of nodes is given by a lexicon, training data or
user input. A primary use of BEL(x) is to find the likelihood of outline models, with which
the N-best models for a given shorthand outline are selected.
Among a variety of beliefs updating algorithms that support the Bayesian Network, this
work directly applies the message passing algorithm developed by Pearl [Pj88] the belief
of every node in the network is taken as the product of and messages, where is a
message received from each of its parents (if any) and is a message received from each of
its children (if any). Alternatively, and of each node of an outline model is denoted as a
product of X(U), a message that node X receives from its parent U and Y j ( X ) , a message
that node X receives from its child Yj. Note that an outline model is a tree structure where
every node has one and only one parent (except the root node, which has no parent) and has
N number of children (Y1, Y2,.., YN).
75
4.5.1 Message Initialization

According to Pearls algorithm [Pj88], nodes of a Bayesian Network need to be assigned
with initial belief before propagating messages through the network. In general, assignment
of initial belief (prior probability) of a node varies widely from one application to another,
depending on statistical information on variables as well as previous experience of a
developer working on a similar problem.
In this work, message initialisation varies depending on the type of node. The initialisation
of and messages for different types of node of an outline model is presented as follows.
Root node: A root node is the topmost one in an outline model and does not have any parent;
therefore its message is set to 0.5 assuming that there is an equal chance of taking a TRUE
or FALSE value for this node. Its message is set to 1 assuming that there is a TRUE
relationship from its child nodes.
Unique node: A unique node does not have any descendants and is linked directly to a root
node. Its message is set to 1 assuming that there is a TRUE relationship from its parent
(root node) and its message is set to 1, stating that a primitive associating with this node
appears in both lexicon and training data.
Virtual node: A virtual node is a judgemental node holding a true relationship from its
parent
(i.e.,
1)
and
optional
=P(Child_Nodes|observation).
76
relationship
to
its
children
(i.e.,
Hidden node: Similar to a virtual node, a hidden node holds a true relationship from its
parent
(i.e.
1)
and
optional
relationship
to
its
children
(i.e.,
=P(Child_Nodes|observation).
Leaf node: A leaf node (not including a unique node) holds an optional relationship from
a virtual node or a hidden node and its message is set to P(Child_Nodes|observation). It
does not have any children and its message is set to a confidence score of the node
obtained from training data.
On the whole, our message initialisation strategy is similar to the one implemented by Xiao
and Leedham [XL02] for signature verification. Nonetheless, the estimated values are
different in this work in accordance with the characteristics of handwritten Pitmans
Shorthand.
4.5.2 Belief Updating

Belief of a node in an outline model is calculated by the formula presented by Xiao [XL02]
denoted as
BEL( x) ( x) ( x)
(4.1)
where is a normalization factor, (x) is a combined message received from all the children
of node X and (x) is a combined message received from all the parents of node X.
Depending on the type of node, (x) is calculated differently. If it is a Root Node, (x) can
be defined by the formula presented by Pearl [Pj88]:
( x) (Y ( x))
j
77
(4.2)
Where Y j (x) is a message that a node X receives from its child node Yj.
If a node is a unique node, (x) is defined as:
( x) 1
(4.3)
If it is a virtual node, (x) is defined as:
Y j ( x)
( x)
0.001
If a child Yj of the virtual node X is true

otherwise
(4.4)
If a node is a hidden node, (x) is defined as:
Y j ( x)
( x)
0.1
If a child Yj of the hidden node X is true
otherwise
(4.5)
In equations 4.4 and 4.5, the values 0.001 and 0.1 are predefined probabilities, used when
none of the child node of X is likely to be true. Selection of these confidence scores is based
on several experimental results, testing with different thresholds between 0 and 1.
If a node is a leaf node, but not a unique node, (x) is defined as:
(1.0,1.0)
( x)
(a, b)
If a value of node X is assigned by a lexicon

otherwise
(4.6)
78
where a and b are normalised recognition and training probabilities for a corresponding
node. The next section, Learning of Outline Models explains how a and b are calculated
using training data.
4.6 Learning of Outline Models

In general, learning in a Bayesian Network often refers to the learning of structure of a
model and its parameters, or learning either one of them [Mk01]. In this work, learning
refers to the learning of parameters of an outline model. Structure learning is not of concern
here since there is no direct interaction between the low level segmentation engine (of the
collaborator) and the network modelling engine (of this research) to enable a dynamic
change of a basic model layout.
Parameter learning of an outline model includes finding an optimal maximum likelihood of a

node based on a set of training parameters and assigning the likelihood value as belief of the
node.
There are various learning algorithms [Hd99], [Mk01] that support Bayesian
Networks and the selection of an appropriate one is based on two factors: structure of the
network (whether it is known or unknown) and evidence of nodes (whether they are fully or
partially observed). With full details of these two factors, an appropriate learning algorithm
for a particular Bayesian Network can be identified using Murphys decision table (Table 41), shown below.
Table 4-1 Murphys decision table [Mk01]
Structure
Known
Unknown
Full
Close form
Local search
Observability
Partial
EM
Structural EM
The table indicates which algorithm is likely to be the most effective under which
circumstances. For example, the Expectation Maximization (EM) algorithm is likely to be
79
the most suitable one for a Bayesian Network whose structure is known in advanced and
whose parameters are partially observed. With reference to this table, parameter learning of
an outline model is discussed in two parts: learning of consonant primitives and learning
of vowel primitives.
4.6.1 Learning of Consonant Primitives

The maximum likelihood estimates (MLE) learning method is used to find estimates of
consonant primitives of an outline model since the structure of the model is known and its
consonant parameters are fully observed in training data. The structure (states of an outline
model) is known in advanced because an outline model is initially constructed with a lexicon
entry with clear information on the number of segments (states) of each shorthand outline.
In addition, the consonant primitives are always observed in training data because
stenographers never omit consonant primitives of a vocalised outline. Murphy denotes MLE
as a closed form in his table (Table 4-1).
The basic idea behind the MLE method is to maximise the likelihood of training data D,
which contains M cases (believed to be independent) [Hd99].
Assuming that
Di={N(i,1),N(i,2),..N(i,j)} is the ith sample of training data D, j is the number of consonant

primitives of a word and N(i,1)={N(i,1,1),N(i,1,2),..,N(i,1,k)} represents a set of possible consonant
primitives of the node N(i,1), a pair of MLE values (a,b) for each consonant node N(i,j,k) can be
calculated using formula 4.7 and formula 4.8 when a training sample Di is fed into an outline
model. In general, a represents the likelihood of a consonant node N(i,j) to be recognised as a
primitive type N(i,j,k) and b is the likelihood of consonant primitive N(i,j,k) to be associated
with node N(i,j). The calculation of a is denoted as:
80
a=
1
M
( | P, D )
i 1
(4.7)
where is the recognition/classification accuracy of a consonant node given training data

Di and a parent node P.
The calculation of b is denoted as:
b=
1
M
( | P, Di)
(4.8)
i 1
where is a conditional variable which is 1 if a consonant primitive of training data Di has

a true relationship with its corresponding parent node P and 0 otherwise.
In addition, the value b is saved in a history file to be used to create new outline models that
do not have training data; the history file creation is presented below.
Initialization
D: a collection of training data
Di: ith sample of the training data, D
L: a primitive lexicon
Li: an element of L which holds an equal word value as Di
N: number of consonant primitives contained in Di
j: an index identifier
Ni,j: a pattern representing the jth consonant primitive of Di
Li,j: a pattern representing the jth consonant primitive of Li
b : probability of Ni,j to have a relationship with Li,j
81
History updating
If (Ni,j != Li,j & b >0)
Save b
The above two lines of pseudo code indicate that if a pattern Ni,j ,observed in training data is
not the same as the one defined in a lexicon and if there is evidence confirming a
relationship between Ni,j and Li,j (i.e., the value of b is greater than zero), the system creates a
history file and stores the value b as a probability of Li,j to be recognised as Ni,j . Then, later
in the training process, the history file is retrieved to construct new outline models for words
which do not have any training samples.
In brief, the use of the history file has a significant benefit to the training of shorthand
outline models, particularly for words which do not have sufficient training data. This is
mainly because Pitmans Shorthand is no longer a demanding skill nowadays and the
collection of thousands or millions of samples of training data is infeasible in terms of
accessibility to experienced writers.
4.6.2 Learning of Vowel Primitives

Finally, the hardest part of the learning process will be addressed, where there are
hidden/missing vowel variables. The problem is that vowel primitives are rarely written by
Pitmans Shorthand writers and omitted positions vary widely depending on individual
preference and context. When having such non-linear distribution of hidden variables, the
Expectation Maximization (EM) algorithm is shown to be effective to find a (locally)
optimal maximum likelihood of a node [Mk01]. Thus, the EM algorithm is applied for
learning of vowel primitives here. The basic idea behind the EM algorithm in our learning
process is that if we know the vowel values from a lexicon, the probability distribution of
82
hidden vowels in an outline model can be estimated after learning in the M step. Then in the
E step, in which E>M, these estimated values can be treated as though they are observed.
On the whole, the EM learning of a vowel (hidden) node is denoted as:
PEM (V=TRUE |O=TRUE) =
E (V TRUE )
E (O TRUE )
(4.9)
where V is a vowel node, O is an outline model and E() is the number of times a
corresponding parameter is expected to occur. According to Murphy [Mk01], E(...) is
computed as follows
E (e) I (e | Dm) P(e | Dm)

m
(4.10)
where D is a set of training data, I(e|Dm) is an indicator function which is 1 if an event e

occurs in training case m, and 0 otherwise.
4.7 Model Selection

Model selection in a Bayesian Network is concerned with measuring the degree to which a
network structure (equivalence class) fits the prior knowledge and data. [Hd99] This work
determines the fitness of a particular outline model to a given input outline via a relative
posterior probability. Assume Oi is the ith outline model and P1, P2,.., Pn are input primitives
classified by our collaborators recognition engine, a posterior probability (fitness) of the
outline model Oi given a set of input primitives can be defined as:
P(Oi| P1, P2,.., Pn) = P(Oi)P(P1, P2,.., Pn |Oi)
(4.11)
According to equation 4.11, posterior probability of an outline model is calculated based

upon a prior probability of an outline model in combination with the likelihood of input
83
primitives which belong to the given outline model. Alternatively, equation 4.11 can be
denoted in terms of belief of a node as follow:
P(Oi | P1, P2 ,..., Pn ) BEL( x | Oi ) BEL( N j | Pj )
(4.12)
where j = (1,..,n), Oi is the ith outline model, x is a root node of Oi, BEL() is the belief of a
node, Pj is an input primitive and Nj is a child node of the root node x.
To find the N-best outline models for a given input, models with top N posterior
probabilities are chosen. However using the posterior probabilities alone to find the best
models is not computationally efficient. The problem is that the number of outline models
increases along with the number of words contained in a lexicon and calculating the
posterior probability of thousands of outline models in real time word transcription is
infeasible, mainly in terms of operational time. Therefore, three unigram-based rejection
strategies are applied in our system in order to reduce model selection time.
The first rejection strategy number of consonant primitives (NCP) of an input outline is
used as a first level filter to reject outline models that are not relevant to a given input. The
approach is denoted as NCP filter and the algorithm is denoted as:
O NCP ( k ) O NCP ( i ) \ O NCP ( i k )

where O
NCP(i)
(4.13)
is a collection of outline models relating to any NCP, and k is the NCP of an
input outline. Example 1 (below) clarifies the concept behind NCP filter.
Example 1
Assuming that k= 2, O = {O1, O2, O3, O4, O5, O6} is a set of outline models contained in the
system and NCP of O1, O2, O3, O4, O5, O6 are 2, 2, 6, 3, 5 and 2 respectively, O NCP(2) is
calculated using the formula 4.13 as follow:
O NCP(k) = O NCP(i) \ O NCP(i k)
84
O NCP(2) = { O1, O2, O3, O4, O5, O6} \ { O3, O4, O5}
= { O1, O2, O6}
The second rejection strategy outline models are discriminated in favour of a pair of
primitives, appearing at the first and last (consonant) segments of an outline. This approach
is denoted as F&L filter and the algorithm is denoted as:
O F(k),L(j) = OF(i),L(i) \ OF(i k),L(i j)
(4.14)
where OFi,Li is a set of outline models whose first and last segments relate to any type of
primitive and k and j are types of the first and last segments of an input outline respectively.
Example 2 below demonstrates the concept behind F&L filter.
Example 2
Assuming that k= 5, j = 6, O = {O1, O2, O3, O4, O5, O6} is a set of outline models contained
in the system and (F(i),L(i)) of O1, O2, O3, O4, O5, O6 are (3,2), (5,5), (5,6), (1,2), (5,6) and
(5,2) respectively, O F(5),L(6) is calculated as:
O F(k),L(j) = OF(i),L(i) \ OF(i k),L(i j)
O F(5),L(6) = { O1, O2, O3, O4, O5, O6} \ { O1, O2, O4, O6}
= { O3, O5}
The idea behind formula 4.14 is based on an interesting phenomenon i.e., wrongly spelled
English words are sometimes comprehensible to a reader as long as the first and the last
letters of the words are clearly indicated. For example, you may understand the following
sentence even though it contains a number of spelling errors: Wornlgy seplled Egnlish
words are sitll leiglbe to a reader as lnog as the frist and lsat ltteers of the words are crroect.
In other words, first and last letters of a word provide heuristics for word identification in
English. Similar to this phenomenon, the outline model selection in our work can be based
on evidence of the first and last primitives of an outline provided that the first and last
segments of an outline are always written in Pitmans Shorthand. According to our study
85
done on 10 samples of shorthand notes, handwritten by professional shorthand writers, it is

observed that the first and last primitives of a vocalized outline are always written in
Pitmans Shorthand. Therefore, the second rejection strategy (formula 4.14) is based on the
first and last primitives of a shorthand outline.
The third rejection strategy outline models are selected depending on the existence of
circular primitives in an input outline. The approach is referred to C filter and the
algorithm is denoted as:
O C(k) = OC(i) \ OC(i k)
(4.15)
where OC(i) is a set of outline models, k is a conditional variable which is TRUE if an input
outline contains circular primitives and FALSE otherwise. Example 3 below demonstrates
the concept behind C filter.
Example 3
Assuming that k= TRUE, O = {O1, O2, O3, O4, O5, O6} is a set of outline models contained
in the system and C(i) of O1, O2, O3, O4, O5, O6 are TRUE, FALSE, TRUE, FALSE, FALSE,
TURE respectively, O C(TRUE) is calculated as:
O C(k)
= OC(i) \ OC(i k)
O C(TRUE) = { O1, O2, O3, O4, O5, O6} \ { O2, O4, O5}
= {O1, O3, O6}
Formula 4.15 checks the existence of circular primitives in outline models and splits the
whole outline models into two main groups: those containing circular primitives and those
not containing circular primitives. In general, this rejection strategy performs well, with
reliable accuracy of the collaborators recognition engine at detecting circular primitives of
an outline (if there are any).
Overall, model selection strategies carried out in this work are illustrated from left to right
rejection order (Figure 4.10). After the C filter, posterior probabilities of the remaining
86
outline models are calculated using formula 4.12, with which the N-best candidate outline
models for a given input are chosen.
NCP
filter
F&L
filter
C filter
Posterior
probability
filter
Legend
A collection of outline model (its
length represents number of
outline models)
Rejection strategy
Figure 4.10 Illustration of outline model selection strategies
4.8 Experimental Result

A primary goal of the experiments carried out in this chapter is to evaluate the transcription
accuracy of the Bayesian Network based word interpreter under the following criteria:
In the presence of shape variation and position confusion of pen strokes due to
natural handwriting.
In the presence of segmentation and classification errors due to hardware constraints

and limitations of the recognition engine.
In the presence of missing vowel primitives that are randomly omitted among
outlines by experienced Pitmans Shorthand writers.
In the presence of incorrect shorthand outlines, written by inexperienced shorthand

writers
87
4.8.1 Data Set

Three types of data sets are evaluated in experiments of this chapter and they can be outlined
as follow:
Single-consonant data set: This data set contains outlines with skeletons having
one and only one consonant stroke, for instance, shorthand outlines for the
words bay (
), pea (
) and pat(
). In general, various groups of
homophones (i.e., outlines that look similar but have different representations)
contain in this data set as outlines are similar with minor difference of line
thicknesses, vowel positions and inclinations.
Stroke-combination data set: This data set contains outlines with skeletons
having two or more consonant strokes, written according to the normal rules of
Pitmans Shorthand i.e., phonemes of the words are directly converted into
Pitmans primitives without applying any special rules of Pitmans Shorthand,
invented for speed enhancement purposes. The data set covers the whole range
of possible stroke combinations, and sample outlines of the data set are
illustrated (Figure 4.11).
Bar
making
rare
escape
machine
Figure 4.11: Samples of the stroke combination data set
Special-rule data set: This data set contains words written according to the
special rules of Pitmans Shorthand. For instance, instead of writing the word
after by comprising primitives of the phonemes /F/, /T/, /R/ and vowels as in
Figure 4.12(a), Pitman uses a doubled length /F/ curve to express the word
88
after as in (Figure 4.12 (b)). In general, this data set contains inconsistent
outlines, written without following corresponding special rules of Pitmans
Shorthand by (inexperienced) shorthand writers who do not digest the complete
rules of Pitmans Shorthand.
Two different outlines for the word after
Vowel
Vowel
Consonant /F/
Consonant /T/
Doubled length curve /F/
Consonant /R/
(a) Incorrect outline
(b) Correct outline
Figure 4.12: Two different shorthand outlines for the word after; (a) the word after
is written according to the direct conversion of phonemes into primitives (b) the word
after is written according to the double-length rule of Pitmans Shorthand
Table 4-2: Details of the data collection for the three data sets
Data set name
Numbers
of words
Writer ID
Number of
times
Single-consonant data set
135
Writer A
135
Writer B
Stroke-combination data set
192
Writer A
192
Writer B
192
Writer C
Special-rule data set
87
Writer A
87
Writer D
87
Writer E
In total, 1416 outlines were collected for the three data sets where
89
Table 4-2 provides details of the collected data. The data is collected using a tablet PC
with an electromagnetic digitizer of resolution 1000 ppi and five writers were involved in the
data collection. The three data sets cover the whole range of shorthand primitives and the
word list contains the 5000 most frequently used English words of the general domain. 45%
of the data is included in a training data set and samples of the collected data are illustrated
(Figure 4.13).
Pitmans
Shorthand outline
for the word bay
Figure 4.13: Screen shot of outlines written by Writer A
4.8.2 Evaluation of the Recognition Engine

Before the evaluation of the word transcription performance of the Bayesian Network based
interpreter, this section firstly evaluates the accuracy of the recognition engine in order to
relate it to the overall word transcription performance. The study is categorized into three
groups namely: (1) analysis of the vocalized outline identification, (2) analysis of the outline
segmentation, and (3) analysis of the primitive classification.
The studies are carried out
using the whole data sets and experimental results are discussed as follow.
Firstly, the accuracy of the vocalised outline identification is discussed. To clarify what is
meant by the vocalised outline identification it is the process of defining whether a written
outline is a short-form or a phonetically written outline. As shown in Figure 4.14, accuracies
of the vocalised outline identification vary from writer to writer, or even from time to time
90
for the same writer. For instance, consider the accuracies of the vocalised outline
identification for writer A for the single-consonant data set where there is a difference of
approximately 62% between the accuracy of the first time and the second time writings. The
study finds that a major reason for having such a difference of accuracy is that writer A
omitted most of the vowels while writing the single-consonant data set the first time whereas
the writer indicated at least one vowel for most of the words at the second time of writing.
Therefore, it is summarised that the indication of at least one vowel for an outline is critical
for obtaining high vocalised outline identification accuracy.
Any words that are not
recognized as vocalised outlines are remarked as short-forms by the recognition engine. For
example, 73% of the data written by writer A for the single consonant data set are remarked
as short-forms by the recognition engine although the outlines are, in fact, vocalized
outlines. On the whole, average vocalised outline identification accuracy for the whole data
sets is 69%.
Performance of the vocalised outline

identification by the recognition engine
Legend
120%
Accuracy
100%
80%
60%
40%
20%
0%
A
writer
Figure 4.14: Evaluation of the vocalised outline identification of the recognition

engine
91
Secondly, segmentation accuracy of the recognition engine is discussed. In general, the

segmentation accuracy varies depending on different data sets. As shown in Figure 4.15, the
single-consonant data set has about 72% of correctly segmented outlines, whereas the
stroke-combination data set has only about 21% of correctly segmented outlines on average.
The results are reasonable since the single consonant data set contains outlines with only one
consonant stroke and hence the higher segmentation accuracy.
For the analysis of
segmentation accuracy of different writers of same data set, consider the results of the
special-rule data set where the segmentation accuracy of outlines written by writer E is
higher than that of writer A. Statistics show that writer A does not have previous experience
of using a pen based text entry system whereas writer E has previous experience of applying
pen based text entry systems of handheld devices. In addition, statistics show that writer A
prefers writing small scripts on a tablet in a similar manner to writing on a conventional
paper whereas writer E produces larger scripts with flexible pen movements on the digitizer.
Therefore, it is observed that writers previous experience of using pen based text entry
systems has an influence over the segmentation performance of the recognition engine. The
average segmentation accuracy of the overall data sets is 36%. The segmentation accuracy
presented in Figure 4.15 is based on the number of correctly detected vocalized outlines and
is formulated as follow:
(t y )
100
t
(4.16)
where s is segmentation accuracy, t is the total number of written words and y is the total
number of outlines that are recognised as short-forms instead of vocalised outlines.
92
Segmentation accuracy of the

recognition engine
80%
70%
60%
50%
40%
30%
20%
10%
0%
A
writer
Legend
Figure 4.15: Evaluation of the segmentation accuracy of the recognition engine
Thirdly, classification accuracy of the recognition engine is discussed. As shown in Figure

4.16, the average classification accuracy of the stroke-combination data set is lower than that
of the single consonant data set or special rule data set. Statistics show that the classification
accuracy is influenced by several factors including tidiness of the handwriting, limitations of
hardware, or limitations of applied algorithms of the recognition engine. On average, the
classification accuracy of the whole data sets is 77% where the classification accuracy is
based on the total number of outlines that are recognised as vocalised outlines as well as
being correctly segmented. The formula is defined as:
tx
100
t
(4.17)
93
where c is classification accuracy, t is total number of written words, x is total number of

outlines that are recognised as vocalized outlines as well as being correctly segmented.
Classification accuracy of the

recognition engine
120%
100%
80%
60%
40%
20%
0%
A
writer
Legend
Figure 4.16: Evaluation of the classification accuracy of the recognition engine
4.8.3 Evaluation of the Word Transcription Accuracy

Experiments carried out in this section are categorized into three groups namely:
Analysis of word transcription accuracy using single-consonant data set.
Analysis of word transcription accuracy using stroke-combination data set.
Analysis of word transcription accuracy using special-rule data set.
Each group comprises of four graphs discussing experimental results from different aspects,
outlined as follow:
Recognition accuracy vs. transcription accuracy: the graph illustrates the influence
of the performance of the recognition engine over the transcription engine. It applies
94
two types of data in order to discuss the theme: firstly, data with any kind of errors
of the recognition engine and secondly, (filtered) data with no vocalised outline
identification and segmentation errors of the recognition engine.
Accuracy of the end result: the graph illustrates the accuracy of a list of results of an
input outline according to three measures: firstly, the accuracy of a correct word
appearing in the result list, secondly, the accuracy of the correct word appearing in
the top five group of the result list and thirdly, the accuracy of the correct word
appearing at the topmost position of the result list. Note that accuracies illustrated in
this graph are based on data with no vocalized outline and segmentation errors as the
correction of these errors is not included in the scope of this research.
Correction accuracy vs. classification/vowel errors: the graph illustrates the

correction accuracy of the Bayesian Network based word interpreter in relation to
the classification and vowel omission errors. Similarly, results reported in this graph
are based on data with no vocalised outline identification and segmentation errors.
Factors influencing the accuracy of a result list: the graph illustrates the average
distribution of factors influencing a correct word not to appear at the topmost
position of the result list. Similarly, results reported in this graph are based on data
with no vocalised outline identification and segmentation errors.
4.8.4 Analysis of Word Transcription Accuracy Using the Single

Consonant Data Set
4.8.4.1 Analysis of the Recognition Accuracy vs. the Transcription Accuracy
As shown (Figure 4.17), accuracy of the recognition engine, specifically, accuracy of the
vocalized outline identification and segmentation has a huge impact on accuracy of the
transcription engine. The study finds that approximately 73% of outlines written by writer A
the first time are not recognised as vocalised outlines and this directly affects the
95
transcription accuracy (less than 20%). It has been discussed that inadequacy of vocalised
outline identification of the recognition engine is mainly affected by the omission of vowels
among outlines and therefore, the indication of at least one vowel of a vocalised outline is
also encouraged in this research in order to achieve high transcription accuracy.
Words appeared in the reslut

list
Relationship between recognition accuracy and

transcription accuracy of the single-consonant
data set
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
In the presence of
any kind of
recognition errors
In the presence of no
vocalised outline
identification and
segmentation errors
Writer
Figure 4.17: Illustration of a relationship between recognition accuracy and

transcription accuracy of the single consonant data set
4.8.4.2 Analysis of the Accuracy of a Result List

As shown (Figure 4.19), approximately 93% of input data are interpreted with a result list
containing a correct word; 72% of the data are interpreted with a correct word appearing in
the top five group of a result list; 40% of the data are interpreted with a correct word
appearing at the topmost position of a result list on average.
An interesting phenomenon here is that although writer A has an immediate level of skill in
Pitmans Shorthand and writer B has an inexperienced level of skill in Pitmans Shorthand,
outlines written by writer B are transcribed more accurately than that of writer A. The study
finds that this is because the handwriting of writer B is more legible with more informative
96
pen strokes than that of writer A as compared (Figure 4.18). In relation to this finding, it is
remarked that the writing of legible scripts is encouraged in this current research to be able
to obtain high recognition and transcription accuracy.
Pitmans Shorthand outlines written by writer B
night
nod
note
nut
Pitmans Shorthand outlines written by writer A
night
nod
note
nut
Figure 4.18: Comparison of the handwriting of two writers
Evaluation of the word transcription accuracy of

the single consonant data set
100%
A correct word appearing
in a result list
80%
60%

in the top five group of a
result list
40%
20%
0%
A

at the topmost position
of a result list
Writer
Figure 4.19: Illustration of the word transcription accuracy of the single consonant
data set
97
4.8.4.3 Analysis of the Correction Accuracy vs. the Classification/Vowel

Errors
The graph (Figure 4.20) illustrates classification and vowel errors in comparison with the
correction accuracy where the correction accuracy indicates how much of the classification
and vowel errors are covered by the transcription engine respectively. Here, the
classification error, the vowel error and the correction accuracy are formulated as follow:
e
c 100
t
(4.18)
where c is the classification error, e is the number of words having a classification error and t
is the total number of input words.
f
100
t
(4.19)
where v is the vowel error, f is the number of words having omitted vowels and t is the total
number of input words.
b
100
t
(4.20)
where a is the correction accuracy, b is the total number of words interpreted correctly by
the transcription engine in the presence of the classification /vowel errors and t is the total
number of words having classification errors or a vowel errors respectively.
On average, the correction rate for the classification error is 76% and the correction rate for
the vowel error is 55%. This indicates that the Bayesian Network based outline models,
implemented in the transcription engine are capable of coping with the classification and
vowel errors.
98
Correction accuracy in comparison with the

classification or vowel errors of the Single
consonant data set
90%
Classification errors
80%
70%
60%
40%
Successful transcription
in the presence of
classification errors
30%
Vowel errors
50%
20%
10%
0%
A
in the presence of vowel
errors
Wrtier
Figure 4.20: Illustration of the correction accuracy in comparison with the

classification or vowel errors of the single consonant data set
4.8.4.4 Analysis of Factors Influencing the Accuracy of a Result List

As illustrated (Figure 4.21), the major factor (49%) that influences a correct word of an input
outline not to appear at the topmost position of a result list is because of the similarity of an
input outline to other outlines. This case is generally true for single-consonant data set as
outlines included in this data set are very similar, with minor differences of line thicknesses
and vowel positions. Other factors influencing a correct word not to appear at the topmost
position of a result list are classification errors (31%), vowel error (3%) and a combination
of similarity of an input outline to other outlines, classification error and vowel error (17%).
99
Average distribtuion of factors influencing the

accuracy of a result list (single-consonant data
set)
17%
3%
49%
31%
due to similarity to other outlines
due to classification errors
due to vowel errors
due to combination of similarity to other outlines, classification errors and
vowel errors
Figure 4.21: Illustration of an average distribution of factors influencing the accuracy

of a result list (single consonant data set)
4.8.5 Analysis
of
Word
Transcription
Accuracy
Using
Stroke-
combination Data Set

This section analyses word transcription accuracy for outlines containing two, or more than
two, strokes. The primary purpose of the study is to evaluate the transcription accuracy in
the presence of stroke combinations. Similar to the single-consonant data set, four types of
graphs are discussed as follow.

As shown (Figure 4.22), average transcription accuracy of filtered 2 stroke-combination data
set is 97% where the value is similar to an accuracy achieved by the single consonant data
set (Figure 4.17). However, in terms of unfiltered data 3 the overall transcription accuracy
of the stroke-combination data set decreases by 27%, compared to that of the singleconsonant data set. The study finds that this is mainly due to the increase of segmentation
2
3
Filtered data does not contain any vocalized outline identification or segmentation errors.
Unfiltered data contains any kind of recognition errors.
100
errors and in relation to this finding, it is evaluated that reliable outline segmentation is
important for the overall transcription accuracy in the case of having words with two or
more consonant strokes.

transcription accuracy of the stroke-combination
data set
Words appeared in the reslut

list
120%
100%
In the presence of
any kind of
recognition errors
80%
60%
vocalised outline
identification and
segmentation errors
40%
20%
0%
C
Writer
Figure 4.22: Illustration of the relationship between recognition accuracy and

transcription accuracy of the stroke-combination data set
4.8.5.2 Analysis of the Accuracy of a Result List

According to experimental results (Figure 4.23), approximately 97% of input data are
transcribed with a result list containing a correct word; 97% of input data are transcribed
with a correct word appearing in the top five of a result list; 55% of input data are
transcribed with a correct word appearing at the topmost position of a result list. On the
whole, the transcription accuracy of the stroke-combination data set increases by 25%,
compared to the single consonant data set.
101
An interesting phenomenon here is that although writer Cs data is not included in the
training data set, 96% of the writers outlines are transcribed with a correct word appearing
at the top-five of a result list. This indicates that the history based learning algorithm
implemented in Bayesian Network models can effectively cope with unseen patterns that are
not included in a training data set.

the stroke-combination data set
120%
in a result list
Accuracy
100%
80%
60%

in the top five of a result
list
40%
20%
0%
C

at the top most position
of a result list
Writer
Figure 4.23: Illustration of the word transcription accuracy of the stroke-combination

data set

Errors
correction accuracy for stroke-combination data set. The classification error, the vowel error
and the correction accuracy are calculated by applying formula 4.18, 4.19 and 4.20
respectively. As shown (Figure 4.24), none of the 3% of the classification errors of writer C
are corrected because the errors include patterns deviating completely from original patterns.
For instance, if the orientation of an input pattern is completely different from its original
form, the transcription engine does not cover this kind of error.
102
Note that writers rarely omit vowels in this data set, compared to the single-consonant data
set. Writers of this data set are encouraged to indicate at least one vowel of an outline,
mainly in order to avoid the rejection of substantial data at the recognition stage by a
vocalised outline detector.
Correction accuracy in comparison with the

classification or vowel errrors of the strokecombination data set
120%
100%
80%
60%
in the presence of
40%
Vowel errors
20%
0%
C
errors
Wrtier
Figure 4.24: Illustration of the correction accuracy in comparison with the

classification/vowel errors of the stroke combination data set

The graph (Figure 4.25) illustrates factors influencing a correct word of an input outline not
to appear at the topmost position of a result list, where 11% is due to the similarity of an
input outline to other outlines and the rest 89% is due to the combination of classification
errors, vowel errors and the similarity of input outlines to other outlines. An interesting
phenomenon here is that ambiguity due to the similarity of an input outline to other outlines
decreases by 38% compared to the single consonant data set. This indicates that outlines
become less ambiguous when they contain two or more consonant strokes.
103

accuracy of a result list (stroke-combination data
set)
due to similarity to other

outlines
11%
due to combination of
similarity to other
outlines, classification
errors and vowel errors
89%

of a result list (stroke-combination data set)
4.8.6 Analysis of Word Transcription Accuracy for the Special-rule Data

Set
This section analyses word transcription accuracy for outlines, written according to the
special rules of Pitmans Shorthand. The primary purpose of the study is to evaluate the
transcription performance in the presence of the application of the special rules of Pitmans
Shorthand, where consistency between patterns written by one writer to another becomes a
critical concern. Similar to the single-consonant and stroke-combination data sets, four
types of graphs are discussed as follow.

As illustrated (Figure 4.26), the transcription accuracy of filtered data 4achieves up to 100%,
however, the transcription accuracy of unfiltered data5 gets to as low as 1% for writer D.
Filtered data does not contain any vocalized outline identification error or segmentation error.
Unfiltered data contains any kind of recognition errors.
104
The study finds that this is mainly due to the preference of writer D writing outlines without
vowel components as well as due to the writing of incorrect outlines without fully following
the special rules of Pitmans Shorthand. In relation to this finding, it is remarked that the
writing of consistent outlines in accordance with the special rules of Pitmans Shorthand is
encouraged in this research in order to obtain high transcription accuracy.
Words appeared in the reslut list

transcription accuracy of the special-rule data set
120%
100%
In the presence of
any kind of
recognition errors
80%
60%
vocalised outline
identifcation and
segmentation errors
40%
20%
0%
A
Writer
Figure 4.26: Relationship between recognition accuracy and transcription accuracy of

the special-rule data set
4.8.6.2 Analysis of the Accuracy of the Result List

According to experimental results (Figure 4.27), approximately 85% of input data are
transcribed with a result list containing a correct word; 85% of input data are transcribed
with a correct word appearing at the top-five of a result list; 80% of input data are
transcribed with a correct word appearing at the topmost position of a result list. An
interesting phenomenon here is that the special-rule data set achieves the highest average
transcription accuracy (80%) in terms of a correct word appearing at the topmost position of
a result list.
105

the special-rule data set
120%
100%

in a result list
80%
60%

in the top five of a result
list
40%
20%
0%
A

at the topmost position
of a result list
Writer
Figure 4.27: Evaluation of the word transcription accuracy of the special-rule data set

Errors
correction accuracy for the special-rule data set. The classification errors, the vowel errors
and the correction accuracy are calculated by applying formula 4.18, 4.19 and 4.20
respectively. Experimental results show that there is no classification error or vowel error
for outlines written by writer A and writer D (i.e., data of the first time writing) and this
directly affects the overall transcription accuracy which achieves up to 100% for some cases
as illustrated(graph above). On average, 100% of the vowel errors and 67% of classification
errors are covered by the transcription engine.
106
Correction accuracy in comparison with

classification or vowel errors of the special-rule
data set
120%
100%
80%
60%
in the presence of
40%
Vowel errors
20%
0%
A
errors
Wrtier
Figure 4.28: Illustration of the correction accuracy in comparison with classification

or vowel errors of the special-rule data set

The graph (Figure 4.29) illustrates factors influencing a correct word of an input outline not
to appear at the topmost position of a result list, where 23% is due to the similarity of an
input outline to other outlines, 8% is due to classification errors and the rest 69% is due to
the combination of the classification errors, vowel errors and the similarity of input outlines
to other outlines.
107

accuracy of a result list (special-rule data set)
due to similarity to other
outlines
23%
due to classification
errors
8%
69%
due to combination of
similarity to other
outlines, classification
errors and vowel errors

of a result list (special-rule data set)
4.9 Summary and Discussion

In summary, this chapter proposes a novel primitive based text translation approach to
interpret handwritten Pitmans outlines into English words, using Bayesian Network based
outline models. Ambiguities of pen strokes and interactions between strokes of written
shorthand outlines are modelled in Bayesian Network outline models.
The word
interpretation comprises the network modelling, the belief propagation, the Bayesian
learning and the model selection. Experimental results of the new framework are presented,
following the full description of Bayesian Network based transcription algorithms.
An evaluation of the recognition engine is presented in combination with the experimental

results of the Bayesian Network based word interpreter, based on three data sets, namely:
the single-consonant data set, the stroke-combination data set and the special rule data set.
Overall, a primary issue discussed in relation to the performance of the recognition engine is
the indication of at least one vowel of an outline in order to avoid an incidence where
outlines are mistaken as short-forms instead of vocalised outlines. In terms of the feasibility
108
of enforcing the restriction to shorthand writers, approximately 80% of inexperienced

Pitmans Shorthand writers find the restriction is easily adaptable; however, approximately
60% of professional Pitmans Shorthand writers find the restriction is impractical, especially
in the case of speed writing. Therefore, a further improvement on algorithms of the vocalised
outline identifier is recommended where the indication of vowels of an outline will no longer
be mandatory.
From the aspect of the performance of the Bayesian Network based word interpreter, the
average transcription accuracies for the three (filtered6) data sets are 91% for a correct word
appearing in a result list, 85% for a correct word appearing in the top five of a result list and
58% for a correct word appearing at the topmost position of a result list. Overall, the
accuracy of 91% is satisfactory, however the accuracy of a correct word appearing at the
topmost position of a result list (58%) indicates that the homophones of the result list need to
be resolved with the application of contextual information. A resolution to this problem is
discussed in detail in chapter 6, which is about the phrase level transcription.
From the aspect of a relationship between the features of a writer and the transcription
accuracy, the study finds that gender and age of writers do not have significant influence on
the performance of the recognition and transcription systems. However, the study finds that
a writers skill in Pitmans Shorthand and a writers previous experience in using pen based
text entry systems are related to the overall transcription accuracy. Another consideration is
that writers of the current experiments are right-handed and a further analysis of the
transcription performance with left-handed writers is recommended. In addition, writers of
the current experiment use British English and further analysis of the transcription
performance with the writers who use American English is challenging. Since Pitmans
Shorthand is written phonetically, outlines written according to British English and
American English are different, especially in vowel notations.
6
Filtered data does not contain any vocalized outline identification error or segmentation error.
109
As a final discussion, a comparison of performance of the conventional phonetic based

transcription and performance of the novel primitive based transcription is given. Table
(Table 4-3) presents accuracies of the two approaches where results are based on the data set
used in experiments of the phonetic based transcription approach, presented in Chapter 3.
Table 4-3: Transcription accuracies of the phonetic based transcription and the
primitive based transcription approaches
Average Transcription
Primitive based
Phonetic based
Accuracy
transcription
transcription
Overall
93%
84%
100%
0%
0%
0%
100%
100%
omission or confusion
In the presence of inconsistent
writing
In the presence of
classification error
As shown (Table 4-3), an average transcription accuracy of the primitive based transcription
approach increases by 9% compared to that of the phonetic based transcription approach.
The study finds that this is mainly due to the increased correction accuracy of vowel errors
in the novel framework. Overall, performance of the novel proposed framework is promising
but must be improved upon in the areas discussed for it to become a commercially viable
system.
110
Generation of a machine-readable Pitmans Shorthand

lexicon
111
The previous chapter presented a novel solution as a means of interpreting handwritten
Pitmans Shorthand outlines using Bayesian Network algorithms, in which geometrical
features of the outlines are directly translated into English word(s). On the whole, the
solution was found to be efficient, mainly with the use of a machine-readable Pitmans
Shorthand dictionary that contains a set of shorthand outlines mapping to corresponding
English word(s). Based on a thorough literature review carried out in this research, no other
machine-readable (electronic) Pitmans Shorthand lexicon has ever been designed, making
this the first of its kind which has been developed specifically for this research. This may be
a major reason why none of the previous work (of the same framework) attempted to analyse
performance of the direct transcription of geometric primitives into English words.
Specifically, this chapter presents full details of the creation of the electronic Pitmans
Shorthand lexicon, developed for this research, under the following four sections:
1. Overview: overview of the rule based creation of the electronic Pitmans Shorthand
lexicon and discussion on general advantages and disadvantages of rule based
approaches.
2. Structure: description of the lexicon structure in terms of feature set, key and lexicon
as a whole.
3. Rules: description of rules, applied in our system to conform to the writing rules of
Pitmans Shorthand.
4. Experimental results: evaluation of the electronic Pitmans Shorthand lexicon,
mainly in terms of accuracy and homophone distribution.
112
5.1 Overview
Firstly, in order to clarify precisely what is meant by a Pitmans Shorthand lexicon, sample
entries of a conventional Pitmans Shorthand dictionary (available in book format) and
sample entries of an electronic Pitmans Shorthand dictionary are illustrated (Figure 5.1).
Key
Shorthand
Key
Word
airs
airs , erase
bake
bake
ball
ball
bays
bays, pays
airs, erase
erase
oak, go
oak
oak, go
go
pays, bays
pays
(a)
(b)
Figure 5.1: (a) sample entries of a conventional Pitmans Shorthand dictionary

available in book format (b) sample entries of an electronic Pitmans Shorthand
lexicon
A primary objective of the research presented in this chapter is to create an electronic

Pitmans Shorthand lexicon Figure 5.1(b), based on the concept of Figure 5.1 (a). A major
difference between them (Figure 5.1 (a) and (b)) is the relationship between keys and
elements each key (each word) is related to one and only one shorthand outline in the
conventional lexicon whereas each key (each shorthand outline) is related to one or more
than one word in the electronic lexicon. The latter layout is preferred in this work because
113
it directly relates to ambiguities of handwritten Pitmans Shorthand e.g., line thickness

ambiguities.
5.1.1 Rule-based Creation of the Electronic Pitmans Shorthand Lexicon

The creation of the electronic Pitmans Shorthand lexicon is based on the following four
basic steps, which are taken by stenographers while learning Pitmans Shorthand:
1. Gain prior knowledge of English pronunciation.
2. Memorise notations of Pitmans Shorthand.
3. Memorise writing rules of Pitmans Shorthand.
4. Write Pitmans Shorthand outlines, using the above learned knowledge.
In order to instruct a machine to generate the electronic Pitmans Shorthand lexicon

automatically, the above four steps are restructured as below:
1. Set up a phonetic lexicon with a series of phonemes as a key for word identification.
2. Define notations of Pitmans Shorthand in terms of low level geometric features, for
is defined as a combination of an anti-clockwise hook
instance, a consonant W
and an upward stroke
3. Define conversion rules that conform to the writing rules of Pitmans Shorthand.
4. Generate a series of geometric features for a given word using the phonetic lexicon
and conversion rules.
The Pitmans Shorthand lexicon is created using a set of conversion rules. When rule-based
algorithms are introduced in the field of handwriting recognition, one may argue that rules
are static and incapable of coping with natural ambiguities [Sy94], [Lr89]. Here, it is
important to realise that the rules reported in this chapter are applied only to static lexical
data, not to handwritten data; due to the use of this static lexical data, accuracy of the
Pitmans Shorthand lexicon becomes reliable. Like other rule-based approaches [FF93],
114
[Mm03], training is not required for the creation of the shorthand lexicon and rules can be
refined easily as needed if the lexicon is to be altered.
5.2 Structure of the Electronic Pitmans Shorthand Lexicon

5.2.1 Feature Set
The electronic Pitmans Shorthand lexicon includes 31 features representing phonemes of
a word. The features are represented in numerical form and are shown in Table 5-1.
Table 5-1: features of the electronic Pitmans Shorthand lexicon
Type
Pattern
Description
Type
/T/ or /D/
17
Large unclosed circle
/F/ or /V/
18
Large closed circle
/th/ or /TH/
19
Small unclosed loop
/P/ or /B/
20
Small closed loop
/M/
21
Large unclosed loop
/N/ or /NG/
22
Large closed loop
/K/ or /G/
23
Small hook
/SH/ or /ZH/
24
Large hook
/CH/ or /J/
25
Vowel
10
/R/ (downward)
26
Vowel
11
/L/ (upward)
27
Diphthong
12
/S/ or /Z/
28
Diphthong
13
/R/ (upward)
29
Diphthong
115
Pattern
Description
14
/L/ (upward)
30
Diphthong
15
Small unclosed circle
31
Diphones
16
Small closed circle
5.2.2 Key
A key corresponds to a vocalised shorthand outline and relates to one or more than one
word.
It is composed of consonants and vowels such that consonant primitives of a
vocalised outline are firstly allocated in chronological order with a series of vowel primitives
following at the end. A major reason for keeping vowel primitives at the end of a key is to
cope with the special writing order of Pitmans Shorthand i.e., consonants are always written
first with vowels placed around the consonant kernel later. Sample keys are given in Figure
5.2 where each key comprises two major components one containing consonant primitives
and the other containing vowel primitives. Both components are arranged in chronological
order such that a primitive at the end of the first component corresponds to the last
consonant of a vocalised outline and a primitive at the beginning of the second component
corresponds to the first vowel of the vocalised outline.
Pitmans Shorthand outlines for the word famous and yellow
famous
yellow
Word
Pronunciation
Phonemes in chronological writing

order of Pitmans Shorthand
Keys of a Pitmans
Shorthand lexicon
Famous
/F M S/
/F M S /
2+5+16+91+92
Yellow
/Y L W/
/Y L /
23+13+11+91+92
Figure 5.2: Sample keys of the electronic Pitmans Shorthand lexicon; vowels are
underlined
116
5.2.3 Lexicon Layout

A Pitmans Shorthand lexicon is a hash-table with each key mapping to one or more than
one word where words with the same key contain the same series of similar consonant
primitives. Here, similar consonant primitives stands for primitives of the same type with
different line thicknesses or lengths. Sample entries of the Pitmans Shorthand lexicon are
illustrated in Figure 5.3.
Lexicon
Sample
Key
Word
2+91
fee, father, further, after
4+91
pays, bays
Sample Pitmans Shorthand outlines
fee
father
after
pays
bays
Figure 5.3: Sample entries of the electronic Pitmans Shorthand lexicon
Sample one in Figure 5.3 presents a lexicon entry for the words fee, father, further and
after. The example indicates that words with similar geometric features of different
lengths belong to the same key. In order to recognise length variation of the words, consider
the sample Pitmans Shorthand outlines given in Figure 5.3.
Similarly, sample two in Figure 5.3 presents a lexicon entry for the words pays and
bays. The example indicates that words with similar geometric features of different
thicknesses belong to the same key. Consider the sample outlines illustrated in Figure 5.3 to
identify line thickness difference between the two words.
On the whole, the Pitmans Shorthand lexicon is created as follows:
117
P: a phonetic lexicon
N: numbers of words contained in P
Wi: ith word of the phonetic lexicon, P
Vi: phonetic representation of Wi
Si: a series of geometric features of Vi
table: a hash table, holding a Pitmans Shorthand lexicon
key: a key, representing a vocalised shorthand outline
value: word value to which a specified key is mapped in table
Initialisation
table = a hash table
Lexicon organisation
For i = 0 to N
//convert phonemes of a word into geometric features
Si = convertPhonemetoShorthand(Vi)
key = Si
//if a key already exists
if (table.containsKey(key))
value = table.get(key)
value += Wi
end
else if (!table.containsKey(key))
value = Wi
end
table.put(key,value)
end
118
5.3 Conversion Procedure

This section presents full details of a conversion procedure that is used to transform
phonemes of a word into a series of geometric features. Assuming that x is a set of
phonemes for a particular word and y is a shorthand representation for the word, a
conversion procedure can be defined as
y = convertPhonemeToShorthand(x)
(5.1)
For instance, if x is a set of phonemes /T D / (for the word today), then y is produced
by invoking the conversion procedure as follows:
y = convertPhonemeToShorthand(/T D /)
y = 1+ 1+ 91+ 92
In total, the conversion procedure comprises 46 rules, conforming to the writing rules of
Pitmans Shorthand 2000, defined in [Oj95]. In order to produce a primitive representation
for a given set of phonemes, the 46 rules are applied in an ascending priority order as follow:
Priority 1: 1st rule 17th rule
Priority 2: 18th rule 32nd rule
Priority 3: 33rd rule 36th rule
Priority 4: 37th rule 39th rule
Priority 5: 40th rule 41st rule
Priority 6: 42nd rule 43rd rule
Priority 7: 44th rule
Priority 8: 45th rule 46th rule
For instance, application of the 2nd rule must follow the completion of the 1st rule and
similarly, application of the 18th rule must follow the completion of the 17th rule. With the
119
aid of a diagram (Figure 5.4), data flow in the conversion procedure can be followed.
Input (phonemes of the word famous)
/F M S/
Rule 1
Rule 2
Rule 17
Rule 18 to 32
Rule 45 to 48
Output (primitives of the word famous)

2+5+16+91+92
Figure 5.4: Illustration of the conversion procedure
5.3.1 The Importance of Algorithms of the Presented Rules

The automatic generation of an electronic Pitmans Shorthand lexicon is, in fact, the
replication of a humans power of recalling a set of writing rules and producing shorthand
outlines for corresponding English words. This process seems to be trivial; however the
efficiency of replicating the exact ability of the human writer relies totally on the efficiency
of the rules implemented in the system. In reality, the implementation of these rules is
deeply complex, as it involves the consideration of several indefinite factors, such as the use
of different notations for the same phoneme, depending on the conformability of pen
movements; the use of various notations for the same pronunciation, depending on whether
the phonemes appear at the beginning, in the middle or at the end of a word; and so on. On
the whole, each rule replicates corresponding criteria on which stenographers base their
120
ability to produce shorthand outlines, and it is important to clarify the concept behind each
rule to enable the reader to understand how the complex writing rules of Pitmans Shorthand
are embedded in the system of this research.
5.3.2
Description of Rules
Table 5-2: Summary of the 46 rules applied in the creation of the Pitmans Shorthand
lexicon
Rule
Description
Verification of a vocalised outline
Diphthong U
CON or COM at the beginning of a word
WH
Negative prefix IL-, IM-, IN-, IR-, UN-
PL, BL, TL, DL, CHL, JL, KL, GL, used consonantly at the beginning, in the
middle or at the end of a word
FL, VL, ThL, ML, NL, SHL, used consonantly at the beginning of a word
SPR, STR, SKR at the beginning of a word
STER in the middle or at the end of a word
10
CON, COM, CUM, COG in the middle of a word
11
SES, SEZ, ZES, ZEZ at the end of a word
12
Past tense ED
13
ST at the beginning, in the middle or at the end of a word
14
Half length stroke for one syllable words
15
ING
16
INGS
17
Suffix SHIP
18
S or Z stroke
121
19
Suffix MENT
20
Suffix MENTAL
21
Suffix MENTALLY
22
Double length stroke
23
MD, ND
24
FR, VR, Thr, THR, SHR, ZHR, MR, NR, used consonantly at the beginning of
a word
25+26
PR, BR, TR, DR, CHR, JR, KR, GR, FR, VR, Thr, THR, SHR, ZHR, MR,
NR, used syllabically at the beginning, in the middle or at the end of a word
27
SKR, SGR
28
KW, GW
29
PL, BL, TL, DL, CHL, JL, KL, GL, used syllabically in the middle or at the
end of a word
30
FL, VL, THL, ML, NL, SHL, used syllabically in the middle or at the end of a
word
31
S followed by H
32
S+vowel+hookR, ST+vowel+hookR
33
Downward L
34
F or V hook at the end of a word
35
F or V hook in the middle of a word
36
SHUN
37
N hook
38
Upward L
39
Half length stroke in a word of two or more syllables
40
Suffix LY
41
Upward R and downward R
42
Dash H
122
43
Reversed FR, VR, Thr, THR
44
P, B, T, D, K, G, M, N, NG, F, V, Th, TH, R, CH, JH, SH, S, Z, ZH, H
45
Vowel extraction and appending
46
Vowel conversion
Table 5-2 presents a summary of the forty-six rules with a list of phonemes relating to each
of them. In order to avoid information overload, algorithms of just five rules are presented
in this section, and the remaining rules can be referenced in detail in Appendix A.
In general, the rules are discussed here in three aspects: complexity, objective and strategy.
The complexity of a rule corresponds to either direct conversion or indirect conversion.
Direct conversion directly converts phonemes into geometric features, whereas indirect
conversion performs the unusual conversion of phonemes into geometric features with
respect to the special writing rules of Pitmans Shorthand, invented for speed improvement
purposes. In addition, the objective of a rule corresponds to the major role of the rule, and
the strategy of a rule corresponds to a programming procedure of the rule.
Description of the 3rd Rule (CON and COM at the beginning of a word)
Complexity: indirect conversion
Objective: to convert the sounds CON and COM at the beginning of a word into a dot
primitive. A sample outline containing the sound COM at the beginning is illustrated in
Figure 5.5.
Strategy: if a word starts with the sound CON or COM, and if the sound CON or COM is
not followed by the sound ING, S, Z, T or D at the end of the word, convert the sound CON
or COM into a dot primitive.
123
Reference
Pitmans outline for the

word commence
Pitmans Shorthand notations
/COM/
/M/
/NS/
//
Pronunciation for the word commence

/ K M N S/
Figure 5.5: Illustration of the use of a dot primitive for the sound COM at the
beginning of a word
Description of the 5th Rule (Negative prefix IL-, IM-, IN-, IR-, UN-)
Objective: to convert the sound IL-, IM-, IN-, IR- or UN-, negative prefix of a word, into a
series of consonant and vowel primitives. A sample Pitmans Shorthand outline containing
the prefix IR- is illustrated in Figure 5.6.
Strategy:
1. Save words containing the prefix IL-, IM-, IN-, IR- or UN- in a list.
2. Check if a word representation of an input matches with any element of the list;
3. if it does and if the prefix is IL-, convert the sound IL into an upward stroke L,
followed by a dot primitive and another extra upward stroke L;
4. if it does and if the prefix is IM-, convert the sound IM- into a curve M, followed a
dot primitive and another extra curve M;
5. if it does and if the prefix is IR-, convert the sound IR- into a downward curve R,
followed by a dot primitive and another extra downward curve R;
6. if it does and if the prefix is IN-, convert the sound IN- into a curve N, followed by a
dot primitive and another extra curve N.
7. if it does and if the prefix is UN-, convert the sound UN- into curve N, followed by a
dash primitive and another extra curve N.
124
In addition, the 5th rule states that a consonant /D/ following the prefix IN- and UN- is not
allowed to be omitted. This is to avoid a conflict with the ND writing rule of Pitmans
Shorthand, in which the consonant /D/ following /N/ is omitted. Detail about the ND rule
can be referenced in Appendix B.
Reference
Pitmans Shorthand outline for
the word irregular
/R/
(start)
//
/R/
(middle)
//
/G/
/L/
/R/
(end)
//
/U/
Pronunciation for the word irregular

/ R G Y U L /
Figure 5.6: Illustration of the use of negative prefix IR- in a vocalised outline
Description of the 6th Rule ( PL, BL,,GL, used consonantly at the beginning, in the middle
or at the end of a word
Objective: to convert a pair of consonants PL, BL, TL, DL, CHL, JL, KL or GL at the
beginning, in the middle or at the end of a word into a series of a small hook L followed by a
corresponding consonant primitive. Note that the consonant L is written as an upward or
downward curve (instead of a hook) when it is not immediately following /P/, /B/, /T/, /D/,
/CH/, /J/, /K/ or /G/. A sample Pitmans Shorthand outline containing the sound /P L/ at the
beginning of a word is illustrated in Figure 5.7.
Strategy:
1. If /N/ comes before /T L/ or /D L/, hook L is not used.
2. If /T/ or /D/ does not appear in the same syllable as /L/, hook L is not used;
125
3. otherwise, replace phonemes /P L/, /B L/, /T L/, /D L/, /CH L/, /J L/, /K L/ and /GL/
of an input with a, b, c, d, e, f, g and h respectively, where
a = hook + P stroke
b= hook + B strokes
c = hook + T stroke
d = hook + D stroke
e = hook + CH stroke
f = hook + J stroke
g = hook + K stroke
h = hook + G stroke.
Reference
Pitmans Shorthand outline
for the word play
/P/
/L/
/PL/
//
Pronunciation for the word play

/P L /
Figure 5.7: Illustration of the use of PL hook in a vocalised outline
Description of the 14th rule (Half length stroke for one syllable words)
Objective: to omit /T/ or /D/ at the end of one syllable words. This relates to the half-length
writing rule of Pitmans Shorthand and a sample (one-syllable) half-length outline is
Strategy: if a word is a one-syllable word and if there are consonants in a word other than
just /R/ and /T/ or /T/, then /T/ or /D/ at the end the word is omitted, provided that /T/ is not
following a voiced consonant and /D/ is not following an unvoiced consonant.
126
Reference

for the word coat
/K/
/T/
//
Pronunciation for the word coat

/K T/
Figure 5.8: Illustration of a one syllable half-length outline
Description of the 22nd rule (Double length stroke)

Objective: to omit the syllables TER, -DER, -THER and -TURE of a word according to the
double-length rule of Pitmans Shorthand (description of the rule can be referenced in
appendix B). A sample Pitmans Shorthand outline containing the syllable -TER is
Strategy: if an input contains the syllable /TER/, /DER/, /THER/ or /TURE/ in the middle or
at the end, and if the syllable is not surrounded by incompatible neighbouring primitives, the
syllable is removed from the input phonemes.
Samples of incompatible neighbouring
primitives are given in Figure 5.10.

the word after
Reference
/F+TER/
/F/
//
Pronunciation for the word after

/ F T E R/
Figure 5.9: Illustration of the omission of the syllable TER in a vocalised outline
127
Primitive pairs that cannot be represented by doubling
/F K/
/V K/
/F G/
/V G/
Figure 5.10: illustration of incompatible primitive pairs for doubling

Experiments carried out in this chapter are categorized into two main studies: firstly, an
analysis of the accuracy of a novel machine-readable Pitmans Shorthand lexicon and
secondly, an analysis of the distribution of homophones (outlines which look similar but
have different representations) in the novel lexicon.
5.4.1 Data Set

In order to analyse an accuracy of a machine-readable Pitmans Shorthand lexicon, 1253
generally used English words are chosen where the words cover the whole range of writing
rules of Pitmans Shorthand, except the rules of currency notations and the rule of
punctuation.
In order to analyse the distribution of homophones in a machine-readable Pitmans

Shorthand lexicon, 5000 most frequently used English words, extracted from the Brown
Corpus are used. Based on the word list, a hash table is created with a series of primitives as
a key for each group of words where a primitive key is automatically generated by the
transcription engine from a phonetic representation of a word. A pictorial representation of
the electronic Pitmans Shorthand lexicon is presented (Figure 5.11).
128
key
Element
May
Maid
Made
Bat
Pat
Bad
Pad
Figure 5.11: Sample entries of a machine-readable Pitmans Shorthand lexicon
5.4.2 Analysis of the Accuracy of a Machine Readable Pitmans

Shorthand Lexicon
A primary goal of an experiment carried out in this section is to evaluate the accuracy of a
machine-readable Pitmans Shorthand lexicon, where the formulation of the lexicon
accuracy is defined as:
(t e)
* 100
t
(5.2)
where a is the accuracy of an electronic Pitmans Shorthand lexicon, t is the total number of
words included in a testing data set and e is the total number of incorrectly generated words
whose primitive representations do not match with patterns defined in an original Pitmans
Shorthand dictionary (available in book format).
129
Number of words
included in a lexicon
Accuracies of different sizes of machine-readable

Pitman's shorthand lexicon
1253
1100
900
700
500
300
100
70
75
80
85
90
95
100
Accuray in Percentage
Figure 5.12: Average accuracies of different sizes of machine-readable Pitmans

Shorthand lexicons
The graph (Figure 5.12) illustrates accuracies of machine-readable Pitmans lexicons of

different sizes where the word lists of each lexicon is chosen from a data set of 1253 words
randomly without including any duplicated words. The study finds that an accuracy of the
lexicon of 1253 words is 98.8% and an average accuracy of different sizes of lexicons is
99%. An average error rate is approximately 1% and errors are categorised into the
following four groups.
Errors due to an ambiguity of the writing rules of Pitmans Shorthand.
Errors due to different phonetic representations of an applied phonetic dictionary.
Errors due to derivative or compound words.
Errors due to limitations of machine compatible scripts
In order to clarify the four types of errors, consider the following four examples in which
each example provides a sample erroneous shorthand outline with a corresponding
explanation for each type of error.
Example 1: Errors due to ambiguity of the writing rules of Pitmans Shorthand
130
In order to clarify errors due to an ambiguity of the writing rules of Pitmans Shorthand,
consider one of the rules of Pitmans Shorthand which reads: Straight strokes are doubled
in length to represent the sounds of TER, -DER, -THER, and TURE when they follow
another stroke [Oj95].
According to this rule, the transcription engine generates a
shorthand representation for the word weather as Figure 5.13(a) in which the sound
THER is added via a doubled length stroke. However, the typical Pitmans Shorthand
dictionary (available in book format) defines the word weather as in Figure 5.13 (b) in
which the sound THER is not written according to the double-length rule. The study finds
that this is because the typical Pitmans Shorthand lexicon applies another rule of Pitmans
Shorthand which reads: A straight stroke is not doubled if the doubling would produce two
strokes of unequal length without an angle [Oj95].
To determine whether the word
weather relates to the first rule or the second rule, consider other two outlines
(
Figure 5.14), defined in the typical Pitmans Shorthand dictionary. Between the two words,
the typical dictionary defines that doubling is not allowed for the word factor as the curve
before the straight stroke will produce two strokes of unequal length if the straight stroke is
doubled (case a); however, it defines that doubling is allowed for the word further since
the word complies with the double length rule of Pitmans Shorthand (case b). On the
whole, the transcription engine assumes that the word weather belongs to the case (b)
rather than case (a) since doubling does not produce two strokes of unequal length if the
straight stroke is doubled to add the sound THER. As a result, a shorthand outline for the
word weather generated by the transcription engine is different from the one defined in the
131
typical Pitmans Shorthand dictionary and hence the error. Overall, a primary cause of error
in this case is due to ambiguity of the rules of Pitmans Shorthand.
Two different shorthand outlines for the word weather
(a) Correct outline
(b) Incorrect outline
Figure 5.13: Two different outlines for the word weather; (a) the word weather is
written according to the double-length rule of Pitmans Shorthand; (b) the word
weather is not written according to the double-length rule of Pitmans Shorthand
Shorthand outline for the word
factor
Shorthand outline for the word

further
(a)
(b)
Figure 5.14: (a) Shorthand outline for the word factor; (b) shorthand outline for the
word further
Example 2: Errors due to different phonetic representations of an applied phonetic

dictionary
In order to clarify errors due to different phonetic representations of a phonetic dictionary,
consider a phonetic representation of the word union. According to the applied phonetic
dictionary of this research (i.e., CMU phonetic dictionary), the word union is composed as
/Y N Y N/ whereas, the word is composed as /Y N N/ according to the typical
Pitmans Shorthand dictionary (available in book format). Note that there is an extra
phoneme /Y/ in the first composition and due to this difference, a shorthand representation
of the word union defined by the transcription engine is different from the one defined by
the typical Pitmans Shorthand dictionary as illustrated (Figure 5.15). In general, Pitmans
132
Shorthand outlines are generated according to phonetic representations of an applied

dictionary and therefore accuracy of the phonetic dictionary is critical in this research.
Two different shorthand outlines for the word union
(a) Correct outline
Figure 5.15: two different shorthand outlines for the word union
Example 3: Errors due to derivative or compound words

In order to clarify errors due to derivative or compound words, consider the compound word
landlord. According to the typical Pitmans Shorthand dictionary (available in book
format), the word landlord is remarked as a composition of two words land and lord;
however according to the transcription engine, the word landlord is remarked as a
composition of one word. As a result, shorthand outline representations for the word
landlord, generated by the transcription engine and the typical Pitmans Shorthand
dictionary become different as illustrated (Figure 5.16), mainly due to one of the rules of
Pitmans Shorthand that reads: a small hook written inside the end of a curved stroke adds
the final sound N [Oj95]. Since the transcription engine does not regard that the phoneme
/N/ at the end of the word land is a final phoneme, the N hook is not applied for the word
landlord and hence the error. In general, efficiency of the transcription engine in
identifying any derivatives and compound words relies on the information available in an
applied phonetic dictionary.
133
Two different shorthand outlines for the word landlord
Normal N stroke
N hook
(a) Correct outline
Figure 5.16: Two different outlines for the word landlord
Example 4: Errors due to the limitation of machine compatible scripts

In order to clarify errors due to the limitations of machine compatible scripts, consider a
shorthand representation for the word environment, which is defined either as Figure 5.17
(a) or Figure 5.17 (b) by the typical Pitmans Shorthand dictionary. Note that both of the
scripts are valid in this case according to the rule of Pitmans Shorthand that reads: the
suffix MENT is represented by
or
, whichever is convenient. [Oj95] In order to
reduce ambiguity for the computer aided transcription, the transcription engine restricts the
writing of the suffix MENT to be only one form and the inability of generating an
alternative form is taken as an error in the current research.
Two different shorthand outlines for the word environment
(a) Valid outline
(b) Valid outline
Figure 5.17: Two different outlines for the word environment
134
Distribution of different categories of errors in

machine-readable Pitman's shorthand lexicons of
different sizes
13%
40%
27%
20%
Errors due to ambiguity of the writing rules of Pitman's shorthand
Errors due to different phonetic representations of an applied phonetic dictionary
Errors due to derivative or compound words
Errors due to limitations of machine compatible scripts
Figure 5.18: The distribution of different categories of errors in different sizes of

electronic Pitmans Shorthand lexicon
On the whole, the graph (Figure 5.18) illustrates the distribution of four types of errors,
discovered in the current experiment where the major error (40%) is due to the limitation of
machine compatible scripts.
5.4.3 Analysis of the Distribution of Homophones in Machine-readable

Pitmans Shorthand Lexicons
A primary goal of an experiment carried out in this section is to estimate an average
distribution of candidate words (homophones) mapping to each key of a machine-readable
Pitmans Shorthand lexicon and to evaluate the distribution of homophones in different sizes
of lexicon.
135
Unique outlines in %
120
In the presence of clear
line thickness and
complete vowel
information
100
80
60
In the presence of line

thickness ambiguity
40
20

ambiguity
50
00
40
00
30
00
20
00
50
0
10
0
Lexicon size in no. of words
Figure 5.19: The distribution of uniqueness of the electronic Pitmans Shorthand

lexicons
Figure 5.19 illustrates experimental results carried out on different sizes of electronic
Pitmans Shorthand lexicon of up to 5000 words. The X-axis of the graph represents
different sizes of lexicons and words are sorted according to the frequency of usage in each
lexicon. The Y-axis of the graph represents the uniqueness of an electronic Pitmans
Shorthand lexicon where the formulation of the uniqueness can be defined as follow:
ta
* 100
t
(5.3)
where u is the uniqueness a lexicon, t is the total number of keys containing in the lexicon,
and a is the total number of keys having one and only one relationship with a corresponding
English word in the lexicon.
The first test (Figure 5.19) illustrates uniqueness of lexicons in the presence of clear
distinction between line thicknesses as well as in the presence of complete vowel
information. The study finds that uniqueness of the lexicon of 5000 most frequently used
English words is 96%. The maximum ambiguity is 4 candidate words per key and an
average ambiguity is 1.02 potential words per key.
136
The second test (Figure 5.19) illustrates uniqueness of lexicons in the presence of line
thickness ambiguity. According to experimental results, uniqueness of the lexicon of 5000
words drops by about 5% if there is no distinction between thick and thin strokes. The
maximum ambiguity is 5 candidate words per key and an average ambiguity is 1.05 potential
words per key.
Finally, the third test (Figure 5.19) illustrates uniqueness of lexicons in the presence of
vowel ambiguity. The study finds that uniqueness of the lexicon of 5000 words is
approximately 71% when vowel primitives are not included in the keys of a lexicon. The
maximum ambiguity is 15 candidate words per key and an average ambiguity is 1.22
potential words per key.
5.5 Discussion
On the whole, this chapter presents the creation of a novel machine-readable Pitmans
Shorthand lexicon in order to facilitate the direct translation of geometrical features of
shorthand outlines into English words. Experimental results present accuracies of different
sizes of electronic Pitmans Shorthand lexicon as well as the distribution of homophones in
the novel lexicon.
In relation to accuracies of the electronic Pitmans Shorthand lexicons, an average accuracy

of 99% is satisfactory, however further improvements on the correction of errors caused by
an applied lexicon can be carried out with the use of an appropriate dictionary. As for the
correction of errors relating to the rules of Pitmans Shorthand, the use of machine
compatible scripts is recommended; however it requires further assessment for user
acceptability.
137
In relation to the analysis of uniqueness of an electronic Pitmans Shorthand lexicon,

experimental results can be used to justify which type of electronic Pitmans Shorthand
lexicon is appropriate for the computer aided transcription of handwritten Pitmans
Shorthand. According to experimental results, a lexicon with low uniqueness is more robust
against the line thickness ambiguity or the vowel ambiguity, and the lexicon with the highest
uniqueness is the least robust against the natural ambiguity. Taking into consideration the
impracticability of having restrictions on natural ambiguity in handwriting recognition, it is
recommended that the use of either a lexicon with line thickness ambiguity or a lexicon with
vowel ambiguity or a combination of both is feasible for the real time transcription of
handwritten Pitmans Shorthand.
138
6. Phrase level transcription of online handwritten Pitmans Shorthand outlines
Phrase level transcription of online handwritten

Pitmans Shorthand outlines
139
This chapter focuses on the solutions to the phrase level recognition of online handwritten
Pitmans Shorthand outlines. The primary aims of this chapter are first to investigate a
contextual method that can effectively reduce homophone ambiguities appeared in a
resulting list of a corresponding handwritten Pitmans Shorthand outline; and second, to
propose a phrase level recognition framework to produce the most likely word sequence for
a written phrase using the Vertibi algorithm.
Unlike research carried out in the previous chapters of the thesis, which determined the
investigation into finding a novel solution for a given problem to be the one and only one
goal, this chapters research is carried out with multiple goals, i.e., to investigate conceptual
algorithms for implementing a handwritten Pitmans Shorthand phrase recogniser, and also
to consider the possibility of applying existing Application Program Interfaces (APIs)
[Mic04] to resolve the problem of handwritten Pitmans Shorthand phrase recognition. A
major bottleneck of the integration is access to the APIs hidden functions to enable the
Pitmans Shorthand recognisers candidate English words to be input into the APIs.
This chapter presents detailed studies carried out to meet the main objectives mentioned
above, and it is categorised into the following four sections:
-
Contextual rejection strategy: presents an effective novel word rejection strategy,

implemented using the critical contextual knowledge that people use when reading
handwritten shorthand notes.
Phrase level recognition algorithm: propose a conceptual solution to find the most
likely word sequences for a handwritten Pitmans Shorthand phrase with the use of
the Viterbi algorithm and statistical language modelling techniques.
Integration with APIs: discusses major difficulties in the integration of a Pitmans

Shorthand phrase recogniser with APIs of the Microsoft Tablet PC Platform
140
Software Development Kit [Tab04], and highlights the potential benefits of

successfully integrating the two components.
-
Experimental result: evaluates the performance of the new contextual rejection

strategy proposed in this chapter.
6.1 Contextual Rejection Strategy

Chapter 4 mentioned that the major factor preventing a correct word from appearing in the
topmost position of a result list for a given Pitmans Shorthand outline is the similarity of
input outlines to other outlines. This research denotes this problem as the homophone
ambiguity, and further resolution to this problem is discussed in relation to the word
rejection strategies here.
Several word rejection strategies [GKM+97], [PP02], [ MAG+02], have been applied in the
field of handwritten word recognition. Their reliability is related to their capability not to
accept false word hypotheses and not to reject true word hypotheses [Ka04]. Common
rejection thresholds reported in the literature are the class-threshold (e.g., [QAC05], which
rejects words according to their grammatical nature), the domain-threshold (e.g., [NB02],
which rejects words according to a user domain), the lexicon-threshold (e.g., [ESS+98],
which rejects words according to a lexicons confidence score) and the recogniser-threshold
(e.g., [PP02], which rejects words according to the confidence scores produced by a Hidden
Markov Model-based on-line handwriting recogniser).
In addition to the rejection strategies mentioned above, a critical contextual knowledge that
needs to be put into practice for rejecting homophones of handwritten Pitmans Shorthand
outlines is the shorthand outlines position. In Pitmans Shorthand, the outlines correct
positioning is highly critical [Oj95], as it provides a primary clue for retrieving vowel
information even though vowels are omitted in an outline. In general, an outlines position is
141
determined by the first pen-stroke in Pitmans Shorthand, such that if an outlines first stroke
is written above the writing line, it is considered to be written in the first position; if the first
stroke is written on the writing line, it is considered to be written in the second position; and
if the first stroke is written through the writing line, it is considered to be written in the third
position. Samples of three Pitmans Shorthand outlines written in three different positions
are illustrated in (Figure 6.1 a). As illustrated (Figure 6.1 (b)), although the three words
comprise exactly the same consonant stroke, their corresponding English words can be
easily identified by the differences between the outlines positions.
Pitmans Shorthand outlines for the words
at, aid and eat
Outline written in
the first position
at
Pitmans Shorthand outlines for the words at, aid

and eat (written without vowels)
Outline written in Outline written in

the second position the third position
aid
Outline written in
the first position
eat
at
(a)
Outline written in Outline written in

the second position the third position
aid
eat
(b)
Figure 6.1: Samples of Pitmans Shorthand outlines written in three different

positions; (a) outlines written including vowel notations, (b) outlines written without
vowel notations
Overall, stenographers apply the outlines position as a primary clue to identify the most
relevant words for a particular shorthand outline. However, this knowledge has never been
applied to solve the problem of homophone ambiguity in machine-based transcriptions.
Based on this observation, the contextual rejection strategy proposed in this chapter is
defined as:
W P(k) = WP(i) \ WP(i k)
(6.1)
where WP(i) is a list of candidate words for an input shorthand outline (written in different
positions) and k is an input outlines written position, which is 1 for the first position, 2 for
142
the second position and 3 for the third position. In order to clarify the algorithm, consider
Example 1 given below.
Example 1
Assuming that k = 1, W = {W1, W2, W3, W4, W5, W6} is a set of candidate words for a given
shorthand outline and P(i) of W1, W2, W3, W4, W5, W6 are 2, 1, 1, 3, 2 and 1 respectively,
then W P(k) is calculated as:
W P(k)
= WP(i) \ WP(i k)
W P(k)
= { W1, W2, W3, W4, W5, W6} \ { W1, W4, W5}

= {W2, W3, W6}
6.2 Handwritten Pitmans Shorthand Phrase Recognition
Short-form
interpreter
Ordered
word list
Bayesian Network
based vocalised outline
interpreter
Short-form
interpreter
Ordered
word list
Ordered
word list
Bayesian Network
based vocalised
outline interpreter
Ordered
word list
Pitmans Shorthand specific contextual word rejection
Filtered
word list
Filtered
word list
Pitmans Shorthand
phrase level
transcription
Language
model
Filtered
word list
Filtered
word list
Word
graph
The cat sat on the mat.
Figure 6.2: illustration of the handwritten Pitmans Shorthand phrase level

transcription process
The structure of the phrase level transcription process (for handwritten Pitmans Shorthand)
is illustrated in Figure 6.2, where the framework is based on the architecture of the online
143
handwritten English sentence recognition [QAC05]. In brief, every handwritten Pitmans

Shorthand outline is given as an input to the short-form interpreter or the Bayesian Network
based vocalised outline interpreter, and a ranked list of candidate words for each outline is
produced at the end of the word recognition process. The candidate words are then validated
by the contextual rejection strategy, which removes irrelevant words from the candidate lists
before forwarding them to the phrase level recogniser. The phrase level recogniser then
creates a word graph with the incoming word lists such that each node represents a candidate
words likelihood, and each edge represents the transitional probability (i.e., language model
probability) between node n1 and node n2. The phrase level recogniser then finds the most
likely sequence of words for a given input phrase by applying the Viterbi algorithm to the
word graph.
Based on the algorithm defined in the online handwritten English sentence recognition
[QAC05], the most likely sequence of words for a handwritten Pitmans Shorthand
phrase is defined as:
= arg max p ( P | W ) p (W )
(6.2)
where W is the candidate words sequences, P is the handwritten Pitmans Shorthand phrase,
p(P|W) is the posterior probability of a handwritten phrase P conditioned on the given
sequence of words W, and p(W) is the prior probability of sequence W.
In detail, p(P|W) is evaluated by the confidence score of the Bayesian Network based word
interpreter and p(W) is given by a statistical language model. In other words, the efficiency
of finding the best sequence of words for a given input phrase depends on the confidence
score of the handwritten word recogniser plus the confidence score of the applied statistical
language model.
144
This chapter focuses on the statistical language models impact on phrase level recognition,
because a language models quality can directly affect the overall word recognition
accuracy. For instance, [MB01] showed that a bi-gram language model outperforms a
unigram language model in offline handwritten sentence recognition. Similarly, work by
[ZB04] showed that a tri-gram model increases word recognition accuracy by 6.8%
compared to a bi-gram model. Again, work by [QAC05] showed that using bi-gram and trigram models for online handwritten sentence recognition results in only a 0.1% difference in
word recognition accuracy. These findings show that it is critical to apply an appropriate
statistical language model in order to obtain an overall promising result.
Considering a statistical language models quality, this chapter proposes the use of the
statistical language model embedded in Microsoft handwriting recognition APIs [Mic04], in
which the language model has been thoroughly trained with millions of words of various
languages, dictionaries and grammar for the development of a commercially viable system.
6.3
The Integration of a Pitmans Shorthand Phrase Recogniser

with APIs
This section presents a feasibility study of the integration of a Pitmans Shorthand phrase
recogniser with Microsoft handwriting recognition APIs, in order to take advantage of
existing statistical language models embedded in the APIs. In order to discuss the specific
API relating to this study, consider the object model illustrated in Figure 6.3.
145
Ink Collection
and Display
Ink Data
Recognition
Recognizer
RecognizerContext
WordList
RecognizerGuide
RecognitionResult
RecognitionAlternates
RecognitionAlternate
Strokes
Stroke
Gesture
Figure 6.3: An abstract view of the object model Microsoft.Ink
Figure 6.3 illustrates an object model for the class Microsoft.Ink that includes child
objects facilitating automatic handwriting recognition. A specific object relating to the
current study is the RecogniserContext object, which enables ink recognition, retrieving
the recognitions result and alternative results.
In order to clarify the RecogniserContexts efficiency, consider the examples illustrated in

Figure 6.4. Figure 6.4 (a) illustrates the recognition results for a written phrase produced by
146
the RecogniserContext API, Figure 6.4 (b) illustrates the change in recognition results
upon a new words arrival, Figure 6.4 (c) illustrates the change in the recognition results
when the APIs context is limited to a full file paths name, and Figure 6.4 (d) illustrates the
change in recognition results when the APIs context is limited to an e-mail address
username.
(a)
(b)
(c)
(d)
Figure 6.4: Screen shots of the recognition results produced by the

RecogniserContext API
In total, approximately 40 kinds of input scopes can be defined in relation to the APIs
context. A major bottleneck in integrating the handwritten Pitmans Shorthand recogniser
into this powerful API is the APIs lack of function that can facilitate taking a written
phrases candidate words as inputs, and producing a ranked list of candidate phrases as an
output. Overall, the investigation of a solution to facilitate this function is rewarding and is
open to future work of this research.
147
6.4
Experimental Results
A small experiment is carried out in this chapter to evaluate the performance of the
contextual rejection strategy reported in this chapter. The data set includes 700 phrases,
which were automatically generated using the word lists gained from the experiments of the
Bayesian Network based word transcription (Chapter 4). A primary goal of this experiment
is to analyse the accuracy of irrelevant words removal from the result lists based on the
shorthand outlines position information, which defines rejection accuracy as:
100
a
0
If correct words appear in result lists after applying the rejection strategy
Otherwise
(6.3)
120
accuracy
100
80
60
40
20
0
1
30 59 88 117 146 175 204 233 262 291 320 349 378 407 436 465 494 523 552 581 610 639 668 697
phrase number
Figure 6.5: Performance of the contextual rejection strategy
The rejection accuracy of 700 phrases is illustrated in Figure 6.5. The study finds that the
contextual rejection strategy correctly filtered 98% of the phrases, and that inaccurate
position writing, practised primarily by inexperienced writers, caused the 2% error rate. The
findings show that the contextual rejection strategy proposed in this chapter is highly reliable
in conjunction with accurate position writing.
6.5 Discussion
This chapter presents Pitmans Shorthand specific contextual knowledge to reduce
handwritten Pitmans Shorthands homophone ambiguities. Theoretical algorithms to
148
resolve the problem of handwritten Pitmans Shorthand phrase recognition are proposed
with the use of the Viterbi algorithm and language models. In relation to the use of sufficient
statistical language models in order to enhance the phrase level recognition performance, a
feasibility study of the phrase level recognisers integration with the existing handwriting
recognition APIs is carried out. The study highlights the APIs efficiency and proposes the
potential benefits of successfully integrating the two components.
Overall, this chapter has addressed solutions to the online handwritten Pitmans Shorthand
phrase recognition problem; however the framework is not fully implemented in this
research mainly because the research into this problem is no longer new, and an established
framework is available in the market. Compared to phrase level transcription, investigation
into novel solutions to handwritten Pitmans Shorthands word level transcription problems
has been more emphasised in this research, as the state of the art of the latter needs more
extensive research in order to produce a commercially viable handwritten Pitmans
Shorthand recogniser.
149
7. Graphical User Interfaces of the Handwritten Pitmans Shorthand Recognition System
Graphical User Interfaces of the Handwritten Pitmans

Shorthand Recognition System
150
Previous chapters presented full details of the back-end interpretation of handwritten
Pitmans Shorthand outlines, whereas this chapter presents the research and development of
front-end user interfaces, via which users of the handwritten Pitmans Shorthand recogniser
interact with the back-end programs. The primary objective of the chapter is to demonstrate
the commercial viability of the end result of this research with a series of well-designed
prototypes. Figure 7.1 depicts the front-end and back-end layers of the system.
Developer
Volunteers for
the training
data collection
Client layer
End users
Front-end layer
Tool layer
Back-end layer
Database layer
Figure 7.1: Front-end and back-end architecture of the system
The chapter includes six main topics, outlined as follow:

1. Overview: a description of interactions between user interfaces and back-end
engines, including clarification of an applied programming environment for each
front-end and back-end program.
151
2. Pen based Application Program Interfaces (APIs): a brief description of Microsoft

tablet PC APIs, in particular ink APIs which are primarily used in the development
of Graphical User Interfaces (GUIs) of the system.
3. Training data collection GUIs: description of data collection GUIs with which a
large collection of handwritten Pitmans Shorthand outlines is collected for training
purposes.
4. Developer GUI: description of a low level parameter setting interface, mainly
implemented for system developers.
5. End-user GUIs: description and comparison of end-user interfaces of this research
which enable handwritten Pitmans Shorthand data entry into tablet PCs.
6. Experiment: evaluation of user feedbacks on the presented prototypes, particularly
from the perspective of potential users acceptability.
7.1 Overview
Visual C#
Visual C++
Java
Visual C#
Ink Collector
GUI
Recognition
engine
Transcription
engine
Result GUI
0 1050 1169
31 1051 1167
46 1053 1171
.
.
WStart
S1,32,1,25,1
S2,32,92,3,0.997942
S2,32,92,12,0.24453
.
.
Data file 1
play 0.322
bay 0.112
clay 0.001
.
.
Data file 3
Data file 2
Legends
File storage
Process flow
Process
Read/write access
Figure 7.2: Illustration of interactions between user interfaces and back-end engines
of the system
152
A brief overview of the interactions between front-end interfaces and back-end engines are
illustrated (Figure 7.2). As shown, handwritten ink data, collected by an ink collector GUI is
firstly saved in a data file (data file 1 in Figure 7.2). The data file is then processed by the
collaborators recognition engine where a series of ink coordinates are transformed into a
ranked list of words or primitives (data file 2 in Figure 7.2). Then, on arrival of the
classified data, the transcription engine invokes and produces a ranked list of n-best words
for the corresponding classified data (data file 3 in Figure 7.2). Once the transcription result
is ready, the front-end GUI retrieves the result file and displays it as the best text
representation for a written outline.
As discussed, a primary means of data transmission between components of the current

system is via file accesses.
In this way, development of the front-end and back-end
programs becomes independent without necessarily waiting for the completion of each
other; thereby enabling the concurrent development of several components of the system in
two countries to become productive. In addition, the current system includes more than one
programming environment and data files are, in fact, primary media communicating
programs of the different environments. The graphical user interfaces presented in this
chapter are implemented using tablet PCs with Microsoft Windows XP Tablet PC Edition.
The detailed programming environments included in the current system development are:
-
Microsoft visual C#, used in the development of front end user interfaces.
Microsoft C++, used in the development of the collaborators recognition engine.
Sun Java (J2SDK), used in the development of the transcription algorithms.
7.2 Ink Data Collection in this Research

This section presents an essential description of the ink collection procedure carried out in
this research. The description is linked to the Microsoft tablet PC platform APIs [Mic04]
in order to enable the (interested) reader to test a simple ink collection program practically.
In other words, the presented ink collection procedure here is applicable to not only the
153
collection of online handwritten Pitmans Shorthand data, but also the collection of any kind
of ink data needed for various purposes.
In general, the Tablet PC platform APIs relating to the ink collection can be divided into
three distinct groups: ink collection APIs, ink data management APIs and ink recognition
APIs. A pictorial presentation of how these APIs work together, at a high level, is provided
at the MSDN library [Abo04] and the illustration is replicated here (Figure 7.3) as a
reference for discussion.
Figure 7.3 Illustration of the Tablet PC platform APIs presented at [Abo04]
According to the pictorial presentation of the Tablet PC platform APIs (Figure 7.3), the ink
collection procedure of this research relates to the utilization of Pen APIs (i.e., the first stage
of Figure 7.3). Here, clarification is made why APIs of the other stages (i.e., No. 2, 3 and 4
stages of Figure 7.3) are not applicable to the current research regardless of their efficient
ink manipulating and recognition capabilities. This is because the Tablet PC platform APIs
support the processing of only fifteen handwritten languages at the time of writing, and
Pitmans Shorthand is not one of them. In brief, only the ink collection APIs are applicable
to the current research, and the remaining functions of ink manipulation and ink recognition
154
are covered by the recognition engine and the transcription engine of this research
respectively.
Figure 7.4 depicts the high level relationship of object models of the tablet PC APIs where a
hierarchical relationship of an ink collection object, namely InkCollector is highlighted.
The primary function of this object is capturing a series of ink coordinates and timestamps of
a pen-stroke. In general, any handwriting data written on handheld devices with a Microsoft
tablet PC platform can be collected with the use of an InkCollector object.
155
Ink Collection
InkTablets
and Display
IInkTablet
InkCollector
InkDisp
IInkCursors
IInkCursor
InkDrawingAttributes
IInkCursorButtons
IInkCursorButton
INKEDLib
InkEdit
PenInputPanel
InkOverlay
InkPicture
InkRenderer
InkRectangle
InkTransform
Ink Data
Recognition
Figure 7.4: Illustration of the high level relationship of object models of the Tablet PC
platform APIs
156
7.3 General Training Data Collection Tool
Figure 7.5: Home page of the training data collector
The interfaces (Figure 7.5 & Figure 7.6) of this research are particularly designed for the
collection of a large sum of handwriting data for system training purposes. It has been
applied as a primary data collection tool in this research, and it can also be applied as a
general data collection tool for any other kinds of handwriting recognition systems.
A primary purpose of the interfaces in this research is to collect and organise training data
effectively as well as to enable volunteers (shorthand writers) to have a user friendly
experience of entering Pitmans Shorthand outlines into a tablet PC. Mention should be
made that Pitmans Shorthand was once widely practised as a speech recording mechanism,
but more recently it has ceased being a popular writing system. Therefore, volunteers of the
157
training data collection process can be of various ages as well as domains. In addition, tablet
PCs are fairly new devices for the general population at the time of writing, and no more
than 20% of the volunteers of this research have previous experience of using pen-based
computers. Taking into account these factors, an important criterion is set in relation to the
layout of the training data collector i.e., functions of the interfaces should be kept as simple
as possible, and the appearance of the interfaces must be suitable for volunteers of various
ages and domains.
Figure 7.6: Sample data entry page of the training data collector GUI
In general, the training data collector collects two types of data: writer data and ink data.
The writer data is intended for the evaluation of the overall system performance and the ink
158
data is intended for the training of the transcription engine. As illustrated (Figure 7.5), the
home page of the training data collector collects the following writer information:
Name: intended for automatic naming of training data folders and files.
Gender: intended for evaluating whether the transcription accuracy varies between
female writers and male writers.
Skill in Pitmans Shorthand: intended for evaluating whether the transcription

accuracy varies depending on the skill of a writer in Pitmans Shorthand, where the
skill is categorised into three levels: professional, intermediate and inexperienced.
First language of a writer: intended for evaluating whether the transcription

accuracy is influenced by a writers skill in English pronunciation. Since Pitmans
Shorthand is based on phonetics, technically, non-native English speakers may find
pronunciation more difficult than native speakers and they may produce more
inaccurate shorthand outlines.
Previous experience in pen-based data entry system: intended for evaluating

whether the transcription accuracy is influenced by a writers previous experience in
using a pen-based text entry system.
Tidiness of handwriting: intended for evaluating whether the transcription accuracy

is influenced by the tidiness of a users handwriting, where the tidiness is
categorised into three levels here: very neat and tidy, legible enough to others but
not very tidy, and legible to me but not to others.
Domain of a writer: intended for evaluating whether the transcription accuracy

varies depending on the change of domains.
Way of writing: intended for evaluating whether the transcription accuracy varies
depending on whether the writer is left-handed or right-handed.
159
7.4 Developer Graphical User Interface
Figure 7.7: Screen shot of the developer Graphical Interface
The developer GUI (Figure 7.7) provides an advanced setting of the system where its
functions are particularly intended for system developers. Since it is a gateway to control
parameters of the recognition and transcription engines, it has presented huge benefits to the
current research and development. Moreover, it is also intended to be beneficial to future
system developers whose work is going to be based on this research. Functions included in
this interface are:
A change of lexicon: a file dialogue is provided to specify the location of a new

lexicon. Domain specific knowledge is critical for the transcription of Pitmans
Shorthand [NB02] and therefore this function enables a change of an appropriate
lexicon for a particular domain.
Definition of training data set: a text area is provided to enter a list of words that are
to be collected for training purposes. Since Pitmans Shorthand is no longer a
popular writing system nowadays, databases of handwritten Pitmans Shorthand
outlines, designed for training a handwriting recogniser, are not available at the time
160
of writing. As a result, the current research includes collection of several training

data sets.
Execution of the recognition engine (classification): a control dialogue is provided

to execute the recognition engine independently. A primary purpose of this function
is to convert a series of ink data files (e.g., data file 1 in Figure 7.2) into a series of
classified data files (e.g., data file 2 in Figure 7.2) in a batch by running the
recognition engine separately. It has been mentioned that the ink collector and the
recognition engine (Figure 7.2) are capable of running independently with data files
created in between. At the time of the training data collection in this research, the
ink collector is run separately, mainly to reduce frustration to volunteers who input
hundreds of shorthand outlines into the system in a limited amount of time.
Parameter setting: a control dialogue is provided to adjust parameters, used in the

training of Bayesian Network based shorthand outline models. The use of this
interface is essential to train the transcription engine, because it is theoretically
inflexible, to have multiple training samples for every word of a dictionary
regardless of a large collection of training data in this research. With the use of the
parameter setting interface, Bayesian Network based outline models can be trained
with training data and history data or either one of them. In addition, the interface
also enables the specification of a preferred training data set, to train the
transcription engine.
7.5 Shorthand Data Entry Graphical User Interfaces

The shorthand data entry GUIs presented in this section are the first graphical user
interfaces, ever implemented to facilitate handwritten Pitmans Shorthand entry into tablet
PCs. Discussion on the interfaces are presented below in comparison with the collaborators
interfaces that were concurrently developed.
161
Firstly, a screen shot of an end-user interface, proposed by the collaborator is illustrated

(Figure 7.8) where box 3 facilitates a handwritten ink entry into the system; box 1 provides
segmentation and classification results of a written script; box 2 provides a list of n-best
words for the written script. Similarly, another (most recent) version of the collaborators
interface is presented (Figure 7.9) where the components are basically the same as the
previous version (Figure 7.8) with additional writing areas and new features for adjusting the
parameters of the recognition engine. On the whole, the end-user interfaces suggested by the
collaborator are mainly beneficial to system developers since they emphasise the back-end
view of the recognition and transcription engines.
Box 1
Box 2
Box 3
Figure 7.8: The first version of the collaborators tablet PC interface for the
handwritten Pitmans Shorthand recognition system
162
Parameter
control
functions
make a cake
Textual output
make
the
Transcription
results
Recognition
results
Input area
Figure 7.9: The latest version of the collaborators tablet PC interface for Pitmans
Shorthand recognition system
Unlike the interfaces developed by the collaborator, end-user interfaces of this thesis put
emphasis on the usability issues including user friendliness, commercial viability and
completeness of the system. From the aspect of user friendliness, the research interfaces are
designed to look similar to a conventional shorthand note-pad. In this way, primary users
(stenographers) of the system are expected to get used to the interfaces quickly, thereby
enabling a short learning curve.
While taking into account the creation of a note pad like interface, a pen-input area (writing
area) becomes a critical concern i.e., whether a square writing box should be designed for
163
the writing of a single word or multiple words. In general, facilitating the writing of
multiple words in a single writing area suffers word boundary ambiguities. However, on the
other hand, enabling the writing of N words in N writing areas suffers wasted screen spaces.
In this research, the collaborators recognition engine encourages the writing of one and only
one word in each writing area in order to reduce word boundary ambiguities. As a result,
end-user interfaces of this research also encourage the writing of N words in N writing areas.
Regardless of the use of several writing boxes in the interface, the original goal (i.e., to
create a note pad like interface) is achieved by connecting the boxes with faded borders as
In addition, the dimension of a writing area is discussed in relation to the creation of a notepad like interface. The size of Pitmans Shorthand note-pads, commonly used by
stenographers, is roughly A5 size (210mm x 149 mm) with approximately 8mm line
intervals [Lg90]. By taking a ratio of the size of a note pad to the size of 15 digitiser of a
tablet PC (e.g., 1024 pixel x 768 pixel), the 8 mm line interval is considered to be equal to
approximately 30 pixels. Based on these measurements, the dimension of individual writing
box of the interface is set at 100 pixels x 60 pixels with a line interval of 30 pixels. The
solution appears to be practical but requires further assessment for user acceptability.
From the aspect of commercial viability, the overall presentation of the interface is designed
to look good in addition to its reliable functioning. On the whole, users are provided with a
choice of two layouts to interact with the final interfaces of the system. The first one (Figure
7.10) is designed to be practical for its rapid note taking purpose and it resembles a
conventional shorthand note-pad. The second one (Figure 7.11) is designed to be practical
for its general text entry purpose and it resembles a handwriting recogniser of Microsoft
Windows XP Tablet PC edition.
164
From the aspect of completeness, the final interfaces of this research perform as a gateway to
access to any components of the system, including a data entry GUI with a list of toolboxes
for text edition and parameter setting, the developer GUI (Figure 7.7), the training data
collection GUI (Figure 7.5), and a back-end view of the system (similar to the one proposed
by the collaborator (Figure 7.9)).
Regardless of the involvement of these multiple
components, the simplicity of the look of the interfaces is achieved by hiding any advanced
level control components with show/hide functions to open/close them respectively, as
illustrated (Figure 7.10 & Figure 7.11).
make a cake
Textual output
make
Text editing
toolbox
cake
Writing area
Page feed
Advanced setting
toolbox
Figure 7.10: Screenshot of a note-pad layout of the end-user interface of this research
165
Advanced setting
toolbox
Textual output
Text editing toolbox
make a cake
make
cake
Writing area
Page feed
Figure 7.11: Screenshot of an alternative layout of the end-user interface of this

research

A small experiment is carried out in this chapter to analyse user feedbacks on the graphical
user interfaces (GUIs) presented in this chapter. An ultimate goal of the experiment is to
determine the most feasible GUI for the automatic handwritten Pitmans Shorthand
recogniser that can be presented as a commercially viable prototype.
On the whole, four types of GUIs are evaluated in the experiment. Two of them were
developed by the collaborated research team and the other two were developed in the
research and development of this thesis. In order to assist the reader to easily recognise the
four GUIs, thumbnail views of the GUIs are presented in Figure 7.12. The experiment was
conducted by 20 persons with different levels of skill in Pitmans Shorthand (including those
with no background knowledge in Pitmans Shorthand up to those with professional level of
skill in Pitmans Shorthand).
166
Prototype 1
Prototype 2
Prototype 3
Prototype 4
Figure 7.12: Thumbnails of the four GUIs evaluated in the experiment
In general, experiments carried out in this chapter are categorised into four groups as
follows:
The general distribution of user fondness for the presented prototypes.
The distribution of user fondness for the presented prototypes in the case of speed
writing.
The distribution of user fondness for the presented prototypes in the case of general
text entry into handheld devices.
The comparison of the most favourite GUI of experienced shorthand writers and that
of novice shorthand writers.
167
7.6.1 Analysis of the General Distribution of User Fondness for the

Presented Prototypes
percentage of user
The general distribution of user fondness for the

four prototypes
100%
80%
60%
40%
20%
0%
1
level of preference (1 represents the most favourite and 4

represents the least favourite)
prototpye 1
prototype 2
prototype 3
prototype 4
Figure 7.13: The general distribution of user fondness for the presented prototypes
(Figure 7.13) illustrates the level of user fondness for the four prototypes where the X-axis
represents the level of user preference for a specific prototype over the others, and the Y-axis
represents the percentage of users. Experimental results show that prototype 4 is the most
favourite GUI for 60% of users, and the prototype 1 is the least favourite GUI for 95% of
users.
168
7.6.2 Analysis of the Distribution of User Fondness for the Presented

Prototypes in the Case of Speed Writing
percentage of user
The distribution of user fondness for

the four prototypes in the case of
speed writing
80%
70%
60%
50%
40%
30%
20%
10%
0%
prototype 1 prototype 2 prototype 3 prototype 4
Figure 7.14: The distribution of user fondness for the presented prototypes in the
case of speed writing
(Figure 7.14) illustrates the level of user fondness for the four prototypes, especially in
relation to the need for rapid writing, for instance, in the case of the real time recording of
spoken speech. An interesting phenomenon here is that prototype 3 becomes the most
preferred GUI over the others, in particular prototype 4. Note that, prototype 4 is the most
favourite GUI in the experiment for general cases (Figure 7.13). This finding shows that the
majority of users regard that a notepad like interface is more appropriate for rapid notetaking purposes.
7.6.3 Analysis of the Distribution of User Fondness for the Presented

Prototypes in the case of a Small Amount of Text Entry into
Handheld Devices
169
percentage of user
The distribution of user fondness for the

four prototypes in the case of small text
entry
70%
60%
50%
40%
30%
20%
10%
0%
prototype 1 prototype 2
prototype 3 prototype 4
Figure 7.15: The distribution of user fondness for the presented prototypes in the
case of a small amount of text entry into handheld devices
Figure 7.15 illustrates the level of user fondness for the four prototypes, particularly in
relation to the need for entering a small amount of textual information into handheld devices,
for instance, entering a persons name into the name field of an address book. In contrast to
the case in Figure 7.14, the study finds that prototype 4 becomes the most favourite GUI
mainly due to the small amount of screen space taken by the shorthand recogniser while
other applications need to be run at the same time.
7.6.4 The Comparison of the Most Favourite GUI of Experienced

Shorthand Writers and that of Novice Shorthand Writers
170
precentage of user
The comparison of the most favourite

GUI of experienced shorhtand writers
and that of novice shorthand writers
120%
100%
80%
60%
40%
20%
0%
prototype 1 prototype 2 prototype 3 prototype 4
experienced shorthand writer
Novice shorthand writer
Figure 7.16: The comparison of the most favourite GUI of experienced shorthand
writers and that of novice shorthand writers
Finally, a comparison of the most favourite GUI of experienced shorthand writers and that of
novice shorthand writers (for the general purpose of use) is given in Figure 7.16. The study
finds that 100% of experienced shorthand writers prefer prototype 3 over the others, whereas
the majority of novice writers (80%) prefer prototype 4 over the others.
7.7 Discussion
This chapter presents the research and development findings of prototypes of the automatic
handwritten Pitmans Shorthand recogniser. It takes a step towards a commercialization of
the product by showing what can be done with the prototypes of the handwritten Pitmans
Shorthand recogniser.
According to the experimental results, it is found that prototype 3 and prototype 4,

developed by the research in this thesis, are preferred to the other two prototypes which were
developed by the collaborated research team. In addition, the study finds that the user
preference between prototype 3 and prototype 4 varies from time to time depending on the
purpose of use. Taking into consideration these findings, the end-user interface of the system
171
is finally designed with the integration of prototype 3 and prototype 4 so that users are
provided with a choice of two different layouts (i.e., prototype 3 or prototype 4) in order to
interact with the automatic handwritten Pitmans Shorthand recogniser on tablet PCs.
172
8. Conclusion
Conclusion
173
8. Conclusion
This chapter presents the summary and conclusion of researches carried out in this thesis and
it is divided into the following four sections:
Research work summary: presents a summary of the whole thesis by highlighting

the key objectives of each chapter in combination with an overall evaluation of the
work carried out in each chapter.
Contribution: draws attention to a list of major contributions that have been made to
the research and development in order to meet the overall objectives of the thesis,
outlined in chapter 1.
Future work: presents further research directions that may be taken in order to
improve upon the presented approaches for a commercially viable system.
Dissemination: presents a list of papers (progress reports of the findings of this

research) that have been presented and published in pattern recognition specific
journals and conference proceedings.
8.1 Research Work Summary

The overall aim of the research presented in this thesis was to investigate the novel lexicon
organization and contextual methods that could improve the state of art of the online
handwritten Pitmans Shorthand recognition problem.
Chapter 1 introduced the research of the thesis by highlighting a motivating need for the
development of new lexical post-processing methods to enhance the quality of text
interpretation of online handwritten Pitman shorthand outlines.
The chapter also
highlighted the necessity for the development of a functional user friendly graphical user
interface which facilitates a rapid text entry into pen based computing handheld devices
using handwritten Pitmans Shorthand.
174
8. Conclusion
A thorough literature review was carried out in Chapter 2 which overviews currently
available text entry systems into handheld devices and describes commonly used pattern
recognition and natural language processing algorithms that are applied to resolve problems
of the handwriting recognition.
The investigation into the efficiency of a conventional phonetic based word transcription
approach (where primitives of a shorthand script are firstly converted into a phonetic
representation in order to interpret it once more into corresponding English words with the
use of a phonetic dictionary) was discussed in Chapter 3. It was shown that the approach is
not robust against ambiguities of Pitmans Shorthand, in particular, ambiguities of the
random omission of vowels among outlines.
This leads to an exploration of the
development of a novel Bayesian Network based word transcription algorithm which aims to
enhance the solution using a primitive based transcription approach (Chapter 4). In the new
approach, primitives of a shorthand script are directly converted into autographic English
word(s) without getting transformed into phonemes with the use of a Pitmans Shorthand
lexicon. It was shown that the new primitive based approach outperforms the conventional
phonetic based method.
In relation to the primitive to text transcription approach, Chapter 5 presented the automatic
generation of a novel machine-readable Pitmans Shorthand lexicon which is an essential
component facilitating the primitive based transcription of the Bayesian Network based
word recogniser. The lexicon was shown to be a very effective mechanism for automatically
generating Pitmans Shorthand representations for any given words.
Following an extensive research into the word level transcription of handwritten Pitmans
Shorthand outlines, Chapter 6 proposed new contextual methods to enhance the solution
quality of the phrase level transcription problem. It was shown that the application of a well
175
8. Conclusion
known Viterbi algorithm in combination with Pitmans Shorthand specific contextual

knowledge is more effective, comparing to other contextual methods of the same framework.
Finally, prototypes of end user graphical user interfaces (GUIs), designed to demonstrate the
real time recognition of handwritten Pitmans Shorthand on a tablet PC are presented in
Chapter 7. This involves an evaluation of the user friendliness of the prototypes as well as
the selection of a final GUI for the whole system based on experimental results.
8.2 Contribution
A number of original contributions have been drawn from the thesis and they are identified
as follow:
For the first time, an investigation into the integration of the low-level online
handwritten Pitmans Shorthand recogniser with the high-level linguistic postprocessor is presented. It is shown that the integration has resulted in a more
productive quality research than the work reported in the literature of the same
framework.
The concept of phonetic based interpretation of segmented portions of handwritten

Pitmans Shorthand outlines into English words, reported in the literature is applied
to the linguistic post-processing of handwritten Pitmans Shorthand problem. The
appraisal delivers a valuable finding that highlights the need for the investigation of
a novel means of interpreting handwritten Pitmans Shorthand with a different
approach from the direction of existing concepts.
For the first time, the Bayesian Network representation is applied to the modelling
of handwritten Pitmans Shorthand outlines. A series of experiments are carried out
to analyse the transcription performance of the Bayesian Network based word
interpreter. The findings show that the Bayesian Network representation is robust
176
8. Conclusion
against stroke variation and highly effective for handling major ambiguities of
handwritten Pitmans Shorthand (i.e., classification errors and vowel errors).
For the first time, a machine readable Pitmans Shorthand lexicon is generated. The
findings show that the capability of the lexicon (i.e., ability to produce an accurate
Pitmans Shorthand representation for a corresponding word) plays an important
role in producing high quality solutions.
The application of Pitmans Shorthand specific contextual methods in combination

with a Viterbi algorithm is proposed for the phrase level transcription problem. The
algorithm shows promise towards achieving the best quality solution to the phrase
level transcription problems of handwritten Pitmans Shorthand.
A complete but resolute testing data set (which covers the whole range of rules of
Pitmans Shorthand) is proposed. The dataset is sufficient to be applied as a quality
benchmark dataset in the literature.
For the first time, the development of end user graphical user interfaces for enabling
Pitmans Shorthand data entry into tablet PCs is carried out. It is shown that the
final interface of the system is ready to be introduced as a commercially viable
prototype.
8.3 Future Work

Whilst this thesis presents several new methodologies to improve the state of the art of the
machine transcription of online handwritten Pitmans Shorthand, there are several research
questions that have been generated. Some of these are identified below.
8.3.1 Improvement upon the Overall System

A further research direction, working in close collaboration with the current collaborated
research team is encouraged for the future improvements upon the overall system for a
177
8. Conclusion
commercially viable product. A major reward gained from the future cooperative research
will be the removal of the limitations of the current system, in particular recovering
segmentation errors of the recognition engine by allowing an interactive processing between
the recognition engine and the transcription engine. In the current Bayesian Network model,
the modeling of segmentation ambiguities is infeasible mainly due to the lack of real time
interaction with the low level segmentation process of the recognition engine. With an
interactive supply of low level segmentation data, Bayesian Network based stroke models
for Pitmans Shorthand notations can be added in connection with existing shorthand outline
models. In this way, segmentation ambiguities can be embedded in probabilistic models and
recovered in the lexical post-processing stage. Overall, it may be worthwhile exploring a
solution to segmentation errors, which is a critical issue in the recognition of natural
handwriting.
In addition, further investigation of the transcription of punctuations and currency notations

of Pitmans Shorthand is worthwhile in order to complete the functionality of the Pitmans
Shorthand recognition system. Similar to short-forms, the number of symbols of the
punctuations and currency notations is limited and therefore the use of a Template Matching
algorithm to interpret these symbols is promising. In fact, the Template Matching algorithm
has already been established in the current recognition engine to recognise short-forms.
When the transcription engine receives punctuation data, a rigorous analysis of the
performance of the phrase level interpreter should be carried out in order to identify the
affect of punctuation over the phrase level transcription performance.
Finally, the full implementation of the integration of the contextual methods with the Viterbi
algorithm (proposed in Chapter 6) should be carried out by applying an appropriate a
professionally developed statistical language model available in the market. With the strong
foundation of the statistical language model in combination with the effective Pitmans
Shorthand specific contextual methods, the overall performance of the system will be
considerably better than other benchmark instances found in the current literature.
178
8. Conclusion
8.3.2 Application of the Presented System to the Real Life Problems

Although this thesis focuses on the automatic recognition of handwritten Pitmans Shorthand
as a rapid means of text entry into a tablet PC, it is worthwhile investigating the applicability
of the system to a variety of real-life problems where speed writing is critical. For instance,
it is interesting to analyse whether the system is beneficial to international travellers as a
language translation tool, running on their personal digital assistances (PDAs) or mobile
phones. Details of this idea have also been proposed in Chapter 1. In addition, previous
ideas stated in the literature such as the application of the Pitmans Shorthand recognition
system to real time subtitling of TV programmes and the real time transcription of lectures
and meetings need to be thoroughly studied in terms of feasibility. Finally, in order to make
the original aim of the thesis come true (i.e., to apply the system as a popular rapid text entry
into portable handheld devices), it is worthwhile formulating innovative as well as attractive
training methods via which general users can become more attracted to and interested in
Pitmans Shorthand. This may include the implementation of Pitmans Shorthand related
educational games, the invention of a shortcut method to learn Pitmans Shorthand and so
on.
8.4 Dissemination
The research carried out in this thesis has been disseminated in pattern recognition specific
international journals and conference proceedings. The following provides a list of papers
that have been published or submitted throughout the research.
1. Swe Myo Htwe, Colin Higgins, Graham Leedham & Ma Yang, Knowledge based
transcription of Pitmans handwritten shorthand using word frequency and context,
179
8. Conclusion
Proceedings of the 7th IEEE International Conference on Development and Application

Systems, pp. 508-512, Suceava, Romania, 27-29 May 2004.
2. Ma Yang, Graham Leedham, Colin Higgins, & Swe Myo Htwe, Segmentation and
recognition of vocalized outlines in Pitman shorthand, Proceedings of the 17th International
Conference on Pattern Recognition, Vol. I, ISBN 0-7695-2128-2, pp. 441-444, Cambridge,
UK, 23-26 August 2004.
3. Swe Myo Htwe, Colin Higgins, Graham Leedham & Ma Yang, Post Processing of
Handwriting Pitmans Shorthand using Unigram and Heuristic Approaches, Published in
Lecture Notes in Computer Science: Document Analysis Systems VI, 3163, SpringerVerlag, pp. 332-336, Proceedings of the IAPR workshop on document analysis systems,
University of Florence, Italy, 8-10 September 2004.
4. Ma Yang, Graham Leedham, Colin Higgins & Swe Myo Htwe, Segmentation and
recognition of phonetic features in handwritten Pitman shorthand, Pattern Recognition,
August 2004, Accepted and in press.
5. Swe Myo Htwe, Colin Higgins, Graham Leedham & Ma Yang, Evaluation of Feature
Sets in the Post Processing of Handwritten Pitmans Shorthand, Proceedings of the 9th
International Workshop on Frontiers in Handwriting Recognition, ISBN 0-7695-2187-8, pp.
359-364, Kokubunji, Tokyo, Japan, 26-29 October 2004.
6. Swe Myo Htwe, Colin Higgins & Graham Leedham, Post-processing of handwritten
Phonetic Pitmans Shorthand using a Bayesian Network built on geometric attributes, In
Pattern Recognition and Image Analysis, Lecture Notes in Computer Science 3687,
180
8. Conclusion
Springer, Sameer Singh, Maneesha Singh, Chid Apte, Petra Perner (Eds.), pp. 569-579,
2005.
7. Swe Myo Htwe, Colin Higgins & Graham Leedham, Transliteration of online
handwritten phonetic Pitmans Shorthand with the use of a Bayesian Network, Proceedings
of the 8th International Conference on Document Analysis and Recognition, Vol. 2, pp.
1090-1094, Seoul, Korea, 29 August - 1 September 2005.
8. Ma Yang, Graham Leedham, Colin Higgins & Swe Myo Htwe, On-line recognition of
Pitmans Shorthand for fast mobile text entry, Proceedings of the 3rd IEEE International
Conference on Information Technology and Applications, pp. 686-691, Sydney, Australia,
4-7 July 2005.
181
References
References
Abo04
About
Pen
Input,
Ink
and
Recognition,
available
from
www.msdn.microsoft.com, 2004.
Bc85
Brooks C.P., Computer Transcription of Written Shorthand for the Deaf, PhD
Thesis, Faculty of Engineering, University of Southampton, 1985.
BN81
Brooks C.P. & Newell A.F., Simultaneous Transcription of Pitman's New Era
Shorthand, Int. Conf. on Microprocessors in Automation and Communications,
pp. 171-179, London, 27-29 Jan, 1981.
BN85
Brooks C.P. & Newell A.F., Computer Transcription of Handwritten Shorthand

as an Aid for the Deaf - A feasibility Study, International Journal of ManMachine Studies, Vol.23, No.1, pp.45-60, 1985.
BSH04
Bishop C.M., Svensen M., Hinton G.E., Distinguishing Text from Graphics in
On-line Handwritten Ink, in Proceedings of the Ninth International Workshop
on Frontiers in Handwriting Recognition (IWFHR'09), pp. 142-147, Tokyo,
Japan, 26-29 October, 2004.
CDL+99
Cowell R. G., Dawid A. P., Lauritzen S. L. & Spiegelhalter D. J., Probabilistic

Networks and Expert Systems, Springer, 1999.
CFH03
Chen F-S., Fu C-M. & Huang C-L., Hand Gesture Recognition Using a Realtime Tracking Method and Hidden Markov Models, Image and Vision
Computing, Vol.21(8): pp. 745-758, 2003.
182
References
CK04
Cho S-J. & Kim J.H., Bayesian Network Modeling of Strokes and their
Relationships for On-line Handwriting Recognition, Pattern Recognition, Vol.
37(22): pp. 253-264, 2004.
ESS+98
El-Yacoubi A., Sabourin R., Suen C.Y. & Gilloux M., Improved Model
Architecture and Training Phase in a Off-line HMM-based Word Recognition
System, in Proc. of the 14th International Conference on Pattern Recognition,
pp.17-20, Brisbaine, Australia, 1998.
FF93
Feddag A. & Foxley E., A Lexical Analyser for Arabic, International Journal of
Man-Machine Studies, Vol.38(2): pp.313-330, Feburary, 1993.
FW00
Freeman W. & Weiss Y., On the Fixed Points of the Max-Product Algorithm,
IEEE Transactions on Information Theory, 2000.
GB04
Gnter S., & Bunke H., HMM-Based Handwritten Word Recognition: on the
Optimization of the Number Of States, Training Iterations And Gaussian
Components, Pattern Recognition , Vol. 37, pp. 2069 - 2079, 2004.
GKM+97
Gloger J., Kaltenmeier A., Mandler E. & Andrews L., Rejection Management in
a Handwriting Recognition System, in Proc. 4th International Conference
Document Analysis and Recognition, pp.556-559, Ulm, Germany, 1997.
HD96
Huang C. & Darwiche A., Inference in Belief Networks: A Procedural Guide,

Intl. J. Approximate Reasoning, Vol.15(3): pp. 225-263, 1996.
Hd99
Heckerman D., A Tutorial on Learning with Bayesian Networks. In Learning in

Graphical Models, M. Jordan (ed.) MIT Press, Cambridge, MA, 1999.
183
References
HHL+04a
Htwe S. M., Higgins C., Leedham C.G. & Yang M., Knowledge Based
Transcription of Pitman's Handwritten Shorthand Using Word Frequency and
Context, in the Proceedings of the 7th IEEE International Conference on
Development and Application Systems, pp. 508-512, Suceava, Romania, 27-29
May 2004.
HHL+04b Htwe S.M., Higgins C., Leedham C.G. & Yang M., Post Processing of
Handwriting Pitmans Shorthand using Unigram and Heuristic Approaches, In
Document Analysis Systems VI, 3163, Lecture Notes in Computer Science,
Springer-Verlag, pp. 332-336, 2004.
HHL+04c
Htwe S.M, Higgins C., Leedham C.G. & Yang M., Evaluation of Feature Sets
in the Post Processing of Handwritten Pitmans Shorthand, Proceedings of the
9th International Workshop on Frontiers in Handwriting Recognition, ISBN 07695-2187-8, pp. 359-364, Kokubunji, Tokyo, Japan, 26-29 October 2004.
HHL+05a
Htwe S.M, Higgins C & Leedham C.G & Yang M., Transliteration of Online
Handwritten Phonetic Pitmans Shorthand with the use of a Bayesian Network,
Proceedings of the 8th International Conference on Document Analysis and
Recognition, Vol. 2, pp. 1090-1094, Seoul, Korea, 29 August - 1 September
2005.
HHL+05b Htwe S.M., Higgins C. & Leedham C.G & Yang M., Post-processing of
Handwritten Phonetic Pitmans Shorthand using a Bayesian Network Built on
Geometric Attributes, Proceedings of the 3rd International Conference on
Advances in Pattern Recognition, pp. 569- 579, Bath, UK, 22-25 August 2005.
184
References
HHL+05c
Htwe S.M., Higgins C., Leedham C.G. & Yang M., Post-processing of
Handwritten Phonetic Pitmans Shorthand using a Bayesian Network Built on
Geometric Attributes, In Pattern Recognition and Image Analysis, Lecture
Notes in Computer Science 3687, Springer, Sameer Singh, Maneesha Singh,
Chid Apte, Petra Perner (Eds.), pp. 569-579, 2005.
HLB00
Hu J., Lim S.G. & Brown M.K., Writer Independent On-line Handwriting
Recognition using an HMM Approach, Pattern recognition, Vol. 33(1): pp. 133147, 2000.
Hn97
Nishida H., Analysis and Synthesis of Deformed Patterns Based on Structural

Models, Computer Vision and Image Understanding, Vol.68(1): pp.59-71,
October 1997.
HSS02
Hu T., Silva L.C. D. & Sengupta K., A Hybrid Approach of NN and HMM for
Facial Emotion Classification, Pattern Recognition, Vol. 23(11): pp. 1303-1310,
2002.
HV93
Hoffman J.S. & Vidal J.J., Cluster Network for Recognition of Handwritten,
Cursive Script Characters, Neural Networks, Vol.6(1): pp.69-78, 1993.
Ja99
Bilmes J.A., Natural Statistical Models for Automatic Speech Recognition, PhD
Dissertation, University of California at Berkeley, May 1999.
JBS05
Justino E. J. R., Bortolozzi F. & Sabourin R., A comparison of SVM and HMM
Classifier in the Off-line Signature Verification, Pattern Recognition Letters,
185
References
Vol. 26(9): pp. 1377-1385, 2005.
JGJ+98
Jordan M. I., Ghahramani Z., Jaakkola T. S. & Saul L. K., An Introduction to

Variational Methods for Graphical Models. In M. Jordan (ed.) Learning in
Graphical Models, MIT Press, 1998.
JJ98
Jaakkola T.S. & Jordan M.I., Learning in Graphical Models, Chapter Improving
the Mean Field Approximations via the Use of Mixture Distributions. Kluwer
Academic Publishers, 1998.
Jm99
Jordan M. I.(ed), "Learning in Graphical Models", MIT Press, 1999.
Ka04
Koerich A.L., Rejection Strategies for Handwritten Word Recognition, in Proc.

9th Int'l Workshop on Frontiers in Handwriting Recognition, Tokyo, Japan, 2629 October, 2004.
KKL03
Kim M., Kim D. & Lee S-Y., Face Recognition Using the Embedded HMM
with Second-order Block-specific Observations, Pattern Recognition, Vol.
36(11): pp. 2723-2735, 2003.
KP01
Kapoor A. & Picard R., A real-time Head Nod and Shake Detector, in
Proceedings from the Workshop on Perspective User Interfaces, November
2001.
KSN+03
Kumar G.H., Shankar M.R., Nagabushan P., Anami B.S., Generation of Pitman
Shorthand Language Symbol for Diphthongs Grammalogues and Punctuation
from Spoken English Language Text: An Approach based on Discrete Wavelet
186
References
Transform and Dynamic Time Wrapping Technique, Proceedings: National

Workshop on IT Services and Applications (WITSA2003), Feb 27-28, 2003.
KSN04
Kumar H.G., Shivakumara P. & Nagaraju S., A New Invariant Feature

Extraction Technique for Classification of Pitman Shorthand Symbols,
Proceedings of the International Conference on Computational Science
(ICCS2004), Krakow, Poland, June 6-9, 2004.
LD84
Leedham C.G. & Downton A.C., On-line recognition of Shortforms in Pitman's

Handwritten Shorthand, Proc. of the 7th Int. Conf. on Pattern Recognition, (also
in 1985 Research Handbook, Department of Electronics, Southampton
University, pp. 39-41), pp. 1058- 1060, Montreal, Canada, 30 July-2 August
1984.
LD86
Leedham C.G. & Downton A.C., On-line Recognition of Pitman's Handwritten

Shorthand - an Evaluation of Potential, Int. J. Man-Machine Studies, Vol. 24,
pp. 375-393, 1986.
LD87
Leedham C.G. & Downton A.C., Automatic Recognition and Transcription of

Pitman's Handwritten Shorthand - an Approach to Shortforms, Pattern
Recognition, Vol. 20, No. 3, pp. 341-348, 1987.
LDB84
Leedham C.G., Downton A.C., Brooks C.P. & Newell A.F., On-line
Acquisition of Pitman's Handwritten Shorthand as a Means of Rapid Data
Entry, Proc. of the 1st Int. Conf. on Human-Computer Interaction, pp. 2.862.91, London, UK, 4-7 September 1984.
187
References
LDB85
Leedham C.G., Downton A.C., Brooks C.P. & Newell A.F., On-line
Acquisition of Pitman's Handwritten Shorthand as a Means of Rapid Data
Entry, Human-Computer Interaction - INTERACT '84, B. Shackel (Ed.), pp.
151-156, North Holland, 1985.
Lg84
Leedham C.G., Computer Acquisition and Recognition of Pitman's Handwritten

Shorthand, PhD Thesis, Department of Electronics, University of Southampton,
1984.
Lg89
Leedham C.G., Pitman's Handwritten Shorthand: Machine Recognition and

Transcription, Proc. of the 4th International Graphonomics Society Conference
on the Development of Graphical Skills, Trondheim, Norway, 24-26 July 1989.
Lg90
Leedham C.G., Automatic Recognition and Transcription of Pitmans

Handwritten Shorthand, Computer Processing of Handwriting, Plamondon R.
and Leedham C. G. (Eds.), World Scientific, ISBN 981-02-0408-6, pp. 235269, Singapore, November 1990.
LQ89
Leedham C.G. & Qiao Y., On-line Recognition of Vocalised Pitman shorthand
Outlines, Proc. of the IEE Colloquium on Character Recognition and
Applications, Digest No. 1989/109, pp. 10/1-10/5, Savoy Place, London, 2
October 1989.
LQ90
Leedham C.G. & Qiao Y., Correcting Recognition Errors in Handwritten,

Vocalised Pitmans Shorthand Outlines Using a Boltzmann Machine, Proc. of
the IEE Colloquium on Neural Nets in Human-Computer Interaction, Digest
No. 1990/179, pp. 2/1-2/7, Savoy Place, London, 14 December 1990.
188
References
LQ92
Leedham C.G. & Qiao Y., High Speed Text Input to Computer Using
Handwriting, Instructional Science, Vol. 21, pp. 209-221, September 1992.
Lr89
Lea R.N., Applications of fuzzy Sets to Rule-based Expert System

Development, Telematics and Informatics, Vol.6(3-4), pp. 403-406, 1989.
LY97
Li X. & Yeung D-Y., On-line Handwritten Alphanumeric Character

Recognition Using Dominant Points in Strokes, Pattern Recognition, Vol.31(1):
pp. 31-44, January, 1997.
MAG+02
Marukata S., Artires T., Gallinari P. & Dorizzi B., Rejection Measures for
Handwriting Sentence Recognition, in Proc. 8th International Workshop on
Frontiers in Handwriting Recognition, pp. 24-29, Niagara-on-the-Lake, Canada,
2002.
MB01
Marti U-V. & Bunke H., Using a Statistical Language Model to Improve the
Performance of an HMM-Based Cursive Handwriting Recognition System,
IJPRAI, Vol. 15(1): pp. 65-90, 2001.
Md98
Mackay D. J. C., Introduction to Monte Carlo Methods. In M. I. Jordan (ed.)

Learning in Graphical Models, pp. 175-204. Cambridge, MA: MIT Press.
Mic04
Microsoft Tablet PC - Ink Controls, available from www.msdn.microsoft.com,

2004
Mk01
Murphy M., Introduction to Graphical Models, Technical Report, May 2001.
Mk98
Murphy M., A Brief Introduction to Graphical Models and Bayesian Networks,
189
References
Technical Report, 1998.
Ml00
Mendez L.A.T., Viterbi Algorithm in Text Recognition, Pattern Recognition

Course, McHill University,2000.
Mm03
Miozzo M., On the Processing of Regular and Irregular Forms of Verbs and
Nouns: Evidence from Neuropsychology, Congnition, Vol.87(2), pp. 101-127,
March 2003.
Ms01
Marukatat S., Sentence Recognition through Hybird Neuro-Markovian

Modeling, in 6th ICDAR, pages 731-737, 2001.
MS04
Meyer C. & Schramm H, Boosting HMM Acoustic Models in Large

Vocabulary Speech Recognition, Proceedings of Signal and Image Processing
(SIP 2004), Honolulu, Hawaii, USA, 23-25 August 2004.
MS99
Manning C. & Schutze H., Foundations of Statistical Natural Language

Processing, MIT Press, 1999.
Mt98
Toshiyuki M., POBox: An efficient Text Input Method for Handheld and
Ubiquitous Computers, Proc. of the ACM Conference on Human Factors in
Computing System (CHI98), Los Angeles, USA, pp. 328-335, April 1998.
NB02
Nagabhushan P. & Anami B.S., Dictionary Supported Generation of English

Text from Pitman Shorthand Scripted
Phonetic Text, Proceedings of the
Language Engineering Conference (LEC'02), pp. 33, Hyderabad, India,

December 13 - 15, 2002.
190
References
NL92
Nair A. & Leedham C.G., Evaluation of Dynamic Programming Algorithms for

the Recognition of Shortforms in Pitmans Shorthand, Pattern Recognition
Letters, Vol. 13, pp. 605-612, August 1992.
Oj95
Osborne J., Pitman 2000: Shorthand First Course (Pitman 2000 Shorthand),
1995.
Pj88
Pearl J., Probabilistic Reasoning in Intelligent Systems: Networks of Plausible

Inference, Morgan Kaufmann Publishers, Inc., San Mateo, CA, 1988.
Pj88
Pearl J., Probabilistic Reasoning in Intelligent Systems: Networks of Plausible

Inference, Morgan Kaufmann, 1988.
PP02
Pitrelli J.F. & Perrone M.P., Confidence Modeling for Verification PostProcessing for Handwriting Recognition, in Proc. 8th International Workshop
on Frontiers in Handwriting Recognition, pp.30-35, Niagara-on-the-Lake,
Canada, 2002.
PS91
Peot M. & Shachter R., Fusion and Propagation with Multiple Observations in
Belief Networks, Artificial Intelligence, Vol.48: pp. 299-318, 1991.
PVM+03
Perraud F., Viard-Gaudin C., Morin E. & Lallican P-M., N-Gram and N-Class
Models for on Line Handwriting Recognition, In 7th ICDAR, pages 1053-1059,
2003.
QAC05
Quiniou S., Anquetil E. & Carbonnel S., Statistical language Models for Online Handwritten Sentence Recognition, Proceedings of the Eighth International
Conference on Document Analysis and Recognition (ICDAR 2005), pp. 516-
191
References
520, Seoul, Korea, 29 August - 1 September, 2005.
QL89
Qiao Y. & Leedham C.G., Segmentation and Classification of Consonant

Features in Vocalised Pitman Shorthand Outlines, Proc. of the Int. Conf. on
Image Processing (ICIP'89), pp. 294-298, Singapore, 5-8 September 1989.
QL91
Qiao Y. & Leedham C.G., Segmentation of Handwritten Pitman Shorthand

Outlines Using an Interactive Heuristic Search, Proc. of the 5th International
Graphonomics Society Conference, ISBN 0-9630246-0-4, pp. 157-162, Tempe,
Arizona, USA, 27-30 October 1991.
QL93
Qiao Y. & Leedham C.G., Segmentation and Recognition of Handwritten

Pitman Shorthand Outlines Using an Interactive Heuristic Search, Pattern
Recognition, Vol. 26, No. 3, pp. 433-441, March 1993.
Ri93
Russell I., Neural Networks: Theory and Applications, The Journal of

Undergraduate Mathematics and its Applications (UMAP), Vol 14(1), January
1993.
Rl89
Rabiner L., A Tutorial on Hidden Markov Models and Selected Applications in

Speech Recognition, Proc. IEEE 77(2): pp. 257--286, 1989.
Sa04
Seward A., A fast HMM Match Algorithm for very Large Vocabulary Speech
Recognition, Speech Comm, Vol. 42, pp. 191-206, 2004.
SJJ96
Saul L.K., Jaakkola T. & Jordan M.I., Mean Field Theory for Sigmoid Belief
Networks, Journal of Artificial Intelligence Research, Vol. 14: pp. 61-76, 1996.
192
References
SKN+04
Shankar M.R., Kumar G.H., Nagabhushan P., Anami B.S., Linear Predictive
Coefficient Based Approach for Generation of Pitman Shorthand Language
Characters from Spoken English Language, Proceedings of the International
Conference on Computational Science (ICCS2004), Krakow, Poland, June 6-9,
2004.
Sr98
Shachter R.D.,Bayes-ball: The Rational Pastime for Determining Irrelevance

and Requisite Information in Belief Networks and Influence Diagrams. In
Uncertainty in Artificial Intelligence, 1998.
Sy94
Swales Y.Y.G., Integrating Artificial Neural Networks with Rule-based Expert

Systems, Decision Support Systems, Vol.11(5), pp.497-507, June 1994.
Tab04
Tablet PC Platform Software Development Kit (SDK) v1.5, 2004, Available

from www.microsoft.com
VBB04
Vinciarelli V., Bengio S. & Bunke H., Offline Recognition of Unconstrained

Handwritten Texts using HMMs and Statistical Language Models, IEEE
Transactions on PAMI, Vol. 26(6): pp. 709-720, 2004.
WF99
Weiss Y. & Freeman W. T., Correctness of Belief Propagation in Gaussian

Graphical Models of Arbitrary Topology. In NIPS-12, 1999.
WH93
Wellman M.P. & Henrion M., Explaining Explaining Away, IEEE

Transactions on Pattern Analysis and Machine Intelligence archive, Vol 15(3):
pp. 287-292, 1993.
193
References
Win05
Windows
XP
Tablet
PC
Edition
2005
Features,
available
from
www.microsoft.com/windowsxp/tabletpc, 25 August, 2004.

Wy00
Weiss Y., Correctness of Local Probability Propagation in Graphical Models

with Loops, Neural Computation, Vol. 12: pp. 1-41, 2000.
XL02
Xiao X. & Leedham C.G, Signature Verification Using a Modified Bayesian

Network, Pattern Recognition, Vol.35, pp. 983-995, 2002.
YLH+04a
Yang M., Leedham C.G., Higgins C., & Htwe S.M, Segmentation and
Recognition of Vocalized Outlines in Pitman shorthand, Proceedings of the
17th International Conference on Pattern Recognition, Vol. I, ISBN 0-76952128-2, pp. 441-444, Cambridge, UK, 23-26 August 2004.
YLH+04b Yang M., Leedham C.G., Higgins C. & Htwe S.M., Segmentation and
Recognition of Phonetic Features in Handwritten Pitman Shorthand, Pattern
Recognition, August 2004, Accepted and in press.
YLH+05a
Yang M., Leedham C.G., Higgins C. & Htwe S.M, On-line Recognition of
Pitmans Shorthand for Fast Mobile Text Entry, Proceedings of the 3rd IEEE
International Conference on Information Technology and Applications, pp. 686691, Sydney, Australia, 4-7 July 2005.
YLH+05b Yang M., Leedham C.G., Higgins C. & Htwe S.M., An On-line Automatic
Recognition System for Pitmans Shorthand, Proceedings of the IEEE Region
10 Technical Conference (TENCON05), Melbourne, Australia, 21-24
November 2005.
194
References
YLH+05c
Yang M., Graham Leedham., Higgins C. & Htwe S.M., Critical Technological
Issues of Commercializing a Pitman Shorthand Recognition System,
Proceedings
of
the
5th
International
Conference
on
Information,
Communications and Signal Processing (ICICS 2005), Bangkok, Thailand, 6-9

December 2005.
YWP95
Yang L., Widjaja B. K. & Prasad R., Application of Hidden Markov Models for
Signature Verification, Pattern Recognition, Vol.28(22): pp. 161--170, 1995.
ZB04
Zimmermann M. & Bunke H., N-Gram Language Models for Offline

Handwritten Text Recognition, in 9th IWFHR, pages 203-208, 2004.
ZK03
Zhai S., & Kristensson
P-O., Shorthand Writing on Stylus Keyboard,
Proceedings of the ACM Conference on Human Factors in Computing Systems

(CHI 2003), CHI Letters 5, 1. ACM Press, pp 97-104, 2003.
195
Appendix
Chapter 9 Appendix A
Description of the 46 Rules Applied to the Automatic Generation of

a Machine-readable Pitmans Shorthand Lexicon
The 1st Rule
Complexity: direct conversion
Objective: to verify if an input (i.e., a set of phonemes) corresponds to a vocalised outline,
containing both consonants and vowels
Strategy: check if an input contains any consonants; if it does, an input is passed to the 2nd
rule; otherwise it instructs the program to omit the current input and get another.
The 2nd Rule

Complexity: indirect conversion.
Objective: to convert a combination of /Y/, /CH/, /JH/ or /ZH/ and a vowel into a diphthong
). To clarify the objective, consider an example for the word refuse (/R F Y
symbol (
U Z/) instead of the consonant /Y/ written as

/U/ and written as a diphthong
, it is combined with an adjacent vowel
(Figure 9.1).
Strategy: convert a combination of /Y/, /ZH/, /JH/ or /CH/ and /U/ or /AH/ into a diphthong
feature
.
Reference
for the word refuse
Pitmans Shorthand notation
/R/
/F/
/S/
/Y/
Pronunciation for the word refuse

/R F Y U Z/
196
(diphthong)
Appendix
Figure 9.1: Illustration of the use of diphthong feature
in a vocalised outline
The 3rd Rule

Objective: to convert the sounds CON and COM at the beginning of a word into a dot
primitive. A sample outline containing the sound COM at the beginning is illustrated in
Figure 9.2.
Strategy: if a word starts with the sound CON or COM, and if the sound CON or COM is
not followed by the sound ING, S, Z, T or D at the end of the word, convert the sound CON
or COM into a dot primitive.
Reference

word commence
/COM/
/M/
/NS/
//
Pronunciation for the word commence

/ K M N S/
Figure 9.2: Illustration of the use of a dot primitive for the sound COM at the
beginning of a word
The 4th Rule

Objective: to convert the sound WH of a word into a large hook. A sample Pitmans
Shorthand outline containing the sound WH is illustrated in Figure 9.3.
Strategy:
1. First, save words containing the sound WH in a list. In the current system, the list
contains 321 words, which are extracted from a lexicon of 99,281 words.
2. Check if a word representation of an input matches with any element of the list.
3. If it does, then the sound WH of an input is converted into a large hook, otherwise
do nothing.
197
Appendix
Reference
word where
/WH/
/R/
//
Pronunciation for the word where

/W H R/
Figure 9.3: Illustration of the use of WH hook in a vocalised outline
The 5th Rule

Objective: to convert the sound IL-, IM-, IN-, IR- or UN-, negative prefix of a word, into a
series of consonant and vowel primitives. A sample Pitmans Shorthand outline containing
the prefix IR- is illustrated in Figure 9.4.
Strategy:
8. Save words containing the prefix IL-, IM-, IN-, IR- or UN- in a list.
9. Check if a word representation of an input matches with any element of the list;
10. if it does and if the prefix is IL-, convert the sound IL into an upward stroke L,
followed by a dot primitive and another extra upward stroke L;
11. if it does and if the prefix is IM-, convert the sound IM- into a curve M, followed a
dot primitive and another extra curve M;
12. if it does and if the prefix is IR-, convert the sound IR- into a downward curve R,
followed by a dot primitive and another extra downward curve R;
13. if it does and if the prefix is IN-, convert the sound IN- into a curve N, followed by a
dot primitive and another extra curve N.
14. if it does and if the prefix is UN-, convert the sound UN- into curve N, followed by a
dash primitive and another extra curve N.
198
Appendix
In addition, the 5th rule states that a consonant /D/ following the prefix IN- and UN- is not
allowed to be omitted. This is to avoid a conflict with the ND writing rule of Pitmans
Shorthand, in which the consonant /D/ following /N/ is omitted. Detail about the ND rule
can be referenced in Appendix B.
Reference
the word irregular
/R/
(start)
//
/R/
(middle)
//
/G/
/L/
/R/
(end)
//
/U/
Pronunciation for the word irregular

/ R G Y U L /
Figure 9.4: Illustration of the use of negative prefix IR- in a vocalised outline
The 6th Rule

Objective: to convert a pair of consonants PL, BL, TL, DL, CHL, JL, KL or GL at the
beginning, in the middle or at the end of a word into a series of a small hook L followed by a
corresponding consonant primitive. Note that the consonant L is written as an upward or
downward curve (instead of a hook) when it is not immediately following /P/, /B/, /T/, /D/,
/CH/, /J/, /K/ or /G/. A sample Pitmans Shorthand outline containing the sound /P L/ at the
Strategy:
4. If /N/ comes before /T L/ or /D L/, hook L is not used.
5. If /T/ or /D/ does not appear in the same syllable as /L/, hook L is not used;
6. otherwise, replace phonemes /P L/, /B L/, /T L/, /D L/, /CH L/, /J L/, /K L/ and /GL/
of an input with a, b, c, d, e, f, g and h respectively, where
199
Appendix
a = hook + P stroke
b= hook + B strokes
c = hook + T stroke
d = hook + D stroke
e = hook + CH stroke
f = hook + J stroke
g = hook + K stroke
h = hook + G stroke.
Reference
for the word play
/P/
/L/
/PL/
//
Pronunciation for the word play

/P L /
Figure 9.5: Illustration of the use of PL hook in a vocalised outline
The 7th rule

Objective: to convert a pair of consonants FL, VL, ThL, ML, NL, SHL at the beginning of a
word into a series of a large hook L, followed by a corresponding consonant primitive. A
sample Pitmans Shorthand outline containing the sound /FL/ at the beginning of a word is
Strategy: replace phonemes /FL/, /VL/, /ThL/, /ML/, /NL/ and /SHL/ with a, b, c, d, e and f
respectively, where
a = large hook L + F stroke
b = large hook L + V stroke
c = large hook L + Th stroke
d = large hook L + M stroke
200
Appendix
e = large hook L + N stroke

f = large hook L+ SH stroke
Reference

for the word flow
/F/
/L/
/FL/
//
Pronunciation for the word flow

/F L /
Figure 9.6: Illustration of the use of FL hook at the beginning of a vocalised outline
The 8th Rule

Objective: to convert a series of consonants SPR, STR, SKR at the beginning of a word into
a series of circle S followed by P, T or K stroke respectively. Note that a consonant R is
omitted in this case. A sample Pitmans Shorthand outline containing the sound /SPR/ at the
Strategy: replace phonemes /SPR/, /STR/ and /SKR/ at the beginning of input phonemes
with a, b and c respectively, where
a = circle S + stroke P
b = circle S + stroke T
c = circle S + stroke K
Reference
for the word spray
/S/
/P/
/R/
/PR/
Pronunciation for the word spray

/S P R /
201
/SPR/
//
Appendix
Figure 9.7: Illustration of the use of SPR stroke in a vocalised outline
The 9th rule

Objective: to convert the sound /STER/ in the middle or at the end of a word into a large
loop. A sample Pitmans Shorthand outline containing the sound /STER/ at the end of a
word is illustrated in Figure 9.8.
Strategy:
1. If the sound /STER/ appears at the beginning of a word, it is not converted into a
large loop;
2. If the sound /STER/ is followed by a consonant /N/ at the end of a word, it is not
converted into a large loop;
3. otherwise, replace the sound /STER/ of a word with a large loop.
Reference

for the word master
/M/
/S/
/T/
/R/
/STER/
/AH/
Pronunciation for the word master

/M AH S T /
Figure 9.8: Illustration of the use of STER loop in a vocalised outline
The 10th rule

Objective: to omit primitives of the sound CON, COM, CUM or COG in the middle of a
word. A sample Pitmans Shorthand outline containing the sound /CON/ in the middle of a
word is illustrated in Figure 9.9.
Strategy:
1. If CON, COM, CUM or COG is the only sound of a word, it is not omitted;
202
Appendix
2. otherwise, omit primitives of the sound CON, COM, CUM or COG.

Reference
for the word reconsider
/R/
/K/
/N/
/S/
/D/
//
//
Pronunciation for the word reconsider

/R K N S D /
Figure 9.9: Illustration of omission of the sound CON in the middle of a vocalised
outline
The 11th rule

Objective: to convert the sound SES, SEZ, ZES or ZEZ at the end of a word into a large
circle. A sample Pitmans Shorthand outline containing the sound SEZ at the end of a word
is illustrated in Figure 9.10.
Strategy: if the word is ended with the sound SES, SEZ, ZES or ZEZ, replace the sound with
a large circle.
Reference
for the word bases
/B/
/SEZ/
//
Pronunciation for the word bases

/B S Z/
Figure 9.10: Illustration of the use of SEZ circle in a vocalised outline
The 12th rule

203
Appendix
Objective: to convert the sound /ED/ that makes a verb a past tense into a disjoined stroke T
or stroke D. A sample Pitmans Shorthand outline containing a disjointed /ED/ at the end is
Strategy: if a word is ended with /T/ or /D/ and if a vowel comes before /T/ or /D/, then
replace /vowel+T/ or /vowel+D/ at the end of the word with T or D stroke respectively.
Reference

for the word dated
/D/
/T/
//
/ED/
Pronunciation for the word dated

/D T D /
Figure 9.11: Illustration of the use of a disjointed /ED/ in a vocalised outline
The 13th rule

Objective: to convert the sound /ST/ at the beginning, in the middle or at the end of a word
into a shallow loop. A sample Pitmans Shorthand outline containing the sound /ST/ at the
Strategy:
1. if a word begins or ends with a vowel, /ST/ at the beginning or at the end of the
word is not converted into a shallow loop;
2. if the sound /ST/ is immediately followed by the sound /SHUN/, it is not converted
into a shallow loop;
3. if /S/ and /T/ belong to two different syllables, /ST/ is not converted into a shallow
loop;
4. if /ST/ comes before /NTS/ or /NDS/ at the end of a word, it is not converted into a
shallow loop;
5. if /ST/ is followed by /R/, it is not converted into a shallow loop;
204
Appendix
6. otherwise, replace /ST/ of an input with a shallow loop.

Reference
for the word stock
/S/
/T/
/ST/
/K/
//
Pronunciation for the word stock

/S T K /
Figure 9.12: Illustration of the use of ST loop in a vocalised outline
The 14th rule

Objective: to omit /T/ or /D/ at the end of one syllable words. This relates to the half-length
writing rule of Pitmans Shorthand and a sample (one-syllable) half-length outline is
Strategy: if a word is a one-syllable word and if there are consonants in a word other than
just /R/ and /T/ or /T/, then /T/ or /D/ at the end the word is omitted, provided that /T/ is not
following a voiced consonant and /D/ is not following an unvoiced consonant.
Reference

for the word coat
/K/
/T/
//
Pronunciation for the word coat

/K T/
Figure 9.13: Illustration of a one syllable half-length outline
The 15th rule

Objective: to convert the suffix ING into a dot primitive. A sample Pitmans Shorthand
outline containing the suffix ING is illustrated in Figure 9.14.
205
Appendix
Strategy: if an input ends with the sound ING, convert the sound ING into a dot primitive.
Reference

for the word coping
/K/
/P/
//
/ING/
Pronunciation for the word coping

/K P NG/
Figure 9.14: Illustration of the use of suffix ING in a vocalised outline
The 16th rule

Objective: to convert the suffix INGS into a dash primitive. A sample Pitmans Shorthand
outline containing the suffix INGS is illustrated in Figure 9.15.
Strategy: if an input ends with the sound /INGS/, convert the sound /INGS/ into a dash.
Reference

for the word takings
/T/
/K/
//
/INGZ/
Pronunciation for the word takings

/T K NG Z/
Figure 9.15: Illustration of the use of the suffix INGS in a vocalised outline
The 17th rule

Objective: to convert the suffix -SHIP into a SH stroke. A sample Pitmans Shorthand
outline containing the suffix SHIP is illustrated in Figure 9.16.
Strategy: if a word ends with the sound /SHIP/, then convert the sound /SHIP/ into a stroke
SH.
206
Appendix
Reference
for the word scholarship
/S/
/K/
/L/
/R/
/AW/
// /SHIP/
Pronunciation for the word scholarship

/S K AW L R SH P/
Figure 9.16: Illustration of the use of the suffix SHIP in a vocalised outline
The 18th rule

Objective: to convert the consonant /S/ or /Z/ into a downward stroke. A sample Pitmans
Shorthand outline containing the stroke Z is illustrated in Figure 9.17.
Strategy:
1. if a vowel comes before /S/ at the beginning of input phonemes, convert /S/ at the
beginning into a stroke S.
2. If input phonemes contain /S+vowel/, /S+vowel+S/, /S+vowel+ past tense D/ or
/S+vowel+ING/ at the end, convert /S/ at the end into a stroke S.
3. If input phonemes contain /S+vowel+S/ or /S+vowel+Z/ at the beginning, convert
/S/ at the beginning into a stroke S.
4. If a word starts with /Z/, convert /Z/ into a stroke Z.
Reference

for the word busy
/B/
/Z/
//
//
Pronunciation for the word busy

/B Z /
Figure 9.17: Illustration of the use of stroke Z in a vocalised outline
207
Appendix
The 19th rule

Objective: to convert the suffix MENT into a stroke N. A sample Pitmans Shorthand
outline containing the suffix MENT is illustrated in Figure 9.18.
Strategy: if the sound /MENT/ appears at the end of a word and is preceded by a straight
upstroke, /N/, /ST/ or /S/, it is converted into stroke N.
Reference

for the word apartment
/P/
/RT/
/MENT/
//
/AH/
Pronunciation for the word apartment

/ P AH R T M N T/
Figure 9.18: Illustration of the use of the suffix MENT in a vocalised outline
The 20th rule

Objective: to convert the suffix MENTAL into a series of a stroke N followed by a
downward L. A sample Pitmans Shorthand outline containing the suffix MENTAL is
Strategy: if an input contains the sound /MENTAL/ at the end, convert the sound
/MENTAL/ into a combination of a stroke N and a downward L.

the word experimental
Reference
/P/
/S/
/P/
/R/
/MENTAL/ //
Pronunciation for the word experimental

/ P R M N T L/
208
/ /
Appendix
Figure 9.19: illustration of the use of the suffix MENTAL in a vocalised outline
The 21st rule

Objective: to convert the suffix MENTALLY into a series of stroke N followed by
downward L and a vowel . In fact, primitive representations of the suffix MENTAL and
the suffix MENTALLY are very similar. Figure 9.20 illustrates a sample Pitmans
Shorthand outline containing the suffix MENTALLY.
Strategy: if input phonemes contain the sound /MENTALLY/ at the end, convert the sound
/MENTALLY/ into a series of a stroke N, followed by a downward L and vowel .

the word experimentally
Reference
/P/
/S/
/P/
/R/
/MENTAL/ //
/ /
Pronunciation for the word experimentally

/ P R M N T L/
Figure 9.20: illustration of the use of the suffix MENTALLY in a vocalised outline
The 22nd rule

Objective: to omit the syllables TER, -DER, -THER and -TURE of a word according to the
double-length rule of Pitmans Shorthand (description of the rule can be referenced in
appendix B). A sample Pitmans Shorthand outline containing the syllable -TER is
Strategy: if an input contains the syllable /TER/, /DER/, /THER/ or /TURE/ in the middle or
at the end, and if the syllable is not surrounded by incompatible neighbouring primitives, the
syllable is removed from the input phonemes.
primitives are given (Figure 9.22).
209
Samples of incompatible neighbouring
Appendix

the word after
Reference
/FTER/
/F/
//
Pronunciation for the word after

/ AH F T R/
Figure 9.21: Illustration of the omission of the syllable TER in a vocalised outline
Primitive pairs that cannot be represented by doubling
/F K/
/V K/
/F G/
/V G/
Figure 9.22: illustration of incompatible primitive pairs for doubling
The 23rd rule

Objective: to omit the consonant /D/, following the consonant /M/ or /N/ of a word
according to the MD and ND writing rules of Pitmans Shorthand (description of the rules
can be referenced in appendix B). A sample Pitmans Shorthand outline containing the
sound /MD/ is given in Figure 9.23.
Strategy:
1. if a series of consonants /M/ and /D/ or /N/ and /D/ is followed by the sound
/SHUN/, the consonant /D/ is not omitted;
2. if a series of consonants /M/ and /D/ or /N/ and /D/ is followed by the sound /N/,
/NS/ or /NT/ at the end of a word, the consonant /D/ is not omitted;
3. if a series of consonants /M/ and /D/ or /N/ and /D/ is followed by a vowel at the end
of a word, the consonant /D/ is not omitted;
4. if a series of consonants /M/ and /D/ or /N/ and /D/ is followed by vowel+/S/ or
vowel+/Z/ at the end of a word, the consonant /D/ is not omitted;
210
Appendix
5. otherwise, omit a consonant /D/, following /M/ or /N/.

the word madam
Reference
/MD/
/M/
//
//
Pronunciation for the word madam

/ M D M/
Figure 9.23: Illustration of occurrence of the sound MD in of vocalised outline.
The 24th rule

Objective: to convert a pair of consonants FR, VR, Thr, THR, SHR, ZHR, MR or NR at the
beginning of a word into a series of a small hook followed by a corresponding consonant
primitive. A sample Pitmans Shorthand outline containing the sound /FR/ at the beginning
of a word is illustrated in Figure 9.24.
Strategy: if an input contains the sound /FR/, /VR/, /Thr/, /THR/, /SHR/, /ZHR/, /MR/ or
/NR/ at the beginning, then replace phonemes / FR/, /VR/, /Thr/, /THR/, /SHR/, /ZHR/, /MR/
or /NR/ with a, b, c, d, e, f, g and h respectively, where
a = small hook + stroke F
b = small hook + stroke V
c = small hook + stroke Th
d = small hook + stroke TH
e = small hook + stroke SH
f = small hook + stroke ZH
g =small hook + stroke M
h= small hook + stroke N
211
Appendix

the word free
Reference
/F/
/R/
/FR/
//
Pronunciation for the word free

/ F R /
Figure 9.24: Illustration of the use of FR hook at the beginning of a vocalised outline
The 25th and 26th rules

Objective: to convert the syllable PR, BR, TR, DR, CHR, JR, KR, GR, FR, VR, Thr, THR,
SHR, ZHR, MR, NR at the beginning, in the middle or at the end of a word into a series of a
small hook followed by a corresponding consonant primitive. A sample Pitmans Shorthand
outline containing the syllable PR is illustrated in Figure 9.25.
Strategy: replace the syllable /PR/, /BR/, /TR/, /DR/, /CHR/, /JR/, /KR/, /GR/, /FR/, /VR/,
/Thr/, /THR/, /SHR/, /ZHR/, /MR/ and /NR/ with a, b, c, d, e, f, g, h, i, j, k, l, m, n, o and p
respectively, where
a = small hook + stroke P
b = small hook + stroke B
c = small hook + stroke T
d = small hook + stroke D
e = small hook + stroke CH
f = small hook + stroke J
g = small hook + stroke K
h = small hook + stroke G
i = small hook + stroke F
j = small hook + stroke V
k = small hook + stroke Th
212
Appendix
l = small hook + stroke TH

m = small hook + stroke SH
n = small hook + stroke ZH
o = small hook + stroke M
p = small hook + stroke N

the word paper
Reference
/P/
/R/
/PER/
//
Pronunciation for the word paper

/PPR/
Figure 9.25: Illustration of occurrence of the syllable PR in a vocalised outline
The 27th rule

Objective: to omit the consonant /R/ in the sound /SKR/ or /SGR/. A sample Pitmans
outline containing the sound /SKR/ is illustrated in Figure 9.26.
Strategy: if an input contains the sound /SKR/ or /SGR/, and if the sound /SGR/ is not at the
beginning of the input, then replace /SKR/ or /SGR/ with a or b respectively, where
a = circle S+ stroke K
b = circle S + stroke G

the word describe
Reference
/D/
/S/
/K/
/R/
/B/
/SKR/
Pronunciation for the word describe

/ D S K R I B/
213
//
/I/
Appendix
Figure 9.26: Illustration of occurrence of the sound SKR in a vocalised outline
The 28th rule

Objective: to convert the sound /KW/ or /GW/ of a word into a series of a large hook
followed by a corresponding consonant primitive. A sample Pitmans Shorthand outline
containing the sound /KW/ is illustrated in Figure 9.27.
Strategy: replace the sound /KW/ and /GW/ with a and b respectively, where
a = large hook + stroke K
b = large hook + stroke G

the word quick
Reference
/K /
/W/
/KW/
//
Pronunciation for the word quick

/ K W K/
Figure 9.27: illustration of occurrence of the sound KW in a vocalised outline
The 29th rule

Objective: to convert the syllable PL, BL, TL, DL, CHL, JL, KL or GL in the middle and at
the end a word into a series of a small hook followed by a corresponding consonant
primitive.
Strategy: replace the syllable PL, BL, TL, DL, CHL, JL, KL, and GL with a, b, c, d, e, f, g
and h respectively, where
a = small hook + P stroke
b= small hook + B stroke
c = small hook + T stroke
214
Appendix
d = small hook + D stroke

e = small hook + CH stroke
f = small hook + J stroke
g= small hook + K stroke
h= small hook + G stroke
The 30th rule

Objective: to convert the syllable FL, VL, THL, ML, NL or SHL in the middle or at the end
of a word into a series of a large hook followed by a corresponding consonant primitive. A
sample outline containing the syllable FL is illustrated in Figure 9.28.
Strategy: replace the syllable FL, VL, THL, ML, NL and SHL with a, b, c, d, e and f
respectively, where
a = large hook + stroke F
b = large hook + stroke V
c = large hook + stroke TH
d = large hook + stroke M
e = large hook + stroke N
f = large hook + stroke SH

the word flow
Reference
/F/
/L/
/F U L /
//
Pronunciation for the word flow

/ F U L F L/
Figure 9.28: illustration of the use of FL hook in a vocalised outline
215
Appendix
The 31st rule

Objective: to convert a pair of consonants /S/ and /H/ into a circle S. A sample Pitmans
Shorthand outline containing a series of /S/ and /H/ is illustrated in Figure 9.29.
Strategy: replace the sound /S H/ of input phonemes with a circle S.

the word racehorse
Reference
/R/
/H/
/S /
/R/
//
/AW/
Pronunciation for the word racehorse

/ R S H S AW R S/
Figure 9.29: illustration of occurrence of /S/ followed by /H/ in a vocalised outline
The 32nd rule

Objective: to omit a small hook R when there is a series of S+vowel+hookR or
ST+vowel+hookR.
A sample Pitmans Shorthand outline containing a series of
S+vowel+hookR is illustrated in Figure 9.30.

Strategy: if an input contains /S+vowel+hookR/ or /ST+vowel+hookR/, omit the hook R.

the word supper
Reference
/S /
/P/
/R/
/PR/
/S PER/
//
Pronunciation for the word supper

/ S P R/
Figure 9.30: illustration of the occurrence of a series of S+vowel+hookR in a vocalised

outline
216
Appendix
The 33rd rule

Objective: to convert the consonant /L/ into a downward stroke. A sample Pitmans
Shorthand outline containing the consonant /L/ is illustrated in Figure 9.31.
Strategy:
1. if /L/ is following a hook N, it is not converted into downward stroke.
2. If /L/ is not at the beginning of a word and if it is following the stroke /N/ or /NG/, it
is converted into a downward stroke.

the word only
Reference
/N/
/L/(down)
/ /
//
Pronunciation for the word only

/ N L /
Figure 9.31: illustration of the use of a downward stroke L in a vocalised outline
The 34th rule

Objective: to convert the consonant /F/ or /V/ at the end of a word into a small hook. A
sample Pitmans Shorthand outline containing a hook F is illustrated in Figure 9.32.
Strategy:
1. if /F/ or /V/ is the only consonant of an input, it is not converted into a small hook.
2. if /F/ or /V/ is followed by the sound -ING, -INGS, T, D, S or Z and if it is not
preceded by the consonant /L/, it is converted into a small hook.
217
Appendix

the word rough
Reference
/R/
/F/
/R F /
//
Pronunciation for the word rough

/ R F/
Figure 9.32: Illustration of the use of hook F in a vocalised outline
The 35th rule

Objective: to convert a consonant /F/ or /V/ in the middle of a word into a small hook. A
sample Pitmans Shorthand outline containing the hook F in the middle is illustrated in
Figure 9.33.
Strategy:
1. if /F/ or /V/ is at the beginning of an input, it is not converted into a small hook.
2. If /F/ or /V/ is in the middle of an input, and if neighbouring consonants of /F/ or /V/
are two straight downward strokes, or a combination of a straight downward stroke
and a curve (Th, S or Z), the consonant /F/ or /V/ is converted into a small hook.

the word divide
Reference
/D/
/V/
/D V/
/D/
//
/I/
Pronunciation for the word divide

/ D V I D/
Figure 9.33: Illustration of the use of hook V in the middle of a vocalised outline
The 36th rule

218
Appendix
Objective: to convert the sound SHUN in the middle or at the end of a word into a small or
large hook. A sample Pitmans Shorthand outline containing a large SHUN hook at the end
is illustrated in Figure 9.34.
Strategy:
1. if the sound SHUN appears at the beginning of a word, it is not converted into a
hook;
2. if the sound SHUN is preceded by a circle S or Z, it is converted into a small hook;
3. otherwise, the sound SHUN in the middle or at the end of an input is converted into
a large hook.

the word attention
Reference
/T/
/N/
/SHUN /
//
Pronunciation for the word attention

/ T N SH N/
Figure 9.34: Illustration of the use of a large SHUN hook in a vocalised outline
The 37th rule

Objective: to convert the consonant /N/ of a word either into a small hook or a circle. A
sample Pitmans outline containing a hook N at the end is illustrated in Figure 9.35.
Strategy:
1. if /N/ is at the beginning of an input, it is not converted into a small hook.
2. If /N S/ appears at the end of an input and is preceded by a curve stroke, the
consonant /N/ is not converted into a small hook.
3. If /N/ is immediately following /S/ or /Z/, it is not converted into a small hook.
4. If /N/ is at the end of an input, it is converted into a small hook.
219
Appendix
5. If /N Z/ appears at the end of an input and is preceded by a curve stroke, the sound
/N Z/ is converted into a series of a small hook followed by a small circle.
6. If /N Z/ or /N S/ appears at the end of an input and is preceded by a straight stroke,
the sound /N Z/ or /N S/ is converted into a small circle.
7. If the sound /N SES/ or /N ZES/ appears at the end of an input and is preceded by a
straight stroke, the sound /N SES/ or /N ZES/ is converted into a large circle.
8. If the sound /N STER/ or /N ST/ appears at the end of an input and is preceded by a
straight stroke, the sound /N STER/ or /N ST/ is converted into a large loop or small
loop respectively.
9. If the sound /N T S/ or /N D S/ appears at the end of an input and is preceded by a
straight stroke, the sound /N T S/ or /N D Z/ is converted into a small circle.

the word alone
Reference
/L/
/N/
/N /
//
//
Pronunciation for the word alone

/ L N/
Figure 9.35: Illustration of the use of hook N at the end of a vocalised outline
The 38th rule

Objective: to convert the consonant /L/ of a word into an upward stroke. A sample Pitmans
Shorthand outline containing the consonant /L/ is illustrated in Figure 9.36.
Strategy: replace consonant /L/ with an upward stroke L.
220
Appendix

the word mail
Reference
/M/
/L/
/ /
Pronunciation for the word mail

/ M L/
Figure 9.36: Illustration of the use of an upward stroke L in a vocalised outline
The 39th rule

Objective: to omit the consonant /D/ or /T/ in a word of two or more syllables according to
the half-length rule of Pitmans Shorthand (description of the rule can be referenced in
appendix B). A sample Pitmans Shorthand outline containing the omission of /D/ and /T/ is
Strategy:
1. if neighbouring consonants of /T/ or /D/ are incompatible, the consonant /T/ or /D/ is
not omitted. A list of incompatible neighbouring consonants in relation to the
omission of /T/ or /D/ is given (Figure 9.38).
2. otherwise, /T/ or /D/ is omitted.
the word deduct
Reference
/D/
/D D/
/ K/
/K T/
//
//
Pronunciation for the word deduct

/ D D K T/
Figure 9.37: Illustration of omission of T or D in a vocalised outline
221
Appendix
A series of primitives that cannot be represented by halving
F/V + K/G + T/D
R+T/D
K/G +K/G +T/D
S+T
L+K/G+T/D
Figure 9.38: Illustration of incompatible combination of primitives for halving
The 40th rule

Objective: to convert the suffix LY into a vowel primitive according to the LY rule of
Pitmans Shorthand (description of the rule can be referenced in appendix B). A sample
Pitmans Shorthand outline containing the suffix LY is illustrated in Figure 9.39.
Strategy: if the input contains the suffix LY, replace the suffix LY with a vowel .

the word solely
Reference
/S/
/L/
/ /
/ /
Pronunciation for the word solely

/SLL/
Figure 9.39: Illustration of the use of suffix LY in a vocalised outline
The 41st rule

Objective: to convert the consonant /R/ into an upward or downward stroke. Sample
Pitmans Shorthand outlines containing an upward or downward R are illustrated in Figure
9.40.
Strategy:
222
Appendix
1. if /R/ appears at the beginning of an input, it is converted into an upward stroke.

2. If /R/ is followed by a sounded vowel at the end of an input, it is converted into an
upward stroke.
3. If /R/ appears at the end of an input, it is converted into a downward stroke.
4. If /R/ is preceded by a vowel at the beginning of an input, it is converted into a
downward stroke.
5. If /R/ is followed by a circle S or SES at the end of an input, it is converted into a
downward R.
6. If /R/ is followed by /M/, it is converted into a downward stroke.
the words rail and erase
Reference
/R/
/L/
/ R/
/S/
//
//
Pronunciations for the words rail and erase

/ R L/, / R S/
Figure 9.40: Illustration of the use of an upward or downward R in vocalised outlines
The 42nd rule

Objective: to convert the consonant /H/ at the beginning of a word into a dash primitive. A
sample Pitmans Shorthand outline containing a dash H at the beginning is illustrated in
Figure 9.41.
Strategy: if /H/ appears at the beginning of an input and is followed by /M/, /L/ or downward
R, it is converted into a dash primitive.
223
Appendix

the word home
Reference
/H/
/H/
/M/
//
Pronunciation for the word home

/ H M/
Figure 9.41: Illustration of the use of a dash H in a vocalised outline
The 43rd rule

Objective: to reverse an orientation of initially hooked FR, VR, Thr or THR according to the
reverse rule of Pitmans Shorthand (description of the rule can be referenced in appendix
B). A sample Pitmans Shorthand outline containing a reversed VR hook is illustrated in
Figure 9.42.
Strategy: if a series of /small_hook+F/, /small_hook+V/, /small_hook+Thr/ or /small_
hook+THR/ is followed by an upstroke or a horizontal stroke, it is converted into a series of
/small hook +R/, /small hook+R/, /small hook+stroke S/ or /small hook +stroke Z/
respectively.

the word cover
Reference
/K/
/V R/
/V R/ (reversed)
//
Pronunciation for the word cover

/ K V R/
Figure 9.42: Illustration of the use of reverse VR hook in a vocalised outline
The 44th rule

224
Appendix
Objective: to convert consonants that have not been converted into geometric features into
their corresponding primitives.
Strategy:
1. replace the consonant /P/ with stroke P
2. replace the consonant /B/ with stroke B
3. replace the consonant /T/ with stroke T
4. replace the consonant /D/ with stroke D
5. replace the consonant /K/ with stroke K
6. replace the consonant /G/ with stroke G
7. replace the consonant /M/ with stroke M
8. replace the consonant /N/ with stroke N
9. replace the consonant /NG/ with stroke NG
10. replace the consonant /F/ with stroke F
11. replace the consonant /V/ with stroke V
12. replace the consonant /Th/ with stroke Th
13. replace the consonant /TH/ with stroke TH
14. replace the consonant /W/ with a series of small hook followed by upward R
15. replace the consonant /Y/ with a series of small hook followed by upward R
16. replace the consoant /CH/ with stroke CH
17. replace the consonant /JH/ with stroke JH
18. replace the consonant /SH/ with stroke SH
19. replace the consonant /S/ with a small circle
20. replace the consonant /Z/ with a small circle
21. replace the consonant /ZH/ with a stroke ZH
22. replace the consonant /H/ with a series of small circle, followed by stroke R
The 45th rule: extract vowels of a word and append them to the end of consonant primitives.
225
Appendix
The 46th rule: convert vowels into their corresponding geometric primitives.
226
Appendix
Appendix B
Certain Rules of Pitmans Shorthand Mentioned in the Discussion

on the Automatic Generation of a Machine-readable Pitmans
Shorthand Lexicon
Rules name
Description of the rule stated in Pitmans Shorthand[Oj95]
MD, ND
Strokes M and N are halved and thickened to add the

following sound of D.
Double length strokes
All curved strokes are doubled in length to represent the

addition of the syllables TER, -DER, -THER, -TURE.
Half-length strokes
In words of two or more syllables a stroke is generally halved

to indicate the following sound T or D.
Suffix -LY
The suffix LY is represented by upward L and the third place

vowel.
Reversed FR, VR, Thr,
The initially hooked FR, VR, Thr, THR are always reversed
THR
when immediately following upstrokes and horizontals.
227

SweMyoHtweThesis PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SweMyoHtweThesis PDF

Uploaded by

Copyright:

Available Formats

Lexicon Organisation and

Contextual Methods for Online

by Swe Myo Htwe, BSc

Thesis submitted to the University of Nottingham for the degree of

To my parents and fianc

LINGUISTIC POST PROCESSING OF HANDWRITTEN PITMANS SHORTHAND .. 1

EVALUATION OF EXISTING TEXT INPUT SYSTEMS FOR HANDHELD DEVICES ....................... 16

NATURAL LANGUAGE PROCESSING ALGORITHMS FOR HANDWRITTEN PHRASE RECOGNITION

SYSTEM OVERVIEW ............................................................................................................ 41

BAYESIAN NETWORK BASED WORD TRANSCRIPTION ........................................... 61

SYSTEM OVERVIEW ........................................................................................................... 63

Analysis of word transcription accuracy using stroke-combination data set ............... 99

Analysis of the recognition accuracy vs. the transcription accuracy.................................99

Analysis of the recognition accuracy vs. the transcription accuracy.................................94

Analysis of the recognition accuracy vs. the transcription accuracy............................... 103

DISCUSSION ..................................................................................................................... 107

GENERATION OF A MACHINE-READABLE PITMANS SHORTHAND LEXICON

INTRODUCTION ............................................................................................... 111

INTRODUCTION ............................................................................................... 138

CONTEXTUAL REJECTION STRATEGY ............................................................................... 139

INTRODUCTION ............................................................................................... 149

CONCLUSION ....................................................................................................................... 171

INTRODUCTION ............................................................................................... 172

RESEARCH WORK SUMMARY ............................................................................................ 172

FIGURE 1.1: SCOPE OF THE THESIS........................................................................................... 6

FIGURE 2.12: S IS CONDITIONALLY DEPENDENT ON R GIVEN AN OBSERVED

FIGURE 3.4: SAMPLE NEIGHBOURHOODS PREDEFINED IN THE NEAREST

FIGURE 4.2: ILLUSTRATION OF BAYESIAN NETWORK BASED WORD

1. Linguistic Post Processing of Handwritten Pitmans Shorthand

Linguistic Post Processing of Handwritten Pitmans

1. Linguistic Post Processing of Handwritten Pitmans Shorthand

Examples are provided by

1. Linguistic Post Processing of Handwritten Pitmans Shorthand

1. Linguistic Post Processing of Handwritten Pitmans Shorthand

Taking into consideration a simultaneous verbatim written transcription of speech, the

1. Linguistic Post Processing of Handwritten Pitmans Shorthand

Compared to the time of the previous research,

In an on-line recognition, an input is in the form of successive points of strokes collected in

1. Linguistic Post Processing of Handwritten Pitmans Shorthand

Scope of the thesis

Figure 1.1: Scope of the thesis

1. Linguistic Post Processing of Handwritten Pitmans Shorthand

1.2 Brief Overview

1. Linguistic Post Processing of Handwritten Pitmans Shorthand

A set of questions relating to the linguistic post processing:

The linguistic postprocessor is based on the recognition engine developed by our

1. Linguistic Post Processing of Handwritten Pitmans Shorthand

Tablet PC based graphical user interface

1st sample input outline

1. Linguistic Post Processing of Handwritten Pitmans Shorthand

(a) An electronic version of a shorthand dictionary has been successfully created by

A complete framework for the online recognition of handwritten Pitmans

A set of questions relating to the development of a mobile PC application:

For what types of handheld devices is the system intended to be applicable?

For what kind of domain/scenario is the application aimed to be applicable?

The application is aimed to be applicable:

1. Linguistic Post Processing of Handwritten Pitmans Shorthand

(a) as a rapid text input system on handheld devices

A set of questions relating to training and testing of the overall system:

How is the whole system performance evaluated?

The overall system performance is evaluated under different criteria such as

1. Linguistic Post Processing of Handwritten Pitmans Shorthand

1.3 Synopsis of the Dissertation