You are on page 1of 13

438

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. 18, NO. 3, MAY /JUNE 1988
Correspondence _____________ _
A Model for Variability Effects in Handprinting with
Implications for the Design of Handwriting
Character Recognition Systems
JEAN R. WARD, ASSOCIATE MEMBER, IEEE, AND
THEODORE KUKLINSKI, MEMBER, IEEE
Abstract -A new implementation model for handwriting variability that
is combinatorial and syntactic is presented. The work is a synthesis of
theoretical aspects and observations of handprinting styles in a large
collection gathered during the development of a commercial on-line char-
acter recognition system. Several physical factors affecting variability in
handprint are reviewed. Next, a combinatorial, predictive model for these
factors and its use in a particular design of a device for on-line character
recognition is presented, in which the variability effects are treated sep-
arately from the recognition process. Beyond the use in this system, the
model implies several substantial testing and performance problems for
reliable handwritten character recognition.
I. INTRODUCTION
Many prototypical systems for handwriting character recogni-
tion have been developed in the past 30 years [1]. More interest-
ingly, the last five years have finally seen several commercial
devices for on-line character recognition introduced in the world
market [2], [3], [7], [8], [10], [11]. There are also commercial
devices primarily for Japanese character recognition, but with
application to English-language writing [15]-[17]. We believe this
reflects both new developments in what has been a hard problem
-reliable machine recognition of freely-written text-and in-
creased attention to different aspects of writing and "graphical
gesture" input to solve human factors issues in computer/user
interfaces [18]-[23].
Brown [24] finds one drawback in the current high level of
commercial interest in developing on-line character recognition;
many significant new developments are not being published.
Serious functional issues were reported for static recognition of
handpriJ:?.ted only after actual, commercial use [25]. Siinilar
questions have' come up after commercial use of on-line character
recognition. Nilssen [26] discusses problems unrelated to the
underlying character recognition technology in conforining
hand-printed input to systems intended for keyboard input. Gould
[27] makes similar points on the application of speech-recogni-
tion technology. These questions have drawn increased interest at
recent technical conferences [28], [29].
We present a predictive model for many variability effects in
handprinting. Our model concerns "generative" variability (how
characters are written) rather than "perceptual" varibility; the
same identical shape is recognized differently by different sub-
jects or the same subject at different times. Parts of the model are
a synthesis of factors previously only considered in isolation.

Kuklinski [30] describes the combinatorial explosion of
enumerating handwriting styles, but does not propose a
solution.
Manuscript received October 22, 1986; revised July 20, 1987.
J. R. Ward is with Teledyne/TAC, 10 Forbes Rd., Woburn, MA 01801.
T. Kuklinski is with Recognitive Sciences, 24 Henshaw Terr., West Newton,
MA 02165.
IEEE Log Number 8717378.
Murase [31] mentions the variability resulting from a writer's
tendency to connect different strokes of sketch symbols,
but only dealt with a few samples. Shridhar [32] also
mentions cases of connected strokes within a character, and
between characters: the same effect has been reported for
Chinese characters [33]. Yoshida [34] deals with this ques-
tion indirectly, by artificially connecting all strokes of each
character before attempting recognition. Tappert [35] men-
tions the existence of intermediate partially connected writ-
ing forms between pure script and pure blockprinting.
Some efforts to model the mechanical motion of the
hand while writing [38], [39] are more oriented to simulat-
ing handwriting output than predicting the variety of forms
of handwriting input.
Brown [24] and Ward [40] report on some artifacts of input
that must be dealt with in any real-world handwriting or
handprinting input system. Phillips and Ward [44], [45] give
a detailed analysis of the liiniting performance of digitizing
tablets and the effects on the image "seen" by a computer.
Tappert (46] refers briefly to the problem of hooking and
continuation marks.
Cox et a/. [47] deal with the embellishments to machine-
printed characters separately from the base form of the
character. This work parallels for "typed" input the work
here on handwritten input: the difference is that the embel-
lishments for machine-printed characters are purely artifi-
cial and different in form from the variability in handwrit-
ing caused by the physical act of writing.
Suen [48] and Yacyk [49] mention the tendency for even
trained writers to revert to their usual writing styles. In
each case, the solution seems to be to classify these samples
as "improper," and therefore not to be considered funda-
mental errors in a recognition system not designed for
them. A work by Shillman et a/. [50] points out that
"classical shapes do not occur frequently" in handwritten
input, and the same point is made by Brooks and Newall
[53]. To us, this implies that a system preinised on recogniz-
ing ideal writing forms is fundamentally limited: the point
is discussed in more detail by Blesser et a/. [54].
Most collections of written characters are stored as static
information for optical character recognition (OCR) develop-
ment. Their data will not reflect some features (such as retraces)
or the time-sequence information used in on-line recognition, and
the resolution (typically 12 X 12 or 24 X 24) typical of OCR char-
acter images [1] is low enough to cover up many variability
effects. We have a collection of over 85 000 characters, and are
aware of one other [55] for on-line character recognition: other
commercial organizations working this area may have similar
collections.
II. TOPOGRAPHICAL LABELING
In this section we present the particular feature extraction and
labeling scheme used in an implementation of our model. The
complete system incorporates other feature extraction methods,
described generally by Blesser et a/. [56], which are not used in
the predictive model for variability presented in this paper.
We use a simplified form of chain codes based on points of
local extrema in X or Y. The use of local extrema is independent
of stroke segmentation. Chain code schemes are frequently used
in handwriting character recognition, and much work has gone
1988 IEEE
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. 18, NO. 3, MAY /JUNE 1988 439
into their refinement [58], [59]. We make no claim that our
particular labeling scheme is fundamentally more sound than
others, but use it only to show the implementation of this
particular variability model in a functioning character recogni-
tion system.
The chain code scheme is similar to the methods presented by
Convis [60] and Crane [61]: later work by Crane eta!., has also
produced commercial products for on-line character recognition
[11]. At least one author reported an effective recognition scheme
based solely on chain codes [62]: another reference using dy-
namic matching is [63]. The use of chain codes has also been
extended to image-processing applications that do not involve
pattern recognition [64].
A. Chain Codes in Modeling Human Character Recognition
There are reasons to believe that the use of feature points
similar to these chain codes and the segmentation of characters at
points based on these feature points may relate to some funda-
mental mechanism in human recognition and writing of char-
acters.
Our particular scheme is not sensitive to the relative positions
of the chain code points; the points are purely local features.
Several researchers into theoretical models of human vision and
perception have reported that local features appear to play a
more dominant role than relative position in human visual per-
ception [65], [66], or that local features are extracted at a more
fundamental, precognitive level than the global, positional infor-
mation [67], [121]. Early developers of handwritten character
recognition systems cited the lack of sensitivity to position as
contributing to the "robustness" of the recognition [68], [140].
Chain code schemes are fundamental to many character recog-
nition systems described in the literature: this is an informal
indication that something like chain codes at least represents. the
individual researcher's mental process in recognizing characters.
Convis [60] uses essentially the same scheme as ours to represent
handwriting with a minimum number of points and maintain
legible fidelity to the original image, as defined by a human
reader. Naus and others in the same research group [69], [70]
describe human perception of characters based on "functional
attributes" whose feature points map directly to various types of
chain code points. Before the advent of machine character recog-
nition for handprinting, Wright [71] described historical writing
styles in terms of segmentation within a character and the chain
of curve points within each segment, using similar forms to ours.
B. Chain Codes in Modeling Writing Variability
Our model is based on neuromotor and mechanical effects in
the handwriting process, instead of perceptual models of recogni-
tion. Preliminary studies we have made indicate similar predictive
results when this model is applied to alphabets substantially
different from the Roman alphabet used in North America and
Europe [73]. This is interesting in light of the comment by Mori
et a!. [74] that Hiragana recognition, an alphabet functionally
similar to the Roman alphabet, is about as difficult to recognize
as Kanji, generally considered the most difficult due to it's sheer
size-we suspect the difficulty with Hiragana is due to the
variability effects described here.
C. Chain Code Classification as Used in Figs. 1- 4
Our chain code scheme is based on local extrema with respect
to the X- and Y-axes. It is affected by changes in rotation:
orientation is itself a significant factor in recognizing characters.
In Figs. 1 and 2 we label local points of extrema as upper-,
down-, left-, or rightmost.
The purpose is to quantify and uniquely label the number of
variants predicted by the model. Many other features that may be
of interest are handled separately in our recognition system: this
chain code scheme is not sensitive to them. One example is a
point that might be considered a "near-miss" for a chain code
(a) (b) (c) (d)
Fig. 1. Chain-code examples: one-stroke. (a) "W." (b) "3." (c) "E."
(d)"M."





r- --.R
D D D D
(a) (b) (c) (d)
Fig. 2. Chain-code examples: multistroke. (a) "W." (b) "5." (c) "E."
(d)"M."
i_6 _____ i
I I
1----l
I I I
L _I ____ j
(a) (b)
Fig. 3. One-stroke "B" (a), showing ((b)-enlarged view) of a near miss for
a chain-code point.
Upwards- Left Upwards- Right
Downwards- Lett Downwards-Right
Fig. 4.
point as in Fig. 3-an algorithm for adding this special case is
given by Freeman and Davis [59].
It is useful to introduce some special cases to extend this
classification scheme: for example certain kinds of "dot" strokes
that may contain regions of "rat's nest" writing motion, or
straight-line strokes without detectable curvature, which have
only two chain-code points. We classify nominally straight strokes
as to which of four quadrants they are drawn in (in Fig. 4),
similar to the method of Hidai eta!. [75].
Geometric chain code schemes obey a number of syntactic
rules. These are similar to the syntactic constraints described by
Maurer et a!. [76] for descriptions of graphical images: Berthod
and Maroy [77] describe a similar case specific to handwriting.
We use these syntactic constraints to check for logical incon-
sistency in the implementation.
D. Sensitivity to Variation
The enumeration of variants is dependent on the features
counted in the enumeration. We believe the implementation
presented here is in practice less sensitive to variations (does not
count as many variations as "distinct") than some other systems.
Our scheme, for example, counts chain codes taken along only
four directions: others use as many as 16 [78].
We believe our method is generally effective in separating
generative variability effects from functional attribute features.
Shillman [79] points out several features that might be measured
in a recognition scheme that have no direct effect on the human
perception of what the character is.
Our implementation of this variability model is not sensitive to
stroke size, position, or intersections, and is relatively insensitive
to degree of curvature except as this affects the labeling of
440
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. 18, NO.3, MAY/JUNE 1988
Fig. 5. Nominally straight horizontal line, showing threshold for labeling of
chain-code points.
chain-code points. This is an intentional choice in our implemen-
tation. These latter positional features are treated separately as
"functional attributes" [69) in a later stage of analysis that does
not then need to deal with the variability effects mentioned here.
Including additional factors in the model (such as relative posi-
tion) would contribute to an even greater combinatorial explo-
sion of variants.
We require a minimum discrete movement in X or Y before a
chain code is labeled (the value is intentionally small), and allow
up to four chain code points in a nominally straight-line parallel
to a coordinate axis (see Fig. 5). The exact mechanism has been
described separately for image processing [43].
III. BASE FORMS
Many character recognition schemes are based on comparisom
of input shapes to ideal forms: Pavlidis (80) defines patterr
recognition in exactly these terms. Our model deals with the
variations on the ideal, or base, form in writing styles due tc
handwriting effects. Different forms of writing the same nominal
"alphabet" can result in forms that even human readers must be
specifically trained to recognize. For example, the text in Fig. 6 is
written in the script style (Fraktur) [81] taught in German-speak-
ing countries up until 1945, and is illegible to most modem
readers in English-speaking countries.
Suen [82] refers to a study of the writing styles taught in North
America, as distinct from a statistical study of what writing styles
actually occur. As a practical matter, certain styles of "printing"
predominate, varying with the population studied. Our system
uses slightly different collections of base forms for different
national markets. Within the confines of this model, we treat
fundamentally different base forms as separate paradigms for
writing, and make no attempt to treat radically different writing
styles for a character as merely variants of the same character.
A. Choice of Base Forms for Implementation
We include as base forms those reported in two comprehensive
statistical studies by Suen [83], [85) along with others encountered
in our own studies. All of these studies treat differences in stroke
order as different base forms.
The most consistent source of different base forms is North
American versus European background of the subjects. There are
three obvious differences shown in Fig. 7: crossed and uncrossed
"7": crossed and uncrossed "Z"; single mark and inverted-V
"1." We have also found less well-known differences in other
characters, such as "4" and "H" as seen in Fig. 8.
Certain other base forms result (Fig. 9) from the subject's
professional background [71]. Usually only subjects with a tech-
nical education use the crossed-zero, and only subjects with
drafting training use the inverted-T two-stroke one.
B. Script-Like Forms
We do not attempt to place any particular constraints on the
subject's writing style other than it be disconnected characters. In
this "unconstrained" data (Fig. 10), we find several disconnected
forms of upper-case script letters. However, the majority of these
"pseudoscript" forms are identical with the variants of printed
letters predicted by connecting the individual strokes, so we do
not need to treat them as different base forms.
Fig. 6. "Franz Josef Strauss" written in 19th Century German script. (Note:
Mark over the final "u" is written to distinguish it from lowercase "n." It is
not the form of the umlaut.)
Fig. 7.
Fig. 8.
(a) (b) (c) (d) (e) (f)
(a) American "7." (b) European "7." (c) American "Z." (d) European
"Z." (e) American "1." (f) European "1."
(a) (b) (c) (d)
(a) American "4." (b) European "4." (c) American "H." (d) European
"H."
(a) (b) (c) (d)
Fig. 9. (a) Crossed "0." (b) Uncrossed "0." (c) Drafting "1." (d) Typical
"1."
(a) (b) (c) (d) (e) (f) (g) (h)
Fig. 10. (a) Printed "b." (b) Script "b." (c) Printed "D." (d) Script "D."
(e) Printed "H." (f) "H." (g) Printed "M." (h) Script "M."
(a) (b) (c) (d)
Fig. 11. (a) Printed "C." (b) Script "C." (c) Printed "L." (d) Script "L."
(a) (b) (c) (d)
Fig. 12. Printed "A." (b) Script "A." (c) Printed "G." (d) Script "G."
In Fig. 11 these are variants of the handprinted form, not
cursive forms. The similarity of the connected-stroke forms to
manuscript writing style is noteworthy. Some additional script
forms are identical to the pseudoscript forms result from hooks
and loops caused by the pen motion from the previous character.
We find a few true cursive forms such as those in Fig. 12; these
must be treated as distinct forms.
The factors we outline here are known to have played a role in
the evolution of various script forms of alphabets, both for
connected writing and separate characters [86).
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. 18, NO. 3, MAY/JUNE 1988 441
13
Fig. 13. Segmentation points for one-stroke letter "B."
Fig. 14. Segmentation points for the letter "Z."
C. Segmentation Within a Character
Based on the work of Wright [71], we describe the variable
portions of characters as segments delimited by the points corre-
sponding to the ends of the original strokes in the base paradigm,
and the ends of the connecting segments. In general, we chose as
segment boundaries the points corresponding to the endpoints of
the strokes in the base form. These generally become sharp
changes in direction while writing the more connected derived
forms. We have also found it useful to pick boundary points
which correspond to sharp changes in writing motion from
leftwards to rightwards, or upwards to downwards, similar to the
chain codes described by Crane [61]. Generally, the two criteria
select the same points. We give two examples here (Figs. 13 and
14).
For the one- and two-stroke variants of "B" we use the
endpoints for the strokes of the three-stroke variant, which gives
us a total of four segments for the one-stroke variant. The
segment boundaries become sharp changes in direction in the
connected variants.
For the one-stroke variant on "Z," the segmentation points are
the sharp comers at the upper-right and lower-left. These chain
code points are substantially invariant in written samples of the
character.
Please note that in this implementation these segmentation
points are for chain code expressions, not individual samples, and
are chosen by the designer for convenience when defining the
base forms for a character. The points chosen can be compared
to those found by the automated procedure used in elastic
matching [87].
D. Stroke Direction
Our model does not specifically predict stroke order or direc-
for forms of a character. In our implementation,
d1fferences m the stroke order are generally treated as different
base forms: these effects have been studied elsewhere [85] and are
well-known among researchers into handwriting character recog-
nition. Our model does sometimes appear to predict stroke order
in some cases, such as when the initial "real" stroke is omitted
from a retraced stroke connected to the next stroke in a char-
acter.
We have verified in our collection samples the assumptions
that vertical strokes in base forms are made from the top down,
horizontal strokes go from left to right for right-handed writers,
and vertical strokes usually precede horizontal strokes within a
character. The physiological basis for these tendencies is de-
scribed by Hollerbach [88].
IV. VARIABILITY FACTORS
We now list the individual factors incorporated in this model
of hand-print variability. For each, we list possible physical
causes. Most of these factors can be present in different instances
Fig. 15. Base form and variants for "N."
Fig. 16. One-stroke "T'"s.
[I] [I]

D
(a) (b) (c)
35% OF
LENGTH
Fig. 17. (a) Straight. (b) Left bow. (c) Right bow.
of writing by the same individual, leading to intrasubject varibil-
ity. The occasional tendency to connect strokes, even by writers
who do not "normally" do so, is an obvious case.
A. Connection of Strokes
The first effect we incorporate in this model is the tendency for
sequential strokes to be connected within a character. The cause
is a failure to lift the pen off the paper between strokes, leading
to a more script-like form of the character. Several examples were
given earlier in the discussion of true script and pseudoscript
variants.
The strokes interpreted in on-line character recognition can
also be connected by poor mechanical design of the writing
stylus, even though the written image in ink shows a clear break
between the strokes. This purely mechanical effect is rare with
the tablet and stylus we use, and we do not consider this cause in
this model.
When the connecting segment retraces the segment just writ-
ten, the connecting segment can be on either side of the original
stroke segment. When the two strokes being connected are close
together, the tendency is to reverse the second stroke to make a
smoother connection to the first. We give examples for the
one-stroke variant of "N" in Fig. 15.
Connecting segments are more common on sequential vertical
or horizontal strokes, and on strokes whose ends are close, than
on strokes where one is horizontal and the other vertical. The
one-stroke "T"s in Fig. 16 are quite infrequent in the data we
have collected in North America, compared to the two-stroke and
one-stroke variants of the "N," although they are more frequent
in Europe [89]. Their inclusion in other systems [11] may reflect
either a difference in the populations used to collect data, or
mechanical problems in the tablet stylus. We also note national
variations for writers of Chinese characters [33].
B. Curves
Nominally straight strokes and segments are only approxi-
mately straight in handwriting. Brooks and Newall [53] state that
vertical curved strokes are very difficult to write reliably. Fig. 17
shows how we allow any nominally straight segment to be curved
to produce a total bow from straight of 35 percent of its full
length.
442 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. 18, NO.3, MAY/JUNE 1988
[ZJUJ[IJ
(a) (b) (c)
Fig. 18. (a) Straight. (b) "S"-recurve. (c) "Z"-recurve.
(a) (b) (c)
Fig. 19. (a) Open loop. (b) Partially collapsed. (c) Fully collapsed cusp.
(c) Enlarged view of (b).
[][]2l[J
(a) (b) (c)
Fig. 20. (a) Cusp "V." (b) Small loop. (c) Enlarged view of (b).
Nominally straight vertical and horizontal strokes present a
particular problem. We allow up to two chain-code points for a
mild double curvature, as in Fig. 18.
C. LoopingjCusping
Blesser [56] and Shillman [79] mention several forms of func-
tional looping; Mori eta/. [74] cite this work in discussing Kanji
handprinting. In contrast, this model is not for the perceptual
questions of looping, but for the generative variability in the
writing process.
There are two general effects we include in this model: the
tendency for intentional loops to collapse into a cusp, and the
tendency for nominal cusps to vary into small loops. In Fig. 19
the loop at the bottom of a one-stroke "8" can collapse into a
physical cusp while still remaining a legible character.
There is a counteracting tendency for sharp cusps to be written
as small loops, particularly in the presence of motion at right
angles to the predominant writing motion. We show a cusp-"V"
written with a slight tilt, and the possible resulting small loop in
Fig. 20.
Cusps occur frequently in retraced segments, such as those that
occur because of a connecting segment between strokes. We
believe the cause of this effect in Fig. 20 is the tendency to avoid
clockwise movement in rapid writing, as reported by Hollerbach
[88]. Badie and Shimura [90] report that "corners" are also
caused in script writing by different ways of writing an intended
curve or endpoint. Note the interchange of cusps and small loops
in the retraced sections of the following "M" characters in
Fig. 21.
Functional distinction of loops and cusps is a difficult problem
in theory [46]. Our studies show it is especially difficult in
practice, because the same physically identical feature comes
from either an intended functional loop or intended cusp-the
physical feature spaces overlap: Arkadev and Braverman [91]
have several papers that describe one character "invading the
feature space" of another.
Fig. 21. Interchange of physical loop and cusps in one-stroke "M."
Fig. 22. Intentional retrace during overwriting of four-stroke "E."
(a) (b) (c)
Fig. 23. (a) Two-stroke "B." (b) Retraced "B." (c) Omitted retrace.
(a) (b)
Fig. 24. (a) Four-stroke "W." (b) One-stroke "W."
D. Disappearance of Retracejlnitia/-Trace
We distinguish between two types of retrace: intentional atypi-
cal retrace, and connecting segment retrace.
The first is an intentional retrace we have seen subjects resort
to when they are unable to get a character recognition device to
accept a particular shape. Fig. 22 shows how a subject retraced
the strokes of the character repeatedly, much as he or she would
to write heavily over a character already on the page. Our on-line
recognition system is like others in registering writing motion, not
visual image: we have incorporated a limited pre-processing
capability to remove some degree of these atypical retraces, but
they remain an unbounded problem. Since these characters are
not part of any subject's normal writing, and occur only because
of feedback of improper recognition, we do not incorporate this
sort of retrace in our model.
Connection retrace, on the other hand, leads to an additional
factor we do incorporate in this model: segment omission. If an
initial downstroke is exactly retraced by the connecting segment
to the next stroke, we frequently find cases where the initial
down-stroke segment is completely omitted, leaving only the
nominal "retrace" segment. Fig. 23 shows the progression of the
effect for a written "B." We find this same effect in Fig. 24 as the
source of the one-stroke "W" from its four-stroke base form.
Litvin [92] cites studies of this effect for Russian writing.
E. Stroke Order
Our model deals mainly with intrasubject varibility, and the
same effects will also apply to intersubject variability. One aspect
of intersubject variability we do not explicitly include or exclude
are cases where different subjects make the same strokes for the
same character in a different order. We treat these cases as
different base forms. Within our collection, we see a general
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. 18, NO.3, MAY/JUNE 1988
443
Fig. 25. One-stroke "1."
trend to make vertical strokes before horizontal srokes, and to
make strokes on the left before strokes on the right. This is
consistent with the left-to-right order for writing groups of char-
acters in text. We have found few cases where the order of
strokes varies for one subject's writing.
It is well known that there are differences in stroke order and
direction for many left-handed writers compared to most right-
hand writers is well known. We have further found a distinction
between left-handed writers who hold the pen in a mirror-image
of the right-handed position and push it ahead of the hand, and
those who curl the wrist over the pen and drag the pen behind
the hand.
F Hooking/Continuation
We distinguish between hooks at the start of a downward
stroke, and continuation marks at the end of a stroke as shown in
Fig. 25, where both effects are on the same character.
Hooks at the start of a stroke tend to come up from the left,
since the previous motion was generally from the bottom of a
previous character or stroke. Continuation marks are most visible
at the ends of strokes, continuing up to the start of the next
character or stroke. For left-handed writers with idiosyncratic
writing styles, the direction of the hooks and continuation marks
is more difficult to predict.
Mechanical aspects of the writing stylus increase the incidence
of hooks [40], [44]. However, these features occur naturally as
well: they can be considered a form of "allograph selection"
according to the previous context [93].
Tappert [46] mentions the problem of hooking and continua-
tion at the starts and ends of strokes as a defect of the mechani-
cal switch in the tablet stylus. We have observed this effect solely
by viewing the inked or marked image left by writing on paper.
Later work [35] mentions an attempt to remove hooks artificially
by an image preprocessing algorithm [36], with partial success
due to removal of "true" hooks-the problem exists also with
Pitman shorthand [53]. We have derived a different method for
removing similar effects caused by slow writing motion [43].
Physically, the hooking can be caused in the "true" writing
image as part of the tendency to connect strokes, where the pen
was not lifted from the page immediately at the nominal end of
the stroke. We base this model on the actual written ink image
left on the page, not on the digitized image. Since both stylus
mechanics and writing motion produce the same effect, we do
not have a requirement to preprocess the image for hook and
continuation removal, and thus are not affected by accidental
removal of a " true" hook.
G. Intermediate Forms of Retracing/Hooking
Initial segments of retraces can vary considerably in length for
the same writer: there is no reliable measure to distinguish a
partially-omitted retrace from the larger sizes of hooks.
This effect is caused by the dynamics of writing motion,
similar to the problem of controlling the touch-down point when
landing a light aircraft on a long runway. The horizontal motion
is much greater than the vertical motion to lift or put the pen in
contact with the paper; the angle of attack is close to horizontal.
Slight variations in either vertical motion or angle of attack will
cause a disproportionate change in the point at which the pen
actually makes contact with enough pressure to start the flow of
ink.
H. Broken Strokes
A well-known problem in most attempts at handwriting char-
acter recognition is broken strokes caused by false pen-lifts when
the pen-tip-down sensing mechanism opens during a stroke. This
is also described by Tappert [46] for broken ligatures in con-
nected script writing. The mechanical causes of this problem are
analyzed by Phillips [44].
The incidence in our collection of over 85 000 characters is
very infrequent. We use a writing stylus and tablet specifically
developed for writing input, with a maximum pressure to register
pen-down of 50 g, and a maximum stylus travel to close of
0.8 mm, which mimics closely the pen pressure needed to make
visible ink flow in an off-the-shelf ball-point-pen ink refill. Typi-
cal figures for styli not developed for writing input are 200 to 500
grams, and 3 to 6 mm. The tablet and stylus may be of interest to
other researchers [94].
Since this phenomenon has been shown to be purely mechani-
cal due to the use of this stylus design, and does not occur in our
device, we do not include this factor in this model.
V. ENUMERATION OF VARIANTS
We will now illustrate the use of this model by glVlng a
description of the variants on the upper-case written "A." In this
implementation, the model is applied separately to the base
forms for each character. We chose the letter "A" as an example
for two reasons.
1) It is a frequent letter in English language text: we have
more samples than for other letters, such as "Q."
2) It illustrates all of the effects described in this paper.
The base form for the upper-case "A" has three strokes, as shown
in Fig. 26. This is variant number 2 from the samples described
in [85]. The arrows show the direction of the strokes, and the
numbers show the sequence of strokes. Chain code points are
indicated. Other stroke-order variants of the three stroke variant
of "A" are possible, but are rare in our data and those of our
sources.
B. Variations on Three-Stroke Form
For each of the two tall strokes, in Fig. 27, we predict three
possibilities.
1) An initial hook caused by the leading-in pen motion from
the previous character.
2) The stroke may tilt slightly to the left or to the right.
3) The stroke may bow convex or concave, or show a slight
double curvature.
For the horizontal stroke in Fig. 28, we predict similar variants.
We therefore predict 2X2X6X6X6, or 864 variants for the
three-stroke upper-case "A" in Fig. 29.
C. Variations on Two-Stroke Forms
For right-handed subjects, we have collected samples of differ-
ent two-stroke forms of the letter "A" reported by Shinghal and
Suen [85]. Both of these are predicted in Fig. 30 by connecting
two strokes from the three-stroke form.
We predict a minimum of four different connecting segments
between base form strokes 1 and 2, and at least eight different
connecting segments between strokes 2 and 3 as shown in Figs.
31 and 32. The total number of variants predicted for the two
2-stroke "A"s (Fig. 33) is 2X6X4X6X6+2X6X6X6X6, or
4320.
D. Variations on One-Stroke Form
Using both connecting segments, we produce the one-stroke
variants. In Fig. 34 the total predicted number of variants is
2X6X4X6X6X6, or 10 368.
444 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. 18, NO.3, MAY/JUNE 1988
Fig. 26. Written samples for three-stroke "A."
Fig. 27. Variant segments for tall strokes on three-stroke "A."


Fig. 28. Variant segments for middle stroke on three-stroke "A."
Fig. 29. Variants on three-stroke "A."
[l[J
l
--
/ '
Fig. 30. Written samples for two-stroke "A."
[?J[Il[]l2J
Fig. 31. Connecting segments for strokes 1 and 2 on three-stroke "A."
[SJ[S]E]
5JBB
Fig. 32. Connecting segments for stroke 2 and 3 in three-stroke "A."
Fig. 33. Variants on two-stroke "A'"s
Fig. 34. Variants on one-stroke "A."
I
\
I

lhookl LD LU
UR
LU
( ( ) /'
!omit-
ULD DLU URD DUL DR
ted I

}
l
.........,.
ULRD DRLU URLD DRL LUR

I
\
)

ULRLD DLRLU URLRD DRUL LOR
I
I
I
I hook
AD
only)
)
LU
!omit
ted)
URD
)omitted) Right
side
(a) (b) (c) (d) (e) (f)
Fig. 35. Variants of the letter "A." (a) Initial hook. (b) Left side. (c) Con-
necting segment. (d) Right side. (e) Connecting segment. (f) Cross-stroke
segment.
E. Summary
We predict a total of 15 552 variants of the upper-case "A"
(Fig. 34). The appearance of each is "reasonable": researchers
who cull "poor-quality" samples before making performance
tests would find them all at least marginally acceptable.
We make no assertion about the statistical frequency of each
variant. Since we are concerned with finding a realistic solution
for on-line character recognition, we are biased slightly towards
permitting more variants, rather than risk omitting variants.
VI. PREDICTIVE SYSTEMS TO COMBAT
COMBINATORIAL ExPLOSION
At first glance, the enumeration of variants may seem to be a
"brute force" approach to the problem of variability. However,
exactly because the variations are generated by a combinatorial
process, we avoid the problems caused by combinatorial explo-
sion by treating the patterns for each variant as regular expres-
sions of chain-codes. Waltz described a similar effect [95] for the
analysis of line-drawings of solid images using "permutation-free
search."
Fig. 35 gives an enumeration of the variants of the letter "A"
derived from this model of variability. We have derived similar
definitions of written forms for each of the 95 "writable" ASCII
characters, and for special characters specific to particular appli-
cations and national variants on the ASCII character set.
The segments can each be described individually by listing the
chain code types along each instance of a segment. The descrip-
tion, recognition, and classification of combinatorial productions
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. 18, NO.3, MAY/JUNE 1988 445
of symbols is a well-developed technology in the realm of pro-
gramming languages.
Tools for exact syntactic pattern matching are widely available
[96]-[98], and are described in basic texts in programming lan-
guage implementation [99], [100]. Burr [101] describes one use of
these standard tools in a context-sensitive recognition system.
Describing a character as a combination of smaller, primitive
segments has been widely used in recent character recognition
work for handwriting, usually by means of a small number of
standardized "primitive" substrokes [33], [102]-[104]. We use a
generalized syntactic description for each segment of each base
form, where the segments can be defined uniquely for each base
form: we believe this method to be more powerful.
Our matching method typically gives between 15 and 25
matchings for any given input character: a similar matching
method [105] typically gives about three matches. The difference
is a result of our design goal of tolerating a wide range of
variability, with the bulk of the recognition carried out using
functional attribute analysis, rather than syntactic methods only.
VII. IMPLICATIONS
There are several implications of this model for the design and
testing of handwriting character recognition systems. Our par-
ticular system uses pre-defined recognition, based partly on this
predictive model of variability, and on the functional attribute
theory of recognition [56], [84], [106], [107]. However, the princi-
ples are independent of the type of recognition scheme used, and
apply specifically to adaptive recognition schemes.
A. Verification of the Model
We have presented an implementation model for predicting
variability, not a fundamental theory for the underlying causes.
Verifying the model reduces to demonstrating it's degree of
success in predicting the existence of variants.
Statistical proof of correctness is very difficult in any variabil-
ity model [108]: we show 15 552 predicted variants for the letter
"A" right-handed form alone. Tappert [109], in referring to the
variation in writing for even one subject, mentions that a sub-
stantially larger collections from many more subjects than were
available would be necessary to quantify the variation.
However, the model does meet criteria for plausibility:
1) The variants predicted are "legible" and reasonable
facsimiles of written characters.
1
2) The individual variability effects are found in actual writ-
ing samples.
3) The number of variants predicted is large, but countable.
4) Each individual variation is easy to produce by actual
writing: subjects not trained to our system can reproduce
our predicted variants as easily or more easily than they
can reproduce variants taken from other subjects with
opposite handedness.
One statistically testable item is to count how many written
variants are not predicted by the model, but can be found in real
writing. In our collection for the right-handed base form of the
letter "A" there are 31 cases that are not predicted by the model
out of some 1900 characters from unconstrained writing. The 31
cases include 25 frivolous characters written by subjects in an
effort to "fool" the system by writing highly ornamental or
artificial forms, leaving only six "true" errors, or less than 0.3
percent of the total. Similar figures apply to all other characters
in our collection. We have done almost no pre-selection on the
collection. For example, we do not separate samples from left-
and right-handed writers.
This is not to assert that only 31 cases were misrecognized:
some distorted shapes were misrecognized due to truly ambigu-
1
There is no accepted formal definition of "legible"; each researcher uses
subjective criteria.
(a) (b) (c) (d)
Fig. 36. (a) Loop on top. (b) Wavy cross-stroke. (c) Wavy initial downstroke.
(d) "Blob" for cross-stroke.
ous functional attributes or simple inadequacies in the part of our
total system which does functional attribute analysis. The classifi-
cation done by character recognition systems is a case of
"boundary theory" [110], not category theory: in our experience,
recognition based on functional attribute models is limited by the
problems of formal description, not by the particular form of
implementation.
The four frivolous forms shown in Fig. 36 came from one
subject each, and account for 2, 5, 10, and 2 of our total
omissions, respectively. One of the remaining omissions was
caused by a broken stroke due to very light writing pressure. In
general, we will not label a character as frivolous unless the
subject who wrote it will agree that is the case.
B. Performance Testing of Character Recognition Systems
Repeatable performance tests of character recognition systems
are done by running the system against a collected set of written
characters. Tests done by having subjects write "first hand"
directly into the system are statistically valid, but are by their
nature not exactly repeatable. Our practice during these "first-
hand" tests is to collect as much of the subjects' writing as
possible in electronic form for later repetition of the test.
We believe our collection of some 85 000 characters is large by
most developers' standards. However, this model shows we can-
not depend on a random collection, no matter how large, to
reliably show all variants that would be encountered in large-scale
actual use of a character recognition device. Such a collection can
be used to hunt for possible variants not included in the design of
the system, but perhaps present in the samples, but cannot be
used to show the complete inclusion of all variants that must be
supported for large-scale use.
Henckels [111] claims that testing practical recognition systems
is not much considered in development, and suggests using
simulated data to test for completeness in a recognition system,
using a statistical argument for the size of a sample collection
needed to reduce feature-selection errors to an acceptable level.
From this perspective, the use of a statistical collection alone is
inadequate for development testing of a character recognition
system.
The problem of demonstrating "completeness" is well known
in the literature on software testing [112]: all cases must be
anticipated in the design, and each code path must be exercised
to certify complete testing. There is no assurance that unantic-
ipated variations are likely to be recognized correctly, and a
general observation in software engineering is that systems fre-
quently fail when presented with atypical or unanticipated input
[113]. Without some fundamental assumption of what variations
can, or at least are likely to occur, there is no clear way to show
that the designer has included all "anticipated" cases.
A common assertion is that the problem of character recogni-
tion is not deterministic-there are no formal criteria for "cor-
rect'' recognition [114]. This logical statement is distinct from the
intellectual statement of "completeness": a given algorithm is
always deterministic, since it will produce the same output for the
same identical sequence of inputs started at the same internal
state. Completeness is the problem of proving that the system
functions as intended for all possible inputs, not that the intent is
subjectively correct.
446 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. 18, NO. 3, MAY/JUNE 1988
C. Use of Constraints to Reduce Variability
Another approach to variability is to reverse the problem by
putting constraints on the writer to reduce the variability. This
has resulted in various standards for machine-readable hand-
printing styles [115], [116]. Several authors have devised means of
"enforcing" the constraints [117], [118]. Some of the larger collec-
tions of writing samples, in fact, were intended to test the ability
of writers to conform to certain of these standards [119], [120]
not to collect data on normal handwriting variability. Mori et a/.
[74] report complaints from users that a character recognition
system (for Kanji) vulnerable to distortions of character shapes is
of no use.
We have reservations about the viability of constraint-based
designs. Y acyk [ 49], for example, reports on the difficulty in
practice of getting even trained writers to consistently use forms
dissimilar to their normal writing style. More subtly, our own
experience is that the system designer is easily tempted to im-
prove "measured" performance by refining the constraints to
exclude difficult cases, instead of improving the actual design
performance. Knoll [122], for example, remarks that the choice of
which samples are used for training and which samples are used
for performance testing has a direct effect on the measured
performance of the system. This can lead to a sort of nonformal
a posteriori "constraint" selection by choosing the test and train-
ing data to minimize the reported error; this weakens the as-
sumed validity of performance measurements.
This is not to say constrained recognition has no utility: Farag
[123] describes an application where the symbols were so con-
strained for the application (ten specially selected script words)
that the recognition problem was made very tractable. Our res-
ervation is about the notion of constraints for general text input
by the general population.
A more subtle point comes from the observed decrease in
writing speed with any form of constraint [72], [124]. We are not
aware of any formal studies as to why the writing speed cannot
be maintained: Morse code operators are taught specific
handprinting styles so as to maintain legibility while writing
quickly. Wing [126] and Suen [127] mention that writing speed
for printing increases with practice. One reasonable assumption
is that the loss of speed is an indication of the level of concentra-
tion needed to maintain an as-yet unaccustomed writing style,
and that fatigue over extended periods of time will result in a loss
of concentration level and a reversion to the subject's usual
writing style.
D. Completeness of Coverage
Many factors affect the total set of variants that can be
produced in writing, including truly random "noise" in the
mechanical and electronic processes of digitizing written input. It
is important to understand the effects of a model that is unade-
quately predictive (it does not predict some variants that are
actually written), as compared to a model that is excessively
predictive (it predicts some variants that will never be observed
in practice).
We can tolerate a system which would recognize some theoreti-
cal " variants" that will not occur in use, but we cannot so easily
tolerate a system that mis-recognizes variants that do occur. The
inclusion of excess variants may add some burden to our devel-
opment effort, but has no negative effect in use. The omission of
real-world variants, if great, might certainly simplify the develop-
ment effort, but would have a profound negative impact on the
usability of a character recognition system.
E. Adaptive Recognition Technologies
Adaptive recognition has been used in many research systems
for on-line character recognition [8], [35], [61], [128]. Tappert
et a/. [124] and Doster and Oed [129] have both commented on
the strong need for a good user interface specifically for the user
j\
' \
Fig. 37. Left-handed variant of "A."
Fig. 38. Possible predictable left-handed variants of "A."
to add and to correct the recognition of new variants during
actual use. This human-factors problem is one of the many that
must be resolved in a functional application system for any type
of recognition scheme [40].
To our knowledge, adaptive recognition has been used in few
commercial systems to date. One established commercial product
[130] uses it for limited command input for interactive graphics.
This system has been praised as a highly productive user-inter-
face [131], but appears to have had little influence on other
applications development.
Adaptive recognition systems can benefit by incorporating
aspects of this model of variability: a successful predictive model
would to reduce the sensitivity of an adaptive system to "new"
variations. In Fig. 37, for example, we observed an unusual
variant "A" for one left-handed writer. Based on this model, it
would be possible to add to the list of paradigm shapes for
matching additional variants, which also were found in the same
subject's printing (Fig. 38).
The case is similar to the use of phoneme detection in speech
recognition. If the linguistic rules for connecting phonemes in the
language are known, they can be used to construct many of the
words of the language from a much smaller set of spoken samples
[132].
Adaptive recognition systems that do not model these variabil-
ity effects may be based on assumptions that fail in practice:
1) The writing samples provided during "training" of the sys-
tem are typical of the characters that will be encountered
during use. A classic problem of any experiment is devising
a procedure that does not affect what the experimenter
wants to measure. In our work we needed character sam-
ples of "unconstrained" writing. Unfortunately, the sub-
jects are aware that they are writing samples for a com-
puter collection, which makes them suddenly conscious of
what is normally an unconscious act of writing. When we
compare the writing samples with the same subject's "nor-
mal" writing-taken from interoffice memos or personal
notes not part of the collection procedure-we see the
subject had tended to shift more to the writing style he or
she remembers from elementary school penmanship les-
sons, or to a blockier style he or she perceives as "neat" or
"proper" writing.
The bulk of our own collection of character samples is
collected during continuous writing: the subject writes
several pages of clear text at normal speed on a digitizing
tablet with no computer-generated feedback. In our com-
prehensive collection procedure we include transcription
of machine-printed text from a separate sheet, transcrip-
tion from text above each writing line, dictation, and
free-form writing of familiar text (subject's name and
address, etc.).
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. 18, NO.3, MAY/JUNE 1988 447
2) The subject's writing does not vary significantly. Several of
the variability effects we have described may or may not
occur in different instances of the same character from the
same subject. For example, many reports of collected
writing samples refer to the number of strokes in a char-
acter as different variants. There are several effects that
can cause a subject to connect one or more strokes in a
character occasionally.
Some of the experimental procedures for collecting char-
acters have the subject write one character at a time. In
continuous writing, we have found different pen motion
depending on whether the previous character meant the
subject had to start with an upward (from the bottom of
the previous character) or downward (from the top of the
previous character) motion, with similar effects depending
on the following character.
3) The input data are an accurate representation of what was
written. There is little published information on the perfor-
mance characteristics of digitizing tablets. A nominal accu-
racy of 0.050 in, for example, could mean that a per-
fectly stationary stylus will report some random shape that
is 0.1-in high due to jitter. This is the approximate full size
of smaller characters in normal writing.
Tablet characteristics are known to vary dramatically
for handwriting input [40], [124], [133]. Phillips [44] has a
more detailed discussion.
4) There are no special fatigue effects. Most studies of recogni-
tion performance involve short timed tests (less than
30 min of continuous use). However, many applications
for on-line character recognition, such as data entry; in-
volve continuous use for several hours a day, every day.
Systems requiring constrained writing styles, or at least
consistent writing styles, would give much different recog-
nition success rates if measured in this kind of environ-
ment. Oed [134] referred to this when collecting data from
subjects in an adaptive recognition system, where the style,
stroke count, and embellishment of character changed
markedly after the first few lines of 20 lines of written text.
5) The research subjects are typical of subjects in actual use.
One likely use of on-line character recognition devices is to
replace keyboards for users who are not trained to type:
loading dock workers, telephone sales staff, nonspecialized
clericals and secretaries. At least some reported excellent
results on recognition performance were made primarily
using the members of the development team and support
staff [136].
Henckels [111] states that the main determining factor
in many tests of recognition performance is the extent to
which the subjects took appropriate care in producing
acceptable writing.
There are several factors that may be significantly dif-
ferent for these two classes of subjects: educational back-
ground, special training in handwriting styles (the crossed
zero is only taught in technical schools, not in normal
schools), familiarity and patience with computer
equipment, willingness to adapt to writing constraints, and
many others. Litvin [92] discusses these points in detail.
An additional problem concerns relative weighting of features
in computing the degree of match to the base samples: the
position of the endpoints is very important for some cases, such
as distinguishing "U," "J," and "0," but for characters that may
involve partially or completely omitted initial segments from
retraces, such as the one-stroke forms for "B," "D," "M," and
"P," the position of the starting point can be highly variable.
F Selection of Problems for Design and Development
We use similar chain code sequences to predict problem cases
in character recognition. The correlation is carried out by a
(a) (b) (c) (d) (e) (f) (g) (h) (i)
Fig. 39. (a) D. (b) Unknown. (c) G. (d) Unknown. (e) 9. (f) Unknown. (g) Y.
(h) Unknown. (i) 7.
computerized search of the chain code patterns: the final output
is a set of artificially generated characters that are used as test
cases in a "confusion matrix" [54] to determine the functional
attribute features for correct recognition. This is a more sys-
tematic approach, and deals much more directly with functional
attribute questions, than that of Henckels [111], who proposed
using simulated data to fine-tune the boundary conditions in a
character recognition system.
For example, the set of "confuseable" characters in Fig. 39 all
have the same chain code sequence of ULDRLURD.
We note that the examples given for graphical context-sensitiv-
ity described by Kuklinski [137], the cases of "similar characters"
discussed Ikeda et at. [138] for Kanji recognition, "confoundable
Katagana" listed by Watanabe et at. [139], Wing's "confuseable
forms" [126], and Tappert's "ambiguous characters" [35] for the
Roman alphabet involve characters whose chain code sequences
would be the same. Suen and Shillman [84] report superior
recognition results for a specific difficult case using the type of
functional attribute analysis similar to that used in the rest of our
system.
G. Formal Bias of Existing Character Collections
There are a number of points where we believe some existing
collections of character samples may be subject to formal bias, if
the collections are to be taken as typical of unconstrained writ-
ing.
a) Atypical multistroke forms: Unlike Kanji characters, the Ro-
man alphabet tends to be written with few strokes. We have
described several factors that cause subjects to write with fewer
strokes than the nominal base form.
Tappert [109] states that there are "at most four" strokes in
any written upper-case alphabetic character in the English-lan-
guage alphabet. However, data provided to us by Suen [83] show
a five-stroke letter "R" (Fig. 40(a)), and we have seen an equiv-
alent six-stroke "B" (Fig. 40(b)) in our own samples. One system
based on work by Crane [11] supports a four-stroke "F" (Fig.
40(c)) that the work by Cox [47] would require to be extended to
an equivalent five-stroke version (Fig. 40(d)).
Rather than an absolute statement about the number of strokes
that can be used in writing, we would make the following points.
1) Each of the cases in Fig. 40 is atypical of "normal"
writing: the first two are taught in some drafting courses,
and the second two are ornamental.
2) The general tendency in "normal" writing is to connect
strokes, or to avoid using too many strokes, since this
slows down writing speed.
b) Formal bias of the collection procedure: The mere fact of
making the subject aware that he or she is writing for sample
collection affects the subject's intent while writing. As we have
noted, other researchers have commented on changes in writing
style from the first samples to later samples.
This intentional "neatness" bias by the subject has a direct
effect on the occurrence of multistroke characters. The three-
stroke variant of "B" in Fig. 41(a), for example, is rare in our
collected samples, and we believe it to be extremely rare in actual
writing. The writing motion of the two curved side strokes is
actually more difficult to make as two separate strokes (in Fig.
41(b)) than as one connected stroke.
Our procedures are explicitly intended to get writing as close to
the subject's normal writing style as possible. We collect data
448
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. 18, NO.3, MAY/JUNE 1988
~ '1;5: [EJ[EJ
(a) (b) (c) (d)
Fig. 40. (a) Five-stroke "R." (b) Six-stroke "B." (c) Four-stroke. (d) Five-
stroke "F."
(a) (b)
Fig. 41. (a) Three-stroke "B." (b) Two-stroke "B."
across several pages of text to discourage persistent use of "neat"
writing. We use several methods of dictation and transcription,
and ask subjects to write familiar text (name and address) as well
as to make up their own text.
We collect occasional samples by tracing over, or asking the
subjects to trace over, writing samples from their daily work.
c) Before-the-fact bias of collected writing samples: We see
limited utility from collections with an explicit intent to collect
constrained writing. These samples are perfectly legitimate tests
of how subjects conform to the constraints being tested, but the
data should not mistakenly be taken as useful for studying
variability of normal writing.
Some examples from the literature are studies of the ability of
writers to conform to the ANSI standard for machine readable
hand-printing and similar models [119], and an earlier study on
writing limited to the characters used in Fortran coding sheets
and to certain restrictive writing styles [141].
On the other hand, the data in one collection were collected
with "intentionally" more variability than is normal [142]: while
this might be a legitimate step for system testing, the variability
in this case is obviously artificial and all from three writers, so
the data are not valid for studying "true" variability effects, and
the data would only include those effects known to the writers.
d) Single character writing versus continuous writing: Samples
collected to test many research systems are taken one character at
a time, which suppresses the intercharacter effects found in
normal writing. This is commonly the case in the "enrollment"
procedure for adaptive recognition systems.
The parts of this model that predict intercharacter effects may
be useful in reducing the sensitivity to these effects that are
introduced by this single-character collection process. At least, an
adaptive system would be better designed to collect samples as
continuously written lines of text, with later identification, which
might introduce some cases with the intercharacter effects.
In practice, we have not found the restriction to writing in
"character boxes" to be a significant constraint, if the size and
spacing of the boxes is a reasonable match to the subject's
normal writing. Our particular system relaxes the constraints of
writing in boxes to a noticeable degree by applying heuristics for
character segmentation that use the reference boxes only as a
rough indication of probable "centers" of the characters. Many
"boxless" character segmentation algorithms use heuristics for
permissible stroke spacing and separation that force equivalent
constraints on the subject.
Please note that we distinguish between continuous writing of
discrete characters, and connected writing of script.
e) Noninclusion of fatigue effects in the collection process: One
dramatic difference between the testing procedures in the litera-
ture and the probable circumstances of actual use is the length of
time the subjectjuser must spend writing. Typical data-entry
applications involve using the device for several hours at a time
Fig. 42.
(a complete working day) for several days on end (part of the
user's general job tasks). Litvin [92] raises these points in a
general discussion of what factors other than error rate must be
considered to evaluate on-line character recognition systems.
There are very few studies of fatigue affects on handwriting. A
well-known factor in adaptive recognition systems is the ap-
pearance of "new" variants after writing even a short time,
typically less than one-half hour. Tappert [109] mentions that
more prototypes are encountered as more samples are collected
from a subject, and the resulting need to have a good user
interface for adding new samples to the nominal base forms
during actual use.
We have observed that subjects tend to create their own
writing constraints when first using on-line writing input. The
subsequent failure to conform to any sort of constraints over
time can be seen as a reversion to "normal sloppiness" from
initial "neatness" with fatigue. Our model is based on physical
aspects of writing that are common in "sloppy" handwriting,
such as stroke connection, retracing, and curving of nominally
straight strokes. The likely effect of fatigue for this model would
be to increase the occurrence of these predicted effects as the
subject stops making the extra effort to avoid them.
H. After-the-fact Bias of Existing Character Collections
Reports on recognition accuracy usually include qualifications
about the type of testing done. Many studies have referred to
tests made with a limited number of subjects who have trained
themselves to be familiar with the system's limitations. The intent
is to avoid inputs that in the researcher's opinion are not "valid"
writing styles. These researchers would probably agree with us
that there is good reason not to require a system to "correctly"
recognize demonstrably bizarre and artificial writing styles that
are very unlikely occur in normal handwriting. To illustrate, the
instances of the letter "A" in Fig. 42 are recognizable, but would
never appear in "normal" handwriting.
The same rationale also has been used to remove more nor-
mally written characters from test data when the intent was to
use input from a broad range of subjects. Suen [1] comments
about one other researcher's reported performance "However,
only data of good quality were used." Greenberg [125] chose not
to use several samples from an IEEE collection due to "poor
quality." Yhap and Greanius [33] mention in an experiment on
Chinese character recognition that some 5 percent of the data
were excluded for various reasons of poor writing or low data
quality, but gives no definition of what constitutes "poorly
written strokes" and no examples of the data that were excluded.
In each case, performance test data were intentionally biased
before the fact according to the experimenter's criteria for "rea-
sonable" input. We find nothing wrong with this decision, if the
criteria for acceptable input are specified more formally so that
the criteria can be subjected to analysis in themselves.
For real-world use, no class of error can be excluded: a failure
to recognize correctly is the same to the user regardless of cause.
I. Lack of Logical Foundation for the Definition of "Errors"
Is the correct result what the character "looks like," or what
the writer "meant" no matter how poorly written?
The definition of what is an "error" is difficult to formalize:
Mori et a/. [74] point out that the total set of humanly-recogniz-
able variations is infinite, and some limits must be set both for
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. 18, NO.3, MAY/JUNE 1988 449
practical and academic reasons: however, the report also states
that it is very important to know how well a system works on
"low-quality" data for commercial use. Brown and Ganapathy
[142] refer to an explicit effort to collect more than average
variability to completely test a system. Watanabe et at. [139]
point out that the human writers and readers of test characters
may give equally consistent, but differing "correct" labels for the
characters.
The danger is in making the selection on a nonformal basis,
instead of with a specific model that can be tested as to whether
specific instances of characters are intended to be recognized.
Unless the criteria for "acceptable" or "realistic" variability are
formalized, there is a real danger of making tests with data that
will not realistically reflect the performance of the system in
actual use; recognition errors removed from the input result in a
better "measured" success rate than will occur in practice.
No overt bias is necessary to give in to this temptation. Our
goal is to predict "reasonable" variability that occurs in real
handprint, without incorporating highly unlikely variations that
make the total design problem of accurate character recognition
more difficult.
VIII. CONCLUSION
Variability in handwriting, both inter- and intrasubject, is what
makes machine recognition of hand-written or printed text dif-
ficult. We have presented a model for certain types of variability
in handwritten characters based on observable physical effects in
the writing process. We have shown how this model was used in
the implementation of a commercial device for on-line character
recognition. The model has several implications for handwriting
character recognition, including reducing the sensitivity of adap-
tive recognition schemes to "new" variants in the course of a
subject's writing. The model also points out substantial limita-
tions in testing procedures to determine the performance of a
character-recognition system with real data.
This work is a recent development in a commercial effort to
develop handwriting character recognition technology, and has
not been previously reported. Many factors in character recogni-
tion cannot be detected in the scale of most laboratory experi-
ments: we encourage other commercial developers to publish
more of the results of their work on developments in handwritten
character recognition.
ACKNOWLEDGMENT
Special thanks are due J. O'Brien, B. Blesser and M. Phillips
for their help in preparing this manuscript.
REFERENCES
[1] C. Y. Suen, M. Berthold, and S. Mori, "Automatic recognition of
hand-printed characters-The state of the art," Proc. IEEE, vol. 68,
no. 4, Apr. 1980, pp. 469-487.
[2] L. J. Lukis and G. P. Duhig, U.S. Patent 4,493,104, "Character recogni-
tion device," Jan. 8, 1985, assigned to Moore Business Forms, Inc.,
Grand Island, NY.
[3] "Penpad 320 Technical Data," Pencept, Inc., Waltham, MA, 1984.
[4] "Penpack Multiplan User's Guide," Pencept, Inc., Waltham, MA,
1984.
[5] "Penpack IBM Personal Editor User's Guide," Pencept, Inc., Waltham,
MA, 1984.
[6] "Penware Penform User's Guide," Pencept, Inc., Waltham, MA, 1984.
[7] "Datapad portable hand-print character recognition data entry system,"
AnnoGraphics, Inc., Fairfax, VA, 1986.
[8] L. N. Cooper, C. Elbaum, and D. L. Reilly, U.S. Patent 4,326,259,
"Self organizing general pattern class separator and identifier," A.pr.
20, 1982, assigned to Nestor Graphics, Providence, RI.
[9] L. N. Cooper and C. Elbaum, U.S. Patent 4,319,331, "Curve follower,"
Mar. 9, 1982, assigned to Nestor Graphics, Providence, RI.
[10] "ScriptWriter technical information," Data Entry Systems, Huntsville,
AL, 1985.
[11] "Handwriter GrafText system model GT-5000," Communication Intel-
ligence Corp., Menlo Park, CA June 1985.
[12] "Handwriter focus: ABC Accounting Package," Communication Intel-
ligence Corp., Menlo Park, CA, 1985.
(13] "Handwriter Lotus 1-2-3 interface kit," Communication Intelligence
Corp., Menlo Park, CA, 1985.
[14] "Handwriter Wordstar interface kit," Communication Intelligence
Corp., Menlo Park, CA, 1985.
[15] "Handwriter product literature," CIC Japan Inc., Tokyo, Japan.
(16] C. Cohen, "News Update," Electronics, vol. 56, no. 12, p. 32, June 16,
1983.
[17] International newsletter, "System reads Kanji characters into word
processors," Electronics, vol. 54, no. 12, p. 64, June 16, 1981.
[18] W. Buxton, "Chunking and phrasing and the design of human-com-
puter dialogues," Proc. IFIP World Comput. Congress, Dublin, Ireland,
Sept. 1-5, 1986.
(19] J. Pavlidis and C. J. VanWyk, "An automatic beautifier for drawings
and illustrations," ACM Comput. Graphics, vol. 19, no. 3, pp. 225-234,
July 1985.
[20] J. J. Thomas and G. Hamlin, "Workshop summary: Graphical input
interaction technique," Comput. Graphics, vol. 5, pp. 279-304, Jan.
1983.
[21] R. V. Rubin, E. J. Golin, and S. P. Reiss, "ThinkPad: A graphical
system for programming by demonstration," IEEE Software, vol. 2, no.
2, pp. 73-79, Mar. 1985.
[22] C. G. Wolf, "Can people use gesture commands?" SIGCHI Bull., vol.
18, no. 2, pp. 73-74, Oct. 1986.
[23] G. A. Flurry, "Electronic handwriting facility," IBM Technical Dis-
closure Bull., vol. 27, no. 9, pp. 5364-5366, Feb. 1985.
[24] M. K. Brown and S. Ganapathy, "Preprocessing techniques for cursive
script word recognition," Pall. Recog., vol. 16, no. 5, pp. 447-458,
1983.
[25] A. J. Tersoff, "Man-machine considerations in automatic handprint
recognition," IEEE Trans. Syst. Man Cybern., vol. SMC-8, no. 4, p.
279, Apr. 1978.
[26] A. Nilssen and J. R. Ward, U.S. Patent 4,562,304, "Apparatus and
method for emulating computer keyboard input with a hand-print
terminal," Dec. 31, 1985, assigned to Pencept, Inc., Waltham, MA.
[27] J. Gould, J. Conti, and T. Tovanyecz, "Composing letters with a
simulated listening typewriter," Commun. ACM, vol. 26, no. 4, pp.
295-308, Apr. 1983.
(28] J. R. Ward (organizer), "Issues limiting the acceptance of user inter-
faces using gesture input and handwriting character recognition," panel
discussion, Proc. CHI+GI Conf. on Human Factors in Computing Sys-
tems and Graphics Interface, Toronto, ON, Apr. 5-9, 1987, pp. 155-158.
[29] C. Y. Suen (chair), "Future challenges in handwriting and computer
applications," panel discussion, scheduled for 3rd Int. Symp. on
Handwriting and Computer Appl., Montreal, PQ, May 29, 1987.
[30] T. Kuklinski, "Components of hand-print style variability." in Proc.
IEEE 7th Int. Conf. Pattern Recog., 1984, pp. 924-926.
[31] H. Murase and T. Wakahara, "Online hand-sketched figure recogni-
tion," Pattern Recog., vol. 19, no. 2, pp. 147-160, 1986.
[32] A . Shridhar and A. Badreldin, "Recognition of isolated and simply
connected handwritten numerals," Pattern Recog., vol. 19, no. 1, pp.
1-12, 1986.
[33] E. F. Yhap and E. C. Greanias, "An on-line Chinese character recogni-
tion system," IBM J. Res. Develop., vol. 25, no. 3, pp. 187-195, May
1981.
[34] K. Yoshida and H. Sakoe, "Online handwritten character recognition
for a personal computer system," IEEE Trans. Consumer Electron., vol.
CE-28, no. 3, pp. 202-209, Aug. 1982.
[35] C. C. Tappert, "Adaptive on-line handwriting recognition," Proc. 7th
Int. Conf. Pattern Recog., 1984, Montreal, PQ, pp. 1004-1007.
[36] __ , "Dehooking procedure for handwriting on a tablet," IBM Tech.
Disclosure Bull., vol. 27, no. 5, pp. 2995-2998, Oct. 1984.
[37] __ , "Delayed stroke processor for handwriting recognition," IBM
Tech. Disclosure Bull., vol. 26, no. 12, pp. 6616-6619, May 1984.
[38] P. Mermelstein and M. Eden, "Experiments on computer recognition
of connected handwritten words," Inform. and Contr., vol. 7, no. 3, pp.
255-270, June 1964.
[39] M. Yasuhara, "Experimental studies of handwriting process," report of
the Research Lab. of Communication Science, University of Electro-
Communications, Japan, vol. 25-2, (Science and Technology section).
pp. 233-254, Mar. 1975.
[40] J. R. Ward and B. Blesser, "Interactive recognition of handprinted
characters for computer input," IEEE Computer Graphics and Appl.,
vol. 5, no. 9, pp. 24-37, Sept. 1985.
(41] __ ,"Implications of using interactive hand-print character recogni-
tion for computer input," Proc. 1985 Trends and Appl. in Comput
Graphics Conf., May 1985, IEEE cat. no. 85CH2148-5.
[42] J. R. Ward, "UNIX as a development tool for a non-UNIX micro-
processor," CommUNIXations, vol. V, no. 5, pp. 26-30, Aug.jSept
1985.
[43] __ , United States Patent 4,534,060, "Method and apparatus for
removing noise at the ends of a stroke," Aug. 6, 1985, assigned to
Pencept, Inc., Waltham, MA.
[44] M. Phillips, "Several simple tests can help you choose the correct
digitizer," Comput. Tech. Rev., vol. VII, no. 1, Jan. 1987.
450
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. 18, NO.3, MAY/JUNE 1988
[45) J. R. Ward and M. Phillips, "Digitizer technology: performance char-
acteristics and the effects on the user interface," IEEE Comput. Graphics
and Appl., vol. 7, no. 4, pp. 31-44, Apr. 1987.
[46) C. C. Tappert, "Cursive script recognition by elastic matching," IBM
J. Res. and Develop., vol. 26, no. 6, pp. 765-771, Nov. 1982.
[47) C. H. Cox III, P. Coueignoux, B. Blesser, and M. Eden, "Skeletons: A
link between theoretical and physical letter descriptions," Pattern Re-
cog., vol. 15, no. 1, pp. 11-22, 1982.
[48) C. Y. Suen, "A study on man-machine interaction problems in char-
acter recognition," IEEE Trans. Syst. Man Cybern., vol. SMC-9, no.
11, pp. 732-737, Nov. 1979.
[49) J. Yacyk, "Alphabetic hand-print reading," IEEE Trans. Syst. Man
Cybern., vol. SMC-8, no. 4, pp. 279-282, Apr. 1978.
[50) R. Shillman, T. Kuklinski, and B. Blesser, "Psychophysical techniques
for investigating the distinctive features of letters," Int. J. Man-Mach.
Studies, vol. 8, pp. 195-205, 1976.
[51) R. Shillman, "Automatic recognition of thick stroke characters," MIT
Research Laboratory of Electronics, Quarterly Progress Report no. 118,
July 1976.
[52] R. Shillman and G. Naus, "The distinctive features of the letters 0 and
D," MIT Research Laboratory of Electronics, Quarterly Progress Re-
port, No. 118, July 1976.
[53] C. P. Brooks and A. F. Newell, "Computer transcription of handwrit-
ten shorthand as an aid for the deaf-A feasibility study," Int. J.
Man-Mach. Studies, vol. 23, pp. 45-60, 1985.
[54) B. Blesser, T. Kuklinski, and R. Shillman, "Empirical tests for feature
selection based on a psychological theory of character recognition,"
Pattern Recog., vol. 8, pp. 77-85, 1976.
[55] W. Doster, private letter of May 1986, AEG Aktiengesellschaft, For-
schungsinstitut Ulm, Sedanstrasse 10, D-7900 Ulm, West Germany.
[56] B. Blesser, R. Shillman, T. Kuklinski, C. Cox, M. Eden and J. Ventra,
"A theoretical approach to character recognition based on phenomeno-
logical attributes," in Proc. 1st Int. Joint Conf. Pattern Recog., 1973,
pp. 33-40.
[57) B. Blesser, R. Shillman, C. Cox, T. Kuklinski, J. Ventura, and M. Eden,
"Character recognition based on phenomenological attributes," Visible
Language, vol. 7, no. 3, pp. 209-223, 1973.
[58) T. Pavlidis and F. Ali, "Computer recognition of handwritten written
numerals by polygonal approximations," IEEE Trans. Syst. Man
Cybern., vol. SMC-6, no. 5, pp. 610-614, Nov. 1975.
[59) H. Freeman and L. S. Davis, "A comer-finding algorithm for chain-
coded curves," IEEE Trans. Comput., vol. 26, pp. 297-303, Mar. 1977.
[60) D. B. Convis, P. J. Grim, and M. A. Reed, U.S. Patent 4,550,438,
"Retro-stroke compression and image generation of script and graphic
data employing an information processing system," Oct. 29, 1985,
assigned to IBM Corp., Armonk NY.
[61) H. D. Crane and R. E. Savoie, "An on-line data entry system for
hand-printed characters," IEEE Comput., vol. 10, no. 3, pp. 43-50,
Mar. 1977.
[62) V. M. Powers, "Pen direction sequences in character recognition,"
Pattern Recog., vol. 5, pp. 291-302, Mar. 1973.
[63) H. Sakoe, U.S. Patent 3.979,722, "Automatic character recognition
device employing dynamic programming," Sept. 7, 1976, assigned to
Nippon Electric Co. Ltd., Tokyo, Japan.
[64) B. Blesser, U.S. Patent 4,375,081, "Multistage digital filtering utilizing
several criteria," Feb. 22, 1983, assigned to Pencept, Inc., Waltham,
MA.
[65] D. H. Foster and R. J. Mason, "Irrelevance of local position informa-
tion in visual adaptation to random arrays of small geometric elements,"
Perception, vol. 9, pp. 217-221,1980.
[66) S. Coffin, "Spatial frequency analysis of block letters does not predict
experimental confusions," Percept. and Psychophys., vol. 23, no. 1, pp.
69-74, 1978.
[67) B. Julesz, "Experiments in the visual perception of texture," Sci. A mer.,
vol. 232, no. 4, pp. 34-43, Apr. 1975.
[68) L. S. Frishkopf, U.S. Patent 3,133,266, "Automatic Recognition of
Handwriting," assigned to Bell Telephone Laboratories, New York,
NY.
[69) R. Shillman, T. Kuklinski, and B. Blesser, "Experimental methodolo-
gies for character recognition based on phenomenological attributes,"
Proc. 2nd Int. Joint Conf. Pattern Recog., Copenhagen, Denmark, Aug.
13-15, 1974, pp. 195-201.
[70) M. J. Naus and R. Shillman, "Why a Y Is not a V: A new look at the
distinctive features of letters," J. Experimental Psych.: Human Percept.
Perf., vol. 2, no. 3, pp. 394-400, 1986.
[71) G. G. N. Wright, "The writing of Arabic numerals," Scottish Council
for Research in Education Series No. 33, London: University of London
Press, 1952.
[72) R. Apsey, "Human factors of constrained hand-print for OCR," IEEE
Trans. Syst. Man Cybern., vol. SMC-8, no. 4, pp. 292-296, Apr. 1978.
[73) L. Salter, "Variability of Japanese characters," internal report, Pencept,
Inc., Waltham, MA 02154, Sept. 1983.
[74) S. Mori, K. Yamamoto and M. Yasuda, "Research on machine recog-
nition of hand-printed characters," IEEE Trans. Pattern Anal. Machine
Intel/., vol. PAMI-6, no. 4, pp. 386-405, July 1984.
[75) Y. Hidai, K. Ooi andY. Nakamura, "Stroke re-ordering algorithm for
on-line handwritten character recognition," Proc. 8th Int. Conf. Pattern
Recog., Paris, Oct. 1986, pp. 934-936.
[76) H. A. Maurer, G. Rozenberg, and E. Welzl, "Using string languages to
describe picture languages" Inform. and Contr., vol. 54, no. 3, pp.
155-185, 1982.
[77) M. Berthod and J. P. Maroy, "Learning in syntactic recognition of
symbols drawn on a graphic tablet," Computer Graphics and Image
Processing, vol. 9, pp. 166-182, 1979.
[78] K. Yamamoto and S. Mori, "Recognition of handprinted characters by
an outermost point method," Pattern Recog., vol. 12., no. 4, pp.
229-236, Mar. 1980.
[79) R. Shillman, "Character recognition based on phenomenological attri-
butes: Theory and methods," Ph.D. Thesis, MIT, Dept. of Elec. En-
gineering, 1974.
[80) T. Pavlidis, "Structural pattern recognition," New York: Springer-
Verlag, 1977.
[81) E. M. Herrick, "A taxonomy of alphabets and scripts," Visible Lan-
guage, vol. VIII, no. 1, pp. 5-32, Winter 1974.
[82) C. Y. Suen, "Handwriting education-A bibliography of contemporary
publications," Visible Language, vol. IX, no. 2, pp. 145-158, Spring
1975.
[83) __ , "Alphanumeric hand-prints with stroke directions and se-
quences," internal report for Pencept, Inc., Waltham, MA, 1977.
[84) C. Y. Suen and R. Shillman, "Low error rate optical character recogni-
tion of unconstrained handprinted letters based on a model of human
perception," IEEE Trans. Syst. Man Cybern., val. 7, no. 6, pp. 491-495,
June 1977.
[85) R. Shinghal and C. Y. Suen, "A method for selecting constrained
handprinted character shapes for machine recognition," IEEE Trans.
Pattern Anal. Machine Intel/., vol. PAMI-4, no. 1, pp. 74-78, Jan. 1982.
[86) D. Deringer, The Alphabet. New York: Funk and }Vagnalls, 1968.
[87) Anonymous, "Multi-segment system for recognizi'ng cursive writing,"
IBM Tech. Disclosure Bull., vol. 17, no. 11, pp. 6735-6739, Apr. 1985.
[88) J. M. Hollerbach, "A study of motor control through analysis and
synthesis of handwriting," Ph.D. Thesis, MIT Dept. of Elec. Engineer-
ing and Comput. Sci., Aug. 1978.
[89) E. Mandler, conversation of Sept. 1986. AEG Aktiengesellschaft, For-
schungsinstitut Ulm, Sedanstrasse 10, D-7900 Ulm, West Germany.
[90) K. Badie and M. Shimura, "Machine recognition of Roman cursive
scripts," Proc. 6th Int. Conf. Pattern Recog., pp. 28-30, 1982.
[91) A. G. Arkadev and E. M. Braverman, Computers and Pattern Recogni-
tion, translated from Russian by W. Turski and J. D. Cowan. Washing-
ton, D.C.: Thompson Book, 1967.
[92) Y. Litvin, "Principles of evaluation for hand-printed and cursive text
recognition methods," GTE Technical Note 401.1, Apr. 1982.
[93) A. M. Wing, M. I. Nimmo-Smith, and M. A. Eldridge, "The con-
sistency of cursive letter formation as a function of position in the
word," Acta Psycholog., vol. 54, pp. 197-204, 1983.
[94) "Penpad Penpad 300 digitizing tablet product information," Pencept,
Inc., Waltham, MA, 1986.
[95) D. L. Waltz, "Generating semantic descriptions from drawings of
scenes with shadows," Ph.D. Thesis, MIT, Dept. of Elec. Engineering,
1972.
[96) K. Thompson and D. M. Ritchie, UNIX Programmer,. Manual- Sixth
Edition. Murray Hill, NJ: Bell Laboratories, 1975.
[97] S. C. Johnson, "YACC: Yet another compiler compiler," Computing
Science Tech. Rep. No. 32, 1975, Bell Laboratories, Murray Hill, NJ.
[98) M. E. Lesk, "Lex-A lexical analyzer generator," Computing Science
Tech. Rep. No. 39, 1975, Bell Laboratories, Murray Hill, NJ, 1975.
[99) D. Gries, "Compiler construction for digital computers," New York:
Wiley & Sons, 1971.
[100) A. V. Aho and J. D. Ullman, "Principles of compiler design," Reading,
MA: Addison-Wesley, 1977.
[101) D. J. Burr, "Designing a handwriting reader," IEEE Trans. Pattern
Anal. Machine Intel/., vol. PAMI-5, no. 5, pp. 554-559, Sept. 1983.
[102) M. Nakagawa, T. Aizawa, C. Komoda, Y. Ideda and N. Takahashi,
"Syntactic pattern recognition with stochastic dissimilarity in Japanese
on-line input systems (JOLIS)-1/1.5," in Proc. 8th Int. Conf. Pattern
Recog., Paris, Oct. 1986, pp. 1059-1061.
[103) G. Chunbiao and X. Guorong, "Automatic recognition of printed
Chinese characters by four comer codes," in Proc. 8th Int. Conf.
Pattern Recog., Paris, Oct. 1986, pp. 1013-1015.
[104) M. Hosaka and F. Kimura, "An interactive geometrical design system
with handwriting," in Proc. 6th IFIPS Cong., Toronto, ON, Aug. 1977.
pp. 167-171.
[105] Y. Kurosawa and H. Asada, "Attributed string matching with statisti-
cal constraints for character recognition," in Proc. 8th Int. Conf.
Pattern Recog., Paris, Oct. 1986, pp. 1063-1067.
[106) R. T. Babcock, "Simulation method of feature selection for uncon-
strained hand-printed characters," M.S. Thesis, Dept. of Elec. En-
gineering and Comput. Sci., MIT, Cambridge, MA, June 1977.
[107) Y. Watanabe, J. Gyoba, and K. Maruyama, "Reaction time and eye
movements in the recognition task of hand-written Katakana-letters,"
Jap. J. Psycho/., vol. 54, no. 1, pp. 58-61, 1983, (in Japanese).
[108) E. H. Dooijes, "Analysis of handwriting movements," Acta Psycho/.,
vol. 54, pp. 99-114, 1983.

You might also like