Professional Documents
Culture Documents
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPAMI.2017.2732978, IEEE
Transactions on Pattern Analysis and Machine Intelligence
A SUBMISSION TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 1
AbstractOnline handwritten Chinese text recognition (OHCTR) is a challenging problem as it involves a large-scale character set,
ambiguous segmentation, and variable-length input sequences. In this paper, we exploit the outstanding capability of path signature to
translate online pen-tip trajectories into informative signature feature maps, successfully capturing the analytic and geometric
properties of pen strokes with strong local invariance and robustness. A multi-spatial-context fully convolutional recurrent network
(MC-FCRN) is proposed to exploit the multiple spatial contexts from the signature feature maps and generate a prediction sequence
while completely avoiding the difficult segmentation problem. Furthermore, an implicit language model is developed to make
predictions based on semantic context within a predicting feature sequence, providing a new perspective for incorporating lexicon
constraints and prior knowledge about a certain language in the recognition procedure. Experiments on two standard benchmarks,
Dataset-CASIA and Dataset-ICDAR, yielded outstanding results, with correct rates of 97.50% and 96.58%, respectively, which are
significantly better than the best result reported thus far in the literature.
Index TermsHandwritten Chinese text recognition, path signature, residual recurrent network, multiple spatial contexts, implicit
language model
1 I NTRODUCTION
0162-8828 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPAMI.2017.2732978, IEEE
Transactions on Pattern Analysis and Machine Intelligence
A SUBMISSION TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2
-3200
-3400
-3600
= = =
Path signature
M( ) M( ) M()
Embedding layer
6 1 11
14 4
3010
62 22
126 46
Learning
semantic context
6 1 11
14 6
with multilayered
30 14 BLSTM
62 30
126 62
______ __
6 1 11 Transcription
________
14 8 _______
30 18
62 38
126 78 = ( = |)
Signature Residual Predicting :
Receptive fields feature maps FCN recurrent network sequence
MC-FCRN Implicit language modeling
Fig. 1. Overview of the proposed method. Variable-length pen-tip trajectories are first translated into offline signature feature maps that preserve
the essential online information. Then, a multi-spatial-context fully convolutional recurrent network (MC-FCRN) take input of the signature feature
maps with receptive fields of different scales in a sliding window manner and generate a predicting sequence. Finally, an implicit LM is proposed to
derive the final label sequence by exploiting the semantic context of embedding vectors that are transformed from the predicting sequence.
been successfully applied to handwritten text recognition ly preserves the online information that characterizes the
in Western languages, where the character set is relatively analytic and geometric properties of the path. Encouraged
small (e.g., for English, there are only 52 classes; therefore by recent advances in deep CNNs and LSTMs, we propose
it is easy to train the network). However, to the best of our the MC-FCRN for robust recognition of signature feature
knowledge, very few studies have attempted to address the maps. MC-FCRN leverages the multiple spatial contexts that
problem of large-scale (where, e.g., the text lines may be correspond to multiple receptive fields in each time step to
represented by more than 7,000 basic classes of characters achieve strong robustness and high accuracy. Furthermore,
and sum up to more than 1 million character samples) we propose an implicit LM, which incorporates semantic
handwritten text recognition problems such as OHCTR. context within the entire predicting feature sequence from
Architectures that integrate CNN and LSTM exhibit ex- both forward and reverse directions, to enhance the predic-
cellent performance in terms of visual recognition and de- tion for each time step. The contributions of this paper can
scription [20], [21], scene text recognition [12], [13], [14], and be summarized as follows:
handwritten text recognition [11]. In text recognition prob-
lems, deep CNNs generate highly abstract feature sequences We develop a novel segmentation-free MC-FCRN to
from input sequential data. LSTM is fed with such feature effectively capture the variable spatial contextual dy-
sequences and generates corresponding character strings. namics as well as the character information for high-
Jointly training LSTM with CNN is straightforward and performance recognition. With a series of receptive
can improve the overall performance significantly. However, fields of different scales, MC-FCRN is able to model the
in the above-mentioned methods, the CNNs, specifically complicate spatial context with strong robustness and
fully convolutional networks (FCNs), process the input high accuracy.
string with only a fixed-size receptive field in a sliding The residual recurrent network, a basic component of
window manner, which we claim is inflexible for uncon- MC-FCRN, not only accelerates the convergence pro-
strained written characters in OHCTR. Moreover, a deep cess but also promotes the optimization result, while
integrated CNN-LSTM network is usually accompanied by adding neither extra parameter nor computational bur-
degradation problem [22] that slows down the convergence den to the system, as compared to ordinary stacked
procedure and affects the system optimization. recurrent network
In this paper, we propose a novel solution (see Fig. 1) We propose an implicit LM that learns to model the
that integrates path signature, a multi-spatial-context fully output distribution given the entire predicting feature
convolutional recurrent network (MC-FCRN), and an im- sequence. Unlike the statistical language model that
plicit language model (implicit LM) to address the problem predicts the next word given only a few previous
of unconstrained online handwritten text recognition. Path words, our implicit LM exploits the semantic context
signature, a recent development in the field of the rough not only from the forward and reverse directions of the
path theory [23], [24], [25], is a promising approach for text but also with arbitrary text length.
translating variable-length pen-tip trajectories into offline Path signature, a novel mathematical feature set,
signature feature maps in our system, because it effective- brought from the rough path theory [23], [24], [25]
0162-8828 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPAMI.2017.2732978, IEEE
Transactions on Pattern Analysis and Machine Intelligence
A SUBMISSION TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 3
0162-8828 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPAMI.2017.2732978, IEEE
Transactions on Pattern Analysis and Machine Intelligence
A SUBMISSION TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 4
-3200 -3200
-1.0 0 1.0
(,)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
8%
-3400 -3400
() , -3400
(1,-1) -3600 -3600
() , =() , () ,
= , = ,
, , =
=
Chens Identity
=
() ,
() ,
= = .
,
Eq. (2)
(), = , , , , , , . =
Window-based Signature
= = =
(a) (b) (c) (d)
Fig. 2. (a) Illustration of feature extraction of path signature. (b) A simple example of calculation of path signature features. (c) Path signature of one
typical online handwritten text example. (d) Left: path signature of the original pen-tip trajectories; Right: path signature of the pen-tip trajectories
with randomly added connections between adjacent strokes. It is notable that excepting for the additional connections, the original part of the
sequential data has the same path signature (same color).
his colleagues as a fundamental component of rough path signature results in the exponential growth of dimension but
theory [23], [24], [25]. Path-signature was first introduced may not always lead to significant gain.
into handwritten Chinese character recognition by Benjamin Next, we describe the practical calculation of path sig-
Graham [52], and followed by Yang et al [32], [33], but nature in OHCTR from the implementation and application
only at the character level. We go further by applying path point of view. As illustrated in Fig. 2a, the pen-tip trajecto-
signature to extremely long sequential data that usually ries of the online handwritten text samples are represented
consist of hundreds of thousands of points and prove its by a sequence of uniform-time sampling points. First, the
effectiveness in OHCTR problem. In the following, we first uniform-time sampling trajectory is translated into equal-
briefly introduce path signature theoretically, and then tech- distance sampling style. Then, to calculate the signature fea-
nically for sake of implementation and application. ture for a specific point, e.g., the red point in Fig. 2a, we esti-
Consider the pen strokes of the online handwritten text mate the window-based signature P (D)t1 ,t3 that takes this
collected from a writing plane H R2 . Then, a pen stroke point as the midpoint. In order to calculate P (D)t1 ,t3 , we
can be expressed as a continuous mapping denoted by D : first compute point-wise signature P (D)t1 ,t2 and P (D)t2 ,t3 ;
[a, b] H with D = (Dt1 , Dt2 ) and t [a, b]. For k then combine them according to Chens identity [51].
1 and a collection of indexes i1 , , ik {1, 2}, the k -th As adjacent sampling points of text samples are connect-
fold iterated integral of D along the index i1 , , ik can be ed by a straight line D = (Dt1 , Dt2 ) with t [a, b], the
(k)
defined by iterated integrals P (D)a,b can be calculated iteratively as
Z follows:
P (D)ia,b
1 , ,ik
= dDti11 , , dDtikk . (1) (
a<t1 <<tk <b (k) 1 , k = 0,
P (D)a,b = (k1) (5)
The signature of the path is a collection of all the iterated (P (D)a,b 4a,b )/k , k 1,
integrals of D: where 4a,b := Db Da denotes the path displacement and
P (D)a,b =(1, P (D)1a,b , P (D)2a,b , P (D)1,1 represents the tensor product. In Fig. 2b, we provide a
a,b ,
simple example to explain the calculation of path signature
P (D)1,2 2,1 2,2
a,b , P (D)a,b , P (D)a,b , ), (2) according to Eq. (5) and Eq. (2). Suppose we have two
adjacent straight lines D = (Dt1 , Dt2 ) with t [t1 , t2 ] and
where the superscripts of the terms P (X)ia,b
1 , ,ik
run over
D = (Dt1 , Dt2 ) with t [t2 , t3 ], as shown in Fig. 2a. Then,
the set of all multi-indexes
following Chens identity [51], we can calculate the path
G = {(i1 , ..., ik )|i1 , , ik {1, 2}, k 1}. (3) signature for the concatenation of these two paths as
(k) k
Then, the k -th iterated integral of the signature P (D)a,b is (k)
X
i1 , ,ik P (D)t1 ,t3 = P (D)it1 ,t2 P (D)ki
t2 ,t3 . (6)
the finite collection of terms P (D)a,b with multi-indexes i=0
(k)
of length k . More specifically, P (D)a,b is the 2k -dimensional Given the pen-tip trajectories of online handwritten text,
vector defined by for each sequential stroke point, we first compute the path
(k)
i1 , ,ik signature within a sliding window according to Eq. (2) and
P (D)a,b = (P (X)a,b |i1 , , ik {1, 2}). (4)
Eq. (6). Then the path signature features of certain level
In [25], it is proved that the whole signature of a path de- (k) along all the stroke points will form the corresponding
termines the path up to time re-parameterization; i.e., path feature maps. Specifically, the path signature feature vector
signature can not only characterize the path displacement of each stroke point spreads over 2(k+1) 1 two-dimensional
and its further derivative as the classical directional features matrices, according to the coordinates of the stroke point
do, but also provide more detailed analytic and geometric of the handwritten text data. The 0, 1, 2-th iterated inte-
properties of the path. In practice, we have to use the gral signature feature maps, i.e., the above-mentioned two-
truncated signature feature, which can capture the global dimensional matrices, of one typical online handwritten text
information on the path. Increasing the degree of truncated example are visualized in Fig. 2c.
0162-8828 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPAMI.2017.2732978, IEEE
Transactions on Pattern Analysis and Machine Intelligence
A SUBMISSION TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 5
In Fig. 2a, P (D)t1 ,t3 is computed with window size 3. In [21], and text recognition [12], [13]. Bidirectional LSTM
practice, we set the window size as 9 to keep strong local (BLSTM) facilitates the learning of complex context dynam-
invariance and robustness. Fig. 2d shows that, although ics in both forward and reverse directions, thereby out-
connections are randomly added between adjacent strokes performing unidirectional networks significantly. Stacked
within a character or between characters, their impact on the LSTM is also popular for sequence learning for accessing
path signature of the original input string is not significant, higher-level abstract information in temporal dimensions.
which proves that path signature based on sliding window Integrated CNN-LSTM systems demonstrate their out-
has excellent local invariance and robustness. standing capability in visual recognition and description
[20], [21] and scene text recognition [12], [13], [14]. How-
ever, the degradation problem [39] usually accompanies
4 M ULTI -S PATIAL -C ONTEXT FCRN deep integrated CNN-LSTM networks and slows down the
Unlike character recognition, where it is easy to normalize convergence process. Driven by the significance of deep
characters to a fixed size, text recognition is complicated be- residual learning [22], [39] for optimization of very deep
cause it involves input sequences of variable length, such as networks, we presented the residual recurrent network to
feature maps and online pen-tip trajectories. We propose a accelerate the convergence of our FCRN and obtain better
new fully convolutional recurrent network (FCRN) for spa- optimization result. Theoretically, we explicitly reformulate
tial context learning to overcome this problem by leveraging the LSTM\BLSTM layer (denoted by h with parameter h )
a fully convolutional network, a residual recurrent network, as learning the spatial contextual information with reference
and connectionist temporal classification, all of which nat- to the input. Denoting the l-th LSTM\BLSTM layer output
urally take inputs of arbitrary size or length. Furthermore, as ql (x), we have
we extend our FCRN to multi-spatial-context FCRN (MC-
FCRN), as shown in Fig. 1, to learn multi-spatial-context ql (x) = h(ql1 (x)) + ql1 (x). (7)
knowledge from complicated signature feature maps. In Iteratively applying ql (x) = h(ql1 (x)) + ql1 (x) =
the following subsections, we briefly introduce the basic h(ql1 (x)) + h(ql2 (x)) + ql2 (x) to qL (x), we get
components of FCRN and explain their roles in the architec-
L1
ture. Then, we demonstrate how MC-FCRN performs multi- X
qL (x) = q0 (x) + h(ql (x)), (8)
spatial-context learning for the OHCTR problem.
l=0
0162-8828 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPAMI.2017.2732978, IEEE
Transactions on Pattern Analysis and Machine Intelligence
A SUBMISSION TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 6
each time step and concatenating the labels to form a label a distribution over the character set without considering
sequence. The probability of alignments is given by any other feature vector, this simple model can hardly
T
incorporate any spatial context for recognition.
Y
p(|u) = p(t , t|u). (9)
t=1 4.2.2 Learning Spatial Context
Alignments can be mapped onto a transcription l by ap- LSTM inherently possesses the advantage of processing
plying a sequence-to-sequence operation B , which first re- sequential data; thus, it is a good choice for capturing
moves the repeated labels and then removes the blanks. For spatial contextual information within an FCN feature se-
example, tree can be obtained by B from tt r ee e or quence. Specially, we use the proposed residual recurren-
t rr e eee . The total probability of a transcription can t network for spatial context learning and develop the
be calculated by summing the probabilities of all alignments proposed FCRN by stacking the residual recurrent net-
that correspond to it: work right after FCN. More importantly, the input of the
X residual recurrent network is now the output of FCN, i.e.
p(l|u) = p(|u). (10) q0 (x) = (f (x1 ), f (x2 ) . . . f (xT )). Now based on Eq. (8), the
:B()=l objective of the FCRN is to learn to make a prediction for
each time step given the entire input signature feature maps:
As suggested by Graves and Jaitly [56], since the ex-
act position of the labels within a particular transcription
cannot be determined, we consider all the locations where t
o(zt , x) = g(zt , qL (x)) (12)
they could occur, thereby allowing a network to be trained (P 0
|C | 0 t
via CTC without pre-segmented data. A detailed forward- s.t. i=1 g(zt = Ci , qL (x)) = 1,
backward algorithm to efficiently calculate the probability t
g(zt , qL (x)) > 0.
in Eq. (10) was proposed by Graves [19].
where the overall system has parameters = (f , h , g ),
t
qL (x) is the t-th time step of the output feature sequence
4.2 Learning Multiple Spatial Contexts with Multi-
of the residual recurrent network, and o(zt , x) models the
Spatial-Context FCRN (MC-FCRN)
probability distribution over all the words in C 0 in the t-
In recent years, the concept of introducing contextual in- th time step given the entire input signature feature maps
PL1
formation within sequential data by using recurrent neural x. When we remove the term l=1 h(ql (x)) from Eq. (8),
network is gaining popularity [12], [13], [21], [36]. However, Eq. (12) simply reduce to Eq. (11). Therefore, the residual re-
very few studies have focused on incorporating multi-scale current network plays a key role in learning spatial context
context information into sequential recognition problem. information in FCRN.
Given the fact that multi-scale strategies, such as SIFT [41],
pyramid match kernel [40], SPP [42] and GoogLeNet [43],
4.2.3 Learning Multiple Spatial Contexts
bring significant improvement in different area, we consider
it of great advantage to introduce the multi-scale strategy Before moving toward learning multiple spatial contexts, we
into the sequential problems, such as OHCTR. Therefore should first introduce the concept of receptive field and de-
we propose a novel architecture, which we refer to as scribe its role in learning multi-spatial context. A receptive
multi-spatial context FCRN, to learn to model the multi- field is a rectangular local region of input images that can
spatial context within the sequential online handwritten text be properly represented by a highly abstract feature vector
data. In the following section, we will introduce MC-FCRN in the output feature sequence of FCN. Let rl represent the
progressively from a basic model, spatial context learning to local region size (width/height) of the l-th layer, and let the
multi-spatial context learning. (xl , yl )-coordinate denote the center position of this local
region. Then, the relationship of rl and (xl , yl )-coordinate
4.2.1 Basic Model between adjacent layers can be formulated as follows:
First, we introduce a simple model having three compo- rl = (rl+1 1) ml + kl ,
nents: FCN (denoted by f with parameters f ), fully con- kl 1
nected layers (denoted by g with parameters g ), and CTC. xl = ml xl+1 + ( pl ), (13)
2
Given signature feature maps x = (x1 , x2 , , xT ), where kl 1
xt represents the receptive field of the t-th time step, the yl = ml yl+1 + ( pl ),
2
objective of this model is to learn to make a prediction for
each receptive field: where k is the kernel size, m is the stride size, and p is
the padding size of a particular layer. Recursively applying
o(zt , xt ) = g(zt , f (xt )) (11) Eq. (13) to adjacent layers in the FCN from the last response
(P 0
|C | 0 maps down to the original image should yield the region
s.t. i=1 g(zt = Ci , f (xt )) = 1,
size and the center coordinate of the receptive field that
g(zt , f (xt )) > 0.
corresponds to the related feature vector of the FCN feature
where zt represents the prediction of the t-th time step sequence.
and o(zt , xt ) models the probability distribution over all the Technically, the receptive fields of an ordinary fully
words in C 0 given the receptive field xt in the t-th time step. convolutional network are of the same scale, except for some
Since each frame in the FCN feature sequence represents specially designed networks, such as GooLeNet. However,
0162-8828 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPAMI.2017.2732978, IEEE
Transactions on Pattern Analysis and Machine Intelligence
A SUBMISSION TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 7
-3200
unconstrained handwritten Chinese text has severe recogni- t=0
-3400
tion problem owing to its large character set, variable writ-
ing style and ambiguous segmentation problem. Therefore, -3600
{ ()}
in OHCTR problem, it is significantly important to observe a
specific local region of the original input image with a series = ()
of receptive fields of different scales.
= + { ()}
In the following, we explain how to generate receptive
=
fields with different scales for each time step. We observe
that the size of the receptive field is sensitive to the kernel = +
{ ()} t=T
size. Assume that the kernel size is increased from kl to = = = =
()
0162-8828 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPAMI.2017.2732978, IEEE
Transactions on Pattern Analysis and Machine Intelligence
A SUBMISSION TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 8
0162-8828 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPAMI.2017.2732978, IEEE
Transactions on Pattern Analysis and Machine Intelligence
A SUBMISSION TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 9
0162-8828 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPAMI.2017.2732978, IEEE
Transactions on Pattern Analysis and Machine Intelligence
A SUBMISSION TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 10
transcription
process, our network was trained with shorter texts seg- n: 7356+1
mented from text lines in the training data, which could be n: 1024
Implicit
normalized to the same height of 126 pixels while retaining n: 1024
Language c: 512
the width at fewer than 576 pixels. In the test phase, we Model c: 512
maintained the same height but increased the width to 2400 embedding layer
pixels in order to include the text lines from the test set.
transcription
We constructed our experiments within the CAFFE [64]
n: 7356+1
deep learning framework, in which LSTM is implemented n: 512/1024/1536
following the approach of Venugopalan et al. [21] while the n: 512/1024/1536 (FCRN/2C-FCRN/3C-FCRN, respectively)
used AdaDelta as the optimization algorithm with = 0.9. c: 512 c: 512 c: 512
c: 512 c: 512 c: 512
The experiments were conducted using GeForce Titan-X k: 102421, s: 21 k: 102421, s: 21 k:102421, s: 21
GPUs. For performance evaluation, we used the correct rate k: 51231, s: 31 k: 51231, s: 31 k: 51231, s: 31
MC-FCRN
(CR) and accuracy rate (AR) as performance indicators, as k: 5121 1, s: 11 k: 5121 1, s: 11 k: 5121 1, s: 11
k: 22, s: 22 k: 22, s: 22 k: 22, s: 22
specified in the ICDAR 2013 Chinese handwriting recogni- k:51233, s:11, p:11 k:51233, s:11, p:11
tion competition [65]. FCRN
k:51233, s:11, p:11
2C-FCRN
3C-FCRN
0162-8828 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPAMI.2017.2732978, IEEE
Transactions on Pattern Analysis and Machine Intelligence
A SUBMISSION TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 11
TABLE 3
Effect of Spatial Context (Percent)
TABLE 4
Effect of Semantic Context (Percent)
1
maintain a reasonable length of the output feature sequence.
However, as listed in Table 3, incorporating the inception
0.8 does not yield sufficiently good results. There are two
reasons to explain this phenomenon. First, in MC-FCRN,
Correct Rate
0.6
we carefully maintain different receptive fields of the same
0.4
time step at the same center position (as illustrated in Fig.3),
LSTM while the inception mechanism of GoogLeNet encourages
Residual LSTM
0.2 BLSTM different scales of convolution kernel to fuse together, and
Residual BLSTM stack upon one another. The stacked inception layers did
0
0 5 10 15 20 25 30 35 perform well in the image-based classification problem, but
Epoch
caused confusion between adjacent time steps, thus failing
Fig. 6. Curves of correct rate of (residual) recurrent network on validation in the sequence labeling problem, like OHCTR. Second, the
set for the first training stage (refer to Section 6.2). gradient explosion problem frequently occurs during the
training of integrated GoogLeNet-LSTM network. Note that
more importantly, with less oscillation. in this paper, we have already used multiple ways to avoid
The effects of multiple spatial contexts are summarized the loss oscillation problem, such as using Bath Normaliza-
in Table 3, and the network architectures of FCRN, 2C- tion strategy and residual recurrent network, but oscillation
FCRN, and 3C-FCRN are shown in Fig. 5. From Fig. 5, still occurs with GoogLeNet-LSTM during optimization.
we can see that FCRN, 2C-FCRN, and 3C-FCRN have
one, two, and three receptive fields of different scales for 6.3.3 Effect of Semantic Context
each time step, respectively. The experiments showed that To evaluate the effectiveness of the implicit LM, we con-
the system performance improved monotonically for both ducted experiments based on FCRN text line recognizer and
Dataset-ICDAR and Dataset-CASIA in the order of FCRN, three different corpora, i.e., the PFR, PH and CLDC corpus.
2C-FCRN, and 3C-FCRN, suggesting that we successfully As listed in Table 4, both the implicit LM and statistical
leveraged the multiple spatial contexts by using multiple language model substantially improved the system perfor-
receptive fields and improved the system performance. Fur- mance on both Dataset-ICDAR and Dataset-CASIA. In the
thermore, we designed FCRN-2 and FCRN-3 such that their table, we also observe the superiority of the implicit LM
architectures and sizes were similar to those of 2C-FCRN over the statistical language model, especially on Dataset-
and 3C-FCRN, except that their receptive fields for each time ICDAR. We attribute this superiority to the potential abil-
step were of the same scale. As listed in Table 3, although ity of the implicit LM to learn semantic context from the
FCRN-3 does benefit from increased parameter number, its entire sequence as well as the confidence information of
performance is even lower than that of 2C-FCRN, which the predicted words. Furthermore, the evaluation time for
further verifies the significance of the additional spatial Dataset-ICDAR and Dataset-CASIA testing data is provided
context. as runtime term on the table. With the maximum number of
The inception mechanism of GoogLeNet, a similar but the prefix paths for beam search on the statistical language
different way to leverage multiple spatial contexts, demon- model set to 100, the proposed implicit LM is approximately
strated outstanding ability in image classification and object 70 times faster than the statistical language model, thus
detection task. Therefore, we replaced the fully convolution- exhibiting a significant advantage for practical applications.
al network with GoogLeNet using 6 inception layers and 9 Furthermore, unlike the statistical language model where its
inception layers, respectively, with proper customization to RAM size increases as the corpus grows, the implicit LM has
0162-8828 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPAMI.2017.2732978, IEEE
Transactions on Pattern Analysis and Machine Intelligence
A SUBMISSION TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 12
TABLE 5
Comparison with Previous Methods Based on Correct Rate and Accuracy Rate (Percent) for Dataset-ICDAR and Dataset-CASIA
Dataset-ICDAR Dataset-CASIA
Method w.o. LM with LM w.o. LM with LM
CR AR CR AR CR AR CR AR
Shi et al., 2016 [12] 85.14 83.60 - - 89.58 87.67 - -
Wang et al., 2012 [1] - - - - - - 92.76 91.97
Zhou et al., 2013 [3] - - 94.62 94.06 87.93 85.92 94.34 93.75
Zhou et al., 2014 [4] - - 94.76 94.22 - - 95.32 94.69
VO-3 [58] - - 95.03 94.49 - - - -
2C-FCRN (residual LSTM) + CLDC (implicit LM) 90.17 88.88 96.01 95.46 94.47 93.31 97.07 96.72
2C-FCRN (residual LSTM) + CLDC (statistical LM) 90.17 88.88 94.51 93.45 94.47 93.31 97.01 96.49
2C-FCRN (residual LSTM) + CLDC (implicit LM & statistical LM) 90.17 88.88 96.58 96.09 94.47 93.31 97.50 97.23
Sun et al., 2016 [66] (2765 classes) 90.18 89.12 94.43 93.40 95.82 95.30 97.55 97.05
3C-FCRN (2651 classes) 93.53 92.86 97.15 96.50 95.50 94.73 97.75 97.31
Upper: 7356 classes; Lower: less than 3000 classes.
FCRN: ,
Sig0: +Statistical LM: ,
Sig3: +Implicit LM: ,
Ground truth: Ground truth: ,
FCRN: 7
LSTM: +Statistical LM: 7
Residual LSTM: +Implicit LM: 7
Ground truth: Ground truth: 7
FCRN:
FCRN: 4 +Statistical LM:
3C-FCRN: +Implicit LM:
Ground truth: Ground truth:
Fig. 7. Unconstrained handwritten Chinese text lines with their corresponding labels and the result predicted by the systems.
a fixed size that is determined by its network architecture. systems that have 7356 classes, Sun et al. [66] applied deep
Therefore, the implicit LM is an ideal alternative option for stacked LSTM with only 2675 classes to tackle the OHCTR
the statistical language model in the case of a large corpus, problem. For a relatively fair comparison, we reduced our
e.g. CLDC. network category so that it had 2650 classes, similar to
It should be noted that the implicit LM requires a sig- the training set of CASIA2.0-CASIA2.2. The experiments
nificant amount of time for training in the experiments. shows that our network have a much more balanced and
With the small corpus of PH and PFR, it takes about 10 better results on both Dataset-ICDAR and Dataset-CASIA
days to achieve good convergence with the simple FCRN compared to Sun et al. [66].
recognizer. As for CLDC corpus that has at least 12 times
more corpus than PH and PFR, we spent approximately 9
6.4 Error Analysis
weeks for optimizing implicit LM to outperform statistical
LM. Although its training process is time-consuming, in In this section, we offer typical examples, as shown in
return, implicit LM provides more accurate and much faster Fig. 7, to show the effectiveness of our method. The left-
predictions in the test phase, exhibiting great potential in top examples shown in Fig. 7 demonstrate that sig3 takes
practical applications. As listed in Table 4, the system per- advantage of online information to recognize ambiguous
formance can be further improved by jointly applying both characters. The left-middle examples show that the residual
the implicit LM and the statistical language model, which recurrent network improves the optimization of the net-
further verifies the complementarity between them. work. The left-bottom examples exhibit that MC-FCRN has
a strong capability of capturing spatial context information
6.3.4 Comparison with Published State-of-the-art Methods for recognition. Finally, as shown in the right panel of the
The methods of Wang et al. [1], Zhou et al. [3] [4] and VO- figure, the implicit LM exhibit better capability of leveraging
3 [58] were all based on the segmentation strategy, which semantic context information as well as confidence informa-
are different from our segmentation-free MC-FCRN that in- tion of predicted words for recognition, as compared to the
corporate the recently developed FCN, LSTM, and CTC. As statistical language model.
listed in Table 5, our proposed method, 2C-FCRN with the
implicit LM trained on corpus CLDC, demonstrates supe-
rior performance to state-of-the-art results on both Dataset- 7 C ONCLUSION
ICDAR and Dataset-CASIA. Furthermore, we integrate the In this paper, we addressed the challenging problem of un-
implicit LM with the statistical language model to leverage constrained online handwritten Chinese text recognition by
the complementarity of them and the results are listed in proposing a novel system that incorporates path signature, a
Table 5. It can be observed that our method significantly multi-spatial-context fully convolutional recurrent network
outperforms the best results on Dataset-ICDAR and Dataset- (MC-FCRN), and an implicit language model. We exploited
CASIA, with a relative error reduction of 29.04% and 47.83% the spatial structure and online information of pen-tip trajec-
on accuracy rate, respectively. Unlike the above-mentioned tories with a powerful path signature. Experiments showed
0162-8828 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPAMI.2017.2732978, IEEE
Transactions on Pattern Analysis and Machine Intelligence
A SUBMISSION TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 13
that the path signature truncated at level two achieves a [14] D. K. Sahu and M. Sukhwani, Sequence to sequence learning for
reasonable trade-off between efficiency and complexity for optical character recognition, CoRR, vol. abs/1511.04176, 2015.
OHCTR problem. For spatial context learning, we presented [15] G. Seni and E. Cohen, External word segmentation of off-line
handwritten text lines, Pattern Recognition, vol. 27, no. 1, pp. 41
the residual recurrent network to accelerate the convergence 52, 1994.
process and improve the optimization results without in- [16] M. Cheriet, N. Kharma, C.-L. Liu, and C. Suen, Character recogni-
troducing extra parameters or computational burden to the tion systems: a guide for students and practitioners. John Wiley &
Sons, 2007.
system. For multi-spatial context learning, we demonstrated
[17] J. Schenk and G. Rigoll, Novel hybrid nn/hmm modelling tech-
that our MC-FCRN successfully exploits multiple spatial niques for on-line handwriting recognition, in Tenth International
contexts from receptive fields with multiple scales to robust- Workshop on Frontiers in Handwriting Recognition. Suvisoft, 2006.
ly recognize the input signature feature maps. For semantic [18] S. Marukatat, T. Artieres, P. Gallinari, and B. Dorizzi, Sentence
context learning, an implicit LM was developed to learn recognition through hybrid neuro-markovian modeling, Sixth
International Conference on Document Analysis and Recognition, pp.
to make predictions conditioned on the entire predicting 731735, 2001.
feature sequence, significantly improving the system per- [19] A. Graves, Supervised sequence labelling. Springer, 2012.
formance. In the experiments, our best result significantly [20] Y. Pan, T. Mei, T. Yao, H. Li, and Y. Rui, Jointly modeling
outperformed all other existing method on two standard embedding and translation to bridge video and language, in
Proceedings of the IEEE Conference on Computer Vision and Pattern
benchmarks Dataset-ICDAR and Dataset-CASIA. Recognition, 2016, pp. 45944602.
One limitation of the proposed implicit LM is that it [21] S. Venugopalan, M. Rohrbach, J. Donahue, R. Mooney, T. Darrell,
involves too much training time in the case of large corpus. and K. Saenko, Sequence to sequence-video to text, pp. 4534
How to accelerate the training procedure of the RNN-based 4542, 2015.
implicit LM is still a challenging problem and remains for [22] K. He, X. Zhang, S. Ren, and J. Sun, Identity mappings in deep
residual networks, pp. 630645, 2016.
further study. [23] T. Lyons and Z. Qian, System control and rough paths,(2002).
[24] T. Lyons, Rough paths, signatures and the modelling of functions
on streams, CoRR, vol. abs/1405.4537, 2014.
R EFERENCES [25] B. Hambly and T. Lyons, Uniqueness for the signature of a path
of bounded variation and the reduced path group, Annals of
[1] D.-H. Wang, C.-L. Liu, and X.-D. Zhou, An approach for real- Mathematics, pp. 109167, 2010.
time recognition of online chinese handwritten sentences, Pattern [26] F. Kimura, K. Takashina, S. Tsuruoka, and Y. Miyake, Modified
Recognition, vol. 45, no. 10, pp. 36613675, 2012. quadratic discriminant functions and the application to chinese
[2] Q.-F. Wang, F. Yin, and C.-L. Liu, Handwritten chinese text character recognition, IEEE Transactions on Pattern Analysis and
recognition by integrating multiple contexts, IEEE Transactions on Machine Intelligence, no. 1, pp. 149153, 1987.
Pattern Analysis and Machine Intelligence, vol. 34, no. 8, pp. 1469 [27] R. Plamondon and S. N. Srihari, Online and off-line handwriting
1481, 2012. recognition: a comprehensive survey, IEEE Transactions on Pattern
[3] X.-D. Zhou, D.-H. Wang, F. Tian, C.-L. Liu, and M. Nak- Analysis and Machine Intelligence, vol. 22, no. 1, pp. 6384, 2000.
agawa, Handwritten chinese/japanese text recognition using [28] B. Verma, J. Lu, M. Ghosh, and R. Ghosh, A feature extraction
semi-markov conditional random fields, IEEE Transactions on technique for online handwriting recognition, vol. 2, pp. 1337
Pattern Analysis and Machine Intelligence, vol. 35, no. 10, pp. 2413 1341, 2004.
2426, 2013. [29] Z.-L. Bai and Q. Huo, A study on the use of 8-directional fea-
[4] X.-D. Zhou, Y.-M. Zhang, F. Tian, H.-A. Wang, and C.-L. Li- tures for online handwritten chinese character recognition, Eighth
u, Minimum-risk training for semi-markov conditional ran- International Conference on Document Analysis and Recognition, pp.
dom fields with application to handwritten chinese/japanese text 262266, 2005.
recognition, Pattern Recognition, vol. 47, no. 5, pp. 19041916, [30] F. Biadsy, J. El-Sana, and N. Y. Habash, Online arabic handwriting
2014. recognition using hidden markov models, in Tenth International
[5] Y.-C. Wu, F. Yin, and C.-L. Liu, Improving handwritten chi- Workshop on Frontiers in Handwriting Recognition. Suvisoft, 2006.
nese text recognition using neural network language models and [31] C.-L. Liu and X.-D. Zhou, Online japanese character recognition
convolutional neural network shape models, Pattern Recognition, using trajectory-based normalization and direction feature extrac-
vol. 65, pp. 251264, 2017. tion, in Tenth International Workshop on Frontiers in Handwriting
[6] Y. Bengio, Markovian models for sequential data, Neural com- Recognition. Suvisoft, 2006.
puting surveys, vol. 2, no. 1049, pp. 129162, 1999.
[32] W. Yang, L. Jin, Z. Xie, and Z. Feng, Improved deep convolu-
[7] T.-H. Su, T.-W. Zhang, D.-J. Guan, and H.-J. Huang, Off-line
tional neural network for online handwritten chinese character
recognition of realistic chinese handwriting using segmentation-
recognition using domain-specific knowledge, 13th International
free strategy, Pattern Recognition, vol. 42, no. 1, pp. 167182, 2009.
Conference on Document Analysis and Recognition (ICDAR), pp. 551
[8] A. Graves, M. Liwicki, S. Fernandez, R. Bertolami, H. Bunke, and 555, 2015.
J. Schmidhuber, A novel connectionist system for unconstrained
[33] W. Yang, L. Jin, D. Tao, Z. Xie, and Z. Feng, Dropsample: a new
handwriting recognition, IEEE Transactions on Pattern Analysis and
training method to enhance deep convolutional neural networks
Machine Intelligence, vol. 31, no. 5, pp. 855868, 2009.
for large-scale unconstrained handwritten chinese character recog-
[9] M. Liwicki, A. Graves, H. Bunke, and J. Schmidhuber, A novel
nition, Pattern Recognition, vol. 58, pp. 190203, 2016.
approach to on-line handwriting recognition based on bidirec-
tional long short-term memory networks, Proc. 9th Int. Conf. on [34] T. Bluche and R. Messina, Scan, attend and read: End-to-end
Document Analysis and Recognition, vol. 1, pp. 367371, 2007. handwritten paragraph recognition with mdlstm attention, CoR-
[10] R. Messina and J. Louradour, Segmentation-free handwritten chi- R, vol. abs/1604.03286, 2016.
nese text recognition with lstm-rnn, 13th International Conference [35] T. Bluche, Joint line segmentation and transcription for end-to-
on Document Analysis and Recognition (ICDAR), pp. 171175, 2015. end handwritten paragraph recognition, pp. 838846, 2016.
[11] Z. Xie, Z. Sun, L. Jin, Z. Feng, and S. Zhang, Fully convolutional [36] J. Donahue, L. Anne Hendricks, S. Guadarrama, M. Rohrbach,
recurrent network for handwritten chinese text recognition, CoR- S. Venugopalan, K. Saenko, and T. Darrell, Long-term recurrent
R, vol. abs/1604.04953, 2016. convolutional networks for visual recognition and description,
[12] B. Shi, X. Bai, and C. Yao, An end-to-end trainable neural network Proceedings of the IEEE Conference on Computer Vision and Pattern
for image-based sequence recognition and its application to scene Recognition, pp. 26252634, 2015.
text recognition, IEEE Transactions on Pattern Analysis and Machine [37] R. K. Srivastava, K. Greff, and J. Schmidhuber, Training very deep
Intelligence, 2016. networks, in Advances in neural information processing systems,
[13] P. He, W. Huang, Y. Qiao, C. C. Loy, and X. Tang, Reading scene 2015, pp. 23772385.
text in deep convolutional sequences, 2016. [38] , Highway networks, CoRR, vol. abs/1505.00387, 2015.
0162-8828 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPAMI.2017.2732978, IEEE
Transactions on Pattern Analysis and Machine Intelligence
A SUBMISSION TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 14
[39] K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning [64] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick,
for image recognition, in Proceedings of the IEEE Conference on S. Guadarrama, and T. Darrell, Caffe: Convolutional architecture
Computer Vision and Pattern Recognition, 2016, pp. 770778. for fast feature embedding, Proceedings of the ACM International
[40] K. Grauman and T. Darrell, The pyramid match kernel: Dis- Conference on Multimedia, pp. 675678, 2014.
criminative classification with sets of image features, Tenth IEEE [65] C.-L. Liu, F. Yin, Q.-F. Wang, and D.-H. Wang, ICDAR 2011
International Conference on Computer Vision, vol. 2, pp. 14581465, chinese handwriting recognition competition (2011).
2005. [66] L. Sun, T. Su, C. Liu, and R. Wang, Deep lstm networks for on-
[41] D. G. Lowe, Distinctive image features from scale-invariant key- line chinese handwriting recognition, in Frontiers in Handwriting
points, International journal of computer vision, vol. 60, no. 2, pp. Recognition (ICFHR), 2016 15th International Conference on. IEEE,
91110, 2004. 2016, pp. 271276.
[42] K. He, X. Zhang, S. Ren, and J. Sun, Spatial pyramid pooling
in deep convolutional networks for visual recognition, IEEE Zecheng Xie is a PhD student in information and
Transactions on Pattern Analysis and Machine Intelligence, vol. 37, communication engineering at the South Chi-
no. 9, pp. 19041916, 2015. na University of Technology. He received a BS
[43] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, in electronics and information engineering from
D. Erhan, V. Vanhoucke, and A. Rabinovich, Going deeper with South China University of Technology in 2014.
convolutions, Proceedings of the IEEE Conference on Computer Vision His research interests include machine learning,
and Pattern Recognition, pp. 19, 2015. document analysis and recognition, computer vi-
[44] Q.-F. Wang, F. Yin, and C.-L. Liu, Integrating language model in sion, and human-computer interaction.
handwritten chinese text recognition, 10th International Conference
on Document Analysis and Recognition, pp. 10361040, 2009.
[45] Y. Bengio, H. Schwenk, J.-S. Senecal, F. Morin, and J.-L. Gauvain,
Zenghui Sun is a master student in commu-
Neural probabilistic language models, Innovations in Machine
nication and information system at the South
Learning, pp. 137186, 2006.
China University of Technology. He received a
[46] L. Vilnis and A. McCallum, Word representations via gaussian
BS in electronics and information engineering
embedding, International Conference on Learning Representations
from South China University of Technology. His
(ICLR), 2014.
research interests include machine learning and
[47] T. Mukherjee and T. Hospedales, Gaussian visual-linguistic em-
computer vision.
bedding for zero-shot recognition. EMNLP, 2016.
[48] I. Sutskever, O. Vinyals, and Q. V. Le, Sequence to sequence
learning with neural networks, Advances in neural information
processing systems, pp. 31043112, 2014.
[49] T. Mikolov, M. Karafiat, L. Burget, J. Cernocky, and S. Khudan- Lianwen Jin (M98) received the B.S. degree
pur, Recurrent neural network based language model. INTER- from the University of Science and Technology
SPEECH, vol. 2, p. 3, 2010. of China, Anhui, China, and the Ph.D. degree
[50] M. Sundermeyer, R. Schluter, and H. Ney, Lstm neural networks from the South China University of Technology,
for language modeling. INTERSPEECH, pp. 194197, 2012. Guangzhou, China, in 1991 and 1996, respec-
[51] K.-T. Chen, Integration of pathsa faithful representation of paths tively. He is a professor in the College of Elec-
by noncommutative formal power series, Transactions of the Amer- tronic and Information Engineering at the South
ican Mathematical Society, vol. 89, no. 2, pp. 395407, 1958. China University of Technology. His research in-
[52] B. Graham, Sparse arrays of signatures for online character terests include handwriting analysis and recog-
recognition, CoRR, vol. abs/1308.0371, 2013. nition, image processing, machine learning, and
[53] J. Long, E. Shelhamer, and T. Darrell, Fully convolutional net- intelligent systems. He has authored over 100
works for semantic segmentation, Proceedings of the IEEE Con- scientific papers. He has received the New Century Excellent Talent
ference on Computer Vision and Pattern Recognition, pp. 34313440, Program of MOE Award and the Guangdong Pearl River Distinguished
2015. Professor Award, and is a member of the IEEE Computational Intel-
[54] S. Hochreiter and J. Schmidhuber, Long short-term memory, ligence Society, IEEE Signal Processing Society, and IEEE Computer
Neural computation, vol. 9, no. 8, pp. 17351780, 1997. Society.
[55] D. Bahdanau, K. Cho, and Y. Bengio, Neural machine translation
by jointly learning to align and translate, CoRR, 2014. Hao Ni is a senior lecturer in financial mathe-
[56] A. Graves and N. Jaitly, Towards end-to-end speech recognition matics at UCL since September 2016. Prior to
with recurrent neural networks, Proceedings of the 31st Internation- this she was a visiting postdoctoral researcher at
al Conference on Machine Learning (ICML-14), pp. 17641772, 2014. ICERM and Department of Applied Mathematics
[57] C.-L. Liu, F. Yin, D.-H. Wang, and Q.-F. Wang, Casia online at Brown University from 2012/09 to 2013/05
and offline chinese handwriting databases, 2011 International and continued her postdoctoral research at the
Conference on Document Analysis and Recognition (ICDAR), pp. 37 Oxford-Man Institute of Quantitative Finance un-
41, 2011. til 2016. She finished her D.Phil. in mathematics
[58] F. Yin, Q.-F. Wang, X.-Y. Zhang, and C.-L. Liu, ICDAR 2013 in 2012 under the supervision of Professor Terry
chinese handwriting recognition competition, 2013 12th Interna- Lyons at University of Oxford.
tional Conference on Document Analysis and Recognition (ICDAR), pp.
14641470, 2013. Terry Lyons is the Wallis Professor of Math-
[59] the people s daily corpus. http://icl.pku.edu.cn/icl groups/ ematics of Oxford university. He was a found-
corpus/dwldform1.asp, [Online] the People s Daily News and ing member (2007) of, and then Director (2011-
Information Center, the Peking University Institute of Computa- 2015) of, the Oxford Man Institute of Quantita-
tional Linguistics and Fujitsu Research and Development Center tive Finance. He was the Director of the Wales
Limited. Accessed March 25, 2016. Institute of Mathematical and Computational Sci-
[60] G. Jin., The ph corpus. ftp://ftp.cogsci.ed.ac.uk/pub/chinese, ences (WIMCS; 2008-2011). Lyons came to Ox-
[Online] Accessed March 25, 2016. ford in 2000 having previously been Professor of
[61] Chinese linguistic data consortium. http://www.chineseldc. Mathematics at Imperial College London (1993-
org, [Online] the Contemporary Corpus developed by State Lan- 2000), and before that he held the Colin Maclau-
guage Commission P.R.China, Institute of Applied Linguistics, rin Chair at Edinburgh (1985-93). His research
2009 Accessed October 22, 2016. interests are focused on Rough Paths, Stochastic Analysis, and Appli-
[62] A. Stolcke et al., Srilm-an extensible language modeling toolkit. cations. He is also interested in developing mathematical tools that can
INTERSPEECH, vol. 2002, p. 2002, 2002. be used to effectively model and describe high dimensional systems
[63] S. Ioffe and C. Szegedy, Batch normalization: Accelerating deep that exhibit randomness. He was President of the UK Learned Society
network training by reducing internal covariate shift, CoRR, vol. for Mathematics, the London Mathematical Society (2013-2015).
abs/1502.03167, 2015.
0162-8828 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.