You are on page 1of 28

02/2013

#6
Languages and translation

Machine
translation
Translation
CONTENTS
ABOUT Machine translation
No rage against the machine 4
MT@Work Conference: by practitioners for practitioners 6
Evaluating Machine Translation: preliminary findings from the 1st DGT-wide translators’ survey 10
Technical Challenges for Machine Translation in the European Institutions 12
Working with Translators 14
The insatiable appetite for data 16

VOICES FROM OUTSIDE


Predicting Translation 18
Choose your own translation future 20
TAPTA4UN: machine translation collaboration between the United Nations and
the World Intellectual Property Organization 22

INTERVIEW
Machine translation: a tool to embrace and master 24

USEFUL INFORMATION
John Hutchins: the recognised historian of Machine Translation 26
Research in the field of Language Technologies 26
HIGHLIGHTS: — From the websites of some EU-funded MT research projects 26
— From websites of some other EU-funded research projects, initiatives and networks 27

Contributors
E DITORIAL par Anabela Pereira,
rédactrice-en-chef

E
n septembre 2010, la Direction générale de la traduction (DGT) a
lancé le magazine thématique “Languages and translation”, in-
Languages and cluant des contributions d’auteurs extérieurs au microcosme
Translation de la fonction publique européenne. La présente édition
No 6 February 2013 sera la dernière de cette série et a pour thème la traduction au-
tomatique (TA). Le moment est venu de dresser le bilan de toute
l’expérience que la DGT a accumulée dans ce domaine.
Editor:
Anabela Pereira L’utilisation de systèmes de TA à la DGT remonte aux années
Graphic, layout and typesetter: 90. Les vagues d’élargissement successives se sont traduites
Philippe Marchetto par une nette augmentation du nombre de langues officielles
et donc du nombre de combinaisons linguistiques possibles.
Cette richesse linguistique, qui semblait avoir sonné le glas du
Published by the système à base de règles utilisé auparavant, place désormais
Directorate-General la DGT dans une situation privilégiée en ce qui concerne le dével-
for Translation oppement des nouveaux systèmes statistiques.
(Pinuccia Contino,
DGT.02 - Communication and En effet, une quantité considérable de données est produite quotidiennement à la DGT par un corps de
relations with stakeholders Unit)
traducteurs hautement qualifiés, dans les 23 langues officielles de l’Union européenne. Cette masse de
données alimente depuis 10 ans la base de données EURAMIS dans un format aisément exploitable par
les moteurs de TA, créant ainsi des conditions idéales pour le développement et le perfectionnement de
Printed by ces systèmes statistiques.
Office for Infrastructure
and Logistics, Brussels Notre Directeur général, M. Rytis Martikonis, a, dès son arrivée à la DGT, pleinement soutenu la TA,
conscient de sa valeur stratégique comme outil de valorisation de la diversité linguistique européenne.
Des moyens importants ont été investis par l’UE pour développer les applications complexes qui sous-
Download this magazine from tendent ces systèmes. En 2010, la DGT lançait le projet MT@EC visant le développement d’un système
EUROPA (the EU’s website): de TA statistique basé sur la plateforme logicielle libre Moses. Fin décembre 2012, la DGT organisait la
ec.europa.eu/dgs/translation/
première d’une série de conférences dédiées à la TA afin de réunir toutes les parties concernées et de
publications/magazines
tirer des conclusions utiles pour l’avenir.

Le projet est porté par une équipe compétente et motivée, soutenue par de très nombreux traducteurs
Address editorial enthousiastes qui apportent une contribution déterminante. Nous avons recueilli les témoignages d’un
correspondence to: certain nombre de ces collègues, grâce auxquels vous pourrez vous faire une idée assez précise de l’état
Communication and relations d’avancement de ce projet à la DGT et des conclusions de la conférence. Le présent numéro comprend
with stakeholders Unit également des contributions d’acteurs extérieurs au service, qui donnent une image plus globale des
Directorate-General for
Translation
développements en matière de TA.
European Commission
rue de Genève, 6
Les traducteurs n’ont cependant aucun souci à se faire: les systèmes de TA ne sont qu’un outil parmi
Office G-6 6/16, d’autres, qui leur est proposé pour leur faciliter la tâche et non pas pour les remplacer! Aucune machine
B-1140 Brussels ne pourra jamais faire preuve du même discernement qu’un être humain dans le choix des meilleures
Belgium solutions de traduction! 

Neither the European Commission


nor any person acting on its behalf
is responsible for any use which
might be made of the information
contained in Languages and
translation. This is not an official
publication and neither the
Commission nor any of its services
are bound in any way by its
contents.
01/2013 #6 3
A BOUT Machine translation

No rage against the machine


by Josep Bonet, Head of the Informatics Unit in DGT

Machine translation, or MT for


short, has a history in DGT. Even
in the early 90s an MT system
based on rules was in use by some
translators. At the time, a limited
number of languages were covered,
with varying degrees of quality. As
the story goes, the French, English
or German translators abhorred it,
while the Spanish or the Portuguese
rather liked it. Indeed, around 1995
I was working in a unit where all
incoming documents, provided
they were in French or English
— i.e. almost all of them — were
systematically pre-translated by the
clerical assistants using the ECMT
system in place at the time, and
almost everybody used the output
on all occasions.

H
owever, MT received a bad financial resources were devoted to memory technology require less in-
blow around the time of the MT, at a time when cost control was vestment, it also yielded the same
big bang that shook the EU. coming to the fore. To get a language results for all languages, be they
The rule-based system had pair up to a sufficient level of quality Romance, Slavic or Finno-Ugric. So
been enlarged from covering two required years of effort and substan- DGT stopped investing in MT, except
languages to covering ten. And this tial injections of money. But, worst of where Language Departments were
at a moment when the EU had eleven all, with nine new languages being ready to use their own translators
official languages! Quite significant added in one go, and three more to
to improve or correct dictionaries.
be added less than three years after,
The prospect was quite bleak. A
improving the system was mission
system based on rules needs con-
impossible. Resources would never
be made available to turn MT into a stant fine-tuning of those rules, but
‘To get a language pair technology for all. especially updating of glossaries.
It was clear that after a few years,
up to a sufficient level of the system would reach a point of
The MT winter came — not quite like
quality required years the nuclear one, but cold neverthe- degradation which would render it
useless even for the ‘good’ pairs.
of effort and substantial less. MT lost steam. It was consid-
ered a non-democratic technology, Eventually it was an external factor,
injections of money.’ compared to translation memories. a judgment by the General Court of
Indeed, not only did translation the EU, which forced the system to

4 01/2013 #6
A BOUT Machine translation
be discontinued. That was the end all the machine does is analyse par- The uptake was quite rapid in the
of it. RIP. allel text to identify the frequency beginning and extremely fast soon
at which, when a word A is found after. Suddenly, MT was not a laugh-
Translation memories, on the oth- in the source language, a word B is able technology, known for the funny
er hand, while being a wonder- found in the target one. The higher mistranslations — for instance, ‘the
ful breakthrough of the 90s which the frequency, the more probable it spirit is willing, but the flesh is weak’
had made it possible to reuse past translated as ‘the vodka is good, but
translations, thus guaranteeing the meat is rotten’. MT was useful,
higher productivity and much higher because in the meantime the world
terminological and phraseological had gone more global than ever and
consistency, were not the ultimate
solution either. With very repeti-
‘MT was useful, people were happy to accept an in-
ferior-quality translation rather than
tive text, with files produced using because in the no translation at all.
strong input from past documents,
they could yield astonishing results. meantime the world
It is in this context that DGT decided
But in the European Commission, had gone more global in 2010 to develop an entirely new
which has the right of initiative,
i.e. deals with many new subjects, than ever and people data-driven MT system which would
cover all languages in pairs with at
match rates of 30% are reckoned to
be rather good for standard docu-
were happy to accept least one of the procedural languag-
es. The aim was to better serve our
ments. Translators knew that much an inferior-quality customers in two ways: by providing
more wisdom was buried in the
huge memories that DGT held in the translation rather them with a self-service system of-
fering raw translations, and by im-
European Commission’s Data Cen-
tre. A simple function like concord-
than no translation proving the efficiency of translators
ance yielded good results, and in at all.’ who, by using MT output as one of
the language tools at their disposal,
context, answered almost 100% of
could produce more text with even
queries. Many phrases were buried
better consistency. It was further felt
in thousands of sentences, but were
that, given the restricted nature of
not being retrieved with memory
many texts translated in the Europe-
technology because the remainder is that we’ve found the good trans-
an Commission, be it by translators
of the sentence was completely dif- lation. Over and above this, the ma-
or by any other user, it was important
ferent. It was clear to many that bil- chine computes the probabilities
to use a system running on the Com-
lions of bilingual parallel sentences, that groups of 2 to 7 or 8 words ap-
mission’s premises.
such as DGT kept in its digital vaults, pear in a certain order. The rest is
could and should be squeezed to ex- simple. With the first probabilities, it
tract all the knowledge contained in translates all words. With the sec- The rest is pure engineering. Very
them. ond ones, it reassembles the words soon 42 language pairs had been
and creates sentences. The result created and tested. Ten language
The answer was at the other side of sometimes is not very good. But departments were satisfied with the
town — a small town called Luxem- many other times it is a perfectly resultant quality levels and were
bourg. A succession of EU Research idiomatic expression! included in a scheme automati-
Framework Programmes had been cally producing MT for all incoming
funding research projects on MT for Around the same time, a quiet revo- requests to be translated into their
many years. One of them produced lution was in the making out there. A respective languages. That number
Moses, the world’s most heavily well-known company built around a has now risen to 17 languages. Ver-
used open-source system, to create popular search engine had been of- sion 2 of the language pairs, which
MT pairs with statistical methods fering MT for a while, based on the now stood at 52, was produced and
by crunching millions of sentences, same rule-based technology that declared good. And a system to ca-
preferably parallel text. MT was at DGT had used in the previous cen- ter for the anticipated big demand
the fingertips of anyone with a PC … tury. And it then decided to offer all was developed. The system will
and lots of data. And, curiously, no internet surfers free statistical MT become fully operational on 1 July
linguistic knowledge was needed, at produced with all the corpora they 2013. This day will mark the end of
least to start with. Believe it or not, could obtain by crawling the web. winter. 

01/2013 #6 5
A BOUT Machine translation

MT@Work Conference: by practitioners for practitioners


by Piet Verleysen, Machine Translation System Owner, with input from Daniel Kluvanec, Machine Translation Business
Manager, Paula Simpson, Policy Officer, and Hilário Leal Fontes, senior translator

DGT has been using and developing machine translation (MT) for many
years now. The time has now come to capitalise on all this experience. On
30 November 2012, DGT organised a conference in Luxembourg on ‘MT@
work’, the first of its kind. It took the form of a trialogue between partners
from three different worlds of professional translation, all of them with
their own particular technology solutions for machine translation as a part
of their workflow: the Directorate General for Translation of the European
Commission, the Documentation Division of the United Nations, and TAUS
(the Translation Automation User Society). The aim of the conference was to
share knowledge between professionals on the impact of MT for professional
translation in the EU institutions, and to take a further step in the direction
of better tools and better integration.

P
articipants, predominantly with ideas and which helped to make
from translation services of the overall picture clear. Where do we
all the EU institutions, pro- stand on using state-of-the-art tech-
duced a day which crackled nologies in legislative drafting and
institutional translation? What are
our aims as we take this profession all” approach possible, at least not
forward? for the moment.

There should be NO FEAR, at least for


‘The conference took A real CHALLENGE: with technology the coming generations of profes-
evolving ever faster, how do we keep
the form of a trialogue track of it and how does it impact our
sional translators, of anyone being
“replaced” by a computer.
between partners working processes? We are continu-
ally being asked to create more and
from three different more added value with fewer and
It’s all about BIG DATA, so we have
every interest in sharing our data
worlds of professional fewer resources. So the important with trusted partners and in boosting
thing is not to work harder, but to
translation, all of work smarter. Where does transla-
interinstitutional (EU) and interna-
tional (UN) collaboration. This should
them with their own tion stand in terms of industrial revo- be an easy thing to organise, and we
lution history?
particular technology will all benefit from it immediately.

solutions for machine MT is ‘only’ a TOOL, just another tool Big data thrusts the QUALITY of our
translation as a part of — how powerful it is depends upon
the way we use it. MT quality varies
central memories into the limelight.
Do we need to get a tighter grip on
their workflow.’ enormously from one language to our “linguistic resources manage-
another. There is no “one-size-fits- ment”?

6 01/2013 #6
A BOUT Machine translation
What do we need to MEASURE (bet- panel composed of representatives are much less keen or not at all (pre-
ter) in order to master our profes- of all EU institutions involved in MT dominantly agglutinative target lan-
sion? Measuring quality is problem- development or testing. During the guages or languages extensively us-
atic. Benchmarking is key. Human course of the year, in addition to MT ing composita).
vs. automatic evaluation needs to use by DGT, test machine translation
be considered, but more work needs samples had also been prepared for The same pattern emerges from the
to be done here (see TAUS Dynamic most of the other EU institutions. recent DGT customer satisfaction
Quality Evaluation Framework). The purpose of the workshop was to survey, which also points to a grow-
assess the results, and thus gain a ing number of users. The number
Is it not all about CHANGE manage- comprehensive picture of the perfor-
ment? Don’t push — if the tool is OK mance and quality of the pre‑release
it will sell itself. Experiment, don’t be system MT@EC. It also sought to
afraid to fail, celebrate success! Take identify specific areas for coopera-
account of the staff learning curve. tion, as well as possible synergies at ‘The morphological
The conference featured three interinstitutional level. richness of languages
workshops: Machine Translation
Strengths and Flaws, chaired by
and the differences in
There was a general consensus —
Daniel Kluvanec (Machine Transla- not only from DGT but also from the syntax of languages have
tion Business Manager), Manage-
ment Considerations, chaired by
other institutions — that the mor- a direct impact on the
phological richness of languages
Viorel Florean (Head of Romanian (i.e. on a scale ranging from analyti- quality of MT output.’
Translation unit) and Translators’ cal, moderately inflected, highly in-
Considerations, chaired by Cristina flected to agglutinated) and the dif-
De Preter (Head of the Portuguese ferences in syntax of languages
Language Department). The main have a direct impact on the quality
conclusions of each of these work- of ‘faithful’ users is rising too. The
of MT output. Certain language pairs fact that this year’s user conference
shops are presented below. work better than others, something was brought forward from its intend-
which is then reflected in the different ed 2013 date is a further indicator of
Machine Translation Strengths levels of satisfaction amongst us- that growth in enthusiasm. Voluntary
and Flaws ers: some (predominantly analytical uptake of MT in a translator’s toolbox
target languages) are very enthusi- clearly indicates its added value for
The discussion was led and steered astic, whereas others (predominantly the professional translator — at least
by a linguistically well balanced heavily inflected target languages) for some languages and some texts
in some contexts.

Subject matter of the texts to be


translated often impacts on the qual-
ity of MT. A reasonable rule of thumb
is simply a yes/no as to whether sim-
ilar ideas (at the sub-sentence level)
have already been processed in the
past, which means they were present
in the human translation corpus. A
further factor is their frequency in
corpora.

In many cases, once translators have


overcome their initial reticence they
are pleasantly surprised by the sys-
tem, and start to rely on it as an aid
— i.e. for typographical support, as a
source of lexical inspiration and for
Director General Rytis Martikonis chairing the MT@work conference, the first of its kind, 30th November 2012. gisting purposes. However, it is im-

01/2013 #6 7
A BOUT Machine translation
portant, before any prejudice sets
in, to try using MT systematically
for at least two weeks in order to
get used to its strengths and flaws.
Nevertheless, it is not very clear yet
whether the skills translators need
to work efficiently with MT are or are
not transferable.

On the other hand, there still seems


to be a sizeable editing distance
between machine and human trans-
lation for a text of publication qual-
ity — even to the point of editing
being so costly that it may be better
to translate from scratch. Speedier
translation, though, might require
more thorough revision. Computa-
tional linguistics apparently still has
a long way to go, at least for highly
The MT@work conference brought people from the translation services of the EU institutions together
inflected or agglutinated languages. to discuss the challenges related to machine translation.

The general impression remains that


there are many questions that tech- from other subfields of linguistics translator’s dream, leading to a loss
nology has either not yet answered (like semantics or pragmatics). of motivation. On the other, the use
or not even addressed. This is partly of MT could have a liberating effect:
due to an apparent lack of syntacti- translators otherwise wrapped up in
cal or morphological processing, with Management Considerations the humdrum of translating repeti-
chronic insufficiencies appearing over tive, often tedious documents, could
and over again — word order taken In the course of a lively debate, the
devote more time to translating
over mechanically from the Eng- participants grappled with various
more interesting and intellectually
lish original; inflectional morphemes key aspects of the impact of using
challenging texts.
mismatched; open compounds or machine translation, focusing pri-
composita not properly recognised; marily on the manager’s viewpoint.
Although MT holds undisputed po- The attitude of the manager is a
terminology used inconsistently, ne- key factor here. Leading by exam-
gation applied unreliably, etc. In ad- tential value, the full picture in terms
of benefits and drawbacks has yet to ple, paired with gentle persuasion —
dition, though, it would be good if as opposed to coercion — can go a
computational linguists could be clearly defined.
long way. If translators give MT a try
tackle more subtle semantic prob- and find it useful they will naturally
lems which they could take over As far as workload and work alloca-
tion are concerned, managers should want to use it.
ideally be able to measure the
quality of the MT output (as is the Another determinant factor is the
case with Euramis, where the ‘match’ quality of central memories. ‘Cor-
‘The attitude of the is gauged as a percentage). In other rupting or contaminating’ Euramis
manager is a key words, it would be useful to have with poor quality data would have a
seriously negative impact, not only
access to a metric that is as closely
factor here. Leading by aligned as possible to human appre- compromising quality output, but
example, paired with ciation of text quality. also having a deterrent effect on us-
ers, i.e. by undermining translators’
gentle persuasion — as In terms of staff motivation, MT can trust. Consequently it is crucial, be-
be a double-edged sword. On the fore sending post-edited MT transla-
opposed to coercion — one hand, editing work linked to the tion to the central memory, to assess
can go a long way.’ use of MT can sometimes be cum- whether the data are fit to be re-
bersome, and this may not be every used. This problem could be circum-

8 01/2013 #6
A BOUT Machine translation
vented by having a separate data- quality and the translators’ working guage departments. The point was
base for texts of ‘less-than-human’ methods. made that, even when perceived as
translation quality. good or very good, MT does not al-
Feedback from translators in the ways save translators’ time. The time
The potential impact of MT on the workshop regarding the perceived saved in typing is invested in more
freelance market was also men- quality of MT varied from excel- thorough checking and research-
tioned. There was felt to be a need lent (e.g. for EN-SV in some SANCO ing, resulting in a sounder text than
to review and address questions like texts and good to very good results would otherwise be the case, but not
pricing, type of documents to be out- in general for romance languages) in a speedier translation.
sourced, appropriate level of post- to useless for ET or HU (EN-HU MT
editing, etc. was humorously evoked as a kind of The use of MT in professional trans-
“drunken” Hungarian). This is much in lation at DGT is also generating
Extending the use of MT — within the line with the findings of DGT surveys fears among translators. Fear of
Commission and to Member States and evaluations and confirms that being deprived of the choice of tools
or citizens — and the consequences MT from English into agglutina- for each job and of becoming just
of this extended use likewise called tive languages is the most chal-
for careful investigation, in terms of lenging issue for our current SMT
financial gains and organisational technology. The Baltic languages
benefits. seem to fare somewhat better,
though the quality level is still low,
‘Low expectations and a
with many grammar and word order positive attitude towards
Translators’ Considerations issues. This is also in line with the
findings. The fact is that, whatever
MT seem to have played
This workshop generated a lively de- the language family and perceived an important role in
bate about translators’ feelings to- level of quality for translation, gram-
wards, and acceptance/rejection of, mar in general and morphology in machine translation
MT, its uneven performance levels, particular seem to be a shared prob- uptake.’
and its advantages and disadvan- lem.
tages in our working environment.
Translators from the DGT, the Euro- Quality also varies with text types
pean Parliament and one panellist and/or the good/bad quality of the
from the Court of Justice voiced their original. At the one extreme, well a part of an assembly line. Fear of
opinions. written and formulaic texts seem to having to make up for the MT perfor-
be where MT performs best across all mance deficit in poorer-quality com-
The general perception was that language combinations. At the other binations. Fear of no longer being al-
current MT in DGT is very good for extreme, sloppy originals featuring lowed to use one’s creativity. Finally,
some languages, some texts and new phrases are the ones where MT fear that MT will change the transla-
some translators. In other words, MT provides the least, if any, help to the tion profession so much that it turns
quality varies greatly with language translator. into something else entirely.
combinations and types of text,
and its use varies with the perceived MT can help translators with the There is also the fear that the devel-
whole sentence (with some major opment of more difficult language
or minor changes), with scattered combinations might be left behind or
phrases or with terminology and vo- even abandoned because of an un-
cabulary, or may, in some cases, be competitive resources-to-language-
useful just for inspiration. But all this size ratio, leaving the translators who
is closely tied in with the transla- work out of and into those languages,
tor’s expectations, subject-mat- and the very languages themselves
ter and source language knowl- and their speakers, at a disadvan-
edge and/or working methods. Low tage. In this respect, it was pointed
expectations and a positive attitude out that cooperation with entities ac-
towards MT seem to have played an tive in the MT field in the countries
important role in machine translation where these languages are spoken
Daniel Kluvanec, the organizer of MT@work conference. uptake in the Maltese and Polish lan- could be mutually beneficial. 

01/2013 #6 9
A BOUT Machine translation

Evaluating Machine Translation:


preliminary findings from the first DGT-wide translators’ survey
by Hilário Leal Fontes1, Chair of the Evaluation Methodology Task Force of DGT’s Machine Translation User Group (MTUG)

In setting out to measure, evaluate and manage Machine Translation


(MT), DGT’s strategy is based on four components: usage statistics,
human evaluation, automatic evaluation and regular surveys. The first
three of these have already been used for specific tasks and/or aspects and
are not discussed here.

T
he first DGT-wide survey of A first snapshot
the MT service offered to its
translators was organised The introduction to the survey point-
by the MTUG and ran during ed out that translators’ views were
November 2012, about one year and important: to gauge their percep-
a half after DGT started to offer MT tions of the usefulness of current MT
from and into English and a few oth- for their translation work and, more
er historical language combinations specifically, to draw as complete an
featuring French (52 language pairs overview as possible about:
in all). • how much use translators make of
each MT language pair;
Some Language Departments (e.g. • how highly each MT language pair
Portuguese and Spanish) had been is rated for translation work;
using rule-based MT fairly extensive- options for each question), but with
• why people currently use MT or an
ly up to end of 2010, and the Por- plenty of free-texts boxes so that re-
MT language pair;
tuguese Department had been using spondents could make their opinions
its own home-brewed Moses engines • why people do not currently use known.
between 2010 and the first half of MT or an MT language pair;
2011. However, for most Language • important shortcomings that The response rate was very encour-
Departments, using MT was a novel prevent people from using an MT aging: 763 translators — about half
experience. language pair in its current state. of DGT’s translation population — re-
plied. It was also encouraging to see
that 535 respondents had used MT in
The survey their translation work over the previ-
ous six months. Of the others, 87 had
‘Translators’ views The survey was published online; all
DGT staff were encouraged to reply,
tried but not used MT, and 141 had
are important and responses were anonymous and
never tried to use MT in their transla-
tion work.
voluntary. In preparing the survey,
to gauge their the Task Force built on the experi-
perceptions of ence from previous partial surveys, How much?
and the fine details were discussed
the usefulness of with colleagues from DGT’s Statistics Translators were asked how often
they had used MT for a given lan-
Unit.
current MT for their guage pair over the past six months.
Technically, the survey was designed Each respondent could rate up to five
translation work.’ for quick completion, with many mul- language pairs that they had used
tiple-choice questions (up to three (535 responses yielded 643 ratings).

10 01/2013 #6
A BOUT Machine translation
The largest groups comprised trans- ings) and for 25‑50 % (87 ratings) of Why not use MT?
lators who said they had used MT for their translation jobs.
more than 75  % (200 ratings) and For people who said they had never
for up to 25 % (273 ratings) of their The following table shows the dif- used MT, the five most quoted rea-
translation jobs. Well behind, in third ferent translation engines grouped sons, in descending order, were:
and fourth places, came those who in five clusters according to the pre- • need to know more about MT
had used MT for 50‑75 % (83 rat- dominant uptake pattern: before using (58 times);
• MT quality is too poor (52 times);
Only users Only users Majority of users Majority of users Majority of users
with less than with max. with usage rate of with usage rate with usage rate • MT could induce insidious mistakes
25% usage 50% usage 50% or LESS around 50% of 50% or MORE
(37 times);
EN-CS, EN-DA, EN-ET, EN-EL, EN-FR, EN-BG, EN-ES,
EN->XX EN-DE EN-LT EN-FI, EN-LV, EN-NL, EN-IT, EN-MT, EN-PT, EN-RO, • I can type and translate quickly
EN-PL, EN-SK EN-SL EN-SV (20 times);
FR->XX FR-IT FR-ES, FR-PT
BG-EN, • MT dumbs down the language and
FR-EN
XX->EN DE-EN CS-EN, IT-EN, EL-EN, ES-EN PT-EN can give rise to suboptimal work
PL-EN (20).
Note: Only language combinations with five or more ratings were considered.
For those who had tried MT but had
How good? 2 — Many words or partial phrases not used it in their daily work over
reusable with acceptable editing the past six months, the five most
Translators were also asked to rate — 368 ratings quoted reasons, in descending order,
the different engines they had tried 1 — Some words or partial phrases were:
or used on a 0-4 scale; the results making me aware of alternatives • too many changes needed (40
were (some translators rated more — 118 ratings times);
than one language pair; 622 re-
0 — Translating from scratch • MT quality was too poor (38
sponses yielded 726 ratings):
outperforms any other benefit times);
4 — Most segments reusable, similar — 55 ratings, mostly from
• MT disrupted working methods
to good translation memory translators who had tried MT but
(21 times);
matches — 92 ratings decided not to use it
• MT was not a time-saver
3 — Most segments reusable, similar The following table sets out the dif- as terminology still needed
to poor translation memory ferent engines in three broad clus- rechecking (16 times);
matches — 93 ratings ters:
• MT induces mistakes they would
Majority of 0, 1 and 2 Around 2 Majority of 2, 3 and 4 not otherwise have made (12
EN-BG, EN-CS, EN-FR, EN-LV, EN-DA, EN-EL, EN-ES, EN-IT,
times)
EN->XX EN-DE, EN-ET, EN-FI, EN-LT
EN-NL, EN-PL, EN-SK, EN-SL EN-MT, EN-PT, EN-RO, EN-SV
FR->XX FR-ES, FR-IT, FR-PT
XX->EN BG-EN, DE-EN, IT-EN
FR-EN
CS-EN, EL-EN ES-EN, PL-EN, PT-EN
Shortcomings
Note: Only language combinations with five or more ratings were considered. In terms of important shortcomings
that prevent the use of current MT,
Why use MT? in translation memories (274 grammar/morphological rules of the
times); target language seems to be the
For people who said they had used
• start with a quick draft and most annoying factor (mentioned
MT over the past six months, the five
improve it (190 times); 204 times), followed by distortion
most quoted reasons, in descending
of meaning (173 times), source-
order, were: • helps cope with heavy workloads language word order reproduced
• MT is a typing aid (283 times); (169 times); in the target language (140 times)
• MT is a source of inspiration for • gains time for more thorough and terminological inconsistency (70
alternative translations available research (145 times). times). 

1 Hilário Fontes Leal is a senior translator that has been involved in the development and dissemination of MT within DGT’s Portuguese
Department since 1999. He is also a member of DGT’s Machine Translation User Group.

01/2013 #6 11
A BOUT Machine translation

Technical Challenges for Machine Translation


in the European Institutions
by Andreas Eisele1, MT Project manager

After a long and interesting history, where phases of enthusiasm and


exaggerated hopes alternated with deep disappointment and bitter
frustration, the old dream of machine translation (MT) has finally become
a reality. Today, anyone browsing the Web can have a page translated
from an unfamiliar foreign language into something that should — at
least in principle — be basically understandable.

A
lthough the idea can be need for human experts to build and
traced back to the 17th maintain complex systems of rules.
century and patents and
prototypes of mechanical However, typical MT results quickly
translation devices were invented as reveal that the problem of machine
far back as the early 1930s, the big translation is far from being solved,
leap forward came about thanks to and there is a huge space for further
powerful computers and recent sta- improvement in translation quality.
tistical techniques, which are able to When targeting morphologically rich
extract the relevant knowledge from and complex languages, the trans-
existing translations without any lations can be peppered with wrong
endings and ill-formed constructions,
and making sense of the MT result
can be a challenge in itself.

Hence, for many important applica-


‘The big leap forward tions, sufficient quality will only be
came about thanks to achieved when the simple statisti- language to understand what
cal methods underlying the current it is about: As distribution of
powerful computers mainstream technology are com- such documents is often restricted,
and recent statistical bined with linguistically augmented using a free MT service is not
models of morphology and grammar. appropriate and our colleagues
techniques, which are In the long run, only such «hybrid» are strongly advised against it.
able to extract the approaches to MT will be able to do
justice to the complexity and rich- • Translators who support the
relevant knowledge from ness of our European languages. dissemination of documents
in all official EU languages:
existing translations The MT@EC project aims at making They need an MT system that is
without any need for state-of-the-art technology from on- aware of the correct terminology
for any given context, and this
going research available to transla-
human experts to build tors and administrators in European system needs to be integrated
and maintain complex institutions. We will mainly serve into the suite of tools used by the
translation departments.
three types of users:
systems of rules.’ • Administrators who need to • Users of on-line services that
skim a document in a foreign connect administrations in

12 01/2013 #6
A BOUT Machine translation
all EU countries: Such services itself), reordering of the constituents tribution to this volume, several on-
need to support all EU languages to match up with the preferred word going research projects are trying
and have to allow users in one order in the target language, as well to find out what shape this might
country to enter information in as morphologic analysis and gener- take in the future, and collabora-
their own language and make this ating inflected forms with the correct tion between our project and these
information accessible to users in linguistic features, such as number,
researchers should be beneficial for
another country. gender, and case. Incorporating such
both sides. Ideally, a well-integrated
improvements into our MT engines
These three types of applications requires us to do a certain amount solution would allow the translator to
lead to quite a number of interest- of research and in-house develop- improve the result of MT and these
ing requirements, which our project is ment to find out whether and how improvements would impact not only
trying to address at the same time. the methods that were successful the current document, but would im-
in research experiments can be used mediately flow back into the MT sys-
In order to be useful in many circum- without making our system less ro-
stances, the translation engines need bust or slower.
to cover a broad set of languages,
including very complex ones, as well There are also some interesting en-
as many document types and subject gineering challenges we need to ad- ‘ One of the most
domains. A big challenge here is to
find and manage suitable training
dress. Current statistical translation
methods need to explore a large
important applications
data from which the relevant lexical space (think millions!) of configura- of MT@EC is to give
material can be extracted. tions for a single sentence before
deciding which possible result would
translators in the
It should furthermore be possible to score best according to the statisti- European institutions
build MT engines for specific applica- cal models. Even on our modern and
tion domains where certain terminol- very fast computers, this computa-
another useful tool that
ogy needs to be used. The adaptation tion may take several seconds per will complement the
to such specific domains needs to be sentence, and hence the translation
based on suitable samples of trans- of a long document may need to be
translation memory
lated documents, without expensive distributed over multiple servers to technology provided by
manual intervention. reduce waiting times for users. We
also need to develop techniques to Euramis.’
Obviously, the linguistic quality of reconcile the need to deliver trans-
our system needs to be further en- lations of small snippets of text as
hanced, especially for target lan- quickly as possible while simulta-
guages with flexible word order and neously processing batches of long tem so that it can learn from its us-
rich morphology. A number of useful documents. ers and improve automatically. Once
techniques have been developed in such an automatic feed-back cycle is
recent or ongoing research in com- Last, but not least, one of the most implemented, the fact that the sys-
putational linguistics, and our aim is important applications of MT@EC tem will be used by thousands of
to integrate these methods into the is to give translators in the Euro-
highly skilled linguistic experts for all
algorithms used by MT@EC, as soon pean institutions another useful tool
the official EU languages on a daily
as their value in our context can be that will complement the transla-
established. tion memory technology provided by basis would open up quite unique
Euramis. This requires a seamless possibilities for our project. Perhaps
Possible candidates for such enhance- integration of MT into the comput- then the old dream of overcoming
ments include syntactic analysis of er-aided translation tools. As Prof. the language barriers with the help
the source text (quite a challenge in Philipp Koehn describes in his con- of machines will finally come true. 

1 Andreas Eisele is a computational linguist who has been working on natural language processing and MT since his studies in computer
science in 1983. Before joining the DGT, he was senior researcher at the DFKI (Deutsche Forschungszentrum für Künstliche Intelligenz)
and Saarland University where he has been in charge of the EU projects Euromatrix and EuromatrixPlus that provided funding for the
development of Moses and statistical and hybrid machine translation between all European Languages. Since October 2010 he is leading
the team that builds the MT engines within the MT@EC project at DGT.

01/2013 #6 13
A BOUT Machine translation

Working with Translators


by Markus Foti1, DGT’s MT Quality Manager

If you have the right ingredients available - computing power, machine


translation software, and, especially, loads and loads of data - setting up
a bare-bones machine translation system is not overwhelmingly difficult.
Given enough data to feed into the open-source (so freely available and
free-of-cost) MOSES machine translation engine, you can have a basic
system up and running quite quickly.

T
hat’s one thing we are not When the MT@EC project was
short of at DGT, data. Euramis launched, volunteers from each lan-
has been growing for close guage department were asked to
to 10 years now, and even come forward to set up a Machine
though it wasn’t originally intended Translation User Group (MTUG) which
for building MT engines, it stores our meets to discuss the direction of the
translations in an ideal format for project and pass information to and
that. from translators.

The devil is in the details. Data alone After the first “baseline engines” were
doesn’t mean the output is going to created, through the MTUG, transla-
be very good. We need to work with tors from the language departments
people to point out the problems and working out of English were asked
refine the systems. And highly-skilled to assess them. 10 departments felt things such as excess spaces, wrong
people is another thing DGT has in the output in their languages was quotation marks (“ ” should be « » in
abundance! already good enough to be used, but French and „ “ in German), corrupted
that left 12 where a lot of work nee- characters, even, in the cases of Por-
ded to be done. And we are not naïve tuguese and Maltese, old spellings
enough to claim that “good enough” mixed with new official forms.
means there is no room for impro-
‘The devil is in the vement! Such problems may not be transla-
tion, strictly speaking, but they have
details. Data alone It’s no secret that the state of the a serious impact on how good the
doesn’t mean the output technology is such that you can-
not expect perfect translations in
translations are judged to be. So we
asked the language departments to
is going to be very good. terms of grammar, terminology or identify such problems and have in-
even readability. Work on improving corporated 35 changes in the pre or
We need to work with these is on-going, and for specifics post-processing to deliver cleaner
people to point out the we look to those who speak the lan- machine translations. These fixes
guages. But to start, there are also should be included in the 3rd gen-
problems and refine the simpler problems that can be tackled eration of engines, which are cur-
systems.’ more directly. For example, transla-
tors were expressing their frustra-
rently being built. We hope that the
improvement will be immediately
tion with having to regularly correct apparent!

14 01/2013 #6
A BOUT Machine translation
The rise of machine translation as guages have complex grammatical
a translation aid also means that structures which need additional ef-
working methods will evolve, and we forts. ‘If MT@EC becomes a
want to hear about how translators standard tool to help
are changing their day-to-day work- Most recently we have been working
ing methods. Post-editing is what on Finnish, one of the most challeng- DGT handle its steadily
comes to mind, but that overlooks
Euramis and translation memories,
ing languages for statistical machine increasing workload
translation to deal with because of
which have in no way been super- its heavy agglutination. Two sepa- and a reliable source
seded. To try to find out the different
ways in which translators are using
rate approaches were taken to im- for the EU and national
prove translation from English into
MT, best practice workshops were Finnish: some grammatical analysis, institutions, we will have
held in autumn 2012 . Approaches
varied from the standard of using
and stemming vocabulary and add- achieved our goal.’
ing endings through post-processing.
MT within Translator’s Workbench, to In the opposite direction, some part-
keeping a print-out of the full ma- of-speech knowledge was applied.
chine translated version on the side
to help with difficult bits while keep- As we move towards the official
Automatic metrics indicated a slight
ing easier passages free of MT influ- target date for release of MT@EC,
improvement in all three cases, but
ence. we will be looking to translators to
these experiments needed to be
assessed by people. The MT team gauge how the translations provided
The biggest change, though, was not
doesn’t speak all of the EU languag- by the system are progressing in
in techniques so much as in mindset
es, so Finnish and English translators more detailed ways – a post-editing
- the suggestion that MT be used a
were asked to assess the results (two tool (called PET) is being tested as a
source of ideas, rather than as a text
improvements, and one regression, if way of measuring how many chang-
to be corrected.
you’re curious), using an online tool es translators make in terms of key-
Work on improving the engines that had previously been used to test strokes, time and “editing distance”.
doesn’t stop there. Many of our lan- the 2nd generation of engines. The data it records is very helpful,
but using the tool in our environment
is not as user-friendly as we would
like.

Indeed, the final goal is simply for


translators to continue working as
usual, while the MT team compares
the final translations to what the MT
system generated and aims to get
this difference to be as small as pos-
sible. That is never likely to be zero,
but if MT@EC becomes a standard
tool to help DGT handle its steadily
increasing workload and a reliable
source for the EU and national in-
stitutions, we will have achieved our
goal. Without the assistance and ex-
pertise offered by DGT’s translators,
On-line MT evaluation tool asking which Finnish rendering is better. this could never succeed! 

1 After 13 years in translation, Markus Foti is currently the MT Quality Officer for DGT. He is responsible for working with translators to get
feedback on MT and working with other EU institutions to test and share progress in MT.

01/2013 #6 15
A BOUT Machine translation

The insatiable appetite for data


by Szymon Klocek1, Machine Translation data administrator

As the old joke goes, the prerequisite for working in IT is to be lazy. This
should not be misconstrued as actual sloth: what this saying means is
that informatics should be about minimizing human effort and making
the machines do the heavy lifting instead. Employing computers means
trying to automate things as much as possible and this holds true for
Machine Translation.

T
here are basically two types is the quality and sheer amount of
of Machine Translation sys- the linguistic data at their core that
tems: rule-based and statis- determines whether the resulting
tical. The fundamental differ- translation will be useful. In the light
ence between the two approaches of the above, it is not surprising that
becomes apparent when we analyse statistical MT systems – including
how to get better results. In the case the one we are building at the IT Unit
of rule-based MT systems, transla- of the DGT - are often called «data-
tions get better when you employ driven».
more people to write even better
rules and dictionaries. In the case of
Why a data-centric MT model
statistical MT systems what you have The unique position of DGT
to do is get more data, or – to phrase The central argument in favour of
it more precisely – more aligned (par- the data-centric MT model, howev- The statistical approach to Machine
allel) previously translated texts that er, is that it is easier to double the Translation has an obvious bot-
can be chopped up into little pieces in amount of data fed into the ma- tleneck: you have to have a source
order to be reused as source material chines than to write twice as many from which to draw the data. Moreo-
for building new sentences. linguistic rules. Building an MT sys- ver, the data has to come from high
tem on the basis of linguistic rules quality human translations that have
is less efficient and more expensive been carefully aligned, i.e. put into
The way the computers crunch the
because it takes people, more specif- pairs of corresponding original and
numbers in order to come up with
ically, highly capable and highly paid translated segments.
the statistically most likely transla-
tion stays more or less the same. It linguists, computational linguists and
other language professionals. All over the world, limited availabil-
ity of this precious resource is the
But the whole issue cannot be re- main factor slowing down progress in
‘Informatics should duced to price and effectiveness MT. At the DGT we are in the unique
alone. Contrary to what many trans- position of being the biggest trans-
be about minimizing lators and even many researchers lation service in the world with the
human effort and might think, statistical MT systems
provide better quality than their rule-
world’s biggest archive of high qual-
ity human translations (the archive
making the machines based counterparts. One of many is known as Euramis, the Translation
reasons is that, as the number of lin- Memory database). The wealth of
do the heavy lifting guistic rules builds up, they start to linguistic resources at our disposal
instead.’ interfere with one another in often and its quality is the cornerstone of
unpredictable ways. our Machine Translation system.

16 01/2013 #6
A BOUT Machine translation
Even more to our advantage is that, challenges is weeding out bad data,
every day, thousands of DGT transla- which in the context of Euramis are ‘The wealth of linguistic
tors are producing even more texts usually mistagged segments, i.e.
we can use. All we have to do is to segments labelled as belonging to
resources at our
acquire the fruits of their labour, one language, whereas in reality they disposal and its quality
store them and convert them for our are written in another. The abun-
purposes. dance of available data allows us to is the cornerstone
have a relatively restrictive approach of our Machine
and aggressively filter our linguistic
But how does it work exactly? resources in order to improve their Translation system:
The easiest way to imagine prepara- quality, and thus the quality of the every day, thousands
tion of data for the purposes of MT resulting translation.
is to think of a massive and sophis-
of DGT translators are
ticated search-and-replace operation When it comes to storage, what bet- producing even more
performed on text files. The files are ter solution can there be than the
huge. They contain tens of gigabytes one that has already proven so ef- texts we can use.’
of data and it often takes hours, if fective? For the purposes of keeping
not days, to process them. Over the data for MT we have built a replica
last year the Machine Translation of Euramis. It can be synchronized own operations do not affect Euramis
data team has optimized acquir- overnight with its older brother, it is and thus do not impact the everyday
ing data from Euramis, reducing the customized to our needs, and it is work of DGT translators.
time needed to extract and convert much faster, as it is not slowed down
Most importantly, however, our
them from almost a week to less by day-to-day requests for Transla-
Euramis clone offers flexibility and
than 24 hours. One of the biggest tion Memory contents. This way our
has room to grow. It was designed to
contain additional categories of in-
formation and data from new sourc-
es outside the DGT, the Commission,
perhaps even outside the European
institutions.

In the near future


We believe that being able to store
external data collections will prove
crucial in the near future. The reason
is simple: we want to make our Ma-
chine Translation system even better,
and we need more linguistic data.

We are in the process of asking oth-


er European institutions for permis-
sion to use their translation memory
archives. We are working on ways to
draw data from the outside world. If
there is one thing you should know
about the Machine Translation data
‘A glance into the precious resource upon which statistical MT systems are built: team, it is that we are always hungry
the multilingual data’. for more parallel translations. 

1 Szymon Klocek, formerly with the Polish Translation Department, currently works at the IT Unit of the DGT as Machine Translation data
administrator. He is responsible for acquiring, storing and processing data for the the Commission’s Machine Translation service; his
responsibilities also include managing the MT data team.

01/2013 #6 17
VOICES FROM OUTSIDE Machine translation

Predicting Translation
by Professor Philipp Koehn1, University of Edinburgh

Translation is unpredictable. If you give a text in a foreign language to a


group of translators, you can be sure that each will come up with a different
translation. That is true even if the text consists only of a single sentence with
a few words. So how can computers — which are known for solving exact
problems with exact solutions — expect to come up with correct translations?

T
he first step is to recognize but also a large tail of other possible
the variability and uncer- translations — say, when the word is
tainty that underlies the pro- used as part of a metaphor or in a
cess of mapping text in one very specific context.
language into another. Translating
requires balancing the dual needs of Statistical machine translation mod-
preserving the text’s meaning (with els try to estimate the probabilities
most of its nuances) and also pro- of different translations of an input
ducing fluent and idiomatic output text, and predict the most probable
that is pleasant to read. one. How do we get all these choices
and probabilities? We get a pretty
Being explicit about uncertainty leads good idea by analyzing millions of
to probabilistic models, thus conceiv- words of translated texts, generated
ing a world where everything is pos- by professional translators over the
sible, but some things are just more last decades.
likely than others. Considering only
a single word, it is easy to see that Bigger, Better, Faster, More networks, Bayesian inference are
it has some probable translations, the buzz words around the cafeteria
maybe even a clearly dominant one, Statistical models have become table of the computer science re-
much more sophisticated in recent searchers who dared to venture into
years. Should we rather reuse large something as fluffy as language.
fragments of prior translations or the
Researchers in the current EU BRIDGE
reliable frequent short chunks? How
project develop algorithms that are
‘Translating requires much should we favor knowledge
able to exploit the growing treas-
extracted from texts that are similar
balancing the dual needs in domain, topic, and style? What
ure chest of past translations more
efficiently. They aim not only at the
of preserving the text’s is the proper tradeoff between
translation of text, but also of spoken
adequacy (matching the source text’s
meaning (with most of meaning) and fluency (producing
input. They scale up their methods to
utilize billions of words of translated
readable output)? How do we set all
its nuances) and also the probability values anyway?
texts, more than any human could
read in a lifetime. 
producing fluent and Machine translation has become a
id¬iomatic output that is playground for computer scientists More Linguistics
who use it to explore novel methods
pleasant to read.’ in machine learning. Big data, A sentence is not just a string of
structured prediction, deep belief words. There is a lot of structure. De-

18 01/2013 #6
VOICES FROM OUTSIDE Machine translation
veloping machine translation mod- Two ongoing EU-funded research
els that are properly aware of nouns projects, CASMACAT and MATECAT,
and adjectives and subject-verb have set out to develop better
agreement and maybe even se- computer-aided machine trans-
mantic preferences is an open lation tools. One main idea is
challenge to the research to develop machine transla-
community. Unfortunately, tion systems that quickly
adding such structure to adapt to a translator’s
statistical models — a task, learn from their
form of hybrid machine corrections, and thus
translation — makes never make the same
them much more com- mistake twice. Another
plex. The search for idea is to change the
the best translation be- way machine transla-
comes more error-prone tion output is presented,
and slower. by highlighting the more
unreliable parts, show-
The basic current models, ing alternative translations,
which just map sequences making it more transparent
of words without any regard to how the machine produced its
their syntactic nature, work sur- output, or providing a more inter-
prisingly well for language pairs such active authoring and editing process.
as French-English or Spanish- Eng-
lish. But translating between syntac-
tically divergent languages such as These days, we have the onward
German-English or Czech-English to the availability of the open source march of machine translation re-
seems to require deeper linguis- Moses machine translation system. search and increasing uptake of the
tic analysis. Our experience shows The Moses system has been devel- technology in industry and govern-
significantly lower quality for these oped in Europe with the aid from EU ment. While it would be foolish to
harder language pairs, even more so funding such as the ongoing Moses- predict perfection, we will see better
when translating from English. The Core project. and more useful translations. Most
good news is that we are already probably. 
seeing some success from syntax- But the costly task of producing
based statistical translation mod- translations that are of publishable
els and there are a lot of promising quality still requires professional hu-
ideas. man translators for the foreseeable
future. A key question for machine ‘The costly task of
translation researchers is: How can
Aiding Professional Translators vast statistical models be used to producing translations
aid these translators in their work?
The most popular application of ma- The current mode of simply present-
that are of publishable
chine translation is the translation ing a professional translator with quality still requires
of web pages and email messages the raw output of some machine
to gain some understanding of what translation engine and asking them professional human
a foreign text means, thanks to the
free service of Google Translate. This
to fix it up, is rather crude and un-
surprisingly encounters resistance
translators for the
use of machine translation has been from professional translators, even foreseeable future.’
integrated by many companies due if it often speeds up their work.

1 Philipp Koehn is a professor at the School of Informatics of the University of Edinburgh. He wrote the textbook on statistical machine
translation, published over 80 academic papers, and co-ordinated several internationally funded research projects. He leads the development
of the open source machine translation system Moses.

01/2013 #6 19
VOICES FROM OUTSIDE Machine translation

Choose your own translation future


by Jaap van der Meer, Director of TAUS1 (Translation Automation User Society)

Technology arrived late in the translation services sector. But it is here, and
it is bound to change everything. In the short-term future, everyone in the
world will be able to speak his or her own language and will be understood.
We are entering the Convergence era: translation will be a utility embedded
in every app, device and screen. Businesses will prosper by finding new
customers in new markets. Governments and citizens will connect and
communicate easily. Consumers will become world-wise, communicating
as if language barriers never existed. It will open doors and break down
barriers and it will give a boost to the translation industry, improve the
technology and fill the gaps in global communications.

I
s this picture too rosy? At TAUS, we often bad and laughable, but people like
believe in the power of translation the fact that it is under their control and
data. Translation data is the fuel of in real-time.
machine translation technology. Data
powers the engines. The engines may nev-
er emulate the human language compe- Entering the Convergence era
tence, but they will be good enough to help As the diagram below shows, the transla-
us converse in languages we never spoke tion industry has undergone a paradigm
before or will ever speak. shift every decade since 1980, but none is As we move into the Integration era—en-
as big as the Convergence era. terprises and institutions are busy releas-
This vision frightens many insiders in the ing the translation function from its iso-
translation industry. Machine translation The volume of information content is ex- lated position. The focus is on integrating
was experimented, tested for a long time, ploding for users. While we make this jour- translation in enterprise applications such
but it never passed the test of usefulness. ney from the 20th century export mentality as content management systems. The
Automation of translation was believed to to the 21st century’s open global society, pressure will keep building to translate
be a utopia, at least until the vox populi the mix of expanding language pairs will more content faster or even translate in
revolution spoke and millions of people mean that a human-driven translation real-time. This opens up tremendous op-
started clicking on the automatic translate process alone will not suffice in this new portunities for innovators to seize the con-
button in their search pages. The quality is era. vergence instrument and offer new solu-
tions. (See the Agents of Change: Insiders
and Invaders videos.)

Two types of convergence

There are two forms of convergence: pure


technology convergence and a business
model convergence. Technology conver-
gence means combining two or more tech-
nologies to create a new compelling prod-
uct or service. The best example of this is
the mobile phone. It has now become a
life-saving and indispensable kind of ex-
tension to our body. Business model con-

20 01/2013 #6
VOICES FROM OUTSIDE Machine translation
vergence means merging market offerings Crowd, Cloud and Big Data Translation data fuels the engines and Big
to create a unique new service offer. In the Data techniques will cause breakthroughs
real world, the emergence of supermarkets Other trends that play along in the Conver- which bring us to a point where we can
was a form of convergence. In the modern gence era are Crowd, Cloud and Big Data. say that computers can sometimes—not
world, the combined offering of coffee and The Crowd is part and parcel of business always and not in all circumstances—beat
music by Starbucks is a good example of model convergence. The Cloud is the natu- humans in language processing and trans-
convergence. In the digital world business ral infrastructure environment to connect lation. So, where does that leave the entre-
model, convergence often has a give-and- with the Crowd and to reach the required preneurs in the translation industry?
take dimension: the user becomes part of scalability and efficiency. But behind the
the supply chain. Innovative examples of Crowd and the Cloud is the secret power
business model convergence are location- of Big Data—the biggest trend of all. The ‘ Machine translation
based apps. The user—often without know- computer can decipher ambiguity; under-
ing it—transmits his or her exact location stand jokes and metaphors, as long as it is
technology will mature quickly
and receives perfectly matching offers fed enough data. and take over as the primary
from a shop or restaurant. choice of tools to be used by the
The importance of Big Data should not be
underestimated. Big Data will push the translation service sector.’
Convergence in the translation performance of automated translation
industry forward and address challenges in many
different areas. The computer will be able Planning for an uncertain
Convergence in the translation industry to run automatic semantic clustering and future
has already started across technologies genre identification processes. This is vital
and business models. We have seen the for the continuous improvement and cus- In 2010 TAUS organized a series of brain-
first demonstrations of the integration of tomization of machine translation technol- storming sessions following the scenario-
speech and machine translation technol- ogy. The computer will also be able to do based planning methodology with trans-
ogy. Using tiny keys on your mobile will no terminology mining much better if it gets lation buyer and provider executives in
longer be necessary: speech input in one more data. Copenhagen and Portland (OR) with the
language, and speech output in another aim of planning for an uncertain future.
language. The best example of business The participants were uncertain about the
model convergence in the translation in- Translation support matching answers to three questions:
dustry is the combination of automatic the new content mix 1. Will machine translation take a big role
translation with search. The owners of
in the translation industry or not?
search engines decided to extend the In the Convergence era, the mix of content
service to professional translators. The to be translated is shifting further away 2. Do we have to fear that translation will
business model convergence went a step from documents and software releases become a free-for-all service?
further: for sharing the translation data to bits and pieces of text, voice and video 3. Will the closed (competitive) or the open
(translation memories) the industry pro- published on multiple screens. The end- (collaborative) business models prevail?
fessionals received customized (improved user and citizen is becoming more in con-
quality) machine translations. trol. They will drive a continuous stream Two of the three questions have been
of translation of official, social, shared, answered in the last couple of years. MT
In the next ten years we will see numer- earned and also private information. will play a major role in the translation in-
ous new examples of converging business Translation memory software fits very well dustry. Translation will not be free. Users
models and technologies. Sometimes this with the updates of static documentation always pay for translation, somewhere.
convergence will address just one language pushed by publishers but it will not be very The third question is still haunting us. We
pair, domain or market niche. Sometimes it helpful when translating dynamic content have not seen clear indicators yet whether
will be applicable on a much wider scale. pulled by users. Machine translation tech- closed models or open models will prevail
Together, this convergence is changing the nology will mature quickly and take over
translation industry completely. Transla- as the primary choice of tools to be used References, further reading and the full ver-
tion will quickly become a utility embed- by the translation service sector. The self- sion of this article (see: http://www.transla-
ded in everything we do, as ubiquitous as service real-time training of MT engines tionautomation.com/articles/choose-your-
electricity and the Internet. may be applied to every single job. own-translation-future).

1 http://www.translationautomation.com/

01/2013 #6 21
VOICES FROM OUTSIDE Machine translation

TAPTA4UN: machine translation collaboration between the


United Nations and the World Intellectual Property Organization
by Cecilia Elizalde (Senior Reviser, Spanish Translation Service, currently Project Manager of gText, and Coordinator
of the Technology Advisory Group of the Documentation Division, José García-Verdugo (Reviser and IT Focal Point,
Spanish Translation Service), Ana Larrea (Senior Reviser and Training Officer, Spanish Translation Service) and José María
Perazzo (Senior Reviser, Spanish Translation Service) of the Documentation Division, Department for General Assembly
and Conference Management, of the United Nations Secretariat

TAPTA4UN is the name of a successful machine translation (MT)


collaboration project between the World Intellectual Property Organization
(WIPO) in Geneva and the United Nations Headquarters (UNHQ) in
New York. In less than six months, the seemingly unfeasible idea of a
customized MT system was made reality by a small team and is now an
everyday tool for dozens of translators.

T
he origin of TAPTA4UN goes the compulsory translation of their
back to the Association for In- patents into English and French.
formation Management (ASLIB)
2011 Conference in London, where
Following this contact, an agreement
Cecilia Elizalde, representing the
was reached by the two parties. The
Documentation Division of UNHQ, at-
UNHQ Spanish Translation Service
tended a presentation by Christophe
would provide a bilingual (English-
Mazenc and Bruno Pouliquen, of
Spanish) corpus comprising 11 years
WIPO, on an MT system for patents, of cost and quality. The system was
of United Nations documentation,
namely the Translation Assistant for successfully set up in less than two
and the development team at WIPO
Patent Titles and Abstracts (TAPTA). months with no financial expense
would use those materials to train
The presentation described a statisti- to either organization. The perfor-
the existing TAPTA system in order
cal machine translation (SMT) system mance of the resulting translator,
to set up a free-text web-based
using open-source Moses technology called TAPTA4UN, was deemed to be
machine translation system for the
and an ad hoc, Java-based web user highly satisfactory (using both auto-
English-Spanish language pair. The
interface. WIPO clients were offered matic and human evaluation). High
bilingual corpus was prepared and
this method in order to ‘accelerate’ BLEU scores were consistently achie-
pre-processed by the Spanish Trans-
lation Service using available HTML ved using different samples. A blind
bitexts. The corpus had to undergo human evaluation involving 1,000
further processing before being fed segments and three reviewers was
into the Moses system. performed, during which the revisers
‘In less than six months, decided to maintain most of the au-
the seemingly unfeasible A number of UNHQ translators had
tomated content in the translations
(in a 1-5 scale, the adequacy score
idea of a customized MT been requesting the purchase of a
was slightly above 4, and the fluency
machine translation system for some
system was made reality by time. For this reason, one of the goals
score was 3.94).
a small team and is now an of the experiment was to see how an
everyday tool for dozens of ad hoc system would compare to the Most users of the English-Spanish
readily available MT alternatives in pair, particularly senior revisers, im-
translators.’ the market (namely Google Trans- mediately showed great enthusiasm
late and Bing Translator) in terms for TAPTA4UN. Senior staff were

22 01/2013 #6
VOICES FROM OUTSIDE Machine translation
impressed by the ability of the sys- IT staff at UNHQ have been trained formance of this software. There are
tem to match existing terminology. by the WIPO development team in also plans to create a full-document
In specific contexts, such as Gene- setup, training, maintenance and translation interface for TAPTA4UN.
ral Assembly resolutions and UN troubleshooting issues. There is an
budget documentation, TAPTA4UN ongoing and close collaboration
Since both the gist translation inter-
results were found to be particularly between them in order to fine-tune
face and the plug-in are fully web-
accurate. That, in turn, was consi- the existing systems according to
based, translators are able to use
dered a big advantage because it the feedback provided by users.
the system from any place in the
involved great savings in typing and Improvements in the quality of the
world, as long as they have an Inter-
reviewing. Finally, translations from output are usually slow, since they
net connection. UN offices in Geneva,
the new system were generally dee- relate directly to the active partici-
Nairobi, Santiago and Vienna have
med to be superior to those provided pation of users, and are highly lan-
been using TAPTA4UN seamlessly for
by Google Translate or Bing Trans- guage-dependent.
several months.
lator. Since those advantages were
pointed out by a significant number
Since Moses-based SMT language
of senior staff, the project quickly
models are essentially a static, im-
gained momentum within the UNHQ
mutable “picture” of the bilingual
translation services.
corpus, there is a need to periodically ‘Since both the gist
update the whole system to incorpo-
rate newly translated materials. It is translation interface
Building upon that success, the De-
partment for General Assembly and
expected that language models will and the plug-in are fully
be updated twice a year. The IT team
Conference Management (DGACM)
at UNHQ is working on a streamli- web-based, translators
of UNHQ decided to continue with
the experiment. In October 2012,
ned and efficient method for those are able to use the
updates.
WIPO and UNHQ collaborated to system from any place
implement the system in production
in cloud servers for the remaining Besides the traditional gist trans- in the world, as long as
language pairs (English to Arabic,
Chinese, French and Russian). They
lation web-based interface, TAP-
TA4UN has been integrated in the
they have an Internet
also transferred the source code and CAT workflow by means of a plug- connection.’
necessary knowledge to a team of in for SDL Trados Studio 2011,
IT specialists and linguists in New the standard translation memory
York during a dedicated one-week software used at UNHQ. That plug-
workshop. As the testing equipment in has been built using SDL’s appli-
in Geneva was no longer available, cation programming interface (API) The Department for General Assem-
the Department authorized the pur- by Michal Ziemski, a developer at bly and Conference Management
chase of cloud space for the training UNHQ who is currently in charge (DGACM) of the UN has recently
and hosting of these new language of the general setup and mainte- launched a global project called
models. nance of TAPTA4UN. Once instal- gText to implement a set of inte-
led, the plug-in allows users to add grated language tools in four duty
TAPTA4UN as a “language provider” stations. Those tools are being deve-
Preparations are currently under to their current translation project, loped at the United Nations Office in
way to evaluate the resulting cloud along with their translation memo- Vienna. The project is now in its ini-
implementations for Arabic, Chinese, ries and terminology databases. The tial stage and TAPTA4UN is already
French and Russian, which are plug-in automatically chooses the being integrated with Mercury, the
already up and running. Preliminary language combination and shows centrepiece of gText. Mercury will be
testing by translators and revisers MT output when the translation the end-user part of gText and will
shows mixed results, from good qua- memory contains no matches with integrate automated referencing, bi-
lity for Chinese to unsatisfactory per- scores above 80%. The TAPTA4UN texts, full-text search, the UN termi-
formance in Russian. Those results plug-in for Trados Studio is currently nology portal, CAT and MT, in a web-
accurately reflect known linguistic in active development and users are based interface that is user friendly
issues related to those target lan- being requested to submit their ex- and accessible to all UN translators,
guages. periences and comments on the per- including contractors. 

01/2013 #6 23
I NTERVIEW Machine translation

Interview with Rytis Martikonis


Director General of DGT

Machine translation: a tool to embrace and master

DGT recently organised a conference on machine


translation — MT@WORK — the first of its kind. What
was the aim of the conference and why organise it now?
Machine translation, our most advanced working tool, has
become an essential part of our everyday life. This is why we
had been thinking of organising an event dedicated to the
topic, and no better time to do it than now as in 2012 we
started testing the machine translation system we are devel-
oping in DGT. In 2013 it will be extended to the rest of the
Commission. It was a great pleasure to see the idea of an event
dedicated to this project take shape. The conference brought
people from the translation services of the EU institutions
together to discuss the challenges related to machine transla-
tion and the possibilities and opportunities it offers. The aim
was not to preach to people about machine translation, rather
it was a meeting organised by practitioners for practitioners.
The European Commission has invested a lot in this tech-
This was the first of three conferences we intend to host each
nology, notably in its programmes on research, technological
year from now on. In 2013, we will explore the needs of other
development and innovation. In 2010, DGT decided to de-
users (non-translators) in the Commission and other EU in-
velop a data-based system instead of a rule-based one in a bid
stitutions. In 2014, we will focus on the needs of the national
to keep pace with technological progress and cater for more
authorities of EU Member States, to show the importance
languages.
DGT gives to machine translation and cooperation.
Our MT@EC project is supported as a policy objective of the
We can indeed say that DGT and the European European Commission and financed by public funds. We are
Commission attach a lot of importance to machine working on it in cooperation with other Commission Direc-
translation. Could you give any particular reason for this? torates-General, namely DIGIT and CNECT. It is one of our
flagship initiatives, linked to the objectives of the European
As the EU’s largest translation service, DGT is a hub of expe-
Digital Agenda, which inspires initiatives such as e-health,
rience, data and technology, a giant laboratory for language
e-justice and the Internal Market Information system. Such
technologies. Euramis, the translation memory database
new initiatives could create a substantial demand for transla-
which stores the data used to build our statistical machine
tion that machine translation might help meet. That is why
translation engines, was designed in the 1990s and now con-
we are investing in the system in cooperation with other DGs.
tains about half a billion sentences.

Since your appointment as Director-General of DGT you


have actively supported machine translation. What do
‘DGT is a hub of experience, data you see as its added value for DGT internally?
and technology, a giant laboratory Firstly, developing our own machine translation system is a
for language technologies.’ way of exploiting our linguistic expertise and of building on
our corpora. Secondly, it ensures that we remain independent

24 01/2013 #6
I NTERVIEW Machine translation
and retain ownership of intellectual property, thus helping us how we manage it and how each translator builds machine
to avoid or mitigate the problems we had in the past. translation into his or her working method. We have an MT
User Group, chaired by the recently appointed MT Adviser,
Thirdly, we can see now that machine translation is a fact of
Daniel Kluvanec, which maps best practices and evaluates the
everyday life in the translation world, including the institu-
process.
tional translation world. In my view, machine translation is
a tool for translators to embrace, to use and to master. As
The new CAT tool will bring about a big change this year.
experts, we in DGT are well placed to do that — to show the
From what I understand, machine translation will be more
possibilities it offers, but also to show its limitations. We need
user-friendly as a result, because it will be built into the new
to show that the translation services of the European institu-
tool.
tions are able to keep up with progress in this area.
However, let me stress that I believe translation is and will A recent translators’ satisfaction survey on the use
remain a human activity. Machines will not be translating for of machine translation in DGT showed a wide range
machines. They will be there to help us. of perceptions: some find it useful, others don’t. What
Last but not least, in the present context of staff cuts and re- could explain this diversity of opinions?
ductions in the EU’s administrative expenditure, along with
the expected increase in translation demand, we are obliged to It depends on the complexity of the language. For languages
do better with less. Machine translation can help us cope with with relatively similar structures, such as English, French,
resource constraints and meet new types of demand, such as Spanish, Portuguese, Italian and even Bulgarian, machine
the kinds of translation required by the new services linked to translation produces good results. However, there is still a lot
the Digital Agenda I mentioned earlier. to do for Estonian, Hungarian and German. Sometimes it is
a generational issue: older translators might prefer different
The MT@Work Conference was an interinstitutional working methods.
event. What is the role of interinstitutional
cooperation in this area? What steps do you plan on taking in the near future?
Machine translation has already been defined as one of the
priorities for cooperation by the Interinstitutional Commit- We want the new MT@EC system to cover all the official EU
tee on Translation and Interpretation (ICTI), where nine EU languages at least. We want it to be a system the EU institu-
institutions come together to tackle issues of common inter- tions will fully own, and one that will get the most out of
est. One purpose of the MT@Work Conference was therefore internal resources and of course ensure confidentiality.
to provide a forum where translators from the European insti-
tutions could meet and discuss how machine translation can Since July 2011, our translators and IT staff, staff in other Di-
offer added value for their work. rectorates-General (MOVE, EAC, RTD, ELARG, HR, SG,
COMP, CNECT, MARKT, OP, TAXUD and JRC) and in
And what is DGT’s role in this area at the international some other EU institutions have been testing the new system.
level? We plan to make it available to all the Commission DGs by
mid-2013. Although DGT’s needs come first, we are working
One reason for our international presence in this area is the on developing customised MT engines for other Commission
dialogue with other international organisations that have sim- DGs and EU institutions. We are also thinking of making it
ilar needs and constraints. This was reflected at the conference available to the public bodies of EU Member States.
in a very good presentation given by Spanish translators from
the United Nations’ translation service in New York. We are To sum up, DGT is more than willing to cooperate with
also present at other fora, such as JIAMCATT (International other organisations or institutions and is keen to do so at all
Annual Meeting on Computer-Assisted Translation and Ter- levels. 
minology). Josep Bonet, Head of our Informatics Unit, is
Chair of its Liaison Committee this year.

Coming back to DGT, what use is actually made of ‘I believe translation is and will
machine translation in-house?
remain a human activity. Machines
We now have 52 language pairs. A recent survey showed that
about one third of DGT translators use machine translation
will not be translating for machines.
in some or all of their translation jobs. For me, this is also They will be there to help us.’
something to work on in the future: to see how we use it,

01/2013 #6 25
U SEFUL INFORMATION Machine translation

John Hutchins:
the recognised historian of Machine Translation
John Hutchins has been widely recognised as the historian of Machine Translation (MT) for dec-
ades and his website is a repository of both historical and up to date information on the evolution
of MT. It also offers a Compendium of Translation Software, a comprehensive list of current soft-
ware, commercial products and vendors. 

Research in the field of Language Technologies


The European Commission has been supporting research in the field of Language Technologies since the 1980s, in
particular through its Framework Programmes on Research and Technological Development and its Innovation Pro-
gramme1, as well as through initiatives of the Joint Research Centre2.
Also, some Member States support applications in this field, for example, Apertium, an open-source rule-based system
which is funded by the Spanish Government.

In addition, there are open-source applications, such as the Moses for Mere Mortals scripts, which make MT easier to
install and use and therefore more accessible to a larger number of users. 

1 http://cordis.europa.eu/fp7/ict/language-technologies/projects_en.html
2 http://langtech.jrc.it/JRC_Resources.html

HIGHLIGHTS
From the websites of some EU-funded MT research projects:

Machine translation is a complex field and presents substantial barriers for entry to potential researchers, and to poten-
tial users of the technologies. The principal aim of the MosesCore project is to reduce these barriers, making it easier to
join and participate in the MT research community, and easier to become an MT user. Through the coordination of freely
available shared software and data, and by organising appropriate networking events, the research and adoption of MT
in Europe will be stimulated. The result of the project will be the broader use of Machine Translation systems, better
quality of the Moses MT engine and the organisation of events and promotion of open-source software for machine
translation, notably the Moses statistical MT toolkit. 

Based on insights gained in the cognitive studies, the project will develop novel types of assistance to human translators (interactive
translation prediction, interactive editing, adaptive translation models) and integrate them into a new workbench, consisting of an editor,
a server, and analysis and visualization tools. The workbench will be designed in a modular fashion and can be combined with existing
computer aided translation tools. The outcome of the CASMACAT project will be made available as open-source software to industry,
academia, and to individual end-users. 

Recent achievements by the so-called statistical MT approach have raised new expectations in the translation industry.
So far, statistical MT has focused on providing ready-to-use translations, rather than outputs that minimize the effort of a
human translator. The MateCat project aims at pushing what can be considered the new frontier of CAT technology: how
to effectively integrate statistical MT within the translation workflow. All prototypes will be released as open-source. 

26 01/2013 #6
U SEFUL INFORMATION Machine translation

From websites of some other EU-funded research projects,


initiatives and networks:

Analysis and Evaluation of Comparable Corpora for Under-Resourced Areas of Machine Trans-
lation. It addresses the widely recognized bottleneck of insufficient parallel corpora for data-
driven MT systems.
Development of automatic transcription and translation technology that will permit the devel-
opment of innovative multimedia captioning and translation services of audiovisual documents
between European and non-European languages.
Aims at developing a common vision of the area and fostering a European strategy for con-
solidating the sector, thus enhancing competitiveness at EU level and worldwide. Through the
exploitation of new collaborative modalities as well as workshops and meetings, FLaReNet will
sustain international cooperation and (re)create a wide Language community.
Our mission is to increase confidence in machine translation. Therefore we make the online
translation service accessible on the internet for free in 24 hours a day.

With LetsMT you can easily build and run your own custom machine translation systems. Simply
upload your own corpora and/or choose to use any of the publicly available corpora. Train your
systems and use them for all your translation needs.
The Language Interoperability Portfolio Project is a collaborative project developing an open,
vendor-independent format that can be used by many different translation tools to package
translation materials.
Network of Excellence consisting of 57 research centres from 33 countries, is dedicated to
building the technological foundations of a multilingual European information society.
This initiative is concerned with standards and best practices that support the creation, localiza-
tion and use of multilingual web-based information. Through W3C workshops open to the public
and various communication channels, it spreads information about what standards and best
practices currently exist, and what gaps need to be filled.
Uses JRC developed technology to automatically generate daily news summaries, allowing us-
ers to see, namely: the major news stories (news clusters) in various languages for any specific
day and to compare how the same events have been reported in the media written in different
languages; the list of most mentioned names and find further automatically derived informa-
tion.
Open Data portals are one aspect in facilitating access to and re-use of public sector informa-
tion. Citizens and business sometimes find it difficult to identify what type of information exists
and by which public authority it is held. A number of countries, regions and municipalities have
therefore created portal websites on public data.
It is about the creation of a factory of Language Resources (LRs) in the form of a production
line that automates all of the steps involved in the acquisition, production, updating and main-
tenance of the LRs required by Machine Translation and by other Language Technologies.

The Patent Language Translation Online Project addresses the problem of the increased need
for translation given the current Intellectual Property (IP) landscape. Despite attempts to unify
the patent system in Europe, there is still significant translation burden when seeking patent
validation. Additionally, increased patenting activity across the world, particularly in Asia, means
that more and more prior art exists in a “foreign” language, creating a further need for transla-
tion by patent searchers and examiners.
By Maria José Machado, PT translator

01/2013 #6 27
Q UOTATION
COORDINATION
AND DRAFTING:
‘The future author is one who
Anabela Pereira
discovers that language, the
(+32) 229 65728 exploration and manipulation of
DGT.02 the resources of language, will
serve him in winning through
GRAPHIC, to his way.’
LAYOUT AND
TYPESETTER:
Thornton Wilder (1897–1975),
Philippe Marchetto U.S. novelist, dramatist. Interview in Writers at Work,
(+352) 4301 36337 First Series, ed. Malcolm Cowley (1958).

You might also like