You are on page 1of 2

TOPIC: YOUR IDEA ABOUT TRANSLATION FROM YOUR NATIVE

LANGUAGE
Author: Carlos Erlan Olival Lima Student ID: M175114

Subject: Japanese Life Today

If we want to automatically analyze existent


human sentence translations, we need to
use statistical machine translation for that.

There are several models that can be used


to translate a sentence from one language
to another one. Then, how do we know if
one model “works better” than another?
One way to pit language models against
each other is to gather up a bunch of
previously unseen English test data, then
ask: What is the model with a better
probability?

One of the ways of get a better translation is


to place a binary tree diagram on top of the
sentence showing the syntactic relationship
between heads and modifiers, for example,
subject/verb, adjective/noun, prepositional-
phrase/verb-phrase, etc. The figure 2 shows
an example of a binary tree diagram. Figure 2 – Binary tree diagram

In my case, I am Brazilian and my native


language is Portuguese, and one of the best
methods to make the translation of the
sentences in Portuguese into English it
would be the EM algorithm. EM algorithm is
an iterative method to find maximum
likelihood or maximum estimates of
parameters in statistical models.

EM algorithm stands for “estimation-


Figure 1 – Symbol of a binary tree diagram maximization.” In fact, some people claim it
stands for “expectation-maximization.” This
model involves latent variables, unknown
parameters and know data observations.

We give a probability value for each


sentence, this way we can compute
alignment probabilities for each pair of
sentences. That means we can collect
fractional counts. When we normalize the
fractional counts, we have a revised set of
parameter values. Hopefully these will be
better, as they take into account correlation
data in the bilingual corpus. Armed with
these better parameter values, we can
again compute (new) alignment probabilities.
And from these, we can get a set of even-
more-revised parameter values. If we do
this over and over, we may converge on
some good values.

Let's see how this works in practice. We'll


illustrate things with a very simplified
scenario. Our corpus will contain two
sentence pairs. The first sentence pair is
bc/xy (a two-word English sentence “b c”
paired with a two-word Portuguese
sentence “x y”). The second sentence pair
is b/y (each sentence contains only one
word).

In that case, the sentence pair bc/xy has


only two alignments:

You might also like