TOPIC: YOUR IDEA ABOUT TRANSLATION FROM YOUR NATIVE
LANGUAGE Author: Carlos Erlan Olival Lima Student ID: M175114
Subject: Japanese Life Today
If we want to automatically analyze existent
human sentence translations, we need to use statistical machine translation for that.
There are several models that can be used
to translate a sentence from one language to another one. Then, how do we know if one model “works better” than another? One way to pit language models against each other is to gather up a bunch of previously unseen English test data, then ask: What is the model with a better probability?
One of the ways of get a better translation is
to place a binary tree diagram on top of the sentence showing the syntactic relationship between heads and modifiers, for example, subject/verb, adjective/noun, prepositional- phrase/verb-phrase, etc. The figure 2 shows an example of a binary tree diagram. Figure 2 – Binary tree diagram
In my case, I am Brazilian and my native
language is Portuguese, and one of the best methods to make the translation of the sentences in Portuguese into English it would be the EM algorithm. EM algorithm is an iterative method to find maximum likelihood or maximum estimates of parameters in statistical models.
EM algorithm stands for “estimation-
Figure 1 – Symbol of a binary tree diagram maximization.” In fact, some people claim it stands for “expectation-maximization.” This model involves latent variables, unknown parameters and know data observations.
We give a probability value for each
sentence, this way we can compute alignment probabilities for each pair of sentences. That means we can collect fractional counts. When we normalize the fractional counts, we have a revised set of parameter values. Hopefully these will be better, as they take into account correlation data in the bilingual corpus. Armed with these better parameter values, we can again compute (new) alignment probabilities. And from these, we can get a set of even- more-revised parameter values. If we do this over and over, we may converge on some good values.
Let's see how this works in practice. We'll
illustrate things with a very simplified scenario. Our corpus will contain two sentence pairs. The first sentence pair is bc/xy (a two-word English sentence “b c” paired with a two-word Portuguese sentence “x y”). The second sentence pair is b/y (each sentence contains only one word).