You are on page 1of 23

Spotting Translationese: An Empirical Approach

Pau Gimnez Flores Supervisors: Carme Colominas and Toni Badia Universitat Pompeu Fabra

Content
1. 2. 3. 4. 5. 6. 7. 8. 9. Translationese Goals Translation Universals Empirical Methods in Translation Studies Theoretical Framework Hypotheses Methodology Working Plan Commented Bibliography

Translationese
A product of the incompetence of the translator (translation errors):
unusual distribution of features is clearly a result of the translators inexperience or lack of competence in the target language (Baker, 1998: 248)

Translation-specific language or dialect, without any negative connotations (translation universals):


Third code which arises out of the bilateral consideration of the matrix and target codes: it is, in a sense, a sub-code of each of the codes involved (Frawley, 1984: 168).

Translationese: set of linguistic features of translated texts which are different both from the source language and the target language (Gellerstam, 1986).

Goals
Main goal: validating the hypothesis of translationese empirically.
Capturing the linguistic properties of translationese in observable and refutable facts. Detecting and classifying automatically translated vs. non-translated texts based on its syntactic and lexical properties.

Translation Universals (1)


Features which typically occur in translated text rather than original utterances and which are not the result of interference from specific linguistic systems (Baker, 1993: 243)

Translation Universals (2)


Explicitation or explicitness: translations tend to be more explicit than source texts Repetition of redundant grammatical items (i.e. prepositions) Optional that-connective is more frequent in reported speech in translated English (Olohan and Baker, 2000).

Translation Universals (3)


Simplification: the language of translations is assumed to be lexically and syntactically simpler than that of non-translated target language texts.
Narrower range of vocabulary: lower type-token ratio. Lower level of information load: lower lexical density

Translations Universals (4)


Normalization: exaggeration of typical features of the target language. Translations tend to be more unmarked and conventional, less creative, more conservative.
Conventionalization of metaphors and idioms. Dialectal and colloquial expressions less frequent. Lexical choice of standard translation (Gellerstam, 1986).

Translations Universals (5)


Interference from the source text and language (Toury, 1995; Mauranen, 2000). It can occur in the morphological, lexical, syntactic level, etc. Unique items hypothesis (Tirkkonen-Condit, 2002): translated texts manifest lower frequencies of linguistic elements that lack linguistic counterparts in the source languages such that these could also be used as translations equivalents (Simplification, Normalization?)

Translations Universals (6)


However,
The as yet relatively small amount of research into potential translation universals has produced contradictory results, which seems to suggest that a search for real, unrestricted universals in the field of translation might turn out to be unsuccessful.
Puurtinen (2003: 403)

Empirical Methods in TS (1)


Laviosa-Braithwaite, (1996): study of the linguistic nature of English translated text in a subsection of the English Comparable Corpora (ECC). vers (1998): investigation of explicitation in translational English and translational Norwegian. Olohan and Baker (2000): testing of the explicitation hypothesis based on the omission and inclusion of the reporting that in translational and original English.

Empirical Methods in TS (2)


Borin and Prtz (2001): study of original newspaper articles in British and American English with articles translated from Swedish into English with POS n-gram tags. Puurtinen (2003): research of potential features of translationese in a corpus of Finnish translations of childrens books.

Empirical Methods in TS (3)


Baroni and Bernardini (2006): application of supervised machine learning techniques (SVMs) to detect translationese on two monolingual corpora of translated and original Italian texts.

Empirical Methods in TS (4)


Rayson et al (2008): a descriptive study of translationese by comparing keyword, keyword classes (POS) and key semantic tags frequencies in original Chinese, translated English and edited translated English corpora. Tirkonnen-Condit (2002): Translationese a myth or an empirical fact? Human translators did not identify well if a text was translated or not.

Theoretical Framework
Crossroad of Corpus Linguistics, Translation Studies and Computational Linguistics
It is an empirical research where corpora are the main source of data and source of hypotheses (Laviosa-Braithwaite, 1996; Olohan and Baker, 2000, etc.) It tries to validate the existence of translationese and to define the linguistic properties of translated language as a product. (Gellerstam, 1986; Baker, 1993, etc.) Use of Computational Linguistic techniques such as information extraction and machine learning algorithms (Kindermann et al., 2003; Baroni and Bernardini, 2006)

Hypotheses
1. Translationese exists and it is observable across languages. 2. This fact can be demonstrated with empirical methods applied to corpora in different languages.

Methodology (1)
Preliminary Study
Two monolingual comparable corpora of original and translated Catalan of art and architecture. 300.000 tokens each. Corpus Building
Corpus compilation Tokenization, tagging and parsing with CatCG (Alsina, Badia et al. 2002)

Corpus Exploitation
Exploitation with Wordsmith Tools (wordlists, frequency lists, type-token ratio,
lexical density, concordance lists) Implementation of scripts to extract collocations and POS n-grams with Python and NTLK

Implementation of a Machine Learning System


Machine Learning techniques (SVMs) in order to automatically classify texts in translated and not translated. Training a set of the corpus and testing (Weka software).

Methodology (2)
Main experiment
Corpus Building
Corpus compilation (Spanish, French, English, German) Tokenization, tagging and parsing

Corpus Exploitation
Exploitation with Wordsmith Tools (wordlists, frequency lists, type-token
ratio, lexical density, concordance lists) Implementation of scripts to extract collocations and POS n-grams with Python and NTLK

Implementation of a Machine Learning System


Machine Learning techniques (SVMs) in order to automatically classify texts in translated and not translated. Training a set of the corpus and testing (Weka software).

Working Plan

Commented Biblography (1)


Baker, M. (1995). Corpora in Translation Studies: An Overview and Some Suggestions for Future Research. Target 7, 2: 223-243.
Definition of a new type of corpora: monolingual comparable corpora in order to effect a shift away from comparing either ST with TT or language A with language B to comparing text production per se with translation. Type-token ratio, lexical density measures.

Borin, L. and Prtz, K. (2001). Through a Glass Darkly: Part-of-speech Distribution in Original and Translated Text, in Computational linguistics in the Netherlands 2000, 30-44.
Comparison of POS n-grams in order to determine if there are significant syntactical differences between original and translated language. Overuse in translated English of preposition-initial sentences and sentenceinitial adverbs.

Commented Biblography (2)


Kindermann et al. (2003). Authorship attribution with support vector machines. Applied Intelligence 19, 109-123.
Different

statistical techniques for authorship attribution are described: the log-likelihood ratio statistic, nave bayesian probabilistic classifiers, multi-layer perceptrons, k-nearest neighbour classification (kNN), Support Vector Machines (SVMs), etc. SVMs achieve better results than other classifiers in author attribution: they are fast and allow a great number of features as input.

Commented Biblography (3)


Baroni, M. and Bernardini, S. (2006). A New Approach to the Study of Translationese: Machine-Learning the Difference between Original and Translated text, Literary and Linguistic Computing (2006) 21(3). 259-274
A new explicit criterion to prove the existence of translationese: learnability by a machine. SVMs allow the utilization of a big amount of features. The application of SVMs achieve better results than professional human translators. Their results show that translations are recognizable on purely grammatical/syntactic grounds (function words distribution and shallow syntactic patterns).

Commented Biblography (4)


Tirkkonen-Condit, S. (2002). Translationese a Myth or an Empirical Fact? Target, 14 (2): 20720.
The hypothesis of translationese is, at least, controversial, whereas the unique items hypothesis can describe in a better way the translated or non-translated nature of a text. Translated texts manifest lower frequencies of linguistic elements that lack linguistic counterparts in the source languages such that these could also be used as translation equivalents.

You might also like