Spotting Translationese (Pau Giménez Thesis Proposal)

Spotting Translationese: An Empirical Approach
Pau Gimnez Flores Supervisors: Carme Colominas and Toni Badia Universitat Pompeu Fabra
Content
1. 2. 3. 4. 5. 6. 7. 8. 9. Translationese Goals Translation Universals Empirical Methods in Translation Studies Theoretical Framework Hypotheses Methodology Working Plan Commented Bibliography
Translationese
A product of the incompetence of the translator (translation errors):
unusual distribution of features is clearly a result of the translators inexperience or lack of competence in the target language (Baker, 1998: 248)
Translation-specific language or dialect, without any negative connotations (translation universals):

Third code which arises out of the bilateral consideration of the matrix and target codes: it is, in a sense, a sub-code of each of the codes involved (Frawley, 1984: 168).
Translationese: set of linguistic features of translated texts which are different both from the source language and the target language (Gellerstam, 1986).
Goals
Main goal: validating the hypothesis of translationese empirically.
Capturing the linguistic properties of translationese in observable and refutable facts. Detecting and classifying automatically translated vs. non-translated texts based on its syntactic and lexical properties.
Translation Universals (1)

Features which typically occur in translated text rather than original utterances and which are not the result of interference from specific linguistic systems (Baker, 1993: 243)

Explicitation or explicitness: translations tend to be more explicit than source texts Repetition of redundant grammatical items (i.e. prepositions) Optional that-connective is more frequent in reported speech in translated English (Olohan and Baker, 2000).

Simplification: the language of translations is assumed to be lexically and syntactically simpler than that of non-translated target language texts.
Narrower range of vocabulary: lower type-token ratio. Lower level of information load: lower lexical density
Translations Universals (4)

Normalization: exaggeration of typical features of the target language. Translations tend to be more unmarked and conventional, less creative, more conservative.
Conventionalization of metaphors and idioms. Dialectal and colloquial expressions less frequent. Lexical choice of standard translation (Gellerstam, 1986).

Interference from the source text and language (Toury, 1995; Mauranen, 2000). It can occur in the morphological, lexical, syntactic level, etc. Unique items hypothesis (Tirkkonen-Condit, 2002): translated texts manifest lower frequencies of linguistic elements that lack linguistic counterparts in the source languages such that these could also be used as translations equivalents (Simplification, Normalization?)

However,
The as yet relatively small amount of research into potential translation universals has produced contradictory results, which seems to suggest that a search for real, unrestricted universals in the field of translation might turn out to be unsuccessful.
Puurtinen (2003: 403)
Empirical Methods in TS (1)

Laviosa-Braithwaite, (1996): study of the linguistic nature of English translated text in a subsection of the English Comparable Corpora (ECC). vers (1998): investigation of explicitation in translational English and translational Norwegian. Olohan and Baker (2000): testing of the explicitation hypothesis based on the omission and inclusion of the reporting that in translational and original English.

Borin and Prtz (2001): study of original newspaper articles in British and American English with articles translated from Swedish into English with POS n-gram tags. Puurtinen (2003): research of potential features of translationese in a corpus of Finnish translations of childrens books.

Baroni and Bernardini (2006): application of supervised machine learning techniques (SVMs) to detect translationese on two monolingual corpora of translated and original Italian texts.

Rayson et al (2008): a descriptive study of translationese by comparing keyword, keyword classes (POS) and key semantic tags frequencies in original Chinese, translated English and edited translated English corpora. Tirkonnen-Condit (2002): Translationese a myth or an empirical fact? Human translators did not identify well if a text was translated or not.
Theoretical Framework
Crossroad of Corpus Linguistics, Translation Studies and Computational Linguistics
It is an empirical research where corpora are the main source of data and source of hypotheses (Laviosa-Braithwaite, 1996; Olohan and Baker, 2000, etc.) It tries to validate the existence of translationese and to define the linguistic properties of translated language as a product. (Gellerstam, 1986; Baker, 1993, etc.) Use of Computational Linguistic techniques such as information extraction and machine learning algorithms (Kindermann et al., 2003; Baroni and Bernardini, 2006)
Hypotheses
1. Translationese exists and it is observable across languages. 2. This fact can be demonstrated with empirical methods applied to corpora in different languages.
Methodology (1)
Preliminary Study
Two monolingual comparable corpora of original and translated Catalan of art and architecture. 300.000 tokens each. Corpus Building
Corpus compilation Tokenization, tagging and parsing with CatCG (Alsina, Badia et al. 2002)
Corpus Exploitation
Exploitation with Wordsmith Tools (wordlists, frequency lists, type-token ratio,
lexical density, concordance lists) Implementation of scripts to extract collocations and POS n-grams with Python and NTLK
Implementation of a Machine Learning System

Machine Learning techniques (SVMs) in order to automatically classify texts in translated and not translated. Training a set of the corpus and testing (Weka software).
Methodology (2)
Main experiment
Corpus Building
Corpus compilation (Spanish, French, English, German) Tokenization, tagging and parsing
Corpus Exploitation
Exploitation with Wordsmith Tools (wordlists, frequency lists, type-token
ratio, lexical density, concordance lists) Implementation of scripts to extract collocations and POS n-grams with Python and NTLK
Implementation of a Machine Learning System

Machine Learning techniques (SVMs) in order to automatically classify texts in translated and not translated. Training a set of the corpus and testing (Weka software).
Working Plan
Commented Biblography (1)

Baker, M. (1995). Corpora in Translation Studies: An Overview and Some Suggestions for Future Research. Target 7, 2: 223-243.
Definition of a new type of corpora: monolingual comparable corpora in order to effect a shift away from comparing either ST with TT or language A with language B to comparing text production per se with translation. Type-token ratio, lexical density measures.
Borin, L. and Prtz, K. (2001). Through a Glass Darkly: Part-of-speech Distribution in Original and Translated Text, in Computational linguistics in the Netherlands 2000, 30-44.
Comparison of POS n-grams in order to determine if there are significant syntactical differences between original and translated language. Overuse in translated English of preposition-initial sentences and sentenceinitial adverbs.

Kindermann et al. (2003). Authorship attribution with support vector machines. Applied Intelligence 19, 109-123.
Different
statistical techniques for authorship attribution are described: the log-likelihood ratio statistic, nave bayesian probabilistic classifiers, multi-layer perceptrons, k-nearest neighbour classification (kNN), Support Vector Machines (SVMs), etc. SVMs achieve better results than other classifiers in author attribution: they are fast and allow a great number of features as input.

Baroni, M. and Bernardini, S. (2006). A New Approach to the Study of Translationese: Machine-Learning the Difference between Original and Translated text, Literary and Linguistic Computing (2006) 21(3). 259-274
A new explicit criterion to prove the existence of translationese: learnability by a machine. SVMs allow the utilization of a big amount of features. The application of SVMs achieve better results than professional human translators. Their results show that translations are recognizable on purely grammatical/syntactic grounds (function words distribution and shallow syntactic patterns).

Tirkkonen-Condit, S. (2002). Translationese a Myth or an Empirical Fact? Target, 14 (2): 20720.
The hypothesis of translationese is, at least, controversial, whereas the unique items hypothesis can describe in a better way the translated or non-translated nature of a text. Translated texts manifest lower frequencies of linguistic elements that lack linguistic counterparts in the source languages such that these could also be used as translation equivalents.

Spotting Translationese (Pau Giménez Thesis Proposal)

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Spotting Translationese (Pau Giménez Thesis Proposal)

Uploaded by

Copyright:

Available Formats

Spotting Translationese: An Empirical Approach

Translation-specific language or dialect, without any negative connotations (translation universals):

Translation Universals (1)

Translation Universals (2)

Translation Universals (3)

Translations Universals (4)

Translations Universals (5)

Translations Universals (6)

Empirical Methods in TS (1)

Empirical Methods in TS (2)

Empirical Methods in TS (3)

Empirical Methods in TS (4)

Implementation of a Machine Learning System

Implementation of a Machine Learning System

Commented Biblography (1)

Commented Biblography (2)

Commented Biblography (3)

Commented Biblography (4)

You might also like