Professional Documents
Culture Documents
Pau Gimnez Flores Supervisors: Carme Colominas and Toni Badia Universitat Pompeu Fabra
Content
1. 2. 3. 4. 5. 6. 7. 8. 9. Translationese Goals Translation Universals Empirical Methods in Translation Studies Theoretical Framework Hypotheses Methodology Working Plan Commented Bibliography
Translationese
A product of the incompetence of the translator (translation errors):
unusual distribution of features is clearly a result of the translators inexperience or lack of competence in the target language (Baker, 1998: 248)
Translationese: set of linguistic features of translated texts which are different both from the source language and the target language (Gellerstam, 1986).
Goals
Main goal: validating the hypothesis of translationese empirically.
Capturing the linguistic properties of translationese in observable and refutable facts. Detecting and classifying automatically translated vs. non-translated texts based on its syntactic and lexical properties.
Theoretical Framework
Crossroad of Corpus Linguistics, Translation Studies and Computational Linguistics
It is an empirical research where corpora are the main source of data and source of hypotheses (Laviosa-Braithwaite, 1996; Olohan and Baker, 2000, etc.) It tries to validate the existence of translationese and to define the linguistic properties of translated language as a product. (Gellerstam, 1986; Baker, 1993, etc.) Use of Computational Linguistic techniques such as information extraction and machine learning algorithms (Kindermann et al., 2003; Baroni and Bernardini, 2006)
Hypotheses
1. Translationese exists and it is observable across languages. 2. This fact can be demonstrated with empirical methods applied to corpora in different languages.
Methodology (1)
Preliminary Study
Two monolingual comparable corpora of original and translated Catalan of art and architecture. 300.000 tokens each. Corpus Building
Corpus compilation Tokenization, tagging and parsing with CatCG (Alsina, Badia et al. 2002)
Corpus Exploitation
Exploitation with Wordsmith Tools (wordlists, frequency lists, type-token ratio,
lexical density, concordance lists) Implementation of scripts to extract collocations and POS n-grams with Python and NTLK
Methodology (2)
Main experiment
Corpus Building
Corpus compilation (Spanish, French, English, German) Tokenization, tagging and parsing
Corpus Exploitation
Exploitation with Wordsmith Tools (wordlists, frequency lists, type-token
ratio, lexical density, concordance lists) Implementation of scripts to extract collocations and POS n-grams with Python and NTLK
Working Plan
Borin, L. and Prtz, K. (2001). Through a Glass Darkly: Part-of-speech Distribution in Original and Translated Text, in Computational linguistics in the Netherlands 2000, 30-44.
Comparison of POS n-grams in order to determine if there are significant syntactical differences between original and translated language. Overuse in translated English of preposition-initial sentences and sentenceinitial adverbs.
statistical techniques for authorship attribution are described: the log-likelihood ratio statistic, nave bayesian probabilistic classifiers, multi-layer perceptrons, k-nearest neighbour classification (kNN), Support Vector Machines (SVMs), etc. SVMs achieve better results than other classifiers in author attribution: they are fast and allow a great number of features as input.