You are on page 1of 7

Peter Sipes Typology

Pronouns
Abstract
Langauges throughout the world encode much information in their pronouns. There are many commonalities: all langauges have first and second person as well as singular and plural number. There is also a stunning array of diversity: languages can mark dual and paucal number; gender; inclusivity and formality to name a few common diversities.

Sampling problem
The sample set for this paper has thirty-three languages. Thirty of them are drawn from twentytwo or twenty-three language families depending how Japanese and Turkish are related to each other. One language is classed as an isolate. The remaining two are creolesboth of which draw influence from Romance languages. This distribution presents two major problems with the data set. The first is a problem of language relations, and the second is of geographic distribution. Of the thirty-three langauges, seven (21%) come from two language families. Five of the thirtytrhee are Afro-Asiatic languages (Maltese, Arabic, Hebrew, Ngas and Tamazight Berber), although this fact is somewhat hidden by calling Hebrew and Maltese Semitic. In fact, one could argue that Maltese is the dialect of Arabic spoken in Malta after a few centuries of contact with Italian and, more recently, English (also one of the data points). If one does make this argument, Arabic could be considered to have two entries under different names. The problem exists within the three Austronesian langauges (Bouma Fijian, Indonesian and Tinrin), but likewise obscured by calling Tinrin a Melanesian langauge. It would have been better to have each family of langauges represented by one representativeor some other numerically balanced scheme. Another problem is the geographical distribution of the languages. Even if Hebrew and Arabic weren't related, they are spoken in the same area of the world. There are bilingual speakers between the two languages, and intense language contact can have effects: for example, the Balkan sprachbund. Furthermore, Arabic is spoken across a wide geographic area which could bring it into contact with

other languages on the fringes of its native range (though I hesitate to elaborate on which langauges exactly, Tamazight Berber and Turkish are candidates here). It has also been in contact for centuries with Swahili as a trade language with (minimally) Turkish and Indonesian as a liturgical language. By far the worst problem of the sample set is the presence of English. Its distribution as a native language is global and overlaps with six (18%) of the languages on the listGarrwa in Australia; Aleut, Chinook, Apache, Tonkawa and Shoshone in anglophone North America. Additionally, English is an official, though not necessarily native, language where another five to seven are spoken: Daga and Yimas (if on the Papua New Guinea side of the island); Nama (South Africa), Swahili (Kenya and Tanzania), Ngas (Nigeria), Tamil (India) and Maltese (Malta). English has been in heavy contact with many languages in the world due to the high numbers of second-language speakers in such places as diverse as Japan and Turkey. Given the far spread of langauges like Arabic and English, it could be diffiult to weed out their influences on members of the list. One way to do that would be to balance the numbers of languages from any given region with each other. To reiterate, the sample could have been improved by balancing the numbers in regard to both langauge families and geography. That said, many languages consider pronouns to be a closed class, so the data set is good enough for some rough-and-ready analysis.

Pronoun typology
Across langauges, pronouns show a stunning diversity of information encoded within. To make the information easier to digest, it can be found in spreadsheet form at http://bit.ly/SP1YxH. I will also break the discussion into several sections. Where there is no feature, I leave the cell blank. Where a feature is present at all, I mark it X. Where there would seem to be a logical gap that is not filled, I mark it . Unless otherwise stated, the will be treated as a blank. Simplicity of structure The simplest typical arrangement of pronouns is to have one pronoun for first, second and third

person each with singular and plurala total of six, with no other sort of marking or alternate forms. Five langauges, 15%, in the data set (Georgian, Papiamentu, Hatian, Daga and Finnish) share this arrangement. If languages that have alternate forms with no apparent explanation are added, Swahili and Crow join the list to make for seven of the thirty-three languages (21%). What is interesting about these langauges is that two are creoles (Hatian and Papiamentu), one is a lingua franca (Swahili) and two are spoken in areas of high language diversity (Georgian and Daga). This would seem to suggest that creoles and langauges used by higher proportions of non-native speakers will tend toward simpler pronoun systems. (Of course this also ignores the existence of case.) Finnish would seem to be the outlier here, though it is well known for having many cases. Since case is somewhat ignored in the data set, I would suggest that the apparent simplicity of Finnish in the data is an illusion caused by missing case data. Person Languages like to distinguish between the speaker and hearer. In fact, all languages did so. Not all languages had a third person. Without getting into major discussionsas they will be in other sections and it can be hard to disentangle person from discussion of the other aspects of pronounsif there is a breakdown (e.g. missing case in Shoshone) or addition (e.g. gender marking in English), it tends to happen on the third person. Having a fourth person is highly marked (6%). Person: 1 (33) = 2 (33) > 3 (30) > 4 (2) All languages distinguish between first and second person. Virtually all languages also distinguish third person (94%) If a pronoun is somehow deficient in case, that deficiency is in the 3rd person (7% of languages that mark 3rd person, Garrawa and Shoshone) Fourth person is rare and highly marked (6%). Number All thirty-three langauges (100%) make some sort of distinction for number: singular and plural. Twelve of them (36%) make a dual distinction at some point in their structure, and two of the langauges that mark dual also mark paucal (6% of total, 17% of dual-marking languages).

Though not grammatical number per se, one lanuage (3%), Apache, makes a distinction for distributivity in its basic pronoun set. It is possible that there is some sample trouble in this matter as five of them are spoken somewhere in North America (Aleut, Apache, Chinook, Shoshone and Tonkawa). It is possible that they form a chain of dual-marking languages. On the other hand, it would require a high degree of justso to make languages as separated as Apache (SW United States) and Aleut (Alaska) have contactbased interference. Number: singular (33) = plural (33) > dual (12) > paucal (2) > distributive (1) Always mark for singular vs. plural. Never paucal without dual. The ratio of paucal-marking langauges to dual-marking languages is about the half that of dual-marking to all langauges within the data set. If a language marks dual or paucal, second person is always so marked. If a language marks dual or paucal, it is highly likely that first person is so marked (92% of dual markers, 100% of paucal markers.) If a language marks dual or paucal, it is quite likely that 3rd person is so marked (83% of dual markers, 50% of paucal markers)

Gender marking Marking for gender on pronouns is a somewhat marked phenomenon, but presents a special problem of bias within the data set. Ten of the thirty-three languages in the data set have any sort of gender marking (30%); half of those ten are Afroasiatic. It is sufficiently marked that langauges that do mark gender on pronouns, tend to not mark all pronouns with gender. It is also interesting to note that marking for gender that matches sex (masculine for male and feminine for female) is typical. English stands out as being the only language in the data set that marks the neuter gender at any point on its pronoun. Gender marking by gender: None (23) > f (10) > m (9) > n (1)

Gender marking by person and number Singular 1st person 2nd person 3rd person 2 5 10 Dual 1 2 2 Plural 1 5 5

However, this whole discussion is riddled with the sampling problem mentioned in the first part of this document. Neither languages that marks gender on first person is Afroasiatic (0%). Four of the five languages that mark gender on the second person are Afroasiatic (80%). Of the languages that mark gender on third person in any way, five of the ten are Afroasiatic (50%). Within the third peson, Afroasiatic accounts for 60% of gender marking languages in the plural, 50% in the dual and 50% in the singular. That said, the trend is clear. Afroasiatic langugesat least within the data settend to mark for gender especially for second person. As mentioned in the person section, third person gets the marking when marking is not uniform. Consolidating the Afroasiatic languages to one entry, six of the twenty-nine languages put some sort of gender marking on 3rd person pronouns (21%). While lower than the initially proposed 30%, it is of approximately the same frequency. If a language marks for gender, 3rd person (singular or plural) is where it will be marked. If a language marks gender on 2nd person, it also marks on 3rd person. If a language marks gender on 1st person, it also marks on 3rd person. If a language marks gender on the plural for any given person, it also marks it on the singular of the same person. If a language marks for gender, it is not likely to mark for exclusivity (again, 2 of 10). Inclusivity Marking for inclusivity/exclusivity on a pronoun is as marked as marking for gender. Ten of the thirty-three mark exclusivity (30%). An eleventh, Japanese, also marks for inclusivity in an unusual

way, as a result Japanese will be ignored for much of this section. For sake of simplicity, unless otherwise noted inclusivity will stand for any sort of inclusivity/exclusivity marking. One problem that cropped up in the data set is that first person singular was not consistently marked for inclusivity when any language did mark for inclusivity in first person plural. As a result, some langauges had assigned to 1s.ex. The problem of course is that first person singular is not inclusive or exclusive on account of number. Kahlna alone marks any singular pronoun for inclusivity and exclusivity. Likewise, languages disprefer marking inclusivity on third person. Tinrin alone marks for inclusivity on third person pronouns. Japanese is an odd case. In the first person plural, there is a form marked as in-group. While not strictly a marker of inclusivity, it would seem to suggest that there is some awareness among speakers of in-group and out-group. Given its presence at first person plural, I wonder if it might not be somewhat inclusive in nature. Though to be conservative, Japanese is not included in any of the counts in this section. Inclusivity/exclusivity marking by person and number Singular 1st person 2nd person 3rd person 1 1 0 Dual 5 1 1 Plural 9 2 1

If a language marks for inclusivity, it is quite likely to mark it in first person plural (90%). If a language marks for inclusivity on dual pronouns, it also marks on plural pronouns. If a language marks for exclusivity, it is not likely to mark for gender too (2 of 10). Formality/politeness/relational marking Six of the thirty-three langauges (18%) make any sort of formality/politeness/relational distinction. For the sake of brevity, I will refer to this throughout as formality. When a language is going to make some sort of formality distinction, second person singular is marked. It is likely (66%) that first person singular will be marked as well. It possible that the other

four possible pronoun types may be marked, but it is much more unusual (33% or less). Interestingly, four of the six languages making formality distinctions are all neighbors in the Asian Pacific areaMandarin, Japanese, Vietnamese and Indonesian. Additionally, Japanese and Turksih may be related thus making for five of the six formality-marking languages to have some sort of connection, which again demonstrates the need for proper sample selection. Formality marking: 2s (6) > 1s (4) > 3s (2) = 3p (2) > 1p (1) = 2p (1) Grab bag features This is an informal grouping of features present on the third person pronoun of Lyele. In addition to a generic third person pronoun for singular and plural, it also marks for mass, human, augment and dimin on the third person singular. This multiplicaiton of features fits in nicely with the earlier discussion on number: if something weird is going to happen outside of formality and inclusivity marking, it will happen on the third person.

Conclusions
Languages use pronouns in a variety of ways. They all distinguish for person and number: particularly singular and plural; and speaker and hearer. While there is quite a degree of diversity, languages tend to mark inclusivity on first person and formality on second person. Third person is, to a degree, the wild card of pronouns. If something marked is going to happen on the pronoun beyond inclusivity or formality,, it happens on the third person for eleven of the thirty-three languages (33%). In those cases, it is particularly likely to be gender marking, missing cases, or missing pronouns.

You might also like