Sprachen, Karten und Sprachkarten

Travelling words of the Neolithic

The spread of people doe not necessarily imply language expansion, but – as I probably wrote before – I am still waiting to see a real world example of when this does not happen. In fact, it is easier for language to spread even in the absence of migrations, or with only a few people involved. That is why the Neolithic is such a compelling case for the farming/language dispersal hypothesis, but the exact language [family] that could have made its way to Europe from the Near East is still a controversial subject (although I already made clear that, in my point of view, Indo-European is definitely not a candidate). About all that I have written enough, but one interesting piece of evidence remains to be examined: Wanderwörter.

Wanderers from the New World

When a new product is adopted for which there is no precedent in one’s culture and language, it is natural that a word for it is also borrowed from the language of the “donors”. The clearest example is the adoption by Europeans, after 1492, of a number of products native to the Americas. Maize, which became one of the most successful native crops grown outside of the New World, had its name borrowed from Spanish maíz, which is, on its turn, derived from Taino mahis (from Proto-Arawak *mariki). The Taino were the groups first met by Columbus, and a number of such words derive from their vocabulary (tobacco and potato, two extremely popular items in Europe nowadays, are examples of that). More specialised products originating in the Americas have also been incorporated together with their native names: from the Aztecs, the Nahuatl word chocolatl gave birth to English chocolate and so many similar words across Europe (avocado and tomato are also Nahuatl words). The name of the plant from which chocolate is made, cacao, ultimately derives from Maya kakaw.

Not everything is straightforward borrowing. Sometimes, instead of adopting a foreign word, existing words for a similar thing have their meaning shifted to designate a new product. Corn (with a solid Indo-European etymology, cf. Latin granum) is another word for maize, and was originally used to designate any small grain cereal. My favourite example is the English word pineapple. Somehow, this combination was thought sufficient to describe the exotic fruit, rather than adopting the Tupi word as the French did (ananas).

The reader will excuse the fact that my examples above are not really Wanderwörter in the strict sense of the term. Wanderwörter are those that are so diffused in so many similar forms that it is virtually impossible to know where they originated. In the aforementioned cases, the direction of borrowing is very clear. Let us go back to the Near East, the problem of the Neolithic language[s], and that old acquaintance of this blog, where most things started: eme-ĝir a.k.a. Sumerian.

Did it all begin in Sumeria?

Recently, an interesting paper came to my attention. Published by Blažek and Boisson in 1992, the article tracks a series of terms supposedly diffused from Sumerian to languages as far East as Indonesia. All the terms pertain to the Neolithic way of life and include tools and verbs related to farming. There are a few convincing cases, such as niĝ₂-ĝal₂ “sickle”. This word appears in Akkadian as ingallu (or nigallu), which might at first hand appear like a direct borrowing from Sumerian, except that we have the Arabian root njl meaning specifically “to reap cereals”, which might indicate that this word existed in Proto-Semitic. It could even be traced back to Proto-Afro-Asiatic, based on forms like Proto-West Chadic *nVgal-at- “sickle” (or this could just attest a very widespread wandering). Blažek and Boisson cite further potential cognates, like Sanskrit laṅgala “plough”, with no obvious Indo-European etymology.

The whole point of the article is that widespread borrowings like the previous one would originate ultimately from Sumerian. In the case of “sickle”, that is supported by the fact that the first element, niĝ₂, means “thing” in Sumerian and is quite a productive prefix in that language (although we could be dealing with reanalysis of a borrowed word). The main difficulty with such hypothesis is that Sumerian (which started to be recorded ca. 3200 BC) is too late to correspond to the initial spread of the Neolithic, and is therefore an unlikely candidate for the source language of “Neolithic Wanderwörter“. And let us not forget that Sumerian might actually be a Bronze Age newcomer to southern Mesopotamia, as attested by its purported substratum.

ngr If cases such as the above are really due to borrowing and not just coincidence, then I find it more compelling to believe that Sumerian (wherever it originated) borrowed its Neolithic vocabulary from the same source as the other languages did. In fact, the purported substratum in the Sumerian language appears in words related to basic agricultural activities, not to professions of a highly specialised urban society (e.g. “scribe” dub-sar which has a perfect Sumerian etymology). Also, there are multiple layers of borrowing. One of my favourite examples is the word for “carpenter” (see the figure above). This is clearly shared between Sumerian and Afro-Asiatic, appearing nowadays in the very common Arabic surname najjar, but to reconstruct it for the level of Proto-Afro-Asiatic is to presume the existence of specialised professionals during the Neolithic, which is unlikely (though not impossible). Most probably, this word was diffused during the Bronze Age.

Of wheat, barley and bread

Just as in the case of the transmission to European languages of indigenous names for cultivated plants that only existed in the New World, I believe a productive approach would be to track the potential diffusion of plant vocabulary outside of the “core” Neolithic area. This is a broad region that extends from the Levant, through southern Anatolia, to northern Mesopotamia. There is mounting genetic evidence for the domestication of einkorn and emmer wheat near the Karacadağ mountains of Turkey, but for cases like barley, the scenario is more complicated, possibly with multiple domestication events across the Near East. Interestingly, the areas of wild distribution of the progenitors of the Neolithic founder crops overlap in an area between southeastern Anatolia and northern Mesopotamia (shaded in dark red in the map further below).

sum_agr-01 — Some words for cultivated plants in Sumerian.

If the Near East was indeed a linguistic mosaic during Neolithic times, it’s the inhabitants of that particular region that must have given their neighbours (if such borrowings really happened) the words for cultivated plants. We will look in vain in the earliest recorded languages (Egyptian and Sumerian) for a source of such words – they are too far in space and time from the Neolithic epicentre – but small clues to early borrowings may be hidden in their vocabulary. In fact, we can identify an extremely widespread word in both of them.

egypt_agr-01 — Some words for cultivated plants in Egyptian and Coptic.

It appears in Sumerian as še “barley; grain” and in Egyptian as swt “wheat” (which, in fact, could be a plural from st, the final t being a feminine ending). I am convinced that this is the same root as Proto-Afro-Asiatic *ŝi/uʕ(Vʕ)- and, surely derived from that, *ŝVʕVm- (Orel & Stolbova give *soʕ- and *siʕüm-) meaning “a cereal”. This is reconstructed based on attestations like the Cushitic *SuH- meaning “barley” or “corn”. Among the Semitic languages, cases like the Akkadian šeʔu “barley” are complicated, since borrowing from Sumerian cannot be discarded. However, words like Arabic seem to confirm the antiquity of this root. Perhaps it was Sumerian that borrowed from the Semitic languages? As we will see, this is a difficult question, and it seems that the word is actually a Wanderwort.

map_agri

The best evidence of the Neolithic age of this root could be the Greek σῖτος or σιτίον, meaning “grain”, a word of obscure origins with no Indo-European etymology. This word appears already in Mycenaean times ( si-to). It could ultimately derive from the Pre-Greek substratum and, consequently, be traced back to Pre-Bronze Age times.

It would not be a surprise to find that this word was diffused from the Southeastern Anatolian/Northern Mesopotamian cradle of plant domestication, as it also appears in the not-so-distant Caucasian languages. Starostin reconstructs *śwĭʔē “a kind of cereal”. In fact, there are numerous words for cereals (including wheat, oats and barley) in Proto-North-Caucasian, which brings a similar issue as the amount of supposedly agricultural vocabulary in Proto-Afro-Asiatic, certainly not in agreement with the age of both proto-languages! Perhaps we are dealing with very ancient borrowings. In the case of North Caucasian, that particular root seems to be well attested, e.g. Avar š:ʷají “small chaff” and Lak ši “millet”.

Another potential Neolithic Wanderwort appears in the Sumerian gar “bread”. Could it perhaps be related to Proto-Indo-European *ĝ(h)er- “grain, corn”? In that case, instead of borrowing by one of the daughter languages, the term would have been borrowed before the Indo-European expansion (similar to other borrowings of Neolithic vocabulary like PIE *bhar- “barley” and *tawr- “bull” from the Semitic languages). But it might already have been present in the continent, as hinted by Basque garia (Proto-Basque *gali) “wheat”. The Afro-Asiatic languages also have plenty of similar examples, and one cannot ignore Proto-North-Caucasian *ɢōlʔe “wheat”. In fact, it is interesting that both this root and the previous one appear together in the (proto-)languages that are closer to the Neolithic core – North Caucasian and Sumerian. If this is a real phenomenon and not a series of coincidences, I would presume that both borrowed the terms from the same source language, spoken ca. 11,500 years ago somewhere in the north of the Fertile Crescent. Hopefully, other bits of the language of these first farmers could be preserved in undetected Wanderwörter across Western Eurasia and North Africa.

Essays on the history of writing II: The isolates of the Fertile Crescent

One can often find in the literature the idea that continents like South America or Africa differ fundamentally from Eurasia in terms of historical language expansions, and that one does not find in those continents a similar phenomenon to Indo-European, which covers nearly all of Europe and a considerable part of Asia. As I was thinking about that, and as I was writing the post about Mediterranean isolate languages, I realised something. If Indo-European has been expanding for the last 5,500 years or so, then it was only relatively recently in its history that it came to dominate all of Europe… or almost all, since Basque is still spoken today.

Etruscan was alive and well in (early) Roman times, as were the Rhaetic languages of the Alps and the unclassified languages of the Iberian Peninsula, including Aquitanian (probably related to Basque). Minoan probably continued to be spoken in Crete after the adoption of Greek, developing into the Eteocretan language. Not to mention the Lemnian and Eteocypriot languages (by the way, I completely forgot about the later in that post). And these are just the ones that left some written record. Who knows how many languages were spoken in Northern Europe at the same time? The point is: if we could map the distribution of language families in Europe 2,500 years ago, I am sure the map would be a lot more colourful.

me_isol — Historical and modern isolated languages and small families in Mesopotamia, Anatolia and the Caucasus.

Of course, as I wrote before, there might have been one or two large families that expanded with the Neolithic in Europe. That is the nature of spread zones. But what about the area where agriculture originated? In line with what we saw in the Americas, areas of ancient cultural developments in the Old World should also be mosaics with high linguistic diversity. In the map above, I am showing some of the unclassified languages, isolated languages or small families of the Fertile Crescent (plus the languages of the Caucasus). Since Crete and Cyprus are still in the range of the map, maybe I should have included Minoan and Eteocypriot, but let’s focus on the Mesopotamia/Anatolia/Caucasus corridor for now.

There is no doubt that the cereals, pulses and animals that were the foundation of the Neolithic in Western Eurasia and Northern Africa were domesticated in this region, most likely in the highland Levant/Anatolia border. Was the spread of farming and pastoralism in the Fertile Crescent accompanied by the diffusion of a single language family, perhaps coinciding with the PPNA/PPNB cultures (or interaction spheres) that dominated much of the region between 11,500 and 8,000 years ago? I do not think so. In later times, most of the area shown in the map above would be covered by only two families: Indo-European (with Hittie and Luwian in Anatolia, Armenian in the Caucasus, and the Indo-Iranian languages to the East, including Persian) and Afro-Asiatic (namely Akkadian in Mesopotamia and all the other Semitic languages spoken from the Levant to Arabia). As in the European case, languages like Hattic, Urartian and Sumerian, that survived for some time while Indo-European and Afro-Asiatic were “conquering” the Near East, could be remnants of previous expansions, possibly since Neolithic times (although there are good Chalcholithic/Bronze Age candidates, like the Maykop culture in the Caucasus and the Ubaid culture in Mesopotamia…). But the fact is that they are too diverse to fit into a hypothetical single family.

Sumerian… and Euphratean?

game_ur — The Royal Game of Ur. Like writing, board games were another Sumerian invention.

I have written enough about Sumerian in a few previous posts, so I will not elaborate on it anymore except for two crucial questions: the autochtonous origins of this mysterious language and the supposed Euphratean substratum. I have come across an extremely interesting paper that advanced the idea of Sumeria as a creole language born in the multicultural environment of Southern Mesopotamia. Among other things, the argument is that archaic cuneiform signs were constructed through a logic similar to modern creoles. For example, consider the signs , and , meaning “head”, “mouth” and “ration” respectively. The last two are formed by slight modifications of the first. Specifically the last one is a combination of “head” and “bread”, and it can be argued that basic Sumerian nouns are very few, most of the others being derived in such way. However, the actual readings of those signs are sag, ka and gu respectively, so the creole-like derivation of nouns is mostly a phenomenon of the writing system, not of the language itself.

More interesting is the idea that an even earlier language preceded Sumerian(s) in Southern Mesopotamia. Many toponyms do not have a Sumerian etymology, and the phonetic reading of many cuneiform signs is of obscure origins. Most of the signs gained their phonetic values from the rebus principle, but some do not work that way. For example, the sign for “bird” is read ḫu when used phonetically (instead of mušen), and the sign for “fish” is read ḫa (instead of ku). Some words, consisting of CV₁C(C)V₂C structure, do not fit the typical Sumerian structure. This is particularly the case of words for professions, like adgub “reed weaver”, sipad “shepherd” or engar “ploughman”. Many refer to farming or herding activities. Thus it is possible that Sumerians borrowed some specialised vocabulary from a society that was already well established in Southern Mesopotamia, but the theory remains speculative, and we know nothing else about this hypothetical language except for the few words in the alleged Sumerian substratum.

Elamite

Spoken to the east of Mesopotamia, Elamite was the language of Susa, capital of a mighty state contemporary with the Sumerians, and later incorporated into the Persian empire. The Elamites got the idea of writing from the Sumerians, but developed their own “archaic” cuneiform signs that we cannot read phonetically (although we know something of the structure and possible content of the texts). Presumably, they recorded the same language as would be later written with the Assyrian syllabary.

elamite_behistun-01 — “And thus says Darius, the king: Within these lands, whosoever was a friend have I protected…”

We have a large corpus for Elamite, and the language is understood relatively well. That is thanks to a number of bilingual inscriptions, the longest of which is the famous Behistun Inscription, where Darius I, king of Persia, announces his lineage and deeds in three languages: Persian, Elamite and Babylonian. Despite some efforts to connect Elamite with other language families, especially with the Dravidian languages of India, it remains an isolate. There are also interesting parallels with Afro-Asiatic, e.g. elti “eye” and kassu “horn” (although the later seems more like a Wanderwort). George Starostin has a paper with a review of such hypotheses, as well as a comparative 100-word list between Elamite and major language families.

Gutian and Kassite

Much less is known about the languages of two other neighbours of Mesopotamia, the Gutians and the Kassites, both of which inhabited the vicinity of the Zagros Mountains and invaded Mesopotamia to install their own dynasties of rulers. The Gutians reigned for a few generations ca. 2100-2000 BC after the collapse of the Akkadian empire. The names of theirs rulers are virtually all that is known of their language. Despite some comparisons with Tocharian (a very divergent Indo-European language once spoken near the border with China), the names of the Gutian kings mentioned in the Sumerian king list do not reflect any known family in the region. As for the Kassites, they conquered Mesopotamia after 1500 BC, and their language is marginally better attested, since a few Kassite-Akkadian glossaries were compiled by ancient scribes. Needless to say, the known Kassite words do not resemble Sumerian, Elamite, Akkadian, Hurrian or any other language of the region.

kassite-01 — Above, some of the Gutian names from the Sumerian king list in the Weld-Blundell prism. Below, part of the Kassite-Akkadian glossary of the Hormuzd Rassam tablet.

Hurro-Urartian

irbuni-01 — “Argishti, son of Menua, built this temple and this fortress, called Irbuni…”

In the 15th century BC, Hurrian was the language of the powerful kingdom of Mitanni to the north of Mesopotamia. Together with the language of their neighbours, the kingdom of Urartu, near modern Armenia, they form the Hurro-Urartian family. Inscriptions in Hurrian and Urartian are written in the Assyrian cuneiform syllabary. They are found, for example, in the Fortress of Erebuni, modern Yerevan, announcing its foundation by the king Argishti I. The inscription names the fortress Ir-bu-u-ni, which is in the origin of modern Armenian Երեւան Yerevan, a name that resisted 2800 years!

The most convincing proposed genetic affiliation of Hurro-Urartian is that it would be related to the Caucasian languages. The geographic location of Hurro-Urartian, its phonology, and a few cognates speak in favour of that connection. For example, Hurrian words consist predominantly of a (C)V(C) structure (pa- “to build”, ḫa- “name”, un- “to come”, ar- “to give” etc.) adhering to the typical pattern of the Northern Caucasus. The shortness of the words is compensated by a seemingly complex consonant inventory. Caucasian languages are famous for having very few vowels, but 50 or 60 different consonants (remember this theory linking phonology and climate?). In line with that, it appears that the Assyrian cuneiform syllabary was ill-suited for rendering the Hurrian language. From what we can reconstruct, scribes had to resort to signs like pi and ip to represent –w-, -v-, -f– and combinations thereof. Moreover, the personal pronouns bear a remarkable resemblance with the Northern Caucasian sets. Compare, for instance, Hurrian 1sg. iša-/šo– and 2sg. fe– with Kabardian 1sg. sa, 2sg. wa and 2pl. fa.

Hattic and Kaskian

The supporters of the Anatolian hypothesis of Indo-European origins forget that the Hittites were newcomers to the region. Their predecessors in Anatolia spoke an unrelated language, conventionally called Hattic. Very little is known of the language, as bits of it were only recorded by the Hittites. Evidently, the hypothesis of a Caucasian connection has been proposed by the Russian school. Comparisons with the hypothetical Sino-Caucasian family (including North Caucasian, Sino-Tibetan and Yenisseian) show some interesting cognates, but the validity of Sino-Caucasian is what is disputed in the first place. The Kaskian language later spoken in the northern coast of Anatolia was presumably related to Hattic, and could be the language of the descendants of the first settlers dislodged by the Hittites.

Deep connections? Not quite

As I was first planning this post, I believed there should be some deep relationship between all the “colours” in the map at the beginning of the post. That was the logical conclusion: these were islands of (a) previous expansion(s) later blurred by Indo-European and Afro-Asiatic. Reality is not so simple: let’s have a look at the basic vocabulary of some of those languages:

sumerian_elamite-01 — * The reconstructions in bold are Starostin’s Proto-North-Caucasian. The ones not in bold are reconstructed for the Proto-Northeast-Caucasian level only.

A few isolated resemblances can be found here and there (e.g. “eye” in Elamite and Proto-North-Caucasian, “tongue” in Sumerian and Proto-North-Caucasian), but these might as well be due to chance. The words for “horn” in Elamite, Hattic and Caucasian might be related. To that we must add Proto-Indo-European *k’era(w)- and Proto-Afro-Asiatic *ḳar-. This means the resemblance in the table above is not unique of those languages, and we might in fact be dealing with a Wanderwort.

Perhaps we should not give so much weight to vocabulary. As I explained previously, there are some intriguing morphological similarities between various Eurasian isolates located thousands of kilometres apart. Curiously enough, the same does not apply to this group of relatively close languages in terms of geographical distance. Let’s review the crucial features that distinguish Eurasian isolates in opposition to the large families that surround them: 1. a predominance of prefixes; 2. ergative alignment when marking the pronouns in the verb; 3. possessive pronouns identical to one of the sets used with the verbs; and 4. complex “chains” preceding the verb root. In the table below, I show a quick comparison between the Mesopotamian-Anatolian isolates and Kabardian, a good representative of the Northwest Caucasian family.

ela_hur-01

Personal pronoun prefixes are marked in red, and suffixes in blue, as usual. Kabardian is the only language that actually conforms perfectly to the aforementioned pattern. Sumerian comes close, especially in relation to the verbal chain, but even it employs a good number of suffixes. Elamite is not even an ergative language, and Hurrian is as prolific in the use of suffixes as a Turkic or Uralic language. Finally, there is almost no resemblance in the actual pronoun particles, except perhaps between Hurrian and the Caucasian languages (more evident with the independent pronouns, as I said above).

What conclusions can we draw from all this? I would like to end with a very simple idea: before the Bronze Age, when large scale warfare – propelled by better weapons but also by the horse – became a major factor in population expansions, the Fertile Crescent was a linguistic mosaic with higher population densities than its surroundings and a long history of ancient cultural innovations, including agriculture. Languages expanded from it, not into it. It is pointless to look for the origins of Indo-European in the first farmers of Anatolia, who spoke Hattic before the Hittites arrived. Neither could the Neolithic peoples of the Levant have spoken Afro-Asiatic, the only Eurasian branch of which (Semitic) having reached the area in relatively recent times. We will never know the language of PPNA, and there might have been many. Perhaps the dwellers of Çatal Höyuk and worshippers at Göbekli Tepe spoke an ancestor of Hattic, or perhaps it was yet another language that contributed to the huge diversity of this ancient cultural mosaic.

catalhoyukbull

Essays on the history of writing: Afro-Asiatic roots in Egyptian

As I have mentioned previously, Afro-Asiatic is an extremely important language family. Not only because it includes two of the earliest languages ever recorded in writing – Egyptian and Akkadian – but also for a reason somewhat related to that: it is old, very old, possibly twice as old as Indo-European (unless you favour the Anatolian hypothesis). Yet, Afro-Asiatic is recognisable as a family and it can be reconstructed, giving hope to the supporters of long-range comparisons in a number of other cases worldwide.

afrasia-01 — (Modern) extent of the Afro-Asiatic languages and key languages that have ancient written records (expressed in k = thousands of years before present). Berber and South Arabian are rather groups of languages than single languages, but are so named here for convenience.

It could be argued that such recognition is only due to the fact that we have records of some of its branches since nearly five thousand years. Well, Indo-European has also a long (though not as long) tradition of written languages, and the fact that we can compare Latin, Greek and Sanskrit (not to mention Hittite) greatly facilitates the work of reconstruction. It was the similarity between those ancient languages that led to the recognition of Indo-European as a family, but no one would seriously doubt that one could arrive at the same conclusion or that Proto-Indo-European could be reconstructed based solely on the modern languages. As for Afro-Asiatic, the same holds true: even it might have started to split around 10 thousand years ago or even earlier, we would still recognise it as a family solely based on languages spoken today.

egyptian_akkadian-01

The antiquity of Afro-Asiatic cannot be questioned: the Egyptian language is well attested since a bit less than five thousand years ago (or more if you consider Predynastic seals, although they cannot be read with certainty in the way later inscriptions can). Akkadian, the second oldest Afro-Asiatic language ever recorded, starts to be written around 4500 years before present. Yet, the differences between those languages are so large that they must have begun to diverge millennia before that. In the table to the left you can see around 20 words of basic vocabulary in Egyptian and Akkadian. Only two of those (highlighted in red) are cognates, descending from a common root in Proto-Afro-Asiatic (be aware that the words for ‘two’, sn and šina, and for ‘to give’, rdj and nadānu, are not related, despite the superficial resemblance).

The spread of Afro-Asiatic has been thought to be intimately connected with the dispersal of farming, and its age is certainly consistent with that. Interestingly, the modern distribution of the family roughly coincides with the distribution of Y-chromosome haplogroup E1b1b (a.k.a. E-M35), which is also assumed to have arrived in Europe during the Neolithic (though it is not really as frequent in archaeological samples as G2a, and a lot of the presence of E-M35 in the Iberian peninsula could be explained by recent gene flow from North Africa). Recent genetic research has shown that individuals from the earliest culture of settled hunter-gatherers of the Levant, called Natufian, belonged to this haplogroup. Natufians lived around 12500 years ago in the key region of western Eurasia where domestication of a number of cereals and animals has taken place, and they were followed, ca. 11500 years ago, by the first Neolithic cultures of the Near East – samples of which were also found to belong to haplogroup E. On the other hand, the spread of E-M35 could have happened long before the Neolithic, and to assume that its carriers expanded from the Levant to Africa, largely replacing the local populations, simply does not agree with the archaeological record. Furthermore, it would imply a Western Eurasian Urheimat for the Afro-Asiatic family, whereas the highest diversity within the family is undoubtedly found in Africa.

Thus, it is reasonable to situate Proto-Afro-Asiatic somewhere in Eastern Africa around 10 millennia ago or more, but whether its expansion has anything to do with the spread of farming and E-M35 is uncertain. The two latter might be connected in the Levant and in Europe, but I would argue that European Neolithic farmers almost certainly did not speak an Afro-Asiatic language. Rather, it seems that the big expansion of Semitic – the only Afro-Asiatic branch in Eurasia – was, like Indo-European, a Bronze Age phenomenon (more about that in the future…).

P.S.: As I was finishing writing this post, this new paper was published showing a genetic affinity between ancient Egyptians and modern Near Eastern/Anatolian populations, in contrast with modern Egyptians, who have a higher contribution from Sub-Saharan Africa. The individuals that were sequenced for the Y-chromosome belonged, unsurprisingly, to haplogroups J and E1b1b.

In any case, this will be a light, almost recreational post, showing Afro-Asiatic roots that made it into Egyptian and, later, Coptic basic vocabulary. The comparison with Akkadian shown above is somewhat unfair, since many basic Egyptian words actually have cognates in Akkadian and many other Afro-Asiatic languages, even though the semantic correspondence is not exact. We will see how a language family can still be recognised even after ten thousand years (let’s say nine and a half, since Coptic is no longer spoken except as a liturgical language).

NOTE: I offer below a few examples of languages from each branch of Afro-Asiatic. The complete etymologies are available in Sergei Starostin’s site. The Proto-Afro-Asiatic forms (PAA) given at the end of each etymology are taken from the reconstructions of Militarev and Stolbova in that website. When the form originally reconstructed by Orel and Stolbova differs, I note it in parenthesis (OS). Just for the fun of it, some words are accompanied by examples from hieratic papyri, with the respective hieroglyphic equivalents and transliteration.

Man and his Occupations

sn *san “brother”. In Coptic, ⲥⲟⲛ. A very common Egyptian word, with cognates in quite a few Afro-Asiatic branches. In the Chadic languages, for example, we find Cagu šǝn, Dangla sino, and others. The Cushitic languages have Beja saan, Bilin šan and Gawwada aššinko – the later meaning “nephew”. PAA *san-/sin-.

two_brothers — “Once there were two brothers…” (From the Tale of the Two Brothers, Papyrus D’Orbiney)

Parts of the Human Body

^cn “eye”. This is an Old Egyptian word, later replaced by the usual jrt (which also has an Afro-Asiatic etymology). Derived from this obsolete root is the verb ^cn “to glance”. A number of words with the phonetic combination ^cn have as a complement. The obvious cognate is the word for “eye” in the Semitic languages: Akkadian īnu, Arabic عين, Hebrew עין, Amharic አይን etc. In the Chadic languages, the cognate for this word appears in the verb “to see”, as in Bole (closely related to Hausa) ‘inn-. Among the Omotic languages, a very divergent branch of Afro-Asiatic, we have Bench an “eye”. PAA *ʕayVn-.

jrt *jārat “eye”. The usual word in Coptic is ⲃⲁⲗ, but ⲉⲓⲁ and ⲉⲓⲉⲣ persisted in prefixes. This has cognates in the word for “eye” among many Chadic languages: Zaar yīr, Musgu arai, Mubi irin etc. Among the Cushitic languages, we find Beja iray- “to see” and Iraqw ara “eye”. PAA *ʔir-.

fnd “nose”. This root is not so well attested in other Afro-Asiatic branches, but it is there. In Chadic, it appears in words for “hole” or “mouth”, as in Sura fuŋ and Pa’a vingi, respectively. In the Cushitic languages, we have a possible cognate in Beja gunuf “nose”. The final d /dʒ/ in Egyptian must have been palatalised from /g/. PAA *fung– (OS *funVg-).

ptahhotep — “The nose is blocked, it cannot breathe…” (from the Precepts of Ptah-Hotep, Papyrus Prisse)

ns *nīs “tongue”. In Coptic, ⲗⲁⲥ. This is a widespread root in Afro-Asiatic. In Semitic, as noted in the table in the beginning of the post, we have Akkadian lišānu and, among modern languages, Arabic لسان, Hebrew לשון, Amharic መላስ etc. Berber languages have ilǝs or ils. In the Chadic branch, we have cognates in many languages, as in Angas leus, Musgu εlεsi, Dangla lēse etc. Egyptian did not have an independent glyph for /l/, although this phoneme must have been present, given the evidence from Coptic. Cognates of words with /l/ in other Afro-Asiatic languages were written with n, r or j in Egyptian. PAA *lis– (OS *les-).

ts “tooth”. There are not many cognates for this word, evidence being mostly limited to the Cushitic branch, with Beja koos “tooth” and Qwadza koʔosiko “molar”. As in the case for “nose”, the initial t /tʃ/ must have been palatalised from the original /k/. PAA *kV(ʔ)Vs– (OS *kos-).

jb *jib “heart”. This is a beautiful etymology, with cognates in many branches of Afro-Asiatic. The Semitic languages are an obvious example, with Akkadian libbu, Arabic لب, Hebrew לב, Amharic ልብ etc. In the Chadic languages, this root is represented by forms like Kilba libibi, Musgoy lib (meaning “belly”) and Mokilko ʔulbo. The Cushitic branch has Afar lubbi, Somali laab and Sidamo lubbo – the later meaning “soul”. Finally, Omotic (this very divergent branch of Afro-Asiatic!) can be included in this etymology, with Anfillo yiboo “heart”. PAA *libb-/lubb– (OS *lib-/lub-).

ptahhotep2 — “The good conduct of his heart and his tongue…” (from the Precepts of Ptah-Hotep, Papyrus Prisse)

qs “bone”. In Coptic, ⲕⲁⲥ. This word is not well attested in Semitic, with few examples like Arabic قص “chest (bone”). In the Berber languages, however, we find a good cognate in the usual word for “bone”, iɣǝs or similar variants. The Chadic languages are also well represented, with Hausa k’ašii, Musgu kεskεε, Dangla kaaso and many others, all meaning “bone”. In Cushitic, a few examples exist, like Warazi mik’eče. Finally, among the Omotic languages, we may cite Nao k’us. In summary, this word has cognates in all Afro-Asiatic branches. PAA *ḳ(ʷ)as– (OS *ḳaċ-).

Sky, Earth, Water

mw *māw “water”. In Coptic, ⲙⲟⲟⲩ. This is, of course, a very basic word, and it has many obvious cognates in other Afro-Asiatic languages. In fact, words for “water” with the general form mV– are widespread in other families, as exemplified by a famous Nostratic etymology. But let’s stick to Afro-Asiatic. Curiously, it is not attested in many branches. Among the Semitic languages, we have Akkadian mū, Arabic ماء, Hebrew מים etc. Cognates in the Chadic languages are not many: we have Guruntum ma, Gude maʔine, among a few others. In the Cushitic languages, cognates of this word appear in Beja muʔ “liquid”, Iraqw maʔay and Dahalo maʔa. PAA *maʔ-.

twobrothers2 — “She did not pour water into his hands…” (from the Tale of the Two Brothers, Papyrus D’Orbiney)

nf “breath, wind”. In Coptic, ⲛⲓϥⲉ. Perhaps this should have been included among the parts of the human body, especially given the meaning this root acquired in some Afro-Asiatic branches. The cognates of Egyptian nf mean “nose” in the Semitic languages: appu, Arabic أنف, Hebrew אף etc. In Chadic languages, it came to mean “to breathe” or “life”, as in Daba nip and Tera nifi respectively. The Cushitic languages also include similar semantic shifts. Thus, we have Beja nifi “to blow”, Saho naf “to breathe; soul”, Afar neef “face”, and Somali naf “soul, life”. Finally, the Omotic languages can be included in this etymology, e.g. Kafa naf “to blow, swell”. PAA *(ʔa–)naf-/(ʔa–)nif– (OS *naf-).

jmnt “west”. In Coptic, ⲉⲙⲛⲧ. The West is not just a cardinal direction in Egyptian: it is also the land of the dead, where Osiris reigns. This word also happens to have an interesting Afro-Asiatic etymology. In the Semitic languages, the cognates mean “right” or “right hand”: Akkadian imnu, Arabic يمنى, Hebrew ימין etc. In Hausa, the most widely spoken Chadic language, the meaning is closer to Egyptian: yammaa “westward”. The trick is to think that, if you have the South, not the North, as your reference, then the West will be at your right hand side! PAA *yamin-.

Adjectives

km “black”. In Coptic, ⲕⲁⲙⲉ. This is an important word, since it is the root of the name for Egypt itself in the ancient language: kmt *kūmat (ⲕⲏⲙⲉ in Coptic), the “black land”, a reference to the fertile soil in the Nile floodplain in contrast with the surrounding desert. Unfortunately, it does not have many cognates in other Afro-Asiatic languages. Evidence is limited to the Chadic and Cushitic languages. In the Chadic branch, semantics have changed a little bit, as in Buduma kaimē “shadow” and Awiya kəmən “evening”. In the Cushitic languages, correspondence is more obvious, with Gawwada kumma “black”. PAA *kum-.

twobrothers3 — “And the stream carried it to Egypt…” (from the Tale of the Two Brothers, Papyrus D’Orbiney)

w3d *wāʀid “green”. In Coptic, ⲟⲩⲱⲧ. This etymology is somewhat ambiguous, but I will assume that the best cognates are in the words for “green” in the Semitic and Berber languages. In the Semitic branch, we have Akkadian warqu, Hebrew ירוק etc. Among the Berber languages, we have awraɣ, irwaɣ and related forms, sometimes meaning “yellow”. Presumably, as in a few other roots listed above, there was a palatalisation from q > d in Egyptian. PAA *wVraḳ– (OS *wVriḳ-).

qbb *qabab “cool” (actually a verb, “to be cool”). In Coptic, ⲕⲃⲟ. The only cognates are found in the Cushitic languages, as in Somali qabow “cold”. Nevertheless, I thought I should include it! PAA *ḳab-.

Verbs

mwt “to die”. In Coptic, ⲙⲟⲩ. Unlike the previous one, this is a very common and widespread root – and “to die” seems like an appropriate way to finish this list! In the Semitic languages, cognates include Akkadian mātu, Arabic مات, Hebrew מת, Amharic ሞተ etc. Berber, on its turn, has əmmət. Among the Chadic languages, we can cite Hausa mutu, Buduma matte, Dangla mate, and many others. Cushitic languages also have cognates for this root, as exemplified by Somali mōd “death”. PAA *mawVt– (OS *mawut-).

ptahhotep3 — “Death is reached…” (from the Precepts of Ptah-Hotep, Papyrus Prisse)

Spread Zones: Europe and the Neolithic Survivors

I have been away for too long, due to other writing activities that kept me extremely busy. In spite of that long gap, this will not be a new cycle of posts, but the end (at least momentarily) of a series of previous topics. I will, however, touch on a variety of subjects and express my opinion about the most contentious question of European linguistics and archaeology – the timing of the spread of Indo-European languages. My opinion on this problem has been decided (for now) by the evidence provided by Basque and the isolated languages of the Mediterranean.

From the now classic Guns, Germs and Steel to the more recent Prisoners of Geography, the idea that the terrain inhabited by a group largely determines the events of their history has become widespread among the general public. Although reality is much more intricate than that, the same principle is the basis of mosaic x spread zone models: large navigable rivers and extensive flat plains will tend to experience wave over wave of language replacement. In contrast, mountainous areas, islands and other inaccessible terrains tend to be refugia where millennia of uninterrupted development result in a myriad of languages disconnected from large families. Obviously there are exceptions – as in the Andean case that I examined previously.

In any case, it is undeniable (as Russia well knows) that from the Asian steppe to Central Europe there is really no geographical barrier – as long as one keeps south of the Urals. The diffusion of Indo-European languages into Europe did not need to overcome all that distance, moving only from the surroundings of the Caspian Sea to the West (assuming a Yamna origin!). Curiously, the Neolithic diffusion followed a different path altogether, from the Levant, through Anatolia, to the Mediterranean, resembling part of the (much later) silk road. The big question is whether these two events (Neolithic and Indo-European) coincide, as proposed originally by Colin Renfrew. I assumed this theory to be pretty much dead, at least among linguists, but I was mistaken (see Gray and Atkinson’s paper and, more recently, Bouckaert and colleagues’). Fortunately, genetics have been playing a major role in redefining our views about ancient migrations, and I became convinced that genetic evidence does not support a Neolithic age for Indo-European. Nevertheless, given the difficult association between genes and languages, I am sure the matter will continue to be hotly debated for years to come.

pca_basque — Map of the 5th principal component of 94 genes in Europe (from *History and Geography of Human Genes*)

The Pre-Indo-European languages of Europe (and their speakers) offer formidable clues to the problem. I have written about the isolates of Eurasia in a previous post, but let us explore some more facts about the last speakers of a Pre-Indo-European language: the Basques. Even before the modern DNA studies, the Basques were known to be different from their neighbours. For example, using only blood types, it was noticed that Basques were predominantly O and had the highest incidence of Rh- in Europe. With the first DNA analyses of a large number of European populations, it became clear that Basques were indeed genetically distinct. I have previously shown some maps of principal component analyses, and here I reproduce the map of the 5th principal component for Europe, based on the 94 genes analysed by Cavalli-Sforza and colleagues. This principal component peaks at the Basque country, which is at the opposite extreme from most of Northern/Central Europe and the Balkans, although with some similarity to the remainder of the Iberian Peninsula.

Keeping in mind that the first principal component shows a gradient from Greece to the northwest (Neolithic?) and the second principal component radiates from the north of the Black Sea (Bronze Age/Yamna?), I believe there are only two ways Cavalli-Sforza’s data can be interpreted:

The Basques are Paleolithic/Mesolithic “survivors” (ergo Neolithic migrants spread the Indo-European languages);
They are Neolithic “survivors” (ergo Indo-European languages arrived during the Bronze Age).

By the way, I am assuming here that the massive genetic legacy of the Neolithic and Bronze Age expansions (see below) must have had a linguistic correlate. In theory, one can imagine a situation in which large numbers of migrants arriving at a region, becoming culturally dominant and having children with locals do not imply language replacement, but I would like to see real world examples of that. In fact, it seems that one only needs a small number of “conquering” migrants with minimal genetic impact to change the language of whole regions (that is the case in some Latin American countries and, further back in time, was the case of Hungary).

Fortunately, we have advanced much since Cavalli-Sforza and colleagues’ original work. For example, it is clear now that the Neolithic expansion in Europe did involve population movement, and that migrations from Anatolia were indeed the source. In my opinion, the most significant genetic piece of evidence is the discovery that modern Basques are the closest living population to Neolithic skeletons from the same region. Sardinians were also found to be very close and, in fact, Sardinians appear in every study cited here as the closest modern match to Neolithic DNA samples. Their status as a “relic” population in Europe due to isolation was noticed long ago, when Cavalli-Sforza left Sardinians out of the PCA due to their singularity within Europe. The genetic similarity between Sardinians/Basques and Neolithic samples deserves special attention.

haplo

One important aspect that has been taken into consideration recently is the distribution of haplogroups in Europe. Unlike the autosomal data referred to above, which tells us about the admixture in the ancestry of an individual, Y-chromosome and mtDNA haplogroups are specific mutations that are passed down over generations and preserve the histories of migrations of particular male and female lineages. The Y-chromosome haplogroups most closely associated with the Neolithic are E1b1b and G2a. Both of them are rare in modern Europe as a whole, but G2a was the dominant haplogroup among Neolithic farmers of Central Europe, France and Spain. Both E1b and G originate outside of Europe, in the Levant and the Caucasus respectively, and must have been brought to Europe by Neolithic migrants. I will not reproduce the beautiful maps from Eupedia, but in the map above I highlight the areas where E1b and G are nowadays more common. The Mediterranean (including Sardinia) and the Balkans are modern refugia of those two haplogroups, but what happened to Central Europe? In fact, paternal lineages from most of Central and Western Europe were later replaced by haplogroups R1a and R1b, carried by Bronze Age migrants (confirmed by the fact that Yamna samples were recently shown to be R1a). Surprisingly, most modern Basques actually belong to haplogroup R1b (maybe “bottleneck effects” could easily lead to the replacement of Y-chromosome lineages over a few generations in such an isolated population?).

In summary, the Neolithic expansion in Europe involved considerable population movements, the genetic signature of which can still be seen and is most noticeable in the Mediterranean and the Balkans. These areas were somewhat less impacted by the later Bronze Age migrations, which also changed considerably the genetic make-up of Central and Western Europe. The fact is that, since there is no other major language expansion after Indo-European, this event must have occurred during the Bronze Age. Some supporters of the Anatolian hypothesis do not deny that fact, arguing that several waves of Indo-European expansion could have occurred. Using the principle of Occam’s razor, I would immediately discard this explanation as being too complicated. The question then is: would the Neolithic expansion also have involved the diffusion of a single, widespread language family?

neolithic_cultures-01 — Pre-Indo-European languages in relation to the major cultural traditions of the Neolithic.

In the map above, I am showing some of the main Neolithic cultures of Europe in relation to the isolated languages that we know about. The Linearbandkeramik (LBK) has taken advantage of the plains that extend from Ukraine to France – a huge spread zone – and it is difficult not to imagine that it was accompanied by the diffusion of a single language [family] around 7000 years ago. At the same time, the Cardial/Impressed culture spread along the Mediterranean shores. Both ultimately stem from the Greek Neolithic (with clear origins in Anatolia), but seem to be local developments. Whether both involved the expansion of the same language [family], I will not dare to speculate, as there is no clue to what the LBK people spoke (perhaps it is in the substrate of the Germanic languages). But could there be a connection between the other Pre-Indo-European languages recorded in Southern Europe?

That Basque was part of a larger language family in the past is well accepted, but what was its extension? Some have suggested the widespread occurrence of Basque-like elements in the toponymia of Europe: e.g. the connection between Val D’Aran in Spain, Arundel in England and Ahrntal in the Alps would be the Basque word aran “valley”. There have also been some attempts to connect Basque to the extinct Pre-Indo-European language of Sardinia – which we can call “Paleo-Sardinian” but is also known as “Nuraghian”. Beyond genetic isolation, the linguistic isolation of Sardinia is clear even in (relatively) recent times: for example, whereas all Romance languages have turned Classical Latin C into /tʃ/, /ʃ/ or /s/, in Sardinian it is still pronounce as a hard /k/. As for the Paleo-Sardinian language, we have no record of it except for words that entered modern Sardinian or the toponyms in the island. It is based on those that Blasco Ferrer proposes parallels with Basque – e.g. the triad Lur-beltz, Lur-gorri and Lur-zuri��meaning “black”, “red” and “white earth” respectively, which appears in the island as Duru-nele, Lúr-kuri and Lu-tzurró.

Among the well-known Pre-Indo-European isolates is the Etruscan language. There is a possibility that it is actually connected to Rhaetic, a language or group of languages preserved on a few inscriptions around the Alps. Given the small corpus, it is unlikely that it will ever be “deciphered”, but formally there are some resemblances with Etruscan – e.g. a common ending -ce or -ke that, in the later, marks the past tense. Another candidate to form a family together with Etruscan is Lemnian, attested in a few inscriptions in the Greek island of Lemnos. If Etruscan, Rhaetic and Lemnian are indeed part of a single family, later fragmented by the spread of Indo-European, then we are potentially dealing with yet another group of Neolithic “relics”. Then, the location of those languages in relation to the Cardial/Impressed culture and to the modern distribution of Y-chromosome haplogroups E1b1b and G2a, as can be seen in the maps above, starts to make sense.

Finally, let us consider the Pre-Indo-European languages (directly or indirectly attested) from the region closest to the origins of the Neolithic. One of the most interesting things about the Greek language is that it absorbed a large amount of non-Indo-European vocabulary. These are wοrds that have no cognates in other Indo-European languages and are also easy to spot based on their distinctive phonology/morphology. Interestingly, they tend to be cultural items – words like σῦκον “fig”, ἔλαιον “olive”, θάλασσα “sea” and βασιλεύς “king” (in Linear B qa-si-re-u). It is not impossible that, like the Linear B syllabary, the substratum in Greek might be related to the language once spoken in Crete and partly preserved in the Linear A script. This language, which we may call Minoan, is probably never going to be fully deciphered given the small size of the corpus and the fact that most of it consists of accounting tablets full of personal names and toponyms. Names of products are written with logograms, so it is impossible to know how they were pronounced. A few transaction words (see below) and inflected forms (like the famous ja-sa-sa-ra-me that I mentioned previously) offer a window into Minoan, but the known vocabulary is so small that the language is destined to remain unclassified.

In summary, it is likely that all these languages are remnants of a once widespread Neolithic family (or families), diffused together with the Cardial/Impressed culture along the Mediterranean. We simply do not have enough material to show how they are related, although a convincing case can be made for Etruscan, Lemnian and Rhaetic. Basque could be related to this phenomenon or be part of a different family spread further west, perhaps in association with the Megalithic traditions of the Atlantic Neolithic. It is almost certain that the LBK expansion brought a single language or language family to their vast territory, but whether it was also related to the Mediterranean variants is impossible to tell.

kikina-01 — Example of a Linear A tablet, now at the museum of Heraklion. Logograms are transcribed with latin words (vir for man, fic for fig). Most of the words spelled with the syllabary are personal and place names. In this tablet, you can see some of the exceptions: ki-ki-na seems to be an adjective describing the figs, or maybe it is the word for fig itself repeated after the logogram. Transaction words usually appear as headers or at the end of the tablet. In this example, a-du and ki-ro could mean something like “balance”. Ku-ro is followed by a number that is the sum of all previous items and must mean “total”.

Spread Zones: South America

As the reader might have guessed by now, one of the theories that I find most attractive in linguistic geography is the opposition between spread and mosaic zones. In the form developed by Nichols, this theory posits that some areas – in general, broad plains with few natural barriers – are prone to the dispersal of single languages or families, whereas other regions (islands, mountain chains) tend to accumulate isolates and small families. Not only specific languages stretch over the spread zones, but also linguistic features. Just look at Eurasia and you will see layer over layer of prehistoric (and more recent) spreads. There is a long East-West axis that served as a corridor for the Indo-European, Uralic, Turkic and Mongolic languages, pretty much along the same steppe that brought Attila and Genghis Khan to the doors of Europe. All the language families in this area also share similar features – as opposed, for example, to the languages of the Caucasus or to the Sino-Tibetan family. The big question is: what are the archaeological correlates of a spread zone? The answer is in the horizon!

Archaeological horizons are geographically extensive phenomena – a ceramic style, a particular form of architecture, or iconography – whose spread happened very fast but whose persistence was often relatively short-lived. If we are trying to identify the periods when some large language families reached their territorial zenith, then we must look for archaeological horizons rather than for very localised cultures. Moreover, it is likely that cultural horizons they corresponded to linguistic spread zones, whereas mosaic zones should be characterised by a myriad of local traditions. Curiously, in South America, the spread zones are very different from the Eurasian case – partly due to the absence of extensive grassy plains with no obstacle to movement (the Argentinian pampas being the obvious exception, although they are not as important a communication route as the Eurasian steppe that connects the continent’s long East-West axis). Instead, the South American spread zones are large river systems (the Amazon and the La Plata Basins) and the Andean cordillera. Yes, mountains are spread zones in South America!

america_spread-01 — To the left, linguistic mosaic zones (green) and spread zones (pink) in South America. To the right, the major archaeological horizons and some localised cultures.

On the Andean side of the debate, the most important question pertains to when the largest language families in the region – Quechua and Aymara – were diffused. Once thought to be closely related in a “Quechumaran” family, the resemblance between the two is now known to be the result of extensive contact. Quechua is much more widespread than Aymara, partly because it was adopted as the language of the Inca empire (although it was already used over a broad territory even before that), but both functioned as linguae francae in different parts of the Central Andes. As correctly pointed out by Paul Heggarty, Quechua and Aymara must have spread during the Early and Middle Horizons of Andean Prehistory (the Late Horizon corresponds to the Inca expansion, but Quechua was already widespread by then). These are periods when shared architecture and/or iconography diffused throughout the coast and the cordillera – presumably, hand in hand with new ideas and, of course, languages. The Andean Horizons are separated by periods of fragmentation, called the intermediate periods, when local cultures flourished.

andes — How can a mountain chain like the Andes have served as a spread zone for languages in the past? One possible explanation is that the population in the whole “spine” of South America has always been connected through river valleys that often link the highlands to the coast. Some of them are extensive, like the Callejón de Huaylas (near Chavín), a major trade route throughout the Andean prehistory.

The Early Horizon, around 900 B.C., is the time of Chavín de Huantar – a labyrinthic pilgrimage ceremonial centre in the highlands once thought to be the “mother culture” of the Andes, but now known to be just one manifestation of a broadly shared tradition. For Heggarty, this is when Aymara started to spread. The reasoning behind that is that there are “islands” of Aymara, as well as place names, in what is now a “sea” of Quechua (including in the region around Chavín), i.e. Aymara must have spread first. I have some doubts about this hypothesis, especially because the family seems a bit shallow (the two other languages, Jaqi and Aru, spoken in central Peru, are not that different from the southern Aymara varieties). However, I admit that it is the most parsimonious explanation. This leaves us with the Middle Horizon as the time when Quechua first expanded. Starting around A.D. 600, the Middle Horizon is the time when two major centres made their influence felt over a wide sphere in the Andes: to the North, Wari, and to the South, Tiwanaku. Although the latter is usually associated with Aymara in most people’s imagination, we know that this language only arrived recently in the region. Could the original one have been Quechua? The linguistic diversity in this area points to other possibilities, including Uru, Chipaya and Puquina. Finally, with the onset of the Late Horizon by A.D. 1450, the Incas continued to propel the expansion of Quechua, as did the Spanish after the conquest. What is important to keep in mind is that there are clear correlates of a linguistic spread zone in the archaeological horizons of the Andes.

shipibo-01 — The famous Shipibo art style looks like a direct descendent of the Amazon Polychrome Tradition. But does that tell us anything about languages in the past?

Let’s move now to the East, to the lowlands of South America. Here, there are two related phenomena whose continental scale grants them the status of archaeological horizons: in the Amazon river, the Polychrome Tradition; in the other large rivers and Atlantic coast, the (distantly) related TupiGuarani Tradition. Starting around 500 B.C., the TupiGuarani Tradition spread from southwestern Amazonia, bringing with it virtually identically decorated ceramics, among other traits. The distribution of the sites and their chronology coincide almost perfectly with the historical extent and depth of the TupiGuarani language family. Thus, unlike in the Andean case, here we know exactly with which languages to associate an archaeological culture. The issue with the Amazon Polychrome Tradition is not so simple. Starting around A.D. 1000 or maybe a few centuries earlier, this tradition rapidly spread over the basin. Some suggest it might also be associated with TupiGuarani languages, based on the fact that the historical Kokama and Omagua, who spoke languages of that family, were still producing similar designs during colonial times. I would really like that to be true, as it would confirm the Spread Zone nature of the main South American waterway. However, there are other groups producing similar pottery until the present, like the Shipibo-Conibo, who are Pano-speakers. Of course, the adoption of the pan-Amazonian polychrome designs does not necessarily imply the diffusion of a single language. However, because the Kokama and Omagua languages apparently result from an ancient TupiGuarani-based lingua franca, I would take that as a serious hypothesis.

quechua_tupi-01

Following on the tradition of showing a little bit of the languages in question, the examples above offer a brief comparison of some widely spaced Quechua and TupiGuarani languages. I have selected sentences with similar structures, though I could hardly find identical ones. The Quechua of Pastaza and Napo, spoken in the Ecuadorian Amazon, and the Quechua of Santiago del Estero, spoken in northwestern Argentina, are separated by over 3,000 km. Kayabi, spoken near the Xingu river, southern Amazon, and Mbyá, spoken in southern Brazil and neighbouring Argentina and Paraguay, are separated by nearly 2,000 km. Yet, the similarities are striking. Moreover, although the two families are completely distinct in terms of morphology (as expected form the west-east divide in South America), they tend to have simpler phoneme inventories and “easier”, more regular structures than many languages that remained in the mosaic zones (see a few examples of isolates and other South American families here). Quechua, in particular, has a regularity reminiscent of an artificial language. I would tentatively suggest that this is a characteristic of languages that succeed in disseminating over the spread zones.

What about the archaeological correlates of linguistic mosaic zones? As I said in a previous post, these tend to be areas rich in localised archaeological cultures. I have highlighted a few of them in the map at the beginning of this post. North- and southwestern Amazon are typical mosaic zones. Recently, the southern fringe of the Amazon has been shown to hide a myriad of archaeological sites with earthworks, ditches, enclosures and roads. Among these are the Geoglyphs of the state of Acre and the “Garden cities” of the Upper Xingu. The Llanos de Mojos, about which I wrote previously, is another example. However, these are very different archaeological cultures, and even within one of these regions there is enormous variety. Therefore, they are perfect correlates of linguistic mosaic zones: there is not a widespread ceramic style or other material trait over these areas, which is matched by their linguistic diversity.

nokugu-01 — Satellite imagery with added plans of a Geoglyph site (left) and one of the “garden cities” of the Upper Xingu (right).

A genetic detour

This will be a short “detour” about a topic that might have crossed some readers’ minds after the latest American series of posts: can external evidence help decide which linguistic classification is more likely? Genetics is a discipline that often uses linguistic data (in the case of the Americas, it is usually Greenberg’s classification). Of course, a straightforward relationship between genes and languages would imply that peoples’ movement is the only mechanism responsible for language change, which is definitely not the case. However, it can offer insights – after all, if one can prove that some people did move, well, that must have had some linguistic consequence. I have here and there touched on some of these issues, but now I want to explore the genetic data more seriously.

About twenty years ago, Luca Cavalli-Sforza and other Italian geneticists published the venerable History and Geography of Human Genes, summarising decades of research all over the world. Beyond the presentation of phylogenetic trees for all human populations and a summary of the history of human colonisation of the globe, the book offered an innovative perspective with the use of principal component analysis (PCA) to shed light on finer scale movements within continents. While we could analyse map after map of the distribution of individual genes, what PCA does is to summarise the main trends in all that variability into just a few maps (for example, if there are several genes that show more or less the same distribution, they can be transformed into a single component).

Some of the principal component (PC) maps of Europe correlate with known or hypothetical migrations in the past, and a lot of them are repeated in the more popular Genes, peoples and languages. For example, Cavalli-Sforza found that the first PC – which explains 28% of the total variance – peaked in the near east and decreased towards north-western Europe. What better proof that the Neolithic expansion involved migrations, not just diffusion of ideas? Such results have now been confirmed with actual archaeological DNA from the Neolithic settlers. The third PC, which explains about 10% of the variance in Europe, peaks north of the Black Sea and decreases towards the west. This can be seen as proof that the spread of Bronze Age cultures such as the Beaker and Corded Ware, thought to originate from the Yamnaya of the Russian steppe, again involved a wave of migrants. DNA from ancient skeletons has once more confirmed those results. The tricky part is that both population waves could be associated with the spread of Indo-European languages, so it becomes difficult to decide between Renfrew and Anthony.

Well, I have an answer to that question (I will explain one day, but here is a clue: the Basques are the key). The important thing to keep in mind is that genes denounce when migrations took place or not, and thus have immense potential to unravel historical linguistic questions. Let us have a look now at the Americas – they turn out to be quite homogeneous (in comparison to Eurasia) when it comes to genetics – and therein lies the problem.

A forest of genes in South America

Phylogenetic (or neighbour-joining) trees are an intuitive way of showing how populations branched out based on their genetic distances. Because new analyses are conducted all the time and new results are constantly being published, we often don’t have a single tree, but a forest of possible phylogenies! For the Americas, they can vary a lot from the earliest to the latest research. Because the trees are usually colour-coded according to linguistic phyla (taken from Greenberg) in almost all publications, let’s do the same thing here and see how well my own classification fits the genetic data.

sforza The first tree shown here was taken from The history and geography…, first published in 1994. It is based on 60 to 70 markers. Cavalli-Sforza and his colleagues present different trees depending on whether North or South America is included (I have used the second). One of the things we can immediately see is that there is no correlation between proposed language phyla and genetic proximity. In fact, there’s no correlation with well-established, smaller families either. The Parakanã, a Tupi population of eastern Amazonia, is the first to split, instead of grouping with the other Tupi-speakers. The Mapuche are almost in an isolated branch instead of appearing close to other Andean and Patagonian peoples. Even worst, the Quechua are shown as being closer to the Maya than to their Aymara neighbours! I believe the problem in this tree is the intrinsic homogeneity of South American populations, coupled with the small number of markers analysed. However, when the whole continent is considered (and groups from the same language family are averaged), the tree is insightful: 1. the Eskimo are closer to Siberian populations; 2. the Na-Dené from the Northwest coast (but not those further south, like the Navajo and Apache) are a bit more distant, but in the same cluster; 3. all the other populations form a separate branch. Needless to say, this was seen as the confirmation of the theory of three waves of migrations – one of them originating the Na-Dene languages, whose distinctiveness and purported links to Siberia are still given serious consideration.

wang Now let’s examine a more recent tree. This one was produced by Wang and colleagues in 2007 and takes into consideration over 600 markers. I am only showing the South American portion of the tree. Overall, the conclusions of this study do not support three waves of migrants, but a single founding population: all Native American peoples grouped closer to each other than to the Siberians. As for linguistic correlates, it seems that a little more structure is emerging: Andean-Patagonian speakers are the first to branch, Quechua and Aymara groups being particularly close genetically. Strangely, some Central and North American populations appear in the following branch. Finally, we have a cluster for Chibcha speakers and another for my purported Chaco-Amazonian phylum. Arawak speakers are distributed across both, something I will comment on ahead. Two things worthy of notice here: 1. as suggested by Wang’s study, the outlier position of Andean-Patagonian might indicate a Pacific coast colonisation route for South America; 2. Chibcha-speakers are closer to Amazonian populations, which would invalidate my claims that it belongs on the western group (assuming that genetic distance equals linguistic distance).

reich And now for the most recent tree, one published in 2012 by Reich and colleagues in Nature. This time, over 300,000 markers were used. This impressive study confirms some of the early conclusions of Cavalli-Sforza about the continent as a whole, but presents a different picture of South America. Eskimos appear closer to Siberian populations than to the rest of the Americas; Na-Dené speakers, however, are not clustered with them, but form a highly divergent branch in the American side of the tree. The authors support the idea of at least three gene flows from Siberia to the Americas, the last two originating the Eskimo and Na-Dené groups. As for South America, we see a similar structure as to the previous tree, but with significant changes: 1. now all the Central American groups are on a branch of their own, instead of mixed with South America; 2. Andean-Patagonian speakers are closer to Chaco-Amazonian ones, with Chibcha splitting earlier than them. In any case, if this is reflected in the language phyla, maybe Chibcha shouldn’t be part either of the western or the eastern South American branches, but maybe as a distinct branch connecting South and Central/North America. Arawak speakers in the latest tree appear in different branches, and that is the only family that never shows a clear genetic correlate. I think this is extremely interesting, as it may confirm the suggestion that Arawak languages spread through trade rather than migration.

Why are the trees so different in this “genetic forest”? Well, maybe it’s difficult to arrive at a consensus because Amerind populations are genetically so close. For example, most people know that nearly all Native Americans, with the exception of some North American groups, belong to blood type O. This is because of a population bottleneck, as not many people crossed Beringia to populate the New World – and that happened relatively recently, so there hasn’t been much time for internal drift. The estimates for the founder population are usually not more than a few hundred persons, and even if the initial separation of this population from the other Siberians might have occurred some 20-30 thousand years ago, the “bottleneck” for the colonisation of the Americas proper is estimated on genetic grounds to be somewhere between 15 and 18 thousand years (with later migrations of the Eskimo and, possibly, the Na-Dené speakers). Moreover, Wang and colleagues find that a coastal route of colonisation of the Americas explains certain variance in the data better, a conclusion that is reinforced by the fact that ice sheets were blocking the inland route in North America before 13,000 years ago. A quick advance of the first settlers along the Pacific is in agreement with the early dates of some archaeological sites like Monte Verde in Chile. Needless to say, it also explains the linguistic divide between the western and eastern parts of the continent.

americas-pc1 — Maps of the principal components 1, 2 and 3 in Cavalli-Sforza, Menozzi and Piazza’s analysis of 66 Native American genetic markers. These three PCs explain around 53% of the total genetic variability in the continent. Gradient is from red to blue in all cases, but the direction is not particularly important.

Let’s now have a quick look at the maps for the principal components in the Americas. The first PC has a very regular North-South gradient in North America. Cavalli-Sforza and colleagues interpret it as summarising the main differences between the Eskimo and (Canadian) Na-Dené, on the one hand, and all other Native Americans, on the other. In South America, there is little variation in this PC, but it does show a West-East gradient. Maybe that is related to the colonisation along the Pacific coast, coupled with a relative isolation between the highland and the lowland groups. The second PC shows several trends. In North America, it summarises again the difference between the northernmost groups (Eskimo and NW Na-Dené) and the remaining peoples. A little dot in the U.S. Southwest may relate to the local Na-Dené speakers, the Apache and Navajo. The peak in the American northeast is interpreted by Cavalli-Sforza as relating to European admixture (which is documented by other means). In South America, I believe this PC shows essentially the same trend as the previous one, but in more detail: the Central Andes, together with the northern coast of South America, are in one extreme, whereas the region of the lower Amazon is in the other. In my view, this shows perfectly the colonisation of South America along the Pacific coast, another entry along the Atlantic, and the genetic isolation between Andean and Amazonian populations. Finally, the third PC is a bit more difficult to interpret. It forms a mainly West-East gradient. In North America, the Eskimo and Na-Dené are again well differentiated, including the outlying position of the Apache and Navajo (in the little orange dot). In South America, there is a peak in the Mapuche area of Chile, decreasing towards the Guyanas. Cavalli-Sforza sees that as evidence of African admixture in the latter. Possibly this PC resumes the distinction between (southern) Andean and Patagonian peoples on one side, and the remaining groups on the other.

Amerind from a South American perspective: part III

It is time to examine some of the vocabulary that, I believe, provides good evidence for the deep relationship of all South American language families. I have amassed this dataset over the last months and, although I am convinced of its plausibility, I will not passionately defend the validity of these etymologies. Although the items that follow pass the important criteria of being basic vocabulary (less prone to borrowing) compared between proto-languages (not picked and chosen between hundreds of individual languages), the sound correspondences are not completely regular – even though I have paid attention to regularity as much as possible. One must notice that violation of perfectly regular correspondences is not always a reason to discard cognates in some well-established language families. I will not attempt reconstructions, but entries in the AED which are more or less equivalent to the etymologies below will be noticed.

Abbreviations

And. = Andean; Ama. = Amazonian; PQ = Proto-Quechua; PAy = Proto-Aymara; Kun. = Kunza; Map. = Mapudungun; PTp = Proto-Tupi; PMJ = Proto-Macro-Jê; PK = Proto-Karib; PP = Proto-Pano; PAr = Proto-Arawak; PTk = Proto-Tukano; PG = Proto-Guaykuru; All. = Allentiac.

Body parts

1. EYE : And. PQ *ɲawi, PAy *najra; Ama. PMJ *nʌm, PK *ənu-ru, All. new. || AED 242 *(i)to(ʔ), 243 *tene ~ *tele. || Comments: PK also has *əne ‘to see’, which can derive from the same root. The reconstruction of PMJ would usually have *ⁿd-. Such initial clusters, which also appear in proto-Tupi reconstructions (e.g. *ⁿp-), are an artefact of the phonology of modern Amazonian languages. These languages have no phonetic distinction between voiced consonants and nasals, the second being the realisation before nasalised vowels. For instance, the word for ‘eye’ in many Northern Jê languages is tɔ < *dɔ < *ndɔ < *nɔ. Thus, there is no need to postulate an initial nasal + occlusive cluster to explain why nasal consonants in one language correspond to occlusives in another. I will adopt the procedure of reconstructing nasals, and never initial clusters, for PMJ and PTp, given the Andean cognates where nasals regularly appear. The Aymara form has a suffix -ra that is common to other body parts (ampara ‘hand’, laχra ‘tongue’, maqhura ‘testicles’). The similar suffix in PK is probably unrelated, as it appears in all types of nouns. PK also exhibits an extra initial vowel, a phenomenon that occurs in several other cognates.

2. HAND; TO TAKE : And. PQ *maki; Ama. PTp *po ( ~ *m-), PMJ *mo, PK *amo-rɨ, PP *mɨ-kɨr ~ *mɨ-βɨ, PAr *ama ‘to bring’, All. mameje- ~ mamjek- ‘to take; to bring’. || AED 370 *ma-n ~ *ma-k ~ *ma-r. || Comments: This was probably a CV root to which different suffixes were added, as the AED reconstruction correctly captures. This is evident in the case of PP, where two different roots for ‘hand’ can be reconstructed, one with the suffix *-kɨr (also appearing in Quechua?), the other with *-βɨ. As for the PTp and PMJ words, the same observation regarding initial nasals can be repeated from the previous etymology. PK again displays an extra initial vowel. The main problem with this root is that it is of a CV type starting with an unmarked consonant, so that the chances of chance resemblance are quite high.

3. HAND, ARM; TO TAKE : And. PQ *apa– ‘to carry, bring’, PAy *ampara; Ama. PMJ *paC, PAr *po ‘to give’, All. lpuɨ. || Comments: Possibly this root can be conflated with the previous one, but I have chosen not to do so, given that this set consistently exhibits unvoiced occlusives, whereas the one above contained nasals in the same position. I am uncertain about the inclusion of Allentiac here instead of in the preceding etymology. The initial cluster lp- is uncommon but not unparalleled in that language, e.g. lka: ‘one’. The weakness of this root is the same as for the preceding one: CV words with unmarked consonants are the usual suspects for chance resemblance.

Nature

4. EGG : And. Kun. qoro, Map. kuram; Ama. PMJ *ŋrɛC. || Comments: A very characteristic trait of Macro-Jê is the presence of initial *Cr- clusters whose origin is not clear. Although this is the reconstruction for PMJ, many derived languages (Maxakali, Rikbaktsa) realise this cluster as CVt- or CVr-. In some modern Jê languages like Xerente, we can observe transitions of the pattern *par > *para > pra ‘foot’. This is a clue that PMJ *Cr- might actually be derived form an earlier *CVr-. The Andean cognates in this set (see also 8. STONE) confirm that possibility, further suggesting (based on Kunza) that the contraction in PMJ might have occurred when the vowels in both syllables were identical.

5. LEAF, TREE, BUSH : And. PQ *satʂa ‘forest, tree, bush’; Ama. PMJ *ʃ-ɔj ‘leaf’. || Comments: In spite of being attested in only two families, and even though the semantic connection is not perfect, this etymology illustrates an important aspect of the comparison between Andean and Amazonian roots, in particular those of the ‘Neo-Amazonian’ core. As I mentioned previously, some PMJ and PTp nouns are distinguished by the presence of a ‘detachable’ initial *ʃ- and *tʲ-, respectively. This initial consonant is lenited or disappears when preceded by a possessive pronoun or when in a genitive construction with another noun. This set, and a few others below, show that this initial consonant regularly corresponds to *s- (and sometimes *ʃ-) in PQ. The situation in PAy is a bit more complicated, possibly involving a correspondence with *h- (also found in PP). Medial *-tʂ- in PQ regularly corresponds to *-j in PMJ (see also 11. WATER below).

6. LOUSE, NIT : And. PAy *k’utʂi, Kun. qeʧir ~ qiʧe; Ama. PMJ *ŋot, PG *aq’ete. || AED 294 *k’^wit’ || Comments: The entry in the AED includes only Quechua, but the form cited is in fact a loan from Aymara. Otherwise, it brings interesting parallels in North America. The Kunza word is very close to Aymara, except for the initial q-, so that borrowing is unlikely. The Andean *k’- / q- has an interesting parallel in PG *-q’-, but otherwise I am not certain of the inclusion of the latter. PMJ *ŋ- corresponds to Kun. q- also in 1. EGG, so we can postulate that an initial nasal velar/uvular was lost in the Andean branch, but preserved in the Amazonian one. The final in PMJ represents a problem, since we would expect *-j.

7. ROOT : And. PQ *sapi; Ama. PTp *tʲ-apo. || AED 600 *tap || Comments: Another case of little attestation, but with good phonological correspondence. PP *tapon cannot be related to this set, as initial *h- (and possibly medial *-β-) would be expected for reasons given below. It is probably an ancient Tupi loan. Otherwise, PQ *s- is in good correspondence with the ‘detachable’ PTp *tʲ- / PMJ *ʃ- (as in 5. LEAF).

8. STONE : And. PAy *qala, Map. kura; Ama. PMJ *kraC, PTk *k’ɯ̃ta. || AED 371 *k^wele ?, 719 *k’at^la, 722 *got. || Comments: The AED etymology 719 is not too different from this one, except that the Tukano cognates are relegated to a separate entry (722). That might be right in the end, as I am not too confident with the correspondence between PTk *-t- and the remaining forms. One difference from the AED is the inclusion, in the latter, of Kawesqar kiella ~ čella, a word that I could not find in the wordlists available to me (at least not with the meaning of ‘stone’). Kawesqar does, however, have kjesáu and qalqajésqe meaning ‘stone’, and these might be related to the proposed etymology after all. In any case, I am not using that particular family in my comparisons. Another unfortunate difficulty is why would Map. have -r- where Aymara has -l-, since both phonemes are available to the first (unlike the Amazonian languages, which normally only have r). I thought of a solution when I noticed that PAy *-l- corresponds to PQ *-r- in loanwords, but soon realised that in these cases the direction of borrowing was from Aymara to Quechua (apparently PQ didn’t even have *l). Apart from that, this etymology confirms the rule of 4. EGG about the formation of PMJ *Cr-.

9. WATER; TO DRINK : And. PQ *jaku; Ama. PK *woku-ru ‘to drink’, PP *waka, PTk *okko. || AED 852 *aq’^wa / *uq’^wa. || Comments: As in the case of AED 370 *ma-n ~ *ma-k ~ *ma-r ‘hand, give, measure’, this root is supposedly found in Eurasia and could go back to ‘Proto-World’. Just think of Italian mano and acqua! Although such interesting ‘cognates’ are probably nothing more than coincidences, the potential cognates in this entry, all occurring in South America, should be taken seriously. Greenberg and Ruhlen include in this root some Macro-Jê words for ‘water’ and ‘to drink’, though I think they belong to a different etymology altogether (see below). The PQ *j- is difficult to explain, but a similar alternation in the Andean languages could explain PQ *ɲawi vs PAy *najra.

10. WATER, RAIN, RIVER : And. PQ *maju ‘river’, PAy *uma- ‘water; to drink’, Map. mawən ‘rain’; Ama. PTp *(ã)mãn ‘rain’. || AED 853 *man. || Comments: Not much to comment here, except that the AED gives some extremely interesting extra-South American parallels, and many South American forms that I haven’t considered here (e.g. Yanomami ma: ‘rain’).

11. WATER, RIVER, LAKE : And. PQ *qutʂa ‘lake’, Map. ko; Ama. PTp *k’ɨ, PMJ *koj ‘river’, All. kaha. || AED 857 *k^wati. || Comments: Ideally there should be more examples of PQ *q- corresponding to PTp *k’-, but the correspondence of PAy *q- with PTk *k’- in 8. STONE means that this is not implausible. In any case, the replacement of q by k or a variant is expected both in the core Amazonian languages and in Huarpe, which lack the first consonant (in fact, even Mapudungun does). The correspondence of PQ *-ʧ- with PMJ *-j also occurs in 5. LEAF.

Verbs

12. TO COME, TO ARRIVE : And. Kun. t’e-; Ama. PMJ *tε̃C, PK *(w-)ətepɨ, PG *t’ek. || AED 325 *tem || Comments: As I noticed in some of the first etymologies, similar roots of the type CV are not usually good evidence of genetic relationship, given the chances of chance resemblance. In this case, the final consonant of PMJ was most likely *-m, as suggested by Proto-Jê *tẽm and Rikbaktsa tama. This is in good agreement with the PK form (which also includes the usual extra vowel at the beginning), but not with PG or Kunza. The latter misses a final consonant altogether. Apart from that, the use of ejectives in both Kunza and PG is an interesting parallel.

13. TO GO, TO COME, TO WALK : And. PQ *-mu- ‘translocative verbal suffix’, PAy *maja-, Map. amu- ~ miaw-; Ama. PMJ *mɔ̃ŋ, PK *(w-)əməkɨ, All. majek-. || AED 324 *min ~ *man. || Comments: this should really be considered a CVC root. The final consonant was most likely **-ŋ, lost or lenited in Quechua, Aymara, Mapudungun and Allentiac. In PK, it is a common phenomenon that final nasals in PMJ correspond to unvoiced plosives, and that roots end in vowels, hence *-kɨ (see previous root for the same changes). The PQ root is not a verb per se, but a very productive suffix to change the direction of movement, e.g. apay ‘to bring’, apamuy ‘to take’. It is present in *ʃamu- ‘to come’. For the correspondence, with PAy *-j-, see also 1. EYE.

Other words

14. NAME : And. PQ *ʃuti ( ~ *s-); Ama. PTp *tʲ-et, PMJ *ʃ-(ij)it, PK *ətetɨ, PP *harɨ, All. hene. || Comments: This is my favourite etymology. It is a cultural term, it has a wide distribution, and it illustrates a series of regular sound correspondences. The ‘detachable’ initial consonants of PTp and PMJ are by now well known. They correspond to PQ *s-, which is one possible reconstruction in this case, the other being *ʃ-. This is one of those tricky words with nearly identical reconstructions in PQ and PAy for which we do not know the direction of borrowing. It would be crucial to know, because Aymara only has *ʃuti. Anyway, in all such cases, I only show the PQ reconstruction. In PK, *(V)t- is the usual correspondence (PK likes to add vowels to the beginning and end of the words). In PP, *h- is the regular equivalent in this set, but I do not have enough examples of a medial *-t- appearing as *-r-, so I will give it the benefit of doubt.

15. NO, NEGATION : And. PQ *ama; Ama. PTp *ãm, PP *(ja)ma. || Comments: Surprisingly absent from the AED.

16. PATH, ROAD; TO WALK : And. PQ *puri- ‘to walk’; Ama. PT *(a)pe, PP *βaʔi, PTk *ma, PA *apu. || AED 595 *p’en. || Comments: Another unfortunate CV root. The *-ri- in the PQ verb is probably a crystallised inchoative suffix. This root has an interesting Maya parallel, as noticed in the AED.

17. TWO : And. PAy *paja, Kun. p’oja, Map. epu; Ama. PA *api. || AED 821 *(ne-)pale, 825 *pit. || Comments: There are 12 reconstructions for ‘two’ in the AED, which suggests that the speakers of the proto-Amerind language were obsessed with arithmetics. Possibly PP *raβɨt fits here, but we would have to explain the initial *r-.

References

Corrêa-da-Silva, B. 2010. Mawé/Awetí/Tupí-Guaraní: relações linguísticas e implicações históricas.
Emlen, N. (forthcoming). Perspectives on the Quechua-Aymara contact relationship and the lexicon and phonology of Pre-Proto-Aymara.
Gildea, S.; Payne, D. 2007. Is Greenberg’s “Macro-Carib” viable?
Nikulin, A. 2015. On the genetic unity of Jê-Tupí-Karib
Oliveira, S. 2014. Contribuições para a reconstrução do Protopáno.
Payne, D. “A classification of Maipuran (Arawakan) languages based on shared lexical retentions”. In: Derbyshire, D.; Pullum, G. (Eds.). Handbook of Amazonian languages. Berlin: Mouton de Gruyter, 1991. v. 3, p. 355-499.

Appendix: establishing sound correspondences

In establishing regular sound correspondences between South American proto-languages, the Neo-Amazonian core (Tupi, Macro-Jê, Karib) is extremely useful, due to certain initial consonants that have been shown to reveal perfect cognates. PTp words starting with *tʲ- regulary correspond to PMj *ʃ- and PK *(V)t-. In the first two, these are the famous detachable consonants: they disappear or change to *j- (*ɲ- before nasals) in genitive constructions. A good comparison of basic vocabulary between the three proto-languages, where this rule of correspondence is confirmed, can be found here. If one can find that in other (proto-)languages there are regular correspondences to this set, we would be one step further to establish more cognates and prove the deep genetic relationships of most South American languages. It so happens that there are regular correspondences in PQ and PP. Consider the following examples:

sth_corr-01 — PQ = Proto-Quechua; PTp = Proto-Tupi; PMJ = Proto-Macro-Jê; PK = Proto-Karib; PP = Proto-Pano.

It is clear that in Proto-Quechua the Neo-Amazonian trio *tʲ- : *ʃ- : *(V)t- corresponds to *s-. There certainly are more examples, with less strict semantic correspondence, than the ones I have selected. In Proto-Pano, there are even more cognates and a perfect correspondence with *h- in this proto-language. This allows us to identify previously undetected loans, such as the word for ‘root’ *tapon in PP. I had once thought that this was a nice looking cognate for PT *tʲ-apo and PQ *sapi, but now I think it is probably borrowed from the first, given that everywhere else PT *tʲ- corresponds to PP *h-. As for Aymara, there are certain difficulties, as the number of cognates that I could establish are not so large as to prove (or disprove) regularity. I currently sustain the hypothesis that the most likely correspondence is PAy *h-, showing a similar development as in PP. The main reason for this are the following potential cognates:

sth_aymara-01-01-01

Now, PMJ has an interesting feature: many body parts include a prefix for ‘flesh’ *ʃ-ĩt. This appears, for instance, in *ʃ-ĩ-krãj ‘knee’ and *ʃ-ĩ-pV ‘ear’. This may be the case in the example for ‘nose’, shown above. The word for ‘flesh’ proper has a perfect cognate in PTp *tʲ-ẽt. Although ‘flesh’ itself has no likely cognates in PQ, the word for ‘nose’ might preserve a prefix similar to that in PMJ. I had once considered that the potential PQ correspondence to the Neo-Amazonian detachable consonants was *r-, as in the word *rinri ‘ear’, precisely because of the ‘flesh’ prefix, and because I saw that particular word as cognate to PAy *hinʧu. However, all the evidence shown above points to PQ *s- as a better candidate. Therefore, I currently see PQ *sinqa ‘nose’ as somehow related to the Amazonian ‘meat’ and ‘nose’ words, and I assume the first part of the word to be cognate to the first half of PAy *hinʧu ‘ear’. An even better potential cognate to the Amazonian words is PAy *hanʧi ‘meat, flesh, skin’, although, admittedly, both are not mutually exclusive.

Amerind from a South American perspective: Part II

Continuing on the subject of South American languages, possible long-range relationships between them, and whether such relationships can enlighten us about the viability of the Amerind hypothesis, I would like to propose a tentative ‘macro’-classification. This will be quite different both from that of the splitters (who don’t propose macro-families at all) and from that of the lumpers (by which I mean Greenberg’s classification, so often repeated in the scientific literature).

sam_amerind The readers possibly saw the map on the right in a previous post. Greenberg’s classification of the South American languages divided them into broad phyla such as “Andean-Chibchan-Paezan” and “Ge-Pano-Carib”. Ideally, such divisions should be based on shared innovations and vocabulary, but that is not always the case (in fact, it seems that Greenberg’s classifications were ready in his mind before he started looking at the data). Now, I believe a much more accurate, “data-driven” scheme can be devised if the following criteria are taken into account: 1. similarity in morphology; 2. personal pronouns; 3. shared retentions or innovations in basic vocabulary. Let us examine each on its turn.

Prefixing vs Suffixing

First of all, the similarities in morphology. I have written a whole post about the differences between “Andean” and “Amazonian” languages, stressing, among other things, how the first tend to be suffixing whereas the second typically employ more prefixes. I will not repeat the evidence here, but surely one could argue that it only bespeaks of areal features. However, the argument could also be the other way round, i.e. that grammar is hardly borrowed. Otherwise, how to explain anomalies such as Ket and its complex verb chain in the middle of a sea of typically Eurasian languages (Indo-European, Uralic, Tungusic) with which it has probably been in contact for a long time? A language’s morphology is not immutable (remember the example of Egyptian?), but it might tell a lot about it’s historical relationships. After all, is it not on the very origin of the recognition of the Indo-European or Afro-Asiatic families?

N : M vs I : A

Shared personal pronouns are usually considered strong evidence that languages are related. At the same time, they can deceive, since all languages tend to use unmarked phonemes for pronouns (e.g. n, m, k, t…) and, therefore, similarities may arise by coincidence. Nevertheless, as I have been repeating for a while, the question then is: shouldn’t the distribution be random? That is, why would some languages choose a set of unmarked consonants (m : t in Eurasia) while others choose a distinct set (n : m in the Americas)? That is when genetic explanations are more likely. The mention of the controversial n : m series was not fortuitous, as it is quite productive in the classification of South American languages: most families exhibit this ‘pan-American’ pair, but not all! Consider the following example:

pronouns-01-01

A quick look at the evidence above, even without prior knowledge of the geographical distribution of the languages, would lead to their classification in two, or maybe three, groups. Most of them use the pan-American n : m series for the 1st and 2nd persons. However, the Andean languages – Quechua, Aymara, Mapudungun – add a suffix to the 1st person (*-qa, *-ja, -ʧe) and a prefix to the 2nd person (*qa-, *xu-, ej-). This, of course, occurs only in the independent form of the pronouns; in Mapudungun, the 1st and 2nd person verbal suffixes are -n and -mi. The other languages highlighted in blue are Amazonian (or intermediates between highlands and lowlands, in the case of Chibcha). They use the n : m series in a monosyllabic or prefixed form. Pano and Tukano have particularly similar forms. In contrast, other lowland languages appear to have replaced the pan-American pronouns by a vowel pair. This pair, which is i : a in the case of Macro-Jê and Guaykuru, also appears in the Maya languages.

Does that pattern have any implications for classification or is it just a coincidence? As I have mentioned before, the Macro-Jê and Tupi languages have long thought to form a macro-family, together with Karib. It could be suggested that the Guaykuru languages are part of the same group, though a bit removed (this is confirmed through shared vocabulary, as I will show below). Look at a map and you will see that these languages occupy a huge part of eastern South America. They appear to have originated in SW Amazonia. The other Amazonian languages, the ones that preserved the n : m pair, have a more restricted distribution in NW Amazonia (except for Arawak). Thus, we can start to envisage a distinction between the Andean/Patagonian phylum and two separate Amazonian phyla, one of which is characterised by widespread geographical distribution and the i : a innovation in the personal pronouns.

LAQ’^w vs NENE

Shared vocabulary is the last evidence, and the most prone to borrowing. That is why I insist it must be basic vocabulary with close semantic correspondence. This is a major problem with some of the etymologies in the AED, as in the case of the famous *t’ina ~ *t’ana ~ *t’una, where any relative (or, in fact, human being!) can be compared to any other. This problem exists even in works like the Altaic etymological dictionary of Starostin, and is one the reasons why I think the Dené-Yeniseian hypothesis is plausible (items of basic vocabulary, with close meanings and regular correspondences connect the two families).

Shared innovations in basic vocabulary can help distinguish subgroups. Let us take as an example the Romance languages. Even though they are all derived from vulgar Latin, we can show that some are closer relatives than others, and that is reflected in vocabulary: for “leg”, Portuguese and Spanish have perna and pierna respectively, whereas French and Italian have jambe and gamba. For “head”, we have cabeça and cabeza against tête and testa. For “morning”, manhã and mañana against matin and mattina, and so on. Needless to say, Portuguese and Spanish are closer to each other than each is to either French or Italian.

After scrutinising wordlists of South American languages (admittedly not the best way to do historical linguistics), I believe I have identified six very stable words that are particularly useful in distinguishing subgroups. These are six words that include mostly body parts, but also a simple noun and a verb: ashes, foot, to sleep, tongue, tooth, two.

vocab-01-01

The table above shows these six words in thirteen languages (or, rather, proto-languages in most cases). I have highlighted, in red and blue, contrasting pairs of words that could be derived from common roots, given their phonological similarity. The gaps in some of the languages are either because the words are not reconstructible, or because I do not have a complete wordlist at my disposal. Some of the “cognates” are only tentative. The possible relationship of PQ *qaʎu and Kaw. qala- to forms such as Kun. lassi has been explained previously. PQ also has *ʎaqwa- ‘to lick’, so metathesis cannot be ruled out. The PTp word for ‘tongue’, despite the resemblance, cannot be related to that set, as I will show in the next post. What is clear from the comparison above is that some languages appear all in red, others all in blue, and some fall in between. This is exemplified by the title of this section, where I make reference to two words for tongue (or ‘to lick’) found in the AED, the first of which appears in the Andes, the second in the lowlands. On the one extreme, in red, we have the core Andean/Patagonian languages. On the other, in blue, we have the core Amazonian languages – Tupi, Macro-Jê and Karib – confirming the evidence from personal pronouns. Some Amazonian languages fall in between: Arawak and Pano. They are also the ones that preserved the pan-American n : m pronouns, also present in the Andean languages. I don’t know enough about the Tukano languages to risk a classification with such a limited number of words as in the table above, but other vocabulary evidence (plus the pronouns) would place it not far from Pano and Arawak. The usefulness of such approach, against mere geographical conveniences, is that some surprises may emerge: Allentiac, a Huarpean language of NW Argentina, is completely Amazonian in its basic vocabulary.

south_am_proposal

I end this post with a map that reflects my current view of the macro-relationships between the South American families. The primordial division, about which I’ve written before, is that between “Andean” and “Amazonian” languages. To put it in more neutral terms (since there are “Amazonian”-type languages well outside of that basin), we can call them “Highland” and “Lowland” or, even better, “Western” and “Eastern” South American languages – the option adopted in the map. There is a core group of “Andean-Patagonian” languages that includes Quechua, Aymara, Mapudungun, Kaweskar and Chon. Possibly Chimu (Mochica) falls in that group, but I do not have a large enough material at my disposal to ascertain that. Chibcha is probably part of the Western division, though it displays some Amazonian features. I cannot specify the position of isolates like Kunza or small families like Puelche within the group, but they are definitely Western.

Among the Eastern languages, the situation is complex. There is a core group of three families – Macro-Jê, Tupi and Karib – whose deep relationship is given as certain. I like to call them the “Neo-Amazonian” languages, given their relatively recent spread from an Amazonian homeland over enormous areas of the South American continent. Pronominal and vocabulary evidence show that Guaykuru (and almost certainly Mataco, though I do not have the data with me) is not far removed from that group. In the map, I show all of them under the label of “Chaco-Amazonian”. They contrast with languages that did not spread as much, remaining since early times in their original homelands. On the one hand, we have the Amazonian families Pano and Tukano, whose deep relationship I take as a serious hypothesis. We can call them “Palaeo-Amazonian” families. On the other, we have those that seem to occupy a similar historical position in the Chaco (or its border), such as Nambikwara and Zamuco. I propose to call them “Palaeo-Chacoan”. Many small families and isolates (Huarpe, Witoto etc.) are Eastern languages, but their relationship with the others is less certain. Finally, Arawak occupies a strange position. I used to think it had to be related to the ‘core’ Amazonian languages, but after long staring at the pronominal, vocabulary and morphological evidence, I am sure it is far removed even within the Eastern group. Its likely western Amazonian origins, close to the Palaeo-Amazonian languages, would confirm that it is indeed an ancient split.

Amerind from a South American perspective: Part I

Expanding on the previous topics, I would like to dedicate a series of (possibly three) posts on the problem of classification of South American languages into broader groups. This is partly due to my own recent efforts of comparing well-established proto-languages (and a few isolates) in that continent, but it will also illustrate some of the fundamental challenges of the Amerind hypothesis as a whole.

First of all, a personal note: it has been said that Amerind – pretty much like Nostratic or Dené-Caucasian – is a matter of faith rather than science. Although I do not entirely agree with that, I thought I should clearly state my credo: I believe that all languages of South America are related, and, on a larger scale, that most languages of the Americas are. But what does that mean exactly? Because you cannot disprove a relationship between two languages, the burden of proof is on who proposes the relationship. Unless you believe in polygenesis (which is not my case), all languages ultimately derive from the same ancestor, and saying that two of them are ‘related’ only means that they are more closely related than with any others. Most importantly, if you want to do serious research instead of delving into unsystematic speculation, the proposed relationship must be demonstrable through valid linguistic methods – regular sound correspondences, shared basic vocabulary and grammar etc. Even if all languages derive from a common source, not all can be demonstrably related due to the enormous time-scale that obliterated the most distant relationships (‘proto-world / proto-sapiens’ claims notwithstanding!).

Given those considerations, here is how I should properly state my position: 1. all South American language families are more closely related to each other than with, let’s say, North American ones; and 2. these relationships are demonstrable because 3. the time elapsed since their divergence was not long enough to obliterate them.

The first point is the most important but, unfortunately, the weakest of them until a better understanding of the peopling of South America is achieved. I have previously mentioned the archaeological confusion that reigns in the continent during the Late Pleistocene-Early Holocene, the evidences of 14-18,000 year old occupations, and the genetic clues of an early, non-Mongoloid migration. Whether or not such early migrations were dead ends (leaving no linguistic imprint in South America), the fact is that there does not appear to be any extraneous influence in the continent after 12,000 years ago, and at least macro-families dating to that time should be recognisable.

Etymologies versus lookalikes

In this first post, I will only illustrate the importance of using valid methods in long-range comparison by examining some entries in the Amerind Etymological Dictionary (AED). I will use as examples the Tupi and Macro-Jê families with which I have some familiarity. There are 253 Macro-Jê and 114 Tupi etymologies in the dictionary. I will focus on those entries of (relatively) basic vocabulary with reflexes in both families. Because there are also a lot of those, I have restricted this post to the first ten entries with those characteristics. Fortunately, they are very representative of the whole: very few of them are sound, others help to illustrate the problems that permeate the AED, namely: 1) arbitrary segmentation; 2) inclusion of words that are not widespread within a family (and whose antiquity is thus questionable); and 3) splitting a well-established cognate set of a language family into multiple ‘Amerind’ etymologies.

Before we start, one thing must be very clear. The AED is a bona fide work that demanded much time and energy from its authors, bearing witness to an erudition and panoptic perspective that I do not claim to possess. Some of the entries may one day be proven to be valid etymologies. Others may reveal interesting long-range loanwords and thus shed light on the prehistory of the continent. Overall, however, the AED is similar to other ‘etymological’ dictionaries whose value is questioned by specialists, such as the Altaic or North Caucasian dictionaries of Starostin and others. The mistakes found there are as embarrassing as the ones in the AED, as reviews have pointed out. The authors of such dictionaries are undoubtedly competent, intelligent linguists whose work demanded much research and time. However, unlike their claims, the fact that such works have been written does not prove the validity of the proposed macro-families. It only proves one thing: that, with enough research and time, long etymological dictionaries can be written connecting any two language families of the world.

Now, for the etymologies.

[Note: PTG = Proto-Tupi-Guarani; PJ = Proto-Jê; PT = Proto-Tupi; PMJ = Proto-Macro-Jê. The spellings in the excerpts are exactly as given in the AED. When I cite specific proto-words, I use the reconstructions of Correa da Silva 2010 (PT) and Nikulin 2015 (PMJ). They were eventually modified when I had some divergence.]

3. ABOVE₃ Equatorial: Tupi: Chiripa rakã ‘head’. Macro-Ge: Caraja: Javaje rahah ‘head’. COMMENTS: Chiripa is a language belonging to the southernmost division of Tupi-Guarani, only one of the branches of the Tupi family, but is very representative of that branch. The PTG form was *ʔa-kaŋ; the form cited in the AED includes a “detachable r-” about which I wrote briefly in a previous post. The PTG word, on its turn, derives from a combination of two PT roots: *ʔa ‘head’ and *kãŋ ‘bone’. As for Macro-Jê, the Karajá word ra (the longer Javaé form must have been chosen in order to better resemble the Tupi one) has a well-known etymology within the family, being cognate to e.g. PJ *krã and Krenak krεn, all of which go back to PMJ *krãj. PT *ʔa ‘head’ and PMJ *krãj ‘head’ do not appear to be cognates, even if some of the daughter languages eventually developed forms that look alike by pure chance. As is not unusual in the AED, some reflexes of PMJ *krãj ‘head’ appear in a different etymology altogether (4. ABOVE4).

25. ARM₁ Equatorial: Proto-Tupi *po ‘hand’, Tupi po, Guarani po, Guayaqui i-pa, Kaapor n-po, Cocama puwa, Kepkeriwat ba. Macro-Ge: Chiquito i-pa, Erikbaktsa -čipa, Proto-Ge *pa, Guato (ma-)po, Kaingang: Apucarana pe, Tibagi pen, Opaie (či-)pe. COMMENTS: This is, in essence, a valid etymology. PT has actually two possible reconstructions, *po and *mo. The entry in the AED conflates two distinct PMJ roots, *mo ‘hand’ and *paC ‘arm’, only the first of which is cognate with the PT root. It also has clear Panoan cognates that have been ignored in the dictionary’s entry (they have been included in a different etymology, 26. ARM₂). The second has some obvious Andean cognates that have also been missed. I will elaborate on this cognate set in the third post of this series, so I will restrain from further comments now, except for noting that Guató (even if this particular word is probably cognate) is an isolate as there is no compelling evidence for its classification as Macro-Jê.

57. BELLY₂ Equatorial: Tupi: Shipaya parua ‘belly’, Arikem pera ‘navel’, Uainuma punua ‘navel’, purua ‘navel’, etc. Macro-Ge: Bororo: Umotina upuru ‘thorax’, Fulnio epatio ‘upper abdomen’, Ge: Apinage pitãn ‘body’, Crengez patu ‘belly, chest’, Kaingang: Serra do Chagu (idfe-)paro ‘chest’, Puri: Coroado puara ‘chest’. COMMENTS: the Tupi part does not seem to be intrinsically wrong, though I have not seen a reconstruction for such cognate set in the recent literature. As for Macro-Jê, there are a few problems. The words cited for the Northern Jê languages are definitely not cognates, since Krenyê goes back to PJ *tu(m) ‘belly’ with the 1st person plural possessive pa-. Funny enough, the same word, without the prefix, appears in a different etymology (59. BELLY₄). For the Kaingang word, the AED presents a variant in a pretty obscure dialect and exposes a methodological flaw that, unfortunately, is typical of the work as a whole: the word meaning ‘chest’ is exactly the one that was put in parenthesis as if it was a sort of unnecessary prefix! idfe- is really ĩɲ ɸe ‘my chest’ (I could not find a meaning for paro). Kaingang ɸ traces back to PJ *s-, so this word is not cognate with the other Jê words. This leaves us with the Umutina and Coroado forms (I doubt Fulniô epatio is related). If these two go back to some PMJ root, and the same is true for the Tupi words in this entry, then we might have a valid cognate – the many mistakes in the AED notwithstanding.

73. BLACK₄ Equatorial: Tupi: Manitsawa diadia. Macro-Ge: Caraja uitira ‘green, blue’, Fulnio čičia ‘black’, Ge: Krenje teted ‘green’, Crengez ntetete ‘green’, Kaingang: Dalbergia čɨ ‘dark brown’, Kamakan: Kamakan hittu ‘green’, Cotoxo itiɬ ‘green’. COMMENTS: Manitsawa is a language of the Juruna branch, which also includes e.g. Xipaya tinikĩ ‘black’. Other Tupi languages like Wayoro have forms such as tiktik ‘black’ that appears to be a better fit for this etymology. In any case, none of these words can be traced back to PT as they are not widespread in the family, being restricted to particular branches. As for Macro-Jê, Krenyê is strangely cited twice (with two different spellings for the language name). It is not possible to reconstruct PJ ‘green’, but the Kaingang word that belongs here is ku-tɨ ‘dark’, from PJ *tɨk ‘black’ (ku- is a very productive prefix in the Jê languages). The other Macro-Jê words cited might indeed be cognates going back to PMJ. However, because a parallel word cannot be reconstructed for PT, the comparison between the families is not convincing.

94. BREAST₁ Equatorial: Tupi: Proto-Tupi *kam. Macro-Ge: Botocudo kuã ‘inside’, kuaŋ ‘belly’, Ge: Cayapo kamaŋ ‘inside’, Krenje kamã, Kaingang: Tibagi ka ‘inside’, kan ‘inside’, Palmas kamme ‘inside’, Mashakali: Mashakali, Capoxo it-kematan ‘inside’, Macuni i-kematahi ‘inside’, Patasho e-kæp ‘inside’. COMMENTS: this is a valid etymology with a couple of mistakes. The PT form should rather be *ŋãm. As for Macro-Jê, the probable match is PMJ *kɤp (~ -ε-) ‘breast’ that can be reconstructed based on PJ *kʌ and Maxakali kεp. As with ‘hand’, I will write more about this in the future, so I will restrain from further comments now.

102. BURN₃ Equatorial: Proto-Tupi-Guarani *apɨ. Macro-Ge: Botocudo pek, Karaho puk, Erikbaktsa okpog(-maha), Yabuti: Arikapu pikö ‘fire’, Mashubi piku ‘fire’. COMMENTS: a rare case where the AED has a plausible etymology with almost no mistakes. We can go further back in time from PTG, as this root can be traced all the way to PT *pɨk’ ‘to burn’. The Macro-Jê set is correct and allows the reconstruction of PMJ *pok ‘to burn’. The only exception is Jabuti: the proto-Jabuti reconstruction should be *pi-ʧə, from PMJ *ʃɯm ‘fire’, a different etymology altogether.

103. BURN₃ Equatorial: Tupi: Sanamaica kaːi ‘fire’. Macro-Ge: Proto-Ge *ku-zɨ ‘fire’, Patasho köa ‘fire’, Macuni kö ‘fire’, Mashakali ko ‘fire’, Kapasho ka ‘fire’. COMMENTS: this etymology illustrates the fundamental problem of arbitrary segmentation in the AED. Sanamaica is a Mondé dialect, a group of languages where forms such as kãj or kãːj ‘to burn’ appear. They are cognates of PTG *kaj ‘to burn’, and thus we can reconstruct something like PT *kãj. The problem is the Macro-Jê set in this entry: the relevant part of the PJ word is *-zɨ (rather *-sɨ in modern reconstructions), the *ku- prefix being very common in the Jê languages. Although forms with a similar prefix appear in other Macro-Jê languages (e.g. Karajá hε-kɔ-dɨ), others include a different prefix (e.g. Ofaye ĩ-ʃɨw or the Jabuti word cited above). The PMJ reconstruction is *ʃɯm ‘fire’, an unlikely cognate of the PT root *kaj ‘to burn’.

150. COME₁ Equatorial: Tupi: Arikem an ‘go’. Macro-Ge: Botocudo nĩ, Caraja anakre, Kamakan: Meniens ni (imperative), Mashakali: Mashakali nũn, Patasho nanæ. COMMENTS: the only Tupi language cited is Arikem. I could not find a reconstruction for ‘to go’ in PT that could result in this word. As for Macro-Jê, there are two well-known PMJ roots meaning ‘to come, to go, to walk’: *tε̃C and *mɔ̃ŋ. The first seems to be in the origin of the Karajá word included in the entry (which is really a construction with the root r-a- ‘to go’ in the future tense, with the typical suffix -kre). This etymology is thus unconvincing.

157. COOK₃ Equatorial: Tupi: Arua kaʔin ‘fire’. Macro-Ge: Ge: Piokobye kaho ‘fire’, Aponegricran koxʔho ‘fire’, Mehin kühü ‘fire’, Taje, Purekamekran kuhü ‘fire’, Karaho, Apinage kukuvu ‘fire’, Ramkokamekran kuxu ‘fire’. COMMENTS: this is one of those etymologies in the AED that is just funny to look at. A few lines above, I was analysing precisely the same cognate set, originating from PT *kãj ‘fire’ on the one hand, and from PMJ *ʃɯm ‘fire’ on the other. In this entry, all the Jê forms cited derive from PJ *ku-sɨ, which includes a prefix, hence the fortuitous resemblance with the Tupi forms. Somehow, for the authors of the AED, the same reconstructed proto-word in a family can derive from two different ‘proto-Amerind’ roots. Or rather not, because they do not care about reconstructions. This situation can be found repeatedly in the AED.

178. DIE₁ Equatorial: Proto-Tupi-Guarani *manõ, Oyampi mahẽ ‘dream’. Macro-Ge: Mashakali: Mashakali, Monosho monon ‘sleep’, Macuni moñung ‘sleep’, Capoxo, Kumanasho mono ‘sleep’, Opaie moye ‘die’, Puri: Coropo mamnon ‘sleep’. COMMENTS: the PTG root cannot be traced back to PT, as it is not found in any other branch. Two roots for ‘to die’ can be tentatively reconstructed for PT: *pap and *eʔã. For ‘to sleep’ we can reconstruct *kjet. As for the Macro-Jê part of the etymology, this is one of the most obvious cognates across the family. In the AED entry, the PJ root *j-õt has been left out, maybe because it does not resemble the others very well. Other cognates that were ignored are Karajá õrõ, Rikbaktsa uru and Proto-Jabuti *nũto. The Ofayé word for ‘to die’ is not a cognate (but jõr ‘to sleep’ is). The mo- prefix is not found outside of the easternmost languages such as Maxakali. Thus, the PMJ reconstruction should be *ʃ-ɔ̃t [I will explain the ‘detachable’ ʃ- soon], unrelated to the PTG root presented in the AED or to any of the PT roots that I mentioned above.

Conclusion

Out of the ten etymologies, only two did not present major problems, and this from the point of view of the two families (Tupi and Macro-Jê) analysed. Although Greenberg once argued that the inclusion of many families would create some sort of ‘random error’, I believe it only multiplies the problems if a careful family by family evaluation is not accomplished. Because this procedure was not followed, the AED does not present etymologies – words that can be carefully traced all the way back to a common ancestor – but rather ‘lookalikes’.

Eurasia-America connections

In the previous post, when I commented about the Amerind hypothesis, I called attention to the fact that most languages in the Americas have similar structures. For example, except for some families clustered on the Pacific side of the continent, they are characterised by: 1) a tendency to use prefixes instead of suffixes; 2) a split ergative alignment for marking the persons in the verbs; and 3) two sets of verbal pronoun prefixes, one of which also functions as possessives. Although language structure seems to be more conservative than vocabulary, providing a good estimate of genetic relationships, these features admittedly could have been diffused over several millennia, or arrived at independently. Or could they? I believe the widespread shared morphology of the American languages is no coincidence, because in Eurasia this pattern is the exception rather than the rule. It can be found in ‘islands’ across the continent, coinciding with many language isolates and small families.

The Islands of Eurasia

untitled — Languages in red share similar structures. Unlike most of the widely dispersed Eurasian families (Indo-European, Uralic, Turkic etc.), the isolates and small families pinpointed in the map share many characteristics found in the American languages. Could this be an ancient pattern in the Old World that was wiped out by later language spreads?

Among the languages that preserve similar structures to the Amerind languages are Basque (but only in vestigial form), the several Caucasian families, Ket, Burushaski, Kusunda, Ainu, and the Chukchi-Kamchatka languages. In the past, Sumerian exhibited similar features. Let us look at some examples in these languages:

eurasia_america-01

As in the Amerind cases, of course, most of these languages also make use of suffixes to varying degrees. It is very suggestive that Chukchi, which is geographically closer to the Americas, shows the same pattern that we saw in the previous post in some Amerind languages: in the transitive verbs, prefixes mark the agent and suffixes mark the patient; suffixes are also used to mark the subject of intransitive verbs, or adjectives in this case (e.g. “you were quick” – remember how adjectives can function as verbs in the Amerind languages?). Curiously, Basque and Burushaski have the inverse situation: prefixes for objects, suffixes for subjects. In all cases, except Chukchi and Basque, the prefix set is also used for marking possessive pronouns.

asasarame-01 — The ‘flexion’ of the word *asasarame* in Linear A.

I have not included Sumerian examples above because this language will be treated separately below – as is appropriate for the oldest language recorded in writing by mankind. Before that, I would like to point out that not all isolates or small families (living or extinct) in Eurasia follow the ‘Amerind’ pattern. There is no evidence, for example, that Etruscan was predominantly prefixing. I did, however, include Minoan (the language spelled in Linear A) – due to the alternation of a/ja in the a-sa-sa-ra-me paradigm. Such paradigms are similar to those noticed in Linear B by Kober and that were so fundamental for the decipherment by Ventris. A-sa-sa-ra-me occurs frequently in libation formulae written in Linear A and, if this is indeed a word that can be ‘infected’, would receive the prefix j- and the suffix -ana. This is weak evidence, of course, especially given that the word is interpreted as the name of a goddess (not a verb, for example). Moreover, it should not be discarded that Minoan was an Afro-Asiatic language (where affixes like j- and -ana would perfectly pass), as the reading of ku-ro as ‘all’ would support, e.g. Akkadian kalû, Arabic kull (but another word of caution here: it does not seem that the word can be reconstructed for Proto-Afro-Asiatic). Although Afro-Asiatic is a geographically vast language family, most of its branches are restricted to North Africa, and quite understandably it does not exhibit the typical Eurasian structures. All in all, I leave Minoan here as a curiosity.

The Nature of Sumerian

Let us briefly examine the grammar of Sumerian with respect to the two main points reviewed above: personal pronoun affixes and verb conjugation. First, unlike all of the previous examples, Sumerian possessive pronouns are suffixed, rather than prefixed to their nouns. The first and second persons sg. are respectively -ĝu and -zu, whereas the first person pl. is marked as -me. These have interesting parallels in Eurasia (m : z could be related to the m : t set, and ĝu, if pronounced /ŋu/, would even have a Sino-Tibetan possessives-01 parallel), but probably are superficial resemblances. The third person is marked -ani if animate and -bi if inanimate.

In the examples, I am using both cuneiform examples and earlier, more linear monumental -style signs, as they were compiled from different sources (this will be covered in the future if in a series of posts about the development of writing).

It is in the field of verb conjugation that Sumerian becomes really interesting – and where it shows some resemblance to languages like Ket or the Na-Dené family. The feature that distinguishes these languages is usually called the ‘verbal chain’, which is nothing more than a sequence of affixes both preceding and following the verb root. In the case of Sumerian, the affixes of the verbal chain convey information not restricted to the agent and patient of the action, but also cross-referencing other components of the sentence in different cases.

Let us take, for example, what is probably the first verbal construction encountered by the student of Sumerian: mu-na-DU₃ ‘he has built’ (Hayes’ manual has a good share of “munadus” in the first chapters!). This appears in a number of dedicatory stelae stating how a temple was built by a king for some deity (E₂-a-ni mu-na-DU₃ ‘his house he has built’, see E₂-a-ni in the previous figure). Such simple word actually conveys a lot of information. verbs-01 First, the prefix mu- is of uncertain meaning, but is one of the mandatory conjugation prefixes, used before case cross-referencing. I.e., there are a number of affixes that reference previous words in different cases: in this case, -na- means that one of the arguments of the sentence is in the dative case (the full sentence would be ‘to him he has built’). Finally, the -n- is marking the 3rd person animate subject (though it is frequently omitted in the cuneiform). In this example and the others included in the figure above, I followed my usual scheme of highlighting the prefixes in red and suffixes in blue.

If the same verb was in the first person, it would be mu-DU₃-en ‘I have built’ – that is because the 1st person is marked as suffix in the verbal chain. This is illustrated by another example above, ma-ra-DU₃-e(n) ‘I shall build’. The elements are the same, except that -mu- changes the vowel due to harmony with the following syllable, -ra-. This, again, cross-references the dative (after all, I shall build your house for you). A last example with the same verb illustrates the nominalisation with -a: i-n-DU₃-a ‘he who has built’. The conjugation prefix in this situation is ĩ-, not mu-, since there is no case cross-referencing. This prefix is also found in the next example, for which I chose an intransitive verb: im-ma-ĝen ‘he went’ (underlying ĩ-ba- affected by nasalisation). The prefix -ba- is in complementary distribution with mu-, referring to inanimate subjects.

The last example in the figure above shows a number of affixes in a relatively complex sentence, nu-mu-e-SUM-mu-un-ze-en /nu-mu-e-sum-enzen/ ‘you have not given it to me’. The first prefix, nu-, is the negation, followed by the now familiar -mu-. Before and after the root SUM ‘to give’, we find -e- and -enzen marking the 2nd person plural, whereas the 1st person is not really referenced.

Conclusion: on long-range comparison

The Sumerian verb chain is not typical of the ‘spread zone’ Eurasian language families – Indo-European, Uralic, Turkic etc. It does, however, have parallels among the ‘residual’ families or isolates: in the figure at the beginning of this post you can see some similarities forms in Ket and Adyghe. Does that mean that those languages are related? Not at all (well, in a way, I believe that all languages are genetically related; the question is whether the relationship is recent enough to be demonstrable). Basque and Sumerian have already been the victims of too many unlikely comparisons. On the other hand, some of the isolates in the map above have indeed been suggested by serious linguists to be related to languages far, far away.

Kusunda, for example, has been hypothesised by Merritt Ruhlen and others to be related to languages of Papua New Guinea, which is not entirely absurd given the genetic ties of southeast Asia with Melanesia. The linguistic evidence is based on pronominal sets and a few vocabulary items. Although the first are indeed suggestive (especially as pronouns tend to be retained longer than vocabulary), the lexical evidence seems to have been assembled in the typical ‘look alike’ fashion that we find, for instance, in the Amerind etymologies. If Kusunda is related to the ‘Indo-Pacific’ languages, this should be an extremely ancient relationship, over 50,000 years old… one wonders how such relationship would still be noticeable today when the languages of Papua New Guinea themselves defy a classification into less than 30 families or so!

The Yeniseian family, of which Ket is the last remnant, has on its turn been proposed to be related to the Na-Dené family of North America. Although the idea is not new (check this 1998 PNAS paper by Ruhlen on the subject), it is in the form proposed by Edward Vajda that it has recently received some acceptance (it was reviewed by Jared Diamond in Nature). The linguistic evidence is mainly based on the resemblance of the ‘verbal chain’ and other shared paradigms in Yeniseian and Na-Dené, but also in a small but significant part of the vocabulary, which shows regular sound correspondences even in items of the basic lexicon. Non-linguistic evidence from genetics is weak and, I must say, archaeologically the hypothesis lacks a clear correlate: the ones that have been presented, such as the interaction with the arctic small tool tradition, fail to convince me (though I am no specialist in the Archaeology of North America, not to say Siberia).

Whether or not we accept those extra-continental relationships, what must be clear is that the common patterns found in the ‘islands’ of Eurasia does not prove genetic relationship between them, but possibly shows a typology that was widespread – maybe through continuous interaction or ‘punctuated equilibrium’ – before the expansions of the major families of the continent. The language(s) of the first (and later?) migrants to the New World shared those features, and that is why they are so common in the Americas, whereas in Eurasia they were wiped out by later language spreads.

An illusion? The dangers of convergence

As usual, in this final note, let me play the Devil’s advocate: it is quite possible that the shared morphology of the languages analysed here simply developed independently over time. This is not just a hypothesis, but a plain fact in the history of some languages: Egyptian, for example, was an agglutinating language with suffixes for the verbs (in most tenses) during its ‘classic’ period, Middle Egyptian. Thus, the verb ‘to hear’ in the perfect was conjugated sdm.n=f (probably pronounced /sadímnaf/) ‘he heard’, where -n- marks the perfect and -f is the 3rd person masculine. However, in the later stages of the language, this form was replaced by the use of an auxiliary verb (as it happened in a number of western European languages – I have heard, ich habe gehört, yo hé escuchado…) from the verb ‘to do’. Thus, we have late Egyptian jr=f sdm ‘he heard’ (literally ‘he did a hearing’). Finally, in Coptic, the latest stage of Egyptian, the auxiliary became bound to the root of the verb, creating a sort of ‘verbal chain’ – ‘he heard’ (a-f-sôtm PST-3sg-hear). If this whole cycle happened over some 4,000 years in the history of the Egyptian language, why would the morphology of Sumerian, Ket, Adyghe etc. have remained intact over dozens of thousands of years?