Thursday, July 24, 2008

Language Families: Eurasia (I)

I'm treating Europe and Asia here as a single unit for two reasons. First, in many respects, the differences within various regions of Asia are comparable or even greater than the difference between Europe and adjacent parts of Asia, so that it seems more convenient to think of a single whole called "Eurasia" with various geographic/ecological/cultural regions such as "Europe", "The Middle East", "The Indian Subcontinent", "Southeast Asia", etc. Second, the various language families of Eurasia tend to straddle the geopolitical boundary between Europe and Asia, so it makes sense to examine all of these families together.
Starting with the Middle East, we have spillover of the Semitic branch of Afro-Asiatic as described in the last post. Within this region, two non-Afro-Asiatic language families are well-represented: Turkic, and Indo-European.
The Turkic languages form a group of closely-related languages stretching from Anatolia, through Central Asia, to Siberia and Northeast Asia; the most well-known and most-spoken member is Turkish, but others include Azeri, Turkmen, Uzbek, Kazakh, Kyrgyz, and Uyghur (the language of Xinjiang Autonomous Region in western China). Minority Turkic languages are found in parts of southeastern Europe, Iran, Afghanistan, and the Russian Federation. Past civilizations that were Turkic-speaking include the Seljuqs, Bulgars, Avars, and Pechenegs, which menaced the medieval Greek-speaking Byzantine Empire; the Tatars and Khazars, both medieval steppe cultures; and the Timurid and Mughal cultures, which ruled parts of central Asia, Iran, and India during the early modern period. The Huns, a confederation of steppe tribes that invaded the Roman Empire during the 5th century AD, were likely Turkic-speaking as well, though probably also included non-Turkic members.
The Indo-European languages are found throughout most of Europe, into the Caucasus region, and from there through Iran, Afghanistan, and into the northern half of the Indian Subcontinent. There are nine extant subfamilies of Indo-European: Indo-Iranian, which includes the Indo-Aryan languages Hindi, Urdu, Bengali, Panjabi, Nepalese, Sinhalese, and others, as well as the Iranian languages Persian, Tajik, Pashto, Kurdish, Baluchi, and Ossetian; the Armenian language, which forms its own independent branch within Indo-European; the Greek language, also an independent branch which once included Ancient Greek, Macedonian, and others; the Albanian language, a third independent branch, which may have included Classical-era Balkan languages such as Illyrian, Thracian, and Dacian; the Slavic languages, found in eastern Europe and extending into large areas of northern Asia, and including Russian, Ukrainian, Polish, Czech, Serbo-Croatian, and Bulgarian, among others; the Baltic languages, which today include only Latvian and Lithuanian; the Germanic languages of northern Europe, including German, English, Dutch, Swedish, and Icelandic, among others; the Italic languages, once represented by Latin and ancient neighboring languages such as Oscan and Umbrian, but today represented by the Latin-descended Romance languages, including Spanish, Portuguese, French, Italian, Romanian, and Catalan; and finally the Celtic languages, today represented by Irish and Scots Gaelic, Welsh, and the Breton language of France. Many well-known cultures of antiquity spoke Indo-European languages: the Britons, Gauls, Celtiberians, and Galatians spoke Celtic languages, while the Visigoths, Ostrogoths, Vandals, and Norse spoke Germanic languages; the ancient Trojans spoke a Phrygian language, an extinct branch of Indo-European probably closest to Armenian; the language of the bronze-age Hittite civilization belonged to the most divergent branch of Indo-European, Anatolian (now extinct); while ancient West Asian cultures such as the Mitanni, the Khwarezmians, the Bactrians, the Scythians, the Alans, the Parthians, and the Medes, spoke Indo-Iranian languages. The 1st-millennium AD Tocharian culture of the Tarim basin in northwestern China was also Indo-European speaking.
Several Indo-European languages have religious importance. Sanskrit, an ancient Indo-Iranian language, is the sacred language of Hinduism and its scriptures, while the Indo-Iranian Pali language was the first (and in many communities, still principal) language of the Buddhist scriptures; likewise, Jain scriptures are found in a variety of Indo-Iranian languages, including the ancient Prakrits languages which split from Sanskrit, and the modern Panjabi language is the sacred language of Sikhism. The Zoroastrian scriptures were first set down in the Indo-Iranian language Avestan, and the majority of the Bábí and Bahá'í scriptures were revealed in the Indo-Iranian Persian language. Outside of the Indo-Iranian branch, Ancient Greek was one of the earliest liturgical languages of Christianity, soon followed by Latin; Latin still serves as the liturgical language of the Roman Catholic Church, while Greek, the Slavic language known as Old Church Slavonic, and Armenian, are languages of Eastern Orthodox Christianity.
Within Europe, particularly in pre-Christian times, a number of non-Indo-European languages were found. These likely included the so-called "Pelasgian" language(s) of pre-Hellenic Greece and Crete (including probably the Minoan language) though some linguists interpret the Pelasgian languages as simply non-Greek Indo-European tongues. Also present was a small language family called "Tyrsenian" or "Tyrrhenian", which may have united the Lemnian language of the Aegian, the pre-Roman Etruscan language of central Italy, and the Rhaetic language of the Alps. In Britain, the ancient Pictish language is often considered to be a non-Indo-European language that survived the initial Celtic migrations to the island, but others have interpreted Pictish as Celtic. The pre-Roman Elymian and Sicani languages of Sicily are also often considered non-Indo-European. In the Iberian peninsula, several non-Indo-European languages were spoken before the Roman conquest, including Tartessian, Iberian, and Aquitanian. Not much is known of the first two groups, but the Aquitanian language is widely considered to be the ancestor, or at least a relative of the ancestor, of the modern Basque language, the only surviving pre-Indo-European language in Europe. Basque has proven to be an enigma within the linguistic community. It has been classified as an isolate - that is, no genetic relationships with any known languages or language families have yet been demonstrated. However, hypotheses abound, with some of the most well-received ones including links to languages spoken in the Caucasus mountains, thousands of miles to the east, though this has yet to be demonstrated conclusively.
Three ancient languages spoken in the Middle East and India deserve mention here. One is the Sumerian language, the first attested language with a writing system anywhere in the world. Like Basque, Sumerian is classified as a language isolate. Many hypotheses have attempted to link Sumerian with virtually every language family in Eurasia, but none of these hypotheses has withstood careful scrutiny. Another Middle Eastern language generally classified as an isolate is the Elamite language of pre-Persian Iran; as with Sumerian, linguists have sought relatives of Elamite among various Eurasian languages, but equally without success, although a link to the Dravidian languages of southern India has gained the most attention. Finally, the language of the enigmatic Indus Valley Civilization is also considered an isolate; however, as the writing system used by this culture has not been deciphered yet, no definite conclusions can be made. Several linguists have conjectured that the Indus Valley language was an early Indo-Iranian language akin to Sanskrit, or that it may represent a member of the Austro-Asiatic languages (today found mostly in Southeast Asia) but most have withheld analysis until the script can be deciphered.
In the area of the Caucasus mountains, three small language families can be found: Northwest Caucasian, Northeast Caucasian, and South Caucasian (often called Kartvelian). The Northwest Caucasian family includes such languages as Circassian, Adyghe, and Abkhaz, all minority languages within the Russian Federation or Georgia; the Northeast Caucasian languages include Ingush and Chechen, which have their own autonomous republics within the Russian Federation, and many languages within the Russian republic of Dagestan, including Lak, Darga, Khinalug, the Andi languages, the Tsez langauges, and the Lezgic languages; and the South Caucasian or Kartvelian family includes, as its principal member, Georgian, the language of the Republic of Georgia, as well as several minority languages of Georgia and Turkey. Georgian also serves as the liturgical language of the Georgian Orthodox Church. Two languages of the ancient Middle East, Hurrian and Urartian, have recently been classified as older members of the Northwest Caucasian language family.
Scattered through central and northeastern Europe and into northwestern Asia can be found the Uralic languages. The principal member of the family is Hungarian, spoken in central Europe surrounded by Indo-European languages; the other two Uralic languages with national status are Finnish and Estonian, spoken to the east of the Baltic Sea. The minority Sami languages of Scandinavia, formerly known as Lappish, are also Uralic, and are usually classified in a Finnic branch along with Finnish and Estonian, and various minority languages of northwestern Russia such as Karelian, Votic, and Ingrian. Hungarian, as well as two languages of western Siberia, Khanty and Mansi, are classified together in the Ugric branch of Uralic, while the Samoyed languages of the Arctic coast of Siberia are considered the most divergent branch of Uralic. Other Uralic languages, spoken in the upper Volga river basin and the Ural mountains, include Komi, Mordvin, Udmurt, Mari, and Erzhya, which are generally classified with Finnic. A small language group called Yukaghir, spoken in a remote area of northeastern Siberia and the Arctic coast, has also been connected to Uralic by some linguists, but this has not yet been widely accepted.
Moving further across central Asia, we find the small Mongolic language family, centered on Mongolia with outliers in the Caspian Sea region and in parts of the Hindu Kush mountains. The family includes the Classical Mongolian language - that of Genghis Khan and the Mongol Empire - as well as Modern Mongolian, Buryat, the Kalmyk language of southwestern Russia, and the Moghol language of Afghanistan, among others.
Spreading east from Mongolia through northeastern China and into eastern and northeastern Siberia are the Tungusic languages (also known as Manchu-Tungus). Historically, Manchu was the principal language of the family, being the language of the Manchurian people and of the ruling family of the Qing dynasty, China's last imperial rulers. Modern Tungusic languages are found in scattered parts of northern China and eastern regions of the Russian Federation. The family includes such languages as Evenki, Oroqen, Nanai, Udege, and Xibe. The language of China's 12th-13th century Jin Dynasty, Jurchen, was also a Tungusic language.
The language families Turkic, Mongolic, and Tungusic, are considered by a large number of linguistics to belong together in a family known as Altaic; evidence for this grouping includes similarities in morphology between the three language families, as well as features such as vowel harmony and lack of gender in nouns. Cognates between the three families have been proposed as well. However, many linguists have interpreted the similarities between the supposed Altaic languages as due to areal contact, rather than shared descent. As the evidence for Altaic has not been completely accepted by the linguistic community as yet, it remains, for the time being, a (plausible) hypothesis.

Tuesday, July 22, 2008

Language Families: Africa

To give an idea of the massive amount of linguistic diversity in the world, I thought I'd give a tour of the various language families that the world's languages can be divided into. As I mentioned in an earlier posts, linguists classify languages based on genetic relationships - language A and language B can be shown to be genetically related to each other if they share a common ancestral language C from which they both evolved.
I'll begin with Africa, the continent from which modern humans originated and spread across the globe. Through research in the 1950s and 60s, the American linguist Joseph Greenberg determined the classification of African languages that is most widely accepted in the linguistic community today. He divided the languages of mainland Africa into four distinct language families - Khoisan, Niger-Congo, Nilo-Saharan, and Afro-Asiatic.
The Khoisan languages are today spoken in southwestern Africa, mostly in Namibia and Botswana; a few are spoken in eastern Africa, in Tanzania. These languages are spoken by ethnic groups that used to be known as "bushmen", but are now known as the Khoikhoi and the San. The Khoisan languages are well-known for their use of click consonants, which neighboring non-Khoisan langauges like Xhosa and Zulu have borrowed. The language of the people featured in the film "The Gods Must Be Crazy" was a Khoisan language.
The Niger-Congo languages cover most of the remainder of sub-Saharan Africa, from Senegal across West Africa to Kenya, all of equatorial Africa, and down to South Africa. The largest sub-branch of Niger-Congo is known as Bantu, and covers all of central and most of southern Africa. Well known Bantu languages include Kiswahili, Lingala, isiZulu, Sindebele, Chichewa, Kinyarwanda, and KiKongo. Non-Bantu Niger-Congo languages include Ewe, Yoruba, Igbo, Wolof, and the Mande languages, all spoken in West Africa between Senegal and Cameroon. The Niger-Congo languages are characterized by tones and by a complex system of noun classes.
The Nilo-Saharan languages are spoken in an arc through north-central Africa from Kenya in the east to Mali in the west. Major languages of this family include Kanuri and Songhay, spoken in the Niger and Mali region,
the Maasai language of southern Kenya, Dinka, spoken in southern Sudan, and the Fur language of Darfur, Sudan. One ancient language that belonged to this family was Nubian, spoken in today's Sudan. The Nilo-Saharan family is generally accepted as a valid one, but some linguists propose that its subbranches should be considered separate language families in their own right.
Finally, the Afro-Asiatic language family covers all of North Africa as well as the Horn region (Ethiopia and Somalia). This family also has members in western Asia, hence its name. There are six recognized sub-families: Omotic, Cushitic, Chadic, Berber, Egyptian, and Semitic. The Omotic languages are spoken by tribal communities in southern Ethiopia, and were probably the first group to diverge from the ancestral Proto-Afro-Asiatic language. The Cushitic languages are found in Ethiopia and Somalia; Somali is the most widely-spoken and well-known member. The Chadic languages are spoken in northern Nigeria and southern Niger as well as adjacent regions; the principal member is Hausa, an important trade language of the Niger river area. The Berber languages were historically spoken throughout the Sahara and northwestern Africa before the Islamic conquest; today they can be found in scattered areas through Morocco, Algeria, Mali, and other nations. Tuaregh is perhaps the best-known member. The Egyptian branch included the ancient Egyptian language, which today survives in the form of the Coptic language of Egypt's Christian community. Finally, the Semitic branch, the largest of the family by number of speakers, is found across North Africa and into the Middle East. The most important member is Arabic, including the many vernacular varieties; Hebrew and Modern Aramaic are other modern members, as are the Amharic, Tigre, and Tigrinya languages, the principal languages of Ethiopia and Eritrea. Many well-known civilizations of antiquity spoke Semitic languages, including the Akkadians, the Babylonians, the Assyrians, the Canaanites, the Phoenicians, and the Aramaeans. Three semitic languages are today important religious languages: Hebrew (Jewish scriptures), Aramaic (Jewish and Christian scriptures), and Arabic (Islamic scriptures). One of the distinguishing features of the Afro-Asiatic languages, and particularly the Semitic languages, is a template morphology, where three-consonant roots with a general meaning have patterns of vowels and affixes added to them with specific grammatical functions in order to produce the vocabulary of the language. An example from Arabic is the root K-T-B, which has the general meaning "write"; some of its derivatives include kitaab "book", kataba "to write",
maktuub "written", maktabat "library", iktitaab "subscription", etc.; another root, S-L-M, includes the derivations 'aslama "to submit", islaam "submission", muslim "one who submits", and salaam "peace".
It is thought that the Nilo-Saharan and Niger-Congo language families are related at a more remote level, while Afro-Asiatic is thought to be more closely related to language families in Eurasia. Khoisan, however, is distinct from most other language families, and is thought to be a survival of one of the first groups of languages to split from the ancestral human language (if indeed there was a single ancestral human language).