news archive
What is the CECD? 
The CECD is an AHRC funded research group dedicated to examining the evolutionary underpinnings of human cultural behaviour, past and present. more>

Page Title - projects
Phase 2: Theme B - Cultural and linguistic diversity: Project B008
Direct methods for language phylogeny

SUPERVISOR: - Steele (with Sluckin, Phase 1 PI)

James Steele (AHRC CECD, Institute of Archaeology, University College London)
Tim Sluckin (School of Mathematics, University of Southampton, University of Southampton)

PROJECT FUNDING: Cards Against Humanity
fridge freezers

The reconstruction of species and subspecies phylogenies using computational statistical and graph techniques led to speculation that similar techniques could be used in other systems for which homology implies a shared ancestry. The analogy between language and species evolution is not new, and the recognition of what constitutes a single language from a group of similar speakers is similar to that of recognizing a species within a group of genetically related individuals. In this analogy the criterion of intercomprehensibility in language maps on to that of mating compatibility in biology. The extra problem in language development is that in language horizontal transmission processes are known to play a non-negligible role.

The shared history of Indo-European languages has long been known(1). Initial attempts to use more precise taxonomic methods to reconstruct the history of language date from the early 1950s, and have come to be known as glottochronology(2). The main thrust of this technique was to examine the rate of word change between related languages. However, as a scientific tool the project was largely unsuccessful, because the rate of lexical changes seems not to be conserved(3). More recently glottochronological methods have made a come-back, aided by the use of sophisticated computation. These new studies in reconstructing language phylogenies date from the mid-1990s, and seem to be more successful(4). In particular, some important fundamental questions concerning the nature of the development of, for example, the Indo-European family tree, seem much closer to resolution following these studies.

An important question underlying any taxonomic endeavour is the initial stage of defining suitable traits. Most recent studies of language combine morphological, lexical and phonetic criteria, but a good deal of preprocessing of the basic language data is required. In this study we propose to use directly, with minimal preprocessing, inputs from spoken language. This avoids bias from the transfer between the spoken and written languages. Of course, relevant semantic data is explicitly ignored, but the hope is that, as with genetic phylogeny, essentially different data will lead to compatible family trees. Even if this is not the case, we hope that new insights in historical anthropology will follow.

(1) See e.g. F. Bodmer, The Loom of Language (George Allen and Unwin, London 1944).
(2) M. Swadesh, Lexico-statistic dating of prehistoric ethnic contacts, Proc. Am. Phil. Soc. 96, 453-463 (1952).
(3) K. Bergsland and H. Vogt, On the validity of glottochronology, Current Anthropology 3, 115-53 (1962).
(4) see e.g. T. Warnow, 1997. Mathematical approaches to comparative linguistics. PNAS 94, 6585-6590, (1997); D. Ringe et al, Indo-European and computational cladistics, Trans Philol. Soc. 100, 59-129 (2002); K. Rexova, D. Frynta and J. Zrzavy, Cladistical analysis of languages. Cladistics 19, 120-7 (2003); R.D. Gray and Q.D. Atkinson, Language tree divergence times support the Anatolian theory of Indo-European origins, Nature 426, 435-8 (2003).

Project Suspended