We have typed 275 men from five populations in Algeria, Tunisia, and Egypt with a set of 119 binary markers and 15 microsatellites from the Y chromosome, and we have analyzed the results together with published data from Moroccan populations. North African Y-chromosomal diversity is geographically structured and fits the pattern expected under an isolation-by-distance model. Autocorrelation analyses reveal an east-west cline of genetic variation that extends into the Middle East and is compatible with a hypothesis of demic expansion. This expansion must have involved relatively small numbers of Y chromosomes to account for the reduction in gene diversity towards the West that accompanied the frequency increase of Y haplogroup E3b2, but gene flow must have been maintained to explain the observed pattern of isolation-by-distance. Since the estimates of the times to the most recent common ancestor (TMRCAs) of the most common haplogroups are quite recent, we suggest that the North African pattern of Y-chromosomal variation is largely of Neolithic origin. Thus, we propose that the Neolithic transition in this part of the world was accompanied by demic diffusion of Afro-Asiatic–speaking pastoralists from the Middle East.
Many studies of African genetic diversity have concentrated on sub-Saharan and northeastern Africa, the most likely source region and corridor to the rest of the world (Tishkoff and Williams 2002). North Africa, however, may have followed a distinct evolutionary direction and requires further investigation. Genetic studies of this area, performed using classical markers, have revealed an agreement between genetic and geographic distances (Cavalli-Sforza et al. 1994) and a predominantly east-west structure to the genetic variation (Bosch et al. 1997). A compilation of 185 mtDNAs sampled across North Africa showed (1) that about half of the lineages belonged to the L haplogroups otherwise observed mainly in sub-Saharan Africa and (2) that most of the rest fell into haplogroup U6 (Salas et al. 2002), which perhaps originated in the Near East and spread into North Africa ~30 thousand years (KY) ago (KYA) (Maca-Meyer et al. 2003). Y-chromosomal studies are potentially highly informative about the origin of male-specific lineages, because of the detailed haplotypes that can be obtained and their high geographical specificity (Jobling and Tyler-Smith 2003), but previous studies have been restricted to limited regions of North Africa (Bosch et al. 1999, 2001; Flores et al. 2001; Manni et al. 2002; Luis et al. 2004). Together, these genetic analyses highlighted the similarity between northeastern Africa and the Middle East and the clear genetic differentiation between northwestern Africa and both sub-Saharan Africa and Europe, including Iberia. The Sahara and Mediterranean, despite the narrow width of the Strait of Gibraltar, seem to have acted as effective long-term barriers to Y-chromosomal gene flow.
To provide a more complete description of the North African pattern of Y-chromosomal variation, we have analyzed five additional populations: Algerian Arabs, Algerian Berbers, Tunisians, and North and South Egyptians (table 1). Binary polymorphisms (Underhill et al. 2000), including 12f2 (Casanova et al. 1985), were typed in the hierarchical fashion described elsewhere (Rosser et al. 2000; Paracchini et al. 2002), allowing the allelic states at 119 markers defining 117 haplogroups to be measured or inferred from the Y phylogeny (fig. 1A). In the North African sample, 30 binary markers were found to be polymorphic, identifying 23 different haplogroups (fig. 1A) (table A1 [online only]). Phylogenetically related haplogroups were classified into clusters, the frequencies of which are shown schematically in fig. 1B. With the existing data from Morocco (Bosch et al. 2001), the combined set now spans the northern part of the continent. In addition, samples from southern Europe, the Middle East, and sub-Saharan Africa were included in some analyses (Semino et al. 2000; Underhill et al. 2000; Cruciani et al. 2002). Our results reveal four main conclusions about the male-lineage variation in North Africa.
First, as shown in fig. 1B, the lineages that are most prevalent in North Africa are distinct from those in the regions to the immediate north and south: Europe and sub-Saharan Africa. This is illustrated by even a cursory examination of the commonest haplogroups: E3b2 is the most common haplogroup in North Africa, forming 42% of the combined sample. In contrast, R1b made up 55% of a mixed European sample (Underhill et al. 2000) and was even higher (77%) in the Iberian sample examined by Bosch et al. (2001), whereas E3a predominates in many sub-Saharan areas, being present at 64% in a pooled sample (Underhill et al. 2000; Cruciani et al. 2002). Such a finding is not surprising, in the light of the earlier genetic studies, but has an important implication: despite haplogroups shared at low frequency, suggesting limited gene flow, North African populations have a genetic history largely distinct from both Europe and sub-Saharan Africa over the timescales needed for the Y-chromosomal differentiation to develop.
Second, just two haplogroups predominate within North Africa, together making up almost two-thirds of the male lineages: E3b2 and J* (42% and 20%, respectively). E3b2 is rare outside North Africa (Cruciani et al. 2004; Semino et al. 2004 and references therein), and is otherwise known only from Mali, Niger, and Sudan to the immediate south, and the Near East and Southern Europe at very low frequencies. Haplogroup J reaches its highest frequencies in the Middle East (Semino et al. 2004 and references therein), whereas the J-276 lineage (equivalent to J* here) is most frequent in Palestinian Arabs and Bedouins. Lineages can rise to high frequency because of biological selection, social selection, and/or neutral drift. There is a suggestion that weak negative selection due to partial deletion of genes needed for spermatogenesis could act on both E3b2 and J (Repping et al. 2003), but this would tend to decrease their frequency, and there is no evidence for positive selection. It therefore seems likely that their increase was due to drift despite any negative selection, implying that male effective population size has been small. Indeed, gene diversity values increase along a latitudinal axis from west to east (fig. 2), and much of this variation is accounted for by haplogroup E3b2, which decreases in frequency in a corresponding fashion from ~76% in the Saharawis in Morocco to ~10% in Egypt (fig. 2). The same haplogroup has increased in frequency in many different populations within North Africa, so there must have been gene flow between them.
Third, there is strong geographical structure to the Y-chromosomal variation within the region. There is a high and significant correlation observed between genetic and geographical distances (r=0.55, P<.0005). Multidimensional scaling (MDS) analysis of genetic distances (Slatkin 1995) based on pairwise ΦST estimates (calculated using the program Arlequin) between 17 of the samples in fig. 1B showed a close correspondence with their relative geographical locations (fig. 3). Indeed, the positions of the samples in the MDS plot describe a latitudinal axis, from North Africa and the Middle East in the upper part to Central and southern Africa in the lower part. Furthermore, the pattern of genetic affinities among the North African samples parallels the west-east orientation quite precisely, from Morocco on the left-hand side to Egypt and the Middle East on the right. Spatial autocorrelation analysis (by AIDA; Bertorelle and Barbujani 1995) shows a clinal pattern of variation, more marked when Middle Eastern samples are included (fig. 4A and 4B). Haplogroup E3b2 itself shows a significant correlogram in a SAAP analysis (Sokal and Oden 1978) (fig. 4C). Furthermore, diversity within this haplogroup, measured using 15 Y-STRs (Thomas et al. 1999; Ayub et al. 2000), declines substantially towards the west (table A2 [online only]). These findings, together with the gene diversity pattern described above, are consistent with the hypothesis of a demic expansion from the Middle East.
Fourth, the time depth associated with the most common Y-chromosomal haplogroups in North Africa is shallow. Y-STR data (15 loci) were obtained for 256 Y chromosomes and revealed 201 different haplotypes (table A3 [online only]). Of these, only 16 were observed in more than one individual, but two were particularly frequent: one was present in 24 chromosomes from the Algerian Arab, Tunisian, and northern Egyptian populations, belonging, with one exception, to haplogroup E3b2*(xE3b2a); the second haplotype (observed in nine Tunisians) was associated with haplogroup J*. STR variability was used to estimate the TMRCA of North African chromosomes from individual haplogroups using the program BATWING (Wilson and Balding 1998), using either 15 loci (table A4 [online only]) or, to incorporate the Moroccan data (Bosch et al. 2001), 8 loci (table 2). The TMRCA of haplogroup E3b2 was estimated to be ~4.2 KY (95% CI 2.8–6.0 KY), using the mutation rate measured in father-son pairs (Kayser et al. 2000) and assuming 30 years per generation, or 6.9 (5.9–8.2) KY using the deduced “effective” mutation rate calibrated by historical events (Zhivotovsky et al. 2004) (table 2). The times for haplogroup J, the second-most-common haplogroup observed in North Africa (6.8 KY, 95% CI 4.4–11.1 KY; or 7.9 KY, 95% CI 6.6–9.1 KY) were also quite recent (table 2), supporting the idea of a recent demographic event. A network (Bandelt et al. 1999) of the E3b2*(xE3b2a) chromosomes, calculated using the program NETWORK, based on eight loci, showed a widespread high-frequency central haplotype (32%) and a starlike structure (fig. A1 [online only]). The Moroccan samples display low variability, and their chromosomes often occupy more-peripheral positions in the network. These findings together support our second conclusion, that genetic drift must have shaped the North African Y-chromosomal landscape.
Which historical or prehistorical demographic processes could explain the characteristics of the variation of Y-chromosomal lineages in North Africa? The current physical barriers, the Mediterranean Sea to the north and Sahara Desert to the south, could have provided genetic barriers leading to the separate evolutionary paths of the regions, although for the Sahara, episodes of more favorable climatic conditions could have relaxed this barrier at times, particularly during some intervals between ~10 KYA and ~5 KYA (Muzzolini 1993). There is no evident reason why it should have acted as a strong genetic barrier at such times, so, if there was substantial gene flow, the genetic differentiation between North and sub-Saharan Africa may postdate this period. A clinal pattern of haplogroup variation like the one we observe can be expected from an east-to-west population expansion, and the finding of lower E3b2 STR variation in the west than in central North Africa (table A2 [online only]), accompanied by a substantial increase in frequency of this haplogroup, is most readily explained by expansion into virtually uninhabited terrain by populations experiencing increasing drift (Barbujani et al. 1994).
The current distributions of the haplogroups can suggest geographical origins, and their TMRCAs provide some constraints on the times of their spread. The M35 lineage (see the phylogeny in fig. 1A for marker locations) is thought to have arisen in East Africa, on the basis of its high frequency and diversity there (Cruciani et al. 2004; Semino et al. 2004), and to have given rise to M81 in North Africa. The TMRCAs for E3b (8.3 KY, 95% CI 5.2–12.4 KY; or 14.4 KY, 95% CI 9.3–19.3 KY; table 2) and E3b2 (2.8–8.2 KY) should thus bracket the spread of E3b2 in North Africa. These times contrast sharply with estimates of 53 ± 21 KYA for the M35 lineage and 32 ± 11 KYA for the M81 lineage, by use of a constant-sized population model, or 30 ± 6 and 19 ± 4 KYA, respectively, by use of an expanding population model (Bosch et al. 2001). They are, however, more in accordance with times of 26.5 KYA (without a useful CI) for the M215 mutation (intermediate between M35 and M96 in the phylogeny; see fig. 1A) and 5.6 KYA for M81 (Cruciani et al. 2004) or of 29.2 ± 4.1 KYA for M35 and 8.6 ± 2.3 KYA for M81 (Semino et al. 2004). An origin for haplogroup J in the Middle East has been proposed (Semino et al. 2004 and references therein); the TMRCA of the J-M267 branch, found in both the Middle East and North Africa (and including our J* chromosomes), was estimated at 24.1 ± 9.4 KY and must predate its spread. This is consistent with our 95% TMRCA estimate of 4.4–11.1 KY for the North African chromosomes. Thus, although Moroccan Y lineages were interpreted as having a predominantly Upper Paleolithic origin from East Africa (Bosch et al. 2001), according to our TMRCA estimates, no populations within the North African samples analyzed here have a substantial Paleolithic contribution.
Early Neolithic sites are documented in the eastern part of North Africa and later ones in the west, which would be compatible with an east-to-west movement at this time, and this is also the case for the Arab expansion. Historical records of the Arab conquest, however, suggest that its demographic impact must have been limited (McEvedy 1980). In addition, genetic evidence shows that E3b2 is rare in the Middle East (Semino et al. 2004), making the Arabs an unlikely source for this frequent North African lineage. Parallel analyses between North Africa and Southern Europe have revealed strikingly similar patterns of Y chromosome variation which would support a scenario in which the Neolithic expansion, originating in the Middle East branched into two flows separated by the geographical barrier of the Mediterranean Sea. Indeed, as in North Africa, Y-chromosome variability in Southern Europe is clinal, gene diversity decreases from east to west, and genetic distances between North Africa and Southern Europe increase in a regular fashion from the Middle East toward the west (results not shown). Under the hypothesis of a Neolithic demic expansion from the Middle East, the likely origin of E3b in East Africa could indicate either a local contribution to the North African Neolithic transition (Barker 2003) or an earlier migration into the Fertile Crescent, preceding the expansion back into Africa.
In conclusion, we propose that the Y-chromosomal genetic structure observed in North Africa is mainly the result of an expansion of early food-producing societies. Moreover, following Arioti and Oxby (1997), we speculate that the economy of those societies relied initially more on herding than on agriculture, because pastoral economies probably supported lower numbers of individuals, thus favoring genetic drift, and showed more mobility than agriculturalists, thus allowing gene flow. Some authors believe that languages families are unlikely to be >10 KY old and that their diffusion was associated with the diffusion of agriculture (Diamond and Bellwood 2003). Since most of the languages spoken in North Africa and in nearby parts of Asia belong to the Afro-Asiatic family (Ruhlen 1991), this expansion could have involved people speaking a proto–Afro-Asiatic language. These people could have carried, among others, the E3b and J lineages, after which the M81 mutation arose within North Africa and expanded along with the Neolithic population into an environment containing few humans.