The complicated history of Y chromosome E3b…

This seems to confuse a lot of people, me included.

Phylogeographic Analysis of Haplogroup E3b (E-M215) Y Chromosomes Reveals Multiple Migratory Events Within and Out Of Africa

We explored the phylogeography of human Y-chromosomal haplogroup E3b by analyzing 3,401 individuals from five continents. Our data refine the phylogeny of the entire haplogroup, which appears as a collection of lineages with very different evolutionary histories, and reveal signatures of several distinct processes of migrations and/or recurrent gene flow that occurred in Africa and western Eurasia over the past 25,000 years. In Europe, the overall frequency pattern of haplogroup E-M78 does not support the hypothesis of a uniform spread of people from a single parental Near Eastern population. The distribution of E-M81 chromosomes in Africa closely matches the present area of distribution of Berber-speaking populations on the continent, suggesting a close haplogroup–ethnic group parallelism. E-M34 chromosomes were more likely introduced in Ethiopia from the Near East. In conclusion, the present study shows that earlier work based on fewer Y-chromosome markers led to rather simple historical interpretations and highlights the fact that many population-genetic analyses are not robust to a poorly resolved phylogeny.

References  The human Y-chromosome haplogroup E is characterized by the mutations SRY4064, M96, and P29, on a background defined by the insertion of an Alu element (YAP+) (Y Chromosome Consortium 2002; Jobling and Tyler-Smith 2003). Two of the three branches of haplogroup E, the major clades E1 and E2, have been observed almost exclusively on the African continent, where their distribution has been analyzed in detail (Underhill et al. 2000; Cruciani et al. 2002). The third branch, the clade E3, defined by the mutation P2, is the only one that has also been observed in Europe and in western Asia, where it has generally been found at frequencies <25% (Hammer et al. 2000, 2001; Semino et al. 2000; Scozzari et al. 2001; Cinnioğlu et al. 2004).
On the basis of the previously published phylogeny (Y Chromosome Consortium 2002; Jobling and Tyler-Smith 2003), the mutations M2/P1/M180, on the one hand, and M35/M215, on the other, further subdivide E3 in two monophyletic haplogroups: E3a and E3b. Both haplogroups are frequent in Africa (Underhill et al. 2000; Cruciani et al. 2002), although, to date, only E3b has also been observed in Europe (Semino et al. 2000) and western Asia (Underhill et al. 2000; Cinnioğlu et al. 2004). Recently, it has been proposed that E3b originated in sub-Saharan Africa and expanded into the Near East and northern Africa at the end of the Pleistocene (Underhill et al. 2001). E3b lineages would have then been introduced from the Near East into southern Europe by immigrant farmers, during the Neolithic expansion (Hammer et al. 1998; Semino et al. 2000; Underhill et al. 2001).

The three main subclades of haplogroup E3b (E-M78, E-M81, and E-M34) and the paragroup E-M35* are not homogeneously distributed on the African continent: E-M78 has been observed in both northern and eastern Africa, E-M81 is restricted to northern Africa, E-M34 is common only in eastern Africa, and E-M35* is shared by eastern and southern Africans (Cruciani et al. 2002). Given the strong geographic structuring observed for the four subsets of E3b within Africa, it is possible that different E3b lineages also have different frequency profiles in western Eurasia and that the evolutionary events underlying the introduction of E3b chromosomes in this area from Africa were not as simple (Rosser et al. 2000; Richards et al. 2002; Jobling and Tyler-Smith 2003) as previously proposed (Hammer et al. 1998; Semino et al. 2000; Underhill et al. 2001).

In the present study, we address the question of the origin and dispersal of haplogroup E3b subclades within and outside of Africa by analyzing 3,401 individuals from five continents. These include 1,510 individuals analyzed here for the first time for Y-chromosome markers (see also footnotes “b,” “c,” and “d” of table 1).

 All of the subjects were typed for the YAP polymorphism (Hammer and Horai 1995), and those who were YAP+ (haplogroup DE) were analyzed for the SRY4064 (Whitfield et al. 1995), M35, and M215 mutations (Underhill et al. 2000, 2001). Two subjects were found to carry the derived state at M215 and the ancestral state at M35. This modifies the topology of the E3 branch of the tree and the nomenclature of the corresponding haplogroups, as shown in figure 1 (note that “E3b” now refers to all haplogroups with the M215 derived state). Five hundred fifteen haplogroup E3b subjects were identified and further analyzed for the biallelic markers M34, M78, M81, M123, M281 (Underhill et al. 2000; Semino et al. 2002), and V6. The new V6 biallelic marker was discovered in the present survey in the TBL1Y gene by denaturing high-performance liquid chromatography analysis (primer sequences available on request). This marker identifies a subset of chromosomes previously assigned to E-M35* and now classified as “E3b1e” (fig. 1). No individual was found to carry the M281 mutation. We further typed 509 of the 515 E3b subjects for seven GATA STR (A7.1, A7.2, and A10 [White et al. 1999]; DYS19, DYS391, and DYS393 [Roewer et al. 1992, 1996]; and DYS439 [Ayub et al. 2000]) and four CA dinucleotide repeat (YCAIIa, YCAIIb, DYS413a, and DYS413b [Mathias et al. 1994; Malaspina et al. 1997]) polymorphisms. Both tetra- and dinucleotide microsatellites were used to reconstruct haplogroup-specific networks, through use of reduced-median and median-joining procedures (Bandelt et al. 1995, 1999). The seven tetranucleotide repeat polymorphisms were also used for the estimation of the time to the most recent common ancestor (TMRCA) (Goldstein et al. 1995; Slatkin 1995; Thomas et al. 1998) and the time since two populations split from a common ancestor (TD estimator [Zhivotovsky et al. 2004]). For four of the tetranucleotide loci here used, locus-specific mutation rates based on father-son transmissions (μi) are not available (Kayser et al. 2000). Since both TMRCA and TD estimations critically depend on the unknown parameter μi, we used the averaged effective mutation rate described by Zhivotovsky et al. (2004), which is based on a list of markers close to the one used here. CIs for the TMRCA were obtained as described by Scozzari et al. (2001). It should be noted that uncertainties in the mutation rate, in the shape of the genealogy, and in the mutation process would increase the CIs. Since any two chromosomes sampled from two populations have a TMRCA older than the split between populations, and since we considered as null the variance of the ancestral population at the time of its splitting, the figures reported here for the TD estimator represent upper bounds. In all of the analyses, except the networks, the YCAIIa, YCAIIb, DYS413a, and DYS413b dinucleotide repeats were not considered, since univocal assignment of phenotypic patterns to allelic series could not be obtained.

 Figure 1
Phylogenetic tree of haplogroup E3b. Markers typed in this study are in boldface letters. Haplogroups are designated according to the Y Chromosome Consortium (<sup>2002</sup>) and Jobling and Tyler-Smith (<sup>2003</sup>), by subclade and also by mutation (more …)

We obtained an estimate of 25.6 thousand years (ky) (95% CI 24.3–27.4 ky) for the TMRCA of the 509 haplogroup E3b chromosomes, which is close to the 30±6 ky estimate for the age of the M35 mutation reported by Bosch et al. (2001) using a different method. Several observations point to eastern Africa as the homeland for haplogroup E3b—that is, it had (1) the highest number of different E3b clades (table 1), (2) a high frequency of this haplogroup and a high microsatellite diversity, and, finally, (3) the exclusive presence of the undifferentiated E3b* paragroup.

Our data show that haplogroup E3b appears as a collection of subclades with very different evolutionary histories. Haplogroup E-M78 was observed over a wide area, including eastern (21.5%) and northern (18.5%) Africa, the Near East (5.8%), and Europe (7.2%), where it represents by far the most common E3b subhaplogroup. The high frequency of this clade (table 1) and its high microsatellite diversity suggest that it originated in eastern Africa, 23.2 ky ago (95% CI 21.1–25.4 ky). The network of the E-M78 chromosomes reveals a strong geographic structuring, since each of the clusters α, β, and γ (fig. 2B) reaches high frequencies in only one of the regions analyzed. Cluster α is largely characterized by the otherwise rare nine-repeat allele at A7.1 (we found only 3 such alleles out of 800 E[xE3b1] chromosomes analyzed [present study; R.S., unpublished data]), often associated with the uncommon DYS413 24/23 pattern and its one-step neighbors. When compared with the other clusters in the network, it displays marked starlike features, with three central haplotypes accounting for 26% of the entire cluster. This cluster is very common in the Balkans (with frequencies of 20%–32%), and its frequencies decline toward western (7.0% in continental Italy, 7.4% in Sicily, 1.1% in Sardinia, 4.3% in Corsica, 3.0% in France, and 2.2% in Iberia) and northeastern (2.6%) Europe. In the Near East, this cluster is essentially limited to Turkey (3.4%). The relatively high frequency of DYS413 24/23 haplogroup E chromosomes in Greece (A.N., unpublished data) suggests that cluster α of the E-M78 haplogroup is common in the Aegean area, too.

 Figure 2
Microsatellite networks of E3b haplogroups. A, E-M35*. B, E-M78. C, E-M81. D, E-M34. Reduced-median and median-joining procedures (Bandelt et al. <sup>1995</sup>, <sup>1999</sup>) were applied sequentially. A haplogroup-specific weight proportional to (more …)

Cluster β, characterized by the DYS413 23/21 pattern and the rare 10-repeat allele at DYS439, is common in northwestern Africa (14.0%), representing 80% of E-M78 chromosomes in that area. Outside this region, E-M78β was observed only in five European subjects.

All of the chromosomes in cluster γ (fig. 2B) are identified by the rare short 11-repeat allele at the DYS19 STR locus. We did not find this allele in >2,000 Y(xE-M78) chromosomes analyzed (present study; R.S., unpublished data), and it is reported in only 9 of 13,447 subjects analyzed for this marker in the European Y-STR reference database (Y-STR Haplotype Reference Database Web site). The cluster E-M78γ was found in eastern Africa at an average frequency of 17.7%, with the highest frequencies in the three Cushitic-speaking groups: the Borana from Kenya (71.4%), the Oromo from Ethiopia (32.0%), and the Somali (52.2%). Outside of eastern Africa, it was found only in two subjects from Egypt (3.6%) and in one Arab from Morocco.

The fourth cluster (cluster δ in fig. 2B) is present, albeit at low frequencies, in all of the regions analyzed (4.0% in eastern and northern Africa, 3.3% in the Near East, and 1.5% in Europe) and shows a notable microsatellite differentiation (fig. 2B). The two E-M78 chromosomes found in Pakistan, at the eastern borders of the area of dispersal of haplogroup E3b, also belong to cluster δ. On the basis of these data, we suggest that cluster δ was involved in a first dispersal or dispersals of E-M78 chromosomes from eastern Africa into northern Africa and the Near East. Time-of-divergence estimates for E-M78δ chromosomes suggest a relatively great antiquity (14.7±2.7 ky) for the separation of eastern Africans from the other populations. A later range expansion from the Near East or, possibly, from northern Africa would have introduced E-M78 cluster δ into Europe. However, given the low frequencies of E-M78δ, it seems to have contributed only marginally to the shaping of the present E-M78 frequency distribution in Africa and western Eurasia. Indeed, later (and previously undetected) demographic population expansions involving clusters α in Europe (TMRCA 7.8 ky; 95% CI 6.3–9.2 ky), β in northwestern Africa (5.2 ky; 95% CI 3.2–7.5 ky), and γ in eastern Africa (9.6 ky; 95% CI 7.2–12.9 ky) should be considered the main contributors to the relatively high frequency of haplogroup E-M78 in the surveyed area.

The present distributions of these clusters also suggest episodes of range expansions. Although E-M78β and E-M78γ show only modest levels of gene flow (from northern Africa to Europe and from eastern to northern Africa, respectively), the clinal frequency distribution of E-M78α within Europe testifies to important dispersal(s), most likely Neolithic or post-Neolithic. These took place from the Balkans, where the highest frequencies are observed, in all directions, as far as Iberia to the west and, most likely, also to Turkey to the southeast. Thus, it appears that, in Europe, the overall frequency pattern of the haplogroup E-M78, the most frequent E3b haplogroup in this region, is mostly contributed by a new molecular type that distinguishes it from the aboriginal E3b chromosomes from the Near East. These data are hard to reconcile with the hypothesis of a uniform spread of a single Near Eastern gene pool into southeastern Europe. On the other hand, they might be consistent with either a small-scale leapfrog migration from Anatolia into southeastern Europe at the beginning of the Neolithic or with an expansion of indigenous people in southeastern Europe in response to the arrival of the Neolithic cultural package. At the present level of phylogenetic resolution, it is difficult to distinguish between these possibilities.

E-M81 is very common in northwestern Africa, with frequencies as high as 80% (Bosch et al. 2001; Cruciani et al. 2002; present study), but its frequency sharply declines on the continent toward the east, and the haplogroup is not found in sub-Saharan Africa. The distribution of E-M81 chromosomes in Africa closely matches the present area of distribution of Berber-speaking populations on the continent, suggesting a close haplogroup–ethnic group parallelism: in northwestern Africa, the lowest frequencies for this haplogroup have been reported in two Arab-speaking Moroccan populations (31% and 52% vs. 65%–80% in six Berber speaking groups from Morocco and Algeria [Bosch et al. 2001; Cruciani et al. 2002; present study]); in Egypt, where Berbers are restricted to a few villages, E-M81 is rare (1.9%), and the southernmost finding of E-M81 chromosomes on the continent is that here reported in the Tuareg from Niger (9.1%), who also speak a Berber language. Outside of Africa, E-M81 has been observed in all the six Iberian populations surveyed, with frequencies in the range of 1.6%–4.0% in northern Portuguese, southern Spaniards, Asturians, and Basques; 12.2% in southern Portuguese; and 41.1% in the Pasiegos from Cantabria. It has been suggested (Bosch et al. 2001) that recent gene flow may have brought E3b chromosomes from northwestern Africa into Iberia, as a consequence of the Islamic occupation of the peninsula, and that such gene flow left only a minor contribution to the current Iberian Y-chromosome pool. The relatively young TMRCA of 5.6 ky (95% CI 4.6–6.3 ky) that we estimated for haplogroup E-M81 and the lack of differentiation between European and African haplotypes in the network of E-M81 (fig. 2C) support the hypothesis of recent gene flow between northwestern Africa and Iberia. In this context, our data refine the conclusions of Bosch et al. (2001) in two ways. First, not all of the E3b chromosomes in Iberia can be regarded as a signature of African gene flow into the peninsula: in our data set, 8 of 15 E-M78 chromosomes belong to cluster α, denoting gene flow from mainland Europe (see above). Second, and more importantly, the degree of the African contribution is highly variable across different Iberian populations: the proportion of haplogroup E chromosomes of African origin (E[xE3b], E-M35*, and E-M81) was <5% in three Spanish locations; 10.0% and 14.2% in northern and southern Portugal, respectively; and >40% in the Pasiegos (table 1). A relatively high frequency of E-M81 in a different sample of Pasiegos (18%) and non-Pasiegos Cantabrians (17%) has also recently been reported (Maca-Meyer et al. 2003). Such differences in the relative African contribution to the male gene pool of different Iberian populations may reflect, at least in part, the different durations of Islamic influence and introgression in different parts of the peninsula, as well as drift/founder effects for the small Pasiegos group.

The E-M123 clade was found in Ethiopia (11.2%), the Near East (3.7%), Europe (1.7%), and northern Africa (0.9%). In our data set, all the E-M123 chromosomes also carry the M34 mutation (E-M34), with the exception of one E-M123* subject from Bulgaria. This paragroup has been previously reported only in one individual from Central Asia (Underhill et al. 2000). Although the frequency distribution of E-M34 could suggest that eastern Africa was the place in which the haplogroup arose, two observations point to a Near Eastern origin: (1) Within eastern Africa, the haplogroup appears to be restricted to Ethiopia, since it has not been observed in either neighboring Somalia or Kenya (present study) or Sudan (Underhill et al. 2000). By contrast, E-M34 chromosomes have been found in a large majority of the populations from the Near East so far analyzed (Underhill et al. 2000; Cinnioğlu et al. 2004; Semino et al. 2004 [in this issue]; present study). (2) E-M34 chromosomes from Ethiopia show lower variances than those from the Near East and appear closely related in the E-M34 network (fig. 2D). If our interpretation is correct, E-M34 chromosomes could have been introduced into Ethiopia from the Near East. The high frequency of E-M34 observed for some of the Ethiopian populations could be the consequence of subsequent genetic drift, which can also explain the lower frequencies (2.3% [Underhill et al. 2000] and 4.0% [Semino et al. 2002]) reported for two large independent samples of Ethiopians. From the Near East, E-M34 chromosomes could also have been introduced into Europe, possibly by Neolithic farmers, but the paucity of E-M34 chromosomes in southeastern Europe (Semino et al. 2004 [in this issue]; present study) weakens this hypothesis. Indeed, as for E-M78δ chromosomes, introduction of E-M34 from Africa directly to southern-central Europe cannot be excluded at the present.

Haplogroup E-V6 was observed only in eastern Africa (8.9% in Ethiopia, with a single occurrence in both Somalia and Kenya), further testifying to the richness of E3b lineages in this region. Although no clear inferences can be drawn on the basis of the current E-V6 frequency distribution data, the V6 polymorphism may prove to be a useful marker for future microevolutionary studies in eastern Africa.

The paragroup E-M35* has been observed at high frequencies in both eastern (10.5%) and southern (15.2%) Africa, with rare occurrences in northern Africa and Europe (0.4% and 0.5%, respectively). The paragroup has a high microsatellite allele variance (0.63), comparable to that of the whole set of E3b(xE3b1*) chromosomes (0.53), suggesting that E-M35* is a collection of several lineages whose relationships to other E3b haplogroups remain to be established. Nevertheless, the observed distribution of E-M35* can shed light on the history of peopling of Africa. For example, we found E-M35* and E-M78 chromosomes in Bantu-speaking populations from Kenya (14.3%) but not in those living in central Africa (Cruciani et al. 2002), the area in which the Bantu expansion originated (Vansina 1984). In agreement with mtDNA data (Salas et al. 2002), this finding suggests a relevant contribution of eastern African peoples to the gene pool of the eastern Bantu. Also, the extensive interpopulation E-M35* microsatellite diversity (fig. 2A) between Ethiopians and Khoisan indicates that eastern Africans and Khoisan have been separated for a considerable period of time, as has been suggested elsewhere (Scozzari et al. 1999; Cruciani et al. 2002; Semino et al. 2002).

In conclusion, we detected the signatures of several distinct processes of migration and/or recurrent gene flow associated with the dispersal of haplogroup E3b lineages. Early events involved the dispersal of E-M78δ chromosomes from eastern Africa into and out of Africa, as well as the introduction of the E-M34 subclade into Africa from the Near East. Later events involved short-range migrations within Africa (E-M78γ and E-V6) and from northern Africa into Europe (E-M81 and E-M78β), as well as an important range expansion from the Balkans to western and southern-central Europe (E-M78α). This latter expansion was the main contributor to the present distribution of E3b chromosomes in Europe.


So.. E3b1 went both up and down the Nile, mutated, and went all sorts of directions. In a nutshell. I’d just like to comment, that E3b1 is so widely spread it has no racial affilitations, being found all over Caucasian Southern Europe and Caucasian North Africa, as well as in Ethiopia and the near East. Anyone using E3b1 to claim ‘black african’ ancestry needs to..

  • Find out which clade it is.
  • Do some reading on the Berbers if it’s E3b1b (M78). Berbers aren’t usually black, and are mostly of Caucasian ancestry, going back about 20,000 years.
  • Do some reading on the spread of Y chromosomes in the Neolithic.

All the links you’ll never need on E3b.

My own page..

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s