Revisiting India's Harappan civilization including new answers as to where the Indo-European languages came from.

By Eric Vandenbroeck 2 June 2018

We know that the Indus River Valley civilization lasted for around 2,000 years, and extended from what is today northeast Afghanistan to Pakistan and northwest India. The civilization was wiped out 4,350 years ago by a 900-year-long drought, scientists at the Indian Institute of Technology in Kharagpur (IIT-Kgp) have found. Evidence gathered during their study also put to rest the widely accepted theory that the said drought lasted for only about 200 years. Also, recently, 95 scientists, massive DNA study that shows how migrants into India from the west and north contributed to local DNA and which aligns with recent analyses on Indo-European languages coming into the subcontinent from the northwest as well.

Where earlier the Nazi's, today Hindu Nationalists sometimes create a bit of their own take on its history. From a scientific point of view, the genetic formation of Central and South Asian populations, has been the subject of intense scrutiny.

It is generally accepted that both South Asia and Europe were affected by two successive migrations. The first migration was from the Near East after around nine thousand years ago, which brought farmers who mixed with local hunter-gatherers. The second migration was from the steppe around after around five thousand years ago, which brought pastoralists who probably spoke Indo-European languages, who then mixed with the local farmers they encountered along the way. Mixtures of these mixed groups then formed two gradients of ancestry: one in Europe, and one in India.

Thus the larger region including India is the outcome of mixtures between two highly differentiated populations, the so-called Ancestral North Indians and Ancestral South Indians, who before their mixture were as different from each other as Europeans and East Asians are today.

According to this, the Ancestral North Indians are related to Europeans, central Asians, Near Easterners, and people of the Caucasus, whereby the Ancestral South Indians descend from a population not related to any present-day populations outside of India. The result is that everyone in mainland India today is a mix, albeit in different proportions, of ancestry related to West Eurasians, and ancestry more closely related to diverse East Asian and South Asian populations. No group in India can claim genetic purity.

Groups in India that speak Indo-European languages typically have more Ancestral North Indians ancestry than those speaking Dravidian languages, who have more Ancestral South Indians ancestry. This suggests that the Ancestral North Indians (henceforth ANI) probably spread Indo-European languages, while the Ancestral South Indians (henceforth ASI) spread Dravidian languages.

As the above-quoted article suggests other researchers, including archaeologist Colin Renfrew of the University of Cambridge in the United Kingdom, had argued that the earlier Anatolian farmers were the original Proto-Indo-European speakers. The new data “make a strong case” for the Yamnaya as carriers of Indo-European languages, Renfrew says. But he still thinks Anatolian farmers could have spoken the earliest language in that family. Yet this latter theory might possibly have been disproven for recent evidence from the palatial archives of the ancient city of Ebla in Syria argues that Indo-European was already spoken in modern-day Turkey in the 25th century BCE.

This said, I think more research is needed to support the latter claim, i.e. that Anatolian speakers were already present in modern-day Turkey in the first half of the 3rd millennium BCE. If future research firmly establishes the onomastics cited by Kroonen, Barjamovic & Peyrot is really Anatolian, then the Indo-Hittite hypothesis (an alternate name for Indo-European, called so by some historical linguists who believe that Anatolian languages broke off well before there were any further divisions in Proto-Indo-European) will be vindicated and the following claim by Kroonen, Barjamovic & Peyrot will prove true: “[S]ince the onomastic evidence from Armi is contemporaneous with the Yamnaya culture (3000-2400 BCE), a scenario in which the Anatolian Indo-European language was linguistically derived from Indo-European speakers originating in this culture can be rejected. This important result offers new support for the Indo-Hittite hypothesis (see above) and strengthens the case for an Indo-Hittite speaking ancestral population from which both Proto-Anatolian and residual Proto-Indo-European split off no later than the 4th millennium BCE” (p. 7).

David Anthony, an anthropologist at Hartwick College, has pointed out, that language shifts generally flow in the direction of groups that have higher economic status, more political power and higher prestige, Anthony says. And in the most brutal situations, it will flow in the direction of people who survived.

We know that groups of traditionally higher social status in the Indian caste system typically have a higher proportion of ANI ancestry than those of traditionally lower social status, even within the same state of India where everyone speaks the same language. For example, Brahmins, the priestly caste, tend to have more ANI ancestry than the groups they live among, even those speaking the same language. Although there are groups in India that are exceptions to these patterns, including well-documented cases where whole groups have shifted social status, the findings are statistically clear and suggest that Ancestral North Indian/ Ancestral South Indian mixture in ancient India occurred in the context of social stratification.

The genetic data from Indians today also reveal something about the history of differences in social power between men and women. Around 20 to 40 percent of Indian men and around 30 to 50 percent of eastern European men have a Y-chromosome type that, based on the density of mutations separating people who carry it, descends in the last sixty-eight hundred to forty-eight hundred years from the same male ancestor.(1) In contrast, the mitochondrial DNA, passed down along the female line, is almost entirely restricted to India, suggesting that it may have nearly all come from the ASI, even in the north. The only possible explanation for this is major migration between West Eurasia and India in the Bronze Age or afterward. Males with this Y chromosome type were extraordinarily successful at leaving offspring while female immigrants made far less of a genetic contribution.

The discrepancy between the Y-chromosome and mitochondrial DNA patterns initially confused historians.(2) But a possible explanation is that most of the ANI genetic input into India came from males. This pattern of sex-asymmetric asymmetric population mixture is disturbingly familiar. Consider African Americans. The approximately 20 percent of ancestry that comes from Europeans derives in an almost four-to-one ratio from the male side.(3) Consider Latinos from Colombia. The approximately 80 percent of ancestry that comes from Europeans is derived in an even more unbalanced way from males (a fifty-to-one ratio).(4) Whereby the common thread is that males from populations with more power tend to pair with females from populations with less. It is amazing that genetic data can reveal such profound information about the social nature of past events.

Most if not all Indian groups that have been analyzed had Ancestral North Indians/Ancestral South Indians mixture dates between four thousand and two thousand years ago, with Indo-European-speaking groups having more recent mixture dates on average than Dravidian-speaking groups. Whereby the older date in Dravidians makes sense, in that the present-day locations of people do not necessarily reflect their past locations. Suppose that the first round of mixture in India happened in the north close to four thousand years ago and was followed by subsequent waves of a mixture in northern India as previously established populations and people with much more West Eurasian ancestry came into contact repeatedly along a boundary zone. The people who were the products of the first mixtures in northern India could plausibly, over thousands of years, have mixed with or migrated to southern India, and thus the dates in southern Indians today would be those of the first round of mixture. Later waves of a mixture of West Eurasian–related people into northern Indian groups would then cause the average date of mixture estimated in northern Indians today to be more recent than in southern Indians.

A hard look at the genetic data confirms the theory of multiple waves of Ancestral North Indian-related mixture into the north. Interspersed among the short stretches of Ancestral North Indian-derived DNA we find in northern Indians, we also find quite long stretches of Ancestral North Indian-derived DNA, which must reflect recent mixtures with people of little or no Ancestral South Indian ancestry.(5)

The patterns are consistent with the hypothesis that all of the mixtures of Ancestral North Indian and Ancestral South Indian ancestry that occurred in the history of some present-day Indian groups happened within the last four thousand years. This meant that the population structure of India before around four thousand years ago was profoundly different from what it is today. Before then, there were unmixed populations, but afterward, there was a convulsive mixture in India, which affected nearly every group.

So around the time, the Indus Civilization collapsed - there was a profound mixture of populations that had previously been segregated. Today in India, people speaking different languages and coming from different social statuses have different proportions of Ancestral North Indian ancestry. Today, Ancestral North Indian ancestry in India derives more from males than from females. This pattern is exactly what one would expect from an Indo-European-speaking people taking the reins of political and social power after four thousand years ago and mixing with the local peoples in a stratified society, with males from the groups in power having more success in finding mates than those from the disenfranchised groups.

One also can conclude that early farmers of the Near East were related to people living today, and that present-day Europeans have a strong genetic affinity to early farmers from Anatolia, consistent with a migration of Anatolian farmers into Europe after nine thousand years ago. Present-day people from India have a strong affinity to ancient Iranian farmers, suggesting that the expansion of Near Eastern farming eastward to the Indus Valley after nine thousand years ago had as important an impact on the population of India.(6)

Present-day people in India also have strong genetic affinities to ancient steppe pastoralists. This situation is reminiscent of what has been found in Europe, where today’s populations are a mixture not just of indigenous hunter-gatherers and migrant farmers, but also of a third major group with an origin in the steppe.  Iranian farmers made a major impact on India twice, admixing both into the ANI and the ASI. And it has been established that ANI was a mixture of about 50 percent steppe ancestry - related distantly to the Yamnaya, and 50 percent Iranian farmer - related ancestry from the groups the steppe people encountered as they expanded south.

As for present-day people in India, how could the genetic evidence of an impact of an Iranian farming expansion on the population of India be reconciled with the evidence of steppe expansions? The situation might be reminiscent of in Europe, where today’s populations are a mixture not just of indigenous hunter-gatherers and migrant farmers, but also of a third major group with an origin in the steppe. To gain some insight, Iosif Lazaridis in my laboratory wrote down mathematical models for present-day Indian groups as mixtures of populations related to Little Andaman Islanders, ancient Iranian farmers, and ancient steppe peoples. What he found is that almost every group in India has ancestry. Instead, people descended from Iranian farmers made a major impact on India twice, admixing both into the ANI and the ASI. Patterson proposed a major revision to our working model for deep Indian history. We can presume that the ANI was a mixture of about 50 percent steppe ancestry-related distantly to the Yamnaya, and 50 percent Iranian farmer–related ancestry from the groups the steppe people encountered as they expanded south. The ASI was also mixed, a fusion of a population descended from earlier farmers expanding out of Iran (around 25 percent of their ancestry), and previously established local hunter-gatherers of South Asia (around 75 percent of their ancestry). So the ASI was not likely to have been the previously established hunter-gatherer population of India and instead may have been the people responsible for spreading Near Eastern agriculture across South Asia. Based on the high correlation of ASI ancestry to Dravidian languages, it seems likely that the formation of the ASI was the process that spread Dravidian languages as well.

These results reveal a remarkably parallel tale of the prehistories of two similarly sized subcontinents of Eurasia, Europe, and India. In both regions, farmers migrating from the core region of the Near East after nine thousand years ago - in Europe from Anatolia, and in India from Iran - brought a transformative new technology, and interbred with the previously established hunter-gatherer populations to form new mixed groups between nine thousand and four thousand years ago. Both subcontinents were then also affected by a second later major migration with an origin in the steppe, in which Yamnaya pastoralists speaking an Indo-European language mixed with the previously established farming population they encountered along the way, in Europe forming the peoples associated with the Corded Ware culture, and in India eventually forming the ANI. These populations of mixed steppe and farmer ancestry then mixed with the previously established farmers of their respective regions, forming the gradients of mixture we see in both subcontinents today.

The Yamnaya, who the genetic data show was closely related to the source of the steppe ancestry in both India and Europe, are obvious candidates for spreading Indo-European languages to both these subcontinents of Eurasia. Analysis of population history in India provided an additional line of evidence for this whereby the Indian Cline was based on the idea of a simple mixture of two ancestral populations, the ANI and ASI. But when tested, each of the Indian Cline groups in turn for whether it fits this model, and that there are groups that do not fit in the sense of having a higher ratio of steppe-related to Iranian farmer–related ancestry than is expected. All of these groups are in the Brahmin varna, with a traditional role in society as priests and custodians of the ancient texts written in the Indo-European Sanskrit language, despite the fact that Brahmins made up only about 10 percent of the groups. A natural explanation for this was that the ANI was not a homogeneous population when they mixed with the ASI, but instead contained socially distinct subgroups with characteristic ratios of steppe to Iranian-related ancestry. The people who were custodians of Indo-European language and culture were the ones with relatively more steppe ancestry, and because of the extraordinary strength of the caste system in preserving ancestry and social roles over generations, the ancient substructure in the ANI is evident in some of today’s Brahmins even after thousands of years. This finding provides yet another line of evidence for the steppe hypothesis, showing that not just Indo-European languages, but also Indo-European culture as reflected in the religion preserved over thousands of years by Brahmin priests, was likely spread by peoples whose ancestors originated in the steppe.

According to a recent book by David Reich three very different possibilities are still on the table. One is that Indus Valley Civilization people were largely unmixed descendants of the first Iranian-related farmers of the region, and spoke an early Dravidian language. A second possibility is that they were the ASI, already a mix of people related to Iranian farmers and South Asian hunter-gatherers and if so they would also probably have spoken a Dravidian language. A third possibility is that they were the ANI, already mixed between steppe and Iranian farmer, related ancestry, and thus would instead likely have spoken an Indo-European language.(6)

Above can be seen a recent imagery of meta-analysis of data of the Indus Valley Civilization details settlement patterns during the ancient Indus period and just after, when a host of factors, including possibly climate change, seem to have contributed to a re-allocation of populations between types of settlements.

Whereby another recent study includes the suggestions re. the Indus and the Bactria Margiana Archaeological Complex.(7)

These scenarios have very different implications, but with ancient DNA, this and other great mysteries of the Indian past will no doubt in not a too long time be resolved. Deserving an article all by itself a good example of such future possibilities can be found here. For another (not yet peer-reviewed) article that once more suggests that the first Indo-European language might indeed have arisen south of the Caucasus mountains, only spreading to other parts of Europe and Asia as people migrated north from this region. The findings are currently available on BioRxiv, whereby for a comment see here.

The influence of the Indo-European languages in the wider areas can be seen on the following map:


1. P. A. Underhill et al., “The Phylogenetic and Geographic Structure of Y-Chromosome Haplogroup R1a,” European Journal of Human Genetics 23 (2015): 124–31.

2. S. Perur, “The Origins of Indians: What Our Genes Are Telling Us,” Fountain Ink, December 3, 2013.

3. K. Bryc et al., “The Genetic Ancestry of African Americans, Latinos, and European Americans Across the United States,” American Journal of Human Genetics 96 (2015): 37–53.

4. L. G. Carvajal-Carmona et al., “Strong Amerind/White Sex Bias and a Possible Sephardic Contribution Among the Founders of a Population in Northwest Colombia,” American Journal of Human Genetics 67 (2000): 1287–95; G. Bedoya et al., “Admixture Dynamics in Hispanics: A Shift in the Nuclear Genetic Ancestry of a South American Population Isolate,” Proceedings of the National Academy of Sciences of the U.S.A. 103 (2006): 7234–39.

5. Moorjani et al., “Recent Population Mixture.”

6. David Reich, Who we are and how we got here, 2018. However we should note that a critique of Reich is presented here.

7. For how among others the later frontier between Bactria and Sogdiana has changed between the Iron Age and the Kushan period see also.