Yes and have you seen the k5 light green component as well. There are no dips or spikes in its diversity across the geographic spectrum.
Which means, the migration/spread out event for this component took place more than at least 12,500.
With k5 the light green and k6 the dark green (discussed already), we cover almost the entire pool of South Asia.
Yes a very minor presence. Have you compared these components against the green ones? Here's what the paper says:
What does it mean?
Simply put - Green ones are older than blue ones and that the green ones have been having long term local genetic evolution in South Asia before blues.
Another thing about the blue components (from the paper):
Lastly, while the blues ones only appear in scant way at South Asia. The green ones (dominant in South Asia) are found with reasonably strong frequency even in the regions of blue components. That is - Caucasus, Central Asia Middle East and Europe.
And we know when the light green spread right? More than 12,500 years ago.
Regards,
Virendra
I think there could be an issue with interpretation.
This is what the paper says:
Our simulations show that differences in haplotype diversity between source and recipient populations can be detected even for migration events that occurred 500 generations ago (∼12,500 years ago assuming one generation to be 25 years).
The way I see it, their simulations are good for migrations upto 12,500 years ago. I would not conclude that their simulations (based on current or recent datasets) essentially represent migrations 12,500 years ago.
Secondly, yes, that green portion is large in India, and with more intra-haplotypical diversity, it would indicate migration out of India, and by the same token, the large blue portions (two shades of blue actually) outside India and small portions in India indicates migration of Russians and Caucasians into India.
Thirdly, the
Principal Component (PC) Analysis takes n-dimensional datapoints, and fixes the x-axis in the most dominant direction (1st PC), and then takes the next most dominant direction (2nd PC) that is normal to the 1st PC, and if we look at the two PCs individually, we will see that, as two examples, (1) East Asians are close to the Europeans, and (2) Caucasians are close to the Gujaratis. Of course, when we are reducing an n-dimensional dataset to two dimensions, we are expected to see such results. Such results are valuable, but not comprehensive. One reason why people do PCA, and stick to 1 or 2 dimensions is because it is easy to deduce a Normal Distribution in 1 or 2 dimensions. In 3 or more, we have the
curse of dimensionality, and also, the tail of a 2+ Gaussian dominates the mode, and is still a matter of research.
Finally, I have to disagree with this conclusion:
Simply put - Green ones are older than blue ones and that the green ones have been having long term local genetic evolution in South Asia before blues.
No, it does not prove that either blue or green is any older than the other, and neither does the paper say so. The only thing we know that the dataset is reliable for conclusions for migrations dating back to 12,500 years ago, and is not haplotype specific.
Worth Mentioning:
There is one method to compute distances or similarities between multi-dimensional datapoints, without reducing it to only two components, and that is the
Mahalanobis Distance, invented by Prashanta Chandra Mahalanobis, a statistician, anthropologist, and founder of ISI.
The advantage of Mahalanobis Distance is that it takes into account all the dimensions, and even if two dimensions might be most significant, the combined effect of other dimensions might actually dominate the contributions of two major dimensions, if we take two datasets and compare them.