The Unz Review • An Alternative Media Selection$
A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media
Authors Filter?
Razib Khan
Nothing found
 TeasersGene Expression Blog
/
Daniel MacArthur

Bookmark Toggle AllToCAdd to LibraryRemove from Library • B
Show CommentNext New CommentNext New ReplyRead More
ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
AgreeDisagreeThanksLOLTroll
These buttons register your public Agreement, Disagreement, Thanks, LOL, or Troll with the selected comment. They are ONLY available to recent, frequent commenters who have saved their Name+Email using the 'Remember My Information' checkbox, and may also ONLY be used three times during any eight hour period.
Ignore Commenter Follow Commenter

Pakistani honor guard

A few days ago I suggested that Dr. Daniel MacArthur might have South Asian ancestry. Now, when confronted with surprise the best option is to stick with your prior assumption, unless that surprise is powerful enough for you to “update” your model. After a few days of further analysis I will update: I do think Dan MacArthur has South Asian ancestry. Dienekes dug further, and noticed that there are hallmarks of “Ancestral South Indian” ancestry along the first 2/3 or so of chromosome 10. Now, you do have to remember that this genomic region is only half South Asian. The other half is European.

But in any case, one question that some people brought up: perhaps MacArthur has Romani heritage? I’m skeptical of this partly because:

1) there weren’t that many Romani in Britain in the 19th century

2) The British Romani are already very highly admixed

Another friend, who is a population genomicist himself, expressed some skepticism that such a long segment wasn’t broken up by recombination over the generations. My only moderately informed answer is this: we’d only notice the long segments, because if a very small region of ‘exotic’ ancestry was embedded within the dominant ancestral component it probably would not show up on some of these tests (or, we’d assume it was noise). Dan has another segment of South Asian ancestry, but much smaller in size. It may be there are other regions which we could find if we used better reference populations.

Here’s what I tentatively want to do with Dan’s data now. First, take the 80 MB or so which has South Asian ancestry, and phase it. That way I’d have a South Asian chromosome and a European one, and we could look for matches for only the South Asian one. But being busy I didn’t have time to do this. What I did have time to do was reduce the chromosomal region under consideration, and then run an IBS distance analysis in a private data set I have. This is a crude, but not always uninformative analysis. But by looking at the relationships I can now conclude that Dan MacArthur probably does not have Romani ancestry. Why? Because the Romani are of Northwest Indian heritage, and MacArthur’s match pattern using the diploid genotype (so South Asian + European) does not match what I expect would emerge from such a combination.

The full table is below, but to me the fact that he has so many matches with Northwest Indian populations is evidence that his ancestry was not Northwest Indian. Otherwise, he would be matching more Utah white (CEU samples) more often. Rather, someone with a mix of more conventional South Asian ancestry and European ancestry often resembles some of the less South Asian populations of South Asia (e.g., Brahui) in these crude measures. In fact, one of the closest matches to Dan’s IBS profile’s is that of my own mother. She is a rather vanilla ethnic Bengali, so I think there is a strong chance that his Indian ancestry is similar. This weak genetic data isn’t really the primary reason. The British East India company operated out of Bengal for much of its history, and there are simply a lot of Bengalis.

There’s a lot more that can be done here. Since I don’t have time, here’s the pedigree file if anyone wants to play with them (Dan is DGM001).

Population Genetic distance from Dan Standardized distance
Brahui 0.253 81.268
Burusho 0.257 82.736
Razib’s Mother 0.258 82.783
CEU 0.258 82.993
Burusho 0.258 83.024
CEU 0.26 83.547
Sakilli 0.26 83.555
Brahui 0.261 83.831
Brahui 0.261 83.857
GIH 0.261 83.955
CEU 0.261 83.972
CEU 0.261 83.985
CEU 0.262 84.043
North Kannadi 0.262 84.169
CEU 0.262 84.207
CEU 0.262 84.318
CEU 0.262 84.33
CEU 0.263 84.391
Paniya 0.263 84.408
CEU 0.263 84.437
CEU 0.263 84.445
CEU 0.263 84.488
CEU 0.263 84.606
CEU 0.263 84.609
CEU 0.264 84.691
Brahui 0.264 84.709
CEU 0.264 84.752
CEU 0.264 84.764
Brahui 0.264 84.822
GIH 0.264 84.826
Burusho 0.264 84.841
CEU 0.264 84.898
CEU 0.264 84.975
North Kannadi 0.264 84.992
CEU 0.265 85.087
Paniya 0.265 85.212
CEU 0.265 85.226
CEU 0.265 85.25
CEU 0.265 85.25
CEU 0.265 85.278
CEU 0.265 85.299
North Kannadi 0.265 85.3
Burusho 0.265 85.309
Burusho 0.266 85.328
CEU 0.266 85.363
CEU 0.266 85.409
North Kannadi 0.266 85.412
CEU 0.266 85.436
Burusho 0.266 85.446
Bene Israel 0.266 85.508
CEU 0.266 85.521
GIH 0.266 85.618
GIH 0.267 85.661
CEU 0.267 85.696
CEU 0.267 85.722
CEU 0.267 85.732
Brahui 0.267 85.777
GIH 0.267 85.793
CEU 0.267 85.799
CEU 0.267 85.816
Cochin Jews 0.267 85.85
CEU 0.267 85.943
Brahui 0.268 85.996
CEU 0.268 86.005
Cochin Jews 0.268 86.011
CEU 0.268 86.08
CEU 0.268 86.115
CEU 0.268 86.18
GIH 0.268 86.229
Cochin Jews 0.268 86.234
CEU 0.268 86.244
Burusho 0.268 86.265
CEU 0.268 86.277
CEU 0.268 86.278
CEU 0.269 86.288
CEU 0.269 86.291
CEU 0.269 86.318
CEU 0.269 86.325
CEU 0.269 86.326
GIH 0.269 86.327
CEU 0.269 86.329
CEU 0.269 86.354
CEU 0.269 86.387
CEU 0.269 86.463
CEU 0.269 86.515
CEU 0.269 86.517
CEU 0.269 86.55
CEU 0.27 86.609
Paniya 0.27 86.682
CEU 0.27 86.687
CEU 0.27 86.696
CEU 0.27 86.717
CEU 0.27 86.733
Sakilli 0.27 86.74
CEU 0.27 86.866
Malayan 0.27 86.879
North Kannadi 0.27 86.883
CEU 0.271 86.937
Brahui 0.271 86.952
Burusho 0.271 86.956
CEU 0.271 86.957
CEU 0.271 86.977
North Kannadi 0.271 86.995
GIH 0.271 87.018
CEU 0.271 87.042
CEU 0.271 87.066
CEU 0.271 87.07
Brahui 0.271 87.09
Bene Israel 0.271 87.094
Sakilli 0.271 87.141
CEU 0.271 87.2
CEU 0.271 87.24
North Kannadi 0.272 87.253
CEU 0.272 87.297
Burusho 0.272 87.307
CEU 0.272 87.327
GIH 0.272 87.353
CEU 0.272 87.355
Cochin Jews 0.272 87.381
CEU 0.272 87.384
CEU 0.272 87.5
CEU 0.272 87.535
CEU 0.273 87.594
Malayan 0.273 87.676
CEU 0.273 87.702
CEU 0.273 87.741
Burusho 0.273 87.806
CEU 0.273 87.846
Cambodians 0.274 87.932
North Kannadi 0.274 87.951
CEU 0.274 87.951
Burusho 0.274 88.03
CEU 0.274 88.047
CEU 0.274 88.081
CEU 0.274 88.089
CEU 0.274 88.101
CEU 0.274 88.179
CEU 0.274 88.19
North Kannadi 0.275 88.243
CEU 0.275 88.32
GIH 0.275 88.325
CEU 0.275 88.349
Brahui 0.275 88.393
CEU 0.275 88.402
CEU 0.275 88.457
Bene Israel 0.276 88.552
CEU 0.276 88.577
CEU 0.276 88.603
CEU 0.276 88.647
CEU 0.276 88.7
CEU 0.276 88.729
CEU 0.276 88.814
CEU 0.276 88.85
Brahui 0.276 88.855
CEU 0.277 88.923
GIH 0.277 88.99
Paniya 0.277 89.082
CEU 0.277 89.118
CEU 0.277 89.15
CEU 0.277 89.151
CEU 0.277 89.17
CEU 0.278 89.184
Cambodians 0.278 89.208
Cambodians 0.278 89.233
Cambodians 0.278 89.383
CEU 0.278 89.45
CEU 0.278 89.493
Cambodians 0.279 89.522
CEU 0.279 89.595
CEU 0.279 89.679
CEU 0.279 89.753
CEU 0.279 89.762
CEU 0.279 89.807
Cambodians 0.28 89.942
GIH 0.28 90.085
CEU 0.281 90.178
Brahui 0.281 90.364
Cambodians 0.282 90.543
Cambodians 0.282 90.559
Cambodians 0.282 90.77
Cambodians 0.283 90.898
CEU 0.283 90.956
CEU 0.284 91.316
CHD 0.289 92.952
Sakilli 0.29 93.103
Bene Israel 0.29 93.122
CHD 0.291 93.619
CHD 0.291 93.663
CHD 0.293 94.125
CHD 0.293 94.248
CHD 0.294 94.451
CHD 0.294 94.629
CHD 0.296 94.965
CHD 0.296 95.279
Yorubas 0.297 95.298
CHD 0.297 95.368
CHD 0.297 95.438
CHD 0.297 95.441
Yorubas 0.297 95.567
CHD 0.298 95.678
CHD 0.298 95.828
CHD 0.299 96.032
CHD 0.299 96.127
CHD 0.3 96.349
CHD 0.3 96.403
CHD 0.3 96.443
CHD 0.3 96.508
CHD 0.3 96.523
CHD 0.3 96.533
CHD 0.301 96.575
CHD 0.301 96.598
CHD 0.301 96.624
CHD 0.301 96.625
CHD 0.301 96.738
CHD 0.301 96.758
CHD 0.301 96.869
Yorubas 0.302 97.106
CHD 0.303 97.37
CHD 0.303 97.41
Yorubas 0.304 97.681
CHD 0.304 97.713
CHD 0.304 97.747
Yorubas 0.304 97.829
CHD 0.304 97.838
CHD 0.305 98.106
CHD 0.306 98.309
Yorubas 0.307 98.499
CHD 0.307 98.546
CHD 0.307 98.547
CHD 0.307 98.606
CHD 0.307 98.764
CHD 0.307 98.78
CHD 0.307 98.803
Yorubas 0.308 98.947
Yorubas 0.308 99.03
Yorubas 0.309 99.411
Yorubas 0.309 99.417
CHD 0.309 99.452
CHD 0.31 99.624
Yorubas 0.311 100
 

My initial inclination in this post was to discuss a recent ordering snafu which resulted in many of my friends being quite peeved at 23andMe. But browsing through their new ‘ancestry composition’ feature I thought I had to discuss it first, because of some nerd-level intrigue. Though I agree with many of Dienekes concerns about this new feature, I have to admit that at least this method doesn’t give out positively misleading results. For example, I had complained earlier that ‘ancestry painting’ gave literally crazy results when they weren’t trivial. It said I was ~60 percent European, which makes some coherent sense in their non-optimal reference population set, but then stated that my daughter was >90 percent European. Since 23andMe did confirm she was 50% identical by descent with me these results didn’t make sense; some readers suggested that there was a strong bias in their algorithms to assign ambiguous genomic segments to ‘European’ heritage (this was a problem for East Africans too).

Here’s my daughter’s new chromosome painting:

One aspect of 23andMe’s new ancestry composition feature is that it is very Eurocentric. But, most of the customers are white, and presumably the reference populations they used (which are from customers) are also white. Though there are plenty of public domain non-white data sets they could have used, I assume they’d prefer to eat their own data dog-food in this case. But that’s really a minor gripe in the grand scheme of things. This is a huge upgrade from what came before. Now, it’s not telling me, as a South Asian, very much. But, it’s not telling me ludicrous things anymore either!

But in regards to omission I am curious to know why this new feature rates my family as only ~3% East Asian, when other analyses put us in the 10-15% range. The problem with very high values is that South Asians often have some residual ‘eastern’ signal, which I suspect is not real admixture, but is an artifact. Nevertheless, northeast Indians, including Bengalis, often have genuine East Asia admixture. On PCA plots my family is shifted considerably toward East Asians. The signal they are picking up probably isn’t noise. Almost every apportionment of East Asian ancestry I’ve seen for my family yields a greater value for my mother, and that holds here. It’s just that the values are implausibly low.

In any case, that’s not the strangest thing I saw. I was clicking around people who I had “shared” genomes with, and I stumbled upon this:

As you can guess from the screenshot this is Daniel MacArthur’s profile. And according to this ~25% of chromosome 10 is South Asian! On first blush this seemed totally nonsensical to me, so I clicked around other profiles of people of similar Northern European background…and I didn’t see anything equivalent.

What to do? It’s going to take more evidence than this to shake my prior assumptions, so I downloaded Dr. MacArthur’s genotype. Then I merged it with three HapMap populations, the Utah whites (CEU), the Gujaratis (GIH), and the Chinese from Denver (CHD). The last was basically a control. I pulled out chromosome 10. I also added Dan’s wife Ilana to the data set, since I believe she got typed with the same Illumina chip, and is of similar ethnic background (i.e., very white). It is important to note that only 28,000 SNPs remained in the data set. But usually 10,000 is more than sufficient on SNP data for model-based clustering with inter-continental scale variation.

I did two things:

1) I ran ADMIXTURE at K = 3, unsupervised

2) I ran an MDS, which visualized the genetic variation in multiple dimensions

Before I go on, I will state what I found: these methods supported the inference from 23andMe, on chromosome 10 Dr. MacArthur seems to have an affinity with South Asians (i.e., this is his ‘curry chromosome’). Here are the average (median) values in tabular format, with MacArthur and his wife presented for comparison.

ADMIXTURE results for chromosome 10
K 1 K 2 K 3
CEU 0.04 0.02 0.93
GIH 0.87 0.05 0.08
CHD 0.01 0.97 0.01
Daniel MacArthur 0.29 0.07 0.64
Ilana Fisher 0.01 0.06 0.94

You probably want a distribution. Out of the non-founder CEU sample none went above 20% South Asian. Though it did surprise me that a few were that high, making it more plausible to me that MacArthur’s results on chromosome 10 were a fluke:

And here’s the MDS with the two largest dimensions:

Again, it’s evident that this chromosome 10 is shifted toward South Asians. If I had more time right now what I’d do is probably get that specific chromosomal segment, phase it, and then compare it to various South Asian populations. But I don’t have time now, so I went and checked out the results from the Interpretome. I cranked up the settings to reduce the noise, and so that it would only spit out the most robust and significant results. As you can see, again chromosome 10 comes up as the one which isn’t quite like the others.

Is there is a plausible explanation for this? Perhaps Dr. MacArthur can call up a helpful relative? From what recall his parents are immigrants from the United Kingdom, and it isn’t unheard of that white Britons do have South Asian ancestry which dates back to the 19th century. Though to be totally honest I’m rather agnostic about all this right now. This genotype has been “out” for years now, so how is it that no one has noticed this peculiarity??? Perhaps the issue is that everyone was looking at the genome wide average, and it just doesn’t rise to the level of notice? What I really want to do is look at the distribution of all chromosomes and see how Daniel MacArthur’s chromosome 10 then stacks up. It might be a random act of nature yet.

Also, I guess I should add that at ~1.5% South Asian that would be consistent with one of MacArthur’s great-great-great-great grandparents being Indian. Assuming 25 year generation times that puts them in the mid-19th century. Of course, at such a low proportion the variance is going to be high, so it is quite possible that you need to push the real date of admixture one generation back, or one generation forward.

 
Razib Khan
About Razib Khan

"I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. If you want to know more, see the links at http://www.razib.com"