Written by: Stephen Hsu
Primary Source: Information Processing
To quote James Lee, the first author listed below: “Shock and Awe” for those who doubt that cognitive ability is influenced by genetic variants.
See work from a year ago: ~100 hits from 300k individuals. Now ~600 hits from 750k. (SNPs associated with EA are likely to also be associated with cognitive ability — see figure at link above.)
47th Behavior Genetics Annual Meeting, Oslo, Norway
Genetic factors are estimated to account for at least 20% of the variation across individuals for educational attainment (Rietveld et al., 2013). The results of the latest GWAS for educational attainment identified 74 genome-wide significant loci for educational attainment (Okbay et al., 2016). Here, in one of the largest GWAS to date, we increase our sample to nearly 750,000 individuals, and we identify over 600 genome-wide significant loci associated with the number of years of schooling completed. Note that at the time of presentation, we will likely have updated our meta-analysis to include over 1,000,000 individuals
In this presentation, I will focus on the biological implications of the GWAS results. At the time of writing, 1,656 genes are significantly prioritized, a more than 10-fold increase since our previous report (Okbay et al., 2016). The newly significant results reinforce the biological theme of prenatal brain development and also bring to the foreground new themes that shed light on the biological underpinnings of cognitive performance and other traits affecting educational attainment.
James Lee (University of Minnesota – Twin Cities), Aysu Okbay (Free University Amsterdam), Robbee Wedow (University of Colorado – Boulder), Edward Kong (Harvard University), Patrick Turley (Broad Institute of MIT and Harvard), Meghan Zacher (Harvard University), Kevin Thom (New York University), Anh Tuan Nguyen Viet (University of Southern California), Omeed Maghzian (Harvard University, NBER), Richard Karlsson Linnér (Vrije Universiteit Amsterdam), Matthew Robinson (The University of Queensland), Social Science Genetic Association Consortium (NA), Peter Visscher (The University of Queensland), Daniel Benjamin (University of Southern California), David Cesarini (New York University)
Note the data here have only been analyzed using summary statistics coming from each sub-cohort. More powerful methods may soon become available:
One of the difficulties in genomics is that when DNA donors are consented for a study, the agreements generally do not allow sharing (aggregation) of genomic data across multiple studies. This leads to isolated silos of data that can’t be fully shared. However, computations can be performed on one silo at a time, with the results (“summary statistics”) shared within a larger collaboration. Most of the leading GWAS collaborations (e.g., GIANT for height, SSGAC for cognitive ability) rely on shared statistics. Simple regression analysis (one SNP at a time) can be conducted using just summary statistics, but more sophisticated algorithms cannot. These more sophisticated methods can generate a better phenotype predictor, using less data, than a SNP by SNP analysis.
A successful implementation like the one described at the link above could produce many (several times!) more hits and significantly more variance accounted for by corresponding predictors. Stay tuned!
Note Added: I’m getting lots of questions about how to interpret these results, so here are some comments.
1. I predicted ~10k variants would account for most of the heritability due to common SNPs (i.e., about 50% of total variance; allowing a predictor which correlates ~0.7 with actual cognitive ability). The rate of discovery of genome-wide significant hits and corresponding variance accounted for seems consistent with this prediction. Genetic associations are most easily discovered for variants which are common (e.g., have ~0.5 Minor Allele Frequency, not 0.05) and have large effect sizes. But alleles with this combination of properties are rare. As statistical power increases, one starts to discover (more and more) variants of lower frequency and/or lower effect size. A reasonable guess at the genetic architecture suggests a higher density of such variants, and is consistent with an accelerating rate of discovery of SNP hits (~100 hits from 300k individuals, ~600 hits from 750k). There are more efficient methods that, I believe, would discover nearly all the variants given sample size of ~1M well-phenotyped individuals. But these methods require more than just summary statistics.
I made a similar prediction of ~10k variants for height, and our (unpublished) genomic prediction results make me fairly confident that this will turn out to be correct. We now have moderately good height predictors and they are getting better very fast. That ~10k variants will turn out to be responsible for most of the variation in cognitive ability is still at a somewhat lower confidence level.
2. People are still confused about how many + variants above the mean in the population are required to make a “genius” (or super-genius). I managed to compress the explanation enough to fit in a tweet:
Flip coin 10000 times. 5000 + sqrt(10000)/2 = 5050 heads is +1SD outcome. 5100 is +2SD, etc. sqrt(N) << N for N large. Binomial~Normal dist.
You can see that even if cognitive ability is controlled by ~10k variants, flipping only ~100 of them is enough to cause a big difference in actual intelligence. Flipping a few hundred could get us to super-geniuses beyond anything in human history.
3. If you read press accounts related to our creation of the BGI Cognitive Genomics Lab back in 2011 (at that time there were zero genome-wide significant alleles associated with intelligence), you can find quotes from genomics “experts” asserting that mankind would never discover the genetic architecture of cognitive ability. (Such quotes are easy to obtain even today!) A Bayesian update given what is known in 2017 would call into question the competence of these “experts”! ;-)