No genomic dark matter

Written by: Stephen Hsu

Primary Source:   Information Processing

Let me put it very simply: there is NO genomic “dark matter” or “missing heritability” — it’s merely a matter of sample size (statistical power) to identify the specific variants that account for the total expected heritability. The paper below (see also HaploSNPs and missing heritability) suggests that essentially all of the expected heritability can be accounted for once rare (MAF < 0.01) and common SNPs are taken into account. I suspect the small remaining gap in heritability is accounted for by nonlinear effects.

We don’t yet know which specific variants are responsible for, e.g., population variation in height, but we expect that they can be found given sufficient statistical power. See Genetic architecture and predictive modeling of quantitative traits.

Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index

Nature Genetics (2015) doi:10.1038/ng.3390

We propose a method (GREML-LDMS) to estimate heritability for human complex traits in unrelated individuals using whole-genome sequencing data. We demonstrate using simulations based on whole-genome sequencing data that ~97% and ~68% of variation at common and rare variants, respectively, can be captured by imputation. Using the GREML-LDMS method, we estimate from 44,126 unrelated individuals that all ~17 million imputed variants explain 56% (standard error (s.e.) = 2.3%) of variance for height and 27% (s.e. = 2.5%) of variance for body mass index (BMI), and we find evidence that height- and BMI-associated variants have been under natural selection. Considering the imperfect tagging of imputation and potential overestimation of heritability from previous family-based studies, heritability is likely to be 60–70% for height and 30–40% for BMI. Therefore, the missing heritability is small for both traits. For further discovery of genes associated with complex traits, a study design with SNP arrays followed by imputation is more cost-effective than whole-genome sequencing at current prices.

From the paper (click for larger image):

… Under a model of neutral evolution, most variants segregating in the population are rare, whereas most genetic variation underlying traits is due to common variants18. The neutral evolutionary model predicts that the cumulative contribution of variants with MAF ≤θ to the total genetic variance is linearly proportional to θ, where θ is a MAF threshold. However, our observed results for height strongly deviated from this model (Fig. 4a), suggesting that height-associated variants have been under natural selection. Such deviation would be even stronger with whole-genome sequencing data because variation at rare sequence variants is less well captured by 1000 Genomes Project imputation than that at common variants (Fig. 3 and Supplementary Fig. 4). … Equivalently, the neutral evolutionary model also predicts that variance explained is uniformly distributed as a function of MAF18, such that the variance explained by variants with MAF ≤0.1 equals that of variants with MAF >0.4. However, we observed that, although the variance explained per variant (defined as , with m being the number of variants) for rare variants was much smaller than that for common variants for both height and BMI (Supplementary Fig. 8), the variants with MAF ≤0.1 in total explained a significantly larger proportion of variance than those with MAF >0.4 (21.0% versus 8.8%, Pdifference = 9.2 × 10−7) for height (Fig. 4b and Supplementary Table 3), consistent with height-associated variants being under selection.

… Theoretical studies on variation in complex traits based on models of natural selection suggest that rare variants only explain a substantial amount of variance under strong assumptions about the relationship between effect size and selection strength19, 20, 21. We performed genome-wide association analyses for height and BMI in the combined data set (Online Methods) and found that the minor alleles of variants with lower MAF tended to have stronger and negative effects on height and stronger but positive effects on BMI (Fig. 4c). The correlation between minor allele effect and MAF was highly significant for both height (P < 1.0 × 10−6) and BMI (P = 8.0 × 10−5) and was even stronger for both traits in the data from the latest GIANT Consortium meta-analyses5, 22 (Fig. 4d); these correlations were not driven by population stratification (Supplementary Fig. 10). All these results suggest that height- and BMI-associated variants have been under selection. These results are consistent with the hypothesis that new mutations that decrease height or increase obesity tend to be deleterious to fitness and are hence kept at low frequencies in the population by purifying selection.

See also Deleterious variants affecting traits that have been under selection are rare and of small effect — the results above support my conjecture from several years ago.

The following two tabs change content below.
Stephen Hsu
Stephen Hsu is vice president for Research and Graduate Studies at Michigan State University. He also serves as scientific adviser to BGI (formerly Beijing Genomics Institute) and as a member of its Cognitive Genomics Lab. Hsu’s primary work has been in applications of quantum field theory, particularly to problems in quantum chromodynamics, dark energy, black holes, entropy bounds, and particle physics beyond the standard model. He has also made contributions to genomics and bioinformatics, the theory of modern finance, and in encryption and information security. Founder of two Silicon Valley companies—SafeWeb, a pioneer in SSL VPN (Secure Sockets Layer Virtual Private Networks) appliances, which was acquired by Symantec in 2003, and Robot Genius Inc., which developed anti-malware technologies—Hsu has given invited research seminars and colloquia at leading research universities and laboratories around the world.
Stephen Hsu

Latest posts by Stephen Hsu (see all)