Sparsity estimates for complex traits

Written by: Stephen Hsu

Primary Source: Information Processing

Note the estimate of few to ten thousand causal SNP variants, consistent with my estimates for height and cognitive ability.

Sparsity (number of causal variants), along with heritability, determines the amount of data necessary to “solve” a specific trait. See Genetic architecture and predictive modeling of quantitative traits.

T1D looks like it could be cracked with only a limited amount of data.

Simultaneous Discovery, Estimation and Prediction Analysis of Complex Traits Using a Bayesian Mixture Model

PLoS Genet 11(4): e1004969. doi:10.1371/journal.pgen.1004969

Gene discovery, estimation of heritability captured by SNP arrays, inference on genetic architecture and prediction analyses of complex traits are usually performed using different statistical models and methods, leading to inefficiency and loss of power. Here we use a Bayesian mixture model that simultaneously allows variant discovery, estimation of genetic variance explained by all variants and prediction of unobserved phenotypes in new samples. We apply the method to simulated data of quantitative traits and Welcome Trust Case Control Consortium (WTCCC) data on disease and show that it provides accurate estimates of SNP-based heritability, produces unbiased estimators of risk in new samples, and that it can estimate genetic architecture by partitioning variation across hundreds to thousands of SNPs. We estimated that, depending on the trait, 2,633 to 9,411 SNPs explain all of the SNP-based heritability in the WTCCC diseases. The majority of those SNPs (>96%) had small effects, confirming a substantial polygenic component to common diseases. The proportion of the SNP-based variance explained by large effects (each SNP explaining 1% of the variance) varied markedly between diseases, ranging from almost zero for bipolar disorder to 72% for type 1 diabetes. Prediction analyses demonstrate that for diseases with major loci, such as type 1 diabetes and rheumatoid arthritis, Bayesian methods outperform profile scoring or mixed model approaches.

Table S5 below gives estimates of sparsity for various disease conditions.


Coronary Artery Disease CAD
Type 1 diabetes T1D
Type 2 diabetes T2D
Crohn’s disease CD
Hypertension HT
Bipolar disorder BD
Rheumatoid arthritis RA

The following two tabs change content below.
Stephen Hsu
Stephen Hsu is vice president for Research and Graduate Studies at Michigan State University. He also serves as scientific adviser to BGI (formerly Beijing Genomics Institute) and as a member of its Cognitive Genomics Lab. Hsu’s primary work has been in applications of quantum field theory, particularly to problems in quantum chromodynamics, dark energy, black holes, entropy bounds, and particle physics beyond the standard model. He has also made contributions to genomics and bioinformatics, the theory of modern finance, and in encryption and information security. Founder of two Silicon Valley companies—SafeWeb, a pioneer in SSL VPN (Secure Sockets Layer Virtual Private Networks) appliances, which was acquired by Symantec in 2003, and Robot Genius Inc., which developed anti-malware technologies—Hsu has given invited research seminars and colloquia at leading research universities and laboratories around the world.
Stephen Hsu

Latest posts by Stephen Hsu (see all)