Written by: Stephen Hsu
Primary Source: Information Processing
As an experiment I recorded this video using slides from a talk I gave last week at NIH. I will be giving similar talks later this spring/summer at Human Longevity Inc. and BGI. The commonality between these institutions is that all three are on the road to accumulating a million human genomes. Who will get there first?
Recording the video was easy using Keynote, although it’s a bit odd to talk to yourself for an hour. I recommend that everyone do this, in order to reach a much larger audience than can fit in a lecture hall :-)
Genetic architecture and predictive modeling of quantitative traits
I discuss the application of Compressed Sensing (L1-penalized optimization or LASSO) to genomic prediction. I show that matrices comprised of human genomes are good compressed sensors, and that LASSO applied to genomic prediction exhibits a phase transition as the sample size is varied. When the sample size crosses the phase boundary complete identification of the subspace of causal variants is possible. For typical traits of interest (e.g., with heritability ~ 0.5), the phase boundary occurs at N ~ 30s, where s (sparsity) is the number of causal variants. I give some estimates of sparsity associated with complex traits such as height and cognitive ability, which suggest s ~ 10k. In practical terms, these results imply that powerful genomic prediction will be possible for many complex traits once ~ 1 million genotypes are available for analysis.