Written by: Stephen Hsu
Primary Source: Information Processing
Highly recommended talk by David Balding on modern approaches to heritability, relatedness, etc. in statistical genetics. (I listened at 1.5x normal speed, which worked for me.)
MLPM (Machine Learning for Personalized Medicine) Summer School 2015
Monday 21st of September
Heritability-based models for prediction of complex traits
by David Balding
Complex trait genetics has been revolutionised over the past 5 years by developments related to the concept of heritability. Heritability is the fraction of phenotypic variation that can be attributed to genetic mechanisms (mostly we focus on narrow-sense heritability, which considers only additive genetic effects). Since we cannot identify and measure the causal genetic mechanisms, a traditional approach has been to use pedigree relatedness as a proxy for the sharing of causal alleles between individuals. Pedigree relatedness even came to be seen as central to the concept of heritability, which perhaps explains why it was not until 2010 that it became widely appreciated that genome-wide genetic markers (SNPs) offered at least a “noisy” way to directly measure causal alleles, and hence a new approach to assessing heritability. This approach is “noisy” because SNPs generally only tag causal variants imperfectly, depending on SNP density and linkage disequilibrium, and many SNPs may tag little or no causal variation. So genome-wide SNP-based heritability estimates are difficult to interpret, but they can provide a lower bound which was enough to show that SNPs usually tag much more causal variation than can be attributed to genome-wide significant SNPs. Another big step forward has been that heritability can be attributed to different genes, genomic regions or functional classes, and for many phenotypes it is found to be widely dispersed across the genome, with relatively little concentration in coding regions. Further, heritability has become a unit of common currency for gene-based tests and meta-analysis. I will review the ideas and the underlying mathematical models, and present some recent results.
1. He notes that after a few hundred years, it’s highly likely that a given descendant carries no actual DNA from a specific ancestor (e.g., most descendants of Shakespeare alive today have none of his DNA).
2. @18min or so: a request to Chris Chang to add a modified definition of SNP relatedness to PLINK (i.e., new flag), with a different weighting for the heterozygous (1,1) case ;-)
3. @29min or so: finally, a discussion of systematic errors in GCTA due to LD characteristics of causal variants. As I said here:
I’ve always felt that the real weakness of GCTA is the assumption of random effects. A consequence of this assumption is that if the true causal variants are atypical (e.g., in terms of linkage disequilibrium) among common SNPs, the results could be biased. It is impossible to evaluate this uncertainty at the moment because we do not yet know the (full) genetic architectures of any complex traits.
4. @35min: again T1D stands out in terms of genetic architecture
5. @47min: predictive correlations of almost 0.6 for T1D