Written by: Stephen Hsu
Primary Source: Information Processing
By constructing haplotypes using adjacent SNPs the authors arrive at a superior set of genetic variables with which to compute genetic similarity. These haplotypes tag rare variants and seem to recover a significant chunk of heritability not accounted for by common SNPs.
See also ref 32: Yang, J. et al. Estimation of genetic variance from imputed sequence variants reveals negligible missing heritability for human height and body mass index. Nature Genetics, submitted
Haplotypes of common SNPs can explain missing heritability of complex diseases (http://dx.doi.org/10.1101/022418)
While genome-wide significant associations generally explain only a small proportion of the narrow-sense heritability of complex disease (h2), recent work has shown that more heritability is explained by all genotyped SNPs (hg2). However, much of the heritability is still missing (hg2 < h2). For example, for schizophrenia, h2 is estimated at 0.7-0.8 but hg2 is estimated at ~0.3. Efforts at increasing coverage through accurately imputed variants have yielded only small increases in the heritability explained, and poorly imputed variants can lead to assay artifacts for case-control traits. We propose to estimate the heritability explained by a set of haplotype variants (haploSNPs) constructed directly from the study sample (hhap2). Our method constructs a set of haplotypes from phased genotypes by extending shared haplotypes subject to the 4-gamete test. In a large schizophrenia data set (PGC2-SCZ), haploSNPs with MAF > 0.1% explained substantially more phenotypic variance (hhap2 = 0.64 (S.E. 0.084)) than genotyped SNPs alone (hg2 = 0.32 (S.E. 0.029)). These estimates were based on cross-cohort comparisons, ensuring that cohort-specific assay artifacts did not contribute to our estimates. In a large multiple sclerosis data set (WTCCC2-MS), we observed an even larger difference between hhap2 and hg2, though data from other cohorts will be required to validate this result. Overall, our results suggest that haplotypes of common SNPs can explain a large fraction of missing heritability of complex disease, shedding light on genetic architecture and informing disease mapping strategies.
The excerpt below is my response to an excellent comment by Gwern:
Your summary is correct, AFAIU. Below is a bit more detail about the 4 gamete test, which differentiates between a recombination event (which breaks the haploblock for descendants of that individual; recombination = scrambling due to sexual reproduction) and a simple mutation at that locus. The goal is to impute identical blocks of DNA that are tagged by SNPs on standard chips.
Algorithm to generate haploSNPs
… Given two alleles at the haploSNPs and two at the mismatch SNP, a maximum of four possible allelic combinations can be observed. If all four combinations are observed, this indicates that a recombination event is required to explain the mismatch, and the haploSNP will be terminated. If, however, only three combinations are observed, the mismatch may be explained by a mutation on the shared haplotype background. These mismatches are ignored and the haploSNP is extended further. We note that this approach can produce a very large number of haploSNPs and very long haploSNPs that could tag signals of cryptic relatedness. …
>> This estimated heritability is much closer to the full-strength twin study estimates, showing that a lot of the ‘missing’ heritability is lurking in the rarer SNPs <<
This was already suspected by some researchers (including me), but the haploSNP results provide support for the hypothesis. It means that, e.g., with whole genomes we could potentially recover nearly all the predictive power implied by classical h2 estimates …