Written by: Stephen Hsu
Primary Source: Information Processing
GCTA is a statistical method for estimating the heritability of a complex trait using (phenotype | genotype) data from unrelated individuals. It has been applied to many human phenotypes, including disease conditions and behavioral traits. GCTA results tend to be consistent with earlier twin and family studies of heritability, and suggest that significant heritability is due to common genetic variants that will be identified in the future through increased statistical power (sample size).
A recent PNAS paper by researchers at Stanford claims to identify many problems with GCTA. The conclusions of this paper have been hotly contested by the GCTA authors and others.
Earlier post (January 1, 2016) on PNAS paper Limitations of GCTA as a solution to the missing heritability problem. (See also: many posts on this blog which mention GCTA.)
Detailed comments and analysis here and here by Sasha Gusev. Gusev claims that the problems identified in Figs 4,7 are the result of incorrect calculation of the SE (4) and failure to exclude related individuals in the Framingham data (7).
GCTA authors Vsscher, Yang, et al. respond to PNAS paper — they accept none of the criticisms (February 13, 2016 bioarxiv).
PNAS authors reply to Visscher, Yang, et al. comments (February 16, 2016 bioarxiv). They claim that relatedness thresholding used with GCTA analysis is flawed and that residual standard errors are much larger than claimed.
Gamazon and Park (February 18, 2016 bioarxiv) question spectral analysis and random matrix theory results in the PNAS paper. (I believe this is the first critique which looks at the mathematics of the PNAS paper, as opposed to simulation results.)
This dispute shows the utility of blogs (Gusev) and bioarxiv for rapid scientific discussion. Some of the commentaries listed above are 20+ pages long with figures and equations. This discussion would not have been possible (or would have taken months or years) in a journal setting.
The next step should be a mini-workshop conducted online, with each group allowed 30 min to present their results, followed by questions :-)
I’ve always felt that the real weakness of GCTA is the assumption of random effects. A consequence of this assumption is that if the true causal variants are atypical (e.g., in linkage disequilibrium) among common SNPs, the results could be biased. It is impossible to evaluate this uncertainty at the moment because we do not yet know the genetic architectures of any complex traits. See Why does GCTA work? for more discussion and a link to work by Lee and Chow examining this issue.
Recently, a promising new method (Heritability Estimates from Summary Statistics) has been proposed which does not make assumptions about the effect size distribution — it uses GWAS estimates of effect size to directly estimate variance accounted for by each region of the genome. The initial application of this method also suggests significant heritability due to common variants.