Written by: Stephen Hsu
Primary Source: Information Processing, 4/1/19.
In this exclusive interview, Stephen Hsu (Michigan State University and co-founder of Genomic Prediction) discusses the application of polygenic risk scores (PRS) for complex traits in pre-implantation genetic screening. Interview conducted by Julianna LeMieux (GEN).
GEN: What motivated you to start Genomic Prediction?
STEVE HSU: It has a very long history. Laurent Tellier is the CEO and we’ve known each other since 2010. We’d been working on the background science of how to use machine learning to look at lots of genomes and then learn to predict phenotypes from that information.
We were betting on the continuing decline in cost for genotyping, and it paid off because now there are millions of genotypes available for analysis. We’d always thought that one of the best and earliest applications of this would be embryo selection because we can help families have a healthy child.
GEN: How did you first get interested in genomics in general, given your educational background in physics?
HSU: I was interested in genetics and evolution, molecular biology, since I was a kid. I grew up in the ’70s and ’80s and already at that time there was a lot of attention focused on the molecular biology revolution, recombinant DNA. We were always told physics is a very mature subject and biology is the subject of the future and it will just explode eventually with these new molecular techniques.
When I got to college and I took some classes in molecular biology, I realized that a lot of the deep questions—like how do you actually decipher a genome and figure out which pieces of the genetic code have direct consequences in phenotypes or complex traits?—would not be answerable with the technology of that time. So I put it aside and did theoretical physics, but got re-interested around the time I met Laurent. I became aware of the super exponential cost curve for genotyping, sequencing in particular. I realized, if this continues for another ten years or so, we’re going to be able to answer all these interesting questions I’ve been thinking about since I was a kid.
GEN: How do you generate a polygenic risk score for different diseases? Of the eight diseases listed on the Genomic Prediction website, are those diseases that your lab has basically generated that data for?
HSU: Many of them were produced by my research group, but the current best-performing breast cancer predictor actually comes from a large international consortium that works on breast cancer…
We use the same data that people would use for GWAS [genome-wide association studies]. For example, we might have 200,000 controls and 20 or 30,000 cases of people in their 50s and 60s who are old enough that they would have been diagnosed for diabetes (or something) if they had it. The algorithm knows which ones are the cases and which are the controls, and it also has about 1 million SNPs from each person, typically what you get from an Affymetrix or an Illumina array.
It is a learning algorithm that tries to tune its internal model so that it best predicts whether someone is actually a case or a control. There’s a bunch of fancy math involved in this—a high-dimensional optimization. You are basically finding the model that best predicts the data.
It is different from GWAS because GWAS is very simple—you look at a particular gene or SNP and you say is there statistical evidence that this particular SNP is associated with whether you have diabetes? You get a yes/no answer. If the P value is significant enough then you say we found a hit.
That problem mathematically is very different from the problem we solved. We are actually doing an optimization in a million-dimensional space to find simultaneously all the SNPs that should be activated in our predictor. This is all in the technical weeds but it is just different mathematics…
We think we can actually predict risk by doing this high-dimensional optimization. Initially, people just thought we were crazy. We wrote theoretical papers predicting how much data would you need to be able to accurately predict height or something like that. … [ AND THOSE PREDICTIONS WERE CORRECT … ]