prcr update

The R package for person-oriented analysis (prcr) is updated (it’s now version 0.1.4). In particular, it was not clear how to use the profile assignments (i.e., what cluster each response is in) in subsequent analyses. So, the update now returns two different representations of the profile assignments, or which profile is associated with each observation: …

More

History of Bayesian Neural Networks

This talk gives the history of neural networks in the framework of Bayesian inference. Deep learning is (so far) quite empirical in nature: things work, but we lack a good theoretical framework for understanding why or even how. The Bayesian approach offers some progress in these directions, and also toward quantifying prediction uncertainty. I was …

More

Announcing clustRcompaR v.0.1.0

Announcing clustRcompaR v.0.1.0 Alex Lishinski and I worked on an R package over the last year or so. We are excited that it’s now available on CRAN. You can install the package using install.packages(‘clustRcompaR’) (only needed first time) and load it (more on its two functions below) using library(clustRcompaR). Here’s a description: Provides an interface …

More

Can Life emerge spontaneously?

It would be nice if we knew where we came from. Sure, Darwin’s insight that we are the product of an ongoing process that creates new and meaningful solutions to surviving in complex and unpredictable environments is great and all. But it requires three sine qua non ingredients: inheritance, variation, and differential selection. Three does …

More

Speed, Balding, et al.: “for a wide range of traits, common SNPs tag a greater fraction of causal variation than is currently appreciated”

I recently blogged about a nice lecture by David Balding at the 2015 MLPM (Machine Learning for Personalized Medicine) Summer School: Machine Learning for Personalized Medicine: Heritability-based models for prediction of complex traits. In that talk he discussed some results concerning heritability estimation and potential improvements over GCTA. A new preprint on bioRxiv has the …

More

Some R Resources

(Should I have spelled the last word in the title “ResouRces” or “resouRces”? The R community has a bit of a fascination about capitalizing the letter “r” as often as possible.) Anyway, getting down to business, I thought I’d post links to a few resources related to the R statistical language/system/ecology that I think may …

More

Machine Learning for Personalized Medicine: Heritability-based models for prediction of complex traits (David Balding)

Highly recommended talk by David Balding on modern approaches to heritability, relatedness, etc. in statistical genetics. (I listened at 1.5x normal speed, which worked for me.) MLPM (Machine Learning for Personalized Medicine) Summer School 2015 Monday 21st of September Heritability-based models for prediction of complex traits by David Balding Complex trait genetics has been revolutionised …

More

Over- and Underfitting

I just read a nice post by Jean-François Puget, suitable for readers not terribly familiar with the subject, on overfitting in machine learning. I was going to leave a comment mentioning a couple of things, and then decided that with minimal padding I could make it long enough to be a blog post. I agree …

More

$1.2 trillion college loan bubble?

See also When everyone goes to college: a lesson from S. Korea. Returns to a “college education” are highly dependent on the intrinsic cognitive ability and work ethic of the individual. WSJ: College Loan Glut Worries Policy Makers The U.S. government over the last 15 years made a trillion-dollar investment to improve the nation’s workforce, …

More

University quality and global rankings

University quality and global rankings The paper below is one of the best I’ve seen on university rankings. Yes, there is a univariate factor one might characterize as “university quality” that correlates across multiple measures. As I have long suspected, the THE (Times Higher Education) and QS rankings, which are partially survey/reputation based, are biased …

More

Coin Flipping

I don’t recall the details, but in a group conversation recently someone brought up the fact that if you flip a fair coin repeatedly until you encounter a particular pattern, the expected number of tosses needed to get HH is greater than the expected number to get HT (H and T denoting head and tail …

More

Genetic ancestry and brain morphology

Population structure — i.e., distribution of gene variants by ancestral group — is reflected in brain morphology, as measured using MRI. Brain morphology measurements can be used to predict ancestry. Strictly speaking, the data only show correlation, not genetic causation, but the most plausible interpretation is that genetic differences are causing morphological differences. One could …

More

GCTA, Missing Heritability, and All That

Bioinformaticist E. Stovner asked about a recent PNAS paper which is critical of GCTA. My comments are below. It’s a shame that we don’t have a better online platform (e.g., like Quora or StackOverflow) for discussing scientific papers. This would allow the authors of a paper to communicate directly with interested readers, immediately after the paper …

More

On Statistics, Reporting and Bacon

I’ve previously ranted about the need for a “journalistic analytics” college major, to help with reporting (and editing) news containing statistical analysis. Today I read an otherwise well written article that inadvertently demonstrates how easy it is for even seasoned reporters to slip up. The cover story of the November 9 issue of Time magazine, …

More

David Donoho interview at HKUST

A long interview with Stanford professor David Donoho (academic web page) at the IAS at HKUST. Donoho was a pioneer in thinking about sparsity in high dimensional statistical problems. The motivation for this came from real world problems in geosciences (oil exploration), encountered in Texas when he was still a student. Geophysicists were using Compressed …

More

Regression Via Pseudoinverse

In my last post (OLS Oddities), I mentioned that OLS linear regression could be done with multicollinear data using the Moore-Penrose pseudoinverse. I want to tidy up one small loose end. Specifically, let be the matrix of predictor observations (including a column of ones if a constant term is desired), let be a vector of …

More

OLS Oddities

During a couple of the lectures in the Machine Learning MOOC offered by Prof. Andrew Ng of Stanford University, I came across two statements about ordinary least squares linear regression (henceforth OLS) that surprised me. Given that I taught regression for years, I was surprised that I could be surprised (meta-surprised?), but these two facts …

More

Producing Reproducible R Code

A tip in the Google+ Statistics and R community led me to the reprex package for R. Quoting the author (Professor Jennifer Bryan, University of British Columbia), the purpose of reprex is to [r]ender reproducible example code to Markdown suitable for use in code-oriented websites, such as StackOverflow.com or GitHub. Much has been written about …

More

Expert Prediction: hard and soft

Jason Zweig writes about Philip Tetlock’s Good Judgement Project below. See also Expert Predictions, Perils of Prediction, and this podcast talk by Tetlock. A quick summary: good amateurs (i.e., smart people who think probabilistically and are well read) typically perform as well as or better than area experts (e.g., PhDs in Social Science, History, Government; …

More

Colleges ranked by Nobel, Fields, Turing and National Academies output

Colleges ranked by Nobel, Fields, Turing and National Academies output This Quartz article describes Jonathan Wai’s research on the rate at which different universities produce alumni who make great contributions to science, technology, medicine, and mathematics. I think the most striking result is the range of outcomes: the top school outperforms good state flagships (R1 …

More

More Shiny Hacks

In a previous entry, I posted code for hack I came up with to add vertical scrolling to the sidebar of a web-based application I’m developing in Shiny (using shinydashboard). Since then, I’ve bumped into two more issues, leading to two more hacks that I’ll describe here. First, I should point out that I’m using …

More

One Hundred Years of Statistical Developments in Animal Breeding

This nice review gives a history of the last 100 years in statistical genetics as applied to animal breeding (via Andrew Gelman). One Hundred Years of Statistical Developments in Animal Breeding (Annu. Rev. Anim. Biosci. 2015. 3:19–56 DOI:10.1146/annurev-animal-022114-110733) Statistical methodology has played a key role in scientific animal breeding. Approximately one hundred years of statistical …

More

Sparsity estimates for complex traits

Note the estimate of few to ten thousand causal SNP variants, consistent with my estimates for height and cognitive ability. Sparsity (number of causal variants), along with heritability, determines the amount of data necessary to “solve” a specific trait. See Genetic architecture and predictive modeling of quantitative traits. T1D looks like it could be cracked …

More

Decision Analytics and Teacher Qualifications

Disclaimers: This a post about statistics versus decision analytics, not a prescription for improving the educational system in the United States (or anywhere else, for that matter). tl;dr. The genesis of today’s post is a blog entry I read on Spartan Ideas titled “Is Michigan Turning Away Good Teachers?” (Spartan Ideas is a “metablog”, curated …

More