How Do You Feel When Something Fails To Replicate?

Written by: Brent Donnellan

Primary Source: The Trait-State Continuum

Short Answer: I don’t know, I don’t care.

There is an ongoing discussion about the health of psychological science and the relative merits of different research practices that could improve research. This productive discussion occasionally spawns a parallel conversation about the “psychology of the replicators” or an extended mediation about their motives, emotions, and intentions. Unfortunately, I think that parallel conversation is largely counter-productive. Why? We have limited insight into what goes on inside the minds of others. More importantly, feelings have no bearing on the validity of any result. I am a big fan of this line from Kimble (1994, p. 257): How you feel about a finding has no bearing on its truth.

A few people seem to think that replicators are predisposed to feeling ebullient (gleeful?) when they encounter failures to replicate. This is not my reaction. My initial response is fairly geeky.  My impulse is to calculate the effect size estimate and precision of the new study to compare to the old study. I do not get too invested when a small N replication fails to duplicate a large N original study. I am more interested when a large N replication fails to duplicate a small N original study.

I then look to see if the design was difficult to implement or fairly straightforward to provide a context for interpreting the new evidence. This helps to anticipate the reactions of people who will argue that replicators lacked the skill and expertise to conduct the study or that their motivations influenced the outcome.  The often vague “lack of expertise” and “ill-intentioned” arguments are more persuasive when critics offer a plausible account of how these factors might have biased a particular replication effort.  This would be akin to offering an alternative theory of the crime in legal proceedings. In many cases, it seems unlikely that these factors are especially relevant. For example, a few people claimed that we lacked the expertise to conduct survey studies of showering and loneliness but these critics failed to offer a well-defined explanation for our particular results besides just some low-level mud-slinging. A failure to detect an effect is not prima facie evidence of a lack of expertise.

After this largely intellectual exercise is concluded, I might experience a change in mood or some sort of emotional reaction. More often this amounts to feelings of disappointment about the quality of the initial study and some anxiety about the state of affairs in the field (especially if the original study was of the small N, big effect size variety). A larger N study holds more weight than smaller N study.  Thus, my degree of worry scales with the sample size of the replication.  Of course, single studies are just data points that should end up as grist for the meta-analytic mill.  So there might be some anticipation over the outcome of future studies to learn what happens in yet another replication attempt.

Other people might have different modal emotional reactions. But does it matter?  And does it have anything at all to do with the underlying science or the interpretation of the replication?  My answers are No, No, and No. I think the important issues are the relevant facts – the respective sample sizes, effect size estimates, and procedures.

The following two tabs change content below.
Brent Donnellan
Brent Donnelan, Associate Professor, MSU Department of Psychology, an expert in personalities and personality disorders, termperment, and relationships, and Trait-State Contiuum editor.