The Cost of Open Science

Written by: C. Titus Brown

Primary Source: Living in an Ivory Basement

I just finished reading The Immortal Life of Henrietta Lacks, an excellent book about the HeLa cell line cultured from cancerous cells taken from Henrietta Lacks. In addition to raising some really interesting and astonishing questions about the appropriate (mis)use of patients’ tissue samples, a section about George Gey caught my eye and made me think about some of the challenges associated with open science.

George Gey was the immensely creative scientist who developed a host of cell-culture techniques, some of which he applied to the cells taken from Henrietta Lacks. The quick spread of HeLa cells amongst scientists, which helped drive their adoption as a standard biological reagent across biomedical science, was due largely to his decision to make them freely available upon request. Since the HeLa cells were self-replicating, and, moreover, easy to culture, they were quickly communicated from scientist to scientist. This let HeLa be used (for one example) to help cure polio.

George Gey never really reaped any rewards from HeLa, other than some reputational rewards. In the Immortal Life book, Rebecca Skloot says that he somewhat regretted his generosity, as he watched other scientists publish experiments that he’d casually done in his own lab and never bothered to publish. Yet it is without a doubt that his decision to make the HeLa cells freely available to others advanced science tremendously.

This kind of thing is one of the hidden “costs” of open science — the cost of pushing science forward as a whole, sometimes at the expense of one’s own career — and something I’m watching happen with some of our software approaches. I’m being asked with increasing frequency to review papers that extend some of our approaches (especially diginorm) and while it’s really exciting to see people building on our work, it’s also bittersweet, because we could have done some of this stuff quite easily ourselves, and (the dogma goes) more publications is better. If we’d sat on our code and eked as many publications as possible out of it, we’d probably be in a better position with respect to a monopoly on certain kinds of sequence analysis. But it’s also clear that (IMNSHO) some sequence analysis would be a harder — the most obvious example is Trinity’s inclusion of a diginorm-inspired approach. I think we did the right thing, but it’s hard to convey this to the people in charge of convincing MSU to retain me, and I’m not sure too many granting agencies care, either.

The obvious conflict here is that the incentives in science careers reward one kind of action — selfishness — while progress in science itself depends on a different kind of action — a certain amount of selflessness. Us open science advocates really need to figure out how to incentivize this better at both an institutional level and at a granting level. Lots is being done in this area, including the recent expansion of NSF-style CVs to include the broader category of research outputs, but more needs to be done. Until it becomes an obvious career win to share, most people won’t.

The hidden cost of potential commercialization

While we’re on the topic of sharing, let’s talk about the Bayh-Dole Act, which encourages recipients of federal funding to pursue commercialization of inventions. This is probably a naive statement on my part – and I welcome corrections — but I think this is probably the most damaging legislative act ever perpetrated on open science, and one of the most damaging acts to progress in science, specifically.

Why? Because everyone thinks they can get rich off of their research work, and pursue their stovepiped research instead of broadly sharing it. This hits both experimentalists and computationalists. One friend who works at a small biotech firm tells me that they have tried and failed to get access to mouse lines to test a potential therapy; the other researchers simply won’t share, and the cost of hammering out a formal legal agreement about potential commercialization profits is worth more than the expected payoff for any given therapy. Plenty of people seem to think they’re going to get rich off of their software, and either delay opening it up or prevent commercial users without an explicit license; more damaging, they block remixing of the software and algorithms with other systems or approaches so as to retain intellectual ownership.

I’m sure there are realms where this makes financial sense, but I betcha it also blocks a lot of forward progress by limiting the relevance of individual research.

And remember, if George Gey had restricted distribution of HeLa cells, it is inarguable that science would have been impeded.

The central paradox of open science

What I find most frustrating about open science is that while most funding organizations are coming to see it as a way to better leverage their funding, and the Internet provides an increasingly excellent environment for collaborating on and remixing research, it’s still not making it down to the institutional level. There is virtually no pressure, no interest, and no action on opening up the products of research at the university level. I can’t tell why, exactly, except that it seems to be driven by a general lack of incentives — i.e. virtually no one sees it to be in their interest. I can’t tell if this is short-term thinking, or rational economic thinking, or what.

In the specific realm of biology and software, I think there’s a strong argument to be made that the future belongs to those who try to build good software. I hope so. But I’m getting tired of the slow pace, and I’m not sure how to accelerate things — discussion and ideas here. (I hope to have some good news on this front in a few weeks, BTW.)

I can tell you that my career has already been immeasurably improved by my openness, including posting our software, writing blogs, and engaging with people on twitter. But I don’t know how to convey this as a systematic approach, and I’m not sure it is something that’s viable for many kinds of resarch.

I would welcome a stronger conclusion, but this is all I have. More thoughts welcome ;).


The following two tabs change content below.
C. Titus Brown
C. Titus Brown is an assistant professor in the Department of Computer Science and Engineering and the Department of Microbiology and Molecular Genetics. He earned his PhD ('06) in developmental molecular biology from the California Institute of Technology. Brown is director of the laboratory for Genomics, Evolution, and Development (GED) at Michigan State University. He is a member of the Python Software Foundation and an active contributor to the open source software community. His research interests include computational biology, bioinformatics, open source software development, and software engineering.