Is “Scientific Data” ever-finer salami-slicing, or is it reducing time to data publication?

Written by: C. Titus Brown

Primary Source: Living in an Ivory Basement

I just read Scientific Data – ultimate salami slicing publishing, in which Pedro Beltrao argues that Nature’s new journal is simply another venue for them to suck money out of scientists. Maybe. But I’m strongly considering sending a lot of stuff there, and I really think Pedro is missing something very important.

(Yes, it is rare that I have a such a puzzled and strong reaction to a critique of a Nature endeavor. ;)

Pedro is missing the idea that this is publication of data, in a peer-reviewed journal. For those of us trying to push open data, this is incredibly important; there are few such journals out there that let me argue to my evaluators that I am doing something more significant than posting unreviewed tarballs on figshare. I’ve mostly been eyeing SIGS, which was co-founded by a colleague of mine in MMG at MSU; Scientific Data seems like a great addition to the field. It would be great to have more peer-reviewed journals that ensure that metadata standards are in place for the data, that basic correctness is there, and that is not for scientific discussion of analyses — unlike e.g. PLoS One, which wants analysis (and where reviewers will happily argue about the quality and correctness of your analysis and interpretation).

To see the effect of having such data-focused journals, consider the below process:

  1. Gather -omic data, do assembly, do basic assembly quality metrics, publish. (~1 month from gathering data to submission for publication.)
  2. Spend ~1-3 years developing methods that do better, more sensitive analysis of the data published in number 1. (1-3 years from gathering data to submission of methods publication.)
  3. Spend ~1-5 years doing scientific analysis of the data in conjunction with other project goals. Submit. (1-5 years from gathering data to submission of scientific analysis for publication.)

Eliminate the first point because there’s no peer-reviewed place to post data, and it means you won’t have much of my data until 3-5 years in — bad for science, bad for citations, bad bad bad. Basically it’s much slower to do serious methods development and scientific analysis than it is to do the initial data analysis.

Reducing the time to first open data would be great.

And that’s why the above process is what we’re hoping to try out in my lab. It is particularly important for the case of collaborators who are not 100% supportive of my “radical” ideas about open data, and want a citation handle that is peer-reviewed (and goes in that place on CVs).

Bottom line: if I can publish the data in a peer-reviewed place sooner, that makes it available to all, sooner; and that’s good, for me and for them. If your argument is that non-peer-reviewed pubs should be counted, well, I’m all for that, and someday I hope that’s the standard. It’s not now, and this is the world people are getting hired and promoted in.

In 10 years if our problem is that we don’t want to pay $1000 for publishing data, well, hopefully other venues will have emerged, or figshare will be well known and/or connected with peer review, or whatever. For now, journals like Scientific Data and Standards in Genomic Sciences seem like welcome stopgaps to this particular faculty member.

Data journals: for now, they’re more than just salami slicing.


The following two tabs change content below.
C. Titus Brown
C. Titus Brown is an assistant professor in the Department of Computer Science and Engineering and the Department of Microbiology and Molecular Genetics. He earned his PhD ('06) in developmental molecular biology from the California Institute of Technology. Brown is director of the laboratory for Genomics, Evolution, and Development (GED) at Michigan State University. He is a member of the Python Software Foundation and an active contributor to the open source software community. His research interests include computational biology, bioinformatics, open source software development, and software engineering.