Annotating papers with pipeline steps – suggestions?

Written by: C. Titus Brown

Primary  Source: Living in an Ivory Basement

A few months ago, I wrote a short description of how we make our papers replicable in the lab. One problem with this process is that for complex pipelines, it’s not always obvious how to connect a number in the paper to the steps in the pipeline that produced it — there are lots of files and outputs involved in all of this. You can sometimes figure it out by looking at the numbers and correlating with what’s in the paper, and/or trying to understand the process from the bottom up, but that’s quite difficult. Even my students and I (who can meet in person) sometimes have trouble tracking things down quickly.

Now, on my latest paper, I’ve started annotating results in the paper source code with Makefile targets. For example, if you are interested in how we got these results, you can use the comment in the LaTeX file to go straight to the associated target in the Makefile.

That’s pretty convenient, but then I got to thinking — how do we communicate this to the reader more directly? Not everyone wants to go to the github repo, read the LaTeX source, and then go find the relevant target in the Makefile (and “not everyone” is a bit of an understatement :). But there’s no reason we couldn’t link directly to the Makefile target in the PDF, is there? And, separately, right now it is a reasonably large burden to copy the results from the output of the scripts into the LaTeX file. Surely there’s a way to get the computer to do this, right?

So, everyone — two questions!

First, assuming that I’m going the nouveau traditional publishing route of producing a PDF for people to download from a journal site, is there a good, or suggested, or even better yet journal-supported way to link directly to the actual computational methods? (Yes, yes, I know it’s suboptimal to publish into PDFs. Let me know when you’ve got that figured out, and I’ll be happy to try it out. ’til then kthxbye.) I’d like to avoid intermediaries like individual DOIs for each method, if at all possible; direct links to github FTW.

Ideally, it would make it through the publishing process. I could, of course, make it available as a preprint, and point interested people at that.

Second, what’s a good way to convey results from a script into a LaTeX (or more generally any text document)? With LaTeX I could make use of the ‘include’ directive; or I could do something with templating; or…?

Well, and third, is there any (good, scriptable) way to do both (1) and (2) at the same time?

Anyway, hit me with your wisdom; would love to hear that there’s already a good solution!

–titus

The following two tabs change content below.
C. Titus Brown
C. Titus Brown is an assistant professor in the Department of Computer Science and Engineering and the Department of Microbiology and Molecular Genetics. He earned his PhD ('06) in developmental molecular biology from the California Institute of Technology. Brown is director of the laboratory for Genomics, Evolution, and Development (GED) at Michigan State University. He is a member of the Python Software Foundation and an active contributor to the open source software community. His research interests include computational biology, bioinformatics, open source software development, and software engineering.