# Sharing “Implicit” Test Problems

The topic of reproducible research is garnering a lot of attention these days. I’m not sure there is a 100% agreed upon, specific, detailed definition of it, and I do think it’s likely to be somewhat dependent on the type of research, but for purposes of this post the Wikipedia definition (previous link) works for …

More

Since I use the Linux Mint operating system, the obvious (if not only) choice for a LaTeX distribution is TeX Live. (If you are not familiar with, or are not interested in, the LaTeX typesetting system, you have already read too far in this post.) On Mint, Ubuntu and other Debian-type operating systems, you typically …

More

# Support for Benders Decomposition in CPLEX

As of version 12.7, CPLEX now has built-in support for Benders decomposition. For details on that (and other changes to CPLEX), I suggest you look at this post on J-F Puget’s blog and Xavier Nodet’s related slide show. [Update 12/7/16: There is additional information about the Benders support in a presentation by IBM’s Andrea Tramontani …

More

# Interactive R Plots with GGPlot2 and Plotly

I refactored a recent Shiny project, using Hadley Wickham’s ggplot2 library to produce high quality plots. One particular feature the project requires is the ability to hover over a plot and get information about the nearest point (generally referred to as “hover text” or a “tool tip”). There are multiple ways to turn static ggplots …

More

# Surviving an nVidia Driver Update

Scenario: I’m running Linux Mint 17.3 Rebecca (based on Ubuntu 14.04) on a PC with a GeForce 6150SE nForce 430 graphics card. My desktop environment is Cinnamon. The graphics card is a bit long in the tooth, but it’s been running fine with the supported nVidia proprietary driver for quite some time. Unfortunately, having no …

More

# Hovering Over Shiny Plots

I’m following up on yesterday’s post, “Formatting in a Shiny App“. One of the features I added to my Shiny application was the ability to identify a point in a plot by hovering over it. Since I wanted to do this in several different plots, and did not want to reproduce the logic each time, …

More

# Formatting in a Shiny App

I’ve been updating a Shiny (web-based interactive R) application, during the course of which I needed to make a couple of cosmetic fixes. Both proved to be oddly difficult. Extensive use of Google (I think I melted one of their cloud servers) eventually turned up enough clues to get both done. I’m going to record …

More

# Some R Resources

(Should I have spelled the last word in the title “ResouRces” or “resouRces”? The R community has a bit of a fascination about capitalizing the letter “r” as often as possible.) Anyway, getting down to business, I thought I’d post links to a few resources related to the R statistical language/system/ecology that I think may …

More

I recently picked up a pair of Bluetooth headphones (Mixcder ShareMe 7) for use with my laptop (which runs Linux Mint). Getting them to connect properly was a bit of an adventure. After I had things (mostly) sorted out, I decided to script the steps necessary to get them working so that I could just …

More

# Finding the Kernel of a Matrix

I’m working on an optimization problem (coding in Java) in which, should various celestial bodies align the wrong way, I may need to compute the rank of a real matrix and, if it’s less than full rank, a basis for its kernel. (Actually, I could get by with just one nonzero vector in the kernel, …

More

# MathJax Whiplash

Technology is continuously evolving, and for the most part that’s good. Every now and then, though, the evolution starts to look like a random mutation … the kind that results in an apocalyptic virus, or mutants with superpowers, or something else that is much more appealing as a plot device in a movie or TV …

More

# Looking at Mormon use of Twitter during #ldsconf

My first introduction to Twitter was in a class on the intersection of technology and Mormonism that I took from David Wiley at Brigham Young University. During the class, David encouraged us to try experiencing the sessions of the upcoming semi-annual LDS General Conference in a new way: by following the #ldsconf hashtag. My very …

More

# Port Forwarding in Mythbuntu

My MythTV setup recently broke in an “interesting” way, and the fix I came up with is a bit involved … so be warned, what follows is a major kludge. Setup Before explaining the pathology (and fix), I need to describe the setup. I have Mythbuntu installed on a PC whose only function is to …

More

A couple of weeks ago, I spent some time discussing how to use R and Web scraping to retrieve information on Twitter users’ locations, as stored in their profiles. I’ve since updated the code to scrape not only locations, but names, descriptions, locations, personal websites, join dates, number of tweets, number of users following, number …

More

# Tabulating Prediction Intervals in R

I just wrapped up (knock on wood!) a coding project using R and Shiny. (Shiny, while way cool, is incidental to this post.) It was a favor for a friend, something she intends to use teaching an online course. Two of the tasks, while fairly mundane, generated code that was just barely obscure enough to …

More

# Selecting the Least Qualifying Index

Something along the following lines cropped up recently, regarding a discrete optimization model. Suppose that we have a collection of binary variables $x_i \in B, \, i \in 1,\dots,N$ in an optimization model, where $B=\{0, 1\}$. The values of the $x_i$ will of course be dictated by the combination of the constraints and objective function. …

More

# Alternative Versions of R

Fair warning: most of this post is specific to Linux users, and in fact to users of Debian-based distributions (e.g., Debian, Ubuntu or Mint). The first section, however, may be of interest to R users on any platform. An alternative to “official” R By “official” R, I mean the version of R issued by the …

More

# The Monty Hall Evolver

The Monty Hall problem is very famous (Wikipedia, NYT). It is so famous because it so easily fools almost everyone the first time they hear about it, including people with doctorate degrees in various STEM fields. There are three doors. Behind one is a big prize, a car, and behind the two others are goats. …

More

# Updated Java Utilities for CPLEX and CP Optimizer

I just finished adding a feature to a utility library I use in Java projects that employ either CPLEX or CP Optimizer. In addition, I moved the files to a new home. The library is free to use under the Eclipse Public License 1.0. The code is mentioned in previous posts, so I’ll just quickly …

More

# An SSH Glitch

Something weird happened with SSH today, and I’m documenting it here in case it happens again. I was minding my own business, doing some coding, on a project that is under version control using Git. After committing some changes, I was ready to push them up to the remote (a GitLab server here at Michigan …

More

# Coding for kids

I’ve been trying to get my kids interested in coding. I found this nice game called Lightbot, in which one writes simple programs that control the discrete movements of a bot. It’s very intuitive and in just one morning my kids learned quite a bit about the idea of an algorithm and the notion of …

More

# Thunar Slow-down Fixed

My laptop is not exactly a screamer, but it’s adequate for my purposes. I run Linux Mint 17 on it (Xfce desktop), which uses Thunar as its file manager. Not too long ago, I installed the RabbitVCS version control tools, including several plugins for Thunar needed to integrate the two. Lately, Thunar has been incredibly …

More

# Parsing Months in R

As part of a recent analytics project, I needed to convert strings containing (English) names of months to the corresponding cardinal values (1 for January, …, 12 for December). The strings came from a CSV file, and were translated by R to a factor when the file was read. The factor had more than 12 …

More

# Python usage survey 2014

Remember that Python usage survey that went around the interwebs late last year? Well, the results are finally out and I’ve visualized them below for your perusal. This survey has been running for two years now (2013-2014), so where we have data for both years, I’ve charted the results so we can see the changes …

More

# RStudio Git Support

One of the assignments in the R Programming MOOC (offered by Johns Hopkins University on Coursera) requires the student to set up and utilize a (free) Git version control repository on GitHub. I use Git (on other sites) for other things, so I thought this would be no big deal. I created an account on …

More

# Estimate whether your sequencing has saturated your sample to a given coverage

This recipe provides a time-efficient way to determine whether you’ve saturated your sequencing depth, i.e. how much new information is likely to arrive with your next set of sequencing reads. It does so by using digital normalization to generate a “collector’s curve” of information collection. Uses for this recipe include evaluating whether or not you …

More

# Estimating (meta)genome size from shotgun data

This is a recipe that provides a time- and memory- efficient way to loosely estimate the likely size of your assembled genome or metagenome from the raw reads alone. It does so by using digital normalization to assess the size of the coverage-saturated de Bruijn assembly graph given the reads provided by you. It does …

More

# Downsampling shotgun reads to a given average coverage (assembly-free)

The below is a recipe for subsetting a high-coverage data set to a given average coverage. This differs from digital normalization because the relative abundances of reads should be maintained — what changes is the average coverage across all the reads. Uses for this recipe include subsampling reads from a super-high coverage data set for …

More

# Extracting shotgun reads based on coverage in the data set (assembly-free)

In recent days, we’ve gotten several requests, including two or three on the khmer mailing list, for ways to extract shotgun reads based on their coverage with respect to the reference. This is fairly easy if you have an assembled genome, but what if you want to avoid doing an assembly? khmer can do this …

More

# Updated Benders Example

Two years ago, I posted an example of how to implement Benders decomposition in CPLEX using the Java API. At the time, I believe the current version of CPLEX was 12.4; as of this writing, it is 12.6.0.1. Around version 12.5, IBM refactored the Java API for CPLEX and, in the process, made one or …

More

# Replication and quality in science – another interview

Nik Sultana, a postdoc in Cambridge, asked me some questions via e-mail, and I asked him if it would be OK for me to publish them on my blog. He said yes, so here you go! How is the quality of scientific software measured? Is there a “bug index”, where software loses points if it’s …

More

# How to make beautiful data visualizations in Python with matplotlib

It’s been well over a year since I wrote my last tutorial, so I figure I’m overdue. This time, I’m going to focus on how you can make beautiful data visualizations in Python with matplotlib. There are already tons of tutorials on how to make basic plots in matplotlib. There’s even a huge example plot …

More

# Turning Bounds into Constraints in CPLEX

I had to delve into the CPLEX documentation today, and found something I had not seen before. As part of a (Java) program I’m writing, I need to use the conflict refiner to track down which upper and lower bounds on variables take a role in making a linear program infeasible. Of course, I could change the …

More

# Being a release manager for khmer

We just released khmer v1.1, a minor version update from khmer v1.0.1 (minor version update:220 commits, 370 files changed. Cancel that — _I_ just released khmer, because I’m the release manager for v1.1! As part of an effort to find holes in our documentation, “surface” any problematic assumptions we’re making, and generally increase the bus factor of the khmer project, …

More

# Learning R

I have recently dedicated myself to learning R, a programming language and environment for focusing largely on statistical analysis and computing. The benefit of using R over other statistical computing packages is that it is free, open-source, and has a hugely active community around its use.  R can be used cross-platform  (PCs, Macs, and Linux) …

More

# A khmer mini-Hackathon: Introducing scientists to testing and code review

As part of the 2-day Mozilla Science Labs hackathon in late July, the khmer project will be providing a “mentored open source contributathon” experience. This will provide an opportunity for people interested in trying out our instance of the “github flow” model, in which contributions are submitted for review using a pull request. Since our …

More

# Trying (and failing?) to build a Scalable CountMin Sketch

tl;dr? I played around with building a CountMin Sketch that is dynamic in size, based on a scalable Bloom Filter approach. I’m not sure it worked. Thoughts, suggestions, help? Bloom Filters In our research, we’ve made some hay using Bloom filters. They’re remarkably easy to implement; I’ve talked about them a couple of times on …

More

# Some thoughts on research coding and Stupidity Driven Development

I’m on a European trip that involves several plane flights accompanied by long airport stays, and I just used some of that time to do a bit of tedious coding on khmer. The coding I did was to add proper exception handling to khmer’s internal file loading routines (see the pull request). The old behavior …

More

# Programming Language Breakdown for the HealthCare.gov Website

Late last year, the NY Times released an article quoting a specialist working on the HealthCare.gov web site: According to one specialist, the Web site contains about 500 million lines of software code. By comparison, a large bank’s computer system is typically about one-fifth that size. This astronomically large number became the subject of intense …

More

# A Java Slider/Text Combo

A few years back I was coding (in Java, of course) the <shudder>GUI</shudder> for a research program. I needed to provide controls that would let a user specify priorities (0-100) scale for various things. Two possibilities occurred to me, with pretty much diametrically opposed strengths and weaknesses. Sliders have a few virtues. Grabbing and yanking …

More

# Setting CPLEX Parameters in Java Revisited

A bit more than a year and a half ago, I wrote some Java code to facilitate setting parameters for the CPLEX optimizer using their Concert API. Since then, I’ve added support for their CP Optimizer, and IBM has refactored the handling of parameters in CPLEX, necessitating an update to my code. This post (which …

More

# CP Optimizer, Java and NetBeans

After years of coding CPLEX applications in Java, I’ve just started working with CP Optimizer (the IBM/ILOG constraint programming solver) … and it did not take me long to run into problems. As with CPLEX, you access CP Optimizer from Java through the Concert API. As always, I am using the NetBeans IDE to do …

More