Written by: Paul Rubin

Primary Source: OR in an OB World

I don’t recall the details, but in a group conversation recently someone brought up the fact that if you flip a fair coin repeatedly until you encounter a particular pattern, the expected number of tosses needed to get HH is greater than the expected number to get HT (H and T denoting head and tail respectively). Not long after that conversation, I read a reference to the same issue in a blog post.

There’s a bit of confusion around this, and I think it has to do with the experiment not being clearly posed. In this post, I will exhaust all options (and likely all readers). Before diving into variations, I will make two standing assumptions regarding the tossing of a pair of coins. The first is that the coins are distinguishable (there is a first coin and a second coin), so that HT and TH are distinct outcomes. The second is that if the “game” ends due to the result of the first of the two coins, the second coin still counts as a toss. To give a concrete example, suppose that we toss our pair of coins twice, looking for two consecutive heads, and observe the following: TH, HT. The first coin in the second toss completed the target sequence (HH), but we will still count this as a total of four coin flips rather than calling it three (ignoring the final T).

The key difference among variations in this “game” will be memory: do prior outcomes affect the chances of “winning” (finding the target pattern) in the next attempt? Let’s start with totally independent trials.

### Independent Trials

I flip two coins (or one coin twice), and either find my pattern or not. If I do not find my pattern, I erase any record of what pattern I did find and try again.

Under these rules, the number of failed attempts (“attempt” meaning two coin flips, either sequential or simultaneous) until the target is found follows a negative binomial distribution. Unfortunately, the Wikipedia entry defines this in terms of the number of successes until a specified number of failures, which is the reverse of our viewpoint. So we will temporarily define *not* finding the target pattern as a “success” and finding the target pattern as a “success”. Regardless of whether the target is HH or HT, there is a 0.25 probability of any attempt being a “failure”. Since we stop on the first “failure”, the expected number of “successes” is $$\frac{p}{1-p} = \frac{0.75}{1-0.75} = 3.$$In the notation of the Wikipedia entry, \(r=1\) is the number of “failures” (times we find the target) needed to stop the process and \(p=0.75\) is the probability of “success” (not finding the target) on a single pair of tosses. The key takeaway is that both HH and HT have an expectation of three non-matches before the first match, or four attempts total to get a match. (Reminder: This is an expected value. Your mileage may vary.)

### Dependent Trials

Now let’s change the game by remembering the last result we got, with an option to include it in the pattern. In this version, tossing one coin repeatedly versus tossing a pair a coins will differ, so I’ll start with a single coin. To illustrate the change, suppose our target pattern is HH and our first two tosses result in TH. Previously, we had to start from scratch, and only HH on our next two tosses would win. Now we can win with HT as well, since the second and third tosses combine to make the pattern: T**H** | **H**T. This also illustrates the difference between tossing a single coin repeatedly versus a pair of coins. With a pair of coins, that would count as four tosses; with a single coin, we would stop on the second H (three tosses) and never see that final T. (Tossing a single coin but requiring an even number of tosses is no different than tossing a pair of coins.)

#### Flipping One Coin

We can model our single toss process as a stationary (discrete-time) Markov chain. The state of the system will be the last result we saw: H, T or D (meaning “done”). State D is an *absorbing state*: when you’re done, you’re done forever. The game has the “memoryless” property that only the current state (most recent single coin flip) has any effect on what happens next. The *transition probability matrix* (row = current state, column = next state) looks like this:

H | T | D | |

H | 0 | 0.5 | 0.5 |

T | 0.5 | 0.5 | 0 |

D | 0 | 0 | 1 |

We have neither a head nor tail at the outset, which is functionally equivalent to starting in state T: it will take a minimum of two tosses to “win”. The quantity we seek is known as the *expected* *first passage time* from state T (start) to state D (done). Sadly, the Wikipedia entry skips over first passage times, and a Google search found lots of PDFs and one video but no helpful web pages. Key fact: if \(p_{ij}\) denotes the transition probability from state \(i\) to state \(j\) and \(F_{ij}\) denotes the first passage time from \(i\) to \(j\) (the number of transitions required, starting in state \(i[latex], until state [latex]j\) is first encountered), then for \(i\neq j\)

$$E[F_{ij}]=1+\sum_{k\neq j}p_{ik}E[F_{kj}].$$

I’m deliberately glossing over a bunch of stuff (transient versus recurrent states, communicating classes, …). You can read up on it if you don’t trust me.

If we solve these equations using the above matrix (reminder: HH is the target), we get $$E[F_{TD}] = 6$$(and \(E[F_{HD}] = 4\)).

Next, let’s change the target pattern from HH to HT. This changes the transition matrix to:

H | T | D | |

H | 0.5 | 0 | 0.5 |

T | 0.5 | 0.5 | 0 |

D | 0 | 0 | 1 |

Solve for first passage times using the revised matrix, and we get $$E[F_{TD}] = 4$$(and \(E[F_{HD}]=2\)).

Conclusion: Tossing one coin continuously, we need an average of four tosses to find HT but an average of six to find HH.

#### Flipping a Pair of Coins

Now suppose we flip two coins at a time. Again, we start with HH as the target. Our transition probability matrix is now the following:

H | T | D | |

H | 0.25 | 0.25 | 0.5 |

T | 0.25 | 0.5 | 0.25 |

D | 0 | 0 | 1 |

To clarify the transition probabilities, if my last toss (of two coins) ended with a head, HH or HT gives me a “win”, TH keeps me in state H, and TT sends me to state T. If my last toss ended in a tail, HH gives me a “win”, HT or TT keeps me in state T, and TH sends me to state H. The expected first passage time T to D is $$E[F_{TD}] = 3.2;$$but since each transition involves tossing two coins, that means I need an average of 6.4 tosses to win. That slightly exceeds the 6 we had for the single coin version, which makes sense; we “waste” some tosses when the first coin completes the pattern and the second coin proves unnecessary.

For the HT target, the transition probability matrix is:

H | T | D | |

H | 0.25 | 0 | 0.75 |

T | 0.5 | 0.25 | 0.25 |

D | 0 | 0 | 1 |

From state H, HT, TH and TT are all “wins”, while HH keeps me in state H. From state T, only HT is a “win”; HH and TH send me to state H, and TT keeps me in state T. We find$$E[F_{TD}] = \frac{20}{9} = 2.2222…$$and so an average of 4.444… coin tosses are needed to find the HT pattern.

Which pretty much pounded the daylights out of that subject. :-) If you want to experiment with this, I’ve posted some R code (which uses the markovchain library package).

#### Paul Rubin

#### Latest posts by Paul Rubin (see all)

- Stepwise Regression Code Revisited - November 12, 2018
- Pseudocode in LyX - October 30, 2018
- B.S.-ing Precisely - October 29, 2018