I am glad you asked.
This may not be the most “elementary” way to calculate the expectation, but it is instructive and gives a bit of an insight as to what is going on. In order to calculate the expected length of a string, we need to count how many there are. The claim is that a string that ends with the first occurrence of “HTH” has the following structure:
- zero or more occurrences of “T”; followed by
- zero or more occurrences of “H”, “HTT”, “HTTT”, etc.; followed by
- HTH
For example, if the total length is 3 there is only HTH
Length 4: HHTH, THTH
Length 5: TTHTH, THHTH, HHHTH
and so on.
A generating function which counts these is given by
(1+x+x^2+x^3+\cdots)(1+x+x^2+2x^3+\cdots)\cdot x^3
i.e. from
(1+\mathtt{T}+\mathtt{T}^2+\cdots)(1+\mathtt{H} + \mathtt{HH} + \mathtt{HHH} + \mathtt{HTT} + \cdots)\mathtt{HTH}
Doing some algebra, this equals
\frac{1}{1-x}\cdot \frac{1}{1-x\bigl(\frac{1}{1-x}-x\bigr)}\cdot x^3 = \frac{x^3}{1-2x+x^2-x^3}.
More generally, if you specify your target word (here “HTH”) and consider the structure of strings that end with that word but have no other occurrences of the target word, there is a trick by Guibas and Odlyzko that avoids explicitly calculating the regular expression or the corresponding finite automaton: let S be the language of words that do not contain the target word, and T be the language of words where it occurs only at the end. Appending a single letter to a word in S results in a nonempty word in S or T, and appending the target word to a word in S results in either a word that contains the target at or near the end (you could have overlaps between the beginning and the end of the target word).
In the case of HTH, this yields the equations S(x)+T(x) = 1+2xS(x) and x^3S(x)=T(x)(1+x^2) which we can solve for T(x). More generally, you get
T(x) = \frac{x^k}{x^k+(1-2x)c(x)}
where c(x) is the polynomial whose coefficient of x^i is 1 whenever shifting the target to the right i letters overlaps.
Bottom line is, if you can count the matching strings and compute the generating function, it is just a bit more algebra to calculate the expected time until your pattern occurs: compute xT'(x) and substitute x=1/2. Because if T(x) = a_0 + a_1x + a_2x^2+\cdots then T'(x) = a_1 + 2a_2x+3a_3x^2+\cdots.
NB that for good sequences like HHT you have to wait on average 8, but with a pessimal choice like HHH you have to wait 14.