Here’s one way to think about it (re-expressing the observations that Dr. Strangelove made):
Any point within a triangle can be uniquely expressed as some weighted average of its vertices: if the triangle’s vertices are A, B, and C, then any point within it is of the form aA + bB + cC for some unique weights a, b, and c which sum to 1.
When we split our large ABC triangle up into four sub-triangles in the Sierpinski way, the sub-triangle containing A consists of those points which give more than half their weight to A, the sub-triangle containing B consists of those points which give more than half their weight to B, and similarly for C, while the remaining, central sub-triangle (the one we exclude from the Sierpinski fractal) is where all the individual weights are below 1/2.
In other words, if we look at some point’s weights a, b, and c in binary (so we have a = 0.a[sub]1[/sub]a[sub]2[/sub]a[sub]3[/sub]…, b = 0.b[sub]1[/sub]b[sub]2[/sub]b[sub]3[/sub]…, and similarly for c), since these weights sum to 1, at most one of a[sub]1[/sub], b[sub]1[/sub], and c[sub]1[/sub] is 1. If all three are 0, the point being described lies in the excluded central triangle and is not part of the Sierpinski fractal. Otherwise, knowing which of these is 1 tells us whether we are in the sub-triangle containing A, B, or C, while the remaining bits 0.a[sub]2[/sub]a[sub]3[/sub]a[sub]4[/sub]…, 0.b[sub]2[/sub]b[sub]3[/sub]b[sub]4[/sub]…, and 0.c[sub]2[/sub]c[sub]3[/sub]c[sub]4[/sub]… describe how we are situated within that sub-triangle (note that these will also sum to 1).
Accordingly, a point is in the Sierpinski fractal just in case, when we write its weights a, b, and c in binary, we find that there is no bit-position at which they are all simultaneously 0 (rather, at each bit-position, precisely one of them is 1 and the others are 0).
Note what happens to these binary expansions when you take the half-and-half average of some arbitrary point aA + bB + cC with one of A, B, or C: it’s as though you stick the bit 1 in front of one of the weights a, b, and c [according as to whether it was A, B, or C you were moving towards], and you stick the bit 0 in front of the other two.
Now, if we assume lower-order bits are visually insignificant, so that only the first N bits of the weights matter, then we will find that when we go to draw a point, the only thing that matters are the last N vertices randomly chosen (the last vertex chosen determining the first bit of our weights [one weight receiving here a bit 1 and the others receiving bit 0 according as to which vertex was chosen], the second last vertex chosen determining the second bit of our weights in the same way, and so on).
In this way, we find that at each moment, of the about 3[sup]N[/sup] many possible points in the Sierpinski fractal (up to the indistinguishability of our not caring about weights beyond the first N bits), we pick one at random. There will be some correlation between the point chosen at one moment and the point chosen in the very next iteration, but there will be no correlation between the point chosen at one moment and the point chosen N or more iterations later. Accordingly, we expect that almost certainly, over time, we will end up choosing each of these points about equally often, thus drawing out the Sierpinski fractal to suitable resolution.