One problem that I see with that algorithm, ccwaterback, is that it’s a linear algorithm. Let’s say that it takes a minute for someone to get woken up, find their way to the telephone, and become cogent enough to call the next person on the list. For the group of 60 that Chronos was talking about, the last person to be called would be called 30 minutes after the first person got the chain started.
I kind of figured it was too simple to be efficient.
But I was also thinking there will be at least 2 people that will see the lights before they are called, that would give you at least 2 “seeds” into the process. With 60-some people on the list, wouldn’t that be a reasonable assumption?
Multiple people seeing the aurora is actually one of the failure cases we have to be careful of. In the simple linear method, each person absolutely must keep calling until he gets in touch with someone, to ensure that the chain is unbroken. But anyone who’s already received a call is going to head outside immediately after making his calls, and will therefore presumably be unavailable. So if two people separated on the list both see the aurora independently (that is, before they get the call), then it would be quite easy for one or more people to end up trying to call into a dead zone of people who have already gotten word. Since such a dead zone will grow as fast as folks can make the phone calls, it’s quite possible that the last person before the dead zone could end up trying to call every single person on the list, which could easily take an hour or two.
And I think that we can take it for granted that anyone with an old list is going to cause problems, but I don’t think there’s any way around that. The very best case scenario there is that it’s obvious that that person has an old list, and is therefore a known dead node. But it’s still a dead node, and would be dealt with in the same manner as any other dead node (due to a person having moved, or not being home, or already being outside to see the aurora, or any other reason).
Maybe invest in an automatic dialer? Once someone calls the automatic dialer, it calls everyone else.

But yeah, I see the major problem here. If I call person X and they don’t answer, I have no way of telling whether they have received the call or not, and thereby, no way of telling if they have made a call. The only way I see to clear this up is require everyone to have a cell phone, and to carry it with them after they receive the call.
In the graph I outlined above, you can proceed as follows:
- The first person to see the aurora calls everyone adjacent to him.
- Each of them calls everyone adjacent to him except the person who called them.
- Anyone who’s already called everyone they need to stops answering the phone.
If you go with a rectangular grid, here are some ways to gain efficiency:
-
For a population x, choose the grid’s dimensions m and n to be even numbers as close as possible to SQRT(x). For example, with 35 people, rather than do 5x7, do a 6x6 with a missing square; the call-time formulae at the bottom of this post reveal why.
-
If m or n is strictly greater than 5, the first call the originator makes should be to his “opposite”, who sits INT(m/2) squares horizontally away and INT(n/2) squares vertically away. This is the person who would otherwise be the last to know. The first recipient is surrounded by un-contacted people and acts just as the originator, doubling the speed of propagation at a minor overhead cost.
The refined ruleset I’ve come up with is as follows:
(0) If you are the originator, call your opposite.
(1) If you are called from above, call down; if you are called from below, call up. If you made or received any calls in (0), call up and down. If a call up or down does not go through, call left and inform them that they must complete your call up or down (as applicable) before executing (2).
(2) If you are called from the left, call right; if you are called from the right, call left. If you made or received any calls for step (1), call left and right. If a call left or right does not go through, call the next person your contact would have called.
On an optimized unbroken grid of m x n, you can call everyone in a number of “rounds” equal to
INT(m/2) + INT(n/2) + 1
and reduce your call time further to
INT(m/4) + INT(n/4) + 2
if you make use of the “opposite” in step (0). I estimated that it was inefficient to do so for m or n less than 6. You pay pretty dearly for the first call, but reap nice rewards when you get up around x=100, where your optimum contact time drops from 11 rounds to 8 rounds.
…you can address “critical failures” and collisions by adding a last rule:
(3) If you talk to everyone you were supposed to call – even if they call you before you manage to call them – you’re done. Go outside!
Because every chain of calls has at least one other chain heading towards it in the opposite direction, you will get one collision in every row – two if you are using the opposite as step (0). A collision tells you that your row (or your half of your row) is complete and that the wave can propagate no further in the direction you are pushing it.
Because there are so many collisions (redundant positive contacts) going on, I can’t see how anyone able to receive a call could get left out (false negative contacts). Can any of you see a flaw in this arrangement?
Well, if you need to worry about multiple originators, but don’t want people needing to call too many other people, things get more tricky. I think that any kind of adjacency map can lead to a partitioning of the set into two groups that cannot reach one-another. There has to be an indeterminate number of people a given person needs to contact, otherwise there is a chance that a valid person will never be contacted. If you have a good idea about the reliability of different people, though, you can arrange the contact order such that you have a pretty good upper bound on how many folks a given person needs to contact before they give up.
If you want a hard upper bound on the number of people that a given person needs to try to contact, I think that toroidal connectivity is the way to go. Just set up a grid of people, with folks connected to the four people directly adjacent to them, wrapping the top around to the bottom and the left over to the right, too, so that everyone has exactly four neighbors. If you’re one of the originators of the aurora sighting, you try to contact the four people adjacent to you, and if you are woken up by one of your grid-neighbors, you try to contact your other three neighbors. After you’ve done that, you’re free to go out.
It’s really easy to add and subtract people from the list. If you’re adding a person to the list, put them in a partially filled row and column. If it’s already a perfect rectangle, start a new row or column. If someone leaves, fill in the hole from a partially-filled row/column that is the only partially-filled row or is the only partially-filled column. There’s a small degeneracy when a row or a column has exactly one or two people in it. If exactly one person is in a row or column, they only have two neighbors, and if exactly two people are in a row or column they only have three neighbors. There are a couple of ways to deal with those situations, but I think that the simplest ways are to either adjust the grid dimensions, or to give two people with four neighbors the additional onus of temporarily having an additional neighbor if that isn’t workable. For an example of the latter:
A B C D E
F G H I J
K L M N O
P Q R
Here, A is adjacent to B, F, E, and P; G is adjacent to B, F, H, and L; N is adjacent to I, M, O, and D, etc… Let’s say that person H can no longer be part of the group. There are 2 partially filled columns but only 1 partially filled row. So take one person out of that row, say, R, and put them in H’s old position:
A B C D E
F G R I J
K L M N O
P Q
Now P and Q only have 3 neighbors each – Q is adjacent to P, B, and L, and P is adjacent to Q, K, and A. So, arbitrarily pick 2 people who are adjacent to neither, maybe I and O, and connect one of them to P and the other to Q. If P then left or had to replace someone, then connect both I and O to Q. But aside from those corner-cases, it’s really easy for people to see exactly who they have to contact once they get called, and even in those corner-cases, it only requires a little bit more thought for 3 or 4 folks in the group. You could even write a script that creates a list for each person of people that they need to contact.
The flaw in this method, as with any other adjacency scheme, is that it’s highly dependant on the connectivity of failed nodes. There is the trivial case where one person’s four neighbors are all unavailable, but you can segment the group into larger sets with less than half of the whole group unavailable. For instance, for the grid:
A B C D E F
G H I J K L
M N O P Q R
S T U V W X
Let’s say that people A, E, H, L, O, Q, T and V are unavailable. Not that many, only 8 out of 24. But any remaining person who sees an aurora will only be able to spread information to half of the available group. On the other hand, if you make dimensions of each side more squarelike:
A B C D E
F G H I J
K L M N O
P Q R S T
U V W X
It’s a little bit more difficult to have large disjoint sets. With different subsets of 10 people, you can partition the remainder into two sets of 7 people each.
Granted. However, our current method, which does have an indeterminate number of calls, is still fallible, due to the fallibility of the individual humans. Based on experience, I suspect that an error-correcting system would be more reliable, even if it can be broken by unavailable nodes (remember, this is not a purely theoretical exercise).
It looks like ultrafilter, Punoqllads, and Jurph are all proposing solutions similar to my toroidal idea, with the differences mainly lying in the robustness and in the number of calls necessary for each person. If I’m understanding correctly, Jurph’s method includes only as much redundancy as is necessary, introduced on the fly. ultrafilter’s network seems a bit sparse to me; while it’s constructed to be guaranteed to survive the loss of any single node, it wouldn’t necessarily survive the loss of two nodes, and it also has an assymetry, in that some people will have more neighbors than others. Punoqllads’ method seems to differ from my proposal only in using a square grid, rather than hexagonal: While this will probably increase robustness, it’s at the expense of an extra call for each person, and I’m not certain that the benefit justifies the extra cost.
In a system designed to work around “dead” nodes, simply include a “do not call after” time in the name-cell of anyone so-inclined, and ensure that the ones with the earliest bedtimes are all grouped in such a way that they don’t jeopardize a night owl. For example, in my method, you choose your grid to have even side lengths – but you could make everyone with a 10pm curfew occupy one column, and even head the column with “early birds - no calls after 10”. Since none of them will be the originator, just make sure that the people who end up next to the early birds on the chart know to treat their neighbor as a dead node after 10pm.
Well, you can sort of intuitively grasp that each person available gets called under my system, and if that’s your main measure of success, then you shouldn’t choose a method that has any probability of failure, as long as it’s provable that a method exists that can’t fail. In general, examine cases where
- The first handful of people the originator tries to call are dead nodes
- An entire geometric group are dead nodes (rows, columns, hexagons)
- A live node is surrounded by dead nodes
…and if your system doesn’t fail then, it’s probably pretty robust. I’ve never seen the aurora, but I always thought of it as somewhat transient and unpredictable, so I wanted my system to be fast. Perhaps instead of judging a system by its probability of success (to my mind, a binary criteria) you should judge by its performance degradation as it loses nodes. Rigorous calculation of this would, I think, be a job for Monte Carlo. 
I’ve done the math on how fast my system is when everything goes according to plan, but I also looked at a worst-case scenario of five early failures (the originator’s first five calls) that can nearly double the time it takes to call sixty people. Sheer bad luck, but I’ve been evaluating this for use as my office’s emergency phone tree, too, and in an emergency, “bad luck” can manifest any number of ways – e.g., when it snows at 2pm on a Friday in DC, anyone with kids is unreachable (or if they are reachable, they can’t call out) because they’re rushing to get kids out of day care and outside the beltway. In my office, that means an emergency drill might have to rely on 50% availability.
My suggestion of a square grid wasn’t so much for the robustness of the algorithm, but due to a concern that the concept of hexagons might be beyond the ken of astrophysicists.
Seriously, though, using a square grid wasn’t about robustness but about the ease of use, both for printing out and for adding and removing users, which you said were also concerns.
Speaking of which, I think I have a way of adding and removing users from a grid that doesn’t require that connectivity kluge when some rows are particular lengths. The idea is to make the grid such that there is never more than one hole in any row or in any column. Where there is a hole, the lines of communication of the person above and the person below cross over the lines of communication of the person on the right and the person on the left. The price you pay is that when adding a user to a grid that is a fully filled-in rectangle, you need to alter many users’ communication lines by pulling appart the current rectangle into two stairsteped triangles.
So, adding a user U to a fully populated 4x5 grid would look like:
before: after:
A B C D E + B C D E
F G H I J A + H I J
K L M N O F G + N O
P Q R S T K L M + T
P Q R S U
So, for this example, A was initially adjacent to B, E, F, and P, but after the split, A is adjacent to F, H, J, and P. T was initially adjacent to E, O, P, and S, but after the split is adjacent to K, M, O, and U.
Adding a user U to a fully populated 4x4 grid would look like:
before: after:
A B C D A B C D
E F G H + F G H
I J K L E + K L
M N O P I J + P
M N O U
An easy way to deal with certain users not wanting to be woken up after 11pm could be to have two separate pages, one with a grid for before 11pm, and one for after 11pm. That would probably work for any scheme, actually.
On average, you should find that most people have about the same number of neighbors. But you’re right–it’s not guaranteed to survive the loss of two vertices.
The ideal solution is to send email, and let anyone who’s interested monitor their account. Why don’t you want to do that?
I’m going to bet that even among astrophysicists, there aren’t very many people who leave their e-mail client running with a “new mail” sound that’s loud enough to be heard throughout the house. There are probably also some people for whom the aurora is a novelty – people who would not mind being awakened from a deep sleep by a phone call, but who would probably grow tired of being awakened by a SPAM message coming in at 2am.
Also, most webmail clients don’t have system hooks that can play a sound, so those people would need to sit in front of their computers every night waiting for the sign. Personally, I’d rather stand outside every night watching for the aurora.
Let me throw out another idea. You could have some phone line owned by the physics department hooked up to a computer after hours. If they department has some kind of voice mail system, the line could be one particular extension that a caller could reach. The computer would have some setup where a person calling in can set off a chain reaction where the computer would call up each person in the list. It could have different sets of people to call up depending upon what time of day it was.
I’m betting there’s some commercial products out there that do something like this, but I think there’s a better route to take. The computer science department at your university probably has a software engineering course, where students group into teams and work on projects submitted by outside groups. This sort of course is always looking for outside projects to hand to the students, where it won’t be a big problem if the students are unable to fully implement the project before the semester is over. This sort of project seems to me to be an ideal candidate for such a course.
I’ll ask around what people think of the idea of an automatic caller. I have the hunch that if it were desireable, it would already have been implemented, but then again, maybe that’s just what everyone else was thinking. One problem which occurs to me is that with the current system, callers can add comments: “It’s mostly cloudy, but you can see lights poking through the gaps”, or “This is a bright one; you might be able to get good photos”. I suppose that an auto-dialer could be configured to let the initiator add a voice message, though. Another concern is security: If we have an auto-dialler which can be initiated by a phone call, we wouldn’t want any old Joe Schmo to be able to start it. This would presumably mean either verification of originating phone number, or some sort of password once you reached the dialler.