Graph theory: Efficient, robust calling tree

In my physics department, we have an aurora calling tree. Basically, the idea is that anyone who sees the Northern Lights will call other people, who then call yet others, etc., so that everyone can see them, too. Ideally, one wants such a system to have several qualities:[ol]
[li]It should be efficient: We want everyone to get word quickly.[/li][li]It should be robust: If a person isn’t home, or doesn’t have his copy of the list, others should still be able to get the word.[/li][li]It should not make things too difficult for any particular person.[/li][li]It should be startable by any person on the list.[/li][li]It should be simple: The entire list and instructions should fit onto two sides of a sheet of paper, and be useable by people woken up at 2 in the morning.[/li][/ol] Currently, the system we use has six circles. The first person to see the aurora calls one person on each circle, and then calling proceeds clockwise around each circle. If you can’t get in touch with someone, then you call the next person around the circle until you get someone. I think we can do better: Under our current system, the first person needs to call at least six people, which is a bit of a burden to place on any given person, especially the starter (you don’t want to discourage starting the list). Each person must call an indeterminate number of others, since you don’t know who’s going to be home. And if there’s a break anywhere, that can potentially leave almost an entire circle unaware.

My first thought was to arrange the people on the vertices of a regular dodecahedron, with the first person calling all three of his neighbors. Each other person would then call his two neighbors who did not call him. Since each person would be calling two others, and could be called by three others, defects would tend to heal themselves, and if you didn’t get in touch with one or both of your contacts, you could just stop calling, confident that news would probably still reach the downstream nodes via some other route. But, of course, this would only work if there were exactly 20 names on the list, which there are not.

My next thought was a hexagonal grid, with periodic boundary conditions. This has the same advantages as the dodecahedron, but it’s much more scalable (it can be used for any “rectangular” number times 4), and it’s much easier to represent on a flat sheet of paper. But this still isn’t perfect; what if the number of names can’t be fitted exactly onto such a grid? Distort the grid to close up the gaps? Just leave those nodes “dead”?

I now have another potential idea: Arrange all the names on a single cyclic list of N names, and have each person call both the next person on the list, and the person M nodes downstream (ideally, M should be close to sqrt(N) and relatively prime to N). This is simpler than the hex grid, and scales easily to any arbitrary number of names, but I’m afraid that it might not be as robust as the hex grid, since each person can only be called by two others, not three others. Is this assessment correct?

To further complicate matters, some people who sign up for the calling tree don’t want to be woken up after 11:00 PM. Currently, this is accomplished by two cicles just for early people; if you see the aurora after 11:00, you just don’t call those lists. How could this be simply accomplished for my tree methods?

Or is there some other, yet better, method that I’m overlooking? And is there some well-established method (beyond Monte Carlo testing) to assess the probability of failure (defined as a case where a person is available to be called, but is not) for such calling trees? Any other suggestions?

That’s not really a tree that you’re describing–a tree has a unique path between any two vertices, which is not a property you want here.

Basically, you want a graph with no cut edges (i.e., removing a single edge doesn’t change the number of connected components. I don’t know of any way to generate such graphs, but it’s pretty easy to check for that property. Need details?

Sorry, make that cut vertices. Same idea.

Assuming you just want to solve this particular problem, you need an SMS Group.
Everyone is part of the same group.
Anyone can initiate and send an SMS to the group.
If you don’t want to be woken up, then turn your phone off.
Everyone else should get the SMS at the same time.

That would require everyone to own a device capable of receiving an SMS.

Why not make sure everyone has a cell phone that receives email and simply do an email blast.

Because not everyone has a cell phone. Everyone does have e-mail, and we do send messages over the e-mail system, but those typically aren’t read until the next morning (which is generally too late). We need a system which can get people out of bed, which means a ringing phone.

I recognize that “tree” is not the most accurate term here; I use it because that’s the term currently used (the papers handed out are labelled “Aurora Calling Tree” on the top). A better term would probably be “Aurora Calling Network”, and if I end up coordinating it, I might just re-name it that. But names are, of course, irrelevant.

And while I do want a graph with no cut vertices, that’s not the total extent of what I want. It’s fairly easy to show that none of the three ideas I mentioned has any cut vertices. Ideally, though, what I want is a network which has a high probability for remaining connected even when a larger number of vertices are cut. We also want a graph that’s efficient: Approximately, this means that it has a low diameter and a small number of edges per node. I do realize that the requirements are somewhat vague, since I haven’t defined “small number of edges”, “low diameter”, nor “large number of cut vertices”, but it’s hard to pin down optimal values.

Because phones are the medium the problem was posed in. If they all had shortwave radios, we could rig an antenna to emit a high squeal based on aurora radiation and everyone would be awakened whenever the aurora was in the sky… but that’s not the question.

Try unwrapping your dodecahedron or icosahedron and just using the equatorial panels. Draw two parallel lines and then divide the band into equilateral triangles. At each vertex, place a name. The parallel lines “wrap” so nobody is on the end.



<--A---B---C---B--> 
   \  /\  /\  /  
    \/  \/  \/
<---B---C---C---->


Adam calls Betty, Bobby, and Bridget; each of them has a chart, so information flows from just the call (especially if everyone passes on the originator’s name). If Adam calls Betty, she knows that Bobby and Bridget will get calls soon. She knows that her first responsibility is to call Charlie and Clara. Bridget will also be calling Charlie (and Calvin); Bobby will be calling Calvin and Clara. Each person makes two calls and should receive two calls. The originator (Adam) places four calls, however, and receives none unless you go back through the tree to inform everyone of who did and didn’t hear the message. I haven’t looked at all the configurations, but there might be a situation in which some poor bastard gets called four times.

Because the message travels around the loop in both directions at once, and because each pulse traveling around the parallel hoops gets synchronized at each step, I believe that it’s self-healing.

The rules are simple:
0) Call everyone adjacent to you (four calls) to start the chain. Make sure to tell them who you are.

  1. When you receive a call, ask “who started the chain”?
  2. If the call came from the above your name, call both people below you. If the call is coming from below, call the people above you. Tell them where the call came from so they can evaluate which direction it’s going.
  3. If you don’t get a second call from the same direction as the first, call your last link to verify that they got the message or aren’t home.
  4. If neither of your calls from (3) are complete, call one person who they would have called to keep your side of the chain alive.

Some of those steps can be eliminated to reduce redundancy, but I think that will get everyone contacted quickly.

I imagine the most efficient method would be some sort of central repository. If we assume that most people have always-on broadband, what you could do is set up a webpage which could keep track of who to dial. The first person to see the Aurora could log on an “initiate calling tree” which would then randomly shuffle the list of names (with parameters such of time of day and the like) and present the first name on the list. Each subsequent webpage request would present the next name to be dialled until the names ran out. This way, each person is guarenteed to be dialled only once and people can dial as many or as few people as they feel like.

Here’s two methods for your consideration.

#1: List the names in some order. The top few should be those you think are the most available and trustworthy.

When someone spots the event, they start at the top of the list and call until they get a “hit”. That person then calls starting with the name below him until he gets a hit and so forth.

#2: Pick out about four “head honchos” who are likely to be available. Divide up the other persons in four-person groups.

The spotter will call the head honchos until he gets a hit, then he will call the other three in his own group. Only the head guys will need the complete listings, everybody else will know the top list group and their own.

Except for the head guys, the rules are simple. If someone in your group calls you, do nothing. If someone outside your group calls, you are to call the other three in your group.

I won’t go thru possible procedures of how the head honchos might divide up the groups etc, except to say they contact someone in each of the groups.

Without any fancy topology (I wouldn’t know the proper topological terms anyway), I see three important properties:

A) “symmetric” vs “assymetric”. In a symmetric system, links are bidirectional: you call the same people who would call you. This reduces the number of calls actually made. Anyone who calls you need not be called. I didn’t look into assymmetic networks because I didn’t see any advantages.

B) "linkage - how many nodes does each node connect to? (in an assymetric system, the number of outgoing nodes determines the total workload of the system, and the number of incoming nodes determines its failure tolerance)

c) “Shortest Alternate Path” [SAP - I’m making that term up] isn’t a figure of merit per se, but it’s measure of how much impact a dead node will have on the next generation of transfer. SAP=1 is a definite weakness, and SAP=2 might not be optimal, but the effect falls off very rapidly at SAP>2.

A triangular grid fares poorly because any two connected nodes will share a third contact (SAP=1), If You are supposed to call Abe, Bob and Charley, then Abe will always be on either Bob or Charley’s list, too, which magnifies the effect of his absence in the vulnerable early stages.

Your hexagonal solution is a good “3-link solution”. The SAP=5, so the no two adjacent nodes share a common contact. To “wrap the map”, Any two “incomplete nodes” (less than two links on the initial planar hexagonal map) can be linked to each other, as long as they don’t already share another contact (i.e. “no triangles are allowed on the final map” )

For a network failure to occur on a hexagonal map, there have to be 2K nonresponding nodes, where K is the “generation” (i.e. the initial spotter makes the K=1 calls; his contacts in turn make the K=2 calls; etc.) ANY system is vulnerable if all the initial spotter’s links are unavailable at k=1. To avoid this risk, give all members a shared list of 3 “emergency contacts” chosen for their willingness and typical availability.

Unless your observations are geographically dependent, try to make your network NOT correspond to geography, or it will be more vulnerable to physical issues (e.g. a failed phone trunk or a blizzard over a mountain range) A good network may route around eventually, but geographically random links route around physical obstacles more quickly.

I found Jurph’s map very clever, but a 4-link system should give substantially higher reliability than a 3-link, and his scheme can actually fail with just 4 absences, no matter how large the network is: a “break” occurs when there are two linked absences (one on the upper line, and one on the lower) and any two breaks will cut off the entire region of the network between them.

Contrast this with other 4-link systems:
An NxM rectangular grid (where N<M) wrapped around a sphere requires at least N non-responses for a network failure; since N>2 for any nontrivial number of nodes, this is a better solution.

Better yet: an NxM grid wrapped around a Torus requires at least N absences to create a break, and 2N absences (two breaks) for the network to fail. Since N can be maximized by making N=M (as close as possible), a rectangular grid with X nodes can’t fail with less than 2*sqrt(x) nonresponders. For X=16, this network can’t fail until half the individual nodes fail! Even for X=100, the network can’t fail until 20% of the nodes fail to respond.

For Large X, I am fairly sure that you can substantially improve on the rectangular grid wrapped around a torus by assigning 4 links at random, and avoiding SAP=1 (or maybe SAP=2) but the mathematics and proof are beyond me, and may be probablistic. Who needs that?

Here is a useful robust, exponential procedure:

Before anyone calls, you have a wrap-around alphabetical list (a loop), where you are number 0, and everyone else has a number which is how far away from you they are on the loop, the shortest way.

Someone calls you, and gives you a name on the list, and you check his number, and it is N.

We have:

You, 1, 2, … N
or
N, … , 2, 1, You

Now you start calling N and work your way back towards yourself until you reach someone, say N-2. Now you divide N-2 by two and give N-2 that name (call this person M).

We have:

N, N-1, N-2, …, M, M-1, … , 2, 1, You

Then you call 1, 2, etc. until you reach someone. You give whoever you reach the name of M-1.

If you call all the way from N to You, or from You to M-1, and no one is home, that’s ok, you’re done.

Both of the people you call do the same thing.

This is pretty much the minimum number of calls to assure exponential expansion of the phone tree without overlap.

I wrote that it takes 2K nonresponders for a hexagonal grid network to fail in the Kth generation, but it’s 3K.

Wow, lots of ideas, here. I don’t think Jurph’s idea, as-is, will quite work. The actual list has about 60 people, which means that even with no dead nodes, such a trellis-work wraparound would require at least 15 calls to reach the person opposite the originator, which is a bit slow, and the number of people reached as a function of time grows only linearly. aahala’s method would be even slower, and would require each person to call an indeterminate number of others, so I don’t see any advantage over the current system.

Almost everyone on the list is in the same city, so geography isn’t particularly relevant. There are actually some advantages, here: For instance, three nodes on the list will be graduate student offices, and a person in any office is likely to inform the other offices on the way out the door, adding some free redundancy (the phone from one office can’t be easily heard in another, so these three offices can’t be considered as a single node). To make the best use of this, I imagine that the offices should be spaced relatively far apart on the graph.

Any scheme can potentially fail if all of the starter’s contacts are unavailable, of course, but I presume that everyone’s smart enough that even without instructions, in that case they’d pick someone else to call and let that person act as starter. More worrisome is the possibility that the starter is isolated in some small disconnected region (larger than one node, but potentially only, say, two nodes), in which case the tree would fail and the starter would not know to attempt to fix it. For the hexagonal grid, this could occur with as few as 4 dead nodes, regardless of the total size of the grid. Likewise, for any locally rectangular grid, you could have a silent failure with 6 dead nodes, regardless of size.

I’m not sure I entirely follow jawdirk’s proposal, but as-is, it’s definitely no good. It provides no overlap, and therefore maximizes efficiency, but we want some overlap for redundancy.

Here’s an algorithm you can use to generate a pretty good efficient calling network.

Start with K[sub]n[/sub], the complete graph on n vertices. Pick an edge at random. If removing that edge doesn’t create any cut vertices, remove it. Otherwise, leave it alone. Keep doing this until none of the edges can be removed without creating a cut vertex.

You can use the algorithm presented here to find cut vertices fairly quickly.

On the contrary, the practice of calling until you reach someone takes care of redundancy. There is no set calling tree, so no there is no possibility of anyone being isolated.

To describe it more simplisticly: Each caller keeps calling until he reaches two people, and then divides his list of people to reach between them. The details I provided just explain how to do it with a minimum of confusion.

The redundancy is that if you can’t find even one person, you have to call your whole list, and if you can’t find two people, then you have to call half of your list. Fortunately, since it is best-case exponential, your list should be very short in a hurry.

The only way this can fail is if you call someone up, give them their list, they say, “yeah, I’m on it.” and then they go back to sleep. But we’re talking about the Northern Lights here. Who could possibly flake in a matter of such gravity?

Hey all – thanks for the critiques. I’ve been kicking myself all day for going with something so vulnerable. Why not do 6 columns of 10 names, using the following ruleset:

  1. If you are the originator, you are “on the rail” (your column will have more responsibility than the others). You will call everyone adjacent to you, in the order prescribed by the rules.
  2. If you receive a call from above, call down; if from below, call up. You are on the rail. Call left and right after you place your vertical call.
  3. If you call up or down but receive no answer, inform the next person you call (left or right) that they are the new rail.
  4. If you receive a call from left, call right; if from right, call left. You are not on the rail unless otherwise informed.
  5. If a call placed left or right fails, skip them and call the person they would have called.
  6. The grid “wraps around” vertically and horizontally! If you are in the left-most cell and can’t call left, call the right-most cell in your row. If you are in the top row and can’t call up, call the bottom person in your column.

For 60 people arranged in six rails of ten, this reaches the last person in 8 calling cycles. A hole on the rail column hurts you, but not badly, since the chain shunts around to one side. Several holes in a diagonal line can force a worst-case scenario where the rail continues to shunt around, but I think it’s still very difficult to isolate people.

Bonus: a simple grid in Excel (with the rules printed below) is all you need. You can post the grid as a webpage for those who don’t want to keep a hardcopy by the phone.

My idea is to use a two-dimensional grid. It’s actually very similar to what I believe you’re describing is your current system. When a person O first starts the chain, they start calling people in column 1 until they reach someone, call them S[sub]1[/sub]. O can then go on their merry way. S[sub]1[/sub] then continues calling people in column 1 until they hit the bottom or they reach someone, call them T[sub]1[/sub]. T[sub]1[/sub] continues calling down that column until they hit the bottom or reach someone. If they reach someone, that person becomes the new T[sub]1[/sub] and the old T[sub]1[/sub] can go on their merry way.

S[sub]1[/sub] isn’t yet finished, though. They start calling people in column 2 until they reach someone, who will be S[sub]2[/sub]. At that point, S[sub]1[/sub] will then be free to go on their way. If, for some reason, no person in that column could be found, S[sub]1[/sub] would then look in the next column for S[sub]3[/sub]

S[sub]i[/sub] (i = 2, 3, 4, etc.) will then have a job similar to S[sub]1[/sub], calling people in their column until they find a person, T[sub]i[/sub], who will work similarly to T[sub]1[/sub]. S[sub]i[/sub] will then check in the next column to find S[sub]i + 1[/sub]. For a slight optimization, the person O can be passed along by the S people until they hit the column that O is in, and then O is passed along by the T people, so they will know not to bother calling O.

I believe that the optimal way to arrange K people is, for where N is ceiling(sqrt(K)), in an N x N-1 grid, with a final row of K + N - N[sup]2[/sup] people. I believe also, that the best way of determining which people should be in which positions is to put the most “responsible” (I don’t like using that label, but I can’t think of a better one at the moment) people in the first row, the next most-responsible people in the third row, the next set in the fifth row, etc. Once the lowest odd-numbered row is reached, wrap around to the second row, then the fourth row, etc.

The above algorithm will result in O usually calling only 1 person, the S people calling 2 or 3 people, and the T people calling 1 or 2 people. The amount of time for the final person to be called, assuming average time to call someone A, would be about A * (2N - 2).

I tried originally to make the algorithm position-neutral, so that the person in the lower left-hand corner wouldn’t feel so bad about being left out, which essentially made the grid a torus, with O becoming a new S[sub]1[/sub] at the top left of the grid. It made things a little more tricky for me to explain without confusing people. You could probably work out the details if you really thought it important to group morale to avoid a “low totem-pole” position on the list.

Essentially, every person would have the same instructions but a list with themselves at the upper-left-hand corner. The name of O would be passed along to everyone, so each person would know what row/column to stop at, as well as to increase the fame of person O, to boot. The optimal positioning of people in the grid would be to place the most “responisble” people in a checkerboard-like fashion.

Oh, and another problem with the position-neutral algorithm is that I think that each member having the latest grid of people might be much more important. One person having an old list of people might result in one person being left out on a given night. With everyone having a different list, one person having an old list might cause bigger problems. I’m not sure about that, though – it might have about the same consequences.

I knew I’d screw something up.

In the position-dependant scheme(i.e., one global list that everyone uses), one person having an old list might cause someone to be skipped on a given night.

I’m thinking just have a list of everyone’s name and phone number, and distribute that list. If person C sees the lights and has not received a call or a message, they call person D and person B. Person D then calls person E (forward progression), and person B calls person A (backward progression). Person D and person B will know which way to progress by checking whether the caller’s name is before or after their name on the list. The list will work in a wrap-around manner.

The logic would go like this:

If I have a message, don’t call anyone.

If I receive a call AND I don’t have a message AND I have not called anyone, then perform CALL_LOGIC.

If I see the lights AND I don’t have a message AND I have not received a call, then call people before me AND after me on the list until I have made a successful call in both directions.


CALL_LOGIC:

If the caller is AFTER me on the list, call the person BEFORE me on the list, leaving a message for people that don’t answer and continue calling the next person BEFORE me on the list until I have a successful call.

If the caller is BEFORE me on the list, call the person AFTER me on the list, leaving a message for people that don’t answer and continue calling the next person AFTER me on the list until I have a successful call.


(This is assuming everyone can receive phone messages, and that everyone will do their duty to call according to the logic.)