How to find the depth of hierarchical data in an adjacency list?

I have a MySQL database with an adjacency table, and the data has a variable-depth hierarchy. Is there a way using only SQL to find the depth of each leaf? In my searches on the web, the only answer to “how do I find the depth” that I’ve seen is “consider a nested set”, but I can not change the DB , so I can’t go to a nested set model.

Any help would be appreciated. Thanks

There’s no guaranteed way in pure SQL, but you can fake it by doing a number of self joins. For example, if you know that your adjacency list will never be deeper than ten, do a series of ten self-joins on the table, matching child to parent. Here’s an example:

Suppose we have a table with the following data:



mysql> select * from testfoo;
+--------+-------+
| parent | child |
+--------+-------+
|   NULL |     1 | 
|      1 |     2 | 
|      2 |     3 | 
|      3 |     4 | 
|      1 |     5 | 
|      5 |     6 | 
+--------+-------+
6 rows in set (0.00 sec)


We want to find the depth of child = 4:

We have a high degree of confidence that our depth will never be higher than five levels, so we can do a query like this:



select p1.parent, p2.parent, p3.parent, p4.parent, p5.parent 
  from testfoo p1 
  left join testfoo p2 on (p1.parent = p2.child ) 
  left join testfoo p3 on (p2.parent = p3.child ) 
  left join testfoo p4 on (p3.parent = p4.child ) 
  left join testfoo p5 on (p4.parent = p5.child ) 
where p1.child = 4;


And that gets as the result:



+--------+--------+--------+--------+--------+
| parent | parent | parent | parent | parent |
+--------+--------+--------+--------+--------+
|      3 |      2 |      1 |   NULL |   NULL | 
+--------+--------+--------+--------+--------+
1 row in set (0.00 sec)


And we can calculate the depth by counting the number of NULLs we get back.

This is horribly inefficient, especially if the adjacency list is big or very deep, and it’s not a perfect solution since you still have to count the NULLs that you get back to determine the depth. Also, it won’t work for arbitrary depth; there’s no recursion in pure SQL so you’re stuck with a finite number of self joins.

Actually, it just occurred to me that if you wrap that whole thing in a subselect, you can just count the incidence of true values in an outer select to get the depth.



select if( l1, 1, 0 ) + if( l2, 1, 0 ) + if (l3, 1, 0) + if ( l4, 1, 0 ) + if ( l5, 1, 0 ) as depth 
  from ( select p1.parent as l1, p2.parent as l2, p3.parent as l3, p4.parent as l4, p5.parent as l5 
         from testfoo p1 
         left join testfoo p2 on (p1.parent = p2.child ) 
         left join testfoo p3 on (p2.parent = p3.child ) 
         left join testfoo p4 on (p3.parent = p4.child ) 
         left join testfoo p5 on (p4.parent = p5.child ) 
        where p1.child = 4 
    ) as list;


which gives us



+-------+
| depth |
+-------+
|     3 | 
+-------+
1 row in set (0.00 sec)


Shazam.

Again, we’re limited to a finite depth (five in this case) but the pattern can be extended as long as you can tolerate such shenanigans.

Sweet. Thanks for the assist.

If you have a potentially large depth (e.g. 1000) that you have to deal with, you can do the calculation in O(log n) steps with some secondary tables. Something like:



create table edge_depth0 as
select parent, child, 1 as depth from edges;

create table edge_depth_join as
select d0.parent, d1.child, d0.depth + d1.depth as depth
from edge_depth0 d0 join edge_depth0 d1
on d0.child = d1.parent;

create table edge_depth1 as
select parent, child, max(depth) as depth
from
(
  select parent, child, depth
  from edge_depth0
  union
  select parent, child, depth
  from edge_depth_join
)
group by parent, child;


and then ping-ponging between edge_depth0 and edge_depth1 like so:



truncate table edge_depth_join;

insert into edge_depth_join (parent, child, depth)
select d0.parent, d1.child, d0.depth + d1.depth
from edge_depth1 d0 join edge_depth1 d1
on d0.child = d1.parent;

truncate table edge_depth0;

insert into edge_depth0 (parent, child, depth)
select parent, child, max(depth)
from
(
  select parent, child, depth
  from edge_depth1
  union
  select parent, child, depth
  from edge_depth_join
)
group by parent, child;

truncate table edge_depth_join;

insert into edge_depth_join (parent, child, depth)
select d0.parent, d1.child, d0.depth + d1.depth
from edge_depth0 d0 join edge_depth0 d1
on d0.child = d1.parent;

truncate table edge_depth1;

insert into edge_depth1 (parent, child, depth)
select parent, child, max(depth)
from
(
  select parent, child, depth
  from edge_depth0
  union
  select parent, child, depth
  from edge_depth_join
)
group by parent, child;


Once you find that the number of rows in edge_depth0 is equal to the number of rows in edge_depth1, you can stop ping-ponging. Then, to get the depth of every node you then do the query:



create table node_depth as
select child as node, max(depth) as depth
from edge_depth1
group by child
union
select e0.parent as node, 0 as depth
from edges e0 left join edges e1
on e0.parent = e1.child
where e1.child is null;


Note that the second part of the union is to get the root of the tree or trees in the data structure, which themselves might be leaves if they have no children. Finally, if all you want are just the depths of the leaves in the data – say, sorted by node – you do:



select n.*
from node_depth left join edges e
on n.node = e.parent
where e.parent is null
order by node;


Note that while the number of steps is O(log n), the data size is O(n[sup]2[/sup]).