is there a tool that tells you "why" a particular tuple is not in SQL query result set?

let’s say a query is “supposed” to return a tuple with some identifiable unique id, but in reality it does not. I think the question “why this tuple is not returned” can be, at least in some cases, reformulated as “how does tuple so-and-so not match the given query - e.g. which condition is false, or which subquery is returning no records etc”. I suspect that this is an overly specific case, and that there are also more general ones, some of which may be cleverly reformulated and others may hit the SAT problem and at best could only be approximately heuristically detectable as such.

Anyway, so is there a tool out there that will handle at least the simple cases of this general database programming related user interface issue? Waah, I don’t wanna figure things out by myself, I wanna get computer mommy do it for me. Waah…

What database are you using? There are query analysis tools that can break down complex queries into individual parts that you can test, and most databases have an “explain” command (it may be called something else) that will tell you what indexes and tables will be queried in what order. Usually these are used for optimizing queries.

I am using SQL Server. Sometimes I also use MySQL.

friedo, are there screenshots or puff pieces out there illustrating the power of these query subdivision tools you mention?

I agree that this sounds like a bandaid solution to some of the specific cases of the more general problem I am pointing out. E.g. if just a single unique id is sufficient to describe the missing tuple, manually rewriting the query to use that id and then testing it part by part to see where it all breaks down is not that hard. If the tuple can only be uniquely identified by a subset of its columns, it starts to suck more.

Hmm, and dealing with a stored procedure worth of queries in this case is a separate issue entirely, and an interesting one at that.

ok, update. The particular problem I had got resolved without any query analysis, just by pasting the view into query window and then randomly deleting various of its conditions to see when the expected id finally gets returned. That, of course, is the dumbest possible way of doing this computationally (applicability of SAT complexity is fairly obvious) but in practice the query was sufficiently simple for this to work. Nice thing to know, in the absence of better tools.

Actually, for debugging why the results you are getting are not what you are expecting, I’ve never heard of any way besides some variant of the one you used. I don’t see how an explain plan would help – obviously the main purpose of an explain is for performance optimization, not for analyzing query correctness.

If someone can give me an example of how looking at a query plan can give clues to why some row is not showing up when I think it should, I’d be very happy to see that. I could see how it might show you what tables it’s looking at and possibly what filters it’s using on each one, but I don’t see this helping you in any but the most simple cases.

that would be my suspicion as well.

But what seems to be missing is potential use of the missing tuple’s other columns’ values, if I happen to know them (and chances are I know at least some of them just from running manual queries on underlying tables). If there were an automated tool to insert a whole bunch of these values seamlessly into the query, and then check which one of the conditions ends up failing (that would require extracting/generating a bunch of subqueries and then running them), that would collapse the dimensionality of the underlying SAT and possibly just give an instant answer e.g. “ok, you don’t get it because so-and-so condition in this subquery is never true, so the input to inner join is always empty”

This is the way I would use. If I get an issue ticket saying “I went to the Current Accounts page and looked for the Capital Maintenance Expenses Account and it wasn’t there”, I would probably break down the query like this, possibly finding out something like “It isn’t there because the query is buggy. The query improperly filters out accounts with no activity because it is INNER JOINing with the AccountTransaction table rather than LEFT OUTER JOINing with it.” I have seen an analogous issue in a real system.

I have used query explainers to track down things like a missing join where I thought there was one (due to a mile-long list of badly written implicit joins, for example.)