imagine you are a grad student working on a “make better SQL parser errors” thesis. You start out with the observation that errors being reported right now pretty much suck. Syntax error next to token so-and-so - not as informative as what we are used to seeing from Java compiler syntax check. So it would be nice if the parser had a grasp of the issue closer to human level and could recognize errors in human understandable terms. Right now it seems to just do early bailout.
So presumably for this project it would be nice to get a corpus of erroneous queries. A corpus of valid, legit queries (such as for validating a parser) is a separate issue in itself - I am not sure where even that could be obtained. But where do you get the erroneous queries?
Well, an obvious answer would be “log actual queries being input by human users into an RDBMS while writing their programs”. It seems natural to seek to understand human-relevant mistakes by studying what sort of errors real-life humans end up making. But, if you are that grad student, where do you go to get a corpus of logged queries from an RDBMS server that has enough human users to accumulate lots of queries, some of them erroneous?