distribution of prime numbers in made-up data

Hopefully the title is helpful.

A thought occurred to me as I was explaining Benford’s law to my wife (she’s a teacher and she was marking 7 year old’s maths work)

Would it also be true to say that, in a made up set of data of large enough size (where someone is just plucking numbers out of thin air) there would likely be more prime numbers than if it were a real world data set?

My reasoning is that in the real world you’d get more number that were a product of multiplying and as this isn’t the case for primes perhaps that would be a tell-tale for book-cooking and the like.

Intuitively this feels like it would be so, and more so more for the larger primes but my google-fu has proved weak on this and my maths skills are not impressive.

So…great dopers, have I stumbled on a great mathematical insight? Am I basically an undiscovered Laplace? or has this one been done to death and I’m rather late to the party.

It seems like this is something that is amenable to experiment.

But given the kind of datasets where Benford’s law applies aren’t necessarily the results of “multiplying things together”. They include things like “populations of towns” and “street addresses”, which are not likely to be impoverished of primes by construction. Certainly some things would be prime-deficient by construction, like areas of rooms in square feet, but not as wide a variety of things as Benford’s law applies to.

That being said, if by “someone is just plucking numbers out of thin air”, you mean an actual person picking “random” numbers, those numbers tend to end in 3 and 7 a lot more frequently than truly random numbers, so one would expect more than the usual number of primes just by reducing the multiples of 5.

It’s not obviousd to me that your proposed “too many prime numbers” law is correct.
It’s totally unrelated to Benford’s Law (which has long fascinated me, and which is essentially a formalization of the idea that one big boulder can be broken up into a lot of small rocks and pebbles, if you think about it). That “made up” numbers don’t follow the expected statistics is,. indeed, a way to uncover such fraud, as has been shown with uses of Benford Probabilities, as well as simple statistics of appearance.

But it’s not clear to me why the fraction of prime numbers would be a better indicator of falsehood than any other measure.

Let’s say you wanted to forge a table of , say, the lengths of rivers. Certainly more of those lengths would be composites than primes, since in any real such table that would be the case, because composite numbers greatly outnumber primes. And if there were more primes than you’d expect, that might send up a red flag. But most people aren’t familiar with which numbers are prime one you get past a hundred, and aren’t any more likely to pick a prime number than they would likely appear in a table.

On the other hand, people would be likely to make the distribution of initial digits roughly equal, because that “looks more natural”. The kicker, of course, is that it isn’t – there ought to be a preponderance of 1’s as the first digit (almost a third of the time), and very few 7’s, 8’s, and 9’s – that follows from Benford’s law. So Benford’s law would be very useful in detecting fraud, while Prime numbers would be pretty useless.
Even if you were constrcting a “random” table of double-digit numbers, I can’t see any reason that people would preferentially select more prime numbers than chance alone would allow (and in this situation people would more likely know which ones were prime). But, again, the Benford Probabilities would probably tip you to the imposter.

A slight hijack, but yes, you can tell if data are made up by careful analysis of the distributions. I’ve done it myself when I had an undergrad who made up their senior thesis data. It was an easy catch (took me all of 15 minutes) and they were expelled from the honors program, and had their graduation delayed for a year.

I would have expelled the student, but it wasn’t my call.

Not true. Most “real-world” numbers are real numbers (i.e. not integers). When you’re multiplying real numbers and rounding them, you can up with prime numbers.