Here’s the problem with counting the number of penalties to analyze the bias of the refs: you’re missing the critical half of the data. The data you need is how many penalties were actually committed versus how many called.
As an example, two teams play. Each has 10 penalties called against them for a similar amount of yards. The refs must be unbiased and got one right, eh? They’ve called each side equally.
Except what if one team actually committed 2 penalties and the other one committed 20? Now suddenly the “balanced” 10 penalties on each side is not even remotely fair - one team was extremely favored by the refs (only being called for half their committed penalties) and the other team was screwed by the refs (by counting mostly phantom penalties against them)
So what you would need to determine bias is a data set that we don’t have - we’d need to know what perfect refs would’ve called versus what our fallible refs actually did call.
In order to objectively gather that data, you’d have to get someone knowledgeable in the rules to go back and view every game, coming up with an objective-as-possible penalty list vs the actual penalties called. And obviously we don’t have that.
But, to a very flawed extent, that’s what you can do on your own when you watch a lot of games closely. You’re building your own mental count of what you see vs what actually gets called, and seeing what patterns emerge.
This is fraught with biases of its own - you’ll be partisan about your own team and against teams you dislike - so it’s certainly a flawed method of analysis. But having flaws in the process does not mean that it’s without merit and could not have merit. If there actually were an imbalance in officiating, people would notice it in this way, and likely be called partisan and dismissed even if it were a real effect.
But let’s take two example teams that I have no particular bias towards or against. In my estimation, based on having watched dozens of games, the Patriots get more of their fair share of favorable officiating, and the Lions tend to receive unfavorable officiating. Do I have the hard data to back me up? No, there’s no PFF-like ref evaluation service publically available. It’s just something learned from observation over time, and given that I don’t care if either team wins or lose, I have no particular reason to skew those observations. Having me unable to prove this point is not the same as saying it’s not real - because the default assumption otherwise is that the refs operate perfectly without any real biases of any sort, which is itself a pretty unlikely thing.