I know this is mainly a general information forum and I’m asking what is likely a highly technical question, so maybe I just need direction to a forum where these kinds of questions are more typical. But this is the place for me to get answers to things that I can’t manage to Google, so I’ll start with the place that I know.
So I’m working on a system of predicting the outcome of sporting/gaming events that are between two individual competitors where the result is a win for one and a loss for the other, with no other information available from how that outcome occurred. I have implemented the Glicko2 system as a method of rating these players, and the system provides a point estimate, and a variance of that estimate, of each player’s rating, with which one derives the probability of a player winning a match against another player. This probability is my “prior”.
There is also a head-to-head history between these players. Sometimes it corresponds well with their ratings at the times they met previously, other times it does not, and I figure that there should be a way of updating the probabilities that each player wins based on the additional information of their head-to-head results. We would have from Bayes’ formula:
W = win, L = loss
D = head-to-head data
P(W|D) = P(D|W) * P(W) / (P(D|W) * P(W) + P(D|L) * P(L))
P(W) and P(L) are just the prior probabilities. In order to use the head-to-head data, we need to figure out P(D|W) (and P(D|L)), which is the probability of getting the head-to-head data given that the player will actually win (lose) the next match. After looking through the probability text I had from school a decade ago, in all such Bayes’ formula problems, this probability is given. But how do I figure it out in this case?
I know the system’s estimate of the probability of each player having won each of their previous matches, so I have an expected number of HTH wins with which to compare to the the actual number of HTH wins, as well as the variance of the distribution of the actual number of HTH wins. So with these values I’m expected to find the probability that the HTH data would occur given the next match goes a certain way. I can see how this can in theory be calculable, but I have absolutely no idea how to actually calculate it; I don’t remember that much probability theory, and this may have gone beyond what I learned anyway. I’m guessing you have to integrate something against something else, but I’m at a complete loss for details.
I’ve tried to do some research on Bayesian methods, but I haven’t found anything that I could tell for certain was applicable to the situation I was in. To be sure, there is plenty out there regarding Bayesian inference, but if there’s anything regarding how to solve this particular problem it appears to be over my head, and I’m hoping someone here can offer me a bit more guidance as to how to approach this problem or what examples exist somewhere online that are similar enough to what I’m doing to provide the framework for me to make these calculations.
–
Without using Bayesian inference at all, I decided to model the change in probability as a change in rating of each player by an amount equal to the standard deviation of each player’s rating (from the Glicko system) multiplied by the number of standard deviations (of the distribution of the head-to-head results) that the expected number of head-to-head wins differs from the actual value further multiplied by some coefficient which I will then vary and try to determine which value gives the historically best results. I have no idea if this is remotely statistically valid, but I was able to get the system to make slightly better historical predictions than it made without considering the HTH data. After thinking about how I might justify that calculation, I realized that I should be doing some Bayesian reasoning, and am stuck with the problem I have presented above.