Can anyone explain to me what log likelihood is? And can you please do it like I’m a complete idiot and have a less than basic understanding of statistics?

thanks in advance.

Can anyone explain to me what log likelihood is? And can you please do it like I’m a complete idiot and have a less than basic understanding of statistics?

thanks in advance.

Log likelihood is, in essence, the logarithm of the probability of your sample data given the parameters of your model.

Suppose that you’re assuming that your sample x_1, x_2, …, x_n are independent and identically distributed with density f(x | p), where p is the vector of parameters for that density. The likelihood of that sample is f(x_1 | p) * f(x_2 | p) * … * f(x_n | p). Then the log likelihood is log(f(x_1 | p)) + log(f(x_2 | p)) + … + log(f(x_n | p)).

There are a couple reasons for taking logs: it might give you something easier to work with, especially for normal/exponential family distributions, and the product that goes into computing the likelihood can be too small for computers to deal with.

Does that help?

This made me immediately think of how information is measured in a data sequence… in which case “log likelihood” is just a way of compositing multiple probabilities into numbers easier for humans to work with.

Wikipedia kicks you to Likelihood-ratio test. Which, upon skimming, seems exactly what it’s used to do.

I appreciate the effort, but this is still way over my head. I am trying to help someone understand what it means, but the only math class I took in college was math 101 (i.e., math for history majors). I guess what I’m saying is that if you could explain it to me like I’m 8 years old then it would be a major help.

Thanks.

It’s a simple measure of either how likely your data is given your model, or how likely your model is given your data.

To better tune the answer, could you tell us which of these terms are okay to use? (Saying “pretend I’m an 8th grader” doesn’t really help us much.)

- random
- logarithm
- product
- probability
- probability distribution
- independent (in the context of random numbers)
- normal (or Gaussian) distribution
- the notation “f(x)”
- the notation “f(x;p)”

In the meantime, I will assume only “probability” and “product” and “logarithm”.

Say you roll two dice ten times and get this sequence of values (the sum of the dice pips) : *8 10 8 7 8 9 8 5 7 9*. Assuming fair dice, the probability of getting a total of 2 on a roll is 1/36. For 3 it’s 2/36=1/18. For 7 it’s 6/36=1/6.

The “likelihood” of your data set, assuming fair dice, is the probability of observing the data set. In this (and most) cases, the probability of observing the data set is the product of the probabilities of observing the individual data points. So:

likelihood = (probability of getting 8) x (probability of getting 10) x ( etc… )

likelihood = (5/36) x (3/36) x ( etc… )

The “log(likelihood)” is the logarithm of this:

log(likelihood) = log(5/36) + log(3/36) + ( etc… )

Notice the effect of the logarithm: the different data points show up in terms that are added together. A sum of many numbers is much easier to deal with computationally than a product of many numbers. Simplifying further:

log(likelihood) = log(5) + log(3) + ( eight more terms ) - 10 log(36)

The likelihood (and thus log(likelihood) ) you calculate for a data set depends on the assumptions you put in. And, you can test how well different sets of assumptions fare against one another by seeing which ones have the largest likelihoods.

What if I told you that one of the dice *might* actually have the numbers “1,1,2,3,4,5” on its six sides instead of “1,2,3,4,5,6”? The above data set is still perfectly possible. You could calculate the likelihood of your data under this new scenario and see if it is higher or lower than the “fair dice” scenario. Here are the probabilities under both scenarios:

```
probabilities:
fair a bad
outcome dice die
2 1/36 2/36
3 2/36 3/36
4 3/36 4/36
5 4/36 5/36
6 5/36 6/36
7 6/36 6/36
8 5/36 4/36
9 4/36 3/36
10 3/36 2/36
11 2/36 1/36
12 1/36 0/36
```

So…

Fair scenario:

log(likelihood) = log(5) + log(3) + log(5) + log(6) + log(5) + log(4) + log(5) + log(4) + log(6) + log(4) - 10 log(36)

log(likelihood) = -8.92754126

Unfair scenario:

log(likelihood) = log(4) + log(2) + log(4) + log(6) + log(4) + log(3) + log(4) + log(5) + log(7) + log(3) - 10 log(36)

log(likelihood) = -9.57729324

The one with the bigger likelihood (or log(likelihood)) is the more favored scenario according to the data. In this case, the “fair” scenario is favored (since -8.9 > -9.6). Don’t let the negative numbers throw you off – that’s just the logarithm doing its thing. If we convert back to plain old likelihood:

likelihood = e[sup]log(likelihood)[/sup]

fair: 0.000132683859

unfair: 0.0000692842299

The “likelihood” is a very small number since the probability of getting any particular set of ten outcomes is very low. But regardless of the small scale of the numbers, the fair scenario is 1.9 times more likely than the unfair scenario.

How’s this? I don’t know in what context you’re dealing with log likelihood so I’m not sure if it’ll help. I hope it does.

In ‘regular’ (OLS) linear regression models, there are a couple of ways to gauge how well a model predicts values on a dependent variable for a set of cases. One the most often used indicators is R^2, which is interpreted as the ratio of explained variance to all the variance present in the dependent variable.

There are also other regression models in which the prediction made is not an actual value (the model predicts that I make 80,000 USD anually, for instance) but rather a chance of an event occuring (the model predicts that the chance of me voting for party A is 75 per cent … and thus it predicts that I will vote for party A). In such a regression analysis, known as logistic regression, there is not much of a point in assessing variance because there is no meaningful variance: I either vote for party A or I don’t, I can’t half vote for party A.

As a result, there is no way to calculate R^2 and interpret it meaningfully. Instead, statisticians have generated other ways to assess the extent to which a model (i.e. a set of independent variables hypothesized to affect the dependent variable) succeeds in generating correct predictions. One of these ways is log-likelihood (LL), which includes the predicted probability of an event happening in a certain case and the actual outcome in that case. LL is typically multiplied by -2 to make it possible to interpret it like Chi-Square. Another way is to simply calculate the percentage of correctly predicted cases. Finally, there have been approximations of R^2 (such as Cox and Snell, and Nagelkerke) called pseudo-R^2. These R^2s are based on log-likelihood.

Long story short: log likelihood is a measure used to assess the explanatory strength of a model in logistic regression.