Statistical validity of soccer scoring

This is my first post to the message board, and I have searched the archives.

Basically, what I’m looking for here is someone to provide a sound statistical argument that the scoring in soccer is almost useless in determing which team is better. I don’t care whether soccer is a good sport or not, and I don’t care whether it’s fun or not fun to watch because of the low scores. I’m only interested in whether the scores can be a meaningful indicator of which team is better.

I really know very little about soccer, but my thinking goes basically like this: each scoring drive in soccer has a small chance of success. In order for a team’s actual score to represent their ability, there has to be a large number of scoring drives, but in soccer there usually isn’t a large number of scoring drives. And this is exacerbated if you further expect the score to indicate the better team. Now, in addition to needing a large number of drives to even accurately represent a team’s ability, what you really need is enough drives to reflect one team’s slight edge over the other, which dramatically drives up the number of scoring drives needed for an accurate result.

Suppose we had a special coin that would only show heads in 10% of tosses and another that would show heads in 20% of tosses. Believe me, I’m no stat major and I labored over this quite a bit, but my results show that if you flipped each coin 5 times, the 20% coin would have only about a 49% chance of yielding a higher number of heads than the 10% coin. This is to say that the 20% coin would be more likely to lose or tie than to win the game of five tosses.

Now if the two coins were instead two soccer teams, and each team gets to make 5 scoring drives during the game, then we essentially have the coin toss scenario above. We can say that the team that is twice as good as the other team is still not even likely to win. A tie or a loss is actually more likely for the vastly better team than a win.

Besides the basic soundness of my coin toss probability math (and I implore people who don’t have a solid background in math to resist the urge to throw up some simple calculations because probabilities are beguiling complicated), we’d need to define “scoring drive” and address what the real-world average number of “scoring drives” is in a soccer game, and what the real-world average probabilities are to score before we could really work this out. But all this should be more-or-less available in the annals of soccer stats if they’re anything like baseball or basketball stats.

My gut feel is that my assertion will be born out and here’s why: soccer is the only international sport. If soccer’s scoring more accurately represented the better team, there wouldn’t be so many competitive countries. There would be a handful of “superpower” teams and everyone else wouldn’t have a chance, ever. But in soccer it seems like any country can have a shot at the big dance. Why? Because it’s pretty much a lottery or a crapshoot and the scoring guarantees that. A team has to be enormously better than another team to have a reasonably sure shot of winning.

If soccer’s scoring more accurately reflected the better team, we’d see scores more like 15-2 instead of this perpetual (and highly suspicious) 0-1. The little countries wouldn’t be able to compete and we’d see a lot less soccer rioting. I mean, who wants to riot when your team loses by 10 or 20 points instead of 1 point?

So, come on stat-boys-and-girls. Enlighten us!

By definition scoring in soccer is how to determine which team is better. If you want 15 to 4 scores you need a different sport.

I would tend to think the percentage is much lower than you imply, and that the spread between teams is much higher. For instance, Canada may score .25% of the time and Brazil may be 2%. Over a game (which has much more than 5 scoring chances BTW), it is very likely that Canada will not score. It is likely that Brazil will. Thus 1-0.

I think you’re confusing the excitement of a cup competition like the latter stages of the World Cup (a one-match knockout tournament) with a league season lasting 8 plus months.

Cups are about the romance, the excitement, the uncertainty of David’s and Goliath’s battling for 90 minutes (or however long it takes). That has little in common with a regular season.

kdonn, I know nothing about soccer but I think your premise is sound – and nicely articulated in you somewhat longish post.

Welcome to the SDMB, BTW.

I don’t know if this helps or not, but one of the most hilarious, exasperating matches I ever saw was France v. Paraguay in 1998. France was on its way to winning the Cup that year, but Paraguay had an ace in the hole–a penalty-shot kicker who rarely missed.

So Paraguay tried like hell to ride out the entire game on defense in order to cash in on the assured one goal in the penalty kick-off. Any time France looked like they had a break, someone would fake an injury or kick the ball out of play. They lasted through all of regulation play and far into the sudden death period, but France finally stuck one in to win it with a few minutes of play remaining.

I mention this because it seems to slightly contradict your theory, kdonn. Paraguay knew that France was the better team, and attempted to compensate by keeping most of their players on the defensive, with the knowledge that the advantage would shift to them once normal play ended. In that single example, Paraguay still lost, 0-1.

But the score belies the complex and insanely boring strategy which was employed to defeat those odds, and which very nearly succeeded. The coins you’re tossing are thinking very hard about which way they want to land.

The playing-for-penalties tactic is certainly not unknown. Red Star Belgrade’s awful defensive performance against Olympique Marseille in the 1991 European Cup was purely designed to take the game to penalties. Argentina’s tactics in the 1990 World Cup final against West Germany were purely designed to bring the game to penalties, where Argentina’s penalty-king 'keeper Sergio Goycochea could do his thing.

The thing is that THERE ARE “super power teams” and not all the teams are on the same level. The teams in the World Cup are not all THE supepowers, but do include the superpower of soccer/football. Can anyone have a shot? No. Look at China, Tunisia, Saudi Arabia, etc. Remember the game between Australia vs. American Samoa (31-0)? The score can reflect the level of game between two countries or teams. Look at the score between Germany vs. Saudi Arabia (8-0) and compare to Germany vs. Ireland (1-1). Of course, a score such as the aforementioned can tell you that Saudi Arabia is a very weak team wheres Ireland is not. Now, there are more stats you have to consider in order to appreciate the level of game and they are: how did the scoring progress (first time vs. secon time? For exampole Senegal vs. Denmark 1:1 (1:0). What was the % of possesion? Shots at goal? Corner Shots? % of possession in the offensive side? etc.

Well, 8-0’s do occur ask Saudi Arabia ( I am not sure a riot ensued :wink: ). The reason we don’t more 15-2 is that more countries are reaching the good level of game vs. the traditional superpowers.

I guess what I am trying to say is that most sports have a certain amount of luck involved, but it is the good teams/athletes that win by skills.

XicanoreX

Your analysis is flawed. Scoring in soccer is not a random event like a coin toss, and the probabilities are meaningless, since they only represent the results of the past, but not the future.

For example, teams don’t have the same number of scoring drives in a game, as in your example. The better team will have more in most games. With more drives, their chances of scoring are better.

And nothing in sport is mathematically certain. You can run simulations in any sport for years and never come up with the same results as real life. I have, for instance, baseball simulation software on my computer. I can run a simulation of the 6th game of the 1986 World Series thousands of times, and it’s hardly likely any of the results will ever match the actual game.

When dealing with sports, the math describes the past, but cannot predict the future. A team may usually only convert on .05% of its goal attempts; but in any one game, it will convert on, say, 25%. The probability over the course of a season or tournament is meaningless when applied to any one game.

Finally, all sports have their upsets. While the better team usually wins, there are always times when they’re off their game, or run into a streak of bad luck (their best player breaking a leg, for instance, or a ball making a bad bounce).

His claims do have validity though. What he is essentially saying is that the statisitcal noise is drowning out the signal when it comes to scoring. There is a statisitcally significant chance that a not-so-good team will get a goal while a very-good team will not, thus giving the game to the not-so-good team.

His claims do have validity though. What he is essentially saying is that the statisitcal noise is drowning out the signal when it comes to scoring. There is a statisitcally significant chance that a not-so-good team will get a goal while a very-good team will not, thus giving the game to the not-so-good team.

The World Cup, unlike nearly every other major sporting event in the world, just doesn’t have surprise winners. That is surprise winner of the championship.

There are surprises in the early rounds and this year’s tournament has four teams in the quarterfinals that have never made it that far before.

Still there have only been seven nations that have won the Cup. And six times the home nation has won (Uruguay 1930, Italy 1934, England 1966, Germany 1974, Argentina 1978, France 1998).

Also in the top leagues in the world, only a few teams win the league championships. Very rarely does a team come from the bottom half of the standings in one season to win the next year.

Another reason why you don’t see huge margins in soccer games is that there is often little reason for a team that is ahead by a comfortable margin, say 3-0, to try to score more. It’s safer to defend.

Sometimes teams may need to add to their goal difference in which case they will push ahead to score more.

Yep, I’m mostly with you on this one,kdonn. I’m not sure your whole analysis is perfect, but the underlying point namely that in a single game of soccer there would seem to be far too much of a chance that the better team will not win, is a point that I have often wondered about.

I’d really like to see someone who is confident on statistics do an analysis of the situation.

However, I have the following comments. Firstly, I think you can throw out the whole concept of “scoring drives” and just do your analysis on the simple fact of how often a team scores a goal in a game.

Secondly, as others have said, your analysis is particularly relevant in competitions where there are few games, and a a large element of knockout. Because then by “fluking” a small number of wins, a poorer team can advance significantly. But most of the big competitions (the English competitions for example) the season is endless and everyone plays everyone else numerous times, and so there is in fact a chance for the laws of chance to make the most of small advantages. And sure enough in those competitions, a handful of clearly better teams do stand out over the course of a season.

Thirdly, one factor that is not accounted for in your statistics is the change in tactics that results from scoring. Soccer is a game in which it is much easier to close down one’s defence than it is to successfully attack. Therefore, once a team is in the lead, they will often choose to defend, and not try to score. The result is that scorelines often look very close (0-1, 2-1 etc) but are not necessarily reflective of the winner’s real ability advantage: the winner may well have deliberately only attacked hard until they had a one goal advantage.

Welcome to the board, kdonn and I hope your future queries are all as interesting and thoughtful as this one.

Oh, and RealityChuck I think you are seriously missing the point here. I think you would agree that (indeed it seems to be the thrust of your post that) a team may well score more or less in a given game than a past average would predict, right?

The thrust of kdonn’s OP is precisely that in soccer, because scoring rates are so low, there is a high chance that a team will win (or lose) in a given game despite what long term past averages would predict. So you are actually in agreement with him.

And further, RealityChuck, your understanding of statistics, and your charming belief that they do nothing to predict the future but are relevant only to the past surely warms the hearts of all casino owners everywhere.

Crap, my last post got lost. I’ll try to summarize my arguments:

I understand your points, but I’m not sure they’re terribly valid. Soccer, like most sports, has statistical variance. Sometimes favorites lose. Compared to a game like baseball, though, I would argue that soccer has fewer “statistical anamolies.”

If you pick a baseball game at random, the result from ONE game is rarely a good indicator of who the better team is. Bad teams win a lot of games in baseball. I’d say the worst team in baseball has a better chance of beating the best team on any given day than the same situation in soccer. (Save for the amazing Senegal win.)

If baseball had the same format as the World Cup, then I can easily envisage the Cubs taking home the title. Or the Red Sox. Or anyone. If soccer had a best-of-5 or best-of-7 structure, then I’d doubt you’d see Senegal taking the series from France. (But you never know.)

As has been explained, there ARE powerhouses in international football. Brazil, France, Germany, Argentina, Spain, to name a few. A couple of them got knocked out (yes, it was surprising,) but you still have most of the usual suspects going through to the quarters. Brazil, Spain, England and perhaps Germany. (OK, England may not be traditionally a World Cup powerhouse, but they play at a fine level.)

It’s not neat or statistically accurate, but you can get a ballpark (no pun intended) idea of statistical variance by looking at betting lines. There are clearer favorites in soccer than in baseball. You will generally make more money on an underdog winning in soccer than in baseball because it happens less often. Usually, there’s more risk (hence higher reward) in betting Senegal over France than, say, the Brewers over the Red Sox. Hell, look at last year. Statistically the Mariners were the best team, with a significantly higher winning percentage than any other team in baseball. Nobody else was over .600. The Mariners were .716!!!
But no title to them.

Having a look at the betting merel tells you what the betting public thinks the chances of a team winning is. It is in no way an indication of the actual probability. The way bets work is that they are structured so that no matter who wins, the guy who takes the bet makes money.

Yes, Shalmanese, I know. That’s why I qualified my statement saying it’s not a statistically accurate way to look at the question, but, in my opinion, a decent general assumption. I would reckon that it IS indeed at good general indicator of actual probablity. Sports that are known for having unpredictable results in the long run tend to have closer betting lines than sports where favorites tend to win significantly more often than they lose.

Also, not all odds are structured with betting balanced on both sides. Read this article.

Vegas oddsmakers are probably one of the better ways to figure out who is statistically the better team.

I don’t know where you get 5 scoring drives from. As much as you can break down a “continuous” sport like soccer, I would say there are a few dozen drives a game.

There is some soundness to your basic premise, I’ve thought of it myself. If the bad team gets lucky scores in the first 10 minutes, it is compartively easy for them to pack the defense and play for a 1-0 win.

Note that all your criticisms apply almost as well, maybe even better, to american football, where there are a distinct, small, number of scoring chances. In football, the better thea usually wins, but there’s plenty of upsets. Were my beloved Patriots really the best team, or did we just get lucky? No one can say for sure, that’s part of the fun of sports.

Here’s the math my friend the high power statistical consultant came up with (took him all of 2 minutes, the jerk). :

I got a probability of 0.83067 (83%) – subject to typos in Excel, but it seems
right.

Basically, I did it by brute force. That is, I figured out the probability distribution of getting 0 through 5 heads for coin A, and then the same for B:

Heads Coin A Coin B

5 0.00001 0.00032
4 0.00045 0.00640
3 0.00810 0.05120
2 0.07290 0.20480
1 0.32805 0.40960
0 0.59049 0.32768

Then, since the results for Coin A and Coin B are independent events, you
can multiply probabilities to find out the probability of various
combinations.

e.g. P(5 heads for B, 5 heads for A) = 0.00001*0.00032.

Then add up all of the probabilities for the events that B has equal or more heads than A!

So even in your example with only 5 scoring chances, the better team wins 83% of the time.