I was good at mathematics in school. Basic calculations have always come easy for me, and in high school i had a pretty firm grasp on more complicated geometry, calculus, etc. But i’m a historian now, and haven’t done any formal math study or advanced calculations since i left school.
Since arriving in the US, i’ve become a big baseball fan, and also interested in the statistics of the game. Not just ERA and Batting Average and Slugging Percentage, but the more esoteric stuff done by the Sabermetrics crowd on websites like Baseball Prospectus. But i’ve never actually gotten into doing any math myself; i just read what other folks do, and absorb their (usually very good) explanations of what their calculations can tell us. I have no idea how to do a regression analysis, and while i understand why standard deviation is important, i’m still not sure how to read a SD and derive significance from it.
Anyway, i’ve decided to try and get a little better at some of this stuff, so i’ve got a Statistics for Dummies book and a few websites and i’m learning.
The thing i’m working on now—inspired by a post on a baseball blog i read—is trying to see whether there’s any correlation between a team’s September performance and its performance in the playoffs. I have gathered data about the September winning % and the playoff winning percentage for all playoff teams since 1976. That gives me 30 years of data (1994 missing due to strike), and a total of 172 playoff teams (8 teams per year in Wild Card era; 4 teams per year before that).
The thing is, though, that i’m still not clear on when different types of correlation are appropriate. To get correlation for this group of stats, i used the Pearson Product Moment Correlation. Based on the reading i’ve done, it seems appropriate, but i’m not completely confident. Is this an appropriate measure to use in this case? If not, why not? And what should i use instead?
So far, the correlation i’ve derived is between Septmember Win % and Playoff Win %. That seems fairly straightforward, because i’m comparing similar numbers (Win%). The Pearson correlation i’ve arrived at for these numbers is 0.018, or basically no correlation, positive or negative.
I also have another question about determining correlation between September wins and Playoff success, where “success” is defined as winning the World Series. How would i do this? Can i assign “success” a sort of binary value, where Winning World Series = 1 and Not Winning World Series = 0? This one has me confused.
Any help you can give my overtaxed stats brain would be appreciated.