Advice on how to learn statistics

Right now, I’m sitting in a biology lab, pouring over a bunch of old data sets. I find myself using a big pile of statistical tests which might as well be black magic to me, because that’s what my boss told me to do. Wilcoxon signed-rank test? Weibull distribution fitting? Sure, if you say so.

I have a vague idea on when a T-test is appropriate and what it will tell me, and I’ve taken a class that covered a bit of combinatorics, but that’s the limit of my knowledge.

Clearly, I should try to figure out what’s going on. I didn’t take a stats course in undergrad, and now I think I have to fill that gap before I move on to grad school. I’m working full time, which really rules out the possibility of taking a normal class. I’d also like to end up with a fairly deep understanding… someday at least. Any good recommendations on how and where I can start learning statistics? I know that a few big universities have been posting lectures online, but I haven’t found any for stats. Is there a good readable textbook that I can use to teach myself?

The only thing I remember about my stat class in undergrad was that it was incredibly hard, I never got above a 60 on any test and still got an A- thanks to the curve. When I needed to implement the Mann-Whitney P test for a software project wiki helped somewhat, but I had to search through a bunch of other sites and correspond with a statistics professor to finally get it working just right for every size of data set. It sounds like you are gonna have to invest in an undergrad stat book and go through the text and the problems if you want broader knowledge.

60% of it is 90% mental.

There’s a Teaching Company course on the topic, but I don’t think it goes into sufficient depth for what you’re interested in.

Currently about 90% of my job duties require statistics. Most of what I know is completely self-taught (I took a course in undergrad, not in grad school). If you have an inclination towards problem-solving and you’re not intimidated by numbers, then you can too.

I don’t have a basic “bible” to recommend, unfortunately. I have some advanced books that I regularly consult, but usually when I’m trying to determine a statistical approach for a specific problem, I go to journal articles and look at what others have done. I think that’s what most biologists do. My recommendation for any budding scientist: emerse yourself in the literature. Learn how people design experiments, become familiar with the statistical tests they use, and then learn how to interpret their results. If you come across a term you don’t know, do a google search on it. Whenever you come across a term you don’t know, even if it’s as basic as “normal distribution”, look it up. Eventually you’ll develop a knowledge base that will make the whole thing less intimidating.

I don’t know what type of biologist you want to be, if you’re planning on going into ecology, I’d highly recommend taking a stats class. Out of all the biological sciences, this is the most challenging when it comes to statistical analysis. I didn’t do this in grad school and I regret it. If you can take a good stats class and simultaneously learn a stats package like SAS or R (which is open source, therefore free!), you can make yourself a very marketable biologist.

Poke around Amazon for books with titles like “Biostatistics” or “Statistics for Biologists” or the like, and ask the higher-ups in your lab for recommendations. If you want to go the more theoretical route, which is a reasonable thing to do, Wackerly et al. is a good undergraduate level textbook, but you’ll need to know some calculus and linear algebra to really get anywhere with it.

ETA: This book looks good, and it’s relatively cheap. You should check it out.

My undergrad degree is in applied math, including stats.

A couple of years ago,. I had the enlightening experience of assisting on the stats part of my gf’s social science PhD. It was part of a much larger and ongoing study, so much of the stats were pre-ordained, and not very deep or insightful IMHO.

I read a lot of papers in a lot of fields - there are no end to statistical favorites in every field or sub-field, and much of the time their is no reason why other then “We’ve always done it that way”.

If you have a field in mind, I would suggest asking the professors in the department, or reviewing the literature in the field to see what might be expected from you. Unless you are going to be good enough in your field to think outside the box and get away with it :slight_smile: Otherwise, you run the high risk of being ignored or shunned because your readership (small by nature) does not understand the math either, no matter their background.

As a student, you can get darn good prices on good software - JUMP comes to mind, turned out to be pretty good for descriptive stats, and maybe more in some areas. We used SAS for the most part, provided by her department.

For my own work, I am partial to R. SAS and SPSS are throwbacks to the late 70s/early 80s when each was new. I remember doing SAS and SPSS on punchcards in college. Use any other software that old do you?

You can find tutorials for almost any techniques for any of the major stats packages and many minor ones via google with no trouble - both how to use the package for input/output, etc., and also how to do various statistical techniques.

Edit: there are some damn fine statistical courses on line now - pretty much all colleges, especially the great ones, have syllabi and course materials available for free. Google OpenCourseware at MIT and see what they and their partner schools have available for you…

The Cartoon Guide to Statistics by Larry Gonick is a very good introductory book, and a relatively easy read (as you might guess). You will want to follow up with a more substantial book, but it will get you started in a non-intimidating fashion. The Cartoon Guide is often used in graduate programs as a precursor to a student’s first statistics course, so it should be a good fit for your needs.

I find that this resource is usually very helpful, although the different chapters vary in quality. Also, although I’m aware of SPSS’s shortcomings, I do find that occasionally its documentation can be pretty enlightening. I’m not familiar with any other statistics packages so I couldn’t say if their documentations are any good.

Here’s another vote for the R package. It’s free, which is a nice bonus, and if you look through the CRAN website, there’s almost certainly someone that’s put together a decent package which will assist you in carrying out the analysis you want to do with varying levels of quality in documentation. I usually manage my data in Excel (which is much easier to use and less buggy than SPSS or any other software I have experience with, although JUMP has some nice features too), and once it’s roughly formatted in the manner I want, then I run it through R for whatever analysis I’m looking for.

I’ve never been capable of using SPSS for more than a half an hour straight without wanting to punch everyone that works for that company in the kidney.

Thanks for the suggestions. So far, it looks like the MIT open course is the best way to start… a bit more structured, at least, than grabbing a random textbook and diving in.

Any other suggestions on good readable textbooks? Ideally, I’d like a book that explains the derivations of various tests and distributions, rather than simply telling me to use some formula in some particular situation.

ETA: I’m using JMP at the moment, because that’s what the department has, and it has nice automated tools that fit some of our data perfectly.

On further investigation of recommended books…

I have to say that I bristle at the title “Statistics for Terrified Biologists”. Dammit, I’m not afraid of math! I took up to diff eq because I wanted to, and terrified a few classmates when I tried to bring it into my bio classes. I missed out on stats in undergrad mostly because the class had a reputation for being rather easy and boring.

Grr. Biologists can do math! At least, some of us…

I don’t remember MIT having any statistics classes on OCW, or at least not any that cover applications. If you want a deep understanding someday, you’ll need the theory, but I don’t know that it’s the best place to start.

If you want a deep understanding of statistics, you will need math beyond the calculus sequence and differential equations. At a minimum, you’ll need multivariable calculus, linear algebra and probability. It’d be very nice if you had a background in real analysis, and being familiar with complex analysis and abstract algebra can’t hurt. If you don’t have any experience with proof-based math classes, you will have trouble picking those up on your own.

Do you have a graduate school picked out yet, or is that more of a future plan? If the latter, you might consider looking for schools that have programs in statistics or biostatistics and planning on picking up a master’s degree in one of those fields. That’s more or less the royal road to the level of understanding you want, and having the credentials certainly can’t hurt you.

The textbook by Wackerly et al. that I mentioned earlier is a good place to start for probability and basic mathematical statistics. Additionally, while JMP is a very nice package, you should become familiar with R.