The Straight Dope

Go Back   Straight Dope Message Board > Main > General Questions

Reply
 
Thread Tools Display Modes
  #1  
Old 12-02-2009, 03:12 PM
mischievous mischievous is offline
Charter Member
 
Join Date: Mar 2001
Posts: 1,162
Fairly basic statistics question.

Okay, so as a professional biologist it pains me to admit this, but I know little to nothing about statistics. I can do a Chi-squared analysis on Mendelian ratios and a Mann-Whitley U-test for a single parameter, but that's about it.

So I currently have a data set that has two parameters - age of the mouse embryo, and length of a particular tissue - with three different mutant genotypes. When I do a scatter plot of the data, it looks like this. The different colors represent different genotypes (I had Excel add "Trend lines", whatever the hell those are.)

So, to my eye, there is no difference between the different genotypes (colors), but I need someone to point me towards the right test to use to actually get a p-value out of that subjective judgment. Anyone want to lend a hand?

Thanks,
mischievous
Reply With Quote
Advertisements  
  #2  
Old 12-02-2009, 03:30 PM
cjepson cjepson is offline
Guest
 
Join Date: Oct 2007
I assume you're trying to determine whether the association between age and tissue length differs across the three genotypes. If that's correct, you want to do a regression model (probably a linear regression) in which the dependent variable (the thing you are trying to predict) is presumably tissue length, and the predictor variables are age, genotype, and the interaction of age by genotype. (The interaction is the effect you are interested in.) The one thing that is a bit tricky is that the genotype variable has to be represented by a set of two indicator, or "dummy", variables, which will mean that the interaction effect will also be represented by two dummy variables. It might be better to find someone who can walk you through that in person. (You could avoid that problem by running a set of three models, each comparing one pair of genotypes -- A vs. B, B vs. C, and A vs. C... that's not the standard way of doing it, though, because it raises the issue of what is called "multiple comparisons". Generally, when you do that, you correct for it by using a more stringent criterion of significance than you otherwise would.)

Hope this helps...
Reply With Quote
  #3  
Old 12-02-2009, 03:42 PM
mischievous mischievous is offline
Charter Member
 
Join Date: Mar 2001
Posts: 1,162
Actually, the thing I'm trying to establish is whether genotype makes a difference, i.e. whether the length of the tissue is longer or grows faster in some mutants than in others.

I think (if I understand it correctly) that linear regression will tell me how fast the tissue is growing in each genotype (i.e. the slope of the line), but will not tell me if the lines for each genotype are the same. Is that correct?
Reply With Quote
  #4  
Old 12-02-2009, 03:57 PM
cjepson cjepson is offline
Guest
 
Join Date: Oct 2007
Quote:
Originally Posted by mischievous View Post
Actually, the thing I'm trying to establish is whether genotype makes a difference, i.e. whether the length of the tissue is longer or grows faster in some mutants than in others.

I think (if I understand it correctly) that linear regression will tell me how fast the tissue is growing in each genotype (i.e. the slope of the line), but will not tell me if the lines for each genotype are the same. Is that correct?
If you do the linear regression model I outlined -- i.e., with the interaction of genotype by age included, as well as the main effects of genotype and age -- then the main effect of genotype will tell you if the tissue length (averaged across all ages) is greater for one genotype than another, and the interaction effect will tell you if the speed of growth (i.e., the degree of association between age and tissue length) is greater for one genotype than another.

Last edited by cjepson; 12-02-2009 at 03:58 PM..
Reply With Quote
  #5  
Old 12-02-2009, 05:00 PM
ultrafilter ultrafilter is offline
Guest
 
Join Date: May 2001
cjepson's suggestion to use dummy variables is the simplest way to approach this, but you can't do multiple regression in Excel. What else do you have access to?
Reply With Quote
  #6  
Old 12-02-2009, 05:08 PM
mischievous mischievous is offline
Charter Member
 
Join Date: Mar 2001
Posts: 1,162
I'm not sure - I don't even know what to look for. I work at the NIH, which has site licenses to a fair amount of software, if you could suggest some names I could go looking for.
Reply With Quote
  #7  
Old 12-02-2009, 05:35 PM
footballisplayedwithyourfeet footballisplayedwithyourfeet is offline
Guest
 
Join Date: Oct 2008
look for stata, spss, sas, R. The problem is, you need to know what you are doing (ie know the software). For the record I would also try the proposed regression model, just make sure you have one dummy less than you have categories (so in your case only 2 dummies and interaction effects) the result you get will tell you hwo that particular category does compared to the one you didn't include...this also means that one model will not tell you whether the two categories that you did give a dummy are different from each other. In order to know this you need to run another model where one of the other categories is the one that is the base (so not with a dummy). I must say that at a glance there seems to be little difference...but if your sample is large enough there might still be significant outcomes.


ps I think I once heard somebody talk about doing regressions in excel, so it might be possible, don't ask me how tough.
Reply With Quote
  #8  
Old 12-02-2009, 06:16 PM
xash xash is offline
Ogministrator
Moderator
 
Join Date: Jan 2001
Location: Palo Alto, CA
Posts: 4,133
Quote:
Originally Posted by mischievous View Post
I'm not sure - I don't even know what to look for. I work at the NIH, which has site licenses to a fair amount of software, if you could suggest some names I could go looking for.
Look for SPSS or JMP.

ETA: SPSS has a free trial download.

Last edited by xash; 12-02-2009 at 06:18 PM..
Reply With Quote
  #9  
Old 12-02-2009, 06:21 PM
ultrafilter ultrafilter is offline
Guest
 
Join Date: May 2001
Quote:
Originally Posted by mischievous View Post
I'm not sure - I don't even know what to look for. I work at the NIH, which has site licenses to a fair amount of software, if you could suggest some names I could go looking for.
We academic statisticians use R. There's a fairly steep learning curve, but it's extremely powerful and infinitely extensible.
Reply With Quote
  #10  
Old 12-02-2009, 06:23 PM
footballisplayedwithyourfeet footballisplayedwithyourfeet is offline
Guest
 
Join Date: Oct 2008
Quote:
Originally Posted by ultrafilter View Post
We academic statisticians use R. There's a fairly steep learning curve, but it's extremely powerful and infinitely extensible.
Don't forget to mention it's open source. You can get it for free anywhere and everywhere.
Reply With Quote
  #11  
Old 12-02-2009, 08:42 PM
mischievous mischievous is offline
Charter Member
 
Join Date: Mar 2001
Posts: 1,162
Okay, tomorrow I'll go looking for some software. I'm sure I'll have a million questions once I get that far.

Dammit, isn't there an easy way?
Reply With Quote
  #12  
Old 12-02-2009, 09:00 PM
thelurkinghorror thelurkinghorror is online now
Guest
 
Join Date: Jun 2006
You might have trouble looking for new versions of SPSS. It's called PASW now.
Reply With Quote
  #13  
Old 12-02-2009, 09:16 PM
CookingWithGas CookingWithGas is offline
Charter Member
 
Join Date: Mar 1999
Location: Tysons Corner, VA, USA
Posts: 9,775
Quote:
Originally Posted by ultrafilter View Post
cjepson's suggestion to use dummy variables is the simplest way to approach this, but you can't do multiple regression in Excel. What else do you have access to?
You can do multiple regression in Excel, unless I misunderstand what you mean (multiple independent variables, one dependent variable, right?). I did this in a course I took in forecasting a few years back but haven't used it since. There is a regression tool built into the Analysis ToolPack which ships with Excel, but isn't installed by default. It's more powerful than the TREND function and will give you a sheet with all the parameters for the model, like R2. I think you can even do multiple regression with TREND if you set the columns up right. But frankly this is a little like removing a screw with a pair of pliers.
Reply With Quote
  #14  
Old 12-03-2009, 09:02 AM
mischievous mischievous is offline
Charter Member
 
Join Date: Mar 2001
Posts: 1,162
Okay, well, I'm an idiot. It turns out that my facility has full-time statistics support.

I have an appointment in an hour with a statistician who says he'll lead me through the process step-by-step. Bless him.

Thanks for all of the ideas, guys, and I'll keep them in mind for the next time I run into trouble.
Reply With Quote
Reply



Bookmarks

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off

Forum Jump


All times are GMT -5. The time now is 02:56 AM.


Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2014, vBulletin Solutions, Inc.

Send questions for Cecil Adams to: cecil@chicagoreader.com

Send comments about this website to: webmaster@straightdope.com

Terms of Use / Privacy Policy

Advertise on the Straight Dope!
(Your direct line to thousands of the smartest, hippest people on the planet, plus a few total dipsticks.)

Publishers - interested in subscribing to the Straight Dope?
Write to: sdsubscriptions@chicagoreader.com.

Copyright 2013 Sun-Times Media, LLC.