Any Affymetrix genechip biomedical experts on the SDMBs?
The company I work for is embarking on a new project in the exciting field of oligonucleotide array/genechip analysis. Paticularly we are looking at gene expression during specific disease states in an attempt to develop diagnostic tests*.
Already, before we’ve even started, we are running into problems. One of the 1st steps in the process is to analyze the chip’s probe intensities into gene expression measures. Using the chip we have in mind each raw probe intensity file is a little over 30 megs. We would like to be able analyze 200 to 400 files at a time. For these calculations, we are attempting to use the open source R Project in combination with code developed by the fine folks at Bioconductor. On a two gig of RAM, Win32 terminal server, I’ve managed to process 38 files before I reached serious memory limitations and the machine locked up. So here are my questions:
1.) We would like to get a 64 bit linux workstation to avoid the 4 gig RAM limit of a 32-bit environment. R, though, is written and compiled for a 32 bit target. Will I be able to compile it from source for a 64-bit platform? If so, what will be it’s limitations? It’s not optimized for 64 bit, so I won’t get the speed but I’ll get the increased memory?
2.) R requires both a C and Fortran compiler. Getting a 64 bit version of GCC is easy but what about a Fortran compiler? I believe I’ll need F2C or F77.
3.) With 20+ gigs of RAM, anybody know how many files we’ll be able to analyze? The memory usage/# of file relationship is not quite linear.
I’ve asked other bizarre questions here, so what the hell, I’m giving is a shot.
*My apologies for that crappy summary explanation, I’m a programmer, not a microbiologist. I further apologize for the long, long question.