Implementing C Code in R/S Plus

Any stats or math geek dopers out there? This is a very strange and specific question.

My boss has given me the wonderful task of writing some c code and implementing it in R (the GNU verion of the S programming language) in both Linux and the Win32 environments.

R link --> http://www.r-project.org/

I’m quite a newbie when it comes to both C and R but I’m afraid my boss likes to say things like “you’re a smart kid, you’ll figure it out”. LOL.

My biggest obstacle so far has been calling the compiled C code in R. Take this C code, for example, to calculate a simple Mean:

#include “Average.h”

void average(float indata, unsigned long n, float *average)
{
unsigned long j;

    for (*average=0.0, j=1; j<=n; j++);
    {
      *average = *average + indata[j];
    }
    *average = *average/n;

}

With header file Average.h:

void average(float indata, unsigned long n, float *average);

I complied this using GCC in linux to generate my shared library with:

gcc -fPIC -c Average.c ld -shared -soname Average.so.1 -o Average.so.1.0 -lc Average.o

Now, I fire up R and load the library:

> dyn.load("/home/lars/c/Average.so.1.0")

You then have to “map” the function so:

> testRfunc <- function(a,b)
.C(“average”, as.double(a), as.integer(length(a), as.double(average))

When I then call the testRfunc I get a “Segmentation Fault” which shutsdown R. Ughh…

Does anyone have any idea what I’m doing wrong?

Sorry for the lenghty post and obscure cry for help…

I’m a C geek, not a stats or “R” geek, but I think I can help.

First of all, your loop index should probably range from 0 to n-1, not 1 to n. Arrays in C are traditionally, almost invariably, zero-indexed. Theoretically yours might not be though; there are (abhorrent) tricks for making them one-indexed. You should check the R documentation for the protocol it uses. The code should then look like this:



*average = 0.0;
for (j = 0;  j < n;  j++)
{
  *average += indata[j];
}
*average /= n;


Here’s another important correction: note that the semi-colon after the closing right parenthesis should be removed. Putting it there has the effect of replacing your loop body with the empty statement, and then executing the code in braces exactly once after the loop is over. Worse yet, the value of j at that time will be n+1, which is almost certainly out of the array’s valid range.

You might also want to verify that the pointer average is not NULL when this function is called. If so, that would also result in the crash you describe.

And, as a subjective matter of style, you could make use of the “+=” and “/=” assignment operators (as I did above) to condense the code somewhat. This is entirely optional however.

Hope this helps.

Sorry for posting again so soon.

For “zero-indexed” and “one-indexed” in my post, read “zero-based” and “one-based”. I had a little jargon hiccup there.

Thanks Bytegeist, I was hoping someone would take a look at the C for me.

I recompiled and still have the same error in R. I’m convinced it’s not the C code but my implementation in R. Back to google…

I’m in a hurry, so I didn’t really look at yours, but I copy and pasted something I wrote up once for people interested in this. It uses Borland for an IDE and Splus so make the approprioate adjustments to what you’re doing.

A couple things I’d check is that the “float” in C++ is the same size as R’s “double”. That could cause a seg fault.

Also, I might try changing the indata to indata* and handle it that way. That’s just a guess. I don’t think Splus is too flexible about that.

Also, you might need the “extern” definition.

Also, you might want your function to return a double instead od a void. I don’t know if R will know how to handle that.

here are the instructions. . .

  1. In Borland, select File -> New -> “DLL Wizard”.

  2. In the .cpp file, use, for example:
    extern “C” void __declspec(dllexport) foo(double *x, double *y);

extern “C” void __declspec(dllexport) foo(double *x, double *y) { *y = <result based on the input ‘x’>; }

  1. Build or make without running (CTRL-F9) or (Project -> Build). You should have filename.dll at this point.

  2. In S+, call dll.load("$PATH/filename.dll")

  3. Then call, for example, .C("_foo",as.double(5),x=double(1)). x will contain your result, or do .C("_foo",as.double(5),x=double(1))$x to constrain the output just to what you want.

  4. dll.unload("$PATH/filename.dll") unloads the dll and allows you to recompile.
    (hope this helps)

Trunk, you are my hero. Your advice was dead-on.

Glad to be of assistance.

That’s real tricky stuff. I think I spent half a week or a week figuring it out once. Google is of little help and the S Programming guide wasn’t too useful either. Its a son of a bitch to do debugging and, as you said, every time something goes wrong R or S-Plus just shuts down so its time consuming to work with.

If you’re doing any bigger projects with it, I recommend just making a little console app or something, do your debugging in there, and then make the appropriate changes to call it from “R” because you can’t step into the code when you call it from “R”.

IMO, “R” is totally slick. It’s got almost all of the useful functionality of S-plus with 0% of the baggage and 0% of the cost.

I hate to turn this into Trunk and I talking back and forth but…

… one more question:

Can you pass a matrix into C from R/S?

I’m guessing your C function declare becomes something like…

void functionname(double **arrayin, unsigned int *firstslenght, double **arrayout)

In this case I’d be passing a multidimensional array in, performing calculations and then passing a multi arrary out. Can R handle that?

BTW, Trunk, don’t you love Baltimore this time of year?