C Programming - Iterate Through Array

larsenmtl · September 25, 2007, 4:01pm

Lets say I have the following function:


float foo (float *input){
  int i;
  float rv;
  for (i=0; i<SIZE_OF_ARRAY; ++i){
    if (*input > THRESH){
      rv=*input;
      break;
    }
    ++input;
  }
  return(rv)
}

Am I passing in a pointer to the first item in the array input? So that ++input moves to the next item?

Would it be valid to rewrite it as:


float foo (float *input){
  int i;
  float rv;
  for (i=0; i<SIZE_OF_ARRAY-1; i++){
    if (input* > THRESH){
      rv=input*;
      break;
    }
  }
  return(rv)
}

If so, is one style preferred over the other?

Thanks,

LarsenMTL

Jas09 · September 25, 2007, 4:14pm

Yes, they are equivalent. Array* is the same as *(Array+i) in most (all?) instances. Arrays are passed to functions in C as a pointer to the first element.

I prefer to use indices (Array*) because it makes it explicit that you are dealing with an array - and it doesn’t freak out programmers not comfortable with pointers that might read/edit/maintain your code. It also maintains the input parameter (so you could make it const or use it later in your function if you wanted).

larsenmtl · September 25, 2007, 4:15pm

Thanks Jas09!

Mijin · September 25, 2007, 4:17pm

Both will work fine in this case (although you need to add a semicolon to the return statements).

As to which is better, I would personally prefer the second syntax, because it is clearer. Modifying arguments to a function looks odd, and might lead another programmer (or you, at a later date) to think that perhaps you intended the function to modify external data in the calling function (e.g. write out to *input), as is commonly done when a function needs to return more than one variable (and the programmer doesn’t want to package them in one struct).

Furthermore, the first syntax is error-prone because you might use the same pointer in a later part of the function, and it will be pointing at the end of the array.

On edit: I see several people beat me to it…

Capt.Ridley_s_Shooting_Party · September 25, 2007, 4:20pm

See question 6.3 and 6.4 here for a technical discussion.

Pleonast · September 25, 2007, 5:03pm

I see two potential bugs.

The first, the float rv is not initialized. If none of the values is greater than the threshold, the function will return an undefined value. This is usually undesirable behavior.

The second, in the second code block, the final element of the array is not checked. Do you really intend “i<SIZE_OF_ARRAY-1” instead of “i<SIZE_OF_ARRAY” or “i<=SIZE_OF_ARRAY-1”.

Also, in the second block, I’d define the function as “float foo(const float input)” to make clear that you are not modifying the array.

In almost all circumstances, I’d prefer the second style, since it is much more clear what the intent of the function is. However, the pointer method will often be fractionally faster than the array dereferencing (since the array dereference will require a pointer addition, unless the compiler is really good). Unless you’re calling this function literally billions of times, it’s not worth the obscuration.

Sage_Rat · September 25, 2007, 5:25pm

The second style is prefered, but as Pleonast points out it will most likely be minimally slower for arcane reasons that would make sense if you knew assembler.

I’d just point out, though, that if your objective is to find the point at which you exceed a value, you’d do better to do a binary search if speed is of the essence.

larsenmtl · September 25, 2007, 5:40pm

Thanks all! My examples above isn’t “real” code just enough to let me get my question across.

Thanks for the bug hunting. In my real code, the compiler complained about the first one but I would have missed that second one (I meant to use <=).

Binary search requires your array to be ordered doesn’t it?

Trunk · September 25, 2007, 6:07pm

It’s just sort of puzzling what you’re trying to do if the array isn’t sorted.

That doesn’t change the answer to your question.

Sage_Rat · September 25, 2007, 7:44pm

It would need to be ordered for iteration to work as well.

Sage_Rat · September 25, 2007, 11:12pm

There’s no reason to subtract the one. You’re better off to just use <.

Punoqllads · September 25, 2007, 11:18pm

It depends upon what the routine is supposed to be doing. If it’s supposed to be returning the smallest value that is above a threshold then you’re correct.

ultrafilter · September 26, 2007, 12:45am

This is the SDMB. We eat/sleep/breathe/live arcane crap.

The notation A* is equivalent to (A + isizeof(A[0])). Going through and doing that computation eats a couple of CPU cycles per array reference. Most of the time you just don’t care, but if you’re writing a real-time system or iterating over every element of a really big array, it could matter.

There’ve been a lot of good reasons given to prefer this style over using pointer arithmetic. It might also be worth noting that random access gets to be a little trickier if you’re not using the subscript notation.

Punoqllads · September 26, 2007, 12:54am

If they’re really that interested in performance, they should be compiling with optimization turned on, which ought to result in essentially identical performance. In fact, using less-obfuscated code could make easier to optimize code, resulting in better performance.

Punoqllads · September 26, 2007, 1:09am

dup post

MilTan · September 26, 2007, 4:27am

Correct. To be more precise, the compiler should perform some sort of strength reduction on the code snippet that ultrafilter posted. That would replace the expensive multiply with an increment that executes on each iteration of the loop – effectively making the second variant do exactly what the first variant does.

Sage_Rat · September 26, 2007, 1:57pm

Multiplication in C isn’t exactly the same as index offsetting in x86 assembly if I recall correctly. There somewhat isn’t any way to write an assembler equivalent in C beyond “A*”.

The following is about this behavior in regards to the LEA command, but the same holds for MOV. The key point is that indexing is almost always an addition and a shift (not multiplication), and doing it as an index off of a register only takes one clock cycle where as running a shift and add as separate ops will take 2 or more cycles.

Agner Fog:

27.1 LEA instruction (all processors)

The LEA instruction is useful for many purposes because it can do a shift, two additions, and a move in just one instruction taking one clock cycle.

Example:

LEA EAX,[EBX+8*ECX-1000]

is much faster than

MOV EAX,ECX / SHL EAX,3 / ADD EAX,EBX / SUB EAX,1000

The LEA instruction can also be used to do an add or shift without changing the flags. The source and destination need not have the same word size, so LEA EAX,[BX] is a possible replacement for MOVZX EAX,BX, although suboptimal on most processors.

You must be aware, however, that the LEA instruction will suffer an AGI stall on the PPlain and PMMX if it uses a base or index register which has been written to in the preceding clock cycle.

Since the LEA instruction is pairable in the v-pipe on PPlain and PMMX and shift instructions are not, you may use LEA as a substitute for a SHL by 1, 2, or 3 if you want the instruction to execute in the V-pipe.

The 32 bit processors have no documented addressing mode with a scaled index register and nothing else, so an instruction like LEA EAX,[EAX2] is actually coded as LEA EAX,[EAX2+00000000] with an immediate displacement of 4 bytes. You may reduce the instruction size by instead writing LEA EAX,[EAX+EAX] or even better ADD EAX,EAX. The latter code cannot have an AGI delay in PPlain and PMMX. If you happen to have a register which is zero (like a loop counter after a loop), then you may use it as a base register to reduce the code size:

LEA EAX,[EBX4] ; 7 bytes
LEA EAX,[ECX+EBX4] ; 3 bytes