Why is "i" chosen in programming loops?

If you think that is traumatic, try debugging a multi thousand line 30 year old program written in Fortran IV that has a massive single common block and hundreds of variables all with names that go:
i, ii, iii, iiii, j, jj, jjj, jjj, etc etc.

Why is that thousands of programmers never thought that you were allowed to use letters other than the one the name started with, or that perhaps even in eight characters, you could be meaningful.

If I could, I would hunt down the author of this program and kill him slowly over a low heat. However I think he is already gone. There are, needless to say, no comments in the code, nor there is there any documentation.

It didn’t and couldn’t have originated with Fortran, because it showed up far, far earlier than that. Yes, Fortran assumed that variables starting with i through n were integers, but it assumed that because that was consistent with the convention already in use by mathematicians. And mathematicians used that convention because “i” stands for either “integer” or “index”, and “n” stands for “number” or “natural number”.

You think that’s bad? Try maintaining a program that existed only in the form of an OCR of a printout of its original version, and both the original softcopy and the printout have been lost to history, and with the added bonus of a text format conversion somewhere along the line that stripped out most of the “redundant” whitespace. Is that a variable named IO, or is it maybe I0, or is it actually the number 10? Is that number at the start of a line a continuation indicator, or a line number? And even in the cases where a typo was obvious once noticed (clearly, “MA5S” should actually be “MASS”), do you know you’ve caught every single one of those obvious typoes? Now combine this with a boss who determines whether the code is correct or not based on whether it produces correct results, and who knows what the correct results are because that’s what the code produces.

Column 7 was the start of the code. 1-5 was for the statement number (long before the days of textual labels for statements). 6 was the continuation column (non-blank means this card is a continuation of the preceding one). 73-80 (or was it 76-80? I forget) was ignored and normally used for sequence numbering your cards (so you could run them through a card sorter if you were a klutz and dropped your cards).

The use of “x” for a counter variable is a BASIC convention (or at least I’ve seen it in most documentation of BASIC interpreters and tutorials). It does make more sense than using “i” or “j”, at least in modern intepreted scripting languges like Matlab, Octave, and Python where the built-in complex type is expressed as “<real> + <imag>i” or “<real> + <imag>j”, or C where the Complex data type is “<real> + <imag> * I”. Elsewhere i or j is ofte is often used by convention to denote the use of values on the imaginary axis, e.g. i = Complex(0,1). Julia eliminates this ambiguity for their built-in complex type by making it “<real? + <imag>im” so it is more clear.

Francis Vaughn is correct that the variable in a “for” loop is counter, not an index. An index implies a sequence which organizes an array or vector structure. The counter in a four loop is just a holder for iteration purposes as previously discussed. It may be used in indexing an array structure within the loop which is ordered in sequence, but it is just a counter.

As a general practice in modern languages where there aren’t restrictions on the length of variable names or the size of the human readable code, single letter variable names should be avoided like the plague and variable names should be reasonable descriptive, albeit not to the point where you have names like arrayForComputingCrossCorrelationCoefficientsBetweenAgeAndDensity (a genuine example from a recent bit of hacktastic code I had to rewrite). Generally for generic iterators I’ll use something like itrA or itrOuter, which is reasonable short and allows you to next loops with clearly identifiable interators (e.g. itrB, itrC, et cetera or itrInner).

Welcome to the world of legacy Fortran code, where not only do engineers use similar looking variable names, fail to annotate code, and don’t provide any kind of test cases to validate the code or interface specification so you are basically presented with a black box that you have to run a parametric input test on to see where it breaks. “But we’ve been using this code for fifty years to predict solar incidence and atmospheric absortivity.” Yes, and it gives the same wrong answers every time above 70 degrees longitude and between March 13 and July 8, and nobody has a clue as to why. For the love of all that is good and holy, let me just rewrite it directly in Python instead of having to wrap it with code that has to interface with its user prompt input mode and hack correction factors to make the answer right.

I hate legacy code with a burning fury. If it isn’t a well-validated standard library or doesn’t come with an interface spec and description of the algorithm, rewrite it. It will save no end of trouble in the long run.

Stranger

For some reason, I remember always using “T” as the variable all the time in VIC-20 and C64 BASIC programs, but looking through the programming manuals they came with, there was no convention whatsoever I can find. Any letter seems to be fair game. (I suspect now that the “T” convention I’m thinking of is for empty FOR loops that were just there for a short pause, in which case the “T” could stand for “time.”)

Don’t forget it’s COBOL roots:
I-INDEX-COUNTER-VARIABLE-THAT-GETS-INCREMENTED-EACH-TIME-THROUGH-THIS-LOOP

Yep, T was typically used in artificial delay loops, where counting up to 1000 took about 1 second.

Other fun, somewhat relevant facts about Commodore BASIC V2:
[ul]
[li]Only the first 2 letters of a variable name were stored. So MISSILECOUNT=MISSIONNUMBER3 was interpreted as MI=MI3. [/li][li]For added versatility and ease of use, every variable was floating-point, unless you used the % suffix to indicate a 16-bit signed integer (or for string).[/li][li]Variable declarations were not allowed. However, performance was slightly improved if you determined which variables were the most often used and made sure they were mentioned first in the code.[/li][li]All arithmetic was performed in floating-point, even if adding 1 to an integer. So integer variables were actually a bit slower than floating-point variables.[/li][li]TI and TI were reserved variables containing a system tick count. So nothing in your program could be called TIMER or TIP$.[/li][li]The only loop instruction was FOR, and you were not allowed to use an integer variable for a counter. So FOR I = -15 TO 45 used a floating-point variable called I.[/li][/ul]

Ooo…I remember most of those, except for #3 (most often used variables, earlier in the code), and I had never thought about that last point. I guess it makes sense as you’re not using the % and you can add a decimal STEP value (like 0.3) and have it count by .3s (which apparently introduces one of those floating point weirdnesses when your numbers start to end with .0000000001 or .000000002 or whatever.) I never tried using I% in a FOR loop and, just trying it, it throws a ?SYNTAX ERROR at me.

In mathematics, the integers are often denoted I (the alternative is Z from the German word Zahlen for numbers). Then an element of I is usually denoted i. You need another integer, call it j. Then k, then l, m, n. It does not continue since o is too readily confused with 0. Since mathematicians were the early programmers, it is almost certain that it started there.

I would accept that welcome graciously, except that I left that world two years ago, and good riddance.

And I will confess to sometimes using a single-letter variable name as a loop index, but only in very local contexts: For instance,


for(i=0;i<size;i++) array* = 0;

to initialize an array. I’ll occasionally do it for longer blocks than that, too, but never more than a half-page or so, so anywhere that i actually shows up, you can see at a glance what it means.

The Scala school of thought on this is that most of the time loop counter variables don’t actually mean anything in the problem space – they’re a means to an end, like applying a function to every point in the collection, or combining all the elements in some way, or selecting a subset of the elements. So it’s no wonder they are often given meaningless, throw-away names, since they don’t actually “mean” anything.

Scala has things like collection.map(*2) or collection.filter(%2 == 0) or collection.foldLeft(0)(+) to describe the higher-order operation without requiring a counter or iterator explicitly. (Other languages do some of this elimination, too, but Scala does it very comprehensively).

That is how I understand it, too. Mathematical formulas have some very old conventions:

[ul]
[li]a,b,c,… are constants.[/li][li]i,j,k,… are indices.[/li][li]n,m,l,… are “sizes of things”[/li][li]x,y,z,… are variables[/li][/ul]

FORTRAN conventions reflect the older math/physics usage. It’s called “FORmula TRANslation” for a reason.

I’m fond of the idea that “i” stands for “index” in mathematical notation. But where is the actual evidence for this? Can anyone trace back the development of this convention through historical sources?

I think it’s far easier to prove that “i as an index” is a convention much older than FORTRAN, than it is to actually prove that “i” was selected to stand for the English word “index” in that older convention.

Here is a paper from 1904 with lots of subscripts, and lo-and-behold, the first one is “i”, the second is “k” (presumably it would have been “j” if it was more visually distinct).

Right; to clarify, what I’m looking for is evidence that “i” was selected to stand for the word “index”. As you note, there is an abundance of mathematical writing using “i” as the dummy variable in contexts such as discrete summation long before Fortran.

That may be, but I’d argue that it was FORTRAN that drove the convention into the mainstream. I agree with beowulff and with Senegoid’s expanded commentary on that. It became so ingrained in me to start any loop in FORTRAN with “DO statement_number I=1,count” that I always thought of “I” as the obvious name for the loop index, not because of the word “index” but because that’s how FORTRAN encouraged its use. “J” would be the index of any inner loop, etc. Most of the time loop indexes were throwaway variables so it was common to use the single letters reserved for the first letters of integers, and “I” was always the obvious first choice.

My first language was PL1. I learned to use J and K for my loop counters. I still use them in COBOL. My count is WS-REC-CNT or WS-EMP-CNT Simple For loops are J and K. A lot of my loops are for tables and J and K are subscripts in that loop.

WS means it’s a variable in working storage. A standard naming convention in my old shop.

Whatever you us be consistent. Wiping out the contents of an important counter can be a tough bug to find. Pick something and use it in For Loops in every program you write.

For the OP. I absolutely would *never *use I as a variable. Looks too much like a 1.

Remember we were debugging these programs at 2 in the morning. Beeper goes off. You call OPS and learn Payroll has aborted. Your butt is on the line to fix it. You want code written as simple as possible.

Being on call is what eventually made me leave CS. 5 years of call was enough for me. I like my 8 to 5 job in computer support.

We are talking here about habits acquired circa FORTRAN IV and machines like the IBM 7040 and 7090. :slight_smile: The standard IBM line printer fonts at the time had enormous serifs on the “I” and it was clearly distinguishable from “1”. Similar care was taken to distinguish the letter “O” from zero, the letter being a distinct rounded square.

Back then all our programs were in uppercase. Our line printer only supported uppercase. Uppercase bands were cheaper than a printer band that supported upper and lower. I was also told the multicase bands would break quicker.

I still code in uppercase. After 20 plus years lower case looks weird and it distracts me trying to read a compiled listing.

If you think that is traumatic, . . .

. . . try debugging the multi-ten-thousand line FORTRAN IV compiler, written in utterly, totally, and completely unstructured assembly language, all of whose variables and branch labels are named like A001, A002, … , B001, B002, with no particularly obvious pattern, and whose every third or fourth line (so it sometimes seemed) was a conditional branch to someplace a dozen or a hundred pages away.

That was, in large part, my first programming job.

Our thinking was that this compiler must have originally been written in absolute machine language (the original Chippewa Falls compiler) and then translated (possibly by some automated process) into assembly language. A great many of the textual messages (pages and pages of error message definitions, for example) were coded in octal :smack:

That seemed, at least, like a plausible theory. This was, after all, the FORTRAN compiler for the CDC 6400/6600/7600 machines, where legend had it that the operating system itself was written an absolute machine language by Seymour Cray.