Quick C question on string literals

I wouldn’t call it bizarre to have a constant value stored only once. I’d assume every* compiler does that for every* language.

  • OK “almost every”.

I disagree, say you have the instruction

li $t0, 5

You this is essentially just putting “101” inline in the program code, and moving 101 directly into the register.

Or you could have 5 stored somewhere else, in which case suddenly you get

la $t0, five
In which ase suddenly five refers to the ADDRESS where you stored 5, and then you have to do what amounts to a pointer dereference on the address 5, and then move that address into the register. Compare this to simply moving the right-most n bits into the register. The only way it’s not bizarre is that with this method you can only load constants wordSize - (#opcode bytes) - (#register code bytes) bytes long, whereas with the slower address method you can load constants up to the register size.

No, actually, it is bizarre; most compilers for most languages do numeric constants inline, right into the machine code. This is both faster (saves memory accesses, frees up cache, often allows smaller opcodes which also frees up cache) and not, you know, completely insane.

OK. But then why does Fortran (at least through Fortran 77) behave this way? It doesn’t sound like it’s just one or two outlier compilers or systems, but rather typical behavior for Fortran (again, at least through Fortran 77).

It’s in the language spec, so it’s not up to the compiler designer. It probably seemed like a good idea at the time in the meeting where they decided this, somebody probably brought up some obscure scenario where this would be necessary or more elegant and everybody went “oh, yeah!” And then made it part of the spec. The other possibility is that it was decided first that all formal parameters be passed be reference, and not value, so rather than change the spec to allow for primitives to have different behavior, they wrote a clause in there to the effect of, “yes, even integers” because they thought it would make it easier on the compiler designer.

This is what happened: In FORTRAN, everything is passed by reference* and they didn’t exclude integer literals from that dictate.

*(As opposed to how it works in C, where everything is passed by value and you have to explicitly create references (usually called pointers) which are then, you guessed it, passed by value. Don’t ask about call-by-name semantics unless you want your brain to melt a little bit; nobody uses it anymore.)

Re: The several posts just above, about how FORTRAN did things.

Much of the early FORTRAN design (and other languages of the day) had to do with the realities of how computers were in those days. Machines were small and slow; a fully loaded IBM 1620, for example, had 100K bytes (equivalent) of memory. You couldn’t fit a whole lot of compiler in there together with all the parsing tables, etc. This was partly why they didn’t have elaborate control structures, much beyond simple do-loops, which could require you to compile your way through a whole lot of source code before you could close off your matching braces. Do-loops were much simpler. ETA: The designers DID have the idea for more elaborate structures, and considered them, but it wasn’t practical to have them.

Also, machines of the day didn’t have any inherent support for stacks. So people didn’t even have much incentive to even think about doing much with stacks, like passing parameters or putting local variables there. Too inefficient to have to generate explicit code to do all the management of that. ALL local variables in subroutines were static, as were constants too.

Some early machines had a single machine instruction (or a simple sequence of two instructions) that would do a three-way branch according to whether a number was negative, zero, or positive. Hence, FORTRAN has a branching statement that does exactly that.

All variables were passed to subroutines by reference, and that meant all constants had to be also, so the subroutine could access its parameters in a uniform way. This created an opportunity for a very obscure bug, that was quite common:



C... MAIN PROGRAM:
       J = 3
       CALL FOO(5)
       K = J + 5
       PRINT 10, K
10     FORMAT (I5) 
       STOP
       END

C... SUBROUTINE FOO MODIFIES ITS ARGUMENT:
       SUBROUTINE FOO ( MYARG )
       MYARG = 7
       RETURN
       END


Printed output: 10
Because the compiler put a 5 into the constants table, then passed a pointer to that into the subroutine, then the subroutine changed the value to 7 (yes, right there in the constants table), and all subsequent uses of the constant 5 in the calling program were now using 7 instead. This happened with floating-point numbers too, perhaps even more so than with integers. Yes, I debugged many a user’s program that was doing stuff like this. There was nothing like read-only segments in those days to prevent that.

Later FORTRAN compilers, like f77, took up the habit of passing constants by loading the constant into a register, then pushing it onto the stack, then passing a reference to that instead. That way, the subroutine could munge it without affecting the constant back in the calling code.

Language and compiler design was primitive then (compared to later years), but still, the designers weren’t totally naive. They knowingly made many concessions to the limitations of hardware of the day. Many of the original designers of FORTRAN were also on the Algol committee, where they designed a much more evolved language (and also made many horrendous design blunders), just waiting for a machine powerful enough to actually host an Algol compiler.

We came across this issue recently (we are writing a C compiler at work). Strangely, the standard seems to be silent on what happens here. Note, that in C, unlike in C++, main can be recursive. But the C standard also places some strict guarantees on what the form the standard parameters to main can take, so it appears to suggest that when performing a recursive main call, the compiler has to enforce the condition that argv and argc have not changed from the initial call to main made by the runtime. Yet this is a pretty bizarre restriction for a recursive function call (i.e. the recursive main either never terminates, or it is recursing on some global variable, rather than its parameters). Yet, the standard also says both parameters are modifiable. So, who knows?

The standard places restrictions on the possible signatures of ‘main’, but I don’t think it requires (implicitly or explicitly) that the parameters hold fixed values for all calls within a program. ‘Main’ behaves like an ordinary function, aside from its special status as the program’s entry point.

Nope, the standard explicitly puts constraints on the values that argc and argv can take. Namely, argc can never be negative, argv[argc] is a null pointer, if argc is greater than zero then argv[0] is the name of the program, etc. Now, if you call main recursively, what happens if you set argc to -1 in the recursive call? Further, what happens if you modify the first element of argv to be something other than the program’s name? Are these defined, or not?

EDIT: I’m not disputing that argc and argv can be modified. It explicitly says that they can in Section 2.1.2.2 of the standard. I’m stating that this clause causes something of a problem when you combine it with the fact that main can be recursive and the standard appears to place some constraints on what values argc and argv can take.

Right. But surely to God, those requirements are just for the runtime environment, when it makes its initial call to ‘main’. They shouldn’t apply to any subsequent calls made by the program itself.

But I am only going by what reason tells me. I don’t have a copy of the actual Standard handy. (Standards are not strictly required to be reasonable.)