Everything I try (&id, strduping into a local and the address of that) has some major issues with the ids randomly changing (or rather the pointer pointing to something different), since it either ends up being a pointer to a local variable or a mutable global variable. I simply cannot figure out what I should use as a pointer here. Or if I need some different representation to store this.
I forgot how much of a pain working with pointers can be sometimes >.<.
I’m talking about the more general claim that a string literal exists for the life of the program and has a location in memory that does not change with scope. Indeed, it would be undeniably daft to write a compiler that went through the trouble of moving the memory location of string literals during execution (since they have to be stored somewhere for the life of the program), but that is very different from saying that the language specifies that the pointer must be valid outside of the string literal’s nominal scope.
In other words: if I wrote a compiler that, at runtime, stored the OP’s “hi” string in a sequestered piece of RAM until the execution stack reached foo(), at which point it allocated some new memory, wrote “hi\0” there, returned the pointer requested inside of foo(), and de-allocated that new memory once foo() exited, would my compiler be in violation of C standard? It isn’t clear to me that it would.
To be sure, this would be a ridiculous way to implement string literals, and no compiler does this in practice. But I haven’t yet found a statement in the standard that forbids such an implementation.
Anybody have access to a DEC VAX or MicroVAX – or more to the point, DEC’s C language reference manual? There was an especially good section on how to read and write complicated C declarations, and how to use the variables thus defined. It would certainly have answered this kind of question.
There are several on-line articles that purport to tell the same sort of thing. Here are two that I just found by googling writing complex c declarations – If you google this yourself, you can find others as well.
(I haven’t read those yet, just skimmed them, so don’t know how well-written they are.)
They work through examples, detailed step-by-step, like this:
char *(*(**foo[][8])())[] ; // huh ?????
One warning I would add though: Several such explications I have seen recommend the “left-right” (or “right-left”) rule, which I have seen in several variations, but as best as I have ever been able to figure, that’s all baloney.
Anyway, read these, and you will either be a better expert at reading, writing, and using complicated declarations than you ever imagined, or you will be more hopelessly confuzled than ever.
I think that’s covered by the fact that string literals are compile-time tokens, just like constants. A string literal is effectively a constant numerical memory address. The compiler therefore couldn’t change the value (i.e., memory address) of “foo” in
Identity is one. Are two string literals defined in different functions the same string if they have the same string contents?
What if foo and bar were compiled in different compilation units? Any difference if the units were statically or dynamically linked?
#include <stdio.h>
char* randomGlobal1;
char* randomGlobal2;
void foo(){
randomGlobal1 = "hi";
}
void bar(){
randomGlobal2 = "hi";
}
int main(){
foo();
bar();
if (randomGlobal1 == randomGlobal2) {
printf("The same
");
} else {
printf("Not the same
");
}
return 0;
}
What if all we ever did was use the strings as components in string operations? The compiler would probably be allowed to create code that built the data on the fly, possibly as manifest constants in the executable code.
This has already been addressed. The standard explicitly states that this is unspecified: “It is unspecified whether these arrays are distinct…” The compiler is therefore free to store them once or to store them multiply; but portable code can’t count on a particular behaviour.
I’m pretty sure that the result of that comparison is undefined, Francis. I don’t have chapter and verse.
I do know that some compilers (Borland C++ 5.0 being one I have on hand) have a compiler switch called “Duplicate strings merged” that does a check on string literals and combines them if they are identical. So depending on how you set this flag on this compiler the behavior of your test program will change.
As to the OP, your example is fine with the one modification (which you already made) that the global should be a pointer to const char.
FWIW, I get a segfault if I try to do something like
char* test = “hi”;
test[1] = ‘b’;
Interestingly, here’s the whole program:
#include <stdio.h>
int main( void ){
printf("hi");
char* test = "hi";
test[1] = 'b';
printf("%s",test);
printf("hi");
return 0;
}
The output?
> ./a.out
Segmentation Fault
It doesn’t even get to line 3, no:
hiSegmentation Fault
Nope, just segfaults before it even prints the first thing. This is with gcc, mind you. I know it doesn’t answer anything about the exact specification of the C language, just a silly test of one implementation.
This is because standard file I/O is normally buffered. The “hi” is buffered for output at some later time (when the next newline is seen, most likely), but since you segfault right away, you never get to a point where the buffered text would be flushed.
I don’t see how that’s possible, you can’t even do that in assembly.
Oh, sure, if you muck around with the program memory you could forseeably do something like
li $t0, 5
Where the binary 101 gets redefined to 111 (7) at that specific instruction, and now $t0 holds 7 instead of the expected 5. But unless your Fortran compiler does some REALLY bizarre flyweighting for integers, even mucking about with the binary will never completely eliminate 5 unless you’re REALLY thorough about it, in which case you’re not redefining 5 so much as ensuring the program never contains 5 in the binary (and being exceptionally thorough, meaning scrubbing all instances where 5 could even be the result of an expression would likely also require running a memory editor in the background while your program is running).
I mean, I’m not saying that mucking with the program won’t redefine 5 for a WHILE up to and including the whole program depending on how big it is, since nobody is going to reload 5 into a register every time they use it (it’ll likely be stored in a register and then reused until 5 is unneeded, in which case it’ll be reloaded at a later date), so it could very well redefine 5 in a very specific, limited, local scope with undefined behavior as to when it resumes being valued at 101, and even then, it’s unlikely doing something like that will redefine 5 in the context of an expression like “6-1” unless you redefine 6 or 1 as well.
It’s explained (among other places) in this Everything2 article. I think I first read about it, though, fifteen or twenty years ago—maybe in the Jargon file or on FidoNet.