Quick C question on string literals

Jragon · October 24, 2011, 4:24am

So I got myself into a kind of ridiculous problem with pointers that I really should have foreseen…

Let’s say I have some basic struct

typedef struct ex {
char** id;
} ex;

id is a char** because I need a variable length string.

So then there’s some function

void func(char* id){
ex* exampleStruct = &randomGlobalTableOfTheseStructsSomewhere[address];

exampleStruct->id = {what, exactly?}
}

Everything I try (&id, strduping into a local and the address of that) has some major issues with the ids randomly changing (or rather the pointer pointing to something different), since it either ends up being a pointer to a local variable or a mutable global variable. I simply cannot figure out what I should use as a pointer here. Or if I need some different representation to store this.

I forgot how much of a pain working with pointers can be sometimes >.<.

Jragon · October 24, 2011, 4:52am

Nevermind, I completely confused myself as to how pointers work… >.<

I got it, I think.

Pasta · October 24, 2011, 5:50am

I’m talking about the more general claim that a string literal exists for the life of the program and has a location in memory that does not change with scope. Indeed, it would be undeniably daft to write a compiler that went through the trouble of moving the memory location of string literals during execution (since they have to be stored somewhere for the life of the program), but that is very different from saying that the language specifies that the pointer must be valid outside of the string literal’s nominal scope.

In other words: if I wrote a compiler that, at runtime, stored the OP’s “hi” string in a sequestered piece of RAM until the execution stack reached foo(), at which point it allocated some new memory, wrote “hi\0” there, returned the pointer requested inside of foo(), and de-allocated that new memory once foo() exited, would my compiler be in violation of C standard? It isn’t clear to me that it would.

To be sure, this would be a ridiculous way to implement string literals, and no compiler does this in practice. But I haven’t yet found a statement in the standard that forbids such an implementation.

Senegoid · October 24, 2011, 7:46am

Jragon:

So I got myself into a kind of ridiculous problem with pointers that I really should have foreseen…

Let’s say I have some basic struct

typedef struct ex {
char** id;
} ex;

id is a char** because I need a variable length string.

So then there’s some function

void func(char* id){
ex* exampleStruct = &randomGlobalTableOfTheseStructsSomewhere[address];

exampleStruct->id = {what, exactly?}
}

Everything I try (&id, strduping into a local and the address of that) has some major issues with the ids randomly changing (or rather the pointer pointing to something different), since it either ends up being a pointer to a local variable or a mutable global variable. I simply cannot figure out what I should use as a pointer here. Or if I need some different representation to store this.

I forgot how much of a pain working with pointers can be sometimes >.<.

Anybody have access to a DEC VAX or MicroVAX – or more to the point, DEC’s C language reference manual? There was an especially good section on how to read and write complicated C declarations, and how to use the variables thus defined. It would certainly have answered this kind of question.

There are several on-line articles that purport to tell the same sort of thing. Here are two that I just found by googling writing complex c declarations – If you google this yourself, you can find others as well.

http://www.codeproject.com/KB/cpp/complex_declarations.aspx

(I haven’t read those yet, just skimmed them, so don’t know how well-written they are.)

They work through examples, detailed step-by-step, like this:



char *(*(**foo[][8])())[] ;    // huh ?????

One warning I would add though: Several such explications I have seen recommend the “left-right” (or “right-left”) rule, which I have seen in several variations, but as best as I have ever been able to figure, that’s all baloney.

Anyway, read these, and you will either be a better expert at reading, writing, and using complicated declarations than you ever imagined, or you will be more hopelessly confuzled than ever.

psychonaut · October 24, 2011, 7:56am

With respect, you are wrong. The standards explicitly state that both the argv pointers and the strings they point to are modifiable:

psychonaut · October 24, 2011, 8:08am

I think that’s covered by the fact that string literals are compile-time tokens, just like constants. A string literal is effectively a constant numerical memory address. The compiler therefore couldn’t change the value (i.e., memory address) of “foo” in


char *s = "foo";

any more than it could change the value of 5 in


int i = 5;

Senegoid · October 24, 2011, 8:14am

Oops, I didn’t know that. Thanks.

That said, I still think of doing stuff like that as a horrendously bad idea in general.

ZenBeam · October 24, 2011, 12:09pm

Thnaks.

What a loser language. In Fortran you can change the value of 5.

Francis_Vaughan · October 24, 2011, 2:25pm

There are all sorts of other fun issues.

Identity is one. Are two string literals defined in different functions the same string if they have the same string contents?

What if foo and bar were compiled in different compilation units? Any difference if the units were statically or dynamically linked?



#include <stdio.h>
char* randomGlobal1;
char* randomGlobal2;

void foo(){
    randomGlobal1 = "hi";
}

void bar(){
    randomGlobal2 = "hi";
}

int main(){
    foo();
    bar();
    if (randomGlobal1 == randomGlobal2) {
        printf("The same
");
    } else {
        printf("Not the same
");
    }
    return 0;
}

What if all we ever did was use the strings as components in string operations? The compiler would probably be allowed to create code that built the data on the fly, possibly as manifest constants in the executable code.

Bytegeist · October 24, 2011, 2:51pm

A good modern resource for that can be found here (cdecl.org).

psychonaut · October 24, 2011, 2:51pm

This has already been addressed. The standard explicitly states that this is unspecified: “It is unspeciﬁed whether these arrays are distinct…” The compiler is therefore free to store them once or to store them multiply; but portable code can’t count on a particular behaviour.

Jas09 · October 24, 2011, 2:52pm

I’m pretty sure that the result of that comparison is undefined, Francis. I don’t have chapter and verse.

I do know that some compilers (Borland C++ 5.0 being one I have on hand) have a compiler switch called “Duplicate strings merged” that does a check on string literals and combines them if they are identical. So depending on how you set this flag on this compiler the behavior of your test program will change.

As to the OP, your example is fine with the one modification (which you already made) that the global should be a pointer to const char.

psychonaut · October 24, 2011, 3:37pm

Unspecified, not undefined. It is an important difference.

Jas09 · October 24, 2011, 3:50pm

:smack: At one point I actually knew those differences in minute detail… I think I’m thankful I no longer do…

Omphaloskeptic · October 24, 2011, 4:52pm

Pasta:

I’m talking about the more general claim that a string literal exists for the life of the program and has a location in memory that does not change with scope. Indeed, it would be undeniably daft to write a compiler that went through the trouble of moving the memory location of string literals during execution (since they have to be stored somewhere for the life of the program), but that is very different from saying that the language specifies that the pointer must be valid outside of the string literal’s nominal scope.

In other words: if I wrote a compiler that, at runtime, stored the OP’s “hi” string in a sequestered piece of RAM until the execution stack reached foo(), at which point it allocated some new memory, wrote “hi\0” there, returned the pointer requested inside of foo(), and de-allocated that new memory once foo() exited, would my compiler be in violation of C standard? It isn’t clear to me that it would.

To be sure, this would be a ridiculous way to implement string literals, and no compiler does this in practice. But I haven’t yet found a statement in the standard that forbids such an implementation.

I think this is addressed by the requirement that

which means that

(Wording has changed in more recent standards but the meaning seems to be the same.)

Jragon · October 24, 2011, 5:38pm

FWIW, I get a segfault if I try to do something like

char* test = “hi”;
test[1] = ‘b’;
Interestingly, here’s the whole program:



#include <stdio.h>

int main( void ){
  printf("hi");
  char* test = "hi";
  test[1] = 'b';
  printf("%s",test);
 printf("hi");
 return 0;
}

The output?
> ./a.out
Segmentation Fault

It doesn’t even get to line 3, no:

hiSegmentation Fault

Nope, just segfaults before it even prints the first thing. This is with gcc, mind you. I know it doesn’t answer anything about the exact specification of the C language, just a silly test of one implementation.

Bytegeist · October 24, 2011, 5:57pm

This is because standard file I/O is normally buffered. The “hi” is buffered for output at some later time (when the next newline is seen, most likely), but since you segfault right away, you never get to a point where the buffered text would be flushed.

Jragon · October 24, 2011, 6:49pm

I don’t see how that’s possible, you can’t even do that in assembly.

Oh, sure, if you muck around with the program memory you could forseeably do something like

li $t0, 5

Where the binary 101 gets redefined to 111 (7) at that specific instruction, and now $t0 holds 7 instead of the expected 5. But unless your Fortran compiler does some REALLY bizarre flyweighting for integers, even mucking about with the binary will never completely eliminate 5 unless you’re REALLY thorough about it, in which case you’re not redefining 5 so much as ensuring the program never contains 5 in the binary (and being exceptionally thorough, meaning scrubbing all instances where 5 could even be the result of an expression would likely also require running a memory editor in the background while your program is running).

I mean, I’m not saying that mucking with the program won’t redefine 5 for a WHILE up to and including the whole program depending on how big it is, since nobody is going to reload 5 into a register every time they use it (it’ll likely be stored in a register and then reused until 5 is unneeded, in which case it’ll be reloaded at a later date), so it could very well redefine 5 in a very specific, limited, local scope with undefined behavior as to when it resumes being valued at 101, and even then, it’s unlikely doing something like that will redefine 5 in the context of an expression like “6-1” unless you redefine 6 or 1 as well.

psychonaut · October 24, 2011, 6:59pm

It’s explained (among other places) in this Everything2 article. I think I first read about it, though, fifteen or twenty years ago—maybe in the Jargon file or on FidoNet.

Jragon · October 24, 2011, 7:05pm

Ah, so they DO do bizarre flyweighting for integer constants.

Topic		Replies	Views
How does the String object work in Java? Factual Questions	20	2183	March 27, 2009
C++ array question Factual Questions	27	1142	October 25, 2002
Funniest / most useless computer programming comments... Miscellaneous and Personal Stuff I Must Share	24	6916	September 23, 2006
String Reverse Without Declaring a Local Variable or calling another function? Factual Questions	29	2594	October 8, 2004
A C programmer's rant -- or -- How to get the current year in 8 easy steps The BBQ Pit	29	2665	June 18, 2009

Quick C question on string literals

Related topics