Quick C question on string literals

I’m a bit rusty on my C but I need to know this, are string literals in the data section header (or at least somewhere else that’s safe for pointers), or the stack frame of a given function? (Pretend I have prototypes and other things to make this compile, just meant to be a quick example).



char* randomGlobal;

void main(){
foo();
bar();
}

void foo(){
randomGlobal = "hi";
}

void bar(){
printf("%s",randomGlobal);
}

Is this valid? Also important, if it is a valid, non dangling pointer construct, do I have to free it manually, or is the memory for “hi” auto-reclaimed at program termination? I really should know this, but I’m doubting myself and I don’t want it to blow up in my face later (I don’t need this exact construct, it’s a bit more complex, but something similar would likely end up happening as a side effect, and there’s unfortunately very little wiggle room to restructure the code).

Well, void main isn’t valid to begin with (it needs an int return value, and to return an int), but what you’re doing is valid.

Your mental model is screwy, though: String constants live in their own RAM, which is read-only (or so you must assume (really, it usually is)) and exists for the lifetime of the program. Since it’s pretty much created at compile-time, though, it can’t leak: The size never changes. (Linking, loading, and binary formats would require a post of their own. This explanation is sufficient.)

When you assign to your pointer, what you’re doing is making the pointer point to the string constant, which is valid.

In addition to the problem of the return value of main(), your function definitions need to specify parameter lists. Merely saying “foo()” is not acceptable for a definition; if the function takes no arguments you need to indicate this as “foo(void)”. Generally speaking, the only valid definitions for main() are “int main(void) {…}” and “int main(int argc, char *argv) {…}”.

On my compiler foo() and void main() are both valid, I know technically C syntax mandates those, but it seems in my compiler void main infers “int main” ended with “return 0”, and foo() infers foo( void ).

Also, Derleth – isn’t the string literal in the immutable RAM exactly what I was talking about with the data header? In MIPS asm:

.data
str: .asciiz “hi”

Would be addressed in the RAM (I’m fairly sure).

Edit: And to be less mysterious, the latest version of gcc is the compiler I’m talking about, seriously, try the program:



void main(){

}


It will compile and run just fine. I usually do it right for style reasons, but I was just whipping up a quick tiny example so didn’t really feel a need to be all exact and proper style-like.

You should probably learn to use gcc better, so it will catch more errors for you.

Data header… right. That’s what I meant. I didn’t notice that part of the OP. What happens is that all of your string constants are placed into your program’s binary as dictated by whatever binary format your compiler generates, and when the program is run they’re put into RAM in some system-dependent location via some system-dependent process such that your program can read them but not write to them.

They may not stay in RAM, of course, given how common virtual memory and swap are these days, and if you’re writing code destined for an embedded system constants are often placed in ROM instead of RAM, but these trivialities do not bother the C programmer. We have pointers to arithmetic and heaps to manage!

This is with -Wall… at least on my system.

That’s a really bad way to program something. You are setting a global to something that is local to foo() then when foo() exits you are trying to use it. In general, that’s a great way to make a program crash. In this specific case though, since “hi” is a string constant it’s still going to be valid even after foo() exits so the program won’t crash, but it’s still definitely not a good way of doing things. If you want your string constant to be accessed globally it should be defined globally, if for no other reason than to make it clear for program maintenance purposes that it is global.

Something like this would be better (this changes only the string and doesn’t include the changes the previous posters have commented about). From a programming maintenance point of view, this makes it much more clear that the string constant is used globally and isn’t just local to the function foo().



char* randomGlobal;
char constGlobal="hi"

void main(){
foo();
bar();
}

void foo(){
randomGlobal = constGlobal;
}

void bar(){
printf("%s",randomGlobal);
}

In the real world, the constant string would often be included in a resource file, if for no other reason than it could be easily modified to handle translation to another language.

As for memory, even when a program leaks memory like a sieve, that memory is freed when the program exits (on most modern operating systems, at least). One notable exception to this though is if you create shared memory, since this won’t be automatically freed by the OS when the program terminates.

I eventually settled on using a const char* const, yes. I figured it safer in the long run.

For the curious, the problems are in that I’m actually working with flex and bison, and I have absolutely no idea what the code it generates looks like, so if I happened to bubble up a string literal (with YYSTYPE as char*, of course) using $$, I wasn’t sure what it would look like in the code it created, so I wanted to be sure that returning a literal would not kill me, at the very least, since I think $$ actually denotes the return value of a given nonterminal*. Normally if I was actually writing the code, I would use a const char* const, hence why I wasn’t actually sure whether or not a literal would work. I just wasn’t sure exactly how bison handled it (variable? return value?).

  • It’s functionally equivalent to a return value, I’m just not sure of the actual implementation.

-Wansi is a good idea as well.

Well, yes, but I already knew it was improper ANSI C syntax :p. Either way, I promise that when I’m actually writing a program I declare my mains as ints and my no-args as voids like a good little coder boy, it was just a silly example so I got sloppy and lazy.

Languages have gotten so cluttered up with rules “for your own good” that they’re a pain in the neck to code. (Java being the worst of them lately.) What happened to the good old days when coding was easy and uncomplicated and SOMEBODY ELSE got to do the debugging? :stuck_out_tongue:

Is the latest generation of programmers generally familiar with that 1982 classic, Real Programmers Don’t Use PASCAL by Ed Post? If you ain’t, y’oughta be.

Does anyone have a cite that says this behavior is specified by the language? I took a quick look and couldn’t find anything. To be sure, a compiler would have to go out of its way to implement string literals any other way, so I don’t doubt that this always works in practice, but I am curious if it is actually specified in the language that local string literals survive outside of their nominal scope.

Are you talking about Derleth’s claim that “string constants live in their own RAM, which is read-only (or so you must assume (really, it usually is)) and exists for the lifetime of the program”? If so, he’s pretty much correct. The C standard says of string literals, “It is unspecified whether these arrays are distinct provided their elements have the appropriate values. If the program attempts to modify such an array, the behavior is undefined.” This means that they may or may not be read-only, but for portability you ought to treat them as such. The standard also claims that string literals are tokens, just like numerical constants are. If you want a cite, consult the sections “Lexical elements” and “String literals” of your favourite C standard (C89, C99, or the draft of the upcoming C1X). In C99 and the latest C1X draft, those sections are 6.4 and 6.4.5, respectively.

The advantage to being unwriteable, of course, is that they can be shared among several executing instances of the same program. Most compilers only started doing this two decades ago or so; earlier some of us used Bill Joy’s `xstr’ program to achieve read-only strings, e.g. in a system where dozens of instances of a terminal interface ran, with hundreds of twisty messages all very similar.

OP’s program would have compiled warning-free in those days since the `void’ keyword didn’t exist. Some people used it … but via
#define void int
:smack:

When that was posted on Usenet, someone responded, in effect, that Real Programmers don’t program in Fortran either.

What about parameters passed to a code? In

int main(int argc, char *argv) {…}

Can argv* be treated as a variable, and its value be altered, or is it more like a string literal? I did this once in the past (and documented that I had no idea if it was legal), and it worked, but I have no idea if it’s non-portable.

What do you mean by “altered”? I mean last I checked function parameters were variables and you could change them. But that’s only local to the function body. When you return to the caller the old value is still there. IE this
{…
int x = 1;
foo(x);

}

void foo(int x)
{
x++;
}

In the first block x is always 1. In that function foo the function param ‘x’(which is really a variable in the foo function that had 1 put into it) becomes 2 partway through.

BTW, I quickly scanned both “real programmer” articles. So did they mention that the only reason a real programmer wouldn’t walk through 10 feet of snow up hill both ways is that he actually lives in his cubical the whole time? :smiley:

Modifying argv*, as in

argv*[1] = ‘s’;

Assume argv* has at least 2 elements.

(Referring to the “Real Programmers Don’t Use Pascal” article:

Ah, I vaguely recall that I might have seen that too.

Yes, I coded for a machine like that too – The Univac SS-90. It was a decimal machine, with a bizarre BCD encoding (called bi-quinary, in which each digit was encoded as four bits having place values of 1, 2, 4, 5), with memory on a rotating drum. Total memory was 5000 words of 10 decimal digits each.

All the components were discrete parts, all wire-wrapped. We had some EE students who kept the thing running, and a complete set of logic diagrams. They designed several new instructions for the computer, which they hand-wired into it. We called this “Hands-In Programming”

In the earliest conception of C, which was essentially a sort of “high-level assembly language”, this was certainly “legal” and doable. Computers didn’t always have the “read only segments” that everyone talks about, like modern machines do.

That never meant for a minute that it was a good idea, or that any such code was ever portable.

What they mean when they say the arguments are variables that can be altered is this:
argv itself is, effectively, a local auto variable, and you can certainly alter it with a statement like:

argv = (pointer to somewhere else) ;

whereupon it no longer points to the original argument array at all. This does NOT mean that you can necessarily modify any of that data that the original argv pointed to – even if, in fact, you might have been able to get away with it.

OTOH, you can certainly modify the contents of an writeable array, given only a pointer to the array, providing that the program was designed so that you can, and this is a perfectly normal thing to do. But probably not with argv.