Why are assignments in conditionals legal in C?

I don’t know, when I learned that Java had a Boolean type, I was excited, because it meant that I could do things like implement a Sieve of Eratosthenes using only a single bit for each number. And then I learned that the type was implemented using 32 bits of storage to do the work of a single bit. Yeah, yeah, I know that memory inefficiency isn’t really a problem any more, but that sort of gratuitous wastefulness still bugs me.

You say it’s gratuitously wasteful, but I would would challenge you to create an implementation that uses less CPU and less memory.

If you store as bits you need to do one of the following:
1 - store a mask used to extract each bit (more memory)
2 - Dynamically calculate the mask each time (more CPU)
3 - Some combo of #1 and #2 that shifts the balance between mem and cpu
Either way, you can easily just do bit operations for your Sieve, you just have to do some masking to get and set.

Or just use a BitSet to have this already pre-implemented for you :slight_smile: (at least with one particular space/CPU compromise).

The main issue is that CPUs can not address single bits (and that is generally a good thing), so the booleans have to be packed into something. JVMs are really centred around 32-bit slots, and if you only have single boolean, it’s not like you can use the rest of the slot for something else without a lot of song and dance.

To be clear, when I said this, I was mimicking the voice of a crusty old programmer from the 80s of the type who would use early C. I don’t actually believe that about the utility of booleans.

Rereading it, I’m not quite sure that distinction came across in text. :slight_smile:

For clarity, I like to see an actual comparison even when an assignment is done. For example, I prefer this:

/* good */ while ( (line = readLine()) != null)

as opposed to this:

/* bad */ while (line = readLine())

Whenever I see just an assignment in a conditional, I have to take a bit of extra time to figure out if the programmer made a mistake or if that’s what was meant. By always explicitly having a comparison done, it eliminates that ambiguity.

Computers start to get slower once you start playing with word alignment. I’ve been working with a lot of libraries lately that intentionally pad structs and enums with arbitrary values to make sure they’re a multiple of 32 bits. It’s probably not actually worth it unless you’re working with some embedded device that expects 32-bit-multiple packets of data or something. Especially since most languages (including C) will usually secretly pad it for you unless you enable certain flags or preprocessor commands, but it is certainly true that the processor’s word size is important, and is largely why booleans are usually the same size as an int.

And as for uses for = in conditionals, there’s always the unholy clone of strcpy:

while(*dst++=*src++);

This has to be the single worst line of coding in computer programming history. It’s definitely unholy. But I’m not sure why you call it a “clone”. It’s right there in K&R. (At least in their first edition. It may have been censored in later editions.) At least put some white space around the = ferchrissake.

In a single line, it’s a terrible example of the so-called “structured programming” we were supposed to be doing in those days. It sort of looks structured, but it really isn’t at all, and the unstructuredness of it is totally disguised.

When I first saw that, I had to stare at it for about ten minutes to figure out how it worked, and when I figured it out, it was a major :smack: (There. One of only two smilies that I will use any more.)

Challenge quiz for C programmers: What do you think was so hard to understand about that one line of code? (Clearly enough, it moves a sequence of bytes from src to dst. That isn’t the problem.) And why is it unstructured, even though it looks so nicely structured?

I don’t know if this counts as a “structured vs. unstructured” distinction, but the reason I don’t like it is that it depends on arcane knowledge about when the ++ takes effect. It also (though this is a lesser consideration) relies on the fact that C strings are null-terminated, and that the value of the null is zero (i.e., false).

One other reason was that optimizing compilers were relatively primitive back then. By knowing the architecture you were writing for, you could write your C code in a way that was easy to optimize on that platform while still getting the benefits of using a high-level language.

I certainly do claim that liberty! These are breathtakingly bad style, especially the first one. They’re very clever, and compact, but offer absolutely no advantage over more expanded forms which are easier to read and verify for correctness. It takes a couple of close readings, for example, to see that the first expression could be more clearly written as:

z = min((unsigned)(g - w), (unsigned)l)

I even checked to make sure, using a reasonable definition of min:



#define min(a,b) \
       ({ __typeof__ (a) _a = (a); \
          __typeof__ (b) _b = (b); \
          _a < _b ? _a : _b; })


When compiled with gcc using -O1, both versions generate identical assembly:



	subl	%esi, %edi
	cmpl	%edi, %edx
	movl	%edi, %eax
	cmovbe	%edx, %eax

That said, I agree that crippling a language for the purpose of attempting to prohibit bad style is a mistake. There are lots of legitimate uses for the expression value of an assignment that are not hideous affronts to the craft. :slight_smile:

Hey, at least it has a comment.

Of course, null-terminated ASCII strings are the source of another giant batch of pains-in-the-ass, that C’s favouring of them forces us to deal with.

Google “C sequence points” for more info than you’ll ever want to know about this subject. However, the question isn’t whether it’s deterministic but whether the semantics are defined. If the semantics are undefined, the results could be deterministic but unpredictable without testing, or they could be undeterministic. (A compiler is allowed to do anything when semantics are undefined, including, be nondeterministic – though that’s a lot harder than it sounds – or crash, or sing the Star Spangled Banner.)

One of the problems with C is that order of evaluation was left unspecified for many operators (on purpose, to allow the compiler to make the code fast) and as a result, many syntactically valid expressions had undefined semantics.

A lot of that has been clarified and nailed down using the concept of sequence points, to clarify the order in which things have to happen, while also allowing maximum flexibility for optimizers to rearrange code without changing the semantics. Only a true nerd could love it.

That has to be one of the more significant exaggerations in computer programming history. As counterexample to the superlative claim, consider a “shortcut” that was occasionally seen in VAX code when implementing an oft-seen predicate like:

/* don’t process unless ptr is non-null and points to a non-zero */
if (ptr && *ptr)
      process(ptr);

Because a VAX OS always had zero at address zero, a microsecond(*) could be saved by writing
if (*ptr)
      process(ptr);

(* - Yes, microsecond,not nanosecond. Computers were slow back in the Jurassic.)

Anyway, one needn’t defend obfuscated code to argue that the simplicity and regularity of C is a virtue. At least it’s straightforward to figure out what a line of code does. It’s not always so easy with languages rich in “user-friendly” complexity.

What? No love for Duff’s device:

Not particularly, I prefer to let optimizers do my loop unrolling for me :p.

One fun case of this is the Go compiler’s handling of its map type (a fast hashmap). The range operator allows you to range over key,value pairs in a loop. The spec specifically says this order isn’t guaranteed, and the compiler reserves the right to return key/value pairs of a ranged over hashmap in any order. Early on people figured out the deterministic order of return values in the current implementation, and were relying on it very very heavily. Heavily enough that it threatened the freedom of the developers to change the internal implementation.

So the compiler now deliberately randomizes the return order whenever range is called on a map. I mean, it’s still deterministic in the sense that I doubt they’re using /dev/random or anything, but for all intents and purposes it’s enforced nondeterminism.

(Note that this only applies to maps which have an unguaranteed return order by the spec. Slices (lists) and arrays have a deterministic return order starting at 0 and ending at [length of list - 1].)

The only other case of nondeterminism I know is inter-compiler, not nondeterministic within a single environment. I read the blog for the game Banished, and he uses a deterministic random map generator (a given seed will always produce the same map). He used a fixed seed for the tutorial maps, but on some platforms, the map was wrong and things were being generated in the middle of a lake. The cuplrit was the C++ code calling a constructor function like Vector3(xCoord, yCoord). Since it was deterministically random, he called:

Vector3 position = GenerateMap(rng.RandFloat(), rng.RandFloat());;

Now what’s wrong with that? There’s no sequence point. Arguments to functions are evaluated in an unspecified order, so on some platforms it works fine. On others, it fails and generates the opposite vector from what you intended. Apparently he had it fail on the same compiler depending on whether he had his debug flags on or not. So the bug disappeared every time he tried to debug it.

I really wish SQL implementations would do that for queries that don’t have a “ORDER BY” for the same reason. If you don’t explicitly tell SQL how to order a result, there’s no order guaranteed-- I can’t count the number of buggy programs I’ve seen as a result of this.

If I ran the universe (or at least a group that implements SQL), I’d have it purposefully return the results out of order maybe 1-in-25 query runs, a high enough percentage of the time so that developers will notice the missing “ORDER BY” while writing their queries.

A succinct example of why C++ is not for the faint of heart. :smiley:

Its rules are tricky enough that even the experts seem unable to get them right consistently. For the longest time, as I recall, Google’s official C++ style guide mandated the inclusion of an explicit virtual destructor in any class containing other virtual methods, ostensibly to guard against the possibility of undefined behavior when operator delete is invoked through a pointer to an object of a derived class – without that vtable entry for the destructor, you’re hosed.

Unfortunately for Google, that’s not the whole picture. The standard unambiguously requires virtual ~Destructor() whenever such an invocation of delete occurs, regardless of the presence or absence of a vtable. Any code that contains “delete foo” where the type of foo is Base* and it points to an object of type Derived, where Derived : Base and base does not contain virtual ~Base(), cannot possibly be valid C++ and is liable to succumb to nasal demons at any time.

Or hell, just look at the relatively recent upset over “const” and “mutable” in C++11 – the standards committee itself had made serious, fundamental changes to the meaning of two of the oldest C++ keywords that not even they noticed or understood until well after the standard had been published.

C++ is a fickle bitch. If you think C is bad…