I would describe myself as of a “low-level DIY bent” as well. The beauty of C++ is that it is very expressive and yet lets you write code where you know exactly what is going on at the instruction or byte level. Here, I have four examples of things that, as far as I know, simply could not be implemented in anything but C++. These are real things in code shipping to hundreds of millions of customers that I implemented within the last year.
Compile-time string scrambling
We sometimes have a need to put debug switches, messages and the like in shipping binaries. These may have strings that we don’t really want to appear in plaintext. It’s not that they’re a security issue, but we just don’t want to make it easy for people to get into trouble, and some of the strings could be taken out of context. So we like to scramble some of the more sensitive ones. Not hard encryption, just something that makes people work a little if they want to open the hood.
We had been doing this in ad hoc fashion; some people XORing in constants to each byte in an array; running an offline script; manually converting bytes; etc. All pretty dumb. Instead, I wrote a library that encrypts strings at compile time using templates. In the codebase, the string appears totally normal and in plaintext–but it never appears in the binary. Instead, it’s encapsulated in a function that descrambles on demand. It’s all completely automatic and only requires the user to wrap the string in a single macro.
Compile-time hashing
Similar to the above, we sometimes need to hash strings such that the strings never actually appear in the binary. Unlike the scrambler, these are totally irreversible. People had been using offline tools to generate the hashes, pasting the result into the code, but this was both a pain in the ass and error prone. So I implemented the same hash routine using templates. Again, nothing offline was required, and the input string never appears in the binary in any form. This could be used on the crappiest microcontroller because only the final bytes are ever stored.
As it turned out, implementing this found several bugs where, due to capitalization and some other issues, the existing collection of hashes just had some errors. With the new system it’s impossible to ever have the wrong hash.
Automatic argument packing
We have a multiprocessor system that communicates through a shared memory space. One processor is just the normal CPU; the other is a massively parallel processor. However, each of these processors is simple and only has a few hundred bytes of fast “RAM” (actually registers).
We wanted to send messages, sorta like remote procedure calls, from the parallel processor to the host. But the message buffer is of limited size and has some restrictions like alignment. How to enforce these on the parallel processor at minimum cost?
Again using templates (variadic this time), I was able to write a library that packs an arbitrary set of function arguments together and writes them out to the shared memory. It does all of the alignment computation and bounds checking at compile time–if you overflow the buffer on a given call, the build will fail. And it’s impossible to ever have a misaligned argument. At the same time, the resulting code is as minimal as it’s possible to be: a series of memory writes at different offsets. So it’s both provably optimal and completely safe.
Automatic variable-length structure packing
Similar to the above, but this time we’re sending full data packets between devices, each of which consists of a series of C-like structs (which may be variable length), and again with alignment constraints.
I wrote a (templated again!) library to generate the proper size and offsets automatically given the requested structs. Performance, while also optimal, was actually not the primary concern here. Instead, type safety was the goal. Once the buffer was created, you could then request a pointer to one of the substructs contained in it. It would in return give you a typed pointer to that struct. The client did not have to do any casting at all–all of the type handling was done by the library, again at compile time.
The previous code had all kinds of nasty pointer math, having to cast back and forth between byte pointers and otherwise, and was very fragile and error prone. The new code is trivial and if someone adds a new member to the packet, everything else “just works,” shifting and realigning the other members as necessary, as well as computing the necessary buffer size.
The common element here is that a huge amount of work is done at compile time. This has two big advantages: it’s fast (and provably so, because I can inspect the generated assembly to see how it compiled); and it’s very safe, because it ensures that a huge class of errors are impossible to make. There are languages, like C, that have the same low-level determinism as C++. And there are some, like C#, that have the same degree (or more) of expressiveness. And some that have the same degree of type safety. But I don’t know of anything else that gives you all of these together.
Some of the things above could have been implemented using generated code. We’re no stranger to that, either: using a script to generate C or C++ in exactly the form you want. That works, but increases the complexity of the build system and probably means there’s another language to know. It’s nicer when you can do everything in one environment.