Tricky C/C++ Preprocessor Question

I still don´t quite understand this. Yes, initializers of static or file scope variables are executed before the execution of the main programm. But those need to have an unique name, at file scope. For static variables in functions, you could have one identifier for each function. There is no way the macros could enforce this, besides using LINE or whatever. But then, you could never go back and reference the identifiers you have created in a useful way. And, IMO, there is no meaningful information you could extract from those initializers. Also, the order in which they are called is not defined. And if you use the macro on “auto” variables that are created on the stack, not only is their initializer not called before execution of the main programm, but keeping their addresses and dereferencing them is inherently unsafe.

I don´t want to be unfriendly, but it seem to me that the OP wants to implement some sort of homebrew debugger. This will almost never work; a custom made tracer is often used, but the debugger that is supplied with your compilier is practically always superior to your own efforts. The debuggers use all sorts of (maybe) undocumented hooks, and they usually dig deep into the runtime system.

Also, ** Omphaloskeptic **, I think this doesn´t work as you suggested:

#define TEST(a,b) TestObj a(b)
TEST(x,3); // expands to TestObj x(3);

It does not exapnd to TestObj x(3); It expands to TestObj a(3).

MaxTheVool, so this is what you want (it uses the solution from John T. Conklin):



#include <iostream>
#include <map>
#include <string>

#define CONCAT(a,b)                 a ## b
#define REGISTER_VARIABLE2(a,b,c)   register_variable_t CONCAT(unique,c)(a,b);
#define REGISTER_VARIABLE(a,b)      REGISTER_VARIABLE2(a,b,__LINE__)

typedef std::map< std::string, int * > database_map_t;
typedef std::map< std::string, int * >::const_iterator const_iterator_t;

database_map_t database;
static int i0 = 12;
static int i1 = 13;
static int i2 = 14;

class register_variable_t
{
    public:
        register_variable_t( std::string key, int * value )
        {
            database[key] = value;
            return;
        }
}; 
static REGISTER_VARIABLE( "x0", &i0 );
static REGISTER_VARIABLE( "x1", &i1 );
static REGISTER_VARIABLE( "x2", &i2 );

void main()
{
    for ( const_iterator_t iterator = database.begin(); iterator != database.end(); iterator++ )
    {
        std::cout << iterator->first << " = " << *iterator->second << std::endl;
    }

    return;
}

Output:



x0 = 12
x1 = 13
x2 = 14

The trick is to make sure that the debug-variable database was constructed before the variables were registered. I don’t think there is a way to guarantee this.

I agree, those identifiers are not very easy to reference later; but it sounds like he’s basically creating objects to use some side effects of the constructor, which sounds like it should work. (I doubt if it’s the best way to do it; I’m not disagreeing with you about that.)

Maybe. It’s not clear enough to me exactly what the OP is trying to do to state an opinion on this. I guess I could see wanting to do something like this to automate some debugging tasks, but I don’t know much about debuggers.

Really? It expands to “TestObj x(3);” for me (verified with g++), which seems correct. Shouldn’t the preprocessor replace both the “a” and “b” tokens in the macro expansion text, with the provided strings “x” and “3”?

Just cause it works in g++ doesn’t mean it’ll work in all compilers.

I agree, we need more information from the OP to provide a solution. Why are you trying to do this?

It’s exactly correct that I’m using this method to generate a kind of “homebrewed” debugger. Why? Because I work for a game development company. All of the programmers here have, and can use, debbugers. However, all of the designers and artists do NOT. And it’s nice to be able to expose certain variables to them for tuning purposes.
Actually, I’ve used this method three times now for three very different purposes. This most recent time (which actually wasn’t the debugging one) was the first time when it was necessary to have more than one call to the macro at file scope, and thus, the first time that the uniqueness of identifiers became an issue.
Here’s another example of a situation when a technique like this was useful: Many games have a dynamic object system in which many different object types (Player, Gun, Bullet, SpecialEffect) all derive from the same base class, and an object factory creates, deletes, and updates these objects. For various reasons, it’s extremely useful to have a list of all possible object types, and be able to refer to objects, and create objects, via an INT type. Ie, ObjectFactory::CreateObject(int iObjectType, void *pInitData). Thus, we want to be able to assign all the object types numbers, starting with zero, so player = 0, gun = 1, bullet = 2, etc.

The obvious way to do that is just to have some central file, ObjectTypes.h, which includes a big enumerated array listing all object types. However, this means that adding and removing new object types requires touching a central .h file, which means that (due to dependencies), all files in the project must recompile, plus if you’re using source control software, multiple programmers might want to have ObjectTypes.h checked out simultaneously, etc.

Using the MACRO-construtor method, you can have a line up near the top of player.cpp which says REGISTER_WITH_OBJECT_FACTORY(Player, gPlayerClassIndex), which will register the player class with the object factory class, and set the variable gPlayerIndex to whatever unique value means player. Thus, object types can be added and deleted without any central list of object types existing at all.

(One drawback is that instead of saying
if (pObject->GetObjectType() == OBJECT_PLAYER)
you say
if (pObject->GetObjectType() == gPlayerClassIndex)

that is, you’re comparing a member variable to a global variable, rather than comparing it to an immediate value, so it presumably executes in a slightly less efficient fashion… although you could probably get around that, too, with a special release mode build in which gPlayerClassIndex is actually an immediate value)

(Actually, using this method to register things with an object factory class has a few other tricky details involving constructors and so forth which I’m skipping for the moment).

Why can’t you use dynamic_cast to do this?

if ( dynamic_cast<Player*>(pObject) )

Good question. Primarily because, to be honest, I haven’t the faintest idea what a dynamic_cast is. (Note my post a while back in which I admitted that I learned C++ from C in a rather ad hoc fashion).

(Max does some quick googling)

Hmmm… so how does dynamic_cast work, internally? By comparing equality of function tables? I suspect that’s going to be quite inefficient, execution-wise, which might explain why no one in the game industry (at least, no one I’ve ever worked with, over several different projects at three different companies) seems to use it.

After some more googling, it appears that dynamic_cast is quite slow, when compared to yet another thing I’ve never seen used (how embarassing!) (but I suppose we’re here to fight ignorance) called typeid.

But, after said googling, I kind of suspect that either of those methods is going to be slower than what I was talking about here, and in games, performance is paramount.

In any case, there are also a bunch of other reasons why it’s nice to have int-to-objecttype-and-back conversion… for instance, looping through all object types, having arrays of data indexed by object type, having a world-building tool which can place objects and use an int to specify what type of object they are, etc.

It definitely should, as I understand it. The preprocessor should split tokens the same way a C compiler does.

Hang on, my quick googling was a bit too quick. The actual answer is that it’s because we’re not using RTTI, because of the attendant performance issues.

Well, yes, I know that. I was using g++ as a sanity check to verify what I already thought I knew. So do you think that after

#define TEST(a,b) TestObj a(b)

the macro

TEST(x,3);

will expand as “TestObj a(3);”, or as “TestObj x(3);”?

I was under the impression that all instances of identifier tokens appearing in the macro replacement list were replaced with the corresponding replacement texts. In this case the identifier tokens are “a” and “b”; in the macro call the replacement texts are “x” and “3” (respectively), giving “TestObj x(3);” after expansion. Can you explain why this shouldn’t be the case?

Max, I am not sure if you caught the warning in a few of the previous posts. Be very careful when you rely on static variable construction. I don’t believe that the C++ standard defines any rules governing the order that static variable have their constructors called. Compiler vendors are free to implement this behavior however they want.

In your example above, the static variable in player.cpp might be constructed before the object-factory variable is constructed. This means player.cpp will be calling the register function of an un-constructed object-factory. The kicker is that it might work fine until you change the link order in the makefile or if you use a different compiler. This might re-order static variable construction and start blowing things up.

If you decide to go ahead, you might be able to mitigate the problem by attempting to detect if the object-factory is already constructed when the register function gets called. You can use magic numbers to minimize the risks.

Compiler bugs.

That’s kind of astonishing. If you’re not even enabling RTTI, you’ve left the land of standard C++.

Have you profiled RTTI use? Or is this just antiquated paranoia?

Indeed, this is one of the dark corners of C++, see the FAQ.

From the standard, 3.6.2/1

Because you’re using a user-defined constructor, it’s dynamic initialization AFAICT. So you’re guaranteed the objects will be initialized in the order they’re declared in a single translation unit (cpp file) but there are no guarantees across multiple translation units.

Agreed. Although I suspect that static variables (ie, “static int x=3”) will always be initialized before class constructors are run, and I’m pretty sure that’s what the snippet that emarkp quoted means. As long as that’s true, it’s pretty easy to set up some safeguards to make sure the classes get initialized in the correct order.
As for RTTI, as I said, I’ve worked in C++ across multiple projects at 3 different game companies, all of which have employed people more knowledgeable in C++ than I am, and no one has used it. Take that for what it’s worth.

In general, console games tend not to have very deep or complex struct inheritance trees… maybe the general feeling is that it’s rarely needed, so why waste the overhead in time and space?

Or, maybe they don’t know about it. I work with a lot of very knowledgeable people, and only a handful of us know our way around the C++ casting system.

It’s definitely worth a look, just to see how the performance is.

I think the cost of RTTI is more memory than CPU and this is roughly demonstrated by the homebrewed RTTI mentioned at the top.

RTTI should add:

  1. Memory: A type-identifier per class.
  2. Memory: A pointer to the type-identifier per instance of an object.
  3. CPU: One or more member functions called per dynamic_cast<> to look-up the type-identifier and compare it. I guess the number of functions would be proportional to the depth of the class hierarchy.

The homebrewed version has all of these features. Although since they are not a generic solution, they can be optimized a bit (using an int as the type-identifier and assuming a hierarchy depth of one).

I googled “RTTI overhead” and found this gaming-oriented discussion of RTTI and exceptions. There are tons of responses and I didn’t read through them. I think RTTI gets lumped in with exception-handling since it became prevalent around the same time.

I agree with that, but why aren’t they simply initialized in main()?

I have 7 years of professional C++ experience. My previous job was writing highly optimized (meaning careful algorithm and data structure design as well as Quantify/VTune runs) internal CAD software at Intel. My current job is writing physics simulations for radiation treatment software. It’s extremely resource-instensive, and there’s a lot of market pressure to make it as fast as possible.

RTTI has no detectable impact on the runtime.

The fact that you’re not even familiar with what dynamic_cast is doesn’t speak highly of your knowledge of C++. I strongly suggest you read Stroustrup’s 3rd Edition of The C++ Programming Language. Accelerated C++ may be a good one as well (though I haven’t read it). Then read Effective C++ and More Effective C++. I highly recomment Herb Sutter’s Exceptional C++ series as well, or you can read some of the origins of those books (though the books are more updated, these articles began from challenges on comp.lang.c++.moderated–Sutter prepared the questions and answers and created some good discussion in the group, now they’re available in book form).

If others at your workplace aren’t familiar with dynamic_cast, I strongly suggest this regimen for them as well. The language is your tool, and if you keep using a hammer to drive a screw you’ll just end up frustrated.

It’s a premature optimization based on empirical data roughly a decade old. Throwing it out means you lose a very useful language feature and have to go into dark corners of the language, possibly for no measurable benefit. What’s the cost to development time for all of that?

Because to do so would require that any time a new object type is added, a line is added to main(), which makes it harder to keep ones work localized, requires programmers to fight to check out a central file, etc.

No argument there. Like I said, I learned it piecemeal while transitioning from C. What seem to be the important parts of it (classes and inheritance) I have learned and use. But there are other parts (RTTI, exception handling) which remain a mystery to me.

Wait a minute. You mean there are static initializers in multiple files? Well, your order-of-operations is completely unknown then.

Do you really add initialization that frequently? Do you know about the PImpl idiom for source code firewalling (which decouples a lot of #include dependency)? Do you really use a revision control system that doesn’t allow merging?

I strongly suspect that you have many incorrect ideas about the portions you believe you understand. Check out some of the books I listed, even if you just page through them at the local bookstore. You might be surprised at how useful it is to learn to better use your tools.