C++ and forward class declarations and recompilation

leahcim · February 14, 2017, 3:56pm

In my day-to-day work, I work on a large C++ code base, and people often use forward declarations instead of #includes to reference another class.

I.e. instead of:


#include "foo.h" // contains class foo

people will use


class foo;

The principle is that with the #include, the current file will be recompiled every time foo.h changes, while with the forward declaration it won’t.

BUT if you look at Google’s style guide, it says:

So I am trying to recommend a best practice here. I have a vague feeling that using this technique to reduce recompilation is just maintaining a practice that was only relevant when compilers were much slower, or that modern compiler optimizations are smart enough to save on recompilation even with #includes or something.

Is this (using forward declarations instead of #includes) still a worthwhile practice?

Rysto · February 14, 2017, 4:38pm

Forward declarations are absolutely a best practice and I’m flabbergasted that Google’s style guide suggests otherwise. At my previous employer, one of the senior people would periodically go through the (very large) C++ codebase and replace unnecessary #includes with forward declarations. It was incredibly tedious work, but he was able to wring impressive compile time improvements by doing this. C++ compilers are not fast because the language is not a simple one. Unnecessary #includes in header files can very quickly add significant time to the overall compile time as the same class definition is unnecessarily compiled over and over and over again by files that never needed to use the class in the first place.

iamthewalrus_3 · February 14, 2017, 5:25pm

Although I’m sure there are cases where replacing #includes with forward declarations is worth it for compiler performance reasons, doing so by default seems like premature optimization. For most code, compilation time doesn’t dominate, and spending developer time making things compile faster is often a waste.

But note that style guides aren’t necessarily about efficiency, they’re about conformity and ease of reading. And style is subjective, so you’re not necessarily going to get good answers from the peanut gallery here. The important thing is that your whole team uses a consistent style. So, if you’re working at Google, or with a codebase that uses that style, use it.

Sometimes the preferred style is not the most efficient one. That’s ok. Saving humans time in understanding code because it’s all the same style is well worth a few trillion extra clock cycles at compile time.

Because, of course, you’ve got a nightly autobuild system doing your compiling for you anyway, right? So the only time you’ve got wait for that file to be recompiled because the header changed is when you changed the header in your working checkout. If you don’t already have this, the gains of a few minutes writing a cron script are going to dwarf what you’d get from going and changing all your files to forward declare. Pick the low-hanging performance fruit first.

markn_1 · February 14, 2017, 5:46pm

There are more reasons to do this beside compiler speed. If you’re using a pointer to an object as an opaque token, it’s good to hide all implementation details of the class. For example:



class Foo;
std::shared_ptr<Foo> getFoo();
bool runFoo(std::shared_ptr<Foo> foo);

In a case like this, it ensures that the calling code doesn’t inappropriately access the members of the Foo that came from getFoo. Only the code that really needs to use the internals of Foo needs to include foo.h.

Rysto · February 14, 2017, 5:55pm

A few hours? You must be joking. For a C++ codebase large enough where this would be worth the time to set up, integrating this kind of thing into your build system is a large investment. I’m not saying that it wouldn’t be worthwhile, but this is not something you can hack together overnight.

leahcim · February 14, 2017, 7:08pm

We do it better than nightly – we have builds running after each check in on various platforms. But this would still help developers doing local builds in principle.

Still, I’m leaning towards not doing this as a matter of policy. In most cases the benefit in build times is slight compared to the cost in confusion.

Pleonast · February 14, 2017, 7:36pm

As always, readability can be improved by useful comments. Something like



// #include "Foo.h"
class Foo;

makes clear what the intent of the code is, while avoiding the unnecessary dependency. Also, there’s sometimes “chicken-and-egg” problems where classes refer to each other and they can’t simply include each other’s headers without a forward declaration somewhere.

That said, I rarely see forward declarations. Instead, there’s small interface classes and header files that take care of things like that, while the complicated headers are used only by the code that implements the complicated stuff. If you’re inclined to use a forward declaration, it means you should be writing a small “helper” header instead, which may be nothing more that a forward declaration.

iamthewalrus_3 · February 14, 2017, 8:09pm

I’m not sure I understand what you’re saying. Are you saying it takes significant time just to type the commands that run the build?

If so, you’ve got bigger problems than forward declaration style.

edwardcoast · February 14, 2017, 9:04pm

I’m all for efficiency, but today with modern hardware how much time is really being spent by reducing compile time? Certainly it is a lot of time spent to go through and make changes in existing production code.

Rysto · February 14, 2017, 9:11pm

It takes a significant amount of time to build the system to build the objects for every active branch and store them away where every developer’s build can find them, set up a system by way old objects can be discarded (without opening up race condition that lead to failed builds), and then extending your build system to correctly determine with a 0% false positive rate whether a cached object can be used, taking into account compiler versions, target architecture and version, compiler flags and header files. Then you need to prove that this system actually leads to decreased build times for normal developer use-cases. This may require restructuring your code and/or build system. Reducing header file dependencies using forward includes can be critical to such a scheme, as unnecessarily included files will cause your caching system to incorrectly conclude that files need to be rebuilt when such headers are changed.

Furthermore, as your code base grows and you scale your developer team up, you’ll find your system needing to support more and more development branches in parallel. Reducing compile times will reduce the rate at which you have to add more build machines to support more branches, and also helps people who want to test out wide-ranging changes like new compilers, changing compiler flags or new build targets.

Rysto · February 14, 2017, 9:15pm

At the job I’m speaking off the C++ codebase easily numbered in the millions of lines; compiling the lot could take 40 minutes. Multiply that by 100-200 developers, all of whom work on 2-5 branches each, and getting that 40 minutes down to 30 or 20 minutes is a massive savings for the organization, both in expensive developer time and sheer build hardware requirements.

leahcim · February 14, 2017, 9:16pm

We do use include guards to prevent recompilation even if things are included multiple times. I.e. every header file is surrounded with:


#ifdef _SOME_UNIQUE_TOKEN_
#define _SOME_UNIQUE_TOKEN_

....

#endif

So even if a file is #included thousands of times, it is compiled only once. I think that is a fairly standard technique.

edwardcoast · February 14, 2017, 9:27pm

With that much code, does the 100-200 developers have to compile the entire thing (millions of lines) each time? Seems like there are more efficient ways to do development than making a line of code change, and compiling millions of lines of code, even if the 40 minutes were reduced to 20 minutes.

And then there is Software QA, who would have to test the entire thing again, and perhaps develop new tests for the legacy production code. Because you never know what could be broken or what impact it might have elsewhere.

Rysto · February 14, 2017, 9:57pm

Well yeah, that’s a completely necessary technique to prevent compile errors. The problem comes when you get hundreds or even thousands of source files (.cpp) all indirectly including the same header, despite none of them actually needing the definition. This can add up surprisingly quickly, especially given the multiplicative effect that can arise: if Foo.cpp includes A.h which includes 3 header files, and those 3 header files all include 3 header files, you very quickly find your self including a truly enormous amount of unnecessary definitions in every file.

Yes, those are all problems when you have a large code base, but there’s really no getting around it. Large, complicated products will have large and complicated codebases.

And certainly improving the speed of the overall build is not the only initiative that the company ever took to make it easier of developers to work with the code. There was an excellent automated system test infrastructure available to everyone. Unit testing was used extensively. Jenkins was used to build feature branches and automatically run both the unit tests and a subset of the automated tests. I myself, along with a colleague, rewrote the build system from the ground-up to make it possible to build only a specified subset of the system (this brought build times for common operations down to about 5-10 minutes). But in the end you need to be sure that you haven’t broken the build entirely because some component you never heard of depended on an API that you modified, and the only way to do that would be to compile everything. Now, you often a developer would offload that responsibility to Jenkins, but there was real value in reducing build times to slow the growth in demand for new build servers in the build farm.

Isilder · February 15, 2017, 10:31am

Its the conditional #ifdef in the .h files that google are worried about.

Forward declare is all right for writing code for a single platform, but google are writing it for many platforms. They want to know that platform specific stuff (as chosen by all those #ifdef 's ) didn’t break the build for any other platform. An edit to the .h might have moved something wrongly into , or out of, a platform specific part.

Also, the substance of the object might be changed by the .h …

If you use forward declare, you have to lock the .h so it is never changed.
Its plausible to forward declare trivial objects inside text books examples… in real life apps… different story.

LSLGuy · February 15, 2017, 1:03pm

I have no experience with C++ and my professional codebases never exceeded about 100K lines. Nor did we have to deal with multi-platform issues. So I’m asking questions here, not asserting a position.
We used C# & the MS .Net platform. One of our standards was that we compiled our standard support stuff into various “golden” dlls that were strongly named & strongly versioned. All code and all build scripts referenced the golden dlls as a run-time shared dll (assembly in .Net speak), not as compile time source. So the IDE saw the golden dlls’ exports, not the source code. As did the build system. As did the install package system. As did the actual run-time environment, whether it was test, QA, or production.

The intent was two-fold. A minor benefit was not wasting cycles recompiling support code or standard libraries. The main benefit was we wanted exactly one version of these support dlls out in the field, and did not want each build by each dev or build-box to have its own private copy that should be functionally the same, but might not be.

This did create versioned dependencies that had to be managed at the time of creating install packages. And at the point of managing the versions of every artifact deployed to each and every production system, both ours and the customers.
But ISTM you’re going to have to control and solve that versioning issue someplace up and down your chain unless 100% of your final output is a free-standing executable which depends on exactly zero external run-time libraries and interfaces only directly to the OS and nothing else. And even then when you get failure reports from the field you’ll be wondering exactly what specific compilation of the code was running on that box at that moment.
For the folks working the big complicated projects, what am I missing? Were we doing it the 1980s way or the toy-scale way or were we smartly leveraging one of the inherent features of the dev / build / runtime environment we had?

Rysto · February 15, 2017, 3:58pm

If you have a system with well-defined API boundaries, and your APIs very rarely change, then that kind of setup can work really well.

If you have a huge existing codebase with poorly defined API boundaries, and you want the freedom to be able to more easily make sweeping changes across the codebase, then establishing such a build process will be difficult, and once it’s in place you may find that it hinders your developers more than it helps.

Francis_Vaughan · February 16, 2017, 7:31am

The only reason this is a question at all is because C++ is a wreck of a language, and doesn’t support proper interface specification. Real languages have the ability to define an interface as a separate entity, and usually allow it to be pre-complied into a memory mappable symbol table, which costs approximately nothing to use during compilation. (For instance VMS supported this in about 1982 on languages DEC supplied compilers for.)

C++ just has idiotic textural includes. And it exploits them mercilessly to attempt to approximate all manner of language abstractions and features, and does them all about as well as the language name suggests. So you pay the cost of compiling thousands of lines of idiotic templated cruft every time. (And you can play my favourite game - see how many hundreds of lines of compilation errors you can create with the smallest typo)

A forward declaration is simply a matter of providing a name to an otherwise opaque reference. You may as well declare it as a char *, except that you can’t because you need two pointers inside there.

The whole argument about build environments would have been solved with the choice of an actually advanced language.

I recently had occasion to go back to a pile of C++ I had been working on a while ago. Lordy, it is astounding how poor productivity becomes wading through the mire and idiocy of that language.

<\curmudgeon mode>

leahcim · February 16, 2017, 2:47pm

Honestly, I’ve found that “curmudgeon mode” usually goes more like, “You and your fancy languages who do all the work for you! You aren’t a real programmer until you’ve programmed in C++ (or for the extra-curmudgeonly, C or Fortran) which is the language God programmed the universe in. That fancy feature you think you need? Just use these ten thousand lines of preprocessor macros and you’re good! Garbage collection is for people who forget to delete their pointers: You know, idiots!”

Topic		Replies	Views
Losery C++ Class question Factual Questions	6	748	July 23, 2001
c++ question: templates and compilation Factual Questions	19	981	September 21, 2002
Problems with GNU C++ compiler and template classes Factual Questions	6	851	August 3, 2007
Subclasses of C++ template classes Factual Questions	1	1426	February 4, 2007
Geeky question regarding C and file management Factual Questions	2	656	August 20, 2003

C++ and forward class declarations and recompilation

Related topics