So I am trying to recommend a best practice here. I have a vague feeling that using this technique to reduce recompilation is just maintaining a practice that was only relevant when compilers were much slower, or that modern compiler optimizations are smart enough to save on recompilation even with #includes or something.
Is this (using forward declarations instead of #includes) still a worthwhile practice?
Forward declarations are absolutely a best practice and I’m flabbergasted that Google’s style guide suggests otherwise. At my previous employer, one of the senior people would periodically go through the (very large) C++ codebase and replace unnecessary #includes with forward declarations. It was incredibly tedious work, but he was able to wring impressive compile time improvements by doing this. C++ compilers are not fast because the language is not a simple one. Unnecessary #includes in header files can very quickly add significant time to the overall compile time as the same class definition is unnecessarily compiled over and over and over again by files that never needed to use the class in the first place.
Although I’m sure there are cases where replacing #includes with forward declarations is worth it for compiler performance reasons, doing so by default seems like premature optimization. For most code, compilation time doesn’t dominate, and spending developer time making things compile faster is often a waste.
But note that style guides aren’t necessarily about efficiency, they’re about conformity and ease of reading. And style is subjective, so you’re not necessarily going to get good answers from the peanut gallery here. The important thing is that your whole team uses a consistent style. So, if you’re working at Google, or with a codebase that uses that style, use it.
Sometimes the preferred style is not the most efficient one. That’s ok. Saving humans time in understanding code because it’s all the same style is well worth a few trillion extra clock cycles at compile time.
Because, of course, you’ve got a nightly autobuild system doing your compiling for you anyway, right? So the only time you’ve got wait for that file to be recompiled because the header changed is when you changed the header in your working checkout. If you don’t already have this, the gains of a few minutes writing a cron script are going to dwarf what you’d get from going and changing all your files to forward declare. Pick the low-hanging performance fruit first.
There are more reasons to do this beside compiler speed. If you’re using a pointer to an object as an opaque token, it’s good to hide all implementation details of the class. For example:
class Foo;
std::shared_ptr<Foo> getFoo();
bool runFoo(std::shared_ptr<Foo> foo);
In a case like this, it ensures that the calling code doesn’t inappropriately access the members of the Foo that came from getFoo. Only the code that really needs to use the internals of Foo needs to include foo.h.
A few hours? You must be joking. For a C++ codebase large enough where this would be worth the time to set up, integrating this kind of thing into your build system is a large investment. I’m not saying that it wouldn’t be worthwhile, but this is not something you can hack together overnight.
We do it better than nightly – we have builds running after each check in on various platforms. But this would still help developers doing local builds in principle.
Still, I’m leaning towards not doing this as a matter of policy. In most cases the benefit in build times is slight compared to the cost in confusion.
As always, readability can be improved by useful comments. Something like
// #include "Foo.h"
class Foo;
makes clear what the intent of the code is, while avoiding the unnecessary dependency. Also, there’s sometimes “chicken-and-egg” problems where classes refer to each other and they can’t simply include each other’s headers without a forward declaration somewhere.
That said, I rarely see forward declarations. Instead, there’s small interface classes and header files that take care of things like that, while the complicated headers are used only by the code that implements the complicated stuff. If you’re inclined to use a forward declaration, it means you should be writing a small “helper” header instead, which may be nothing more that a forward declaration.
I’m all for efficiency, but today with modern hardware how much time is really being spent by reducing compile time? Certainly it is a lot of time spent to go through and make changes in existing production code.
It takes a significant amount of time to build the system to build the objects for every active branch and store them away where every developer’s build can find them, set up a system by way old objects can be discarded (without opening up race condition that lead to failed builds), and then extending your build system to correctly determine with a 0% false positive rate whether a cached object can be used, taking into account compiler versions, target architecture and version, compiler flags and header files. Then you need to prove that this system actually leads to decreased build times for normal developer use-cases. This may require restructuring your code and/or build system. Reducing header file dependencies using forward includes can be critical to such a scheme, as unnecessarily included files will cause your caching system to incorrectly conclude that files need to be rebuilt when such headers are changed.
Furthermore, as your code base grows and you scale your developer team up, you’ll find your system needing to support more and more development branches in parallel. Reducing compile times will reduce the rate at which you have to add more build machines to support more branches, and also helps people who want to test out wide-ranging changes like new compilers, changing compiler flags or new build targets.
At the job I’m speaking off the C++ codebase easily numbered in the millions of lines; compiling the lot could take 40 minutes. Multiply that by 100-200 developers, all of whom work on 2-5 branches each, and getting that 40 minutes down to 30 or 20 minutes is a massive savings for the organization, both in expensive developer time and sheer build hardware requirements.
With that much code, does the 100-200 developers have to compile the entire thing (millions of lines) each time? Seems like there are more efficient ways to do development than making a line of code change, and compiling millions of lines of code, even if the 40 minutes were reduced to 20 minutes.
And then there is Software QA, who would have to test the entire thing again, and perhaps develop new tests for the legacy production code. Because you never know what could be broken or what impact it might have elsewhere.
Well yeah, that’s a completely necessary technique to prevent compile errors. The problem comes when you get hundreds or even thousands of source files (.cpp) all indirectly including the same header, despite none of them actually needing the definition. This can add up surprisingly quickly, especially given the multiplicative effect that can arise: if Foo.cpp includes A.h which includes 3 header files, and those 3 header files all include 3 header files, you very quickly find your self including a truly enormous amount of unnecessary definitions in every file.
Yes, those are all problems when you have a large code base, but there’s really no getting around it. Large, complicated products will have large and complicated codebases.
And certainly improving the speed of the overall build is not the only initiative that the company ever took to make it easier of developers to work with the code. There was an excellent automated system test infrastructure available to everyone. Unit testing was used extensively. Jenkins was used to build feature branches and automatically run both the unit tests and a subset of the automated tests. I myself, along with a colleague, rewrote the build system from the ground-up to make it possible to build only a specified subset of the system (this brought build times for common operations down to about 5-10 minutes). But in the end you need to be sure that you haven’t broken the build entirely because some component you never heard of depended on an API that you modified, and the only way to do that would be to compile everything. Now, you often a developer would offload that responsibility to Jenkins, but there was real value in reducing build times to slow the growth in demand for new build servers in the build farm.
Its the conditional #ifdef in the .h files that google are worried about.
Forward declare is all right for writing code for a single platform, but google are writing it for many platforms. They want to know that platform specific stuff (as chosen by all those #ifdef 's ) didn’t break the build for any other platform. An edit to the .h might have moved something wrongly into , or out of, a platform specific part.
Also, the substance of the object might be changed by the .h …
If you use forward declare, you have to lock the .h so it is never changed.
Its plausible to forward declare trivial objects inside text books examples… in real life apps… different story.
I have no experience with C++ and my professional codebases never exceeded about 100K lines. Nor did we have to deal with multi-platform issues. So I’m asking questions here, not asserting a position.
We used C# & the MS .Net platform. One of our standards was that we compiled our standard support stuff into various “golden” dlls that were strongly named & strongly versioned. All code and all build scripts referenced the golden dlls as a run-time shared dll (assembly in .Net speak), not as compile time source. So the IDE saw the golden dlls’ exports, not the source code. As did the build system. As did the install package system. As did the actual run-time environment, whether it was test, QA, or production.
The intent was two-fold. A minor benefit was not wasting cycles recompiling support code or standard libraries. The main benefit was we wanted exactly one version of these support dlls out in the field, and did not want each build by each dev or build-box to have its own private copy that should be functionally the same, but might not be.
This did create versioned dependencies that had to be managed at the time of creating install packages. And at the point of managing the versions of every artifact deployed to each and every production system, both ours and the customers.
But ISTM you’re going to have to control and solve that versioning issue someplace up and down your chain unless 100% of your final output is a free-standing executable which depends on exactly zero external run-time libraries and interfaces only directly to the OS and nothing else. And even then when you get failure reports from the field you’ll be wondering exactly what specific compilation of the code was running on that box at that moment.
For the folks working the big complicated projects, what am I missing? Were we doing it the 1980s way or the toy-scale way or were we smartly leveraging one of the inherent features of the dev / build / runtime environment we had?
If you have a system with well-defined API boundaries, and your APIs very rarely change, then that kind of setup can work really well.
If you have a huge existing codebase with poorly defined API boundaries, and you want the freedom to be able to more easily make sweeping changes across the codebase, then establishing such a build process will be difficult, and once it’s in place you may find that it hinders your developers more than it helps.
The only reason this is a question at all is because C++ is a wreck of a language, and doesn’t support proper interface specification. Real languages have the ability to define an interface as a separate entity, and usually allow it to be pre-complied into a memory mappable symbol table, which costs approximately nothing to use during compilation. (For instance VMS supported this in about 1982 on languages DEC supplied compilers for.)
C++ just has idiotic textural includes. And it exploits them mercilessly to attempt to approximate all manner of language abstractions and features, and does them all about as well as the language name suggests. So you pay the cost of compiling thousands of lines of idiotic templated cruft every time. (And you can play my favourite game - see how many hundreds of lines of compilation errors you can create with the smallest typo)
A forward declaration is simply a matter of providing a name to an otherwise opaque reference. You may as well declare it as a char *, except that you can’t because you need two pointers inside there.
The whole argument about build environments would have been solved with the choice of an actually advanced language.
I recently had occasion to go back to a pile of C++ I had been working on a while ago. Lordy, it is astounding how poor productivity becomes wading through the mire and idiocy of that language.
Honestly, I’ve found that “curmudgeon mode” usually goes more like, “You and your fancy languages who do all the work for you! You aren’t a real programmer until you’ve programmed in C++ (or for the extra-curmudgeonly, C or Fortran) which is the language God programmed the universe in. That fancy feature you think you need? Just use these ten thousand lines of preprocessor macros and you’re good! Garbage collection is for people who forget to delete their pointers: You know, idiots!”