C++ source code organization problems
One of my past software works was a program that had to elaborate digital images recorded by satellite instruments like the MeteoSat and ERS satellites of European Space Agency.
When I started to work on that software I received from the main company a document which told how to organize the C++ source code. This part of the documentation wasn't bad, actually it was enough detailed and clean, but the way of source code organization described in there was full of flaws.
That company was using both C++ and Java and they were trying to make something like Java packages also in the C++ code base. Their developers or designers created a way to organize C++ source code having in mind Java packages, but that way wasn't very good for a lot of reasons.
Let's see how that company was organizing C++ source code:
- Every class had to be declared in a separate header file, the title of the header file had to be equal to the name of the class.
- Classes were assigned to abstract packages, and the name of a class had to be made with the package name prefix, actually the package name followed by an underscore followed by the class inner name. In this way the class name told the package which the class belonged to, and since the file name was equal to the class name also the files were assigned to packages.
- Each package had to have a special header, called <package_name>_int.h, which contained the include directives towards all the public headers of that package and towards all the package_int.h files of other packages that were used by that package.
- Each package had to have a special header, called <package_name>_glob.h, which contained the global definitions of that package and included all the header files of the package, and included also the package_int.h file of the same package. Every cpp file of the package had to include the package_glob.h at the beginning.
- The public headers of the various packages had to be in the same directory while the private headers and the cpp files had to be in separate subdirectories having title equal to the package name.
The following scheme summarizes this way of organizing the code, the ellipses represent files, the big rectangles represent packages and the arrows represent the include directives:
These rules are apparently coherent and useful but at a slightly deeper view they result to be awkward, useless, incomplete, complicated, and with disastrous consequences on the complexity of the build process. Let's see some inadequacies of these rules:
- Having a separate header for each class is often too much because there are often groups of classes that are strictly related to each other and that can't be used separately. However separate headers reduce the length of the headers and improve readability. But if some class declarations are very short, like some template classes or some derived classes with just a little difference from the base class, it may be preferable to have them in the same header file. In Java a source file contains not only the class declaration but also its implementation, then a separate file for each class is more suitable. Furthermore in Java the source file becomes a compiled class file with the same title, this doesn't happen in C++.
- The use of package prefixes is good, and it allows also to have classes with the same inner name in different packages, however the file name is enough to tell the package of a file, so there is no need of keeping the files in separate directories for each package.
- The package_int.h file includes all the public header files of the package, so if any public header of the package is modified all the packages that use that package in some way have to be recompiled, even if they don' use the modified header at all.
- The package_glob.h file includes every header of the package and it is included by every cpp file of the package, so a change in any header causes the recompilation of all the cpp files of the package, even those which do not depend from the modified header.
- Some of the files of a package are inside a directory with the same title of the package, but not all the files, so that directory is useless if the package has to be copied entirely, or updated. When I asked to a technician what was the advantage of keeping the cpp files of a package in a separate directory while most of the header files of the same package were not stored in that directory, that technician told that it was because later you could make DLLs from each package. The only reason to put the cpp files in a separate directory is to allow a compiler command line with a wildcard filename to get all the cpp files in that directory. However, since the file names have the prefix of the package name, the cpp files of a package are also found with a wildcard expression that begins with that prefix. Having the cpp files in the package directory and the public headers outside that directory makes the copy of the entire package very difficult in terms of common file system operations. If one wants to separate the files of different packages then one has to put all the files of the package in a separate directory. Furthermore in C++ the difference between cpp and h files is small, because class declarations can have inline member functions, which are actually an implementation, and templates declarations contain both definition and implementation, i.e. statements.
In this rigid and simplistic way of organizing the code there are also problems caused by the effective dependencies between code elements, you can't simply put all the include directives for all headers of the package in a separate header file without looking at the real dependencies. For example if a class B needs the declaration of a class A to be compiled then the header of A must come before the header of B in the list. This wasn't mentioned at all in that code organization guides, they neglected this fact because actually the Java compiler resolves this problems by itself, but the C++ compiler can't do it since it passes on files only once.
For another project I developed a new way of organizing a C++ code set, still using the concept of packages but reducing the complexity of the build process at almost the minimum. This code organization is the most convenient, it provides:
- a natural package structure for classes and files
- a simple way to manage package properties
- a clear way of marking classes and files for belonging to a package
- minimal dependencies among source files
- minimal number of files that need to be recompiled when something has been modified in a header file
- minimal dependencies among packages
- faster builds
- a robust set of rules to check this way
- the possibility to make both DLLs and static libraries from the same set of source files
- simple ways to find the files of a package
- simple include directives inside source files
Furthermore this solution can be easily checked in the code, and it is possible to make tools that retrieve the information about packages, or check the files for these rules.
When a code base is very large the build time begins to be quite long, and the speed of the machine running the compiler is never enough, one has to minimize the number of files that must be recompiled when something changes in a header file.
This solution is described thoroughly in my book "C++ Physical Design", which also faces other important problems of C++ development and provides good solutions. You can download some free chapters from here.