Skip to content

5. The Compilation Process

Solutions to the Exercises

Alternative Include Mechanisms

C++ provides two different ways of including a header file in a source file. To include the C++ standard library for handling input and output on the terminal, we have used the following #include directive with angular brackets <...>.

#include <iostream>

But to get the declaration of the function gauss_sum in the last project, we have chosen to use another variant with double quotes "...".

#include "gauss_sum.hpp"

Both of these variants are very similar. If the given file could be found, they take the content of it and insert it at the exact position of the #include directive. The subtle difference of these two variants is the search algorithm that has to look for the given file. By using tags <...> in the form #include <...>, the algorithm tries to find the given file in one of the standard include paths of the compiler. By using double quotes "...", the algorithm will first try to find the file in the current directory with respect to the source file and if it could not find the given file, it looks it up in the standard include paths.

For this course, we do not want to think about two different alternatives for including files. As a consequence, in the future, we will always use #include directives with angular brackets. This will force us to specify a file path relative to some standard include path of a project. When reading source files, this makes it clearer to us what the specific dependencies of a source file are and where those files can be found. Furthermore, this inclusion scheme generalizes the usage of your own code in form of a library. Shortly, it will be explained how to add more standard include paths to the compiler.

Please note, the file extension of a header or source file is not important for the C++ compiler and can be changed arbitrarily. The standard file iostream even is a C++ header file with no additional extension. For the beginning, this may be too much freedom left to the programmer. Hence, we will stick to a typical standard naming scheme. Every source file will get the .cpp extension and every header file will get the .hpp extension.

Adding a Factorial Function

Implementing a function, that computes the factorial of a natural number, should be done the same way we have implemented the functions gauss_sum and gauss_sum_eq. Therefore we will start with the declaration in the file gauss_sum.hpp. The factorial of given natural number is again a natural number. Accordingly, the declaration should look like the following.

// gauss_sum.hpp
int gauss_sum(int n);
int gauss_sum_eq(int n);
int factorial(int n);

Now we copy the declaration into the gauss_sum.cpp source file and will add the implementation. Please note, the computation of the factorial function is extremely similar to the computation of the Gauss sum. The only difference is that instead of an addition with a starting value of zero we have to use a multiplication with a starting value of one.

// gauss_sum.cpp
int gauss_sum(int n) {
  int result = 0;
  for (int i = 1; i <= n; ++i) {
    result = result + i;
  }
  return result;
}

int gauss_sum_eq(int n) {  //
  return n * (n + 1) / 2;
}

int factorial(int n){
  int result = 1;
  for (int i = 1; i <= n; ++i) {
    result = result * i;
  }
  return result;
}

To completely fulfill the task of the exercise, we have to call the function in the main.cpp source file and output its computation to the terminal.

// main.cpp
#include <iostream>
#include "gauss_sum.hpp"

using namespace std;

int main() {
  cout << "Give me a number: \n";
  int n;
  cin >> n;

  for (int i = 1; i <= n; ++i) {
    cout << i << "! = " << factorial(i) << '\n';
  }
}

Compilation Process and Stages

Scheme of Compiler Process

Click on the image to get the PDF.

For efficiently dealing with multiple files in a project, we will now take a deeper look into the actual compilation process and the different stages that are involved when we are calling the compiler on the command line. The figure shown above demonstrates the compilation process for a project with two source files. Possibly, they are both including some other header files, but as we have said, header files are not compiled but only included.

The first stage of the compiler process is called the preprocessor. The preprocessor acts without knowledge of the C++ syntax and only interchanges text strings based on the directives beginning with #. For us, this means that every #include directive is processed by the preprocessor. For every of those directives, the preprocessor simply includes the content of the given file without knowing if it is syntactically correct or not.

After a source file has been preprocessed, the actual compiling of a source file will be done independently of every other source file. Therefore, after this step, external structures or functions which were not defined inside the current file are not yet resolved and kept as so-called symbols. The resulting files are typically called object files and already contain most of the binary code, the final executable will consist of, together with a few symbols to externally defined functions and structures.

The last stage, called the linker, then resolves all symbols left and creates the actual binary file. Please note, that only the linker needs all object files to put them together in one larger binary file. The other stages can be done in parallel.

Setting Include Paths

Using two different inclusion schemes, has proven to be error-prone in my opinion. Hence, we will strive for simplicity and only want to use the library scheme.

#include "gauss_sum.hpp"
#include <gauss_sum.hpp>

To validly transform the first directive to the second, we have to add the current path of the project directory to the standard include paths of the compiler when compiling the source files. For GCC (the C++ compiler suite that we are currently using), this can be done on the command line by appending the path to the -I flag. Because we do not use any subfolders in our project, we want to add the current directory where we are calling the compiler. Please remember, the current directory is typically referenced by a single dot ..

$ g++ -o gauss main.cpp gauss_sum.cpp -I.

From now on, every path given in an include directive has to be relative to the project root and not to the file itself.

Errors in Different Stages

Based on the knowledge of the different stages involved in the compiling process, it is now possible to analyze error messages more accurately when we as programmers have done mistakes. By determining which of the stages throws an error, it is more likely to know or at least closer guess where the origin of the respective error lies.

Manual Compiling Process

We can also get a deeper understanding of the compiling process by manually comprehending it stages on the command line. First, we will only run the preprocessor and the compiling stage for every source file to generate the respective object files by providing the flag -c.

$ ls
gauss_sum.hpp  gauss_sum.cpp  main.cpp
$ g++ -c main.cpp -I.
$ g++ -c gauss_sum.cpp
$ ls 
gauss_sum.hpp  gauss_sum.cpp  gauss_sum.o  main.cpp  main.o

To assemble the object files in a binary file, we the call the following command.

$ g++ -o gauss main.o gauss_sum.o
$ ls
gauss  gauss_sum.hpp  gauss_sum.cpp  gauss_sum.o  main.cpp  main.o
$ ./gauss
...

This process takes up a lot more work on the command line and it makes no real sense to do this kind of thing manually for small projects. But for very large projects where compiling all source files can take a few minutes up to hours, it would not be feasible to always call the simpler command g++ -o gauss main.cpp gauss_sum.cpp -I. because this would always force the compiler to recompile every source file. With the process given above, if we would change a source file, let us say gauss_sum.cpp, we would only have to recompile the gauss_sum.cpp file and then link everything again into an executable. For large projects, this idea is able to tremendously speed up the compiling process.

$ g++ -c gauss_sum.cpp
$ g++ -o gauss main.o gauss_sum.o

But even then this process is a cumbersome task. We will later see how in this case build systems can simplify our life.

Last but not least, for the more experienced reader, we can of course tweak the above compiling process by using wildcards in the command line. If you do not want to use external build systems or you have no idea about them, this variant may take you further ahead the road. However, you should have really strong reasons not to use a build system.

$ g++ -c *.cpp -I.
$ g++ -o gauss *.o
$ # Make a change to 'gauss_sum.cpp' file
$ g++ -c gauss_sum.cpp
$ g++ -o gauss *.o

Last update: October 22, 2020