In C++, it’s very easy to make that a project takes very long time to build. In my personal top, a unit test (in the form of a single .cpp
file) used to compile for 4.5 minutes.
Fortunately, the compilation speed is debuggable. When compiling a single file, you need to specify the -ftime-trace setting:
1
2
3
4
-ftime-trace
Turn on time profiler. Generates JSON file based on output filename. Results can be analyzed with chrome://tracing or Speedscope App for flamegraph visualization.
-ftime-trace-granularity=<arg>
Minimum time granularity (in microseconds) traced by time profiler
The command could look like this:
1
clang++ main.cpp -c -ftime-trace -ftime-trace-granularity=50
The resulting main.json
file can be visualized on 🔬Speed Scope. (There is a GIF on its github presenting an example of visualisation).
What does the compiler do:
- ⚙️ Before starting compilation, if the
-ftime-trace
setting is set, clang will call the llvm::timeTraceProfilerInitialize method. - ⚙️ In this method, an object of the llvm::TimeTraceProfiler is initialized.
- ⚙️ When an event starts, you need to call the llvm::TimeTraceProfiler::begin method to remember the starting time.
- ⚙️ When the event ends, you need to call the llvm::TimeTraceProfiler::end method to complete an event record.
- ⚙️ As you can see from the code, they use a stack because the events are nested inside each other (for example, there can be a “parse class” event inside a “file compilation” event).
- ⚙️ After compiling the file, the llvm::TimeTraceProfiler::write method is called to create a json file.
By default, the -ftime-trace-granularity
parameter equals to 500
(500 microseconds). Not all events are recorded, but only sufficiently “long” ones that lasted longer than 500µs - the code that checks it.
In the code, the necessary methods are not called “manually” - the standard RAII idiom is used in the form of the llvm::TimeTraceScope structure. As you can see, at the moment of calling the constructor “the event begins”, calling the destructor “the event ends”. (if compilation was invoked without the -ftime-trace
flag, then this object does nothing)
There is an example: this is how the time for template instantiation is measured (which occurs after parsing the file): PerformPendingInstantiations.
While templates are being instantiated, all sorts of “nested” events are recorded, for example, InstantiateFunction.
That’s how the compiler supports the flame graphs in a simple way 🙂
In my experience of observing the compilation speed, the “frontend” of the compiler (which parses a file in AST) takes 3-20 times longer than the “backend” (which translates the AST to LLVM IR, makes optimizations, and translaties to binary). The main reason for this imbalance is the huge size of the source file after all #include
s are done (this is true for almost all modern C++ projects).
Based on this data, it becomes clear what needs to be corrected to speed up compilation. However, speeding up the compilation is a completely different story…