Home How compilers implement NRVO?
Post
Cancel

How compilers implement NRVO?

The meaning of NRVO

If you don’t know what NRVO means, there is a nice documentation on cppreference.

The NRVO (Named Return Value Optimization) is a copy elision optimization. Copy elision is a situation where the code doesn’t call copy/move constructors because the object is being created in the place where it was supposed to be.

The RVO (Return Value Optimization) is a simpler optimization:

1
2
3
4
std::string foo1() {
    return std::string("bar");
}
std::string f = foo1();

The compiler finds out that there is no need in calling copy/move ctor, it just needs to allocate memory for std::string f on the stack and build the object in-place.

Even if the copy/move ctor has side effects, it won’t be called anyway. The same applies if the ctor was = delete or was declared without definition or is in the private section of the class - it doesn’t affect the RVO.

The RVO is an old optimization, every compiler is capable to do it, and since C++17 it is mandatory (in certain circumstances).

The NRVO optimization is a bit harder, but is not mandatory - the compiler might generate a sub-optimal code. This is a rule - if every return xxx; inside the function returns the same xxx, then the compiler may not generate calls to a copy/move ctor:

1
2
3
4
5
6
7
8
9
10
std::string foo2() {
    std::string y = "I'm a redundant string";
    std::string x = "sample text";
    if (x.size() % 2 == 0) {
        return x;
    }
    x += "xxx";
    return x;
}
std::string f = foo2(); // no copy/move was involved here

As we can see, every return returns the same value, therefore NRVO is legal. If we had a return y; somewhere or a return std::move(x);, then NRVO was not legal.

How Clang calculates whether the NRVO is legal?

Clang holds the “stack of scopes” while parsing the source file. A scope can be nested in another scope, so scopes form a “scope tree”.

There is a dedicated scope for every function, class, if-expression, for-loop, and so on. The most high-level scope is the translation unit’s scope.

A function’s scope has one of the three states related to the NRVO:

  1. There is no NRVO candidate variable.
  2. There is an NRVO candidate variable (the scope holds a pointer to it).
  3. There is more than one NRVO candidates -> NRVO is prohibited.

When the scope is fully parsed, it notifies the parent scope about its NRVO state. If after parsing the scope it comes out that there is exactly one NRVO candidate, then the NRVO works. If there is more than one NRVO candidates, then the NRVO doesn’t work.

There is the if constexpr expression since C++17, and Clang struggles to have the optimal NRVO within these expressions. The NRVO is calculated correctly for simple cases where the if-constexpr’s body is discarded because of a false compile-time value:

1
2
3
4
5
6
7
8
9
template<bool B>
std::string foo() {
    std::string y = "y";
    std::string x = "x";
    if constexpr (1 + 2 == 4) {
        return y;
    }
    return x;
}

The NRVO also calculates correctly when the body doesn’t get discarded:

1
2
3
4
5
6
7
8
9
10
template<bool B>
std::string foo() {
    std::string y = "y";
    std::string x = "x";

    if constexpr (1 + 2 == 3) {
        return y;
    }
    return x;
}

But Clang struggles with this code:

1
2
3
4
5
6
7
8
9
template<bool B>
std::string foo() {
    std::string y = "y";
    std::string x = "x";
    if constexpr (B) {
        return y;
    }
    return x;
}

The NRVO is calculated when analyzing scopes, but not particular function’s instantiations. So Clang has to treat context-dependent if constexprs as if their body will not be discarded. Therefore we get an optimal code for foo<true> and a suboptimal code for foo<false>.

Link to godbolt (code’s author is Antony Polukhin, 2021)

I have reviewed the Clang’s code and in my opinion it’s almost impossible to refactor the NRVO calculation from analyzing scopes to analyzing the AST (Abstract Syntax Tree), it’s better to spend effort to something more important. After all, the NRVO is not a mandatory optimization, so people are okay with suboptimal code for if constexprs.

This post is licensed under CC BY 4.0 by the author.

Alternative operator representations

Immutable classes for API data (a practical advice)