Home char, but not char 🤔
Post
Cancel

char, but not char 🤔

When you explore a file format, you may find file signatures. A file signature is a sequence of bytes that used to verify the content of a file. A file signature may look as a sequence of “magic bytes” (unique for this type) at the beginning of a file.

There is a list of file signatures for some formats on wikipedia. Bytes at the beginning of the file are often human-readable, meaning that they encode Lating letters - for example Rar!, LZIP, OggS.

Let’s look at an example class that takes 4 bytes and then can check whether it’s the file signature of a RAR-compressed file:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
class BinarySignature {
public:
    BinarySignature(int32_t value)
        : Value_{value}
    {}

    int32_t AsInt() {
        return Value_;
    }

    bool IsRar() {
        return Value_ == 'Rar!';
    }

private:
    int32_t Value_;
};

Do you see something creepy? Yes, this is comparing an int with a multi-symbol char!

1
Value_ == 'Rar!'

The 'X' literal is supported everywhere and has the char type.

The 'XXXXX' literal has the int type, but the compiler has the right to not support literals of these types. Also this literal has the implementation-defined numerical value, that is, it’s up to the compiler to decide what int value generate from this literal.

Most of C++ compilers support the multi-symbol char literal and convert it to the int value as it were consecutive bytes of the corresponding int value.

Link to godbolt.

So you now know about this feature of C++ and its cool use-case when you work with human-readable signatures in binary files. 🙂

This post is licensed under CC BY 4.0 by the author.

Exceptions in C++ and Garbage Collection - what do they have in common? ♻️

-