proxy70

The compiled Qt translation file (*.qm) is generated by Qt Linguist and
holds all the translation data that a Qt application can use for a
single language.

I have written a Kaitai Struct YAML schema for this format.

Conventions used in this document

-   When left unspecified, a number is signed.
-   When left unspecified, a string is encoded in UTF-8.
-   Strings with a defined size may contain null bytes. Strings without
    a defined size are null-terminated.

Structure

The file starts with 16 bytes of a magic header, then is structured in
blocks. The number of blocks is only determined by reading them until t
he end of the file.

    +-------+---------+---------+- - -+---------+
    | Magic | Block 1 | Block 2 |     | Block N |
    +-------+---------+---------+- - -+---------+

The magic header is as follows:

    3C B8 64 18 CA EF 9C 95 CD 21 1C BF 60 A1 BD DD

Block

    "Block:"
    +-----+--------------+----------------+
    | Tag | Block length | Block contents |
    +-----+--------------+----------------+

Tag (unsigned byte)

    One of the following:

    0x2F
        Contexts block

    0x42
        Hashes block

    0x69
        Messages block

    0x88
        Numerus Rules block

    0x96
        Dependencies block

    0xA7
        Language block

Block length (unsigned int32)
    The size of the block’s contents, measured in bytes.

Block contents
    The contents of each block depend on the tag.

There should only be one of each block tag in a single file. There
should always be a Hashes block.

Contexts block

When the QM file has been generated using lrelease -compress, the
messages in the file are compressed by their common prefixes: their
hash, their hash and context, or their hash, context and source text.
The context prefix will be stored in a hash table in the Contexts block,
and the context and source text will only be mentioned in the attributes
of the first message that has this context or source text.

This block cannot exceed 131072 bytes in size; if this limit is
exceeded, lrelease acts like -compress was not set and the contexts will
be saved in the Messages block.

    "Block contents (Contexts block):"
    +------------+--------------+
    | Hash table | Context pool |
    +------------+--------------+

Hash table

The hash table maps a hash of the context to an offset where the context
might be found in the context pool.

    "Hash table:"
    +--------+----------+----------+- - -+----------+
    | Length | Offset 1 | Offset 2 |     | Offset N |
    +--------+----------+----------+- - -+----------+

Length (unsigned int16)
    Length of the hash table.

Offset (unsigned int16)
    Offset, in bytes, within the context pool, where the context’s
    string should be seeked. Note that the context string probably will
    not be found at this offset; it will be found further away. All
    offsets should be multiples of 2. An offset of 0 means this hash
    does not exist in this file.

Note that the hash table’s size may exceed the actual amount of
contexts, resulting in many offsets being set to zero.

Context pool

    "Context pool:"
    +--------+-----------+-----------+- - -+-----------+
    | 0x0000 | Context 1 | Context 2 |     | Context N |
    +--------+-----------+-----------+- - -+-----------+

As offset 0 in the hash table means that the context does not exist in
this file, the context at offset 0 in the context pool is set to 0x0000.

Context

    "Context:"
    +--------+---------+---------+
    | Length | Context | Padding |
    +--------+---------+---------+

Length (unsigned byte)
    The length of the context string in bytes.

Context (string)
    The context name, truncated to up to 255 characters.

Padding (optional unsigned byte)
    An extra null byte (0x00) may be added to ensure the size of this
    whole block is a multiple of 2.

Hashes block

The Hashes block links hashes of source text and comment strings to
pointers to messages.

    "Block contents (Hashes block):"
    +--------+--------+- - -+--------+
    | Hash 1 | Hash 2 |     | Hash N |
    +--------+--------+- - -+--------+

Hash

    "Hash:"
    +------+--------+
    | Hash | Offset |
    +------+--------+

Hash (unsigned int32)
    A hash of the bytes represented by the concatenation of the source
    text and of the comment strings of a single message. This can be
    used for faster lookup of translations, since the source text and
    comment are defined in the source code.

Offset (unsigned int32)
    Offset, in bytes, of the start of the message designated by this
    hash, starting from the beginning of the contents of the Messages
    block.

Messages block

    "Block contents (Messages block):"
    +-------------+-------------+- - -+-------------+
    | Attribute 1 | Attribute 2 |     | Attribute N |
    +-------------+-------------+- - -+-------------+

There is no exact structure for a message: attributes should be read
into a list until an End attribute is reached, meaning all the
attributes in this list are part of the message.

Messages are usually looked up using the Hashes block first, rather than
reading through the Messages block sequentially.

Attribute

Attributes have no official name; they have been named attributes as
they are the various properties of a message.

    "Attribute:"
    +-----+--------------------+
    | Tag | Attribute contents |
    +-----+--------------------+

Tag (unsigned byte)

    One of the following:

    1.  End
    2.  Source text (UTF-16)
    3.  Translation
    4.  Context (UTF-16)
    5.  Hash (obsolete)
    6.  Source text
    7.  Context
    8.  Comment
    9.  Unknown (obsolete)

Attribute contents
    The contents of each attribute depend on the tag.

-   There should only be one Comment attribute.
-   There should only be one of either a Context or a Context (UTF-16)
    attribute.
-   There should only be one of either a Source text or a Source text
    (UTF-16) attribute.
-   There may be zero or more Translation attributes.
-   There must be one End attribute.

End attribute

Attributes with the End tag signify the end of the message. They have no
contents.

Source text (UTF-16) attribute

    "Attribute contents (Source text (UTF-16) attribute):"
    +--------+-------------+
    | Length | Source text |
    +--------+-------------+

Length (int32)
    Length of the string, in bytes. Should always be a multiple of 2,
    unless it is negative, which indicates an empty string.

Source text (UTF-16 string)
    The source text for this message. If the translations are ID-based,
    this will be the ID of this translation, and the context and comment
    will always be empty.

Translation attribute

    "Attribute contents (Translation attribute):"
    +--------+-------------+
    | Length | Translation |
    +--------+-------------+

Length (int32)
    Length of the string, in bytes. Should always be a multiple of 2,
    unless it is negative, which indicates an empty string.

Translation (UTF-16 string)
    The translated text for this message.

Context (UTF-16) attribute

    "Attribute contents (Context (UTF-16) attribute):"
    +--------+---------+
    | Length | Context |
    +--------+---------+

Length (int32)
    Length of the string, in bytes. Should always be a multiple of 2,
    unless it is negative, which indicates an empty string.

Context (UTF-16 string)
    Name of the context in which this message appears. This is usually a
    Qt class name.

Hash attribute

    "Attribute contents (Hash attribute):"
    +------+
    | Hash |
    +------+

Hash (uint32)
    Hash of the message. This is now only stored in the separate Hashes
    block.

Source text attribute

    "Attribute contents (Source text attribute):"
    +--------+-------------+
    | Length | Source text |
    +--------+-------------+

Length (unsigned int32)
    Length, in bytes, of the source text.

Source text (string)
    The source text for this message. If the translations are ID-based,
    this will be the ID of this translation, and the context and comment
    will always be empty.

Context attribute

    "Attribute contents (Context attribute):"
    +--------+---------+
    | Length | Context |
    +--------+---------+

Length (unsigned int32)
    Length, in bytes, of the context.

Context (string)
    Name of the context in which this message appears. This is usually a
    Qt class name.

Comment attribute

    "Attribute contents (Comment attribute):"
    +--------+---------+
    | Length | Comment |
    +--------+---------+

Length (unsigned int32)
    Length, in bytes, of the comment.

Comment (string)
    A comment left by the developer on this message, meant for
    disambiguation.

https://doc.qt.io/qt-6/i18n-source-translation.html#disambiguation

Unknown obsolete attribute

    "Attribute contents (Unknown obsolete attribute):"
    +------+
    | Byte |
    +------+

Byte (unknown, 1 byte)
    No definition known.

This attribute is not found in Qt 2.1.1, and can be found in Qt 2.2.0 as
“Obsolete 1”. It is now known as “Obsolete 2”, because the Hash
attribute because “Obsolete 1”. I cannot find any other versions between
those two version nombers those could tell what this attribute was for.

Numerus Rules block

Defines the rules for automatic pluralization of names in the
translation language.

    "Block contents (Numerus Rules block):"
    +------------------+------------------+- - -+------------------+
    | Rule component 1 | Rule component 2 |     | Rule component N |
    +------------------+------------------+- - -+------------------+

Rule component (unsigned byte)

    Either an integer, an arithmetic operator with optional flags, a
    logical operator or a rule separator.

    The following arithmetic operators are defined:

    0x01
        Equality operator. Followed by one integer X, means “the value
        is equal to X”.

    0x02
        Less than operator. Followed by one integer X, means “the value
        is less than to X”.

    0x03
        Less than or equal operator. Followed by one integer X, means
        “the value is less than or equal to X”.

    0x04
        Between operator. Followed by two integers X and Y, means “the
        value is between X and Y”.

    The following flags can be applied to the arithmetic operators:

    0x08
        Not.

    0x10
        Modulo 10. Get the remainder of the division of the value by 10
        before applying the operator.

    0x20
        Modulo 100. Get the remainder of the division of the value by
        100 before applying the operator.

    0x40
        Leading 1000. Meaning is unclear.

    The following logical operators are defined:

    0xFD
        And.

    0xFE
        Or.

    The logical operators apply in their order of definition; “A and B
    or C and D” means “((A and B) or C) and D”.

    Finally, the rule separator is defined:

    0xFF
        New rule.

The numerus rules are applied to a numeric value to determine whether
the name associated with this value should be pluralized. Each rule is
applied one after the other, and maps to a different pluralization form.
The amount of pluralization forms depends on the language.

With N pluralization forms, there should be N-1 rules. If the first rule
matches, then the second pluralization form is picked. If the second
rule matches, then the third pluralization form is picked. If no rule
matches, then the first pluralization form is picked – defined as the
singular form.

Dependencies block

    "Block contents (Dependencies block):"
    +--------------+--------------+- - -+--------------+
    | Dependency 1 | Dependency 2 |     | Dependency N |
    +--------------+--------------+- - -+--------------+

Dependency (string)
    The name of a file in the same directory as this one that this file
    depends on.

Language block

    "Block contents (Language block):"
    +---------------+
    | Language code |
    +---------------+

Language code (string)
    Holds the language code of the translation file.

References

-   Qt Linguist Manual
-   Writing Source Code for Translation
-   Source code of the QM reader and writer of Qt Linguist
-   Source code of the QTranslator, which reads from QM files to perform
    the translations within apps
-   The Qt archive, to examine the sources of older versions of Qt. Try
    reading src/corelib/kernel/qtranslator.cpp or
    tools/linguist/shared/qm.cpp.