The compiled Qt translation file (*.qm) is generated by Qt Linguist and holds all the translation data that a Qt application can use for a single language. I have written a Kaitai Struct YAML schema for this format. Conventions used in this document - When left unspecified, a number is signed. - When left unspecified, a string is encoded in UTF-8. - Strings with a defined size may contain null bytes. Strings without a defined size are null-terminated. Structure The file starts with 16 bytes of a magic header, then is structured in blocks. The number of blocks is only determined by reading them until t he end of the file. +-------+---------+---------+- - -+---------+ | Magic | Block 1 | Block 2 | | Block N | +-------+---------+---------+- - -+---------+ The magic header is as follows: 3C B8 64 18 CA EF 9C 95 CD 21 1C BF 60 A1 BD DD Block "Block:" +-----+--------------+----------------+ | Tag | Block length | Block contents | +-----+--------------+----------------+ Tag (unsigned byte) One of the following: 0x2F Contexts block 0x42 Hashes block 0x69 Messages block 0x88 Numerus Rules block 0x96 Dependencies block 0xA7 Language block Block length (unsigned int32) The size of the block’s contents, measured in bytes. Block contents The contents of each block depend on the tag. There should only be one of each block tag in a single file. There should always be a Hashes block. Contexts block When the QM file has been generated using lrelease -compress, the messages in the file are compressed by their common prefixes: their hash, their hash and context, or their hash, context and source text. The context prefix will be stored in a hash table in the Contexts block, and the context and source text will only be mentioned in the attributes of the first message that has this context or source text. This block cannot exceed 131072 bytes in size; if this limit is exceeded, lrelease acts like -compress was not set and the contexts will be saved in the Messages block. "Block contents (Contexts block):" +------------+--------------+ | Hash table | Context pool | +------------+--------------+ Hash table The hash table maps a hash of the context to an offset where the context might be found in the context pool. "Hash table:" +--------+----------+----------+- - -+----------+ | Length | Offset 1 | Offset 2 | | Offset N | +--------+----------+----------+- - -+----------+ Length (unsigned int16) Length of the hash table. Offset (unsigned int16) Offset, in bytes, within the context pool, where the context’s string should be seeked. Note that the context string probably will not be found at this offset; it will be found further away. All offsets should be multiples of 2. An offset of 0 means this hash does not exist in this file. Note that the hash table’s size may exceed the actual amount of contexts, resulting in many offsets being set to zero. Context pool "Context pool:" +--------+-----------+-----------+- - -+-----------+ | 0x0000 | Context 1 | Context 2 | | Context N | +--------+-----------+-----------+- - -+-----------+ As offset 0 in the hash table means that the context does not exist in this file, the context at offset 0 in the context pool is set to 0x0000. Context "Context:" +--------+---------+---------+ | Length | Context | Padding | +--------+---------+---------+ Length (unsigned byte) The length of the context string in bytes. Context (string) The context name, truncated to up to 255 characters. Padding (optional unsigned byte) An extra null byte (0x00) may be added to ensure the size of this whole block is a multiple of 2. Hashes block The Hashes block links hashes of source text and comment strings to pointers to messages. "Block contents (Hashes block):" +--------+--------+- - -+--------+ | Hash 1 | Hash 2 | | Hash N | +--------+--------+- - -+--------+ Hash "Hash:" +------+--------+ | Hash | Offset | +------+--------+ Hash (unsigned int32) A hash of the bytes represented by the concatenation of the source text and of the comment strings of a single message. This can be used for faster lookup of translations, since the source text and comment are defined in the source code. Offset (unsigned int32) Offset, in bytes, of the start of the message designated by this hash, starting from the beginning of the contents of the Messages block. Messages block "Block contents (Messages block):" +-------------+-------------+- - -+-------------+ | Attribute 1 | Attribute 2 | | Attribute N | +-------------+-------------+- - -+-------------+ There is no exact structure for a message: attributes should be read into a list until an End attribute is reached, meaning all the attributes in this list are part of the message. Messages are usually looked up using the Hashes block first, rather than reading through the Messages block sequentially. Attribute Attributes have no official name; they have been named attributes as they are the various properties of a message. "Attribute:" +-----+--------------------+ | Tag | Attribute contents | +-----+--------------------+ Tag (unsigned byte) One of the following: 1. End 2. Source text (UTF-16) 3. Translation 4. Context (UTF-16) 5. Hash (obsolete) 6. Source text 7. Context 8. Comment 9. Unknown (obsolete) Attribute contents The contents of each attribute depend on the tag. - There should only be one Comment attribute. - There should only be one of either a Context or a Context (UTF-16) attribute. - There should only be one of either a Source text or a Source text (UTF-16) attribute. - There may be zero or more Translation attributes. - There must be one End attribute. End attribute Attributes with the End tag signify the end of the message. They have no contents. Source text (UTF-16) attribute "Attribute contents (Source text (UTF-16) attribute):" +--------+-------------+ | Length | Source text | +--------+-------------+ Length (int32) Length of the string, in bytes. Should always be a multiple of 2, unless it is negative, which indicates an empty string. Source text (UTF-16 string) The source text for this message. If the translations are ID-based, this will be the ID of this translation, and the context and comment will always be empty. Translation attribute "Attribute contents (Translation attribute):" +--------+-------------+ | Length | Translation | +--------+-------------+ Length (int32) Length of the string, in bytes. Should always be a multiple of 2, unless it is negative, which indicates an empty string. Translation (UTF-16 string) The translated text for this message. Context (UTF-16) attribute "Attribute contents (Context (UTF-16) attribute):" +--------+---------+ | Length | Context | +--------+---------+ Length (int32) Length of the string, in bytes. Should always be a multiple of 2, unless it is negative, which indicates an empty string. Context (UTF-16 string) Name of the context in which this message appears. This is usually a Qt class name. Hash attribute "Attribute contents (Hash attribute):" +------+ | Hash | +------+ Hash (uint32) Hash of the message. This is now only stored in the separate Hashes block. Source text attribute "Attribute contents (Source text attribute):" +--------+-------------+ | Length | Source text | +--------+-------------+ Length (unsigned int32) Length, in bytes, of the source text. Source text (string) The source text for this message. If the translations are ID-based, this will be the ID of this translation, and the context and comment will always be empty. Context attribute "Attribute contents (Context attribute):" +--------+---------+ | Length | Context | +--------+---------+ Length (unsigned int32) Length, in bytes, of the context. Context (string) Name of the context in which this message appears. This is usually a Qt class name. Comment attribute "Attribute contents (Comment attribute):" +--------+---------+ | Length | Comment | +--------+---------+ Length (unsigned int32) Length, in bytes, of the comment. Comment (string) A comment left by the developer on this message, meant for disambiguation. https://doc.qt.io/qt-6/i18n-source-translation.html#disambiguation Unknown obsolete attribute "Attribute contents (Unknown obsolete attribute):" +------+ | Byte | +------+ Byte (unknown, 1 byte) No definition known. This attribute is not found in Qt 2.1.1, and can be found in Qt 2.2.0 as “Obsolete 1”. It is now known as “Obsolete 2”, because the Hash attribute because “Obsolete 1”. I cannot find any other versions between those two version nombers those could tell what this attribute was for. Numerus Rules block Defines the rules for automatic pluralization of names in the translation language. "Block contents (Numerus Rules block):" +------------------+------------------+- - -+------------------+ | Rule component 1 | Rule component 2 | | Rule component N | +------------------+------------------+- - -+------------------+ Rule component (unsigned byte) Either an integer, an arithmetic operator with optional flags, a logical operator or a rule separator. The following arithmetic operators are defined: 0x01 Equality operator. Followed by one integer X, means “the value is equal to X”. 0x02 Less than operator. Followed by one integer X, means “the value is less than to X”. 0x03 Less than or equal operator. Followed by one integer X, means “the value is less than or equal to X”. 0x04 Between operator. Followed by two integers X and Y, means “the value is between X and Y”. The following flags can be applied to the arithmetic operators: 0x08 Not. 0x10 Modulo 10. Get the remainder of the division of the value by 10 before applying the operator. 0x20 Modulo 100. Get the remainder of the division of the value by 100 before applying the operator. 0x40 Leading 1000. Meaning is unclear. The following logical operators are defined: 0xFD And. 0xFE Or. The logical operators apply in their order of definition; “A and B or C and D” means “((A and B) or C) and D”. Finally, the rule separator is defined: 0xFF New rule. The numerus rules are applied to a numeric value to determine whether the name associated with this value should be pluralized. Each rule is applied one after the other, and maps to a different pluralization form. The amount of pluralization forms depends on the language. With N pluralization forms, there should be N-1 rules. If the first rule matches, then the second pluralization form is picked. If the second rule matches, then the third pluralization form is picked. If no rule matches, then the first pluralization form is picked – defined as the singular form. Dependencies block "Block contents (Dependencies block):" +--------------+--------------+- - -+--------------+ | Dependency 1 | Dependency 2 | | Dependency N | +--------------+--------------+- - -+--------------+ Dependency (string) The name of a file in the same directory as this one that this file depends on. Language block "Block contents (Language block):" +---------------+ | Language code | +---------------+ Language code (string) Holds the language code of the translation file. References - Qt Linguist Manual - Writing Source Code for Translation - Source code of the QM reader and writer of Qt Linguist - Source code of the QTranslator, which reads from QM files to perform the translations within apps - The Qt archive, to examine the sources of older versions of Qt. Try reading src/corelib/kernel/qtranslator.cpp or tools/linguist/shared/qm.cpp.