- Status
- Introduction
- Normative References
- MXF-DIGEST calculation
- Equivalence
- MXF-DIGEST URN
- Bibliography
This DRAFT memo describes a proposed method for use by MXF applications. At this time the memo is open for comment. If the proposal is adopted by a sufficiently large subset of the MXF community it will then be submitted to SMPTE for publication as standard (ST) engineering document. Substantial changes to this memo may occur during this process, and implementors are cautioned against making permanent implementation or deployment decisions based on its current contents.
This document and associated reference implementation are hosted at https://github.com/cinecert/mxf-digest
SMPTE ST 377-1 Material Exchange Format (MXF) files are often very large (comprising tens or even hundreds of gigabytes), and transferring such files or even transiting them over a system bus to a CPU requires non-trivial resources. In addition, many file-based applications use message digest algorithms to uniquely identify files. The intersection of these two properties is a source of friction in many workflows, both because digesting files after writing them is resource intensive, and because the input to digest algorithms commonly utilized by these systems is inherently serial in nature while parallel processing of file contents is increasingly common. What is needed is a method of digesting large MXF files that supports both parallel calculation and out-of-order calculation.
The proposed algorithm works with any serial-input digest by using the KLV sub-structure of an MXF file as a natural segmentation layer upon which to calculate an ordered set of digests. The digest values in this ordered set can then be the subject of another digest, finally producing an identifier that is appropriately unique within the scope of the chosen digest algorithm. For maximum interoperability a single digest algorithm should be chosen for use by this process, and so this proposal selects SHA-512.
The resulting digest value must have a canonical encoding to promote interoperable use across applications. This proposal defines a URN encoding which employs a Base58 alphabet chosen for brevity and lack of conflict with URI special characters.
SMPTE ST 336:2007 — Data Encoding Protocol Using Key-Length-Value
SMPTE ST 377-1:2011 — Material Exchange Format (MXF) — File Format Specification
SMPTE ST 2029:2009 — Uniform Resource Names for SMPTE Resources
SMPTE ST 2114:2017 — Unique Digital Media Identifier (C4 ID)
IETF RFC 5234 — Augmented BNF for Syntax Specifications: ABNF
The primitive message digest algorithm shall be SHA512 as defined in ISO/IEC 10118-3.
Run-in, as defined in ST 377-1, Sec. 6.5, "Run-In Sequence", shall not be contributed to the digest. If present, run-in bytes shall be skipped before digest calculation begins.
Note to IMF implementors: There is no run-in in IMF. Use of MXF run-in is disallowed by the IMF Essence Component. This provision of the MXF-DIGEST process exists for maximum compatibility with other MXF applications.
The algorithm shall operate on complete KLV packets, as defined in SMPTE ST 336, which shall be digested in their original form. Expansion or translation of BER-encoded Key or Value Length fields shall not be performed.
- Establish an empty list of digest values
- For each KLV packet in the file:
- Instantiate a fresh Primitive Digest context (a packet digest)
- Update the packet digest context with all of the bytes comprising the KLV packet
- Finalize the packet digest context and append its value to the list of digest values
- Instantiate a fresh Primitive Digest context (the sequence digest)
- For each packet digest in the list of digest values:
- update the sequence digest context with the big-endian integer octets of the packet digest
- Finalize the sequence digest
The MXF-DIGEST value is created by encoding the sequence digest value as URN item of the form urn:smpte:mxf-digest:<b58-digits>
, where mxf-digest
is a registered NSS as defined in this document, and <b58-digits>
is the Base58 encoding of the big-endian integer octets of the sequence digest. The Base58 encoding shall be interpreted as defined in SMPTE ST 2114, Sec. 5.1 "C4 Base58".
urn:smpte:mxf-digest:64pMA4dgr8iLqAEkiMpfJv2JHLubLY9wpDUeAcr3pto3gKGsszyCqr9ofBk668EJrVNagTW7WujyYZV9YEUqCRGE
While the normative algorithm processes the KLV packets in order, it should be noted that the packet digest values may be calculated in any order, at any time, so long as they are contributed to the sequence digest completely and in the same order in which the respective KLV packets appear in the MXF file. For a given list of KLV packets, any out-of-order calculation of MXF Digest that produces a value equal to the normative algorithm presented above in MXF-DIGEST calculation is in compliance with this memo.
The NID of an MXF-DIGEST URN shall be smpte
, as defined in SMPTE ST 2029.
The NSS of an MXF-DIGEST URN shall begin with mxf-digest:
. The identifier structure for the MXF-DIGEST subnamespace (MXF-DIGEST-NSS), described using IETF RFC 5234 (EBNF), shall be:
MXF-DIGEST-NSS = "smpte:mxf-digest:" MXF-DIGEST
MXF-DIGEST = 88*B58-DIGIT
B58-DIGIT = %x31-39 / ; 1-9
%x41-48 / ; A-H
%x4a-4e / ; J-N
%x50-5a / ; P-Z
%x61-6b / ; a-k
%x6d-7a / ; m-z
The Base58 digits in the URN representation of an MXF-DIGEST shall be the Base58 representation of the Sequence Digest. Lexical equivalence of MXF-DIGEST URN values shall be determined by an exact string match that is case-sensitive for B58-DIGIT characters.
SMPTE ST 2067-5:2013 — Interoperable Master Format — Essence Component