A typical CMAF chunk header for one sample is around ~104 bytes.
With sample-level objects (every video frame, every audio frame is one MoQ object), that overhead becomes a meaningful share of the wire cost — about 25 % for low-bitrate audio, smaller for video where the media bitrate still dominates.
LOCMAF observes that consecutive moof boxes are almost identical and delta-encodes them — applied to clear audio
and cbcs audio the delta moof shrinks to two bytes, and the same approach extends to DRM-protected video.
It targets the same low-latency CMAF-chunk workload as LL-HLS Parts and DASH Chunked CMAF —
with the per-chunk moof overhead compressed away on the MoQ wire.
The natural mapping. Each MoQ group starts at a random-access point (IDR for video). Inside a group, objects are delivered in order, and that ordering is what makes the delta stream possible. Audio groups are aligned to video groups so a tune-in operation can deliver both clocks at once.
Consecutive moof boxes differ only in a handful of fields. LOCMAF compresses in two stages:
trex are omitted entirely.
base_media_decode_time is derived implicitly from the previous moof's sample durations.
The mdat box header never goes on the wire — its length is implied by the MoQ object length.
Measured against assets/test10s/video_400kbps_avc.mp4 — a 250-fragment AVC group with one sample per fragment.
go run ./cmd/locmaf roundtrip -verbose.The CMSF catalog reports the wire bitrate including framing. The moof saving per object is essentially constant (~100 B → 2 B ≈ 99.6 B/object), so the percentage saving grows as the track's bitrate shrinks — audio gains the most.
| track | sample | CMAF | LOCMAF | saved |
|---|---|---|---|---|
audio_128kbps_aac | 128.0 kbps | 171.5 kbps | 131.9 kbps | 23.1 % |
video_400kbps_avc | 373.2 kbps | 396.4 kbps | 376.5 kbps | 5.0 % |
The 128 kbps AAC track lands within ~3% of the raw sample bitrate —
the remaining LOCMAF overhead is ~2 B/object × ~47 obj/s ≈ 0.75 kbps plus the MoQ object framing.
Full per-track table (Opus, AC-3, HEVC, 600 / 900 kbps video) is in docs/LOCMAF.md.
| ID | Symbol | Object kind |
|---|---|---|
| 23 | LocmafFullHeader | full chunk (styp / prft / emsg / moof + mdat) |
| 25 | LocmafDeltaHeader | delta chunk (prft / emsg / moof + mdat) |
From v0.2 the moov box is no longer compressed.
The CMAF Header is delivered as raw, uncompressed bytes via the
MSF / CMSF catalog (the same initData bytes a plain
cmaf packaging track uses), so v0.2 no longer needs the
v0.1 MoovHeader = 21 object kind. Init compression is a
one-time-per-track cost; LOCMAF concentrates on the per-chunk overhead.
Values 23 and 25 sit in the public MOQT object-kind ID space and
need an IANA registration — requested by the IETF LOCMAF draft
(draft-einarsson-moq-locmaf).
Unknown header_id values are logged and skipped using
properties_length, so the format extends without breaking
older decoders.
Two-byte LOCMAF framing carries one whole moof worth of information.
base_media_decode_time advances implicitly; per-sample fields are unchanged from the previous moof.
Single sample size is calculated from total object size.
The 2-byte steady state holds for clear content and for cbcs audio.
Sub-sample encryption on video adds the subsample map per fragment (~6 B steady state),
and cenc additionally carries the per-sample IV — see the
DRM section below for the measured impact.
LOCMAF's primary use case is low-latency DRM-protected streaming over MoQ.
The encrypted mdat bytes, per-sample IVs, subsample byte ranges,
and the tenc defaults (including default_KID) are carried verbatim —
so the standard CDM / MSE / EME path on the receiver works as if the content had arrived as plain CMAF.
LOCMAF is invisible to the player.
CMSF carries the DRM description at the catalog level; LOCMAF doesn't replace it.
A contentProtections array is referenced by per-track
contentProtectionRefIDs, mirroring DASH-IF IOP 6.
{
"contentProtections": [{
"refID": "widevine",
"defaultKID": ["abcdef0123456789abcdef0123456789"],
"scheme": "cbcs",
"drmSystem": {
"systemID": "edef8ba9-79d6-4ace-a3c8-27dcd51d21ed",
"laURL": { "url": "https://lic.example.com/widevine", "type": "POST" },
"pssh": "base64-pssh-box"
}
}],
"tracks": [{
"name": "video_400kbps_avc_drm",
"packaging": "locmaf",
"locmafVersion": "0.1",
"contentProtectionRefIDs": ["widevine", "playready", "fairplay"],
"initData": "..."
}]
}
The receiver picks the first refID whose drmSystem.systemID matches a CDM it can talk to,
then uses the named pssh / laURL to set up the MediaKeySession exactly as for a plain CMSF stream.
cenc vs cbcs on the wire| scheme | per-sample IV | subsample map | extra delta-moof cost |
|---|---|---|---|
cenc | per-sample, 8 or 16 B | ~3 B/subsample (video) | ~16 B IV + subsample bytes |
cbcs | constant IV in tenc | ~3 B/subsample (video) | subsample bytes only |
Audio uses full-sample encryption under both schemes, so audio under cbcs costs the same on the LOCMAF wire as clear audio —
the constant IV lives once in the moov and no per-fragment encryption signalling is needed.
Video carries the subsample map under both schemes.
| track | scheme | CMAF | LOCMAF | saved |
|---|---|---|---|---|
audio_128kbps_aac | cbcs | 191.4 kbps | 131.9 kbps | 31.1 % |
audio_128kbps_aac | cenc | 197.4 kbps | 138.6 kbps | 29.8 % |
video_400kbps_avc | cbcs | 408.8 kbps | 378.5 kbps | 7.4 % |
video_400kbps_avc | cenc | 412.0 kbps | 382.1 kbps | 7.3 % |
LOCMAF saves more relative to CMAF on DRM-protected content than on clear content —
the CMAF moof grows under encryption (extra senc / saio / saiz) while LOCMAF only emits what it actually needs.
Full per-track tables (Opus, AC-3, HEVC, 600 / 900 kbps) are in docs/LOCMAF.md.
The 2-byte steady state requires every frame to have an integer duration in the chosen media timescale:
| stream | timescale | ticks per frame |
|---|---|---|
| 48 kHz AAC | 48 000 | 1 024 |
| 60000/1001 fps video (NTSC) | 60 000 | 1 001 |
With a mismatched timescale each frame drifts ±1 tick, and the per-sample duration array has to be sent every fragment — the 2-byte steady-state delta moof is no longer achievable.
CMAF (ISO/IEC 23000-19) tightens ISOBMFF: each CMAF Track carries exactly one media track,
each CMAF Chunk is one moof followed by exactly one mdat
(§7.3.3.2), and each traf contains exactly one trun (Table 4).
These restrictions are what make LOCMAF's "one MoQ object = one moof + one mdat" mapping unambiguous —
a varint header_id, a properties block, and the rest of the object is mdat payload.
General fragmented MP4 may contain multiple traf / trun boxes per moof,
multiple mdat boxes per fragment, or multiple tracks multiplexed into one file.
LOCMAF does not address those layouts directly. Source content must be CMAF-conformant
(or trivially repackaged into CMAF) before LOCMAF encoding.
LOCMAF was developed as part of Hugo Björs's KTH MSc thesis, Efficient DRM in MoQ using Low Overhead CMAF (2026), under the supervision of Torbjörn Einarsson at Eyevinn Technology.
The format is intended as a packaging encoding in the CMSF catalog and complements the Low Overhead Container (LOC) defined by the MoQ working group.
The wire format is at locmafVersion "0.1", advertised in the
CMSF catalog Track entry whenever packaging == "locmaf". Receivers
compare against their highest supported version and fall back if the encoder is
ahead — this covers behavioural changes inside an existing object kind, which the
header-ID skip-unknown rule on its own can't detect.
The format may still evolve based on feedback, measurement on additional codecs,
and the trajectory of CMSF and MOQT in the IETF. The reference encoder and decoder
live in Eyevinn/moqlivemock and the browser player in
Eyevinn/warp-player.