LOCMAF
LOCMAF — Low Overhead CMAF for MOQ
A new, compact wire format for streaming CMAF over MOQ Transport
including DRM-protected content. Sample-level objects with a delta moof as small as 2 bytes, reconstructed into functionally lossless CMAF chunks at the receiver.
Open the slide deck → Spec GitHub Live demo
The shape of the compression

Big chunks → tiny deltas → identical big chunks

A typical CMAF chunk header for one sample is around ~104 bytes. With sample-level objects (every video frame, every audio frame is one MoQ object), that overhead becomes a meaningful share of the wire cost — about 25 % for low-bitrate audio, smaller for video where the media bitrate still dominates. LOCMAF observes that consecutive moof boxes are almost identical and delta-encodes them — applied to clear audio and cbcs audio the delta moof shrinks to two bytes, and the same approach extends to DRM-protected video. It targets the same low-latency CMAF-chunk workload as LL-HLS Parts and DASH Chunked CMAF — with the per-chunk moof overhead compressed away on the MoQ wire.

A row of CMAF chunks compressed into delta blocks on the wire, then decompressed back to identical CMAF chunks.
Sender packages big CMAF chunks. The wire carries tiny deltas. The receiver rebuilds byte-identical chunks for the MSE player.
MoQ ↔ CMAF

One group per segment, one object per chunk

The natural mapping. Each MoQ group starts at a random-access point (IDR for video). Inside a group, objects are delivered in order, and that ordering is what makes the delta stream possible. Audio groups are aligned to video groups so a tune-in operation can deliver both clocks at once.

CMAF segments above mapped one-to-one onto MoQ groups, with each chunk mapped to one MoQ object.
Inside a moof

Most fields are predictable or constant

Consecutive moof boxes differ only in a handful of fields. LOCMAF compresses in two stages:

base_media_decode_time is derived implicitly from the previous moof's sample durations. The mdat box header never goes on the wire — its length is implied by the MoQ object length.

Anatomy of a source moof showing which fields LOCMAF omits, derives, or emits as deltas.
Empirical results

~2.3% of CMAF wire bytes

Measured against assets/test10s/video_400kbps_avc.mp4 — a 250-fragment AVC group with one sample per fragment.

CMAF moof total
26 040 B
~104 B per fragment × 250
LOCMAF object total
592 B
19 B full + 2.3 B average delta
Compression ratio
45 : 1
on the moof headers themselves
Bar chart comparing 26 040 bytes of CMAF moof overhead to 592 bytes of LOCMAF wire bytes.
Re-runnable via go run ./cmd/locmaf roundtrip -verbose.
Timeline of one MoQ group: a full moof followed by a brief IDR-transition delta and then a long run of 2-byte steady-state delta moofs.

Per-track wire bitrate

The CMSF catalog reports the wire bitrate including framing. The moof saving per object is essentially constant (~100 B → 2 B ≈ 99.6 B/object), so the percentage saving grows as the track's bitrate shrinks — audio gains the most.

tracksampleCMAFLOCMAFsaved
audio_128kbps_aac128.0 kbps171.5 kbps131.9 kbps23.1 %
video_400kbps_avc373.2 kbps396.4 kbps376.5 kbps5.0 %

The 128 kbps AAC track lands within ~3% of the raw sample bitrate — the remaining LOCMAF overhead is ~2 B/object × ~47 obj/s ≈ 0.75 kbps plus the MoQ object framing. Full per-track table (Opus, AC-3, HEVC, 600 / 900 kbps video) is in docs/LOCMAF.md.

Wire format

One framing for every object kind

LOCMAF object framing: header_id varint, properties_length varint, properties block, mdat raw payload.

Top-level object IDs

IDSymbolObject kind
23LocmafFullHeaderfull chunk (styp / prft / emsg / moof + mdat)
25LocmafDeltaHeaderdelta chunk (prft / emsg / moof + mdat)

From v0.2 the moov box is no longer compressed. The CMAF Header is delivered as raw, uncompressed bytes via the MSF / CMSF catalog (the same initData bytes a plain cmaf packaging track uses), so v0.2 no longer needs the v0.1 MoovHeader = 21 object kind. Init compression is a one-time-per-track cost; LOCMAF concentrates on the per-chunk overhead.

Values 23 and 25 sit in the public MOQT object-kind ID space and need an IANA registration — requested by the IETF LOCMAF draft (draft-einarsson-moq-locmaf). Unknown header_id values are logged and skipped using properties_length, so the format extends without breaking older decoders.

Steady-state delta moof in two bytes

MoofDeltaHeader = 25
1 B
properties_length = 0
1 B
mdat raw payload …
N B
LOCMAF framing (2 B)  ·  mdat data

Two-byte LOCMAF framing carries one whole moof worth of information. base_media_decode_time advances implicitly; per-sample fields are unchanged from the previous moof. Single sample size is calculated from total object size.

The 2-byte steady state holds for clear content and for cbcs audio. Sub-sample encryption on video adds the subsample map per fragment (~6 B steady state), and cenc additionally carries the per-sample IV — see the DRM section below for the measured impact.

Designed for protected content

DRM is end-to-end transparent

LOCMAF's primary use case is low-latency DRM-protected streaming over MoQ. The encrypted mdat bytes, per-sample IVs, subsample byte ranges, and the tenc defaults (including default_KID) are carried verbatim — so the standard CDM / MSE / EME path on the receiver works as if the content had arrived as plain CMAF. LOCMAF is invisible to the player.

Catalog DRM signalling

CMSF carries the DRM description at the catalog level; LOCMAF doesn't replace it. A contentProtections array is referenced by per-track contentProtectionRefIDs, mirroring DASH-IF IOP 6.

{
  "contentProtections": [{
    "refID": "widevine",
    "defaultKID": ["abcdef0123456789abcdef0123456789"],
    "scheme": "cbcs",
    "drmSystem": {
      "systemID": "edef8ba9-79d6-4ace-a3c8-27dcd51d21ed",
      "laURL": { "url": "https://lic.example.com/widevine", "type": "POST" },
      "pssh": "base64-pssh-box"
    }
  }],
  "tracks": [{
    "name": "video_400kbps_avc_drm",
    "packaging": "locmaf",
    "locmafVersion": "0.1",
    "contentProtectionRefIDs": ["widevine", "playready", "fairplay"],
    "initData": "..."
  }]
}

The receiver picks the first refID whose drmSystem.systemID matches a CDM it can talk to, then uses the named pssh / laURL to set up the MediaKeySession exactly as for a plain CMSF stream.

cenc vs cbcs on the wire

schemeper-sample IVsubsample mapextra delta-moof cost
cencper-sample, 8 or 16 B~3 B/subsample (video)~16 B IV + subsample bytes
cbcsconstant IV in tenc~3 B/subsample (video)subsample bytes only

Audio uses full-sample encryption under both schemes, so audio under cbcs costs the same on the LOCMAF wire as clear audio — the constant IV lives once in the moov and no per-fragment encryption signalling is needed. Video carries the subsample map under both schemes.

Bitrate impact under DRM

trackschemeCMAFLOCMAFsaved
audio_128kbps_aaccbcs191.4 kbps131.9 kbps31.1 %
audio_128kbps_aaccenc197.4 kbps138.6 kbps29.8 %
video_400kbps_avccbcs408.8 kbps378.5 kbps7.4 %
video_400kbps_avccenc412.0 kbps382.1 kbps7.3 %

LOCMAF saves more relative to CMAF on DRM-protected content than on clear content — the CMAF moof grows under encryption (extra senc / saio / saiz) while LOCMAF only emits what it actually needs. Full per-track tables (Opus, AC-3, HEVC, 600 / 900 kbps) are in docs/LOCMAF.md.

Source-side requirements

Commensurate timescales

The 2-byte steady state requires every frame to have an integer duration in the chosen media timescale:

streamtimescaleticks per frame
48 kHz AAC48 0001 024
60000/1001 fps video (NTSC)60 0001 001

With a mismatched timescale each frame drifts ±1 tick, and the per-sample duration array has to be sent every fragment — the 2-byte steady-state delta moof is no longer achievable.

Scope

CMAF-shaped MP4 only

CMAF (ISO/IEC 23000-19) tightens ISOBMFF: each CMAF Track carries exactly one media track, each CMAF Chunk is one moof followed by exactly one mdat (§7.3.3.2), and each traf contains exactly one trun (Table 4). These restrictions are what make LOCMAF's "one MoQ object = one moof + one mdat" mapping unambiguous — a varint header_id, a properties block, and the rest of the object is mdat payload.

General fragmented MP4 may contain multiple traf / trun boxes per moof, multiple mdat boxes per fragment, or multiple tracks multiplexed into one file. LOCMAF does not address those layouts directly. Source content must be CMAF-conformant (or trivially repackaged into CMAF) before LOCMAF encoding.

Resources

Specification, code, and reading

Origin

LOCMAF was developed as part of Hugo Björs's KTH MSc thesis, Efficient DRM in MoQ using Low Overhead CMAF (2026), under the supervision of Torbjörn Einarsson at Eyevinn Technology.

The format is intended as a packaging encoding in the CMSF catalog and complements the Low Overhead Container (LOC) defined by the MoQ working group.

Status

The wire format is at locmafVersion "0.1", advertised in the CMSF catalog Track entry whenever packaging == "locmaf". Receivers compare against their highest supported version and fall back if the encoder is ahead — this covers behavioural changes inside an existing object kind, which the header-ID skip-unknown rule on its own can't detect.

The format may still evolve based on feedback, measurement on additional codecs, and the trajectory of CMSF and MOQT in the IETF. The reference encoder and decoder live in Eyevinn/moqlivemock and the browser player in Eyevinn/warp-player.