LOCMAF — Low Overhead CMAF for MOQ

The shape of the compression

Big chunks → tiny deltas → identical big chunks

A typical CMAF chunk header for one sample is around ~104 bytes. With sample-level objects (every video frame, every audio frame is one MoQ object), that overhead becomes a meaningful share of the wire cost — about 25 % for low-bitrate audio, smaller for video where the media bitrate still dominates. LOCMAF observes that consecutive moof boxes are almost identical and delta-encodes them — applied to clear audio and cbcs audio the delta moof shrinks to two bytes, and the same approach extends to DRM-protected video. It targets the same low-latency CMAF-chunk workload as LL-HLS Parts and DASH Chunked CMAF — with the per-chunk moof overhead compressed away on the MoQ wire.

A row of CMAF chunks compressed into delta blocks on the wire, then decompressed back to identical CMAF chunks. — Sender packages big CMAF chunks. The wire carries tiny deltas. The receiver rebuilds byte-identical chunks for the MSE player.

MoQ ↔ CMAF

One group per segment, one object per chunk

The natural mapping. Each MoQ group starts at a random-access point (IDR for video). Inside a group, objects are delivered in order, and that ordering is what makes the delta stream possible. Audio groups are aligned to video groups so a tune-in operation can deliver both clocks at once.

CMAF segments above mapped one-to-one onto MoQ groups, with each chunk mapped to one MoQ object.

Inside a moof

Most fields are predictable or constant

Consecutive moof boxes differ only in a handful of fields. LOCMAF compresses in two stages:

tfhd against trex defaults — values that already match the moov's trex are omitted entirely.
Delta encoding within a group — the first moof per group is full; every subsequent moof carries only the fields whose value changed since the previous moof.

base_media_decode_time is derived implicitly from the previous moof's sample durations; a timeline discontinuity (splice, capture gap) re-anchors with a new full header. The mdat box header never goes on the wire — its length is implied by the MoQ object length.

Anatomy of a source moof showing which fields LOCMAF omits, derives, or emits as deltas.

Empirical results

~2.3% of CMAF wire bytes

Measured against assets/test10s/video_400kbps_avc.mp4 — a 250-fragment AVC group with one sample per fragment.

CMAF moof total

26 040 B

~104 B per fragment × 250

LOCMAF object total

592 B

19 B full + 2.3 B average delta

Compression ratio

45 : 1

on the moof headers themselves

Bar chart comparing 26 040 bytes of CMAF moof overhead to 592 bytes of LOCMAF wire bytes. — Measured with the v0.2 implementation; v0.3 keeps the same 2-byte steady state.

Timeline of one MoQ group: a full moof followed by a brief IDR-transition delta and then a long run of 2-byte steady-state delta moofs.

Per-track wire bitrate

The CMSF catalog reports the wire bitrate including framing. The moof saving per object is essentially constant (~100 B → 2 B ≈ 99.6 B/object), so the percentage saving grows as the track's bitrate shrinks — audio gains the most.

track	sample	CMAF	LOCMAF	saved
`audio_128kbps_aac`	128.0 kbps	171.5 kbps	131.9 kbps	23.1 %
`video_400kbps_avc`	373.2 kbps	396.4 kbps	376.5 kbps	5.0 %

The 128 kbps AAC track lands within ~3% of the raw sample bitrate — the remaining LOCMAF overhead is ~2 B/object × ~47 obj/s ≈ 0.75 kbps plus the MoQ object framing.

Packaging

One framing for every object kind

LOCMAF object framing: element_type vi64, properties_length vi64, properties block, mdat raw payload.

An element sequence

Every MoQ object is zero or more genBox elements, exactly one moof header (full or delta), and the raw mdat payload. Each element starts with an element_type vi64:

element_type	Symbol	Meaning
1	genBox	one generic pre-moof box (styp / prft / emsg / uuid), carried verbatim
2	locmafFullHeader	full moof header (absolute encoding)
3	locmafDeltaHeader	delta moof header (in-group deltas)

The CMAF Header (init) is delivered verbatim via the MSF / CMSF catalog — the same init-data entry a plain cmaf track uses, so the two packagings of one source can share it. Init is a one-time-per-track cost; LOCMAF compresses the per-chunk overhead.

No codepoints leave the LOCMAF payload, so nothing needs an IANA registry: the catalog's packaging: "locmaf" declaration scopes the bytes before any object arrives. Extension happens through new genBox box types and new field IDs (unknown field IDs are skipped via the parity rule); the element types themselves are fixed per locmafVersion.

Steady-state delta moof in two bytes

locmafDeltaHeader = 3

1 B

properties_length = 0

1 B

mdat raw payload …

N B

LOCMAF framing (2 B) · mdat data

Two-byte LOCMAF framing carries one whole moof worth of information. base_media_decode_time advances implicitly; per-sample fields are unchanged from the previous moof. Single sample size is calculated from total object size.

The 2-byte steady state holds for clear content and for cbcs audio. Sub-sample encryption on video adds the subsample map per fragment (~6 B steady state), and cenc additionally carries the per-sample IV — see the DRM section below for the measured impact.

Designed for protected content

DRM is end-to-end transparent

LOCMAF's primary use case is low-latency DRM-protected streaming over MoQ. The encrypted mdat bytes, per-sample IVs, subsample byte ranges, and the tenc defaults (including default_KID) are carried verbatim — so the standard CDM / MSE / EME path on the receiver works as if the content had arrived as plain CMAF. LOCMAF is invisible to the player.

Catalog DRM signaling

CMSF carries the DRM description at the catalog level; LOCMAF doesn't replace it. A contentProtections array is referenced by per-track contentProtectionRefIDs, mirroring DASH-IF IOP 6.

{
  "contentProtections": [{
    "refID": "widevine",
    "defaultKID": ["abcdef0123456789abcdef0123456789"],
    "scheme": "cbcs",
    "drmSystem": {
      "systemID": "edef8ba9-79d6-4ace-a3c8-27dcd51d21ed",
      "laURL": { "url": "https://lic.example.com/widevine", "type": "POST" },
      "pssh": "base64-pssh-box"
    }
  }],
  "initDataList": [{ "id": "v1init", "type": "base64", "data": "..." }],
  "tracks": [{
    "name": "video_400kbps_avc_drm",
    "packaging": "locmaf",
    "locmafVersion": "0.3",
    "contentProtectionRefIDs": ["widevine", "playready", "fairplay"],
    "initRef": "v1init"
  }]
}

The receiver picks the first refID whose drmSystem.systemID matches a CDM it can talk to, then uses the named pssh / laURL to set up the MediaKeySession exactly as for a plain CMSF stream.

`cenc` vs `cbcs` on the wire

scheme	per-sample IV	subsample map	extra delta-moof cost
`cenc`	per-sample, 8 or 16 B	~3 B/subsample (video)	~16 B IV + subsample bytes
`cbcs`	constant IV in `tenc`	~3 B/subsample (video)	subsample bytes only

Audio uses full-sample encryption under both schemes, so audio under cbcs costs the same on the LOCMAF wire as clear audio — the constant IV lives once in the moov and no per-fragment encryption signaling is needed. Video carries the subsample map under both schemes.

Bitrate impact under DRM

track	scheme	CMAF	LOCMAF	saved
`audio_128kbps_aac`	`cbcs`	191.4 kbps	131.9 kbps	31.1 %
`audio_128kbps_aac`	`cenc`	197.4 kbps	138.6 kbps	29.8 %
`video_400kbps_avc`	`cbcs`	408.8 kbps	378.5 kbps	7.4 %
`video_400kbps_avc`	`cenc`	412.0 kbps	382.1 kbps	7.3 %

LOCMAF saves more relative to CMAF on DRM-protected content than on clear content — the CMAF moof grows under encryption (extra senc / saio / saiz) while LOCMAF only emits what it actually needs. The receiver regenerates senc, saiz, and saio canonically, and the scheme is carried agnostically — cbc1 and cens ride the same fields.

Source-side requirements

Commensurate timescales

The 2-byte steady state requires every frame to have an integer duration in the chosen media timescale:

stream	timescale	ticks per frame
48 kHz AAC	48 000	1 024
60000/1001 fps video (NTSC)	60 000	1 001

With a mismatched timescale each frame drifts ±1 tick, and the per-sample duration array has to be sent every fragment — the 2-byte steady-state delta moof is no longer achievable.

Scope

CMAF-shaped MP4 only

CMAF (ISO/IEC 23000-19) tightens ISOBMFF: each CMAF Track carries exactly one media track, each CMAF Chunk is one moof followed by exactly one mdat (§7.3.3.2), and each traf contains exactly one trun (Table 4). These restrictions are what make LOCMAF's "one MoQ object = one moof + one mdat" mapping unambiguous — a short element sequence, and the rest of the object is mdat payload.

General fragmented MP4 may contain multiple traf / trun boxes per moof, multiple mdat boxes per fragment, or multiple tracks multiplexed into one file. LOCMAF does not address those layouts directly. Source content must be CMAF-conformant (or trivially repackaged into CMAF) before LOCMAF encoding.

Resources

Specification, code, and reading

In your browser Conformance checker Drop a .locmaf file to verify conformance, or a fragmented CMAF file to see it round-trip through LOCMAF — all client-side via WebAssembly, nothing uploaded. Presentation LOCMAF slide deck A MARP slide deck summarizing the LOCMAF design. Reference codec Eyevinn/locmaf The Go reference codec, the locmaf CLI (align, pack, dump, verify), and a byte-pinned corpus of golden conformance vectors. Reference impl Eyevinn/moqlivemock Go MoQ server and subscriber with the LOCMAF encoder and decoder. Browser player Eyevinn/warp-player Browser MoQ player with LOCMAF and EME / DRM support. Live demo moqlivemock.demo.osaas.io Public demo of the LOCMAF + DRM stack streaming over MOQT. Spec draft-einarsson-moq-locmaf The LOCMAF Internet-Draft — object encoding, CMSF catalog signaling, and canonical reconstruction. Spec draft-ietf-moq-cmsf CMAF MoQ Streaming Format — the catalog LOCMAF rides in. Spec draft-ietf-moq-loc Low Overhead Container — LOCMAF's complementary format for non-CMAF media. Spec draft-ietf-moq-transport Media over QUIC Transport — the underlying transport. Related draft-lcurley-compressed-mp4 Compressed MP4 — generic ISOBMFF box-header compression; cuts per-fragment overhead from ~100 to ~20 bytes.

Origin

LOCMAF was developed as part of Hugo Björs's KTH MSc thesis, Efficient DRM in MoQ using Low Overhead CMAF (2026), under the supervision of Torbjörn Einarsson at Eyevinn Technology.

The format is intended as a packaging encoding in the CMSF catalog and complements the Low Overhead Container (LOC) defined by the MoQ working group.

Status

The packaging is at locmafVersion "0.3", specified in draft-einarsson-moq-locmaf and advertised in the CMSF catalog Track entry whenever packaging == "locmaf". Following the June 2026 MoQ interim, LOCMAF is being folded into CMSF as a packaging mode rather than proceeding as a standalone draft.

The reference encoder and decoder live in Eyevinn/moqlivemock and the browser player in Eyevinn/warp-player; both currently implement v0.2, with the v0.3 update in progress.