Appending Videos in JavaScript with MediaSource Buffers

Intro
- Link to All Code Demos
Getting Dummy Data Ready
- Loading Pre-Generated Dummy Data
General Approach
Multi-File Support
- Source Buffer Modes: Segments vs Sequence
- Better Option: Streaming Formats
Research and FAQ
- Existing Tutorials and Examples
- Issues I Ran Into

Intro

In my previous post about using raw data in JavaScript, I alluded to the concept of dynamically loading data into video elements using the newer Media Source Extensions API (aka MSE) and its collection of interfaces, buffers, and sources.

I just finished brute-forcing my way through some of the basics (and I do mean basics) of MSE, and wanted to share my findings on dynamically loading data into videos, specifically by using the MediaSource interface, SourceBuffer, and the appendBuffer method. MSE is a really complicated topic, and distinctly separate from my normal area of programming, so my examples are going to be a lot more basic level than what I normally post. Codecs and math-heavy coding are just not my bread-and-butter 🤷‍♂️.

Anyways, I’m posting this because although my examples are simple, there is not nearly enough info out there on the web on how to even start with MSE / MediaSource. I’m hoping this helps someone.

Link to All Code Demos

I have coded several fully-functional and well-commented examples that use the approaches from this post to dynamically load video. These will be discussed further on, but if you want to jump right to the source code, you can find it at:

github.com/joshuatz/mediasource-append-examples

Getting Dummy Data Ready

First thing; I need some video clips that are going to work with MediaSource (see “Research” section for more info on file formats). For my demo, I’m going to stick with Webm as the container, and VP9 as the codec. I could use FFMPEG to convert videos I want to use to the right format directly, but in my case, I used Shotcut to handle both the trimming and conversion process.

If you don’t have your own files to work with, I compiled a list of resources I found as I worked on this.

💡 I was a little lost on the state of video support across different browsers, until I found this awesome writeup by @Vestride: Encoding Video for the Web. Mozilla also has a great guide. General idea is AVC-H.264/MP4+AAC offers widest compatibility, but is commercially licensed, whereas a better option with less support is VP8 or VP9 with Opus audio.

Loading Pre-Generated Dummy Data

I know that I’m going to need the video files that I want to load accessible from within my JavaScript code. This leaves me with two main options.

The first is pretty standard. Upload the video clips somewhere (or find already hosted clips), and from JavaScript, use fetch() to retrieve them via network request and get the binary blob.

The other approach is something I want to do for fun, and to reuse some concepts from my last post on binary formats. I want to store my video clips, directly in JavaScript!

To do this, first we need to get the binary data into a “stringified” format. You can’t just open a video file with Notepad and copy and paste the text into a JS file. However, Base64 encoding is a fast and easy way to store binary data:

base64 --wrap=0 sample_vid.webm > sample_vid-base64.txt

If you don’t have access to a local base64 converter, you can also use an online converter tool

Now, I can store the base64 string directly in JS:

const vidClip = `GkXfo59...`;

However, when I’m ready to append the clip into a source buffer with MediaSource, I’m going to have to convert it back into a binary blob (more on this later).

General Approach

For dynamically loading raw video data into a <video> element using the MediaSource API, the primary method we need to be focused on is the sourceBuffer.appendBuffer(buffer) method. This takes a chunk of raw data (as an ArrayBuffer) and appends it to an existing SourceBuffer instance.

However, to actually get to the point where we can use this to append to a video, there is a bunch of setup necessary, since our SourceBuffer should belong to a MediaSource instance, which in turn should be connected to a <video> through an Object URL.

The general steps to put it all together look something like this:

Get a reference to a <video> element (in the DOM)
Create a new instance of the MediaSource interface (const mediaSource = new MediaSource())
Create an Object URL that points to that raw source. Point the src attribute of the video element to it
- Example: videoElem.src = URL.createObjectURL(mediaSource)
Attach a listener to the mediaSource for the sourceopen event (or define as the callback property onsourceopen)
- We need this to fire before we can attach a buffer
- You can also check the mediaSource.readyState property, to see if it == 'open'
Once sourceopen has fired, create an instance of the SourceBuffer interface, which will hold data, and attach to to mediaSource with the addSourceBuffer method:
- Example: const sourceBuffer = mediaSource.addSourceBuffer('video/webm; codecs="vp9,opus"');
Now that we have the buffer, we can start loading data with the sourceBuffer.appendBuffer(arrayBuffer) method
Every time that we append a buffer, we also need to listen for the updateend event that will fire after the browser is done with the operation
- If we have more chunks to append, we can start a new append operation, and repeat the listener cycle over again
- If we don’t have any more chunks, we need to close out / signal the end of the stream with the mediaSource.endOfStream() method. We also might want to call videoElement.play() at this point, if the video is not set to autoplay (note: video must be muted or else this will fail due to autoplay rules).

– These exact steps are used in my “Standard” approach example

Loading...

Multi-File Support

In discussing multi-file support, it is important to make a clear distinction between multi-file video loading vs streamed “chunks”, as these often get confused. A single video file (sample.webm) can be split into multiple chunks (e.g. ArrayBuffers), but you can also have multiple video files (sample_a.webm, sample_b.webm), which each get their own buffer (and/or are split into even smaller chunks for each file).

You can use AppendBuffer for both data chunks of a single video file, or entire disparate video files.

This section is going to be discussing multi-file appends, as opposed to single-file chunk appends, as there are some special caveats that apply when appending completely separate files.

For chunks of the same file, the standard approach above for loading buffers should work just fine (as a basic example).

Source Buffer Modes: Segments vs Sequence

One of the primary caveats that apply to multi-file appends is how they are handled depending on the sourceBuffer.mode property (spec).

In sequence mode, you are forcing new appends to be treated as adjacent to the previous, regardless of timestamps held within the files themselves. This works to our advantage with multiple file appends, but would be bad if we were appending chunks in a random order and relying on their internal timestamps for placement.

– Sequence Mode Example

In sequence mode, you will also see this warning in Chrome when using it with multiple files:

Warning: using MSE ‘sequence’ AppendMode for a SourceBuffer with multiple tracks may cause loss of track synchronization. In some cases, buffered range gaps and playback stalls can occur. It is recommended to instead use ‘segments’ mode for a multitrack SourceBuffer.

In addition to the likelihood of sync bugs that the above warning is pointing out, there is also a chance that support for using sequence mode with multiple files might be deprecated in Chrome. Their recommendation would be to switch to segments.

In segments mode, the internal timestamps of the content of the buffer determine placement; this means that you can append chunks of a video in any order you want, and as long as they have “coded frames” with timestamps. This is great for handling chunks passed over a network connection, as you cannot guarantee the order in which they will be returned from the server. However, this works to our disadvantage for multiple-file appends, as relying on internal timestamps for placement makes no sense if appending multiple files as chunks. The files do not know about each other when they are encoded (why would they?), so their timestamps only describe their own chunks, not their relation to other files on the timeline. In practice, this usually means that, without tweaking your code, if you append multiple files with mode set to segments, they will overwrite each other in the buffer, and you end up with only one video getting played back.

So, to use segments mode with multiple files, you need to manage the timing offset between file appends yourself. My demo, here, shows this in action, using sourceBuffer.timestampOffset to move the pointer to where the next append call will place data.

– Segments Mode Example

Better Option: Streaming Formats

For live video feeds, or just a really robust video loading approach that can handle things like adaptive bitrate streaming, resolution switching, etc. – you probably want to switch to using a streaming video format paired with a well-supported player component.

The two most popular streaming formats are DASH (aka MPEG-DASH, Dynamic Adaptive Streaming over HTTP) and HLS (HTTP Live Streaming). There are tons of differences between these two formats, better explained by those more qualified to do so than myself. However, the general gist is that DASH is newer, but gaining in native adoption and power, whereas HLS is older and not growing as fast, but has a lot of legacy support.

These are some good starting places for learning more:

MDN: Live Streaming Web Audio and Video
Cloudflare: HLS vs DASH
Eleven-Labs: DASH & HLS

In general, these streaming formats are complicated to implement by hand, so you would want to use them with a well-known and supported player or library, such as:

Multi-format players, with extra features:
- Google’s Shaka Player
  - Supports both DASH and HLS, plus tons of other features
- Bitmovin Player (Commercial)
Format specific
- Video-Dev: hls.js
- Dash Industry Forum: dash.js

There are also paid platforms out there that handle even more of the process, such as automatic transcoding of a single high-resolution input file into multiple bitrates and resolutions, and player embed code generation

Research and FAQ

These are notes that I jotted down while learning about MSE and trying to build out my example demos.

Important questions

MediaSource vs MediaStream
- Good answer: https://stackoverflow.com/questions/51843518/mediasource-vs-mediastream-in-javascript
Specs to know about
- w3c/media-source aka MSE (Media Source Extensions API)
- w3c/media-source
What can make up a MediaSource stream? What MIME types?
- https://w3c.github.io/media-source/byte-stream-format-registry.html
  - For video
    - video/webm (https://w3c.github.io/media-source/webm-byte-stream-format.html)
      - vorbis
      - opus
      - vp8
      - vp9
      - vp90...
    - video/mp4
    - video/mp2t
What does the MediaSource pipeline look like?
- https://w3c.github.io/media-source/pipeline_model.svg
How do you format the data that you feed into the buffer (e.g. through appendBuffer())?
- See: https://w3c.github.io/media-source/index.html#byte-stream-formats
How does WEBM fit into adaptive streaming?
- See: http://wiki.webmproject.org/adaptive-streaming. It is often used with MPEG-DASH.

Existing Tutorials and Examples

Issues I Ran Into

Uncaught DOMException: Failed to execute 'appendBuffer' on 'SourceBuffer': This SourceBuffer has been removed from the parent media source
- I ran into this error in Chromium, but not in Firefox. It can very often (and in my case too) be traced to mis-matched codecs (Firefox was more forgiving then Chromium in this instance)
- For example, when I saw this error, I ended up going to the Media tab of dev tools in Chrome, and then the player I was interested in and then Messages; I finally saw: Audio stream codec opus doesn't match SourceBuffer codecs.! Aha!
  - In this case, the buffer I was trying to append did correctly use the opus codec, and opus is a supported audio type to bundled with VP9. However, I forgot to declare it as part of the addSourceBuffer call
  - I had to change mediaSource.addSourceBuffer('video/webm; codecs="vp9"') to mediaSource.addSourceBuffer('video/webm; codecs="vp9,opus"')
- I think Firefox will automatically detect codecs from the data you append, whereas Chromium requires explicit declaration (just a guess)
- If you want to check if a mimetype is supported, you can query it with MediaSource.isTypeSupported(mimeTypeStr), like MediaSource.isTypeSupported('video/webm; codecs="vp9,opus"')
Gapless sequential playback (mode = sequence) does not work in Chrome (stalls), but works in Firefox just fine (for some files, see issue below this one)
- If you are using multi-track appends (e.g. separate files as opposed to chunks / stream), this is a complex and apparently buggy issue with Chromium (media-source/#190)
  - Chrome now even logs a message warning as such: Warning: using MSE 'sequence' AppendMode for a SourceBuffer with multiple tracks may cause loss of track synchronization. In some cases, buffered range gaps and playback stalls can occur. It is recommended to instead use 'segments' mode for a multitrack SourceBuffer.
- See my notes under “Segments vs Sequence”
In Firefox, multiple appends with separate files (or even chunks) seems extremely buggy depending on the input files – with certain input files, I’m seeing extremely frequent random stalls with zero logged errors. I’m wondering if it is super picky about conformance to codecs. Or it could be evicting buffer segments way faster than it should be.
- This issue seems highly related
- I don’t think it is just me; although this issue was closed, it doesn’t appear fixed in my version of Firefox (which is actually a lot newer than when that issue was posted and then marked as fixed).
- Same issue across stable, beta, and nightly build
- RESOLUTION: At least in my demos, switching over all the input files from VP9/Opus to VP8/Vorbis fixed the segments mode demo. It could be that the files I originally picked just happened to be malformed VP9, but it also seems suspect that it kept happening with so many different VP9 files…
- Also, should be noted that although Firefox does not have the awesome chrome://media-internals/ that Chromium browsers have (or the media DevTools tab), there is the Devtools Media Panel extension, which can be used with Firefox Nightly