Understanding how video files work

Storing a video inside a file is much more challenging that storing text or HTML. The volume of data "describing" a video is several orders of magnitude larger than text, so it requires to be optimised and compressed to make the video file usable. Video data is usually described by a sequence of images, each of which includes the color of every pixel in the image. This sequence is typically compressed by recording only the changes from one frame to the next. Compression helps to reduce storage and bandwidth needs for video data.

Video usually includes one or more audio tracks and these video and audio tracks need to be in sync to be played together. Videos can include subtitles and other metadata as well, that use complete different formats but need to be in sync as well.

We can recap saying that video files have 2 basic requirements:

These elements are usually called streams or tracks.

Containers

To solve the first requirement, video files are defined as containers. This means that a video file can contain other elements (video tracks, audio tracks, metadata, ...) and there are many container formats, for example: MP4, MKV, MOV, etc.

So when we commonly say that a video is in MP4 format, we are referring to its container format.

Codecs

To solve the second requirement, each element inside the container is encoded (aka compressed) in a particular way.

Some codecs provide better compression (file size reduction) loosing quality (lossy codecs), and others compress the video or audio much less but do not loose any quality (lossless codecs). Some codecs can be lossy or lossless depending on the settings used. In each case, we will use the codec that better fits our needs.

Examples of codecs are: H.264, H.265, AAC, MP3, FLAC

Modifying a video file

Now we know that a video file is a container and it uses different codecs to compress its elements or streams, we can understand what are the steps to edit a video file:

  1. Demuxing: Open the container and extract the different streams
  2. Decoding: Uncompress the stream we want to modify
  3. Editing: Apply any changes to the stream
  4. Encoding: Compressing back the stream
  5. Muxing: Packing the streams into the container file

Obviously, in this process we can decide to not compress and pack the streams in the same formats they originally were, for example, if we convert a MP4 file to MKV, or if we change the audio encoding of a MP4 file from MP3 to AAC.

The process of changing the container format from one to another it's called transmuxing.

The process of changing the encoding format from one to another it's called transcoding.

When you continue working with video technology, you will probably find these concepts again and again. These concepts are the ones that are used to create, edit, and share videos.

Video compatibility

Not all containers can contain streams encoded in any format. Some containers expect their streams to be in specific or at least a short list of codecs. One of the most versatile container formats is MKV.

In the same way, not all video players can play all video files. This is because the video player must know how to open the container and how to uncompress the media element. One of the most versatile video players is VLC.

In many cases, we must make sure that our video files are compatible with the most common video players, for example for Quicktime (in MacOSX) or with the HTMl5 native video player embedded in the browsers (Chrome, Edge, Safari, Firefox). To achieve this, in some cases, we will need to add specific options.

Comparing video container formats

Each container format supports

Source: Wikipedia

So it's been 22 years since the release of any major container format (WebM), and more than 20 years since MP4/MKV were launched. On the contrary, the codecs have been improved much more recently, with updated versions every few months.

What is the best codec compression?

This is a hard question to answer, as it depends on your use case.

The combination of MP4 and H.264 is currently the most common video configuration, used by most of the webstreaming sites as it's wide compatible with most of the operating systems and devices. H.264 is also used in TV cable broadcasting and in Blu-ray disks too.

In terms of compression and optimisation, H.265 and VP9 provide much better results (even up to 50% improvement) but do not have the level of compatibility of H.264.

Self-evaluation

In FFMPEG, 0:a refers to

In FFMPEG, -map 1:v:0 refers to

Published on February 5th, 2022

Author
David Bosch
David Bosch David is an experienced engineer, now collaborating with JSON2Video.