In this lesson I will go through different ways of adding or merging audio and video using FFMPEG. The examples are chosen to be as simple to understand as possible, even maybe there are more synthetic FFMPEG commands to use.

Adding audio to video is a common task in video automation processes, and something that is very easy to do with FFMPEG.

In all examples I will assume you start from:

Add audio to video with FFMPEG

Merge audio and video

One common and very simple operation is to merge audio and video with FFMPEG. If you have a video file with no audio and a separate audio file, you can simply do:

ffmpeg \
    -i video1.mp4 -i audio1.mp3 \
    -c:v copy \
    -map 0:v -map 1:a \
    -y output.mp4 

This command merges video1.mp4's video stream (-map 0:v) and audio1.mp3's audio stream (-map 1:a) into a final output.mp4.

To learn more about how to map the streams, read Mapping streams with FFMPEG lesson.

Replace audio in video

In this example, I use the same command above to replace the video file audio track (in blue) with the audio track of the audio file (red). The resulting video has the green video track and the red audio track:

video file
ffmpeg \
    -i video1.mp4 -i audio1.mp3 \
    -c:v copy \
    -map 0:v -map 1:a \
    -y output.mp4 

In this example I'm adding the audio to a MP4 video, but you can apply the same approach to any video format, as an input or as an output.

The input video (60s) is longer than the input audio (32s), and the resulting video is as long as the longest input (in this case, the video, 60s).

Shorten the video to the audio length

If we want the video to be trimmed to last the duration of the audio (the shortest of the two inputs), I must add the -shortest parameter:

ffmpeg \
    -i video1.mp4 -i audio1.mp3 \
    -c:v copy \
    -map 0:v -map 1:a \
    -shortest \
    -y output.mp4 

Combine two audio inputs into one video

Simple mix of two audios into a video

Now, I will combine both audio tracks into one, so the voice will play over the music.

video file
ffmpeg \
    -i video1.mp4 -i audio1.mp3 \
    -c:v copy \
    -filter_complex " \
        [0:a][1:a] amix=inputs=2:duration=longest [audio_out] \
        " \
    -map 0:v -map "[audio_out]" \
    -y output.mp4 

For mixing the audio streams I'm using a FFMPEG filter (amix). So I'm telling amix to accept 2 inputs (0:a and 1:a), combine them together into an output stream called audio_out, with the duration of the longest input.

Afterwards, I'm mapping the video stream 0:v and the mixed audio stream audio_out into the final output video.

As we are not modifying the video stream, we can use -c:v copy to just copy the track from the source to the output, avoiding re-encoding and speeding the process.

Mix two audio inputs into a video adjusting volume

Same case above, but now, I will reduce music volume to emphatise the voice:

ffmpeg \
    -i video1.mp4 -i audio1.mp3 \
    -filter_complex " \
        [0:a] volume=0.5 [music];
        [music][1:a] amix=inputs=2:duration=longest [audio_out] \
        " \
    -map 0:v -map "[audio_out]" \
    -y output.mp4 

The first filter takes audio of input #0 (0:a), reduces the volume to the half and stores it into a temporary stream called music. The second filter, takes the music and audio1.mp3, mixing them together.

Mix two audio inputs into a video with delay

In the examples above, the voice starts at the very beginning. If I want to delay the start of the voice-over audio, I need to add a delay:

ffmpeg \
    -i video1.mp4 -i audio1.mp3 \
    -filter_complex " \
        [1:a] adelay=2100|2100 [voice]; \
        [0:a][voice] amix=inputs=2:duration=longest [audio_out] \
        " \
    -map 0:v -map "[audio_out]" \
    -y output.mp4 

To do this, I just need to add a adelay filter before the amix specifying the delay (in milliseconds) I want to add. The two values 2100|2100 mean 2,100 milliseconds for each channel (left and right).

Merge audio and video

FFMPEG has three different but similar filters for merging audio and video: amix, amerge and join.

We used amix above for mixing the 2 audio tracks into one, making the voice-over to play along with the background music. On the contrary, amerge and join can combine the 2 audio tracks into one multi-channel track, for example the left and right channels of a stereo track.

video file
ffmpeg \
    -i video1.mp4 -i audio1.mp3 \
    -filter_complex " \
        [0:a][1:a] amerge=inputs=2 [audio_out] \
        " \
    -map 0:v -map "[audio_out]" \
    -y output.mp4 

Read the FFMPEG documentation to better understand the differences.

Add a silent audio stream to video

If we have a video file with only a video stream, or a with a soundtrack but we need to just include an empty audio track, we can do this:

video file
ffmpeg \
    -i video1.mp4 -f lavfi -i anullsrc \
    -c:v copy \
    -shortest \
    -map 0:v -map 1:a \
    -y output.mp4 

The -f lavfi -i anullsrc generates a virtual audio source with silence with infinite length. That is why it's important to specify -shortest to limit the output duration to the video stream duration. If not, an infinite output file would be created.

Add audio to video at specific time

If you want to add an audio stream at a specific time in the video, you can use the adelay filter as well:

video file
ffmpeg \
    -i video1.mp4 -i audio1.mp3 \
    -filter_complex " \
        [1:a] adelay=10000|10000 [voice]; \
        [0:a][voice] amix=inputs=2:duration=longest [audio_out] \
        " \
    -map 0:v -map "[audio_out]" \
    -y output.mp4 

In the example above, the audio is added at the 10th second of the video (10,000 milliseconds).

Add multiple audio tracks to a video

Add a second audio track to video

In this example, I will add the audio1.mp3 as a second audio track to the video file.

video file

The output file will have 2 different audio tracks that can be played separately. If you use a player like VLC, you are able to choose what track to play.

The command below is very similar to previous examples with the difference that we replace -map 0:v with -map 0.

This slight change means that instead of only "mapping" the input #0's video track (-map 0:v), we are mapping all tracks from input #0 (-map 0), so the output has 2 tracks coming from input #0 and 1 track coming from input #1.

ffmpeg \
    -i video1.mp4 -i audio1.mp3 \
    -c:v copy \
    -map 0 -map 1:a \
    -y output.mp4 

The HTML5 player below can't select the audio track to play, so it plays the first track (blue):

Loop audio to the length of video

If we want to loop an audio stream over a video we can use the -stream_loop -1 parameter before the desired input:

ffmpeg \
    -i video1.mp4 -stream_loop -1 -i audio2.mp3 \
    -c:v copy \
    -shortest \
    -map 0:v -map 1:a \
    -y output.mp4

The -stream_loop -1 parameter loops the audio2.mp3 stream infinitely, so it repeats forever. Consequently, we must speficy the -shortest parameter to set the length of the output to the video duration.

Create a multilingual video file

If you want to distribute a single video file in multiple languages, you must use a multi-channel video file. Each language is set in a different track or stream and the user can choose what language to hear.

multilingual video

In the example below, I'm creating a video file with 3 different languages: English, Italian and Spanish.

ffmpeg \
    -i video1.mp4 -i english.mp3 -i italian.mp3 -i spanish.mp3 \
    -c:v copy \
    -map 0:v -map 1:a:0 -map 2:a:0 -map 3:a:0 \
    -y output.mp4

The result is a video with 3 streams, one for each language:

multilingual video file

Final thoughts on adding audio to video with FFMPEG

You can face several situations when adding audio to video, from replacing completely the audio track, mixing an audio track over the existing audio or adding an additional audio stream in a multi-channel video.

In this example I summarized the most common use cases with clear and simple examples you can easily reuse in your projects.

If you have any use case you would like me to add, please drop us an message.

Sources

Published on January 9th, 2022

Author
David Bosch
David Bosch David is an experienced engineer, now collaborating with JSON2Video.