Multitrack recording with FFmpeg (Windows)

Introduction

To start, I'm quite pissed off. Sometimes I just want to record some audio from multiple devices in sync, losslessly. There's quite a few purposes for this. In my case it's gameplay with microphones and Discord audio all separate but in sync. Back then, software like Dxtory had the functionality to record multiple audio tracks in perfect sync. But it requires a game open, and it doesn't work with modern games. In fact, it struggles with Windows 11. There are other solutions, like OBS. But, again these include a video track. What if I just want the audio in perfect sync?

GeForce Experience provides minor multitrack support. But it does it with AAC and only allows one additional track. Its options are very limited, to the point I complain about it in "Grind Series: Quantity without compromising Quality". Audacity was called a "Multitrack recorder" at one point. But ironically to the name, it doesn't support recording from multiple sources in sync either. But what powers it definitely can: FFmpeg. So let's take a dive into how to record multiple audio tracks on Windows in perfect sync.

The absolute basics

On Windows, if you want to record audio with FFmpeg, you must use a DirectShow device (dshow). This includes a whole bunch of devices system-wide. Virtual devices, your microphone, etc.

Getting the names of your devices

You must know the name of your audio device to interact with dshow. You can get them via:

Bash Command

ffmpeg -f dshow -list_devices true -i dummy

Here's some example output (with alternative names and video devices omitted for clarity):

Standard Error (FFmpeg)

[dshow @ 000002b0f35c0dc0] "マイク (HyperX QuadCast S)" (audio)
[dshow @ 000002b0f35c0dc0] "Game Capture 4K60 Pro MK.2 Audio" (audio)
[dshow @ 000002b0f35c0dc0] "virtual-audio-capturer" (audio)
[dshow @ 000002b0f35c0dc0] "Discord VC IN (Virtual Audio Cable)" (audio)
[dshow @ 000002b0f35c0dc0] "マイク (BLUE Yeti PRO)" (audio)
[dshow @ 000002b0f35c0dc0] "Game VC IN (Virtual Audio Cable)" (audio)
[dshow @ 000002b0f35c0dc0] "Game Capture 4K60 Pro MK.2 Audio (Game Capture 4K60 Pro MK.2)" (audio)
[dshow @ 000002b0f35c0dc0] "マイク (Realtek High Definition Audio)" (audio)
[dshow @ 000002b0f35c0dc0] "16 Channel System IN (Virtual Audio Cable)" (audio)

The most basic example: Recording 1 audio source

Most devices these days run at 48 kHz, or 48000 Hz. So, to record some audio, the most basic way is:

Bash Command

ffmpeg -f dshow -sample_rate 48000 -i audio="マイク (HyperX QuadCast S)" -c copy "out.mka"

This gives an out.mka file that has some recorded input from my microphone, listed up above as a device.

Recording 2 devices at once (but...?)

You are able to stack multiple of these to record multiple devices at once. So if I wanted to record my microphone and Discord VC:

Bash Command

ffmpeg \
	-f dshow -sample_rate 48000 -i audio="マイク (HyperX QuadCast S)" \
	-f dshow -sample_rate 48000 -i audio="Discord VC IN (Virtual Audio Cable)" \
	-map 0 -map 1 -c copy "out.mka"

The out.mka here will have both audio tracks separated. You can import them into Audacity and play around with them. There is a catch to this though. As you record, with multiple devices, dshow will introduce the devices slowly. This means that each channel is delayed. The more tracks you add, the more laggy it gets. It's a mess honestly.

This normally wouldn't be a problem. Especially if the lag was minor. But the delay is entire seconds. And the more devices you add, the worse it gets. Sure, there should be ways to compensate for this manually. You could just realign the tracks in Audacity. Then export, and recreate the MKA file. But I don't want that. The whole point of this is automation. So, let's get to making it automatic.

Some observations

It pissed me off, so of course I reverse-engineered how it worked, barely. When dealing with audio devices (Windows or Linux), they have an epoch timer. A clock that is from the moment the machine started. This is used to try to keep all of the devices in sync. When I was recording 4 audio devices with -copyts flag on, I noticed it. Here's the command for recording all 4 devices:

Bash Commands

# Change this to what devices you want to record
DEV_0="16 Channel System IN (Virtual Audio Cable)"
DEV_1="マイク (HyperX QuadCast S)"
DEV_2="Discord VC IN (Virtual Audio Cable)"
DEV_3="Game VC IN (Virtual Audio Cable)"

# Record all 4 simultaneously
ffmpeg \
	-f dshow -sample_rate 48000 -rtbufsize 1M -i audio="${DEV_0}" \
	-f dshow -sample_rate 48000 -rtbufsize 1M -i audio="${DEV_1}" \
	-f dshow -sample_rate 48000 -rtbufsize 1M -i audio="${DEV_2}" \
	-f dshow -sample_rate 48000 -rtbufsize 1M -i audio="${DEV_3}" \
	-map 0 -map 1 -map 2 -map 3 \
	-c copy -copyts \
	"tmp.mka"

And here is the selected output with the epoch timer showing:

Standard Error (FFmpeg)

  Duration: N/A, start: 13189.398000, bitrate: 1536 kb/s
  Duration: N/A, start: 13190.725000, bitrate: 1536 kb/s
  Duration: N/A, start: 13191.592000, bitrate: 1536 kb/s
  Duration: N/A, start: 13192.437000, bitrate: 1536 kb/s

The difference in the start times here just coincidentally line up with how the tracks should be aligned for perfect synchronisation. This is easily tested by just opening Audacity and sliding the tracks (or generating silence at the start of a respective track). It goes from incredibly far off to literally perfect. It's beautiful. But I'm not about to just do this every single time I record. That's stupid. So let's resort to automation.

Because I am going to automate this, you can get all of the timestamps of all tracks via ffprobe, included in the FFmpeg suite:

Bash and Standard Out (FFprobe)

UNIX> ffprobe -loglevel quiet -select_streams a -show_entries stream=start_time -of csv=p=0 -i "out.mka"
13189.398000
13190.725000
13191.592000
13192.437000

If you compute the delta between them, and then append the time, it is possible to force synchronisation between all of the channels with this data. The consequence is that it's a two-step procedure. You must record first. Then you must "process" the audio. It's worth it for zero lag, but I am not going to argue for it because it chews up your disk space.

Taking action

With the observation out of the way, now it's time to write up some code to take care of this mess. We're going to hit this problem with some pure Bash. So get ready.

Computing delay

Assume the following ffprobe output from directly above. The delay of the first track is 0 milliseconds. The other tracks will be computed from first track's epoch timer.

Bash and Standard Out (FFprobe)

I=0
INITIAL="0"
DELTA="0"
DELAYS="$(ffprobe -loglevel quiet -select_streams a -show_entries stream=start_time -of csv=p=0 -i "out.mka")"

for DELAY in $DELAYS; do
	if [ $I -eq 0 ]; then
		# First track
		# Assume a delay of 0
		INITIAL="$DELAY"
		DELTA="0"
	else
		# Not first track
		# Compute distance between Nth time and first time
		DELTA="$(echo "$INITIAL $DELAY" | awk '{ printf("%d", ($2 - $1) * 1000); }')"
	fi

	# Print the delay in milliseconds
	echo "$DELTA"
	let "I++"
done

This will go through the output line-by-line and output the delays in milliseconds. The reason I want milliseconds is because there is an FFmpeg filter called adelay which takes milliseconds. As expected, the output is:

Standard Output

Applying delay via FFmpeg

Perfect. Next, we need to construct the filter that will go through all 4 tracks and apply the delay at the start. That filter would look something like this:

Bash Command Segment

[0:a:0]adelay=delays=0:all=1[ch0];
[0:a:1]adelay=delays=1327:all=1[ch1];
[0:a:2]adelay=delays=2194:all=1[ch2];
[0:a:3]adelay=delays=3039:all=1[ch3]

I multi-lined it to make it look pretty. The final command should look like this:

Bash Command

ffmpeg \
	-i "out.mka" \
	-filter_complex '[0:a:0]adelay=delays=0:all=1[ch0];[0:a:1]adelay=delays=1327:all=1[ch1];[0:a:2]adelay=delays=2194:all=1[ch2];[0:a:3]adelay=delays=3039:all=1[ch3]' \
	-map '[ch0]' \
	-map '[ch1]' \
	-map '[ch2]' \
	-map '[ch3]' \
	-c:a flac -compression_level 12 \
	"fixed.mka"

Computing and applying delay simultaneously

Both steps up above can be combined into one single step if you don't mind a little eval. This also means it supports an infinite amount of tracks, compared to 1, 2, or 4 as demonstrated up above.

Bash and Standard Out (FFprobe)

I=0
INITIAL="0"
FILTER=""
MAP=""
DELAYS="$(ffprobe -loglevel quiet -select_streams a -show_entries stream=start_time -of csv=p=0 -i "out.mka")"

for DELAY in $DELAYS; do
	if [ $I -eq 0 ]; then
		INITIAL="$DELAY"
		DELTA="0"
		MAP="-map \"[ch${I}]\""
		FILTER="[0:a:${I}]adelay=delays=${DELTA}:all=1[ch${I}]"
	else
		DELTA="$(echo "$INITIAL $DELAY" | awk '{ printf("%d", ($2 - $1) * 1000); }')"
		MAP="${MAP} -map \"[ch${I}]\""
		FILTER="${FILTER};[0:a:${I}]adelay=delays=${DELTA}:all=1[ch${I}]"
	fi
	let "I++"
done

eval "ffmpeg -i \"out.mka\" -filter_complex \"${FILTER}\" ${MAP} -c:a flac -compression_level 12 \"fixed.mka\""

As an added bonus, it will comb through the file and losslesly compress it with the maximum possible settings. Because this step doesn't have to be real-time, we can let the CPU take more time and have some breathing room to compress efficiently to ensure the optimal file. Especially when the file is 5x smaller than normal.

Goodies

I have a script attached which does the recording process, followed by the delay processing. That script is mka_record.sh. Additionally, I have another script which will process the audio track if step 2 somehow failed while running the first script (e.g. out of disk space or machine crash). That is mka_repair.sh. You may find those here:

Obviously, you will have to edit these scripts yourself to make sure they work with your setup. List your devices and change the DEV_ variables accordingly. These heavily rely on the MKA container to work properly. And I recommend you keep it in an MKA container until it is fully processed, because you get corruption-resistance. I once had my machine crash while recording, and I was able to get the audio back just because it was recorded in MKA.

Conclusion

This one was a short one. I hope it was useful. For the record, while this works on Windows only, some small modifications can be made to make it work on Linux as well. For Mac, I wouldn't vouch for FFmpeg since the AVFoundation implementation has been broken since FFmpeg version 4.3, featuring crackling audio everywhere. For Mac, I'd suggest SoX. Maybe I'll write up something on that soon.

Hope it helps with content creation. And happy holidays. :)

Settings

Appearance

Theme

Colourscheme

Language

Preferred Language

About

"blog.claranguyen.me" details

Introduction

The absolute basics

Getting the names of your devices

The most basic example: Recording 1 audio source

Recording 2 devices at once (but...?)

Some observations

Taking action

Computing delay

Applying delay via FFmpeg

Computing and applying delay simultaneously

Goodies

Conclusion