Introduction
To start, I'm quite pissed off. Sometimes I just want to record some audio from multiple devices in sync, losslessly. There's quite a few purposes for this. In my case it's gameplay with microphones and Discord audio all separate but in sync. Back then, software like Dxtory had the functionality to record multiple audio tracks in perfect sync. But it requires a game open, and it doesn't work with modern games. In fact, it struggles with Windows 11. There are other solutions, like OBS. But, again these include a video track. What if I just want the audio in perfect sync?
GeForce Experience provides minor multitrack support. But it does it with AAC and only allows one additional track. Its options are very limited, to the point I complain about it in "Grind Series: Quantity without compromising Quality". Audacity was called a "Multitrack recorder" at one point. But ironically to the name, it doesn't support recording from multiple sources in sync either. But what powers it definitely can: FFmpeg. So let's take a dive into how to record multiple audio tracks on Windows in perfect sync.
The absolute basics
On Windows, if you want to record audio with FFmpeg, you must use a
DirectShow device (dshow
). This includes a whole bunch of
devices system-wide. Virtual devices, your microphone, etc.
Getting the names of your devices
You must know the name of your audio device to interact with
dshow
. You can get them via:
ffmpeg -f dshow -list_devices true -i dummy
Here's some example output (with alternative names and video devices omitted for clarity):
[dshow @ 000002b0f35c0dc0] "マイク (HyperX QuadCast S)" (audio)
[dshow @ 000002b0f35c0dc0] "Game Capture 4K60 Pro MK.2 Audio" (audio)
[dshow @ 000002b0f35c0dc0] "virtual-audio-capturer" (audio)
[dshow @ 000002b0f35c0dc0] "Discord VC IN (Virtual Audio Cable)" (audio)
[dshow @ 000002b0f35c0dc0] "マイク (BLUE Yeti PRO)" (audio)
[dshow @ 000002b0f35c0dc0] "Game VC IN (Virtual Audio Cable)" (audio)
[dshow @ 000002b0f35c0dc0] "Game Capture 4K60 Pro MK.2 Audio (Game Capture 4K60 Pro MK.2)" (audio)
[dshow @ 000002b0f35c0dc0] "マイク (Realtek High Definition Audio)" (audio)
[dshow @ 000002b0f35c0dc0] "16 Channel System IN (Virtual Audio Cable)" (audio)
The most basic example: Recording 1 audio source
Most devices these days run at 48 kHz, or 48000 Hz. So, to record some audio, the most basic way is:
ffmpeg -f dshow -sample_rate 48000 -i audio="マイク (HyperX QuadCast S)" -c copy "out.mka"
This gives an out.mka
file that has some recorded input from
my microphone, listed up above as a device.
Recording 2 devices at once (but...?)
You are able to stack multiple of these to record multiple devices at once. So if I wanted to record my microphone and Discord VC:
ffmpeg \
-f dshow -sample_rate 48000 -i audio="マイク (HyperX QuadCast S)" \
-f dshow -sample_rate 48000 -i audio="Discord VC IN (Virtual Audio Cable)" \
-map 0 -map 1 -c copy "out.mka"
The out.mka
here will have both audio tracks separated. You
can import them into Audacity and play around with them.
There is a catch to this though. As you record, with multiple devices,
dshow
will introduce the devices slowly. This means that each
channel is delayed. The more tracks you add, the more laggy it gets. It's a
mess honestly.
This normally wouldn't be a problem. Especially if the lag was minor. But the delay is entire seconds. And the more devices you add, the worse it gets. Sure, there should be ways to compensate for this manually. You could just realign the tracks in Audacity. Then export, and recreate the MKA file. But I don't want that. The whole point of this is automation. So, let's get to making it automatic.
Some observations
It pissed me off, so of course I reverse-engineered how it worked, barely.
When dealing with audio devices (Windows or Linux), they have an epoch
timer. A clock that is from the moment the machine started. This is used to
try to keep all of the devices in sync. When I was recording 4 audio
devices with -copyts
flag on, I noticed it. Here's the command
for recording all 4 devices:
# Change this to what devices you want to record
DEV_0="16 Channel System IN (Virtual Audio Cable)"
DEV_1="マイク (HyperX QuadCast S)"
DEV_2="Discord VC IN (Virtual Audio Cable)"
DEV_3="Game VC IN (Virtual Audio Cable)"
# Record all 4 simultaneously
ffmpeg \
-f dshow -sample_rate 48000 -rtbufsize 1M -i audio="${DEV_0}" \
-f dshow -sample_rate 48000 -rtbufsize 1M -i audio="${DEV_1}" \
-f dshow -sample_rate 48000 -rtbufsize 1M -i audio="${DEV_2}" \
-f dshow -sample_rate 48000 -rtbufsize 1M -i audio="${DEV_3}" \
-map 0 -map 1 -map 2 -map 3 \
-c copy -copyts \
"tmp.mka"
And here is the selected output with the epoch timer showing:
Duration: N/A, start: 13189.398000, bitrate: 1536 kb/s
Duration: N/A, start: 13190.725000, bitrate: 1536 kb/s
Duration: N/A, start: 13191.592000, bitrate: 1536 kb/s
Duration: N/A, start: 13192.437000, bitrate: 1536 kb/s
The difference in the start
times here just coincidentally
line up with how the tracks should be aligned for perfect synchronisation.
This is easily tested by just opening Audacity and sliding the tracks
(or generating silence at the start of a respective track). It goes from
incredibly far off to literally perfect. It's beautiful. But I'm not about
to just do this every single time I record. That's stupid. So let's resort
to automation.
Because I am going to automate this, you can get all of the timestamps of
all tracks via ffprobe
, included in the FFmpeg suite:
UNIX> ffprobe -loglevel quiet -select_streams a -show_entries stream=start_time -of csv=p=0 -i "out.mka"
13189.398000
13190.725000
13191.592000
13192.437000
If you compute the delta between them, and then append the time, it is possible to force synchronisation between all of the channels with this data. The consequence is that it's a two-step procedure. You must record first. Then you must "process" the audio. It's worth it for zero lag, but I am not going to argue for it because it chews up your disk space.
Taking action
With the observation out of the way, now it's time to write up some code to take care of this mess. We're going to hit this problem with some pure Bash. So get ready.
Computing delay
Assume the following ffprobe output from directly above. The delay of the first track is 0 milliseconds. The other tracks will be computed from first track's epoch timer.
I=0
INITIAL="0"
DELTA="0"
DELAYS="$(ffprobe -loglevel quiet -select_streams a -show_entries stream=start_time -of csv=p=0 -i "out.mka")"
for DELAY in $DELAYS; do
if [ $I -eq 0 ]; then
# First track
# Assume a delay of 0
INITIAL="$DELAY"
DELTA="0"
else
# Not first track
# Compute distance between Nth time and first time
DELTA="$(echo "$INITIAL $DELAY" | awk '{ printf("%d", ($2 - $1) * 1000); }')"
fi
# Print the delay in milliseconds
echo "$DELTA"
let "I++"
done
This will go through the output line-by-line and output the delays in
milliseconds. The reason I want milliseconds is because there is an FFmpeg
filter called
adelay
which takes milliseconds. As expected, the output is:
0
1327
2194
3039
Applying delay via FFmpeg
Perfect. Next, we need to construct the filter that will go through all 4 tracks and apply the delay at the start. That filter would look something like this:
[0:a:0]adelay=delays=0:all=1[ch0];
[0:a:1]adelay=delays=1327:all=1[ch1];
[0:a:2]adelay=delays=2194:all=1[ch2];
[0:a:3]adelay=delays=3039:all=1[ch3]
I multi-lined it to make it look pretty. The final command should look like this:
ffmpeg \
-i "out.mka" \
-filter_complex '[0:a:0]adelay=delays=0:all=1[ch0];[0:a:1]adelay=delays=1327:all=1[ch1];[0:a:2]adelay=delays=2194:all=1[ch2];[0:a:3]adelay=delays=3039:all=1[ch3]' \
-map '[ch0]' \
-map '[ch1]' \
-map '[ch2]' \
-map '[ch3]' \
-c:a flac -compression_level 12 \
"fixed.mka"
Computing and applying delay simultaneously
Both steps up above can be combined into one single step if you don't mind a
little eval
. This also means it supports an infinite amount of
tracks, compared to 1, 2, or 4 as demonstrated up above.
I=0
INITIAL="0"
FILTER=""
MAP=""
DELAYS="$(ffprobe -loglevel quiet -select_streams a -show_entries stream=start_time -of csv=p=0 -i "out.mka")"
for DELAY in $DELAYS; do
if [ $I -eq 0 ]; then
INITIAL="$DELAY"
DELTA="0"
MAP="-map \"[ch${I}]\""
FILTER="[0:a:${I}]adelay=delays=${DELTA}:all=1[ch${I}]"
else
DELTA="$(echo "$INITIAL $DELAY" | awk '{ printf("%d", ($2 - $1) * 1000); }')"
MAP="${MAP} -map \"[ch${I}]\""
FILTER="${FILTER};[0:a:${I}]adelay=delays=${DELTA}:all=1[ch${I}]"
fi
let "I++"
done
eval "ffmpeg -i \"out.mka\" -filter_complex \"${FILTER}\" ${MAP} -c:a flac -compression_level 12 \"fixed.mka\""
As an added bonus, it will comb through the file and losslesly compress it with the maximum possible settings. Because this step doesn't have to be real-time, we can let the CPU take more time and have some breathing room to compress efficiently to ensure the optimal file. Especially when the file is 5x smaller than normal.
Goodies
I have a script attached which does the recording process, followed by the
delay processing. That script is mka_record.sh
. Additionally,
I have another script which will process the audio track if step 2 somehow
failed while running the first script (e.g. out of disk space or machine
crash). That is mka_repair.sh
. You may find those here:
Obviously, you will have to edit these scripts yourself to make sure they
work with your setup. List your devices and change the DEV_
variables accordingly. These heavily rely on the MKA container to work
properly. And I recommend you keep it in an MKA container until it is
fully processed, because you get corruption-resistance. I once had my
machine crash while recording, and I was able to get the audio back just
because it was recorded in MKA.
Conclusion
This one was a short one. I hope it was useful. For the record, while this works on Windows only, some small modifications can be made to make it work on Linux as well. For Mac, I wouldn't vouch for FFmpeg since the AVFoundation implementation has been broken since FFmpeg version 4.3, featuring crackling audio everywhere. For Mac, I'd suggest SoX. Maybe I'll write up something on that soon.
Hope it helps with content creation. And happy holidays. :)