Synchronizing & replacing audio in MPEG-4 video

2013-06-04 18:36:46

I occasionally have need to replace the audio stream in a video file (usually some variant of MPEG-4) with a higher quality audio source and retain sync - particularly lip sync.

I've come up with what seems like a good workflow to synchronise soundboard audio fairly easily with the original video's low quality audio, encode it to AAC and remultiplex into a properly synchronized MPEG-4 file (e.g. .mp4 or .m4v) that seems to play fully synchronised on any platform I've thrown it at with the high quality audio instead of the original. The longest file I've tried so far stays in sync for its full 59 minute duration.

I've made some notes, primarily for myself and my preferences, but I thought this might be useful for anyone searching the forums or internet for ways to do this in future.

It should also be workable with other devices like an old Nikon camera that produces MOV files with motion-JPEG video and 8-bit mono sound at something odd like a 7980 Hz sampling rate, simply by using Handbrake to convert both video and low-quality sound using x264 and FAAC, say, then replacing the synchronized LQ sound with better sound. I find that many platforms won't play these MOV-mJPEG videos anyway (Quicktime excepted ), so I usually end up using Handbrake to convert them to MPEG-4 Visual (h.264 via x264) plus AAC audio to make a compatible video file that looks as good but takes up a fraction of the space.

High quality doesn't always mean HiFi - like soundboard audio at 48kHz/24-bit. Sometimes it can be a medium sampling rate for speech, but recorded at close quarters rather than with room-reverberation or background noise and thus offer improved legibility. A simple smartphone with a PCM recording application like AndRecorder for Android (which works happily at 22050 Hz/16-bit mono on my low-end phone) might be used with a wired headset microphone clipped to a tie or a lapel to produce decent sound in a video interview, for example, while recording the video and low-quality sound on a separate digital camera or DV recorder. The more professional approach might be a dedicated audio field recorder like a Zoom H1. In either case, if the separate audio source is better than that captured by the video recording device, it can be synced using this method.

My aim is to preserve quality or quality per bitrate, especially in sound (video isn't all that great anyway), and to use free software, mostly cross-platform.

I also want to produce files that Just Work properly on any platform without taking up undue bitrate for the quality of video present. That's why I prefer main profile h264 video at Constant Framerate and LC-AAC audio (96k CVBR is the sweet-spot for QAAC encoder where LC-AAC is a little better than HE-AAC).

I aim to produce two files from each video

near maximum quality file for simple transcoding to DVD
low-bitrate good quality file for internet distribution or mobile devices

Given that my video stream is poor: 10.4 fps, 320x240 pixels... 3GP format
the former version runs at 737kbps video + 327 kbps audio = 1065 kbps total (~450MB/hr)
The low bitrate version Constant Quality RF=25.0 runs at only 108 kbps video + 84 kbps audio = 193 kbps total (~80 MB/hr)
(CQ RF=23.0 runs at 158kbps video + 84 kbps audio = 243 kbps total (~100 MB/hr))

I've noted commandline equivalents that would work outside of Windows-only tools, so this guide could be useful on Mac or Linux platforms also. Of course, I'm not recommending this as the best or only way of doing things, but as a workflow that provides me with what I need without too many arduous steps. I also list some of the problems that led me to this method (i.e. I tried to demux the raw video and audio streams but couldn't retain the sync, possibly due to inconsistent frame rate - keeping the video in its MP4/3GP container until switching the audio over seems to retain proper timing).

My brief summary of the workflow and tools that I've chosen to use is reproduced here as well as on page 2 of the linked files, shared on my MediaFire Audio folder. Fuller details are provided elsewhere in the OpenDocument Presentation (81KiB, LibreOffice Impress or OpenOffice Impress will open it, and MS Powerpoint should open it also, though I can't test that). A PDF download of the same Presentation(176KiB) is also provided.

One of the key points is how easily this method can provide a synchronised audio stream in Audacity with multiple point of visual confirmation.
The other key point is keeping the video stream in its original MP4 container to retain accurate timing until the moment the audio streams are switched.

Full workflow in brief

Obtain a video file in .mp4, .m4v or .3gp format – usually directly from device. For files like .avi or .mov without MPEG-4 compatible content, use Handbrake to transcode video using x264, say, and to transcode original LQ audio (don't worry too much about quality) to enable easy sync.
Open HQ audio (preferably lossless, such as .flac or .aup) in Audacity. Check Project Rate is good (e.g. 44100 or 48000 Hz)
Drag video file to Audacity to decode LQ audio beside HQ audio track
Synchronize audio at an obvious point (e.g. handclap, clapperboard, 1st note or beat of a certain song), pref near middle of LQ audio. Hint: use the ↔ Time-shift Tool to adjust it. Use Spectral View (e.g. vertical zoom to 0-3.5 kHz) to help line up accurately to within 10-40 ms or so.
Select from start to end of LQ audio track precisely and trim HQ track to selection. Close LQ track and edit HQ track as desired (fades, envelope, gain, subtle peak-limiting or dynamic compression etc, possibly using foobar2000 for ReplayGain (Apply gain, allow clipping) & Advanced Limiter during Convert...)
Encode edited audio to AAC e.g. HQstream.m4a file via qaac encoder. Consider both low bitrate version to accompany low-rate CQ CFR video for internet/mobile devices and high-bitrate version with original video as source for DVD transcode.
Open YAMB & “Create MP4 file with mutiple video, audio, etc... streams” - open original .mp4, .m4v or .3gp video file and deselect its audio stream. Add HQstream.m4a and select its audio. Choose output file name (e.g. Video0123_HQaudio_96kCVBR.mp4) and hit Next.

N.B. YAMB indicates that encoded AAC is a little longer than AMR (possible no gapless metadata) but it still lip-syncs just fine.
Optionally convert video using Handbrake using AAC Passthru to preserve audio.
Test files from 6. or 7. in Quicktime, Windows Media Player, VLC Player, digital TV via USB flash drive, smartphone, tablet etc. as desired. LC-AAC audio and x264 video with Constant Framerate should be widely compatible and retain audio sync in all players, in my experience.

If you have any good suggestions (or suggested alternative tools for Mac/Linux) feel free to chime in.

Notice