Audio codec specifications for different applications

2012-12-09 14:52:30

Hi!

I am trying to learn more about codecs in different situations, and what specifications that could be necessary for different applications.

An example would be voiceconference for example, where I know that it must be a positive factor to have as low latency as possible. But what more specifications could it be in that case?

I understand that different applications need different specifications, but I don't know so much what broadcast, mobile phones, conference, web audio needs at the moment.

I am studying the OPUS codec now and it seems promising as a new and upcoming one.

/Thanks

Audio codec specifications for different applications

Reply #1 – 2012-12-10 09:56:03

Look through the Opus (audio format) page on Wikipedia.

In particular, to sort of the Citations Needed tags I dug out a number of references, mostly in open journals using Google Scholar to back up the numbers used especially in the Quality comparison and low latency performance section (rather than referencing only other Wikipedia articles, where much the same numbers are quoted). There are Wikipedia articles on both Latency (engineering) and Latency (audio), linked in various parts of the Opus article.

It's the total latency (often the round-trip latency) that matters, and there are some things that can be done to eke out the latency a little (e.g. in the demanding case of networked music performance or jamming, there's some evidence that delaying the sound of one's own instrument very slightly to an in-ear monitor, while transmitting it to the co-performers as soon as possible can help accommodate slightly longer round-trip delay)

There's also a range of info about Broadcast applications, for example lip sync.

As for other specs - a brief summary of what I've gathered and can remember:
Broadcast - In studio, keep total latency minimal (less than around 10ms). Outside the studio, lip sync error compared to video is supposed to be no more than about 45 ms, but latency of the pair can be enormous.
Audio is traditionally 48 kHz sampled in broadcast - which finds its way into DAB radio etc.

8kHz is used in POTS telecoms and mobile phones and most VoIP with algorithmic delay typically below 20ms. In addition to 8kHz, 16kHz and 32 kHz occasionally (wideband and superwideband) are used in conference and web audio. Low latency video encoding is necessary for video telephony and video conferencing.

Knowledge of the link characteristics can get important
- e.g. IP has variable packet delay and no guarantee of delivery so in choosing a buffer delay a balance between latency and packet-loss might be necessary (e.g. if the packet delay histogram shows a peaked statistical distribution with tails, how long you allow before outputting sound from the sender determines what proportion of packets arrive too late to contribute to the sound you've played back (which essentially counts as packet loss for live playback, though if you're recording the stream also - e.g. a conference live chat to be played later as a podcast or radio show - you could potentially save the late packets that the live participants didn't hear live, making the podcast better quality than the live experience. (I don't think this is done yet, especially as chat latency allows very low loss. This could be extra useful at the bleeding edge of low latency such as live music performance, though if that's to be kept for posterity, the sources could better be losslessly stored locally on each participant's computer to be mixed offline later without transcoding).

ATM and STM channels are expensive but dedicated links with reliable delivery, bandwidth and delay, and can be used for studio-to-studio international link-ups - usually for two-way TV conversations - video+audio one way, audio-only from studio to interviewee or journalist, though they are often booked months in advance, I believe. Ham radio links also have predictable delay.

ADSL can operate in high or low latency modes (not at the user's choice - mine has just improved to Fast Path ~ 10ms, though at risk of greater packet loss), so 10 to 25 ms of latency might well be present in each participant's local loop broadband connection.

There's also stuff in sound cards and software mixers and DSP effects, some of which use look-ahead buffers. For that reason, some people use audio drivers like JACK audio toolkit (and the NETJACK package built around it) to reduce latency especially in network jam sessions.

Notice