IPB

Welcome Guest ( Log In | Register )

2 Pages V   1 2 >  
Reply to this topicStart new topic
genreal purpose audio codec?
dragontiger
post Dec 28 2010, 01:32
Post #1





Group: Members
Posts: 7
Joined: 28-December 10
Member No.: 86853



Hi all,

Is there any general purpose codec that would work well on all kinds of audio data? For example - if you were recording your daily activities, then your data would consist of speech segments, environmental audio segments (from your surroundings) and so on. Is there an appropriate codec to use for such kind of data?

The closest to this I found was Siren 22 by Polycom. However, it's closed-source. sad.gif

Thanks in advance for answering the question.
Go to the top of the page
+Quote Post
Roseval
post Dec 28 2010, 01:58
Post #2





Group: Members
Posts: 482
Joined: 26-March 08
Member No.: 52303



I’m inclined to say almost any Codec will do.
Basically it is about dynamic range and frequency.
16 bit will give you 96 dB and 24 even 144 dB (probably more than your gear can cope with) dynamic range.
44.kHz sample rate allows for a frequency range up to almost 22 kHz.
Standard Redbook audio (16 bits/44.1 kHz sample rate) fulfill our needs pretty well.
You might use any codec like WAV, FLAC etc. to cover this.

I do think what you want (probably recording with realistic results ) is not really related to a specific Codec.



--------------------
TheWellTemperedComputer.com
Go to the top of the page
+Quote Post
dragontiger
post Dec 28 2010, 02:43
Post #3





Group: Members
Posts: 7
Joined: 28-December 10
Member No.: 86853



Thanks for the reply.

I would like to explain the scenario however. Suppose, as I mentioned earlier, you are constantly recording your surroundings 24/7. WAVs and FLACs won't work here because of their enormous storage requirements (due to their lossless nature, ofcourse). In such a case, Ogg/Vorbis or MP3 can do pretty well and provide bitrates close to 64kbps (more than enough). However, these codecs are primarily targeted towards coding music and not general audio. They will work well on speech and environmental audio, but they cannot offer high compression rates that are possible if I use Speex or something similar. So, given the nature of your audio recordings (speech and surroundings), I can make more savings on storage if I can intelligently decide on the audio source and apply the appropriate compression techniques.

So my question, is there any codec that does take into account the nature of audio and then decide what technique to use?
Go to the top of the page
+Quote Post
saratoga
post Dec 28 2010, 09:06
Post #4





Group: Members
Posts: 4923
Joined: 2-September 02
Member No.: 3264



Are you carrying around a PC all day to do this, or an embedded recorder? Because unless you have a laptop with a 12 hour battery, many of the formats you mention probably aren't an option anyway. You'd run out of battery life long before you ran out of storage.

If you're just rigging a portable voice recorder, I would just use MP3 or wavpack lossy. They're a lightweight enough that you could reasonably encode them all day without draining the battery of a typical hacked up recorder/mp3 player, and compress quite well (about 0.5 to 1.5GB per day).

(That said, in theory WMA can do this by switching between WMA Voice and WMA Pro, but not much supports it and of course you would have to purchase software to do this from MS. )
Go to the top of the page
+Quote Post
dragontiger
post Dec 29 2010, 05:24
Post #5





Group: Members
Posts: 7
Joined: 28-December 10
Member No.: 86853



The idea is to build a recorder that would last up to 24 hours and require less storage (part of my research work). So, I was thinking of writing a switchable encoder between Vorbis and Speex and put them in the same Ogg container. Before I undertake this huge task, I just wanted to make sure that there's no other codec that can do this.

I am a bit apprehensive about MP3 because of all the patent-related issues and also that at lower bitrates (40-64 kbps), Vorbis tends to perfom better. LossyWav won't give me the compression ratios that I require.

Thanks for the reply smile.gif
Go to the top of the page
+Quote Post
saratoga
post Dec 29 2010, 05:57
Post #6





Group: Members
Posts: 4923
Joined: 2-September 02
Member No.: 3264



QUOTE (dragontiger @ Dec 28 2010, 23:24) *
The idea is to build a recorder that would last up to 24 hours and require less storage (part of my research work).


24 hours is not very much data. Why exactly do you need compression at all?

QUOTE (dragontiger @ Dec 28 2010, 23:24) *
LossyWav won't give me the compression ratios that I require.


LossyWav isn't the same thing as Wavpack lossy. I was suggesting the latter since its very light on battery to encode, but it sounds like you're using a PC anyway, so that doesn't really matter.
Go to the top of the page
+Quote Post
dragontiger
post Dec 29 2010, 07:44
Post #7





Group: Members
Posts: 7
Joined: 28-December 10
Member No.: 86853



Sorry for leaving out details - I'm planning for a small and wearable device powered by a couple of coin cells. So, the stress is on ultra low-power hardware design. Larger flash memories to store data, more the power consumed (1 GB flash memory works within the power budget). Hence, the need for compression.
Also, if I keep the Vorbis or MP3 encoder on for compressing all the data, it is more than necessary, since most of my data is going to be speech or silence and I can then use Speex (lower complexity and higher compression ratio). I still want to use Vorbis/MP3 for compressing the environmental sounds. So, can't rule out them out fully either.
Go to the top of the page
+Quote Post
dragontiger
post Dec 29 2010, 07:51
Post #8





Group: Members
Posts: 7
Joined: 28-December 10
Member No.: 86853



WavPack sounds like a good idea. I think I will look into it further.
Go to the top of the page
+Quote Post
Roseval
post Dec 29 2010, 10:54
Post #9





Group: Members
Posts: 482
Joined: 26-March 08
Member No.: 52303



You might have a look at AAC: http://en.wikipedia.org/wiki/Advanced_Audio_Coding


--------------------
TheWellTemperedComputer.com
Go to the top of the page
+Quote Post
odyssey
post Dec 29 2010, 11:11
Post #10





Group: Members
Posts: 2296
Joined: 18-May 03
From: Denmark
Member No.: 6695



24 hour uncompressed mono audio is approx. 8GB. Flash in this range is really cheap. If you are going to use just a few cell-batteries, I wouldn't expect you to come far with any kind of lossy encoding.

Regarding choice of codec if you decide that anyway, I would choose LAME MP3 any day for any kind of content. It has been optimized so well that it encodes almost everything you throw at it without artifacts. Look at it this way; the music it is supposed to encode transparently is the hardest job for it. It should be no problem at all, for it to encode environmental audio.

This post has been edited by odyssey: Dec 29 2010, 11:13


--------------------
Can't wait for a HD-AAC encoder :P
Go to the top of the page
+Quote Post
Remedial Sound
post Dec 29 2010, 16:40
Post #11





Group: Members
Posts: 505
Joined: 5-January 06
From: Dublin
Member No.: 26898



I'll second the recommendation for LAME mp3, as it has universal hardware/software support. For the purposes you describe it'd probably make the most sense to use the Voice "preset" command line recommended in the LAME wiki., which forces mono and utilizes ABR:

CODE
--abr 56 -mm


From my experiences this works quite well for audiobooks (haven't done any field recording though). You can even reduce the ABR bitrate an LAME will reduce the sample rate as needed.

You might also want to consider trying VBR (-V 9) with forced mono and perhaps a forced sample rate. While I've never tried this, in theory VBR will handle long passages of (near-)silence better (i.e., using fewer bits and saving them for the more complex stuff).

HTH
Go to the top of the page
+Quote Post
saratoga
post Dec 29 2010, 17:17
Post #12





Group: Members
Posts: 4923
Joined: 2-September 02
Member No.: 3264



QUOTE (dragontiger @ Dec 29 2010, 01:44) *
Sorry for leaving out details - I'm planning for a small and wearable device powered by a couple of coin cells. So, the stress is on ultra low-power hardware design. Larger flash memories to store data, more the power consumed (1 GB flash memory works within the power budget).


Thats not really how it works. You'll use a little bit of power writing out data to flash, but its tiny compared to the power to actually compress something. If this thing really needs to run off a couple coin cells for 24 hours, things like mp3, vorbis, etc are out of the question. Look into wavpack, but even then you're probably going to miss your power budget by a lot.

PCM is almost certainly your best bet. Buffer a couple seconds of it to DRAM, clock up your flash, burst write your buffer, and then clock down the flash chip.

QUOTE (dragontiger @ Dec 29 2010, 01:44) *
Also, if I keep the Vorbis or MP3 encoder on for compressing all the data, it is more than necessary, since most of my data is going to be speech or silence and I can then use Speex (lower complexity and higher compression ratio). I still want to use Vorbis/MP3 for compressing the environmental sounds. So, can't rule out them out fully either.


Generally anything running off a couple batteries for more then a few hours has no FPU, and thus you won't be encoding vorbis unless you're writing your own integer vorbis encoder. Have you figured out which codecs are even possible to run on your hardware? And how big of a battery pack you'll need to do it? IMO its not really worthwhile asking about all these codecs if you can't actually run them.
Go to the top of the page
+Quote Post
Notat
post Dec 29 2010, 18:31
Post #13





Group: Members
Posts: 581
Joined: 17-August 09
Member No.: 72373



It does take a non-trivial amount of power and time to write flash. Whether you'll have net power savings by compressing depends on the complexity of the compression and resultant bit rate. Power-wise, you may be better off with a low complexity and/or low bit rate encoder. Of course, you may find the sound quality unacceptable. Have a look at CELT and Speex.
Go to the top of the page
+Quote Post
Rotareneg
post Dec 29 2010, 18:38
Post #14





Group: Members
Posts: 194
Joined: 18-March 05
From: Non-Euclidean
Member No.: 20701



You might consider just using run-length and Huffman coding, which would cut down the size of the raw data without using much processor power.

Also, choose your bit depth and sampling rate accordingly, the bit depth being probably the easiest to trim back.

This post has been edited by Rotareneg: Dec 29 2010, 18:47
Go to the top of the page
+Quote Post
SebastianG
post Dec 29 2010, 18:46
Post #15





Group: Developer
Posts: 1318
Joined: 20-March 04
From: Göttingen (DE)
Member No.: 12875



Maybe SBC is an option. This is Boothooth's primary general purpose audio codec. It's quite simple/fast and gives you a quality per bit ratio that is probably similar to MPEG Layer 1. For example, you could use a sampling rate of 32 kHz and a data rate of about 128 kbps (for one channel). You'll find an SBC implementation in the libbluez source code.
Go to the top of the page
+Quote Post
saratoga
post Dec 29 2010, 19:56
Post #16





Group: Members
Posts: 4923
Joined: 2-September 02
Member No.: 3264



QUOTE (SebastianG @ Dec 29 2010, 12:46) *
Maybe SBC is an option. This is Boothooth's primary general purpose audio codec. It's quite simple/fast and gives you a quality per bit ratio that is probably similar to MPEG Layer 1. For example, you could use a sampling rate of 32 kHz and a data rate of about 128 kbps (for one channel). You'll find an SBC implementation in the libbluez source code.


I was thinking something like 10 bit ADPCM @ 22khz. Thats only 215kbps, and the compression is much more power efficient then doing a subband decomposition.
Go to the top of the page
+Quote Post
knutinh
post Dec 29 2010, 20:32
Post #17





Group: Members
Posts: 569
Joined: 1-November 06
Member No.: 37047



Do you expect to do any AGC to compensate for the large variations in loudness typically encountered?

-k
Go to the top of the page
+Quote Post
dragontiger
post Dec 29 2010, 23:42
Post #18





Group: Members
Posts: 7
Joined: 28-December 10
Member No.: 86853



Thank you everyone for all the suggestions. I have definitely got a few things to think about before I start.

On the other hand, just out of curiosity, would it actually make any sense to write a new codec (not exactly new, taking the important properties of the good codecs out there and integrating them) that can handle multiple sources of audio with ease, be power-efficient and portable-device friendly (all of these at low bitrates, 4-8 kbps for speech, 40-64 kbps for the rest)? Just a thought.
Go to the top of the page
+Quote Post
saratoga
post Dec 30 2010, 00:00
Post #19





Group: Members
Posts: 4923
Joined: 2-September 02
Member No.: 3264



QUOTE (dragontiger @ Dec 29 2010, 17:42) *
Thank you everyone for all the suggestions. I have definitely got a few things to think about before I start.


Out of curiosity which CPU were you going to use?

QUOTE (dragontiger @ Dec 29 2010, 17:42) *
On the other hand, just out of curiosity, would it actually make any sense to write a new codec (not exactly new, taking the important properties of the good codecs out there and integrating them) that can handle multiple sources of audio with ease, be power-efficient and portable-device friendly (all of these at low bitrates, 4-8 kbps for speech, 40-64 kbps for the rest)? Just a thought.


MS did that with the WMA9 family, so I guess it made sense to them. I've never actually seen someone use it though, so I think in practice it hasn't been too popular. I think part of it is that voice codecs tend to be under completely different restrictions then audio codecs in most situations, so its difficult to combine the two without giving up too much (in terms of latency, packet size, cpu power, memory, etc).
Go to the top of the page
+Quote Post
dragontiger
post Dec 30 2010, 00:07
Post #20





Group: Members
Posts: 7
Joined: 28-December 10
Member No.: 86853



The plan is not to use a CPU, but develop an ASIC for better power efficiency. And yes, since there is no FPU, I have started writing an integer vorbis encoder (from your earlier post).

This post has been edited by dragontiger: Dec 30 2010, 00:07
Go to the top of the page
+Quote Post
C.R.Helmrich
post Dec 30 2010, 00:07
Post #21





Group: Developer
Posts: 686
Joined: 6-December 08
From: Erlangen Germany
Member No.: 64012



QUOTE (dragontiger @ Dec 28 2010, 03:43) *
Thanks for the reply.

I would like to explain the scenario however. Suppose, as I mentioned earlier, you are constantly recording your surroundings 24/7. WAVs and FLACs won't work here because of their enormous storage requirements (due to their lossless nature, ofcourse). In such a case, Ogg/Vorbis or MP3 can do pretty well and provide bitrates close to 64kbps (more than enough). However, these codecs are primarily targeted towards coding music and not general audio. They will work well on speech and environmental audio, but they cannot offer high compression rates that are possible if I use Speex or something similar. So, given the nature of your audio recordings (speech and surroundings), I can make more savings on storage if I can intelligently decide on the audio source and apply the appropriate compression techniques.

So my question, is there any codec that does take into account the nature of audio and then decide what technique to use?


This is precisely what the following upcoming audio coding standard will be for. We presently call it "Unified speech and audio coder", but that name will probably change. At high bit rates (32 kbps per channel and more), it's quite similar to HE-AAC, so for now I recommend using that.

www.gel.usherbrooke.ca/gournay/documents/publications/AES126_...pdf

Chris

This post has been edited by C.R.Helmrich: Dec 30 2010, 00:14


--------------------
If I don't reply to your reply, it means I agree with you.
Go to the top of the page
+Quote Post
saratoga
post Dec 30 2010, 00:15
Post #22





Group: Members
Posts: 4923
Joined: 2-September 02
Member No.: 3264



QUOTE (dragontiger @ Dec 29 2010, 18:07) *
The plan is not to use a CPU, but develop an ASIC for better power efficiency.


A couple points:

1) Generally to have an ASIC fabricated you need to order thousands of units. Are you planning to order that many devices?
2) Theres a lot of commercially available ASICs that can do what you need without the enormous cost of fabricating a custom part.
3) Your ASIC will probably be based on some kind of CPU or DSP internally unless you're really going to try and layout the logic directly to encode a file, which I think would be staggeringly difficult. Have you thought about which kind you will use?

QUOTE (dragontiger @ Dec 29 2010, 18:07) *
And yes, since there is no FPU, I have started writing an integer vorbis encoder (from your earlier post).


I think this is probably not going to be worthwhile because of the power requirements. You should probably pick a format thats well suited to what you want to do, and I don't think thats going to be Vorbis, or any perceptual codec for that matter. Thats probably going to be some kind of PCM variant so that you can keep the processing on your device to an absolute minimum.
Go to the top of the page
+Quote Post
C.R.Helmrich
post May 19 2011, 17:33
Post #23





Group: Developer
Posts: 686
Joined: 6-December 08
From: Erlangen Germany
Member No.: 64012



In case anyone is interested in technical details (and has the time and financial resources):

At next week's IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2011) in Prague, I and a colleague of mine will be presenting some of the recent work that was accepted for integration into the "unified speech and audio codec" I mentioned above, namely:

Stereo coding at bit rates of ~96 kb/s and higher:
AASP-P10.4: EFFICIENT TRANSFORM CODING OF TWO-CHANNEL AUDIO SIGNALS BY MEANS OF COMPLEX-VALUED STEREO PREDICTION

Design of the arithmetic coder replacing Huffman coding:
AASP-P10.3: EFFICIENT CONTEXT ADAPTIVE ENTROPY CODING FOR REAL-TIME APPLICATIONS

See www.cmsworldwide.com/ICASSP2011/Papers/PublicSessionIndex3.asp?Sessionid=1034 for details. One detail which is probably interesting for some forum members: to demonstrate the advantage that the new stereo coding tool has over traditional tools, we conducted a formal blind test including two items from HA: BerlinDrug and Waiting. smile.gif

Chris


--------------------
If I don't reply to your reply, it means I agree with you.
Go to the top of the page
+Quote Post
IgorC
post May 20 2011, 06:00
Post #24





Group: Members
Posts: 1556
Joined: 3-January 05
From: ARG/RUS
Member No.: 18803



Chris,

It's surprise that USAC will have improved efficiency at 96 kbps and higher. Until now all signs were indicating that it will be another low bitrate codec and LC-AAC will still be used for >80 kbps.
Looking at list of new coding techniques there are high chances that USAC will have substantial improvement of coding efficiency over LC-AAC.

This post has been edited by IgorC: May 20 2011, 06:01
Go to the top of the page
+Quote Post
DonP
post May 20 2011, 10:31
Post #25





Group: Members (Donating)
Posts: 1471
Joined: 11-February 03
From: Vermont
Member No.: 4955



QUOTE (saratoga @ Dec 29 2010, 18:15) *
QUOTE (dragontiger @ Dec 29 2010, 18:07) *
The plan is not to use a CPU, but develop an ASIC for better power efficiency.


A couple points:

1) Generally to have an ASIC fabricated you need to order thousands of units. Are you planning to order that many devices?


I've worked on a few designs where the volume was in the hundred range. In those cases the chips were going into a small quantity of very expensive products like satellites or CAT scanners.

You can also have chips made through MOSIS which combines multiple designs/customers on one mask set so you aren't fighting the economics of making whole wafers in one design. The smallest standard quantity I saw listed is 40.
Go to the top of the page
+Quote Post

2 Pages V   1 2 >
Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 30th August 2014 - 07:45