Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: enhanced aac+ to aac lc (Read 10181 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

enhanced aac+ to aac lc

Hi, everyone, has there anybody used enhanced aac+ code(TS 26.410) to encode/decode only aac lc ,without sbr?

I modified the encoder of the enhanced aac+, ,there is no sbr, then when the decoder decode the new .3gp file ,no sbr is decoded, but the outout sample rate is 1/2 downsampled, because aac lc works at 1/2 sample rate, the sbr works at original sample rate, and in the decoder ,it is sbr make it become to original sample rate, but now there is no sbr.
Does anyone know how to make the sample rate of decoder output same as the original input file sample rate, i tried to modify the encoder to encode at original sample rate,but there will be no sound, and if i use 1/2 sample rate .3gp & modify the decoder to change the sample rate and frame size, the sound will sound strange...
or
is that ok if i upsample the new decoder output file (1/2 sample rate ) through matlab?

Thanks in advance.

enhanced aac+ to aac lc

Reply #1
What you see is to be expected. AFAIK, the only way to decode the LC part is at the half sampling rate because the inverse transform will only produce the number of samples that went into the original MDCT. (Out of interest, the old mp3pro files used MP3 at half the sampling rate, i.e. MPEG-2 layer 3, plus SBR for the upper half and a normal MP3 decoder without MP3Pro features would only produce the lower half).

I believe you need to resample (and I think that's what a AAC+ decoder does) though you'll probably find an app like foobar2000 (Windows) or SoX (command line multiplatform) does a good job of resampling and much faster than MATLAB, last I heard.
Dynamic – the artist formerly known as DickD

enhanced aac+ to aac lc

Reply #2
What you see is to be expected. AFAIK, the only way to decode the LC part is at the half sampling rate because the inverse transform will only produce the number of samples that went into the original MDCT. (Out of interest, the old mp3pro files used MP3 at half the sampling rate, i.e. MPEG-2 layer 3, plus SBR for the upper half and a normal MP3 decoder without MP3Pro features would only produce the lower half).

I believe you need to resample (and I think that's what a AAC+ decoder does) though you'll probably find an app like foobar2000 (Windows) or SoX (command line multiplatform) does a good job of resampling and much faster than MATLAB, last I heard.


Thank you Dynamic, you mean i can not modify the C code to sample at the original  sample rate, i only can upsample the output file of the decoder ?
you are right, the SBR part in the enhanced aac+ decoder does the upsample, use QMF filter
about the resample software, how about the Adobe Audition software?which one is better?
i used the 'resample' command in matlab, but i am not sure about the 'order' value,if order is not correct, the quality may be not good.

enhanced aac+ to aac lc

Reply #3
What you see is to be expected. AFAIK, the only way to decode the LC part is at the half sampling rate because the inverse transform will only produce the number of samples that went into the original MDCT. (Out of interest, the old mp3pro files used MP3 at half the sampling rate, i.e. MPEG-2 layer 3, plus SBR for the upper half and a normal MP3 decoder without MP3Pro features would only produce the lower half).

I believe you need to resample (and I think that's what a AAC+ decoder does) though you'll probably find an app like foobar2000 (Windows) or SoX (command line multiplatform) does a good job of resampling and much faster than MATLAB, last I heard.


as i know, the Neroaac/dec or fhgaacenc/dec they can generate the original sample rate .wav file after decoder directly under LC mode(without sbr), i am curious how did they do that, do you think maybe they also write some resample codec in C?

enhanced aac+ to aac lc

Reply #4
I guess we need to inspect the source code of an open source decoder to find out. I can't see how mathematically they could generate with an iMDCT containing twice as many samples as 'frequency' bins (positive and negative 'frequencies' both count towards the total) provided in the transform domain. Therefore, I'd assume it must be done after converting to the time domain (i.e. by upsampling with appropriate anti-alias filtering).

I'd like to be enlightened by anyone who knows for sure.
Dynamic – the artist formerly known as DickD

enhanced aac+ to aac lc

Reply #5
I guess we need to inspect the source code of an open source decoder to find out. I can't see how mathematically they could generate with an iMDCT containing twice as many samples as 'frequency' bins (positive and negative 'frequencies' both count towards the total) provided in the transform domain. Therefore, I'd assume it must be done after converting to the time domain (i.e. by upsampling with appropriate anti-alias filtering).

I'd like to be enlightened by anyone who knows for sure.


Hope someone can read this post.

I don't know how to write upsampling with filtering in C code, so maybe i can only use some software to convert sample rate.
About the sample rate conversion software, how about the Adobe Audition software?which one do you think is better?

enhanced aac+ to aac lc

Reply #6
SoX is a free open source project on Sourceforge with very good resampling, so you could use it externally or adapt their code providing you comply with their license.

Also tools like ffmpeg might have useful source code.
Dynamic – the artist formerly known as DickD


enhanced aac+ to aac lc

Reply #8
SoX is a free open source project on Sourceforge with very good resampling, so you could use it externally or adapt their code providing you comply with their license.

Also tools like ffmpeg might have useful source code.

when you said SoX ,do you mean this http://sourceforge.net/projects/sox/?source=directory 
or
http://www.hydrogenaudio.org/forums/index....showtopic=67373 (connect to foobar2000)
are they the same thing?


enhanced aac+ to aac lc

Reply #10
I guess we need to inspect the source code of an open source decoder to find out. I can't see how mathematically they could generate with an iMDCT containing twice as many samples as 'frequency' bins (positive and negative 'frequencies' both count towards the total) provided in the transform domain. Therefore, I'd assume it must be done after converting to the time domain (i.e. by upsampling with appropriate anti-alias filtering).

I'd like to be enlightened by anyone who knows for sure.

Hi, Dynamic,in the decoder, 'splline_resampler.c', there are
static const float a_22_16 = 0.3f;    resample from 22.05khz->16khz
static const float b_22_16 = 1.3f;

static const float a_24_16 = 0.24f;    resample from 24khz->16khz
static const float b_24_16 = 1.24f;


static const float a_22_8 = 0.06f;    resample from 22khz->8khz
static const float b_22_8 = 1.06f;

static const float a_24_8 = 0.05f;    resample from 24khz->8khz
static const float b_24_8 = 1.05f;

i guess maybe they add the iirFilterCoeff_a/b of resampling from e.g.24khz->48khz(upsample back to original one)...,but  i'm not sure,  i don't know how these values come from .do you know how to make sure of these iirFilterCoeff_a/b values?

by the way, do you know have to get the mono decoder output file ? i set the CT mono debug mode, it seems everytime when the program went to interleaveSamples(&TimeDataFloat[0],&TimeDataFloat[frameSize],pTimeDataPcm,frameSize,&numChannels);
the 'numChannels' will change from 1 to 2, but if i set the numchannels keeps 1, i will get the mono output ,but the sound sounds totally wrong...thanks.

enhanced aac+ to aac lc

Reply #11
Thats' from the 3gpp code isn't it? I think the intention of spline_resampler.c is to allow handsets whose DACs don't support the rate of a received file to downsample to a supported rate.

As you want to upsample, I'm not sure this code is directly applicable, but I haven't looked into it.

With downsampling, you need to filter first to remove frequencies above the Nyquist limit, then interpolate to the new, lower sampling rate.
With upsampling, you need to interpolate to the new higher sampling rate then filter afterwards to remove any frequencies that have been introduced above the Nyquist limit of the lower rate.
In theory, which ever way you're going between the same pair of sampling rates (up or down), the cut-off frequency should be the same, assuming an ideal filter.

The speex resampler code looks useful, seems to have a liberal license, works for arbitrary rates, and it implements upsampling intelligently, in that it recognises that the filter design for downsampling must ensure good attenuation at and above the Nyquist limit, but for upsampling, the content is already low on content very close to the Nyquist limit and zero above it until you introduce aliasing by your chosen method of interpolation, so they can be more relaxed about the attenuation close to the limit and preserve audio frequencies better by choosing a slightly higher cut-off frequency when upsampling than they do when downsampling. It also offers FIXED POINT or FLOATING POINT versions, which you can choose depending on your hardware, and I believe it has been tested when compiled for numerous popular platforms (certainly the Opus source code which includes the same resampler has been tested very widely prior to IETF standardization)

The speex one calculates the sinc function on the fly, calculates the cut-off mathematically but has a number of Kaiser window functions pre-calculated in the source code, but it includes some values for adjusting the filter cut-off frequency for upsampling versus downsampling.

It can essentially be treated as a black box that just does the job without having to understand how.

(P.S. That's the right SoX project you linked to a few posts above, and their resampling code has been implemented in a fb2k plugin which you mentioned in the other link. Some people get very picky about inaudible differences that can show up on graphs, where SoX resampler performs very well. I doubt there's an audible difference from fb2k's PPHS resampler or speex's for normal sampling rates. I guess there's a modest chance of slight audibility when upsampling from very low sample rates such as 8kHz.)
Dynamic – the artist formerly known as DickD

enhanced aac+ to aac lc

Reply #12
Thats' from the 3gpp code isn't it? I think the intention of spline_resampler.c is to allow handsets whose DACs don't support the rate of a received file to downsample to a supported rate.

As you want to upsample, I'm not sure this code is directly applicable, but I haven't looked into it.

With downsampling, you need to filter first to remove frequencies above the Nyquist limit, then interpolate to the new, lower sampling rate.
With upsampling, you need to interpolate to the new higher sampling rate then filter afterwards to remove any frequencies that have been introduced above the Nyquist limit of the lower rate.
In theory, which ever way you're going between the same pair of sampling rates (up or down), the cut-off frequency should be the same, assuming an ideal filter.

The speex resampler code looks useful, seems to have a liberal license, works for arbitrary rates, and it implements upsampling intelligently, in that it recognises that the filter design for downsampling must ensure good attenuation at and above the Nyquist limit, but for upsampling, the content is already low on content very close to the Nyquist limit and zero above it until you introduce aliasing by your chosen method of interpolation, so they can be more relaxed about the attenuation close to the limit and preserve audio frequencies better by choosing a slightly higher cut-off frequency when upsampling than they do when downsampling. It also offers FIXED POINT or FLOATING POINT versions, which you can choose depending on your hardware, and I believe it has been tested when compiled for numerous popular platforms (certainly the Opus source code which includes the same resampler has been tested very widely prior to IETF standardization)

The speex one calculates the sinc function on the fly, calculates the cut-off mathematically but has a number of Kaiser window functions pre-calculated in the source code, but it includes some values for adjusting the filter cut-off frequency for upsampling versus downsampling.

It can essentially be treated as a black box that just does the job without having to understand how.

(P.S. That's the right SoX project you linked to a few posts above, and their resampling code has been implemented in a fb2k plugin which you mentioned in the other link. Some people get very picky about inaudible differences that can show up on graphs, where SoX resampler performs very well. I doubt there's an audible difference from fb2k's PPHS resampler or speex's for normal sampling rates. I guess there's a modest chance of slight audibility when upsampling from very low sample rates such as 8kHz.)

Yes, that is from the 3GPP code.
Thank you so much, Dynamic, I can learn something from your reply.
How about the 'mono' question,do you know have to get the mono decoder output file ?
I set the CT mono debug mode, but it seems everytime when the program goes to
'interleaveSamples(&TimeDataFloat[0],&TimeDataFloat[frameSize],pTimeDataPcm,frameSize,&numChannels);'
the 'numChannels' will change from 1 to 2, but if I let the numchannels keep 1, I will get the mono output ,but the sound will sounds totally wrong...
I choose the 96kbps-mono.I am confused.
Looking forward to your reply.

enhanced aac+ to aac lc

Reply #13
How about the 'mono' question,do you know have to get the mono decoder output file ?
I set the CT mono debug mode, but it seems everytime when the program goes to
'interleaveSamples(&TimeDataFloat[0],&TimeDataFloat[frameSize],pTimeDataPcm,frameSize,&numChannels);'
the 'numChannels' will change from 1 to 2, but if I let the numchannels keep 1, I will get the mono output ,but the sound will sounds totally wrong...
I choose the 96kbps-mono.I am confused.
Looking forward to your reply.


I thought the whole idea of the special mono mode was this:

If the original encode was stereo, the AAC-LC part (low frequencies) must be decoded as stereo and downmixed to mono. However, you can save computational resources with the SBR layer by downmixing the components before conducting the band replication.

Pure speculation but I wonder if what you're doing by forcing numChannels to 1 is over-riding the initial stereo decode of the LC layer (whose stereo information could be encoded in various ways, such as L-R or M-S stereo for example) and possibly you occasionally get the M and occassionally get the L, for example, depending on what was chosen for each frame.

I'm not really familiar with the 3GPP code to tell what the problem is. If you've already stripped away the SBR layer, it might be totally useless to specify mono decoding, as it only does anything different in the SBR layer (which you've discarded), and you're best to simply decode as stereo and downmix using any of the usual formulae such as mono=(L+R)/2 or mono=(L+R)/sqrt(2).
Dynamic – the artist formerly known as DickD

enhanced aac+ to aac lc

Reply #14
How about the 'mono' question,do you know have to get the mono decoder output file ?
I set the CT mono debug mode, but it seems everytime when the program goes to
'interleaveSamples(&TimeDataFloat[0],&TimeDataFloat[frameSize],pTimeDataPcm,frameSize,&numChannels);'
the 'numChannels' will change from 1 to 2, but if I let the numchannels keep 1, I will get the mono output ,but the sound will sounds totally wrong...
I choose the 96kbps-mono.I am confused.
Looking forward to your reply.


I thought the whole idea of the special mono mode was this:

If the original encode was stereo, the AAC-LC part (low frequencies) must be decoded as stereo and downmixed to mono. However, you can save computational resources with the SBR layer by downmixing the components before conducting the band replication.

Pure speculation but I wonder if what you're doing by forcing numChannels to 1 is over-riding the initial stereo decode of the LC layer (whose stereo information could be encoded in various ways, such as L-R or M-S stereo for example) and possibly you occasionally get the M and occassionally get the L, for example, depending on what was chosen for each frame.

I'm not really familiar with the 3GPP code to tell what the problem is. If you've already stripped away the SBR layer, it might be totally useless to specify mono decoding, as it only does anything different in the SBR layer (which you've discarded), and you're best to simply decode as stereo and downmix using any of the usual formulae such as mono=(L+R)/2 or mono=(L+R)/sqrt(2).

Thanks, Dynamic.
I'm sorry I forgot to mention what i tried is that the original encode is mono, and the decoder debug mode is also mono. There is no need to do downmix.

enhanced aac+ to aac lc

Reply #15
So that's an interleave function?

Why do you need to interleave mono data? You should only need to interleave multiple channels. Are you perhaps forcing it to interleave mono data as if it were stereo, thus ending up with a stereo file of half the duration, perhaps?
Dynamic – the artist formerly known as DickD

enhanced aac+ to aac lc

Reply #16
So that's an interleave function?

Why do you need to interleave mono data? You should only need to interleave multiple channels. Are you perhaps forcing it to interleave mono data as if it were stereo, thus ending up with a stereo file of half the duration, perhaps?

the original input file to the encoder is also mono file , there is nothing about stereo.
acturally i don‘t know why it does the interleave, the number of channels change from 1 to 2  after running the '  interleaveSamples' and if i comment out this 'interleaveSamples'. there will be no sound.
From the standard i know i can't encode/decode in mono mode if the bitrate is <=44kbps, so i tried 96kbps.

    /* interleave time samples */
  interleaveSamples(&TimeDataFloat[0],&TimeDataFloat[frameSize],pTimeDataPcm,frameSize,&numChannels);

static void
interleaveSamples(float *pTimeCh0,
                  float *pTimeCh1,
                  short *pTimeOut,
                  int frameSize,
                  int *channels)
{
  int i;

  for (i=0; i<frameSize; i++)
  {
    *pTimeOut++ = (short) *pTimeCh0++;

    if(*channels == 2) {
      *pTimeOut++ = (short) *pTimeCh1++;
    }
    else {
      *pTimeOut = *(pTimeOut-1);
      *pTimeOut++;
    }
  }
  *channels = 2;
}