Forums

Psychoacoustics in MPEG-4 AAC.

Started by Ramya Desai July 14, 2003
Hi all,

A good day to all. 

I want to know about psychoacoustic models in MPEG-4 AAC. If any of you know
links to papers..please forward me. I will be thankful to you.

Regards,

Ramya.
	
Hi,

the models used by the ISO code are usually described quite
well in the standard. However, these models are not optimized
to work too good. :-)

Principally, the psychoacoustic model for AAC can be similar
or the same (despite the block length etc.) as the
psychoacoustic model for MP3. A google search and a look at
citeseer for psychoacoustic models will give you some first
pointers. There are also quite many AES papers from the last
years regarding AAC encoding.

Best regards,
Alexander

Ramya Desai wrote:
> Hi all,
> 
> A good day to all. 
> 
> I want to know about psychoacoustic models in MPEG-4 AAC. If any of you
know links to papers..please forward me. I will be thankful to you.
> 
> Regards,
> 
> Ramya.
> 
>  
> 
>
	-- 
dipl. ing.
alexander lerch

zplane.development
http://www.zplane.de
holsteinische str. 39-42
D-12161 berlin
 fon: +49.30.854 09 15.0
 fax: +49.30.854 09 15.5
	
Hello,

 The psycho acoustic model is described well in the standard but just omitting
whatever is needed to give a good encoder quality. So if you follow only  the
model which is explained in the standard you will end up in a worst quality
(will seldom cross 2 on the MOS scale for most of the SQAM). Some places where
the standard doesn't says anything are 

1) On window switching mechanism: It says that the PE should be greater than a
implementation dependent constant but doesn't say what is the value for it. If
you make the value lower, for some SQAM the quality improves but it results in
unnecessary switching for some cases.

2) MPEG 4 has window shape switching as well . But when to switch to KBD and
when to stay in sin windows again the standard doesn't say anything.

3) MPEG 4 has short window grouping mechanism. But how to group and on what
basis to group ?

4) On what scale factor bands should one aplly MS, IS dual or PNS again decides
the quality and standard doesn't address the same.

5) When compared to MP3,MPEG4 has ability to enable MS/IS on a scale factor
basis. How to do the same.

6) PNS do deteriorate the quality if not chosen for only noise. But how to rate
a scalefactor content as  noise? 

 It is based on all these factors that the encoder quality depends and the
persons who made the standards will never spill the bean. If any one has some
input for the same please do share. Last day I had posted with the same
questions but I think by  mistake it went to Mr.Mihir. He did gave some very
useful pointers. Thanks very much Mihir. I am attaching his mail here for
reference . Some good AES papers are listed in the same.

Cheers,
 Tony
	-----Original Message-----
From: Mody, Mihir [mailto:mihir@mihi...] 
Sent: Sunday, July 13, 2003 11:51 AM
To: Tony Francis
Subject: RE: MPEG-4 AAC LC Books

Hi Tony,

1) The paper is useful to understand PNS principles. You are really interested
in implentation from encoder side, please refer following paper. D. Schulz,
"Improving Audio Codecs by Noise Substitution", Journal of the AES,
Volume 44, Number 7/8, July/August 1996, pp. 593-598
2) The following papers will be useful from to decide MS/IS switching decisions.
a)J. Herre, E. Eberlein and K. Brandenburg, "Combined stereo coding" ,
93rd AES convention, San Francisco, 1992 b)Johnston and Ferreira,
"Sum-Difference Stereo Transform Coding", Proc. IEEE ICASSP (1992) p
569-571 c)Herre, J., Brandenburg, K. and Lederer, D. `Intensity Stereo Coding'
AES 96th convention , Amsterdam, preprint 3799 (1994) 
3) This is one way to to detect transient detection. If the psychoacoustic model
is 
is not very good, then PE calculation wont be good. I would suggest to try other
methods e.g time domain energy etc. 
4) Yes, There is software named OPERA from German company OPTICOM. This uses
this models to give score for a given single and useful for encoder
development.
	One more thing I would like to add is that, there is no one way of doing in all
modules of audio encoder. Encoder need lot of tuning for given method as well as
to find right algorithm for switching module.

Regards,
Mihir
	-----Original Message-----
From: Tony Francis [mailto:Tony@Tony...]
Sent: Sunday, July 13, 2003 12:26 AM
To: Mody, Mihir
Subject: Re: MPEG-4 AAC LC Books
	Hello Mihir,

 I have some questions.

 1) How much useful is "J. Herre, D. Schulz: "Extending the MPEG-4 
AAC Codec by Perceptual Noise Substitution", 104th AES Convention, 
Amsterdam 1998, Preprint 4720 " paper. Does it reveal anything 
related to implementation of PNS from and encoder point of view? How 
can you select which scale factors to code using the PNS coding 
block. To be specific any information to classify one scale factor 
content as noise?

2) Same thing with IS coding also. From MPEG 2 NBC (when compared to 
mp3) , IS/MS can be enabled scale factor wise. How to decide from 
where to start IS coding from and on what basis I can selectively on 
and off IS/MS.

3) Switching from long to short block depends on the calculated PE. 
But on what value one should switch to short block. Standard says it 
as an implementation constant. Is this one also a variable one ? If I 
make the value of constant very low, I get a good quality for some 
audio  test vectors but when i do a single tone analysis, it gives a 
bad state.

4) Is any one using ITU R BS 1387 for subjective audio analysis?

 Thanks in advance for any help.
cheers,
Tony
	-----Original Message-----
From: alexander lerch [mailto:lerch@lerc...] 
Sent: Monday, July 14, 2003 7:30 PM
To: Ramya Desai
Cc: audiodsp@audi...
Subject: Re: [audiodsp] Psychoacoustics in MPEG-4 AAC.
	Hi,

the models used by the ISO code are usually described quite well in the
standard. However, these models are not optimized to work too good. :-)

Principally, the psychoacoustic model for AAC can be similar
or the same (despite the block length etc.) as the psychoacoustic model for MP3.
A google search and a look at citeseer for psychoacoustic models will give you
some first pointers. There are also quite many AES papers from the last years
regarding AAC encoding.

Best regards,
Alexander

Scanned by SecureSynergy VirusScreen Service. 
For more information log on to : http://www.securesynergyonline.com or
http://www.securesynergy.com
	
Hi,

I would differentiate between the psychoacoustic model itself
and the tools like TNS, LTP, PNS, Window Switching, Block
Switching, Stereo Tools, Prediction, Noiseless Coding etc.,
even if data from the psychoacoustic model can be used for
controlling some of them.

The most critical parts of an AAC (and MP3) encoder are the
psychoacoustic model and the quantizer. If you have these
working fairly good, it is time to implement the tools to
enhance quality further.

The psychoacoustic models used by many of the available
encoders are principally based on the ISO-Model. However, just
using the standard code will not result in good quality (and
that is not the intention of the code).
To get a feeling where to tune, a look at some open source
projects can be helpful. The most prominent projects would be
lame and faac.
Note that the encoder is never standardized in MPEG general
audio recommendations, but only its output bitstream. The part
describing the encoder is only informative, not normative. So,
you are allowed to use completely different models to achieve
other/better results. In this way, quality improvements are
possible after publishing the standard.
	Tony Francis wrote:
> Hello,
> 
>  The psycho acoustic model is described well in the standard but just
omitting whatever is needed to give a good encoder quality. So if you follow
only  the model which is explained in the standard you will end up in a worst
quality (will seldom cross 2 on the MOS scale for most of the SQAM). Some places
where the standard doesn't says anything are 
> 
> 1) On window switching mechanism: It says that the PE should be greater
than a implementation dependent constant but doesn't say what is the value for
it. If you make the value lower, for some SQAM the quality improves but it
results in unnecessary switching for some cases.

I would use other criteria, not only the PE.

> 
> 2) MPEG 4 has window shape switching as well . But when to switch to KBD
and when to stay in sin windows again the standard doesn't say anything.

That's indeed a little bit tricky. It will lead to quality
improvements in a few cases, but could lead to worse quality
in other cases.

> 
> 3) MPEG 4 has short window grouping mechanism. But how to group and on what
basis to group ?
> 
> 4) On what scale factor bands should one aplly MS, IS dual or PNS again
decides the quality and standard doesn't address the same.
> 
> 5) When compared to MP3,MPEG4 has ability to enable MS/IS on a scale factor
basis. How to do the same.
> 
> 6) PNS do deteriorate the quality if not chosen for only noise. But how to
rate a scalefactor content as  noise? 

Prediction-based approaches are working well for most cases.
You already have an (very trivial) tonality measure when
calculating the unpredictability.

> 
>  It is based on all these factors that the encoder quality depends and the
persons who made the standards will never spill the bean. If any one has some
input for the same please do share. Last day I had posted with the same
questions but I think by  mistake it went to Mr.Mihir. He did gave some very
useful pointers. Thanks very much Mihir. I am attaching his mail here for
reference . Some good AES papers are listed in the same.
> 

Right :-)
This is why there are not so much AAC encoders on the market.
But it wouldn't be fun if all was standardized, would it?

ITU-R BS.3187 can be useful in codec development to see audio
examples with bad quality and evaluate recent code changes
with a larger set of the sample database. However, I would not
overestimate such objective tools.

Best regards,
Alexander

-- 
dipl. ing.
alexander lerch

zplane.development
http://www.zplane.de
holsteinische str. 39-42
D-12161 berlin
 fon: +49.30.854 09 15.0
 fax: +49.30.854 09 15.5
	
Hello,
 I do agree the critical parts of an encoder are the psychoacoustic model and
quantizer. But the question of it working good needs when tuning for a
particular platform. I have both working good but not getting the desired
quality due to the areas I had mentioned earlier. 

 You had pointed that instead of PE you will use other criteria as well . Can
you please be a bit more elaborate? (same case with window shape as well :-) 

 In fact every thing cant be standardised atleast at the enocer side. After all
it is effort worth years :-)

 And I think what u intended is ITU BS 1387 instead of ITU-R BS.3187. 

 Thanks a lot for the informative reply.
Regards,
Tony Francis
-----Original Message-----
From: alexander lerch [mailto:lerch@lerc...] 
Sent: Tuesday, July 15, 2003 9:00 PM
To: Tony Francis
Cc: audiodsp@audi...
Subject: Re: [audiodsp] Psychoacoustics in MPEG-4 AAC.
	Hi,

I would differentiate between the psychoacoustic model itself and the tools like
TNS, LTP, PNS, Window Switching, Block Switching, Stereo Tools, Prediction,
Noiseless Coding etc., even if data from the psychoacoustic model can be used
for controlling some of them.

The most critical parts of an AAC (and MP3) encoder are the psychoacoustic model
and the quantizer. If you have these working fairly good, it is time to
implement the tools to enhance quality further.

The psychoacoustic models used by many of the available encoders are principally
based on the ISO-Model. However, just using the standard code will not result in
good quality (and that is not the intention of the code). To get a feeling where
to tune, a look at some open source projects can be helpful. The most prominent
projects would be lame and faac. Note that the encoder is never standardized in
MPEG general audio recommendations, but only its output bitstream. The part
describing the encoder is only informative, not normative. So, you are allowed
to use completely different models to achieve other/better results. In this way,
quality improvements are possible after publishing the standard.
	Tony Francis wrote:
> Hello,
> 
>  The psycho acoustic model is described well in the standard but just 
> omitting whatever is needed to give a good encoder quality. So if you
follow only  the model which is explained in the standard you will end up in a
worst quality (will seldom cross 2 on the MOS scale for most of the SQAM). Some
places where the standard doesn't says anything are
> 
> 1) On window switching mechanism: It says that the PE should be 
> greater than a implementation dependent constant but doesn't say what 
> is the value for it. If you make the value lower, for some SQAM the 
> quality improves but it results in unnecessary switching for some 
> cases.

I would use other criteria, not only the PE.

> 
> 2) MPEG 4 has window shape switching as well . But when to switch to 
> KBD and when to stay in sin windows again the standard doesn't say 
> anything.

That's indeed a little bit tricky. It will lead to quality improvements in a few
cases, but could lead to worse quality in other cases.

> 
> 3) MPEG 4 has short window grouping mechanism. But how to group and on 
> what basis to group ?
> 
> 4) On what scale factor bands should one aplly MS, IS dual or PNS 
> again decides the quality and standard doesn't address the same.
> 
> 5) When compared to MP3,MPEG4 has ability to enable MS/IS on a scale 
> factor basis. How to do the same.
> 
> 6) PNS do deteriorate the quality if not chosen for only noise. But 
> how to rate a scalefactor content as  noise?

Prediction-based approaches are working well for most cases. You already have an
(very trivial) tonality measure when calculating the unpredictability.

> 
>  It is based on all these factors that the encoder quality depends and 
> the persons who made the standards will never spill the bean. If any 
> one has some input for the same please do share. Last day I had posted 
> with the same questions but I think by  mistake it went to Mr.Mihir. 
> He did gave some very useful pointers. Thanks very much Mihir. I am 
> attaching his mail here for reference . Some good AES papers are 
> listed in the same.
> 

Right :-)
This is why there are not so much AAC encoders on the market. But it wouldn't be
fun if all was standardized, would it?

ITU-R BS.3187 can be useful in codec development to see audio examples with bad
quality and evaluate recent code changes with a larger set of the sample
database. However, I would not overestimate such objective tools.

Best regards,
Alexander

-- 
dipl. ing.
alexander lerch

zplane.development
http://www.zplane.de
holsteinische str. 39-42
D-12161 berlin
 fon: +49.30.854 09 15.0
 fax: +49.30.854 09 15.5
	------------------------ Yahoo! Groups Sponsor ---------------------~-->
Free shipping on all inkjet cartridge & refill kit orders to US &
Canada. Low prices up to 80% off. We have your brand: HP, Epson, Lexmark &
more. http://www.c1tracking.com/l.asp?cidU10
http://us.click.yahoo.com/GHXcIA/n.WGAA/ySSFAA/26EolB/TM
---------------------------------~->

_____________________________________
Note: If you do a simple "reply" with your email client, only the
author of this message will receive your answer.  You need to do a "reply
all" if you want your answer to be distributed to the entire group.

_____________________________________
About this discussion group:

To Join:  audiodsp-subscribe@audi...

To Post:  audiodsp@audi...

To Leave: audiodsp-unsubscribe@audi...

Archives: http://groups.yahoo.com/group/audiodsp

Other DSP-Related Groups: http://www.dsprelated.com
	">http://docs.yahoo.com/info/terms/
	Scanned by SecureSynergy VirusScreen Service. 
For more information log on to : http://www.securesynergyonline.com or
http://www.securesynergy.com
	
Hi Tony,

Tony Francis wrote:
> Hello,
>  I do agree the critical parts of an encoder are the psychoacoustic model
and quantizer. But the question of it working good needs when tuning for a
particular platform. I have both working good but not getting the desired
quality due to the areas I had mentioned earlier. 
> 
>  You had pointed that instead of PE you will use other criteria as well .
Can you please be a bit more elaborate? (same case with window shape as well :-)


Well, I am not going to tell you our secrets :-). But, since
the main reason of block switching is to avoid pre-echos, and
pre-echos are most noticeable in the case of loud transients,
some form of transient detection could be helpful.
I would not overestimate the influence of window shape
switching on the quality.

>  In fact every thing cant be standardised atleast
at the enocer side. After all it is effort worth years :-)
> 
>  And I think what u intended is ITU BS 1387 instead of ITU-R BS.3187. 

Right, sorry for that typo.

Best regards,
Alexander Lerch

-- 
dipl. ing.
alexander lerch

zplane.development
http://www.zplane.de
holsteinische str. 39-42
D-12161 berlin
 fon: +49.30.854 09 15.0
 fax: +49.30.854 09 15.5