DSPRelated.com
Forums

Harmonic Noise shaping Filter

Started by BhanuPrakash July 18, 2003
HI Grp,

What is a Harmonic noise shaping filter. What actually it does.?

I encountered the following statement in on Speech CODEC "In order to
improve the quality of the encoded speech, a harmonic noise shaping
filter is constructed."

How does this improve the Quality???

TIA,
BP$



All,

I have the same question. Several months ago, I fixed a bug having long existed
in the harmonic noise shaping filter function in our G723.1 coder. I thought I
had improved the performance. Then I asked the QA guys test the fixed version
against older one. The tested PSQM score didn't show any improvement. Directly
hearing the decoded voice doesn't tell the difference either.

I would also appreiate it if somebody could explains more about the harmonic
noise shaping filter.

LIjun

BhanuPrakash <> wrote:
HI Grp,

What is a Harmonic noise shaping filter. What actually it does.?

I encountered the following statement in on Speech CODEC "In order to
improve the quality of the encoded speech, a harmonic noise shaping
filter is constructed."

How does this improve the Quality???

TIA,
BP$ _____________________________________
Note: If you do a simple "reply" with your email client, only the author of this
message will receive your answer. You need to do a "reply all" if you want your
answer to be distributed to the entire group.

_____________________________________
About this discussion group:

To Join:

To Post:

To Leave:

Archives: http://www.yahoogroups.com/group/speechcoding

Other DSP-Related Groups: http://www.dsprelated.com

---------------------------------
Post your free ad now! Yahoo! Canada Personals



hi all!

The Harmonic noise shaping filter and the formant perceptual weighting
filter are both based on the same principle - they try to minimise the noise
in the "spectral peaks". (The spectral peaks can be formant peaks as well as
the pitch harmonic peaks).

The motivation for the use of these filters comes from the masking property
of the ear, which says that if the noise level is below a particular
threshold based on the energy of the speech signal, it cannot be perceived
by the ear as the signal would tend to 'mask' it.

The overall masking threshold for a given speech segment follows the peaks
and valleys of the speech spectrum. If a speech coder can push the noise
below the masking threshold function at all frequencies, the coded speech
would be perceptually noise-free. However, at low bit rates it is difficult
to push noise below the threshold in both "peaks" and "valleys" of the
speech spectrum.

So the strategy adopted is to preserve the spectral peaks and sacrifice the
valleys. in other words, during encoding, noise spectral shaping is done in
such a way that the noise components around the spectral peaks are below
masking threshold while noise components in valley regions are not.

hence, a "harmonic noise shaping filter" would attenuate noise at pitch
harmonic PEAKS (and the "formant perceptual weighting filter" would
attenuate noise at the formant PEAKS).

in doing so, the noise in the "valleys" may exceed the threshold and hence
most of the perceived noise comes from the spectral valleys, including
valleys between pitch harmonics. This noise is taken care of by the
"postfilter" at the decoder end. The postfilter would simply attenuate the
freq components between pitch harmonics and the formants, which contain the
unwanted noise. (these are better known as "long term" and "short term"
postfilters respectively).

so ultimately you get more or less acceptable quantisation noise in the
spectral peaks as well as in the valleys...

hope that explains the concept behind the HNS (and FPW) filter.. for more
details please go through this wonderful paper:

http://scl.ece.ucsb.edu/pubs/pubs_E/e95_1.pdf

best regards,
Sameer.
-----Original Message-----
From: Lijun Tan [mailto:]
Sent: Monday, August 11, 2003 9:47 PM
To: BhanuPrakash;
Subject: Re: [speechcoding] Harmonic Noise shaping Filter All,

I have the same question. Several months ago, I fixed a bug having long
existed in the harmonic noise shaping filter function in our G723.1 coder. I
thought I had improved the performance. Then I asked the QA guys test the
fixed version against older one. The tested PSQM score didn't show any
improvement. Directly hearing the decoded voice doesn't tell the difference
either.

I would also appreiate it if somebody could explains more about the harmonic
noise shaping filter.

LIjun

BhanuPrakash <> wrote:
HI Grp,

What is a Harmonic noise shaping filter. What actually it does.?

I encountered the following statement in on Speech CODEC "In order to
improve the quality of the encoded speech, a harmonic noise shaping
filter is constructed."

How does this improve the Quality???

TIA,
BP$



Hi,

I would like to add in a bit to the already good explaination from Sameer.

I would reiterate what Sameer said...

The strategy adopted is to preserve the spectral peaks and sacrifice the
valleys. in other words, during encoding, noise spectral shaping is done in such
a way that the noise components around the spectral peaks are below masking
threshold while noise components in valley regions are not.

The reason for this is the basic LPC difference equation has no zeroes included
( remember we assume the order of the denominator high (LPCORDER) compared to
the numerator). It's an all pole filter otherwise.

That's why LPC filter can represent peaks more closely than the valleys.

Cheer's

Arvind
Sameer Kibey <> wrote:
hi all!

The Harmonic noise shaping filter and the formant perceptual weighting
filter are both based on the same principle - they try to minimise the noise
in the "spectral peaks". (The spectral peaks can be formant peaks as well as
the pitch harmonic peaks).

The motivation for the use of these filters comes from the masking property
of the ear, which says that if the noise level is below a particular
threshold based on the energy of the speech signal, it cannot be perceived
by the ear as the signal would tend to 'mask' it.

The overall masking threshold for a given speech segment follows the peaks
and valleys of the speech spectrum. If a speech coder can push the noise
below the masking threshold function at all frequencies, the coded speech
would be perceptually noise-free. However, at low bit rates it is difficult
to push noise below the threshold in both "peaks" and "valleys" of the
speech spectrum.

So the strategy adopted is to preserve the spectral peaks and sacrifice the
valleys. in other words, during encoding, noise spectral shaping is done in
such a way that the noise components around the spectral peaks are below
masking threshold while noise components in valley regions are not.

hence, a "harmonic noise shaping filter" would attenuate noise at pitch
harmonic PEAKS (and the "formant perceptual weighting filter" would
attenuate noise at the formant PEAKS).

in doing so, the noise in the "valleys" may exceed the threshold and hence
most of the perceived noise comes from the spectral valleys, including
valleys between pitch harmonics. This noise is taken care of by the
"postfilter" at the decoder end. The postfilter would simply attenuate the
freq components between pitch harmonics and the formants, which contain the
unwanted noise. (these are better known as "long term" and "short term"
postfilters respectively).

so ultimately you get more or less acceptable quantisation noise in the
spectral peaks as well as in the valleys...

hope that explains the concept behind the HNS (and FPW) filter.. for more
details please go through this wonderful paper:

http://scl.ece.ucsb.edu/pubs/pubs_E/e95_1.pdf

best regards,
Sameer.
-----Original Message-----
From: Lijun Tan [mailto:]
Sent: Monday, August 11, 2003 9:47 PM
To: BhanuPrakash;
Subject: Re: [speechcoding] Harmonic Noise shaping Filter All,

I have the same question. Several months ago, I fixed a bug having long
existed in the harmonic noise shaping filter function in our G723.1 coder. I
thought I had improved the performance. Then I asked the QA guys test the
fixed version against older one. The tested PSQM score didn't show any
improvement. Directly hearing the decoded voice doesn't tell the difference
either.

I would also appreiate it if somebody could explains more about the harmonic
noise shaping filter.

LIjun

BhanuPrakash <> wrote:
HI Grp,

What is a Harmonic noise shaping filter. What actually it does.?

I encountered the following statement in on Speech CODEC "In order to
improve the quality of the encoded speech, a harmonic noise shaping
filter is constructed."

How does this improve the Quality???

TIA,
BP$ _____________________________________
Note: If you do a simple "reply" with your email client, only the author of this
message will receive your answer. You need to do a "reply all" if you want your
answer to be distributed to the entire group.

_____________________________________
About this discussion group:

To Join:

To Post:

To Leave:

Archives: http://www.yahoogroups.com/group/speechcoding

Other DSP-Related Groups: http://www.dsprelated.com
---------------------------------




hi Arvind

thanks a lot for the followup !
but i am reluctant to say that the harmonic noise shaping (HNS) and formant
perceptual weighting (FPW) filters have much to do with the LPC filter. the
FPW filter is certainly derived from the LPC filter, but it contains both
poles as well as "zeros". Similarly the transfer function of HNS filter too
contains both poles and zeros.

anyway, to borrow explanation from the paper by Chen and Gersho : "in most
cases, lowering noise components at certain frequencies can only be achieved
at the price of increased noise components at other frequencies.Therefore at
very low encoding rates when the average level of coding noise is quite
high, it is very difficult, if not impossible, to force noise below the
threshold at all frequencies. The situation is similar to stepping on a
ballon: when we use noise spectral shaping to reduce the noise components in
the spectral valley regions, the noise components near formants will exceed
the threshold; on the other hand, if we reduce the noise near formants, the
noise in valley regions will exceed the threshold"

effectively that is the difficulty involved in filter design and thats why
the strategy of preserving peaks and sacrificing valleys..
Sameer -----Original Message-----
From: Arvind [mailto:]
Sent: Wednesday, August 13, 2003 9:42 AM
To: Sameer Kibey; 'Lijun Tan'; 'BhanuPrakash';
Subject: RE: [speechcoding] Harmonic Noise shaping Filter Hi,

I would like to add in a bit to the already good explaination from Sameer.

I would reiterate what Sameer said...

The strategy adopted is to preserve the spectral peaks and sacrifice the
valleys. in other words, during encoding, noise spectral shaping is done in
such a way that the noise components around the spectral peaks are below
masking threshold while noise components in valley regions are not.

The reason for this is the basic LPC difference equation has no zeroes
included ( remember we assume the order of the denominator high (LPCORDER)
compared to the numerator). It's an all pole filter otherwise.
That's why LPC filter can represent peaks more closely than the valleys.
Cheer's
Arvind
Sameer Kibey <> wrote:
hi all!

The Harmonic noise shaping filter and the formant perceptual weighting
filter are both based on the same principle - they try to minimise the noise
in the "spectral peaks". (The spectral peaks can be formant peaks as well as
the pitch harmonic peaks).

The motivation for the use of these filters comes from the masking property
of the ear, which says that if the noise level is below a particular
threshold based on the energy of the speech signal, it cannot be perceived
by the ear as the signal would tend to 'mask' it.

The overall masking threshold for a given speech segment follows the peaks
and valleys of the speech spectrum. If a speech coder can push the noise
below the masking threshold function at all frequencies, the coded speech
would be perceptually noise-free. However, at low bit rates it is difficult
to push noise below the threshold in both "peaks" and "valleys" of the
speech spectrum.

So the strategy adopted is to preserve the spectral peaks and sacrifice the
valleys. in other words, during encoding, noise spectral shaping is done in
such a way that the noise components around the spectral peaks are below
masking threshold while noise components in valley regions are not.

hence, a "harmonic noise shaping filter" would attenuate noise at pitch
harmonic PEAKS (and the "formant perceptual weighting filter" would
attenuate noise at the formant PEAKS).

in doing so, the noise in the "valleys" may exceed the threshold and hence
most of the perceived noise comes from the spectral valleys, including
valleys between pitch harmonics. This noise is taken care of by the
"postfilter" at the decoder end. The postfilter would simply attenuate the
freq components between pitch harmonics and the formants, which contain the
unwanted noise. (these are better known as "long term" and "short term"
postfilters respectively).

so ultimately you get more or less acceptable quantisation noise in the
spectral peaks as well as in the valleys...

hope that explains the concept behind the HNS (and FPW) filter.. for more
details please go through this wonderful paper:

http://scl.ece.ucsb.edu/pubs/pubs_E/e95_1.pdf

best regards,
Sameer.


Hi Ilya,

I checked the G.723.1 standard just now .. what you say abt HNS filter being
a classical LTP is indeed right.

It is interesting to note that the simple LTP can also serve as the harmonic
noise shaper. This, however, upon a little thinking appears logical.. and I
just thought I should share this small explanation with all.

It is well understood that long term periodicity in the time domain (or the
"pitch") manifests itself as the pitch harmonic peaks in freq domain. The
job of the LTP is to remove this long term periodicity (or the long term
'redundancy').
So once you pass the input speech thro' the LTP, what is left is the speech
without any pitch component. In the frequency domain, this is reflected as
"removal of the pitch harmonics" i.e. the spectrum would tend to get
"flattened".

In other words, the frequency response of the LTP is inverse of the input
speech spectrum. (if you want spectrum to get flattened, it is obvious that
your filter's response should have valleys where the input signal has peaks
and vice versa).

Now let us look at what we want from the harmonic noise shaper: its main
function is to attenuate noise in the spectral peaks. So its freq response
should have (a)valleys where the speech spectrum has harmonic peaks (b)peaks
at the valleys between the speech harmonic peaks.

That is nothing but the freq response of the LTP !!
thus the LTP can indeed be used as a HNS filter...

as far as the confusion between HNS filter and Adaptive CB is concerned, the
HNS in G.723.1 is a simple "first order" filter. The adaptive CB, on the
other hand, uses a fifth order filter for better prediction. so both are
different.(Probably the first order filter is quite sufficient for the
purpose of speech enhancement using HNS).

regards,
Sameer. -----Original Message-----
From: Ilya Druker [mailto:]
Sent: Sunday, August 17, 2003 3:07 PM
To:
Subject: [speechcoding] Re: Harmonic Noise shaping Filter Harmonic Noise Shaping filter is just a beautiful name for the classic
Long-Term Prediction (LTP) filter. But LTP filter is a private case of
adaptive codebook. In G723.1 the Harmonic Noise Shaping filter is
followed by adaptive codebook in analysis-by-synthesis quantization of
subframes. What I cannot understand is why G723.1 needs BOTH Harmonic
Noise Shaping AND adaptive codebook?! Is not that redundant?

Thanks, Ilya Druker

--- In , "Sameer Kibey" <sameer@t...> wrote:
> hi all!
>
> The Harmonic noise shaping filter and the formant perceptual weighting
> filter are both based on the same principle - they try to minimise
the noise
> in the "spectral peaks". (The spectral peaks can be formant peaks as
well as
> the pitch harmonic peaks).
>
> The motivation for the use of these filters comes from the masking
property
> of the ear, which says that if the noise level is below a particular
> threshold based on the energy of the speech signal, it cannot be
perceived
> by the ear as the signal would tend to 'mask' it.
>
> The overall masking threshold for a given speech segment follows the
peaks
> and valleys of the speech spectrum. If a speech coder can push the noise
> below the masking threshold function at all frequencies, the coded
speech
> would be perceptually noise-free. However, at low bit rates it is
difficult
> to push noise below the threshold in both "peaks" and "valleys" of the
> speech spectrum.
>
> So the strategy adopted is to preserve the spectral peaks and
sacrifice the
> valleys. in other words, during encoding, noise spectral shaping is
done in
> such a way that the noise components around the spectral peaks are below
> masking threshold while noise components in valley regions are not.
>
> hence, a "harmonic noise shaping filter" would attenuate noise at pitch
> harmonic PEAKS (and the "formant perceptual weighting filter" would
> attenuate noise at the formant PEAKS).
>
> in doing so, the noise in the "valleys" may exceed the threshold and
hence
> most of the perceived noise comes from the spectral valleys, including
> valleys between pitch harmonics. This noise is taken care of by the
> "postfilter" at the decoder end. The postfilter would simply
attenuate the
> freq components between pitch harmonics and the formants, which
contain the
> unwanted noise. (these are better known as "long term" and "short term"
> postfilters respectively).
>
> so ultimately you get more or less acceptable quantisation noise in the
> spectral peaks as well as in the valleys...
>
> hope that explains the concept behind the HNS (and FPW) filter.. for
more
> details please go through this wonderful paper:
>
> http://scl.ece.ucsb.edu/pubs/pubs_E/e95_1.pdf
>
> best regards,
> Sameer.