HI Grp, What is a Harmonic noise shaping filter. What actually it does.? I encountered the following statement in on Speech CODEC "In order to improve the quality of the encoded speech, a harmonic noise shaping filter is constructed." How does this improve the Quality??? TIA, BP$ |
|
Harmonic Noise shaping Filter
Started by ●July 18, 2003
Reply by ●August 11, 20032003-08-11
All, I have the same question. Several months ago, I fixed a bug having long existed in the harmonic noise shaping filter function in our G723.1 coder. I thought I had improved the performance. Then I asked the QA guys test the fixed version against older one. The tested PSQM score didn't show any improvement. Directly hearing the decoded voice doesn't tell the difference either. I would also appreiate it if somebody could explains more about the harmonic noise shaping filter. LIjun BhanuPrakash <> wrote: HI Grp, What is a Harmonic noise shaping filter. What actually it does.? I encountered the following statement in on Speech CODEC "In order to improve the quality of the encoded speech, a harmonic noise shaping filter is constructed." How does this improve the Quality??? TIA, BP$ _____________________________________ Note: If you do a simple "reply" with your email client, only the author of this message will receive your answer. You need to do a "reply all" if you want your answer to be distributed to the entire group. _____________________________________ About this discussion group: To Join: To Post: To Leave: Archives: http://www.yahoogroups.com/group/speechcoding Other DSP-Related Groups: http://www.dsprelated.com --------------------------------- Post your free ad now! Yahoo! Canada Personals |
|
Reply by ●August 12, 20032003-08-12
hi all! The Harmonic noise shaping filter and the formant perceptual weighting filter are both based on the same principle - they try to minimise the noise in the "spectral peaks". (The spectral peaks can be formant peaks as well as the pitch harmonic peaks). The motivation for the use of these filters comes from the masking property of the ear, which says that if the noise level is below a particular threshold based on the energy of the speech signal, it cannot be perceived by the ear as the signal would tend to 'mask' it. The overall masking threshold for a given speech segment follows the peaks and valleys of the speech spectrum. If a speech coder can push the noise below the masking threshold function at all frequencies, the coded speech would be perceptually noise-free. However, at low bit rates it is difficult to push noise below the threshold in both "peaks" and "valleys" of the speech spectrum. So the strategy adopted is to preserve the spectral peaks and sacrifice the valleys. in other words, during encoding, noise spectral shaping is done in such a way that the noise components around the spectral peaks are below masking threshold while noise components in valley regions are not. hence, a "harmonic noise shaping filter" would attenuate noise at pitch harmonic PEAKS (and the "formant perceptual weighting filter" would attenuate noise at the formant PEAKS). in doing so, the noise in the "valleys" may exceed the threshold and hence most of the perceived noise comes from the spectral valleys, including valleys between pitch harmonics. This noise is taken care of by the "postfilter" at the decoder end. The postfilter would simply attenuate the freq components between pitch harmonics and the formants, which contain the unwanted noise. (these are better known as "long term" and "short term" postfilters respectively). so ultimately you get more or less acceptable quantisation noise in the spectral peaks as well as in the valleys... hope that explains the concept behind the HNS (and FPW) filter.. for more details please go through this wonderful paper: http://scl.ece.ucsb.edu/pubs/pubs_E/e95_1.pdf best regards, Sameer. -----Original Message----- From: Lijun Tan [mailto:] Sent: Monday, August 11, 2003 9:47 PM To: BhanuPrakash; Subject: Re: [speechcoding] Harmonic Noise shaping Filter All, I have the same question. Several months ago, I fixed a bug having long existed in the harmonic noise shaping filter function in our G723.1 coder. I thought I had improved the performance. Then I asked the QA guys test the fixed version against older one. The tested PSQM score didn't show any improvement. Directly hearing the decoded voice doesn't tell the difference either. I would also appreiate it if somebody could explains more about the harmonic noise shaping filter. LIjun BhanuPrakash <> wrote: HI Grp, What is a Harmonic noise shaping filter. What actually it does.? I encountered the following statement in on Speech CODEC "In order to improve the quality of the encoded speech, a harmonic noise shaping filter is constructed." How does this improve the Quality??? TIA, BP$ |
|
Reply by ●August 13, 20032003-08-13
Hi, I would like to add in a bit to the already good explaination from Sameer. I would reiterate what Sameer said... The strategy adopted is to preserve the spectral peaks and sacrifice the valleys. in other words, during encoding, noise spectral shaping is done in such a way that the noise components around the spectral peaks are below masking threshold while noise components in valley regions are not. The reason for this is the basic LPC difference equation has no zeroes included ( remember we assume the order of the denominator high (LPCORDER) compared to the numerator). It's an all pole filter otherwise. That's why LPC filter can represent peaks more closely than the valleys. Cheer's Arvind Sameer Kibey <> wrote: hi all! The Harmonic noise shaping filter and the formant perceptual weighting filter are both based on the same principle - they try to minimise the noise in the "spectral peaks". (The spectral peaks can be formant peaks as well as the pitch harmonic peaks). The motivation for the use of these filters comes from the masking property of the ear, which says that if the noise level is below a particular threshold based on the energy of the speech signal, it cannot be perceived by the ear as the signal would tend to 'mask' it. The overall masking threshold for a given speech segment follows the peaks and valleys of the speech spectrum. If a speech coder can push the noise below the masking threshold function at all frequencies, the coded speech would be perceptually noise-free. However, at low bit rates it is difficult to push noise below the threshold in both "peaks" and "valleys" of the speech spectrum. So the strategy adopted is to preserve the spectral peaks and sacrifice the valleys. in other words, during encoding, noise spectral shaping is done in such a way that the noise components around the spectral peaks are below masking threshold while noise components in valley regions are not. hence, a "harmonic noise shaping filter" would attenuate noise at pitch harmonic PEAKS (and the "formant perceptual weighting filter" would attenuate noise at the formant PEAKS). in doing so, the noise in the "valleys" may exceed the threshold and hence most of the perceived noise comes from the spectral valleys, including valleys between pitch harmonics. This noise is taken care of by the "postfilter" at the decoder end. The postfilter would simply attenuate the freq components between pitch harmonics and the formants, which contain the unwanted noise. (these are better known as "long term" and "short term" postfilters respectively). so ultimately you get more or less acceptable quantisation noise in the spectral peaks as well as in the valleys... hope that explains the concept behind the HNS (and FPW) filter.. for more details please go through this wonderful paper: http://scl.ece.ucsb.edu/pubs/pubs_E/e95_1.pdf best regards, Sameer. -----Original Message----- From: Lijun Tan [mailto:] Sent: Monday, August 11, 2003 9:47 PM To: BhanuPrakash; Subject: Re: [speechcoding] Harmonic Noise shaping Filter All, I have the same question. Several months ago, I fixed a bug having long existed in the harmonic noise shaping filter function in our G723.1 coder. I thought I had improved the performance. Then I asked the QA guys test the fixed version against older one. The tested PSQM score didn't show any improvement. Directly hearing the decoded voice doesn't tell the difference either. I would also appreiate it if somebody could explains more about the harmonic noise shaping filter. LIjun BhanuPrakash <> wrote: HI Grp, What is a Harmonic noise shaping filter. What actually it does.? I encountered the following statement in on Speech CODEC "In order to improve the quality of the encoded speech, a harmonic noise shaping filter is constructed." How does this improve the Quality??? TIA, BP$ _____________________________________ Note: If you do a simple "reply" with your email client, only the author of this message will receive your answer. You need to do a "reply all" if you want your answer to be distributed to the entire group. _____________________________________ About this discussion group: To Join: To Post: To Leave: Archives: http://www.yahoogroups.com/group/speechcoding Other DSP-Related Groups: http://www.dsprelated.com --------------------------------- |
|
Reply by ●August 13, 20032003-08-13
hi Arvind thanks a lot for the followup ! but i am reluctant to say that the harmonic noise shaping (HNS) and formant perceptual weighting (FPW) filters have much to do with the LPC filter. the FPW filter is certainly derived from the LPC filter, but it contains both poles as well as "zeros". Similarly the transfer function of HNS filter too contains both poles and zeros. anyway, to borrow explanation from the paper by Chen and Gersho : "in most cases, lowering noise components at certain frequencies can only be achieved at the price of increased noise components at other frequencies.Therefore at very low encoding rates when the average level of coding noise is quite high, it is very difficult, if not impossible, to force noise below the threshold at all frequencies. The situation is similar to stepping on a ballon: when we use noise spectral shaping to reduce the noise components in the spectral valley regions, the noise components near formants will exceed the threshold; on the other hand, if we reduce the noise near formants, the noise in valley regions will exceed the threshold" effectively that is the difficulty involved in filter design and thats why the strategy of preserving peaks and sacrificing valleys.. Sameer -----Original Message----- From: Arvind [mailto:] Sent: Wednesday, August 13, 2003 9:42 AM To: Sameer Kibey; 'Lijun Tan'; 'BhanuPrakash'; Subject: RE: [speechcoding] Harmonic Noise shaping Filter Hi, I would like to add in a bit to the already good explaination from Sameer. I would reiterate what Sameer said... The strategy adopted is to preserve the spectral peaks and sacrifice the valleys. in other words, during encoding, noise spectral shaping is done in such a way that the noise components around the spectral peaks are below masking threshold while noise components in valley regions are not. The reason for this is the basic LPC difference equation has no zeroes included ( remember we assume the order of the denominator high (LPCORDER) compared to the numerator). It's an all pole filter otherwise. That's why LPC filter can represent peaks more closely than the valleys. Cheer's Arvind Sameer Kibey <> wrote: hi all! The Harmonic noise shaping filter and the formant perceptual weighting filter are both based on the same principle - they try to minimise the noise in the "spectral peaks". (The spectral peaks can be formant peaks as well as the pitch harmonic peaks). The motivation for the use of these filters comes from the masking property of the ear, which says that if the noise level is below a particular threshold based on the energy of the speech signal, it cannot be perceived by the ear as the signal would tend to 'mask' it. The overall masking threshold for a given speech segment follows the peaks and valleys of the speech spectrum. If a speech coder can push the noise below the masking threshold function at all frequencies, the coded speech would be perceptually noise-free. However, at low bit rates it is difficult to push noise below the threshold in both "peaks" and "valleys" of the speech spectrum. So the strategy adopted is to preserve the spectral peaks and sacrifice the valleys. in other words, during encoding, noise spectral shaping is done in such a way that the noise components around the spectral peaks are below masking threshold while noise components in valley regions are not. hence, a "harmonic noise shaping filter" would attenuate noise at pitch harmonic PEAKS (and the "formant perceptual weighting filter" would attenuate noise at the formant PEAKS). in doing so, the noise in the "valleys" may exceed the threshold and hence most of the perceived noise comes from the spectral valleys, including valleys between pitch harmonics. This noise is taken care of by the "postfilter" at the decoder end. The postfilter would simply attenuate the freq components between pitch harmonics and the formants, which contain the unwanted noise. (these are better known as "long term" and "short term" postfilters respectively). so ultimately you get more or less acceptable quantisation noise in the spectral peaks as well as in the valleys... hope that explains the concept behind the HNS (and FPW) filter.. for more details please go through this wonderful paper: http://scl.ece.ucsb.edu/pubs/pubs_E/e95_1.pdf best regards, Sameer. |
Reply by ●August 18, 20032003-08-18
Hi Ilya, I checked the G.723.1 standard just now .. what you say abt HNS filter being a classical LTP is indeed right. It is interesting to note that the simple LTP can also serve as the harmonic noise shaper. This, however, upon a little thinking appears logical.. and I just thought I should share this small explanation with all. It is well understood that long term periodicity in the time domain (or the "pitch") manifests itself as the pitch harmonic peaks in freq domain. The job of the LTP is to remove this long term periodicity (or the long term 'redundancy'). So once you pass the input speech thro' the LTP, what is left is the speech without any pitch component. In the frequency domain, this is reflected as "removal of the pitch harmonics" i.e. the spectrum would tend to get "flattened". In other words, the frequency response of the LTP is inverse of the input speech spectrum. (if you want spectrum to get flattened, it is obvious that your filter's response should have valleys where the input signal has peaks and vice versa). Now let us look at what we want from the harmonic noise shaper: its main function is to attenuate noise in the spectral peaks. So its freq response should have (a)valleys where the speech spectrum has harmonic peaks (b)peaks at the valleys between the speech harmonic peaks. That is nothing but the freq response of the LTP !! thus the LTP can indeed be used as a HNS filter... as far as the confusion between HNS filter and Adaptive CB is concerned, the HNS in G.723.1 is a simple "first order" filter. The adaptive CB, on the other hand, uses a fifth order filter for better prediction. so both are different.(Probably the first order filter is quite sufficient for the purpose of speech enhancement using HNS). regards, Sameer. -----Original Message----- From: Ilya Druker [mailto:] Sent: Sunday, August 17, 2003 3:07 PM To: Subject: [speechcoding] Re: Harmonic Noise shaping Filter Harmonic Noise Shaping filter is just a beautiful name for the classic Long-Term Prediction (LTP) filter. But LTP filter is a private case of adaptive codebook. In G723.1 the Harmonic Noise Shaping filter is followed by adaptive codebook in analysis-by-synthesis quantization of subframes. What I cannot understand is why G723.1 needs BOTH Harmonic Noise Shaping AND adaptive codebook?! Is not that redundant? Thanks, Ilya Druker --- In , "Sameer Kibey" <sameer@t...> wrote: > hi all! > > The Harmonic noise shaping filter and the formant perceptual weighting > filter are both based on the same principle - they try to minimise the noise > in the "spectral peaks". (The spectral peaks can be formant peaks as well as > the pitch harmonic peaks). > > The motivation for the use of these filters comes from the masking property > of the ear, which says that if the noise level is below a particular > threshold based on the energy of the speech signal, it cannot be perceived > by the ear as the signal would tend to 'mask' it. > > The overall masking threshold for a given speech segment follows the peaks > and valleys of the speech spectrum. If a speech coder can push the noise > below the masking threshold function at all frequencies, the coded speech > would be perceptually noise-free. However, at low bit rates it is difficult > to push noise below the threshold in both "peaks" and "valleys" of the > speech spectrum. > > So the strategy adopted is to preserve the spectral peaks and sacrifice the > valleys. in other words, during encoding, noise spectral shaping is done in > such a way that the noise components around the spectral peaks are below > masking threshold while noise components in valley regions are not. > > hence, a "harmonic noise shaping filter" would attenuate noise at pitch > harmonic PEAKS (and the "formant perceptual weighting filter" would > attenuate noise at the formant PEAKS). > > in doing so, the noise in the "valleys" may exceed the threshold and hence > most of the perceived noise comes from the spectral valleys, including > valleys between pitch harmonics. This noise is taken care of by the > "postfilter" at the decoder end. The postfilter would simply attenuate the > freq components between pitch harmonics and the formants, which contain the > unwanted noise. (these are better known as "long term" and "short term" > postfilters respectively). > > so ultimately you get more or less acceptable quantisation noise in the > spectral peaks as well as in the valleys... > > hope that explains the concept behind the HNS (and FPW) filter.. for more > details please go through this wonderful paper: > > http://scl.ece.ucsb.edu/pubs/pubs_E/e95_1.pdf > > best regards, > Sameer. |