DSPRelated.com
Forums

Pitch shifting question

Started by H December 23, 2005
Vladimir Vassilevsky wrote:
> >>> If you want to shift just the pitch of speech signals, while > >>> maintaining all the rest unchanged (i.e. time, speed, formants etc.), > >>> then you need to decompose speech into LPC & excitation, apply speech > >>> shifting to the excitation, and synthesize back to generate the new > >>> speech. > >>> > >> This applies to the speech only. > > > > Not really, it applies to any signal that exhibits periodicity (and > > spectral envelope), and provides the capability of time-varying tracking > > of the signal's characteristics. > > What if there is more then one source of the periodicity with the > different periods? What if no clear periodicity can be derived? > LPC + pitch assumes the human speech structure. > > >> The more general method to make the pitch or the speed change is by > >> the use of a filterbank. The signal is extrapolated by repetition or > >> truncated separately in the each subband. > > > > > > That method is not "more general", rather than the versa. Unlike the > > proposed method of LPC+excitation decomposition, the suggested > > filter-bank method "discretizes" the spectrum into fixed bands and is > > not optimized to the time-varying characteristics of the input signal. > > Once there is no clear periodicity, then the only way to make the signal > "longer" or "shorter" is to extrapolate it by continuation of the > subframe (in the each band) or truncate it. The subband processing > moderates the edge effects and prevents the spillage of the > interpolation artifacts into the different subbands.
However, if there were multiple exciter sources with differing periods and envelopes, would not mixing together those overtones, which just happen to be contained in the same subband but result from different exciters, result in some time domain spillage and artifacts instead? IMHO. YMMV. -- rhn A.T nicholson d.O.t C-o-M
>>>> If you want to shift just the pitch of speech signals, while >>>> maintaining all the rest unchanged (i.e. time, speed, formants >>>> etc.), then you need to decompose speech into LPC & excitation, >>>> apply speech shifting to the excitation, and synthesize back to >>>> generate the new speech. >>>> >>> This applies to the speech only. >> >> >> >> Not really, it applies to any signal that exhibits periodicity (and >> spectral envelope), and provides the capability of time-varying >> tracking of the signal's characteristics. > > > What if there is more then one source of the periodicity with the > different periods? What if no clear periodicity can be derived? > LPC + pitch assumes the human speech structure. > >> >>> The more general method to make the pitch or the speed change is by >>> the use of a filterbank. The signal is extrapolated by repetition or >>> truncated separately in the each subband. >> >> >> >> That method is not "more general", rather than the versa. Unlike the >> proposed method of LPC+excitation decomposition, the suggested >> filter-bank method "discretizes" the spectrum into fixed bands and is >> not optimized to the time-varying characteristics of the input signal. > > > Once there is no clear periodicity,
If there's no periodicity or at least pseudo periodicity, then there's no "pitch" right? and the question was about pitch shifting wasn't it?
> then the only way to make the signal
Don't be so sure as "the only way", as there are several ways of doing that, that may be very well the only way you happened to know...
> "longer" or "shorter" is to extrapolate it by continuation of the > subframe (in the each band) or truncate it. The subband processing > moderates the edge effects and prevents the spillage of the > interpolation artifacts into the different subbands.
Not really, one may take advantages of the temporal signal characteristic, and the the corresponding perception properties of the auditory system to such. Subband is just one way, and certainly not the best way, of doing that. NS