DSPRelated.com
Forums

Logarithmic Interpolation

Started by Michel Rouzic April 14, 2006
Rune Allnor wrote:
> Michel Rouzic skrev: > > Rune Allnor wrote: > > > Michel Rouzic skrev: > > > > Martin Eisenberg wrote: > > > > > Michel Rouzic wrote: > > > > > > > > > > > I didn't wanna talk about this too much, because I rather > > > > > > realize by myself if it's a dumb idea than be told, but my plan > > > > > > is to *try* using that to perform pitch shift/time stretch. > > > > > > > > > > > > my basic idea is, get the FFT of the signal you want to shift, > > > > > > interpolate the FFT logarithmically, shift the contents towards > > > > > > the left or the right depending on what you want to obtain, > > > > > > interpolate back to linear form, IFFT. of course, i'm definitly > > > > > > not sure and quite sceptical about the odds of obtaining the > > > > > > wanted result, but I think it's worth trying. > > > > > > > > > > Since adding to log frequency is the same as multiplying linear > > > > > frequency you can get the same thing with just one interpolator, > > > > > used to resample the scaled spectrum. > > > > > > > > oh yeah, that's right, but, if I do that, wouldn't the result be just > > > > like interpolating the signal in the time-domain? > > > > > > Nope. Interpolation (i.e. sampling the spectrum on a denser grid) > > > corresponds to elongating the duration in time domain. > > > > what do you mean by elongating the duration? you mean making it longer > > while keeping it at the same frequencies or not? > > If you (in linear scale) interpolate to find new spectrum coefficients > at half the bin widths, you ought to get the your original signal > sampled at the original frequency, but zero-padded to twice the > length/number of samples. > > > > You indicate above that this is an academic exercise, and as such > > > it makes perfect sense. However, I think you are doing things > > > a bit too complicated. Why not check the properties of spectrum > > > interpolation etc in linear scale first, and switch to logarithmic > > > after you've got some more experience? > > > > sure, but in the first place I had some trouble seeing how it could be > > performed in the linear scale, and actually, I have quite some trouble > > successfully implementing it. > > Sure. But doing these things in logarithmic scale don't make > it easier.
not only it wouldn't be easier, but i would get quite the same result, right?
> When something doesn't quite work, is it because the > general idea is wrong, or because something went wrong with > the linear-to-logarithmic conversion? It can be very hard to tell. > > > the problem is, no matter how i try to put it, i end up with a > > resampled version of my original signal plus zero-padding. > > Seems reasonable, see above. > > > let's see, > > if i want to move everything up by one octave, it must i must multiply > > every frequency by two, frequency 0.01 becomes 0.02, frequency 0.1 > > becomes 0.2 etc... so basically, it's all about interpolating the > > signal in the frequency domain by a factor of 2, which is zero-padding > > in the time domain, and then get rid of the upper half of the spectrum > > so that I get my frequency multiplication right. > > Aha... you want to do a frequency shift?
Can't be called that I think. IIRC, frequency shift is when you add a certain amount of Hz to every frequency, as what i'm looking for is pitch shift, which is multiplying every frequency by a certain ratio.
> It is not obvious how to do > that. In radio, this is usally done as Amplitude Modulation, AM, > where you multiply (mix) with a sinusoidal at the carrier frequency.
yup, you add a certain amount of Hz to your signal.
> Now, in AM the carrier is usually a lot figher than the bandwidth > of the baseband signal. In your case, you get some sort of aliasing > that ought to make lots of problems. > > If you haven't done so already, have a look at a text on > AM modulation. > > > if i got it right, it means that all my original idea would do is > > zeropad and decimate all in the time domain, right? > > I don't know. I didn't notice that frequency shift thing until > right now. > > > > Somehow, I think there is a bit confusion about what effects are > > > caused interpolating the spectrum, and what effects are caused > > > by working in logarithmic scale. > > > > yeah, i'm definitly confused, and right now I have no idea on how to > > obtain something different than zeropading and decimation in the time > > domain out of frequency-domain interpolation. > > The zero padding is as expected, the resampling... well, I don't know.
well the resampling is because i did it. and the reason i did it was to make sure what making it in a way so that every frequency is multiplied by 2.0 was getting me to. sounds like there is no way you can possibly get to pitch shift no matter what you do with a simple FFT, logarithmic scaling or not. my idea for logarithmic scaling and then shifting came from some program I did that consists of making a log-scaled spectrogram and turning it back into a sound, shifting the spectrogram vertically would result in pitch shifting, only, due to the great loss of information during the spectrogram analysis, it's not a viable way to obtain a high quality pitch shifting. fortunatly, i have a new idea to test. it consists this time of analyzing the sounds into a bi-dimensional complex array (in the rectangular form, not polar) obtained by STFT, based on the idea that you can get back to the exact original signal from what's stored in that array, interpolating in either axis might give interesting results. i guess i'll try that idea tomorrow, it musn't be too hard to implement, it's quite the same principle as STFT-based spectrography, the main difference being keeping the whole complex information instead of just computing the magnitude.
Michel Rouzic skrev:
> Rune Allnor wrote: > > Michel Rouzic skrev:
> > > sure, but in the first place I had some trouble seeing how it could be > > > performed in the linear scale, and actually, I have quite some trouble > > > successfully implementing it. > > > > Sure. But doing these things in logarithmic scale don't make > > it easier. > > not only it wouldn't be easier, but i would get quite the same result, > right?
I have never tried these things, neither in linear or log scale. The naive answer would be "yes, both approaches should give similar answers", but there might be some detail that changes that. ...
> > > let's see, > > > if i want to move everything up by one octave, it must i must multiply > > > every frequency by two, frequency 0.01 becomes 0.02, frequency 0.1 > > > becomes 0.2 etc... so basically, it's all about interpolating the > > > signal in the frequency domain by a factor of 2, which is zero-padding > > > in the time domain, and then get rid of the upper half of the spectrum > > > so that I get my frequency multiplication right. > > > > Aha... you want to do a frequency shift? > > Can't be called that I think. IIRC, frequency shift is when you add a > certain amount of Hz to every frequency, as what i'm looking for is > pitch shift, which is multiplying every frequency by a certain ratio.
Yep, you are right. Now it all makes sense. Well, we are back to these questions about the application. I don't know howa pitch shifting device to "tune" a recording of a musical instrument would work, but to "tune" the human voice, there is a different approach that might be easier to implement. A popular model for the human vocal system, is that the vocal cords act like a "repeated pulse source" that feed a pulse train into the vocal tract, which in turn acts as a filter to shape the sound. The period between the pulses emitted by the vocal cords determine the pitch. You can (formally) separate the pulses and the filter response by using a cepstrum. If you manage to do that, you can impose a different pitch, and then re-synthesize the signal. Actually, this just might work with musical instruments as well. Now, before you start fiddling with the cepstrum, be warned that the cepstrum is a numerical nightmare, that is far from stable and not anywhere near guaranteed to work. ...
> > > yeah, i'm definitly confused, and right now I have no idea on how to > > > obtain something different than zeropading and decimation in the time > > > domain out of frequency-domain interpolation. > > > > The zero padding is as expected, the resampling... well, I don't know. > > well the resampling is because i did it. and the reason i did it was to > make sure what making it in a way so that every frequency is multiplied > by 2.0 was getting me to.
If pitch shifts was your objective, that seems sensible.
> sounds like there is no way you can possibly get to pitch shift no > matter what you do with a simple FFT, logarithmic scaling or not. my > idea for logarithmic scaling and then shifting came from some program I > did that consists of making a log-scaled spectrogram and turning it > back into a sound, shifting the spectrogram vertically would result in > pitch shifting, only, due to the great loss of information during the > spectrogram analysis, it's not a viable way to obtain a high quality > pitch shifting.
OK... did you use the spectrogram as the starting point for this excercise? Did you compute it yourself or did you use a canned routine to do that? There are at least two potential problems with the spectrogram. First, spectrogramsdon't contain phase information. So they are no exact representation of the signal. Second, it is not obvious how to mount all the shifted spectra back together even if you managed to do things correctly with each individual spectrum.
> fortunatly, i have a new idea to test. it consists this time of > analyzing the sounds into a bi-dimensional complex array (in the > rectangular form, not polar) obtained by STFT, based on the idea that > you can get back to the exact original signal from what's stored in > that array, interpolating in either axis might give interesting > results. i guess i'll try that idea tomorrow, it musn't be too hard to > implement, it's quite the same principle as STFT-based spectrography, > the main difference being keeping the whole complex information instead > of just computing the magnitude.
Sounds like a nice project. Rune
Rune Allnor wrote:
> Michel Rouzic skrev: > > Rune Allnor wrote: > > > Michel Rouzic skrev: > > > > > sure, but in the first place I had some trouble seeing how it could be > > > > performed in the linear scale, and actually, I have quite some trouble > > > > successfully implementing it. > > > > > > Sure. But doing these things in logarithmic scale don't make > > > it easier. > > > > not only it wouldn't be easier, but i would get quite the same result, > > right? > > I have never tried these things, neither in linear or log scale. > The naive answer would be "yes, both approaches should give > similar answers", but there might be some detail that changes that. > > ... > > > > let's see, > > > > if i want to move everything up by one octave, it must i must multiply > > > > every frequency by two, frequency 0.01 becomes 0.02, frequency 0.1 > > > > becomes 0.2 etc... so basically, it's all about interpolating the > > > > signal in the frequency domain by a factor of 2, which is zero-padding > > > > in the time domain, and then get rid of the upper half of the spectrum > > > > so that I get my frequency multiplication right. > > > > > > Aha... you want to do a frequency shift? > > > > Can't be called that I think. IIRC, frequency shift is when you add a > > certain amount of Hz to every frequency, as what i'm looking for is > > pitch shift, which is multiplying every frequency by a certain ratio. > > Yep, you are right. Now it all makes sense. > > Well, we are back to these questions about the application. I don't > know > howa pitch shifting device to "tune" a recording of a musical > instrument > would work, but to "tune" the human voice, there is a different > approach > that might be easier to implement. > > A popular model for the human vocal system, is that the vocal cords > act like a "repeated pulse source" that feed a pulse train into > the vocal tract, which in turn acts as a filter to shape the sound. > The period between the pulses emitted by the vocal cords determine > the pitch. > > You can (formally) separate the pulses and the filter response by using > a > cepstrum. If you manage to do that, you can impose a different pitch, > and then re-synthesize the signal. Actually, this just might work with > musical instruments as well. > > Now, before you start fiddling with the cepstrum, be warned that the > cepstrum is a numerical nightmare, that is far from stable and not > anywhere near guaranteed to work. > > ... > > > > yeah, i'm definitly confused, and right now I have no idea on how to > > > > obtain something different than zeropading and decimation in the time > > > > domain out of frequency-domain interpolation. > > > > > > The zero padding is as expected, the resampling... well, I don't know. > > > > well the resampling is because i did it. and the reason i did it was to > > make sure what making it in a way so that every frequency is multiplied > > by 2.0 was getting me to. > > If pitch shifts was your objective, that seems sensible. > > > sounds like there is no way you can possibly get to pitch shift no > > matter what you do with a simple FFT, logarithmic scaling or not. my > > idea for logarithmic scaling and then shifting came from some program I > > did that consists of making a log-scaled spectrogram and turning it > > back into a sound, shifting the spectrogram vertically would result in > > pitch shifting, only, due to the great loss of information during the > > spectrogram analysis, it's not a viable way to obtain a high quality > > pitch shifting. > > OK... did you use the spectrogram as the starting point for this > excercise? Did you compute it yourself or did you use a canned > routine to do that?
We can say that the spectrogram thing made me want to achieve a better pitch shifting, and that since shifting the log-scaled spectrogram vertically was how i would do it, the first thing i could think about was shifting a log-scaled FFT. any yes i did the spectrogram program myself (besides the FFT part that I left to FFTW). actually thats what got me started into DSP almost one year ago.
> There are at least two potential problems with the spectrogram. > First, spectrogramsdon't contain phase information. So they are > no exact representation of the signal.
Indeed, that's why I can't possibly get a perfect result out of my log-scaled spectrogram, but I get an interesting result however.
> Second, it is not obvious > how to mount all the shifted spectra back together even if you > managed to do things correctly with each individual spectrum.
it's not that hard. the spectrogram is obtained by a logarithmically spread filter bank, and the envelope for each frequency band is computed in order to get the spectrogram. turning it back into a sounds consists mainly in modulating a white noise filtered by the filter bank with the envelope for each maching band.
> > fortunatly, i have a new idea to test. it consists this time of > > analyzing the sounds into a bi-dimensional complex array (in the > > rectangular form, not polar) obtained by STFT, based on the idea that > > you can get back to the exact original signal from what's stored in > > that array, interpolating in either axis might give interesting > > results. i guess i'll try that idea tomorrow, it musn't be too hard to > > implement, it's quite the same principle as STFT-based spectrography, > > the main difference being keeping the whole complex information instead > > of just computing the magnitude. > > Sounds like a nice project.
Yup. Just a little correction, i said that interpolating in either axis might work, actually interpolating in the frequency axis won't work since well it will only give me what I already had before : zero-padding. I only hope doing it in the time axis will be fine.
Rune Allnor wrote:

   ...

> Sounds like a nice project.
It would seem so to me too if it weren't already a standard part of the audiodiddler's repertoire. With so many wonderful techniques left to be invented, the only reason I can see for revisiting old ones is improving them. "I can to it better" is a fine motivation, but it implies that the old inferior way is known. Even the best see further by standing on the shoulders of giants. One who is too proud to climb doesn't get much of a view. Jerry -- Engineering is the art of making what you want from things you can get. �����������������������������������������������������������������������

Michel Rouzic wrote:

> Can't be called that I think. IIRC, frequency shift is when you add a > certain amount of Hz to every frequency, as what i'm looking for is > pitch shift, which is multiplying every frequency by a certain ratio.
Michel, the common term for what you are trying to do is pitch or frequency scaling. A search on "pitch scaling" and "frequency scaling" will turn up a good deal of research and publication on the subject. It's not an easy problem as you are finding out. :-) The easiest is the phase vocoder approach that is well described by Stephan Bernsee at http://www.dspdimension.com/ but that approach, at least for audio, doesn't sound very good. Most digital audio workstation (DAW) software these days, and plugins like AutoTune and Celemony, contain better implementations but they are proprietary because it's a valuable process when done well. Bob -- "Things should be described as simply as possible, but no simpler." A. Einstein
Jerry Avins wrote:
> Rune Allnor wrote: > > ... > > > Sounds like a nice project. > > It would seem so to me too if it weren't already a standard part of the > audiodiddler's repertoire. With so many wonderful techniques left to be > invented, the only reason I can see for revisiting old ones is improving > them. "I can to it better" is a fine motivation, but it implies that the > old inferior way is known. Even the best see further by standing on the > shoulders of giants. One who is too proud to climb doesn't get much of a > view.
I will gladly enjoy the previous experience of others, but i tried googling "audiodiddler" and it returned 0 results. can you link please?
Jerry Avins skrev:
> Rune Allnor wrote: > > ... > > > Sounds like a nice project. > > It would seem so to me too if it weren't already a standard part of the > audiodiddler's repertoire. With so many wonderful techniques left to be > invented, the only reason I can see for revisiting old ones is improving > them. "I can to it better" is a fine motivation, but it implies that the > old inferior way is known. Even the best see further by standing on the > shoulders of giants. One who is too proud to climb doesn't get much of a > view.
Well yes, you are right. It is no need to do this if we live by the dogmatic time == $$ and want only to do new stuff. On the other hand, these sorts of projects can be very good education. The trick is to know when to leave it, and go on with other stuff. Rune
Jerry Avins skrev:
> Rune Allnor wrote: > > ... > > > Sounds like a nice project. > > It would seem so to me too if it weren't already a standard part of the > audiodiddler's repertoire. With so many wonderful techniques left to be > invented, the only reason I can see for revisiting old ones is improving > them. "I can to it better" is a fine motivation, but it implies that the > old inferior way is known. Even the best see further by standing on the > shoulders of giants. One who is too proud to climb doesn't get much of a > view.
Well yes, you are right. It is no need to do this if we live by the dogmatic time == $$ and want only to do new stuff. On the other hand, these sorts of projects can be very good education. The trick is to know when to leave it, and go on with other stuff. Rune
Jerry Avins wrote:
> Rune Allnor wrote: > ... > > Sounds like a nice project. > > It would seem so to me too if it weren't already a standard part of the > audiodiddler's repertoire. With so many wonderful techniques left to be > invented, the only reason I can see for revisiting old ones is improving > them.
Some of the most interesting educational projects (self or class assigned) are to try and (re)invent something not previously described in the textbook. They might be best considered as "finger exercises". One tests ones knowledge and creativity by seeing how close ones solutions are to the existing generic methods. It's like closing the textbook halfway through a proof or description and seeing if you can finish it yourself, rather than just memorizing the text so you can regurgitate it upon stimulus. I like this practice because it often helps me learn to recognize by experience which kinds of solutions techniques won't work (e.g. often the first few I try... :) , as well as alternative methods which sometimes actually do work in some limited problem domains. IMHO. YMMV. -- rhn A.T nicholson d.0.t C-o-M
Ron N. wrote:
> Jerry Avins wrote: > >>Rune Allnor wrote: >> ... >> >>>Sounds like a nice project. >> >>It would seem so to me too if it weren't already a standard part of the >>audiodiddler's repertoire. With so many wonderful techniques left to be >>invented, the only reason I can see for revisiting old ones is improving >>them. > > > Some of the most interesting educational projects (self > or class assigned) are to try and (re)invent something not > previously described in the textbook. They might be best > considered as "finger exercises". One tests ones knowledge > and creativity by seeing how close ones solutions are to the > existing generic methods. It's like closing the textbook > halfway through a proof or description and seeing if you > can finish it yourself, rather than just memorizing the text > so you can regurgitate it upon stimulus. > > I like this practice because it often helps me learn to recognize > by experience which kinds of solutions techniques won't work > (e.g. often the first few I try... :) , as well as alternative > methods which sometimes actually do work in some limited > problem domains.
I agree. But there is usually the goal of doing it more cleverly; at lower cost; in a more readily understandable or explainable way; in a way that fits better than what can be bought; in a way that embodies a sentiment; OR without the hassle of having to look it up. I've infringed more patents than I know (and lost opportunities to patent things) because it's often easier to just do something than it is to research how it's normally done. I have made my own toys and artifacts, from a string-pulled dump truck made from wooden cheese boxes and checkers, through a Hi-Fi system that I wanted but couldn't afford (my introduction to electronics), a pair of silver earrings for a friend's sweet-sixteen party gift and years later a ring for our wedding, to custom furniture for our home. My daughter has the plush-lined cedar jewelry box I made for my mother one rainy day in the country. Believe me, I understand pride of authorship. Michel seems to be on a different track. Maybe I'm wrong. Jerry -- Engineering is the art of making what you want from things you can get. �����������������������������������������������������������������������