Rune Allnor wrote:
> Michel Rouzic skrev:
> > Rune Allnor wrote:
> > > Michel Rouzic skrev:
>
> > > > sure, but in the first place I had some trouble seeing how it could be
> > > > performed in the linear scale, and actually, I have quite some trouble
> > > > successfully implementing it.
> > >
> > > Sure. But doing these things in logarithmic scale don't make
> > > it easier.
> >
> > not only it wouldn't be easier, but i would get quite the same result,
> > right?
>
> I have never tried these things, neither in linear or log scale.
> The naive answer would be "yes, both approaches should give
> similar answers", but there might be some detail that changes that.
>
> ...
> > > > let's see,
> > > > if i want to move everything up by one octave, it must i must multiply
> > > > every frequency by two, frequency 0.01 becomes 0.02, frequency 0.1
> > > > becomes 0.2 etc... so basically, it's all about interpolating the
> > > > signal in the frequency domain by a factor of 2, which is zero-padding
> > > > in the time domain, and then get rid of the upper half of the spectrum
> > > > so that I get my frequency multiplication right.
> > >
> > > Aha... you want to do a frequency shift?
> >
> > Can't be called that I think. IIRC, frequency shift is when you add a
> > certain amount of Hz to every frequency, as what i'm looking for is
> > pitch shift, which is multiplying every frequency by a certain ratio.
>
> Yep, you are right. Now it all makes sense.
>
> Well, we are back to these questions about the application. I don't
> know
> howa pitch shifting device to "tune" a recording of a musical
> instrument
> would work, but to "tune" the human voice, there is a different
> approach
> that might be easier to implement.
>
> A popular model for the human vocal system, is that the vocal cords
> act like a "repeated pulse source" that feed a pulse train into
> the vocal tract, which in turn acts as a filter to shape the sound.
> The period between the pulses emitted by the vocal cords determine
> the pitch.
>
> You can (formally) separate the pulses and the filter response by using
> a
> cepstrum. If you manage to do that, you can impose a different pitch,
> and then re-synthesize the signal. Actually, this just might work with
> musical instruments as well.
>
> Now, before you start fiddling with the cepstrum, be warned that the
> cepstrum is a numerical nightmare, that is far from stable and not
> anywhere near guaranteed to work.
>
> ...
> > > > yeah, i'm definitly confused, and right now I have no idea on how to
> > > > obtain something different than zeropading and decimation in the time
> > > > domain out of frequency-domain interpolation.
> > >
> > > The zero padding is as expected, the resampling... well, I don't know.
> >
> > well the resampling is because i did it. and the reason i did it was to
> > make sure what making it in a way so that every frequency is multiplied
> > by 2.0 was getting me to.
>
> If pitch shifts was your objective, that seems sensible.
>
> > sounds like there is no way you can possibly get to pitch shift no
> > matter what you do with a simple FFT, logarithmic scaling or not. my
> > idea for logarithmic scaling and then shifting came from some program I
> > did that consists of making a log-scaled spectrogram and turning it
> > back into a sound, shifting the spectrogram vertically would result in
> > pitch shifting, only, due to the great loss of information during the
> > spectrogram analysis, it's not a viable way to obtain a high quality
> > pitch shifting.
>
> OK... did you use the spectrogram as the starting point for this
> excercise? Did you compute it yourself or did you use a canned
> routine to do that?
>
> There are at least two potential problems with the spectrogram.
> First, spectrogramsdon't contain phase information. So they are
> no exact representation of the signal. Second, it is not obvious
> how to mount all the shifted spectra back together even if you
> managed to do things correctly with each individual spectrum.
>
> > fortunatly, i have a new idea to test. it consists this time of
> > analyzing the sounds into a bi-dimensional complex array (in the
> > rectangular form, not polar) obtained by STFT, based on the idea that
> > you can get back to the exact original signal from what's stored in
> > that array, interpolating in either axis might give interesting
> > results. i guess i'll try that idea tomorrow, it musn't be too hard to
> > implement, it's quite the same principle as STFT-based spectrography,
> > the main difference being keeping the whole complex information instead
> > of just computing the magnitude.
>
> Sounds like a nice project.
>
> Rune
Well I just had a new idea and I'd appreciate criticism before I'd
start implementing it.
There it is. One thing about phase vocoding (i think), if I got it
right, is that it tries to determine something like the best window
size matching to something like a repetitive pattern in the signal in
order to diminish the "smearing" or something like this. well I hardly
manage to explain it, plus I forgot where I read about it, but anyways,
that made me think, if you "cut" the signal into slim slices, I mean if
you use a filter bank with filters with quite a narrow bandwidth,
basically each "slice" looks like a bunch of waves all of the same
length but not the same height.
that may be slightly over simplistic but my basic idea is there, such
"slices" would be easy to be made longer, by for example taking each
wave and repeating it twice (or even better trying to interpolate new
waves by measuring the height of the wave on the left and the height of
the wave on the right). in this example then you'd have each slice
successfully pitch-shifted right? (or rather time-stretched) and since
adding all the original slices together would bring you back to the
original signal, adding all the time-stretched slices together would
get you to a nicely time-streched signal
is there a major flaw in my idea or is it worth exploring/testing?