I've read Dan Ellis article on "Model-Based Monaural Source Separation
Using a Vector-Quantized Phase-Vocoder Representation".
My problem is that I want to get a good estimate for the phase
information after doing some manipulation on the magnitude of the
short time fourier transform (STFT) of some speech.
Dan Ellis quantizises (Vector Quantization) the "instantanious
frequency" of peaks and cumulates it to get an estimate for the phase
information. (Much like using a phase vocoder).
My question is, wont the phase vocoder only give good estimates of the
phase at the areas where the STFT magnitude is fairly constant over
several frames/analysis windows? So it will not give a good estimate
unless I am in a very nice voiced region of the speech and have a high
Has anyone here worked with this kind of problem? I'm having some
difficulty figuring out exactly what it is Dan Ellis is doing, and
there does not seem to be any further information on this method for a
phase estimate from quatazised phase information.
Any help would be appreciated!