Hi, I've read Dan Ellis article on "Model-Based Monaural Source Separation Using a Vector-Quantized Phase-Vocoder Representation". (http://www.ee.columbia.edu/~dpwe/pubs/EllisW06-pvocvq.pdf). with great interest. My problem is that I want to get a good estimate for the phase information after doing some manipulation on the magnitude of the short time fourier transform (STFT) of some speech. Dan Ellis quantizises (Vector Quantization) the "instantanious frequency" of peaks and cumulates it to get an estimate for the phase information. (Much like using a phase vocoder). My question is, wont the phase vocoder only give good estimates of the phase at the areas where the STFT magnitude is fairly constant over several frames/analysis windows? So it will not give a good estimate unless I am in a very nice voiced region of the speech and have a high resolution STFT? Has anyone here worked with this kind of problem? I'm having some difficulty figuring out exactly what it is Dan Ellis is doing, and there does not seem to be any further information on this method for a phase estimate from quatazised phase information. Any help would be appreciated! Sincerely, Soren
Phase Vocoder and Vector Quantization
Started by ●June 28, 2006