Forums

Synthesis from a spectrogram

Started by Michel Rouzic April 6, 2006
I made a program that makes a spectrogram out of a sound (with log base
2 frequency) output it as an image file and then is able to input it
again to turn it back into a sound. However, I'm meeting a problem.

In order to make the spectrogram, I bandpass the original sound many
times with an array of filters which frequency and width vary
logarithmically and get the enveloppe out of each band. In order to
turn the spectrogram (the image) back into a sound, I generate a white
noise, filter it in the exact same way as I filtered the original
sound, and multiply each of these bands with the enveloppe from the
image matching to that band.

While the sound is recognizibly reconstructed, the result is quite
noise, the main problem being that there is noise in areas where they
should not. In other words, I'm not really sure that using this
technique (I mean using noise and filter it) might be the best way, but
I cannot think of any other. Any ideas?

Michel Rouzic wrote:
> I made a program that makes a spectrogram out of a sound (with log base > 2 frequency) output it as an image file and then is able to input it > again to turn it back into a sound. However, I'm meeting a problem. > > In order to make the spectrogram, I bandpass the original sound many > times with an array of filters which frequency and width vary > logarithmically and get the enveloppe out of each band. In order to > turn the spectrogram (the image) back into a sound, I generate a white > noise, filter it in the exact same way as I filtered the original > sound, and multiply each of these bands with the enveloppe from the > image matching to that band. > > While the sound is recognizibly reconstructed, the result is quite > noise, the main problem being that there is noise in areas where they > should not. In other words, I'm not really sure that using this > technique (I mean using noise and filter it) might be the best way, but > I cannot think of any other. Any ideas?
I would proceed as follows: Make a filter bank such that you can reconstruct the original signal from the filter bank output with the appropriate synthesis filters. Wouldn't that be great? If you don't modify the signal in frequency domain, the output signal is _exactly_ the same as the input signal! Now Michel, this has of course all been done before. Its called the Wavelet Transform. Look for a two-channel orthogonal or biorthogonal WT to get the log2 base frequency division. I'm sure the code is out there ... (use Mulder's voice to read this last sentence:-)
Andor wrote:
> Michel Rouzic wrote: > > I made a program that makes a spectrogram out of a sound (with log base > > 2 frequency) output it as an image file and then is able to input it > > again to turn it back into a sound. However, I'm meeting a problem. > > > > In order to make the spectrogram, I bandpass the original sound many > > times with an array of filters which frequency and width vary > > logarithmically and get the enveloppe out of each band. In order to > > turn the spectrogram (the image) back into a sound, I generate a white > > noise, filter it in the exact same way as I filtered the original > > sound, and multiply each of these bands with the enveloppe from the > > image matching to that band. > > > > While the sound is recognizibly reconstructed, the result is quite > > noise, the main problem being that there is noise in areas where they > > should not. In other words, I'm not really sure that using this > > technique (I mean using noise and filter it) might be the best way, but > > I cannot think of any other. Any ideas? > > I would proceed as follows: Make a filter bank such that you can > reconstruct the original signal from the filter bank output with the > appropriate synthesis filters. Wouldn't that be great? If you don't > modify the signal in frequency domain, the output signal is _exactly_ > the same as the input signal! > > Now Michel, this has of course all been done before. Its called the > Wavelet Transform. Look for a two-channel orthogonal or biorthogonal WT > to get the log2 base frequency division. > > I'm sure the code is out there ... > > (use Mulder's voice to read this last sentence:-)
Wait, there's something that I don't understand here. I read about wavelet transforms since you posted, and it sounds somewhat close to what I do, but there's something wrong tho, in order to perform the inversion wavelet transform, you're supposed to do it with bands that have been obtained by filtering and downsampling (downsampling is a detail tho), what i'm trying to say is that I don't keep all that, since I only keep the envelope for each band. So I still don't understand how I reconstruct my signal with only that..
Michel Rouzic wrote:

> Andor wrote: > > Michel Rouzic wrote: > > > I made a program that makes a spectrogram out of a sound (with log base > > > 2 frequency) output it as an image file and then is able to input it > > > again to turn it back into a sound. However, I'm meeting a problem. > > > > > > In order to make the spectrogram, I bandpass the original sound many > > > times with an array of filters which frequency and width vary > > > logarithmically and get the enveloppe out of each band. In order to > > > turn the spectrogram (the image) back into a sound, I generate a white > > > noise, filter it in the exact same way as I filtered the original > > > sound, and multiply each of these bands with the enveloppe from the > > > image matching to that band. > > > > > > While the sound is recognizibly reconstructed, the result is quite > > > noise, the main problem being that there is noise in areas where they > > > should not. In other words, I'm not really sure that using this > > > technique (I mean using noise and filter it) might be the best way, but > > > I cannot think of any other. Any ideas? > > > > I would proceed as follows: Make a filter bank such that you can > > reconstruct the original signal from the filter bank output with the > > appropriate synthesis filters. Wouldn't that be great? If you don't > > modify the signal in frequency domain, the output signal is _exactly_ > > the same as the input signal! > > > > Now Michel, this has of course all been done before. Its called the > > Wavelet Transform. Look for a two-channel orthogonal or biorthogonal WT > > to get the log2 base frequency division. > > > > I'm sure the code is out there ... > > > > (use Mulder's voice to read this last sentence:-) > > Wait, there's something that I don't understand here. I read about > wavelet transforms since you posted, and it sounds somewhat close to > what I do, but there's something wrong tho, in order to perform the > inversion wavelet transform, you're supposed to do it with bands that > have been obtained by filtering and downsampling (downsampling is a > detail tho), what i'm trying to say is that I don't keep all that, > since I only keep the envelope for each band. So I still don't > understand how I reconstruct my signal with only that..
Of course, if you only keep the envelope of the filter bank output, then you lose information, and you won't be able to reconstruct your signal. However, I don't know what you hope gain from just keeping the envelope instead of the signed signal. If you define envelope(x) = abs(x), then this kind of irreversible data compression saves you only one bit of memory per sample ...
Andor wrote:
> Michel Rouzic wrote: > > > Andor wrote: > > > Michel Rouzic wrote: > > > > I made a program that makes a spectrogram out of a sound (with log base > > > > 2 frequency) output it as an image file and then is able to input it > > > > again to turn it back into a sound. However, I'm meeting a problem. > > > > > > > > In order to make the spectrogram, I bandpass the original sound many > > > > times with an array of filters which frequency and width vary > > > > logarithmically and get the enveloppe out of each band. In order to > > > > turn the spectrogram (the image) back into a sound, I generate a white > > > > noise, filter it in the exact same way as I filtered the original > > > > sound, and multiply each of these bands with the enveloppe from the > > > > image matching to that band. > > > > > > > > While the sound is recognizibly reconstructed, the result is quite > > > > noise, the main problem being that there is noise in areas where they > > > > should not. In other words, I'm not really sure that using this > > > > technique (I mean using noise and filter it) might be the best way, but > > > > I cannot think of any other. Any ideas? > > > > > > I would proceed as follows: Make a filter bank such that you can > > > reconstruct the original signal from the filter bank output with the > > > appropriate synthesis filters. Wouldn't that be great? If you don't > > > modify the signal in frequency domain, the output signal is _exactly_ > > > the same as the input signal! > > > > > > Now Michel, this has of course all been done before. Its called the > > > Wavelet Transform. Look for a two-channel orthogonal or biorthogonal WT > > > to get the log2 base frequency division. > > > > > > I'm sure the code is out there ... > > > > > > (use Mulder's voice to read this last sentence:-) > > > > Wait, there's something that I don't understand here. I read about > > wavelet transforms since you posted, and it sounds somewhat close to > > what I do, but there's something wrong tho, in order to perform the > > inversion wavelet transform, you're supposed to do it with bands that > > have been obtained by filtering and downsampling (downsampling is a > > detail tho), what i'm trying to say is that I don't keep all that, > > since I only keep the envelope for each band. So I still don't > > understand how I reconstruct my signal with only that.. > > Of course, if you only keep the envelope of the filter bank output, > then you lose information, and you won't be able to reconstruct your > signal. However, I don't know what you hope gain from just keeping the > envelope instead of the signed signal. If you define envelope(x) = > abs(x), then this kind of irreversible data compression saves you only > one bit of memory per sample ...
This is definitly not about compression, far from that. It's about editing (editing the spectrogram, understand, the image, instead of directly editing the sound, and i really hope you're not wondering "why would anyone wanna do that?"). btw to me envelope(x)!=abs(x) but rather envelope(x)=downsample(abs(upwards_frequency_shift(upsample(x)))) so yeah, i'm not hoping to gain anything by just keeping the envelope but as you can guess since the goal is to produce a spectrogram I can't leave the filtered signals as is. I'm not trying to get to the exact original signal but something close enough. I'm already get interesting results by using a white noise but I don't think this is good enough, and I hardly can think of any alternative. Of course, the way I get the spectrogram itselves could be changed, but it would still have to be a spectrogram, therefore I think it will still have to be made of envelopes
Michel Rouzic wrote:
> I made a program that makes a spectrogram out of a sound (with log base > 2 frequency) output it as an image file and then is able to input it > again to turn it back into a sound. However, I'm meeting a problem. > > In order to make the spectrogram, I bandpass the original sound many > times with an array of filters which frequency and width vary > logarithmically and get the enveloppe out of each band. In order to > turn the spectrogram (the image) back into a sound, I generate a white > noise, filter it in the exact same way as I filtered the original > sound, and multiply each of these bands with the enveloppe from the > image matching to that band. > > While the sound is recognizibly reconstructed, the result is quite > noise, the main problem being that there is noise in areas where they > should not. In other words, I'm not really sure that using this > technique (I mean using noise and filter it) might be the best way, but > I cannot think of any other. Any ideas?
You have a lossy process (your filter bands are probably way too wide and throw away all phase information as well), and are trying to reconstruct lost data with white noise, so of course there will be noise where there wasn't in the original signal. One possibility is to inject less noise into your reconstruction process, perhaps by using narrower filters for the reconstruction than the input, and tuning those filters away from where you are finding the most noise in the result. You still won't get the original signal back, but you might get less noise with the missing portions. Do you use the same noise for all the filter outputs or a different noise vector for each one? I wonder whether that makes a difference... IMHO. YMMV. -- rhn A.T nicholson d.0.t C-o-M
Ron N. wrote:
> Michel Rouzic wrote: > > I made a program that makes a spectrogram out of a sound (with log base > > 2 frequency) output it as an image file and then is able to input it > > again to turn it back into a sound. However, I'm meeting a problem. > > > > In order to make the spectrogram, I bandpass the original sound many > > times with an array of filters which frequency and width vary > > logarithmically and get the enveloppe out of each band. In order to > > turn the spectrogram (the image) back into a sound, I generate a white > > noise, filter it in the exact same way as I filtered the original > > sound, and multiply each of these bands with the enveloppe from the > > image matching to that band. > > > > While the sound is recognizibly reconstructed, the result is quite > > noise, the main problem being that there is noise in areas where they > > should not. In other words, I'm not really sure that using this > > technique (I mean using noise and filter it) might be the best way, but > > I cannot think of any other. Any ideas? > > You have a lossy process (your filter bands are probably way > too wide and throw away all phase information as well), and > are trying to reconstruct lost data with white noise, so of course > there will be noise where there wasn't in the original signal.
My bands are probably way too wide... so far as I use them they are one semitone wide, they maybe narrower, is it too wide?
> One possibility is to inject less noise into your reconstruction > process, perhaps by using narrower filters for the reconstruction > than the input, and tuning those filters away from where you are > finding the most noise in the result. You still won't get the original > signal back, but you might get less noise with the missing portions.
narrower filter? if i do that, i'm afraid i'll get regular gaps in my output signal's frequency response, and basically, i don't want that (that really won't sound good)
> Do you use the same noise for all the filter outputs or a > different noise vector for each one? I wonder whether that > makes a difference...
Yes, I use the same noise. I might try using a new noise everytime, but in my humble opinion, it won't change alot. BTW, I realized that the unexpected noise comes mainly from hand-made spectrograms, i'd have to investigate, but it might have to do with envelopes containing higher frequency components than they should, although I thought I did something to prevent that. For each band, I multiplied the envelope by the white noise then bandpassed it, hoping it would also make the envelope of the newly created band smoother in the lowest frequencies. I'll have to test whether it's right or wrong Anyways, do you have any idea that might make me able to avoid using white noise at all?
Michel Rouzic wrote:
> BTW, I realized that the unexpected noise comes mainly from hand-made > spectrograms, i'd have to investigate, but it might have to do with > envelopes containing higher frequency components than they should, > although I thought I did something to prevent that. For each band, I > multiplied the envelope by the white noise then bandpassed it, hoping > it would also make the envelope of the newly created band smoother in > the lowest frequencies. I'll have to test whether it's right or wrong
No wait it's not that at all. It's my envelope interpolation that introduces ripples. I need to go for another interpolation, something like convolution by a gaussian function. still, i'd like something better than using white noise.