comp.dsp | Synthesis from a spectrogram

I made a program that makes a spectrogram out of a sound (with log base
2 frequency) output it as an image file and then is able to input it
again to turn it back into a sound. However, I'm meeting a problem.

In order to make the spectrogram, I bandpass the original sound many
times with an array of filters which frequency and width vary
logarithmically and get the enveloppe out of each band. In order to
turn the spectrogram (the image) back into a sound, I generate a white
noise, filter it in the exact same way as I filtered the original
sound, and multiply each of these bands with the enveloppe from the
image matching to that band.

While the sound is recognizibly reconstructed, the result is quite
noise, the main problem being that there is noise in areas where they
should not. In other words, I'm not really sure that using this
technique (I mean using noise and filter it) might be the best way, but
I cannot think of any other. Any ideas?

Reply by Andor ●April 6, 20062006-04-06

Michel Rouzic wrote:
> I made a program that makes a spectrogram out of a sound (with log base
> 2 frequency) output it as an image file and then is able to input it
> again to turn it back into a sound. However, I'm meeting a problem.
>
> In order to make the spectrogram, I bandpass the original sound many
> times with an array of filters which frequency and width vary
> logarithmically and get the enveloppe out of each band. In order to
> turn the spectrogram (the image) back into a sound, I generate a white
> noise, filter it in the exact same way as I filtered the original
> sound, and multiply each of these bands with the enveloppe from the
> image matching to that band.
>
> While the sound is recognizibly reconstructed, the result is quite
> noise, the main problem being that there is noise in areas where they
> should not. In other words, I'm not really sure that using this
> technique (I mean using noise and filter it) might be the best way, but
> I cannot think of any other. Any ideas?

I would proceed as follows: Make a filter bank such that you can
reconstruct the original signal from the filter bank output with the
appropriate synthesis filters. Wouldn't that be great? If you don't
modify the signal in frequency domain, the output signal is _exactly_
the same as the input signal!

Now Michel, this has of course all been done before. Its called the
Wavelet Transform. Look for a two-channel orthogonal or biorthogonal WT
to get the log2 base frequency division.

I'm sure the code is out there ...

(use Mulder's voice to read this last sentence:-)

Reply by Michel Rouzic ●April 8, 20062006-04-08

Andor wrote:
> Michel Rouzic wrote:
> > I made a program that makes a spectrogram out of a sound (with log base
> > 2 frequency) output it as an image file and then is able to input it
> > again to turn it back into a sound. However, I'm meeting a problem.
> >
> > In order to make the spectrogram, I bandpass the original sound many
> > times with an array of filters which frequency and width vary
> > logarithmically and get the enveloppe out of each band. In order to
> > turn the spectrogram (the image) back into a sound, I generate a white
> > noise, filter it in the exact same way as I filtered the original
> > sound, and multiply each of these bands with the enveloppe from the
> > image matching to that band.
> >
> > While the sound is recognizibly reconstructed, the result is quite
> > noise, the main problem being that there is noise in areas where they
> > should not. In other words, I'm not really sure that using this
> > technique (I mean using noise and filter it) might be the best way, but
> > I cannot think of any other. Any ideas?
>
> I would proceed as follows: Make a filter bank such that you can
> reconstruct the original signal from the filter bank output with the
> appropriate synthesis filters. Wouldn't that be great? If you don't
> modify the signal in frequency domain, the output signal is _exactly_
> the same as the input signal!
>
> Now Michel, this has of course all been done before. Its called the
> Wavelet Transform. Look for a two-channel orthogonal or biorthogonal WT
> to get the log2 base frequency division.
>
> I'm sure the code is out there ...
>
> (use Mulder's voice to read this last sentence:-)

Wait, there's something that I don't understand here. I read about
wavelet transforms since you posted, and it sounds somewhat close to
what I do, but there's something wrong tho, in order to perform the
inversion wavelet transform, you're supposed to do it with bands that
have been obtained by filtering and downsampling (downsampling is a
detail tho), what i'm trying to say is that I don't keep all that,
since I only keep the envelope for each band. So I still don't
understand how I reconstruct my signal with only that..

Reply by Andor ●April 9, 20062006-04-09

Michel Rouzic wrote:

> Andor wrote:
> > Michel Rouzic wrote:
> > > I made a program that makes a spectrogram out of a sound (with log base
> > > 2 frequency) output it as an image file and then is able to input it
> > > again to turn it back into a sound. However, I'm meeting a problem.
> > >
> > > In order to make the spectrogram, I bandpass the original sound many
> > > times with an array of filters which frequency and width vary
> > > logarithmically and get the enveloppe out of each band. In order to
> > > turn the spectrogram (the image) back into a sound, I generate a white
> > > noise, filter it in the exact same way as I filtered the original
> > > sound, and multiply each of these bands with the enveloppe from the
> > > image matching to that band.
> > >
> > > While the sound is recognizibly reconstructed, the result is quite
> > > noise, the main problem being that there is noise in areas where they
> > > should not. In other words, I'm not really sure that using this
> > > technique (I mean using noise and filter it) might be the best way, but
> > > I cannot think of any other. Any ideas?
> >
> > I would proceed as follows: Make a filter bank such that you can
> > reconstruct the original signal from the filter bank output with the
> > appropriate synthesis filters. Wouldn't that be great? If you don't
> > modify the signal in frequency domain, the output signal is _exactly_
> > the same as the input signal!
> >
> > Now Michel, this has of course all been done before. Its called the
> > Wavelet Transform. Look for a two-channel orthogonal or biorthogonal WT
> > to get the log2 base frequency division.
> >
> > I'm sure the code is out there ...
> >
> > (use Mulder's voice to read this last sentence:-)
>
> Wait, there's something that I don't understand here. I read about
> wavelet transforms since you posted, and it sounds somewhat close to
> what I do, but there's something wrong tho, in order to perform the
> inversion wavelet transform, you're supposed to do it with bands that
> have been obtained by filtering and downsampling (downsampling is a
> detail tho), what i'm trying to say is that I don't keep all that,
> since I only keep the envelope for each band.  So I still don't
> understand how I reconstruct my signal with only that..

Of course, if you only keep the envelope of the filter bank output,
then you lose information, and you won't be able to reconstruct your
signal. However, I don't know what you hope gain from just keeping the
envelope instead of the signed signal. If you define envelope(x) =
abs(x), then this kind of irreversible data compression saves you only
one bit of memory per sample ...

Reply by Michel Rouzic ●April 9, 20062006-04-09

Andor wrote:
> Michel Rouzic wrote:
>
> > Andor wrote:
> > > Michel Rouzic wrote:
> > > > I made a program that makes a spectrogram out of a sound (with log base
> > > > 2 frequency) output it as an image file and then is able to input it
> > > > again to turn it back into a sound. However, I'm meeting a problem.
> > > >
> > > > In order to make the spectrogram, I bandpass the original sound many
> > > > times with an array of filters which frequency and width vary
> > > > logarithmically and get the enveloppe out of each band. In order to
> > > > turn the spectrogram (the image) back into a sound, I generate a white
> > > > noise, filter it in the exact same way as I filtered the original
> > > > sound, and multiply each of these bands with the enveloppe from the
> > > > image matching to that band.
> > > >
> > > > While the sound is recognizibly reconstructed, the result is quite
> > > > noise, the main problem being that there is noise in areas where they
> > > > should not. In other words, I'm not really sure that using this
> > > > technique (I mean using noise and filter it) might be the best way, but
> > > > I cannot think of any other. Any ideas?
> > >
> > > I would proceed as follows: Make a filter bank such that you can
> > > reconstruct the original signal from the filter bank output with the
> > > appropriate synthesis filters. Wouldn't that be great? If you don't
> > > modify the signal in frequency domain, the output signal is _exactly_
> > > the same as the input signal!
> > >
> > > Now Michel, this has of course all been done before. Its called the
> > > Wavelet Transform. Look for a two-channel orthogonal or biorthogonal WT
> > > to get the log2 base frequency division.
> > >
> > > I'm sure the code is out there ...
> > >
> > > (use Mulder's voice to read this last sentence:-)
> >
> > Wait, there's something that I don't understand here. I read about
> > wavelet transforms since you posted, and it sounds somewhat close to
> > what I do, but there's something wrong tho, in order to perform the
> > inversion wavelet transform, you're supposed to do it with bands that
> > have been obtained by filtering and downsampling (downsampling is a
> > detail tho), what i'm trying to say is that I don't keep all that,
> > since I only keep the envelope for each band.  So I still don't
> > understand how I reconstruct my signal with only that..
>
> Of course, if you only keep the envelope of the filter bank output,
> then you lose information, and you won't be able to reconstruct your
> signal. However, I don't know what you hope gain from just keeping the
> envelope instead of the signed signal. If you define envelope(x) =
> abs(x), then this kind of irreversible data compression saves you only
> one bit of memory per sample ...

This is definitly not about compression, far from that. It's about
editing (editing the spectrogram, understand, the image, instead of
directly editing the sound, and i really hope you're not wondering "why
would anyone wanna do that?").

btw to me envelope(x)!=abs(x) but rather
envelope(x)=downsample(abs(upwards_frequency_shift(upsample(x))))

so yeah, i'm not hoping to gain anything by just keeping the envelope
but as you can guess since the goal is to produce a spectrogram I can't
leave the filtered signals as is. I'm not trying to get to the exact
original signal but something close enough.

I'm already get interesting results by using a white noise but I don't
think this is good enough, and I hardly can think of any alternative.
Of course, the way I get the spectrogram itselves could be changed, but
it would still have to be a spectrogram, therefore I think it will
still have to be made of envelopes

Reply by Ron N. ●April 9, 20062006-04-09

Michel Rouzic wrote:
> I made a program that makes a spectrogram out of a sound (with log base
> 2 frequency) output it as an image file and then is able to input it
> again to turn it back into a sound. However, I'm meeting a problem.
>
> In order to make the spectrogram, I bandpass the original sound many
> times with an array of filters which frequency and width vary
> logarithmically and get the enveloppe out of each band. In order to
> turn the spectrogram (the image) back into a sound, I generate a white
> noise, filter it in the exact same way as I filtered the original
> sound, and multiply each of these bands with the enveloppe from the
> image matching to that band.
>
> While the sound is recognizibly reconstructed, the result is quite
> noise, the main problem being that there is noise in areas where they
> should not. In other words, I'm not really sure that using this
> technique (I mean using noise and filter it) might be the best way, but
> I cannot think of any other. Any ideas?

You have a lossy process (your filter bands are probably way
too wide and throw away all phase information as well), and
are trying to reconstruct lost data with white noise, so of course
there will be noise where there wasn't in the original signal.

One possibility is to inject less noise into your reconstruction
process, perhaps by using narrower filters for the reconstruction
than the input, and tuning those filters away from where you are
finding the most noise in the result.  You still won't get the original
signal back, but you might get less noise with the missing portions.

Do you use the same noise for all the filter outputs or a
different noise vector for each one?  I wonder whether that
makes a difference...

IMHO. YMMV.
-- 
rhn A.T nicholson d.0.t C-o-M

Reply by Michel Rouzic ●April 9, 20062006-04-09

Ron N. wrote:
> Michel Rouzic wrote:
> > I made a program that makes a spectrogram out of a sound (with log base
> > 2 frequency) output it as an image file and then is able to input it
> > again to turn it back into a sound. However, I'm meeting a problem.
> >
> > In order to make the spectrogram, I bandpass the original sound many
> > times with an array of filters which frequency and width vary
> > logarithmically and get the enveloppe out of each band. In order to
> > turn the spectrogram (the image) back into a sound, I generate a white
> > noise, filter it in the exact same way as I filtered the original
> > sound, and multiply each of these bands with the enveloppe from the
> > image matching to that band.
> >
> > While the sound is recognizibly reconstructed, the result is quite
> > noise, the main problem being that there is noise in areas where they
> > should not. In other words, I'm not really sure that using this
> > technique (I mean using noise and filter it) might be the best way, but
> > I cannot think of any other. Any ideas?
>
> You have a lossy process (your filter bands are probably way
> too wide and throw away all phase information as well), and
> are trying to reconstruct lost data with white noise, so of course
> there will be noise where there wasn't in the original signal.

My bands are probably way too wide... so far as I use them they are one
semitone wide, they maybe narrower, is it too wide?

> One possibility is to inject less noise into your reconstruction
> process, perhaps by using narrower filters for the reconstruction
> than the input, and tuning those filters away from where you are
> finding the most noise in the result.  You still won't get the original
> signal back, but you might get less noise with the missing portions.

narrower filter? if i do that, i'm afraid i'll get regular gaps in my
output signal's frequency response, and basically, i don't want that
(that really won't sound good)

> Do you use the same noise for all the filter outputs or a
> different noise vector for each one?  I wonder whether that
> makes a difference...

Yes, I use the same noise. I might try using a new noise everytime, but
in my humble opinion, it won't change alot.

BTW, I realized that the unexpected noise comes mainly from hand-made
spectrograms, i'd have to investigate, but it might have to do with
envelopes containing higher frequency components than they should,
although I thought I did something to prevent that. For each band, I
multiplied the envelope by the white noise then bandpassed it, hoping
it would also make the envelope of the newly created band smoother in
the lowest frequencies. I'll have to test whether it's right or wrong

Anyways, do you have any idea that might make me able to avoid using
white noise at all?

Reply by Michel Rouzic ●April 10, 20062006-04-10

Michel Rouzic wrote:
> BTW, I realized that the unexpected noise comes mainly from hand-made
> spectrograms, i'd have to investigate, but it might have to do with
> envelopes containing higher frequency components than they should,
> although I thought I did something to prevent that. For each band, I
> multiplied the envelope by the white noise then bandpassed it, hoping
> it would also make the envelope of the newly created band smoother in
> the lowest frequencies. I'll have to test whether it's right or wrong

No wait it's not that at all. It's my envelope interpolation that
introduces ripples. I need to go for another interpolation, something
like convolution by a gaussian function. still, i'd like something
better than using white noise.

Synthesis from a spectrogram

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group