comp.dsp | Spectral Analysis & Resynthesis of Audio Signals.. Anyone else into this? Applications/Tools that reliably produce good results?

I've been extremely interested in Spectral Analysis & Resynthesis of Audio
Signals for as long as I've known of it.  The idea of understanding and
manipulating a timbre in terms of basic, intuitive building blocks seems
like the most useful tool imaginable for sound design.

There are a number of computer applications which combine spectral
analysis, resynthesis, and spectral manipulation.  My favorite so far is
SPEAR.  

Usually the results of spectral analysis followed by resynthesis sound
similar to the original signal, but not quite identical.  The replica
usually sounds like it's lost some high frequency content, sounds a bit
"muddy" for lack of a better term.  Even if the analysis parameters are
tweaked extensively, it seems impossible in most cases to produce a replica
that is audibly indistinguishable from the original, and the replica always
seems to sound "inferior" to the original from an aesthetic perspective.

Having studied the practical issues surrounding the Fourier Transform, I
realize that there are Heisenberg-ish issues related to frequency and time
resolution which likely contribute to imperfections in the replica (unless
clever ways to get around or sufficiently minimize this have been devised
which I don't know about).  

Understanding the basic concept employed in the partial tracking
algorithms used by SPEAR et al (the FT is ignored below a certain
amplitude), and also knowing that high frequencies present in audio signals
tend to have relatively low amplitudes, I can see how these two issues
could together be responsible for poor replication in the high frequency
region.


Does anyone know of better software or methods for spectral
anaysis/resynthesis of audio signals?

Does anyone know if improvements in spectral analysis algorithms beyond
what is currently employed in SPEAR et al can be expected?  

Are there insurmountable mathematical limitations which will prevent
sufficiently accurate spectral modeling such that the model is audibly
indistinguishable from the original?

Reply by Michel Rouzic ●May 30, 20082008-05-30

maxplanck wrote:

> Does anyone know of better software or methods for spectral
> anaysis/resynthesis of audio signals?

Shameless personal plug (but it seems appropriate) : http://arss.sf.net
(the Analysis & Resynthesis Sound Spectrograph). It has its
limitations (the "Heisenberg-ish issues" you were referring to mainly)
but I guess it's about as good as it gets when it comes to analysing a
sound into a spectrogram and resynthesising it.

I don't know so much about SPEAR but I seem to understand that it's
more about bringing modifications to a STFT rather than going whole
mile the ARSS goes that is synthesising from a sole bitmap image.

> Does anyone know if improvements in spectral analysis algorithms beyond
> what is currently employed in SPEAR et al can be expected?

Well there are super-resolution techniques, such as the HHT, which I
believe are mostly all fairly recent techniques. I myself intend to
soon investigate such techniques because it's pretty annoying to have
to choose between time and frequency resolution.

Reply by maxplanck ●May 30, 20082008-05-30

Do you know about this?  Apparently an improvement over STFT which
increases time/frequency resolution

http://en.wikipedia.org/wiki/Reassignment_method

Here are pics of STFT Spectrograms vs Reassigned Spectrograms, quite
remarkable increase in sharpness

http://www.cerlsoundgroup.org/Kelly/timefrequency.html


Sadly, I'm fairly sure that SPEAR uses the Reassignment Method (Someone
please correct me if I'm wrong), and although it works fairly well it
nevertheless almost always seems to produce a slighty "inferior" replica of
the original sound.  

I'm wondering if we can expect improvements beyond what's been achieved
using the Reassignment Method.. Or is there some mathematical proof that
this is probably as close as we can get.

Reply by maxplanck ●May 30, 20082008-05-30

From your site:
"The original idea that prompted me to create this project was that if we
can synthesise a spectrogram into a sound, then one could very well learn
how to create all types of sounds by learning how to do so from studying
spectrograms of recorded sounds"

This is exactly what I'm interested in!  It makes sense.. sound design
from the fundamental building blocks, learn how to build with these blocks
by studying analyzed sounds.  If only there were an analysis method which
produced spectral models that were audibly indistinguishable from the
original sounds!  Without that I'm reluctant to start, I feel that it would
be like groping in the dark.. trial and error in a realm of infinite
possibilities, trying to stumble upon a way to recreate what is missing
from the model.. not likely to happen, unless it's something obvious.

Your HAL 9000 spectrogram/photoshop example is quite inspirational.  

I think that the Reassignment Method may be one of the ultra high
resolution methods you mentioned.

Reply by Michel Rouzic ●May 30, 20082008-05-30

maxplanck wrote:

> This is exactly what I'm interested in!  It makes sense.. sound design
> from the fundamental building blocks, learn how to build with these blocks
> by studying analyzed sounds.  If only there were an analysis method which
> produced spectral models that were audibly indistinguishable from the
> original sounds!  Without that I'm reluctant to start, I feel that it would
> be like groping in the dark.. trial and error in a realm of infinite
> possibilities, trying to stumble upon a way to recreate what is missing
> from the model.. not likely to happen, unless it's something obvious.
>
> Your HAL 9000 spectrogram/photoshop example is quite inspirational.

Well even if such programs as the ARSS were able to achieve such high
resolutions that the resynthesised sound could be indistinguishable
from the original it still wouldn't help so much if you wanted to
create sounds on your own. If your goal is to reproduce speech, quite
like I did with the aforementioned HAL 9000 Photoshop example, you
don't need so much more resolution than what the traditional
spectrography techniques can provide. Speech seems to be basically
made of blobs of noise (the higher frequency components) and a bunch
of parallel curves (which you find (usually?) lower in the
spectrogram). Which means you don't need to see so much more than you
can to figure out precisely what "makes" speech.

Just better choose as high-pitched examples as possible, you'll get
more resolution out of those. On a side note in case you intend to
create speech from scratch just by putting phonemes together I've
asked about it on this very newsgroup a few month ago and I've been
informed that it wasn't quite that simple, i.e. you can't just stick a
bunch of unrelated phonemes together and except it to work well. I
haven't experimented with speech much though.

If you're interested in other things than just speech well some things
will work great and some not so much but the realm of possibilities we
can explore right now with the tools available is quite vast. But it's
just better to keep a distance from bass sounds right now (because of
resolution issues that is).

> I think that the Reassignment Method may be one of the ultra high
> resolution methods you mentioned.

Yup, it is indeed. I for one intend to explore a technique that I've
only thought up and never read about yet. I'm very convinced that it
should work great but the only time I talked about it it got shot down
and that discouraged me exploring it any further for some time.

Reply by dbd ●May 30, 20082008-05-30

On May 29, 9:56 pm, Michel Rouzic <Michel0...@yahoo.fr> wrote:

> maxplanck wrote:

> > ...
> > I think that the Reassignment Method may be one of the ultra high
> > resolution methods you mentioned.

> Yup, it is indeed.
>  ...

In what sense of 'resolution' is the reassignment method an
'ultra high resolution' method, as opposed to the many other
STFT based methods?

Dale B. Dalrymple
http://dbdimages.com

Reply by maxplanck ●May 30, 20082008-05-30

-Michel, unfortunately Bass sounds are the ones that I'm most interested
in!  I'm not very much interested in speech.  

If the algorithm does not reproduce the Bass sounds that I'm interested in
such that they are audibly indistinguishable from the originals, then it is
of limited use, I'm sure you understand what I'm talking about.  With an
audibly distinguishable reproduction, it's difficult or impossible to
determine what needs to be added to the model in order to make it audibly
indistinguishable from the original, or what if anything should be removed
from the model in order to make it closer to the original.

But just seeing someone draw a spectral model by hand which sounds close
to a real world signal is inspirational!  I've analyzed and resynthesized
signals using oscillator banks, but visually memorizing patterns and
drawing them is the next step which I have not tried yet because of my
dissatisfaction with the analysis algorithms that I've been working with.



-Dale, here are pics of STFT Spectrograms vs Reassignment Method
Spectrograms

http://www.cerlsoundgroup.org/Kelly/timefrequency.html


The Reassignment Method uses phase to resolve/minimize bin leakage (or
something like that, haven't read/thought carefully about it yet).  This
wikipedia article gives a general description, 

http://en.wikipedia.org/wiki/Reassignment_method

And I *think* this tutorial from dspdimension.com describes the method a
bit in the section titled "From phase to frequency"

http://www.dspdimension.com/admin/pitch-shifting-using-the-ft/

Reply by Ron N ●May 30, 20082008-05-30

On May 30, 12:19 am, dbd <d...@ieee.org> wrote:
> On May 29, 9:56 pm, Michel Rouzic <Michel0...@yahoo.fr> wrote:
>
> > maxplanck wrote:
> > > ...
> > > I think that the Reassignment Method may be one of the ultra high
> > > resolution methods you mentioned.
> > Yup, it is indeed.
> >  ...
>
> In what sense of 'resolution' is the reassignment method an
> 'ultra high resolution' method, as opposed to the many other
> STFT based methods?

Reassignment is a form of STFT method, where reassignment
uses multiple differently shaped windows per each frame
instead of just one.  Using two different windows per frame
allows differencing the phase of the halves of the frame FFTs
in a similar manner to using multiple (two) shorter STFT
frames with something like phase vocoder analysis.

If the two halves of the frame differ in spectrum, both
reassignment and phase vocoder methods can produce bogus
frequency 'resolution', unless one "knows" apriori that
the two halves of the spectrum are related and consistent.
Any information gain is related to the correctness of that
assumption.

.

IMHO. YMMV.
--
rhn A.T nicholson d.0.t C-o-M

Reply by dbd ●May 30, 20082008-05-30

On May 30, 11:49 am, "maxplanck" <erik.bo...@comcast.net> wrote:
> ...

> -Dale, here are pics of STFT Spectrograms vs Reassignment Method
> Spectrograms
>
> http://www.cerlsoundgroup.org/Kelly/timefrequency.html
>
> The Reassignment Method uses phase to resolve/minimize bin leakage (or
> something like that, haven't read/thought carefully about it yet).  This
> wikipedia article gives a general description,
>
> http://en.wikipedia.org/wiki/Reassignment_method
>
> And I *think* this tutorial from dspdimension.com describes the method a
> bit in the section titled "From phase to frequency"
>
> http://www.dspdimension.com/admin/pitch-shifting-using-the-ft/

maxplank

Thanks, I'm familiar with the reassignment method. I didn't ask what
it is. I Asked:

In what sense of 'resolution' is the reassignment method an
'ultra high resolution' method, as opposed to the many other
STFT based methods?

Dale B. Dalrymple
http://dbdimages.com

Reply by dbd ●May 31, 20082008-05-31

On May 30, 4:25 pm, Ron N <ron.nichol...@gmail.com> wrote:

> ...
> Reassignment is a form of STFT method, where reassignment
> uses multiple differently shaped windows per each frame
> instead of just one.  Using two different windows per frame
> allows differencing the phase of the halves of the frame FFTs
> in a similar manner to using multiple (two) shorter STFT
> frames with something like phase vocoder analysis.
>
The phase vocoder calculates phase of bins of STFTs calculated with
different centers to calculate a frequency correction from the phase
difference and the distance between centers. Frequency reassignment
calculates a frequency correction from the ratio of the STFTs of the
same data with different windows. The similarity seems to be limited
to the count of STFTs and the count of window applications being 2.

> If the two halves of the frame differ in spectrum, both
> reassignment and phase vocoder methods can produce bogus
> frequency 'resolution', unless one "knows" apriori that
> the two halves of the spectrum are related and consistent.
> Any information gain is related to the correctness of that
> assumption.
> ...
> rhn A.T nicholson d.0.t C-o-M

Reassignment doesn't have two halves.
The methods do share the assumption of stationarity.

But the question I asked was:
In what sense of 'resolution' is the reassignment method an
'ultra high resolution' method, as opposed to the many other
STFT based methods?

Are reassignment results from proper data any different than the
results from algorithms using peakpicking and interpolation?

"A Unified Theory of Time-Frequency Reassignment" by Fitz and Fulop is
available at:

http://www.eecs.wsu.edu/~kfitz/fitz05unified.html

In figure 3 on page 13 there is an example of how reassignment results
match the peaks of a STFT.

Dale B. Dalrymple

Previous12 3 Next

Spectral Analysis & Resynthesis of Audio Signals.. Anyone else into this? Applications/Tools that reliably produce good results?

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group