DSPRelated.com
Forums

Spectral Analysis & Resynthesis of Audio Signals.. Anyone else into this? Applications/Tools that reliably produce good results?

Started by maxplanck May 29, 2008
I've been extremely interested in Spectral Analysis & Resynthesis of Audio
Signals for as long as I've known of it.  The idea of understanding and
manipulating a timbre in terms of basic, intuitive building blocks seems
like the most useful tool imaginable for sound design.

There are a number of computer applications which combine spectral
analysis, resynthesis, and spectral manipulation.  My favorite so far is
SPEAR.  

Usually the results of spectral analysis followed by resynthesis sound
similar to the original signal, but not quite identical.  The replica
usually sounds like it's lost some high frequency content, sounds a bit
"muddy" for lack of a better term.  Even if the analysis parameters are
tweaked extensively, it seems impossible in most cases to produce a replica
that is audibly indistinguishable from the original, and the replica always
seems to sound "inferior" to the original from an aesthetic perspective.

Having studied the practical issues surrounding the Fourier Transform, I
realize that there are Heisenberg-ish issues related to frequency and time
resolution which likely contribute to imperfections in the replica (unless
clever ways to get around or sufficiently minimize this have been devised
which I don't know about).  

Understanding the basic concept employed in the partial tracking
algorithms used by SPEAR et al (the FT is ignored below a certain
amplitude), and also knowing that high frequencies present in audio signals
tend to have relatively low amplitudes, I can see how these two issues
could together be responsible for poor replication in the high frequency
region.


Does anyone know of better software or methods for spectral
anaysis/resynthesis of audio signals?

Does anyone know if improvements in spectral analysis algorithms beyond
what is currently employed in SPEAR et al can be expected?  

Are there insurmountable mathematical limitations which will prevent
sufficiently accurate spectral modeling such that the model is audibly
indistinguishable from the original?

maxplanck wrote:

> Does anyone know of better software or methods for spectral > anaysis/resynthesis of audio signals?
Shameless personal plug (but it seems appropriate) : http://arss.sf.net (the Analysis & Resynthesis Sound Spectrograph). It has its limitations (the "Heisenberg-ish issues" you were referring to mainly) but I guess it's about as good as it gets when it comes to analysing a sound into a spectrogram and resynthesising it. I don't know so much about SPEAR but I seem to understand that it's more about bringing modifications to a STFT rather than going whole mile the ARSS goes that is synthesising from a sole bitmap image.
> Does anyone know if improvements in spectral analysis algorithms beyond > what is currently employed in SPEAR et al can be expected?
Well there are super-resolution techniques, such as the HHT, which I believe are mostly all fairly recent techniques. I myself intend to soon investigate such techniques because it's pretty annoying to have to choose between time and frequency resolution.
Do you know about this?  Apparently an improvement over STFT which
increases time/frequency resolution

http://en.wikipedia.org/wiki/Reassignment_method

Here are pics of STFT Spectrograms vs Reassigned Spectrograms, quite
remarkable increase in sharpness

http://www.cerlsoundgroup.org/Kelly/timefrequency.html


Sadly, I'm fairly sure that SPEAR uses the Reassignment Method (Someone
please correct me if I'm wrong), and although it works fairly well it
nevertheless almost always seems to produce a slighty "inferior" replica of
the original sound.  

I'm wondering if we can expect improvements beyond what's been achieved
using the Reassignment Method.. Or is there some mathematical proof that
this is probably as close as we can get.
From your site:
"The original idea that prompted me to create this project was that if we
can synthesise a spectrogram into a sound, then one could very well learn
how to create all types of sounds by learning how to do so from studying
spectrograms of recorded sounds"

This is exactly what I'm interested in!  It makes sense.. sound design
from the fundamental building blocks, learn how to build with these blocks
by studying analyzed sounds.  If only there were an analysis method which
produced spectral models that were audibly indistinguishable from the
original sounds!  Without that I'm reluctant to start, I feel that it would
be like groping in the dark.. trial and error in a realm of infinite
possibilities, trying to stumble upon a way to recreate what is missing
from the model.. not likely to happen, unless it's something obvious.

Your HAL 9000 spectrogram/photoshop example is quite inspirational.  

I think that the Reassignment Method may be one of the ultra high
resolution methods you mentioned.
maxplanck wrote:

> This is exactly what I'm interested in! It makes sense.. sound design > from the fundamental building blocks, learn how to build with these blocks > by studying analyzed sounds. If only there were an analysis method which > produced spectral models that were audibly indistinguishable from the > original sounds! Without that I'm reluctant to start, I feel that it would > be like groping in the dark.. trial and error in a realm of infinite > possibilities, trying to stumble upon a way to recreate what is missing > from the model.. not likely to happen, unless it's something obvious. > > Your HAL 9000 spectrogram/photoshop example is quite inspirational.
Well even if such programs as the ARSS were able to achieve such high resolutions that the resynthesised sound could be indistinguishable from the original it still wouldn't help so much if you wanted to create sounds on your own. If your goal is to reproduce speech, quite like I did with the aforementioned HAL 9000 Photoshop example, you don't need so much more resolution than what the traditional spectrography techniques can provide. Speech seems to be basically made of blobs of noise (the higher frequency components) and a bunch of parallel curves (which you find (usually?) lower in the spectrogram). Which means you don't need to see so much more than you can to figure out precisely what "makes" speech. Just better choose as high-pitched examples as possible, you'll get more resolution out of those. On a side note in case you intend to create speech from scratch just by putting phonemes together I've asked about it on this very newsgroup a few month ago and I've been informed that it wasn't quite that simple, i.e. you can't just stick a bunch of unrelated phonemes together and except it to work well. I haven't experimented with speech much though. If you're interested in other things than just speech well some things will work great and some not so much but the realm of possibilities we can explore right now with the tools available is quite vast. But it's just better to keep a distance from bass sounds right now (because of resolution issues that is).
> I think that the Reassignment Method may be one of the ultra high > resolution methods you mentioned.
Yup, it is indeed. I for one intend to explore a technique that I've only thought up and never read about yet. I'm very convinced that it should work great but the only time I talked about it it got shot down and that discouraged me exploring it any further for some time.
On May 29, 9:56 pm, Michel Rouzic <Michel0...@yahoo.fr> wrote:

> maxplanck wrote:
> > ... > > I think that the Reassignment Method may be one of the ultra high > > resolution methods you mentioned.
> Yup, it is indeed. > ...
In what sense of 'resolution' is the reassignment method an 'ultra high resolution' method, as opposed to the many other STFT based methods? Dale B. Dalrymple http://dbdimages.com
-Michel, unfortunately Bass sounds are the ones that I'm most interested
in!  I'm not very much interested in speech.  

If the algorithm does not reproduce the Bass sounds that I'm interested in
such that they are audibly indistinguishable from the originals, then it is
of limited use, I'm sure you understand what I'm talking about.  With an
audibly distinguishable reproduction, it's difficult or impossible to
determine what needs to be added to the model in order to make it audibly
indistinguishable from the original, or what if anything should be removed
from the model in order to make it closer to the original.

But just seeing someone draw a spectral model by hand which sounds close
to a real world signal is inspirational!  I've analyzed and resynthesized
signals using oscillator banks, but visually memorizing patterns and
drawing them is the next step which I have not tried yet because of my
dissatisfaction with the analysis algorithms that I've been working with.



-Dale, here are pics of STFT Spectrograms vs Reassignment Method
Spectrograms

http://www.cerlsoundgroup.org/Kelly/timefrequency.html


The Reassignment Method uses phase to resolve/minimize bin leakage (or
something like that, haven't read/thought carefully about it yet).  This
wikipedia article gives a general description, 

http://en.wikipedia.org/wiki/Reassignment_method

And I *think* this tutorial from dspdimension.com describes the method a
bit in the section titled "From phase to frequency"

http://www.dspdimension.com/admin/pitch-shifting-using-the-ft/
On May 30, 12:19 am, dbd <d...@ieee.org> wrote:
> On May 29, 9:56 pm, Michel Rouzic <Michel0...@yahoo.fr> wrote: > > > maxplanck wrote: > > > ... > > > I think that the Reassignment Method may be one of the ultra high > > > resolution methods you mentioned. > > Yup, it is indeed. > > ... > > In what sense of 'resolution' is the reassignment method an > 'ultra high resolution' method, as opposed to the many other > STFT based methods?
Reassignment is a form of STFT method, where reassignment uses multiple differently shaped windows per each frame instead of just one. Using two different windows per frame allows differencing the phase of the halves of the frame FFTs in a similar manner to using multiple (two) shorter STFT frames with something like phase vocoder analysis. If the two halves of the frame differ in spectrum, both reassignment and phase vocoder methods can produce bogus frequency 'resolution', unless one "knows" apriori that the two halves of the spectrum are related and consistent. Any information gain is related to the correctness of that assumption. . IMHO. YMMV. -- rhn A.T nicholson d.0.t C-o-M
On May 30, 11:49 am, "maxplanck" <erik.bo...@comcast.net> wrote:
> ...
> -Dale, here are pics of STFT Spectrograms vs Reassignment Method > Spectrograms > > http://www.cerlsoundgroup.org/Kelly/timefrequency.html > > The Reassignment Method uses phase to resolve/minimize bin leakage (or > something like that, haven't read/thought carefully about it yet). This > wikipedia article gives a general description, > > http://en.wikipedia.org/wiki/Reassignment_method > > And I *think* this tutorial from dspdimension.com describes the method a > bit in the section titled "From phase to frequency" > > http://www.dspdimension.com/admin/pitch-shifting-using-the-ft/
maxplank Thanks, I'm familiar with the reassignment method. I didn't ask what it is. I Asked: In what sense of 'resolution' is the reassignment method an 'ultra high resolution' method, as opposed to the many other STFT based methods? Dale B. Dalrymple http://dbdimages.com
On May 30, 4:25 pm, Ron N <ron.nichol...@gmail.com> wrote:

> ... > Reassignment is a form of STFT method, where reassignment > uses multiple differently shaped windows per each frame > instead of just one. Using two different windows per frame > allows differencing the phase of the halves of the frame FFTs > in a similar manner to using multiple (two) shorter STFT > frames with something like phase vocoder analysis. >
The phase vocoder calculates phase of bins of STFTs calculated with different centers to calculate a frequency correction from the phase difference and the distance between centers. Frequency reassignment calculates a frequency correction from the ratio of the STFTs of the same data with different windows. The similarity seems to be limited to the count of STFTs and the count of window applications being 2.
> If the two halves of the frame differ in spectrum, both > reassignment and phase vocoder methods can produce bogus > frequency 'resolution', unless one "knows" apriori that > the two halves of the spectrum are related and consistent. > Any information gain is related to the correctness of that > assumption. > ... > rhn A.T nicholson d.0.t C-o-M
Reassignment doesn't have two halves. The methods do share the assumption of stationarity. But the question I asked was: In what sense of 'resolution' is the reassignment method an 'ultra high resolution' method, as opposed to the many other STFT based methods? Are reassignment results from proper data any different than the results from algorithms using peakpicking and interpolation? "A Unified Theory of Time-Frequency Reassignment" by Fitz and Fulop is available at: http://www.eecs.wsu.edu/~kfitz/fitz05unified.html In figure 3 on page 13 there is an example of how reassignment results match the peaks of a STFT. Dale B. Dalrymple