On Wed, 16 Dec 2015 12:13:53 -0800 (PST), dbd <d.dalrymple@sbcglobal.net> wrote:>On Friday, December 11, 2015 at 9:07:09 PM UTC-8, Max wrote:>> The S-transform sounds like a cross between wavelets and STFTs. That >> should be a good start. The link above didn't work though: 403 error, >> which I assume means that authorization is required. > >The url is a free journal. I haven't paid for access. Perhaps your ISP has been blacklisted.>A simple google finds the original S-Transform paper: > >Stockwell, RG, L Mansinha, and RP Lowe (1996). Localization of the complex spectrum: the S transform, IEEE Transactions on Signal Processing 44 (4), p 998-1001. > >Available (today at least)at: >http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.462.1500&rep=rep1&type=pdf > >Some other general papers that might be of interest are: > > E. Sejdi?, I. Djurovi?, J. Jiang, �Time-frequency feature representation using energy concentration: An overview of recent advances,� Digital Signal Processing, vol. 19, no. 1, pp. 153-183, January 2009. >http://www.imedlab.org/publications/tfr%20review.pdf > > >R. A. Brown and R. Frayne, "A fast discrete S-transform for biomedical signal processing", University of Calgary Seaman Family MR Research Centre Foothills Medical Centre, Canada. >http://www.ncbi.nlm.nih.gov/pubmed/19163232 > >Dale B. DalrympleHi Dale, Persistent '403' at the original URL when I tried it last, but I just tried again, and it worked. Thanks for the links to other papers as well. I imagine they'll be challenging, but hopefully there is a worthwhile avenue, especially in that I've been exploring 'feature extraction' for audio, which could make use of wavelet-like analysis in places.
Adaptive window size for STFT
Started by ●December 9, 2015
Reply by ●December 16, 20152015-12-16
Reply by ●December 16, 20152015-12-16
Thanks for the references, Wolfgang. Lots of studying to do now! On Tue, 15 Dec 2015 00:04:39 +0100, Wolfgang S�rgel <wsoergel_news@go4more.de> wrote:>There are methods, developed for low-delay noise reduction in audio: > >You could look up the STFT window switching by Mauler & Martin: > >http://www.asp.eurasipjournals.com/content/pdf/1687-6180-2009-469480.pdf > >Another method, which also allows frequency selective resolution: > >http://ftp.esat.kuleuven.be/pub/SISTA/ida/reports/13-228.pdf > >Generally: These methods do work. In practice, restrictions may arise if >the time-frequency analysis-synthesis is not only for a single-purpuse, >such as noise reduction but rather more objectives apply. Will a >variable scheme be suitable for all applications? > >On 12/09/2015 05:15 PM, Max wrote: >> I'm interested in finding out more about refining time-resolution by >> use of multiple window widths for STFT's. In other words, wider >> windows for lower frequencies, narrower for better time-resolution for >> highs. >> >> Any recommendations for starting points? >> >> Also, are there any pitfalls to watch for? It seems an obvious thing >> to do, but most of the time, when I see mention of STFT's, one window >> size is used for all frequencies. >>
Reply by ●December 19, 20152015-12-19
On Friday, December 11, 2015 at 11:07:09 PM UTC-6, Max wrote:> On Wed, 9 Dec 2015 15:22:05 -0800 (PST), dbd wrote: > >On Wednesday, December 9, 2015 at 8:15:16 AM UTC-8, Max wrote: > >> I'm interested in finding out more about refining time-resolution by > >> use of multiple window widths for STFT's. In other words, wider > >> windows for lower frequencies, narrower for better time-resolution for > >> highs. > >> > >> Any recommendations for starting points? > >> > >> Also, are there any pitfalls to watch for? It seems an obvious thing > >> to do, but most of the time, when I see mention of STFT's, one window > >> size is used for all frequencies. > > > >A commonly used and well supported approach is the S-Transform. > > > >Stockwell, RG, L Mansinha, and RP Lowe (1996). Localization of the complex spectrum: the S transform, IEEE Transactions on Signal Processing 44 (4), p 998-1001. > >Abstract > >The S transform, which is introduced in this correspondence, > >is an extension of the ideas of the continuous wavelet transform (CWT) > >and is based on a moving and scalable localizing Gaussian window. It > >is shown here to have some desirable characteristics that are absent in > >the continuous wavelet transform. The S transform is unique in that > >it provides frequency-dependent resolution while maintaining a direct > >relationship with the Fourier spectrum. These advantages of the S > >transform are due to the fact that the modulating sinusoids are fixed > >with respect to the time axis, whereas the localizing scalable Gaussian > >window dilates and translates. > > > > > >A simpler discussion: > > > >International Journal of Signal Processing, Image Processing and Pattern Recognition Vol.6, No.5 (2013), pp.245 > > > >Time-frequency Analysis Based on the S-transform > >Lin Yun*, Xu Xiaochun, Li Bin and Pang Jinfeng > >College of Information and Communication Engineering > >Harbin Engineering University > >Harbin, China > > > >Abstract > >S-transform is a new time-frequency analysis method, which is deduced from short-time Fourier transform and continue Wavelet transform. It has much better performance than traditional time-frequency method. Therefore, in this paper, the basic principle of is briefly introduced and the relationships between is analyzed by theoretical derivation. According to the simulation experiments, the time-frequency space characteristics of short-time Fourier transform, Wigner-Ville distribution and S-transform are contrasted. As the results shown, the window of S-transform has a progressive frequency dependent resolution. So the S-transform has a great flexibility and utility in the processing of non-stationary signal. > >Compare with the time-frequency spectrum of three different analysis methods under various noise conditions, it is obvious that S-transform has much better anti-noise performance than that of traditional methods for non-stationary signal processing. Based on the superior time-frequency resolution, the S-transform spectrum can be used to describe the structure of incoming signal effectively > > > >At: > >http://www.sersc.org/journals/IJSIP/vol6_no5/22.pdf > > > >Dale B. Dalrymple > > Hi Dale, > > The S-transform sounds like a cross between wavelets and STFTs. That > should be a good start. The link above didn't work though: 403 error, > which I assume means that authorization is required.Any transform with a window size that proportionally tracks the frequency is essentially a wavelet transform. The only question that distinguishes one from another is what inversion formula is used and what "admissibility" conditions there are for inversion. The S-transform actually uses the same *forward* transform as the wavelet transform (that is, when the former is suitably generalized to allow for windowing functions other than the gaussian). It differs from the wavelet transform only in (a) the inverse transform formula and (b) the precondition for inversion. There is a slightly different transform which I've used (and stumbled upon) that ALSO uses the same forward transform but has a notable advantage in that the inverse transform is "just add up the (complex) components" -- in the TIME domain(!) I don't need to emphasize the obvious advantages this has for doing manipulations and analysis with the spectrum. One of the biggest advantages is that you can freely relocate any part of the spectrum anywhere vertically -- thus making relocation by "instantaneous frequency" an order of magnitude simpler. The admissibility conditions for both S-transform and this (unnamed) latter one are quite lenient and both allow for gaussian (or gaussian on a log-frequency scale) or cosine windows. I demonstrate the latter here (where the transform was actually done in the time domain!) in which I misnamed it the S-transform before getting fully up to speed on the issue https://www.youtube.com/watch?v=itUSUau6DJM I have source code (as I alluded to or mentioned a few articles back) for doing all the above. WtoB -- STFT with fixed window size. dB or linear amplitudes, graph size specifiable, hop and DFT size specifiable (i.e. bristling with options almost as bad as ffmpeg does). Basically this is like a cross-modal "ffmpeg Sound.wav Graph.bmp" conversion -- except that ffmpeg does not cross over between sound and graphs (yet!) BtoW -- a peak finder that is almost good enough to be an inverse to WtoB (hence the name). EnScale -- the routine that produces graphs like those seen in the YouTube on the last (unnamed) transform. Purely time domain. Phases are displayed color-coded and this can go WAY up (like 5280 pixels/second). DeScale -- the "add 'em up" inverse for EnScale. Needs high enough time resolution to see the phase. Does not incorporate any of the tricks used in BtoW though yet. ARSS (my version) -- does logarithmic scale transforms just like the original article is seeking out. Can do linear frequency scale or hybrid linear/log. I set it up to show dB or linear amplitude output. Essentially this is a S transform with a cosine window on a log frequency scale. The source code for all the above is VERY SMALL (< 1000 lines each). A few extra files are needed to resolve internal dependencies -- but there are NO external dependencies. (ARSS originally had a dependency on the huge FFTW library which you can always put back if you want to change the code, though it's not worth it). The code -- all in C -- is available by e-mail. Just drop me a note. I'll try to get it wrapped and cleaned up a bit and maybe and in a few notes on the algorithms and analyses.
Reply by ●December 28, 20152015-12-28
On Wednesday, December 16, 2015 at 9:47:39 PM UTC-6, Max wrote:> On Wed, 16 Dec 2015 12:13:53 -0800 (PST), dbd > wrote: > > >On Friday, December 11, 2015 at 9:07:09 PM UTC-8, Max wrote: > > >> The S-transform sounds like a cross between wavelets and STFTs. That > >> should be a good start. The link above didn't work though: 403 error, > >> which I assume means that authorization is required. > > > >The url is a free journal. I haven't paid for access. Perhaps your ISP has been blacklisted. > > >A simple google finds the original S-Transform paper: > > > >Stockwell, RG, L Mansinha, and RP Lowe (1996). Localization of the complex spectrum: the S transform, IEEE Transactions on Signal Processing 44 (4), p 998-1001. > > > >Available (today at least)at: > >http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.462.1500&rep=rep1&type=pdf > > > >Some other general papers that might be of interest are: > > > > E. Sejdi?, I. Djurovi?, J. Jiang, "Time-frequency feature representation using energy concentration: An overview of recent advances," Digital Signal Processing, vol. 19, no. 1, pp. 153-183, January 2009. > >http://www.imedlab.org/publications/tfr%20review.pdf > > > > > >R. A. Brown and R. Frayne, "A fast discrete S-transform for biomedical signal processing", University of Calgary Seaman Family MR Research Centre Foothills Medical Centre, Canada. > >http://www.ncbi.nlm.nih.gov/pubmed/19163232 > > > >Dale B. Dalrymple > > > Hi Dale, > > Persistent '403' at the original URL when I tried it last, but I just > tried again, and it worked. Thanks for the links to other papers as > well. I imagine they'll be challenging, but hopefully there is a > worthwhile avenue, especially in that I've been exploring 'feature > extraction' for audio, which could make use of wavelet-like analysis > in places.A good followup to these that expands on my mention a couple articles up about relocation is Estimating and Interpreting the Instantaneous Frequency of a Signal Part 1: Fundamentals Boualem Boashash, Senior Member, IEEE Proceedings of the IEEE, Volume 80, No. 4, April 1992, pp. 520-538. which can be found on the web as well by Google search. Relocation undoes the effects of "spectral leakage" by making effective use of the phase. This goes particularly well with any scale-covariant transform (i.e. any of the transforms, like continuous wavelet, S-transform, or the unnamed one I mentioned before that amount to the use of a logarithmic scale for frequency). I'm not all too keen on the Stockwell articles, because of their gaps and conclusions that go astray (so I rewrote them and ramped up the figures to colored figures and added in the missing talking points). One of the biggest missing features is the absence of any energy integral! In fact, there is one that also happens to directly tie into the "instantaneous frequency" issue raised in Stockwell's Papers. It has the curious form that the total energy counts negative frequencies as negative and the formula looks like one seen in quantum theory. (A non-issue if you restrict yourself to the analytic signal and positive frequencies). The BtoW program I mentioned does rudimentary relocation by locating frequencies at peaks in the DFT spectrum (and using quadratic fitting to estimate the peak). The results are quite good and (contrary to what I said in my last two articles) they DO have the ability to produce white noise. (Although I'm not sure why, since WtoB was originally meant to be a noise-eliminating program!) A demo of what you can do -- this comes straight off the spectrograph depicted in the video ... with a 2-3-4 harmonic keying to add a Rock/House music chord effect. Experiment: House/Techno by 100% spectrographic production https://www.youtube.com/watch?v=aWpAWXmMbqI If you want PDFs for the rewrites I mentioned above of the Stockwell papers, let me know. The "Parseval" energy integral is worth seeing if nothing more. All the transforms, by the way, can be combined within a SINGLE unifying framework by the integrals f(q,p) = integral f(t) (g_p(t - q) 1^{p(t - q)})* dt f(t) = integral f(q,p) g^p(t - q) 1^{p(t - q)} dq dp where 1^x stands for exp(2 pi i x) the analysis windowing functions are g_p(t - q) the synthesis functions are g^p(t - q) either or both may be complex ()* = complex conjugate the precondition for the inversion formula to work is integral g_p(t - q)* g^p(t - q) dp = 1 Windowed Fourier Transforms (or STFT) have functions g_p(Q) = g(Q) and g^p(Q) = h(Q) that are independent of the frequency and ... consequently ... windowing sizes independent of frequency p. The Wavelet and S transforms have g_p(Q) = g(pQ) |p|^A where the power A can be chosen in various ways and g^p(Q) = h(pQ) |p|^{1-A}. Depending on how h is chosen you get (generalized) S or wavelet. It is possible to set h(Q) so that the inverse amounts to just adding up p's. f(q) = integral f(q,p) mu(p) dp where mu(p) determines whether you have a log or linear axis
Reply by ●December 29, 20152015-12-29
On Wednesday, December 16, 2015 at 9:47:39 PM UTC-6, Max wrote:> especially in that I've been exploring 'feature > extraction' for audio, which could make use of wavelet-like analysis > in places.You'll notice in the YouTube demo I posted a link to yesterday, the spectrum had broad triangular ghost-like lines in it producing the Alien sounds, the lines are actually *added* and which means (since the spectrum is on a dB scale) that the respective spectra are being multiplied; i.e. the respective sounds undergo *convolution*. The same thing applies to voice: the distinctive quality of a voice is, for the most part, convolved with the distinctive languagey-part of the speech. The latter you can produce in isolation as a "white noise response" (i.e. whisper the sound). Then convolve it with the voice and you get the speech. The components on a dB spectrum are simply added. So working backwards, you'd end up finding the best-fitting voice pattern to graft onto the language part and separate the two. The combination method will even work if the "voice" is music! (The music starts to talk). I ran a test on this earlier today and it sounds spooky. I'll post a demo when I get time. About feature extraction in general: you'll see a demo of this in my video https://www.youtube.com/watch?v=Qok1QLMhrxc where I separate the control voice (in The Outer Limits intro) from the background beacon tone. Most of what you're seeking to do in this respect is *independent* of what time-frequency analysis method you use, so the main topic of the article actually is a separate issue. One of the reasons I posted the references about Instantaneous Frequency hinges in this very question (and it is why it's an important staple in DSP, as well as an issue Stockwell raised prominently in his papers): sound components take the form of lines in a spectrum. So focusing everything in the spectrum down to its "chirp lines" (i.e. undoing the effects of both spectral leakage and its *temporal* analogue) will yield the components. The method I used in the above video was simply to do factor analysis on the spectrum! More precisely singular value decomposition on the matrix making up the spectrum. The program DeLayer.c (that I mentioned a few articles back, which works with the output of my other programs BtoW and WtoB) will do least-squares factor analysis. If you use enough factors, I found, that the *individual notes* in a music source will tend to be extracted, among other things. Technically, least-squares-based methods are not appropriate for amplitude spectrographs since amplitudes are non-negative, though they might work on spectrographs where you have complex amplitudes available (i.e. amplitude + phase). To do this problem right requires going over to CONVEX optimization and convex programming which feature extraction is the prime exemplar of. The convex programming problem, given an amplitude spectrograph A(q,p) is minimize |A(q,p) - sum_f L_f(q) R_f(p)|^2 subject to the convex constraints A(q,p) >= 0 L_f(q) >= 0 (the time series for factor f) R_f(p) >= 0 (the spectrum for factor f) This extracts a "sound/music" score from a spectrum. Having said what I just said: least squares analysis WILL extract factors, but it tends to pair off opposing factors, like this L(q) = L_1(q) + L_2(q), R(p) = R_1(p) + R_2(p) where the pairs (L_1,R_1) and (L_2,R_2) are non-zero over non-overlapping regions in the (q,p) time-frequency plane.
Reply by ●December 29, 20152015-12-29
On Tuesday, December 29, 2015 at 7:29:27 PM UTC-6, federat...@netzero.com wrote:> To do this problem right requires going over to CONVEX optimization and convex programming which feature extraction is the prime exemplar of. > > The convex programming problem, given an amplitude spectrograph A(q,p) is minimize |A(q,p) - sum_f L_f(q) R_f(p)|^2 subject to the convex constraints > A(q,p) >= 0 > L_f(q) >= 0 (the time series for factor f) > R_f(p) >= 0 (the spectrum for factor f) > This extracts a "sound/music" score from a spectrum. > > Having said what I just said: least squares analysis WILL extract factors, but it tends to pair off opposing factors, like this > L(q) = L_1(q) + L_2(q), R(p) = R_1(p) + R_2(p) > where the pairs (L_1,R_1) and (L_2,R_2) are non-zero over non-overlapping regions in the (q,p) time-frequency plane.I forgot: a reference to search for Convex Optimization (730 pages, PDF) Stephen Boyd Lieven Vandenberghe should be accessible in PDF form via web search (which is where I got it). The problem I alluded to was discussed in greater generality in Chapter 4. Methods and algorithms are in Chapters 9-11. Convex programming, though a form of (the generally intractible) NON-linear programming has the same time-saving efficiencies as linear programming, and so is tractible. The convex programming problem I posed will tend to lock onto to the naturally occurring vertical and horizontal features in the spectrum (the snaps and tones).
Reply by ●January 3, 20162016-01-03
On Mon, 28 Dec 2015 17:59:24 -0800 (PST), federation2005@netzero.com wrote:>A good followup to these that expands on my mention a couple articles up about relocation is > >Estimating and Interpreting the Instantaneous Frequency of a Signal >Part 1: Fundamentals >Boualem Boashash, Senior Member, IEEE >Proceedings of the IEEE, Volume 80, No. 4, April 1992, pp. 520-538.... Hey, Federation2005: Before this scrolls away, I wanted to let you know that I did see your posts, and I appreciate the effort. I'll be looking into the papers and other comments.
Reply by ●January 6, 20162016-01-06
On Sunday, January 3, 2016 at 9:09:24 AM UTC-6, Max wrote:> Hey, Federation2005: Before this scrolls away, I wanted to let you > know that I did see your posts, and I appreciate the effort. I'll be > looking into the papers and other comments.Here's the experiment I was referring to where I added the spectrographs (in dB = multiplication in linear amplitudes). Posted on New Year's Eve of course. :) Experiment with Cyborg House: https://www.youtube.com/watch?v=715CLhWO8H4&feature=youtu.be It DOES work if you attach the voiceless sounds to an actual voice, though it produces a buzzing quality (a common problem, by the way, with voice synthesis routines that use Hidden Markov Models). To switch genders I end up pushing the voice part up an octave, but also the language part up, say, 1/4 octave. If you want the software I referenced to experiment around with (and to modify and expand to you heart's content) let me know. The same for the factor analysis routine (DeLayer.c) which does singular value decomposition factor analysis on the matrix representing a spectrograph.






