Reply by robert bristow-johnson●June 1, 20132013-06-01
On May 30, 3:36�pm, Alfred Einstead <federation2...@netzero.com>
wrote:
> I will follow up on the note below with some You Tube demos, along
> with an outline of the analysis method used to recover the "Holy
> Grail", as it were (i.e. clean separation of acoustic signals into
> their natural "chirp" components, in the same way that a person
> decomposes the sounds when hearing them).
>
> From 2012 Dec 15http://groups.google.com/group/comp.dsp/msg/fb55797342ea084e
>
> > The hybrid of the time-frequency and time-scale are the S-transforms.
> > They have some rather unusual and extremely properties that even the
> > research literature doesn't (yet) know of.
okay, on that post you (if you're the same as Mark) said:
> The S-transform fixes the problem with g. In its absolutely most
> general form it is defined by
> f_S(q, p) = integral f(t) (|p| g(p(t - q)))* dt
> and has inverse
> f(t) = integral f_S(q, p) 1^{p(t - q)} dq dp
not sure what the 1^some_power means.
q is delay (or shift in t) and p is scale. g(t) is a normalized
window kernel.
how is this different from a continuous wavelet transform?
r b-j
Reply by dszabo●May 30, 20132013-05-30
Pretty pictures. Have you transcribed anything with it yet?
Reply by Alfred Einstead●May 30, 20132013-05-30
I will follow up on the note below with some You Tube demos, along
with an outline of the analysis method used to recover the "Holy
Grail", as it were (i.e. clean separation of acoustic signals into
their natural "chirp" components, in the same way that a person
decomposes the sounds when hearing them).
From 2012 Dec 15
http://groups.google.com/group/comp.dsp/msg/fb55797342ea084e
> The hybrid of the time-frequency and time-scale are the S-transforms.
> They have some rather unusual and extremely properties that even the
> research literature doesn't (yet) know of.
> I'll do a quick run up to that, since I have an interest in it right
> now, because of the above-mentioned "unusual and heretofore unknown
> properties." Among other things, it recovers the concept (well-known
> to physicists) of "instantaneous frequency" and it leads directly to a
> *non-linear* transform that removes the problem of spectral leakage.
(The property not "well-known", BTW, was not the "instantaneous
frequency" concept, but was that the S-transform has a version of the
Parseval Theorem, but that's another issue I won't be discussing
here.)
> It's better to map the amplitude as brightness and the phase as color.
> Then you'll end up seeing some rather interesting (and revealing)
> patterns. Colorizing the transform shows the first signs of the
> emergence of the Holy Grail that I'm leading up to.
You can see this, for instance, in the following You Tube, where I
transcribe a drum solo at 1/4 speed with a (really really sloppily
coded & slightly modified) version of the S-transform.
http://www.youtube.com/watch?v=6orozX1GD1w
What's modified in the transform is that I normalise the phase of the
transrform so that the following property holds:
* Tones transform to tones with the same phase and frequency.
The phase is color-coded and the one thing that stands out is that it
is *almost independent* of the frequency at which the transform tunes
in at. You can see clearly-distinguishable groups, once the phase is
brought out in this way. The one thing that particularly stands out is
the real-time tracking on the instantaneous frequency, which can be
directly seen by counting the number of phase cycles per time unit
(the units in the video are 1 pixel = 1/5280 second). The bass drum,
for instance, drops down from 85 Hz to 60 Hz, and you can even see the
echo resonating in real time.
The separation is generally true for any acoustic or wave phenomenon
where separate (possibly variable-frequency) components are aded in --
as long as you don't have too many crowded together on the same
signal.
> Finally, this leads up to the Holy Grail. Since the natural frequency
> of this component is n(q, p), then it is just as natural to redraw the
> spectrograph by moving this amplitude up from frequency p to frequency n.
This is seen in the second You Tube video:
http://www.youtube.com/watch?v=itUSUau6DJM
which shows the same drum solo, first at 100% speed, then 25% speed
with the frequencies localized -- and then does the same thing for a
15 second segment of the "2001 theme" (at 1/4 speed, for 1 minute)
along with a House music segment that has both voice and electronica
in it.
The one thing that stands out the most is that the spectrum is
SINGULAR. It is virtualy al concentrated on a web of "chirp lines".
The only reason you see the extra smoky residue of lines is because I
enhanced the brightness from the raw readout to better show the
intensities. The original was virtually all black, except on the chirp
lines themselves.
The phase color-coding is averaged for the higher frequencies, so it
shows up as white. The averaging is done by taking the brightness
equal to the time-average amplitude, and the color saturation to the
average coherence of the signal over the time interval. Low
frequencies are coherent, while high frequencies show up as grey-
shades.
The coding is sloppy, because it is only meant to be a proof of
concept. In particular, different phase-estimating methods were used
in the different segments as the basis for determinig the
instantaneous frequency. For the House Music segment, for instance,
you see quantization of the "chirp lines" which arises as an artifact
coming from the fact that I used a time step in doing the estimation
of the phase time derivative.
(And, of course, I can't let the house music demo go without doing a
shameless plug for some of the experimentation I've been doing
spectrographic-based remixing and combined biological-machine voice
synthesis :)
The Beast Stomp:
https://www.youtube.com/watch?v=FrnuJp9eoRw
)
The ideal form of the analysis that results in the clean separation
can be done as follows. The "ideal" part of the analysis is the color
coding of the phase. A method is required to both do and undo the
phase-averaging that is necessary when going beyond the resolution of
the graph.
(a) First get a rough separation of frequencies (e.g. the first You
Tube video), so as to separate out the components at least to some
degree.
Any transform may be used as long as (a) it is approximately scale-
invariant, (b) preserve the phase of monochromatic tones.
The forward transform to convert a signal X into a complex spectrum
Y(q,p) parametrized by time (q) and frequency (p).
(b) Second, localize the spectrum to its instantaneous frequency,
which is determined by the phase (e.g. the second You Tube video).
IF is the time derivative of the phase. The IF (nu) is determined for
each frequency bin (p) at each time (q). One way to estimate it is to
multiply the transform by the frequency before carrying the transform
to obtain the time derivative directly. That's something I haven't yet
tested.
(c) Add the complex amplitude Y(q,p) to Y_loc(q,nu(q,p)).
So, the contribution that originally went into the p-bin gets moved
into the nu(q,p)-bin. All this is added up.
(d) Perform an inverse transform on Y_loc a per frequency basis, for
each nu to obtain a set of voices X_nu for each frequency nu.
The result is a set of voices that are displayed concurrently on
separate tracks to make a spectogram (or scalogram). The frequencies
for the "voices" can be freely chosen, provided the right
normalizations are used -- particular so as to make property (e)
valid.
(e) The original signal is simply the sum X = sum_nu X_nu.
This is where the "ideal" part of the analysis enters the picture. To
get an exact reproduction requires the exact phase be retained in the
graph, itself, or represented by some other (useable) means.
Otherwise, phase estimation is going to have to be carried out. The
primary issue here is that the chirp lines may cross bins, while yet
you want the phase to remain continuous on each chirp line. This
requires using some kind of left-to-right algorithm to come up with
the best estimate for the phase in each bin, based on the phases (and
intensities) of the nearest frequency bins in the immediately
preceding time step.