Hello!
I'm really sorry this reply comes in so late, I did not realise this
newsgroup would move so fast. Well, better late than never.
Ron N. wrote:
> Bob Monsen wrote:
> > So, not being an expert, perhaps I'm missing some crucial point here,
> > but why can't he just use an overlapping (or maybe even non-overlapping)
> > set of sequential FFTs to track the frequency changes?
>
> It looks like it's the standard problem of resolution in frequency
> versus resolution in time. To get more resolution in frequency from
> an FFT requires a longer FFT window, which, without zero padding,
> results in less resolution in time, and/or might even require a
> buffer longer than the sound of interest which would introduce
> outside time-domain interference.
Yes, that's exactly what I meant. If I want more accuracy in frequency,
I need a longer window and resolution in time decreases. A few points
of this project that I could not make clear enough:
1) Yes, I have the whole choir on one track. Or actually two, but it
being in stereo does not help at all. So this is a real challenge. I
have already recorded what I'm about to record (over a year ago) so now
I'm just wondering how to get best results out of it. I did ponder
quite a bit about wether to use multi-track recording systems but then
I decided against it. As Richard Dobson wonderfully said, "a capella
choral singing is an extra-ordinary socio-acoustic phenomenon". So I
thought that if I took the choir into a studio and tried to record
voices separately, I would lose the basic essence of the phenomenon I'm
trying to study. I mean the situation would be so far from a real
choir-singing event that it would not be worth very much to me. I also
thought about things being easier if I had 4 or 8 singers instead of
35, but that kind of "barbershop" music is not what I'm interested in.
I'm fascionated about the way 35 singers can work like a single
instrument without anyone [inside] ever thinking about how 35 throats,
70 ears and 35 brains can possibly do anything together.
2) Our fft method this far consists of taking a series of 0.2 sec
slices (8820 samples) at 0.1 sec intervals (50% overlapping?), running
them through a hanning window, zeropadding with a million zeros and
plotting the result with frequency and amplitude. We have also made 3D
plottings of all the slices (time, freq, amp) and it works nicely but
now I'm more interested in getting the actual data out of there.
3) The autocorrelation thingy we tried didn't work because it seemed to
be highly vulnerable to noise. We knew that it was only supposed to
work if I had voices on separate tracks, but we thought we might check
if we could use very heavy filtering and isolate the fundamentals that
we are trying to measure (measure, not "find" since I know what's being
sung there, so I know with 5% accuracy where those fundamentals are).
So before trying any filtering we built an autocorrelation system and
tested it with a 440Hz matlab-made sinus waveform. It was correct and
accurate. Then we thought to test the extremely noisy scenario; we made
a waveform that had the sum of 400Hz and 500Hz with equal amplitudes
and fed it to the beast. As a result we got a very accurate, nice and
clean 441.xxx Hz. I'm not sure of the actual result, but something like
that. What scared us off was the fact that the result was as clean as
it would have been if there had been a single noiseless wave of that
frequency. All the noise was just gone. Nothing suggested that anything
was wrong with the result. I know our noise example was an extreme one,
but it revealed that power in other frequencies than the one being
measured will pull the result in the direction of the noise, and that's
the last thing we might want. Is this how autocorrelation is supposed
to work or did we do something wrong? I have some real noise issues
there also, because it was not a studio recording, not least of which
is a DC brumm in the left channel, but I think I'll handle that one by
filtering out everything below 55Hz (there's not anything I'm
interested in down there).
> In music, the strongest frequency present might be an overtone of
> the musical pitch. Autocorrelation can help determine if this is
> the situation. But if one already knows the approximate pitch, one
> can measure the dominant frequency and divide that down to get more
> precise pitch information.
I wonder how often that situation accurs. I'm not sure if you mean the
situation where a person singing alone (or many people singing the same
note) produces a sound where an overtone is dominant, or a situation
where that comes from different voices' overtones boosting each other,
for example to people singing a fifth apart at 200 and 300 Hz and a
common overtone of 600Hz being dominant.
The situation what I'm worried about is when an upper voice is singing
a pitch that is simulaneously an overtone of a lower voice. Let's say
bases sing 110Hz (A) and altos sing 330Hz (E an octave and a half
higher). And let's say tuning is not perfect. Now there would be a peak
at 330Hz that's the sum of altos and bases. How much would the bases'
overtone have effect on the peak at 330Hz? Let's say that bases sing
108Hz making their 3rd partial tone 324Hz. If I would try to study it
with a non-zeropadded fft I would get such a wide "hill" from the
altos' fundamental (330Hz) that there would be quite a lot of power
shown also at 324Hz. Now if we add the power from bases to that I
believe 324Hz would become the peak instead. I'm hoping to tackle that
by zeropadding (making the peaks steeper) but I'm not sure if it works.
And it would be nasty towards the singers to assume that "if the upper
pitch is in perfect harmony, it must be an overtone" :>.
Ron Nicholson wrote:
> As you found, zero-padding and using a long fft, although a very
> accurate method of interpolating frequency, will not show fine
> detail in the frequency envelope. However frequency is the
> derivative of phase. So what I might try is a technique from
> phase vocoding. Use overlapped successive short fft's and
> compare the phase changes in the nearest bin of interest with
> what would be the phase change represented by the overlap
> offset. Plot that phase difference. The slope of the plot will
> represent the frequency offset from the fft bin center, and any
> curvature in the plot will represent a change in frequency.
>
> This could work with fft windows as short as maybe a dozen
> cycles or less of the dominant frequency (which itself may be
> an overtone of the fundamental pitch), so you can get much
> better time resolution. For 330 Hz, maybe try 75% overlapped
> windows as short as maybe 1024 samples of 44.1 KHz.
Wow.. would this also work with my case where all the voices are on the
same track or for individual voices only? I have not yet figured out
what you mean by this but I will, and anyway to understand the answer
is on askers responsibility. One key question: What do you mean by
"nearest bin of interest"? (or rather, what does a "bin" mean in this
context?) A stupid question, perhaps, but I know only know only as many
things about signal processing as I have come across in this project,
and also it was my friend who wrote the actual code although I know
what the code does. I can read it but not yet write :)
Richard Dobson wrote:
> Excellent though Matlab may be for audio analysis, you may find that the
> advanced tools designed specifically for analysing musical audio may suit your
> purposes better. For example, the most comprehensive system around at the moment
> seems to be the CLAM suite of tools (all GPL with sources, but full binary
> installer-based packages are now available) from University Pompeu Fabra:
Thanks a lot!! I will check that out throughly. And thank you for the
insights on more musical matters! I didn't realise that singers
approach the tones from below, although I have practical experiense on
the matter. I mean, when you mentioned it I realised that's just what I
do. I have a pianist backround so I tend to consider harmony a
"vertical" thing. When I sing in a choir, I don't think very much about
succesive intervals being pure, but instead I compare my voice with
other simultaneously occuring voices. I surely don't assume that equal
tuning would be the basis for choirs, instead I'd like to really find
out what is the amateur reality of tuning; by researching the practise
I'm trying to find out what is the hidden ideal of tuning that they are
trying to achieve, if there is one. There is one paradox that I'm
particularly interested in, and this goes way out of topic of DSP:
Lets say choir sings in C major a following typical cadence with very
slow chords: C, F, dm7, G7, C or in other words I - IV - II7 - V7 - I.
The F major is tuned so that F and C make a perfet 5th and the A is a
pure major 3rd above F (a 4:5:6 major chord). In dm7 the D in
introduced - it will go to perfect 5th below the A while other notes
remain untouched. Now we go to G7 where singers of the D note will hold
their pitch from previous chord. G is tuned by the D and while doing
so, it will become lower than it was in the beginning, by a syntonic
comma (80:81). I know there are solutions to this but what I think is
weird about this phenomenon is that everybody are singing in perfect
harmony and our tonic is falling. I'd like to know how this thing is
dealt with "in real life".
I must sadly confess (to Ron N. and Robert Bristow-Johnson) that I
actually didn't understand (yet) much of your later conversation, and
therefore I fail at trying to comment it, although I would very much
like to.
Still one more question: It's usually recommended to use zeropadding by
something like the samples length of zeros. What effect does outrageous
zeropadding (a million zeros after a 8820 long sample) have on fft's
reliability? Would it be an issue that would ruin our results? It
certainly makes results more accurate, but would that just mean adding
decimals to a guess or would it really be more accurate?
Thanks very much to all of you who replied!
Erkki Nurmi