I am analyzing sounds of daily activities recorded by a smartphone. For example walking, getting up from bed, falling, running etc.. (one at the time).
Let’s take one them. My goal is to
- convert from time domain to frequency domain
- divide it in bins of 10 Hz wide
- calculate the average FT magnitude in each of these bins.
My questions are:
- Which FFT Magnitude spectrum should I use between the following graphs?
- How should proceed to divide my signal (99144 total samples with 44100 sample rate) in 10 Hz bins? -> Here the frequencies of the single-sided magnitude are 0 – 22000 Hz -> Which means I should have 2200 bins of 10 Hz
-> Should I just create a new array and put the average mean of all values between 0 – 10 Hz, 10 – 20 Hz and so on?
[signal,fs]=audioread(filename); % fs = 44100
N = length(signal); % N = 94144
%% Single-sided magnitude spectrum with frequency axis in Hertz
% Each bin frequency is separated by fs/N Hertz.
X_mags = abs(fft(signal));
bin_vals = [0 : N-1];
fax_Hz = bin_vals*fs/N; % conversion Hz - Bins
N_2 = ceil(N/2);
figure;
subplot(4,1,1);
plot(fax_Hz(1:N_2), X_mags(1:N_2))
xlabel('Frequency (Hz)')
ylabel('Magnitude');
title('Single-sided Magnitude spectrum (Hertz)');
axis tight
%% Single-sided magnitiude spectrum in decibels and Hertz
X_mags = abs(fft(signal));
bin_vals = [0 : N-1];
fax_Hz = bin_vals*fs/N;
N_2 = ceil(N/2);
subplot(4,1,2);
plot(fax_Hz(1:N_2), 10*log10(X_mags(1:N_2)))
xlabel('Frequency (Hz)')
ylabel('Magnitude (dB)');
title('Single-sided Magnitude spectrum (Hertz)');
axis tight
%% Single-Sided Amplitude Spectrum
NFFT = 2^nextpow2(N);
Y_NFFT = fft(signal,NFFT)/N;
f_NFFT = fs/2*linspace(0,1,NFFT/2+1);
subplot(4,1,3);
plot(f_NFFT,2*abs(Y_NFFT(1:NFFT/2+1)))
title('Single-Sided Amplitude Spectrum of y(t)')
xlabel('Frequency (Hz)')
ylabel('|Y(f)|')
%% Pwelch defaut values
subplot(4,1,4);
pwelch(signal,[],[],[],fs);
title(sprintf('Pwelch defaut values'));
Just a couple of observations.
1) It is clear from the spectra shown, that the recordings were actually captured at 32 kHz sampling before being converted to 44.1 kHz. The half sampling cutoff is very evident at just over 15 kHz.
2) At 44.1 kHz sampling, you will need 2205 bins between DC and Fs/2 to obtain 10 Hz resolution. This is nit-picky, but 2205 instead of 2200.
3) To get a better average over the entire length of the data, you can break the recording into blocks of 2205 samples and do a 2205 point DFT for each block. Then you can average the results of each DFT per bin. Since you only care about Mag, you can average the Mags of the respective bins.
4) I wonder what you are trying to achieve. The spectra for the activities you describe vary significantly over time. You may want to run a spectrogram to see how the frequency content changes with time.
5) If you are only doing this with MatLab, then look at the usage of the pwelch power spectral density function. You can specify the DFT size and the overlap of the sample blocks. Pwelch is - virtually by definition - an averaging spectral estimator.
6) Getting to you question, I think that dB spectra provide better visualization of the dynamic range of the signal at different frequencies.
>>Kaz said
>>you say you want to "calculate the average FFT magnitude in each of these >>bins". but each bin will have one value.
My question wasn't very clear, sorry. I meant that I want to have my Frequencies (less than 22kHz) divided into bins 10 Hz wide.
Does this code achieves my goal?
%% Pwelch 4096 ( as 44100/4096 = 10.7 Hz)
len = 4096;
subplot(4,1,4);
pwelch(signal,len,[],len,fs);
title(sprintf('Pwelch Hamming Window Size :: %d',len));
>>dgshaw6 said
>>1) It is clear from the spectra shown, that the recordings were actually
>>captured at 32 kHz sampling before being converted to 44.1 kHz. The
half >>sampling cutoff is very evident at just over 15 kHz.
Could you please elaborate? The recording has been done using a smartphone (iPhone) and the file .m4a is directly opened in Matlab without prior processing. Is that normal? I won't use frequencies > 22KHz anyways.
>>dgshaw6 said
>>3) To get a better average over the entire length of the data, you can
break >>the recording into blocks of 2205 samples and do a 2205 point DFT
for each >>block. Then you can average the results of each DFT per bin.
Since you only >>care about Mag, you can average the Mags of the
respective bins.
Thanks. So before using fft function. I should break my 94144 samples in blocks of 2205 and do something like that?
NFFT = 2^nextpow2(2205);
Y_NFFT = fft(signal,NFFT)/N;
>>dgshaw6 said
>>4) I wonder what you are trying to achieve. The spectra for the
activities >>you describe vary significantly over time. You may want to
run a spectrogram >>to see how the frequency content changes with time.
I am trying to apply Machine Learning techniques to see if I can distinguish between the type of sounds (walking, running etc..) I have many other features generated but I also want to have "The average FT magnitude in each 10 Hz bin" as a feature for my Learning Algorithm.
>>5) If you are only doing this with MatLab, then look at the usage of the
>>pwelch power spectral density function. You can specify the DFT size
and >>the overlap of the sample blocks. Pwelch is - virtually by
definition - an >>averaging spectral estimator.
Thanks. The code I provided above is in the right direction to achieve my goal?
Thank you very much for your help and please bear with me any obvious question.
if your Fs = 44100 Hz and you want fft resolution of 10 Hz then use fft resolution N = 44100/10 = 4410.
You can then take just 4410 samples of signal or split the stream into segments (+ overlap) each = 4410 samples and directly average the bins (fft output).
FFT covers frequencies from -Fs/2 to +Fs/2 so you don't need to worry about single sided spectrum.
You better average the amplitude values from each bin directly(not the dB), then you can convert the mean to power in dB:
fft_mean = mean(ffts1,ffts2,ffts3,...);
plot(20*log10(f,abs(fft_mean))); %in dB
Thank you very much! It's more clear now. But instead of using fft function (and then calculate the mean), Can I just use pwelch as suggested by dgshaw6?
The following code seems to work fine for me and giving me 2206 bins of 10Hz. Is that correct?
% N = 94144
% fs = 44100
segmentLength = 4410; % window size
overlappedSamples = 2205; % 50% overlap
figure;
[pxx3,f3] = pwelch(signal,segmentLength,overlappedSamples,segmentLength,fs);
% Use segmentLength DFT points so that 100 Hz falls directly on a DFT bin.
pwelch(signal,segmentLength,overlappedSamples,segmentLength,fs);
% plot(f3,10*log10(pxx3))
yes pwelch can do it all nicely but is not intuitive.
Great! Thanks! So my code with pwelch is correct ?
Last question please: In my code, What’s the difference between the vector 'pxx3' and '10*log10(pxx3)' ? (I am assuming pwelch function is plotting 10*log10(pxx3)).
pwelch outputs power estimate (i.e. square of abs of fft). To convert power to db use 10*log10. If it was abs(fft) to db use 20*log10
Thank you very much.
I am not sure I have fully understand. So here Val1 and Val2 will output the same thing?
% N = 94144
% fs = 44100
segmentLength = N; % window size
[pxx,f] = pwelch(signal,segmentLength,[],segmentLength,fs);
%pwelch(data,segmentLength,overlappedSamples,segmentLength,fs);
Y = fft(signal);
Y_mag = abs(Y);
Val1 = 20*log10(Y_mag);
Val2 = 10*log10(pxx);
pwelch does not just use one fft, it applies several ffts , windowing, scaling and averaging.If you do all that you may get equality.
Here is example equality of fft & pwelch:
N = 94144;
x = randn(1,N);
pwr1 = pwelch(x,ones(1,N),0,N,1,'twosided');
y = fft(x);
pwr2 = abs(y).^2;
plot(pwr1);
hold
plot(pwr2*max(pwr1)/max(pwr2),'r--');
I like the way you answer. Very clear and to the point! Thank you very much my problem is solved!
If I may add another question (related to this one) : As stated above, my goal is to use Machine Learning techniques to try to distinguish between different sound activities (walking, running etc.).
Among the features I use for every audio file, is the vector containing the magnitutude of every 10 Hz bins. Is taking the values returned by 10*log10(pxx) of pwelch is okay or should I do further processing (for example detrending the original signal?)
You are best to research that and tell us. Certainly if you can identify the frequency content (power and its bin location) you can see if there is some pattern emerging to characterise each case based on such power distribution. You may also include phase information as it is there ready if you do fft (rather than pwelch).
Hi,
Sorry to take so much time to get back to you.
By the time of this writing, you have many of your questions answered, and it sounds like you have the tools needed to get you work done.
I do owe you a couple of answers still.
1) You have said that you took the recording file directly from the iPhone. My guess is that the .m4a file is encoding the audio using a starting sampling rate of 32 kHz, and then somewhere the recording is resampled to 44.1 kHz. This also surprises me, because most cell phones are converging on 48 kHz as the default sampling rate for the audio platform.
In 3) of my discussion, I said that you should break the data up into 2205 sample blocks. As @kaz has pointed out, I was incorrect, and 4410 is the correct block size. However, this means that you need to use the DFT size of 4410 samples, so that your equations:
NFFT = 2^nextpow2(2205);
Y_NFFT = fft(signal,NFFT)/N;
are not in fact correct. However, with you subsequent discussion with @kaz, you have arrived at exactly what I would have recommended by using the pwelch with 4410 blocks and 2205 overlap.
@kaz has suggested that pwelch is not intuitive, and I agree, but it is based on the DFT of the auto-correlation function, which - in my mind - helps to remove the local - in time - issues with the pure DFT techniques that have been discussed here.
I hope that things are working well now.
David
Thank you very much! Changing pwelch parameters got me thinking about some questions and I want to make sure that all is clear in my mind (my code is below).
1/ I have Fs=44100 and I am using pwelch with 4410 DFT points that’s why I have FT resolutions of 10 Hz (44100/10 = 4410 points). Pwelch returns a vector of 2206 values each (2206 bins).
Each value of this vector represents the average result of every 10 Hz bin. Am I correct?
2/ 4410 DFT Size -> Means that I break the recording (N = 94144 samples) into blocks of 4410 samples right? Does this mean I have segments of 4410 samples?
3/ At this point, DFT Size have nothing to do with windowing right?
4/ In my pwelch code below, I also use Hamming Window (default) of size 4410 with 50% overlap. Now, my big confusion, this have nothing to do with the DFT Size right?
Apologize in advance for any possible obvious explanation that I missed
My code
% N = 94144
% fs = 44100
[data,fs]=audioread(filename);
segmentLength = 4410; % window size
overlappedSamples = 2205; % 50% overlap
[pxx3,f3] = pwelch(data,segmentLength,overlappedSamples,segmentLength,fs); % in this case pxx3 returns vector of 2206 values
% second parameter: the length of the Hamming window to be used
% fourth parameter: number of FFT points to be used
1) Yes each bin is 10 Hz DC, 10Hz, ... Fs/2. DC and Fs/2 will have real content only.
2) Yes. The pwelch function will pull blocks of data in that are 4410 samples each. With overlap of 2205 that means that half of the previous block and a new set of data of 2205 will be used for the next block. For you purposes, you could probably use 0 overlap just fine too.
3) In the pwelch function, I believe that the second parameter can be a specification for a window function. By providing a single integer, you are effectively choosing a rectangular window of 4410 samples. The FFT length is defined in the fourth parameter which you have also selected as 4410. I have not experimented with choosing a different FFT length from the window size, so I don't know what will happen in pwelch if you try to do that.
4) I am a little confused, and I have not read the documentation recently, but I thought that to use a window you have to provide the vector for it like pwelch(signal, hann(segmentLength), ...) to force the use of a window on the data. I could be wrong, but I thought that a scalar implied a rectangular window. Sorry I don't have a better handle on that.
David
Thank you very much. It's very clear for the questions 1, 2 and 3.
I am still quite confused for the last question. I have already read the documentation of pwelch and you are right! You need to provide the vector of the window. But if you provide only the size (which I have done in my case), a Hamming window of that size is used by default. So in my case :
pwelch(data,segmentLength,overlappedSamples,segmentLength,fs);
First param: signal
Second param: the length of the Hamming window to be used (I use 4410)
Third param: the overalapping window (I use 2205 -> 50%)
Fourth param: Number of FFT points to be used (I use 4410 too).
Last param: fs
Reading the litterature, it's often suggested to use FFT points and Window size to be the same, and 50% overlap in general is good choice.
Now, I understand very well why I use 4410 FFT points thanks to you and @Kaz but I am still confused about the size of the Hamming Window and the overlap. The Window size is another concept that has nothing to do with the DFT Size right? In my case, I am using 4410 Window, but If I change it, what's the main consequence that I will notice in the pwelch output?
(Because the way I understand windowing, is segmenting the total samples to frames of smaller number of samples with %overlapping (data to be used again etc..) but this happens to be the same definition of DFT size?!)
Apologize in advance for any possible obvious explanation that I missed
Thank you very much
Hi,
Thanks for the correction to my understanding of the pwelch function. It has caused me to go back and reconsider some of the work I have done using spectrogram which uses the same technique. I have been looking at OFDM signals recently, and keeping the bins isolated is essential.
In order to use a rectangular window then, I will have to use ones(segmentLength) instead of the constant that I thought was the right thing to do.
The best authority on widows and their effect on frequency domain processing is fred harris at the UCSD. He has written many papers and some text books on the subject of frequency domain transforms and processing.
The most significant thing about using a hamming window is that a single frequency input - that would normally line up with the center if a single bin - will be "spread" over 2 adjacent bins as well. The center bin will be larger and the one on each side will appear to have content even when none is there. If memory serves me right, these adjacent bins will be 3 dB down from the center one.
Using more "complicated" windows creates more spreading than just two adjacent bins. Blackman-harris covers at least 5 bins.
The reason for using a 50% overlap with the conventional hamming window is that the emphasis of any given portion of the input data will be the same. Remember that the window is effectively a single negative raised cosine cycle. If we use a 50% overlap, then the two adjacent windows will sum to a constant over the effected samples.
Thanks again for setting me straight.
David
Thank you very very much for all your answers. This was extremely helpful for me and I am glad that my questions helped you in some way.
you say you want to "calculate the average FFT magnitude in each of these bins". but each bin will have one value.
So I assume you want to average over several FFT frames. In that case you just average the amplitude of outputs bins.
pwelch averages one frame by segmenting it, see help pwelch