Technical discussions related to Audio Signal Processing (digital effects, acoustics, noise reduction, musical signal processing, etc).
Hi everyone ! I'm a beginner in audio DSP applications. I'm a student working in the field of sound quality. In this framework, I'm facing the following issue : I'd like to impoverish the frequency content of some 4-seconds recorded sounds (fs=44100hz, quality 16 bits), i.e. increase the frequency step Df of their FFT spectrum (for example 2Hz instead of 0.25Hz initially), while keeping the same sampling rate to recompute at the end the modified sound. I've thought about the following process : - first, applying the FFT and keeping complex samples 0 to N/2; - secondly, keeping samples 0,4,8...N/2 to obtain a coaser Df equal to 2Hz (amounting to decimation); - thirdly, linearly interpolating the real and imaginary parts of the complex spectrum to recover the initial Df 0.25; - then, reconstituting the entire FFT spectrum by flipping and taking the conjugates of samples 1 to N/2-1, and concatenating them to previous samples 0 to N/2, - finally, recompute the modified sound by performing iFFT on the previous reconstituted spectrum and keeping the real part of its result. I've tried this process but it seems it doesn't work: the obtained sound is really distorted (it seems that phenomenon of modulation and maybe aliasing occur in the time domain). I'm getting stuck on that problem; can anyone help me please to detect where my reasoning can be erroneous and indicate me how to process the sound in a proper way to reach the expected result, if possible ? Thanks a lot. Skoobi.
On Sunday 01 October 2006 12:56, arnaud_trolle wrote: > I'd like to impoverish the frequency content of some 4-seconds > recorded sounds (fs=44100hz, quality 16 bits), i.e. increase the > frequency step Df of their FFT spectrum (for example 2Hz instead of > 0.25Hz initially), while keeping the same sampling rate to recompute > at the end the modified sound. Hi Scoobi, Fs=44100Hz allows freq content up to almost 22,05kHz. Downsampling 0,25Hz->2Hz takes this border down to 22,05kHz/8=2,75kHz. When upsampling again, you have lost 7/8 of your signal, and you get intense aliasing from above 2,75kHz. This is what you hear. You'll have to suppress all frequency content above (and including 2,75kHz). There's no way around this. Because the frequencies above 2,75kHz don't contribute real signal content, there's no loss anyway, so the only issue is that you have to apply a filter (which costs you processing time / memory). That's the reason why 44,1kHz is used for Fs: the aliasing above 22,05kHz is out of the range of what people (usually) can hear, so there's no need for high Q antialiasing filters. Bernhard
Bernhard- > Fs=44100Hz allows freq content up to almost 22,05kHz. > Downsampling 0,25Hz->2Hz takes this border down to 22,05kHz/8=2,75kHz. > When upsampling again, you have lost 7/8 of your signal, and you > get intense aliasing from above 2,75kHz. This is what you hear. > You'll have to suppress all frequency content above (and including > 2,75kHz). > There's no way around this. > Because the frequencies above 2,75kHz don't contribute real signal > content, > there's no loss anyway, so the only issue is that you have to apply a > filter (which costs you processing time / memory). > > That's the reason why 44,1kHz is used for Fs: the aliasing above 22,05kHz > is out of the range of what people (usually) can hear, so there's no > need for high Q antialiasing filters. It should be made clear that *some* anti-alias filter is still needed, but with less demanding characteristics (higher cut-off frequency, slower roll-off, which you described as 'Q') as sampling rate increases beyond human hearing. Whatever the sampling process -- sigma-delta converter, successive-approximation converter, etc -- if the input transducer can support a higher range than 1/2 the sampling rate, then the system must include some type of anti-alias filter in the "analog" domain; i.e. prior to sampling. -Jeff
On Wednesday 04 October 2006 18:51, Jeff Brower wrote: > Bernhard- > > > Fs=44100Hz allows freq content up to almost 22,05kHz. > > Downsampling 0,25Hz->2Hz takes this border down to 22,05kHz/8=2,75kHz. > > When upsampling again, you have lost 7/8 of your signal, and you > > get intense aliasing from above 2,75kHz. This is what you hear. > > You'll have to suppress all frequency content above (and including > > 2,75kHz). > > There's no way around this. > > Because the frequencies above 2,75kHz don't contribute real signal > > content, > > there's no loss anyway, so the only issue is that you have to apply a > > filter (which costs you processing time / memory). > > > > That's the reason why 44,1kHz is used for Fs: the aliasing above 22,05kHz > > is out of the range of what people (usually) can hear, so there's no > > need for high Q antialiasing filters. > > It should be made clear that *some* anti-alias filter is still needed, but > with less demanding characteristics (higher cut-off frequency, slower > roll-off, which you described as 'Q') as sampling rate increases beyond > human hearing. > > Whatever the sampling process -- sigma-delta converter, > successive-approximation converter, etc -- if the input transducer can > support a higher range than 1/2 the sampling rate, then the system must > include some type of anti-alias filter in the "analog" domain; i.e. prior > to sampling. > > -Jeff Hi Jeff, thanks for pointing this out. I agree in most of what you say. My intention, however, was to keep things as simple as possible. Besides, a broad variety of sigma-delta ADCs has the antialiasing filter incorporated. In combination of a common analog front end including OP-amp based decoupling stage and preamplifier which cannot handle frequencies much higher than 20kHz, an explicit anti-aliasing filter is dispensable. Therefore I decided to not mention this in my post. Nevertheless, you're right from the theoretical point of view - and without thorough investigation of the individual case, it's not legal to ignore this ... Bernhard
On Wednesday 04 October 2006 18:38, Arnaud Troll=E9 wrote: > ... > FFTYreshape=3DReFFTYreshape+i*ImFFTYreshape;=20 > Ymodif=3Dreal(ifft(cat(1,FFTYreshape,fliplr(conj(FFTYreshape(2:end-1,:))'= )')) >); wavwrite(Ymodif/(1.001*max(max(Ymodif))),44100,16,'ModifiedSample.wav')= ; > > Can you confirm me with these additive elements that what you've mentionn= ed > still holds ? Sorry if my speech appears a bit confused or aberrant, I'm > not still used to DSP mechanisms but I'm working towards it :-). > > Thanks, > > Scoobi. Sorry, Scoobi, that I cannot directly relate to your Matlab code. However, I'll try to tell you from the pragmatic point of view: The way to go is usually this: 1) There's information in every sample of your stream. 2) There may be redundancy in the stream. Examples: - a passage of silence - every other sample can be deduced from its neighbors=20 by (linear|cubic) interpolation - speech must be recognizable, but redundancy is in the quality - only low frequencies are used 3) If there is no redundancy, you cannot reduce sample rate or stored=20 information 4) If there is redundancy, you must classify it. Find a mathematical algorithm (or describe in words),=20 which redundancy you want to remove. 5) Find an approach to remove it. 6) Think of better approaches (better compression, quicker execution, less= =20 memory expense,..., already available and well established) 7) Decide and implement it. Most important here is the 2nd item. You can only compress your data stream, if there is redundancy.=20 Compression implies that you lose information. There are lots of compression algorithms which can be used - think of Dolby= as=20 a nice analog compression method, or just a low pass filter. Picking out every Nth sample of a stream is an easy method which is very mu= ch=20 like low pass filtering.=20 If you turn left the treble control on your music station, which removes th= e=20 high frequencies, that's the effect! Decide if this is what you want. Restoring the removed samples by any algorithm will work perfectly, if you= =20 know the mathematical description of the data stream and can calculate ever= y=20 sample. Usually you don't know this, because your music is (almost)=20 arbitrarily distributing the samples over the stream. In this case, the restoration introduces errors. It astonished me to realize that one of the best restoration practices is t= o=20 just insert zeros instead of other interpolated values, because they don't= =20 introduce additional errors. However, you need a low-pass filter behind you= r=20 decompression stage. If it is perfectly adjusted, your music has exactly th= e=20 same sound as after the turned-left treble control, which is the best what you can achieve... Other methods may apply in addition: think of a supervision micro: only if a certain sound level is reached,=20 recording must be activated, while it need not sample pieces of long silenc= e. Taking just a number which indicates the duration of the silence, might be= =20 enough. This might reduce the stream by a huge factor. Such things depend mainly on your application. Back to your issue: Probably you'll achieve a good result in just removing samples, and later=20 adding zeros instead. Google for "zero padding" or such things, and you'll= =20 find lots of information there. Your FFT method will certainly not produce better results, except in very=20 special situations. And be aware, that you lose more information/quality, if you remove more=20 samples. Matlab might give you the means to check this out. Bernhard =20