DSPRelated.com
Forums

Changing the sampling rate of an audio signal.

Started by ma August 5, 2005
ma wrote:
> > Thanks, > > Interesting code. But I couldn't find any mathematical model on how they are > doing this. Why they need a whole file to do this? Why small chunk of data > generate some effect? > > Where can I find some mathematical modeling?
The Secret Rabbit Code web pages do state that the rabbit is a re-implementation of the Julis O. Smith algorithm. Is is described here: http://ccrma-www.stanford.edu/~jos/resample/ Erik -- +-----------------------------------------------------------+ Erik de Castro Lopo nospam@mega-nerd.com (Yes it's valid) +-----------------------------------------------------------+ " ... new TV ad for Microsoft's Internet Explorer e-mail program which uses the musical theme of the "Confutatis Maledictis" from Mozart's Requiem. "Where do you want to go today?" is the cheery line on the screen, while the chorus sings "Confutatis maledictis, flammis acribus addictis,". This translates to "The damned and accursed are convicted to the flames of hell."
The most understandable conceptual explanation I've come across on the
web is here:

http://dspguru.com/info/faqs/mrfaq.htm

In a nutshell, changing the sampling rate by an integer factor of N:

Interpolation:
1. Upsample: Insert N-1 0's between samples
2. Low-pass filter below original fs/2

Decimation:
1. Low-pass filter below desired fs/2
2. Downsample: Skip samples (keep only every Nth sample, and disregard
N-1 samples in between)

ma wrote:
> Hello, > > I want to change the sample rate of an audio signal. Where can I find > some information about it? Some mathematical model and some source code > would be perfect.
Tim Wescott wrote:
> > The underlying technique is called polyphase filtering.
Actually the polyphase filtering usually means a fixed number of phases. The Julius O. SMith technique can acutally handle irrational ratios between input and output sample rate and therefore has an infinite number of phases. Erik -- +-----------------------------------------------------------+ Erik de Castro Lopo nospam@mega-nerd.com (Yes it's valid) +-----------------------------------------------------------+ Traditional capital was stuck in a company's bank account or investments. It could not walk away in disgust. Human capital has free will. It can walk out the door; traditional capital cannot.
in article 42F3F246.8A4E6A23@mega-nerd.com, Erik de Castro Lopo at
nospam@mega-nerd.com wrote on 08/05/2005 19:12:

> Tim Wescott wrote: >> >> The underlying technique is called polyphase filtering. > > Actually the polyphase filtering usually means a fixed number of > phases. The Julius O. Smith technique can acutally handle irrational > ratios between input and output sample rate and therefore has an > infinite number of phases.
but it still is not inaccurate to use the term "polyphase filtering" for naming the underlying technique. so Smith and Gossett linearly interpolate (or use some other low-order spline) between these polyphases to resample at any arbitrary time - it's essentially a way to get a more accurate representation of that windowed sinc or whatever the LPF impulse response is. i *really* have a lot of respect for JOS (and consider him to be a sorta friend, but i only see him at the AES conventions), but i think the idea to interpolate between the discrete and finite polyphases is incremental. i am surprized at how many folks refer to Smith & Gossett as the seminal paper on this. Rabiner, in the 70's, might be the seminal author. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
Thanks to everyone that replied.

It really helped me to refresh my signal processing knowledge and also add a 
lot to it.

After reading most of the documents that was suggested as a good reading, I 
grasp some information about the technique but I find new questions:



1- In Rabbit source code, it is mentioned that if I use small chunk of data 
for transformation then I will have odd sound effects. Why is this 
happening? I need to change the sampling rate of an online system which 
means I only have access to some limited samples (small chunk of data). How 
can I make the system not to generate odd sound effects?



2- What is the delay in this process? Since I want to change the sampling 
rate online, if the delay is very high, I may get some echo effect or 
similar problem. Any idea of how to solve this problem?



3- Are these algorithms fast enough to do processing on real-time on a PC?



Best regards



"ma" <ma@nowhere.com> wrote in message 
news:URGIe.104715$Pf3.31857@fe2.news.blueyonder.co.uk...
> Hello, > > I want to change the sample rate of an audio signal. Where can I find > some information about it? Some mathematical model and some source code > would be perfect. > > > > What I want to do is: > > > > I have a device that can play audio signal with the sampling rate of > 8KSPS. If I want to play a signal that is samples at say 44KSPS I need to > change the sampling rate. How can I do it efficiently and mathematically > correct ( no aliasing) > > > > In other hand this device can generate audio signal with the sampling rate > of 8KBPS, How can change the sampling rate to say 44KSPS? > > > > I am working on windows so any help from windows OS can be used. > > > > Any suggestion on where I may start will be very appreciating? > > > > Best regards > > > >
"ma" <ma@nowhere.com> schrieb im Newsbeitrag
news:URGIe.104715$Pf3.31857@fe2.news.blueyonder.co.uk...
> Hello, > > I want to change the sample rate of an audio signal. Where can I find > some information about it? Some mathematical model and some source code > would be perfect. > > > > What I want to do is: > > > > I have a device that can play audio signal with the sampling rate of
8KSPS.
> If I want to play a signal that is samples at say 44KSPS I need to change > the sampling rate. How can I do it efficiently and mathematically correct > ( no aliasing) > > > > In other hand this device can generate audio signal with the sampling rate > of 8KBPS, How can change the sampling rate to say 44KSPS? > > > > I am working on windows so any help from windows OS can be used. > > > > Any suggestion on where I may start will be very appreciating? > > > > Best regards > > > >
In a DSP Book there was the following receipe (i think) Do a DFT of 44100 input Samples, store them in an array of 48000(x2) (Complex)Samples, fill the remaining (44101 to 48000) bins with zeroes, make an inverse DFT of this 48000 bins and you get 48000 output samples.
In article <gh%Ie.118455$Pf3.114536@fe2.news.blueyonder.co.uk>,
ma <ma@nowhere.com> wrote:
>1- ... it is mentioned that if I use small chunk of data >for transformation then I will have odd sound effects. Why is this >happening? I need to change the sampling rate of an online system which >means I only have access to some limited samples (small chunk of data). How >can I make the system not to generate odd sound effects?
Typical poly-phase resampling techniques only mention linear phase and symmetric interpolation filters. You can also try minimum phase filters or even one-sided interpolators near the sample buffer boundries.
>2- What is the delay in this process? Since I want to change the sampling >rate online, if the delay is very high, I may get some echo effect or >similar problem. Any idea of how to solve this problem?
The delay depends on the width and symmetry of your interpolator, you can always use a shorter or non-symmetric (minimum phase or one-sided) interpolator, trading off against your various quality metrics.
>3- Are these algorithms fast enough to do processing on real-time on a PC?
Depends on the algorithm and the PC. I've wrote code which recalculated each raised-cosine windowed sinc coefficient for dozens of taps for real-time audio non-rational resampling just using standard sin/cos math library calls on a current PowerMac. With various precalculated table look-up optimizations, you can of course get more performance if needed. IMHO. YMMV. -- Ron Nicholson rhn AT nicholson DOT com http://www.nicholson.com/rhn/ #include <canonical.disclaimer> // only my own opinions, etc.
1. Why do you represent yourself as an escapee from the
school playground?

Airy has not been around for months and yet you are
repeatedly making rather silly and infantile remarks
in his direction.

If you don't understand the points that Airy makes, then
admit it, or better to keep quiet. As it is, by dealing with
technical points in an international public forum such as this
one by just bad-mouthing your opponents makes you appear to
be a charlatan peddling some snake-oil elixir.

2. Talking of snake-oil, where have your Diracian impulses come from?

The Diracian has some interesting properties - zero width,
area of unity, infinite sum of all possible cosines, a height
which is not discussed but which appears to be greater in magnitude
than a Bristow-Johnson ego (if that were indeed possible). In the
systems that you deal with, what experimental evidence do you
have that there are pulses with the attributes of Diracians upon
which to base your theory?

3. Talking of snake-oil, where did the factor of "T" come from in your
opening lines?

Consider a 16-bit ADC capable of 100 M Samples per sec. In the first
instance we'll use it to sample a geophysical signal of bandwidth
limited
to 300 HZ and sample at 1 kHz, with suitable analogue instrumentation
to match the input signal to the full range of the ADC. If we now
keep the circuit the same, but now sample at 65.536 MHZ, your claimed
factor of "T" will result in the 16 bit range being compressed down
to one bit. We know this doesn't happen -there will be more samples,
but they'll still be of the same magnitude and 16-bit range.

There

robert bristow-johnson wrote:
> in article VVLIe.108880$Pf3.64017@fe2.news.blueyonder.co.uk, ma at > ma@nowhere.com wrote on 08/05/2005 11:51: > > > Where can I find some mathematical modeling? I need to know the theory and > > then use the code as a learning practice. > > you want theory? i'll give you theory. > > below is about as fundamental for the theory as you can get. it really is > just a result or application of the Nyquist/Shannon Sampling and > Reconstruction Theorem. imagine outputting your signal, at the original > sampling rate, to an ideal D/A converter and then resampling that analog > output signal with an ideal A/D converter (running at the new sampling > rate). now convert that imagination to mathematics (see below). > > i could have just referred to the post (responding to the infamous > "Airy-head") but Google is messing up the ASCII spacing and you don't wanna > read what Airy had to say, anyway. so it's reposted below. > > -- > > r b-j rbj@audioimagination.com > > "Imagination is more important than knowledge." > > > > > > (a monospaced font needed to view the following correctly) > > > ____________________________________________________________________________ > > Bandlimited Interpolation, Sample Rate Conversion, or > Fractional-Sample Delay from the P.O.V. of the > Nyquist/Shannon Sampling and Reconstruction Theorem > > ____________________________________________________________________________ > > Here is the mathematical expression of the sampling and reconstruction > theorem: > > > > x(t)*q(t) = T*SUM{x[k]*d(t-k*T)} .------. > x(t)--->(*)------------------------------------->| H(f) |---> x(t) > ^ '------' > | > | +inf > '------- q(t) = T * SUM{ d(t - k*T) } > k=-inf > > > where: d(t) = 'dirac' impulse function > and T = 1/Fs = sampling period > Fs = sampling frequency > > > +inf > q(t) = T * SUM{ d(t - k*T) } is the "sampling function", is periodic > k=-inf with period T, and can be expressed as a > Fourier series. It turns out that ALL of > the Fourier coefficients are equal to 1. > > > +inf > q(t) = SUM{ c[n]*exp(j*2*pi*n/T*t) } > n=-inf > > > where c[n] is the Fourier series coefficient: > > > t0+T > c[n] = 1/T * integral{q(t) * exp(-j*2*pi*n/T*t) dt} > t0 > > > and t0 can be any time. We choose t0 to be -T/2. > > > T/2 +inf > c[n] = 1/T * integral{T * SUM{ d(t - k*T) } * exp(-j*2*pi*n/T*t) dt} > -T/2 k=-inf > > > T/2 > = integral{ d(t - 0*T) * exp(-j*2*pi*n/T*t) dt} > -T/2 > > > T/2 > = integral{ d(t) * exp(-j*2*pi*n/T*t) dt} > -T/2 > > > = exp(-j*2*pi*n/T*0) > > > = 1 > > > So all Fourier series coefficients for the periodic q(t) are 1. > This results in another valid expression for q(t): > > > +inf > q(t) = SUM{ exp(j*2*pi*n*Fs*t) } . > n=-inf > > > The sampled signal looks like: > > > +inf > x(t)*q(t) = x(t) * T * SUM{ d(t - k*T) } > k=-inf > > > +inf > = T * SUM{ x(t) * d(t - k*T) } > k=-inf > > > +inf > = T * SUM{ x(k*T) * d(t - k*T) } > k=-inf > > > +inf > = T * SUM{ x[k] * d(t - k*T) } > k=-inf > > > where x[k] =def x(k*T) by definition. (We are using the notation > that x[] is a discrete sequence and x() is a continuous function.) > BTW, if you where to compute the Laplace Transform of the sampled > signal x(t)*q(t), you would get the Z Transform of x[k] scaled by > the constant T. > > > An alternative (and equally important representation) of > the sampled signal is: > > > +inf > x(t)*q(t) = x(t) * SUM{ exp(j*2*pi*n*Fs*t) } > n=-inf > > > +inf > = SUM{ x(t) * exp(j*2*pi*n*Fs*t) } > n=-inf > > Defining the spectrum of x(t) as its Fourier Transform: > > > +inf > X(f) =def FT{ x(t) } =def integral{ x(t)*exp(-j*2*pi*f*t) dt} > -inf > > > Then using the frequency shifting property of the Fourier Transform, > > > FT{ x(t)*exp(j*2*pi*f0*t) } = X(f - f0) > > > results in > > > +inf > FT{ x(t)*q(t) } = SUM{ X(f - n*Fs) } > n=-inf > > > This says, what we all know, that the spectrum of our signal > being sampled is shifted and repeated forever at multiples > of the sampling frequency, Fs. If x(t) or X(f) is bandlimited > to B (i.e. X(f) = 0 for all |f| > B) AND if there is no > overlap of the tails of adjacent images X(f), that is: > > > right tail of nth image of X(f) < left tail of (n+1)th image > > > n*Fs + B < (n+1)*Fs - B = n*Fs + Fs - B > > > B < Fs - B > > > 2*B < Fs = 1/T > > > Then we ought to be able to reconstruct X(f) (and also x(t)) by > low pass filtering out all of the images of X(f). To do that, > Fs > 2*B (to prevent overlap) and H(f) must be: > > > { 1 for |f| < Fs/2 = 1/(2*T) > H(f) = { > { 0 for |f| >= Fs/2 = 1/(2*T) > > > This is the "sampling" half of the Nyquist/Shannon sampling and > reconstruction theorem. It says that the sampling frequency, Fs, > must be strictly greater than twice the bandwidth, B, of the > continuous-time signal, x(t), for no information to be lost (or > "aliased"). The "reconstruction" half follows: > > > The impulse response of the reconstruction LPF, H(f), is the > inverse Fourier Transform of H(f), called h(t): > > > +inf > h(t) = iFT{ H(f) } = integral{ H(f)*exp(+j*2*pi*f*t) df} > -inf > > > 1/(2*T) > = integral{ exp(+j*2*pi*f*t) df} > -1/(2*T) > > > = (exp(+j*2*pi/(2*T)*t) - exp(+j*2*pi/(-2*T)*t))/(j*2*pi *t) > > > = ( exp(+j*pi/T*t) - exp(-j*pi/T*t) )/(j*2*pi*t) > > > = sin((pi/T)*t)/(pi*t) > > > = (1/T)*sin(pi*t/T)/(pi*t/T) > > > = (1/T)*sinc(t/T) > > > where sinc(u) =def sin(pi*u)/(pi*u) > > > The input to the LPF is x(t)*q(t) = T*SUM{x[k]*d(t - k*T)} . > Each d(t - k*T) impulse generates its own impulse response and > since the LPF is linear and time invariant, all we have to do > is add up the time-shifted impulse responses weighted by their > coefficients, x[k]. The T and 1/T kill each other off. > > The output of the LPF is > > > +inf +inf > x(t) = SUM{ x[k]*sinc((t-k*T)/T) } = SUM{ x[k]*sinc(t/T - k) } > k=-inf k=-inf > > > This equation tells us explicitly how to reconstruct our > sampled, bandlimited input signal from the samples. This is > the "reconstruction" half of the Nyquist/Shannon sampling and > reconstruction theorem. > > > > Interpolation, Fractional Delay, Sample Rate Conversion, etc: > > When interpolating (whether for fractional sample delay or > for doing sample rate conversion), you must evaluate this > equation for times that might not be integer multiples of your > sampling period, T. > > The sinc(t/T) function is one for t = 0 and zero for t = k*T > for k = nonzero integer. This means that if your new > sample time, t, happens to land exactly on an old sample > time, k*T, only that sample (and none of the neighbors) > contributes to the output sample and the output sample > is equal to the input sample. Only in the case where the > output sample is in between input samples, do the neighbors > contribute to the calculation. > > Since the sinc() function goes on forever to +/- infinity, > it must be truncated somewhere to be of practical use. > Truncating is actually applying the rectangular window (the > worst kind) so it is advantageous to window down the sinc() > function gradually using something like a Hamming or Kaiser > window. In my experience, with a good window, you'll need to > keep the domain of the windowed sinc() function from -16 to +16 > (a 32 tap FIR) and sample it 16384 times in that region (that > would be 16384/32 = 512 polyphases or "fractional delays" if > linear interpolation is subsequently used to get in between > neighboring polyphases). > > Now we simply split the time, t, into an integer part, m, and > fractional part, a, and substitute: > > > +inf > x((m+a)*T) = SUM{ x[k]*sinc((m+a)*T/T - k) } > k=-inf > > > +inf > = SUM{ x[k]*sinc(a + m - k) } > k=-inf > > > where: t = (m + a)*T > and m = floor(t/T) > a = fract(t/T) = t/T - floor(t/T) > > t-T < m*T <= t which means a is always nonnegative and less > than 1: 0 <= a < 1 > > > The limits of the sum can be bumped around a bit without changing > it and by substituting k <- k+m : > > > +inf > x((m+a)*T) = SUM{ x[m+k] * sinc(a - k) } > k=-inf > > > Now, an approximation is made by truncating the summation to a > finite amount (which is effectively windowing): > > > +16 > x((m+a)*T) = SUM{ x[m+k] * sinc(a-k) * w((a-k)/16) } > k=-15 > > > w(u) is the window function. A Hamming window would be: > > { 0.54 + 0.46*cos(pi*u) for |u| < 1 > w(u) = { > { 0 for |u| >= 1 > > A Kaiser window would be better. > > This requires two 32 point FIR computations to calculate two > adjacent interpolated samples (these are linearly interpolated > to get the final output sample). Since the sinc(u)*w(u) > function is symmetrical, that means 8193 numbers stored somewhere > in memory. When interpolating, the integer part of t/T (or 'm') > determines which 32 adjacent samples of x[k] to use, the > fractional part of t/T (or 'a') determines which set of 32 windowed > sinc() coefficients to be used to combine the 32 samples. > > ____________________________________________________________________________ > > ____________________________________________________________________________
A good question indeed!

And a question that the so-called experts and authors in this
field fail to answer time and time again.

The Diracian, or Unit Impulse is a very good mathematical tool
to analyse the response of systems once a mathmatical model of
those systems had been produced.

It is, however, a poor mathematical claim to make that such
impulses are found to be part of a system when neither the
area nor the magnitude of such impulses are found anywhere in such
systems.

To those who ask, "Who cares? I get good results." , I suggest that
their approach is unscientific and compares to the religious
loonies who sacrifice goats and virgins to stop the Sun falling
out of the sky and justify the continuing practice by the Sun
remaining in the sky.

So.....is the world of DSP a world of scientific men, or is
it a world of snake-oil charlatans and of religious loonies?

Where do these Diracian impulses come from?

Airy R.Bean wrote:

> 2. Talking of snake-oil, where have your Diracian impulses come from? > > The Diracian has some interesting properties - zero width, > area of unity, infinite sum of all possible cosines, a height > which is not discussed but which appears to be greater in magnitude > than a Bristow-Johnson ego (if that were indeed possible). In the > systems that you deal with, what experimental evidence do you > have that there are pulses with the attributes of Diracians upon > which to base your theory? >
Polymath wrote:
> A good question indeed! > > And a question that the so-called experts and authors in this > field fail to answer time and time again. > > The Diracian, or Unit Impulse is a very good mathematical tool > to analyse the response of systems once a mathmatical model of > those systems had been produced. > > It is, however, a poor mathematical claim to make that such > impulses are found to be part of a system when neither the > area nor the magnitude of such impulses are found anywhere in such > systems. > > To those who ask, "Who cares? I get good results." , I suggest that > their approach is unscientific and compares to the religious > loonies who sacrifice goats and virgins to stop the Sun falling > out of the sky and justify the continuing practice by the Sun > remaining in the sky. > > So.....is the world of DSP a world of scientific men, or is > it a world of snake-oil charlatans and of religious loonies? > > Where do these Diracian impulses come from?
They're all the same, so they come right off dirac. -- john