I have a system which has a tuner chip and DSP chip, both are clocked by independent xtals. Tuner gives the baseband samples to DSP, and tuner is master and dsp is slave. Dsp does the demodulation of base band samples and does the audio decoding and generate audio, which is streamed out from DSP.
Now if the tuner xtal is drifting then the dsp audio streaming needs to adjust to that drift, else buffer overflow or underrun happens as the sample rates doesn't match.
How to design a control system such that a digital baseband frame of duration 'T' ms is mapped to audio and adjust the drift ?
Apparently there are no timing information being broadcast and no meta data fields available for digital radio standards like DAB etc.The only way is to find the symbol boundary and time stamp it, since the tuner is feeding the base band samples to DSP, if any tuner variations will be reflected in the timing of symbols in DSP. Monitor this difference and adjust audio clock in DSP, which is possible in hardware of DSP.
This is what I'm planning to implement, please find the attached figure,
T0 – timestamp of RF sample buffer 1
T1 - timestamp of RF sample buffer 2
T = T1 – T0 ( duration of RF buffer)
t0 – timestamp of Audio sample buffer 1
t1 - timestamp of Audio sample buffer 2
t = t1 – t0 ( duration of audio buffer)
Assume audio playback buffer and RF capture buffer are of same duration, then
Error = (T – t ),
This error is minimized by audio control loop.Audio COntrol Loop Please send your suggestions feedback regarding this approach.
One way to handle consumption of audio signals from two systems with two different clock is to use ASRC in one path. There are multiple ways to use asrc. One option is to use a jitter buffer like buffer in the same path that of asrc. Keep monitoring the buffer levels to decide the asrc output sample count( increase or decrease). Hope it helps.
Yes, asrc is required but how accuratley estimate the drift and find the src ratio ?
There is no need to estimate the instantaneous drift. As long as the effect of drift is averaged out in the right way is fine.
For example, consider the following case:
1. the input sampling rate is 48000Hz
2. the output sample rate is also 48000Hz
3. the input DMA block size and output DMA block size is 128 samples
4. let the jitter buffer size be 1280 samples with lower water mark at 20% and upper water mark at 80%. ie, low_flag will be set when the buffer level goes lower than 256 samples. high_flag will be set when the buffer level goes higher than 1024.
the input loop(for example):
Input.get(&pInBuffer); // get pointer of 128 samples from dma buffer buffer
nOutCount = calc_OutSamples(....); // compute the number of samples required to be produced (by interpolating) from nInSamples. this function will make use of the low_flag and high_flag
ASRC.process(pInBuffer, nInCOunt, pAsrcBuffer, nOutCount); // apply asrc to produce nOutCount from nInCount samples
JitterBuffer.add(pAsrcBuffer, nOutCount); // move to jitter buffer
the output loop can simply consume from the JitterBuffer.
the above sequence is based on input loop. there can be sequences based on output loop/isr.
just one way to handle... hope it conveys one approach.
Suppose if I move this to output side,
First I wait the jitter buffer to be half fill, ie wait for
5 audio packets which will fill the jitter buffer with 128 x 5 = 640 samples. Then start the output side DMA, after every packet is send out DMA raise and interrupt and in ISR callback, check the level of buffer. If sample rates are not same then the buffer will build up or under-run based on which side the drift is.
Now my question is if I detect the jitter buffer has reached lower or higher watermarks, how to decide on how may samples to output ? Won't that cause an audio artifact ?
How does the "calc_OutSamples(....); " function works funtionally ?
Before I answer your Qs, let's assume an API for the ASRC call.
int asrc_resample(int *p_in, int *p_out, uint n_in, uint n_out);
>> how to decide on how may samples to output ?
since in you've decided to move asrc to output side, I'm assuming you're planning to call asrc after the bufferring. so it is better to keep n_out const and vary the n_in according to the buffer level.
so, keep the n_out const and make it same or multiple of your output DMA buffer. vary the n_in depending on the buffer level.
here again there're more than one ways and all may not give same result. one scheme which i feel is good for CD quality rendering is:
1. when the buffer level is lower than the watermarkL, make n_in = 0.9*n_out
2. when the buffer level is higher than the watermarkH, make n_in = 1.1*n_out
3. else n_in = n_out
and make call to the asrc api for all the 3 cases(!).
10% interpolation without affecting audio quality can be realized with nominal MIPS. A good paper from Dr. Paul describes the theory and implementation aspects in his paper http://www.analog.com/media/en/technical-documenta...
>> Won't that cause an audio artifact ?
no. 10% interpolation will not degrade audio quality for practical cases. If you want more accuracy, go for higher order interpolations and more longer FIR filters.
>> How does the "calc_OutSamples(....); " function works funtionally ?
already explained above.
hope it helps.
Thanks for the detailed explanation, I would like to go some more finer details as I think it would be good to discuss.
Suppose when it reaches lower/higher watermarks, will it keep oscillating between 1/2 & 3 ?
How to test this by artificially creating an audio drift ? Any method to create drifts for testing rather than running for long hours ?
best way is to use Audio Precision which can be set to give you signals with jittery clocks.
alternate options are many,like:
a. perform upsampling and downsampling once in a while so that average is const.
b. put a test asrc to increase/decrease number of samples prior to the actual asrc.
c. or buy one cheap DVD player and use the s/pdif from this in master mode. the real test !
rest leaving to your imagination....all the best.
As you know my primary aim is to apply this on a broadcast receiver where the audio is received on AIR and channel decoder does the demodulation of digital radio symbols, and extracts the compressed audio data. This is then fed to an audio decoder to decode the audio. After decoding the uncompressed audio has to be played out.
I would like to implement the jitter buffer, by having time stamps. So my question is where to do the timestamping, the timestamp put after audio decoding will be jitery because of variable audio content and compression effects. So the best place is to put at the RF input when baseband samples are buffered using DMA, this then can be corrected for clock recovery and use it while sending out audio.
Any suggestions on this approach, Please refer to the attached figure in my first post.
The approach with asrc which we discussed in our conversation so far performs asynchronous re-sampling to adjust the effect of time base changes. This scheme is generic and can apply to all sample producer-consumer which are using different clocks. I think, the broadcast receiver you illustrated also fit into this category.
time base change leads to buffer overflow/underflow which will lead to eventual sample drops and hence glitch in the output audio. The approach with asrc creates/destroys samples to adjust the extra/missing samples in the buffer. This approach doesn't mandate usage of time-stamps.
The asrc ratio can be decided based on the buffer level or the timestamp 'trend' or any other information which you may want to adjust to...
Suppose if I put a timestamp at input side DMA buffer, and try to use that at output side by comparing with the current time, and find the delta between arrival and departure times, it should remain constant as long as there is no drift.
Now consider the scenario I need to do 10% interpolation when the jitter buffer goes below the lower watermark due to drift. Which timestamp to consider on the input side ? Do I need to do interpolatio of time as well ?
Though I didn't do it myself, one option to leverage the arrival time-stamp to decide more accurate re-sampling factor would be to 'fit' incoming time-stamp against the outgoing (local)time-stamp. for example, y = mx+c. you may use 'm' as the current re-sampling factor.
where, x and y are the numbers corresponding to TSs.
may use first order regression fit for finding the value of m for given duration corresponding to r frames of (local)time duration dt each. You can update the regression fit continuously OR refresh after 'n' frames.
Hope it helps.
Assume my input & output DMA size is 128 samples each, and resampling is happening on input side as well, then if the jitter buffer hit high water mark, means the output side is slower, so need to consume more from input.
.ie, n_in = 1.1*n_out,
but since the input DMA gives only 128 samples for every completion of DMA callback, the resampler has to wait for the next dma callback to get 128 + 12.8 samples. This will cause a delay of 2.67 ms assuming sample rate of 48kHz. How to handle this varying input requirement.
Ben, Hope you have resolved this at your side on the generic aspects. You can also drop a mail into email@example.com in case you need to take this further on specifics.
Tune the tunable clock by compring the input position in to the buffer(s) to the takeout posision in buffers (calculate freespace) - idealy the distance shall be half the total buffers.
In some cases (when DMA or other hardware is controlling the buffer i/o) i have more than 2 buffers and figure out the free buffer count. If its low or high i change the clock.
Joakim / OZ1DUG
I've done similar for genlocked video, although in this case we were strictly locked at the output end, and varied the timing of a camera at the source end. But in the middle was a buffer, and we servoed the fill level of that buffer to 50%.
You can model the buffer fill level as an integrating process; depending on your needs and the amount of variability you have to contend with you can implement a simple proportional control and accept that the fill level won't be at exactly 50% when there's a timing mismatch, or you can use a proportional-integral controller and nail down the long-term fill level at exactly 50%