DSPRelated.com
Forums

VOIP Clock rate adaption

Started by Rocky August 4, 2007
Hi All,

In the ISDN environment (T1, E1) there is a 'master' clock and every
satellite device derives its clocking from the next device up the
hierarchy so that the entire network is synchronised.

When using VOIP the clock link is broken. Party A and party B may have
the same nominal sample rate, but because they have independant clock
sources the actual sample rates may vary by a small amount.

If data is sent from A to B eventually there will be a buffer overflow
or a buffer underrun. Since the buffer size in the proposed system is
256 samples at 8000 Hz sample rate giving some 30 msec of buffering I
wondered if anyone could propose an effective sample rate adaption
scheme that could be used.

Or suggest any other method of handling the problem.

Rocky

Rocky wrote:
> Hi All, > > In the ISDN environment (T1, E1) there is a 'master' clock and every > satellite device derives its clocking from the next device up the > hierarchy so that the entire network is synchronised. > > When using VOIP the clock link is broken. Party A and party B may have > the same nominal sample rate, but because they have independant clock > sources the actual sample rates may vary by a small amount. > > If data is sent from A to B eventually there will be a buffer overflow > or a buffer underrun. Since the buffer size in the proposed system is > 256 samples at 8000 Hz sample rate giving some 30 msec of buffering I > wondered if anyone could propose an effective sample rate adaption > scheme that could be used. > > Or suggest any other method of handling the problem.
256 samples is a really short buffer for VoIP. You must expect a very low level of jitter. Is this a pure LAN environment? Those are usually the only situations where 30ms has a hope of working consistently. Various rate adaption strategies are used for VoIP. In the normal case, the jitter buffer will need to be able to grow to much more than 30ms, yet keeping the buffer large badly affects perceived call quality - large interactive latency is bad. This means the buffers are usually dynamic, and need to be adjusting the flow of samples continuously. The simplistic flow adjustment method is to drop or insert samples in the quiet periods. The smarter method is to dynamically adjust the playback rate, using a constant pitch rate adjusting algorithm, like PSOLA or PICOLA. Its an interesting area, and one still subject to interesting research. Regards, Steve
On Aug 4, 9:26 pm, Steve Underwood <ste...@dis.org> wrote:
> Rocky wrote: > > Hi All, > > > In the ISDN environment (T1, E1) there is a 'master' clock and every > > satellite device derives its clocking from the next device up the > > hierarchy so that the entire network is synchronised. > > > When using VOIP the clock link is broken. Party A and party B may have > > the same nominal sample rate, but because they have independant clock > > sources the actual sample rates may vary by a small amount. > > > If data is sent from A to B eventually there will be a buffer overflow > > or a buffer underrun. Since the buffer size in the proposed system is > > 256 samples at 8000 Hz sample rate giving some 30 msec of buffering I > > wondered if anyone could propose an effective sample rate adaption > > scheme that could be used. > > > Or suggest any other method of handling the problem. > > 256 samples is a really short buffer for VoIP. You must expect a very > low level of jitter. Is this a pure LAN environment? Those are usually > the only situations where 30ms has a hope of working consistently. > > Various rate adaption strategies are used for VoIP. In the normal case, > the jitter buffer will need to be able to grow to much more than 30ms, > yet keeping the buffer large badly affects perceived call quality - > large interactive latency is bad. This means the buffers are usually > dynamic, and need to be adjusting the flow of samples continuously. The > simplistic flow adjustment method is to drop or insert samples in the > quiet periods. The smarter method is to dynamically adjust the playback > rate, using a constant pitch rate adjusting algorithm, like PSOLA or PICOLA.
The application is evisaged to be a dedicated replacement of an E1 card on Asterisk, so it would be a dedicated LAN card feeding one or two of these 32 port units. We currently have an E1 channel bank but have found that some of the available E1 cards for Asterisk will lock up to the extent that a power down reset is required to get it up again. Also, the E1 cards are relatively expensive so we are exploring the idea of using a LAN card instead of the E1 card and creating an IAX2 to PCM hiway. I was thinking of using a scheme where we could change the PCM clock by up to 0.1% - possibly only use up to 30 channels and add or deleted a clock per frame. i.e. some frames with 255 or 257 bits. This would be conrolled by attempting to keep the buffers half full (the ones in use!). Maybe each channel needs to be handle seperately - what if the Asterisk unit is just a gateway? Thanks for the input Steve. Rocky

Rocky wrote:
> Hi All, > > In the ISDN environment (T1, E1) there is a 'master' clock and every > satellite device derives its clocking from the next device up the > hierarchy so that the entire network is synchronised. > > When using VOIP the clock link is broken. Party A and party B may have > the same nominal sample rate, but because they have independant clock > sources the actual sample rates may vary by a small amount.
The typical approach is to track the rate mismatch until the difference will reach +/- 1 sample, and then just drop or repeat a sample. If this does not happen very often, the effect is barely audible. Of course, an interpolation would be better, however the simple +/- sample is good enough. Vladimir Vassilevsky DSP and Mixed Signal Design Consultant http://www.abvolt.com
Vladimir Vassilevsky wrote:
> > > Rocky wrote: >> Hi All, >> >> In the ISDN environment (T1, E1) there is a 'master' clock and every >> satellite device derives its clocking from the next device up the >> hierarchy so that the entire network is synchronised. >> >> When using VOIP the clock link is broken. Party A and party B may have >> the same nominal sample rate, but because they have independant clock >> sources the actual sample rates may vary by a small amount. > > The typical approach is to track the rate mismatch until the difference > will reach +/- 1 sample, and then just drop or repeat a sample. If this > does not happen very often, the effect is barely audible. Of course, an > interpolation would be better, however the simple +/- sample is good > enough.
Inserting or dropping a sample generally causes a nasty click. The rates are likely to be sufficiently different that these clicks might happen once a second, though will generally be less frequent than that. In telephony this generally considered an unacceptable solution. Steve
On Aug 6, 8:31 pm, Steve Underwood <ste...@dis.org> wrote:
> Vladimir Vassilevsky wrote: > > > Rocky wrote: > >> Hi All, > > >> In the ISDN environment (T1, E1) there is a 'master' clock and every > >> satellite device derives its clocking from the next device up the > >> hierarchy so that the entire network is synchronised. > > >> When using VOIP the clock link is broken. Party A and party B may have > >> the same nominal sample rate, but because they have independant clock > >> sources the actual sample rates may vary by a small amount. > > > The typical approach is to track the rate mismatch until the difference > > will reach +/- 1 sample, and then just drop or repeat a sample. If this > > does not happen very often, the effect is barely audible. Of course, an > > interpolation would be better, however the simple +/- sample is good > > enough. > > Inserting or dropping a sample generally causes a nasty click. The rates > are likely to be sufficiently different that these clicks might happen > once a second, though will generally be less frequent than that. In > telephony this generally considered an unacceptable solution. >
Would it be reasonable to do minimal buffering for the VoIP TX path so as to allocate more RAM to the RX path and let the remote receiver 'worry' about the rate adaption for TX, only taking care of the locally received data? Rocky
Steve Underwood wrote:
> Vladimir Vassilevsky wrote: >> The typical approach is to track the rate mismatch until the >> difference will reach +/- 1 sample, and then just drop or repeat a >> sample. If this does not happen very often, the effect is barely >> audible. Of course, an interpolation would be better, however the >> simple +/- sample is good enough. > > Inserting or dropping a sample generally causes a nasty click. The rates > are likely to be sufficiently different that these clicks might happen > once a second, though will generally be less frequent than that. In > telephony this generally considered an unacceptable solution.
I believe SDH interfaces between telcos do allow such slips for plesiosynchronous operation. But telcos use very accurate clocks so I guess such slips are very rare (I'm not sure how rare). Regards -- Adrian Hey
Rocky wrote:
> We currently have an E1 channel bank but have found that some of the > available E1 cards for Asterisk will lock up to the extent that a > power down reset is required to get it up again. Also, the E1 cards > are relatively expensive so we are exploring the idea of using a LAN > card instead of the E1 card and creating an IAX2 to PCM hiway. > > I was thinking of using a scheme where we could change the PCM clock > by up to 0.1% - possibly only use up to 30 channels and add or deleted > a clock per frame. i.e. some frames with 255 or 257 bits. This would > be conrolled by attempting to keep the buffers half full (the ones in > use!). > > Maybe each channel needs to be handle seperately - what if the > Asterisk unit is just a gateway?
I dunno about Asterisk, but there are products that seem to be doing what you want (AFAICT) available off the shelf. A quick google reveals this.. http://www.ghipsystems.com/en/Products/IPM-en/IPM-en.html .. but I'm sure there are others. If you want to do it yourself it shouldn't be to hard. I've done something similar on Blackfin and I also designed it to cope with +/- 0.1% clock freq errors, which seems generous even for really cheap inaccurate crystals. ADI have useful app notes about this kind of thing. A software implementation should easily be able to deal with 30 channels on 1 DSP. My implementation worked out at about 9 Mips per channel, but that was bi-directional 16 KHz <-> 48 KHz asynchronous resampling. If you only need it to work at 8 KHz on the incoming network side I guess you will need less Mips. I was fortunate enough to be dealing with TDM on both sides, and in this case you can derive all the timing information you need by time tagging the DMA interrupts and doing a few calculations. I think you could do something similar, but with a network interface on one side you need to deal with much higher jitter in arrival times (of course). BTW, if you're using E1 then I guess you're using G711 codec. If so then you may also want to consider implementing the packet loss concealment appendix.. http://www.itu.int/rec/T-REC-G.711-199909-I!AppI/en I'm not sure about using exotic constant pitch time stretching or compressing algorithms. It seems like a lot of work to me. If you find your jitter buffer getting dangerously near empty or over full I would have thought it would be OK to temporarily allow largish variation in your synthesised resampling clock (but don't have sudden step changes in freq) e.g. +/- 5% would probably sound OK for speech and allow you to play catch up or slow down at 50mS/S (just guessing here, I've never actually tried it). Regards -- Adrian Hey
On Aug 6, 5:06 pm, Adrian Hey <a...@NoSpicedHam.iee.org> wrote:
> Rocky wrote: > > We currently have an E1 channel bank but have found that some of the > > available E1 cards for Asterisk will lock up to the extent that a > > power down reset is required to get it up again. Also, the E1 cards > > are relatively expensive so we are exploring the idea of using a LAN > > card instead of the E1 card and creating an IAX2 to PCM hiway. > > > I was thinking of using a scheme where we could change the PCM clock > > by up to 0.1% - possibly only use up to 30 channels and add or deleted > > a clock per frame. i.e. some frames with 255 or 257 bits. This would > > be conrolled by attempting to keep the buffers half full (the ones in > > use!). > > > Maybe each channel needs to be handle seperately - what if the > > Asterisk unit is just a gateway? > > I dunno about Asterisk, but there are products that seem to be > doing what you want (AFAICT) available off the shelf. A quick > google reveals this.. > http://www.ghipsystems.com/en/Products/IPM-en/IPM-en.html > .. but I'm sure there are others. > > If you want to do it yourself it shouldn't be to hard. I've done > something similar on Blackfin and I also designed it to cope > with +/- 0.1% clock freq errors, which seems generous even for > really cheap inaccurate crystals. > > ADI have useful app notes about this kind of thing. A software > implementation should easily be able to deal with 30 channels > on 1 DSP. My implementation worked out at about 9 Mips per > channel, but that was bi-directional 16 KHz <-> 48 KHz > asynchronous resampling. If you only need it to work at 8 KHz > on the incoming network side I guess you will need less Mips. > > I was fortunate enough to be dealing with TDM on both sides, and > in this case you can derive all the timing information you need by > time tagging the DMA interrupts and doing a few calculations. > I think you could do something similar, but with a network > interface on one side you need to deal with much higher jitter > in arrival times (of course). > > BTW, if you're using E1 then I guess you're using G711 codec. > If so then you may also want to consider implementing the > packet loss concealment appendix.. > http://www.itu.int/rec/T-REC-G.711-199909-I!AppI/en > > I'm not sure about using exotic constant pitch time stretching > or compressing algorithms. It seems like a lot of work to me. > If you find your jitter buffer getting dangerously near empty > or over full I would have thought it would be OK to temporarily > allow largish variation in your synthesised resampling clock (but > don't have sudden step changes in freq) e.g. +/- 5% would > probably sound OK for speech and allow you to play catch up or > slow down at 50mS/S (just guessing here, I've never actually > tried it). > > Regards > -- > Adrian Hey
I worked on a system that did just that, but unfortunately it could not decode DTMF when the rate was adapting. John
Adrian Hey wrote:
> Steve Underwood wrote: >> Vladimir Vassilevsky wrote: >>> The typical approach is to track the rate mismatch until the >>> difference will reach +/- 1 sample, and then just drop or repeat a >>> sample. If this does not happen very often, the effect is barely >>> audible. Of course, an interpolation would be better, however the >>> simple +/- sample is good enough. >> >> Inserting or dropping a sample generally causes a nasty click. The >> rates are likely to be sufficiently different that these clicks might >> happen once a second, though will generally be less frequent than >> that. In telephony this generally considered an unacceptable solution. > > I believe SDH interfaces between telcos do allow such slips for > plesiosynchronous operation. But telcos use very accurate clocks > so I guess such slips are very rare (I'm not sure how rare).
Public exchanges use Rhubidium clocks, so slips are pretty rare. When you try allowing slips with a PBX it sounds bad. Also, for a pure PSTN application, regular slips would prevent almost any modem from working over the path. Steve