DSPRelated.com
Forums

Low latency DSP on PC question

Started by TrissT February 26, 2009
Hi guys, I hope someone can help me.

I am developing an application on PC which needs low latency DSP, using
C++.  The filters form part of Windows application, which is part of a
machine control system.

As FIR filters seem to be higher latency, I am using IIR lowpass filters,
with some success, but have the issue that the higher the filter order, the
higher the latency gets.  The system has a low sample rate, so latency is
an important issue.

I notice from reading this forum that there is a form of IIR filter called
'parallel form' where the filter is built from summing parallel stages
rather than cascading.

Can someone point me at the design techniques for these, they look at
first sight to be just what I need.

Will they really have lower latency than the 'normal' cascaded stages, for
a given filter order?

Thanks,
TrissT


On Feb 26, 9:18&#4294967295;am, "TrissT" <tr...@ymir.net> wrote:
> Hi guys, I hope someone can help me. > > I am developing an application on PC which needs low latency DSP, using > C++. &#4294967295;The filters form part of Windows application, which is part of a > machine control system. > > As FIR filters seem to be higher latency, I am using IIR lowpass filters, > with some success, but have the issue that the higher the filter order, the > higher the latency gets. &#4294967295;The system has a low sample rate, so latency is > an important issue. > > I notice from reading this forum that there is a form of IIR filter called > 'parallel form' where the filter is built from summing parallel stages > rather than cascading. > > Can someone point me at the design techniques for these, they look at > first sight to be just what I need. > > Will they really have lower latency than the 'normal' cascaded stages, for > a given filter order? > > Thanks, > TrissT
There is no reason that latency should increase as the IIR filter order goes up. If you have a cascade of biquads, you just need to execute them from left to right, so the input to every filter is the most recently calculated output from the previous filter. Executing from right to left gives you an extra z^-1 between each stage, which is undesirable. I always thought the parallel approach was a bit weird; for example, if you are trying to achieve a -100 dB stopband, you end up adding two large quantities together that are supposed to cancel to the -100 dB level. Sounds like coefficient quantization could ruin your whole day; but I could be wrong because I have never used them I believe the Matlab fdatool will give you the coefficients, if you want them. Bob Adams
On Feb 26, 10:42&#4294967295;am, Robert Adams <robert.ad...@analog.com> wrote:
> On Feb 26, 9:18&#4294967295;am, "TrissT" <tr...@ymir.net> wrote: > > > > > > > Hi guys, I hope someone can help me. > > > I am developing an application on PC which needs low latency DSP, using > > C++. &#4294967295;The filters form part of Windows application, which is part of a > > machine control system. > > > As FIR filters seem to be higher latency, I am using IIR lowpass filters, > > with some success, but have the issue that the higher the filter order, the > > higher the latency gets. &#4294967295;The system has a low sample rate, so latency is > > an important issue. > > > I notice from reading this forum that there is a form of IIR filter called > > 'parallel form' where the filter is built from summing parallel stages > > rather than cascading. > > > Can someone point me at the design techniques for these, they look at > > first sight to be just what I need. > > > Will they really have lower latency than the 'normal' cascaded stages, for > > a given filter order? > > > Thanks, > > TrissT > > There is no reason that latency should increase as the IIR filter > order goes up. If you have a cascade of biquads, you just need to > execute them from left to right, so the input to every filter is the > most recently calculated output from the previous filter. Executing > from right to left gives you an extra z^-1 between each stage, which > is undesirable. > > I always thought the parallel approach was a bit weird; for example, > if you are trying to achieve a -100 dB stopband, you end up adding two > large quantities together that are supposed to cancel to the -100 dB > level. Sounds like coefficient quantization could ruin your whole day; > but I could be wrong because I have never used them > > I believe the Matlab fdatool will give you the coefficients, if you > want them. > > Bob Adams- Hide quoted text - > > - Show quoted text -
By the way I am assuming that when you say "latency" you do not mean "peak group delay of the filter". They are two very different things. Bob
TrissT wrote:
> Hi guys, I hope someone can help me. > > I am developing an application on PC which needs low latency DSP, using > C++. The filters form part of Windows application, which is part of a > machine control system. > > As FIR filters seem to be higher latency, I am using IIR lowpass filters, > with some success, but have the issue that the higher the filter order, the > higher the latency gets. The system has a low sample rate, so latency is > an important issue. > > I notice from reading this forum that there is a form of IIR filter called > 'parallel form' where the filter is built from summing parallel stages > rather than cascading. > > Can someone point me at the design techniques for these, they look at > first sight to be just what I need. > > Will they really have lower latency than the 'normal' cascaded stages, for > a given filter order?
The word I've heard is that the latency of the OS -- indeterminate at that! -- will swamp any latency of calculation. Windows is hardly a real-time OS. Jerry -- Engineering is the art of making what you want from things you can get. &#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;
On Feb 26, 6:18&#4294967295;am, "TrissT" <tr...@ymir.net> wrote:
> Hi guys, I hope someone can help me. > > I am developing an application on PC which needs low latency DSP, using > C++. &#4294967295;The filters form part of Windows application, which is part of a > machine control system. >
If you are trying to develop a real-time control system where a Windows PC application is part of the control, then you should give up now. Windows is not a real-time operating system. It's a pre-emptive operating system. Because of pre-emption, your Windows app can be interrupted at any time by the kernel, for any amount of time, and not receive a time-slice again until much later. Typically, you should expect your Windows app to be pre-empted for anywhere from tens of milliseconds to hundreds of milliseconds, depending on the configuration of each individual machine (global settings such as "server" vs "desktop" mode), the current workload of the machine at each instant (such as the number and CPU requirements of other apps running on the machine), and other factors to numerous to mention or to control over any significant deployment. Some Windows OSs are real-time, such as Windows CE. A control system using Windows CE might be possible. But you mentioned PCs, so you probably are not targetting Windows CE. On the other hand, if you are not using the app as part of a real-time control loop, then Windows apps are perfectly capable of post- processing activity, such as monitoring activity to sound alarms and the like if things look wrong. In such scenarios, there is not a need for real-time.
Malachy Moses wrote:
> On Feb 26, 6:18 am, "TrissT" <tr...@ymir.net> wrote: >> Hi guys, I hope someone can help me. >> >> I am developing an application on PC which needs low latency DSP, using >> C++. The filters form part of Windows application, which is part of a >> machine control system. >> > > If you are trying to develop a real-time control system where a > Windows PC application is part of the control, then you should give up > now. > > Windows is not a real-time operating system. It's a pre-emptive > operating system. Because of pre-emption, your Windows app can be > interrupted at any time by the kernel, for any amount of time, and not > receive a time-slice again until much later. Typically, you should > expect your Windows app to be pre-empted for anywhere from tens of > milliseconds to hundreds of milliseconds, depending on the > configuration of each individual machine (global settings such as > "server" vs "desktop" mode), the current workload of the machine at > each instant (such as the number and CPU requirements of other apps > running on the machine), and other factors to numerous to mention or > to control over any significant deployment. > > Some Windows OSs are real-time, such as Windows CE. A control system > using Windows CE might be possible. But you mentioned PCs, so you > probably are not targetting Windows CE. > > On the other hand, if you are not using the app as part of a real-time > control loop, then Windows apps are perfectly capable of post- > processing activity, such as monitoring activity to sound alarms and > the like if things look wrong. In such scenarios, there is not a need > for real-time.
It may be possible to assign one core of a dual-core processor to the real-time task alone. That might bear looking into. Jerry -- Engineering is the art of making what you want from things you can get. &#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;
>Malachy Moses wrote: >> On Feb 26, 6:18 am, "TrissT" <tr...@ymir.net> wrote: >>> Hi guys, I hope someone can help me. >>> >>> I am developing an application on PC which needs low latency DSP,
using
>>> C++. The filters form part of Windows application, which is part of
a
>>> machine control system. >>> >> >> If you are trying to develop a real-time control system where a >> Windows PC application is part of the control, then you should give up >> now. >> >> Windows is not a real-time operating system. It's a pre-emptive >> operating system. Because of pre-emption, your Windows app can be >> interrupted at any time by the kernel, for any amount of time, and not >> receive a time-slice again until much later. Typically, you should >> expect your Windows app to be pre-empted for anywhere from tens of >> milliseconds to hundreds of milliseconds, depending on the >> configuration of each individual machine (global settings such as >> "server" vs "desktop" mode), the current workload of the machine at >> each instant (such as the number and CPU requirements of other apps >> running on the machine), and other factors to numerous to mention or >> to control over any significant deployment. >> >> Some Windows OSs are real-time, such as Windows CE. A control system >> using Windows CE might be possible. But you mentioned PCs, so you >> probably are not targetting Windows CE. >> >> On the other hand, if you are not using the app as part of a real-time >> control loop, then Windows apps are perfectly capable of post- >> processing activity, such as monitoring activity to sound alarms and >> the like if things look wrong. In such scenarios, there is not a need >> for real-time. > >It may be possible to assign one core of a dual-core processor to the >real-time task alone. That might bear looking into.
There are hard real time extensions to Linux which work pretty well. However "hard" is a pretty vague term in an era where cores rely on caches to a huge extent. A small action by another activity on the machine, that trashes the cache, can have a huge impact on the hard real time activity. Assigning the real time activity to a specific core only partially alleviates that. The L1 caches may be private to a core, but L2 may not be, and L3 certainly won't. Regards, Steve
"steveu" <steveu@coppice.org> wrote in message
news:ydSdnbN2LJO1-DrUnZ2dnUVZ_v_inZ2d@giganews.com...

> There are hard real time extensions to Linux which work pretty well. > However "hard" is a pretty vague term in an era where cores rely on caches > to a huge extent.
"Hard" doesn't imply "fast" or "instant". Hard realtime only means that missing the deadlines is unacceptable.
> A small action by another activity on the machine, that > trashes the cache, can have a huge impact on the hard real time activity.
The deep pipelines and branch predictors contribute, too. However that kind of indeterminism is in the range of tens of nanoseconds.
> Assigning the real time activity to a specific core only partially > alleviates that.
So, assign the real time activity to PIC or AVR. Timing is exact and predictable.
> The L1 caches may be private to a core, but L2 may not be, > and L3 certainly won't.
SDRAM page misses or occasional refresh cycles create a lot of jitter in the 100ns range. Vladimir Vassilevsky DSP and Mixed Signal Consultant www.abvolt.com
> >"steveu" <steveu@coppice.org> wrote in message >news:ydSdnbN2LJO1-DrUnZ2dnUVZ_v_inZ2d@giganews.com... > >> There are hard real time extensions to Linux which work pretty well. >> However "hard" is a pretty vague term in an era where cores rely on
caches
>> to a huge extent. > >"Hard" doesn't imply "fast" or "instant". Hard realtime only means that >missing the deadlines is unacceptable.
Of course, but it is becoming increasingly difficult to know if you can meet those not especially fast hard constraints when the equipment is so very non-deterministic. Analysis no longer gets you very far, as the worst case is extremely arcane. Testing cannot prove you will always meet the constraints. You end up with vague things like "the worst I ever saw was X, or make sure I have a bit more than X available and hope that's safe". Time was when rigorous hard real time analysis was a very practical thing.
>> A small action by another activity on the machine, that >> trashes the cache, can have a huge impact on the hard real time
activity.
> >The deep pipelines and branch predictors contribute, too. However that
kind
>of indeterminism is in the range of tens of nanoseconds.
Cache activity can affect things by an order or magnitude.
>> Assigning the real time activity to a specific core only partially >> alleviates that. > >So, assign the real time activity to PIC or AVR. Timing is exact and >predictable.
Right. However, most MCUs just a little bit pokier then those are starting to place reliance on cache, and determinism is lost. It would be nice if some of these things could be put in a mode where they are a lot slower, but all the statistically based performance features are disabled.
>> The L1 caches may be private to a core, but L2 may not be, >> and L3 certainly won't. > >SDRAM page misses or occasional refresh cycles create a lot of jitter in
the
>100ns range.
Almost everything being done to soup up the average performance of modern machines is having a negative influence on the determinism needed by hard real time activities. Steve
"steveu" <steveu@coppice.org> writes:

> Of course, but it is becoming increasingly difficult to know if you can > meet those not especially fast hard constraints when the equipment is so > very non-deterministic. Analysis no longer gets you very far, as the worst > case is extremely arcane. Testing cannot prove you will always meet the > constraints. You end up with vague things like "the worst I ever saw was X, > or make sure I have a bit more than X available and hope that's safe". Time > was when rigorous hard real time analysis was a very practical thing. >
This lot have a statistical approach to the problem based on instrumenting your application code and putting it through its paces. I've never used the tool, just aware of its existence :) http://www.rapitasystems.com/ Cheers, Martin -- martin.j.thompson@trw.com TRW Conekt - Consultancy in Engineering, Knowledge and Technology http://www.conekt.net/electronics.html