DSPRelated.com
Forums

FIR and IIR Digital Filters in Gawk or Awk

Started by Todd Rose November 1, 2006
I am a newbie to both Awk and DSP, so forgive my ignorance.

I can and normally do filter data with Matlab, Mathematica, C, Fortran,
Excel, etc., but I would like something I can do on the command line in a
language that is flexible and easy to use (meant for the job more than
un-cryptic syntax).  I have had some good luck using gawk one-liners.  I
know I should RTFM or read http://www.oreilly.com/catalog/awkprog3/ as Ed
Morton seems to frequently suggest, but I can't think of an elegant way to
do this and I was hoping one of you all could help.

Is there a slick way to implement digital filters in gawk or awk?  It seems
like awk would be a natural at this, but I can't find anything when I Google
it (gawk awk filter FIR IIR and combinations thereof).  Obviously, I could
program awk using arrays like one would in any other language, but I would
prefer a more elegant solution that uses the natural features of awk (even
if getline is involved).

I really want an "2*n+1" (odd) centered point moving average (FIR) filter,
so no need to worry about fancy filter coefficients or using the previous
filtered values (IIR).  Ideally, it would read in a two column text file,
possibly with a commented header and footer, with the left column "time" and
the right the signal I want to filter, and write out the filtered data in
the identical format.  Note that I usually have one cycle or period of
multicycle averaged data (~1000 points) and it would be useful if the filter
wrapped around at the ends rather than starting or ending with zeros or the
beginning or ending values.  (I can't think of how to do this without
processing the file twice, but that would be O.K. if the code is more
elegant.)

For the awk experts who may be unfamiliar with DSP, the moving average (see
http://www.dspguide.com/ch15.htm for why it is optimal in some cases) can be
implemented recursively (http://www.dspguide.com/ch15/5.htm) which may be
useful for awk implementation.

The next thing I would like to have is Savitzky-Golay filtering, which is a
generalized centered moving average

Example weights are:

{0.04576659038901602`, -0.034324942791762014`, -0.056535199892313905`, \
-0.03903620944945484`, 0.0024229371382420246`, 0.054516085610445554`, \
0.10634001884506664`, 0.1494144568582582`, 0.17768205680441512`, \
0.18750841297617446`, 0.17768205680441512`, 0.1494144568582582`, \
0.10634001884506664`, 0.054516085610445554`, 0.0024229371382420246`, \
-0.03903620944945484`, -0.056535199892313905`, -0.034324942791762014`, \
0.04576659038901602`}

The center, 0.18750841297617446`, weight corresponds to "present" time a[i].

Next problem would be IIR where you use the raw data, x, and the previous
filtered data points, y,  to compute the new points.  Wrapping the periodic
data makes that one hard.  I think one would need an algorithm to start it
up then iterate to converge on the coupled "y" values.

Any help or tips on how to handle particular aspects would be appreciated.

Regards,
Todd Rose


Todd Rose wrote:
> I am a newbie to both Awk and DSP, so forgive my ignorance. > > I can and normally do filter data with Matlab, Mathematica, C, Fortran, > Excel, etc., but I would like something I can do on the command line in a > language that is flexible and easy to use (meant for the job more than > un-cryptic syntax). I have had some good luck using gawk one-liners. I > know I should RTFM or read http://www.oreilly.com/catalog/awkprog3/ as Ed > Morton seems to frequently suggest, but I can't think of an elegant way to > do this and I was hoping one of you all could help. > > Is there a slick way to implement digital filters in gawk or awk? It seems > like awk would be a natural at this, but I can't find anything when I Google > it (gawk awk filter FIR IIR and combinations thereof). Obviously, I could > program awk using arrays like one would in any other language, but I would > prefer a more elegant solution that uses the natural features of awk (even > if getline is involved). > > I really want an "2*n+1" (odd) centered point moving average (FIR) filter, > so no need to worry about fancy filter coefficients or using the previous > filtered values (IIR). Ideally, it would read in a two column text file, > possibly with a commented header and footer, with the left column "time" and > the right the signal I want to filter, and write out the filtered data in > the identical format. Note that I usually have one cycle or period of > multicycle averaged data (~1000 points) and it would be useful if the filter > wrapped around at the ends rather than starting or ending with zeros or the > beginning or ending values. (I can't think of how to do this without > processing the file twice, but that would be O.K. if the code is more > elegant.) > > For the awk experts who may be unfamiliar with DSP, the moving average (see > http://www.dspguide.com/ch15.htm for why it is optimal in some cases) can be > implemented recursively (http://www.dspguide.com/ch15/5.htm) which may be > useful for awk implementation. > > The next thing I would like to have is Savitzky-Golay filtering, which is a > generalized centered moving average > > Example weights are: > > {0.04576659038901602`, -0.034324942791762014`, -0.056535199892313905`, \ > -0.03903620944945484`, 0.0024229371382420246`, 0.054516085610445554`, \ > 0.10634001884506664`, 0.1494144568582582`, 0.17768205680441512`, \ > 0.18750841297617446`, 0.17768205680441512`, 0.1494144568582582`, \ > 0.10634001884506664`, 0.054516085610445554`, 0.0024229371382420246`, \ > -0.03903620944945484`, -0.056535199892313905`, -0.034324942791762014`, \ > 0.04576659038901602`} > > The center, 0.18750841297617446`, weight corresponds to "present" time a[i]. > > Next problem would be IIR where you use the raw data, x, and the previous > filtered data points, y, to compute the new points. Wrapping the periodic > data makes that one hard. I think one would need an algorithm to start it > up then iterate to converge on the coupled "y" values. > > Any help or tips on how to handle particular aspects would be appreciated. > > Regards, > Todd Rose
>From p.20 of The AWK Manual": If you find yourself writing awk scripts
of more than, say, a few hundred lines, you might consider using a different programming language." With that being said, wouldnt the following implement a simple FIR filter: cat foobar.txt | awk 'BEGIN{a0=0;a1=1;a2=2} { print $1, $2*a0+x1*a1+x2*a2; x2=x1; x1=$2; }' If your files are only a couple thousand lines, then why not just use C++ and the read the whole file into memory and do with it what you will? C++ has pretty nice stdin/stdout functions. Then you could use it like cat foobar.txt | firFilter coeffs.txt > fubar.txt
Rutiger,

Thanks.  This is great.  I was looking for a one-liner (even if it is a
pretty long line).  Like you said, if it is too long, you might be using the
wrong tool.

If I do it in C and pipe it, then I have to maintain it for the group with
only a few able or willing to program in C, and I would rather have them be
able to modify it to suit and improve it themselves.  (I also have some
other gawk tools I would like them to be able to use.  I plan on giving them
a little primer course -- once I am competent myself -- and let them loose.)

It seems to shift the data when I test it.  Any idea how to realign it so
than it is centered?

y[i]=a*x[i-1]+b*x[i]+c*x[i+1]

Any idea how to wrap it around?

Here is what I would like from a 3 pt centered moving average filter:

x    y    filtered y

      1 1 7.666667
      2 2 2
      3 3 3
      4 4 4
      5 5 5
      6 6 6
      7 7 7
      8 8 8
      9 9 9
      10 10 10
      11 11 11
      12 12 12
      13 13 13
      14 14 14
      15 15 15
      16 16 16
      17 17 17
      18 18 18
      19 19 19
      20 20 13.33333


Here is what I got with

gawk 'BEGIN{a0=1/3;a1=1/3;a2=1/3} { print $1,$2*a0+x1*a1+x2*a2; x2=x1;
x1=$2; }'

      1 1 0.333333
      2 2 1
      3 3 2
      4 4 3
      5 5 4
      6 6 5
      7 7 6
      8 8 7
      9 9 8
      10 10 9
      11 11 10
      12 12 11
      13 13 12
      14 14 13
      15 15 14
      16 16 15
      17 17 16
      18 18 17
      19 19 18
      20 20 19


Todd

"rutiger" <rutiger_bishop@yahoo.com> wrote in message
news:1162398163.225461.21210@m73g2000cwd.googlegroups.com...
> From p.20 of The AWK Manual": If you find yourself writing awk scripts > of more than, say, a few hundred lines, you might consider using a > different programming language." > > With that being said, wouldnt the following implement a simple FIR > filter: > > cat foobar.txt | awk 'BEGIN{a0=0;a1=1;a2=2} { print $1, > $2*a0+x1*a1+x2*a2; x2=x1; x1=$2; }' > > If your files are only a couple thousand lines, then why not just use > C++ and the read the whole file into memory and do with it what you > will? C++ has pretty nice stdin/stdout functions. Then you could use it > like > > cat foobar.txt | firFilter coeffs.txt > fubar.txt >
Todd Rose wrote:

> Is there a slick way to implement digital filters in gawk or awk? It seems > like awk would be a natural at this, but I can't find anything when I Google > it (gawk awk filter FIR IIR and combinations thereof). Obviously, I could > program awk using arrays like one would in any other language, but I would > prefer a more elegant solution that uses the natural features of awk (even > if getline is involved).
In the late 1990s, I wrote this one: BEGIN { if(ARGC<=1) { print "DIGIFILT - Digitale Filterung von diskreten Zeitreihen" >> "/dev/stderr" print "IN: Zeitreihen in Spalten" >> "/dev/stderr" print " jeder Zeitpunkt eine Zeile; jeder Kanal eine Spalte" >> "/dev/stderr" print "OUT: Zeitreihe; Format wie oben" >> "/dev/stderr" print "ALG: Prozedur 'filter' aus MATLAB Stud. Ed. Seite 274" >> "/dev/stderr" print "PARA: Name einer Datei mit Filter-Koeffizienten." >> "/dev/stderr" print " Das korrekte Format der Koeff.-Datei wird aus MATLAB heraus mit" >> "/dev/stderr" print " 'save mot8hz.flt a b /ascii /double' erzeugt." >> "/dev/stderr" print "Aufruf:" >> "/dev/stderr" print " gawk -f digifilt.awk diff.flt < data > filtereddata" >> "/dev/stderr" print "Beispiel: Butterworth 4. Ordnung, fs = 128 Hz, Cutoff 8 Hz" >> "/dev/stderr" print " 1.00000000000000e+000 -3.18063854887472e+000 3.86119434899422e+000 -2.11215535511097e+000 4.38265142261980e-001" >> "/dev/stderr" print " 4.16599204406620e-004 1.66639681762559e-003 2.49959522644039e-003 1.66639681762559e-003 4.16599204406731e-004" >> "/dev/stderr" print "Juergen Kahrs, 12.1.93" >> "/dev/stderr" print "Beispiel: Differentiator Beispiel: Integrator" >> "/dev/stderr" print " 1 0 1 0 rekursiver Zweig" >> "/dev/stderr" print " 1 -1 1 1 nichtrekursiver Zweig" >> "/dev/stderr" print "Juergen Kahrs, 27.3.97" >> "/dev/stderr" exit } coefffile=ARGV[1]; ARGV[1] = "" getline AStr < coefffile; getline BStr < coefffile na=split(AStr, a); nb=split(BStr, b); if (na>nb) {n=na-1} else {n=nb-1} for (i=1; i <=(n+1); i++) printf("a%d = %10.5f\t\tb%d = %10.5f\t\n", i, a[i], i, b[i]) >> "/dev/stderr" } { for(i=1; i<=NF; i++) { x[n, i] = $i y[n, i] = b[1] * x[n,i] for(j=2; j<=(n+1); j++) y[n, i] = y[n, i] + b[j] * x[n-j+1, i] - a[j] * y[n-j+1, i] printf(" %f", y[n, i]) for (j=0; j<=n; j++) { x[j, i] = x[j+1, i]; y[j, i] = y[j+1, i]; } } print "" } Comments are in German, but the script can easily be adapted to all kinds of digital filters. You only have to write a parameter file and pass the name of the parameter file to the gawk interpreter. When writing this script, I had a look at the Matlab Student Edition, which came with a wonderful manual.
> The next thing I would like to have is Savitzky-Golay filtering, which is a > generalized centered moving average > > Example weights are: > > {0.04576659038901602`, -0.034324942791762014`, -0.056535199892313905`, \ > -0.03903620944945484`, 0.0024229371382420246`, 0.054516085610445554`, \ > 0.10634001884506664`, 0.1494144568582582`, 0.17768205680441512`, \ > 0.18750841297617446`, 0.17768205680441512`, 0.1494144568582582`, \ > 0.10634001884506664`, 0.054516085610445554`, 0.0024229371382420246`, \ > -0.03903620944945484`, -0.056535199892313905`, -0.034324942791762014`, \ > 0.04576659038901602`}
You probably know the "Numerical Recipes" books: http://www.nrbook.com/a/bookcpdf.html Savitzky-Golay Smoothing Filters http://www.nrbook.com/a/bookcpdf/c14-8.pdf It is surprisingly simple to port the C routines to AWK. In the 1990s, I did this for some matrix routines and I still use some of these routines (for example linear regression with calculation of significance levels).
Hi Todd,

On Wed, 01 Nov 2006 13:49:48 -0500, Todd Rose wrote:

> Any idea how to wrap it around? > > Here is what I would like from a 3 pt centered moving average filter: > > x y filtered y > > 1 1 7.666667 > 2 2 2 > 3 3 3 > 4 4 4 > 5 5 5
[snip] That goal is (IMO) incompatible with the line-by-line approach that you would naturally take in awk (yes, I do DSP in awk myself, for small values of DSP). You have a requirement that the whole file be known before the filtering starts (or at least the last line, which amounts to the same thing in a stream-processing paradigm). You probably can do this in awk, but you'll have to do it by using arrays, and at that point the point of using awk is lost. I recommend that you abandon awk for this particular problem and look instead at numerical python (or even perfectly ordinary python: it'll work, but it will be a bit slower for large data files, and won't have the nice matrix arithmetic features of numpy). It might even come close to being a one-liner: you can get numpy to suck a whole file of data into a vector or array with a single statement. Most of what you like about Matlab is available in python. Cheers, -- Andrew
Todd Rose wrote:
> Rutiger, > > Thanks. This is great. I was looking for a one-liner (even if it is a > pretty long line). Like you said, if it is too long, you might be using the > wrong tool. > > If I do it in C and pipe it, then I have to maintain it for the group with > only a few able or willing to program in C, and I would rather have them be > able to modify it to suit and improve it themselves. (I also have some > other gawk tools I would like them to be able to use. I plan on giving them > a little primer course -- once I am competent myself -- and let them loose.) > > It seems to shift the data when I test it. Any idea how to realign it so > than it is centered?
The data is 'shifted' beacuse that is how the moving average filter works. The output of a true convolution is L+P-1 samples long, where L is the length of the input vector and P is the length of the filter. You'll have to use an END block to implement the remaing P-1 outputs. To realign the data in this particular case you can just discard the first and last outputs from the filter.
rutiger wrote:
> Todd Rose wrote: >> Rutiger, >> >> Thanks. This is great. I was looking for a one-liner (even if it is a >> pretty long line). Like you said, if it is too long, you might be using the >> wrong tool. >> >> If I do it in C and pipe it, then I have to maintain it for the group with >> only a few able or willing to program in C, and I would rather have them be >> able to modify it to suit and improve it themselves. (I also have some >> other gawk tools I would like them to be able to use. I plan on giving them >> a little primer course -- once I am competent myself -- and let them loose.) >> >> It seems to shift the data when I test it. Any idea how to realign it so >> than it is centered? > > The data is 'shifted' beacuse that is how the moving average filter > works. The output of a true convolution is L+P-1 samples long, where L > is the length of the input vector and P is the length of the filter. > You'll have to use an END block to implement the remaing P-1 outputs. > To realign the data in this particular case you can just discard the > first and last outputs from the filter.
That's all true, but it might be a little obscure at first. Maybe putting it another way would explain more clearly. You can look at the filter's output as the average of (the average of the first and last samples), (the average of the next two in from the ends), (and so on until you reach the middle). Each of those pairs applies to the time of the middle sample, so the result is delayed relative to the first sample. That's explains what happens when the thing does when the pipeline is full, but there are anomalies at both ends, when it's filling and when it's emptying. To the extent that the pipeline isn't full, the result is compromised. Discarding the worst half at each end realigns the two streams (and makes them the same length). Jerry -- "The rights of the best of men are secured only as the rights of the vilest and most abhorrent are protected." - Chief Justice Charles Evans Hughes, 1927 &#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;
Rutiger,

Your filter worked great.  As you pointed out, it did just what it was
supposed to do and it was very elegant.  Thanks for posting it.  Sorry if I
didn't seem appreciative.

Jerry,

That is a clearer explanation of the filter.  The pipeline analogy seems
apt.  What we usually do is add a copy of the cycle to front and the back
and filter it forward and backward with an elliptic filter.  Then we chop
off the end cycles and use that.  I had written a simple Excel worksheet and
just wrapped the 5 point centered moving averages around the ends.  This
doesn't work smoothly with awk with one pass through the file, and does seem
to be out of the spirit of awk.  I was just hoping there was a slick way to
take advantage of the recurrence forms (the next sum can be formed by just
adding one additional point to front and subtracting one from the back and
leaving the middle alone.)  This is sort of a mini pipeline with the length
of the filter rather than the length of the time series.

It seems to me that the elementary theory and way of thinking of DSP is in
the real time filtering application where you have no knowledge of future
time, rather than postprocessing a single cycle of data where you know all
time effectively.  (At least, that is how it seems to my naive eye trying to
learn from the Matlab Signal Processing Toolbox documentation or the
"Digital Filter Designer's Handbook" I picked up for $0.50 at the library
book sale.)  Everything is up-winded --there is no central difference, like
in finite difference theory (e.g., in CFD).  You open up a whole new can of
worms when you are free from the constraints of time's causal arrow.  You
can use centered moving filters like the moving average or Savitzky-Golay on
one pass and they don't shift the data, or you can pass the data back
through the filter backwards.  It is also impossible to wrap the ends around
to take advantage of a cylindrical (or toroidal) phase space when you don't
know future time events, but for postprocessing this is not a problem
theoretically.  It might be clumsy to implement in awk though.  I want to
exploit every advantage I have to extract information out of the signal.

Todd


J&#4294967295;rgen,

Thanks much.  This worked very well on my data even with the sample filter
you provided.  I translated the comments to English using BabbleFish.  Here
is the script with English comments and the .bat file I used to drive it.
(I needed to strip out the first column of time data from the time series,
filter it forward and backward, and then put it back together again for
input into my application and for plotting it.)

=== digifilt.awk ===

BEGIN {
  if(ARGC<=1) {
    print "DIGIFILT - digital filtering of discrete time series" >>
"/dev/stderr"
    print "IN:  Time series (data only) in columns " >> "/dev/stderr"
    print "     each sample time a line; each channel a column " >>
"/dev/stderr "
    print "OUT: Time series; Format like above " >> "/dev/stderr"
    print "ALG: Procedure 'filter' from MATLAB Stud. OD. Page 274" >>
"/dev/stderr "
    print "PARA: Name of a file with filter coefficients." >> "/dev/stderr"
    print "     the correct format of the coefficient file mot8hz.flt for a
b is output from MATLAB using" >> "/dev/stderr"
    print "     'save mot8hz.flt a b /ascii /double ' produced." >>
"/dev/stderr"
    print "call:" >> "/dev/stderr"
    print "gawk -f digifilt.awk diff.flt < data > filtereddata" >>
"/dev/stderr"
    print "Example: Butterworth 4. Order, fs = 128 Hz, cut-off 8 Hz " >>
"/dev/stderr"
    print " 1.00000000000000e+000 -3.18063854887472e+000
3.86119434899422e+000 -2.11215535511097e+000 4.38265142261980e-001 " >>
"/dev/stderr"
    print " 4.16599204406620e-004  1.66639681762559e-003
2.49959522644039e-003  1.66639681762559e-003 4.16599204406731e-004 " >>
"/dev/stderr"
    print "Juergen Kahrs, 12.1.93 " >> "/dev/stderr"
    print "Example: Differentiator               Example: Integrator " >>
"/dev/stderr"
    print " 1 0                                   1 0            recursive
branch " >> "/dev/stderr"
    print " 1 -1                                  1 1
not-recursive branch " >> "/dev/stderr"
    print "Juergen Kahrs, 27.3.97" >> "/dev/stderr"
    exit
  }
  coefffile=ARGV[1]; ARGV[1] = ""
  getline AStr < coefffile; getline BStr < coefffile
  na=split(AStr, a); nb=split(BStr, b);
  if (na>nb) {n=na-1} else {n=nb-1}
  for (i=1; i <=(n+1); i++)
    printf("a%d = %10.5f\t\tb%d = %10.5f\t\n", i, a[i], i, b[i]) >>
"/dev/stderr"

}

=== digifilt.bat ===

@echo off
rem gawk -f digifilt.awk diff.flt < data > filtereddata
rem split file into sample time and data (for one column of data only for
now)
rem TODO: add multiple columns of data -- digifilt.awk can handle multiple
columns
rem TODO: combine into one gawk one-liner
rem TODO: add front and rear repeats of the data to remove the end effects
rem (or really move them to the ends of the added cycles and then strip tham
back off
rem to leave the good filtfilt()ed middle time series.
gawk '{print $2}' Cylinder_1_single_raw.dat > dummycol2.dat
gawk '{print $1}' Cylinder_1_single_raw.dat > dummytime.dat
rem gawk -f digifilt.awk mot8hz.flt < dummycol2.dat > dummyfilt.dat
rem digifilt.awk is the gawk script
rem mot8hz.flt is the filter coefficients "a" followed by "b" rows (not
columns)
rem dummyfilt.dat is the output to pass back into tac then filter then tac
again
rem filter one direction
rem mot8hz as a two row "a" and "b" filter coefficient file as output from
Matlab
gawk -f digifilt.awk mot8hz.flt < dummycol2.dat > dummyfilt.dat
rem tac myfile | gawk '/log end/{f=1}f;f&&/log start/{exit}' | tac
rem filter backwards to remove delay (like filtfilt() in Matlab)
rem tac reverses the order of the file
tac dummyfilt.dat | gawk -f digifilt.awk mot8hz.flt | tac > dummyfiltrev.dat
rem this puts the time and filtfilt data back together as two columns (note
that the calling order is data then time
rem to get output as time column then data column
rem gawk 'NR==FNR{b[++n]=$0;next}FNR==n{t=","}{print $0 t, b[FNR]}' fileB
fileA
gawk 'NR==FNR{b[++n]=$0;next}FNR==n{t=","}{print $0 t, b[FNR]}'
dummyfiltrev.dat dummytime.dat > Cylinder_1_single_but.dat

=== end ===

Todd

"J&#4294967295;rgen Kahrs" <Juergen.KahrsDELETETHIS@vr-web.de> wrote in message
news:4qsc2cFog0j7U1@individual.net...
> Todd Rose wrote: > > > Is there a slick way to implement digital filters in gawk or awk? It
seems
> > like awk would be a natural at this, but I can't find anything when I
Google
> > it (gawk awk filter FIR IIR and combinations thereof). Obviously, I
could
> > program awk using arrays like one would in any other language, but I
would
> > prefer a more elegant solution that uses the natural features of awk
(even
> > if getline is involved). > > In the late 1990s, I wrote this one: > > BEGIN { > if(ARGC<=1) { > print "DIGIFILT - Digitale Filterung von diskreten Zeitreihen" >>
"/dev/stderr"
> print "IN: Zeitreihen in Spalten" >> "/dev/stderr" > print " jeder Zeitpunkt eine Zeile; jeder Kanal eine Spalte" >>
"/dev/stderr"
> print "OUT: Zeitreihe; Format wie oben" >> "/dev/stderr" > print "ALG: Prozedur 'filter' aus MATLAB Stud. Ed. Seite 274" >>
"/dev/stderr"
> print "PARA: Name einer Datei mit Filter-Koeffizienten." >>
"/dev/stderr"
> print " Das korrekte Format der Koeff.-Datei wird aus MATLAB
heraus mit" >> "/dev/stderr"
> print " 'save mot8hz.flt a b /ascii /double' erzeugt." >>
"/dev/stderr"
> print "Aufruf:" >> "/dev/stderr" > print " gawk -f digifilt.awk diff.flt < data > filtereddata" >>
"/dev/stderr"
> print "Beispiel: Butterworth 4. Ordnung, fs = 128 Hz, Cutoff 8 Hz" >>
"/dev/stderr"
> print " 1.00000000000000e+000 -3.18063854887472e+000
3.86119434899422e+000 -2.11215535511097e+000 4.38265142261980e-001" >> "/dev/stderr"
> print " 4.16599204406620e-004 1.66639681762559e-003
2.49959522644039e-003 1.66639681762559e-003 4.16599204406731e-004" >> "/dev/stderr"
> print "Juergen Kahrs, 12.1.93" >> "/dev/stderr" > print "Beispiel: Differentiator Beispiel: Integrator" >>
"/dev/stderr"
> print " 1 0 1 0
rekursiver Zweig" >> "/dev/stderr"
> print " 1 -1 1 1
nichtrekursiver Zweig" >> "/dev/stderr"
> print "Juergen Kahrs, 27.3.97" >> "/dev/stderr" > exit > } > coefffile=ARGV[1]; ARGV[1] = "" > getline AStr < coefffile; getline BStr < coefffile > na=split(AStr, a); nb=split(BStr, b); > if (na>nb) {n=na-1} else {n=nb-1} > for (i=1; i <=(n+1); i++) > printf("a%d = %10.5f\t\tb%d = %10.5f\t\n", i, a[i], i, b[i]) >>
"/dev/stderr"
> } > > { for(i=1; i<=NF; i++) { > x[n, i] = $i > y[n, i] = b[1] * x[n,i] > for(j=2; j<=(n+1); j++) y[n, i] = y[n, i] + b[j] * x[n-j+1, i] - a[j]
* y[n-j+1, i]
> printf(" %f", y[n, i]) > for (j=0; j<=n; j++) { > x[j, i] = x[j+1, i]; > y[j, i] = y[j+1, i]; > } } > print "" > } > > Comments are in German, but the script can easily > be adapted to all kinds of digital filters. You > only have to write a parameter file and pass the > name of the parameter file to the gawk interpreter. > When writing this script, I had a look at the > Matlab Student Edition, which came with a wonderful > manual. > > > The next thing I would like to have is Savitzky-Golay filtering, which
is a
> > generalized centered moving average > > > > Example weights are: > > > > {0.04576659038901602`, -0.034324942791762014`, -0.056535199892313905`, \ > > -0.03903620944945484`, 0.0024229371382420246`, 0.054516085610445554`, \ > > 0.10634001884506664`, 0.1494144568582582`, 0.17768205680441512`, \ > > 0.18750841297617446`, 0.17768205680441512`, 0.1494144568582582`, \ > > 0.10634001884506664`, 0.054516085610445554`, 0.0024229371382420246`, \ > > -0.03903620944945484`, -0.056535199892313905`, -0.034324942791762014`, \ > > 0.04576659038901602`} > > You probably know the "Numerical Recipes" books: > > http://www.nrbook.com/a/bookcpdf.html > Savitzky-Golay Smoothing Filters > http://www.nrbook.com/a/bookcpdf/c14-8.pdf > > It is surprisingly simple to port the C routines > to AWK. In the 1990s, I did this for some matrix > routines and I still use some of these routines > (for example linear regression with calculation > of significance levels). >
Andrew,

Thank you for your response.  It does seem a bit out of character with awk.

I have been looking at numpy to do this and everything else I do with
Matlab.  I seem to need to do a lot of text processing and accessing Excel
files in addition to numerical computing.  This is not fun for an old
Fortran programmer.  I have done it in Matlab, but it was painful.  I'm not
sure that numpy will be lot simpler, but it looks to have some promise.
Strangely enough, I find that the greater structure typing required by
python makes it easier to do the vectorized operations without so much trial
error (when I forget whether I need x' or x, etc., and I have to do little
test scripts).

What my job is (and I would guess a lot of engineers jobs are) is to take a
bunch of data from different sources, combine it into a usable structure,
then do some (sometimes heavy) numerical computations on it and output it as
either input files for another simulation code, or as plots or tables for a
written report or presentation.  Programming all this seems a bit much to
some people (the ones who don't have open up a 100 Excel files, transpose
them, and save them as .csv without forgetting any of them), but I find it
reduces errors in the final product, is a lot more fun for me, and with the
usual reruns to look at one little change or with your new corrected input
data, it saves a bunch of time.

Despite all my fiddling around with learning python (and ruby and java), it
has been gawk that has been the easiest to get working on the simpler
things.  (I have done all the above with Matlab and other than producing
.pdf output (which might be easy, too) I think python will be the total
package.  (I do like ruby, but I'll need to wait a while for numrb or scirb
to be what numpy and scipy are now.)

Todd
"Andrew Reilly" <andrew-newspost@areilly.bpc-users.org> wrote in message
news:pan.2006.11.01.23.29.38.258445@areilly.bpc-users.org...
> I recommend that you abandon awk for this particular problem and look > instead at numerical python (or even perfectly ordinary python: it'll > work, but it will be a bit slower for large data files, and won't have the > nice matrix arithmetic features of numpy). It might even come close to > being a one-liner: you can get numpy to suck a whole file of data into a > vector or array with a single statement. Most of what you like about > Matlab is available in python.