I am a newbie to both Awk and DSP, so forgive my ignorance. I can and normally do filter data with Matlab, Mathematica, C, Fortran, Excel, etc., but I would like something I can do on the command line in a language that is flexible and easy to use (meant for the job more than un-cryptic syntax). I have had some good luck using gawk one-liners. I know I should RTFM or read http://www.oreilly.com/catalog/awkprog3/ as Ed Morton seems to frequently suggest, but I can't think of an elegant way to do this and I was hoping one of you all could help. Is there a slick way to implement digital filters in gawk or awk? It seems like awk would be a natural at this, but I can't find anything when I Google it (gawk awk filter FIR IIR and combinations thereof). Obviously, I could program awk using arrays like one would in any other language, but I would prefer a more elegant solution that uses the natural features of awk (even if getline is involved). I really want an "2*n+1" (odd) centered point moving average (FIR) filter, so no need to worry about fancy filter coefficients or using the previous filtered values (IIR). Ideally, it would read in a two column text file, possibly with a commented header and footer, with the left column "time" and the right the signal I want to filter, and write out the filtered data in the identical format. Note that I usually have one cycle or period of multicycle averaged data (~1000 points) and it would be useful if the filter wrapped around at the ends rather than starting or ending with zeros or the beginning or ending values. (I can't think of how to do this without processing the file twice, but that would be O.K. if the code is more elegant.) For the awk experts who may be unfamiliar with DSP, the moving average (see http://www.dspguide.com/ch15.htm for why it is optimal in some cases) can be implemented recursively (http://www.dspguide.com/ch15/5.htm) which may be useful for awk implementation. The next thing I would like to have is Savitzky-Golay filtering, which is a generalized centered moving average Example weights are: {0.04576659038901602`, -0.034324942791762014`, -0.056535199892313905`, \ -0.03903620944945484`, 0.0024229371382420246`, 0.054516085610445554`, \ 0.10634001884506664`, 0.1494144568582582`, 0.17768205680441512`, \ 0.18750841297617446`, 0.17768205680441512`, 0.1494144568582582`, \ 0.10634001884506664`, 0.054516085610445554`, 0.0024229371382420246`, \ -0.03903620944945484`, -0.056535199892313905`, -0.034324942791762014`, \ 0.04576659038901602`} The center, 0.18750841297617446`, weight corresponds to "present" time a[i]. Next problem would be IIR where you use the raw data, x, and the previous filtered data points, y, to compute the new points. Wrapping the periodic data makes that one hard. I think one would need an algorithm to start it up then iterate to converge on the coupled "y" values. Any help or tips on how to handle particular aspects would be appreciated. Regards, Todd Rose
FIR and IIR Digital Filters in Gawk or Awk
Started by ●November 1, 2006
Reply by ●November 1, 20062006-11-01
Todd Rose wrote:> I am a newbie to both Awk and DSP, so forgive my ignorance. > > I can and normally do filter data with Matlab, Mathematica, C, Fortran, > Excel, etc., but I would like something I can do on the command line in a > language that is flexible and easy to use (meant for the job more than > un-cryptic syntax). I have had some good luck using gawk one-liners. I > know I should RTFM or read http://www.oreilly.com/catalog/awkprog3/ as Ed > Morton seems to frequently suggest, but I can't think of an elegant way to > do this and I was hoping one of you all could help. > > Is there a slick way to implement digital filters in gawk or awk? It seems > like awk would be a natural at this, but I can't find anything when I Google > it (gawk awk filter FIR IIR and combinations thereof). Obviously, I could > program awk using arrays like one would in any other language, but I would > prefer a more elegant solution that uses the natural features of awk (even > if getline is involved). > > I really want an "2*n+1" (odd) centered point moving average (FIR) filter, > so no need to worry about fancy filter coefficients or using the previous > filtered values (IIR). Ideally, it would read in a two column text file, > possibly with a commented header and footer, with the left column "time" and > the right the signal I want to filter, and write out the filtered data in > the identical format. Note that I usually have one cycle or period of > multicycle averaged data (~1000 points) and it would be useful if the filter > wrapped around at the ends rather than starting or ending with zeros or the > beginning or ending values. (I can't think of how to do this without > processing the file twice, but that would be O.K. if the code is more > elegant.) > > For the awk experts who may be unfamiliar with DSP, the moving average (see > http://www.dspguide.com/ch15.htm for why it is optimal in some cases) can be > implemented recursively (http://www.dspguide.com/ch15/5.htm) which may be > useful for awk implementation. > > The next thing I would like to have is Savitzky-Golay filtering, which is a > generalized centered moving average > > Example weights are: > > {0.04576659038901602`, -0.034324942791762014`, -0.056535199892313905`, \ > -0.03903620944945484`, 0.0024229371382420246`, 0.054516085610445554`, \ > 0.10634001884506664`, 0.1494144568582582`, 0.17768205680441512`, \ > 0.18750841297617446`, 0.17768205680441512`, 0.1494144568582582`, \ > 0.10634001884506664`, 0.054516085610445554`, 0.0024229371382420246`, \ > -0.03903620944945484`, -0.056535199892313905`, -0.034324942791762014`, \ > 0.04576659038901602`} > > The center, 0.18750841297617446`, weight corresponds to "present" time a[i]. > > Next problem would be IIR where you use the raw data, x, and the previous > filtered data points, y, to compute the new points. Wrapping the periodic > data makes that one hard. I think one would need an algorithm to start it > up then iterate to converge on the coupled "y" values. > > Any help or tips on how to handle particular aspects would be appreciated. > > Regards, > Todd Rose>From p.20 of The AWK Manual": If you find yourself writing awk scriptsof more than, say, a few hundred lines, you might consider using a different programming language." With that being said, wouldnt the following implement a simple FIR filter: cat foobar.txt | awk 'BEGIN{a0=0;a1=1;a2=2} { print $1, $2*a0+x1*a1+x2*a2; x2=x1; x1=$2; }' If your files are only a couple thousand lines, then why not just use C++ and the read the whole file into memory and do with it what you will? C++ has pretty nice stdin/stdout functions. Then you could use it like cat foobar.txt | firFilter coeffs.txt > fubar.txt
Reply by ●November 1, 20062006-11-01
Rutiger,
Thanks. This is great. I was looking for a one-liner (even if it is a
pretty long line). Like you said, if it is too long, you might be using the
wrong tool.
If I do it in C and pipe it, then I have to maintain it for the group with
only a few able or willing to program in C, and I would rather have them be
able to modify it to suit and improve it themselves. (I also have some
other gawk tools I would like them to be able to use. I plan on giving them
a little primer course -- once I am competent myself -- and let them loose.)
It seems to shift the data when I test it. Any idea how to realign it so
than it is centered?
y[i]=a*x[i-1]+b*x[i]+c*x[i+1]
Any idea how to wrap it around?
Here is what I would like from a 3 pt centered moving average filter:
x y filtered y
1 1 7.666667
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 8
9 9 9
10 10 10
11 11 11
12 12 12
13 13 13
14 14 14
15 15 15
16 16 16
17 17 17
18 18 18
19 19 19
20 20 13.33333
Here is what I got with
gawk 'BEGIN{a0=1/3;a1=1/3;a2=1/3} { print $1,$2*a0+x1*a1+x2*a2; x2=x1;
x1=$2; }'
1 1 0.333333
2 2 1
3 3 2
4 4 3
5 5 4
6 6 5
7 7 6
8 8 7
9 9 8
10 10 9
11 11 10
12 12 11
13 13 12
14 14 13
15 15 14
16 16 15
17 17 16
18 18 17
19 19 18
20 20 19
Todd
"rutiger" <rutiger_bishop@yahoo.com> wrote in message
news:1162398163.225461.21210@m73g2000cwd.googlegroups.com...
> From p.20 of The AWK Manual": If you find yourself writing awk scripts
> of more than, say, a few hundred lines, you might consider using a
> different programming language."
>
> With that being said, wouldnt the following implement a simple FIR
> filter:
>
> cat foobar.txt | awk 'BEGIN{a0=0;a1=1;a2=2} { print $1,
> $2*a0+x1*a1+x2*a2; x2=x1; x1=$2; }'
>
> If your files are only a couple thousand lines, then why not just use
> C++ and the read the whole file into memory and do with it what you
> will? C++ has pretty nice stdin/stdout functions. Then you could use it
> like
>
> cat foobar.txt | firFilter coeffs.txt > fubar.txt
>
Reply by ●November 1, 20062006-11-01
Todd Rose wrote:> Is there a slick way to implement digital filters in gawk or awk? It seems > like awk would be a natural at this, but I can't find anything when I Google > it (gawk awk filter FIR IIR and combinations thereof). Obviously, I could > program awk using arrays like one would in any other language, but I would > prefer a more elegant solution that uses the natural features of awk (even > if getline is involved).In the late 1990s, I wrote this one: BEGIN { if(ARGC<=1) { print "DIGIFILT - Digitale Filterung von diskreten Zeitreihen" >> "/dev/stderr" print "IN: Zeitreihen in Spalten" >> "/dev/stderr" print " jeder Zeitpunkt eine Zeile; jeder Kanal eine Spalte" >> "/dev/stderr" print "OUT: Zeitreihe; Format wie oben" >> "/dev/stderr" print "ALG: Prozedur 'filter' aus MATLAB Stud. Ed. Seite 274" >> "/dev/stderr" print "PARA: Name einer Datei mit Filter-Koeffizienten." >> "/dev/stderr" print " Das korrekte Format der Koeff.-Datei wird aus MATLAB heraus mit" >> "/dev/stderr" print " 'save mot8hz.flt a b /ascii /double' erzeugt." >> "/dev/stderr" print "Aufruf:" >> "/dev/stderr" print " gawk -f digifilt.awk diff.flt < data > filtereddata" >> "/dev/stderr" print "Beispiel: Butterworth 4. Ordnung, fs = 128 Hz, Cutoff 8 Hz" >> "/dev/stderr" print " 1.00000000000000e+000 -3.18063854887472e+000 3.86119434899422e+000 -2.11215535511097e+000 4.38265142261980e-001" >> "/dev/stderr" print " 4.16599204406620e-004 1.66639681762559e-003 2.49959522644039e-003 1.66639681762559e-003 4.16599204406731e-004" >> "/dev/stderr" print "Juergen Kahrs, 12.1.93" >> "/dev/stderr" print "Beispiel: Differentiator Beispiel: Integrator" >> "/dev/stderr" print " 1 0 1 0 rekursiver Zweig" >> "/dev/stderr" print " 1 -1 1 1 nichtrekursiver Zweig" >> "/dev/stderr" print "Juergen Kahrs, 27.3.97" >> "/dev/stderr" exit } coefffile=ARGV[1]; ARGV[1] = "" getline AStr < coefffile; getline BStr < coefffile na=split(AStr, a); nb=split(BStr, b); if (na>nb) {n=na-1} else {n=nb-1} for (i=1; i <=(n+1); i++) printf("a%d = %10.5f\t\tb%d = %10.5f\t\n", i, a[i], i, b[i]) >> "/dev/stderr" } { for(i=1; i<=NF; i++) { x[n, i] = $i y[n, i] = b[1] * x[n,i] for(j=2; j<=(n+1); j++) y[n, i] = y[n, i] + b[j] * x[n-j+1, i] - a[j] * y[n-j+1, i] printf(" %f", y[n, i]) for (j=0; j<=n; j++) { x[j, i] = x[j+1, i]; y[j, i] = y[j+1, i]; } } print "" } Comments are in German, but the script can easily be adapted to all kinds of digital filters. You only have to write a parameter file and pass the name of the parameter file to the gawk interpreter. When writing this script, I had a look at the Matlab Student Edition, which came with a wonderful manual.> The next thing I would like to have is Savitzky-Golay filtering, which is a > generalized centered moving average > > Example weights are: > > {0.04576659038901602`, -0.034324942791762014`, -0.056535199892313905`, \ > -0.03903620944945484`, 0.0024229371382420246`, 0.054516085610445554`, \ > 0.10634001884506664`, 0.1494144568582582`, 0.17768205680441512`, \ > 0.18750841297617446`, 0.17768205680441512`, 0.1494144568582582`, \ > 0.10634001884506664`, 0.054516085610445554`, 0.0024229371382420246`, \ > -0.03903620944945484`, -0.056535199892313905`, -0.034324942791762014`, \ > 0.04576659038901602`}You probably know the "Numerical Recipes" books: http://www.nrbook.com/a/bookcpdf.html Savitzky-Golay Smoothing Filters http://www.nrbook.com/a/bookcpdf/c14-8.pdf It is surprisingly simple to port the C routines to AWK. In the 1990s, I did this for some matrix routines and I still use some of these routines (for example linear regression with calculation of significance levels).
Reply by ●November 1, 20062006-11-01
Hi Todd, On Wed, 01 Nov 2006 13:49:48 -0500, Todd Rose wrote:> Any idea how to wrap it around? > > Here is what I would like from a 3 pt centered moving average filter: > > x y filtered y > > 1 1 7.666667 > 2 2 2 > 3 3 3 > 4 4 4 > 5 5 5[snip] That goal is (IMO) incompatible with the line-by-line approach that you would naturally take in awk (yes, I do DSP in awk myself, for small values of DSP). You have a requirement that the whole file be known before the filtering starts (or at least the last line, which amounts to the same thing in a stream-processing paradigm). You probably can do this in awk, but you'll have to do it by using arrays, and at that point the point of using awk is lost. I recommend that you abandon awk for this particular problem and look instead at numerical python (or even perfectly ordinary python: it'll work, but it will be a bit slower for large data files, and won't have the nice matrix arithmetic features of numpy). It might even come close to being a one-liner: you can get numpy to suck a whole file of data into a vector or array with a single statement. Most of what you like about Matlab is available in python. Cheers, -- Andrew
Reply by ●November 2, 20062006-11-02
Todd Rose wrote:> Rutiger, > > Thanks. This is great. I was looking for a one-liner (even if it is a > pretty long line). Like you said, if it is too long, you might be using the > wrong tool. > > If I do it in C and pipe it, then I have to maintain it for the group with > only a few able or willing to program in C, and I would rather have them be > able to modify it to suit and improve it themselves. (I also have some > other gawk tools I would like them to be able to use. I plan on giving them > a little primer course -- once I am competent myself -- and let them loose.) > > It seems to shift the data when I test it. Any idea how to realign it so > than it is centered?The data is 'shifted' beacuse that is how the moving average filter works. The output of a true convolution is L+P-1 samples long, where L is the length of the input vector and P is the length of the filter. You'll have to use an END block to implement the remaing P-1 outputs. To realign the data in this particular case you can just discard the first and last outputs from the filter.
Reply by ●November 3, 20062006-11-03
rutiger wrote:> Todd Rose wrote: >> Rutiger, >> >> Thanks. This is great. I was looking for a one-liner (even if it is a >> pretty long line). Like you said, if it is too long, you might be using the >> wrong tool. >> >> If I do it in C and pipe it, then I have to maintain it for the group with >> only a few able or willing to program in C, and I would rather have them be >> able to modify it to suit and improve it themselves. (I also have some >> other gawk tools I would like them to be able to use. I plan on giving them >> a little primer course -- once I am competent myself -- and let them loose.) >> >> It seems to shift the data when I test it. Any idea how to realign it so >> than it is centered? > > The data is 'shifted' beacuse that is how the moving average filter > works. The output of a true convolution is L+P-1 samples long, where L > is the length of the input vector and P is the length of the filter. > You'll have to use an END block to implement the remaing P-1 outputs. > To realign the data in this particular case you can just discard the > first and last outputs from the filter.That's all true, but it might be a little obscure at first. Maybe putting it another way would explain more clearly. You can look at the filter's output as the average of (the average of the first and last samples), (the average of the next two in from the ends), (and so on until you reach the middle). Each of those pairs applies to the time of the middle sample, so the result is delayed relative to the first sample. That's explains what happens when the thing does when the pipeline is full, but there are anomalies at both ends, when it's filling and when it's emptying. To the extent that the pipeline isn't full, the result is compromised. Discarding the worst half at each end realigns the two streams (and makes them the same length). Jerry -- "The rights of the best of men are secured only as the rights of the vilest and most abhorrent are protected." - Chief Justice Charles Evans Hughes, 1927 ���������������������������������������������������������������������
Reply by ●November 5, 20062006-11-05
Rutiger, Your filter worked great. As you pointed out, it did just what it was supposed to do and it was very elegant. Thanks for posting it. Sorry if I didn't seem appreciative. Jerry, That is a clearer explanation of the filter. The pipeline analogy seems apt. What we usually do is add a copy of the cycle to front and the back and filter it forward and backward with an elliptic filter. Then we chop off the end cycles and use that. I had written a simple Excel worksheet and just wrapped the 5 point centered moving averages around the ends. This doesn't work smoothly with awk with one pass through the file, and does seem to be out of the spirit of awk. I was just hoping there was a slick way to take advantage of the recurrence forms (the next sum can be formed by just adding one additional point to front and subtracting one from the back and leaving the middle alone.) This is sort of a mini pipeline with the length of the filter rather than the length of the time series. It seems to me that the elementary theory and way of thinking of DSP is in the real time filtering application where you have no knowledge of future time, rather than postprocessing a single cycle of data where you know all time effectively. (At least, that is how it seems to my naive eye trying to learn from the Matlab Signal Processing Toolbox documentation or the "Digital Filter Designer's Handbook" I picked up for $0.50 at the library book sale.) Everything is up-winded --there is no central difference, like in finite difference theory (e.g., in CFD). You open up a whole new can of worms when you are free from the constraints of time's causal arrow. You can use centered moving filters like the moving average or Savitzky-Golay on one pass and they don't shift the data, or you can pass the data back through the filter backwards. It is also impossible to wrap the ends around to take advantage of a cylindrical (or toroidal) phase space when you don't know future time events, but for postprocessing this is not a problem theoretically. It might be clumsy to implement in awk though. I want to exploit every advantage I have to extract information out of the signal. Todd
Reply by ●November 5, 20062006-11-05
J�rgen,
Thanks much. This worked very well on my data even with the sample filter
you provided. I translated the comments to English using BabbleFish. Here
is the script with English comments and the .bat file I used to drive it.
(I needed to strip out the first column of time data from the time series,
filter it forward and backward, and then put it back together again for
input into my application and for plotting it.)
=== digifilt.awk ===
BEGIN {
if(ARGC<=1) {
print "DIGIFILT - digital filtering of discrete time series" >>
"/dev/stderr"
print "IN: Time series (data only) in columns " >> "/dev/stderr"
print " each sample time a line; each channel a column " >>
"/dev/stderr "
print "OUT: Time series; Format like above " >> "/dev/stderr"
print "ALG: Procedure 'filter' from MATLAB Stud. OD. Page 274" >>
"/dev/stderr "
print "PARA: Name of a file with filter coefficients." >> "/dev/stderr"
print " the correct format of the coefficient file mot8hz.flt for a
b is output from MATLAB using" >> "/dev/stderr"
print " 'save mot8hz.flt a b /ascii /double ' produced." >>
"/dev/stderr"
print "call:" >> "/dev/stderr"
print "gawk -f digifilt.awk diff.flt < data > filtereddata" >>
"/dev/stderr"
print "Example: Butterworth 4. Order, fs = 128 Hz, cut-off 8 Hz " >>
"/dev/stderr"
print " 1.00000000000000e+000 -3.18063854887472e+000
3.86119434899422e+000 -2.11215535511097e+000 4.38265142261980e-001 " >>
"/dev/stderr"
print " 4.16599204406620e-004 1.66639681762559e-003
2.49959522644039e-003 1.66639681762559e-003 4.16599204406731e-004 " >>
"/dev/stderr"
print "Juergen Kahrs, 12.1.93 " >> "/dev/stderr"
print "Example: Differentiator Example: Integrator " >>
"/dev/stderr"
print " 1 0 1 0 recursive
branch " >> "/dev/stderr"
print " 1 -1 1 1
not-recursive branch " >> "/dev/stderr"
print "Juergen Kahrs, 27.3.97" >> "/dev/stderr"
exit
}
coefffile=ARGV[1]; ARGV[1] = ""
getline AStr < coefffile; getline BStr < coefffile
na=split(AStr, a); nb=split(BStr, b);
if (na>nb) {n=na-1} else {n=nb-1}
for (i=1; i <=(n+1); i++)
printf("a%d = %10.5f\t\tb%d = %10.5f\t\n", i, a[i], i, b[i]) >>
"/dev/stderr"
}
=== digifilt.bat ===
@echo off
rem gawk -f digifilt.awk diff.flt < data > filtereddata
rem split file into sample time and data (for one column of data only for
now)
rem TODO: add multiple columns of data -- digifilt.awk can handle multiple
columns
rem TODO: combine into one gawk one-liner
rem TODO: add front and rear repeats of the data to remove the end effects
rem (or really move them to the ends of the added cycles and then strip tham
back off
rem to leave the good filtfilt()ed middle time series.
gawk '{print $2}' Cylinder_1_single_raw.dat > dummycol2.dat
gawk '{print $1}' Cylinder_1_single_raw.dat > dummytime.dat
rem gawk -f digifilt.awk mot8hz.flt < dummycol2.dat > dummyfilt.dat
rem digifilt.awk is the gawk script
rem mot8hz.flt is the filter coefficients "a" followed by "b" rows (not
columns)
rem dummyfilt.dat is the output to pass back into tac then filter then tac
again
rem filter one direction
rem mot8hz as a two row "a" and "b" filter coefficient file as output from
Matlab
gawk -f digifilt.awk mot8hz.flt < dummycol2.dat > dummyfilt.dat
rem tac myfile | gawk '/log end/{f=1}f;f&&/log start/{exit}' | tac
rem filter backwards to remove delay (like filtfilt() in Matlab)
rem tac reverses the order of the file
tac dummyfilt.dat | gawk -f digifilt.awk mot8hz.flt | tac > dummyfiltrev.dat
rem this puts the time and filtfilt data back together as two columns (note
that the calling order is data then time
rem to get output as time column then data column
rem gawk 'NR==FNR{b[++n]=$0;next}FNR==n{t=","}{print $0 t, b[FNR]}' fileB
fileA
gawk 'NR==FNR{b[++n]=$0;next}FNR==n{t=","}{print $0 t, b[FNR]}'
dummyfiltrev.dat dummytime.dat > Cylinder_1_single_but.dat
=== end ===
Todd
"J�rgen Kahrs" <Juergen.KahrsDELETETHIS@vr-web.de> wrote in message
news:4qsc2cFog0j7U1@individual.net...
> Todd Rose wrote:
>
> > Is there a slick way to implement digital filters in gawk or awk? It
seems
> > like awk would be a natural at this, but I can't find anything when I
Google
> > it (gawk awk filter FIR IIR and combinations thereof). Obviously, I
could
> > program awk using arrays like one would in any other language, but I
would
> > prefer a more elegant solution that uses the natural features of awk
(even
> > if getline is involved).
>
> In the late 1990s, I wrote this one:
>
> BEGIN {
> if(ARGC<=1) {
> print "DIGIFILT - Digitale Filterung von diskreten Zeitreihen" >>
"/dev/stderr"
> print "IN: Zeitreihen in Spalten" >> "/dev/stderr"
> print " jeder Zeitpunkt eine Zeile; jeder Kanal eine Spalte" >>
"/dev/stderr"
> print "OUT: Zeitreihe; Format wie oben" >> "/dev/stderr"
> print "ALG: Prozedur 'filter' aus MATLAB Stud. Ed. Seite 274" >>
"/dev/stderr"
> print "PARA: Name einer Datei mit Filter-Koeffizienten." >>
"/dev/stderr"
> print " Das korrekte Format der Koeff.-Datei wird aus MATLAB
heraus mit" >> "/dev/stderr"
> print " 'save mot8hz.flt a b /ascii /double' erzeugt." >>
"/dev/stderr"
> print "Aufruf:" >> "/dev/stderr"
> print " gawk -f digifilt.awk diff.flt < data > filtereddata" >>
"/dev/stderr"
> print "Beispiel: Butterworth 4. Ordnung, fs = 128 Hz, Cutoff 8 Hz" >>
"/dev/stderr"
> print " 1.00000000000000e+000 -3.18063854887472e+000
3.86119434899422e+000 -2.11215535511097e+000 4.38265142261980e-001" >>
"/dev/stderr"
> print " 4.16599204406620e-004 1.66639681762559e-003
2.49959522644039e-003 1.66639681762559e-003 4.16599204406731e-004" >>
"/dev/stderr"
> print "Juergen Kahrs, 12.1.93" >> "/dev/stderr"
> print "Beispiel: Differentiator Beispiel: Integrator" >>
"/dev/stderr"
> print " 1 0 1 0
rekursiver Zweig" >> "/dev/stderr"
> print " 1 -1 1 1
nichtrekursiver Zweig" >> "/dev/stderr"
> print "Juergen Kahrs, 27.3.97" >> "/dev/stderr"
> exit
> }
> coefffile=ARGV[1]; ARGV[1] = ""
> getline AStr < coefffile; getline BStr < coefffile
> na=split(AStr, a); nb=split(BStr, b);
> if (na>nb) {n=na-1} else {n=nb-1}
> for (i=1; i <=(n+1); i++)
> printf("a%d = %10.5f\t\tb%d = %10.5f\t\n", i, a[i], i, b[i]) >>
"/dev/stderr"
> }
>
> { for(i=1; i<=NF; i++) {
> x[n, i] = $i
> y[n, i] = b[1] * x[n,i]
> for(j=2; j<=(n+1); j++) y[n, i] = y[n, i] + b[j] * x[n-j+1, i] - a[j]
* y[n-j+1, i]
> printf(" %f", y[n, i])
> for (j=0; j<=n; j++) {
> x[j, i] = x[j+1, i];
> y[j, i] = y[j+1, i];
> } }
> print ""
> }
>
> Comments are in German, but the script can easily
> be adapted to all kinds of digital filters. You
> only have to write a parameter file and pass the
> name of the parameter file to the gawk interpreter.
> When writing this script, I had a look at the
> Matlab Student Edition, which came with a wonderful
> manual.
>
> > The next thing I would like to have is Savitzky-Golay filtering, which
is a
> > generalized centered moving average
> >
> > Example weights are:
> >
> > {0.04576659038901602`, -0.034324942791762014`, -0.056535199892313905`, \
> > -0.03903620944945484`, 0.0024229371382420246`, 0.054516085610445554`, \
> > 0.10634001884506664`, 0.1494144568582582`, 0.17768205680441512`, \
> > 0.18750841297617446`, 0.17768205680441512`, 0.1494144568582582`, \
> > 0.10634001884506664`, 0.054516085610445554`, 0.0024229371382420246`, \
> > -0.03903620944945484`, -0.056535199892313905`, -0.034324942791762014`, \
> > 0.04576659038901602`}
>
> You probably know the "Numerical Recipes" books:
>
> http://www.nrbook.com/a/bookcpdf.html
> Savitzky-Golay Smoothing Filters
> http://www.nrbook.com/a/bookcpdf/c14-8.pdf
>
> It is surprisingly simple to port the C routines
> to AWK. In the 1990s, I did this for some matrix
> routines and I still use some of these routines
> (for example linear regression with calculation
> of significance levels).
>
Reply by ●November 5, 20062006-11-05
Andrew, Thank you for your response. It does seem a bit out of character with awk. I have been looking at numpy to do this and everything else I do with Matlab. I seem to need to do a lot of text processing and accessing Excel files in addition to numerical computing. This is not fun for an old Fortran programmer. I have done it in Matlab, but it was painful. I'm not sure that numpy will be lot simpler, but it looks to have some promise. Strangely enough, I find that the greater structure typing required by python makes it easier to do the vectorized operations without so much trial error (when I forget whether I need x' or x, etc., and I have to do little test scripts). What my job is (and I would guess a lot of engineers jobs are) is to take a bunch of data from different sources, combine it into a usable structure, then do some (sometimes heavy) numerical computations on it and output it as either input files for another simulation code, or as plots or tables for a written report or presentation. Programming all this seems a bit much to some people (the ones who don't have open up a 100 Excel files, transpose them, and save them as .csv without forgetting any of them), but I find it reduces errors in the final product, is a lot more fun for me, and with the usual reruns to look at one little change or with your new corrected input data, it saves a bunch of time. Despite all my fiddling around with learning python (and ruby and java), it has been gawk that has been the easiest to get working on the simpler things. (I have done all the above with Matlab and other than producing .pdf output (which might be easy, too) I think python will be the total package. (I do like ruby, but I'll need to wait a while for numrb or scirb to be what numpy and scipy are now.) Todd "Andrew Reilly" <andrew-newspost@areilly.bpc-users.org> wrote in message news:pan.2006.11.01.23.29.38.258445@areilly.bpc-users.org...> I recommend that you abandon awk for this particular problem and look > instead at numerical python (or even perfectly ordinary python: it'll > work, but it will be a bit slower for large data files, and won't have the > nice matrix arithmetic features of numpy). It might even come close to > being a one-liner: you can get numpy to suck a whole file of data into a > vector or array with a single statement. Most of what you like about > Matlab is available in python.






