I am designing the FIR filter for up conversion in FPGA. Input to the FIR filter is 16 bits wide (-32768 to 32767) with the maximum co-eff value of 32767. This produces 32 bit output at accumulator which is rounded to 16 bits before feeding other filter. During verification, I see the output value is almost divided by 2 of input value (even though both input and output are 16 bits). Since I have multiple filters cascaded, output level start diminishing at each stage based on rounding value. With the lower amplitude input data at the input, I see no signal at the output filter because each filter rounds the output.
Looks like, I am missing something during rounding. What should I take care during rounding ? Any reference would be helpful.
Hello srid. Perhaps the material at the following web page will be of some small value to you:
For interpolator CIC each comb stage requires input width plus one
extra bit while the bit growth at each integrator depends on stage
and the final integrator gain = (R.M)N/R.
Thus with input of B width the maximum bitwidth = ceil(log2((R.M)N/R) + B).
R = upsampling rate
M = delay stages
N = number of integrator or comb stages.
Normally truncation or rounding should NOT be applied until final output.
The overall dc gain = ((M*N)^N)/N which can be adjusted as required at final truncation.
Register pruning as suggested by Rick is really not worth the effort in FPGAs as they have hundreds of thousands of registers.
As Kaz pointed out, the FPGA got plenty of registers. CIC filter is implemented with full precision at the each stage and the rounding is done at the final stage. My current implementation didn't take care of the gain reduction based on interpolation rate which is something to be done.
srid, don't get it wrong. The interpolation effect on gain relates to classic use of FIR as interpolator by zero insertion. It does not apply to CIC. I have posted details of CIC interpolator bitwidth and dc gain.
Hi kaz. Thanks for your good advice.
What is the DC gain of your filter, and the maximum gain? Worst-case maximum gain is the sum of the absolute values of the coefficients.
The DC gain is ~1 (0.9999. Sum of all the co-efficients). The maximum gain is 65535 (sum of absolute 16 bit co-efficent values).
if coeff sum is 2^16 and you truncate 16 LSBs off sum then that should be ok unless you are upsampling in which case you need to increase gain by the upsampling factor.
I am assuming low pass filter
Thanks Kaz. It is low pass filter. You mean increasing the gain by multiplying the "output" by interpolation factor ?
No need to re-multiply. If you truncate 15 LSBs it gives a gain of 2.
In this case you need to discard one MSB and may need clipping but you can pre-compute that for worst case.
CIC filter is unique and has its own rules of gain control. There is plenty literature around.
Thanks Kaz !
No matter what, I'd test it with a single sample. And those single samples could be of various amplitudes. It won't tell you about summations and rounding but about the integrity of the implementation - since you know what the answer should be. And, it should tell you what the scaling is.
Therefore, your results are correct.
dudelsound,
If you truncate (n) LSBS from a data bus you are dividing its value by 2^n.
That applies always and irrespective of any other issues.
yes, of course, that's what I'm saying. You must not truncate 16 bits but 15.
you truncate as required by how you pre-scaled coeffs. This is unrelated to issue of 16 x 16 => 31 bits. In fact it should be 32 bits since the max negative value of -32768 * -32768 requires 32 bits, though it is narrow case and can be avoided but does lead to overflow if 31 bits are used without accounting for it.
Again - that's all clear - it is just a common mistake to think that since we multiplied two 16 bit values, we just take the upper 16 bit of the 32 bit result and are done.
And since the question was about a mysterious 6dB loss I thought this common mistake might be worth mentioning. Naturally, everything works well if you "truncate as required", but as things were obviously not well, I thought "not truncating as required" might be the problem.
thanks dudelsound, got your perspective.
srid, Though your post is mixing between FIR (coeffs) & CIC but I guess you are using both.
Regarding FIR scaling (without upsampling by zero insertion), you say your coeffs sum is ~1 before scaling and then after scaling the max coeff is 32767 and later you say their absolute sum is 65535. If sum of SIGNED coeffs is 65535 then truncating 16 bits should give dc unity(well almost). The sum of ABSOLUTE coeff values (i.e. ignoring sign) indicates maximum possible output value for worst case input pattern. That is when input values invert the sign of each and every coeff such that all add up as all positive or all negative products.
Yes Kaz. I use both FIR and CIC filters.
for FIR filter, the coeff sums before scaling is ~1. The max coeff is 32767, the sum of signed coeffs are 65535 and the absolute sum of coeffs are 108423.
>>If sum of signed coeffs is 65535 then truncating 16 bits should give dc >>unity(well almost).
Well, with the 16 bits are truncated, I don't see the unity gain.
possibility 1: your observation is wrong
possibility2: your conclusion is wrong
possibility3: your platform is doing something wrong
are you checking in simulation or straight diving in hardware.
Testing in simulation. I might be doing something wrong, will look at them more closely. Thank you for the help !
As suggested by Fred, one easy quick test (but partial) is to inject a constant input e.g. your maximum of +32767 for 16 bits signed input for one sample only. Then you should get your coeffs at output scaled by 32767/2^16 ~= half.
This is impulse response test but scaled and checks multipliers.
Then you can inject same input for all samples to check adders (dc response) and you should get cumsum scaled.
With the Impulse scaled to max value (+32767), I get the output = coeffs value / 2. That's the reason I posted the question in first place to understand why. For an unity gain filter, I was expecting the output should be equal to co-effs value.
so your conclusion is wrong...
makes sense Kaz. Which is the better approach to keep the gain ~1. Scale up the coefficients or rounding 15 bits and taking care of overflow ?
Once your coeff sum is 1 then any pre-scaling of coeffs by 2^n should be followed by descaling by /2^n of final sum. Using power of 2 is convenient instead of using dividers.
what (n) should be is matter of resolution and available resource.
Moreover you may modify gain to 2 or less or more by changing the scaling/descaling ratios.