Hi,

I have implemented 1D convolution in many ways before, in the
time-domain, by FFT, or by FFT with overlap-add, but I have never did
that in 2D until now.

I tried to but I have a problem that I just cannot solve. The vertical
convolution works properly, but the problem with the horizontal
convolution is that the last kernel_width-1 bins stay to 0. I'm pretty
sure that I have made a dumb mistake somewhere, but as hard as I tried
I couldn't spot it, so if I'm posting this here it's out of despair.

Context : the array ***image represents an image, the first dimension
being the colour layer, the two others being respectively the X and Y
axis. This image has been zero-stuffed in a previous function in order
to be filtered by the array *kernel, which represents a one dimensional
windowed-sinc function.

int8_t ***convolution(int8_t ***image, uint16_t image_width, uint16_t
image_height, int8_t *kernel, uint16_t kernel_width, uint16_t
*output_width, uint16_t *output_height, char sinc_scale)
{
	int8_t ***output, *line;
	uint16_t ic, ix, iy, ik;

	*output_width = image_width+kernel_width-1;
	*output_height = image_height+kernel_width-1;

	output = malloc (4 * sizeof(int8_t **));	//allocation of the
zerostuffed result
	for (ic=0; ic<4; ic++)
	{
		output[ic] = malloc (*output_width * sizeof(int8_t *));
		for (ix=0; ix<*output_width; ix++)
			output[ic][ix] = calloc (*output_height, sizeof(int8_t));
	}

	line = malloc (*output_width * sizeof(int8_t));	//allocation of the
horizontally convoluted result (one line)

	for (ic=0; ic<4; ic++)
	{
		for (iy=0; iy<image_height; iy++)
		{
			for (ix=0; ix<image_width; ix++)
			{
				for (ik=0; ik<kernel_width; ik++)
				{
					line[ix+ik] += ( (int16_t) (image[ic][ix][iy] * kernel[ik])) >>
8-sinc_scale;	//horizontal convolution
				}
			}
			for (ix=0; ix<image_width; ix++)
			{
				for (ik=0; ik<kernel_width; ik++)
				{
					output[ic][ix][iy+ik] += ( (int16_t) (line[ix] * kernel[ik])) >>
8-sinc_scale;	//vertical convolution
				}
				line[ix]=0;
			}
		}
	}
	free(line);
	return output;
}

Thanks in advance