Rune and Robert, thanks again. I haven't read your latest posts in full yet. I am posting to share comments from Dan Ellis about my original post. (By the way, his web page at www.ee.columbia.edu/~dpwe has an interesting collection of audio-related software and lecture slides.) From: Dan Ellis To: David Gelbart David - The point you raise is subtle and valid, but I don't believe it would have much effect on a task like speech recognition. The difference between circular and linear convolution relies on the extent to which the convolution of the two sequences exceeds the length of the basic FFT frame length i.e. if both signals are length N, then the time- aliasing involved in circular convolution could have a large impact. If, on the other hand, one of the signals is K << N points, then over N-K+1 of the resulting convolution, linear and circular convolution will match, and the discrepency is limited to K-1 points. Now, a short impulse response corresponds to a smooth frequency response (think, for instance, about windowing a longer time response, corresponding to convolving (smoothing) the spectrum with a broad window) - and thus multiplying the frequency response by a smooth function will be equivalent to convolving with a short impulse response (not always, but potentially - and particularly if the signal is zero phase i.e. the multiplication is by a pure-real, smooth function). A 256 point spectrum projected onto 13 Mel Cepstra will mostly be very smooth (except perhaps in the lowest frequency bands), so the effective impulse response is short. The other way to think about it is to think about applying a linear filter to a continuous signal, and how that would play out in the MFCCs. The obvious difference between linear and circular convolution here is the small spill-over from the points in one frame to the next frame due to the time-extent of the impulse response. So if you have one frame with energy, then the next frame is exactly zero, the effect of applying a (causal) filter will be to spill some energy into the otherwise silent frame. But in the absence of this kind of situation (where the energy of the spillover is significant in comparison to the energy within the frame), the effect of the linear filtering on the spectrum (and hence MFCCs) is pretty much what you expect - a frequency-dependent gain. Again, the difference between linear convolution and scaling FFT bins gets larger as the equivalent impulse response becomes larger compared to the FFT frame size. But this is, I think, rare or unlikely for a filter defined in the MFCC domain, because of the spectral smoothness implicit in a low-dimensional representation. DAn. [I then asked Dan whether he felt the same way about LDMN as about CMS.] From: Dan Ellis To: David Gelbart LDMN is removing the mean of each of 513 (or whatever) log spectral channels? In theory that could result in an effective filter that would differ significantly from linear filtering. But again, it's quite unlikely that a real signal would have such rapidly-varying statistics to result in such a filter. I think you can get around this by zero-padding all your frames prior to analysis. Doesn't force the effective filter to be N/2 long, but I doubt it wouldn't be.