Reply by Bob Masta February 12, 20152015-02-12
On Tue, 10 Feb 2015 22:36:01 -0800 (PST), sash236@gmail.com
wrote:

>I am capturing human speech on the web. >I find there are examples that people are sampling at 44.1 and 48kHz. >All of them are also stereo. > >If the sole purpose of capturing the sound is for extracting features, what might be the minimum or optimal sampling rate?
That depends a lot on what features you are trying to extract. Consider that "telephone quality" is 300-3000 Hz, implying that an 8000 Hz sample rate would be fine.
>Is there any value in Stereo signal? Am I correct that the left and right speech samples are identical in stereo - so just ignoring the buffer of one channel make it into a mono as a way to convert it?
They'd be identical if they were recorded from a single mono mic, which is likely the case. But if they were recorded from a stereo pair of mics they might show slight differences if they were not equally distant from the person speaking. Either way, you should be OK just using just one channel. The usual way to convert stereo to mono is to add the two channels together, then divide by two (to prevent clipping, assuming that either or both could be more than half of full-scale). But if you really did have true stereo, and the mics were at significantly different distances, then this averaging process could cause phase cancellation that might have some effect on the analysis results. (Unlikely in most real-world cases, unless you are looking at really high speech harmonics.) Best regards, Bob Masta DAQARTA v7.60 Data AcQuisition And Real-Time Analysis www.daqarta.com Scope, Spectrum, Spectrogram, Sound Level Meter Frequency Counter, Pitch Track, Pitch-to-MIDI FREE Signal Generator, DaqMusiq generator Science with your sound card!
Reply by February 11, 20152015-02-11
I am capturing human speech on the web.
I find there are examples that people are sampling at 44.1 and 48kHz.
All of them are also stereo.

If the sole purpose of capturing the sound is for extracting features, what might be the minimum or optimal sampling rate?

Is there any value in Stereo signal?  Am I correct that the left and right speech samples are identical in stereo - so just ignoring the buffer of one channel make it into a mono as a way to convert it?