On Tue, 10 Feb 2015 22:36:01 -0800 (PST), sash236@gmail.com
wrote:

>I am capturing human speech on the web.
>I find there are examples that people are sampling at 44.1 and 48kHz.
>All of them are also stereo.
>
>If the sole purpose of capturing the sound is for extracting features, what might be the minimum or optimal sampling rate?

That depends a lot on what features you are trying to
extract.  Consider that "telephone quality" is 300-3000 Hz,
implying that an 8000 Hz sample rate would be fine.  

>Is there any value in Stereo signal?  Am I correct that the left and right speech samples are identical in stereo - so just ignoring the buffer of one channel make it into a mono as a way to convert it?

They'd be identical if they were recorded from a single mono
mic, which is likely the case.  But if they were recorded
from a stereo pair of mics they might show slight
differences if they were not equally distant from the person
speaking.   Either way, you should be OK just using just one
channel.

The usual way to convert stereo to mono is to add the two
channels together, then divide by two (to prevent clipping,
assuming that either or both could be more than half of
full-scale).  But if you really did have true stereo, and
the mics were at significantly different distances, then
this averaging process could cause phase cancellation that
might have some effect on the analysis results.  (Unlikely
in most real-world cases, unless you are looking at really
high speech harmonics.)

Best regards,

Bob Masta

              DAQARTA  v7.60
   Data AcQuisition And Real-Time Analysis
              www.daqarta.com
Scope, Spectrum, Spectrogram, Sound Level Meter
 Frequency Counter, Pitch Track, Pitch-to-MIDI 
   FREE Signal Generator, DaqMusiq generator    
          Science with your sound card!

I am capturing human speech on the web.
I find there are examples that people are sampling at 44.1 and 48kHz.
All of them are also stereo.

If the sole purpose of capturing the sound is for extracting features, what might be the minimum or optimal sampling rate?

Is there any value in Stereo signal?  Am I correct that the left and right speech samples are identical in stereo - so just ignoring the buffer of one channel make it into a mono as a way to convert it?