Echo Cancellation & Environment Calibration

Started by jeffhcliu June 4, 2007
Hi everyone!

Please allow me to start off by clarifying that I have very very
little knowledge on the subjects of DSP and echo cancellation and
please forgive me if I sound ignorant. I have tried to read up on them
but I am still a bit unsure about where to start for what I want to do.

Right now, I have implemented an automatic music composition
application in Visual C++ Express Edition which generates MIDI music
(a hardware/software synthesizer might be incorporated later on). It
composes music derived from visual data it receives through a webcam
and is also able to detect human faces and general movements.

What I intend to do now is to gather audio input from the physical
environment (e.g. in a room or hall and by using a microphone) which
the program gets its visual information from. It should ideally be
able to determine voices properties of people who speak within the
visual frame (i.e. it detects mouth and jaw movements to assess
whether it thinks a person in view is speaking, it then processes the
audio input (especially the human voice frequency range?) to "confirm"
that the is a human voice in that data. If that is the case, the
program would then extract the properties of that voice (by using FFT?)

Echo cancellation will have to be involved because the music generated
by the program would very likely be picked up by the microphone. Any
suggestions to combat this? The main reason I started my own thread
(hopefully, this will become a thread :) ) is because I could not find
material that deals with my scenario (maybe I'm just being greedy), at
least not in the manner that I can relate to and understand.

You see, due to the fact that it is a composition software (therefore,
I would know the rate that which the music is going to be outputted
and probably heard by the microphone) as well as having the jaw/mouth
movement detection mechanism, I think what is required might be
slightly different from the general echo cancellation methods and it
might not need to be as "powerful".

>From what I understand and what I mean by "environment calibration" is
that audio calibration will be necessary at the start of the program
in order to find out the lag/distance from the microphone to the
speaker as well as the acoustic properties of the environment (unit

I guess, the two main questions are:

1. Echo cancellation in the context of my composer software
2. Environment audio calibration to find lag and acoustic properties

Thank you for bearing with me through such a long post. Any
advice/suggestions regarding the two questions will be greatly
appreciated. Thank you.