I am currently trying to write a Matlab implementation of the G.729 codec
and right now, the output of my decoder sounds less than normal, to say the
least. As such, I have a couple of questions that I hope that someone on
this list can help me with:
1. In the decoder, the first pass at synthesis (i.e. w/o post-processing)
seems to me to be simply interpolating the past excitation vector (u) at
the transmitted pitch estimate. It seems to me that if the past
excitation vector is initialized to zero, then in the first subframe of
speech, the excitation will only be from the fixed codebook. If this is
true, how does this introduce voicing into the speech in later subframes?
2. How does this algorithm differentiate between voiced and unvoiced frames?
If anyone could help me out with these questions, I'd be most grateful.