What is the Flaw in My understanding ??

Started by December 14, 2004
```Dear All !!

****************************************************
Any shed of the Kowledge on this will help my me out
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I am working on the module in which i have to mix the two (audio/speech) files
Its look simple to add the each samples of the two diffrent audio file and
then write into the Mixed file.

But here comes the problem That if i simply add the two diffrent audio files
(Each samples) then there may be over flow of the range, so I decided to
divide the each sample by two and then add the data and write into the file.

what I observed that the resultant mixed wav file whcih I got has the low
volume, and this is obvious that as i am dividing the value of each sample by
two. So it is decreasing the amplitude level.

So I took another Way to mixed the audio files.

Let the two signal be A and B respectively, the range is between 0 and 255.

Y = A  +  B &#2013266070; A * B / 255

Where Y is the resultant signal which contains both signal A and B,
merging two audio streams into single stream by this method solves the
problem of overflow and information loss to an extent.

If the range of 8-bit sampling is between &#2013266070;127 to 128

If both A and B are negative       Y = A +B &#2013266070; (A * B / (-127))
Else				     Y = A + B &#2013266070; A * B / 128

For  n-bit sampling audio signal

If both A and B are negative       Y = A + B &#2013266070; (A * B  /  (-2 pow(n-1) &#2013266070; 1))
Else                               Y = A + B &#2013266070; (A * B /  (2 pow(n-1))

Now the aplying the above approach I am geting the good sound qualites
for the mixing of the two audio signal.
But As I am increasing the number of the files to be mixed then I hear
some sort of disturbance (Noise) in it. Means that as the number of the
files is increased then the disturbence in the audio mixed file also
increases.

WHat is the reason behind this ??? Is there is some underlying hard ware
Problem or The qualites Of the sound recording depend on the Recording Device
??????????

I want to have some review of your views on this.

Personally what I think is that it may due to the folloing factors

1: Digital computaion error
http://www.filter-solutions.com/quant.html

2: Due to aggressinve increase of the amplitude of the mixed file,
as we go on increasing the number of the audio files. I.e higher the number
of the files the resultant values of the mixed audio fuiles will be
increased and will tend towards the higgher range i.e towards 32767 in the
case of the positive samples. and will tend towards the -32768 when the
two samples of the audio files are negative. { here I am talking about the
16 bit audio data Recorded at the 8KHz sampled }

So is there Any other approach So that I can approve my self that the Mixed
audio data is Noise Free (At most I have to Mix the 10 Audio Files).

One More queery is, what is the reason behind the distortion when the low
level recording is done and when we paly the same file. Is there any
distortion in it. ????? and in my perception we have the distortion in the
recorded and the play back of the same Audio file. For which I am stating my
views. (Correct me where ever I am wrong)

Explanation 1-->

If we have a  good A/D-D/A converter also in recording and playback the
audio files, Then there comes the picture of the distortion also. we know
that the digital recording is extremely accurate due to its (S/N) high
signal to noise Ratio. Now Regarding at the low level, digital is actually
better than analog, due to its 90 dB dynamic range. The best we can get from
phonograph (Recording and Playing software/device) records is around 60 dB.
More precise is around 40 dB.

we can hear the range of the 120-plus dB.  This is why recordings utilize a
lot of compression (Compressor-->  a electronic device that quickly turns
the volume up when the music/speech is soft and quickly turns it down when
it is loud).

Now here comes the Picture of the compressor which compress and the term
Quickly" which means some loss of digital data at the both the ends (High
and Low). Since low level Surroundings detail are completely stripped by
digitizing when we record at the low level.

So the digitizing the low level signal lose the relevent information which
result in the distortion.

Note :
In the Sound cards Use the A/D and The D/A converter and it is involved with
the samling frequency and  It is not sure that Exact sampling frequnecy is
same for the difrent sound cards which may vary and very low level. So which
also cause the Distortion at the low level.

Explainion 2-->

Now suppose If we record the audio data from the one's system(Recording
Device) at the low level volume set, in the volume control. such that a
sound recorded at the 100% low level of the recording. And when this
recorded audio file is played back at the another System  at the 100% low
level of the volume control and if we dont vary the setting then it will
paly the same with out distortion

And if there is diffrence in the Volume level control setting at which it is
recorded and audio file played back will result in some sort of distortion.

Note :

If there is variance in the recorded and the played back audio files volume
control then also their will be distortion. So for the Low level Recording
and listining there will be some distoortion will be seen if we play this
low level recorded file into another system at the very high level.

Explainnation 3-->

Some software and the hard ware Use the Normalisation concept for various
algorthim used. Some normalisers are basically "Volme expaders," and some
are the "Limiters" They stretch the dynamic range of the material, the low
sounds in the original remain low and that to at their original level,
while the level of the loudest sounds is raised peak level permiitted by
the recording proccess  and what eevr  lies in between is raised in level
pro-portionately. (Addaptive increase), Which also cause the distortion of
the original recorded sound. Hence to hear the low volumes sounding we have
to increase the volume, to hear the lower volumes (soft volumes) parts of
audio file, Hence all the enhance signal is also plyaed causing the
distortion.

Note:
Mostaly the sound Recorded under the concept of normalisation at low level
can also cause the Distortion. Very High Music and the Speech are recorded
at the (Compressor/Expansion) Algorthim which uses the Normalisation.

One More Thing what is the Lowest and the upper limit for the recoerding of
the 16 bit data 8Khz sampling frquency so that we dont have the NOISE
for the same recoerded and the play back audio file. ???????????????

Any shed of the Kowledge on this will help my me out

Regards
Ranjeet
```
```"ranjeet" <ranjeet.gupta@gmail.com> wrote in message
>
>  [Snip]
>   Let the two signal be A and B respectively, the range is between 0 and
255.
>
>    Y = A  +  B - A * B / 255
>
> [Snip]

I do not understand what you are trying to do here, I have not seen the
approach before.  But I can tell you that multiplication in time is
equivalent to convolution in frequency so the spectra of signal Y(z)
contains the spectra of A(z) and B(z) Plus the convolution of A(z)(*)B(z)
which will add noise to the final result.  The more of these signals you mix
in this manner the more noise you are going to add.

Scaling by 1/2 to avoid overflow will guarantee that no y(k) result will
overflow, but at the cost of overall (on average) smaller signals.  In
making scaling decisions to prevent overflow, one approach is to think of
the signals as random and look the pdfs.  What is the probability that A + B
will be greater than 255?  Then make a tradeoff between nice large robust
signals and the probabilty that every once in a while a signal may be
clipped and choose a scale factor somewhere between 1 (highest probability
of overflow) and 1/2 (no probability of overflow).

Also, it helps to saturate on overflow (rather than wrap around) so that the
overflow only appears as a slight distortion.  (not a terribly wrong answer
with the wrong sign)

-Shawn Steenhagen

```
```"Shawn Steenhagen" <shawn.NSsteenhagen@NSappliedsignalprocessing.com> wrote in
message news:SmJvd.731\$qQ4.531@fe03.lga...
>
> "ranjeet" <ranjeet.gupta@gmail.com> wrote in message
> >
> >  [Snip]
> >   Let the two signal be A and B respectively, the range is between 0 and
> 255.
> >
> >    Y = A  +  B - A * B / 255
> >
> > [Snip]
>
> I do not understand what you are trying to do here, I have not seen the
> approach before.  But I can tell you that multiplication in time is
> equivalent to convolution in frequency so the spectra of signal Y(z)
> contains the spectra of A(z) and B(z) Plus the convolution of A(z)(*)B(z)
> which will add noise to the final result.  The more of these signals you mix
> in this manner the more noise you are going to add.
>
> Scaling by 1/2 to avoid overflow will guarantee that no y(k) result will
> overflow, but at the cost of overall (on average) smaller signals.  In
> making scaling decisions to prevent overflow, one approach is to think of
> the signals as random and look the pdfs.  What is the probability that A + B
> will be greater than 255?  Then make a tradeoff between nice large robust
> signals and the probabilty that every once in a while a signal may be
> clipped and choose a scale factor somewhere between 1 (highest probability
> of overflow) and 1/2 (no probability of overflow).
>
> Also, it helps to saturate on overflow (rather than wrap around) so that the
> overflow only appears as a slight distortion.  (not a terribly wrong answer
> with the wrong sign)

Regarding the scaling, since you are working with files, you may be able to
analyze the results after mixing and then apply an appropriate scaling factor to
maximize peak value but avoid clipping.  This is usually called normalization.
One simple approach would be to use a conservative scaling factor to guarantee
overflow will not occur and, as you are mixing the files, keep a running tab on
the maximum value you ever encounter.  When finished, find the scaling factor G
= full_scale/max_value, where full_scale is the maximum number your wave format
can handle (probably 2^15 - 1 for 16-bit signed).  Then multiply the mixed
result file by G.  The result should be a file whose maximum output level is as
large as possible without clipping.

```
```"ranjeet" <ranjeet.gupta@gmail.com> wrote in message
> Dear All !!
>
>   ****************************************************
>   Any shed of the Kowledge on this will help my me out
>   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
>  I am working on the module in which i have to mix the two (audio/speech)
> files
>  Its look simple to add the each samples of the two diffrent audio file
> and
>  then write into the Mixed file.
>
>  But here comes the problem That if i simply add the two diffrent audio
> files
>  (Each samples) then there may be over flow of the range, so I decided to
>  divide the each sample by two and then add the data and write into the
> file.
>
>  what I observed that the resultant mixed wav file whcih I got has the low
>  volume, and this is obvious that as i am dividing the value of each
> sample by
>  two. So it is decreasing the amplitude level.

The objective is to add the two files together.  So far, so good.

You didn't say how the files themselves were scaled in the first place - but
it appears that their volume is adequate.  Is that right?

If you add two uncorrelated files together for mixing purposes then it may
well be similar to adding to noise records together.  The resulting
amplitude is an increase of sqrt(2) and not 2.  So, perhaps you'd do better
to divide each by sqrt(2).  Some amount of clipping is likely but may be
acceptable.  Obviously what you do is dependent on your implementation and
the tools that are available.

Fred

```
```Hi Ranjeet,

if you distort your signal you get distortion. It's as simple as that.

I'm not quite sure how I should read your formulas. For example when
you write "Y = A + B &#2013266070; (A * B  /  (-2 pow(n-1) &#2013266070; 1))" what is the *
supposed to mean? Convolution? It can't be multiplication, because you
write "-2 pow(n-1)" which contains an explicit multiplication that you
don't write using '*'. And what is the last '1' standing for?

In general, if you use a nonlinear process for mixing your signals
(which is how I *think* I can interpret your description) you are
distorting the shape of their waveforms which will add distortion
noise. The more signals you mix in this manner (and the more
non-linearly you scale them) the more noise will be introduced.

As others have already said you need to scale the N signals by 1/N in
the worst case, and if you start out with 8 bit signals you're losing a
lot of information in the process. I would recommend you convert your
signals to floating point first and do the mixing there.
You can then scale the sum later as you see fit, or better yet,
normalize so your output signal fits into the target wordlength.
--
Stephan M. Bernsee
http://www.dspdimension.com

```
```On 2004-12-15 07:47:21 +0100, Stephan M. Bernsee <spam@dspdimension.com> said:

> I'm not quite sure how I should read your formulas. [...] And what is
> the last '1' standing for?

Ah, looks like my news reader is ballsing up the formula. When I look
at it through Google groups I see that there's a minus before the '1'.

In my news reader there isn't, because you didn't use a minus but an Em
dash...!
Nevermind.
--
Stephan M. Bernsee
http://www.dspdimension.com

```
```
ranjeet wrote:

>   I am working on the module in which i have to mix the two (audio/speech) files
>   Its look simple to add the each samples of the two diffrent audio file and
>   then write into the Mixed file.

>   But here comes the problem That if i simply add the two diffrent audio files
>   (Each samples) then there may be over flow of the range, so I decided to
>   divide the each sample by two and then add the data and write into the file.

You should add them together with one extra bit available, and then
divide by two.  The difference is in rounding.

>   what I observed that the resultant mixed wav file whcih I got has the low
>   volume, and this is obvious that as i am dividing the value of each sample by
>   two. So it is decreasing the amplitude level.
>
>   So I took another Way to mixed the audio files.
>
>   Let the two signal be A and B respectively, the range is between 0 and 255.
>
>    Y = A  +  B &#2013266070; A * B / 255

(snip)

Don't do that.  A*B is the equivalent of a modulator, with the
right sign convention a balanced modulator, but not what you want
when adding signals.   This is the term that creates intermodulation
distortion in audio signals.

-- glen

```