Forums

What is the Flaw in My understanding ??

Started by ranjeet December 14, 2004
Dear All !!

   ****************************************************
   Any shed of the Kowledge on this will help my me out
   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  
  I am working on the module in which i have to mix the two (audio/speech) files
  Its look simple to add the each samples of the two diffrent audio file and 
  then write into the Mixed file.

  But here comes the problem That if i simply add the two diffrent audio files
  (Each samples) then there may be over flow of the range, so I decided to  
  divide the each sample by two and then add the data and write into the file.

  what I observed that the resultant mixed wav file whcih I got has the low  
  volume, and this is obvious that as i am dividing the value of each sample by 
  two. So it is decreasing the amplitude level. 

  So I took another Way to mixed the audio files.

  Let the two signal be A and B respectively, the range is between 0 and 255.
  
   Y = A  +  B � A * B / 255

  Where Y is the resultant signal which contains both signal A and B, 
  merging two audio streams into single stream by this method solves the 
  problem of overflow and information loss to an extent.

  If the range of 8-bit sampling is between �127 to 128

  If both A and B are negative       Y = A +B � (A * B / (-127)) 
  Else				     Y = A + B � A * B / 128           

  For  n-bit sampling audio signal

  If both A and B are negative       Y = A + B � (A * B  /  (-2 pow(n-1) � 1))
  Else                               Y = A + B � (A * B /  (2 pow(n-1))

  Now the aplying the above approach I am geting the good sound qualites
  for the mixing of the two audio signal.
  But As I am increasing the number of the files to be mixed then I hear
  some sort of disturbance (Noise) in it. Means that as the number of the
  files is increased then the disturbence in the audio mixed file also
  increases. 

  WHat is the reason behind this ??? Is there is some underlying hard ware 
  Problem or The qualites Of the sound recording depend on the Recording Device 
  ??????????

  I want to have some review of your views on this.

  Personally what I think is that it may due to the folloing factors

  1: Digital computaion error
     http://www.filter-solutions.com/quant.html

  2: Due to aggressinve increase of the amplitude of the mixed file, 
     as we go on increasing the number of the audio files. I.e higher the number
     of the files the resultant values of the mixed audio fuiles will be 
     increased and will tend towards the higgher range i.e towards 32767 in the 
     case of the positive samples. and will tend towards the -32768 when the 
     two samples of the audio files are negative. { here I am talking about the
     16 bit audio data Recorded at the 8KHz sampled }

  So is there Any other approach So that I can approve my self that the Mixed
  audio data is Noise Free (At most I have to Mix the 10 Audio Files).

   One More queery is, what is the reason behind the distortion when the low
   level recording is done and when we paly the same file. Is there any 
   distortion in it. ????? and in my perception we have the distortion in the 
   recorded and the play back of the same Audio file. For which I am stating my
   views. (Correct me where ever I am wrong)

Explanation 1-->

    If we have a  good A/D-D/A converter also in recording and playback the 
    audio files, Then there comes the picture of the distortion also. we know 
    that the digital recording is extremely accurate due to its (S/N) high 
    signal to noise Ratio. Now Regarding at the low level, digital is actually
    better than analog, due to its 90 dB dynamic range. The best we can get from
    phonograph (Recording and Playing software/device) records is around 60 dB. 
    More precise is around 40 dB.

    we can hear the range of the 120-plus dB.  This is why recordings utilize a
    lot of compression (Compressor-->  a electronic device that quickly turns 
    the volume up when the music/speech is soft and quickly turns it down when 
    it is loud).

    Now here comes the Picture of the compressor which compress and the term 
    Quickly" which means some loss of digital data at the both the ends (High 
    and Low). Since low level Surroundings detail are completely stripped by 
    digitizing when we record at the low level.

    So the digitizing the low level signal lose the relevent information which 
    result in the distortion.

Note :
    In the Sound cards Use the A/D and The D/A converter and it is involved with
    the samling frequency and  It is not sure that Exact sampling frequnecy is 
    same for the difrent sound cards which may vary and very low level. So which
    also cause the Distortion at the low level.

Explainion 2-->

    Now suppose If we record the audio data from the one's system(Recording 
    Device) at the low level volume set, in the volume control. such that a 
    sound recorded at the 100% low level of the recording. And when this 
    recorded audio file is played back at the another System  at the 100% low 
    level of the volume control and if we dont vary the setting then it will 
    paly the same with out distortion

    And if there is diffrence in the Volume level control setting at which it is
    recorded and audio file played back will result in some sort of distortion.

Note :

    If there is variance in the recorded and the played back audio files volume 
    control then also their will be distortion. So for the Low level Recording  
    and listining there will be some distoortion will be seen if we play this 
    low level recorded file into another system at the very high level.


Explainnation 3-->

     Some software and the hard ware Use the Normalisation concept for various
     algorthim used. Some normalisers are basically "Volme expaders," and some 
     are the "Limiters" They stretch the dynamic range of the material, the low 
     sounds in the original remain low and that to at their original level, 
     while the level of the loudest sounds is raised peak level permiitted by 
     the recording proccess  and what eevr  lies in between is raised in level 
     pro-portionately. (Addaptive increase), Which also cause the distortion of 
     the original recorded sound. Hence to hear the low volumes sounding we have
     to increase the volume, to hear the lower volumes (soft volumes) parts of 
     audio file, Hence all the enhance signal is also plyaed causing the 
     distortion.

Note:
      Mostaly the sound Recorded under the concept of normalisation at low level
      can also cause the Distortion. Very High Music and the Speech are recorded
      at the (Compressor/Expansion) Algorthim which uses the Normalisation.


  One More Thing what is the Lowest and the upper limit for the recoerding of
  the 16 bit data 8Khz sampling frquency so that we dont have the NOISE 
  for the same recoerded and the play back audio file. ???????????????

  Any shed of the Kowledge on this will help my me out
  Thanks In Advance

Regards
Ranjeet
"ranjeet" <ranjeet.gupta@gmail.com> wrote in message
news:77c88a3b.0412141310.1397d0ff@posting.google.com...
> > [Snip] > Let the two signal be A and B respectively, the range is between 0 and
255.
> > Y = A + B - A * B / 255 > > [Snip]
I do not understand what you are trying to do here, I have not seen the approach before. But I can tell you that multiplication in time is equivalent to convolution in frequency so the spectra of signal Y(z) contains the spectra of A(z) and B(z) Plus the convolution of A(z)(*)B(z) which will add noise to the final result. The more of these signals you mix in this manner the more noise you are going to add. Scaling by 1/2 to avoid overflow will guarantee that no y(k) result will overflow, but at the cost of overall (on average) smaller signals. In making scaling decisions to prevent overflow, one approach is to think of the signals as random and look the pdfs. What is the probability that A + B will be greater than 255? Then make a tradeoff between nice large robust signals and the probabilty that every once in a while a signal may be clipped and choose a scale factor somewhere between 1 (highest probability of overflow) and 1/2 (no probability of overflow). Also, it helps to saturate on overflow (rather than wrap around) so that the overflow only appears as a slight distortion. (not a terribly wrong answer with the wrong sign) -Shawn Steenhagen
"Shawn Steenhagen" <shawn.NSsteenhagen@NSappliedsignalprocessing.com> wrote in
message news:SmJvd.731$qQ4.531@fe03.lga...
> > "ranjeet" <ranjeet.gupta@gmail.com> wrote in message > news:77c88a3b.0412141310.1397d0ff@posting.google.com... > > > > [Snip] > > Let the two signal be A and B respectively, the range is between 0 and > 255. > > > > Y = A + B - A * B / 255 > > > > [Snip] > > I do not understand what you are trying to do here, I have not seen the > approach before. But I can tell you that multiplication in time is > equivalent to convolution in frequency so the spectra of signal Y(z) > contains the spectra of A(z) and B(z) Plus the convolution of A(z)(*)B(z) > which will add noise to the final result. The more of these signals you mix > in this manner the more noise you are going to add. > > Scaling by 1/2 to avoid overflow will guarantee that no y(k) result will > overflow, but at the cost of overall (on average) smaller signals. In > making scaling decisions to prevent overflow, one approach is to think of > the signals as random and look the pdfs. What is the probability that A + B > will be greater than 255? Then make a tradeoff between nice large robust > signals and the probabilty that every once in a while a signal may be > clipped and choose a scale factor somewhere between 1 (highest probability > of overflow) and 1/2 (no probability of overflow). > > Also, it helps to saturate on overflow (rather than wrap around) so that the > overflow only appears as a slight distortion. (not a terribly wrong answer > with the wrong sign)
Regarding the scaling, since you are working with files, you may be able to analyze the results after mixing and then apply an appropriate scaling factor to maximize peak value but avoid clipping. This is usually called normalization. One simple approach would be to use a conservative scaling factor to guarantee overflow will not occur and, as you are mixing the files, keep a running tab on the maximum value you ever encounter. When finished, find the scaling factor G = full_scale/max_value, where full_scale is the maximum number your wave format can handle (probably 2^15 - 1 for 16-bit signed). Then multiply the mixed result file by G. The result should be a file whose maximum output level is as large as possible without clipping.
"ranjeet" <ranjeet.gupta@gmail.com> wrote in message 
news:77c88a3b.0412141310.1397d0ff@posting.google.com...
> Dear All !! > > **************************************************** > Any shed of the Kowledge on this will help my me out > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > I am working on the module in which i have to mix the two (audio/speech) > files > Its look simple to add the each samples of the two diffrent audio file > and > then write into the Mixed file. > > But here comes the problem That if i simply add the two diffrent audio > files > (Each samples) then there may be over flow of the range, so I decided to > divide the each sample by two and then add the data and write into the > file. > > what I observed that the resultant mixed wav file whcih I got has the low > volume, and this is obvious that as i am dividing the value of each > sample by > two. So it is decreasing the amplitude level.
The objective is to add the two files together. So far, so good. You didn't say how the files themselves were scaled in the first place - but it appears that their volume is adequate. Is that right? If you add two uncorrelated files together for mixing purposes then it may well be similar to adding to noise records together. The resulting amplitude is an increase of sqrt(2) and not 2. So, perhaps you'd do better to divide each by sqrt(2). Some amount of clipping is likely but may be acceptable. Obviously what you do is dependent on your implementation and the tools that are available. Fred
Hi Ranjeet,

if you distort your signal you get distortion. It's as simple as that.

I'm not quite sure how I should read your formulas. For example when 
you write "Y = A + B &#2013266070; (A * B  /  (-2 pow(n-1) &#2013266070; 1))" what is the * 
supposed to mean? Convolution? It can't be multiplication, because you 
write "-2 pow(n-1)" which contains an explicit multiplication that you 
don't write using '*'. And what is the last '1' standing for?

In general, if you use a nonlinear process for mixing your signals 
(which is how I *think* I can interpret your description) you are 
distorting the shape of their waveforms which will add distortion 
noise. The more signals you mix in this manner (and the more 
non-linearly you scale them) the more noise will be introduced.

As others have already said you need to scale the N signals by 1/N in 
the worst case, and if you start out with 8 bit signals you're losing a 
lot of information in the process. I would recommend you convert your 
signals to floating point first and do the mixing there.
You can then scale the sum later as you see fit, or better yet, 
normalize so your output signal fits into the target wordlength.
-- 
Stephan M. Bernsee
http://www.dspdimension.com

On 2004-12-15 07:47:21 +0100, Stephan M. Bernsee <spam@dspdimension.com> said:

> I'm not quite sure how I should read your formulas. [...] And what is > the last '1' standing for?
Ah, looks like my news reader is ballsing up the formula. When I look at it through Google groups I see that there's a minus before the '1'. In my news reader there isn't, because you didn't use a minus but an Em dash...! Nevermind. -- Stephan M. Bernsee http://www.dspdimension.com

ranjeet wrote:

> I am working on the module in which i have to mix the two (audio/speech) files > Its look simple to add the each samples of the two diffrent audio file and > then write into the Mixed file.
> But here comes the problem That if i simply add the two diffrent audio files > (Each samples) then there may be over flow of the range, so I decided to > divide the each sample by two and then add the data and write into the file.
You should add them together with one extra bit available, and then divide by two. The difference is in rounding.
> what I observed that the resultant mixed wav file whcih I got has the low > volume, and this is obvious that as i am dividing the value of each sample by > two. So it is decreasing the amplitude level. > > So I took another Way to mixed the audio files. > > Let the two signal be A and B respectively, the range is between 0 and 255. > > Y = A + B &#2013266070; A * B / 255
(snip) Don't do that. A*B is the equivalent of a modulator, with the right sign convention a balanced modulator, but not what you want when adding signals. This is the term that creates intermodulation distortion in audio signals. -- glen