The PARSHL Program
This appendix is adapted from the original paper describing the PARSHL program [271] for sinusoidal modeling of audio. While many of the main points are summarized elsewhere in the text, the PARSHL paper is included here as a source of more detailed info on carrying out elementary sinusoidal modeling of sound based on the STFT.
As mentioned in §G.7.1, the phase vocoder was a widely used analysis tool for additive synthesis starting in the 1970s. A difficulty with the phase vocoder, as traditionally implemented, is that it uses a fixed uniform filter bank. While this works well for periodic signals, it is relatively inconvenient for inharmonic signals. An ``inharmonic phase vocoder'' called PARSHLH.1 was developed in 1985 to address this problem in the context of piano signal modeling [271]. PARSHL worked by tracking peaks in the short-time Fourier transform (STFT), thereby synthesizing an adaptive inharmonic FIR filter bank, replacing the fixed uniform filter bank of the vocoder. In other respects, PARSHL could be regarded as a phase-vocoder analysis program.
The PARSHL program converted an STFT to a set of amplitude and frequency envelopes for inharmonic, quasi-sinusoidal-sum signals. Only the most prominent peaks in the spectrum of the input signal were tracked. For quasi harmonic sounds, such as the piano, the amplitudes and frequencies were sampled approximately once per period of the lowest frequency in the analysis band. For resynthesis, PARSHL supported both additive synthesis [233] using an oscillator bank and overlap-add reconstruction from the STFT, or both.
PARSHL followed the amplitude, frequency, and phaseH.2 of the most prominent peaks over time in a series of spectra, computed using the Fast Fourier Transform (FFT) The synthesis part of the program used the analysis parameters, or their modification, to generate a sinewave in the output for each peak track found.
The steps carried out by PARSHL were as follows:
- Compute the STFT
using the frame size, window
type, FFT size, and hop size specified by the user.
- Compute the squared magnitude spectrum in dB
(
).
- Find the bin numbers (frequency samples) of the spectral peaks.
Parabolic interpolation is used to refine the peak location
estimates. Three spectral samples (in dB) consisting of the local peak
in the FFT and the samples on either side of it suffice to determine
the parabola used.
- The magnitude and phase of each peak is calculated from the
maximum of the parabola determined in the previous step. The parabola
is evaluated separately on the real and imaginary parts of the
spectrum to provide a complex interpolated spectrum value.
- Each peak is assigned to a frequency track by matching the
peaks of the previous frame with the current one. These tracks can be
``started up,'' ``turned-off'' or ``turned-on'' at any frame by
ramping in amplitude from or toward 0
.
- Arbitrary modifications can be applied to the analysis
parameters before resynthesis.
- If additive synthesis is requested, a sinewave is generated for
each frequency track, and all are summed into an output buffer. The
instantaneous amplitude, frequency, and phase for each sinewave are
calculated by interpolating the values from frame to frame. The
length of the output buffer is equal to the hop size
which is
typically some fraction of the window length
.
- Repeat from step 1, advancing
samples each iteration until
the end of the input sound is reached.
The following sections provide further details:
Choice of Hop Size
A question related to the STFT analysis window is the hop size , i.e., how much we can advance the analysis time origin from frame to frame. This depends very much on the purposes of the analysis. In general, more overlap will give more analysis points and therefore smoother results across time, but the computational expense is proportionately greater. For purposes of spectrogram display or additive synthesis parameter extraction, a conservative constraint is to require that the analysis window overlap-add to a constant at the chosen hop size:
where denotes the FFT window, and is the hop size in samples. This constant overlap-add (COLA) constraint ensures that the successive frames will overlap in time in such a way that all data are weighted equally.
The COLA constraint can be overly conservative for steady-state signals. For additive synthesis purposes, it is more efficient and still effective to increase the hop size to the number of samples over which the spectrum is not changing appreciably. In the case of the steady-state portion of piano tones, the hop size appears to be limited by the fastest amplitude envelope ``beat'' frequency caused by mistuning strings on one key or by overlapping partials from different keys.
For certain window types (such as sum-of-cosine windows, as discussed in Chapter 3), there exist perfect overlap factors in the sense of (H.1). For example, a rectangular window can hop by , where is any positive integer, and a Hanning or Hamming window can use any hop size of the form . For the Kaiser window, in contrast, there is no perfect hop size other than .
The COLA criterion for windows and their hop sizes is not the best perspective to take when overlap-add synthesis is being constructed from the modified spectra [7]. As discussed in Chapter 9, the hop size is the decimation factor applied to each FFT filter-bank output, and the window is the envelope of each filter's impulse response. The decimation by causes aliasing, and the frame rate is equal to twice the ``folding frequency'' of this aliasing. Consequently, to minimize aliasing, the choice of hop size should be such that the folding frequency exceeds the ``cut-off frequency'' of the window. The cut-off frequency of a window can be defined as the frequency above which the window transform magnitude is less than or equal to the worst-case side-lobe level. For convenience, we typically use the frequency of the first zero-crossing beyond the main lobe as the definition of cut-off frequency. Following this rule yields overlap for the rectangular window, overlap for Hamming and Hanning windows, and (5/6) overlap for Blackman windows. The hop size usable with a Kaiser window is determined by its design parameters (principally, the desired time-bandwidth product of the window, or, the ``beta'' parameter) [115].
One may wonder what happens to aliasing in the perfect-reconstruction case in which (H.1) is satisfied. The answer is that aliasing does occur in the individual filter-bank outputs, but this aliasing is canceled in the reconstruction by overlap-add if there were no modifications to the STFT. For a general discussion of aliasing cancellation in decimated filter banks, see Chapter 11 (especially §11.4.5) and/or [287].
Filling the FFT Input Buffer (Step 2)
The FFT size is normally chosen to be the first power of two that is at least twice the window length , with the difference filled with zeros (``zero-padded''). The reason for increasing the FFT size and filling in with zeros is that zero-padding in the time domain corresponds to interpolation in the frequency domain, and interpolating the spectrum is useful in various ways. First, the problem of finding spectral peaks which are not exact bin frequencies is made easier when the spectrum is more densely sampled. Second, plots of the magnitude of the more smoothly sampled spectrum are less likely to confuse the untrained eye. (Only signals truly periodic in samples should not be zero-padded. They should also be windowed only by the Rectangular window.) Third, for overlap-add synthesis from spectral modifications, the zero-padding allows for multiplicative modification in the frequency domain (convolutional modification in the time domain) without time aliasing in the inverse FFT. The length of the allowed convolution in the time domain (the impulse response of the effective digital filter) equals the number of extra zeros (plus one) in the zero padding.
If is the number of samples in the main lobe when the zero-padding factor is 1 ( ), then a zero-padding factor of gives samples for the same main lobe (and same main-lobe bandwidth). The zero-padding (interpolation) factor should be large enough to enable accurate estimation of the true maximum of the main lobe after it has been frequency shifted by some arbitrary amount equal to the frequency of a sinusoidal component in the input signal. We have determined by computational search that, for a rectangularly windowed sinusoid (of any frequency), quadratic frequency interpolation (using the three highest bins) yields at least (of the distance from the sinc peak to the first zero-crossing) accuracy if the zero-padding factor is 5 or higher.
As mentioned in the previous section, we facilitate phase detection by using a zero-phase window, i.e., the windowed data (using an odd length window) is centered about the time origin. A zero-centered, length data frame appears in the length FFT input buffer as shown in Fig.H.1c. The first samples of the windowed data, the ``negative-time'' portion, will be stored at the end of the buffer, from sample to , and the remaining samples, the zero- and ``positive-time'' portion, will be stored starting at the beginning of the buffer, from sample 0 to . Thus, all zero padding occurs in the middle of the FFT input buffer.
Peak Detection (Steps 3 and 4)
Due to the sampled nature of spectra obtained using the STFT, each peak (location and height) found by finding the maximum-magnitude frequency bin is only accurate to within half a bin. A bin represents a frequency interval of Hz, where is the FFT size. Zero-padding increases the number of FFT bins per Hz and thus increases the accuracy of the simple peak detection. However, to obtain frequency accuracy on the level of of the distance from a sinc maximum to its first zero crossing (in the case of a rectangular window), the zero-padding factor required is . (Note that with no zero padding, the STFT analysis parameters are typically arranged so that the distance from the sinc peak to its first zero-crossing is equal to the fundamental frequency of a harmonic sound. Under these conditions, of this interval is equal to the relative accuracy in the fundamental frequency measurement. Thus, this is a realistic specification in view of pitch discrimination accuracy.) Since we would nominally take two periods into the data frame (for a Rectangular window), a Hz sinusoid at a sampling rate of KHz would have a period of samples, so that the FFT size would have to exceed one million. A more efficient spectral interpolation scheme is to zero-pad only enough so that quadratic (or other simple) spectral interpolation, using only bins immediately surrounding the maximum-magnitude bin, suffices to refine the estimate to accuracy. PARSHL uses a parabolic interpolator which fits a parabola through the highest three samples of a peak to estimate the true peak location and height (cf. Fig.H.2).
We have seen that each sinusoid appears as a shifted window transform which is a sinc-like function. A robust method for estimating peak frequency with very high accuracy would be to fit a window transform to the sampled spectral peaks by cross-correlating the whole window transform with the entire spectrum and taking and interpolated peak location in the cross-correlation function as the frequency estimate. This method offers much greater immunity to noise and interference from other signal components.
To describe the parabolic interpolation strategy, let's define a coordinate system centered at , where is the bin number of the spectral magnitude maximum, i.e., for all . An example is shown in Figure 4. We desire a general parabola of the form
(H.2) |
such that , , and , where , , and are the values of the three highest samples:
(H.3) | |||
(H.4) | |||
(H.5) |
We have found empirically that the frequencies tend to be about twice as accurate when dB magnitude is used rather than just linear magnitude. An interesting open question is what is the optimum nonlinear compression of the magnitude spectrum when quadratically interpolating it to estimate peak locations.
Solving for the parabola peak location , we get
(H.6) |
and the estimate of the true peak location (in bins) will be
(H.7) |
and the peak frequency in Hz is . Using , the peak height estimate is then
(H.8) |
The magnitude spectrum is used to find , but can be computed separately for the real and imaginary parts of the complex spectrum to yield a complex-valued peak estimate (magnitude and phase).
Once an interpolated peak location has been found, the entire local maximum in the spectrum is removed. This allows the same algorithm to be used for the next peak. This peak detection and deletion process is continued until the maximum number of peaks specified by the user is found.
Peak Matching (Step 5)
The peak detection process returns the prominent peaks in a given frame sorted by frequency. The next step is to assign some subset of these peaks to oscillator trajectories, which is done by the peak matching algorithm. If the number of spectral peaks were constant with slowly changing amplitudes and frequencies along the sound, this task would be straightforward. However, it is not always immediately obvious how to connect the spectral peaks of one frame with those of the next.
To describe the peak matching process, let's assume that the frequency tracks were initialized at frame and we are currently at frame . Suppose that at frame the frequency values for the tracks are , and that we want to match them to the peaks, with frequencies , of frame .
Each track looks for its peak in frame by finding the one which is closest in frequency to its current value. The th track claims frequency for which is minimum. The change in frequency must be less than a specified maximum , which can be a frequency-dependent limit (e.g., linear, corresponding to a relative frequency change limit). The possible situations are as follows:
12pt (1) If a match is found inside the maximum change limit, the track is continued (unless there is a conflict to resolve, as described below).
12pt (2) If no match is made, it is assumed that the track with frequency must ``turn off'' entering frame , and is matched to itself with zero magnitude. Since oscillator amplitudes are linearly ramped from one the frame to the next, the terminating track will ramp to zero over the duration of one frame hop. This track will still exist (at zero amplitude), and if it ever finds a frame with a spectral peak within its capture range , it will ``turned back on,'' ramping its amplitude up to the newly detected value. It is sometimes necessary to introduce some hysteresis into the turning on and off process in order to prevent ``burbling'' of the tracks whose peaks sometimes make the cut and sometimes don't. Normally this problem can be avoided by searching for many more spectral peaks than there are oscillators to allocate.
12pt (3) If a track finds a match which has already been claimed by another track, we give the peak to the track which is closest in frequency, and the ``losing'' track looks for another match. If the current track loses the conflict, it simply picks the best available non-conflicting peak. If the current track wins the conflict, it calls the assignment procedure recursively on behalf of the dislodged track. When the dislodged track finds the same peak and wants to claim it, it will see there is a conflict which it loses and will move on. This process is repeated for each track, solving conflicts recursively, until all existing tracks are matched or ``turned-off''.
After each track has extended itself forward in time or turned off, the peaks of frame which have not been used are considered to be new trajectories and a new track is ``started-up'' for each one of them up to the maximum number of oscillators specified (which again should be less than the number of spectral peaks detected). The new oscillator tracks are started at frame with zero magnitude and ramp to the correct amplitude at the current frame .
Once the program has finished, the peak trajectories for a sound look as in Fig.H.4.
Parameter Modifications (Step 6)
The possibilities that STFT techniques offer for modifying the analysis results before resynthesis have an enormous number of musical applications. Quatieri and McAulay [222] give a good discussion of some useful modifications for speech applications. By scaling and/or resampling the amplitude and the frequency trajectories, a host of sound transformations can be accomplished.
Time-scale modifications can be accomplished by resampling the amplitude, frequency, and phase trajectories. This can be done simply by changing the hop size in the resynthesis (although for best results the hop size should change adaptively, avoiding time-scale modifications during voice consonants or attacks, for example). This has the effect of slowing down or speeding up the sound while maintaining pitch and formant structure. Obviously this can also be done for a time-varying modification by having a time-varying hop size . However, due to the sinusoidal representation, when a considerable time stretch is done in a ``noisy'' part of a sound, the individual sinewaves start to be heard and the noise-like quality is lost.
Frequency transformations, with or without time scaling, are also possible. A simple one is to scale the frequencies to alter pitch and formant structure together. A more powerful class of spectral modifications comes about by decoupling the sinusoidal frequencies (which convey pitch and inharmonicity information) from the spectral envelope (which conveys formant structure so important to speech perception and timbre). By measuring the formant envelope of a harmonic spectrum (e.g., by drawing straight lines or splines across the tops of the sinusoidal peaks in the spectrum and then smoothing), modifications can be introduced which only alter the pitch or only alter the formants. Other ways to measure formant envelopes include cepstral smoothing [198] and the fitting of low-order LPC models to the inverse FFT of the squared magnitude of the spectrum [157]. By modulating the flattened (by dividing out the formant envelope) spectrum of one sound by the formant-envelope of a second sound, ``cross-synthesis'' is obtained. Much more complex modifications are possible.
Not all spectral modifications are ``legal,'' however. As mentioned earlier, multiplicative modifications (simple filtering, equalization, etc.) are straightforward; we simply zero-pad sufficiently to accommodate spreading in time due to convolution. It is also possible to approximate nonlinear functions of the spectrum in terms of polynomial expansions (which are purely multiplicative). When using data derived filters, such as measured formant envelopes, it is a good idea to smooth the spectral envelopes sufficiently that their inverse FFT is shorter in duration than the amount of zero-padding provided. One way to monitor time-aliasing distortion is to measure the signal energy at the midpoint of the inverse-FFT output buffer, relative to the total energy in the buffer, just before adding it to the final outgoing overlap-add reconstruction; little relative energy in the ``maximum-positive'' and ``minimum negative'' time regions indicates little time aliasing. The general problem to avoid here is drastic spectral modifications which correspond to long filters in the time domain for which insufficient zero-padding has been provided. An inverse FFT of the spectral modification function will show its time duration and indicate zero-padding requirements. The general rule (worth remembering in any audio filtering context) is ``be gentle in the frequency domain.''
Synthesis (Step 7)
The analysis portion of PARSHL returns a set of amplitudes , frequencies , and phases , for each frame index , with a ``triad'' ( ) for each track . From this analysis data the program has the option of generating a synthetic sound.
The synthesis is done one frame at a time. The frame at hop , specifies the synthesis buffer
(H.9) |
where is the number of tracks present at frame ; ; and is the length of the synthesis buffer (without any time scaling , the analysis hop size). To avoid ``clicks'' at the frame boundaries, the parameters ( ) are smoothly interpolated from frame to frame.
The parameter interpolation across time used in PARSHL is the same as that used by McAulay and Quatieri [174]. Let ( ) and ( ) denote the sets of parameters at frames and for the th frequency track. They are taken to represent the state of the signal at time 0 (the left endpoint) of the frame.
The instantaneous amplitude is easily obtained by linear interpolation,
(H.10) |
where is the time sample into the th frame.
Frequency and phase values are tied together (frequency is the phase derivative), and they both control the instantaneous phase . Given that four variables are affecting the instantaneous phase: , and , we need at least three degrees of freedom for its control, while linear interpolation only gives one. Therefore, we need at least a cubic polynomial as interpolation function, of the form
(H.11) |
We will not go into the details of solving this equation since McAulay and Quatieri [174] go through every step. We will simply state the result:
(H.12) |
where and can be calculated using the end conditions at the frame boundaries,
(H.13) | |||
(H.14) |
This will give a set of interpolating functions depending on the value of , among which we have to select the ``maximally smooth'' one. This can be done by choosing to be the integer closest to , where is [174, Eq.(36)]
(H.15) |
and finally, the synthesis equation turns into
(H.16) |
which smoothly goes from frame to frame and where each sinusoid accounts for both the rapid phase changes (frequency) and the slowly varying phase changes.
Figure H.5 shows the result of the analysis/synthesis process using phase information and applied to a piano tone.
Magnitude-only Analysis/Synthesis
A traditional result of sound perception is that the ear is sensitive principally to the short-time spectral magnitude and not to the phase, provided phase continuity is maintained. Our experience has been that this may or may not be true depending on the application, and in §H.0.9 we will discuss it. Obviously if the phase information is discarded, the analysis, the modification, and the resynthesis processes are simplified enormously. Thus we will use the magnitude-only option of the program whenever the application allows it.
In the peak detection process we calculate the magnitude and phase of each peak by using the complex spectrum. Once we decide to discard the phase information there is no need for complex spectra and we simply can calculate the magnitude of the peak by doing the parabolic interpolation directly on the log magnitude spectrum.
The synthesis also becomes easier; there is no need for a cubic function to interpolate the instantaneous phase. The phase will be a function of the instantaneous frequency and the only condition is phase continuity at the frame boundaries. Therefore, the frequency can be linearly interpolated from frame to frame, like the amplitude. Without phase matching the synthesized waveform will look very different from the original (Fig.H.5), but the sound quality for many applications will be perceptually the same.
Preprocessing
The task of the program can be simplified and the analysis/synthesis results improved if the sound input is appropriately manipulated before running the program.
Most important is to equalize the input signal. This controls what it means to find spectral peaks in order of decreasing magnitude. Equalization can be accomplished in many ways and here we present some alternatives.
12pt (1) A good equalization strategy for audio applications is to weight the incoming spectrum by the inverse of the equal-loudness contour for hearing at some nominal listening level (e.g., dB). This makes spectral magnitude ordering closer to perceptual audibility ordering.
12pt (2) For more analytical work, the spectrum can be equalized to provide all partials at nearly the same amplitude (e.g., the asymptotic roll-off of all natural spectra can be eliminated). In this case, the peak finder is most likely to find and track all of the partials.
12pt (3) A good equalization for noise-reduction applications is to ``flatten'' the noise floor. This option is useful when it is desired to set a fixed (frequency-independent) track rejection threshold just above the noise level.
12pt (4) A fourth option is to perform adaptive equalization of types (2) or (3) above. That is, equalize each spectrum independently, or compute the equalization as a function of a weighted average of the most recent power spectrum (FFT squared magnitude) estimates.
Apart from equalization, another preprocessing strategy which has proven very useful is to reverse the sound in time. The attack of most sounds is quite ``noisy'' and PARSHL has a hard time finding the relevant partials in such a complex spectrum. Once the sound is reversed the program will encounter the end of the sound first, and since in most instrumental sounds this is a very stable part, the program will find a very clear definition of the partials. When the program gets to the sound attack, it will already be tracking the main partials. Since PARSHL has a fixed number of oscillators which can be allocated to discovered tracks, and since each track which disappears removes its associated oscillator from the scene forever,H.3 analyzing the sound tail to head tends to allocate the oscillators to the most important frequency tracks first.
Applications
The simplest application of PARSHL is as an analysis tool since we can get a very good picture of the evolution of the sound in time by looking at the amplitude, frequency and phase trajectories. The tracking characteristics of the technique yield more accurate amplitudes and frequencies than if the analysis were done with an equally spaced bank of filters (the traditional STFT implementation).
In speech applications, the most common use of the STFT is for data-reduction. With a set of amplitude, frequency and phase functions we can get a very accurate resynthesis of many sounds with much less information than for the original sampled sounds. From our work it is still not clear how important is the phase information in the case of resynthesis without modifications, but McAulay and Quatieri [174] have shown the importance of phase in the case of speech resynthesis.
Some of the most interesting musical applications of the STFT techniques are given by their ability to separate temporal from spectral information, and, within each spectrum, pitch and harmonicity from formant information. In §H.0.5, Parameter Modifications, we discussed some of them, such as time scaling and pitch transposition. But this group of applications has a lot of possibilities that still need to be carefully explored. From the few experiments we have done to date, the tools presented give good results in situations where less flexible implementations do not, namely, when the input sound has inharmonic spectra and/or rapid frequency changes.
The main characteristic that differentiates this model from the traditional ones is the selectivity of spectral information and the phase tracking. This opens up new applications that are worth our attention. One of them is the use of additive synthesis in conjunction with other synthesis techniques. Since the program allows tracking of specific spectral components of a sound, we have the flexibility of synthesizing only part of a sound with additive, synthesis, leaving the rest for some other technique. For example, Serra [247] has used this program in conjunction with LPC techniques to model bar percussion instruments, and Marks and Polito [163] have modeled piano tones by using it in conjunction with FM synthesis [38]. David Jaffe has had good success with birdsong, and Rachel Boughton used PARSHL to create abstractions of ocean sounds.
One of the problems encountered when using several techniques to synthesize the same sound is the difficulty of creating the perceptual fusion of the two synthesis components. By using phase information we have the possibility of matching the phases of the additive synthesis part to the rest of the sound (independently of what technique was used to generate it). This provides improved signal ``splicing'' capability, allowing very fast cross-fades (e.g., over one frame period).
PARSHL was originally written to properly analyze the steady state of piano sounds; it did not address modeling the attack of the piano sound for purposes of resynthesis. The phase tracking was primarily motivated by the idea of splicing the real attack (sampled waveform) to its synthesized steady state. It is well known that additive synthesis techniques have a very hard time synthesizing attacks, both due to their fast transition and their ``noisy'' characteristics. The problem is made more difficult by the fact that we are very sensitive to the quality of a sound's attack. For plucked or struck strings, if we are able to splice two or three periods, or a few milliseconds, of the original sound into our synthesized version the quality can improve considerably, retaining a large data-reduction factor and the possibility of manipulating the synthesis part. When this is attempted without the phase information, the splice, even if we do a smooth cross-fade over a number of samples, can be very noticeable. By simply adding the phase data the task becomes comparatively easy, and the splice is much closer to inaudible.
Conclusions
In this appendix, an analysis/synthesis technique based on a sinusoidal representation was presented that has proven to be very appropriate for signals which are well characterized as a sum of inharmonic sinusoids with slowly varying amplitudes and frequencies. The previously used harmonic vocoder techniques have been relatively unwieldy in the inharmonic case, and less robust even in the harmonic case. PARSHL obtains the sinusoidal representation of the input sound by tracking the amplitude, frequency, and phase of the most prominent peaks in a series of spectra computed using a Cooley-Tukey Fast Fourier Transform (FFT) of successive, overlapping, windowed data frames, taken over the duration of a sound. We have mentioned some of the musical applications of this sinusoidal representation.
Continuing the work with this analysis/synthesis technique we are implementing PARSHL on a Lisp Machine with an attached FPS AP120B array processor. We plan to study further its sound transformation possibilities and the use of PARSHL in conjunction with other analysis/synthesis techniques such as Linear Predictive Coding (LPC) [162].
The basic ``FFT processor'' at the heart of PARSHL provides a ready point of departure for many other STFT applications such as FIR filtering, speech coding, noise reduction, adaptive equalization, cross-synthesis, and many more. The basic parameter trade-offs discussed in this appendix are universal across all of these applications.
Although PARSHL was designed to analyze piano recordings, it has proven very successful in extracting additive synthesis parameters for radically inharmonic sounds. It provides interesting effects when made to extract peak trajectories in signals which are not describable as sums of sinusoids (such as noise or ocean recordings). PARSHL has even demonstrated that speech can be intelligible after reducing it to only the three strongest sinusoidal components.
The surprising success of additive synthesis from spectral peaks suggests a close connection with audio perception. Perhaps timbre perception is based on data reduction in the brain similar to that carried out by PARSHL. This data reduction goes beyond what is provided by critical-band masking. Perhaps a higher-level theory of ``timbral masking'' or ``main feature dominance'' is appropriate, wherein the principal spectral features serve to define the timbre, masking lower-level (though unmasked) structure. The lower-level features would have to be restricted to qualitatively similar behavior in order that they be ``implied'' by the louder features. Another point of view is that the spectral peaks are analogous to the outlines of figures in a picture--they capture enough of the perceptual cues to trigger the proper percept; memory itself may then serve to fill in the implied spectral features (at least for a time).
Techniques such as PARSHL provide a powerful analysis tool toward extracting signal parameters matched to the characteristics of hearing. Such an approach is perhaps the best single way to obtain cost-effective, analysis-based synthesis of any sound.
Acknowledgments
The authors of [271] thank Dynacord, Inc., for supporting the development of the first version of PARSHL in the summer of 1985. We also wish to acknowledge the valuable contributions of Gerold Schrutz (of Dynacord) during that time.
Software Listing
The software listing below is in the SAIL programming language (generally a superset of ALGOL). Since procedures are defined before they are used, the listing starts out defining basic utilities, progressing to higher levels until finally the top level is reached. As a result, it is usually easier start read the top level procedure at the end first, and then work backwards to fill in the details.
The SAIL keyboard had various non-standard characters which are spelled out here using TeX equivalents such as &alpha#alpha; and &beta#beta;.
COMMENT Track instantaneous amplitudes and frequencies of multiple sinusoids; COMMENT Side-result: FFT-based filtering and/or inharmonic additive synthesis; COMMENT Completed in the summer of 1985 by Julius O. Smith III BEGIN "PARSHL" REQUIRE "{}<>" DELIMITERS; DEFINE #="comment",CrLf="'15&'12",tab={""&'11},Sam16="4",Cr="'15", ALT={'175&""},Thru={step 1 until}; REQUIRE "JOSLIB.REQ[LIB,JOS]" SOURCE!FILE; REQUIRE "REQUIRING UDP2:SIGLIB.REL[SIG,MUS] LIBRARY" MESSAGE; REQUIRE "UDP2:SIGLIB.REL[SIG,MUS]" LIBRARY; REQUIRE "JAMLIB.REL[SUB,SYS]" LIBRARY; REQUIRE "TRMTYP.SAI[LIB,JOS]" SOURCE!FILE; REQUIRE "DISPLA.REQ[LIB,JOS]" SOURCE!FILE; EXTERNAL PROCEDURE TRPINI(INTEGER CODE); # Enable floating point exception code; EXTERNAL PROCEDURE SUPCT; # Disable \alpha-I,\alpha-R,\alpha-T,\alpha-L etc; REQUIRE "RECORD.REQ[LIB,JOS]" SOURCE!FILE; # REQUIRE "MYIO.REQ[LIB,JOS]" SOURCE!FILE; COMMENT Declare File IO stuff; REQUIRE "FLTIO.REQ[LIB,JOS]" SOURCE!FILE; COMMENT Declare GetFlt; EXTERNAL BOOLEAN PROCEDURE AinInt(REFERENCE INTEGER Val; STRING Name); EXTERNAL BOOLEAN PROCEDURE AinReal(REFERENCE REAL Val; STRING Name); EXTERNAL STRING PROCEDURE Cvfs(REAL Val); EXTERNAL SIMPLE BOOLEAN PROCEDURE FNparse( STRING Arg; REFERENCE STRING Device; REFERENCE STRING Filename); EXTERNAL STRING PROCEDURE DEV(STRING Name); COMMENT Return DEVICE part of filename; EXTERNAL STRING PROCEDURE NAME(STRING Name); COMMENT Return NAME part of filename; EXTERNAL STRING PROCEDURE EXT(STRING Name); COMMENT Return extension part of filename; EXTERNAL STRING PROCEDURE PPN(STRING Name); COMMENT Return PPN part of filename; COMMENT SOUND IO DECLARATIONS; REQUIRE "SYS:DEFAUL.HDR" SOURCE!FILE; # REQUIRE "RHEAD.REL[SYS,MUS]" LOAD!MODULE; # REQUIRE "WHEAD.REL[SYS,MUS]" LOAD!MODULE; EXTERNAL BOOLEAN PROCEDURE SANDI( INTEGER Chan, StSamp, Nsamps; REFERENCE REAL ARRAY X; INTEGER FilSamps, Headed, FilPak, Xpack(-1)); EXTERNAL BOOLEAN PROCEDURE SANDO( INTEGER Chan, StSamp, Nsamps; REFERENCE REAL ARRAY X; REFERENCE INTEGER ARRAY Hist; REFERENCE INTEGER FilSamps; INTEGER Headed, FilPak, Xpack(-1)); # ReadH (rp, hdbuf, Fail, true if don't want printout); EXTERNAL PROCEDURE ReadH ( RECORD!POINTER (File) Rp; REFERENCE REAL ARRAY Hdbuf; REFERENCE INTEGER FAIL; BOOLEAN Silence (FALSE)); # WriteH (head, clock, pack, #chans, #ticks, maxamp, cstr); EXTERNAL PROCEDURE WriteH ( REFERENCE REAL ARRAY Head; REAL Clock; INTEGER Pack, #chans, #ticks; REAL Maxamp; STRING Cstr ); INTEGER ARRAY OutH,AmpH,FrqH[0:128]; REAL ARRAY Head[1:128]; INTEGER Nread,Brk; REAL Maxamp; BOOLEAN FAIL,Eof; STRING Ifile,Idev,Ofile,Odev; STRING AmpFile,AmpDev,FrqFile,FrqDev,Tstr; RECORD!POINTER (File) InFptr, OutFptr; RECORD!POINTER (File) AmpFptr, FrqFptr; DEFINE InF(x) = {File:x[InFptr]}, OutF(x) = {File:x[OutFptr]}; DEFINE AmpF(x) = {File:x[AmpFptr]}, FrqF(x) = {File:x[FrqFptr]}; COMMENT Filter input; # REQUIRE "JOSLIB.REQ[LIB,JOS]" SOURCE!FILE; # REQUIRE "RECORD.REQ[LIB,JOS]" SOURCE!FILE; # REQUIRE "FLTIO.REQ[LIB,JOS]" SOURCE!FILE; COMMENT Declare GetFlt; DEFINE MaxCoeffs = "8192"; COMMENT Maximum number of filter coefficients; STRING Ffile; INTEGER Ni,No; REAL ARRAY Ic,Oc[1:MaxCoeffs]; INTEGER Nfft,Nspec,Nh,Nx,Nhop,Ndec,i,Nhops; INTERNAL INTEGER Trace; DEFINE ShowSpectra="(Trace LAND 1)"; DEFINE TDpyFFA = " IF ShowSpectra THEN DpyFFA "; DEFINE ShowTime="(Trace LAND 2)"; DEFINE TDpyEd = " IF ShowTime THEN DpyEd "; DEFINE ShowPeakFinder="(Trace LAND 4)"; DEFINE Debug1="(Trace LAND 8)", Debug2="(Trace LAND 16)", Debug3="(Trace LAND 32)"; BOOLEAN HaveOfile,HaveIfile,HaveFfile,HavePack; STRING TmpStr; EXTERNAL INTEGER !SKIP!; IFC FALSE THENC # SAIL Esc-I interrupt facility; SIMPLE PROCEDURE EscI; # This is called upon <esc>I interrupt; START!CODE "EscI" TLZE '10,'400000; # If sign bit is on; MOVN '10,'10; # Convert sign-magnitude to 2' comp; MOVEM '10, Trace; # Save Ac 10; CALLI 0, '400024; # DISMIS (return to user level); END "EscI"; PROCEDURE Int!init; BEGIN EXTERNAL INTEGER JOBAPR; # JOB DATA AREA user interrupt routine pointer; JOBAPR <- LOCATION(EscI); ENABLE(15); # Enable user interrupt handling; Trace <- 0; END; COMMENT REQUIRE Int!Init INITIALIZATION; COMMENT rmoved by BIL because he thinks this is extremely dangerous -- you are depending on getting only esc-i interrupts and nothing else. The SAIL runtime world that tries to handle arithmetic exceptions (for example) depends on JOBAPR pointing to some procedure that can handle such interrupts (UDP2:TRIGS[NEW,SAI]/4P and UDP2:NWORLD[NEW,SAI]/31,32P); ELSEC IFC NOT DECLARATION(GOGTAB) THENC EXTERNAL INTEGER ARRAY GOGTAB[0:'300]; ENDC SIMPLE PROCEDURE Esci; Trace <- GOGTAB['200]; PROCEDURE Int!Init; BEGIN ENABLE(15); INTMAP(15,Esci,0); Trace <- 0; END; REQUIRE Int!Init INITIALIZATION; ENDC # Global declarations; BOOLEAN DoFlt,DoSynth,DoOut,SwapOut; # Flags for type of function wanted; REAL MinSep,Thresh,Hyst,DFmax1,DFmax2,DAmax,SigScl,Fc1,Fc2,Fs,DBscl; INTEGER MaxLins,MaxOscs,Nframes,Npartials,Frame,MinWid; BOOLEAN Quiet,InstantRise; COMMENT DerivedName - Generate name from root name and code string; STRING PROCEDURE DerivedName(STRING GivenName, CodeString); BEGIN "DerivedName" INTEGER i,j,ln,lc; STRING Gname,DevStr; STRING RootName; # REQUIRE "JOSLIB.REQ[LIB,JOS]" SOURCE!FILE; EXTERNAL STRING PROCEDURE DEV(STRING Name); COMMENT Return DEVICE part of filename; EXTERNAL STRING PROCEDURE NAME(STRING Name); COMMENT Return NAME part of filename; EXTERNAL STRING PROCEDURE EXT(STRING Name); COMMENT Return extension part of filename; EXTERNAL STRING PROCEDURE PPN(STRING Name); COMMENT Return PPN part of filename; Gname <- NAME(GivenName); ln <- LENGTH(Gname); lc <- LENGTH(CodeString); IF lc+ln LEQ 6 THEN RootName <- Gname&CodeString ELSE RootName <- Gname[1 to 6-lc]&CodeString; IF EQU(RootName,Gname) THEN BEGIN # Gak! Must generate a new name!; PRINT(" DerivedName got a name collision",CrLf); IF ln>1 THEN RootName <- Gname[1 to 6-lc-1]&CodeString&"2" ELSE RootName <- Gname[1 to 6-lc]&CodeString[1 to lc-1]&"2"; IF EQU(RootName,Gname) THEN RootName <- RootName[1 to 5]&"3"; END; DevStr <- Dev(GivenName); IF DevStr NEQ NULL THEN DevStr <- DevStr&":"; RETURN(DevStr&RootName&Ext(GivenName)&Ppn(GivenName)); END "DerivedName"; COMMENT Fprint - Print documentation from DSK; # Put this in JOSLIB; PROCEDURE Fprint(STRING Fname); COMMENT Print contents of file in fname; BEGIN STRING Help,Ttystr; INTEGER Fchan,Brk,Eof,Bt; BOOLEAN FAIL; DEFINE FF={('14&"")}; OPEN(Fchan <- GETCHAN,"DSK",0,2,0,2000,Brk,Eof); COMMENT Ascii input; LOOKUP(Fchan,Fname,FAIL); IF FAIL THEN USERERR(0,1," Can't find HELP file "&Fname); IF FAIL THEN RETURN; SETBREAK(Bt <- GETBREAK,FF,NULL,"sin"); DO BEGIN "pages" Help <- INPUT(Fchan,Bt); IF NOT EQU(Help[1 FOR 7],"COMMENT") THEN BEGIN MyOutDpy(Help,-3,-1); MyOutDpy ("Type RETURN to continue - ALT to return to command level",31,-1); MyOutDpy (Fname,33,-1); Ttystr <- INCHWL; Break!N; IF Ttystr="H" OR Ttystr = "?" THEN MyOutDpy(" This IS help!!",32,-1); IF !skip!="H" OR !skip! = "?" THEN MyOutDpy(" This IS help!!",32,-1); IF !skip! = ALT THEN BEGIN PRINT(CrLf); DONE "pages"; END; END; END "pages" UNTIL Eof; Relbreak(Bt); RELEASE(Fchan); END; COMMENT Help - Command Summaries; RECURSIVE PROCEDURE HELP(STRING Topic(NULL)); BEGIN "help" Break!N; IF EQU(Topic,"?") THEN BEGIN Fprint("PARSHL.JOS[MUS,DOC]"); PRINT(CrLf,CrLf," Type \alpha-? for command summary",CrLf); # PRINT(" Type <command!letter>\alpha-? for individual command description",CrLf); PRINT(" Type ?\alpha-? or ?,?<RETURN> for full documentation",CrLf,CrLf); RETURN; END; IF EQU(Topic,"TopLevel") THEN MyOutDpy(" Outer Level Command Summary [s> => string, [n> => integer, [d> => real number, \alpha- => CONTROL, and \beta- => META ALT - Exit PARSHL. ?\alpha-? - Print complete documentation. (Also ?,?[CR> does this.) [n>\alpha-W - Analysis Window (1=rect,2=triang,3:5=Hamming,6=Kaiser,7=Cheb) [n>\alpha-T - FFT length (must be a power of 2). [n>\alpha-D - Number of samples of input signal per FFT frame. [n>\alpha-H - Number of samples of input signal to advance between FFT's. [n>\alpha-L - Length of filter impulse response (if known). [n>\alpha-C - Decimation factor (1 => no decimation, 2 => every other sample, etc.) [s>\alpha-I - Input sound file. [s>\alpha-O - Output sound file. [n>\alpha-P - Output sound file packing mode. (\alpha-P with no arg explains codes.) [s>\alpha-F - Input filter file. ; - Comment. [ESC>[n>I codes: (n can be any sum of the following): [n> display(s) enabled --- ------------------ 0 none (type [ESC>I to turn off all running displays) 1 time buffers ([ESC>1I) 2 spectral buffers ([ESC>2I) 4 peak-finder 8 debug level 1 16 debug level 2 32 debug level 3 ------------------------------------------------------------------------ ",-3,-1); IF EQU(Topic,"SubLevel") THEN MyOutDpy(" Inner Level Command Summary [s] => string, [n] => integer, [d] => real number, \alpha- => CONTROL, and \beta- => META ALT - Exit to Outer level. ?\alpha-? - Print complete documentation. (Also ?,?[CR] does this.) [d]\alpha-M - Minimum spacing between partials in Hz [n]\alpha-N - Maximum number of partials to track [n]\alpha-O - Maximum number of oscillators to use [d]\alpha-F - First (lowest) partial frequency allowed (Hz) [d]\alpha-L - Last (highest) partial frequency allowed (Hz) [d]\alpha-T - Peak detection threshold in dB [d]\alpha-H - Spectral wiggle tolerance in dB [d]\alpha-D - Maximum allowed frequency deviation per frame in Hz at LOW frequency [d]\alpha-U - Maximum allowed frequency deviation per frame in Hz at HIGH frequency [n]\alpha-B - Buffer display code. If negative, all time buffers are displayed. \alpha-S - Show spectra of filter input and output on each FFT hop. (\beta-S disables.) ; - Comment. ------------------------------------------------------------------------ ",-3,-1); END "help"; COMMENT DpyFFA - Display interleaved dB spectra as computed by FFA in SIGLIB; PROCEDURE DpyFFA(REAL ARRAY S; INTEGER Nfft; STRING T; REAL Fs(1)); COMMENT For displaying interleaved dB spectra as computed by FFA in SIGLIB; BEGIN "DpyFFA" INTEGER i,Nspec; REAL ARRAY Spec[1:(Nfft DIV 2)+1]; Nspec <- (Nfft DIV 2) + 1; FOR i <- 1 STEP 1 UNTIL Nspec DO Spec[i] <- (10.0/LOG(10.0))*LOG( (S[2*i-1]^2+S[2*i]^2) MAX 1.0@-35); DpyEd(Spec,Nspec,T,"MAGNITUDE (DB)",0,Fs/2); END "DpyFFA"; COMMENT Qinterp - Quadratic interpolation; INTERNAL SIMPLE REAL PROCEDURE Qinterp(REAL Ym1,Y0,Yp1; BOOLEAN InteriorX(TRUE)) ; COMMENT Fit a parabola Y[X] = A*X^2+B, through the three equally spaced Y-values Ym1 = Y[-1], Y0 = Y[0], and Yp1=Y[1], and return the X-value where the extremum is attained. For example, if -1 is returned, then Y[-1] is the point of zero slope in the parabola through the three points. If InteriorX is TRUE then if the extremum lies outside the interval [-1,1], a warning is printed and the returned value is clipped to lie at either 1 or -1 as appropriate. ; BEGIN "Qinterp" REAL X; X <- (Yp1 - Ym1)/(2*(2*Y0 - Yp1 - Ym1)); IF InteriorX AND (ABS(X)>1) THEN PRINT(" Qinterp: Clipping analytic extremum to ", X <- (IF X>0 THEN 1 ELSE -1),CrLf); IF Debug3 THEN PRINT("Qinterp: Given Y's ",Ym1," ",Y0," ",Yp1, ", extremum is at X = ",X,CRLF); RETURN(X); END "Qinterp"; COMMENT FindPeaks - Generic peak finder; IFC NOT DECLARATION(IntNoSet) THENC DEFINE IntNoSet={(1 LSH 35)}; IFC NOT DECLARATION(RealNoSet) THENC DEFINE RealNoSet={(-1.0@35)}; INTEGER PROCEDURE FindPeaks(REAL ARRAY X,Peaks,PeakLocs; REAL Thresh(RealNoSet),Hyst(RealNoSet); INTEGER MaxPeaks(IntNoSet),MinWidth(IntNoSet),BlankWidth(IntNoSet), I1(IntNoSet),I2(IntNoSet)); COMMENT Find amplitudes and indices of all peaks in X = array of values to be searched for peaks. (It is overwritten.) Peaks[1:MaxPeaks] = peak amplitudes. PeakLocs[1:MaxPeaks] = peak indices. Thresh = Threshold below which no peaks are to be sought Hyst = Wiggles less than Hyst in magnitude are ignored. MaxPeaks = Maximum number of peaks to be found (starting w largest) MinWidth = Minimum peak width in indices, after blanking. Narrower peaks are removed and ignored. BlankWidth = Minimum blanking interval in indeces. If = IntNoSet, then blank to nearest local minimum to within Hyst interval. Blanking is done before width is measured for efficiency reasons (would rather measure width first). i1,i2 = index boundaries for the search within X. The return value is the number of peaks actually found. ; BEGIN "FindPeaks" INTEGER i,j,Npeaks,NdxReach,Owid,Odig,Peak,Poff,PLoff,M1,M2,Nfound,Lb,Ub; REAL Xmax,Xmin; GETFORMAT(Owid,Odig); SETFORMAT(0,5); NdxReach <- (IF (BlankWidth NEQ IntNoSet) THEN (BlankWidth-1)/2 MAX 1 ELSE 0); Poff <- ARRINFO(Peaks,1)-1; PLoff <- ARRINFO(PeakLocs,1)-1; Npeaks <- (IF MaxPeaks NEQ IntNoSet THEN MaxPeaks ELSE ARRINFO(Peaks,2)-Poff); Npeaks <- Npeaks MIN (ARRINFO(Peaks,2)-Poff) MIN (ARRINFO(PeakLocs,2)-PLoff); Lb <- ARRINFO(X,1); Ub <- ARRINFO(X,2); IF I2=IntNoSet THEN I2 <- ARRINFO(X,2); IF I1=IntNoSet THEN I1 <- ARRINFO(X,1); I1 <- (Lb MAX I1 MIN Ub); I2 <- (I1 MAX I2 MIN Ub); ArrMin(Xmin,X,I1,I2); ArrMax(Xmax,X,I1,I2); IF Thresh=RealNoSet THEN Thresh <- Xmin; IF Hyst=RealNoSet THEN Hyst <- (Xmax-Xmin)/100; M1 <- M2 <- 0; COMMENT [m1,m2] = index interval of last peak; Nfound <- 0; FOR Peak <- 1 STEP 1 UNTIL Npeaks DO BEGIN "fp" INTEGER MaxLoc,TmpI; REAL MaxVal,ClobVal,TmpR; IF M1=I1 AND M2=I2 THEN DONE "fp"; MaxLoc <- ArrMax(MaxVal,X,I1,I2); IF MaxVal<Thresh THEN DONE "fp"; Nfound <- Nfound + 1; PeakLocs[Nfound+PLoff] <- MaxLoc + Qinterp(X[(MaxLoc-1) MAX I1],MaxVal,X[(MaxLoc+1) MIN I2]); IF MaxLoc=I1 AND Debug1 THEN PRINT("*** Peak is at right of search interval for peak ",Nfound," ***",CrLf); IF MaxLoc=I2 AND Debug1 THEN PRINT("*** Peak is at left of search interval for peak ",Nfound," ***",CrLf); Peaks[Nfound+Poff] <- MaxVal; COMMENT Now slice off peak so we don't find it again; M1 <- (MaxLoc-NdxReach) MAX I1; M2 <- (MaxLoc+NdxReach) MIN I2; ArrMin(ClobVal,X,M1,M2); TmpR <- X[M1]; WHILE M1>I1 AND TmpR+Hyst GEQ X[M1-1] DO BEGIN TmpR <- TmpR MIN X[M1-1]; M1 <- M1-1 END; ClobVal <- ClobVal MIN TmpR; TmpR <- X[M2]; WHILE M2<I2 AND TmpR+Hyst GEQ X[M2+1] DO BEGIN TmpR <- TmpR MIN X[M2+1]; M2 <- M2+1 END; ClobVal <- ClobVal MIN TmpR; FOR i <- M1 STEP 1 UNTIL M2 DO X[i] <- ClobVal; IF (M2-M1+1 < MinWidth) OR MaxLoc=I1 OR MaxLoc=I2 # JOS/10/25/85; THEN BEGIN "Throw it back" Nfound <- Nfound-1; CONTINUE "fp"; END "Throw it back"; IF ShowPeakFinder THEN DpyEd(X,I2-ARRINFO(X,1)+1,"X remains after peak "& CVS(Nfound)&" at i="&CVS(MaxLoc)); END "fp"; SETFORMAT(Owid,Odig); RETURN(Nfound); END "FindPeaks"; COMMENT FindPartials - Find all significant spectral lines for current frame; PROCEDURE FindPartials(REFERENCE INTEGER Npartials; REAL ARRAY XmagDB,LinAmp,LinFrq; REAL Fs,MinSep,Thresh,Hyst; INTEGER Nfft,Frame,MinWid; REAL Fc1(RealNoSet),Fc2(RealNoSet)); COMMENT Find amplitudes (dB) and frequencies (Hz) of all partials in XmagDB. Npartials = the maximum number of partials to find. It is set to the actual number of partials found on return. XmagDB[1:Nfft/2+1] = DB magnitude spectrum LinAmp[1:Nharms,1:Nframes] = Partial amplitudes LinFrq[1:Nharms,1:Nframes] = Partial frequencies Fs = sampling rate (Hz) MinSep = minimum line spacing (Hz) Thresh,Hyst are args to FindPeaks Nfft = Size of FFT used in computing XmagDB Frame = current time frame number MinWid = minimum acceptable peak width in FFT bins. Fc1,Fc2 = lower and upper frequency bounds in Hz. No partials will be allowed outside this interval. ; BEGIN "FindPartials" REQUIRE "HACKS.REL[SUB,SYS]" LIBRARY; EXTERNAL SIMPLE PROCEDURE QISort(REAL ARRAY A,Data; INTEGER LB,UB); INTEGER i,j,Nspec,BinInt,BinReach,Owid,Odig,Partial,MinWid; REAL ARRAY Xsave[1:Nspec <- Nfft/2+1]; GETFORMAT(Owid,Odig); SETFORMAT(0,7); BinInt <- (MinSep/Fs)*Nfft+.5; # Blanking interval for spectral peak (in bins); IF Frame=1 THEN PRINT(CrLf," Blanking interval is ",BinInt," bins",CrLf); ARRTRAN(Xsave,XmagDB); # Save input array; Npartials <- FindPeaks(XmagDB,LinAmp,LinFrq,Thresh,Hyst, Npartials,MinWid,BinInt,((1+Nfft*Fc1/Fs) MAX 1),((1+Nfft*Fc2/Fs) MIN Nspec)); ARRTRAN(XmagDB,Xsave); # Sort for ascending frequency (for convenience only); IF Npartials>0 THEN QIsort(LinFrq,LinAmp,1,Npartials); IF Debug2 THEN PRINT(CrLf); FOR Partial <- 1 STEP 1 UNTIL Npartials DO BEGIN "fp" LinFrq[Partial] <- Fs*(LinFrq[Partial]-1)/Nfft; IF Debug2 GEQ Partial THEN PRINT("Frame=",Frame," Amp(dB)=",LinAmp[Partial], " Freq(Hz)=",LinFrq[Partial],"=",Partial,"*",MinSep,"*", LinFrq[Partial]/(Partial*MinSep),CrLf); END "fp"; IF ShowPeakFinder THEN BEGIN "ShowHarms" REAL ARRAY CFrqs[1:Nspec]; REAL Val; Val <- LinAmp[1]; j <- 1; FOR Partial <- 1 STEP 1 UNTIL Npartials DO BEGIN FOR i <- j STEP 1 UNTIL Nfft*(LinFrq[Partial]/Fs) DO CFrqs[i] <- Val; j <- i; Val <- LinAmp[Partial]; END; Dpy2(XmagDB,CFrqs,i, "Centers of pitch-wide search regions for frame "&CVS(Frame), "MAGNITUDE (DB)",0,Fs/2,-30,30); END "ShowHarms"; SETFORMAT(Owid,Odig); END "FindPartials"; COMMENT OutPartials - Write partial amplitudes and frequencies to disk; PROCEDURE OutPartials(INTEGER Npartials,Nframes; REAL ARRAY Amps,Frqs; REAL Fs,Thresh); BEGIN "OutPartials" INTEGER N,AmpP,FrqP; N <- Npartials*Nframes; Sando(AmpF(Channel),AmpP,N,Amps,AmpH,AmpP,TRUE,AmpF(Pack)); Sando(FrqF(Channel),FrqP,N,Frqs,FrqH,FrqP,TRUE,FrqF(Pack)); Sando(AmpF(Channel),0,0,Amps,AmpH,AmpP,TRUE,AmpF(Pack)); Sando(FrqF(Channel),0,0,Frqs,FrqH,FrqP,TRUE,FrqF(Pack)); END "OutPartials"; COMMENT Decimate - Downsample by integer factors; INTEGER PROCEDURE Decimate(REAL ARRAY A; INTEGER N,M,I(0)); COMMENT Downsample array A[1:N] by M. I nonzero means initialize; COMMENT Return the number of samples produced; BEGIN "Decimate" OWN INTEGER P; INTEGER j; IF M LEQ 1 THEN RETURN(N); IF I NEQ 0 OR P LEQ 0 THEN P <- 1; j <- 0; FOR i <- P STEP M UNTIL N DO A[j <- j+1] <- A[i]; P <- i-N; RETURN(j); END "Decimate"; COMMENT GetWin - Compute overlap-add analysis window; INTEGER WinType; DEFINE Rectangular="1", Triangular="2", Hamming="3", GenHamming="4", Hanning="5", Kaiser="6", Chebyshev = "7", Nwins = "7"; PRELOAD!WITH "Rectangular", "Triangular", "Hamming", "GenHamming", "Hanning", "Kaiser", "Chebyshev"; STRING ARRAY WinNames[1:Nwins]; DEFINE WinStr = "WinNames[WinType]"; PROCEDURE GetWin(REAL ARRAY W; INTEGER Wtype,Nw; REAL P3(-1.),P4(-1.)); COMMENT Generate analysis window taking special care to ensure that it will overlap-add to unity in the case of a Hamming window with hopsize = Nw/(2K) Wtype Window ----- ------ 1 Rectangular 2 Triangular 3 Hamming 4 Generalized Hamming 5 Hanning 6 Kaiser 7 Chebyshev ; BEGIN "GetWin" # REQUIRE "SIGLIB.REL[SUB,SYS]" LIBRARY; EXTERNAL PROCEDURE !WINFLT(REAL ARRAY H; REFERENCE INTEGER NH; REFERENCE INTEGER WINTYP; REFERENCE INTEGER BNDTYP; REAL ARRAY P); REAL ARRAY WinPars[1:4]; WinPars[1] <- 0; # Lower cutoff frequency; WinPars[2] <- 0; # Upper cutoff frequency; IF WinType=Kaiser AND P3<0 THEN DO BEGIN IF NOT AinReal(P3 <- 60,"Kaiser stop-band rejection in DB") THEN RETURN; IF P3 < 0 THEN PRINT("Stop-Band rejection must be POSITIVE dB...like 60",CRLF); END UNTIL P3 GEQ 0; IF WinType=GenHamming AND P3<0 THEN DO IF NOT AinReal(P3 <- 0.54,"Alpha (0:4)") THEN RETURN UNTIL (0 LEQ P3 LEQ 4); IF WinType=Chebyshev AND P4 LEQ 0 THEN DO BEGIN "GetCheb" IF NOT AinReal(P3, "Chebyshev stop-band rejection in DB (or minus transition width in Hz/Srate)") THEN RETURN; IF P3<0 THEN BEGIN P4 <- P3; P3 <- 0; END ELSE P4 <- 0; END "GetCheb" UNTIL (0 LEQ P4<1/2 AND 0 LEQ P3); WinPars[3] <- P3; WinPars[4] <- P4; IF (Nw MOD 2) = 1 THEN BEGIN REAL ARRAY TmpBuf[1:2*Nw+1]; !WinFlt(TmpBuf,Nw,Wtype,1,WinPars); # Analysis window = lowpass, Fc=0; ARRTRAN(W,TmpBuf); IF Wtype>Rectangular THEN W[Nw] <- 0; # For odd lengths, last sample zeroed; END ELSE BEGIN REAL ARRAY TmpBuf[1:2*Nw+1]; INTEGER i; !WinFlt(TmpBuf,i <- 2*Nw+1,Wtype,1,WinPars); FOR i <- 1 STEP 1 UNTIL Nw DO W[i] <- TmpBuf[2*i]; END; BEGIN "nrmlz" # REQUIRE "JOSLIB.REQ[LIB,JOS]" SOURCE!FILE; EXTERNAL REAL PROCEDURE MaxArr(INTEGER n; REAL ARRAY y); INTEGER i; REAL Wmax,Wscl; Wmax <- MaxArr(Nw,W); Wscl <- 1.0/Wmax; FOR i <- 1 STEP 1 UNTIL Nw DO W[i] <- W[i]*Wscl; END "nrmlz"; Tdpyed(W,Nw,"GETWIN: Window returned"); END "GetWin"; BOOLEAN PROCEDURE GetIfile(STRING IfileTry); BEGIN "GetIfile" DO BEGIN FNparse(IfileTry,Idev,Ifile); PRINT(Tab,"Reading file ",Idev,":",Ifile,Crlf); RELEASE(InF(Channel)); OPEN(InF(Channel) <- GETCHAN,Idev,'17,0,0,200,Brk,Eof); LOOKUP(InF(Channel),Ifile,FAIL); IF FAIL THEN BEGIN PRINT("File not found",CrLf); PRINT("Input sound file: "); IfileTry <- INCHWL; IF !SKIP!=ALT THEN RETURN(FALSE); END ELSE BEGIN ReadH(InFptr,Head,FAIL,Quiet); # Read and print header; InF(Name) <- Ifile; END END UNTIL NOT FAIL; RETURN(TRUE); END "GetIfile"; BOOLEAN PROCEDURE GetOfile(STRING OfileTry); BEGIN "GetOfile" DO BEGIN Ofile <- OfileTry; FNparse(OfileTry,Odev,Ofile); PRINT(Tab,"Writing file ",Odev,":",Ofile,Crlf); RELEASE(OutF(Channel)); OPEN(OutF(Channel) <- GETCHAN,Odev,'17,0,0,200,Brk,Eof); ENTER(OutF(Channel),Ofile,FAIL); IF FAIL THEN BEGIN PRINT("Can't ENTER output file",CrLf); PRINT("Output sound file: "); OfileTry <- INCHWL; IF !SKIP!=ALT THEN RETURN(FALSE); END ELSE HaveOfile <- TRUE; END UNTIL NOT FAIL; RETURN(TRUE); END "GetOfile"; COMMENT Get user input parameters; SUPCT; # Disable \alpha-IC,\alpha-T,\alpha-L; TTYUP(TRUE); # Convert to upper case on input; TrpIni('26); # all except integer overflow (1) and real uflow ('10); # The following is apparently too severe: TRPINI(-1); # Enable decent floating point exception code; Find!Type; # Get terminal characteristics; Quiet <- (IF dpytype=ddtype THEN FALSE ELSE TRUE); PRINT(CrLf,"PARSHL: ", COMPILER!BANNER[LENGTH(SCANC(COMPILER!BANNER,Tab,"","sinz"))+11 FOR 17],CrLf); InFptr <- NEW!RECORD (File); OutFptr <- NEW!RECORD (File); AmpFptr <- NEW!RECORD (File); FrqFptr <- NEW!RECORD (File); BEGIN "GetPars" REQUIRE "{}<>" DELIMITERS; DEFINE #="comment",thru={ step 1 until },crlf={('15)&('12)},ALT={'175&""},CR={('15)&""}; # REQUIRE "JOSLIB.REQ[LIB,JOS]" SOURCE!FILE; EXTERNAL PROCEDURE READ_COMMAND(STRING Prompt;REFERENCE STRING Cbits,Arg2,Arg1,Cmd); EXTERNAL PROCEDURE SUPCT; COMMENT Inhibit activation on \alpha-T,\alpha-L,\alpha-B - the SUPCT bit; EXTERNAL STRING PROCEDURE Cvbs(BOOLEAN B); EXTERNAL STRING PROCEDURE Cvfs(REAL r); # STRING Ffile,Ifile,Ofile; # INTEGER Nfft,Nspec,Nh,Nx,Nhop,Ndec; STRING Prompt; IF Nfft LEQ 0 THEN # the test is for saving defaults across <CALL> START; BEGIN "defaults" Nfft <- 1024; Nx <- 676; Nh <- 0; Nhop <- Nx/2; Ndec <- 1; WinType <- Hamming; Trace <- 0; DoFlt <- FALSE; DoSynth <- TRUE; SwapOut <- FALSE; InstantRise <- FALSE; END "defaults"; Idev <- Odev <- "UDP2"; Ifile <- "PC3R.SND[XF,JOS]"; Ffile <- "TEST.FLT"; Ofile <- DerivedName(Ifile,"S"); HaveIfile <- HaveFfile <- HaveOfile <- HavePack <- FALSE; WHILE TRUE DO BEGIN "OmniLoop" WHILE TRUE DO BEGIN "OuterParameters" STRING Bucky,Arg2, Arg1,Cmd; INTEGER Boolhak,Brk; COMMENT Enforce parameter consistency constraints; WinType <- (1 MAX WinType MIN Nwins); Nx <- Nx MAX 1; IF Nfft < (i <- 2**(1+(i <- LOG(Nfft)/LOG(2)))) THEN PRINT(" Transform size increased to next power of 2 = ",Nfft <- i,CrLf); IF Nh > Nfft THEN PRINT(" Transform size increased to ", Nfft <- 2**(1+(i <- LOG(Nh+Nx)/LOG(2))),CrLf); IF SwapOut THEN DoFlt <- DoSynth <- FALSE; IF SwapOut THEN BEGIN OutF(Pack) <- 3; HavePack <- TRUE END; IF DoFlt AND (Nh+Nx GEQ Nfft) THEN PRINT("*** FILTER WILL TIME ALIAS ***",CrLf, " (FFT size should be at least frame size plus filter length - 1.)",CrLf, " To avoid time aliasing, set FFT size to at least ",Nh+Nx-1,CrLf, " or reduce data frame size and/or filter length",CrLf); DoOut <- (DoFlt OR DoSynth OR SwapOut); IF NOT DoOut THEN PRINT(" Warning: No output signal will be generated",CrLf, "Only amplitude and frequency envelopes will be computed.",CrLf); IF NOT (DoFlt OR DoSynth) THEN Ndec <- 1; Prompt <- Crlf& "Window("&CVS(WinType)&"="&WinStr& ") TransformSize("&CVS(Nfft)& ") DataLength("&CVS(Nx)& ") HopSize("&CVS(Nhop)& ")"&CRLF&" "& (IF DoFlt THEN "LengthFilter("&CVS(Nh)&") " ELSE NULL)& "Input("&Idev&":"&Ifile& ") Output("&Odev&":"&Ofile& ") PackingOut("&CVS(OutF(Pack))& ") "&(IF DoFlt THEN "Filter("&Ffile&")" ELSE NULL)&CrLf&" "& (IF (DoFlt OR DoSynth) THEN "Compression("&CVS(Ndec)&") " ELSE NULL)& "AdditiveSynthesis("&CVBS(DoSynth)& ") UseFilter("&CVBS(DoFlt)& ") XchangeOut("&CVBS(SwapOut)& ") or ?:"; COMMENT General command is "arg2,arg1,cmd" or "arg2,arg1 \alpha-cmd" or "arg2,arg1 \beta-cmd" or "arg2,arg1 \alpha-\beta-cmd"; READ_COMMAND (Prompt,Bucky,Arg2,Arg1,Cmd); COMMENT Allow boolean B to be set TRUE with \alpha-B or B<return> or FALSE with \beta-B or <any arg>,B<return>; Boolhak <- INTSCAN(Bucky,0); IF Boolhak \leq 1 THEN Boolhak <- 0; CASE Cmd OF BEGIN "SetParameters" ["?"] HELP(IF Arg1 THEN Arg1[1 FOR 1] ELSE "TopLevel"); ["W"] WinType <- INTSCAN(Arg1,Brk); ["T"] Nfft <- INTSCAN(Arg1,Brk); ["D"] Nx <- INTSCAN(Arg1,Brk); ["H"] Nhop <- INTSCAN(Arg1,Brk); ["L"] Nh <- INTSCAN(Arg1,Brk); ["C"] Ndec <- INTSCAN(Arg1,Brk); ["A"] DoSynth <- NOT (Arg1 + Boolhak); ["U"] DoFlt <- NOT (Arg1 + Boolhak); ["X"] SwapOut <- NOT (Arg1 + Boolhak); ["Z"] Trace <- INTSCAN(Arg1,Brk); ["I"] IF (HaveIfile <- GetIfile(IF Arg1 NEQ NULL THEN Arg1 ELSE Ifile)) THEN BEGIN IF NOT HaveOfile THEN Ofile <- DerivedName(Ifile,"S"); IF NOT HavePack THEN BEGIN OutF(Pack) <- (IF SwapOut THEN 3 ELSE InF(Pack)); HavePack <- TRUE; END; END; ["O"] HaveOfile <- GetOfile(IF Arg1 NEQ NULL THEN Arg1 ELSE Ofile); ["P"] IF Arg1 NEQ NULL THEN OutF(Pack) <- INTSCAN(Arg1,0) ELSE BEGIN PRINT(CrLf,"Packing codes are (0=12b, 1=18b, 3=FP, 4=16b SAM, 5=20b SAM)",CrLf, "Output packing code:"); Arg1 <- INCHWL; IF !SKIP!=ALT THEN CONTINUE "OmniLoop"; OutF(Pack) <- (IF Arg1=NULL THEN Sam16 ELSE INTSCAN(Arg1,0)); PRINT("Packing code set to ",OutF(Pack),CrLf); HavePack <- TRUE; END; ["F"] BEGIN Ffile <- Arg1&".FLT"; IF NOT (HaveFfile <- GetFlt(Ni,No,Ic,Oc,Ffile,FALSE,FALSE)) THEN CONTINUE "OmniLoop"; IF No>1 THEN PRINT(" Recursive part of filter ignored.",CRLF); Nh <- Ni; END; [";"] ; COMMENT For comments in command lines; [ALT] ["E"] CALL(0,"EXIT"); [CR] DONE "OuterParameters"; ELSE PRINT(" what?",Crlf) END "SetParameters"; END "OuterParameters"; IF NOT (HaveIfile OR (HaveIfile <- GetIfile(Ifile))) THEN CALL(0,"EXIT"); IF NOT HaveOfile THEN Ofile <- DerivedName(Ifile,"S"); IF NOT HavePack THEN BEGIN OutF(Pack) <- (IF SwapOut THEN 3 ELSE InF(Pack)); HavePack <- TRUE; END; IF InF(#chans)>1 THEN BEGIN PRINT(Crlf,"Sorry, can only do 1-channel files."); HaveIfile <- FALSE; CONTINUE "OmniLoop"; END; IF NOT (HaveOfile OR NOT DoOut OR (HaveOfile <- GetOfile(Ofile))) THEN CALL(0,"EXIT"); IF DoFlt AND NOT HaveFfile THEN DO BEGIN IF NOT (HaveFfile <- GetFlt(Ni,No,Ic,Oc,Ffile,FALSE,FALSE)) THEN CALL(0,"EXIT"); IF No>1 THEN PRINT(" Recursive part of filter ignored.",CRLF); Nh <- Ni; END UNTIL HaveFfile; Nframes <- (InF(#samps)-Nx)/Nhop + 1; # Number of frames to process; TmpStr <- "PARSHL: Input file was "&Idev&":"&Ifile&CrLf&Tab& (IF DoFlt THEN "Filter="&Ffile&CrLf&Tab ELSE NULL)&" "& (IF DoSynth THEN " AS" ELSE NULL)& " Nframes="&CVS(Nframes)& " Nfft="&CVS(Nfft)& " Window="&WinStr& " Nframe="&CVS(Nx)& " Nhop="&CVS(Nhop)& " Compression="&CVS(Ndec)&CrLf&Tab& "NOT FINISHED"&CrLf& "(+)"&Tab&InF(Text)&Crlf; IF DoOut THEN BEGIN "SUOutSound" OutH[0] <- 0; OutF(Dev ) <- ODev; OutF(Name ) <- Ofile; OutF(Clock ) <- InF(Clock)/Ndec; OutF(#chans) <- InF(#chans); OutF(#ticks) <- InF(#ticks)*Ndec; OutF(#samps) <- 0; OutF(Text ) <- TmpStr; WriteH(Head,OutF(Clock),OutF(Pack),OutF(#chans),OutF(#ticks),0,OutF(Text)); USETO(OutF(Channel),1); ARRYOUT(OutF(Channel),Head[1],128); END "SUOutSound"; BEGIN "SUAmps" DO BEGIN "GAF" Tstr <- DerivedName((IF HaveOfile THEN Odev ELSE Idev)&":"& (IF HaveOfile THEN Ofile ELSE Ifile),"A"); FNparse(Tstr,AmpDev <- NULL,AmpFile); PRINT(CrLf,"Output amplitude envelopes file (=",AmpDev,":",AmpFile,"): "); Tstr <- INCHWL; IF !SKIP!=ALT THEN CALL(0,"EXIT"); FNparse(Tstr,AmpDev,AmpFile); PRINT(Tab,"Writing file ",AmpDev,":",AmpFile,Crlf); OPEN(AmpF(Channel) <- GETCHAN,AmpDev,'17,0,0,200,Brk,Eof); ENTER(AmpF(Channel),AmpFile,FAIL); IF FAIL THEN PRINT("Can't ENTER amplitudes output file",CrLf); END "GAF" UNTIL NOT FAIL; AmpH[0] <- 0; AmpF(Dev ) <- AmpDev; AmpF(Name ) <- AmpFile; AmpF(Clock ) <- InF(Clock)/Ndec; AmpF(#chans) <- InF(#chans); AmpF(#ticks) <- InF(#ticks)*Ndec; AmpF(#samps) <- 0; AmpF(Pack ) <- 3; AmpF(Text ) <- TmpStr; WriteH(Head,AmpF(Clock),AmpF(Pack),AmpF(#chans),AmpF(#ticks),0,AmpF(Text)); # Block size parameter (not yet supported by WriteH); Head[8] <- MEMORY[LOCATION(Nframes),REAL]; USETO(AmpF(Channel),1); ARRYOUT(AmpF(Channel),Head[1],128); END "SUAmps"; BEGIN "SUFrqs" DO BEGIN "GFF" EXTERNAL STRING PROCEDURE NAME(STRING Name); COMMENT Return NAME part of filename; EXTERNAL STRING PROCEDURE EXT(STRING Name); COMMENT Return extension part of filename; EXTERNAL STRING PROCEDURE PPN(STRING Name); COMMENT Return PPN part of filename; Tstr <- Name(AmpFile); Tstr <- Tstr[1 TO LENGTH(Tstr)-1]; # Strip off trailing "A"; Tstr <- Tstr&Ext(AmpFile)&Ppn(AmpFile); Tstr <- DerivedName(AmpDev&":"&Tstr,"F"); FNparse(Tstr,FrqDev <- NULL,FrqFile); PRINT(CrLf,"Output frequency envelopes file (=",FrqDev,":",FrqFile,"): "); Tstr <- INCHWL; IF !SKIP!=ALT THEN CALL(0,"EXIT"); FNparse(Tstr,FrqDev,FrqFile); PRINT(Tab,"Writing file ",FrqDev,":",FrqFile,Crlf); OPEN(FrqF(Channel) <- GETCHAN,FrqDev,'17,0,0,200,Brk,Eof); ENTER(FrqF(Channel),FrqFile,FAIL); IF FAIL THEN PRINT("Can't ENTER Frq output file",CrLf) END "GFF" UNTIL NOT FAIL; FrqH[0] <- 0; FrqF(Dev ) <- FrqDev; FrqF(Name ) <- FrqFile; FrqF(Clock ) <- InF(Clock)/Ndec; FrqF(#chans) <- InF(#chans); FrqF(#ticks) <- InF(#ticks)*Ndec; FrqF(Pack ) <- 3; FrqF(#samps) <- 0; FrqF(Text ) <- TmpStr; WriteH(Head,FrqF(Clock),FrqF(Pack),FrqF(#chans),FrqF(#ticks),0,FrqF(Text)); # Block size parameter (not yet supported by WriteH); Head[8] <- MEMORY[LOCATION(Nframes),REAL]; USETO(FrqF(Channel),1); ARRYOUT(FrqF(Channel),Head[1],128); END "SUFrqs"; CALL (InF(Channel), "SHOWIT"); # Simulate an <esc>2x for the input channel; SETFORMAT(0,5); # For buffer begin-time print-out and squelched frqs; Nspec <- (Nfft DIV 2) + 1; Maxamp <- 0; Nhops <- 0; # Counts up to Nframes unless aborted with ESC-I; # changed from 2 to 4, XJS 3.12.87; SigScl <- 4/Nx; # Guess for Hamming window, 50% overlap; REQUIRE CrLf&" IS SIGSCL CORRECT?? (Synth scaling)"&CrLf MESSAGE; COMMENT Additional input parameters for PARSHL analysis; IF MaxLins LEQ 0 THEN BEGIN "Idefaults" MinSep <- InF(Clock)/(Nx/4); # This is right when there are 4 period per frame; # Each valid peak should be (Nfft/Nx)*4 bins wide (4 for Hamming); MinWid <- 2.0*Nfft/Nx+1; # Half of expected width plus 1 (side-lobes rejected); MaxLins <- 60 MIN 0.5*InF(Clock)/MinSep; MaxOscs <- 40 MIN 0.5*InF(Clock)/MinSep; Fc1 <- InF(Clock)/1000; Fc2 <- InF(Clock)/2; Thresh <- -30; Hyst <- 0.1; # DAmax <- 10; # Disabled; DFmax1 <- MinSep/2; DFmax2 <- DFmax1; END "Idefaults"; WHILE TRUE DO BEGIN "InnerParameters" STRING Bucky,Arg2, Arg1,Cmd; INTEGER Boolhak,Brk; MinSep <- (0 MAX MinSep MIN InF(Clock)/4); MinWid <- (0 MAX MinWid MIN Nspec); MaxLins <- (1 MAX MaxLins MIN Nspec); MaxOscs <- (1 MAX MaxOscs MIN MaxLins); Thresh <- (-760 MAX Thresh MIN 760); Hyst <- (Hyst MAX 0); DFmax1 <- (0 MAX DFmax1 MIN InF(Clock)/2); DFmax2 <- (0 MAX DFmax2 MIN InF(Clock)/2); Prompt <- Crlf& "MinSpacing("&Cvfs(MinSep)&"Hz"& ") BinMin("&CVS(MinWid)& ") Nlines("&CVS(MaxLins)& ") Oscs("&CVS(MaxOscs)& ") InstantRise("&Cvbs(InstantRise)&")"&CRLF& " FirstFrq("&Cvfs(Fc1)& ") LastFrq("&Cvfs(Fc2)& ") Thresh("&Cvfs(Thresh)&"dB"&")"&CRLF& " Hyst("&Cvfs(Hyst)&"dB"& ") DFmax("&Cvfs(DFmax1)&"Hz"& ") UltDFmax("&Cvfs(DFmax2)&"Hz"& ") or ?:"; READ_COMMAND (Prompt,Bucky,Arg2,Arg1,Cmd); Boolhak <- INTSCAN(Bucky,0); IF Boolhak \leq 1 THEN Boolhak <- 0; CASE Cmd OF BEGIN "SetParameters" ["?"] HELP(IF Arg1 THEN Arg1[1 FOR 1] ELSE "SubLevel"); ["M"] MinSep <- REALSCAN(Arg1,Brk) MAX 0.0; ["N"] MaxLins <- INTSCAN(Arg1,Brk) MAX 1; ["O"] MaxOscs <- INTSCAN(Arg1,Brk) MAX 1; ["F"] Fc1 <- REALSCAN(Arg1,Brk) MAX 0.0; ["L"] Fc2 <- REALSCAN(Arg1,Brk) MIN InF(Clock)/2; ["T"] Thresh <- REALSCAN(Arg1,Brk); ["I"] InstantRise <- NOT (Arg1 + Boolhak); ["H"] Hyst <- REALSCAN(Arg1,Brk) MAX 0.0; ["D"] DFmax1 <- DFmax2 <- (REALSCAN(Arg1,Brk) MAX 0.0); ["U"] DFmax2 <- REALSCAN(Arg1,Brk) MAX 0.0; ["B"] MinWid <- INTSCAN(Arg1,Brk) MAX 0; ["Z"] Trace <- INTSCAN(Arg1,Brk); [";"] ; COMMENT For comments in command lines; [ALT] ["E"] CONTINUE "OmniLoop"; [CR] DONE "InnerParameters"; ELSE PRINT(" hmmm?",Crlf) END "SetParameters"; END "InnerParameters"; DONE "OmniLoop"; END "OmniLoop"; END "GetPars"; COMMENT Allocation of analysis data structures; BEGIN "ALAR" # Allocate arrays; # REQUIRE "SIGLIB.REL[SUB,SYS]" LIBRARY; EXTERNAL PROCEDURE !FFA(REAL ARRAY B; REFERENCE INTEGER NFFT); EXTERNAL PROCEDURE !FFS(REAL ARRAY B; REFERENCE INTEGER NFFT); INTEGER Yp,Xp,Bp,Nout,i; REAL ARRAY X[1:Nfft+2],H[1:Nfft+2],XmagDB[1:Nspec], OutBuf[1:2*Nfft],WinBuf[1:Nx]; REAL ARRAY Amps,Frqs[1:MaxLins,1:Nframes]; # Output database; INTEGER ARRAY Nactive[1:Nframes]; # No. active lines each frame; # Data structures for additive synthesis oscillators. (Use explained next page); INTEGER CurOsc,CurLin,Nlins,Noscs,PrvNoscs; INTEGER ARRAY OscOfLin[1:MaxLins]; # Osc no. assigned to each Lin; INTEGER ARRAY PrvLinOfOsc[1:MaxOscs]; # Lin no. assigned to each osc; INTEGER ARRAY LinOfOsc[1:MaxOscs]; # PrvLinOfOsc for next frame; INTEGER ARRAY OscPhs[1:MaxOscs]; # Current phases (sum of frequency); REAL ARRAY PrvOscAmp[1:MaxOscs]; # Current amplitudes (Linar ramps to targets); REAL ARRAY PrvOscFrq[1:MaxOscs]; # Current frequency (Linar ramp to that of Lin); REAL ARRAY OscAmp[1:MaxOscs]; # Target amplitude; REAL ARRAY OscFrq[1:MaxOscs]; # Target frequency; REAL ARRAY LinAmpDB[1:MaxLins]; # Target amplitudes of some osc in dB; REAL ARRAY LinAmp[1:MaxLins]; # Target amplitudes of some osc; REAL ARRAY LinFrq[1:MaxLins]; # Current instantaneous frequency; # REAL ARRAY LinPhs[1:MaxLins]; # Phase is the first thing thrown away; COMMENT Oscillator allocation utilities; DEFINE UDTrace = " (ABS(Trace)>2) "; DEFINE PrvOscOn(x) = { (PrvLinOfOsc[x] > 0) }, PrvOscFree(x) = { (PrvLinOfOsc[x] = 0) }, PrvOscSquelched(x) = { (PrvLinOfOsc[x] < 0) }, OscOn(x) = { (LinOfOsc[x] > 0) }, OscFree(x) = { (LinOfOsc[x] = 0) }, OscSquelched(x) = { (LinOfOsc[x] < 0) }, StopOsc(x) = { BEGIN LinOfOsc[x] <- -ABS(PrvLinOfOsc[x]); IF UDtrace THEN PRINT (" Osc ",x," turned off.",CrLf) END }, LinFree(x) = { (OscOfLin[x] = 0) }, LinClaimed(x) = { (OscOfLin[x] > 0) }; INTEGER PROCEDURE NxtPrvOscOn(INTEGER CurOsc); COMMENT Return next active oscillator after CurOsc, with wrap-around. Return 0 if no active oscillators; BEGIN "NxtPrvOscOn" INTEGER cnt,Found; Found <- 0; FOR cnt <- 1 Thru MaxOscs DO BEGIN CurOsc <- CurOsc+1; IF CurOsc>MaxOscs THEN CurOsc <- CurOsc-MaxOscs; IF PrvOscOn(CurOsc) THEN BEGIN Found <- CurOsc; DONE END; IF UDtrace THEN PRINT (" Skipping inactive Prv Osc number ",CurOsc,CrLf); END; IF UDtrace THEN PRINT (" Next active Prv Osc is number ",CurOsc,CrLf); IF Found=0 THEN PRINT(CrLf,"*!*!* Next active Prv Osc not found!",CrLf); RETURN(Found) END "NxtPrvOscOn"; INTEGER PROCEDURE NxtOscOn(INTEGER CurOsc); COMMENT Return next active oscillator after CurOsc, with wrap-around. Return 0 if no active oscillators; BEGIN "NxtOscOn" INTEGER cnt,Found; Found <- 0; FOR cnt <- 1 Thru MaxOscs DO BEGIN CurOsc <- CurOsc+1; IF CurOsc>MaxOscs THEN CurOsc <- CurOsc-MaxOscs; IF OscOn(CurOsc) THEN BEGIN Found <- CurOsc; DONE END; IF UDtrace THEN PRINT (" Skipping inactive Osc number ",CurOsc,CrLf); END; IF UDtrace THEN PRINT (" Next active Osc is number ",CurOsc,CrLf); IF Found=0 THEN PRINT(CrLf,"*!*!* Next active osc not found!",CrLf); RETURN(Found) END "NxtOscOn"; INTEGER PROCEDURE NxtOscFree(INTEGER CurOsc); COMMENT Return next free oscillator after CurOsc, with wrap-around. Return 0 if no free oscillators; BEGIN "NxtOscFree" INTEGER cnt,Found; Found <- 0; FOR cnt <- 1 Thru MaxOscs DO BEGIN CurOsc <- CurOsc+1; IF CurOsc>MaxOscs THEN CurOsc <- CurOsc-MaxOscs; IF OscFree(CurOsc) THEN BEGIN Found <- CurOsc; DONE END; IF UDtrace THEN PRINT (" Skipping non-free Osc number ",CurOsc,CrLf); END; IF UDtrace THEN PRINT (" Next free Osc is number ",CurOsc,CrLf); IF Found=0 THEN PRINT(CrLf,"*!*!* Next free osc not found!",CrLf); RETURN(Found) END "NxtOscFree"; INTEGER PROCEDURE NxtOscEnding(INTEGER CurOsc); COMMENT Return next Ending oscillator after CurOsc, with wrap-around. Return 0 if no Ending oscillators; BEGIN "NxtOscEnding" INTEGER cnt,Found; Found <- 0; FOR cnt <- 1 Thru MaxOscs DO BEGIN CurOsc <- CurOsc+1; IF CurOsc>MaxOscs THEN CurOsc <- CurOsc-MaxOscs; IF OscSquelched(CurOsc) THEN BEGIN Found <- CurOsc; DONE END; IF UDtrace THEN PRINT (" Skipping non-Ending Osc number ",CurOsc,CrLf); END; IF UDtrace THEN PRINT (" Next Ending Osc is number ",CurOsc,CrLf); IF Found=0 THEN PRINT(CrLf,"*!*!* Next Ending osc not found!",CrLf); RETURN(Found) END "NxtOscEnding"; INTEGER PROCEDURE ClosestOscFree(REAL NewFrq); COMMENT Return free oscillator with frequency closest to NewFrq. Return 0 if no free oscillators; BEGIN "ClosestOscFree" INTEGER Cnt,Found; REAL Dis,BestDis; Found <- NxtOscFree(0); IF Found=0 THEN BEGIN PRINT(CrLf,"*!*!* Next free osc not found!",CrLf); RETURN(Found) END; BestDis <- 1.0@38; Found <- 0; FOR cnt <- 1 Thru MaxOscs DO BEGIN IF OscFree(Cnt) THEN IF LinOfOsc[Cnt] NEQ 0 THEN IF (Dis <- ABS(NewFrq-LinFrq[ABS(LinOfOsc[Cnt])]))<BestDis THEN BEGIN Found <- Cnt; BestDis <- Dis; END; END; IF Found=0 THEN Found <- NxtOscFree(0) # None were ever used before; ELSE IF UDtrace THEN BEGIN PRINT (" Desired frq = ",NewFrq,CrLf); PRINT (" Closest frq = ",LinFrq[ABS(LinOfOsc[Found])],CrLf); PRINT (" Closest free osc is number ",Found,CrLf); END; RETURN(Found) END "ClosestOscFree"; COMMENT CkFrqs is OBSOLETE (replaced by GetClosestFrq); INTEGER PROCEDURE CkFrqs(INTEGER CurOsc; REAL DFlim); # Search for a line within specs, starting from where last one was. We assume partials are always quick-sorted by frequency so that usually the answer is a next-door neighbor ; BEGIN "CkFrqs" INTEGER LstLin; REAL TF,PF,DF,t; CurOsc <- NxtPrvOscOn(CurOsc); IF CurOsc LEQ 0 THEN RETURN(CurOsc); PF <- OscFrq[CurOsc]; # Former target frequency of oscillator CurOsc; LstLin <- PrvLinOfOsc[CurOsc] MIN Nlins; # Line this osc had last frame; IF LstLin<1 THEN BEGIN PRINT(" *** Nlins=0? ***"); RETURN(CurOsc) END; TF <- LinFrq[LstLin]; # Target frequency using last frame assignment; DF <- ABS(TF-PF); # Absolute change in frequency; IF UDtrace THEN PRINT( " Prv Frq for Osc ",CurOsc," was Line ",LstLin,"=",PF,CrLf, " Trg Frq for Osc ",CurOsc," on same Line ",LstLin,"=",TF,CrLf, " Del Frq for Osc ",CurOsc," would then be ",100*DF,"%",CrLf); IF DF > DFlim THEN BEGIN "DFerror" INTEGER Found; IF UDtrace THEN PRINT( "$$$ Frequency tolerance exceeded for Osc ",CurOsc," Frame ",Frame,CrLf); Found <- 0; DEFINE FrqInRange = { ABS(LinFrq[CurLin]-PF) LEQ DFlim }; DEFINE GotIt = { BEGIN Found <- CurLin; IF Trace THEN PRINT(" Found = ",Found,CrLf); DONE END }; FOR CurLin <- LstLin-1 STEP -1 UNTIL 1 DO IF FrqInRange THEN Gotit; IF Found=0 THEN FOR CurLin <- LstLin+1 STEP 1 UNTIL Nlins DO IF FrqInRange THEN Gotit; IF Found=0 THEN BEGIN "LineLost" PRINT(" LINE AT ",PF,"Hz LOST!",CrLf); StopOsc(CurOsc); END "LineLost" ELSE IF OscOfLin[Found] NEQ 0 THEN BEGIN "ohno" # Serious problem. The fix is to go find the other nearby line; PRINT("*!*!* LINE ",Found, " HAS CAPTURED OSCILLATORS ",OscOfLin[Found]," AND ",CurOsc,CrLf, " I HAVE TO DELETE ONE OSCILLATOR AND PROBABLY LOSE A NEARBY LINE",CrLf); StopOsc(CurOsc); END "ohno" ELSE BEGIN "LineFound" LinOfOsc[CurOsc] <- Found; OscOfLin[Found] <- CurOsc; Noscs <- Noscs+1; IF UDtrace THEN PRINT( " *New* line number is ",LinOfOsc[CurOsc], "=",LinFrq[LinOfOsc[CurOsc]], " hopefully closer to PF=",PF,CrLf) END "LineFound" END "DFerror" ELSE BEGIN "DFok" LinOfOsc[CurOsc] <- LstLin; OscOfLin[LstLin] <- CurOsc; Noscs <- Noscs+1; IF UDtrace THEN PRINT(" # SAME # line number used for osc ",CurOsc,CrLf) END "DFok"; RETURN(CurOsc); END "CkFrqs"; RECURSIVE PROCEDURE GetClosestFrq(INTEGER CurOsc; REAL DFmax); # Go through all active lines and find the one closest (BestFrq) to the previous frequency (PF) of oscillator CurOsc. If the difference between BestFrq and PF exceeds the frequency change limit (DFmax), the oscillator is turned off. (If it is already off it stays off.) On the other hand, if ABS(BestFrq-PF)<DFmax, a check is made to see if any other running osc has claimed this line already. If so, it is determined who is closer, and the loser is turned off. If the DFmax test passes and there are no collisions, the line is claimed by placing its number in LinOfOsc[CurOsc]. The osc is then running because LinOfOsc[CurOsc] will then be positive. ; BEGIN "GetClosestFrq" INTEGER j,BestLin,CurLin; REAL PF,CurDist,MinDist,BestFrq; # IF NOT PrvOscOn(CurOsc) THEN PRINT(" *** GetClosestFrq: Called on squelched osc ",CurOsc," ***",CrLf); PF <- OscFrq[CurOsc]; # Former target frequency of oscillator CurOsc; IF UDtrace THEN PRINT(" Prv frq for osc ",CurOsc," = ",PF,CrLf); BestLin <- 0; MinDist <- 1.0@38; DEFINE Dist(x,y) = { ABS((x)-(y)) }; FOR CurLin <- 1 STEP 1 UNTIL Nlins DO IF MinDist>(CurDist <- Dist(PF,LinFrq[CurLin])) THEN IF LinFree(CurLin) THEN BEGIN "PotentiallyMine" MinDist <- CurDist; BestLin <- CurLin END "PotentiallyMine" ELSE BEGIN "PotentiallyHis" INTEGER OtherOsc; REAL CurFrq; CurFrq <- LinFrq[CurLin]; OtherOsc <- OscOfLin[CurLin]; # Osc who has already claimed this line; IF CurDist<Dist(CurFrq,OscFrq[OtherOsc]) THEN BEGIN "HeCantHaveIt" MinDist <- CurDist; BestLin <- CurLin; IF UDtrace THEN PRINT(" GetClosestFrq Recursing on bumped osc ",OtherOsc,CrLf); OscOfLin[CurLin] <- CurOsc; # Tentatively claim this while other guy fishes; GetClosestFrq(OtherOsc,DFmax); # Go fish, OtherOsc; OscOfLin[CurLin] <- 0; # Unclaim; END "HeCantHaveIt" END "PotentiallyHis"; IF BestLin LEQ 0 THEN BEGIN "SureIsCrowded" PRINT(" *!*!* Osc ",CurOsc," could find no best line!!",CrLf); StopOsc(CurOsc); RETURN END "SureIsCrowded"; # Now BestLin points to the best line for osc CurOsc with all contention resolved; BestFrq <- LinFrq[BestLin]; IF UDtrace THEN PRINT(" Bst frq for osc ",CurOsc," = ",BestFrq,CrLf); IF MinDist > DFmax THEN BEGIN "DFerror" # And after all that!; IF UDtrace THEN PRINT("$$$ Frequency tolerance exceeded for Osc ",CurOsc,CrLf); StopOsc(CurOsc); # If already stopped, StopOsc is a noop; END "DFerror" ELSE BEGIN "TakeIt" LinOfOsc[CurOsc] <- BestLin; OscOfLin[BestLin] <- CurOsc; IF UDtrace THEN PRINT(" Osc ",CurOsc," grabs line ",BestLin,CrLf); END "TakeIt"; RETURN; END "GetClosestFrq"; COMMENT UpdateMap - Assign oscillators target values to partials. UpdateMap figures out which spectral lines go with which running oscillators. It makes extensive reference to the data structure of the previous page. (The routine is declared here to avoid a long argument list.) FindPartials prepares a list of partial amplitudes and frequencies active in the current frame. Since the partials (or lines) are found in order of decreasing amplitude, they can get arbitrarily reordered from frame to frame. It helps to sort by frequency the partials. However, there is still the problem of partials appearing and disappearing. This version of UpdateMap assumes lines to be sorted by ascending frequency. Here is a strategy: Initialization is trivial, so assume we are into the analysis at time Frame. Suppose there are Noscs active oscillators. Then (1) Each active oscillator looks for its line by finding the one which is closest in frequency. If it cannot find its line then it assumes the line disappeared and puts itself into the ramp-off state. (The previous amplitude tells whether this is a reasonable assumption, but the information is not being put to use here.) When an oscillator finds its line, it marks the line as taken. This prevents crossing partials from putting two oscillators on the same line and losing the other one. Thus, an oscillator must find its line among those not already taken. A count is made of the total number of line-to-osc assignments. There cannot be more successfully assigned oscs than than lines because of the "taken" mark interlock. However, there can be lines left over after all bands have been searched. (2) If there are one or more new lines, a pass is made through the Lines-in-Use array, and an oscillator is dispatched onto each line found not to be taken. An oscillator marks a line as taken by storing its id in an array. Therefore, if a collision occurs, the oscillator on the closest track can be given the line. In the current implementation, only a warning is issued. Each oscillator has 3 states: On, off, stopping. On means that the oscillator is to ramp from its previous amplitude to the current amplitude over Nhop samples. Off means no output. Stopping means the oscillator is ramping to zero. If we can ramp to zero from any amplitude in Nhop samples, we only need On and Off states (Stopping is then equivalent to Off with a previous amplitude > 0). However, it is also possible that ramps should comprise several hops when going from an arbitrary amplitude to 0. A hop is typically on the order of two periods of sound. JOS 14-JUN-85 : After getting the above working, I think the osc-to-line assignment strategy should be changed so that an oscillator is assigned a fixed frequency band. That way, an oscillator will not jump from line to line as we see now. This wastes oscillators (because bands with no lines are generating zeros) but consolidation can take place later. For now, I think it is best to have convenient AMPS and FRQS functions which are easy to look at. When an oscillator loses its line and gets dispatched on another line far away in frequency, it makes the AMPS and FRQS arrays unintelligible. Thus, UpDateMap is based on these changes. DAJ mentions what if a line is crossing back and forth between two bands? Instead, how about doing the previous version, but when assigning free oscs to new lines, find the osc with closest previous frequency. A losing case here is when a new line appears close to (but below) another line and captures the other lines osc. JOS 15-JUN-85 : After talking with Gerold, it was decided that we will do the following allocation strategy: * The input sound is reversed in time to allow postponement of the attack analysis until the end. * Working backwards through the sound, we dispatch a new oscillator on each new sinusoidal line which appears. Once committed to a frequency, an oscillator cannot be reassigned to a new frequency except by tracking valid glissandos. * For each frame, each oscillator finds the line which is closest to its current frequency. The current frequency is normally the frequency of the line in the previous frame used by this oscillator, except when the oscillator is off. A stopped osc remembers its last valid frequency as its current frequency. The difference between the closest line frequency and the current frequency are compared to a maximum-change limit. If this limit is exceeded the oscillator is turned off (or left off). Otherwise, a check is made to see if the best line has been claimed by another oscillator already in the current frame. If not, the osc claims the line and we go process the next osc. If there is a collision, we give the line to the osc which has to move the shortest distance in frequency to reach the new line. The osc which does not get the new line (in a collision) is turned off and its "current frequency" is rolled back to what it was before it claimed the new line. *** NOTE *** This is not optimal. It may be that OSC 1 claims LINE K+1 instead of LINE K which it normally claims. OSC 2 comes along and takes back LINE K. OSC 1 is turned off, and nobody can claim LINE K which is still there. Ideally, if an OSC is bumped from a line it claimed because another OSC came along later with a better fit to it, the bumped osc should get another pass to look for a "consolation line". [JOS: Implemented 15-JUN-85 9:21] REMARK: When the frequency deviation exceeds the allowed level, it is normally because the line for that oscillator dropped out. However, we can also get shaken loose by excessive frequency vibrato. In this case one oscillator turns off and another will hopefully start up elsewhere. 8-JUL-85 Consider allowing Oscs to find their peaks in DBspec instead of LinFrqs. Operation would be exactly as now, except the raw DB mag spectrum is searched for a local max nearest the current osc frq. Marking of a peak as "taken" is as before but now using the FindPeak method of peak removal. Every osc must be allowed to search the raw initial spectrum. GRS has suggested doing groups reduction on the mag^2 spectrum (summing power within a group and letting center frq of the group stand for line frq for the group). This way, PARSHL runs as if there were only 16 or so lines in the spectrum. Each group becomes a "virtual line" with its own AMP and FRQ. ; COMMENT (UpdateMap) - Initialization; PROCEDURE UpdateMap; COMMENT Arguments (See "Allocation of analysis data structures" above); BEGIN "UpdateMap" INTEGER i,Nadd,Owid,Odig; STRING CrLfd; REAL PROCEDURE DFmax(INTEGER CurOsc); # Return maximum allowed glissando for oscillator CurOsc; BEGIN "DFmax" OWN BOOLEAN ARRAY Inited[1:1]; # Array so <CALL>START<CR> will clear it; OWN REAL Alpha, Beta, DFmaxr; IF NOT Inited[1] THEN BEGIN "DFinit" REAL f1,f2; INTEGER kase; Inited[1] <- TRUE; f1 <- Fc1 MAX 0; f2 <- Fc2 MIN Fs/2; IF DFmax1=DFmax2 THEN kase <- 3 ELSE IF f1=0 THEN kase <- 2 ELSE kase <- 1; Alpha <- (CASE kase OF (0,(DFmax2/f2-DFmax1/f1)/(f2-f1),(DFmax2-DFmax1)/(f2-f1),0)); Beta <- (CASE kase OF (0,(DFmax1*f2/f1-DFmax2*f1/f2)/(f2-f1),(DFmax1*f2-DFmax2*f1)/(f2-f1),DFmax1)); END "DFinit"; DFmaxr <- Alpha*OscFrq[CurOsc]+Beta; IF Debug3 THEN PRINT(" Computed DFmax for OSC ",CurOsc," at freq ",OscFrq[CurOsc]," is ",DFmaxr,CrLf); RETURN(DFmaxr); END "DFmax"; GETFORMAT(Owid,Odig); IF Frame LEQ 1 THEN BEGIN "initialize" Noscs <- Nactive[Frame] MIN MaxOscs; Nlins <- Nactive[Frame] MIN MaxLins; FOR CurOsc <- 1 STEP 1 UNTIL Noscs DO BEGIN LinOfOsc[CurOsc] <- (CurLin <- CurOsc); PrvLinOfOsc[CurOsc] <- LinOfOsc[CurOsc]; # For next entry; OscOfLin[CurLin] <- CurOsc; OscPhs[CurOsc] <- 0; OscAmp[CurOsc] <- LinAmp[CurLin]; PrvOscAmp[CurOsc] <- (IF InstantRise THEN LinAmp[CurLin] ELSE 0); PrvOscFrq[CurOsc] <- LinFrq[CurLin]; OscFrq[CurOsc] <- LinFrq[CurLin]; END; PRINT(" Initialization of ",Noscs," oscillators to the tallest of ",Nlins," lines.",CrLf); RETURN END "initialize"; SETFORMAT(0,7); COMMENT (UpdateMap) - Association of spectral lines to oscs; Nlins <- Nactive[Frame]; # Number of Lins currently active; REQUIRE " PrvLinOfOsc only used for trace messages?" MESSAGE; ARRTRAN(PrvLinOfOsc,LinOfOsc); ARRCLR(LinOfOsc); ARRCLR(OscOfLin); IF UDtrace THEN BEGIN PRINT(CrLf,"------- UpdateMap entry on frame ",Frame," --------", CrLf, " Number of lines = ",Nlins," ... Number of oscs = ",Noscs,CrLf, " PrvLinOfOsc follows:",CrLf); FOR i <- 1 Thru MaxOscs DO PRINT(" ",PrvLinOfOsc[i]); END; FOR CurOsc <- 1 STEP 1 UNTIL Noscs DO GetClosestFrq(CurOsc,DFmax(CurOsc)); # Allocate oscillators to new lines, if any; Nadd <- (Nlins MIN MaxOscs)-Noscs; IF Nadd>0 THEN BEGIN "Add" PRINT("$$$ Adding ",Nadd," oscillator(s) $$$",CrLf); CurLin <- 0; FOR CurOsc <- Noscs+1 Thru Noscs+Nadd DO BEGIN "AddOsc" DO CurLin <- CurLin+1 UNTIL OscOfLin[CurLin]=0; ; # Find lin with no osc; OscOfLin[CurLin] <- CurOsc; LinOfOsc[CurOsc] <- CurLin; IF UDtrace THEN PRINT(" Osc ",CurOsc," dispatched to ",LinFrq[CurLin]," Hz",CrLf); END "AddOsc"; Noscs <- Noscs+Nadd; END "Add"; IF UDtrace THEN BEGIN PRINT(CrLf," <- *^Y <- *^YUpdateMap exit on frame ",Frame,CrLf, " PrvLinOfOsc follows:",CrLf); FOR CurOsc <- 1 Thru MaxOscs DO PRINT(" ",PrvLinOfOsc[CurOsc]); PRINT(CrLf," LinOfOsc follows:",CrLf); FOR CurOsc <- 1 Thru MaxOscs DO PRINT(" ",LinOfOsc[CurOsc]); PRINT(CrLf," OscOfLin follows:",CrLf); FOR CurLin <- 1 Thru MaxOscs DO PRINT(" ",OscOfLin[CurLin]); END; COMMENT (UpdateMap) - Update new amp/frq target; # Update target values for synthesis up to latest spectral frame; FOR CurOsc <- 1 Thru MaxOscs DO BEGIN "OutAF" PrvOscFrq[CurOsc] <- OscFrq[CurOsc]; IF OscOn(CurOsc) THEN OscFrq[CurOsc] <- LinFrq[LinOfOsc[CurOsc]]; PrvOscAmp[CurOsc] <- OscAmp[CurOsc]; OscAmp[CurOsc] <- (IF OscOn(CurOsc) THEN LinAmp[LinOfOsc[CurOsc]] ELSE 0); IF PrvOscAmp[CurOsc]=0 THEN PrvOscFrq[CurOsc] <- OscFrq[CurOsc]; END "OutAF"; CrLfd <- FALSE; DEFINE CkCrLf={ (IF NOT CrLfd THEN CrLf ELSE NULL) }; # Gad, what an awkward thing; FOR CurOsc <- 1 STEP 1 UNTIL Noscs DO BEGIN IF PrvOscFree(CurOsc) AND OscOn(CurOsc) THEN PRINT(CkCrLf," $$$ Osc ",CurOsc," *starting* on frequency ",OscFrq[CurOsc]," $$$",Crlfd <- CrLf); IF PrvOscSquelched(CurOsc) AND OscOn(CurOsc) THEN PRINT(CkCrLf," $$$ Osc ",CurOsc," *revived* on frequency ",OscFrq[CurOsc]," $$$",CrLfd <- CrLf); IF OscSquelched(CurOsc) AND PrvOscOn(CurOsc) THEN PRINT(CkCrLf," $$$ Osc ",CurOsc," *squelched* on frequency ",OscFrq[CurOsc]," $$$",CrLfd <- CrLf); END; SETFORMAT(Owid,Odig); END "UpdateMap"; COMMENT Synthesize - Use database to run oscillators up to latest targets; DEFINE UseFixedPoint = "FALSE"; # Set true for fast execution; DEFINE SinSiz = "512"; # No. wds in Sine table for synthesis; IFC UseFixedPoint THENC INTEGER ELSEC REAL ENDC ARRAY SinBuf[0:SinSiz-1]; # Sine table; REAL Mag; # Mag * Frq = Sine table increment; PROCEDURE Synthesize(INTEGER Nhop,Bp,Fs,Frame); # Crank out MaxOscs sinusoids from amplitude PrvOscAmp[i] to OscAmp[i], and frequency PrvOscFrq[i] to OscFrq[i] (integrating OscPhs[i]). Add result to OutBuf[Bp+1:Bp+Nhop]. Only Noscs oscs will be active, but there are also some Ending ones which have to be ramped to 0. Hence a loop from 1 to MaxOscs is required. At present, there is no way to specify skipping the first N partials in the synthesis. Since the oscs are not necessarily sorted by frequency, this requires a little fooling around. See the QuickSort (QIsort) in FindPartials for ideas. I think I would sort the oscillator numbers according to increasing frequency, and then testing OscOn I would skip over the first active oscillators in the loop to MaxOscs below. ; REQUIRE CrLf&" Need code to skip 1st N partials here "&CrLf MESSAGE; BEGIN "Synthesize" INTEGER IntIdx,Zosc,Nsamp,FpSinSiz; IFC UseFixedPoint THENC REQUIRE " Using fixed-point synthesis" MESSAGE; INTEGER ARRAY SynBuf[1:Nhop]; # Synthesis output buffer (for speed); INTEGER Y1,Y2,Y, Remain,Amp,Damp,Inc,Dinc,Phs; # Set up fixed-point number system (for maximum execution speed); DEFINE Nbits="14"; # No. bits to right of binary point; # Number of oscs summable is 2^(36-2*Nbits); DEFINE One = "(1 LSH Nbits)"; # A "1" in fixed-point; DEFINE R2Fp(x) = {((x) * One)}; # REAL to Fixed-Point (Fp), no round; DEFINE Fp2I(x) = {((x) ASH -Nbits)}; # Fp to INTEGER, no round; DEFINE Frac(x) = {((x) LAND (One-1))}; # Mask off fractional part; DEFINE FpFp(x,y)={((x)*(y) ASH -Nbits)}; # Fixed-point times fixed-point; ELSEC REQUIRE " Using floating-point synthesis"&CrLf MESSAGE; REAL ARRAY SynBuf[1:Nhop]; # Synthesis output buffer (for speed); REAL Y1,Y2,Y, Remain,Amp,Damp,Inc,Dinc,Phs; INTEGER PROCEDURE Floor(REAL x); RETURN(x); DEFINE One = "1"; DEFINE R2Fp(x) = {(x)}; DEFINE Fp2I(x) = { Floor(x) }; DEFINE Frac(x) = { (X-Floor(x)) }; DEFINE FpFp(x,y)={((x)*(y))}; ENDC IF Frame LEQ 1 THEN BEGIN "InitSine" REAL Pi,Dang; Pi <- 4*ATAN(1); Dang <- 2*Pi/SinSiz; FOR i <- 0 STEP 1 UNTIL SinSiz-1 DO SinBuf[i] <- R2Fp(SIN(i*Dang)); Mag <- SinSiz/Fs; END "InitSine"; FOR Zosc <- 1 STEP 1 UNTIL MaxOscs DO BEGIN "OscLoop" Amp <- R2Fp(PrvOscAmp[Zosc]); # Current amplitude (linear); Damp <- R2Fp(OscAmp[Zosc]-PrvOscAmp[Zosc])/Nhop; # Per-sample increment; IF Amp=Damp=0 # Only Noscs oscs are active; THEN CONTINUE "OscLoop"; # so avoid inactive ones; Inc <- R2Fp(Mag*PrvOscfrq[Zosc]); # Current table increment; Dinc <- R2Fp(Mag*(OscFrq[Zosc]-PrvOscfrq[Zosc])/Nhop); # Per-sample step; Phs <- OscPhs[Zosc]; # Current oscillator phase; FOR Nsamp <- 1 STEP 1 UNTIL Nhop DO # Synthesize one sinusoid; BEGIN "SynLoop" IntIdx <- Fp2I(Phs); # Sine table address; Remain <- Frac(Phs); # Linear interpolation amount; DEFINE SineMask = "(SinSiz-1)"; # For power-of-2 table length only; Y1 <- SinBuf[IntIdx LAND SineMask]; Y2 <- SinBuf[(IntIdx+1) LAND SineMask]; Y <- Y1 + FpFp(Remain,(Y2 - Y1)); SynBuf[Nsamp] <- SynBuf[Nsamp] + FpFp(Amp,Y); Amp <- Amp + Damp; Inc <- Inc + Dinc; Phs <- Phs + Inc; END "SynLoop"; FpSinSiz <- SinSiz*One; Phs <- (IF Phs GEQ FpSinSiz THEN Phs-FpSinSiz ELSE IF Phs<0 THEN Phs+FpSinSiz ELSE Phs); OscPhs[Zosc] <- Phs; END; FOR Nsamp <- Bp+1 STEP 1 UNTIL Bp+Nhop DO OutBuf[Nsamp] <- OutBuf[Nsamp] + (SynBuf[Nsamp-Bp]/One); # No rounding; END "Synthesize"; COMMENT Processing; GetWin(WinBuf,WinType,Nx); # Load window buffer; IF DoFlt THEN BEGIN FOR i <- 1 STEP 1 UNTIL Nh DO H[i] <- IC[i]; # Load filter into 1st part of buffer; TDpyEd(H,Nh,"Filter Impulse response"); !FFA(H,Nfft); # FFT filter impulse response; TDpyFFA(H,Nfft,"Filter spectrum",InF(Clock)); END; DBscl <- (10.0/LOG(10.0)); # Incredibly, this is not performed at compile time; Fs <- InF(Clock); Xp <- Bp <- Yp <- 0; # Input, OLA, and output sample pointers; PRINT("At time: "); DO BEGIN "RI" INTEGER i; PRINT(Cvfs(Xp/Fs)," "); ARRCLR(X); IF Xp+Nx>InF(#samps) THEN Nx <- InF(#samps) - Xp; Sandi(InF(Channel),Xp,Nx,X,InF(#samps),TRUE,InF(Pack)); Xp <- Xp + Nhop; Nhops <- Nhops + 1; Frame <- Xp/Nhop; Tdpyed(X,Nx,"Input frame "&CVS(Frame)); IF WinType>1 THEN FOR i <- 1 STEP 1 UNTIL Nx DO X[i] <- X[i] * WinBuf[i]; Tdpyed(X,Nx,"Windowed input frame "&CVS(Frame)); !FFA(X, Nfft); TDpyFFA(X,Nfft,"Windowed input frame spectrum",Fs); FOR i <- 1 STEP 2 UNTIL Nfft+1 DO BEGIN "DoFilter" INTEGER Ip1; REAL Xi,Hi,Xip1,Hip1; Ip1 <- i+1; Xi <- X[i]; Hi <- H[i]; Xip1 <- X[Ip1]; Hip1 <- H[Ip1]; XmagDB[(i+1) % 2] <- DBscl*LOG(Xi*Xi+Xip1*Xip1 MAX 1.0@-20); IF DoFlt THEN BEGIN X[i] <- Xi*Hi - Xip1*Hip1; X[Ip1] <- Xi*Hip1 + Xip1*Hi; END END "DoFilter"; IF SwapOut THEN BEGIN REAL ARRAY Xout[1:Nspec]; ARRTRAN(Xout,XmagDB); # Xout[1:257] gets clobbered if packing mode not 3; Tdpyed(Xout,Nspec,"*** OUTPUT DB (SWAP) SPECTRUM ***"); IF Sando(OutF(Channel),Yp,Nspec,Xout,OutH,Yp,TRUE,OutF(Pack)) THEN PRINT(" *** Sando unhappy ***",CrLf); PRINT(" * "); END; FindPartials(Nactive[Frame] <- MaxLins,XmagDB,LinAmpDB,LinFrq,Fs,MinSep, Thresh,Hyst,Nfft,Frame,MinWid,Fc1,Fc2); FOR i <- 1 Thru Nactive[Frame] DO LinAmp[i] <- 10^(LinAmpDB[i]/20)*SigScl; UpdateMap; # Unravel partials into consistent tracks (for synthsis); FOR CurOsc <- 1 Thru MaxOscs DO BEGIN "AmpsFrqs" Amps[CurOsc,Frame] <- OscAmp[CurOsc]; Frqs[CurOsc,Frame] <- OscFrq[CurOsc]; END "AmpsFrqs"; # TDpyFFA(X,Nfft,"Filtered windowed input frame spectrum",Fs); IF DoFlt THEN BEGIN "ola" !FFS(X, Nfft); # Inverse FFT (unnecessarily slow); Tdpyed(X,Nx+Nh,"Filtered windowed input frame"); FOR i <- Bp+1 STEP 1 UNTIL Bp+Nfft DO OutBuf[i] <- OutBuf[i] + X[i-Bp]; # For hack; END "ola"; IF DoSynth THEN BEGIN "callsynth" Tdpyed(OutBuf,Bp+Nhop,"OLA buffer BEFORE additive synthesis"); Synthesize(Nhop,Bp,Fs,Frame); # Generate synthesis up to sample Bp+Nhop in OutBuf; Tdpyed(OutBuf,Bp+Nhop,"OLA buffer AFTER additive synthesis"); END "callsynth"; IF NOT SwapOut THEN BEGIN Bp <- Bp + Nhop; IF Bp GEQ Nfft THEN IF DoOut THEN BEGIN "Wout" Tdpyed(OutBuf,Nfft,"*** OUTPUT OLA buffer BEFORE decimation"); Nout <- Decimate(OutBuf,Nfft,Ndec); # Downsample; Tdpyed(OutBuf,Nout,"*** OUTPUT OLA buffer AFTER decimation"); Sando(OutF(Channel),Yp,Nout,OutBuf,OutH,Yp,TRUE,OutF(Pack)); FOR i <- 1 STEP 1 UNTIL Nfft DO OutBuf[i] <- OutBuf[i+Nfft]; FOR i <- Nfft+1 STEP 1 UNTIL 2*Nfft DO OutBuf[i] <- 0; Bp <- Bp - Nfft; PRINT(" * "); END "Wout" END END "RI" UNTIL Xp GEQ InF(#samps) OR Trace=-999 OR Frame GEQ Nframes; TmpStr <- "PARSHL: From "&Idev&":"&Ifile&"; "& (IF DoFlt THEN "Filter="&Ffile ELSE NULL)&CrLf&Tab& (IF DoSynth THEN " AS" ELSE NULL)& " Nhops="&CVS(Nhops)& " Window="&WinStr& " Nfft="&CVS(Nfft)& " Nframe="&CVS(Nx)& " Nhop="&CVS(Nhop)& " Compression="&CVS(Ndec)&CrLf&Tab& " MinSpace="&Cvfs(MinSep)&"Hz"& " Lines="&CVS(MaxLins)& " Oscs="&CVS(MaxOscs)& " FirstFrq="&Cvfs(Fc1)&"Hz"& " LastFrq="&Cvfs(Fc2)&"Hz"&CrLf&Tab& " Thresh="&Cvfs(Thresh)&"dB"& " Hyst="&Cvfs(Hyst)&"dB"& " DFmax1="&Cvfs(DFmax1)&"Hz"& " UltDFmax="&Cvfs(DFmax2)&"Hz"&CrLf& " (+)"&Tab&InF(Text)&Crlf; IF DoOut THEN BEGIN "flush" Nout <- Decimate(OutBuf,Bp+Nh,Ndec); # Downsample; Sando(OutF(Channel),Yp,Nout,OutBuf,OutH,Yp,TRUE,OutF(Pack)); # Last piece; Sando(OutF(Channel),0,0,X,OutH,Yp,TRUE,OutF(Pack)); # Flush output buffer; WriteH(Head, OutF(Clock), OutF(Pack), OutF(#chans), OutF(#ticks), Maxamp, TmpStr); USETO (OutF(Channel), 1); ARRYOUT (OutF(Channel), Head [1], 128); RELEASE(InF(Channel)); RELEASE(OutF(Channel)); END "flush"; OutPartials(MaxOscs,Nframes,Amps,Frqs,Fs,Thresh); WriteH(Head, AmpF(Clock), AmpF(Pack), AmpF(#chans), AmpF(#ticks), Maxamp, TmpStr); USETO(AmpF(Channel),1); ARRYOUT(AmpF(Channel),Head[1],128); RELEASE(AmpF(Channel)); WriteH(Head, FrqF(Clock), FrqF(Pack), FrqF(#chans), FrqF(#ticks), Maxamp, TmpStr); USETO(FrqF(Channel),1); ARRYOUT(FrqF(Channel),Head[1],128); RELEASE(FrqF(Channel)); IF Trace THEN BEGIN REAL ARRAY TmpRA[1:Nframes]; FOR i <- 1 STEP 1 UNTIL Nframes DO TmpRA[i] <- Nactive[i]; DpyEd(TmpRA,Nframes,"Number of active partials versus frame number") END; END "ALAR"; PRINT("R R;REDUCE to convert Amplitudes and Frequencies to SEG functions",CrLf); PRINT("R R;SND2MF to convert Amplitudes and Frequencies to MERGE file format",CrLf); END "PARSHL".
Bibliography
- 1
-
M. Abe and J. O. Smith, ``Design criteria for simple sinusoidal parameter
estimation based on quadratic interpolation of FFT magnitude peaks,'' Audio Engineering Society Convention, San Francisco, 2004,
Preprint 6256. - 2
- J. Abel, ``private communication,'' 2006.
- 3
-
J. S. Abel, ``Expressions relating frequency, critical-band rate, and critical
bandwidth,'' 1997,
submitted for publication. - 4
- E. Aboutanios and B. Mulgrew, ``Iterative frequency estimation by interpolation on fourier coefficients,'' IEEE Transactions on Signal Processing, vol. 53, pp. 1237-1242, Apr. 2005.
- 5
-
M. Abramowitz and I. A. Stegun, eds., Handbook of Mathematical Functions,
New York: Dover, 1965. - 6
-
M. Ali, Adaptive Signal Representation with Application in Audio Coding,
PhD thesis, University of Minnesota, Mar. 1996. - 7
- J. B. Allen, ``Short term spectral analysis, synthesis, and modification by discrete Fourier transform,'' IEEE Transactions on Acoustics, Speech, Signal Processing, vol. ASSP-25, pp. 235-238, June 1977.
- 8
- J. B. Allen, ``Application of the short-time Fourier transform to speech processing and spectral analysis,'' Proc. IEEE ICASSP-82, pp. 1012-1015, 1982.
- 9
- J. B. Allen and L. R. Rabiner, ``A unified approach to short-time Fourier analysis and synthesis,'' Proc. IEEE, vol. 65, pp. 1558-1564, Nov. 1977.
- 10
-
X. Amatriain, J. Bonada, A. Loscos, and X. Serra, ``Spectral processing,'' in
DAFX - Digital Audio Effects (U. Zölzer, ed.), pp. 373-438, West
Sussex, England: John Wiley and Sons, LTD, 2002,
http://www.dafx.de/. - 11
- B. S. Atal and L. S. Hanauer, ``Speech analysis and synthesis by linear prediction of the speech wave,'' Journal of the Acoustical Society of America, vol. 50, pp. 637-655, 1971.
- 12
- F. Auger and P. Flandrin, ``Improving the readibility of time-frequency and time-scale representations by the reassignment method,'' IEEE Transactions on Signal Processing, vol. 43, pp. 1068-1089, May 1995.
- 13
- L. C. Barbosa, ``A maximum-energy-concentration spectral window,'' IBM Journal of Research and Development, vol. 30, pp. 321-325, May 1986.
- 14
-
M. Bellanger, ``Improved design of long FIR filters using the frequency
masking technique,'' in Proceedings of the International Conference on Acoustics, Speech, and
Signal Processing, Atlanta, (New York), IEEE Press, May 1996,
Paper DSP1.1. - 15
-
L. L. Beranek, Acoustics,
http://asa.aip.org/publications.html: American Institute of Physics, for the Acoustical Society of America, 1986,
1st ed. 1954. - 16
-
M. Bosi and R. E. Goldberg, Introduction to Digital Audio Coding and
Standards,
Boston: Kluwer Academic Publishers, 2003. - 17
-
M. Bosi, K. Brandenburg, S. Quackenbush, L. Fielder, K. Akagin, H. Fuchs,
M. Dietz, J. Herre, G. Davidson, and Y. Oikawa, ``ISO / IEC MPEG-2 advanced
audio coding,'' Audio Engineering Society Convention, vol. Preprint 4382,
Nov. 1996,
36 pages. See also ISO/IEC International Standard IS 13818-7 entitled ``MPEG-2 Advanced Audio Coding,'' April, 1997. - 18
-
L. Bosse, ``An experimental high fidelity perceptual audio coder,'' tech. rep.,
Elec. Engineering Dept., Stanford University (CCRMA), Mar. 1998,
Music 420 Project Report, https://ccrma.stanford.edu/~jos/bosse/. - 19
-
R. C. Boulanger, ed., The Csound Book: Perspectives in Software
Synthesis, Sound Design, Signal Processing, and Programming,
MIT Press, Mar. 2000. - 20
-
J. Bouvrie and T. Ezzat, ``An incremental algorithm for signal reconstruction
from short-time fourier transform magnitude,'' in INTERSPEECH 2006 --
ICSLP, 2006,
http://cbcl.mit.edu/publications/ps/signalrec_ICSLP06.pdf. - 21
-
G. E. P. Box and G. M. Jenkins, Time Series Analysis,
San Francisco: Holden Day, 1976. - 22
-
S. Boyd and L. Vandenberghe, Convex Optimization,
Cambridge University Press, Feb. 2004,
http://www.stanford.edu/~boyd/cvxbook/. - 23
-
R. Bracewell, The Fourier Transform and its Applications,
New York: McGraw-Hill, 1965. - 24
- K. Brandenburg, ``Perceptual coding of high quality digital audio,'' in Applications of DSP to Audio & Acoustics (M. Kahrs and K. Brandenburg, eds.), pp. 39-83, Boston/Dordrecht/London: Kluwer Academic Publishers, 1998.
- 25
- K. Brandenburg and M. Bosi, ``Overview of MPEG audio: Current and future standards for low-bit-rate audio coding,'' Journal of the Audio Engineering Society, vol. 45, pp. 4-21, Jan./Feb. 1997.
- 26
-
A. S. Bregman, Auditory Scene Analysis: the Perceptual Organization of
Sound,
Cambridge, MA: MIT Press, 1990. - 27
-
R. Bristow-Johnson, ``Wavetable synthesis 101, a fundamental perspective,''
presented at 101st AES Convention, no. Preprint 4400, 1996,
http://www.musicdsp.org/files/Wavetable-101.pdf. - 28
- R. Bristow-Johnson, ``Tutorial on floating-point versus fixed-point,'' Audio Engineering Society Convention, Oct. 2008.
- 29
- J. C. Brown, ``Calculation of a constant Q spectral transform,'' Journal of the Acoustical Society of America, vol. 89, no. 1, pp. 425-434, 1991.
- 30
- J. C. Brown and M. S. Puckette, ``An efficient algorithm for the calculation of a constant Q transform,'' Journal of the Acoustical Society of America, vol. 92, no. 5, pp. 2698-2701, 1992.
- 31
- T. Brown and M. M. Wang, ``An iterative algorithm for single-frequency estimation,'' IEEE Transactions on Signal Processing, vol. 50, no. 11, pp. 2671-2682, 1993.
- 32
-
C. S. Burrus, ``Chebyshev or equal ripple error approximation filters,'' tech.
rep., Connexions, Nov. 2008,
http://cnx.org/content/m16895/latest/. - 33
- R. J. Cassidy and J. O. Smith III, ``Efficient time-varying loudness estimation via the hopping Goertzel DFT,'' in Proceedings of the IEEE International Midwest Symposium on Circuits and Systems (MWSCAS-2007), Aug. 2007.
- 34
- C. Chafe, D. Jaffe, K. Kashima, B. Mont-Reynaud, and J. O. Smith, ``Techniques for note identification in polyphonic music,'' in Proceedings of the 1985 International Computer Music Conference, Vancouver, Computer Music Association, searchable at http://quod.lib.umich.edu/i/icmc/, 1985.
- 35
-
H. Chamberlin, Musical Applications of Microprocessors,
New Jersey: Hayden Book Co., Inc., 1980. - 36
-
D. C. Champeney, A Handbook of Fourier Theorems,
Cambridge University Press, 1987. - 37
-
D. G. Childers, ed., Modern Spectrum Analysis,
New York: IEEE Press, 1978. - 38
-
J. M. Chowning, ``The synthesis of complex audio spectra by means of frequency
modulation,'' Journal of the Audio Engineering Society, vol. 21, no. 7,
pp. 526-534, 1973,
reprinted in [236]. - 39
- J. M. Chowning, ``Frequency modulation synthesis of the singing voice,'' in Current Directions in Computer Music Research (M. V. Mathews and J. R. Pierce, eds.), pp. 57-63, Cambridge, MA: MIT Press, 1989.
- 40
- J. M. Chowning, ``private communication,'' 2006.
- 41
-
J. M. Chowning and D. Bristow, FM Theory and Applications,
Tokyo: Yamaha Corporation, 1986. - 42
-
R. V. Churchill, Complex Variables and Applications,
New York: McGraw-Hill, 1960. - 43
-
L. Cohen, Time-Frequency Analysis,
Englewood Cliffs, NJ: Prentice-Hall, 1995. - 44
- B. Conolly and I. J. Good, ``A table of discrete Fourier transform pairs,'' SIAM J. Applied Math, vol. 32, no. 4, pp. 810-822, 1977.
- 45
-
P. Cook and G. Scavone, Synthesis Tool Kit in C++, Version 4,
https://ccrma.stanford.edu/software/stk/, 2010,
see also https://ccrma.stanford.edu/~jos/stkintro/. - 46
-
P. R. Cook, Identification of Control Parameters in an Articulatory Vocal
Tract Model, with Applications to the Synthesis of Singing,
PhD thesis, Elec. Engineering Dept., Stanford University (CCRMA), Dec. 1990,
http: //www.cs.princeton.edu/~prc/. - 47
-
P. R. Cook, Real Sound Synthesis for Interactive Applications,
A. K. Peters, L.T.D., 2002. - 48
-
T. Cover and J. Thomas, Elements of Information Theory,
New York: John Wiley and Sons, Inc., 1991. - 49
- R. Crochiere, ``A weighted overlap-add method of short-time Fourier analysis/synthesis,'' IEEE Transactions on Acoustics, Speech, Signal Processing, vol. ASSP-28, pp. 99-102, Feb 1980.
- 50
-
R. Crochiere and L. R. Rabiner, Multirate Digital Signal Processing,
Englewood Cliffs, NJ: Prentice-Hall, 1983. - 51
- A. Croisiere, D. Esteban, and C. Galand, ``Perfect channel splitting by use of interpolation/decimation/tree decomposition techniques,'' in Proc. Int. Symp. Info., Circuits and Systems, Patras, Greece, 1976.
- 52
- O. Darrigol, ``The acoustic origins of harmonic analysis,'' Archive for History of the Exact Sciences, vol. 61, July 2007.
- 53
- G. Davidson, L. Fielder, and M. Antill, ``Low-complexity transform coder for satellite link applications,'' Audio Engineering Society Convention, vol. Preprint 2966, no. Session-Paper no. F-I-6, pp. 1-22, 1990.
- 54
-
A. de Cheveigné, ``Pitch perception models -- a historical review,'' in Proceedings of the 18th International Conference on Acoustics (ICA-04), Kyoto, Japan, p. Tu2.H.2,
2004,
http://recherche.ircam.fr/equipes/pcm/cheveign/pss/2004_ICA.pdf. - 55
- A. de Cheveigne and H. Kawahara, ``YIN, a fundamental frequency estimator for speech and music,'' Journal of the Acoustical Society of America, vol. 111, no. 4, pp. 1917-1930, 2002.
- 56
-
J. R. Deller Jr., J. G. Proakis, and J. H. Hansen, Discrete-Time
Processing of Speech Signals,
New York: Macmillan, 1993. - 57
-
J. R. Deller Jr., J. H. L. Hansen, and J. G. Proakis, Discrete-Time
Processing of Speech Signals,
New York: John Wiley and Sons, Inc., Mar. 2001. - 58
- P. Depalle and T. Hélie, ``Extraction of spectral peak parameters using a short-time Fourier-transform modeling and no-sidelobe windows,'' in Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, (New York), IEEE Press, Oct. 1997.
- 59
-
P. A. M. Dirac, The Principles of Quantum Mechanics, Fourth Edition,
New York: Oxford University Press, 1958-2001. - 60
-
C. Dodge and T. A. Jerse, Computer Music: Synthesis, Composition, and
Performance,
New York: Wadsworth Publication Co., 1997. - 61
- C. L. Dolph, ``A current distribution for broadside arrays which optimizes the relationship between beam width and side-lobe level,'' Proceedings of the IRE, vol. 34, pp. 335-348, 1946.
- 62
- M. Dolson, ``The phase vocoder: A tutorial,'' Computer Music Journal, vol. 10, no. 4, pp. 14-27, 1986.
- 63
- S. Dostrovsky, ``Early vibration theory: Physics and music in the seventeenth century,'' Archive for History of the Exact Sciences, vol. 14, no. 3, 1975.
- 64
- B. Doval and X. Rodet, ``Estimation of fundamental frequency of musical sound signals,'' in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, Toronto, (New York), pp. 3657-3660, vol. 5, IEEE Press, April 14-17, 1991.
- 65
- B. Doval and X. Rodet, ``Fundamental frequency estimation using a new harmonic matching method,'' in Proceedings of the 1991 International Computer Music Conference, Montréal, pp. 555-558, Computer Music Association, searchable at http://quod.lib.umich.edu/i/icmc/, 1991.
- 66
-
DSP Committee, ed., Programs for Digital Signal Processing,
New York: IEEE Press, 1979. - 67
-
DSP Committee, ed., Programs for Digital Signal Processing II,
New York: IEEE Press, 1979. - 68
-
H. W. Dudley, ``The vocoder,'' Bell Labs Rec., vol. 18, pp. 122-126,
1939,
Reprinted in [243, pp. 347-351]. - 69
- B. Eaglestone and S. Oates, ``Analytical tools for group additive synthesis,'' in Proceedings of the 1990 International Computer Music Conference, Glasgow, Computer Music Association, searchable at http://quod.lib.umich.edu/i/icmc/, 1990.
- 70
-
D. P. W. Ellis, ``A phase vocoder in Matlab,'' 2002,
Web resource. - 71
-
H. Fastl and E. Zwicker, Psychoacoustics: Facts and Models,
Berlin: Springer Verlag, 2006,
third edition, 462pp., CD-ROM. - 72
- J. A. Fessler and B. P. Sutton, ``Nonuniform fast Fourier transforms using min-max interpolation,'' IEEE Transactions on Signal Processing, vol. 51, pp. 560-574, Feb. 2003.
- 73
- K. Fitz and L. Haken, ``On the use of time-frequency reassignment in additive sound modeling,'' Journal of the Audio Engineering Society, vol. 50, pp. 879-893, Nov. 2002.
- 74
-
J. L. Flanagan, Speech Analysis, Synthesis, and Perception,
New York: Springer Verlag, 1972. - 75
-
J. L. Flanagan, ``Voices of men and machines,'' Journal of the Acoustical
Society of America, vol. 51, pp. 1375-1387, Mar. 1972,
Reprinted in [243, pp. 4-16]. - 76
-
J. L. Flanagan and R. M. Golden, ``Phase vocoder,'' Bell System Technical
Journal, vol. 45, pp. 1493-1509, Nov. 1966,
Reprinted in [243, pp. 388-404]. - 77
-
N. H. Fletcher and T. D. Rossing, The Physics of Musical Instruments, 2nd
Edition,
New York: Springer Verlag, 1998. - 78
-
A. Franck, Efficient Algorithms for Arbitrary Sample Rate Conversion with
Application to Wave Field Synthesis,
PhD thesis, Technischen Universität Ilmeneau, Nov. 2011. - 79
-
D. K. Frederick and A. B. Carlson, Linear Systems in Communication and
Control,
New York: John Wiley and Sons, Inc., 1971. - 80
-
M. Frigo and S. G. Johnson, ``FFTW: An adaptive software architecture for the
FFT,'' in Proceedings of the International Conference on Acoustics, Speech, and Signal
Processing, Seattle, vol. 3, (New York), pp. 1381-1384, IEEE Press, 1998,
http://www.fftw.org/. - 81
- S. A. Fulop and K. Fitz, ``A spectrogram for the twenty-first century,'' Acoustics Today, pp. 26-33, July 2006.
- 82
- T. J. Gardner and M. O. Magnesco, ``Sparse time-frequency representations,'' Proc. National Academy of Sciences of the United States of America, vol. 103, April 2006.
- 83
- E. B. George and M. J. T. Smith, ``A new speech coding model based on least-squares sinusoidal representation,'' Proc. IEEE ICASSP-87, vol. 3, pp. 1641-1644, 1987.
- 84
- E. B. George and M. J. T. Smith, ``Analysis-by-synthesis/Overlap-add sinusoidal modeling applied to the analysis and synthesis of musical tones,'' Journal of the Audio Engineering Society, vol. 40, no. 6, pp. 497-516, 1992.
- 85
-
A. Gerzso, ``Density of spectral components: Preliminary experiments,'' tech.
rep., IRCAM, 1978,
Report no. 31/80, abstract and ordering information on-line at http://mediatheque.ircam.fr/articles/index-e.html. - 86
-
P. E. Gill, W. Murray, and M. H. Wright, Practical Optimization,
New York: Academic Press, 1981. - 87
- B. R. Glasberg and B. C. J. Moore, ``Derivation of auditory filter shapes from notched-noise data,'' Hearing Research, vol. 47, pp. 103-138, 1990.
- 88
- B. R. Glasberg and B. C. J. Moore, ``A model of loudness applicable to time-varying sounds,'' Journal of the Audio Engineering Society, vol. 50, pp. 331-342, May 2002.
- 89
-
S. Golden, ``Exponential polynomial signals: Estimation, analysis, and
applications,'' 1995,
citeseer.nj.nec.com/golden95exponential.html. - 90
-
S. Golden and B. Friedlander, ``Estimation and statistical analysis of
exponential polynomial signals.''
citeseer.nj.nec.com/golden98estimation.html. - 91
-
S. Golden and B. Friedlander, ``Maximum likelihood estimation, analysis, and
applications of exponential polynomial signals.''
citeseer.nj.nec.com/golden99maximum.html. - 92
-
G. H. Golub and C. F. Van Loan, Matrix Computations, 3rd Edition,
Baltimore: The Johns Hopkins University Press, 1996. - 93
-
M. Goodwin and A. Kogon, ``Overlap-add synthesis of nonstationary sinusoids,''
in Proceedings of the 1995 International Computer Music Conference, Banff, Computer Music
Association, searchable at
http://quod.lib.umich.edu/i/icmc/,
1995,
citeseer.nj.nec.com/goodwin95overlapadd.html. - 94
- M. Goodwin and A. Kogon, ``Overlap-add synthesis of nonstationary sinusoids,'' in International Computer Music Conference, 1995.
- 95
-
R. M. Gray and L. D. Davisson, An Introduction to Statistical Signal
Processing,
Cambridge University Press, 2003,
http://www-ee.stanford.edu/~gray/sp.pdf. - 96
- D. D. Greenwood, ``A cochlear frequency-position function for several species--29 years later,'' Journal of the Acoustical Society of America, vol. 87, pp. 2592-605, June 1990.
- 97
-
J. Grey, An Exploration of Musical Timbre,
PhD thesis, Stanford University, Feb. 1975,
available as CCRMA Technical Report STAN-M-2, Music Dept., Stanford University. - 98
- D. W. Griffin and J. S. Lim, ``Signal estimation from modified short-time Fourier transform,'' IEEE Transactions on Acoustics, Speech, Signal Processing, vol. ASSP-32, pp. 236-243, Apr 1984.
- 99
- D. W. Griffin and J. S. Lim, ``Multiband-excitation vocoder,'' IEEE Transactions on Acoustics, Speech, Signal Processing, vol. ASSP-36, pp. 236-243, Feb 1988.
- 100
-
D. Halliday, R. Resnick, and J. Walker, Extended, Fundamentals of Physics,
6th Edition,
New York: John Wiley and Sons, Inc., 2000. - 101
- F. J. Harris, ``On the use of windows for harmonic analysis with the discrete Fourier transform,'' Proceedings of the IEEE, vol. 66, pp. 51-83, Jan 1978.
- 102
-
W. Hartmann, Signals, Sound, and Sensation,
New York: AIP Press, 1997,
647 pp., 221 illustrations, hardcover. - 103
- M. H. Hayes, J. S. Lim, and A. V. Oppenheim, ``Signal reconstruction from phase or magnitude,'' IEEE Transactions on Acoustics, Speech, Signal Processing, vol. ASSP-28, pp. 672-680, Dec 1980.
- 104
-
D. Hejna and B. R. Musicus, ``The solafs time-scale modification algorithm,''
tech. rep., BBN Technical Report, July 1991,
http://www.ee.columbia.edu/ dpwe/papers/HejMus91-solafs.pdf. - 105
- H. D. Helms, ``Digital filters with equiripple or minimax responses,'' IEEE Transactions on Audio and Electroacoustics, vol. 19, pp. 87-94, Mar. 1971.
- 106
-
W. J. Hess, Algorithms and Devices for Pitch Determination of
Speech-Signals,
Berlin: Springer-Verlag, 1983. - 107
- A. Horner, J. Beauchamp, and L. Haken, ``Methods for multiple wavetable synthesis of musical instrument tones,'' Journal of the Audio Engineering Society, vol. 41, pp. 336-356, May 1993.
- 108
-
P. Huang, Artificial Reverberation using the Digital Waveguide Mesh,
PhD thesis, Music Department, Stanford University,
in preparation. - 109
-
B. Hutchins, Musical Engineer's Handbook,
Ithaca, New York: Electronotes, 1975. - 110
- T. Irino and H. Kawahara, ``Signal reconstruction from modified auditory wavelet transform,'' IEEE Transactions on Signal Processing, vol. 41, no. 12, pp. 3549-3554, 1993.
- 111
- T. Irino and R. D. Patterson, ``A time-domain, level-dependent auditory filter: The gammachirp,'' Journal of the Acoustical Society of America, vol. 101, pp. 412-419, 1997.
- 112
- T. Irino and M. Unoki, ``A time-varying analysis/synthesis auditory filterbank using the gammachirp,'' in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, Seattle, (New York), IEEE Press, 1998.
- 113
-
ISE/IEC JTC 1/SC 29/WG 11, ISO/IEC 11172-3: Information Technology -
Coding of Moving Pictures and Associated Audio for Digital Storage Media at
up to about 1.5 Mbit/s - Part 3: Audio,
Motion Picture Experts Group, 1993. - 114
-
T. Kailath, A. H. Sayed, and B. Hassibi, Linear Estimation,
Englewood Cliffs, NJ: Prentice-Hall, Apr. 2000. - 115
-
J. F. Kaiser, ``Using the
window function,'' IEEE Transactions on
Circuits and Systems--I: Fundamental Theory and Applications, pp. 20-23,
April 22-25 1974,
Reprinted in [67], pp. 123-126. - 116
- L. J. Karam and J. H. McClellan, ``Complex Chebyshev approximation for FIR filter design,'' IEEE Transactions on Circuits and Systems--II: Analog and Digital Signal Processing, vol. 42, no. 3, pp. 207-216, 1995.
- 117
-
L. J. Karam and J. H. McClellan, ``Design of optimal digital FIR filters with
arbitrary magnitude and phase responses,'' in ISCAS-96, 1996,
http://www.eas.asu.edu/~karam/papers/iscas96_km.html. - 118
- M. Karjalainen, ``A new auditory model for the evaluation of sound quality of audio systems,'' in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, Tampa, Florida, (New York), pp. 608-611, IEEE Press, 1985.
- 119
- M. Karjalainen and J. O. Smith, ``Body modeling techniques for string instrument synthesis,'' in Proceedings of the 1996 International Computer Music Conference, Hong Kong, pp. 232-239, Computer Music Association, searchable at http://quod.lib.umich.edu/i/icmc/, Aug. 1996.
- 120
-
S. M. Kay, Modern Spectral Estimation,
Englewood Cliffs, NJ: Prentice-Hall, 1988. - 121
-
S. M. Kay, Fundamentals of Statistical Signal Processing, Volume I:
Estimation Theory,
Englewood Cliffs, NJ: Prentice-Hall, 1993. - 122
- S. Kay, ``A fast and accurate single frequency estimator,'' IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 37, pp. 1987-1990, Dec. 1989.
- 123
- Y. Kim and J. O. Smith, ``A speech feature based on bark frequency warping - the non-uniform linear prediction (nlp) cepstrum,'' in Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, (New York), IEEE Press, Oct. 1999.
- 124
- A. Klapuri, ``Multipitch estimation and sound separation by the spectral smoothness principle,'' in IEEE International Conference on Acoustics, Speech, and Signal Processing, (Salt Lake City, Utah, USA), May 2001.
- 125
-
A. Klapuri, ``Automatic transcription of music,'' in Proceedings of the Stockholm
Musical Acoustics Conference (SMAC-03),
http://www.speech.kth.se/smac03/,
(Stockholm), Royal Swedish Academy of Music, Aug. 2003,
also http://www.cs.tut.fi/~klap/iiro/index.html. - 126
- A. Klapuri, ``Multiple fundamental frequency estimation by harmonicity and spectral smoothness,'' IEEE Transactions on Speech and Audio Processing, vol. 11, no. 6, pp. 804-816, 2003.
- 127
-
A. Klapuri, ``A perceptually motivated multiple-F0 estimation method,'' in
Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and
Acoustics, New Paltz, NY, (New York), IEEE Press, Oct. 2005,
http://www.cs.tut.fi/sgn/arg/klap/waspaa2005.pdf. - 128
- D. Klatt, ``Software for a cascade/parallel formant synthesizer,'' Journal of the Acoustical Society of America, vol. 67, pp. 13-33, 1980.
- 129
- D. Klatt, ``Review of text-to-speech conversion for english,'' Journal of the Acoustical Society of America, vol. 82, pp. 737-793, Sept. 1987.
- 130
- P. Kleczkowski, ``Group additive synthesis,'' Computer Music Journal, vol. 13, no. 1, pp. 12-20, 1989.
- 131
-
D. E. Knuth, The Art of Computer Programming, Volumes 1-3,
Reading MA: Addison-Wesley, 1998,
First published in 1973. - 132
- W. Koenig, H. K. Dunn, and L. Y. Lacy, ``The sound spectrograph,'' Journal of the Acoustical Society of America, vol. 18, no. 1, pp. 19-49, 1946.
- 133
- M. Z. Komodromos, S. F. Russel, and P. T. P. Tang, ``Design of FIR Hilbert transformers and differentiators in the complex domain,'' IEEE Transactions on Circuits and Systems--II: Analog and Digital Signal Processing, vol. 45, pp. 64-67, Jan 1998.
- 134
- P. J. Kootsookos and R. C. Williamson, ``FIR approximation of fractional sample delay systems,'' IEEE Transactions on Circuits and Systems--II: Analog and Digital Signal Processing, vol. 43, pp. 40-48, Feb 1996.
- 135
- T. I. Laakso, V. Välimäki, M. Karjalainen, and U. K. Laine, ``Splitting the Unit Delay--Tools for Fractional Delay Filter Design,'' IEEE Signal Processing Magazine, vol. 13, pp. 30-60, Jan. 1996.
- 136
- H. Landau and H. Pollak, ``Prolate-spheroidal wave functions, Fourier analysis and uncertainty--ii,'' Bell System Technical Journal, vol. 40, pp. 65-84, Jan. 1961.
- 137
- J. Laroche, ``The use of the matrix pencil method for the spectrum analysis of musical signals,'' Journal of the Acoustical Society of America, vol. 94, pp. 1958-1965, Oct. 1993.
- 138
- J. Laroche, ``Time and pitch scale modification of audio signals,'' in Applications of DSP to Audio & Acoustics (M. Kahrs and K. Brandenburg, eds.), pp. 279-309, Boston/Dordrecht/London: Kluwer Academic Publishers, 1998.
- 139
- J. Laroche, ``Synthesis of sinusoids via non-overlapping inverse fourier transform,'' IEEE Transactions on Speech and Audio Processing, vol. 8, pp. 471-477, July 2000.
- 140
- J. Laroche and M. Dolson, ``About this phasiness business,'' in Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, (New York), IEEE Press, Oct. 1997.
- 141
- J. Laroche and M. Dolson, ``Improved phase vocoder time-scale modification of audio,'' IEEE Transactions on Speech and Audio Processing, vol. 7, no. 3, pp. 323-32, 1999.
- 142
- J. Laroche and M. Dolson, ``New phase-vocoder techniques for pitch-shifting, harmonizing, and other exotic effects,'' in Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, (New York), pp. 91-94, IEEE Press, Oct. 17-20, 1999.
- 143
- J. Laroche and M. Dolson, ``New phase-vocoder techniques for real-time pitch shifting, chorusing, harmonizing, and other exotic audio modifications,'' Journal of the Audio Engineering Society, vol. 47, pp. 928-936, Nov. 1999.
- 144
-
S. N. Levine and J. O. Smith, ``A sines+transients+noise audio representation
for data compression and time/pitch-scale modifications,'' Audio Engineering
Society Convention, vol. Session on Analysis and Synthesis of Sound,
no. preprint 4781, 1998,
https://ccrma.stanford.edu/~scottl/papers.html. - 145
- S. N. Levine and J. O. Smith, ``Improvements to the switched parametric and transform audio coder,'' in Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, (New York), IEEE Press, Oct. 1999.
- 146
- S. N. Levine and J. O. Smith, ``A compact and malleable sines+transients+noise model for sound,'' in Analysis, Synthesis, and Perception of Musical Sounds: The Sound of Music (J. Beauchamp, ed.), Berlin: Springer-Verlag, 2006.
- 147
-
S. N. Levine, T. S. Verma, and J. O. Smith, ``Alias-free multiresolution
sinusoidal modeling for polyphonic, wideband audio,'' in Proceedings of the IEEE
Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY,
(New York), IEEE Press, Oct. 1997,
https://ccrma.stanford.edu/~scottl/papers.html. - 148
-
S. N. Levine, T. S. Verma, and J. O. Smith, ``Multiresolution sinusoidal
modeling for wideband audio with modifications,'' in Proceedings of the International
Conference on Acoustics, Speech, and Signal Processing, Seattle, (New York),
IEEE Press, 1998,
https://ccrma.stanford.edu/~scottl/papers.html. - 149
-
S. N. Levine, Audio Representations for Data Compression and Compressed
Domain Processing,
PhD thesis, Electrical Engineering Dept., Stanford University (CCRMA), Dec. 1998,
https://ccrma.stanford.edu/~scottl/thesis.html. - 150
-
M. J. Lighthill, Introduction to Fourier Analysis,
Cambridge University Press, Jan. 1958. - 151
-
L. Ljung and T. L.Soderstrom, ``The Steiglitz-McBride algorithm
revisited--convergence analysis and accuracy aspects,'' IEEE
Transactions on Automatic Control, vol. 26, pp. 712-717, June 1981,
See also the function stmcb() in the Matlab Signal Processing Toolbox. - 152
-
L. Ljung and T. L. Soderstrom, Theory and Practice of Recursive
Identification,
Cambridge, MA: MIT Press, 1983. - 153
-
M. Lobo, L. Vandenberghe, S. Boyd, and H. Lebret, ``Applications of
second-order cone programming,'' Linear Algebra and its Applications,
vol. 284, pp. 193-228, Nov. 1998,
special issue on Linear Algebra in Control, Signals and Image Processing. http://www.stanford.edu/~boyd/socp.html. - 154
-
H.-L. Lu, Toward a High-Quality Singing Synthesizer with Vocal Texture
Control,
PhD thesis, Elec. Engineering Dept., Stanford University (CCRMA), July 2002,
https://ccrma.stanford.edu/~vickylu/thesis/. - 155
- P. Lynch, ``The Dolph-Chebyshev window: A simple optimal filter,'' America Meteorological Society Journal of the Online, vol. 125, pp. 655-660, Apr. 1997.
- 156
-
R. Lyons, Understanding Digital Signal Processing,
Upper Saddle River, NJ: Prentice Hall PTR, 2001. - 157
- J. Makhoul, ``Linear prediction: A tutorial review,'' Proceedings of the IEEE, vol. 63, pp. 561-580, Apr. 1975.
- 158
- D. Malah and J. L. Flanagan, ``Frequency scaling of speech signals by transform techniques,'' Bell System Technical Journal, vol. 60, pp. 2107-2156, Nov 1981.
- 159
- H. S. Malvar, ``Lapped transforms for efficient transform/subband coding,'' IEEE Transactions on Acoustics, Speech, Signal Processing, vol. 38, pp. 969-978, June 1990.
- 160
-
H. S. Malvar, Signal Processing with Lapped Transforms,
Boston: Artech House, 1992. - 161
- H. S. Malvar and D. H. Staelin, ``The lot: Transform coding without blocking effects,'' IEEE Transactions on Acoustics, Speech, Signal Processing, vol. 37, pp. 553-559, Apr 1989.
- 162
-
J. D. Markel and A. H. Gray, Linear Prediction of Speech,
New York: Springer Verlag, 1976. - 163
- J. Marks and J. Polito, ``Modeling piano tones,'' in Proceedings of the 1986 International Computer Music Conference, The Hague, Computer Music Association, searchable at http://quod.lib.umich.edu/i/icmc/, 1986.
- 164
- J. S. Marques and L. B. Almeida, ``Frequency-varying sinusoidal modeling of speech,'' IEEE Transactions on Acoustics, Speech, Signal Processing, vol. 37, pp. 763-765, May 1989.
- 165
- D. C. Massie, ``Wavetable sampling synthesis,'' in Applications of DSP to Audio & Acoustics (M. Kahrs and K. Brandenburg, eds.), pp. 311-341, Boston/Dordrecht/London: Kluwer Academic Publishers, 1998.
- 166
- M. V. Mathews, ``The digital computer as a musical instrument,'' Science, vol. 142, no. 11, pp. 553-557, 1963.
- 167
-
M. V. Mathews, The Technology of Computer Music,
Cambridge, MA: MIT Press, 1969. - 168
-
I. G. Mattingly, ``Speech synthesis for phonetic and phonological models,'' in
Current Trends in Linguistics (T. A. Sebeok, ed.), vol. 12,
pp. 2451-2487, The Hague: Mouton, 1974,
http://www.mindspring.com/~ssshp/ssshp_cd/im_home.htm. - 169
- R. J. McAulay and T. F. Quatieri, ``Speech analysis-synthesis based on a sinusoidal representation,'' Tech. Rep. 693, Lincoln Laboratory, MIT, 1985.
- 170
- R. J. McAulay and T. F. Quatieri, ``Phase coherence in speech reconstruction for enhancement and coding applications,'' Proc. IEEE ICASSP-89, vol. 1, pp. 207-210, 1989.
- 171
- R. J. McAulay and T. F. Quatieri, ``Pitch estimation and voicing detection based on a sinusoidal speech model,'' Proc. IEEE ICASSP-90, vol. 1, pp. 249-252, 1990.
- 172
- R. J. McAulay and T. F. Quatieri, ``Sine-wave phase coding at low data rates,'' Proc. IEEE ICASSP-91, vol. 1, pp. 577-580, 1991.
- 173
- R. J. McAulay and T. F. Quatieri, ``Mid-rate coding based on a sinusoidal representation of speech,'' Proc. IEEE ICASSP-85, vol. 3, pp. 945-948, 1985.
- 174
- R. J. McAulay and T. F. Quatieri, ``Speech analysis/synthesis based on a sinusoidal representation,'' IEEE Transactions on Acoustics, Speech, Signal Processing, vol. ASSP-34, pp. 744-754, Aug 1986.
- 175
-
J. H. McClellan, R. W. Schafer, and M. A. Yoder, DSP First: A Multimedia
Approach,
Englewood Cliffs, NJ: Prentice-Hall, 1998,
Tk5102.M388. - 176
-
J. H. McClellan, T. W. Parks, and L. R. Rabiner, ``A computer program for
designing optimum FIR linear phase digital filters,'' IEEE Transactions on
Audio and Electroacoustics, vol. 21, pp. 506-526, Dec. 1973,
program also contained in [224]. - 177
- B. C. J. Moore and B. R. Glasberg, ``A revision of Zwicker's loudness model,'' Acta Acustica, vol. 82, pp. 335-345, 1996.
- 178
- B. C. J. Moore, ``Distribution of auditory-filter bandwidths at 2 kHz in young normal listeners,'' Journal of the Acoustical Society of America, vol. 81, pp. 1633-1635, May 1987.
- 179
-
B. C. J. Moore, An Introduction to the Psychology of Hearing,
New York: Academic Press, 1997. - 180
- B. C. J. Moore and B. R. Glasberg, ``Suggested formulae for calculating auditory-filter bandwidths and excitation patterns,'' Journal of the Acoustical Society of America, vol. 74, pp. 750-753, Sept. 1983.
- 181
- B. C. J. Moore, R. W. Peters, and B. R. Glasberg, ``Auditory filter shapes at low center frequencies,'' Journal of the Acoustical Society of America, vol. 88, pp. 132-140, July 1990.
- 182
- B. C. J. Moore, B. R. Glasberg, and T. Baer, ``A model for the prediction of thresholds, loudness, and partial loudness,'' Journal of the Audio Engineering Society, vol. 45, pp. 224-249, Apr. 1997.
- 183
-
F. R. Moore, Elements of Computer Music,
Englewood Cliffs, NJ: Prentice Hall, 1990. - 184
- J. A. Moorer, ``The hetrodyne filter as a tool for analysis of transient waveforms,'' Tech. Rep. AIM-208, Stanford Artificial Intelligence Laboratory, Computer Schience Dept., Stanford University, July 1973.
- 185
-
J. A. Moorer, On the Segmentation and Analysis of Continuous Musical Sound
by Digital Computer,
PhD thesis, Computer Science Dept., Stanford University, 1975,
CCRMA Technical ReportSTAN-M-3. - 186
- J. A. Moorer, ``Signal processing aspects of computer music--a survey,'' Computer Music Journal, vol. 1, no. 1, pp. 4-37, 1977.
- 187
- J. A. Moorer, ``The use of the phase vocoder in computer music applications,'' Journal of the Audio Engineering Society, vol. 26, pp. 42-45, Jan./Feb. 1978.
- 188
- J. A. Moorer and M. Berger, ``Linear-phase bandsplitting: Theory and applications,'' Journal of the Audio Engineering Society, vol. 34, pp. 143-152, Mar. 1986.
- 189
- M. Mueller, D. P. W. Ellis, A. Klapuri, and G. Richard, ``Signal processing for music analysis,'' IEEE Journal of Selected Topics in Signal Processing, vol. 5, no. 6, pp. 1088-1110, 2011.
- 190
- W. A. Munson and H. C. Montgomery, ``A speech analyzer and synthesizer,'' Journal of the Acoustical Society of America, vol. 22, no. 5, pp. 678-678, 1950.
- 191
- S. H. Nawab and T. F. Quatieri, ``Short-time Fourier transform,'' in Adanced topics in signal processing (J. S. Lim and A. V. Oppenheim, eds.), ch. 6, Prentice-Hall, 1988.
- 192
- S. Nawab, T. Quatieri, and J. Lim, ``Signal reconstruction from short-time Fourier transform magnitude,'' IEEE Transactions on Acoustics, Speech, Signal Processing, vol. ASSP-31, pp. 986-998, Aug 1983.
- 193
- A. Ng and A. Horner, ``Iterative combinatorial basis spectra in wavetable matching,'' Journal of the Audio Engineering Society, vol. 50, no. 12, pp. 1054-1063, 2002.
- 194
- Nguyen, ``Near-perfect-reconstruction pseudo-qmf banks,'' IEEE Transactions on Signal Processing, vol. 43, pp. 65-76, Jan. 1994.
- 195
- P. Noll, ``Digital audio coding for visual communications,'' Proc. IEEE, vol. 83, pp. 925-943, June 1995.
- 196
- A. H. Nuttall, ``Some windows with very good sidelobe behavior,'' IEEE Transactions on Acoustics, Speech, Signal Processing, vol. ASSP-29, pp. 84-91, Feb 1981.
- 197
-
J. C. O'Neill, P. Flandrin, and W. C. Karl, ``Sparse representations with
chirplets via maximum likelihood estimation.''
citeseer.nj.nec.com/389174.html. - 198
-
A. V. Oppenheim and R. W. Schafer, Digital Signal Processing,
Englewood Cliffs, NJ: Prentice-Hall, 1975. - 199
-
D. O'Shaughnessy, Speech Communication,
Reading MA: Addison-Wesley, 1987. - 200
- T. Painter and A. Spanias, ``Perceptual coding of digital audio,'' Proceedings of the IEEE, vol. 88, pp. 451-513, Apr. 2000.
- 201
-
A. Papoulis, Probability, Random Variables, and Stochastic Processes,
New York: McGraw-Hill, 1965. - 202
-
A. Papoulis, Signal Analysis,
New York: McGraw-Hill, 1977. - 203
-
A. Papoulis, Probability, Random Variables, and Stochastic Processes,
New York: McGraw-Hill, 1984. - 204
-
T. W. Parks and C. S. Burrus, Digital Filter Design,
New York: John Wiley and Sons, Inc., June 1987,
contains FORTRAN software listings. - 205
- R. D. Patterson, ``Auditory filter shapes derived with noise stimuli,'' Journal of the Acoustical Society of America, vol. 76, pp. 640-654, Mar. 1982.
- 206
- R. D. Patterson and G. B. Henning, ``Stimulus variability and auditory filter shape,'' Journal of the Acoustical Society of America, vol. 62, pp. 649-663, Sept. 1977.
- 207
- R. D. Patterson, I. Nimmo-Smith, D. L. Weber, and R. Milroy, ``The deterioration of hearing with age: Frequency selectivity, the critical ratio, the audiogram, and speech threshold,'' Journal of the Acoustical Society of America, vol. 72, pp. 1788-1803, Dec. 1982.
- 208
- R. D. Patterson, M. Allerhand, and C. Giguere, ``Time-domain modelling of peripheral auditory processing: A modular architecture and software platform,'' Journal of the Acoustical Society of America, vol. 98, pp. 1890-1894, 1995.
- 209
-
V. Peltonen, J. Tuomi, A. Klapuri, J. Huopaniemi, and T. Sorsa, ``Computational
auditory scene recognition,'' Proceedings of the IEEE International
Conference on Acoustics, Speech, and Signal Processing, 2002,
http://www.cs.tut.fi/~klap/iiro/index.html. - 210
-
E. Peterson and F. S. Cooper, ``Peak picker: A bandwidth compression device,''
Journal of the Acoustical Society of America, vol. 29, p. 777, June 1957,
See also the section entitled ``Peak Picking'' in [245, pp. 357]. - 211
-
R. Plomp, Aspects of Tone Sensation,
New York: Academic Press, 1976. - 212
- M. R. Portnoff, ``Implementation of the digital phase vocoder using the fast Fourier transform,'' IEEE Transactions on Acoustics, Speech, Signal Processing, vol. ASSP-24, pp. 243-248, June 1976.
- 213
- M. R. Portnoff, ``Time-frequency representation of digital signals and systems based on short-time Fourier analysis,'' IEEE Transactions on Acoustics, Speech, Signal Processing, vol. ASSP-28, pp. 55-69, Feb 1980.
- 214
- J. P. Princen and A. B. Bradley, ``Analysis/synthesis filter bank design based on time domain aliasing cancellation,'' IEEE Transactions on Acoustics, Speech, Signal Processing, vol. 34, pp. 1153-1161, Oct. 1986.
- 215
- M. Puckette, ``Phase-locked vocoder,'' in Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, (New York), pp. Session 9a, Paper 3, IEEE Press, Oct. 18, 1995.
- 216
-
M. Puckette, Theory and Techniques of Electronic Music,
http://crca.ucsd.edu/~msp/techniques/latest/book-html/book.html, 2003. - 217
- H. Purnhagen and N. Meine, ``Hiln--the MPEG-4 parametric audio coding tools,'' in IEEE Proc. ISCAS 2000, Geneva, vol. 3, pp. 201-204, May 2000.
- 218
-
W. Putnam and J. O. Smith, ``Design of fractional delay filters using convex
optimization,'' in Proceedings of the IEEE Workshop on Applications of Signal Processing to
Audio and Acoustics, New Paltz, NY, (New York), IEEE Press, Oct. 1997,
https://ccrma.stanford.edu/~jos/resample/optfir.pdf. - 219
- T. Quatieri, S. H. Nawab, and J. S. Lim, ``Frequency sampling of the short-time Fourier-transform magnitude for signal reconstruction,'' Journal of the Optical Society of America, vol. 73, pp. 1523-1526, Nov 1983.
- 220
- T. F. Quatieri and T. E. Hanna, ``Time-scale modification with inconsistent constraints,'' in Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, (New York), pp. Session 10, Paper 2, IEEE Press, Oct. 18, 1995.
- 221
- T. F. Quatieri and R. J. McAulay, ``Magnitude-only reconstruction using a sinusoidal speech model,'' Proc. IEEE ICASSP-84, vol. 2, pp. 27.6.1-27.6.4, 1984.
- 222
- T. F. Quatieri and R. J. McAulay, ``Speech transformations based on a sinusoidal representation,'' IEEE Transactions on Acoustics, Speech, Signal Processing, vol. ASSP-34, pp. 1449-1464, Dec 1986.
- 223
- T. F. Quatieri and R. J. McAulay, ``Audio signal processing based on sinusoidal analysis/synthesis,'' in Applications of DSP to Audio & Acoustics (M. Kahrs and K. Brandenburg, eds.), pp. 343-416, Boston/Dordrecht/London: Kluwer Academic Publishers, 1998.
- 224
-
L. R. Rabiner and B. Gold, Theory and Application of Digital Signal
Processing,
Prentice-Hall, 1975. - 225
-
L. R. Rabiner and C. M. Rader, eds., Digital Signal Processing,
New York: IEEE Press, 1972. - 226
-
L. R. Rabiner and R. W. Schafer, Digital Processing of Speech Signals,
Prentice-Hall, 1978. - 227
-
L. Rabiner, Fundamentals of Speech Recognition,
Prentice Hall, Apr. 1993. - 228
- A. Reilly, G. Frazer, and B. Boashash, ``Analytic signal generation--tips and traps,'' IEEE Transactions on Signal Processing, vol. 42, Nov 1994.
- 229
-
J. R. Rice, The Approximation of Functions, Volume I,
Reading MA: Addison-Wesley, 1964. - 230
- D. C. Rife and R. R. Boorstyn, ``Single-tone parameter estimation from discrete-time observations,'' IEEE Transactions on Information Theory, vol. IT-20, pp. 591-598, Sep. 1974.
- 231
- D. C. Rife and R. R. Boorstyn, ``Multiple tone parameter estimation from discrete-time observations,'' Bell System Technical Journal, vol. 55, pp. 1389-1410, Nov. 1976.
- 232
-
J.-C. Risset, ``Computer music experiments, 1964--
,'' Computer Music Journal, vol. 9, no. 1, pp. 11-18, 1985,
Reprinted in [234]. - 233
- J.-C. Risset and M. V. Mathews, ``Analysis of musical-instrument tones,'' Physics Today, vol. 22, no. 2, pp. 23-30, 1969.
- 234
-
C. Roads, ed., The Music Machine,
Cambridge, MA: MIT Press, 1989. - 235
-
C. Roads, The Computer Music Tutorial,
Cambridge, MA: MIT Press, 1996. - 236
-
C. Roads and J. Strawn, eds., Foundations of Computer Music,
Cambridge, MA: MIT Press, 1985. - 237
-
C. Roads, S. T. Pope, A. Piccialli, and G. De Poli, eds., Musical Signal
Processing,
Netherlands: Swets and Zietlinger, 1997. - 238
- A. Röbel, ``A new approach to transient processing in the phase vocoder,'' in Proceedings of the Conference on Digital Audio Effects (DAFx-03), Queen Mary, University of London, 2003.
- 239
-
X. Rodet and P. Depalle, ``Spectral envelopes and inverse FFT synthesis,''
Proc. 93rd Convention of the Audio Engineering Society, San Francisco,
1992,
Preprint 3393 (H-3). - 240
-
S. Roucos and A. M. Wilgus, ``High quality time scale modification for
speech,'' in Proceedings of the International Conference on Acoustics, Speech, and Signal
Processing, Tampa, Florida, pp. 493-496, Mar. 1985,
Introduces the SOLA (Synchronous OverLap-Add) technique for time scale modification. - 241
-
J. Salsman, ``Matlab programming/phase vocoder and encoder,'' 2007,
http://en.wikibooks.org/wiki/MATLAB_Programming/Phase_Vocoder_and_Encoder. - 242
- L. Savioja, V. Välimäki, and J. O. Smith, ``Audio signal processing using graphics processing units,'' Journal of the Audio Engineering Society, vol. 59, pp. 3-19, Jan. 2011.
- 243
-
R. W. Schafer and J. D. Markel, eds., Speech Analysis,
New York: IEEE Press, 1979. - 244
-
C. Schörkhuber and A. Klapuri, ``Constant-Q transform toolbox for music
processing,'' in Proc. 7th Sound and Music Conf (SMC-2010), 2010,
http://www.elec.qmul.ac.uk/people/anssik/cqt/smc2010.pdf. - 245
-
M. R. Schroeder, ``Vocoders: Analysis and synthesis of speech (a review of 30
years of applied speech research),'' Proceedings of the IEEE, vol. 54,
pp. 720-734, May 1966,
Reprinted in [243, pp. 352-366]. - 246
-
X. Serra, A System For Sound Analysis Transformation Synthesis Based on a
Deterministic Plus Stochastic Decomposition,
PhD thesis, Stanford University, 1989. - 247
- X. Serra, ``A computer model for bar percussion instruments,'' in Proceedings of the 1986 International Computer Music Conference, The Hague, pp. 257-262, Computer Music Association, searchable at http://quod.lib.umich.edu/i/icmc/, 1986.
- 248
- X. Serra, ``Musical sound modeling with sinusoids plus noise,'' in Musical Signal Processing (C. Roads, S. Pope, A. Picialli, and G. De Poli, eds.), pp. 91-122, The Netherlands: Swets & Zeitlinger Publishers, June 1997.
- 249
-
X. Serra and J. O. Smith, ``Spectral modeling synthesis: A sound
analysis/synthesis system based on a deterministic plus stochastic
decomposition,'' Computer Music Journal, vol. 14, no. 4,
pp. 12-24, 1990,
Software and recent papers URL: //www.iua.upf.es/~xserra/. - 250
- X. Serra and J. O. Smith, ``Sound-sheet examples for a sound analysis/synthesis system based on a deterministic plus stochastic decomposition,'' Computer Music Journal, vol. 15, pp. 86-87, Spring 1991.
- 251
- M. J. Shailer, B. C. J. Moore, B. R. Glasberg, and N. Watson, ``Auditory filter shapes at 8 and 10 kHz,'' Journal of the Acoustical Society of America, vol. 88, pp. 141-148, July 1990.
- 252
-
L. L. Sharf, Statistical Signal Processing, Detection, Estimation, and
Time Series Analysis,
Reading MA: Addison-Wesley, 1991. - 253
- G. Shi, X. Xie, X. Chen, and W. Zhong, ``Recent advances and new design method in nonuniform filter banks,'' in Proceedings of the IEEE International Conference on Comm., Circ. & Sys., vol. 1, pp. 211-215, June 2006.
- 254
-
D. J. Shpak and A. Antoniou, ``A generalized Remez method for the design of
FIR digital filters,'' IEEE Transactions on Circuits and Systems,
vol. CAS-37, pp. 161-174, Feb. 1990,
basis of firgr in the Matlab Filter Design Toolbox. - 255
-
M. Slaney, ``An efficient implementation of the Patterson-Holdsworth auditory
filter bank,'' Tech. Rep. 35, Apple Computer, Inc., 1993,
http://www.slaney.org/malcolm/apple/tr35/PattersonsEar.pdf. - 256
- D. Slepian and H. Pollak, ``Prolate-spheroidal wave functions, Fourier analysis and uncertainty--i,'' Bell System Technical Journal, vol. 40, pp. 34-64, Jan. 1961.
- 257
- A. Smirnov, ``Proto musique concrète--Russian futurism in the 10s and 20s, and early ideas of sonic art and the art of noises,'' in Inventionen 98: 50 Jahre Musique Concrète, Jagerstrasse 23, D-10117 Berlin: DAAD, 1998.
- 258
-
J. O. Smith, Techniques for Digital Filter Design and System
Identification with Application to the Violin,
PhD thesis, Elec. Engineering Dept., Stanford University (CCRMA), June 1983,
CCRMA Technical Report STAN-M-14, https://ccrma.stanford.edu/STANM/stanms/stanm14/. - 259
-
J. O. Smith, ``Physical modeling using digital waveguides,'' Computer Music Journal, vol. 16, no. 4, pp. 74-91, 1992,
special issue: Physical Modeling of Musical Instruments, Part I. https://ccrma.stanford.edu/~jos/pmudw/. - 260
-
J. O. Smith, Introduction to Matlab and Octave,
https://ccrma.stanford.edu/~jos/matlab/, 2003. - 261
-
J. O. Smith, Introduction to Digital Filters,
https://ccrma.stanford.edu/~jos/filters05/, September 2005. - 262
-
J. O. Smith, Physical Audio Signal Processing: Digital Waveguide Modeling
of Musical Instruments and Audio Effects,
https://ccrma.stanford.edu/~jos/pasp/, Dec. 2005. - 263
-
J. O. Smith, Introduction to Digital Filters with Audio Applications,
https://ccrma.stanford.edu/~jos/filters/, Sept. 2007,
online book. - 264
-
J. O. Smith, Mathematics of the Discrete Fourier Transform (DFT), with
Audio Applications, Second Edition,
https://ccrma.stanford.edu/~jos/mdft/, Apr. 2007,
online book. - 265
- J. O. Smith, ``Audio FFT filter banks,'' in Proceedings of the 12th International Conference on Digital Audio Effects (DAFx-09), Como, Italy, September 1-4, Sept. 2009.
- 266
-
J. O. Smith, Physical Audio Signal Processing,
https://ccrma.stanford.edu/~jos/pasp/, Dec. 2010,
online book. - 267
-
J. O. Smith, Physical Audio Signal Processing,
https://ccrma.stanford.edu/~jos/pasp/, Dec. 2010,
W3K Publishing, online book. - 268
-
J. O. Smith and J. S. Abel, ``The Bark bilinear transform,'' in Proceedings of the
IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz,
NY, (New York), IEEE Press, Oct. 1995,
Session 8, paper 6, 4 pages. https://ccrma.stanford.edu/~jos/gz/bbtmh.tgz. - 269
- J. O. Smith and J. S. Abel, ``Bark and ERB bilinear transforms,'' IEEE Transactions on Speech and Audio Processing, pp. 697-708, Nov. 1999.
- 270
-
J. O. Smith and P. Gossett, ``A flexible sampling-rate conversion method,'' in
Proc. 1984 Int. Conf. Acoustics, Speech, and Signal Processing
(ICASSP-84), San Diego, vol. 2, (New York), pp. 19.4.1-19.4.2, IEEE
Press, Mar. 1984,
expanded tutorial and associated free software available at the Digital Audio Resampling Home Page: https://ccrma.stanford.edu/~jos/resample/. - 271
-
J. O. Smith and X. Serra, ``PARSHL: A program for the analysis/synthesis of
inharmonic sounds based on a sinusoidal representation,'' in Proceedings of the
1987 International Computer Music Conference, Champaign-Urbana, Computer Music
Association, searchable at
http://quod.lib.umich.edu/i/icmc/,
1987,
Also available as Stanford Music Department Technical Report STAN-M-43. Expanded version available on-line at https://ccrma.stanford.edu/~jos/parshl/. - 272
- M. J. T. Smith and T. P. Barnwell, ``A unifying framework for analysis/synthesis systems based on maximally decimated filter banks,'' in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, Tampa, Florida, (New York), pp. 521-524, IEEE Press, 1985.
- 273
-
A. Spanias, T. Painter, and V. Atti, Audio Signal Processing and Coding,
New York: John Wiley and Sons, Inc., 2007. - 274
- T. Sporer and K. Brandenburg, ``Constraints of filter banks used for perceptual measurement,'' Journal of the Audio Engineering Society, vol. 43, pp. 107-116, Mar. 1995.
- 275
-
K. Steiglitz, A Digital Signal Processing Primer with Applications to
Audio and Computer Music,
Reading MA: Addison-Wesley, 1996. - 276
-
S. S. Stevens and H. Davis, Hearing: Its Psychology and Physiology,
American Inst. of Physics, for the Acoustical Society of America, 1983,
copy of original 1938 edition, http://asa.aip.org/publications.html. - 277
- T. G. Stockham, ``High speed convolution and correlation,'' Proceedings of the April 26-28 Joint Computer Conference (AFIPS-66, Spring), pp. 129-133, 1966.
- 278
- H. W. Strube, ``Linear prediction on a warped frequency scale,'' Journal of the Acoustical Society of America, vol. 68, no. 4, pp. 1071-1076, 1980.
- 279
-
R. D. Strum and D. E. Kirk, First Principles of Discrete Systems and
Digital Signal Processing,
Reading MA: Addison-Wesley, 1988. - 280
- R. K. C. Tan and A. H. J. Lin, ``A time-scale modification algorithm based on the subband time-domain technique for broad-band signal applications,'' Journal of the Audio Engineering Society, vol. 46, pp. 437-449, May 2000.
- 281
- Tech. staff, ``ADEC subroutine description,'' Tech. Rep. 13201, General Electric Co., Syracuse, NY, June 1977.
- 282
-
H. Thornburg, On the Detection and Modeling of Transient Audio Signals
with Prior Information,
PhD thesis, Elec. Engineering Dept., Stanford University (CCRMA), September 2005. - 283
-
J. Treichler, ``Notes on the design of optimal fir filters,'' tech. rep.,
Connexions, Sept. 2009,
http://cnx.org/content/col10553/latest/. - 284
- J. M. Tribolet and R. E. Crochiere, ``Frequency-domain coding of speech,'' IEEE Transactions on Acoustics, Speech, Signal Processing, vol. ASSP-27, pp. 512-530, Oct. 1979.
- 285
- C. Truesdell, ``The rational mechanics of flexible or elastic bodies, 1638ÂÂ1788,'' in Leonardi Euleri Opera Omnia, Ser. 2, XI part 2, 1960.
- 286
- M. Unser, ``Splines: A perfect fit for signal and image processing,'' IEEE Signal Processing Magazine, vol. 16, pp. 22-38, Nov. 1999.
- 287
-
P. P. Vaidyanathan, Multirate Systems and Filter Banks,
Prentice-Hall, 1993. - 288
-
V. Välimäki and T. Tolonen, ``Development and calibration of a guitar
synthesizer,'' Journal of the Audio Engineering Society, vol. 46, Sept. 1998,
Proceedings of the 103rd Convention of the Audio Engineering Society, New York, Sept. 1997. - 289
-
(Various), Prime-Factor FFT Algorithm,
http://www.wikipedia.org/wiki/Prime-factor_FFT_algorithm, 2003. - 290
-
T. Verma, A Perceptually Based Audio Signal Model With Application to
Scalable Audio Compression,
PhD thesis, Elec. Engineering Dept., Stanford University (CCRMA), 2000. - 291
-
M. Vetterli and J. Kovacevic, Wavelets and Subband Coding,
Englewood Cliffs, NJ: Prentice-Hall, 1995. - 292
-
H. L. F. von Helmholtz, Die Lehre von den Tonempfindungen als
Physiologische Grundlage für die Theorie der Musik,
Braunschweig: F. Vieweg und Sohn, 1863. - 293
-
H. L. F. von Helmholtz, On the Sensations of Tone as a Physiological Basis
for the Theory of Music,
New York: Dover, 1954,
English translation of 1863 (German) edition. - 294
- R. F. Voss and J. Clarke, ```1/f noise' in music: Music from 1/f noise,'' Journal of the Acoustical Society of America, vol. 63, pp. 258-263, Jan. 1978.
- 295
- A. Wang, ``Instantaneous and frequency-warped techniques for source separation and signal parametrization,'' in Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, (New York), IEEE Press, Oct. 1995.
- 296
-
P. D. Welch, ``The use of fast Fourier transforms for the estimation of power
spectra: A method based on time averaging over short modified periodograms,''
IEEE Transactions on Audio and Electroacoustics, vol. 15, pp. 70-73, 1967,
reprinted in [37] and [225]. - 297
- J. D. Wise, J. R. Caprio, and T. W. Parks, ``Maximum likelihood pitch estimation,'' IEEE Transactions on Acoustics, Speech, Signal Processing, vol. ASSP-24, pp. 418-423, Oct. 1976.
- 298
- B. A. Wright, ``Auditory filter asymmetry at 2000 Hz in 80 normal-hearing ears,'' Journal of the Acoustical Society of America, vol. 100, pp. 1717-1721, Sept. 1996.
- 299
- M. Wright and J. O. Smith, ``Open-source matlab tools for interpolation of SDIF sinusoidal synthesis,'' in Proceedings of the 2005 International Computer Music Conference, Sept., Barcelona, Spain, pp. 632-635, Sept. 5-9 2005.
- 300
-
D. T. Yeh, Digital Implementation of Musical Distortion Circuits by
Analysis and Simulation,
PhD thesis, CCRMA, Elec. Engineering Dept., Stanford University (CCRMA), June 2009,
https://ccrma.stanford.edu/~dtyeh/. - 301
- M. M. J. Yekta, ``Equivalence of the Lagrange interpolator for uniformly sampled signals and the scaled binomially windowed shifted sinc function,'' Digital Signal Processing, vol. 19, pp. 838-842, Sept. 2009.
- 302
- Z. Zijing and Y. Yun, ``A simple design method for nonuniform cosine modulated filter banks,'' in IEEE International Symposium on Microwave, Antenna, Prop., & EMC Tech. for Wireless Comm., pp. 1052-1055, 2007.
- 303
-
U. Zölzer, ed., DAFX--Digital Audio Effects,
West Sussex, England: John Wiley and Sons, LTD, 2002,
http://www.dafx.de/. - 304
-
E. Zwicker and H. Fastl, Psychoacoustics, Facts and Models,
Berlin: Springer Verlag, 1990,
see also later 1999 edition. - 305
-
E. Zwicker and H. Fastl, Psychoacoustics: Facts and Models,
Berlin: Springer Verlag, 1999,
second updated edition, 80pp., CD-ROM/softcover. - 306
- E. Zwicker and B. Scharf, ``A model of loudness summation,'' Psych. Rev., vol. 72, pp. 3-26, 1965.
- 307
- E. Zwicker and E. Terhardt, ``Analytical expressions for critical band rate and critical bandwidth as a function of frequency,'' Journal of the Acoustical Society of America, vol. 68, pp. 1523-1525, 1980.
Index for this Document
- filter design
- optimal least-squares impulse response : 5.3
- absolutely integrable : 3.2.1
- acyclic convolution : 3.3.5
- acyclic FFT convolution : 9.1.2
- additive synthesis : 6 | 11.4.1 | 20 | 20.8 | 21.6
- admissibility condition, wavelets : 12.9.1.6
- alias component matrix : 12.3.8
- aliased sinc function : 4.1
- aliasing cancellation : 12.3
- aliasing components : 3.3.12
- aliasing theorem for the DTFT : 3.3.12
- aliasing, time domain : 9.1.4.3
- allpass filter : 12.5.1
- amplitude envelope : 11.4 | 20.10.1
- analysis modulation matrix : 12.3.8
- analytic signal : 5.6 | 5.6.1.1 | 6.1 | 20.10.1
- applications of the STFT : 11
- asinc function : 4.1
- audio
- filter banks : 11.7
- spectrogram : 8.3
- spectrogram hop size : 8.3.2.1
- spectrogram : 8.3
- auditory filter bank : 8.3.1
- auditory filter shape : 8.3.3.4
- autocorrelation : 3.3.7
- autocorrelation computation : 7.9
- autocorrelation function : 16.2.3
- autocorrelation method of linear prediction : 11.3.2.2
- bandpass filter : 5.6
- Bark frequency scale : 18.1
- Bark warping : 18.3
- Bartlett window : 4.5
- baseband signal : 10.1.2
- basis signals : 12.9.1
- bias : 6.6.2
- biased autocorrelation : 7
- biased sample autocorrelation : 7.6
- bilinear transform : 18.2
- bin number : 8.1.3
- Blackman window : 4.3.1
- Blackman window matlab example : 19.1.1
- Blackman-Harris window : 4.3.4
- family : 4.3
- bounded variation : 15.18
- breakpoints : 20.10.1.2
- brown noise : 7.14
- Burg's method : 11.3.2.2
- central limit theorem : 17.9.1
- cepstral windowing : 11.3.1
- cepstrum : 5.8
- cepstrum, causal : 5.9
- channel vocoder : 20.5
- characteristic function : 17.12.4
- Chebyshev bandpass filter design : 5.5.2.2
- Chebyshev FIR filters : 5.10.2
- Chebyshev optimal windows : 4.13.2
- Chebyshev polynomials : 4.10.4.1
- Chebyshev window : 4.10
- by linear programming : 4.13
- chirp signal : 10.2.1 | 11.6
- chirp, Gaussian-windowed : 11.6
- chirplet : 11.6 | 11.6
- chirplet frequency estimation : 11.6.3.1
- chirplet signal modeling : 20.8.2
- chirplet, spectrum : 11.6.1
- circular convolution : 9.1
- coherent addition of signals : 7.15
- COLA (constant overlap-add) : 8.1.1
- COLA constraint : 9.2.1
- COLA constraint, frequency domain : 9.3.2
- COLA dual : 9.3
- colored noise : 7.14
- complex demodulation : 10.3.2
- complex signal modulation : 10.3.2
- compression : 12
- confidence interval : 16.3.3
- confidence level : 16.3.3
- conjugate quadrature filters : 12.3.7
- constant overlap-add : 21.1
- constant overlap-add window : 8.1.1 | 9
- constant-overlap-add : 9.2.1
- constant-Q filter banks : 11.7.1
- constant-Q Fourier transform : 12.9.1.6
- continuous Fourier theorems : 3.4 | 15
- continuous wavelet transform : 12.9.1.6
- convolution : 3.3.5 | 9.1
- acyclic : 9.1.2
- acyclic in matlab : 9.1.2.1
- continuous time : 15.7
- cyclic : 9.1.1
- cyclic, or circular : 9.1
- FFT overlap-add in matlab : 9.2.5
- FFT, overlap-add : 9.2
- in matlab : 9.1.3
- short signals : 9.1
- acyclic in matlab : 9.1.2.1
- convolution theorem : 3.3.5 | 3.3.5 | 15.7
- correlation : 3.3.6
- correlation analysis : 16.2
- correlation theorem : 3.3.6 | 3.3.7
- covariance : 7.4
- covariance lattice methods : 11.3.2.2
- covariance method, linear prediction : 11.3.2.2
- critical band of hearing : 8.3.2
- critical downsampling : 20.10.1.2
- cross-correlation : 16.2.1
- cross-power spectral density : 16.2.2 | 16.2.2
- cross-synthesis : 11.2
- cubic phase interpolation : 11.4.2.1
- cubic splines : 5.7
- cut-off frequency : 5.1
- cycles per second : 15.1
- cyclic autocorrelation : 7.8
- cyclic convolution : 9.1
- cyclic FFT convolution : 9.1.1
- dc sampling filter : 10.3.1
- decimation operator : 12.1.2
- deconvolution : 9.1.2
- delta function : 15.10
- demodulation, complex : 10.3.2
- demos : 11.9
- denoising : 7.1.1
- deterministic : 6.7.2
- deterministic part : 11.4.3.2
- detrend : 7.9
- DFT filter bank : 10.3 | 10.3.4.2
- differentiation theorem : 15.2 | 15.18
- differentiation theorem dual, DTFT : 3.3.13
- differentiation theorem dual, FT : 15.3
- digital filter design : see filter designtextbf
- digital prolate spheroidal sequence : 4.8
- window : 4.8
- Dirichlet function : 4.1
- discrete time Fourier transform (DTFT) : 3.1
- discrete wavelet filterbank : 12.9.1.8 | 12.9.1.8
- discrete wavelet transform : 12.9.1.7
- discrete-time Fourier transform : see DTFTtextbf
- Dolph window : 4.10
- Dolph-Chebyshev window : 4.10
- comparison to Hamming : 4.10.3
- definition : 4.10.4.2
- length computation : 4.10.4.4
- main-lobe width : 4.10.4.3
- theory : 4.10.4
- definition : 4.10.4.2
- double-factorial : 17.12.3
- downsampling : 3.3.12
- downsampling (decimation) operator : 12.1.2
- DPSS : see digital prolate spheroidal sequencetextbf
- DPSS window : 4.8
- DTFT definition : 3.3
- DTFT Fourier theorems : 3.3
- aliasing theorem : 3.3.12
- convolution theorem : 3.3.5
- correlation theorem : 3.3.6
- downsampling theorem : 3.3.12
- energy theorem : 3.3.8
- even symmetry : 3.3.3.2
- linearity : 3.3.1
- power theorem : 3.3.8
- real signals : 3.3.3.1
- repeat operator : 3.3.10
- repeat theorem : 3.3.11
- scaling operator : 3.3.10
- scaling theorem : 3.3.11
- shift theorem : 3.3.4
- stretch operator : 3.3.9
- stretch theorem : 3.3.11
- symmetry : 3.3.3
- time reversal : 3.3.2
- convolution theorem : 3.3.5
- duality, Fourier : 9.3 | 10.5
- Durbin recursion : 11.3.2.3
- dyadic filter bank : 11.7.1 | 12.9.1.9
- dyadic wavelet filter bank : 12.9.1.9
- effective length of a window : 6.5.3
- energy theorem : 3.3.8
- ensemble average : 16.1.6
- entropy : 17.11.1 | 17.11.1
- envelope break-points : 11.4.2.1
- envelope follower : 8.3.3.7 | 20.10.1
- equation error : 18.3.1
- equiripple : 5.3.1
- equivalent rectangular bandwidth : 18.5
- ergodic : 16.1.6
- estimator variance : 16.3.3
- excitation pattern : 8.3.1 | 8.3.2 | 8.3.3.2
- expected value : 16.1.6 | 16.1.6 | 16.3
- exponential polynomial signal : 11.6
- exponential window : 4.6
- extended lapped transforms : 12.7.2
- extremal frequencies : 5.10.3
- F0 : see fundamental frequencytextbf
- F0 estimation : 11.1
- fast Fourier transform : see FFTtextbf
- FBS : see filter-bank summationtextbf
- FFT
- fftshift utility in matlab : 3.5.4.1
- filter
- filter bank
- auditory : 8.3.1
- DFT : 10.3
- discrete wavelet : 12.9.1.8
- FFT based : 11.7
- Haar : 12.3.3
- multirate : 12
- paraunitary : 12.5
- perfect reconstruction : 12.3
- polyphase : 12.1.3
- Princen-Bradley : 12.7.2
- pseudo-QMF : 12.7.1
- wavelet : 12.9
- DFT : 10.3
- filter design : 5
- frequency-sampling method : 5.4
| 5.6.2.3
- Hilbert transform filter : 5.6 | 5.6.2
- least-squares, linear-phase : 5.10.3
- linear programming : 4.13
- nonlinear-phase : 5.10.6
- nonparametric : 5.6.3
- optimal methods : 5.10
- specifications : 5.2
- window method : 5.5
- window method example : 5.5.2
- Hilbert transform filter : 5.6 | 5.6.2
- filter-bank interpretation of STFT : 10
- filtered white noise : 7.14 | 7.14
- finite support : 7.6
- FIR (finite impulse response) filter : 5.5
- FIR filter design : see filter designtextbf
- first-order moment : 17.12.1
- flip operator : 15.8
- floor function : 7.13
- FM : see frequency modulationtextbf
- formants : 8.2.1
- Fourier dual : 3.5 | 10.5
- Fourier theorems
- continuous time : 3.4
| 15
- discrete time : 3.3
- DTFT : see DTFT Fourier theoremstextbf
- differentiation dual : 3.3.13
- FT
- differentiation dual : 15.3
- discrete time : 3.3
- Fourier theorems (continuous time)
- convolution theorem : 15.7
- differentiation : 15.2
- flip theorem : 15.8
- Gaussian pulse : 15.11
- impulse train : 15.14
- modulation theorem : 15.6
- power theorem : 15.9
- rectangular pulse : 15.12
- sampling theorem : 15.16
- scaling or similarity : 15.4
- shift theorem : 15.5
- uncertainty principle : 15.17
- differentiation : 15.2
- Fourier transform
- frame (of data) : 8.1.2
- frequency envelopes : 20.10
- frequency modulation : 20.9
- brass synthesis : 20.9.2
- modulation index : 20.9.1
- operator : 20.9.3
- spectra : 20.9.1
- synthesis : 20.9
- voice synthesis : 20.9.3
- modulation index : 20.9.1
- frequency resolution : 6.4.1 | 6.5.2
- frequency sampling method for FIR digital filter design : 5.4
- frequency scaling : 11.5 | 11.5
- frequency trajectories : 11.4.2.3
- frequency warping
- fundamental frequency estimation : 11.1
- Gaussian : 17
- characteristic function : 17.12.5
- closure under convolution : 17.3
- closure under multiplication : 17.2
- complex integral : 17.7
- distribution : 17.11.3.3
- Fourier transform : 17.8
- integral : 17.6.1
- maximum entropy property : 17.11
- moments : 17.12
- probability density : 17.10
- pulse : 15.11
- random variable, closed under addition : 17.13
- window : 4.11 | 17.1
- window transform : 4.11.2
- closure under convolution : 17.3
- Gaussian-windowed chirp : 11.6
- generalized function : 15.10
- generalized Hamming window : 4.2
- generalized Hamming window family : 4.2.5
- generalized STFT : 12.9.1.11
- geometric signal theory : 12.9.1
- Gibbs phenomenon : 4.1.1
- glossary of notation : 14
- graphic equalizer : 5.7 | 9.3.3
- graphical convolution : 9.1
- group-additive synthesis : 20.8.4.2
- Haar filter bank : 12.3.3
- Hamming window : 4.2.3
- comparison to Chebyshev : 4.10.3
- Hammond organ : 20.4
- Hann window : 4.2.1 | 4.2.1
- Hann-Poisson window : 4.7
- hanning window : 4.2.1
- harmonic : 6.5.3
- harmonic comb : 11.1.2
- Heisenberg uncertainty principle : 15.17.1
- Hermitian : 3.3.3.1
- Hermitian spectrum : 5.6
- heterodyne-comb : 20.11.1
- Hilbert space : 12.9.1
- Hilbert transform : 5.6.1
- Hilbert transform filter design : 5.6
- Hilbert transform kernel : 5.6.1.1 | 5.6.1.1
- history of spectral modeling : 20
- hop size : 7.12 | 8.3.2.1 | 9.2.1
- hop size (STFT) : 8.1.3
- ideal lowpass filter : 5.5
- identity system : 20.10.1.3
- impulse train : 15.14
- impulse, continuous time : 15.10
- impulse, sinc : 15.13
- independent events : 16.1.2 | 16.3.1
- independent random variables : 16.3.1
- inner product : 3.3.8 | 15.9
- innovations sequence : 11.3.2
- instantaneous amplitude : 20.10.1
- instantaneous frequency : 20.10.1
- instantaneous loudness : 8.3.2
- instantaneous phase : 20.10.1
- interpolation
- interpolation kernel : 3.5.2 | 8.3.3.3
- inverse filter : 11.3.2
- inverse-FFT synthesis : 20.8.1 | 20.11.3
- Kaiser window : 4.9
- beta parameter : 4.9.1
- Kaiser-Bessel window : 4.9
- lagged product : 7.4
- Laurent expansion : 5.9 | 5.9
- least squares estimation : 6.7.1
- likelihood function : 6.7.3
- linear inequality constraints : 5.10.4
- linear least squares : 6.7.1.1
- linear objective : 5.10.4
- linear phase : 9.1.4.2
- linear phase term : 3.3.4
- linear prediction
- autocorrelation method : 11.3.2.2
- covariance method : 11.3.2.2
- peak sensitivity : 11.3.2.1
- spectral envelope : 11.3.2
- covariance method : 11.3.2.2
- linear programming : 4.13 | 4.13.1 | 5.10.4
- linearity of the DTFT : 3.3.1
- lossless filter : 12.5.1
- lossless filter examples : 12.5.2
- lossless transfer function matrix : 12.5.1
- loudness : 8.3 | 8.3.1