## Optimization of Synthesis Oversampled Complex Filter Banks

An important issue with oversampled FIR analysis filter banks (FBs) is to determine inverse synthesis FBs, when they exist. Given any complex oversampled FIR analysis FB, we first provide an algorithm to determine whether there exists an inverse FIR synthesis system. We also provide a method to ensure the Hermitian symmetry property on the synthesis side, which is serviceable to processing real-valued signals. As an invertible analysis scheme corresponds to a redundant decomposition, there is no unique inverse FB. Given a particular solution, we parameterize the whole family of inverses through a null space projection. The resulting reduced parameter set simplifies design procedures, since the perfect reconstruction constrained optimization problem is recast as an unconstrained optimization problem. The design of optimized synthesis FBs based on time or frequency localization criteria is then investigated, using a simple yet efficient gradient algorithm.

## Hybrid Floating Point Technique Yields 1.2 Gigasample Per Second 32 to 2048 point Floating Point FFT in a single FPGA

Hardware Digital Signal Processing, especially hardware targeted to FPGAs, has traditionally been done using fixed point arithmetic, mainly due to the high cost associated with implementing floating point arithmetic. That cost comes in the form of increased circuit complexity. The increase circuit complexity usually also degrades maximum clock performance. Certain applications demand the dynamic range offered by floating point hardware, and yet require the speeds and circuit density usually associated with fixed point hardware. The Fourier transform is one DSP building block that frequently requires floating point dynamic range. Textbook construction of a pipelined floating point FFT engine capable of continuous input entails dozens of floating point adders and multipliers. The complexity of those circuits quickly exceeds the resources available on a single FPGA. This paper describes a technique that is a hybrid of fixed point and floating point operations designed to significantly reduce the overhead for floating point. The results are illustrated with an FFT processor that performs 32, 64, 128, 256, 512, 1024 and 2048 point Fourier transforms with IEEE single precision floating point inputs and outputs. The design achieves sufficient density to realize a continuous complex data rate of 1.2 Gigasamples per second data throughput using a single Virtex4-SX55-10 device.

## Efficient Digital Fiilters

What would you do in the following situation? Let ’ s say you are diagnosing a DSP system problem in the field. You have your trusty laptop with your development system and an emulator. You figure out that there was a problem with the system specifications and a symmetric FIR filter in the software won ’ t do the job; it needs reduced passband ripple, or maybe more stopband attenuation. You then realize you don ’ t have any filter design software on the laptop, and the customer is getting angry. The answer is easy: You can take the existing filter and sharpen it. Simply stated, filter sharpening is a technique for creating a new filter from an old one [1] – [3] . While the technique is almost 30 years old, it is not generally known by DSP engineers nor is it mentioned in most DSP textbooks.

## An application of neural networks to adaptive playout delay in VoIP

The statistical nature of data traffic and the dynamic routing techniques employed in IP networks results in a varying network delay (jitter) experienced by the individual IP packets which form a VoIP flow. As a result voice packets generated at successive and periodic intervals at a source will typically be buffered at the receiver prior to playback in order to smooth out the jitter. However, the additional delay introduced by the playout buffer degrades the quality of service. Thus, the ability to forecast the jitter is an integral part of selecting an appropriate buffer size. This paper compares several neural network based models for adaptive playout buffer selection and in particular a novel combined wavelet transform/neural network approach is proposed. The effectiveness of these algorithms is evaluated using recorded VoIP traces by comparing the buffering delay and the packet loss ratios for each technique. In addition, an output speech signal is reconstructed based on the packet loss information for each algorithm and the perceptual quality of the speech is then estimated using the PESQ MOS algorithm. Simulation results indicate that proposed Haar-Wavelets-Packet MLP and Statistical-Model MLP adaptive scheduling schemes offer superior performance.

## HIERARCHICAL MOTION ESTIMATION FOR EMBEDDED OBJECT TRACKING

This paper presents an algorithm developed to provide automatic motion detection and object tracking embedded within intelligent CCTV systems. The algorithm development focuses on techniques which provide an efficient embedded systems implementation with the ability to target both FPGA and DSP devices. During algorithm development constraints on hardware implementation have been fully considered resulting in an algorithm which, when targeted at current FPGA devices, will take full advantage of the DSP resource commonly provided in such devices. The hierarchical structure of the proposed algorithm provides the system with a multi-level motion estimation process allowing low resolution estimation for motion detection and further higher resolution stages for motion estimation. An initial MATLAB prototype has demonstrated this algorithm capable of object motion estimation while compensating for camera motion, allowing a moving object to be tracked by a moving camera.

## An FPGA Implementation of Hierarchical Motion Estimation for Embedded Oject Tracking

This paper presents the hardware implementation of an algorithm developed to provide automatic motion detection and object tracking functionality embedded within intelligent CCTV systems. The implementation is targeted at an Altera Stratix FPGA making full use of the dedicated DSP resource. The Altera Nios embedded processor provides a platform for the tracking control loop and generic Pan Tilt Zoom camera interface. This paper details the explicit functional stages of the algorithm that lend themselves to an optimised pipelined hardware implementation. This implementation provides maximum data throughput, providing real-time operation of the described algorithm, and enables a moving camera to track a moving object in real time.

## Hidden Markov Model based recognition of musical pattern in South Indian Classical Music

Automatic recognition of musical patterns plays a crucial part in Musicological and Ethno musicological research and can become an indispensable tool for the search and comparison of music extracts within a large multimedia database. This paper finds an efficient method for recognizing isolated musical patterns in a monophonic environment, using Hidden Markov Model. Each pattern, to be recognized, is converted into a sequence of frequency jumps by means of a fundamental frequency tracking algorithm, followed by a quantizer. The resulting sequence of frequency jumps is presented to the input of the recognizer which use Hidden Markov Model. The main characteristic of Hidden Markov Model is that it utilizes the stochastic information from the musical frame to recognize the pattern. The methodology is tested in the context of South Indian Classical Music, which exhibits certain characteristics that make the classification task harder, when compared with Western musical tradition. Recognition of 100% has been obtained for the six typical music pattern used in practise. South Indian classical instrument, flute is used for the whole experiment.

## Design and implementation of odd-order wave digital lattice lowpass filters, from specifications to Motorol DSP56307EVM module

This thesis is dedicated to applying and developing explicit formulas for the design and implementation of odd-order lattice Lowpass wave digital filters (WDFs) on a Digital Signal Processor (DSP), such as a Motorola DSP56307EVM (Evaluation Module). The direct design method of Gazsi for filter types such as Butterworfh, Chebyshev, inverse Chebyshev, and Cauer (Elliptic) provides a straightforward method for calculating the coefficients without an extensive knowledge of digital signal processing. A program package to design and implement odd-order WDFs, including detailed procedures and examples, is presented in this thesis and includes not only the calculations of the coefficients, but also the simulation on a MATLAB platform and an implementation on a Motorola DSP56307EVM board. It is very quick, effective and convenient to obtain the coefficients when the user enters a few parameters according to the general specifications; to verify the characteristics of the designed filter; to simulate the filter on the MATLAB platform; to implement the filter on the DSP board; and to compare the results between the simulation and the implementation.

## Image Analysis Using a Dual-Tree M-Band Wavelet Transform

We propose a 2D generalization to the M-band case of the dual-tree decomposition structure (initially proposed by N. Kingsbury and further investigated by I. Selesnick) based on a Hilbert pair of wavelets. We particularly address (i) the construction of the dual basis and (ii) the resulting directional analysis. We also revisit the necessary pre-processing stage in the M-band case. While several reconstructions are possible because of the redundancy of the representation, we propose a new optimal signal reconstruction technique, which minimizes potential estimation errors. The effectiveness of the proposed M- band decomposition is demonstrated via denoising comparisons on several image types (natural, texture, seismics), with various M-band wavelets and thresholding strategies. Signicant improvements in terms of both overall noise reduction and direction preservation are observed.

## Energy Profiling of DSP Applications, A Case Study of an Intelligent ECG Monitor

Proper balance of power and performance for optimum system organization requires precise profiling of the power consumption of different hardware subsystems as well as software functions. Moreover, power consumption of mobile systems is even more important, since the battery is a large portion of the overall size and weight of the system. Average power consumption is only a crude estimate of power requirements and battery life; a much better estimate can be made using dynamic power consumption. Dynamic power consumption is a function of the execution profile of the given application running on specific hardware platform. In this paper we introduce a new environment for energy profiling of DSP applications. The environment consists of a JTAG emulator, a high-resolution HP 3583A multimeter and a workstation that controls devices and stores the traces. We use Texas Instruments’ Real Time Data Exchange mechanism (RTDXÔ) to generate an execution profile and custom procedures for energy profile data acquisition using GPIB interface. We developed custom procedures to correlate and analyze both energy and execution profiles. The environment allows us to improve the system power consumption through changes in software organization and to measure real battery life for the given hardware, software and battery configuration. As a case study, we present the analysis of a real-time portable ECG monitor implemented using a Texas Instruments TMS320C5410-100 processor board, and a Del Mar PWA ECG Amplifier.

## Update To: A Wide-Notch Comb Filter

This article presents alternatives to the wide-notch comb filter described in Reference [1].

## Filter a Rectangular Pulse with no Ringing

To filter a rectangular pulse without any ringing, there is only one requirement on the filter coefficients: they must all be positive. However, if we want the leading and trailing edge of the pulse to be symmetrical, then the coefficients must be symmetrical. What we are describing is basically a window function.

## LOW-RESOURCE DELAYLESS SUBBAND ADAPTIVE FILTER USING WEIGHTED OVERLAP-ADD

A delayless structure targeted for low-resource implementation is proposed to eliminate filterbank processing delays in subband adaptive filters (SAFs). Rather than using direct IFFT or polyphase filterbanks to transform the SAFs back into the time-domain, the proposed method utilizes a weighted overlap-add (WOLA) synthesis. Low-resource real-time implementations are targeted and as such do not involve long (as long as the echo plant) FFT or IFFT operations. Also, the proposed approach facilitates time distribution of the adaptive filter reconstruction calculations crucial for efficient real-time and hardware implementation. The method is implemented on an oversampled WOLA filterbank employed as part of an echo cancellation application. Evaluation results demonstrate that the proposed implementation outperforms conventional SAF systems since the signals used in actual adaptive filtering are not distorted by filterbank aliasing. The method is a good match for partial update adaptive algorithms since segments of the time-domain adaptive filter are sequentially reconstructed and updated.

## EFFICIENT MAPPING OF ADVANCED SIGNAL PROCESSING ALGORITHMS ON MULTI-PROCESSOR ARCHITECTURES

Modern microprocessor technology is migrating from simply increasing clock speeds on a single processor to placing multiple processors on a die to increase throughput and power performance in every generation. To utilize the potential of such a system, signal processing algorithms have to be efficiently parallelized so that the load can be distributed evenly among the multiple processing units. In this paper, we study several advanced deterministic and stochastic signal processing algorithms and their computation using multiple processing units. Specifically, we consider two commonly used time-frequency signal representations, the short-time Fourier transform and the Wigner distribution, and we demonstrate their parallelization with low communication overhead. We also consider sequential Monte Carlo estimation techniques such as particle filtering, and we demonstrate that its multiple processor implementation requires large data exchanges and thus a high communication overhead. We propose a modified mapping scheme that reduces this overhead at the expense of a slight loss in accuracy, and we evaluate the performance of the scheme for a state estimation problem with respect to accuracy and scalability.

## Algorithm Adaptation and Optimization of a Novel DSP Vector Co-processor

The Division of Computer Engineering at Linköping's university is currently researching the possibility to create a highly parallel DSP platform, that can keep up with the computational needs of upcoming standards for various applications, at low cost and low power consumption. The architecture is called ePUMA and it combines a general RISC DSP master processor with eight SIMD co-processors on a single chip. The master processor will act as the main processor for general tasks and execution control, while the co-processors will accelerate computing intensive and parallel DSP kernels.This thesis investigates the performance potential of the co-processors by implementing matrix algebra kernels for QR decomposition, LU decomposition, matrix determinant and matrix inverse, that run on a single co-processor. The kernels will then be evaluated to find possible problems with the co-processors' microarchitecture and suggest solutions to the problems that might exist. The evaluation shows that the performance potential is very good, but a few problems have been identified, that causes significant overhead in the kernels. Pipeline mismatches, that occurs due to different pipeline lengths for different instructions, causes pipeline hazards and the current solution to this, doesn't allow effective use of the pipeline. In some cases, the single port memories will cause bottlenecks, but the thesis suggests that the situation could be greatly improved by using buffered memory write-back. Also, the lack of register forwarding makes kernels with many data dependencies run unnecessarily slow.

## Auditory Component Analysis Using Perceptual Pattern Recognition to Identify and Extract Independent Components From an Auditory Scene

The cocktail party effect, our ability to separate a sound source from a multitude of other sources, has been researched in detail over the past few decades, and many investigators have tried to model this on computers. Two of the major research areas currently being evaluated for the so-called sound source separation problem are Auditory Scene Analysis (Bregman 1990) and a class of statistical analysis techniques known as Independent Component Analysis (Hyvärinen 2001). This paper presents a methodology for combining these two techniques. It suggests a framework that first separates sounds by analyzing the incoming audio for patterns and synthesizing or filtering them accordingly, measures features of the resulting tracks, and finally separates sounds statistically by matching feature sets and making the output streams statistically independent. Artificial and acoustical mixes of sounds are used to evaluate the signal-to-noise ratio where the signal is the desired source and the noise is comprised of all other sources. The proposed system is found to successfully separate audio streams. The amount of separation is inversely proportional to the amount of reverberation present.

## Optimization of Synthesis Oversampled Complex Filter Banks

An important issue with oversampled FIR analysis filter banks (FBs) is to determine inverse synthesis FBs, when they exist. Given any complex oversampled FIR analysis FB, we first provide an algorithm to determine whether there exists an inverse FIR synthesis system. We also provide a method to ensure the Hermitian symmetry property on the synthesis side, which is serviceable to processing real-valued signals. As an invertible analysis scheme corresponds to a redundant decomposition, there is no unique inverse FB. Given a particular solution, we parameterize the whole family of inverses through a null space projection. The resulting reduced parameter set simplifies design procedures, since the perfect reconstruction constrained optimization problem is recast as an unconstrained optimization problem. The design of optimized synthesis FBs based on time or frequency localization criteria is then investigated, using a simple yet efficient gradient algorithm.