A NEW PARALLEL IMPLEMENTATION FOR PARTICLE FILTERS AND ITS APPLICATION TO ADAPTIVE WAVEFORM DESIGN
Sequential Monte Carlo particle filters (PFs) are useful for estimating nonlinear non-Gaussian dynamic system parameters. As these algorithms are recursive, their real-time implementation can be computationally complex. In this paper, we analyze the bottlenecks in existing parallel PF algorithms, and we propose a new approach that integrates parallel PFs with independent Metropolis-Hastings (PPF-IMH) algorithms to improve root mean-squared estimation error performance. We implement the new PPF-IMH algorithm on a Xilinx Virtex-5 field programmable gate array (FPGA) platform. For a onedimensional problem and using 1,000 particles, the PPF-IMH architecture with four processing elements utilizes less than 5% Virtex-5 FPGA resources and takes 5.85 μs for one iteration. The algorithm performance is also demonstrated when designing the waveform for an agile sensing application.
Digital Signal Processor Fundamentals and System Design
Digital Signal Processors (DSPs) have been used in accelerator systems for more than fifteen years and have largely contributed to the evolution towards digital technology of many accelerator systems, such as machine protection, diagnostics and control of beams, power supply and motors. This paper aims at familiarising the reader with DSP fundamentals, namely DSP characteristics and processing development. Several DSP examples are given, in particular on Texas Instruments DSPs, as they are used in the DSP laboratory companion of the lectures this paper is based upon. The typical system design flow is described; common difficulties, problems and choices faced by DSP developers are outlined; and hints are given on the best solution.
Filter a Rectangular Pulse with no Ringing
To filter a rectangular pulse without any ringing, there is only one requirement on the filter coefficients: they must all be positive. However, if we want the leading and trailing edge of the pulse to be symmetrical, then the coefficients must be symmetrical. What we are describing is basically a window function.
An application of neural networks to adaptive playout delay in VoIP
The statistical nature of data traffic and the dynamic routing techniques employed in IP networks results in a varying network delay (jitter) experienced by the individual IP packets which form a VoIP flow. As a result voice packets generated at successive and periodic intervals at a source will typically be buffered at the receiver prior to playback in order to smooth out the jitter. However, the additional delay introduced by the playout buffer degrades the quality of service. Thus, the ability to forecast the jitter is an integral part of selecting an appropriate buffer size. This paper compares several neural network based models for adaptive playout buffer selection and in particular a novel combined wavelet transform/neural network approach is proposed. The effectiveness of these algorithms is evaluated using recorded VoIP traces by comparing the buffering delay and the packet loss ratios for each technique. In addition, an output speech signal is reconstructed based on the packet loss information for each algorithm and the perceptual quality of the speech is then estimated using the PESQ MOS algorithm. Simulation results indicate that proposed Haar-Wavelets-Packet MLP and Statistical-Model MLP adaptive scheduling schemes offer superior performance.
Code Acquisition using Smart Antennas with Adaptive Filtering Scheme for DS-CDMA Systems
Pseudo-noise (PN) code synchronizer is an essential element of direct-sequence code division multiple access (DS-CDMA) system because data transmission is possible only after the receiver accurately synchronizes the locally generated PN code with the incoming PN code. The code synchronization is processed in two steps, acquisition and tracking, to estimate the delay offset between the two codes. Recently, the adaptive LMS filtering scheme has been proposed for performing both code acquisition and tracking with the identical structure, where the LMS algorithm is used to adjust the FIR filter taps to search for the value of delay-offset adaptively. A decision device is employed in the adaptive LMS filtering scheme as a decision variable to indicate code synchronization, hence it plays an important role for the performance of mean acquisition time (MAT). In this thesis, only code acquisition is considered. In this thesis, a new decision device, referred to as the weight vector square norm (WVSN) test method, is devised associated with the adaptive LMS filtering scheme for code acquisition in DS-CDMA system. The system probabilities of the proposed scheme are derived for evaluating MAT. Numerical analyses and simulation results verify that the performance of the proposed scheme, in terms of detection probability and MAT, is superior to the conventional scheme with mean-squared error (MSE) test method, especially when the signal-to-interference-plus-noise ratio (SINR) is relatively low. Furthermore, an efficient and joint-adaptation code acquisition scheme, i.e., a smart antenna coupled with the proposed adaptive LMS filtering scheme with the WVSN test method, is devised for applying to a base station, where all antenna elements are employed during PN code acquisition. This new scheme is a process of PN code acquisition and the weight coefficients of smart antenna jointly and adaptively. Numerical analyses and simulation results demonstrate that the performance of the proposed scheme with five antenna elements, in terms of the output SINR, the detection probability and the MAT, can be improved by around 7 dB, compared to the one with single antenna case.
Real Time Implementation of Multi-Level Perfect Signal Reconstruction Filter Bank
Discrete Wavelet Transform (DWT) is an efficient tool for signal and image processing applications which has been utilized for perfect signal reconstruction. In this paper, twenty seven optimum combinations of three different wavelet filter types, three different filter reconstruction levels and three different kinds of signal for multi-level perfect reconstruction filter bank were implemented in MATLAB/Simulink. All the filters for different wavelet types were designed using Filter Design Analysis (FDA) and Wavelet toolbox. Signal to Noise Ratio (SNR) was calculated for each combination. Combination with best SNR was then implemented on TMS320C6713 DSP kit. Real time testing of perfect reconstruction on DSP kit was then carried out by two different methods. Experimental results accede with theory and simulations.
BLAS Comparison on FPGA, CPU and GPU
High Performance Computing (HPC) or scientific codes are being executed across a wide variety of computing platforms from embedded processors to massively parallel GPUs. We present a comparison of the Basic Linear Algebra Subroutines (BLAS) using double-precision floating point on an FPGA, CPU and GPU. On the CPU and GPU, we utilize standard libraries on state-of-the-art devices. On the FPGA, we have developed parameterized modular implementations for the dot product and Gaxpy or matrix-vector multiplication. In order to obtain optimal performance for any aspect ratio of the matrices, we have designed a high-throughput accumulator to perform an efficient reduction of floating point values. To support scalability to large data-sets, we target the BEE3 FPGA platform. We use performance and energy efficiency as metrics to compare the different platforms. Results show that FPGAs offer comparable performance as well as 2.7 to 293 times better energy efficiency for the test cases that we implemented on all three platforms.
Energy Profiling of DSP Applications, A Case Study of an Intelligent ECG Monitor
Proper balance of power and performance for optimum system organization requires precise profiling of the power consumption of different hardware subsystems as well as software functions. Moreover, power consumption of mobile systems is even more important, since the battery is a large portion of the overall size and weight of the system. Average power consumption is only a crude estimate of power requirements and battery life; a much better estimate can be made using dynamic power consumption. Dynamic power consumption is a function of the execution profile of the given application running on specific hardware platform. In this paper we introduce a new environment for energy profiling of DSP applications. The environment consists of a JTAG emulator, a high-resolution HP 3583A multimeter and a workstation that controls devices and stores the traces. We use Texas Instruments’ Real Time Data Exchange mechanism (RTDXÔ) to generate an execution profile and custom procedures for energy profile data acquisition using GPIB interface. We developed custom procedures to correlate and analyze both energy and execution profiles. The environment allows us to improve the system power consumption through changes in software organization and to measure real battery life for the given hardware, software and battery configuration. As a case study, we present the analysis of a real-time portable ECG monitor implemented using a Texas Instruments TMS320C5410-100 processor board, and a Del Mar PWA ECG Amplifier.
An FPGA Implementation of Hierarchical Motion Estimation for Embedded Oject Tracking
This paper presents the hardware implementation of an algorithm developed to provide automatic motion detection and object tracking functionality embedded within intelligent CCTV systems. The implementation is targeted at an Altera Stratix FPGA making full use of the dedicated DSP resource. The Altera Nios embedded processor provides a platform for the tracking control loop and generic Pan Tilt Zoom camera interface. This paper details the explicit functional stages of the algorithm that lend themselves to an optimised pipelined hardware implementation. This implementation provides maximum data throughput, providing real-time operation of the described algorithm, and enables a moving camera to track a moving object in real time.
Optimization of Synthesis Oversampled Complex Filter Banks
An important issue with oversampled FIR analysis filter banks (FBs) is to determine inverse synthesis FBs, when they exist. Given any complex oversampled FIR analysis FB, we first provide an algorithm to determine whether there exists an inverse FIR synthesis system. We also provide a method to ensure the Hermitian symmetry property on the synthesis side, which is serviceable to processing real-valued signals. As an invertible analysis scheme corresponds to a redundant decomposition, there is no unique inverse FB. Given a particular solution, we parameterize the whole family of inverses through a null space projection. The resulting reduced parameter set simplifies design procedures, since the perfect reconstruction constrained optimization problem is recast as an unconstrained optimization problem. The design of optimized synthesis FBs based on time or frequency localization criteria is then investigated, using a simple yet efficient gradient algorithm.
Hybrid Floating Point Technique Yields 1.2 Gigasample Per Second 32 to 2048 point Floating Point FFT in a single FPGA
Hardware Digital Signal Processing, especially hardware targeted to FPGAs, has traditionally been done using fixed point arithmetic, mainly due to the high cost associated with implementing floating point arithmetic. That cost comes in the form of increased circuit complexity. The increase circuit complexity usually also degrades maximum clock performance. Certain applications demand the dynamic range offered by floating point hardware, and yet require the speeds and circuit density usually associated with fixed point hardware. The Fourier transform is one DSP building block that frequently requires floating point dynamic range. Textbook construction of a pipelined floating point FFT engine capable of continuous input entails dozens of floating point adders and multipliers. The complexity of those circuits quickly exceeds the resources available on a single FPGA. This paper describes a technique that is a hybrid of fixed point and floating point operations designed to significantly reduce the overhead for floating point. The results are illustrated with an FFT processor that performs 32, 64, 128, 256, 512, 1024 and 2048 point Fourier transforms with IEEE single precision floating point inputs and outputs. The design achieves sufficient density to realize a continuous complex data rate of 1.2 Gigasamples per second data throughput using a single Virtex4-SX55-10 device.
Introduction to Digital Signal Processing
Nice slides introducing Digital Signal Processing.
Efficient Digital Fiilters
What would you do in the following situation? Let ’ s say you are diagnosing a DSP system problem in the field. You have your trusty laptop with your development system and an emulator. You figure out that there was a problem with the system specifications and a symmetric FIR filter in the software won ’ t do the job; it needs reduced passband ripple, or maybe more stopband attenuation. You then realize you don ’ t have any filter design software on the laptop, and the customer is getting angry. The answer is easy: You can take the existing filter and sharpen it. Simply stated, filter sharpening is a technique for creating a new filter from an old one [1] – [3] . While the technique is almost 30 years old, it is not generally known by DSP engineers nor is it mentioned in most DSP textbooks.
An application of neural networks to adaptive playout delay in VoIP
The statistical nature of data traffic and the dynamic routing techniques employed in IP networks results in a varying network delay (jitter) experienced by the individual IP packets which form a VoIP flow. As a result voice packets generated at successive and periodic intervals at a source will typically be buffered at the receiver prior to playback in order to smooth out the jitter. However, the additional delay introduced by the playout buffer degrades the quality of service. Thus, the ability to forecast the jitter is an integral part of selecting an appropriate buffer size. This paper compares several neural network based models for adaptive playout buffer selection and in particular a novel combined wavelet transform/neural network approach is proposed. The effectiveness of these algorithms is evaluated using recorded VoIP traces by comparing the buffering delay and the packet loss ratios for each technique. In addition, an output speech signal is reconstructed based on the packet loss information for each algorithm and the perceptual quality of the speech is then estimated using the PESQ MOS algorithm. Simulation results indicate that proposed Haar-Wavelets-Packet MLP and Statistical-Model MLP adaptive scheduling schemes offer superior performance.
HIERARCHICAL MOTION ESTIMATION FOR EMBEDDED OBJECT TRACKING
This paper presents an algorithm developed to provide automatic motion detection and object tracking embedded within intelligent CCTV systems. The algorithm development focuses on techniques which provide an efficient embedded systems implementation with the ability to target both FPGA and DSP devices. During algorithm development constraints on hardware implementation have been fully considered resulting in an algorithm which, when targeted at current FPGA devices, will take full advantage of the DSP resource commonly provided in such devices. The hierarchical structure of the proposed algorithm provides the system with a multi-level motion estimation process allowing low resolution estimation for motion detection and further higher resolution stages for motion estimation. An initial MATLAB prototype has demonstrated this algorithm capable of object motion estimation while compensating for camera motion, allowing a moving object to be tracked by a moving camera.
An FPGA Implementation of Hierarchical Motion Estimation for Embedded Oject Tracking
This paper presents the hardware implementation of an algorithm developed to provide automatic motion detection and object tracking functionality embedded within intelligent CCTV systems. The implementation is targeted at an Altera Stratix FPGA making full use of the dedicated DSP resource. The Altera Nios embedded processor provides a platform for the tracking control loop and generic Pan Tilt Zoom camera interface. This paper details the explicit functional stages of the algorithm that lend themselves to an optimised pipelined hardware implementation. This implementation provides maximum data throughput, providing real-time operation of the described algorithm, and enables a moving camera to track a moving object in real time.
Hidden Markov Model based recognition of musical pattern in South Indian Classical Music
Automatic recognition of musical patterns plays a crucial part in Musicological and Ethno musicological research and can become an indispensable tool for the search and comparison of music extracts within a large multimedia database. This paper finds an efficient method for recognizing isolated musical patterns in a monophonic environment, using Hidden Markov Model. Each pattern, to be recognized, is converted into a sequence of frequency jumps by means of a fundamental frequency tracking algorithm, followed by a quantizer. The resulting sequence of frequency jumps is presented to the input of the recognizer which use Hidden Markov Model. The main characteristic of Hidden Markov Model is that it utilizes the stochastic information from the musical frame to recognize the pattern. The methodology is tested in the context of South Indian Classical Music, which exhibits certain characteristics that make the classification task harder, when compared with Western musical tradition. Recognition of 100% has been obtained for the six typical music pattern used in practise. South Indian classical instrument, flute is used for the whole experiment.
Design and implementation of odd-order wave digital lattice lowpass filters, from specifications to Motorol DSP56307EVM module
This thesis is dedicated to applying and developing explicit formulas for the design and implementation of odd-order lattice Lowpass wave digital filters (WDFs) on a Digital Signal Processor (DSP), such as a Motorola DSP56307EVM (Evaluation Module). The direct design method of Gazsi for filter types such as Butterworfh, Chebyshev, inverse Chebyshev, and Cauer (Elliptic) provides a straightforward method for calculating the coefficients without an extensive knowledge of digital signal processing. A program package to design and implement odd-order WDFs, including detailed procedures and examples, is presented in this thesis and includes not only the calculations of the coefficients, but also the simulation on a MATLAB platform and an implementation on a Motorola DSP56307EVM board. It is very quick, effective and convenient to obtain the coefficients when the user enters a few parameters according to the general specifications; to verify the characteristics of the designed filter; to simulate the filter on the MATLAB platform; to implement the filter on the DSP board; and to compare the results between the simulation and the implementation.
Image Analysis Using a Dual-Tree M-Band Wavelet Transform
We propose a 2D generalization to the M-band case of the dual-tree decomposition structure (initially proposed by N. Kingsbury and further investigated by I. Selesnick) based on a Hilbert pair of wavelets. We particularly address (i) the construction of the dual basis and (ii) the resulting directional analysis. We also revisit the necessary pre-processing stage in the M-band case. While several reconstructions are possible because of the redundancy of the representation, we propose a new optimal signal reconstruction technique, which minimizes potential estimation errors. The effectiveness of the proposed M- band decomposition is demonstrated via denoising comparisons on several image types (natural, texture, seismics), with various M-band wavelets and thresholding strategies. Signicant improvements in terms of both overall noise reduction and direction preservation are observed.
Peak-to-Average Power Ratio and CCDF
Peak to Average Power Ratio (PAPR) is often used to characterize digitally modulated signals. One example application is setting the level of the signal in a digital modulator. Knowing PAPR allows setting the average power to a level that is just low enough to minimize clipping.
Complex Down-Conversion Amplitude Loss
This article illustrates the signal amplitude loss inherent in a traditional complex down-conversion system. (In the literature of signal processing, complex down-conversion is also called "quadrature demodulation.")
BLAS Comparison on FPGA, CPU and GPU
High Performance Computing (HPC) or scientific codes are being executed across a wide variety of computing platforms from embedded processors to massively parallel GPUs. We present a comparison of the Basic Linear Algebra Subroutines (BLAS) using double-precision floating point on an FPGA, CPU and GPU. On the CPU and GPU, we utilize standard libraries on state-of-the-art devices. On the FPGA, we have developed parameterized modular implementations for the dot product and Gaxpy or matrix-vector multiplication. In order to obtain optimal performance for any aspect ratio of the matrices, we have designed a high-throughput accumulator to perform an efficient reduction of floating point values. To support scalability to large data-sets, we target the BEE3 FPGA platform. We use performance and energy efficiency as metrics to compare the different platforms. Results show that FPGAs offer comparable performance as well as 2.7 to 293 times better energy efficiency for the test cases that we implemented on all three platforms.
Real Time Implementation of Multi-Level Perfect Signal Reconstruction Filter Bank
Discrete Wavelet Transform (DWT) is an efficient tool for signal and image processing applications which has been utilized for perfect signal reconstruction. In this paper, twenty seven optimum combinations of three different wavelet filter types, three different filter reconstruction levels and three different kinds of signal for multi-level perfect reconstruction filter bank were implemented in MATLAB/Simulink. All the filters for different wavelet types were designed using Filter Design Analysis (FDA) and Wavelet toolbox. Signal to Noise Ratio (SNR) was calculated for each combination. Combination with best SNR was then implemented on TMS320C6713 DSP kit. Real time testing of perfect reconstruction on DSP kit was then carried out by two different methods. Experimental results accede with theory and simulations.
Auditory Component Analysis Using Perceptual Pattern Recognition to Identify and Extract Independent Components From an Auditory Scene
The cocktail party effect, our ability to separate a sound source from a multitude of other sources, has been researched in detail over the past few decades, and many investigators have tried to model this on computers. Two of the major research areas currently being evaluated for the so-called sound source separation problem are Auditory Scene Analysis (Bregman 1990) and a class of statistical analysis techniques known as Independent Component Analysis (Hyvärinen 2001). This paper presents a methodology for combining these two techniques. It suggests a framework that first separates sounds by analyzing the incoming audio for patterns and synthesizing or filtering them accordingly, measures features of the resulting tracks, and finally separates sounds statistically by matching feature sets and making the output streams statistically independent. Artificial and acoustical mixes of sounds are used to evaluate the signal-to-noise ratio where the signal is the desired source and the noise is comprised of all other sources. The proposed system is found to successfully separate audio streams. The amount of separation is inversely proportional to the amount of reverberation present.
A Nonlinear Stein Based Estimator for Multichannel Image Denoising
The use of multicomponent images has become widespread with the improvement of multisensor systems having increased spatial and spectral resolutions. However, the observed images are often corrupted by an additive Gaussian noise. In this paper, we are interested in multichannel image denoising based on a multiscale representation of the images. A multivariate statistical approach is adopted to take into account both the spatial and the inter-component correlations existing between the different wavelet subbands. More precisely, we propose a new parametric nonlinear estimator which generalizes many reported denoising methods. The derivation of the optimal parameters is achieved by applying Stein’s principle in the multivariate case. Experiments performed on multispectral remote sensing images clearly indicate that our method outperforms conventional wavelet denoising techniques.
Region based Active Contour Segmentation
In this paper, we propose a natural framework that allows any region-based segmentation energy to be re-formulated in a local way. We consider local rather than global image statistics and evolve a contour based on local information. Localized contours are capable of segmenting objects with heterogeneous feature profiles that would be difficult to capture correctly using a standard global method. The presented technique is versatile enough to be used with any global region-based active contour energy and instill in it the benefits of localization. We describe this framework and demonstrate the localization of three well-known energies in order to illustrate how our framework can be applied to any energy. We then compare each localized energy to its global counterpart to show the improvements that can be achieved. Next, an in-depth study of the behaviors of these energies in response to the degree of localization is given. Finally, we show results on challenging images to illustrate the robust and accurate segmentations that are possible with this new class of active contour models.
LOW-RESOURCE DELAYLESS SUBBAND ADAPTIVE FILTER USING WEIGHTED OVERLAP-ADD
A delayless structure targeted for low-resource implementation is proposed to eliminate filterbank processing delays in subband adaptive filters (SAFs). Rather than using direct IFFT or polyphase filterbanks to transform the SAFs back into the time-domain, the proposed method utilizes a weighted overlap-add (WOLA) synthesis. Low-resource real-time implementations are targeted and as such do not involve long (as long as the echo plant) FFT or IFFT operations. Also, the proposed approach facilitates time distribution of the adaptive filter reconstruction calculations crucial for efficient real-time and hardware implementation. The method is implemented on an oversampled WOLA filterbank employed as part of an echo cancellation application. Evaluation results demonstrate that the proposed implementation outperforms conventional SAF systems since the signals used in actual adaptive filtering are not distorted by filterbank aliasing. The method is a good match for partial update adaptive algorithms since segments of the time-domain adaptive filter are sequentially reconstructed and updated.
A pole-zero placement technique for designing second-order IIR parametric equalizer filters
A new procedure is presented for designing second-order parametric equalizer filters. In contrast to the traditional approach, in which the design is based on a bilinear transform of an analog filter, the presented procedure allows for designing the filter directly in the digital domain. A rather intuitive technique known as pole-zero placement, is treated here in a quantitative way. It is shown that by making some meaningful approximations, a set of relatively simple design equations can be obtained. Design examples of both notch and resonance filters are included to illustrate the performance of the proposed method, and to compare with state-of-the-art solutions.






