## OPTIMAL DESIGN OF DIGITAL EQUIVALENTS TO ANALOG FILTERS

●4 commentsThe proposed optimal algorithm for the digitizing of analog filters is based on two existing filter design methods: the extended window design (EWD) and the matched–pole (MP) frequency sampling design. The latter is closely related to the filter design with iterative weighted least squares (WLS). The optimization is performed with an original MP design that yields an equiripple digitizing error. Then, a drastic reduction of the digitizing error is achieved through the introduction of a fractional time shift that minimizes the magnitude of the equiripple error within a given frequency interval. The optimal parameters thus obtained can be used to generate the EWD equations, together with a variable fractional delay output, as described in an earlier paper. Finally, in contrast to the WLS procedure, which relies on a “good guess” of the weighting function, the MP optimization is straightforward.

## A NEW PARALLEL IMPLEMENTATION FOR PARTICLE FILTERS AND ITS APPLICATION TO ADAPTIVE WAVEFORM DESIGN

Sequential Monte Carlo particle ﬁlters (PFs) are useful for estimating nonlinear non-Gaussian dynamic system parameters. As these algorithms are recursive, their real-time implementation can be computationally complex. In this paper, we analyze the bottlenecks in existing parallel PF algorithms, and we propose a new approach that integrates parallel PFs with independent Metropolis-Hastings (PPF-IMH) algorithms to improve root mean-squared estimation error performance. We implement the new PPF-IMH algorithm on a Xilinx Virtex-5 ﬁeld programmable gate array (FPGA) platform. For a onedimensional problem and using 1,000 particles, the PPF-IMH architecture with four processing elements utilizes less than 5% Virtex-5 FPGA resources and takes 5.85 μs for one iteration. The algorithm performance is also demonstrated when designing the waveform for an agile sensing application.

## A pole-zero placement technique for designing second-order IIR parametric equalizer filters

A new procedure is presented for designing second-order parametric equalizer filters. In contrast to the traditional approach, in which the design is based on a bilinear transform of an analog filter, the presented procedure allows for designing the filter directly in the digital domain. A rather intuitive technique known as pole-zero placement, is treated here in a quantitative way. It is shown that by making some meaningful approximations, a set of relatively simple design equations can be obtained. Design examples of both notch and resonance filters are included to illustrate the performance of the proposed method, and to compare with state-of-the-art solutions.

## Adaptive distributed noise reduction for speech enhancement in wireless acoustic sensor networks

An adaptive distributed noise reduction algorithm for speech enhancement is considered, which operates in a wireless acoustic sensor network where each node collects multiple microphone signals. In previous work, it was shown theoretically that for a stationary scenario, the algorithm provides the same signal estimators as the centralized multi-channel Wiener filter, while significantly compressing the data that is transmitted between the nodes. Here, we present simulation results of a fully adaptive implementation of the algorithm, in a non-stationary acoustic scenario with a moving speaker and two babble noise sources. The algorithm is implemented using a weighted overlap-add technique to reduce the overall input-output delay. It is demonstrated that good results can be obtained by estimating the required signal statistics with a long-term forgetting factor without downdating, even though the signal statistics change along with the iterative filter updates. It is also demonstrated that simultaneous node updating provides a significantly smoother and faster tracking performance compared to sequential node updating.

## EFFICIENT MAPPING OF ADVANCED SIGNAL PROCESSING ALGORITHMS ON MULTI-PROCESSOR ARCHITECTURES

●2 commentsModern microprocessor technology is migrating from simply increasing clock speeds on a single processor to placing multiple processors on a die to increase throughput and power performance in every generation. To utilize the potential of such a system, signal processing algorithms have to be efficiently parallelized so that the load can be distributed evenly among the multiple processing units. In this paper, we study several advanced deterministic and stochastic signal processing algorithms and their computation using multiple processing units. Specifically, we consider two commonly used time-frequency signal representations, the short-time Fourier transform and the Wigner distribution, and we demonstrate their parallelization with low communication overhead. We also consider sequential Monte Carlo estimation techniques such as particle filtering, and we demonstrate that its multiple processor implementation requires large data exchanges and thus a high communication overhead. We propose a modified mapping scheme that reduces this overhead at the expense of a slight loss in accuracy, and we evaluate the performance of the scheme for a state estimation problem with respect to accuracy and scalability.

## Closing the gap: CPU and FPGA Trends in sustainable floating-point BLAS performance

Field programmable gate arrays (FPGAs) have long been an attractive alternative to microprocessors for computing tasks — as long as floating-point arithmetic is not required. Fueled by the advance of Moore’s Law, FPGAs are rapidly reaching sufficient densities to enhance peak floating-point performance as well. The question, however, is how much of this peak performance can be sustained. This paper examines three of the basic linear algebra subroutine (BLAS) functions: vector dot product, matrix-vector multiply, and matrix multiply. A comparison of microprocessors, FPGAs, and Reconfigurable Computing platforms is performed for each operation. The analysis highlights the amount of memory bandwidth and internal storage needed to sustain peak performance with FPGAs. This analysis considers the historical context of the last six years and is extrapolated for the next six years.

## BLAS Comparison on FPGA, CPU and GPU

High Performance Computing (HPC) or scientific codes are being executed across a wide variety of computing platforms from embedded processors to massively parallel GPUs. We present a comparison of the Basic Linear Algebra Subroutines (BLAS) using double-precision floating point on an FPGA, CPU and GPU. On the CPU and GPU, we utilize standard libraries on state-of-the-art devices. On the FPGA, we have developed parameterized modular implementations for the dot product and Gaxpy or matrix-vector multiplication. In order to obtain optimal performance for any aspect ratio of the matrices, we have designed a high-throughput accumulator to perform an efficient reduction of floating point values. To support scalability to large data-sets, we target the BEE3 FPGA platform. We use performance and energy efficiency as metrics to compare the different platforms. Results show that FPGAs offer comparable performance as well as 2.7 to 293 times better energy efficiency for the test cases that we implemented on all three platforms.

## Biosignal processing challenges in emotion recognition for adaptive learning

●16 commentsUser-centered computer based learning is an emerging field of interdisciplinary research. Research in diverse areas such as psychology, computer science, neuroscience and signal processing is making contributions to take this field to the next level. Learning systems built using contributions from these fields could be used in actual training and education instead of just laboratory proof-of-concept. One of the important advances in this research is the detection and assessment of the cognitive and emotional state of the learner using such systems. This capability moves development beyond the use of traditional user performance metrics to include system intelligence measures that are based on current theories in neuroscience. These advances are of paramount importance in the success and wide spread use of learning systems that are automated and intelligent. Emotion is considered an important aspect of how learning occurs, and yet estimating it and making adaptive adjustments are not part of most learning systems. In this research we focus on one specific aspect of constructing an adaptive and intelligent learning system, that is, estimation of the emotion of the learner as he/she is using the automated training system. The challenge starts with the definition of the emotion and the utility of it in human life. The next challenge is to measure the co-varying factors of the emotions in a non-invasive way, and find consistent features from these measures that are valid across wide population. In this research we use four physiological sensors that are non-invasive, and establish a methodology of utilizing the data from these sensors using different signal processing tools. A validated set of visual stimuli used worldwide in the research of emotion and attention, called International Affective Picture System (IAPS), is used. A dataset is collected from the sensors in an experiment designed to elicit emotions from these validated visual stimuli. We describe a novel wavelet method to calculate hemispheric asymmetry metric using electroencephalography data. This method is tested against typically used power spectral density method. We show overall improvement in accuracy in classifying specific emotions using the novel method. We also show distinctions between different discrete emotions from the autonomic nervous system activity using electrocardiography, electrodermal activity and pupil diameter changes. Findings from different features from these sensors are used to give guidelines to use each of the individual sensors in the adaptive learning environment.

## Gauss-Newton Based Learning for Fully Recurrent Neural Networks

●4 commentsThe thesis discusses a novel off-line and on-line learning approach for Fully Recurrent Neural Networks (FRNNs). The most popular algorithm for training FRNNs, the Real Time Recurrent Learning (RTRL) algorithm, employs the gradient descent technique for finding the optimum weight vectors in the recurrent neural network. Within the framework of the research presented, a new off-line and on-line variation of RTRL is presented, that is based on the Gauss-Newton method. The method itself is an approximate Newton’s method tailored to the specific optimization problem, (non-linear least squares), which aims to speed up the process of FRNN training. The new approach stands as a robust and effective compromise between the original gradient-based RTRL (low computational complexity, slow convergence) and Newton-based variants of RTRL (high computational complexity, fast convergence). By gathering information over time in order to form Gauss-Newton search vectors, the new learning algorithm, GN-RTRL, is capable of converging faster to a better quality solution than the original algorithm. Experimental results reflect these qualities of GN-RTRL, as well as the fact that GN-RTRL may have in practice lower computational cost in comparison, again, to the original RTRL.

## Wavelet Denoising for TDR Dynamic Range Improvement

A technique is presented for removing large amounts of noise present in time-domain-reflectometry (TDR) waveforms to increase the dynamic range of TDR waveforms and TDR based s-parameter measurements.

## Music Signal Processing

Chapter 12 of the book "Multimedia Signal Processing: Theory and Applications in Speech, Music and Communications" - Musical Instruments - A Review of Basic Physics of Sound - Music Signal Features and Models - Ear: Hearing of Sounds - Psychoacoustics of Hearing - Music Compression - High Quality Music Coding: MPEG - Stereo Music - Music Recognition

## Optimization of Audio Processing algorithms (Reverb) on ARMv6 family of processors

Audio processing algorithms are increasingly used in cell phones and today’s customers are placing more demands on cell phones. Feature phones, once the advent of mobile phone technology, nowadays do more than just providing the user with MP3 play back or advanced audio effects. These features have become an integral part of medium as well as low-end phones. On the other hand, there is also an endeavor to include as improved quality as possible into products to compete in market and satisfy users’ needs. Tackling the above requirements has been partly satisfied by the advance in hardware design and manufacturing technology. However, as new hardware emerges into market the need for competence to write efficient software and exploit the new features thoroughly and effectively arises. Even though compilers are also keeping up with the new tide space for hand optimized code still exist. Wrapped in the above goal, an effort was made in this thesis to partly cover the competence requirement at Multimedia Section (part of Ericsson Mobile Platforms) to develope optimized code for new processors. Forging persistently ahead with new products, EMP has always incorporated the latest technology into its products among which ARMv6 family of processors has the main central processing role in a number of upcoming products. To fully exploit latest features provided by ARMv6, it was required to probe its new instruction set among which new media processing instructions are of outmost importance. In order to execute DSP-intensive algorithms (e.g. Audio Processing algorithms) efficiently, the implementation should be done in low-level code applying available instruction set. Meanwhile, ARMv6 comes with a number of new features in comparison with its predecessors. SIMD (Single Instruction Multiple Data) and VFP (Vector Floating Point) are the most prominent media processing improvements in ARMv6. Aligned with thesis goals and guidelines, Reverb algorithm which is among one of the most complicated audio features on a hand-held devices was probed. Consequently, its kernel parts were identified and implementation was done both in fixed-point and floating-point using the available resources on hardware. Besides execution time and amount of code memory for each part were measured and provided in tables and charts for comparison purposes. Conclusions were finally drawn based on developed code’s efficiency over ARM compiler’s as well as existing code already developed and tailored to ARMv5 processors. The main criteria for optimization was the execution time. Moreover, quantization effect due to limited precision fixed-point arithmetic was formulated and its effect on quality was elaborated. The outcomes, clearly indicate that hand optimization of kernel parts are superior to Compiler optimized alternative both from the point of code memory as well as execution time. The results also confirmed the presumption that hand optimized code using new instruction set can improve efficiency by an average 25%-50% depending on the algorithm structure and its interaction with other parts of audio effect. Despite its many draw backs, fixed-point implementation remains yet to be the dominant implementation for majority of DSP algorithms on low-power devices.

## Fundamentals of the DFT (fft) Algorithms

In this article, a physical explanation of the fundamentals of the DFT (fft) algorithms is presented in terms of waveform decomposition. After reading the article and trying the examples, the reader is expected to gain a clear understanding of the basics of the mysterious DFT (fft) algorithms.

## A pole-zero placement technique for designing second-order IIR parametric equalizer filters

A new procedure is presented for designing second-order parametric equalizer filters. In contrast to the traditional approach, in which the design is based on a bilinear transform of an analog filter, the presented procedure allows for designing the filter directly in the digital domain. A rather intuitive technique known as pole-zero placement, is treated here in a quantitative way. It is shown that by making some meaningful approximations, a set of relatively simple design equations can be obtained. Design examples of both notch and resonance filters are included to illustrate the performance of the proposed method, and to compare with state-of-the-art solutions.

## Evaluation of Image Warping Algorithms for Implementation in FPGA

The target of this master thesis is to evaluate the Image Warping technique and propose a possible design for an implementation in FPGA. The Image Warping is widely used in the image processing for image correction and rectification. A DSP is a usual choice for implantation of the image processing algorithms, but to decrease a cost of the target system it was proposed to use an FPGA for implementation. In this work a different Image Warping methods was evaluated in terms of performance, produced image quality, complexity and design size. Also, considering that it is not only Image Warping algorithm which will be implemented on the target system, it was important to estimate a possible memory bandwidth used by the proposed design. The evaluation was done by implemented a C-model of the proposed design with a finite datapath to simulate hardware implementation as close as possible.

## Digital Signal Processing Maths

●1 commentModern digital signal processing makes use of a variety of mathematical techniques. These techniques are used to design and understand efficient filters for data processing and control.

## EngD thesis: Reduced-Complexity Signal Detection in Digital Communications Receivers

The Author began this Engineering Doctorate (EngD) in Autumn 1999, whilst already in full-time employment as a DSP software engineer at Nortel Networks, Harlow. This EngD comprises a set of three projects. The first project was focused on the development of dual-tone multi-frequency (DTMF) signal detection software. DTMF signals are currently used for operating menu-driven services such as voice-mail, telephone banking and share-dealing. The need for detection software in a packet networking environment exists because such signals become degraded when they travel through speech compression circuits. In addition, if they appear as echoes on a telephone line, they can affect the operation of echo cancellation systems. In this thesis a number of DSP algorithms are discussed where fast detection and minimum complexity are key characteristics. A key contribution here was the development of a novel detection algorithm based on the extraction of parameters that model the DTMF signal. The thesis reports a method combining parameter extraction with the technique of maximum likelihood to perform DTMF detection, resulting in very short time-frames when compared to standard approaches. Reducing the complexity of detection techniques is also important in today’s communication systems. To this end a key contribution here was the development of the ‘split Goertzel algorithm’, which permitted an overlapping of analysis windows without the need for reprocessing input data. Besides being applied to voice-band signals, such as in the case of DTMF, the Author also had the opportunity during the EngD to apply the skills and knowledge acquired in signal processing to higher-rate data-streams. This involved work concerning the equalisation of channel distortion commonly found in wireless communication systems. This covers two projects, with the first being conducted at Verticalband Ltd. (now no longer operational) in the area of digital television receivers. In this part of the thesis a real-time DSP implementation is discussed for enhancing a simulation system developed for wireless channels. A number of channel equalisation approaches are studied. The work concludes that for high-rate signals, non-linear algorithms have the best error rate performance. On the basis of the studies carried out, the thesis considers development and implementation issues of designs based on the decision feedback equaliser. The thesis reports on a software design which applies the method of least squares to carry out filter coefficient adaptation. The Verticalband studies reported lead on to further research within the context of channel equalisation, in the context of the detection of data in multiple-input multiple-output (MIMO) wireless local area network (WLAN) systems. This work was undertaken at Philips Research in Eindhoven, The Netherlands. The thesis discusses implementation scenarios of multi-element antenna arrays that aim to provide bit-rates orders of magnitude higher than today’s WLAN offerings, as those required by emerging standards such as 802.11n. The complexity of optimal detection techniques, such as maximum likelihood, scales exponentially with the number of transmit antennas. This translates to high processing costs and power consumption, rendering such techniques unsuitable for use in battery-powered devices. The initial main contribution here was the analysis of the complexity of algorithms whose performance had already been tested, such as the non-linear maximum likelihood approach and also less complex methods utilising linear filtering. This resulted in the development of new formulae to predict processing costs of algorithms based on the number of transmit and receive antennas. Other key contributions were defining a method to reduce the complexity of matrix inversion when using the Moore-Penrose pseudo-inverse, and the application of matrix decomposition to obviate the need for costly matrix inversion at all. Some on-going research into sub-optimal detection is also discussed, which describes methods to reduce the search-space for the maximum likelihood algorithm.

## A Subspace Based Approach to the Design, Implementation and Validation of Algorithms for Active Vibration Isolation Control

Vibration isolation endeavors to reduce the transmission of vibration energy from one structure (the source) to another (the receiver), to prevent undesirable phenomena such as sound radiation. A well-known method for achieving this is passive vibration isolation (PVI). In the case of PVI, mounts are used - consisting of springs and dampers - to connect the vibrating source to the receiver. The stiffness of the mount determines the fundamental resonance frequency of the mounted system and vibrations with a frequency higher than the fundamental resonance frequency are attenuated. Unfortunately, however, other design requirements (such as static stability) often impose a minimum allowable stiffness, thus limiting the achievable vibration isolation by passive means. A more promising method for vibration isolation is hybrid vibration isolation control. This entails that, in addition to PVI, an active vibration isolation control (AVIC) system is used with sensors, actuators and a control system that compensates for vibrations in the lower frequency range. Here, the use of a special form of AVIC using statically determinate stiff mounts is proposed. The mounts establish a statically determinate system of high stiffness connections in the actuated directions and of low stiffness connections in the unactuated directions. The latter ensures PVI in the unactuated directions. This approach is called statically determinate AVIC (SD-AVIC). The aim of the control system is to produce antidisturbance forces that counteract the disturbance forces stemming from the source. Using this approach, the vibration energy transfer from the source to the receiver is blocked in the mount due to the anti-forces. This thesis deals with the design of controllers generating the anti-forces by applying techniques that are commonly used in the field of signal processing. The control approaches - that are model-based - are both adaptive and fixed gain and feedforward and feedback oriented. The control approaches are validated using two experimental vibration isolation setups: a single reference single actuator single error sensor (SR-SISO) setup and a single reference input multiple actuator input multiple error sensor output (SR-MIMO) setup. Finding a plant model can be a problem. This is solved by using a black-box modelling strategy. The plants are identified using subspace model identification. It is shown that accurate linear models can be found in a straightforward manner by using small batches of recorded (sampled) time-domain data only. Based on the identified models, controllers are designed, implemented and validated. Due to resonance in mechanical structures, adaptive SD-AVIC systems are often hampered by slow convergence of the controller coefficients. In general, it is desirable that the SD-AVIC system yields fast optimum performance after it is switched on. To achieve this result and speed up the convergence of the adaptive controller coefficients, the so-called inverse outer factor model is included in the adaptive control scheme. The inner/outer factorization, that has to be performed to obtain the inverse outer factor model, is completely determined in state space to enable a numerically robust computation. The inverse outer factor model is also incorporated in the control scheme as a state space model. It is found that fast adaptation of the controller coefficients is possible. Controllers are designed, implemented and validated to suppress both narrowband and broadband disturbances. Scalar regularization is used to prevent actuator saturation and an unstable closed loop. In order to reduce the computational load of the controllers, several steps are taken including controller order reduction and implementation of lower order models. It is found that in all experiments the simulation and real-time results correspond closely for both the fixed gain and adaptive control situation. On the SR-SISO setup, reductions up to 5.0 dB are established in real-time for suppressing a broadband disturbance output (0-2 kHz) using feedback-control. On the SR-MIMO vibration isolation setup, using feedforward-control reductions of broadband disturbances (0-1 kHz) of 9.4 dB are established in real-time. Using feedback-control, reductions are established up to 3.5 dB in real-time (0-1 kHz). In case of the SR-MIMO setup, the values for the reduction are obtained by averaging the reductions obtained in all sensor outputs. The results pave the way for the next generation of algorithms for SD-AVIC.

## Hybrid Floating Point Technique Yields 1.2 Gigasample Per Second 32 to 2048 point Floating Point FFT in a single FPGA

Hardware Digital Signal Processing, especially hardware targeted to FPGAs, has traditionally been done using fixed point arithmetic, mainly due to the high cost associated with implementing floating point arithmetic. That cost comes in the form of increased circuit complexity. The increase circuit complexity usually also degrades maximum clock performance. Certain applications demand the dynamic range offered by floating point hardware, and yet require the speeds and circuit density usually associated with fixed point hardware. The Fourier transform is one DSP building block that frequently requires floating point dynamic range. Textbook construction of a pipelined floating point FFT engine capable of continuous input entails dozens of floating point adders and multipliers. The complexity of those circuits quickly exceeds the resources available on a single FPGA. This paper describes a technique that is a hybrid of fixed point and floating point operations designed to significantly reduce the overhead for floating point. The results are illustrated with an FFT processor that performs 32, 64, 128, 256, 512, 1024 and 2048 point Fourier transforms with IEEE single precision floating point inputs and outputs. The design achieves sufficient density to realize a continuous complex data rate of 1.2 Gigasamples per second data throughput using a single Virtex4-SX55-10 device.

## Audio Time-Scale Modification

Audio time-scale modification is an audio effect that alters the duration of an audio signal without affecting its perceived local pitch and timbral characteristics. There are two broad categories of time-scale modification algorithms, time-domain and frequency-domain. The computationally efficient time-domain techniques produce high quality results for single pitched signals such as speech, but do not cope well with more complex signals such as polyphonic music. The less efficient frequencydomain techniques have proven to be more robust and produce high quality results for a variety of signals; however they introduce a reverberant artefact into the output. This dissertation focuses on incorporating aspects of time-domain techniques into frequency-domain techniques in an attempt to reduce the presence of the reverberant artefact and improve upon computational demands. From a review of prior work it was found that there are a number of time-domain algorithms available and that the choice of algorithm parameters varies considerably in the literature. This finding prompted an investigation into the effects of the choice of parameters and a comparison of the various techniques employed in terms of computational requirements and output quality. The investigation resulted in the derivation of an efficient and flexible parameter set for use within time-domain implementations. Of the available frequency-domain approaches the phase vocoder and timedomain/ subband techniques offer an efficiency and robustness advantage over sinusoidal modelling and iterative phase update techniques, and as such were identified as suitable candidates for the provision of a framework for further investigation. Following from this observation, improvements in the quality produced by time-domain/subband techniques are realised through the use of a bark based subband partitioning approach and effective subband synchronisation techniques. In addition, computational and output quality improvements within a phase vocoder implementation are achieved by taking advantage of a certain level of flexibility in the choice of phase within such an implementation. The phase flexibility established is used to push or pull phase values into a phase coherent state. Further improvements are realised by incorporating features of time-domain algorithms into the system in order to provide a ‘good’ initial set of phase estimates; the transition to ‘perfect’ phase coherence is significantly reduced through this scheme, thereby improving the overall output quality produced. The result is a robust and efficient time-scale modification algorithm which draws upon various aspects of a number of general approaches to time-scale modification.