## Digital Signal Processor Fundamentals and System Design

Digital Signal Processors (DSPs) have been used in accelerator systems for more than fifteen years and have largely contributed to the evolution towards digital technology of many accelerator systems, such as machine protection, diagnostics and control of beams, power supply and motors. This paper aims at familiarising the reader with DSP fundamentals, namely DSP characteristics and processing development. Several DSP examples are given, in particular on Texas Instruments DSPs, as they are used in the DSP laboratory companion of the lectures this paper is based upon. The typical system design flow is described; common difficulties, problems and choices faced by DSP developers are outlined; and hints are given on the best solution.

## Novel Method of Showing Frequency Transients in the Fourier Transform and it’s Application in Time-Frequency Analysis

Fourier Transform in the frequency domain is modified to also analyse frequency transients i.e. changes in the frequency spectrum with time variable of any order. This is analytically, a very useful tool as there are many problems where frequency variation with time has to be analyzed e.g. Doppler shift, Light through different mediums in time and space. Numerical calculations are usually done for such problems when needed. Here, Fourier transform is analyzed to incorporate more variables that simultaneously do the Time lag-Frequency Analysis (TLFA) from Fourier Transform by changing the Fourier Operator. Also, the Frequency Derivative Analysis (FDA) of any order can be analyzed from Fourier Transform. Validity of the operator is examined using Eigen value analysis and operator algebra.

## STUDY OF DIGITAL MODULATION TECHNIQUES

Modulation is the process of facilitating the transfer of information over a medium. Typically the objective of a digital communication system is to transport digital data between two or more nodes. In radio communications this is usually achieved by adjusting a physical characteristic of a sinusoidal carrier, either the frequency, phase, amplitude or a combination thereof . This is performed in real systems with a modulator at the transmitting end to impose the physical change to the carrier and a demodulator at the receiving end to detect the resultant modulation on reception. Hence, modulation can be objectively defined as the process of converting information so that it can be successfully sent through a medium. This thesis deals with the current digital modulation techniques used in industry. Also, the thesis examines the qualitative and quantitative criteria used in selection of one modulation technique over the other. All the experiments, and realted data collected were obtained using MATLAB and SIMULINK

## Region based Active Contour Segmentation

In this paper, we propose a natural framework that allows any region-based segmentation energy to be re-formulated in a local way. We consider local rather than global image statistics and evolve a contour based on local information. Localized contours are capable of segmenting objects with heterogeneous feature profiles that would be difficult to capture correctly using a standard global method. The presented technique is versatile enough to be used with any global region-based active contour energy and instill in it the benefits of localization. We describe this framework and demonstrate the localization of three well-known energies in order to illustrate how our framework can be applied to any energy. We then compare each localized energy to its global counterpart to show the improvements that can be achieved. Next, an in-depth study of the behaviors of these energies in response to the degree of localization is given. Finally, we show results on challenging images to illustrate the robust and accurate segmentations that are possible with this new class of active contour models.

## LOW-RESOURCE DELAYLESS SUBBAND ADAPTIVE FILTER USING WEIGHTED OVERLAP-ADD

A delayless structure targeted for low-resource implementation is proposed to eliminate filterbank processing delays in subband adaptive filters (SAFs). Rather than using direct IFFT or polyphase filterbanks to transform the SAFs back into the time-domain, the proposed method utilizes a weighted overlap-add (WOLA) synthesis. Low-resource real-time implementations are targeted and as such do not involve long (as long as the echo plant) FFT or IFFT operations. Also, the proposed approach facilitates time distribution of the adaptive filter reconstruction calculations crucial for efficient real-time and hardware implementation. The method is implemented on an oversampled WOLA filterbank employed as part of an echo cancellation application. Evaluation results demonstrate that the proposed implementation outperforms conventional SAF systems since the signals used in actual adaptive filtering are not distorted by filterbank aliasing. The method is a good match for partial update adaptive algorithms since segments of the time-domain adaptive filter are sequentially reconstructed and updated.

## OPTIMAL DESIGN OF DIGITAL EQUIVALENTS TO ANALOG FILTERS

The proposed optimal algorithm for the digitizing of analog filters is based on two existing filter design methods: the extended window design (EWD) and the matched–pole (MP) frequency sampling design. The latter is closely related to the filter design with iterative weighted least squares (WLS). The optimization is performed with an original MP design that yields an equiripple digitizing error. Then, a drastic reduction of the digitizing error is achieved through the introduction of a fractional time shift that minimizes the magnitude of the equiripple error within a given frequency interval. The optimal parameters thus obtained can be used to generate the EWD equations, together with a variable fractional delay output, as described in an earlier paper. Finally, in contrast to the WLS procedure, which relies on a “good guess” of the weighting function, the MP optimization is straightforward.

## A NEW PARALLEL IMPLEMENTATION FOR PARTICLE FILTERS AND ITS APPLICATION TO ADAPTIVE WAVEFORM DESIGN

Sequential Monte Carlo particle ﬁlters (PFs) are useful for estimating nonlinear non-Gaussian dynamic system parameters. As these algorithms are recursive, their real-time implementation can be computationally complex. In this paper, we analyze the bottlenecks in existing parallel PF algorithms, and we propose a new approach that integrates parallel PFs with independent Metropolis-Hastings (PPF-IMH) algorithms to improve root mean-squared estimation error performance. We implement the new PPF-IMH algorithm on a Xilinx Virtex-5 ﬁeld programmable gate array (FPGA) platform. For a onedimensional problem and using 1,000 particles, the PPF-IMH architecture with four processing elements utilizes less than 5% Virtex-5 FPGA resources and takes 5.85 μs for one iteration. The algorithm performance is also demonstrated when designing the waveform for an agile sensing application.

## A pole-zero placement technique for designing second-order IIR parametric equalizer filters

A new procedure is presented for designing second-order parametric equalizer filters. In contrast to the traditional approach, in which the design is based on a bilinear transform of an analog filter, the presented procedure allows for designing the filter directly in the digital domain. A rather intuitive technique known as pole-zero placement, is treated here in a quantitative way. It is shown that by making some meaningful approximations, a set of relatively simple design equations can be obtained. Design examples of both notch and resonance filters are included to illustrate the performance of the proposed method, and to compare with state-of-the-art solutions.

## Adaptive distributed noise reduction for speech enhancement in wireless acoustic sensor networks

An adaptive distributed noise reduction algorithm for speech enhancement is considered, which operates in a wireless acoustic sensor network where each node collects multiple microphone signals. In previous work, it was shown theoretically that for a stationary scenario, the algorithm provides the same signal estimators as the centralized multi-channel Wiener filter, while significantly compressing the data that is transmitted between the nodes. Here, we present simulation results of a fully adaptive implementation of the algorithm, in a non-stationary acoustic scenario with a moving speaker and two babble noise sources. The algorithm is implemented using a weighted overlap-add technique to reduce the overall input-output delay. It is demonstrated that good results can be obtained by estimating the required signal statistics with a long-term forgetting factor without downdating, even though the signal statistics change along with the iterative filter updates. It is also demonstrated that simultaneous node updating provides a significantly smoother and faster tracking performance compared to sequential node updating.

## EFFICIENT MAPPING OF ADVANCED SIGNAL PROCESSING ALGORITHMS ON MULTI-PROCESSOR ARCHITECTURES

Modern microprocessor technology is migrating from simply increasing clock speeds on a single processor to placing multiple processors on a die to increase throughput and power performance in every generation. To utilize the potential of such a system, signal processing algorithms have to be efficiently parallelized so that the load can be distributed evenly among the multiple processing units. In this paper, we study several advanced deterministic and stochastic signal processing algorithms and their computation using multiple processing units. Specifically, we consider two commonly used time-frequency signal representations, the short-time Fourier transform and the Wigner distribution, and we demonstrate their parallelization with low communication overhead. We also consider sequential Monte Carlo estimation techniques such as particle filtering, and we demonstrate that its multiple processor implementation requires large data exchanges and thus a high communication overhead. We propose a modified mapping scheme that reduces this overhead at the expense of a slight loss in accuracy, and we evaluate the performance of the scheme for a state estimation problem with respect to accuracy and scalability.

## Decimator Image Response

This article presents a way to compute and plot the image response of a decimator. I'm defining the image response as the unwanted spectrum of the impulse response after downsampling, relative to the desired passband response.

## Towards Efﬁcient and Robust Automatic Speech Recognition: Decoding Techniques and Discriminative Training

Automatic speech recognition has been widely studied and is already being applied in everyday use. Nevertheless, the recognition performance is still a bottleneck in many practical applications of large vocabulary continuous speech recognition. Either the recognition speed is not sufﬁcient, or the errors in the recognition result limit the applications. This thesis studies two aspects of speech recognition, decoding and training of acoustic models, to improve speech recognition performance in different conditions.

## STUDY OF DIGITAL MODULATION TECHNIQUES

Modulation is the process of facilitating the transfer of information over a medium. Typically the objective of a digital communication system is to transport digital data between two or more nodes. In radio communications this is usually achieved by adjusting a physical characteristic of a sinusoidal carrier, either the frequency, phase, amplitude or a combination thereof . This is performed in real systems with a modulator at the transmitting end to impose the physical change to the carrier and a demodulator at the receiving end to detect the resultant modulation on reception. Hence, modulation can be objectively defined as the process of converting information so that it can be successfully sent through a medium. This thesis deals with the current digital modulation techniques used in industry. Also, the thesis examines the qualitative and quantitative criteria used in selection of one modulation technique over the other. All the experiments, and realted data collected were obtained using MATLAB and SIMULINK

## Region based Active Contour Segmentation

In this paper, we propose a natural framework that allows any region-based segmentation energy to be re-formulated in a local way. We consider local rather than global image statistics and evolve a contour based on local information. Localized contours are capable of segmenting objects with heterogeneous feature profiles that would be difficult to capture correctly using a standard global method. The presented technique is versatile enough to be used with any global region-based active contour energy and instill in it the benefits of localization. We describe this framework and demonstrate the localization of three well-known energies in order to illustrate how our framework can be applied to any energy. We then compare each localized energy to its global counterpart to show the improvements that can be achieved. Next, an in-depth study of the behaviors of these energies in response to the degree of localization is given. Finally, we show results on challenging images to illustrate the robust and accurate segmentations that are possible with this new class of active contour models.

## BLAS Comparison on FPGA, CPU and GPU

High Performance Computing (HPC) or scientific codes are being executed across a wide variety of computing platforms from embedded processors to massively parallel GPUs. We present a comparison of the Basic Linear Algebra Subroutines (BLAS) using double-precision floating point on an FPGA, CPU and GPU. On the CPU and GPU, we utilize standard libraries on state-of-the-art devices. On the FPGA, we have developed parameterized modular implementations for the dot product and Gaxpy or matrix-vector multiplication. In order to obtain optimal performance for any aspect ratio of the matrices, we have designed a high-throughput accumulator to perform an efficient reduction of floating point values. To support scalability to large data-sets, we target the BEE3 FPGA platform. We use performance and energy efficiency as metrics to compare the different platforms. Results show that FPGAs offer comparable performance as well as 2.7 to 293 times better energy efficiency for the test cases that we implemented on all three platforms.

## FUZZY LOGIC BASED CONVOLUTIONAL DECODER FOR USE IN MOBILE TELEPHONE SYSTEMS

Efficient convolutional coding and decoding algorithms are most crucial to successful operation of wireless communication systems in order to achieve high quality of service by reducing the overall bit error rate performance. A widely applied and well evaluated scheme for error correction purposes is well known as Viterbi algorithm [7]. Although the Viterbi algorithm has very good error correcting characteristics, computational effort required remains high. In this paper a novel approach is discussed introducing a convolutional decoder design based on fuzzy logic. A simplified version of this fuzzy based decoder is examined with respect to bit error rate (BER) performance. It can be shown that the fuzzy based convolutional decoder here proposed considerably reduces computational effort with only minor BER performance degradation when compared to the classical Viterbi approach.

## Development of a real time test platform for motor drive algorithms

In this thesis a real time test platform for a permanent magnet synchronous motor is developed. The implemented algorithm is Field Oriented Control (FOC) and it is implemented on a Texas Instruments TMS320F2808 Digital Signal Processor (DSP). The platform is developed in a rapid prototyping approach using Matlab/Simulink and the Real Time Workshop (RTW) packages.With this software the control algorithm and its interface to different DSP modules, such as A/D converter and PWM module, is constructed as a Simulink block scheme. The blocks used come from ordinary Simulink libraries and libraries provided by the RTW packages. From the Simulink block scheme Matlab can auto generate embedded C code adapted for different embedded targets, in this case the 2808 DSP.The developed real time test platform is also a Simulink model, though different from the algorithm model. When the start simulation command is given in the platform model a Graphical User Interface is loaded which lets the user specify motor parameters and certain algorithm parameters. Once the parameters are chosen RTW generates code from the algorithm model, loads it into the DSP and runs the generated program. From the platform model it is possible to set the reference speed of the motor in real time and monitor/log motor parameters such as actual speed and stator currents.

## Algorithm Adaptation and Optimization of a Novel DSP Vector Co-processor

The Division of Computer Engineering at Linköping's university is currently researching the possibility to create a highly parallel DSP platform, that can keep up with the computational needs of upcoming standards for various applications, at low cost and low power consumption. The architecture is called ePUMA and it combines a general RISC DSP master processor with eight SIMD co-processors on a single chip. The master processor will act as the main processor for general tasks and execution control, while the co-processors will accelerate computing intensive and parallel DSP kernels.This thesis investigates the performance potential of the co-processors by implementing matrix algebra kernels for QR decomposition, LU decomposition, matrix determinant and matrix inverse, that run on a single co-processor. The kernels will then be evaluated to find possible problems with the co-processors' microarchitecture and suggest solutions to the problems that might exist. The evaluation shows that the performance potential is very good, but a few problems have been identified, that causes significant overhead in the kernels. Pipeline mismatches, that occurs due to different pipeline lengths for different instructions, causes pipeline hazards and the current solution to this, doesn't allow effective use of the pipeline. In some cases, the single port memories will cause bottlenecks, but the thesis suggests that the situation could be greatly improved by using buffered memory write-back. Also, the lack of register forwarding makes kernels with many data dependencies run unnecessarily slow.

## EngD thesis: Reduced-Complexity Signal Detection in Digital Communications Receivers

The Author began this Engineering Doctorate (EngD) in Autumn 1999, whilst already in full-time employment as a DSP software engineer at Nortel Networks, Harlow. This EngD comprises a set of three projects. The first project was focused on the development of dual-tone multi-frequency (DTMF) signal detection software. DTMF signals are currently used for operating menu-driven services such as voice-mail, telephone banking and share-dealing. The need for detection software in a packet networking environment exists because such signals become degraded when they travel through speech compression circuits. In addition, if they appear as echoes on a telephone line, they can affect the operation of echo cancellation systems. In this thesis a number of DSP algorithms are discussed where fast detection and minimum complexity are key characteristics. A key contribution here was the development of a novel detection algorithm based on the extraction of parameters that model the DTMF signal. The thesis reports a method combining parameter extraction with the technique of maximum likelihood to perform DTMF detection, resulting in very short time-frames when compared to standard approaches. Reducing the complexity of detection techniques is also important in today’s communication systems. To this end a key contribution here was the development of the ‘split Goertzel algorithm’, which permitted an overlapping of analysis windows without the need for reprocessing input data. Besides being applied to voice-band signals, such as in the case of DTMF, the Author also had the opportunity during the EngD to apply the skills and knowledge acquired in signal processing to higher-rate data-streams. This involved work concerning the equalisation of channel distortion commonly found in wireless communication systems. This covers two projects, with the first being conducted at Verticalband Ltd. (now no longer operational) in the area of digital television receivers. In this part of the thesis a real-time DSP implementation is discussed for enhancing a simulation system developed for wireless channels. A number of channel equalisation approaches are studied. The work concludes that for high-rate signals, non-linear algorithms have the best error rate performance. On the basis of the studies carried out, the thesis considers development and implementation issues of designs based on the decision feedback equaliser. The thesis reports on a software design which applies the method of least squares to carry out filter coefficient adaptation. The Verticalband studies reported lead on to further research within the context of channel equalisation, in the context of the detection of data in multiple-input multiple-output (MIMO) wireless local area network (WLAN) systems. This work was undertaken at Philips Research in Eindhoven, The Netherlands. The thesis discusses implementation scenarios of multi-element antenna arrays that aim to provide bit-rates orders of magnitude higher than today’s WLAN offerings, as those required by emerging standards such as 802.11n. The complexity of optimal detection techniques, such as maximum likelihood, scales exponentially with the number of transmit antennas. This translates to high processing costs and power consumption, rendering such techniques unsuitable for use in battery-powered devices. The initial main contribution here was the analysis of the complexity of algorithms whose performance had already been tested, such as the non-linear maximum likelihood approach and also less complex methods utilising linear filtering. This resulted in the development of new formulae to predict processing costs of algorithms based on the number of transmit and receive antennas. Other key contributions were defining a method to reduce the complexity of matrix inversion when using the Moore-Penrose pseudo-inverse, and the application of matrix decomposition to obviate the need for costly matrix inversion at all. Some on-going research into sub-optimal detection is also discussed, which describes methods to reduce the search-space for the maximum likelihood algorithm.