## Digital Signal Processor Fundamentals and System Design

Digital Signal Processors (DSPs) have been used in accelerator systems for more than fifteen years and have largely contributed to the evolution towards digital technology of many accelerator systems, such as machine protection, diagnostics and control of beams, power supply and motors. This paper aims at familiarising the reader with DSP fundamentals, namely DSP characteristics and processing development. Several DSP examples are given, in particular on Texas Instruments DSPs, as they are used in the DSP laboratory companion of the lectures this paper is based upon. The typical system design flow is described; common difficulties, problems and choices faced by DSP developers are outlined; and hints are given on the best solution.

## STUDY OF DIGITAL MODULATION TECHNIQUES

Modulation is the process of facilitating the transfer of information over a medium. Typically the objective of a digital communication system is to transport digital data between two or more nodes. In radio communications this is usually achieved by adjusting a physical characteristic of a sinusoidal carrier, either the frequency, phase, amplitude or a combination thereof . This is performed in real systems with a modulator at the transmitting end to impose the physical change to the carrier and a demodulator at the receiving end to detect the resultant modulation on reception. Hence, modulation can be objectively defined as the process of converting information so that it can be successfully sent through a medium. This thesis deals with the current digital modulation techniques used in industry. Also, the thesis examines the qualitative and quantitative criteria used in selection of one modulation technique over the other. All the experiments, and realted data collected were obtained using MATLAB and SIMULINK

## Region based Active Contour Segmentation

In this paper, we propose a natural framework that allows any region-based segmentation energy to be re-formulated in a local way. We consider local rather than global image statistics and evolve a contour based on local information. Localized contours are capable of segmenting objects with heterogeneous feature profiles that would be difficult to capture correctly using a standard global method. The presented technique is versatile enough to be used with any global region-based active contour energy and instill in it the benefits of localization. We describe this framework and demonstrate the localization of three well-known energies in order to illustrate how our framework can be applied to any energy. We then compare each localized energy to its global counterpart to show the improvements that can be achieved. Next, an in-depth study of the behaviors of these energies in response to the degree of localization is given. Finally, we show results on challenging images to illustrate the robust and accurate segmentations that are possible with this new class of active contour models.

## LOW-RESOURCE DELAYLESS SUBBAND ADAPTIVE FILTER USING WEIGHTED OVERLAP-ADD

A delayless structure targeted for low-resource implementation is proposed to eliminate filterbank processing delays in subband adaptive filters (SAFs). Rather than using direct IFFT or polyphase filterbanks to transform the SAFs back into the time-domain, the proposed method utilizes a weighted overlap-add (WOLA) synthesis. Low-resource real-time implementations are targeted and as such do not involve long (as long as the echo plant) FFT or IFFT operations. Also, the proposed approach facilitates time distribution of the adaptive filter reconstruction calculations crucial for efficient real-time and hardware implementation. The method is implemented on an oversampled WOLA filterbank employed as part of an echo cancellation application. Evaluation results demonstrate that the proposed implementation outperforms conventional SAF systems since the signals used in actual adaptive filtering are not distorted by filterbank aliasing. The method is a good match for partial update adaptive algorithms since segments of the time-domain adaptive filter are sequentially reconstructed and updated.

## OPTIMAL DESIGN OF DIGITAL EQUIVALENTS TO ANALOG FILTERS

The proposed optimal algorithm for the digitizing of analog filters is based on two existing filter design methods: the extended window design (EWD) and the matched–pole (MP) frequency sampling design. The latter is closely related to the filter design with iterative weighted least squares (WLS). The optimization is performed with an original MP design that yields an equiripple digitizing error. Then, a drastic reduction of the digitizing error is achieved through the introduction of a fractional time shift that minimizes the magnitude of the equiripple error within a given frequency interval. The optimal parameters thus obtained can be used to generate the EWD equations, together with a variable fractional delay output, as described in an earlier paper. Finally, in contrast to the WLS procedure, which relies on a “good guess” of the weighting function, the MP optimization is straightforward.

## A NEW PARALLEL IMPLEMENTATION FOR PARTICLE FILTERS AND ITS APPLICATION TO ADAPTIVE WAVEFORM DESIGN

Sequential Monte Carlo particle ﬁlters (PFs) are useful for estimating nonlinear non-Gaussian dynamic system parameters. As these algorithms are recursive, their real-time implementation can be computationally complex. In this paper, we analyze the bottlenecks in existing parallel PF algorithms, and we propose a new approach that integrates parallel PFs with independent Metropolis-Hastings (PPF-IMH) algorithms to improve root mean-squared estimation error performance. We implement the new PPF-IMH algorithm on a Xilinx Virtex-5 ﬁeld programmable gate array (FPGA) platform. For a onedimensional problem and using 1,000 particles, the PPF-IMH architecture with four processing elements utilizes less than 5% Virtex-5 FPGA resources and takes 5.85 μs for one iteration. The algorithm performance is also demonstrated when designing the waveform for an agile sensing application.

## A pole-zero placement technique for designing second-order IIR parametric equalizer filters

A new procedure is presented for designing second-order parametric equalizer filters. In contrast to the traditional approach, in which the design is based on a bilinear transform of an analog filter, the presented procedure allows for designing the filter directly in the digital domain. A rather intuitive technique known as pole-zero placement, is treated here in a quantitative way. It is shown that by making some meaningful approximations, a set of relatively simple design equations can be obtained. Design examples of both notch and resonance filters are included to illustrate the performance of the proposed method, and to compare with state-of-the-art solutions.

## Adaptive distributed noise reduction for speech enhancement in wireless acoustic sensor networks

An adaptive distributed noise reduction algorithm for speech enhancement is considered, which operates in a wireless acoustic sensor network where each node collects multiple microphone signals. In previous work, it was shown theoretically that for a stationary scenario, the algorithm provides the same signal estimators as the centralized multi-channel Wiener filter, while significantly compressing the data that is transmitted between the nodes. Here, we present simulation results of a fully adaptive implementation of the algorithm, in a non-stationary acoustic scenario with a moving speaker and two babble noise sources. The algorithm is implemented using a weighted overlap-add technique to reduce the overall input-output delay. It is demonstrated that good results can be obtained by estimating the required signal statistics with a long-term forgetting factor without downdating, even though the signal statistics change along with the iterative filter updates. It is also demonstrated that simultaneous node updating provides a significantly smoother and faster tracking performance compared to sequential node updating.

## EFFICIENT MAPPING OF ADVANCED SIGNAL PROCESSING ALGORITHMS ON MULTI-PROCESSOR ARCHITECTURES

Modern microprocessor technology is migrating from simply increasing clock speeds on a single processor to placing multiple processors on a die to increase throughput and power performance in every generation. To utilize the potential of such a system, signal processing algorithms have to be efficiently parallelized so that the load can be distributed evenly among the multiple processing units. In this paper, we study several advanced deterministic and stochastic signal processing algorithms and their computation using multiple processing units. Specifically, we consider two commonly used time-frequency signal representations, the short-time Fourier transform and the Wigner distribution, and we demonstrate their parallelization with low communication overhead. We also consider sequential Monte Carlo estimation techniques such as particle filtering, and we demonstrate that its multiple processor implementation requires large data exchanges and thus a high communication overhead. We propose a modified mapping scheme that reduces this overhead at the expense of a slight loss in accuracy, and we evaluate the performance of the scheme for a state estimation problem with respect to accuracy and scalability.

## Closing the gap: CPU and FPGA Trends in sustainable floating-point BLAS performance

Field programmable gate arrays (FPGAs) have long been an attractive alternative to microprocessors for computing tasks — as long as floating-point arithmetic is not required. Fueled by the advance of Moore’s Law, FPGAs are rapidly reaching sufficient densities to enhance peak floating-point performance as well. The question, however, is how much of this peak performance can be sustained. This paper examines three of the basic linear algebra subroutine (BLAS) functions: vector dot product, matrix-vector multiply, and matrix multiply. A comparison of microprocessors, FPGAs, and Reconfigurable Computing platforms is performed for each operation. The analysis highlights the amount of memory bandwidth and internal storage needed to sustain peak performance with FPGAs. This analysis considers the historical context of the last six years and is extrapolated for the next six years.

## Algorithms and tools for automatic generation of DSP hardware structures

The increased complexity of Digital Signal Processing (DSP) algorithms demands for the development of more complex and more eﬃcient hardware structures. The work presented herein describes the core components for the development of a tool capable of automatic generation of eﬃcient hardware structures, therefore facilitating developers work. It comprises algorithms and techniques for i) balancing the paths in a graph, ii) scheduling of operations to functional units, iii) allocating registers and iv) generating the VHDL code. Results show that the developed techniques are capable of generating the hardware structure of typical DSP algorithms represented in data-ﬂow graphs with over 2,000 nodes in around 200 ms, scaling to 80,000 nodes in about 214 s. Within the developed techniques, solving the scheduling problem is one of the most complex tasks: it is a NP-complete problem and directly inﬂuences the number of functional units and registers required. Therefore, experimental analysis was made on scheduling algorithms for time-constrained problems. Results show that simple list-based algorithms are more eﬃcient in large problems than more complex algorithms: they run faster and tend to require less functional units.

## Voice Codec for Floating Point Processor

As part of an ongoing project at the department of electrical engineering, ISY, at Linköping University, a voice decoder using floating point formats has been the focus of this master thesis. Previous work has been done developing an mp3-decoder using the floating point formats. All is expected to be implemented on a single DSP.The ever present desire to make things smaller, more efficient and less power consuming are the main reasons for this master thesis regarding the use of a floating point format instead of the traditional integer format in a GSM codec. The idea with the low precision floating point format is to be able to reduce the size of the memory. This in turn reduces the size of the total chip area needed and also decreases the power consumption.One main question is if this can be done with the floating point format without losing too much sound quality of the speech. When using the integer format, one can represent every value in the range depending on how many bits are being used. When using a floating point format you can represent larger values using fewer bits compared to the integer format but you lose representation of some values and have to round the values off.From the tests that have been made with the decoder during this thesis, it has been found that the audible difference between the two formats is very small and can hardly be heard, if at all. The rounding seems to have very little effect on the quality of the sound and the implementation of the codec has succeeded in reproducing similar sound quality to the GSM standard decoder.

## Implementation of Algorithms on FPGAs

This thesis describes how an algorithm is transferred from a digital signal processor to an embedded microprocessor in an FPGA using C to hardware program from Altera. Saab Avitronics develops the secondary high lift control system for the Boeing 787 aircraft. The high lift system consists of electric motors controlling the trailing edge wing flaps and the leading edge wing slats. The high lift motors manage to control the Boeing 787 aircraft with full power even if half of each motor’s stators are damaged. The motor is a PMDC brushless motor which is controlled by an advanced algorithm. The algorithm needs to be calculated by a fast special digital signal processor. In this thesis I have tested if the algorithm can be transferred to an FPGA and still manage the time and safety demands. This was done by transferring an already working algorithm from the digital signal processor to an FPGA. The idea was to put the algorithm in an embedded NIOS II microprocessor and speed up the bottlenecks with Altera’s C to hardware program. The study shows that the C-code needs to be optimized for C to hardware to manage the up speeding part, as the tests showed that the calculation time for the algorithm actually became longer with C to hardware. This thesis also shows that it is highly probable to use an FPGA equipped with Altera’s NIOS II safety critical microprocessor instead of a digital signal processor to control the electrical high lift motors in the Boeing 787 aircraft.

## A DGPS/Radiobeacon Receiver for Minimum Shift Keying with Soft Decision Capabilities

The Global Positioning System (GPS) is now in operation, and many improvements to its performance are being sought. One such improvement is Differential GPS (DGPS), where known errors in the GPS broadcast are identified and the corrections broadcast to the end user. One implementation of DGPS being considered is the use of coastal marine radio direction finding (RDF) radiobeacons in the 285-325kHz band as transmitters for the DGPS broadcast. The normal RDF beacon signal consists of a continuous carrier on a one kilohertz boundary plus a Morse-code identification signal 1025Hz above the carrier. In the DGPS/radiobeacon implementation proposed for the US coastal regions, the differential data link signal uses minimum shift keying (MSK) at a data rate of 25, 50, 100, 200 or 400 baud (the exact baud rat has not yet been decided). This MSK signal is centered between the RDF beacon carrier and identification signal. At the frequencies that these radiobeacons are operated, the prevailing atmospheric noise is both non-Gaussian and very strong. This noise characteristic makes the design of a long-range data link difficult. One solution that has been proposed is the use of forward error correction (FEC) coding of the data. The performance of FEC decoders can be improved by the used of a soft decision receiver, which delivers both bit decisions and information about the validity of the bit decisions. This work describes the design of a radio receiver for DGPS/Radiobeacon servics which is capable of reception of 400 baud MSK in the DGPS/Radiobeacon band. The receiver is designed to be easily augmented to provide soft decisions and easily modified to recieve MSK at data rates of 25 to 400 baud. The radio is a microprocessor controlled dual conversion superheterodyne with an audio frequency of 1kHz. The demodulator runs on the same microprocessor that controls the radio. The weak-signal performance of the demodulator is very good: the Eb/No vs. bit error rate performance of the demodulator is only a couple of dB worse than the theoretical performance of differential phase-shift keying. The radio has a noise floor of -114dBm referenced to it's 500Hz wide audio bandwidth and a 3rd order intermodulation intercept of +7dBm for a dynamic range of 83dB. This work concludes with a thumbnail analysis of the operations needed to implement a soft bit decision estimator, and some suggestions for the implementation of said soft bit decision estimator.

## Teaching MODEM Concepts and Design Procedure with MATLAB Simulations

MATLAB simulation is used as the primary tool to illustrate concepts, to validate MODEM designs, and to vent' operation of the subsystems employed in DSP based transmitters and receivers presented in a pair of classes on MODEM Design and Digital Receiver Design. The whole gamut of subsystems found in conventional and experimental modem designs are simulated and assembled to form a full end-to-end simulation of an operating MODEM. This paper describes the philosophy used to guide class involvement and assess the experience and the learning value to student participants.

## The Risk In Using Frequency Domain Curves To Evaluate Digital Integrator Performance

This article shows the danger in evaluating the performance of a digital integration network based solely on its frequency response curve. If you plan on implementing a digital integrator in your signal processing work I recommend you continue reading this article.

## Implementing Simultaneous Digital Differentiation, Hilbert Transformation, and Half-Band Filtering

Recently I've been thinking about digital differentiator and Hilbert transformer implementations and I've developed a processing scheme that may be of interest to the readers here on dsprelated.com.

## Digital Image Processing Using LabView

Digital Image processing is a topic of great relevance for practically any project, either for basic arrays of photodetectors or complex robotic systems using artificial vision. It is an interesting topic that offers to multimodal systems the capacity to see and understand their environment in order to interact in a natural and more efficient way. The development of new equipment for high speed image acquisition and with higher resolutions requires a significant effort to develop techniques that process the images in a more efficient way. Besides, medical applications use new image modalities and need algorithms for the interpretation of these images as well as for the registration and fusion of the different modalities, so that the image processing is a productive area for the development of multidisciplinary applications. The aim of this chapter is to present different digital image processing algorithms using LabView and IMAQ vision toolbox. IMAQ vision toolbox presents a complete set of digital image processing and acquisition functions that improve the efficiency of the projects and reduce the programming effort of the users obtaining better results in shorter time. Therefore, the IMAQ vision toolbox of LabView is an interesting tool to analyze in detail and through this chapter it will be presented different theories about digital image processing and different applications in the field of image acquisition, image transformations. This chapter includes in first place the image acquisition and some of the most common operations that can be locally or globally applied, the statistical information generated by the image in a histogram is commented later. Finally, the use of tools allowing to segment or filtrate the image are described making special emphasis in the algorithms of pattern recognition and matching template.

## Region based Active Contour Segmentation

In this paper, we propose a natural framework that allows any region-based segmentation energy to be re-formulated in a local way. We consider local rather than global image statistics and evolve a contour based on local information. Localized contours are capable of segmenting objects with heterogeneous feature profiles that would be difficult to capture correctly using a standard global method. The presented technique is versatile enough to be used with any global region-based active contour energy and instill in it the benefits of localization. We describe this framework and demonstrate the localization of three well-known energies in order to illustrate how our framework can be applied to any energy. We then compare each localized energy to its global counterpart to show the improvements that can be achieved. Next, an in-depth study of the behaviors of these energies in response to the degree of localization is given. Finally, we show results on challenging images to illustrate the robust and accurate segmentations that are possible with this new class of active contour models.