Implementation of Algorithms on FPGAs
This thesis describes how an algorithm is transferred from a digital signal processor to an embedded microprocessor in an FPGA using C to hardware program from Altera. Saab Avitronics develops the secondary high lift control system for the Boeing 787 aircraft. The high lift system consists of electric motors controlling the trailing edge wing flaps and the leading edge wing slats. The high lift motors manage to control the Boeing 787 aircraft with full power even if half of each motor’s stators are damaged. The motor is a PMDC brushless motor which is controlled by an advanced algorithm. The algorithm needs to be calculated by a fast special digital signal processor. In this thesis I have tested if the algorithm can be transferred to an FPGA and still manage the time and safety demands. This was done by transferring an already working algorithm from the digital signal processor to an FPGA. The idea was to put the algorithm in an embedded NIOS II microprocessor and speed up the bottlenecks with Altera’s C to hardware program. The study shows that the C-code needs to be optimized for C to hardware to manage the up speeding part, as the tests showed that the calculation time for the algorithm actually became longer with C to hardware. This thesis also shows that it is highly probable to use an FPGA equipped with Altera’s NIOS II safety critical microprocessor instead of a digital signal processor to control the electrical high lift motors in the Boeing 787 aircraft.
DSP Memory Management in a Third Generation High Performance Base Station
Most of the tasks in a mobile cellular network base station are performed with programmable digital signal processors. Their memory spaces and management features are very limited. The buffering requirements in the base station can have large instantaneous variations during the simultaneous transmission of burst' data on multiple channels to multiple users. In particular the high bit-rates of the Wideband Code Division Multiple Access data transfer evolution High Speed Downlink Packet Access create very high demands for buffering. The fragmentation of the buffer memory is a threat. It causes a gradual decrease in performance, which is critical in a long running process like the base station. The amount of fragmentation is different with different memory management methods. In this work the features and applicability of different memory management methods for signal processors used in the base stations of third generation cellular networks have been studied. Software based memory management includes a high amount of conditional branches. The signal processor, which is optimized for highly parallel sequential computing, executes conditional branches very badly when compared to microcontrollers and general-purpose processors. The memory management methods are first studied in theory and then experimentally. In the experiments two different memory management methods were analyzed. The memory managers were loaded with a synthetic workload program that simulates multi-user high bit-rate data transmissions in the base station. The performances of the memory managers were measured in terms of fragmentation, execution time and memory utilization. The experiments confirmed the information gained from the theoretical studies that different memory management methods are usually optimized for a certain feature. The experiments showed that a simple method is fast to execute and works well with small and intermediate loads. When the load is increased the performance decreases. The second, more complex, measured method was found to require more computing, but to be capable of using the memory space assigned to it more effectively.
Real Time Implementation of Multi-Level Perfect Signal Reconstruction Filter Bank
Discrete Wavelet Transform (DWT) is an efficient tool for signal and image processing applications which has been utilized for perfect signal reconstruction. In this paper, twenty seven optimum combinations of three different wavelet filter types, three different filter reconstruction levels and three different kinds of signal for multi-level perfect reconstruction filter bank were implemented in MATLAB/Simulink. All the filters for different wavelet types were designed using Filter Design Analysis (FDA) and Wavelet toolbox. Signal to Noise Ratio (SNR) was calculated for each combination. Combination with best SNR was then implemented on TMS320C6713 DSP kit. Real time testing of perfect reconstruction on DSP kit was then carried out by two different methods. Experimental results accede with theory and simulations.
BLAS Comparison on FPGA, CPU and GPU
High Performance Computing (HPC) or scientific codes are being executed across a wide variety of computing platforms from embedded processors to massively parallel GPUs. We present a comparison of the Basic Linear Algebra Subroutines (BLAS) using double-precision floating point on an FPGA, CPU and GPU. On the CPU and GPU, we utilize standard libraries on state-of-the-art devices. On the FPGA, we have developed parameterized modular implementations for the dot product and Gaxpy or matrix-vector multiplication. In order to obtain optimal performance for any aspect ratio of the matrices, we have designed a high-throughput accumulator to perform an efficient reduction of floating point values. To support scalability to large data-sets, we target the BEE3 FPGA platform. We use performance and energy efficiency as metrics to compare the different platforms. Results show that FPGAs offer comparable performance as well as 2.7 to 293 times better energy efficiency for the test cases that we implemented on all three platforms.
Energy Profiling of DSP Applications, A Case Study of an Intelligent ECG Monitor
Proper balance of power and performance for optimum system organization requires precise profiling of the power consumption of different hardware subsystems as well as software functions. Moreover, power consumption of mobile systems is even more important, since the battery is a large portion of the overall size and weight of the system. Average power consumption is only a crude estimate of power requirements and battery life; a much better estimate can be made using dynamic power consumption. Dynamic power consumption is a function of the execution profile of the given application running on specific hardware platform. In this paper we introduce a new environment for energy profiling of DSP applications. The environment consists of a JTAG emulator, a high-resolution HP 3583A multimeter and a workstation that controls devices and stores the traces. We use Texas Instruments’ Real Time Data Exchange mechanism (RTDXÔ) to generate an execution profile and custom procedures for energy profile data acquisition using GPIB interface. We developed custom procedures to correlate and analyze both energy and execution profiles. The environment allows us to improve the system power consumption through changes in software organization and to measure real battery life for the given hardware, software and battery configuration. As a case study, we present the analysis of a real-time portable ECG monitor implemented using a Texas Instruments TMS320C5410-100 processor board, and a Del Mar PWA ECG Amplifier.
Real-time Motion Picture Restoration
Through age or misuse, motion picture films can develop damage in the form of dirt or scratches which detract from the quality of the film. Removal of these artifacts is a worthwhile process as it makes the films more visually attractive and extends the life of the material. In this thesis, various methods for detecting and concealing the effects of film damage are described. Appropriate algorithms are selected for implementation of a system, based on a TMS320C80 video processor, which can remove the effects of film defects using digital processing. The restoration process operates in real-time at video frame rates (30 frames per second). Details of the software implementation of this system are presented along with results from processing damaged film material. The effects of damage are significantly reduced after processing.
An FPGA Implementation of Hierarchical Motion Estimation for Embedded Oject Tracking
This paper presents the hardware implementation of an algorithm developed to provide automatic motion detection and object tracking functionality embedded within intelligent CCTV systems. The implementation is targeted at an Altera Stratix FPGA making full use of the dedicated DSP resource. The Altera Nios embedded processor provides a platform for the tracking control loop and generic Pan Tilt Zoom camera interface. This paper details the explicit functional stages of the algorithm that lend themselves to an optimised pipelined hardware implementation. This implementation provides maximum data throughput, providing real-time operation of the described algorithm, and enables a moving camera to track a moving object in real time.
FUZZY LOGIC BASED CONVOLUTIONAL DECODER FOR USE IN MOBILE TELEPHONE SYSTEMS
Efficient convolutional coding and decoding algorithms are most crucial to successful operation of wireless communication systems in order to achieve high quality of service by reducing the overall bit error rate performance. A widely applied and well evaluated scheme for error correction purposes is well known as Viterbi algorithm [7]. Although the Viterbi algorithm has very good error correcting characteristics, computational effort required remains high. In this paper a novel approach is discussed introducing a convolutional decoder design based on fuzzy logic. A simplified version of this fuzzy based decoder is examined with respect to bit error rate (BER) performance. It can be shown that the fuzzy based convolutional decoder here proposed considerably reduces computational effort with only minor BER performance degradation when compared to the classical Viterbi approach.
Biosignal processing challenges in emotion recognition for adaptive learning
User-centered computer based learning is an emerging field of interdisciplinary research. Research in diverse areas such as psychology, computer science, neuroscience and signal processing is making contributions to take this field to the next level. Learning systems built using contributions from these fields could be used in actual training and education instead of just laboratory proof-of-concept. One of the important advances in this research is the detection and assessment of the cognitive and emotional state of the learner using such systems. This capability moves development beyond the use of traditional user performance metrics to include system intelligence measures that are based on current theories in neuroscience. These advances are of paramount importance in the success and wide spread use of learning systems that are automated and intelligent. Emotion is considered an important aspect of how learning occurs, and yet estimating it and making adaptive adjustments are not part of most learning systems. In this research we focus on one specific aspect of constructing an adaptive and intelligent learning system, that is, estimation of the emotion of the learner as he/she is using the automated training system. The challenge starts with the definition of the emotion and the utility of it in human life. The next challenge is to measure the co-varying factors of the emotions in a non-invasive way, and find consistent features from these measures that are valid across wide population. In this research we use four physiological sensors that are non-invasive, and establish a methodology of utilizing the data from these sensors using different signal processing tools. A validated set of visual stimuli used worldwide in the research of emotion and attention, called International Affective Picture System (IAPS), is used. A dataset is collected from the sensors in an experiment designed to elicit emotions from these validated visual stimuli. We describe a novel wavelet method to calculate hemispheric asymmetry metric using electroencephalography data. This method is tested against typically used power spectral density method. We show overall improvement in accuracy in classifying specific emotions using the novel method. We also show distinctions between different discrete emotions from the autonomic nervous system activity using electrocardiography, electrodermal activity and pupil diameter changes. Findings from different features from these sensors are used to give guidelines to use each of the individual sensors in the adaptive learning environment.
HIERARCHICAL MOTION ESTIMATION FOR EMBEDDED OBJECT TRACKING
This paper presents an algorithm developed to provide automatic motion detection and object tracking embedded within intelligent CCTV systems. The algorithm development focuses on techniques which provide an efficient embedded systems implementation with the ability to target both FPGA and DSP devices. During algorithm development constraints on hardware implementation have been fully considered resulting in an algorithm which, when targeted at current FPGA devices, will take full advantage of the DSP resource commonly provided in such devices. The hierarchical structure of the proposed algorithm provides the system with a multi-level motion estimation process allowing low resolution estimation for motion detection and further higher resolution stages for motion estimation. An initial MATLAB prototype has demonstrated this algorithm capable of object motion estimation while compensating for camera motion, allowing a moving object to be tracked by a moving camera.
An Advanced Signal Processing Toolkit for Java applications
The aim of this study is to examine the capability, performance, and relevance of a signal processing toolkit in Java, a programming language for Web-based applications. Due to the simplicity, ease and application use of the toolkit and with the advanced Internet technologies such as Remote Method Invocation (RMI), a spectral estimation applet has been created in the Java environment. This toolkit also provides an interactive and visual approach in understanding the various theoretical concepts of spectral estimation and shows the need to create more application applets to better understand the various concepts of signal and image processing. This study also focuses on creating a Java toolkit for embedded systems, such as Personal Digital Assistants (PDAs), embedded Java board, and supporting integer precision, and utilizing COordinate Rotation DIgital Computer (CORDIC) algorithm, both aimed to provide good performance in resource-limited environments. The results show a feasibility and necessity of developing a standardized Application Programming Interface (API) for the fixed-point signal processing library.
A Two-Level Reconfigurable Cell Array for Digital Signal Processing
Reconfigurable hardware has become an attractive option for implementing digital signal processing, especially in systems that require both high performance and flexibility. This thesis presents a novel two-level reconfigurable architecture targeted toward systems with these requirements. The architecture supports a large orthogonal design space whereby designers can customize the word length, amount of parallelism, number of functional units, and functional unit connectivity to meet the needs of the application. On the upper level, algorithms are mapped onto an array of 4-bit cells and a hierarchical interconnection fabric. The interconnection structure contains a mesh of 4-bit busses for local data transfer, as well as an H-tree for communicating results between functional units. On the lower level, each cell contains a small matrix of elements that collectively implement all necessary operations. The matrix of elements has only two configurations: one optimized for mathematical functions such as multiply-accumulates, and the other optimized for memory operations. The system also contains pipeline latches to maximize clock rate and throughput. Circuit simulations indicate that the architecture achieves a clock frequency of 200 MHz in a modest 0.25-μm CMOS technology. An initial prototype of the reconfigurable cell has been fabricated in 0.5-μm CMOS and tested for functionality. The estimated execution time for a 16-bit, 256-point Fast Fourier Transform shows a speedup ranging from 1.6 to 14 compared to contemporary digital signal processors.
Blind Adaptive Dereverberation of Speech Signals Using a Microphone Array
In this thesis, we present a blind adaptive speech dereverberation method based on the use of a reduced mutually referenced equalizers (RMRE) criterion. The method is based on the idea of the inversion of single-input multiple-output FIR linear systems, and as such requires the use of multiple microphones. However, unlike many traditional microphone array methods, there is no need for a specific array configuration or geometry. The RMRE method finds a subset of equalizers for a given delay in a single step, without the need for the typical channel estimation step. This makes the method practical in terms of implementation and avoids the pitfalls of the more complicated two step dereverberation approach, typical in many inversion methods. Additionally, only the second-order statistics of the signals recorded by the microphones are used, without the need for utilizing higher-order statistics information typically needed when the channsls have a nonminimum phase response, as is the case with room impulse responses. We present simulations and experimental results that demonstrate the applicability of the method when the input is speech, and show that in the noiseless case, perfect dereverberation can be achieved. We also evaluate its performance in the presence of noise, and we present a possible way to modify the proposed RMRE to work for very low SNR values. We also explore the problems when model-order mismatches are present, and demonstrate that the under-modeling of the channel impulse responses order can be combated by increasing the number of microphones. For order over-estimation, we will show that RMRE can handle such errors with no modification.
Automated Accident Detection in Intersections Via Digital Audio Signal Processing
The aim of this thesis is to design a system for automated accident detection in intersections. The input to the system is a three-second audio signal. The system can be operated in two modes: two-class and multi-class. The output of the two-class system is a label of “crash” or “non-crash”. In the multi-class system, the output is the label of “crash” or various non-crash incidents including “pile drive”, “brake”, and “normal-traffic” sounds. The system designed has three main steps in processing the input audio signal. They are: feature extraction, feature optimization and classification. Five different methods of feature extraction are investigated and compared; they are based on the discrete wavelet transform, fast Fourier transform, discrete cosine transform, real cepstrum transform and Mel frequency cepstral transform. Linear discriminant analysis (LDA) is used to optimize the features obtained in the feature extraction stage by linearly combining the features using different weights. Three types of statistical classifiers are investigated and compared: the nearest neighbor, nearest mean, and maximum likelihood methods. Data collected from Jackson, MS and Starkville, MS and the crash signals obtained from Texas Transportation Institute crash test facility are used to train and test the designed system. The results showed that the wavelet based feature extraction method with LDA and maximum likelihood classifier is the optimum design. This wavelet-based system is computationally inexpensive compared to other methods. The system produced classification accuracies of 95% to 100% when the input signal has a signal-to-noise-ratio of at least 0 decibels. These results show that the system is capable of effectively classifying “crash” or “non-crash” on a given input audio signal.
Ignal Enhancement Using Time-Frequency Based Denoising
This thesis investigates and compares time and wavelet-domain denoising techniques where received signals contain broadband noise. We consider how time and wavelet-domain denoising schemes and their combinations compare in the mean squared error sense. This work applies Wiener prediction and Median filtering as they do not require any prior signal knowledge. In the wavelet-domain we use soft or hard thresholding on the detail coefficients. In addition, we explore the effect of these wavelet-domain thresholding techniques on the coefficients associated with cycle-spinning and the newly proposed recursive cycle-spinning scheme. Finally, we note that thresholding does not make an attempt to de-noise coefficients that remain after thresholding; therefore we apply time domain techniques to the remaining detail coefficients from the first level of decomposition in an attempt to de-noise them further prior to reconstruction. This thesis applies and compares these techniques using a mean squared error criterion to identify the best performing in a robust test signal environment. We find that soft thresholding with Stein’s Unbiased Risk Estimate (SURE) thresholding produces the best mean squared error results in each test case and that the addition of Wiener prediction to the first level of decomposition coefficients leads to a slightly enhanced performance. Finally, we illustrate the effects of denoising algorithms on longer data segments.
Efficient Signal Processing Techniques for Future Wireless Communications Systems
Wireless communications systems are evolving to be more diverse in use and more ubiquitous in nature. It is of fundamental importance that we consume the resources available in such systems, i.e., bandwidth and energy, to preserve room for more users and to preserve longevity. Signal processing can greatly help us achieve this. In this thesis we consider improving the utility of resources available in wireless communications systems. The basic obstacle for most wireless communications systems is the multipath channel that causes intersymbol interference. Channel estimation is a crucial step for recovering the transmitted symbols. Moreover, as more devices are equipped with wireless capabilities, the bandwidth becomes scarce and it is important to allow more than one device or more than one user to use the same frequency range or the same channel. However, this introduces multiuser interference, which is again eliminated only if the channel is known. Furthermore, most wireless systems are battery powered, at least at the transmitter end. Hence it is crucial that energy consumption is minimized to preserve the longevity of the system. The contribution of this thesis is three fold: (i) We propose novel bandwidth efficient blind channel estimation algorithms for single input multiple output systems, and for multiuser OFDM systems. The former exploits cyclostationarity inherent in communications signals. The latter exploits the structure introduced to the transmitted signal via precoding. We consider design of such precoders by optimizing performance metrics such as the bit error rate and signal to interference plus noise ratio. (ii) In the multiuser systems case, we propose a novel cooperative OFDM system and show that, when users face significantly different channel conditions, cooperation can improve the performance of all the cooperating users. (iii) We consider energy efficient training based system estimation in large MIMO systems. The goal there is to minimize energy consumption both in transmission of training symbols and in performing computations. We show that by using a divide and conquer strategy in selecting the active set of transmitters and receivers, it is possible to minimize energy consumption without degrading the accuracy of the channel estimate.
Image Analysis Using a Dual-Tree M-Band Wavelet Transform
We propose a 2D generalization to the M-band case of the dual-tree decomposition structure (initially proposed by N. Kingsbury and further investigated by I. Selesnick) based on a Hilbert pair of wavelets. We particularly address (i) the construction of the dual basis and (ii) the resulting directional analysis. We also revisit the necessary pre-processing stage in the M-band case. While several reconstructions are possible because of the redundancy of the representation, we propose a new optimal signal reconstruction technique, which minimizes potential estimation errors. The effectiveness of the proposed M- band decomposition is demonstrated via denoising comparisons on several image types (natural, texture, seismics), with various M-band wavelets and thresholding strategies. Signicant improvements in terms of both overall noise reduction and direction preservation are observed.
Noise covariance properties in Dual-Tree Wavelet Decompositions
Dual-tree wavelet decompositions have recently gained much popularity, mainly due to their ability to provide an accurate directional analysis of images combined with a reduced redundancy. When the decomposition of a random process is performed – which occurs in particular when an additive noise is corrupting the signal to be analyzed – it is useful to characterize the statistical properties of the dual-tree wavelet coefficients of this process. As dual-tree decompositions constitute overcomplete frame expansions, correlation structures are introduced among the coefficients, even when a white noise is analyzed. In this paper, we show that it is possible to provide an accurate description of the covariance properties of the dual-tree coefficients of a wide-sense stationary process. The expressions of the (cross-) covariance sequences of the coefficients are derived in the one and two-dimensional cases. Asymptotic results are also provided, allowing to predict the behaviour of the second-order moments for large lag values or at coarse resolution. In addition, the crosscorrelations between the primal and dual wavelets, which play a primary role in our theoretical analysis, are calculated for a number of classical wavelet families. Simulation results are finally provided to validate these results.
A Nonlinear Stein Based Estimator for Multichannel Image Denoising
The use of multicomponent images has become widespread with the improvement of multisensor systems having increased spatial and spectral resolutions. However, the observed images are often corrupted by an additive Gaussian noise. In this paper, we are interested in multichannel image denoising based on a multiscale representation of the images. A multivariate statistical approach is adopted to take into account both the spatial and the inter-component correlations existing between the different wavelet subbands. More precisely, we propose a new parametric nonlinear estimator which generalizes many reported denoising methods. The derivation of the optimal parameters is achieved by applying Stein’s principle in the multivariate case. Experiments performed on multispectral remote sensing images clearly indicate that our method outperforms conventional wavelet denoising techniques.
Active control of automobile cabin noise with conventional and advanced speakers
Recently much research has focused on the control of enclosed sound fields, particularly in automobiles. Both Active Noise Control (ANC) and Active Structural Acoustic Control (ASAC) techniques are being applied to problems stemming from power train noise and road noise (noise due to the interaction of the tires with the surface of the road). Due to the low frequency characteristics of these noise problems, large acoustic sources are required to obtain efficient control of the sound field. This creates demand in the automobile industry for compact lightweight sources. This work is concerned with the application of active control to power train noise, as well as road noise in the interior cabin of a sport utility vehicle using advanced, compact lightweight piezoelectric acoustic sources. First, a test structure approximately the same size as the automobile was built to study the principles of active noise control in a cavity. A finite element model of the cavity was created in order to optimize the positions of the error sensors and the control sources. Experimental work was performed with the optimized actuator and sensor locations in order to validate the model, and draw conclusions regarding the conditions to obtain global control of the sound field. Second, a broad-band feedforward filtered-X LMS algorithm was used to control power train noise. Preliminary power train noise tests were conducted using arrangements of four microphones and up to four commercially available speakers for control. Attenuation of seven decibel (dB) at the error sensors was measured in the 40-500 Hz frequency band. The dimensions of the zone of quiet generated by the control were measured, and show that noise reductions were obtained for a large volume surrounding the error sensors. Next, advanced speakers were implemented for active control of power train noise. The results obtained with different arrangements of these speakers were very similar to those obtained with the commercially-available speakers. These advanced speakers use piezoelectric devices to induce the displacement of a speaker membrane, which radiates sound. Their lighter weight and compact dimensions are a significant advantage over conventional speakers, for their application in automobile. Third, preliminary results were obtained for active control of road noise. The controller used an optimized set of four reference signals to control the noise at one error sensor using one control source. Two sets of tests were conducted. The first set of tests was performed on a dynamometer, which simulates the effects of the road on the tires. The second set of tests was performed on a rough road. Reduction of two to four decibel of the sound pressure level at the error sensor was obtained between 100 and 200 Hz.
Auditory Component Analysis Using Perceptual Pattern Recognition to Identify and Extract Independent Components From an Auditory Scene
The cocktail party effect, our ability to separate a sound source from a multitude of other sources, has been researched in detail over the past few decades, and many investigators have tried to model this on computers. Two of the major research areas currently being evaluated for the so-called sound source separation problem are Auditory Scene Analysis (Bregman 1990) and a class of statistical analysis techniques known as Independent Component Analysis (Hyvärinen 2001). This paper presents a methodology for combining these two techniques. It suggests a framework that first separates sounds by analyzing the incoming audio for patterns and synthesizing or filtering them accordingly, measures features of the resulting tracks, and finally separates sounds statistically by matching feature sets and making the output streams statistically independent. Artificial and acoustical mixes of sounds are used to evaluate the signal-to-noise ratio where the signal is the desired source and the noise is comprised of all other sources. The proposed system is found to successfully separate audio streams. The amount of separation is inversely proportional to the amount of reverberation present.
Restoration of Nonlinearly Distorted Optical Soundtracks Using Regularized Inverse Characteristics
This dissertation is concerned with the possibilities of restoration of degraded film-sound. The sound-quality of old films are often not acceptable, which means that the sound is so noisy and distorted that the listener have to take strong efforts to understand the conversations in the film. In this case the film cannot give artistic enjoyment to the listener. This is the reason that several old films cannot be presented in movies or television. The quality of these films can be improved by digital restoration techniques. Since we do not have access to the original signal, only the distorted one, therefore we cannot adjust recording parameters or recording techniques. The only possibility is to post-compensate the signal to produce a better estimate about the undistorted, noiseless signal. In this dissertation new methods are proposed for fast and efficient restoration of nonlinear distortions in the optically recorded film soundtracks. First the nonlinear models and nonlinear restoration techniques are surveyed and the ill-posedness of nonlinear post-compensation (the extreme sensitivity to noise) is explained. The effects and sources of linear and nonlinear distortions at optical soundtracks are also described. A new method is proposed to overcome the ill-posedness of the restoration problem and to get an optimal result. The effectiveness of the algorithm is proven by simulations and restoration of real film-sound signals.
Ignal Enhancement Using Time-Frequency Based Denoising
This thesis investigates and compares time and wavelet-domain denoising techniques where received signals contain broadband noise. We consider how time and wavelet-domain denoising schemes and their combinations compare in the mean squared error sense. This work applies Wiener prediction and Median filtering as they do not require any prior signal knowledge. In the wavelet-domain we use soft or hard thresholding on the detail coefficients. In addition, we explore the effect of these wavelet-domain thresholding techniques on the coefficients associated with cycle-spinning and the newly proposed recursive cycle-spinning scheme. Finally, we note that thresholding does not make an attempt to de-noise coefficients that remain after thresholding; therefore we apply time domain techniques to the remaining detail coefficients from the first level of decomposition in an attempt to de-noise them further prior to reconstruction. This thesis applies and compares these techniques using a mean squared error criterion to identify the best performing in a robust test signal environment. We find that soft thresholding with Stein’s Unbiased Risk Estimate (SURE) thresholding produces the best mean squared error results in each test case and that the addition of Wiener prediction to the first level of decomposition coefficients leads to a slightly enhanced performance. Finally, we illustrate the effects of denoising algorithms on longer data segments.
Image Analysis Using a Dual-Tree M-Band Wavelet Transform
We propose a 2D generalization to the M-band case of the dual-tree decomposition structure (initially proposed by N. Kingsbury and further investigated by I. Selesnick) based on a Hilbert pair of wavelets. We particularly address (i) the construction of the dual basis and (ii) the resulting directional analysis. We also revisit the necessary pre-processing stage in the M-band case. While several reconstructions are possible because of the redundancy of the representation, we propose a new optimal signal reconstruction technique, which minimizes potential estimation errors. The effectiveness of the proposed M- band decomposition is demonstrated via denoising comparisons on several image types (natural, texture, seismics), with various M-band wavelets and thresholding strategies. Signicant improvements in terms of both overall noise reduction and direction preservation are observed.
Voice Activity Detection. Fundamentals and Speech Recognition System Robustness
An important drawback affecting most of the speech processing systems is the environmental noise and its harmful effect on the system performance. Examples of such systems are the new wireless communications voice services or digital hearing aid devices. In speech recognition, there are still technical barriers inhibiting such systems from meeting the demands of modern applications. Numerous noise reduction techniques have been developed to palliate the effect of the noise on the system performance and often require an estimate of the noise statistics obtained by means of a precise voice activity detector (VAD). Speech/non-speech detection is an unsolved problem in speech processing and affects numerous applications including robust speech recognition, discontinuous transmission, real-time speech transmission on the Internet or combined noise reduction and echo cancellation schemes in the context of telephony. The speech/non-speech classification task is not as trivial as it appears, and most of the VAD algorithms fail when the level of background noise increases. During the last decade, numerous researchers have developed different strategies for detecting speech on a noisy signal and have evaluated the influence of the VAD effectiveness on the performance of speech processing systems. Most of the approaches have focussed on the development of robust algorithms with special attention being paid to the derivation and study of noise robust features and decision rules. The different VAD methods include those based on energy thresholds, pitch detection, spectrum analysis, zero-crossing rate, periodicity measure, higher order statistics in the LPC residual domain or combinations of different features. This chapter shows a comprehensive approximation to the main challenges in voice activity detection, the different solutions that have been reported in a complete review of the state of the art and the evaluation frameworks that are normally used. The application of VADs for speech coding, speech enhancement and robust speech recognition systems is shown and discussed. Three different VAD methods are described and compared to standardized and recently reported strategies by assessing the speech/non-speech discrimination accuracy and the robustness of speech recognition systems.
A pole-zero placement technique for designing second-order IIR parametric equalizer filters
A new procedure is presented for designing second-order parametric equalizer filters. In contrast to the traditional approach, in which the design is based on a bilinear transform of an analog filter, the presented procedure allows for designing the filter directly in the digital domain. A rather intuitive technique known as pole-zero placement, is treated here in a quantitative way. It is shown that by making some meaningful approximations, a set of relatively simple design equations can be obtained. Design examples of both notch and resonance filters are included to illustrate the performance of the proposed method, and to compare with state-of-the-art solutions.
EFFICIENT MAPPING OF ADVANCED SIGNAL PROCESSING ALGORITHMS ON MULTI-PROCESSOR ARCHITECTURES
Modern microprocessor technology is migrating from simply increasing clock speeds on a single processor to placing multiple processors on a die to increase throughput and power performance in every generation. To utilize the potential of such a system, signal processing algorithms have to be efficiently parallelized so that the load can be distributed evenly among the multiple processing units. In this paper, we study several advanced deterministic and stochastic signal processing algorithms and their computation using multiple processing units. Specifically, we consider two commonly used time-frequency signal representations, the short-time Fourier transform and the Wigner distribution, and we demonstrate their parallelization with low communication overhead. We also consider sequential Monte Carlo estimation techniques such as particle filtering, and we demonstrate that its multiple processor implementation requires large data exchanges and thus a high communication overhead. We propose a modified mapping scheme that reduces this overhead at the expense of a slight loss in accuracy, and we evaluate the performance of the scheme for a state estimation problem with respect to accuracy and scalability.
Auditory System for a Mobile Robot
The auditory system of living creatures provides useful information about the world, such as the location and interpretation of sound sources. For humans, it means to be able to focus one's attention on events, such as a phone ringing, a vehicle honking, a person taking, etc. For those who do not suffer from hearing impairments, it is hard to imagine a day without being able to hear, especially in a very dynamic and unpredictable world. Mobile robots would also benefit greatly from having auditory capabilities. In this thesis, we propose an artificial auditory system that gives a robot the ability to locate and track sounds, as well as to separate simultaneous sound sources and recognising simultaneous speech. We demonstrate that it is possible to implement these capabilities using an array of microphones, without trying to imitate the human auditory system. The sound source localisation and tracking algorithm uses a steered beamformer to locate sources, which are then tracked using a multi-source particle filter. Separation of simultaneous sound sources is achieved using a variant of the Geometric Source Separation (GSS) algorithm, combined with a multisource post-filter that further reduces noise, interference and reverberation. Speech recognition is performed on separated sources, either directly or by using Missing Feature Theory (MFT) to estimate the reliability of the speech features. The results obtained show that it is possible to track up to four simultaneous sound sources, even in noisy and reverberant environments. Real-time control of the robot following a sound source is also demonstrated. The sound source separation approach we propose is able to achieve a 13.7 dB improvement in signal-to-noise ratio compared to a single microphone when three speakers are present. In these conditions, the system demonstrates more than 80% accuracy on digit recognition, higher than most human listeners could obtain in our small case study when recognising only one of these sources. All these new capabilities will allow humans to interact more naturally with a mobile robot in real life settings.
An application of neural networks to adaptive playout delay in VoIP
The statistical nature of data traffic and the dynamic routing techniques employed in IP networks results in a varying network delay (jitter) experienced by the individual IP packets which form a VoIP flow. As a result voice packets generated at successive and periodic intervals at a source will typically be buffered at the receiver prior to playback in order to smooth out the jitter. However, the additional delay introduced by the playout buffer degrades the quality of service. Thus, the ability to forecast the jitter is an integral part of selecting an appropriate buffer size. This paper compares several neural network based models for adaptive playout buffer selection and in particular a novel combined wavelet transform/neural network approach is proposed. The effectiveness of these algorithms is evaluated using recorded VoIP traces by comparing the buffering delay and the packet loss ratios for each technique. In addition, an output speech signal is reconstructed based on the packet loss information for each algorithm and the perceptual quality of the speech is then estimated using the PESQ MOS algorithm. Simulation results indicate that proposed Haar-Wavelets-Packet MLP and Statistical-Model MLP adaptive scheduling schemes offer superior performance.
HIERARCHICAL MOTION ESTIMATION FOR EMBEDDED OBJECT TRACKING
This paper presents an algorithm developed to provide automatic motion detection and object tracking embedded within intelligent CCTV systems. The algorithm development focuses on techniques which provide an efficient embedded systems implementation with the ability to target both FPGA and DSP devices. During algorithm development constraints on hardware implementation have been fully considered resulting in an algorithm which, when targeted at current FPGA devices, will take full advantage of the DSP resource commonly provided in such devices. The hierarchical structure of the proposed algorithm provides the system with a multi-level motion estimation process allowing low resolution estimation for motion detection and further higher resolution stages for motion estimation. An initial MATLAB prototype has demonstrated this algorithm capable of object motion estimation while compensating for camera motion, allowing a moving object to be tracked by a moving camera.






