Correcting an Important Goertzel Filter Misconception
This article illustrates the signal amplitude loss inherent in a traditional complex down-conversion system. (In the literature of signal processing, complex down-conversion is also called "quadrature demodulation.")
I recently learned an interesting rule of thumb regarding the use of an amplifier to drive the input of an analog to digital converter (ADC). The rule of thumb describes how to specify the maximum allowable noise power of the amplifier.
Towards Efﬁcient and Robust Automatic Speech Recognition: Decoding Techniques and Discriminative Training●1 comment
Automatic speech recognition has been widely studied and is already being applied in everyday use. Nevertheless, the recognition performance is still a bottleneck in many practical applications of large vocabulary continuous speech recognition. Either the recognition speed is not sufﬁcient, or the errors in the recognition result limit the applications. This thesis studies two aspects of speech recognition, decoding and training of acoustic models, to improve speech recognition performance in different conditions.
This article surveys the theory of compressive sensing, also known as compressed sensing or CS, a novel sensing/sampling paradigm that goes against the common wisdom in data acquisition.
Chapter 1 of the book: "Compressed Sensing: Theory and Applications".
Chapter 1 of the book: Real-Time Digital Signal Processing: Fundamentals, Implementations and Applications, 3rd Edition
This document describes a straightforward method to significantly reduce the number of necessary multiplies per input sample of traditional IIR lowpass and highpass digital filters.
In digital signal processing (DSP) we're all familiar with the processes of bandpass sampling an analog bandpass signal and downsampling a digital bandpass signal. The overall spectral behavior of those operations are well-documented. However, mathematical expressions for computing the translated frequency of individual spectral components, after bandpass sampling or downsampling, are not available in the standard DSP textbooks. This document explains how to compute the frequencies of translated spectral components and provide the desired equations in the hope that they are of use to you.
Recently I've been reading papers on underwater acoustic communications systems and this caused me to investigate the frequency-domain effects of time-reversal of time-domain sequences. I created this article because there is so little coverage of this topic in the literature of DSP.
Peak to Average Power Ratio (PAPR) is often used to characterize digitally modulated signals. One example application is setting the level of the signal in a digital modulator. Knowing PAPR allows setting the average power to a level that is just low enough to minimize clipping.
This book provides an applications-oriented introduction to digital signal processing written primarily for electrical engineering undergraduates. Practicing engineers and graduate students may also ﬁnd it useful as a ﬁrst text on the subject.
In this article, we develop complex-baseband models for several signal impairments: interfering carrier, multipath, phase noise, and Gaussian noise. To provide concrete examples, we'll apply the impairments to a QAM system. The impairment models are Matlab functions that each use at most seven lines of code. Although our example system is QAM, the models can be used for any complex-baseband signal.
An important drawback affecting most of the speech processing systems is the environmental noise and its harmful effect on the system performance. Examples of such systems are the new wireless communications voice services or digital hearing aid devices. In speech recognition, there are still technical barriers inhibiting such systems from meeting the demands of modern applications. Numerous noise reduction techniques have been developed to palliate the effect of the noise on the system performance and often require an estimate of the noise statistics obtained by means of a precise voice activity detector (VAD). Speech/non-speech detection is an unsolved problem in speech processing and affects numerous applications including robust speech recognition, discontinuous transmission, real-time speech transmission on the Internet or combined noise reduction and echo cancellation schemes in the context of telephony. The speech/non-speech classification task is not as trivial as it appears, and most of the VAD algorithms fail when the level of background noise increases. During the last decade, numerous researchers have developed different strategies for detecting speech on a noisy signal and have evaluated the influence of the VAD effectiveness on the performance of speech processing systems. Most of the approaches have focussed on the development of robust algorithms with special attention being paid to the derivation and study of noise robust features and decision rules. The different VAD methods include those based on energy thresholds, pitch detection, spectrum analysis, zero-crossing rate, periodicity measure, higher order statistics in the LPC residual domain or combinations of different features. This chapter shows a comprehensive approximation to the main challenges in voice activity detection, the different solutions that have been reported in a complete review of the state of the art and the evaluation frameworks that are normally used. The application of VADs for speech coding, speech enhancement and robust speech recognition systems is shown and discussed. Three different VAD methods are described and compared to standardized and recently reported strategies by assessing the speech/non-speech discrimination accuracy and the robustness of speech recognition systems.
The aim of this project consists in the FPGA design and implementation of a transmitter and receiver (Tx/Rx) multicarrier system such the Orthogonal Frequency Division Multiplexing (OFDM). This Tx/Rx OFDM subsystem is capable to deal with with different M-QAM modulations and is implemented in a digital signal processor (DSP-FPGA). The implementation of the Tx/Rx subsystem has been carried out in a FPGA using both System Generator visual programming running over Matlab/Simulink, and the Xilinx ISE program which uses VHDL language. This project is divided into four chapters, each one with a concrete objective. The first chapter is a brief introduction to the digital signal processor used, a field-programmable gate array (FPGA), and to the VHDL programming language. The second chapter is an overview on OFDM, its main advantages and disadvantages in front of previous systems, and a brief description of the different blocks composing the OFDM system. Chapter three provides the implementation details for each of these blocks, and also there is a brief explanation on the theory behind each of the OFDM blocks to provide a better comprehension on its implementation. The fourth chapter is focused, on the one hand, in showing the results of the Matlab/Simulink simulations for the different simulation schemes used and, on the other hand, to show the experimental results obtained using the FPGA to generate the OFDM signal at baseband and then upconverted at the frequency of 3,5 GHz. Finally the conclusions regarding the whole Tx/Rx design and implementation of the OFDM subsystem are given.
A Review of Physical and Perceptual Feature Extraction Techniques for Speech, Music and Environmental Sounds●3 comments
Endowing machines with sensing capabilities similar to those of humans is a prevalent quest in engineering and computer science. In the pursuit of making computers sense their surroundings, a huge effort has been conducted to allow machines and computers to acquire, process, analyze and understand their environment in a human-like way. Focusing on the sense of hearing, the ability of computers to sense their acoustic environment as humans do goes by the name of machine hearing. To achieve this ambitious aim, the representation of the audio signal is of paramount importance. In this paper, we present an up-to-date review of the most relevant audio feature extraction techniques developed to analyze the most usual audio signals: speech, music and environmental sounds. Besides revisiting classic approaches for completeness, we include the latest advances in the field based on new domains of analysis together with novel bio-inspired proposals. These approaches are described following a taxonomy that organizes them according to their physical or perceptual basis, being subsequently divided depending on the domain of computation (time, frequency, wavelet, image-based, cepstral, or other domains). The description of the approaches is accompanied with recent examples of their application to machine hearing related problems.
Chapter 1 of the book: "Compressed Sensing: Theory and Applications".