Chapter 12 of the book "Multimedia Signal Processing: Theory and Applications in Speech, Music and Communications" - Musical Instruments - A Review of Basic Physics of Sound - Music Signal Features and Models - Ear: Hearing of Sounds - Psychoacoustics of Hearing - Music Compression - High Quality Music Coding: MPEG - Stereo Music - Music Recognition
This report covers a master thesis in embedded systems, the goal of which was to investigate the high speed data collection capabilities with a Blackfin DSP. Basic theory about sampling and noise is covered briefly from a practical point of view. The theory is intended to be useful for those diving into a ADC datasheet for the first time. After an investigation of the delimiting factors, suitable components were selected and a prototype ADC PCB was designed from scratch. The goal is to design a general low noise data collecting unit compatible with the Blackfin DSP. Finally simple DSP software is designed to prove that DSP can handle such a high datastream.Testing the ADC card with the target Blackfin platform indicates thatthe analog parts indeed works. An analog bandwidth of over 10MHz ismeasured at a resolution exceeding 10 bits with respect to noise. The digital parts intended to interleave the two channels digital streams into one Blackfin unit did not work as intended. Only one channel is supported as of now. The report contains suggestions for future work in this area.
Today, a high noise level is considered a problem in many working environments. The main reason is that it contributes to stress and fatigue. Traditional methods using passive noise control is only practicable for high frequencies. As a complement to passive noise control, active noise control (ANC) can be used to reduce low frequency noise. The main idea of ANC is to use destructive interference of waves to cancel disturbing noises. The purpose of this thesis is to design and implement an ANC system in the driver's cabin of a Valmet 890 forest machine. The engine boom is one of the most disturbing noises and therefore the main subjective for the ANC system to suppress. The ANC system is implemented on a Texas Instrument DSP development starter kit. Different FxLMS algorithms are evaluated with feedback and feedforward configurations. The results indicate that an ANC system significantly reduces the sound pressure level (SPL) in the cabin. Best performance of the evaluated systems is achieved for the feedforward FxLMS system. For a commonly used engine speed of 1500 rpm, the SPL is reduced with 17 dB. The results show fast enough convergence and global suppression of low frequency noise.
This master thesis consists of implementation and evaluation of an AEC, Acoustic Echo Canceller, algorithm in a floating-point architecture. The most important question this thesis will try to answer is to determine benefits or drawbacks of using a floating-point architecture, relative a fixed-point architecture, to do AEC. In a telephony system there is two common forms of echo, line echo and acoustic echo. Acoustic echo is introduced by sound emanating from a loudspeaker, e.g. in a handsfree or speakerphone, being picked up by a microphone and then sent back to the source. The problem with this feedback is that the far-end speaker will hear one, or multiple, time-delayed version(s) of her own speech. This time-delayed version of speech is usually perceived as both confusing and annoying unless removed by the use of AEC. In this master thesis the performance of a floating-point version of a normalized least-mean-square AEC algorithm was evaluated in an environment designed and implemented to approximate live telephony calls. An instruction-set simulator and assembler available at the initiation of this master thesis were extended to enable; zero-overhead loops, modular addressing, post-increment of registers and register-write forwarding. With these improvements a bit-true assembly version was implemented capable of real-time AEC requiring 15 million instructions per second. A solution using as few as eight mantissa bits, in an external format used when storing data in memory, was found to have an insignificant effect on the selected AEC implementation’s performance. Due to the relatively low memory requirement of the selected AEC algorithm, the use of a small external format has a minor effect on the required memory size. In total this indicates that the possible reduction of the memory requirement and related energy consumption, does not justify the added complexity and energy consumption of using a floating-point architecture for the selected algorithm. Use of a floating-point format can still be advantageous in speech-related signal processing when the introduced time delay by a subband, or a similar frequency domain, solution is unacceptable. Speech algorithms that have high memory use and small introduced delay requirements are a good candidate for a floating-point digital signal processor architecture.
Ogg Vorbis is a fairly new and growing audio format, often used for online distribution of music and internet radio stations for streaming audio. It is considered to be better than MP3 in both quality and compression and in the same league as for example AAC. In contrast with many other formats, like MP3 and AAC, Ogg Vorbis is patent and royalty free. The purpose of this thesis project was to investigate how the C6416 DSP processor and a Stratix II FPGA could be connected to each other and work together as co-processors and using an Ogg Vorbis decoder as implementation example. A fixed-point decoder called Tremor (developed by Xiph.Org the creator of the Vorbis I specification), has been ported to the DSP processor and an Ogg Vorbis player has been developed. Tremor was profiled before performing the software / hardware partitioning to decide what parts of the source code of Tremor that should be implemented in the FPGA to off-load and accelerate the DSP.
In this thesis a real time test platform for a permanent magnet synchronous motor is developed. The implemented algorithm is Field Oriented Control (FOC) and it is implemented on a Texas Instruments TMS320F2808 Digital Signal Processor (DSP). The platform is developed in a rapid prototyping approach using Matlab/Simulink and the Real Time Workshop (RTW) packages.With this software the control algorithm and its interface to different DSP modules, such as A/D converter and PWM module, is constructed as a Simulink block scheme. The blocks used come from ordinary Simulink libraries and libraries provided by the RTW packages. From the Simulink block scheme Matlab can auto generate embedded C code adapted for different embedded targets, in this case the 2808 DSP.The developed real time test platform is also a Simulink model, though different from the algorithm model. When the start simulation command is given in the platform model a Graphical User Interface is loaded which lets the user specify motor parameters and certain algorithm parameters. Once the parameters are chosen RTW generates code from the algorithm model, loads it into the DSP and runs the generated program. From the platform model it is possible to set the reference speed of the motor in real time and monitor/log motor parameters such as actual speed and stator currents.
Audio processing algorithms are increasingly used in cell phones and today’s customers are placing more demands on cell phones. Feature phones, once the advent of mobile phone technology, nowadays do more than just providing the user with MP3 play back or advanced audio effects. These features have become an integral part of medium as well as low-end phones. On the other hand, there is also an endeavor to include as improved quality as possible into products to compete in market and satisfy users’ needs. Tackling the above requirements has been partly satisfied by the advance in hardware design and manufacturing technology. However, as new hardware emerges into market the need for competence to write efficient software and exploit the new features thoroughly and effectively arises. Even though compilers are also keeping up with the new tide space for hand optimized code still exist. Wrapped in the above goal, an effort was made in this thesis to partly cover the competence requirement at Multimedia Section (part of Ericsson Mobile Platforms) to develope optimized code for new processors. Forging persistently ahead with new products, EMP has always incorporated the latest technology into its products among which ARMv6 family of processors has the main central processing role in a number of upcoming products. To fully exploit latest features provided by ARMv6, it was required to probe its new instruction set among which new media processing instructions are of outmost importance. In order to execute DSP-intensive algorithms (e.g. Audio Processing algorithms) efficiently, the implementation should be done in low-level code applying available instruction set. Meanwhile, ARMv6 comes with a number of new features in comparison with its predecessors. SIMD (Single Instruction Multiple Data) and VFP (Vector Floating Point) are the most prominent media processing improvements in ARMv6. Aligned with thesis goals and guidelines, Reverb algorithm which is among one of the most complicated audio features on a hand-held devices was probed. Consequently, its kernel parts were identified and implementation was done both in fixed-point and floating-point using the available resources on hardware. Besides execution time and amount of code memory for each part were measured and provided in tables and charts for comparison purposes. Conclusions were finally drawn based on developed code’s efficiency over ARM compiler’s as well as existing code already developed and tailored to ARMv5 processors. The main criteria for optimization was the execution time. Moreover, quantization effect due to limited precision fixed-point arithmetic was formulated and its effect on quality was elaborated. The outcomes, clearly indicate that hand optimization of kernel parts are superior to Compiler optimized alternative both from the point of code memory as well as execution time. The results also confirmed the presumption that hand optimized code using new instruction set can improve efficiency by an average 25%-50% depending on the algorithm structure and its interaction with other parts of audio effect. Despite its many draw backs, fixed-point implementation remains yet to be the dominant implementation for majority of DSP algorithms on low-power devices.
The concept of Parallel Vector (scratch pad) Memories (PVM) was introduced as one solution for Parallel Computing in DSP, which can provides parallel memory addressing efficiently with minimum latency. The parallel programming more efficient by using the parallel addressing generator for parallel vector memory (PVM) proposed in this thesis. However, without hiding complexities by cache, the cost of programming is high. To minimize the programming cost, automatic parallel memory address generation is needed to hide the complexities of memory access. This thesis investigates methods for implementing conflict-free vector addressing algorithms on a parallel hardware structure. In particular, match vector addressing requirements extracted from the behaviour model to a prepared parallel memory addressing template, in order to supply data in parallel from the main memory to the on-chip vector memory. According to the template and usage of the main and on-chip parallel vector memory, models for data pre-allocation and permutation in scratch pad memories of ASIP can be decided and configured. By exposing the parallel memory access of source code, the memory access flow graph (MFG) will be generated. Then MFG will be used combined with hardware information to match templates in the template library. When it is matched with one template, suited permutation equation will be gained, and the permutation table that include target addresses for data pre-allocation and permutation is created. Thus it is possible to automatically generate memory address for parallel memory accesses. A tool for achieving the goal mentioned above is created, Permutator, which is implemented in C++ combined with XML. Memory access coding template is selected, as a result that permutation formulas are specified. And then PVM address table could be generated to make the data pre-allocation, so that efficient parallel memory access is possible. The result shows that the memory access complexities is hiden by using Permutator, so that the programming cost is reduced.It works well in the context that each algorithm with its related hardware information is corresponding to a template case, so that extra memory cost is eliminated.
As part of an ongoing project at the department of electrical engineering, ISY, at Linköping University, a voice decoder using floating point formats has been the focus of this master thesis. Previous work has been done developing an mp3-decoder using the floating point formats. All is expected to be implemented on a single DSP.The ever present desire to make things smaller, more efficient and less power consuming are the main reasons for this master thesis regarding the use of a floating point format instead of the traditional integer format in a GSM codec. The idea with the low precision floating point format is to be able to reduce the size of the memory. This in turn reduces the size of the total chip area needed and also decreases the power consumption.One main question is if this can be done with the floating point format without losing too much sound quality of the speech. When using the integer format, one can represent every value in the range depending on how many bits are being used. When using a floating point format you can represent larger values using fewer bits compared to the integer format but you lose representation of some values and have to round the values off.From the tests that have been made with the decoder during this thesis, it has been found that the audible difference between the two formats is very small and can hardly be heard, if at all. The rounding seems to have very little effect on the quality of the sound and the implementation of the codec has succeeded in reproducing similar sound quality to the GSM standard decoder.
The target of this master thesis is to evaluate the Image Warping technique and propose a possible design for an implementation in FPGA. The Image Warping is widely used in the image processing for image correction and rectification. A DSP is a usual choice for implantation of the image processing algorithms, but to decrease a cost of the target system it was proposed to use an FPGA for implementation. In this work a different Image Warping methods was evaluated in terms of performance, produced image quality, complexity and design size. Also, considering that it is not only Image Warping algorithm which will be implemented on the target system, it was important to estimate a possible memory bandwidth used by the proposed design. The evaluation was done by implemented a C-model of the proposed design with a finite datapath to simulate hardware implementation as close as possible.
This thesis investigates and compares time and wavelet-domain denoising techniques where received signals contain broadband noise. We consider how time and wavelet-domain denoising schemes and their combinations compare in the mean squared error sense. This work applies Wiener prediction and Median filtering as they do not require any prior signal knowledge. In the wavelet-domain we use soft or hard thresholding on the detail coefficients. In addition, we explore the effect of these wavelet-domain thresholding techniques on the coefficients associated with cycle-spinning and the newly proposed recursive cycle-spinning scheme. Finally, we note that thresholding does not make an attempt to de-noise coefficients that remain after thresholding; therefore we apply time domain techniques to the remaining detail coefficients from the first level of decomposition in an attempt to de-noise them further prior to reconstruction. This thesis applies and compares these techniques using a mean squared error criterion to identify the best performing in a robust test signal environment. We find that soft thresholding with Stein’s Unbiased Risk Estimate (SURE) thresholding produces the best mean squared error results in each test case and that the addition of Wiener prediction to the first level of decomposition coefficients leads to a slightly enhanced performance. Finally, we illustrate the effects of denoising algorithms on longer data segments.
We propose a 2D generalization to the M-band case of the dual-tree decomposition structure (initially proposed by N. Kingsbury and further investigated by I. Selesnick) based on a Hilbert pair of wavelets. We particularly address (i) the construction of the dual basis and (ii) the resulting directional analysis. We also revisit the necessary pre-processing stage in the M-band case. While several reconstructions are possible because of the redundancy of the representation, we propose a new optimal signal reconstruction technique, which minimizes potential estimation errors. The effectiveness of the proposed M- band decomposition is demonstrated via denoising comparisons on several image types (natural, texture, seismics), with various M-band wavelets and thresholding strategies. Signicant improvements in terms of both overall noise reduction and direction preservation are observed.
The fields of neurobiology and electrical engineering have come together to pursue an integrated Brain-Machine Interface (BMI). Signal processing methods are used to find mapping algorithms between motor cortex neural firing rate and hand position. This cognitive extension could help patients with quadriplegia regain some independence using a thought-controlled robot arm. Current signal processing methods to achieve realtime neural-to-motor translation involve large, multi-processor systems to produce motor control parameters. Eventually, software running in a portable signal processing system is needed to allow for the patient to have the BMI in a backpack or attached to a wheelchair. This thesis presents a DSP-Based Computational Engine for a Brain-Machine Interface. The development of a DSP Board based on the Texas Instruments TMS320VC33 DSP will be presented, along with implementations of two digital filters and their training methods: 1) FIR trained with Normalized Least Mean Square Adaptive Filter (NLMS) and 2) Recurrent Multi-Layer Perceptron (RMLP) trained with Real-Time Recurrent Learning (RTRL). The requirements of the DSP Board, component selection and integration, and control software are discussed. The DSP implementations of the digital filters are presented, along with performance and timing analysis in real data collected from an Owl Monkey at Duke University. The weights of the FIR-NLMS filter converged similarly on the DSP as they did in MATLAB. Likewise, the weights of the RMLP-RTRL filter converged similarly on the DSP as they did using the Backpropagation Through Time method in NeuroSolutions. The custom DSP Board and two digital algorithms implemented in this thesis create a starting point for an integrated, portable, real-time signal processing solution for a Brain-Machine Interface.
Proper balance of power and performance for optimum system organization requires precise profiling of the power consumption of different hardware subsystems as well as software functions. Moreover, power consumption of mobile systems is even more important, since the battery is a large portion of the overall size and weight of the system. Average power consumption is only a crude estimate of power requirements and battery life; a much better estimate can be made using dynamic power consumption. Dynamic power consumption is a function of the execution profile of the given application running on specific hardware platform. In this paper we introduce a new environment for energy profiling of DSP applications. The environment consists of a JTAG emulator, a high-resolution HP 3583A multimeter and a workstation that controls devices and stores the traces. We use Texas Instruments’ Real Time Data Exchange mechanism (RTDXÔ) to generate an execution profile and custom procedures for energy profile data acquisition using GPIB interface. We developed custom procedures to correlate and analyze both energy and execution profiles. The environment allows us to improve the system power consumption through changes in software organization and to measure real battery life for the given hardware, software and battery configuration. As a case study, we present the analysis of a real-time portable ECG monitor implemented using a Texas Instruments TMS320C5410-100 processor board, and a Del Mar PWA ECG Amplifier.
In 1960, R.E. Kalman published his famous paper describing a recursive solution to the discrete-data linear filtering problem. Since that time, due in large part to advances in digital computing, the Kalman filter has been the subject of extensive research and application, particularly in the area of autonomous or assisted navigation.
The success of multicarrier modulation in the form of OFDM in radio channels illuminates a path one could take towards high-rate underwater acoustic communications, and recently there are intensive investigations on underwater OFDM. In this paper, we implement the acoustic OFDM transmitter and receiver design of [4, 5] on a TMS320C6713 DSP board. We analyze the workload and identify the most time-consuming operations. Based on the workload analysis, we tune the algorithms and optimize the code to substantially reduce the synchronization time to 0.2 seconds and the processing time of one OFDM block to 1.7 seconds on a DSP processor at 225 MHz. This experimentation provides guidelines on our future work to reduce the per-block processing time to be less than the block duration of 0.23 seconds for real time operations.
The Transform-Domain LMS Algorithm (Narayan, 1983) is studied in the context of an acoustic system identification problem. The power estimator in this two-stage digital filter is shown to affect the achievable rates and depths of convergence significantly. Preferred values for the two tracking parameters, $\beta$ and $\mu,$ are determined. Dynamic Step-size Initialization is proposed to improve early convergence by accelerating the rate at which true power measurements replace (arbitrary) initial values. Later, linear estimators are shown to be sub-optimal, particularly where the spectral distribution of the reference changes rapidly. A simple non-linear Peak Window Power Estimator which eliminates these problems is described. It will be shown to improve the tracking rates and misadjustment simultaneously. The benefits of these methods are demonstrated using FIR sequences representative of typical acoustic environments and using recordings from a commercial telephone set. The proposed structures surpass theexisting algorithms consistently under all circumstances tested.
MATLAB simulation is used as the primary tool to illustrate concepts, to validate MODEM designs, and to vent' operation of the subsystems employed in DSP based transmitters and receivers presented in a pair of classes on MODEM Design and Digital Receiver Design. The whole gamut of subsystems found in conventional and experimental modem designs are simulated and assembled to form a full end-to-end simulation of an operating MODEM. This paper describes the philosophy used to guide class involvement and assess the experience and the learning value to student participants.
The design of conventional sensors is based primarily on the Shannon-Nyquist sampling theorem, which states that a signal of bandwidth W Hz is fully determined by its discrete-time samples provided the sampling rate exceeds 2W samples per second. For discrete-time signals, the Shannon-Nyquist theorem has a very simple interpretation: the number of data samples must be at least as large as the dimensionality of the signal being sampled and recovered. This important result enables signal processing in the discrete-time domain without any loss of information. However, in an increasing number of applications, the Shannon-Nyquist sampling theorem dictates an unnecessary and often prohibitively high sampling rate. (See Box 1 for a derivation of the Nyquist rate of a time-varying scene.) As a motivating example, the high resolution of the image sensor hardware in modern cameras reflects the large amount of data sensed to capture an image. A 10-megapixel camera, in effect, takes 10 million measurements of the scene. Yet, almost immediately after acquisition, redundancies in the image are exploited to compress the acquired data significantly, often at compression ratios of 100:1 for visualization and even higher for detection and classification tasks. This example suggests immense wastage in the overall design of conventional cameras.