The increased complexity of Digital Signal Processing (DSP) algorithms demands for the development of more complex and more eﬃcient hardware structures. The work presented herein describes the core components for the development of a tool capable of automatic generation of eﬃcient hardware structures, therefore facilitating developers work. It comprises algorithms and techniques for i) balancing the paths in a graph, ii) scheduling of operations to functional units, iii) allocating registers and iv) generating the VHDL code. Results show that the developed techniques are capable of generating the hardware structure of typical DSP algorithms represented in data-ﬂow graphs with over 2,000 nodes in around 200 ms, scaling to 80,000 nodes in about 214 s. Within the developed techniques, solving the scheduling problem is one of the most complex tasks: it is a NP-complete problem and directly inﬂuences the number of functional units and registers required. Therefore, experimental analysis was made on scheduling algorithms for time-constrained problems. Results show that simple list-based algorithms are more eﬃcient in large problems than more complex algorithms: they run faster and tend to require less functional units.
Chapter 12 of the book "Multimedia Signal Processing: Theory and Applications in Speech, Music and Communications" - Musical Instruments - A Review of Basic Physics of Sound - Music Signal Features and Models - Ear: Hearing of Sounds - Psychoacoustics of Hearing - Music Compression - High Quality Music Coding: MPEG - Stereo Music - Music Recognition
This report covers a master thesis in embedded systems, the goal of which was to investigate the high speed data collection capabilities with a Blackfin DSP. Basic theory about sampling and noise is covered briefly from a practical point of view. The theory is intended to be useful for those diving into a ADC datasheet for the first time. After an investigation of the delimiting factors, suitable components were selected and a prototype ADC PCB was designed from scratch. The goal is to design a general low noise data collecting unit compatible with the Blackfin DSP. Finally simple DSP software is designed to prove that DSP can handle such a high datastream.Testing the ADC card with the target Blackfin platform indicates thatthe analog parts indeed works. An analog bandwidth of over 10MHz ismeasured at a resolution exceeding 10 bits with respect to noise. The digital parts intended to interleave the two channels digital streams into one Blackfin unit did not work as intended. Only one channel is supported as of now. The report contains suggestions for future work in this area.
Today, a high noise level is considered a problem in many working environments. The main reason is that it contributes to stress and fatigue. Traditional methods using passive noise control is only practicable for high frequencies. As a complement to passive noise control, active noise control (ANC) can be used to reduce low frequency noise. The main idea of ANC is to use destructive interference of waves to cancel disturbing noises. The purpose of this thesis is to design and implement an ANC system in the driver's cabin of a Valmet 890 forest machine. The engine boom is one of the most disturbing noises and therefore the main subjective for the ANC system to suppress. The ANC system is implemented on a Texas Instrument DSP development starter kit. Different FxLMS algorithms are evaluated with feedback and feedforward configurations. The results indicate that an ANC system significantly reduces the sound pressure level (SPL) in the cabin. Best performance of the evaluated systems is achieved for the feedforward FxLMS system. For a commonly used engine speed of 1500 rpm, the SPL is reduced with 17 dB. The results show fast enough convergence and global suppression of low frequency noise.
This master thesis consists of implementation and evaluation of an AEC, Acoustic Echo Canceller, algorithm in a floating-point architecture. The most important question this thesis will try to answer is to determine benefits or drawbacks of using a floating-point architecture, relative a fixed-point architecture, to do AEC. In a telephony system there is two common forms of echo, line echo and acoustic echo. Acoustic echo is introduced by sound emanating from a loudspeaker, e.g. in a handsfree or speakerphone, being picked up by a microphone and then sent back to the source. The problem with this feedback is that the far-end speaker will hear one, or multiple, time-delayed version(s) of her own speech. This time-delayed version of speech is usually perceived as both confusing and annoying unless removed by the use of AEC. In this master thesis the performance of a floating-point version of a normalized least-mean-square AEC algorithm was evaluated in an environment designed and implemented to approximate live telephony calls. An instruction-set simulator and assembler available at the initiation of this master thesis were extended to enable; zero-overhead loops, modular addressing, post-increment of registers and register-write forwarding. With these improvements a bit-true assembly version was implemented capable of real-time AEC requiring 15 million instructions per second. A solution using as few as eight mantissa bits, in an external format used when storing data in memory, was found to have an insignificant effect on the selected AEC implementation’s performance. Due to the relatively low memory requirement of the selected AEC algorithm, the use of a small external format has a minor effect on the required memory size. In total this indicates that the possible reduction of the memory requirement and related energy consumption, does not justify the added complexity and energy consumption of using a floating-point architecture for the selected algorithm. Use of a floating-point format can still be advantageous in speech-related signal processing when the introduced time delay by a subband, or a similar frequency domain, solution is unacceptable. Speech algorithms that have high memory use and small introduced delay requirements are a good candidate for a floating-point digital signal processor architecture.
Ogg Vorbis is a fairly new and growing audio format, often used for online distribution of music and internet radio stations for streaming audio. It is considered to be better than MP3 in both quality and compression and in the same league as for example AAC. In contrast with many other formats, like MP3 and AAC, Ogg Vorbis is patent and royalty free. The purpose of this thesis project was to investigate how the C6416 DSP processor and a Stratix II FPGA could be connected to each other and work together as co-processors and using an Ogg Vorbis decoder as implementation example. A fixed-point decoder called Tremor (developed by Xiph.Org the creator of the Vorbis I specification), has been ported to the DSP processor and an Ogg Vorbis player has been developed. Tremor was profiled before performing the software / hardware partitioning to decide what parts of the source code of Tremor that should be implemented in the FPGA to off-load and accelerate the DSP.
In this thesis a real time test platform for a permanent magnet synchronous motor is developed. The implemented algorithm is Field Oriented Control (FOC) and it is implemented on a Texas Instruments TMS320F2808 Digital Signal Processor (DSP). The platform is developed in a rapid prototyping approach using Matlab/Simulink and the Real Time Workshop (RTW) packages.With this software the control algorithm and its interface to different DSP modules, such as A/D converter and PWM module, is constructed as a Simulink block scheme. The blocks used come from ordinary Simulink libraries and libraries provided by the RTW packages. From the Simulink block scheme Matlab can auto generate embedded C code adapted for different embedded targets, in this case the 2808 DSP.The developed real time test platform is also a Simulink model, though different from the algorithm model. When the start simulation command is given in the platform model a Graphical User Interface is loaded which lets the user specify motor parameters and certain algorithm parameters. Once the parameters are chosen RTW generates code from the algorithm model, loads it into the DSP and runs the generated program. From the platform model it is possible to set the reference speed of the motor in real time and monitor/log motor parameters such as actual speed and stator currents.
Audio processing algorithms are increasingly used in cell phones and today’s customers are placing more demands on cell phones. Feature phones, once the advent of mobile phone technology, nowadays do more than just providing the user with MP3 play back or advanced audio effects. These features have become an integral part of medium as well as low-end phones. On the other hand, there is also an endeavor to include as improved quality as possible into products to compete in market and satisfy users’ needs. Tackling the above requirements has been partly satisfied by the advance in hardware design and manufacturing technology. However, as new hardware emerges into market the need for competence to write efficient software and exploit the new features thoroughly and effectively arises. Even though compilers are also keeping up with the new tide space for hand optimized code still exist. Wrapped in the above goal, an effort was made in this thesis to partly cover the competence requirement at Multimedia Section (part of Ericsson Mobile Platforms) to develope optimized code for new processors. Forging persistently ahead with new products, EMP has always incorporated the latest technology into its products among which ARMv6 family of processors has the main central processing role in a number of upcoming products. To fully exploit latest features provided by ARMv6, it was required to probe its new instruction set among which new media processing instructions are of outmost importance. In order to execute DSP-intensive algorithms (e.g. Audio Processing algorithms) efficiently, the implementation should be done in low-level code applying available instruction set. Meanwhile, ARMv6 comes with a number of new features in comparison with its predecessors. SIMD (Single Instruction Multiple Data) and VFP (Vector Floating Point) are the most prominent media processing improvements in ARMv6. Aligned with thesis goals and guidelines, Reverb algorithm which is among one of the most complicated audio features on a hand-held devices was probed. Consequently, its kernel parts were identified and implementation was done both in fixed-point and floating-point using the available resources on hardware. Besides execution time and amount of code memory for each part were measured and provided in tables and charts for comparison purposes. Conclusions were finally drawn based on developed code’s efficiency over ARM compiler’s as well as existing code already developed and tailored to ARMv5 processors. The main criteria for optimization was the execution time. Moreover, quantization effect due to limited precision fixed-point arithmetic was formulated and its effect on quality was elaborated. The outcomes, clearly indicate that hand optimization of kernel parts are superior to Compiler optimized alternative both from the point of code memory as well as execution time. The results also confirmed the presumption that hand optimized code using new instruction set can improve efficiency by an average 25%-50% depending on the algorithm structure and its interaction with other parts of audio effect. Despite its many draw backs, fixed-point implementation remains yet to be the dominant implementation for majority of DSP algorithms on low-power devices.
The concept of Parallel Vector (scratch pad) Memories (PVM) was introduced as one solution for Parallel Computing in DSP, which can provides parallel memory addressing efficiently with minimum latency. The parallel programming more efficient by using the parallel addressing generator for parallel vector memory (PVM) proposed in this thesis. However, without hiding complexities by cache, the cost of programming is high. To minimize the programming cost, automatic parallel memory address generation is needed to hide the complexities of memory access. This thesis investigates methods for implementing conflict-free vector addressing algorithms on a parallel hardware structure. In particular, match vector addressing requirements extracted from the behaviour model to a prepared parallel memory addressing template, in order to supply data in parallel from the main memory to the on-chip vector memory. According to the template and usage of the main and on-chip parallel vector memory, models for data pre-allocation and permutation in scratch pad memories of ASIP can be decided and configured. By exposing the parallel memory access of source code, the memory access flow graph (MFG) will be generated. Then MFG will be used combined with hardware information to match templates in the template library. When it is matched with one template, suited permutation equation will be gained, and the permutation table that include target addresses for data pre-allocation and permutation is created. Thus it is possible to automatically generate memory address for parallel memory accesses. A tool for achieving the goal mentioned above is created, Permutator, which is implemented in C++ combined with XML. Memory access coding template is selected, as a result that permutation formulas are specified. And then PVM address table could be generated to make the data pre-allocation, so that efficient parallel memory access is possible. The result shows that the memory access complexities is hiden by using Permutator, so that the programming cost is reduced.It works well in the context that each algorithm with its related hardware information is corresponding to a template case, so that extra memory cost is eliminated.
As part of an ongoing project at the department of electrical engineering, ISY, at Linköping University, a voice decoder using floating point formats has been the focus of this master thesis. Previous work has been done developing an mp3-decoder using the floating point formats. All is expected to be implemented on a single DSP.The ever present desire to make things smaller, more efficient and less power consuming are the main reasons for this master thesis regarding the use of a floating point format instead of the traditional integer format in a GSM codec. The idea with the low precision floating point format is to be able to reduce the size of the memory. This in turn reduces the size of the total chip area needed and also decreases the power consumption.One main question is if this can be done with the floating point format without losing too much sound quality of the speech. When using the integer format, one can represent every value in the range depending on how many bits are being used. When using a floating point format you can represent larger values using fewer bits compared to the integer format but you lose representation of some values and have to round the values off.From the tests that have been made with the decoder during this thesis, it has been found that the audible difference between the two formats is very small and can hardly be heard, if at all. The rounding seems to have very little effect on the quality of the sound and the implementation of the codec has succeeded in reproducing similar sound quality to the GSM standard decoder.
The proposed optimal algorithm for the digitizing of analog filters is based on two existing filter design methods: the extended window design (EWD) and the matched–pole (MP) frequency sampling design. The latter is closely related to the filter design with iterative weighted least squares (WLS). The optimization is performed with an original MP design that yields an equiripple digitizing error. Then, a drastic reduction of the digitizing error is achieved through the introduction of a fractional time shift that minimizes the magnitude of the equiripple error within a given frequency interval. The optimal parameters thus obtained can be used to generate the EWD equations, together with a variable fractional delay output, as described in an earlier paper. Finally, in contrast to the WLS procedure, which relies on a “good guess” of the weighting function, the MP optimization is straightforward.
The thesis discusses a novel off-line and on-line learning approach for Fully Recurrent Neural Networks (FRNNs). The most popular algorithm for training FRNNs, the Real Time Recurrent Learning (RTRL) algorithm, employs the gradient descent technique for finding the optimum weight vectors in the recurrent neural network. Within the framework of the research presented, a new off-line and on-line variation of RTRL is presented, that is based on the Gauss-Newton method. The method itself is an approximate Newton’s method tailored to the specific optimization problem, (non-linear least squares), which aims to speed up the process of FRNN training. The new approach stands as a robust and effective compromise between the original gradient-based RTRL (low computational complexity, slow convergence) and Newton-based variants of RTRL (high computational complexity, fast convergence). By gathering information over time in order to form Gauss-Newton search vectors, the new learning algorithm, GN-RTRL, is capable of converging faster to a better quality solution than the original algorithm. Experimental results reflect these qualities of GN-RTRL, as well as the fact that GN-RTRL may have in practice lower computational cost in comparison, again, to the original RTRL.
In the signals and systems course and in the first course in digital signal processing, a signal is, most often, characterized by its amplitude spectrum in the frequency-domain and its amplitude profile in the time-domain. So much a student gets used to this type of characterization, that the student finds it difficult to appreciate, when encountered in the ensuing statistical signal processing course, the fact that a signal can also be characterized by its autocorrelation function in the time-domain and the corresponding power spectrum in the frequency-domain and that the amplitude characterization is not available. In this article, the characterization of a signal by its autocorrelation function in the time-domain and the corresponding power spectrum in the frequency-domain is described. Cross-correlation of two signals is also presented.
The auditory system of living creatures provides useful information about the world, such as the location and interpretation of sound sources. For humans, it means to be able to focus one's attention on events, such as a phone ringing, a vehicle honking, a person taking, etc. For those who do not suffer from hearing impairments, it is hard to imagine a day without being able to hear, especially in a very dynamic and unpredictable world. Mobile robots would also benefit greatly from having auditory capabilities. In this thesis, we propose an artificial auditory system that gives a robot the ability to locate and track sounds, as well as to separate simultaneous sound sources and recognising simultaneous speech. We demonstrate that it is possible to implement these capabilities using an array of microphones, without trying to imitate the human auditory system. The sound source localisation and tracking algorithm uses a steered beamformer to locate sources, which are then tracked using a multi-source particle filter. Separation of simultaneous sound sources is achieved using a variant of the Geometric Source Separation (GSS) algorithm, combined with a multisource post-filter that further reduces noise, interference and reverberation. Speech recognition is performed on separated sources, either directly or by using Missing Feature Theory (MFT) to estimate the reliability of the speech features. The results obtained show that it is possible to track up to four simultaneous sound sources, even in noisy and reverberant environments. Real-time control of the robot following a sound source is also demonstrated. The sound source separation approach we propose is able to achieve a 13.7 dB improvement in signal-to-noise ratio compared to a single microphone when three speakers are present. In these conditions, the system demonstrates more than 80% accuracy on digit recognition, higher than most human listeners could obtain in our small case study when recognising only one of these sources. All these new capabilities will allow humans to interact more naturally with a mobile robot in real life settings.
In this article, a physical explanation of the fundamentals of the DFT (fft) algorithms is presented in terms of waveform decomposition. After reading the article and trying the examples, the reader is expected to gain a clear understanding of the basics of the mysterious DFT (fft) algorithms.
This paper presents the hardware implementation of an algorithm developed to provide automatic motion detection and object tracking functionality embedded within intelligent CCTV systems. The implementation is targeted at an Altera Stratix FPGA making full use of the dedicated DSP resource. The Altera Nios embedded processor provides a platform for the tracking control loop and generic Pan Tilt Zoom camera interface. This paper details the explicit functional stages of the algorithm that lend themselves to an optimised pipelined hardware implementation. This implementation provides maximum data throughput, providing real-time operation of the described algorithm, and enables a moving camera to track a moving object in real time.
The Global Positioning System (GPS) is now in operation, and many improvements to its performance are being sought. One such improvement is Differential GPS (DGPS), where known errors in the GPS broadcast are identified and the corrections broadcast to the end user. One implementation of DGPS being considered is the use of coastal marine radio direction finding (RDF) radiobeacons in the 285-325kHz band as transmitters for the DGPS broadcast. The normal RDF beacon signal consists of a continuous carrier on a one kilohertz boundary plus a Morse-code identification signal 1025Hz above the carrier. In the DGPS/radiobeacon implementation proposed for the US coastal regions, the differential data link signal uses minimum shift keying (MSK) at a data rate of 25, 50, 100, 200 or 400 baud (the exact baud rat has not yet been decided). This MSK signal is centered between the RDF beacon carrier and identification signal. At the frequencies that these radiobeacons are operated, the prevailing atmospheric noise is both non-Gaussian and very strong. This noise characteristic makes the design of a long-range data link difficult. One solution that has been proposed is the use of forward error correction (FEC) coding of the data. The performance of FEC decoders can be improved by the used of a soft decision receiver, which delivers both bit decisions and information about the validity of the bit decisions. This work describes the design of a radio receiver for DGPS/Radiobeacon servics which is capable of reception of 400 baud MSK in the DGPS/Radiobeacon band. The receiver is designed to be easily augmented to provide soft decisions and easily modified to recieve MSK at data rates of 25 to 400 baud. The radio is a microprocessor controlled dual conversion superheterodyne with an audio frequency of 1kHz. The demodulator runs on the same microprocessor that controls the radio. The weak-signal performance of the demodulator is very good: the Eb/No vs. bit error rate performance of the demodulator is only a couple of dB worse than the theoretical performance of differential phase-shift keying. The radio has a noise floor of -114dBm referenced to it's 500Hz wide audio bandwidth and a 3rd order intermodulation intercept of +7dBm for a dynamic range of 83dB. This work concludes with a thumbnail analysis of the operations needed to implement a soft bit decision estimator, and some suggestions for the implementation of said soft bit decision estimator.
IMPLEMENTATION OF PERIODOGRAM SMOOTHING OF NOISYIMPLEMENTATION OF PERIODOGRAM SMOOTHING OF NOISY SIGNALS USING TMS320C6713 DSK
Periodogram Smoothing is a technique of power spectrum estimation. The discrete Fourier transform of a digital signal simply resolves the frequency components. The algorithm is implemented on Texas Instruments’ TMS320C6713 DSP Starter Kit (DSK). This is a 32-bit floating-point digital signal processor running at 225 MHz. The programs are basically written in the C programming language. However, those sections of code which are time-critical and memory-critical are written in assembly language of C6713. A MATLAB™ graphical user interface is also provided. The MATLAB™ program calls C programs loaded in Code Composer Studio (CCS). The C programs in turn call the assembly programs when required.
Automatic recognition of musical patterns plays a crucial part in Musicological and Ethno musicological research and can become an indispensable tool for the search and comparison of music extracts within a large multimedia database. This paper finds an efficient method for recognizing isolated musical patterns in a monophonic environment, using Hidden Markov Model. Each pattern, to be recognized, is converted into a sequence of frequency jumps by means of a fundamental frequency tracking algorithm, followed by a quantizer. The resulting sequence of frequency jumps is presented to the input of the recognizer which use Hidden Markov Model. The main characteristic of Hidden Markov Model is that it utilizes the stochastic information from the musical frame to recognize the pattern. The methodology is tested in the context of South Indian Classical Music, which exhibits certain characteristics that make the classification task harder, when compared with Western musical tradition. Recognition of 100% has been obtained for the six typical music pattern used in practise. South Indian classical instrument, flute is used for the whole experiment.
IS-95 is the present U.S. 2nd generation CDMA standard. Currently, the 2nd generation CDMA phones are produced by Qualcomm. Texas Instruments (TI) has ASIC design for Viterbi Decoder on C54x. Several of the components in the forward link process are also implemented in hardware. However, having to design a specific hardware for a particular application is expensive and time consuming. Thus, the possibility of the alternative implementations is of great interest to both customers and TI itself. This research has achieved in successful implementation of IS-95 entirely in software on TI fixed-point DSP TMS320C6201, and met the real time constraint. IS-95 system, the industrial standard for CDMA, is a very complicated system and extremely computationally demanding. The transmission rate for an IS-95 system is 1.2288 Mcps. This research project includes all the major components of the demodulation process for the forward link system: PN Descrambling, Walsh Despreading, Phase Correction & Maximal Ratio Combining, Deinterleaver, Digital Automatic Gain Control, and Viterbi Deccc:r. The entire demodulation process is done completely in C. That makes it a very attractive alternative implementation in the future applications. It is well known that ASIC design is not only expensive and but also time consuming, programming in assembly is easier and cheaper, but programming in C is a much easier and efficient way out, in particular, for general computer engineers. During the whole process, efforts have been devoted on developing various specific techniques to optimize the design for all the components involved. These developments are successfully achieved by making the best use of the following techniques: to simplify the algorithms first before programming, to look for regularity in the problem, to work toward the Compiler's full efficiency, and to use C intrinsics whenever possible. All these attributes together make the implementation scheme great for DSP applications. The benchmark results compare very well to the TI-internal hand scheduled assembly performance of the same type of decoders. The estimated percentage usage of all the components (excluding PN) is only 21.18% of the total CPU cycles available (4,000 K), which is very efficient and impressive.