In this thesis a real time test platform for a permanent magnet synchronous motor is developed. The implemented algorithm is Field Oriented Control (FOC) and it is implemented on a Texas Instruments TMS320F2808 Digital Signal Processor (DSP). The platform is developed in a rapid prototyping approach using Matlab/Simulink and the Real Time Workshop (RTW) packages.With this software the control algorithm and its interface to different DSP modules, such as A/D converter and PWM module, is constructed as a Simulink block scheme. The blocks used come from ordinary Simulink libraries and libraries provided by the RTW packages. From the Simulink block scheme Matlab can auto generate embedded C code adapted for different embedded targets, in this case the 2808 DSP.The developed real time test platform is also a Simulink model, though different from the algorithm model. When the start simulation command is given in the platform model a Graphical User Interface is loaded which lets the user specify motor parameters and certain algorithm parameters. Once the parameters are chosen RTW generates code from the algorithm model, loads it into the DSP and runs the generated program. From the platform model it is possible to set the reference speed of the motor in real time and monitor/log motor parameters such as actual speed and stator currents.
Audio processing algorithms are increasingly used in cell phones and today’s customers are placing more demands on cell phones. Feature phones, once the advent of mobile phone technology, nowadays do more than just providing the user with MP3 play back or advanced audio effects. These features have become an integral part of medium as well as low-end phones. On the other hand, there is also an endeavor to include as improved quality as possible into products to compete in market and satisfy users’ needs. Tackling the above requirements has been partly satisfied by the advance in hardware design and manufacturing technology. However, as new hardware emerges into market the need for competence to write efficient software and exploit the new features thoroughly and effectively arises. Even though compilers are also keeping up with the new tide space for hand optimized code still exist. Wrapped in the above goal, an effort was made in this thesis to partly cover the competence requirement at Multimedia Section (part of Ericsson Mobile Platforms) to develope optimized code for new processors. Forging persistently ahead with new products, EMP has always incorporated the latest technology into its products among which ARMv6 family of processors has the main central processing role in a number of upcoming products. To fully exploit latest features provided by ARMv6, it was required to probe its new instruction set among which new media processing instructions are of outmost importance. In order to execute DSP-intensive algorithms (e.g. Audio Processing algorithms) efficiently, the implementation should be done in low-level code applying available instruction set. Meanwhile, ARMv6 comes with a number of new features in comparison with its predecessors. SIMD (Single Instruction Multiple Data) and VFP (Vector Floating Point) are the most prominent media processing improvements in ARMv6. Aligned with thesis goals and guidelines, Reverb algorithm which is among one of the most complicated audio features on a hand-held devices was probed. Consequently, its kernel parts were identified and implementation was done both in fixed-point and floating-point using the available resources on hardware. Besides execution time and amount of code memory for each part were measured and provided in tables and charts for comparison purposes. Conclusions were finally drawn based on developed code’s efficiency over ARM compiler’s as well as existing code already developed and tailored to ARMv5 processors. The main criteria for optimization was the execution time. Moreover, quantization effect due to limited precision fixed-point arithmetic was formulated and its effect on quality was elaborated. The outcomes, clearly indicate that hand optimization of kernel parts are superior to Compiler optimized alternative both from the point of code memory as well as execution time. The results also confirmed the presumption that hand optimized code using new instruction set can improve efficiency by an average 25%-50% depending on the algorithm structure and its interaction with other parts of audio effect. Despite its many draw backs, fixed-point implementation remains yet to be the dominant implementation for majority of DSP algorithms on low-power devices.
The concept of Parallel Vector (scratch pad) Memories (PVM) was introduced as one solution for Parallel Computing in DSP, which can provides parallel memory addressing efficiently with minimum latency. The parallel programming more efficient by using the parallel addressing generator for parallel vector memory (PVM) proposed in this thesis. However, without hiding complexities by cache, the cost of programming is high. To minimize the programming cost, automatic parallel memory address generation is needed to hide the complexities of memory access. This thesis investigates methods for implementing conflict-free vector addressing algorithms on a parallel hardware structure. In particular, match vector addressing requirements extracted from the behaviour model to a prepared parallel memory addressing template, in order to supply data in parallel from the main memory to the on-chip vector memory. According to the template and usage of the main and on-chip parallel vector memory, models for data pre-allocation and permutation in scratch pad memories of ASIP can be decided and configured. By exposing the parallel memory access of source code, the memory access flow graph (MFG) will be generated. Then MFG will be used combined with hardware information to match templates in the template library. When it is matched with one template, suited permutation equation will be gained, and the permutation table that include target addresses for data pre-allocation and permutation is created. Thus it is possible to automatically generate memory address for parallel memory accesses. A tool for achieving the goal mentioned above is created, Permutator, which is implemented in C++ combined with XML. Memory access coding template is selected, as a result that permutation formulas are specified. And then PVM address table could be generated to make the data pre-allocation, so that efficient parallel memory access is possible. The result shows that the memory access complexities is hiden by using Permutator, so that the programming cost is reduced.It works well in the context that each algorithm with its related hardware information is corresponding to a template case, so that extra memory cost is eliminated.
As part of an ongoing project at the department of electrical engineering, ISY, at Linköping University, a voice decoder using floating point formats has been the focus of this master thesis. Previous work has been done developing an mp3-decoder using the floating point formats. All is expected to be implemented on a single DSP.The ever present desire to make things smaller, more efficient and less power consuming are the main reasons for this master thesis regarding the use of a floating point format instead of the traditional integer format in a GSM codec. The idea with the low precision floating point format is to be able to reduce the size of the memory. This in turn reduces the size of the total chip area needed and also decreases the power consumption.One main question is if this can be done with the floating point format without losing too much sound quality of the speech. When using the integer format, one can represent every value in the range depending on how many bits are being used. When using a floating point format you can represent larger values using fewer bits compared to the integer format but you lose representation of some values and have to round the values off.From the tests that have been made with the decoder during this thesis, it has been found that the audible difference between the two formats is very small and can hardly be heard, if at all. The rounding seems to have very little effect on the quality of the sound and the implementation of the codec has succeeded in reproducing similar sound quality to the GSM standard decoder.
The target of this master thesis is to evaluate the Image Warping technique and propose a possible design for an implementation in FPGA. The Image Warping is widely used in the image processing for image correction and rectification. A DSP is a usual choice for implantation of the image processing algorithms, but to decrease a cost of the target system it was proposed to use an FPGA for implementation. In this work a different Image Warping methods was evaluated in terms of performance, produced image quality, complexity and design size. Also, considering that it is not only Image Warping algorithm which will be implemented on the target system, it was important to estimate a possible memory bandwidth used by the proposed design. The evaluation was done by implemented a C-model of the proposed design with a finite datapath to simulate hardware implementation as close as possible.
Benchmarking of DSP kernel algorithms was conducted in the thesis on a DSP processor for teaching in the course TESA26 in the department of Electrical Engineering. It includes benchmarking on cycle count and memory usage. The goal of the thesis is to evaluate the quality of a single MAC DSP instruction set and provide suggestions for further improvement in instruction set architecture accordingly. The scope of the thesis is limited to benchmark the processor only based on assembly coding. The quality check of compiler is not included. The method of the benchmarking was proposed by BDTI, Berkeley Design Technology Incorporations, which is the general methodology used in world wide DSP industry. Proposals on assembly instruction set improvements include the enhancement of FFT and DCT. The cycle cost of the new FFT benchmark based on the proposal was XX% lower, showing that the proposal was right and qualified. Results also show that the proposal promotes the cycle cost score for matrix computing, especially matrix multiplication. The benchmark results were compared with general scores of single MAC DSP processors offered by BDTI.
This thesis is about implementing the functions for reciprocal, square root, inverse square root and logarithms on a DSP platform. A multi-core DSP platform that consists of one master processor core and several SIMD coprocessor cores is currently being designed by a team at the Computer Engineering Department of Linköping University. The SIMD coprocessors’ arithmetic logic unit (ALU) has 16 multipliers to support vector multiplication instructions. By efficiently using the 16 multipliers, it is possible to evaluate polynomials very fast. The ALU does not have (hardware) support for floating point arithmetic, so the challenge is to get good precision by using fixed point arithmetic. Precise and fast solutions to implement the mathematical functions are found by converting the fixed point input to a soft floating point format before polynomial approximation, choosing a polynomial based on an error analysis of the polynomial approximation, and using Newton-Raphson or Goldschmidt iterations to improve the precision of the polynomial approximations. Finally, suggestions are made of changes and additions to the instruction set architecture, in order to make the implementations faster, by efficiently using the currently existing hardware.
This Master thesis describes the benchmarking of a DSP processor. Benchmarking means measuring the performance in some way. In this report, we have focused on the number of instruction cycles needed to execute certain algorithms. The algorithms we have used in the benchmark are all very common in signal processing today. The results we have reached in this thesis have been compared to benchmarks for other processors, performed by Berkeley Design Technology, Inc. The algorithms were programmed in assembly code and then executed on the instruction set simulator. After that, we proposed changes to the instruction set, with the aim to reduce the execution time for the algorithms. The results from the benchmark show that our processor is at the same level as the ones tested by BDTI. Probably would a more experienced programmer be able to reduce the cycle count even more, especially for some of the more complex benchmarks.
Most of the tasks in a mobile cellular network base station are performed with programmable digital signal processors. Their memory spaces and management features are very limited. The buffering requirements in the base station can have large instantaneous variations during the simultaneous transmission of burst' data on multiple channels to multiple users. In particular the high bit-rates of the Wideband Code Division Multiple Access data transfer evolution High Speed Downlink Packet Access create very high demands for buffering. The fragmentation of the buffer memory is a threat. It causes a gradual decrease in performance, which is critical in a long running process like the base station. The amount of fragmentation is different with different memory management methods. In this work the features and applicability of different memory management methods for signal processors used in the base stations of third generation cellular networks have been studied. Software based memory management includes a high amount of conditional branches. The signal processor, which is optimized for highly parallel sequential computing, executes conditional branches very badly when compared to microcontrollers and general-purpose processors. The memory management methods are first studied in theory and then experimentally. In the experiments two different memory management methods were analyzed. The memory managers were loaded with a synthetic workload program that simulates multi-user high bit-rate data transmissions in the base station. The performances of the memory managers were measured in terms of fragmentation, execution time and memory utilization. The experiments confirmed the information gained from the theoretical studies that different memory management methods are usually optimized for a certain feature. The experiments showed that a simple method is fast to execute and works well with small and intermediate loads. When the load is increased the performance decreases. The second, more complex, measured method was found to require more computing, but to be capable of using the memory space assigned to it more effectively.
802.16e provides specifications for non line of sight, mobile wireless communications in the frequency range of 2-6 GHz. It is well implemented by using OFDMA as its physical layer scheme. The OFDM symbol time (sT) is to be selected depending on the channel conditions, available bandwidth and, simulations provide a means of selecting right values of sTin different channel conditions. Additionally it has been shown that certain values of sT outperform others in all conditions, thus invalidating their use. Moreover, a solution proposed by INTEL is also analyzed. One of the major requirements of OFDM is high synchronization. Detecting the timing offset of a new mobile user, entering the network, which is not time aligned using cross-correlation and ‘auto-correlation’ in time domain and cross-correlation in frequency domain at the base station has been simulated. Results point that the processing load can be significantly reduced by using frequency domain correlation of the received data or by using ‘auto-correlation’ followed by cross-correlation on localized data. The use of adaptive antenna system in 802.16e improves the system performance, where beamforming is implemented in the direction of desired user. Capon’s method and MUSIC method have been simulated to compute the direction of arrival for OFDMA uplink. A new user, while in the ranging process, transmits data with unknown time offset and unknown direction. The thesis describes the procedure to find the two unknown one after another.
The thesis discusses a novel off-line and on-line learning approach for Fully Recurrent Neural Networks (FRNNs). The most popular algorithm for training FRNNs, the Real Time Recurrent Learning (RTRL) algorithm, employs the gradient descent technique for finding the optimum weight vectors in the recurrent neural network. Within the framework of the research presented, a new off-line and on-line variation of RTRL is presented, that is based on the Gauss-Newton method. The method itself is an approximate Newton’s method tailored to the specific optimization problem, (non-linear least squares), which aims to speed up the process of FRNN training. The new approach stands as a robust and effective compromise between the original gradient-based RTRL (low computational complexity, slow convergence) and Newton-based variants of RTRL (high computational complexity, fast convergence). By gathering information over time in order to form Gauss-Newton search vectors, the new learning algorithm, GN-RTRL, is capable of converging faster to a better quality solution than the original algorithm. Experimental results reflect these qualities of GN-RTRL, as well as the fact that GN-RTRL may have in practice lower computational cost in comparison, again, to the original RTRL.
In this thesis, the design of a music synthesizer implementing the Scalable Polyphony-MIDI soundset on a low cost DSP system is presented. First, the SP-MIDI standard and the target DSP platform are presented followed by review of commonly used synthesis techniques and their applicability to systems with limited computational and memory resources. Next, various oscillator and ﬁlter algorithms used in digital subtractive synthesis are reviewed in detail. Special attention is given to the aliasing problem caused by discontinuities in classical waveforms, such as sawtooth and pulse waves and existing methods for bandlimited waveform synthesis are presented. This is followed by review of established structures for computationally efﬁcient time-varying ﬁlters. A novel digital structure is presented that decouples the cutoff and resonance controls. The new structure is based on the analog Korg MS-20 lowpass ﬁlter and is computationally very efﬁcient and well suited for implementation on low bitdepth architectures. Finally, implementation issues are discussed with emphasis on the Differentiated Parabole Wave oscillator and MS-20 ﬁlter structures and the effects of limited computational capability and low bitdepth. This is followed by designs for several example instruments.
This Master Thesis presents the design of the core of a fixed point general purpose multimedia DSP processor (MDSP) and its instruction set. This processor employs parallel processing techniques and specialized addressing models to speed up the processing of multimedia applications. The MDSP has a dual MAC structure with one enhanced MAC that provides a SIMD, Single Instruction Multiple Data, unit consisting of four parallel data paths that are optimized for accelerating multimedia applications. The SIMD unit performs four multimedia-oriented 16-bit operations every clock cycle. This accelerates computationally intensive procedures such as video and audio decoding. The MDSP uses a memory bank of four memories to provide multiple accesses of source data each clock cycle.
One of the major threats to wireless communications is jamming. Many anti-jamming techniques have been presented in the past. However most of them are based on the precondition that the communicating devices have a pre-shared secret that can be used to synchronize the anti-jamming scheme. E.g. for frequency hopping the secret could be used to derive the hopping sequence and for direct sequence spread spectrum the secret is used to derive the spreading codes. But how can the devices bootstrap a jamming-resistant communication without having a pre-shared secret? Christina Popper and Mario Strasser propose as scheme for Uncoordinated Frequency Hopping (UFH) and Uncoordinated Direct Sequence Spread Spectrum (UDSSS) in their papers  and  respectively. The goal of my project was an implementation of Uncoordinated Direct Sequence Spread Spectrum (UDSSS) using Software Dened Radios. The First version should serve as an easy to use and extendable proof of conceptfor the proposed scheme.
The increased complexity of Digital Signal Processing (DSP) algorithms demands for the development of more complex and more eﬃcient hardware structures. The work presented herein describes the core components for the development of a tool capable of automatic generation of eﬃcient hardware structures, therefore facilitating developers work. It comprises algorithms and techniques for i) balancing the paths in a graph, ii) scheduling of operations to functional units, iii) allocating registers and iv) generating the VHDL code. Results show that the developed techniques are capable of generating the hardware structure of typical DSP algorithms represented in data-ﬂow graphs with over 2,000 nodes in around 200 ms, scaling to 80,000 nodes in about 214 s. Within the developed techniques, solving the scheduling problem is one of the most complex tasks: it is a NP-complete problem and directly inﬂuences the number of functional units and registers required. Therefore, experimental analysis was made on scheduling algorithms for time-constrained problems. Results show that simple list-based algorithms are more eﬃcient in large problems than more complex algorithms: they run faster and tend to require less functional units.
Today, a high noise level is considered a problem in many working environments. The main reason is that it contributes to stress and fatigue. Traditional methods using passive noise control is only practicable for high frequencies. As a complement to passive noise control, active noise control (ANC) can be used to reduce low frequency noise. The main idea of ANC is to use destructive interference of waves to cancel disturbing noises. The purpose of this thesis is to design and implement an ANC system in the driver's cabin of a Valmet 890 forest machine. The engine boom is one of the most disturbing noises and therefore the main subjective for the ANC system to suppress. The ANC system is implemented on a Texas Instrument DSP development starter kit. Different FxLMS algorithms are evaluated with feedback and feedforward configurations. The results indicate that an ANC system significantly reduces the sound pressure level (SPL) in the cabin. Best performance of the evaluated systems is achieved for the feedforward FxLMS system. For a commonly used engine speed of 1500 rpm, the SPL is reduced with 17 dB. The results show fast enough convergence and global suppression of low frequency noise.
This master thesis consists of implementation and evaluation of an AEC, Acoustic Echo Canceller, algorithm in a floating-point architecture. The most important question this thesis will try to answer is to determine benefits or drawbacks of using a floating-point architecture, relative a fixed-point architecture, to do AEC. In a telephony system there is two common forms of echo, line echo and acoustic echo. Acoustic echo is introduced by sound emanating from a loudspeaker, e.g. in a handsfree or speakerphone, being picked up by a microphone and then sent back to the source. The problem with this feedback is that the far-end speaker will hear one, or multiple, time-delayed version(s) of her own speech. This time-delayed version of speech is usually perceived as both confusing and annoying unless removed by the use of AEC. In this master thesis the performance of a floating-point version of a normalized least-mean-square AEC algorithm was evaluated in an environment designed and implemented to approximate live telephony calls. An instruction-set simulator and assembler available at the initiation of this master thesis were extended to enable; zero-overhead loops, modular addressing, post-increment of registers and register-write forwarding. With these improvements a bit-true assembly version was implemented capable of real-time AEC requiring 15 million instructions per second. A solution using as few as eight mantissa bits, in an external format used when storing data in memory, was found to have an insignificant effect on the selected AEC implementation’s performance. Due to the relatively low memory requirement of the selected AEC algorithm, the use of a small external format has a minor effect on the required memory size. In total this indicates that the possible reduction of the memory requirement and related energy consumption, does not justify the added complexity and energy consumption of using a floating-point architecture for the selected algorithm. Use of a floating-point format can still be advantageous in speech-related signal processing when the introduced time delay by a subband, or a similar frequency domain, solution is unacceptable. Speech algorithms that have high memory use and small introduced delay requirements are a good candidate for a floating-point digital signal processor architecture.
Ogg Vorbis is a fairly new and growing audio format, often used for online distribution of music and internet radio stations for streaming audio. It is considered to be better than MP3 in both quality and compression and in the same league as for example AAC. In contrast with many other formats, like MP3 and AAC, Ogg Vorbis is patent and royalty free. The purpose of this thesis project was to investigate how the C6416 DSP processor and a Stratix II FPGA could be connected to each other and work together as co-processors and using an Ogg Vorbis decoder as implementation example. A fixed-point decoder called Tremor (developed by Xiph.Org the creator of the Vorbis I specification), has been ported to the DSP processor and an Ogg Vorbis player has been developed. Tremor was profiled before performing the software / hardware partitioning to decide what parts of the source code of Tremor that should be implemented in the FPGA to off-load and accelerate the DSP.
A Prototype Laboratory Environment for Digital Signal Processing Using Simulink and a Texas Instrument DSP Device
Normally, when a model is designed from building blocks in Simulink, the simulation is performed within the Simulink environment. A test of the design in a real-time environment requires that source code is generated, compiled and downloaded to the target hardware. As a first attempt to bridge this software gap, this thesis describes and evaluates a prototype laboratory environment, which directly links Simulink to a Texas Instrument DSP device. The prototype system converts graphical models and makes available various real-time signal processing algorithms, such as adders, delays, FFTs, IIR filters and multipliers. Future work is to consider modification of the prototype to allow for feedback in the graphical models and to find an efficient way of handling signal processing algorithms where variable buffer lengths are required.
The aim of this study is to examine the capability, performance, and relevance of a signal processing toolkit in Java, a programming language for Web-based applications. Due to the simplicity, ease and application use of the toolkit and with the advanced Internet technologies such as Remote Method Invocation (RMI), a spectral estimation applet has been created in the Java environment. This toolkit also provides an interactive and visual approach in understanding the various theoretical concepts of spectral estimation and shows the need to create more application applets to better understand the various concepts of signal and image processing. This study also focuses on creating a Java toolkit for embedded systems, such as Personal Digital Assistants (PDAs), embedded Java board, and supporting integer precision, and utilizing COordinate Rotation DIgital Computer (CORDIC) algorithm, both aimed to provide good performance in resource-limited environments. The results show a feasibility and necessity of developing a standardized Application Programming Interface (API) for the fixed-point signal processing library.