This white paper discusses the very large FFT (VLFFT) demo, which implements one-dimensional complex single-precision floating-point FFTs of size 16K to 1024K samples on 1, 2, 4 and 8 DSP cores of TI’s TMS320C6678 8-core fixed- and floatingpoint DSP in order to demonstrate the capabilities of the C66x DSP core as well as the capability of the architecture to accommodate parallelization across multiple cores with performance boosts proportional to the number of cores added. The FFT was chosen as the algorithm for this demo as FFTs are common signal processing building blocks used in applications such as medical imaging, communications, and military and commercial radars, and electronic warfare (jammers, anti-jammers). The 1024K sample FFT is shown to take only 6.4 ms when the algorithm is run on all eight DSP cores of the TMS320C6678 device at 1 GHz.
Introduced by Texas Instruments over thirty years ago, the digital signal processor (DSP) has evolved in its implementation from a standalone processor to a multicore processing element and has continued to extend in its range of applications. The breadth of software development tools for the DSP has also expanded to accommodate diverse sets of programmers. From small, low power, yet “smart” devices with applications such as voice and image recognition, to multicore, high-performance compute platforms performing real-time data analytics, the opportunities to achieve the low-power processing efficiencies of DSPs are nearly endless. The TI DSP has benefited from a relatively unique tool suite evolution making it easy and effective for the general programmer and the signal processing expert alike to quickly develop their application code. This paper addresses how TI DSP users are able to achieve the high performance afforded by the TI DSP architecture, in an efficient, easy-to-use development environment.