DSPRelated.com
Forums

Communication between ARM and DSP

Started by alvin6 3 years ago9 replieslatest reply 3 years ago990 views

Hi all,

I am a newbie to DSP. Now our company wants to develop a product combine ARM & DSP. At the beginning, we were thinking about to use TI's am57x or keystone series soc. However, after checked the performance benchmark of TI's IPC(inter-processor communication) http://software-dl.ti.com/processor-sdk-rtos/esd/d... . The throughput is low and cpu utilization rate is relatively high. So our director wants to find a communicate method(like pcie or other parallel communication ) which can get high throughput between ARM and DSP(at least 20MB/s, ideally 50MB/s). Any idea about communication between ARM and DSP would be appreciated and helpful. The ARM and DSP no need to be on a soc.  


Thanks.

[ - ]
Reply by jmford94December 15, 2020

The link you shared is for a benchmark copying data through the Linux kernel into userspace.  You will get many times more performance if you are working with the DSP core to the processor core inside or below the Linux kernel.  The am57x series has many high-speed ports and memory subsystems to support fast data transfer, so it really should be able to do whatever it is you want to do, but as Chuck says below you should start with the problem and work towards a solution rather than jamming a solution into the problem.


[ - ]
Reply by jbrowerDecember 15, 2020

Alvin-

Based on this:

https://e2e.ti.com/support/processors/f/791/t/7747...

I'd say you have valid concerns, not to mention time spent debugging.

Unfortunately it seems the only Keystone way to move data between c66x and Arm cores is via shared mem.  TI should have connected the PCIe interface to the Arm corepack (as well as the c66x corepack), which would emulate a typical server architecture, where Arm cores run Linux and c66x cores run on a PCIe card. This would work equally well for embedded systems:  an FPGA could serve as "PCIe bus" for both sides, with an intermediary buffer. That would have allowed throughput well over 200 MB/sec for a 1x PCIe bus.

Unfortunately, TI did the Keystone II designs in 2014 and since then they went in the tank on DSP roadmap. Their inability to adapt to servers and server architecture standardized by the likes of Intel, Nvidia, Mellanox, etc was at the root of the tanking.

On the positive side for a TI approach, if you could combine a suitable TI DSP with an FPGA containing Arm core(s), you would then benefit from solid TI build and debug tools, code/algorithm base, and super reliable chips. A key question then would be which DSP and what does TI say about its remaining lifepan.

-Jeff

[ - ]
Reply by rbjDecember 15, 2020

Not sure your company wants to consider it but Analog Devices has a relatively new fire-breathing SoC that has an ARM and two SHArC DSPs on one chip.  But I dunno how much processing you need to do.  Maybe relatively little.

[ - ]
Reply by SlartibartfastDecember 15, 2020

DMA to shared memory?


[ - ]
Reply by rrlagicDecember 15, 2020

When it is SoC, shared memory, of course. 

When it is not, still DMA to dedicated buffer over high speed link like PCIe.

[ - ]
Reply by Bob11December 15, 2020

If the DSP can be implemented in firmware blocks there's always the Xilinx Zynq product line.

[ - ]
Reply by ChuckMcMDecember 15, 2020

Like others here, a known way to do this is to get an FPGA that incorporates and ARM hard "core" and DSP blocks in its FPGA resources. The Xilinx UltraScale and RFSoC both fit the bill and I've been building radios with both of them.

When doing it that way, the simplest technique is to just share memory and use the built in AXI DMA engine to move data into and out of the DSP and/or ARM's memory. We can basically saturate the ARM's memory bus doing this if we're not careful to leave some cycles for it. (even with each subsystem having its own DDR memory).

Xilinx also has blocks for an LVDS bus called Aurora which you can use with another FPGA to move data from one to the other. (so for example if you have an FPGA running the DACs and ADCs and another which is compute/DSP.) 

Other than that, not a lot of ARM chips have externalized memory buses so they will be limited in terms of what sort of I/O bus they do expose. The NXP i.MX processors have PCIe (1.x) available which can get you 50MB/s (500mbps)

All told though it seems a kind of backwards way to come at an engineering problem. Start with the problem and then extrapolate the requirements and then create the design space being the more typical path. 


What kind of "DSP" are you doing? Audio processing? RF processing? Vision processing? Each brings their own set of requirements to the party.

[ - ]
Reply by alvin6December 15, 2020

Hi ChunkMcM,


Thanks for your help and kind reply. Your answer really gives me some ideas and inspiration. 


Basically, I am an embedded Linux engineer who not knows much about DSP/FPGA and helping our DSP engineer to determine the hardware structures now. Also, our company is doing business with monitoring and analyzing of the power quality. 

I definitely agree with you that the requirements should be always considered in the first place and then is the technical details. 

Actually, we used ADI's BF609 connect with ARM core by SPI(which is only a couple of MB per second) in our last generation product. Now, our company wants to upgrade it by adding more channels, increasing the sampling rate, recording the detail of some transient processes, and etc. 


Really appreciate your help. 


[ - ]
Reply by ChuckMcMDecember 15, 2020

Well I can recommend the ADALM-PLUTO SDR from ADI. Its inexpensive ($150 list $99 educational pricing) and has a bunch of educational material around DSP processing w/ MATLAB support and FPGAs and has a Zynq (ARM core) tied to an ADI ADC and DAC. Probably the simplest pre-packaged way to dive in. (Hmm, I just checked and it is now $249 qty 1, check with your sales rep, they may be able to get you one for free if you're evaluating the 9363 part)