joggingsong@gmail.com wrote:
> I don't know about TI C6000, if someone have access latency of L1 data
> cache/SRAM, please tell me.

For c6000 architecture the L1 cache/SRAM is single-cycle access.

Hi,
Now VLIW architecture is chosen for powerful DSP such C6000 from TI,
MSC81xx from Freescale,
blackfin from Analog Devices. From papers and books, I learn that
register allocation and instruction scheduling are two important
phases in an optimizing compiler
for exploiting greater instruction level parallelism(ILP). Register
allocation tries to reduce
spill code, but in blackfin and MSC81xx stack is in cache or L1 SRAM,
of which access latency is 1 cycle.
I don't know about TI C6000, if someone have access latency of L1 data
cache/SRAM, please tell me.
I think it will not lead to degradation of performance. In the book
computer architecture:
a quantitative approach, ISA is not very important. So in my opinion,
trying to
use SIMD and special instructions, which is hard to express in c
language, is the right way to improve performance,
Other improvement ways should be in C language,  I can modify code to
have a new implementation architecture for
reducing memory access.

Best Regards
Jogging