I am performing frame by frame auto correlation of a 1D signal in labview. My frame size is 250K samples where each sample size is 16 bits. Labview takes about 200ms to perform this operation and i need to use a dsp processor to improve this time period.I dont have much experience with DSP processors so i need guidance in selecting the right processor for this task and how much improvement can i expect in timing.Any help in this regard is highly appreciated.
Firstly, I would like to offer my deepest condolences that you have to work with LabVIEW. In my opinion, it is quite nice for connecting simple pieces of hardware together, and ridiculously bad for doing just about anything else.
Before considering a DSP processor, are you certain that your existing implementation is well optimised? Autocorrelations can be computed extremely efficiently using FFTs. Sorry if this is obvious, but it's an important thing to check first.
As a baseline, I would throw together a simple auto-correlator in a half-decent programming language of your choice (e.g. C++), using an FFT library (FFTW is okay and free; faster alternatives exist) and import it to LabVIEW as a DLL. Only if that's not fast enough would I consider going down the DSP route.
Also, if you happen to have an FPGA already in your system, then you might also consider using a small part of that instead of getting a DSP. Otherwise, a DSP is a very good tool for the job. I'm no expert with DSPs, so my best guess would be that you would probably be looking at a moderate to enormous speed-up depending on whether you want to spend a moderate or enormous sum of money on it.
I believe that @weetabixharry has quite a good reason to point you to some more alternatives than going directly to an Micro-Processor-Unit(MPU)/Micro-Controller-Unit(MCU) solution.
I just recently started working on DSPs and MCU and I myself can't provide too much insight but if you end up on this solution I believe you should go for one for which the manufacturing company provides enough good tools/facilities that will work for you. Most probably the results you will get with the vast majority of today's MPUs/MCUs will be (as weetabixharry mentions) at least moderate. This means that you would, most probably, want to go for a (on top of computationally efficient, in order to achieve speed) easily implementable solution. Embedded programming can become cumbersome if you don't have the right tools (most probably the right "hardware abstractions").
I would definitely consider the first alternative weetabixharry suggests (the C/C++ solution) a quick one that might even do the job. Also, if you indeed have an FPGA, then by all means use it, it will most probably end up even faster than the MPU/MCU alternative.
I am not sure I did help here, but I hope you'll find the way that suits you best.
One more thing to consider is the precision of the results you will get. Most probably in LabVIEW you can work in 64-bit, but in the MCU/MPU you will have to "put some extra effort" to achieve the same precision. Nevertheless many manufacturers provide optimized DSP routines for their products.
I second weetabixharry's take on LabVIEW; we have a large amount of LabVIEW code here at Raytheon, and it is a nightmare to maintain. I wasn't even aware that you could do stuff like autocorrelation in LabVIEW, but I suppose if that's what you have available, you make it work.
I also second the opinions that jumping into hardware could get complex, expensive, and time-consuming in a hurry. As a hardware guy with over 30 years experience, I myself would hesitate to take the plunge, although probably mostly because of the software that would need to be written to make things work.
Good advice from the others.
If you're using LabVIEW and testing/simulating solely with PCs at this point, then weetabixharry's suggestion is very good. Searching for:
shows lots of examples. Find one that works, it should be very fast, my guess is 10x faster than LabVIEW if you have 4 or more cores in your PC.
A brief survey of LabVIEW multithreading (multiple core) capabilities shows it's not doing anything for single block multithreading. They seem to be running diagrams and possibly blocks concurrently where possible, but a single autocorr block is probably stuck on one core.
If you really need a real-time operation, you may try DSP processor based solution. But unless you need that, I suggest to use Octave/Matlab, Python based analysis/solutions, and then C/C++ based solutions.