Reply by Martin Strubel November 27, 20122012-11-27
Hi there,

the reply might come a bit late, but anyhow: I haven't done any high
speed JPEG encoding on the BF561, but did some PAL YUV420 at 10 fps
(only asm-coded 2D DCT) MJPEG and rather stupid bayer conversion in
uClinux on the single core Blackfins. Actually, the Bayer burns many cycles.
If you have a video source that already delivers some YUV format, like a
few CMOS sensors, you'll likely reach the 25 fps with a dual core.
To be on the safe side, you could put the JPEG encoding into an FPGA.
That works rather well. Some more information here:
http://tech.section5.ch/news/?p=219

Cheers,

- Martin


>> >> Well, ADI's numbers certainly point to it being feasible. No matter what >> idea you choose as a candidate, I think it's a Really Good Idea to buy an >> eval kit and try it out before you start laying out a board > > Oh, absolutely. Diving into a new (at least to me - I think I understand the theory, but theory and practice are two different things :) technology, I like to have a "known-good" position on at least something. That gives me a way to build the knowledge on a firm foundation - start off simple with the appropriate "hello, world" code, and iterate through increasing levels of complexity until you get a solution. > > There's actually a newer blackfin board that seems to be "the future", but since it doesn't come with video-in onboard, I'm sticking to the one that does. Reduce risk is something of a mantra for me :) Not being on the bleeding edge while learning stuff is generally a good idea too. > >> -- and make >> sure that you've identified as many possible sources of slowness as you >> can, and have them as many in there (or at least accounted for) as you > > Yeah, this is where the uncertainty lies. They actually have an eval-kit that does everything I could ask for (http://www.analog.com/en/evaluation/bf561-ezlite/eb.html). Video in routed to the DSP, lots of i/o (albeit on prototyping-unfriendly connectors) and sufficient onboard RAM. This isn't a business idea, it's an attempt to reincarnate a (really expensive) solution that my company (before I sold it :) used to supply about a decade ago. Since it's just a hobby project, I'd like to have a *bit* of certainty before I splash the cash :) > > I've been trying to join Analog Device's forum to ask pertinent questions there, but after typing in your details they try to send an email to confirm your email address, and that email is never sent (or at least it never arrives). Thus no ability to log into their forum. Thus frustration. > > I've sent emails to their web-team, but we'll see if that garners any reaction.
Reply by September 10, 20122012-09-10
On Monday, September 10, 2012 9:46:54 AM UTC-7, Tim Wescott wrote:
>=20 > Well, ADI's numbers certainly point to it being feasible. No matter what=
=20
> idea you choose as a candidate, I think it's a Really Good Idea to buy an=
=20
> eval kit and try it out before you start laying out a board=20
Oh, absolutely. Diving into a new (at least to me - I think I understand th= e theory, but theory and practice are two different things :) technology, I= like to have a "known-good" position on at least something. That gives me = a way to build the knowledge on a firm foundation - start off simple with t= he appropriate "hello, world" code, and iterate through increasing levels o= f complexity until you get a solution. There's actually a newer blackfin board that seems to be "the future", but = since it doesn't come with video-in onboard, I'm sticking to the one that d= oes. Reduce risk is something of a mantra for me :) Not being on the bleedi= ng edge while learning stuff is generally a good idea too.
> -- and make=20 > sure that you've identified as many possible sources of slowness as you=
=20
> can, and have them as many in there (or at least accounted for) as you=20
Yeah, this is where the uncertainty lies. They actually have an eval-kit th= at does everything I could ask for (http://www.analog.com/en/evaluation/bf5= 61-ezlite/eb.html). Video in routed to the DSP, lots of i/o (albeit on prot= otyping-unfriendly connectors) and sufficient onboard RAM. This isn't a bus= iness idea, it's an attempt to reincarnate a (really expensive) solution t= hat my company (before I sold it :) used to supply about a decade ago. Sinc= e it's just a hobby project, I'd like to have a *bit* of certainty before I= splash the cash :) I've been trying to join Analog Device's forum to ask pertinent questions t= here, but after typing in your details they try to send an email to confirm= your email address, and that email is never sent (or at least it never arr= ives). Thus no ability to log into their forum. Thus frustration. I've sent emails to their web-team, but we'll see if that garners any react= ion.
Reply by Tim Wescott September 10, 20122012-09-10
On Sun, 09 Sep 2012 14:33:17 -0700, krudthebarbarian wrote:

> On Sunday, September 9, 2012 1:46:32 PM UTC-7, Tim Wescott wrote: >> On Sun, 09 Sep 2012 12:29:22 -0700, krudthebarbarian wrote: >> >> > The ADI site does say the memory setup is optimal, but with the sizes >> > of images they're using, they must be storing them in SDRAM, so it >> > must be streaming them from SDRAM to L1/L2 and out again after >> > operating on them. In other words, I don't think the ADI setup is >> > *too* fake. >> >> > So, where's the beef ? And, at the end of the day, is it possible to >> > do this on the Blackfin ? >> >> >> Whatever ADI is quoting, it's very likely a core algorithm, not close >> to everything you need. > > At the end of the day, I don't really care if I get JPEG files out, as > long as there's a relatively painless way to make them into JPEGs. > >> So it may be that getting things decoded into fast memory (which is way >> easy to write to) may not be the real bottleneck, or may be only one of >> the bottlenecks to getting things decoded and out onto a network. > > Right, but the 92ms figure above is for memory->memory too, although to > be fair he doesn't say which Blackfin he's using. ADI are using a > slower/less capable Blackfin than the one I had in mind to use (they are > using a single-core '548, I'm looking at the dual-core '561). The > network stuff is an additional overhead (although I don't think it'll be > too onerous. > > I was also trying to think how I could set up a JPEG compression test to > be, ahem, advantageous to the marketing people. Given that they're > quoting a final JPEG file size, and results for various JPEG compression > ratios in terms of DSP cycles, I'm having trouble thinking that it's > anything other than it appears to be. Which os why I'm asking, of course > :) > > The idea is to have multiple video feeds streaming information into a > central server. I'm using single frame compression (JPEG, maybe JPEG-2k) > because of the unique nature of the video sources - I'll be > back-stepping and overwriting individual frames with new data very > frequently but still want to play back arbitrary sequences of frames > afterwards. > > So, I have to have compression done at the video-source (or I'll > overload the network as well as put too large a load on the server to do > the compression), but on playback, it'll be a web-browser / quicktime > interface, so if I can easily transform the data to an MJPEG stream (or > similar), I'll be fine. If that means prepending a header, appending a > footer or whatever to every frame, I'm fine with that. I don't have to > have everything perfectly standard at the instant it's compressed, is > what I'm trying to say. > >> Can you take a canned set of 30 jpeg images of the size you want and >> pump them out over your network at the speed you want? How much of the >> processor is left over while you're doing this for decoding? > > Yeah, the jpeg compression brings them down to ~60k each. A 100Mbit > network can easily handle that load. I was figuring that if one of the > DSP cores is doing the JPEG compression, the other could do the network > management - a simple producer/consumer queue would handle that simply > enough. On the mac, it doesn't even blip the processor to do this :) On > the DSP, I was figuring I could use the ENC28J60 and then it's just a > matter of sending data via SPI. Depending on how I can arrange it, I > might be able to do this via DMA and not even bother the DSP itself. > > Of course, it'd be nice to know it was at least feasible before plonking > down the $500 for the evaluation kit :)
Well, ADI's numbers certainly point to it being feasible. No matter what idea you choose as a candidate, I think it's a Really Good Idea to buy an eval kit and try it out before you start laying out a board -- and make sure that you've identified as many possible sources of slowness as you can, and have them as many in there (or at least accounted for) as you can. -- My liberal friends think I'm a conservative kook. My conservative friends think I'm a liberal kook. Why am I not happy that they have found common ground? Tim Wescott, Communications, Control, Circuits & Software http://www.wescottdesign.com
Reply by September 9, 20122012-09-09
On Sunday, September 9, 2012 1:46:32 PM UTC-7, Tim Wescott wrote:
> On Sun, 09 Sep 2012 12:29:22 -0700, krudthebarbarian wrote: >=20 > > The ADI site does say the memory setup is optimal, but with the sizes o=
f
> > images they're using, they must be storing them in SDRAM, so it must be > > streaming them from SDRAM to L1/L2 and out again after operating on > > them. In other words, I don't think the ADI setup is *too* fake. >=20 > > So, where's the beef ? And, at the end of the day, is it possible to do > > this on the Blackfin ? > >=20 > Whatever ADI is quoting, it's very likely a core algorithm, not close to=
=20
> everything you need.
At the end of the day, I don't really care if I get JPEG files out, as long= as there's a relatively painless way to make them into JPEGs.
> So it may be that getting things decoded into fast memory (which is way=
=20
> easy to write to) may not be the real bottleneck, or may be only one of=
=20
> the bottlenecks to getting things decoded and out onto a network.
Right, but the 92ms figure above is for memory->memory too, although to be = fair he doesn't say which Blackfin he's using. ADI are using a slower/less= capable Blackfin than the one I had in mind to use (they are using a singl= e-core '548, I'm looking at the dual-core '561). The network stuff is an ad= ditional overhead (although I don't think it'll be too onerous. =20 I was also trying to think how I could set up a JPEG compression test to be= , ahem, advantageous to the marketing people. Given that they're quoting a = final JPEG file size, and results for various JPEG compression ratios in te= rms of DSP cycles, I'm having trouble thinking that it's anything other tha= n it appears to be. Which os why I'm asking, of course :) The idea is to have multiple video feeds streaming information into a centr= al server. I'm using single frame compression (JPEG, maybe JPEG-2k) because= of the unique nature of the video sources - I'll be back-stepping and over= writing individual frames with new data very frequently but still want to p= lay back arbitrary sequences of frames afterwards.=20 So, I have to have compression done at the video-source (or I'll overload t= he network as well as put too large a load on the server to do the compress= ion), but on playback, it'll be a web-browser / quicktime interface, so if = I can easily transform the data to an MJPEG stream (or similar), I'll be fi= ne. If that means prepending a header, appending a footer or whatever to ev= ery frame, I'm fine with that. I don't have to have everything perfectly st= andard at the instant it's compressed, is what I'm trying to say.
> Can you take a canned set of 30 jpeg images of the size you want and pump=
=20
> them out over your network at the speed you want? How much of the=20 > processor is left over while you're doing this for decoding?
Yeah, the jpeg compression brings them down to ~60k each. A 100Mbit network= can easily handle that load. I was figuring that if one of the DSP cores i= s doing the JPEG compression, the other could do the network management - a= simple producer/consumer queue would handle that simply enough. On the mac= , it doesn't even blip the processor to do this :) On the DSP, I was figuri= ng I could use the ENC28J60 and then it's just a matter of sending data via= SPI. Depending on how I can arrange it, I might be able to do this via DMA= and not even bother the DSP itself.=20 Of course, it'd be nice to know it was at least feasible before plonking do= wn the $500 for the evaluation kit :) Simon
Reply by Tim Wescott September 9, 20122012-09-09
On Sun, 09 Sep 2012 12:29:22 -0700, krudthebarbarian wrote:

> So, I've been looking at a few ways of creating a module that does > "video-in" -> "convert to stream of individual JPEG images" -> "send > over a network". One way might be to use a Blackfin DSP. > > I'm having some difficulty trying to figure out whether the DSP would be > up to the job though. On the ADI site, they seem to be quoting ~31 > cycles/pixel > (http://www.analog.com/en/processors-dsp/blackfin/bf_jpeg_motion-jpeg/
products/product.html)
> for quality 60, or ~15 million cycles for an SD video frame (or > 25ms/frame). That means at 30fps I'd need 450MHz, which is comfortably > within the 600MHz budget of the part (and in fact there are two DSP's on > board, so hey, this ought to be simple, right ? Right ??) > > Then, on the other hand I see people who've been doing this a lot longer > than I (because this is my first foray into DSPs) getting a 752x512 (not > a million miles away from SD video) frame out in 92ms. What ? That's > only ~10 fps, and basically means it's worthless to me. > > The ADI site does say the memory setup is optimal, but with the sizes of > images they're using, they must be storing them in SDRAM, so it must be > streaming them from SDRAM to L1/L2 and out again after operating on > them. In other words, I don't think the ADI setup is *too* fake. > > So, where's the beef ? And, at the end of the day, is it possible to do > this on the Blackfin ? > > I've looked at FPGA's (I have some experience there), at XMOS chips to > parallelise the problem, at a honking ARM chip (they're actually pretty > fast these days, and I've worked with the NEON stuff before). Quite > apart from the attraction of learning a new tool (the DSP) it seems > there's less "support stuff" needed for the Blackfin, and it's in a > fairly friendly package (it's a BGA, but only the outer two rows are > used). > > Any help gratefully appreciated :) > > Simon
Whatever ADI is quoting, it's very likely a core algorithm, not close to everything you need. So it may be that getting things decoded into fast memory (which is way easy to write to) may not be the real bottleneck, or may be only one of the bottlenecks to getting things decoded and out onto a network. Can you take a canned set of 30 jpeg images of the size you want and pump them out over your network at the speed you want? How much of the processor is left over while you're doing this for decoding? -- My liberal friends think I'm a conservative kook. My conservative friends think I'm a liberal kook. Why am I not happy that they have found common ground? Tim Wescott, Communications, Control, Circuits & Software http://www.wescottdesign.com
Reply by September 9, 20122012-09-09
So, I've been looking at a few ways of creating a module that does "video-i=
n" -> "convert to stream of individual JPEG images" -> "send over a network=
". One way might be to use a Blackfin DSP.

I'm having some difficulty trying to figure out whether the DSP would be up=
 to the job though. On the ADI site, they seem to be quoting ~31 cycles/pix=
el (http://www.analog.com/en/processors-dsp/blackfin/bf_jpeg_motion-jpeg/pr=
oducts/product.html) for quality 60, or ~15 million cycles for an SD video =
frame (or 25ms/frame). That means at 30fps I'd need 450MHz, which is comfor=
tably within the 600MHz budget of the part (and in fact there are two DSP's=
 on board, so hey, this ought to be simple, right ? Right ??)

Then, on the other hand I see people who've been doing this a lot longer th=
an I (because this is my first foray into DSPs) getting a 752x512 (not a mi=
llion miles away from SD video) frame out in 92ms. What ? That's only ~10 f=
ps, and basically means it's worthless to me.

The ADI site does say the memory setup is optimal, but with the sizes of im=
ages they're using, they must be storing them in SDRAM, so it must be strea=
ming them from SDRAM to L1/L2 and out again after operating on them. In oth=
er words, I don't think the ADI setup is *too* fake.

So, where's the beef ? And, at the end of the day, is it possible to do thi=
s on the Blackfin ?=20

I've looked at FPGA's (I have some experience there), at XMOS chips to para=
llelise the problem, at a honking ARM chip (they're actually pretty fast th=
ese days, and I've worked with the NEON stuff before). Quite apart from the=
 attraction of learning a new tool (the DSP) it seems there's less "support=
 stuff" needed for the Blackfin, and it's in a fairly friendly package (it'=
s a BGA, but only the outer two rows are used).

Any help gratefully appreciated :)

Simon