Hi, Could somebody please point me to papers, tutorials, websites, etc about fast algorithm for histogram calculation? I need to classify 120000 values, distributed in 6400 classes (or bins). The particular thing here is that the classes are not uniformly spaced. The step is not calculated like (MaxValue-MinValue)/NumberOfClasses. I am given the edges of every bin (or class), instead. So, the way I am doing it now is by scanning every sample over all of the classes (bins) to check which of these does the sample belongs to. This kind of algorithm is rather simplistic, but works. The BIG drawback is that it takes too much time to execute: around 16 minutes in an ADSP-21065L, coded in C (without optimization), an execution time which I can not tolerate. Before going for optimization flags on the compiler, or doing assembly coding, I'd like to have your suggestions and advice for a more efficient algorithm to do it. Regards, JaaC
Fast histogram algorithm
Started by ●November 2, 2004
Reply by ●November 2, 20042004-11-02
I'm not sure if this is enough info to help you out. As I read it, you have a bunch of data that needs to be classified whether or not one particluar datum (sample) belongs to a particular class (bin), where each class (bin) can span several (sample?) values. Each class may have a different span of samples that it encompasses. Is that correct? Can't you just create a mapping function that directly creates the class number from the actual sample value? Rather than checking for the 6400 classes' boundaries by an if statement inside a loop try to set up a mapping function that creates the bin number by taking the sample value through a mapping function, quantized to an integer bin number... I don't know if this applies (or is possible) in your case, but from what you say it should be feasible... -- Stephan M. Bernsee http://www.dspdimension.com
Reply by ●November 2, 20042004-11-02
jaime.aranguren@ieee.org (Jaime Andres Aranguren Cardona) writes:> Hi, > > Could somebody please point me to papers, tutorials, websites, etc > about fast algorithm for histogram calculation? > > I need to classify 120000 values, distributed in 6400 classes (or > bins). The particular thing here is that the classes are not uniformly > spaced. The step is not calculated like > (MaxValue-MinValue)/NumberOfClasses. I am given the edges of every bin > (or class), instead. > > So, the way I am doing it now is by scanning every sample over all of > the classes (bins) to check which of these does the sample belongs to. > This kind of algorithm is rather simplistic, but works. The BIG > drawback is that it takes too much time to execute: around 16 minutes > in an ADSP-21065L, coded in C (without optimization), an execution > time which I can not tolerate. > > Before going for optimization flags on the compiler, or doing assembly > coding, I'd like to have your suggestions and advice for a more > efficient algorithm to do it. > > Regards, > > JaaCHi Jaime, An obvious thing to try is table-driven mapping, but this isn't practical if you have floating point data, and even with integer the table is big. I'm thinking with a 21065 you're talking float. What about sorting the data first using a fast sort algorithm? -- % Randy Yates % "Maybe one day I'll feel her cold embrace, %% Fuquay-Varina, NC % and kiss her interface, %%% 919-577-9882 % til then, I'll leave her alone." %%%% <yates@ieee.org> % 'Yours Truly, 2095', *Time*, ELO http://home.earthlink.net/~yatescr
Reply by ●November 2, 20042004-11-02
"Jaime Andres Aranguren Cardona" <jaime.aranguren@ieee.org> wrote in message news:14a86f87.0411020421.3bc955b@posting.google.com...> Hi, > > Could somebody please point me to papers, tutorials, websites, etc > about fast algorithm for histogram calculation? > > I need to classify 120000 values, distributed in 6400 classes (or > bins). The particular thing here is that the classes are not uniformly > spaced. The step is not calculated like > (MaxValue-MinValue)/NumberOfClasses. I am given the edges of every bin > (or class), instead. > > So, the way I am doing it now is by scanning every sample over all of > the classes (bins) to check which of these does the sample belongs to. > This kind of algorithm is rather simplistic, but works. The BIG > drawback is that it takes too much time to execute: around 16 minutes > in an ADSP-21065L, coded in C (without optimization), an execution > time which I can not tolerate. > > Before going for optimization flags on the compiler, or doing assembly > coding, I'd like to have your suggestions and advice for a more > efficient algorithm to do it. > > Regards, > > JaaCHi Jaime, You've probably thought of this already but would a fast sort help? I know there are a number of resources for performing fast sort, then you can just identify the difference in the index where you cross your next bin boundary? Best of Luck - Mike
Reply by ●November 2, 20042004-11-02
Randy Yates wrote:> jaime.aranguren@ieee.org (Jaime Andres Aranguren Cardona) writes: > > >>Hi, >> >>Could somebody please point me to papers, tutorials, websites, etc >>about fast algorithm for histogram calculation? >> >>I need to classify 120000 values, distributed in 6400 classes (or >>bins). The particular thing here is that the classes are not uniformly >>spaced. The step is not calculated like >>(MaxValue-MinValue)/NumberOfClasses. I am given the edges of every bin >>(or class), instead. >> >>So, the way I am doing it now is by scanning every sample over all of >>the classes (bins) to check which of these does the sample belongs to. >>This kind of algorithm is rather simplistic, but works. The BIG >>drawback is that it takes too much time to execute: around 16 minutes >>in an ADSP-21065L, coded in C (without optimization), an execution >>time which I can not tolerate. >> >>Before going for optimization flags on the compiler, or doing assembly >>coding, I'd like to have your suggestions and advice for a more >>efficient algorithm to do it. >> >>Regards, >> >>JaaC > > > Hi Jaime, > > An obvious thing to try is table-driven mapping, but this isn't practical > if you have floating point data, and even with integer the table is big. > I'm thinking with a 21065 you're talking float. > > What about sorting the data first using a fast sort algorithm?And if you can't sort all 120000 at once, bring it in in chunks small enough to sort. -- Tim Wescott Wescott Design Services http://www.wescottdesign.com
Reply by ●November 3, 20042004-11-03
Hi Stephan,> I'm not sure if this is enough info to help you out. As I read it, you > have a bunch of data that needs to be classified whether or not one > particluar datum (sample) belongs to a particular class (bin), where > each class (bin) can span several (sample?) values. Each class may have > a different span of samples that it encompasses. Is that correct?Absolutely!> Can't you just create a mapping function that directly creates the > class number from the actual sample value? > > Rather than checking for the 6400 classes' boundaries by an if > statement inside a loop try to set up a mapping function that creates > the bin number by taking the sample value through a mapping function, > quantized to an integer bin number... > > I don't know if this applies (or is possible) in your case, but from > what you say it should be feasible...Could be... will think about it. Regards, JaaC
Reply by ●November 3, 20042004-11-03
> Hi Jaime, > > An obvious thing to try is table-driven mapping, but this isn't practical > if you have floating point data, and even with integer the table is big. > I'm thinking with a 21065 you're talking float.Hi, Randy Yes, using float with he '065L. One concern with table mapping is that the range, or in other words, the maximum and minimum values of the 120000 samples is not the same for every block of 120000 samples... So, a different table for every block of data? Any further idea is very welcome.> What about sorting the data first using a fast sort algorithm?Yes, I've thinking about it. I'm also thinking of the best way to take advantage of the sorted version of data buffer. The first approach to come to my mind, maybe the simplest one, would be use a "while" loop, instead of a "for" loop... this way, the classification of the lower valued samples would certainly take a short time, and as the values become higher, it would take somewhat longer time, but definitely less than it takes with the current approach, for which the scanning is performed even if the data sample has already found the bin (or class) it belongs to. Any additional ideas, please? And thanks a lot in advance, JaaC
Reply by ●November 3, 20042004-11-03
> > Hi Jaime, > > > > An obvious thing to try is table-driven mapping, but this isn't practical > > if you have floating point data, and even with integer the table is big. > > I'm thinking with a 21065 you're talking float. > > > > What about sorting the data first using a fast sort algorithm? > > And if you can't sort all 120000 at once, bring it in in chunks small > enough to sort.Hi Tim, Will take your suggestion into account. Thanks, JaaC
Reply by ●November 3, 20042004-11-03
> Hi Jaime, > You've probably thought of this already but would a fast sort help? I > know there are a number of resources for performing fast sort, then you can > just identify the difference in the index where you cross your next bin > boundary? > Best of Luck - MikeHi Mike, Let me check if I fully understood your suggestion: 1. Sort the input data. 2. Start to scan the sorted data to check until which index (in the sorted data buffer) do I have values less than the first upper limit of the classes, which will mean, how many samples belong to the first class (or bin). 3. Repeat until all data is scanned. But I wouldn't need to start over again from the beginning, just continue from the next "unclassified" (but sorted) sample, to check until which index the samples belong to the class (bin). Is this what you meant? Any additional idea? Regards and TIA, JaaC
Reply by ●November 3, 20042004-11-03
"Jaime Andres Aranguren Cardona" <jaime.aranguren@ieee.org> wrote in message news:14a86f87.0411030342.4e81ad30@posting.google.com...> > Hi Jaime, > > You've probably thought of this already but would a fast sort help? >snip>> Hi Mike, > > Let me check if I fully understood your suggestion: > > 1. Sort the input data. > 2. Start to scan the sorted data to check until which index (in the > sorted data buffer) do I have values less than the first upper limit > of the classes, which will mean, how many samples belong to the first > class (or bin). > 3. Repeat until all data is scanned. But I wouldn't need to start over > again from the beginning, just continue from the next "unclassified" > (but sorted) sample, to check until which index the samples belong to > the class (bin). > > Is this what you meant? Any additional idea? > > Regards and TIA, > > JaaCYes - that's as far as I went, using a binary tree sort (or quick sort) anything you can get your hands on easilly must be faster than the 16 minutes you were quoting and I got the impression you were looking for a quick, low effort way to improve your histogram generation times rather than something that was as fast as you could get. You can pull free source code in C from several web sources and should not need to worry about compile optimisation to get comfortably into the 'a few seconds' range if you've got enough programme space and memory. Best of Luck - Mike






