DSPRelated.com
Forums

Vector quantisation for clustering

Started by smuglr October 25, 2005
I am hoping to identify clusters within a set of vectors using vector
quantisation. I have only just started looking into vector quantisation so
I'd like to ask a couple of questions. 

I'm implementing the LBG algorithm in Scilab, and it has just occurred to
me - what if I want to identify five clusters - or three? The LGB seems to
work only for 2^x number of clusters am I right? Perhaps this is blatantly
obvious, but I failed to notice until now.

Can anyone point me towards another algorithm which will perform
clustering for an arbitrary number of sets? My set of data is likely to be
in two to four dimensions and I'd like to split the data into anywhere from
2 to 6 subsets.

Cheers,
Dougie



		
This message was sent using the Comp.DSP web interface on
www.DSPRelated.com
smuglr wrote:
> I am hoping to identify clusters within a set of vectors using vector > quantisation. I have only just started looking into vector quantisation so > I'd like to ask a couple of questions. > > I'm implementing the LBG algorithm in Scilab, and it has just occurred to > me - what if I want to identify five clusters - or three? The LGB seems to > work only for 2^x number of clusters am I right? Perhaps this is blatantly > obvious, but I failed to notice until now. > > Can anyone point me towards another algorithm which will perform > clustering for an arbitrary number of sets? My set of data is likely to be > in two to four dimensions and I'd like to split the data into anywhere from > 2 to 6 subsets.
Why would you say that LBG only works for 2^N?? Where could you possibly have read that? How does: Assume N initial centroids Loop: For each centroid, determine the set of points that are closer to this centroid than to any other centroid For each set of points, determine the new centroid Until ( .... whatever suitable stop condition ....) Restrict you to work with a power of 2? That for the purpose of data compression, it *only makes sense* to use a number that is a power of 2, that's a different thing. The algorithm works for *any* number that is less than or equal to the number of training points (although in practice, the number should be much less than the number of training points). HTH, Carlos --
>Why would you say that LBG only works for 2^N?? Where could >you possibly have read that? > >How does: > >Assume N initial centroids >Loop: > For each centroid, determine the set of points that > are closer to this centroid than to any other centroid > > For each set of points, determine the new centroid >Until ( .... whatever suitable stop condition ....) > >Restrict you to work with a power of 2? > >Carlos >--
Ahhh -*moment of clarity*- the implementation I was following at http://www.data-compression.com/vq.shtml uses the splitting method to iteratively build an initial codebook which is then optimised by LBG, I thought the whole algorithm was the LBG method. I see what I need to look into now. Many thanks for pointing this out, Dougie