一个简单谱聚类的例子

xiaoxiao2024-07-12 144

聚类是一种常见的无监督学习方法，目的在于从原始无标记数据中提取出分类标记。最简单的代表是K-means聚类，下面给出一个简单例子：

n=300; c=3; t=randperm(n); x=[randn(1,n/3)-2 randn(1,n/3) randn(1,n/3)+2; randn(1,n/3) randn(1,n/3)+4 randn(1,n/3)]'; m=x(t(1:c),:); x2=sum(x.^2,2); s0(1:c,1)=inf; for o=1:1000 m2=sum(m.^2,2); [d,y]=min(repmat(m2,1,n)+repmat(x2',c,1)-2*m*x'); for t=1:c m(t,:)=mean(x(y==t,:)); s(t,1)=mean(d(y==t)); end if norm(s-s0)<0.001, break, end so=s; end figure(1); clf; hold on; plot(x(y==1,1),x(y==1,2),'bo'); plot(x(y==2,1),x(y==2,2),'rx'); plot(x(y==3,1),x(y==3,2),'gv');

一般K-means聚类只能处理线性可分的聚类问题，因为它采用欧式距离作为分类依据。对于非线性问题，我们可以采用核映射方法，用样本的内积来代替欧式距离。然而这种方法的最终聚类结果强力依赖于初始值的选择，当由核函数决定的特征空间维度比较高的时候，这种依赖非常明显。对此，可以使用降维的方法解决该问题，这种方法被称为谱聚类。

谱聚类的基本流程是在原始数据中利用局部保持投影法进行降维，然后直接运用K-means方法。下面给出一个简单的例子：

n=500; c=2; k=10; t=randperm(n); a=linspace(0,2*pi,n/2)'; x=[a.*cos(a), a.*sin(a); (a+pi).*cos(a), (a+pi).*sin(a)]; x=x+rand(n,2); x=x-repmat(mean(x),[n,1]); x2=sum(x.^2,2); d=repmat(x2,1,n)+repmat(x2',n,1)-2*x*(x'); [p,i]=sort(d); W=sparse(d<=ones(n,1)*p(k+1,:)); W=(W+W'~=0); D=diag(sum(W,2)); L=D-W; [z,v]=eigs(L,D,c-1,'sm'); m=z(t(1:c),:); z2=sum(z.^2,2); s0(1:c,1)=inf; for o=1:1000 m2=sum(m.^2,2); [u,y]=min(repmat(m2,1,n)+repmat(z2',c,1)-2*m*(z')); for t=1:c m(t,:)=mean(z(y==t,:)); s(t,1)=mean(d(y==t)); end if norm(s-s0)<0.001, break, end so=s; end figure(1); clf; hold on; axis([-10 10 -10 10]) plot(x(y==1,1),x(y==1,2),'bo'); plot(x(y==2,1),x(y==2,2),'rx');

相关资源：类平均聚类方法类平均聚类方法

最新回复(0)