knn代码实现

xiaoxiao2022-07-13 140

文章目录

原理代码相关测试代码

原理

knn原理:存在一个样本数据集合(训练集),并且样本集里面每个数据都存在标签；输入没有标签的新数据之后,将新数据的每个特征与样本集里面的数据对应进行比较(计算欧式距离)，而后算法提取样本集里面的特征最相似的前k个数据,通过投票的方式来选择标签:

代码

import numpy as np import operator def createDataSet(): matrix = np.array([[1.0,1.1], [1.0, 1.0], [0.0, 0.0], [0.0, 0.1]])##创建矩阵 classVector = ['A','A','B','B'] ##标明类别 return matrix,classVector matrix,classVector = createDataSet(); ##算法封装 def classify(inX ,matrix,classvector,k):##inX为待检测目标数据,k指的是样本集中最相似的前K个数据 dataSetSize = matrix.shape[0] ##matrix.shape[0]代表的是matrix这个矩阵的行数,matrix.shape[1]代表的是matrix这个矩阵的列数 diffmat = np.tile(inX,(dataSetSize,1)) - matrix ##做差 sqDiffMat= diffmat ** 2 ## 平方 sqDistances = sqDiffMat.sum(axis=1) ##求平方和 distance = sqDistances ** 0.5 ##求出每个样本与未知样本的距离 sortDistIndicies = distance.argsort() ##根据下标来进行排序;argsort函数是将distance中的元素从小到大排列，提取其对应的index(索引)，然后输出,注意:这里将所有距离都给求出来了(并且是按照从小到大来排的)！！！ print(sortDistIndicies) classCount = {} ##自定义了一个字典,用于保存{A:2,B:1},里面的A，B为种类,2,1为下标(索引) for i in range(k): voteLabel = classVector[sortDistIndicies[i]] ## classCount[voteLabel]=classCount.get(voteLabel,0)+1 ##这里的get方法为字典里面的方法，0表示如果不存在key则返回0；A，B变成了{'A': 2, 'B': 1},这里是一个非常重要的技巧,开发会用到 sortedClassCount = sorted(classCount.items(),key=operator.itemgetter(1),reverse=True)##itemgetter(1)表示是按照{'A': 2, 'B': 1}里面的2,1来排序的，如果为itemgetter(0)表示是按照{'A': 2, 'B': 1}里面的A，B来排序的 ##sortedClassCount是个数组 print(sortedClassCount[0][0]) classify([1, 1], matrix, classVector, 3) ##总结 # 这边其实已经把所有距离都算出来了，默认是从小到大排序的，k只不过是取得前k个最短的距离值 # 然后用字典来计数的,按照从大到小的顺序排，显然第一个就是票数最多的

相关测试代码

import numpy as np import operator matrix = np.array([[1.0, 1.1], [1.0, 1.0], [0.0, 0.0], [0.0, 0.1]]) dataSetSize = matrix.shape[0] ##matrix.shape[0]代表的是matrix这个矩阵的行数,matrix.shape[1]代表的是matrix这个矩阵的列数 classVector = ['A', 'A', 'B', 'B'] print(matrix) print(dataSetSize) ##这里解释了np.tile的用法与效果 diffMat = np.tile([1,1], (dataSetSize, 1)) diffMat1 = np.tile([1,1], (dataSetSize, 2)) print(diffMat) # [[1 1] # [1 1] # [1 1] # [1 1]] print(diffMat1) # [[1 1 1 1] # [1 1 1 1] # [1 1 1 1] # [1 1 1 1]] ##做差 diffMat = np.tile([1,1], (dataSetSize, 1)) -matrix print(diffMat) # [[ 0. -0.1] # [ 0. 0. ] # [ 1. 1. ] # [ 1. 0.9]] ##平方 sqDiffMat = diffMat ** 2 print(sqDiffMat) # [[0. 0.01] # [0. 0. ] # [1. 1. ] # [1. 0.81]] ##求和 sqDistances = sqDiffMat.sum(axis=1) print(sqDistances) ##[0.01 0. 2. 1.81] sqDistances1 = sqDiffMat.sum(axis=0) print(sqDistances1) ##[2. 1.82] distance = sqDistances ** 0.5 print(distance) ##[0.1 0. 1.41421356 1.3453624 ] ##根据下标来进行排序 sortDistIndicies = distance.argsort() print(sortDistIndicies) ##[1 0 3 2] ## # [0.1 0. 1.41421356 1.3453624 ] # A A B B # 0 1 2 3 -------》对应的[1 0 3 2]就是下标 ##再看一个例子: # 1.先定义一个array数据 # # 1 import numpy as np # 2 x=np.array([1,4,3,-1,6,9]) # 2.现在我们可以看看argsort()函数的具体功能是什么： # # x.argsort() # 输出定义为y=array([3,0,2,1,4,5])。 # # 我们发现argsort()函数是将x中的元素从小到大排列，提取其对应的index(索引)，然后输出到y。例如：x[3]=-1最小，所以y[0]=3,x[5]=9最大，所以y[5]=5。 ############## voteLabel = classVector[sortDistIndicies[0]] voteLabel1= classVector[sortDistIndicies[1]] voteLabel2= classVector[sortDistIndicies[2]] print(voteLabel) print(voteLabel1) print(voteLabel2) classCount = {} classCount[voteLabel] = classCount.get(voteLabel, 0)+1 ##字典里面的get方法(返回指定键的值) # dict = {'Name': 'Zara', 'Age': 27} # # print "Value : %s" % dict.get('Age') # print "Value : %s" % dict.get('Sex', "Never") # 以上实例输出结果为： # # Value : 27 # Value : Never print(classCount[voteLabel]) print(classCount.items()) ###items 的例子: https://www.runoob.com/python/att-dictionary-items.html # # !/usr/bin/python # # coding=utf-8 # # dict = {'Google': 'www.google.com', 'Runoob': 'www.runoob.com', 'taobao': 'www.taobao.com'} # # print # "字典值 : %s" % dict.items() # # # 遍历字典列表 # for key, values in dict.items(): # print # key, values ######以上实例输出结果为: # # 字典值: [('Google', 'www.google.com'), ('taobao', 'www.taobao.com'), ('Runoob', 'www.runoob.com')] # Google # www.google.com # taobao # www.taobao.com # Runoob # www.runoob.com #classCount[voteLabel] = classCount.get(voteLabel, 0)+1 这里的测试 dic = {} dic1={} dic['A']=dic.get("A",0)+1 print(dic) #{'A': 1} dic['A']=dic.get("A",0)+1 print(dic)##{'A': 2} dic['A']=dic.get("A",0)+1 print(dic)#{'A': 3} dic1['A']=dic1.get("A",0) print(dic1) print(dic1['A'])

最新回复(0)