七、(4)逻辑回归——二分类法,预测乳腺癌数据

    xiaoxiao2025-07-15  7

    七、(4)逻辑回归——二分类法,预测乳腺癌数据

    乳腺癌数据集下载地址:https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data

    下载的数据为data格式,直接改文件名为csv查看数据内容即可。最后一行为目标值。2代表正常,4代表癌症。

    由于官网给的数据集没有每列的名称,需要我们自己添加,代码中会写出添加步骤。

    代码如下:

    # -*- coding: utf-8 -*- """ Created on Sun May 26 21:34:29 2019 @author: sun """ from sklearn.linear_model import LogisticRegression from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.metrics import classification_report from sklearn.externals import joblib import pandas as pd import numpy as np def logistic(): """ 逻辑回归做二分类进行癌症预测(根据细胞的属性特征) """ # 构造列标签名字,一共11个 column = ['Sample code number','Clump Thickness', 'Uniformity of Cell Size','Uniformity of Cell Shape','Marginal Adhesion', 'Single Epithelial Cell Size','Bare Nuclei','Bland Chromatin','Normal Nucleoli','Mitoses','Class'] # 读取数据 data = pd.read_csv(r"C:\Users\sun\Desktop\论文\算法代码\逻辑回归二分类\乳腺癌分类数据.csv",engine='python',names=column) #print(data) # 缺失值进行处理 data = data.replace(to_replace='?', value=np.nan) data = data.dropna() # 进行数据的分割,1到10列为特征值,11列为目标值 x_train, x_test, y_train, y_test = train_test_split(data[column[1:10]], data[column[10]], test_size=0.25) # 进行标准化处理 std = StandardScaler() x_train = std.fit_transform(x_train) x_test = std.transform(x_test) # 逻辑回归预测 lg = LogisticRegression(C=1.0) lg.fit(x_train, y_train) #print(lg.coef_) y_predict = lg.predict(x_test) #print("准确率:", lg.score(x_test, y_test)) #print("召回率:", classification_report(y_test, y_predict, labels=[2, 4], target_names=["良性", "恶性"])) #保存训练好的模型 joblib.dump(lg, "./lg.pkl") #加载模型,预测自己的数据 model = joblib.load("./lg.pkl") # 读取数据,数据为相同格式下需要预测的数据。和前文七、(1)(2)的操作一样 data = pd.read_csv(r"C:\Users\sun\Desktop\论文\算法代码\逻辑回归二分类\输入预测乳腺癌数据.csv",engine='python',names=column) xx_test= data[column[1:10]] #获取特征值1 到 10 列, xx_test = std.transform(xx_test) yy_predict = model.predict(xx_test) print("保存的模型预测的结果:", yy_predict) if __name__ == "__main__": logistic()

    如数结果:

    测试六个数据的分类效果还可以。

    搞定收工。

    “☺☺☺ 若本篇文章对你有一丝丝帮助,请帮顶、评论点赞,谢谢。☺☺☺”

    ↓↓↓↓

    最新回复(0)