七、(4)逻辑回归——二分类法,预测乳腺癌数据
乳腺癌数据集下载地址:https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data
下载的数据为data格式,直接改文件名为csv查看数据内容即可。最后一行为目标值。2代表正常,4代表癌症。
由于官网给的数据集没有每列的名称,需要我们自己添加,代码中会写出添加步骤。
代码如下:
"""
Created on Sun May 26 21:34:29 2019
@author: sun
"""
from sklearn
.linear_model
import LogisticRegression
from sklearn
.model_selection
import train_test_split
from sklearn
.preprocessing
import StandardScaler
from sklearn
.metrics
import classification_report
from sklearn
.externals
import joblib
import pandas
as pd
import numpy
as np
def logistic():
"""
逻辑回归做二分类进行癌症预测(根据细胞的属性特征)
"""
column
= ['Sample code number','Clump Thickness', 'Uniformity of Cell Size','Uniformity of Cell Shape','Marginal Adhesion', 'Single Epithelial Cell Size','Bare Nuclei','Bland Chromatin','Normal Nucleoli','Mitoses','Class']
data
= pd
.read_csv
(r
"C:\Users\sun\Desktop\论文\算法代码\逻辑回归二分类\乳腺癌分类数据.csv",engine
='python',names
=column
)
data
= data
.replace
(to_replace
='?', value
=np
.nan
)
data
= data
.dropna
()
x_train
, x_test
, y_train
, y_test
= train_test_split
(data
[column
[1:10]], data
[column
[10]], test_size
=0.25)
std
= StandardScaler
()
x_train
= std
.fit_transform
(x_train
)
x_test
= std
.transform
(x_test
)
lg
= LogisticRegression
(C
=1.0)
lg
.fit
(x_train
, y_train
)
y_predict
= lg
.predict
(x_test
)
joblib
.dump
(lg
, "./lg.pkl")
model
= joblib
.load
("./lg.pkl")
data
= pd
.read_csv
(r
"C:\Users\sun\Desktop\论文\算法代码\逻辑回归二分类\输入预测乳腺癌数据.csv",engine
='python',names
=column
)
xx_test
= data
[column
[1:10]]
xx_test
= std
.transform
(xx_test
)
yy_predict
= model
.predict
(xx_test
)
print("保存的模型预测的结果:", yy_predict
)
if __name__
== "__main__":
logistic
()
如数结果:
测试六个数据的分类效果还可以。
搞定收工。
“☺☺☺ 若本篇文章对你有一丝丝帮助,请帮顶、评论点赞,谢谢。☺☺☺”
↓↓↓↓
转载请注明原文地址: https://yun.8miu.com/read-139393.html