六、(1)决策树和随机森林分析泰坦尼克号乘客数据。
数据集下载地址https://pan.baidu.com/s/1g76H1913c5vYK1z02Ba_5w,密码 :yj1y ,保存为csv格式即可。
本文参考网上相关算法文章,对经典数据进行预测分析。以掌握该算法相关知识点。 csv格式如图示,选取pclass、age、room、sex四个特征值。
代码如下:
"""
Created on Wed May 22 13:42:53 2019
@author: sun
"""
import pandas
as pd
from sklearn
.feature_extraction
import DictVectorizer
from sklearn
.model_selection
import train_test_split
, GridSearchCV
from sklearn
.ensemble
import RandomForestClassifier
from sklearn
.tree
import DecisionTreeClassifier
, export_graphviz
def decision():
"""
决策树对泰坦尼克号进行预测生死
:return: None
"""
titan
= pd
.read_csv
(r
"C:\Users\sun\Desktop\论文\算法代码\决策树和随机森林\尼克号.csv",engine
='python')
x
= titan
[['pclass', 'age', 'sex','floor']]
y
= titan
['survived']
x
['age'].fillna
(x
['age'].mean
(), inplace
=True)
x
['floor'].fillna
(value
= 0, inplace
=True)
print(x
)
x_train
, x_test
, y_train
, y_test
= train_test_split
(x
, y
, test_size
=0.25)
dict = DictVectorizer
(sparse
=False)
x_train
= dict.fit_transform
(x_train
.to_dict
(orient
="records"))
print(dict.get_feature_names
())
x_test
= dict.transform
(x_test
.to_dict
(orient
="records"))
print(x_train
)
dec
= DecisionTreeClassifier
(max_depth
=8)
dec
.fit
(x_train
, y_train
)
print("决策树预测的准确率:", dec
.score
(x_test
, y_test
))
export_graphviz
(dec
, out_file
="./尼克号tree.dot", feature_names
=['年龄', 'floor','pclass=1st', 'pclass=2nd', 'pclass=3rd', '女性', '男性'])
rf
= RandomForestClassifier
(n_jobs
=-1)
rf
.fit
(x_train
, y_train
)
print("随机森林预测的准确率:", rf
.score
(x_test
, y_test
))
print("数据越多,随机森林准确率越大")
param
= {"n_estimators": [100, 200, 300, 500, 800, 1200], "max_depth": [5, 8, 15, 25, 30]}
gc
= GridSearchCV
(rf
, param_grid
=param
, cv
=2)
gc
.fit
(x_train
, y_train
)
print("准确率:", gc
.score
(x_test
, y_test
))
print("查看选择的参数模型:", gc
.best_params_
)
return None
if __name__
== "__main__":
decision
()
输出结果如图示:
./尼克号tree.dot 决策树可视化需安装graphviz:
1、下载graphviz。下载第一个,一路next即可。
2、下载打开后,找到 ./尼克号tree.dot路径运行即可
搞定收工。
“☺☺☺ 若本篇文章对你有一丝丝帮助,请帮顶、评论点赞,谢谢。☺☺☺”
↓↓↓↓