七、(2)线性回归——正规方程(基础的线性回归)、SVM、随机森林对比。
本文主要探讨一下线性回归、SVM、随机森林三种模型预测波士顿房价数据集的特点,即准确率。
第一步:可视化三种线性模型
完整代码如下:
"""
Created on Sun May 26 13:06:39 2019
@author: sun
"""
from sklearn
.feature_selection
import SelectKBest
,f_regression
from sklearn
.linear_model
import LinearRegression
from sklearn
.svm
import SVR
from sklearn
.ensemble
import RandomForestRegressor
import pandas
as pd
import matplotlib
.pyplot
as plt
from matplotlib
import font_manager
my_font
= font_manager
.FontProperties
(fname
="\Windows\Fonts\simhei.ttf")
data
= pd
.read_csv
(r
"C:\Users\sun\Desktop\论文\算法代码\线性回归\波士顿原数据.csv",engine
='python')
x
= data
[['crim','zn','indus','chas','nox','rm','age','dis','rad','tax','ptratto','b','lstat']]
y
= data
['medv']
SelectKBest
= SelectKBest
(f_regression
, k
=1)
bestFeature
= SelectKBest
.fit_transform
(x
,y
.values
.ravel
())
SelectKBest
.get_support
()
x
.columns
[SelectKBest
.get_support
()]
a
= x
.columns
[SelectKBest
.get_support
()]
x
= data
['lstat'].reshape
(-1,1)
y
= data
['medv'].reshape
(-1,1)
def plot_scatter(x
,y
,R
=None):
plt
.scatter
(x
,y
,s
=32,marker
='o',facecolors
='blue')
if R
is not None:
plt
.scatter
(x
,R
,color
='red',linewidth
=0.5)
plt
.xlabel
("lstat")
plt
.ylabel
("medv")
plt
.show
()
plot_scatter
(x
,y
)
regressor
= LinearRegression
(normalize
=True).fit
(x
,y
)
plot_scatter
(x
,y
,regressor
.predict
(x
))
regressor
= SVR
().fit
(x
,y
)
plot_scatter
(x
,y
,regressor
.predict
(x
))
regressor
=RandomForestRegressor
().fit
(x
,y
)
plot_scatter
(x
,y
,regressor
.predict
(x
))
结果如下:
从上图中发现随机森林的效果最好,接下重复七、(1)里的步骤,我们用随机森林回归来测试一下波士顿数据集。
完整代码如下:
"""
Created on Sat May 25 19:28:12 2019
@author: sun
"""
from sklearn
.datasets
import load_boston
from sklearn
.model_selection
import train_test_split
from sklearn
.preprocessing
import StandardScaler
from sklearn
.linear_model
import LinearRegression
, SGDRegressor
, Ridge
from sklearn
.externals
import joblib
from sklearn
.metrics
import mean_squared_error
import pandas
as pd
from sklearn
.ensemble
import RandomForestRegressor
def mylinear():
"""
线性回归直接预测房子价格
:return: None
"""
titan
= pd
.read_csv
(r
"C:\Users\sun\Desktop\论文\算法代码\线性回归\波士顿原数据.csv",engine
='python')
data
= titan
[['crim','zn','indus','chas','nox','rm','age','dis','rad','tax','ptratto','b','lstat']]
target
= titan
['medv']
x_train
, x_test
, y_train
, y_test
= train_test_split
(data
, target
, test_size
=0.25)
std_x
= StandardScaler
()
x_train
= std_x
.fit_transform
(x_train
)
x_test
= std_x
.transform
(x_test
)
std_y
= StandardScaler
()
y_train
= std_y
.fit_transform
(y_train
.reshape
(-1, 1))
y_test
= std_y
.transform
(y_test
.reshape
(-1, 1))
rf
= RandomForestRegressor
()
rf
.fit
(x_train
, y_train
)
y_rf_predict
= std_y
.inverse_transform
(rf
.predict
(x_test
))
print("随机森林回归测试集里面每个房子的预测价格:", y_rf_predict
)
print("随机森林回归的均方误差:", mean_squared_error
(std_y
.inverse_transform
(y_test
), y_rf_predict
))
joblib
.dump
(rf
, "./rf.pkl")
model
= joblib
.load
("./rf.pkl")
titan2
= pd
.read_csv
(r
"C:\Users\sun\Desktop\论文\算法代码\线性回归\填入数据预测房价.csv",engine
='python')
xx_test
= titan2
[['crim','zn','indus','chas','nox','rm','age','dis','rad','tax','ptratto','b','lstat']]
xx_test
= std_x
.transform
(xx_test
)
yy_predict
= std_y
.inverse_transform
(model
.predict
(xx_test
))
print("保存的模型预测的结果:", yy_predict
)
if __name__
== "__main__":
mylinear
()
运行结果,随机森林预测相对较好:
搞定收工。
“☺☺☺ 若本篇文章对你有一丝丝帮助,请帮顶、评论点赞,谢谢。☺☺☺”
↓↓↓↓
转载请注明原文地址: https://yun.8miu.com/read-136015.html