李宏毅第三次Task3

xiaoxiao2022-07-06 241

作业1：预测PM2.5的值在这个作业中，我们将用梯度下降方法预测PM2.5的值 hw1要求： 1、要求python3.5+ 2、只能用（1）numpy（2）scipy（3）pandas 3、请用梯度下降手写线性回归 4、最好的公共简单基线 5、对于想加载模型而并不想运行整个训练过程的人：请上传训练代码并命名成 train.py 只要用梯度下降的代码就行了 hw_best要求： 1、要求python3.5+ 2、任何库都可以用 3、在kaggle上获得你选择的更高的分

数据介绍：本次作業使用豐原站的觀測記錄，分成train set跟test set，train set是豐原站每個月的前20天所有資料test set則是從豐原站剩下的資料中取樣出來。 train.csv:每個月前20天每個小時的氣象資料(每小時有18種測資)。共12個月。 test.csv:從剩下的資料當中取樣出連續的10小時為一筆，前九小時的所有觀測數據當作feature，第十小時的PM2.5當作answer。一共取出240筆不重複的test data，請根據feauure預測這240筆的PM2.5。

请完成之后参考以下资料： Sample_code:https://ntumlta.github.io/2017fall-ml-hw1/code.html Supplementary_Slide:https://docs.google.com/presentation/d/1WwIQAVI0RRA6tpcieynPVoYDuMmuVKGvVNF_DSKIiDI/edit#slide=id.g1ef6d808f1_2_0 答案参考answer.csv

#coding=utf-8 #Version:python3.6.0 #Tools:Pycharm 2017.3.2 __date__ = '2019/5/22 13:51' __author__ = 'ranchunfu' import numpy as np import pandas as pd import matplotlib.pyplot as plt train = pd.read_csv("./Dataset/train.csv") test = pd.read_csv("./Dataset/test(1).csv") train = train[train['observation'] == 'PM2.5'] test = test[test['AMB_TEMP'] == 'PM2.5'] train = train.iloc[:,3:] test = test.iloc[:,2:] train = np.array(train, dtype = 'float32') test = np.array(test, dtype = 'float32') train = train.reshape(1,train.shape[0]*train.shape[1]) PM = train #数据归一化参考追风者 PM_mean = int(PM.mean()) PM_theta = int(PM.var()**0.5) PM = (PM - PM_mean) / PM_theta np.random.seed(3) W = np.random.randn(1,10) * 0.01 # b = np.zeros((1,1)) #正向传播以及梯度下降 costs = [] lean_rate = 0.1 m = PM.shape[1] - 9 for i in range(150): cost = 0 grad = 0 for j in range(m): x = np.array(PM[:,j:j+9]) x = np.insert(x,0,1).reshape(10,1) error = PM[:,j+9] - np.dot(W,x) cost += float(error**2) grad += (error) * x.T cost = cost / (2*m) costs.append(cost) dW = grad/m if i % 10 == 0 : print(cost) W = W + lean_rate*dW plt.plot(costs) plt.xlabel("num of iter") plt.ylabel("cost") plt.title("learn = 0.1") plt.show() #处理测试数据 test = pd.read_csv("./Dataset/test(1).csv") test = test[test['AMB_TEMP'] == 'PM2.5'] test = test.iloc[:,2:] x = test.insert(0,0,1) test = test.T test = np.array(test, dtype = 'float32') test_pred = np.dot(W,test) #正向传播 np.set_printoptions(precision=3) np.set_printoptions(suppress=True) # print(test_pred) answer = pd.read_csv("answer.csv") answer = answer["value"].values answer = answer.reshape(1,240) print(np.sum((y_pred - answer)**2)/240)

cost曲线

y_pred值

评价指标

最新回复(0)