目录
4.1 Building your Deep Neural Network: Step by Step
1 - Packages
2 - Outline of the Assignment
3 - Initialization
3.1 - 2-layer Neural Network
3.2 - L-layer Neural Network
4 - Forward propagation module
4.1 - Linear Forward
4.2 - Linear-Activation Forward
4.3 - L-Layer Model
5 - Cost function
6 - Backward propagation module
6.1 - Linear backward
6.2 - Linear-Activation backward
6.3 - L-Model Backward
6.4 - Update Parameters
4.2 Deep Neural Network for Image Classification: Application
1 - Packages
2 - Dataset
3 - Architecture of your model
3.1 - 2-layer neural network
3.2 - L-layer deep neural network
3.3 - General methodology
4 - Two-layer neural network
5 - L-layer Neural Network
6 - Results Analysis
7 - Test with your own image (optional/ungraded exercise)
符号:
上标[l]表示第l层,例如:是第L层的激活函数,和是第L层的参数。上标(i)表示第i个样本,例如:是第i个训练样本。下标i表示向量的第i个输入,例如:表示第l层激活函数的第i个输入。
首先,运行下面的单元来导入在这个作业中需要的所有包:
numpy是使用python进行科学计算的基础包。Matplotlib是一个用于在Python中绘制图形的库。dnn_utils提供了一些必要的函数。testCases提供了一些测试用例来评估函数的正确性。np.random.seed(1)用来保持所有的随机函数调用的一致性。 import numpy as np import h5py import matplotlib.pyplot as plt from testCases_v2 import * from dnn_utils_v2 import sigmoid, sigmoid_backward, relu, relu_backward %matplotlib inline plt.rcParams['figure.figsize'] = (5.0, 4.0) # set default size of plots plt.rcParams['image.interpolation'] = 'nearest' plt.rcParams['image.cmap'] = 'gray' %load_ext autoreload %autoreload 2 np.random.seed(1)
为了构建神经网络,需要实现几个辅助函数。这些辅助函数将在下节作业中用来构造一个双层神经网络和一个L层的神经网络。下面是任务大纲:
1.初始化双层神经网络和L层神经网络的参数
2.实现前向传播模块(紫色部分):
完成一步前向传播的LINEAR部分(得到)已有ACTIVATION函数(relu/sigmoid)合并前两步,得到新的[LINEAR→ACTIVATION]前向函数堆叠[LINEAR→RELU]前向函数L-1次(从第1层到第L-1层),在最后(最后一层L)加上一个[LINEAR→RELU]。得到新的L_model_forward函数。3.计算损失
4.实现反向传播模块(红色部分):
完成一步反向传播的LINEAR部分已有ACTIVATE函数的梯度(relu_backward/sigmoid_backward)合并前两部,得到新的[LINEAR→ACTIVATION]反向函数堆叠[LINEAR→RELU]反向函数L-1次,加上[LINEAR→SIGMOID],得到新的L_model_backward函数。5.最后更新参数
注意:对于每一个前向函数,都有一个对应的反向函数,这就是为什么前向模块的每一步都要存储一些值在一个缓存器中,在反向传播模块你会用到这些缓存值来计算梯度。
完成两个辅助函数来初始化模型的参数,第一个函数初始化双层神经网络的参数,第二个函数初始化L层神经网络的参数。
练习:创建并初始化双层神经网络的参数。
说明:
模型的结构是LINEAR→RELU→LINEAR→SIGMOID对于权重矩阵使用随机初始化,使用正确维度的np.random.randn(shape)*0.01对于偏置值初始化为零,使用np.zeros(shape) W1 = np.random.randn(n_h, n_x) * 0.01 b1 = np.zeros(shape=(n_h, 1)) W2 = np.random.randn(n_y, n_h) * 0.01 b2 = np.zeros(shape=(n_y, 1))是l层的神经元数目,如果输入X的大小是(12288,209),即209个样本数:
练习:实现L层神经网络的初始化。
说明:
模型的结构是[LINEAR→RELU] X (L-1) → LINEAR → SIGMOID,即有L-1层使用RELU激活函数,接着是一个输出层使用sigmoid激活函数。对于权重矩阵使用随机初始化,使用np.random.randn(shape)*0.01对于偏置值初始化为零,使用np.zeros(shape)对于不同层的神经元数目,将其存在变量layer_dims中。例如,在“Planar Data classification model”中,layer_dims就是[2,4,1]:两个输入单元,单隐层有四个隐藏单元,输出层有有一个输出单元。意味着W1的大小是(4,2),b1的大小是(4,1),W2的大小的(1,4),b2的大小是(1,1)。 np.random.seed(3) parameters = {} L = len(layer_dims) # number of layers in the network for l in range(1, L): parameters['W' + str(l)] = np.random.randn(layer_dims[l], layer_dims[l-1]) * 0.01 parameters['b' + str(l)] = np.zeros(shape=(layer_dims[l], 1))
首先实现三个函数:
LINEARLINEAR → ACTIVATION,其中ACTIVATION是RELU或sigmoid[LINEAR → RELU] X (L-1) → LINEAR → SIGMOID(整个模型)线性前向模块计算如下:
其中,
练习:构建前向传播的线性模块。
Z = np.dot(W, A) + b两个激活函数Sigmoid和ReLU,都返回两个值,激活值"a",一个"cache"包含了"Z"(将反馈给相应的反向函数的值)。
练习:实现LINEAR→ACTIVATION层的前向传播,数学表示为,使用linear_forward()和正确的激活函数。
if activation == "sigmoid": # Inputs: "A_prev, W, b". Outputs: "A, activation_cache". Z, linear_cache = linear_forward(A_prev, W, b) A, activation_cache = sigmoid(Z) elif activation == "relu": # Inputs: "A_prev, W, b". Outputs: "A, activation_cache". Z, linear_cache = linear_forward(A_prev, W, b) A, activation_cache = relu(Z)为了更方便地实现L层神经网络,需要一个函数来复制上一步(linear_activation_forward和RELU)L-1次,然后紧接着是linear_activation_forward和SIGMOID。
练习:实现上述模型的前向传播。
说明:,即
caches = [] A = X L = len(parameters) // 2 # number of layers in the neural network # Implement [LINEAR -> RELU]*(L-1). Add "cache" to the "caches" list. for l in range(1, L): A_prev = A A, cache = linear_activation_forward(A_prev, parameters['W' + str(l)], parameters['b' + str(l)], "relu") caches.append(cache) # Implement LINEAR -> SIGMOID. Add "cache" to the "caches" list. AL, cache = linear_activation_forward(A, parameters['W' + str(L)], parameters['b' + str(L)], "sigmoid") caches.append(cache)
实现前向和反向传播,需要计算损失以检查模型是否在学习。
练习:计算交叉熵损失J,公式如下:
cost = -np.mean(Y * np.log(AL) + (1 - Y) * np.log(1 - AL))
和前向传播类似,用下列三步构建反向传播:
LINEAR反向LINEAR→ACTIVATION反向,其中ACTIVATION计算RELU或sigmoid激活函数的导数[LINEAR→RELU] X (L-1) → LINEAR → SIGMOID反向(整个模型)对于第l层,线性部分是
假设已经计算了,想得到
使用计算下列三个输出:
练习:利用上述三个公式实现linear_backward()
dW = np.dot(dZ, np.transpose(A_prev)) / m db = np.mean(dZ).reshape(b.shape) dA_prev = np.dot(np.transpose(W), dZ)提供了两个反向函数,sigmoid_backward和relu_backward,返回dZ,它们计算的是:
练习:实现LINEAR→ACTIVATION层的反向传播。
if activation == "relu": dZ = relu_backward(dA, activation_cache) dA_prev, dW, db = linear_backward(dZ, linear_cache) elif activation == "sigmoid": dZ = sigmoid_backward(dA, activation_cache) dA_prev, dW, db = linear_backward(dZ, linear_cache)接下来实现整个网络的反向传播,当实现L_model_forward函数时,在每一次迭代中,都储存了一个包含(X,W,b,Z)的cache,在反向传播模块会用到这些值来计算梯度,因此,在L_model_backward函数中,会从L层开始反向迭代所有隐藏层,每一步都会用到缓存值来反向传播通过第l层:
初始化反向传播:输出为,因此代码中需要计算,使用下列公式:
dAL = - (np.divide(Y, AL) - np.divide(1 - Y, 1 - AL)) # derivative of cost with respect to AL可以使用这个激活后的梯度dAL来保持反向传播,如上图所示,现在可以把dAL反馈给LINEAR→SIGMOID反向函数。之后,需要使用for循环迭代所有使用了LINEAR→RELU反向函数的其他层。需要存储每一个dA,dW和db在grads字典中,使用下列式子:
练习:实现[LINEAR→RELU] X (L-1) → LINEAR → SIGMOID模型的反向传播。
grads = {} L = len(caches) # the number of layers m = AL.shape[1] Y = Y.reshape(AL.shape) # after this line, Y is the same shape as AL # Initializing the backpropagation dAL = -(np.divide(Y, AL) - np.divide(1-Y, 1-AL)) # Lth layer (SIGMOID -> LINEAR) gradients. # Inputs: "AL, Y, caches". # Outputs: "grads["dAL"], grads["dWL"], grads["dbL"] current_cache = caches[L-1] grads["dA" + str(L)], grads["dW" + str(L)], grads["db" + str(L)] = \ linear_activation_backward(dAL, current_cache, "sigmoid") for l in reversed(range(L - 1)): # lth layer: (RELU -> LINEAR) gradients. # Inputs: "grads["dA" + str(l + 2)], caches". # Outputs: "grads["dA" + str(l + 1)] , grads["dW" + str(l + 1)] , grads["db" + str(l + 1)] current_cache = caches[l] grads["dA" + str(l + 1)] , grads["dW" + str(l + 1)] , grads["db" + str(l + 1)] = \ linear_activation_backward(grads["dA" + str(l + 2)], current_cache, "relu")更新模型的参数,使用梯度下降:
其中是学习率,计算得到的结果存入参数字典。
练习:实现update_parameters()来使用梯度下降更新参数。
L = len(parameters) // 2 # number of layers in the neural network # Update rule for each parameter. Use a for loop. for l in range(L): parameters["W" + str(l+1)] = parameters["W" + str(l+1)] - \ learning_rate * grads["dW" + str(l+1)] parameters["b" + str(l+1)] = parameters["b" + str(l+1)] - \ learning_rate * grads["db" + str(l+1)]
使用上一次作业中的猫数据集,当时的测试准确率为70%。
问题描述:给定一个数据集("data.h5"),包含:
训练集,m_train张图片,标记为猫(1)和非猫(0)测试集,m_test张图片,标记为猫和非猫每张图片的大小为(num_px,num_px,3),三通道RGB train_x_orig, train_y, test_x_orig, test_y, classes = load_data() # Explore your dataset m_train = train_x_orig.shape[0] num_px = train_x_orig.shape[1] m_test = test_x_orig.shape[0] print ("Number of training examples: " + str(m_train)) print ("Number of testing examples: " + str(m_test)) print ("Each image is of size: (" + str(num_px) + ", " + str(num_px) + ", 3)") print ("train_x_orig shape: " + str(train_x_orig.shape)) print ("train_y shape: " + str(train_y.shape)) print ("test_x_orig shape: " + str(test_x_orig.shape)) print ("test_y shape: " + str(test_y.shape)) Number of training examples: 209 Number of testing examples: 50 Each image is of size: (64, 64, 3) train_x_orig shape: (209, 64, 64, 3) train_y shape: (1, 209) test_x_orig shape: (50, 64, 64, 3) test_y shape: (1, 50)其中一张图片:
# Example of a picture index = 50 plt.imshow(train_x_orig[index]) print ("y = " + str(train_y[0,index]) + ". It's a " + classes[train_y[0,index]].decode("utf-8") + " picture.") y = 1. It's a cat picture.首先对图片进行reshape和standardize操作:
# Reshape the training and test examples train_x_flatten = train_x_orig.reshape(train_x_orig.shape[0], -1).T # The "-1" makes reshape flatten the remaining dimensions test_x_flatten = test_x_orig.reshape(test_x_orig.shape[0], -1).T # Standardize data to have feature values between 0 and 1. train_x = train_x_flatten/255. test_x = test_x_flatten/255. print ("train_x's shape: " + str(train_x.shape)) print ("test_x's shape: " + str(test_x.shape)) train_x's shape: (12288, 209) test_x's shape: (12288, 50)
构建两个不同的模型:双层神经网络和L层神经网络。
构建模型的一般方法:
1.初始化参数/定义超参数
2.num_iterations次循环:
前向传播计算损失函数反向传播更新参数3.使用训练后的参数预测标签
问题:使用上一节实现的辅助函数,双层神经网络结构:LINEAR→RELU→LINEAR→SIGMOID.
def initialize_parameters(n_x, n_h, n_y): ... return parameters def linear_activation_forward(A_prev, W, b, activation): ... return A, cache def compute_cost(AL, Y): ... return cost def linear_activation_backward(dA, cache, activation): ... return dA_prev, dW, db def update_parameters(parameters, grads, learning_rate): ... return parameters定义常数:
### CONSTANTS DEFINING THE MODEL #### n_x = 12288 # num_px * num_px * 3 n_h = 7 n_y = 1 layers_dims = (n_x, n_h, n_y)双层神经网络实现:
# GRADED FUNCTION: two_layer_model def two_layer_model(X, Y, layers_dims, learning_rate = 0.0075, num_iterations = 3000, print_cost=False): """ Implements a two-layer neural network: LINEAR->RELU->LINEAR->SIGMOID. Arguments: X -- input data, of shape (n_x, number of examples) Y -- true "label" vector (containing 0 if cat, 1 if non-cat), of shape (1, number of examples) layers_dims -- dimensions of the layers (n_x, n_h, n_y) num_iterations -- number of iterations of the optimization loop learning_rate -- learning rate of the gradient descent update rule print_cost -- If set to True, this will print the cost every 100 iterations Returns: parameters -- a dictionary containing W1, W2, b1, and b2 """ np.random.seed(1) grads = {} costs = [] # to keep track of the cost m = X.shape[1] # number of examples (n_x, n_h, n_y) = layers_dims # Initialize parameters dictionary, by calling one of the functions you'd previously implemented ### START CODE HERE ### (≈ 1 line of code) parameters = initialize_parameters(n_x, n_h, n_y) ### END CODE HERE ### # Get W1, b1, W2 and b2 from the dictionary parameters. W1 = parameters["W1"] b1 = parameters["b1"] W2 = parameters["W2"] b2 = parameters["b2"] # Loop (gradient descent) for i in range(0, num_iterations): # Forward propagation: LINEAR -> RELU -> LINEAR -> SIGMOID. Inputs: "X, W1, b1". Output: "A1, cache1, A2, cache2". ### START CODE HERE ### (≈ 2 lines of code) A1, cache1 = linear_activation_forward(X, W1, b1, "relu") A2, cache2 = linear_activation_forward(A1, W2, b2, "sigmoid") ### END CODE HERE ### # Compute cost ### START CODE HERE ### (≈ 1 line of code) cost = compute_cost(A2, Y) ### END CODE HERE ### # Initializing backward propagation dA2 = -(np.divide(Y, A2) - np.divide(1-Y, 1-A2)) # Backward propagation. Inputs: "dA2, cache2, cache1". Outputs: "dA1, dW2, db2; also dA0 (not used), dW1, db1". ### START CODE HERE ### (≈ 2 lines of code) dA1, dW2, db2 = linear_activation_backward(dA2, cache2, "sigmoid") dA0, dW1, db1 = linear_activation_backward(dA1, cache1, "relu") ### END CODE HERE ### # Set grads['dWl'] to dW1, grads['db1'] to db1, grads['dW2'] to dW2, grads['db2'] to db2 grads['dW1'] = dW1 grads['db1'] = db1 grads['dW2'] = dW2 grads['db2'] = db2 # Update parameters. ### START CODE HERE ### (approx. 1 line of code) parameters = update_parameters(parameters, grads, learning_rate) ### END CODE HERE ### # Retrieve W1, b1, W2, b2 from parameters W1 = parameters["W1"] b1 = parameters["b1"] W2 = parameters["W2"] b2 = parameters["b2"] # Print the cost every 100 training example if print_cost and i % 100 == 0: print("Cost after iteration {}: {}".format(i, np.squeeze(cost))) if print_cost and i % 100 == 0: costs.append(cost) # plot the cost plt.plot(np.squeeze(costs)) plt.ylabel('cost') plt.xlabel('iterations (per tens)') plt.title("Learning rate =" + str(learning_rate)) plt.show() return parameters训练:
parameters = two_layer_model(train_x, train_y, layers_dims = (n_x, n_h, n_y), num_iterations = 2500, print_cost=True)训练集上的准确率:
predictions_train = predict(train_x, train_y, parameters) Accuracy: 1.0测试集上的准确率:
predictions_test = predict(test_x, test_y, parameters) Accuracy: 0.72
问题:使用上一节实现的辅助函数,L层神经网络结构:[LINEAR→RELU] X (L-1) → LINEAR → SIGMOID.
def initialize_parameters_deep(layer_dims): ... return parameters def L_model_forward(X, parameters): ... return AL, caches def compute_cost(AL, Y): ... return cost def L_model_backward(AL, Y, caches): ... return grads def update_parameters(parameters, grads, learning_rate): ... return parameters定义常数:
### CONSTANTS ### layers_dims = [12288, 20, 7, 5, 1] # 5-layer modelL层神经网络实现:
# GRADED FUNCTION: L_layer_model def L_layer_model(X, Y, layers_dims, learning_rate = 0.0075, num_iterations = 3000, print_cost=False):#lr was 0.009 """ Implements a L-layer neural network: [LINEAR->RELU]*(L-1)->LINEAR->SIGMOID. Arguments: X -- data, numpy array of shape (number of examples, num_px * num_px * 3) Y -- true "label" vector (containing 0 if cat, 1 if non-cat), of shape (1, number of examples) layers_dims -- list containing the input size and each layer size, of length (number of layers + 1). learning_rate -- learning rate of the gradient descent update rule num_iterations -- number of iterations of the optimization loop print_cost -- if True, it prints the cost every 100 steps Returns: parameters -- parameters learnt by the model. They can then be used to predict. """ np.random.seed(1) costs = [] # keep track of cost # Parameters initialization. ### START CODE HERE ### parameters = initialize_parameters_deep(layers_dims) ### END CODE HERE ### # Loop (gradient descent) for i in range(0, num_iterations): # Forward propagation: [LINEAR -> RELU]*(L-1) -> LINEAR -> SIGMOID. ### START CODE HERE ### (≈ 1 line of code) AL, caches = L_model_forward(X, parameters) ### END CODE HERE ### # Compute cost. ### START CODE HERE ### (≈ 1 line of code) cost = compute_cost(AL, Y) ### END CODE HERE ### # Backward propagation. ### START CODE HERE ### (≈ 1 line of code) grads = L_model_backward(AL, Y, caches) ### END CODE HERE ### # Update parameters. ### START CODE HERE ### (≈ 1 line of code) parameters = update_parameters(parameters, grads, learning_rate) ### END CODE HERE ### # Print the cost every 100 training example if print_cost and i % 100 == 0: print ("Cost after iteration %i: %f" %(i, cost)) if print_cost and i % 100 == 0: costs.append(cost) # plot the cost plt.plot(np.squeeze(costs)) plt.ylabel('cost') plt.xlabel('iterations (per tens)') plt.title("Learning rate =" + str(learning_rate)) plt.show() return parameters训练:
parameters = L_layer_model(train_x, train_y, layers_dims, num_iterations = 2500, print_cost = True)训练集上的准确率:
pred_train = predict(train_x, train_y, parameters) Accuracy: 0.9856459330143541测试集上的准确率:
pred_test = predict(test_x, test_y, parameters) Accuracy: 0.8
一些错分类的图片:
