CNN经典架构LeNet-5及参数调整

xiaoxiao2022-07-06 207

CNN经典架构LeNet-5及参数调整

Create Wed May 2019.5.22 19:18 Update Tue Jun 2019-6-18 17:15:54（更新对LeNet-5网络的基本框架的修改，并对卷积过程中的个参数维度加以注释）目前来说，深度学习、机器学习一直给人就是高深莫测的感觉。但是其实对于深度学习在计算机视觉方面的研究问题，目前来说其实就是对一些视频（帧信息）或图片任务实现分类识别，并且可以控制在一定的误差范围之内就达到了问题的目的。并且与其他对于图像分类任务的算法相对比的情况下，神经网络由于其具有较好的分类特性，所以成为了当之无愧的首选算法。神经网络的前身就是感知机，但是之前感知机只能做做基础的一些任务，尤其是不能处理异或运算。所以研究人员就在考虑如何让计算机可以完成一些更为复杂的任务，所以在模拟人脑基础上的神经网络就被提出。当时由于神经网络的较为难解释性，并且训练也无法得到预想的实验结果，所以神经网络就沉寂了一段时间。但是当Hinton教授首次将误差反向传播引入之后，神经网络就焕发了新的生机。感知机模型其实看看深度学习的发展历史还是挺有趣的。如果对深度学习或者机器学习有兴趣的话，网上关于它们的介绍还是很多的，可以自行搜索。回归该博文的主题，LeNet-5网络。leNet-5网络于20世纪90年代由LeCun教授提出，当时只是用于对于银行手写字体的计算机识别任务。正是由于该网络的提出，直接引起了研究员对神经网络的研究，尤其是卷积神经网络（CNN），如果你看到这篇博文的话，你应该了解什么是CNN网络了，所以这里不加赘述。这里也不对LeNet-5的原始论文加以详述，因为网上这种博文较多，并且介绍的也大同小异，说得也还不错。这里只说明LeNet-5网络一共由5层构成：第一层-卷积、激活、最大值池化；第二层-卷积、激活、池化；后三层：全连接层。这里以Tensorflow为基础，实现对手写字体数据集MNIST进行识别分类，并给出最终的分类结果。这里着重针对分析Lenet-5网络中的各个参数的作用，以及如何对整个网络的参数进行调整。这里先把卷积过程和池化过程先梳理一下： 1、卷积过程：假设输入a = tf.constant([[1., 2., 2., 1.], [3., 2., 1., 1.], [2., 2., 1., 3.], [2., 1., 2., 1.]]),其具体矩阵表示为然后将其进行形状的转换，b = tf.reshape(a, [1, 4, 4, 1])其转换后的矩阵为a.T可以理解为a的转置。再设滤波窗口为filter = tf.ones([1, 2, 2, 1])其矩阵表示为，然后根据CNN卷积的对应乘积相加的规则可以得到最终的结果。假设卷积为conv = tf.nn.conv2d(a, filter, strides = [1, 1, 1, 1], padding = 'VALID'),得到卷积的结果为这里注意：padding的作用，如果改为SAME，读者可自行尝试，也就是在原待卷积矩阵进行加0边，即，并且参与卷积的数值只能是float，不能是int。 2、池化过程继续使用上述矩阵b，则pool = tf.nn.max_pool(b, ksize = [1, 2, 2, 1], strides = [1, 1, 1, 1,], padding = 'VALID')所以最终经过最大值池化之后的结果为（如果对conv2d、max_pool不是很了解的话，建议可以看看官方API或者参照其他博文）整体实现代码：

#Create Tue May 2019.5.21 14:23LeNet-5 #End 2019.5.21 16:07 import tensorflow as tf import tensorflow.examples.tutorials.mnist.input_data as input_data mnist = input_data.read_data_sets('MNIST_data', one_hot=True) #Attention：the picture size of MNIST datasets is 28*28 sess = tf.InteractiveSession() #insert caculate graph when run graph #with tf.Session() as sess: #build all operation before run graph x = tf.placeholder('float', shape=[None, 28*28]) #Article of LeNet-5: the picture size is 32 * 32 #None: stands for unclear of number of columns y_ = tf.placeholder('float', shape=[None, 10]) #mnist datasets are ten numbers, so y_ shape is the matrix of one row and ten columns #tf.placeholder() just a holder-expression, it can not calculate in graph x_image = tf.reshape(x, [-1, 28, 28, 1]) #genarate the shape of 28*28(block), and only one column. -1 stands for suitable number of block def weights(shape): initial = tf.truncated_normal(shape, stddev=0.2) return tf.Variable(initial) #generate weights of CNN def bias(shape): initial = tf.constant(0.1, shape=shape) return tf.Variable(initial) #generate biases of CNN def conv2d(x, W): return tf.nn.conv2d(input=x, filter=W, strides=[1, 1, 1, 1], padding='SAME') #Redefine conv2d function def max_pool_2x2(x): return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME') #Redefine max_pool function # 1st layer: conv+relu+max_pool w_conv1 = weights([5, 5, 1, 6]) b_conv1 = bias([6]) h_conv1 = tf.nn.relu(conv2d(x_image, w_conv1)+b_conv1) h_pool1 = max_pool_2x2(h_conv1) #can choose avg_pool replace max_pool_2x2 # 2nd layer: conv+relu+max_pool w_conv2 = weights([5, 5, 6, 16]) b_conv2 = bias([16]) h_conv2 = tf.nn.relu(conv2d(h_pool1, w_conv2)+b_conv2) h_pool2 = max_pool_2x2(h_conv2) h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*16]) # 3rd layer: 3*full connection w_fc1 = weights([7*7*16, 120]) b_fc1 = bias([120]) h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, w_fc1)+b_fc1) #can choose tanh() or sigmoid() function w_fc2 = weights([120, 84]) b_fc2 = bias([84]) h_fc2 = tf.nn.relu(tf.matmul(h_fc1, w_fc2)+b_fc2) w_fc3 = weights([84, 10]) b_fc3 = bias([10]) h_fc3 = tf.nn.softmax(tf.matmul(h_fc2, w_fc3)+b_fc3) cross_entropy = -tf.reduce_sum(y_*tf.log(h_fc3)) train_step = tf.train.AdamOptimizer(1e-3).minimize(cross_entropy) correct_prediction = tf.equal(tf.argmax(h_fc3, 1), tf.argmax(y_, 1)) accuracy = tf.reduce_mean(tf.cast(correct_prediction, 'float')) sess.run(tf.global_variables_initializer()) #initialize all variables for i in range(2000): batch = mnist.train.next_batch(60) if i0 == 0: train_accuracy = accuracy.eval(session=sess, feed_dict={x: batch[0], y_: batch[1]}) print('step {}, training accuracy: {}'.format(i, train_accuracy)) train_step.run(session=sess, feed_dict={x: batch[0], y_: batch[1]}) print('test accuracy: {}'.format(accuracy.eval(session=sess, feed_dict={x: mnist.test.images, y_: mnist.test.labels})))

Update Contents：

更改后的代码如下：

import tensorflow as tf from tensorflow.examples.tutorials.mnist import input_data mnist = input_data.read_data_sets('MNIST/', one_hot = True) sess = tf.InteractiveSession() x = tf.placeholder(tf.float32, [None, 784]) y_ = tf.placeholder(tf.float32, [None, 10]) x_image = tf.reshape(x, [-1, 28, 28, 1]) def weight_variable(shape): initial = tf.random_normal(shape, stddev = 0.1) return tf.Variable(initial) def bias_variable(shape): initial = tf.random_normal(shape = shape) return tf.Variable(initial) def conv2d(x, w): return tf.nn.conv2d(x, w, strides = [1, 1, 1, 1], padding = 'SAME') def max_pool_2x2(x): return tf.nn.max_pool(x, ksize = [1, 2, 2, 1], strides = [1, 2, 2, 1], padding = 'SAME') W_conv1 = weight_variable([5, 5, 1, 6]) b_conv1 = bias_variable([6]) h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1) #28*28*6 h_pool1 = max_pool_2x2(h_conv1) #14*14*6 W_conv2 = weight_variable([5, 5, 6, 16]) b_conv2 = bias_variable([16]) h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2) #14*14*16 #增加一层卷积层，并且将池化操作更换至此步进行 W_conv3 = weight_variable([5, 5, 16, 16]) b_conv3 = bias_variable([16]) h_conv3 = tf.nn.relu(conv2d(h_conv2, W_conv3) + b_conv3) #14*14*16 h_pool3 = max_pool_2x2(h_conv3) #7*7*16 #执行全连接操作，将得到的像素矩阵进行在输入全连接层前作维度准备 h_pool2_flat = tf.reshape(h_pool3, [-1, 7*7*16]) #h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*16]) W_fc1 = weight_variable([7*7*16, 120]) b_fc1 = bias_variable([120]) h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1) W_fc2 = weight_variable([120, 84]) b_fc2 = bias_variable([84]) h_fc2 = tf.nn.relu(tf.matmul(h_fc1, W_fc2) + b_fc2) #增加一层全连接层，将全连接数据维度进一步降低 W_fc3 = weight_variable([84, 42]) b_fc3 = bias_variable([42]) h_fc3 = tf.nn.relu(tf.matmul(h_fc2, W_fc3) + b_fc3) W_fc4 = weight_variable([42, 10]) b_fc4 = bias_variable([10]) y_conv = tf.nn.softmax(tf.matmul(h_fc3, W_fc4) + b_fc4) cross_entropy = -tf.reduce_sum(y_ * tf.log(y_conv)) train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy) correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1)) accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) tf.global_variables_initializer().run() for i in range(1500): batch = mnist.train.next_batch(50) if i % 100 == 0: train_accuracy = accuracy.eval(feed_dict = {x: batch[0], y_: batch[1]}) print('steps: %d, training accuracy: %g\n' %(i, train_accuracy)) train_step.run(feed_dict = {x: batch[0], y_: batch[1]}) print('test accuracy: %g\n'

转载请注明原文地址: https://yun.8miu.com/read-26028.html

最新回复(0)

CNN经典架构LeNet-5及参数调整

CNN经典架构LeNet-5及参数调整

目录