**
** 1、传统LeNet-5网络实现 2、对LeNet-5网络进行修改,并注明修改原则
Create Wed May 2019.5.22 19:18 Update Tue Jun 2019-6-18 17:15:54(更新对LeNet-5网络的基本框架的修改,并对卷积过程中的个参数维度加以注释) 目前来说,深度学习、机器学习一直给人就是高深莫测的感觉。但是其实对于深度学习在计算机视觉方面的研究问题,目前来说其实就是对一些视频(帧信息)或图片任务实现分类识别,并且可以控制在一定的误差范围之内就达到了问题的目的。并且与其他对于图像分类任务的算法相对比的情况下,神经网络由于其具有较好的分类特性,所以成为了当之无愧的首选算法。 神经网络的前身就是感知机,但是之前感知机只能做做基础的一些任务,尤其是不能处理异或运算。所以研究人员就在考虑如何让计算机可以完成一些更为复杂的任务,所以在模拟人脑基础上的神经网络就被提出。当时由于神经网络的较为难解释性,并且训练也无法得到预想的实验结果,所以神经网络就沉寂了一段时间。但是当Hinton教授首次将误差反向传播引入之后,神经网络就焕发了新的生机。 感知机模型 其实看看深度学习的发展历史还是挺有趣的。如果对深度学习或者机器学习有兴趣的话,网上关于它们的介绍还是很多的,可以自行搜索。回归该博文的主题,LeNet-5网络。leNet-5网络于20世纪90年代由LeCun教授提出,当时只是用于对于银行手写字体的计算机识别任务。正是由于该网络的提出,直接引起了研究员对神经网络的研究,尤其是卷积神经网络(CNN),如果你看到这篇博文的话,你应该了解什么是CNN网络了,所以这里不加赘述。这里也不对LeNet-5的原始论文加以详述,因为网上这种博文较多,并且介绍的也大同小异,说得也还不错。这里只说明LeNet-5网络一共由5层构成:第一层-卷积、激活、最大值池化;第二层-卷积、激活、池化;后三层:全连接层。这里以Tensorflow为基础,实现对手写字体数据集MNIST进行识别分类,并给出最终的分类结果。这里着重针对分析Lenet-5网络中的各个参数的作用,以及如何对整个网络的参数进行调整。 这里先把卷积过程和池化过程先梳理一下: 1、卷积过程: 假设输入a = tf.constant([[1., 2., 2., 1.], [3., 2., 1., 1.], [2., 2., 1., 3.], [2., 1., 2., 1.]]),其具体矩阵表示为 然后将其进行形状的转换,b = tf.reshape(a, [1, 4, 4, 1])其转换后的矩阵为a.T可以理解为a的转置。再设滤波窗口为filter = tf.ones([1, 2, 2, 1])其矩阵表示为 ,然后根据CNN卷积的对应乘积相加的规则可以得到最终的结果。假设卷积为conv = tf.nn.conv2d(a, filter, strides = [1, 1, 1, 1], padding = 'VALID'),得到卷积的结果为 这里注意:padding的作用,如果改为SAME,读者可自行尝试,也就是在原待卷积矩阵进行加0边,即 ,并且参与卷积的数值只能是float,不能是int。 2、池化过程 继续使用上述矩阵b,则pool = tf.nn.max_pool(b, ksize = [1, 2, 2, 1], strides = [1, 1, 1, 1,], padding = 'VALID')所以最终经过最大值池化之后的结果为(如果对conv2d、max_pool不是很了解的话,建议可以看看官方API或者参照其他博文) 整体实现代码:
#Create Tue May 2019.5.21 14:23LeNet-5 #End 2019.5.21 16:07 import tensorflow as tf import tensorflow.examples.tutorials.mnist.input_data as input_data mnist = input_data.read_data_sets('MNIST_data', one_hot=True) #Attention:the picture size of MNIST datasets is 28*28 sess = tf.InteractiveSession() #insert caculate graph when run graph #with tf.Session() as sess: #build all operation before run graph x = tf.placeholder('float', shape=[None, 28*28]) #Article of LeNet-5: the picture size is 32 * 32 #None: stands for unclear of number of columns y_ = tf.placeholder('float', shape=[None, 10]) #mnist datasets are ten numbers, so y_ shape is the matrix of one row and ten columns #tf.placeholder() just a holder-expression, it can not calculate in graph x_image = tf.reshape(x, [-1, 28, 28, 1]) #genarate the shape of 28*28(block), and only one column. -1 stands for suitable number of block def weights(shape): initial = tf.truncated_normal(shape, stddev=0.2) return tf.Variable(initial) #generate weights of CNN def bias(shape): initial = tf.constant(0.1, shape=shape) return tf.Variable(initial) #generate biases of CNN def conv2d(x, W): return tf.nn.conv2d(input=x, filter=W, strides=[1, 1, 1, 1], padding='SAME') #Redefine conv2d function def max_pool_2x2(x): return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME') #Redefine max_pool function # 1st layer: conv+relu+max_pool w_conv1 = weights([5, 5, 1, 6]) b_conv1 = bias([6]) h_conv1 = tf.nn.relu(conv2d(x_image, w_conv1)+b_conv1) h_pool1 = max_pool_2x2(h_conv1) #can choose avg_pool replace max_pool_2x2 # 2nd layer: conv+relu+max_pool w_conv2 = weights([5, 5, 6, 16]) b_conv2 = bias([16]) h_conv2 = tf.nn.relu(conv2d(h_pool1, w_conv2)+b_conv2) h_pool2 = max_pool_2x2(h_conv2) h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*16]) # 3rd layer: 3*full connection w_fc1 = weights([7*7*16, 120]) b_fc1 = bias([120]) h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, w_fc1)+b_fc1) #can choose tanh() or sigmoid() function w_fc2 = weights([120, 84]) b_fc2 = bias([84]) h_fc2 = tf.nn.relu(tf.matmul(h_fc1, w_fc2)+b_fc2) w_fc3 = weights([84, 10]) b_fc3 = bias([10]) h_fc3 = tf.nn.softmax(tf.matmul(h_fc2, w_fc3)+b_fc3) cross_entropy = -tf.reduce_sum(y_*tf.log(h_fc3)) train_step = tf.train.AdamOptimizer(1e-3).minimize(cross_entropy) correct_prediction = tf.equal(tf.argmax(h_fc3, 1), tf.argmax(y_, 1)) accuracy = tf.reduce_mean(tf.cast(correct_prediction, 'float')) sess.run(tf.global_variables_initializer()) #initialize all variables for i in range(2000): batch = mnist.train.next_batch(60) if i0 == 0: train_accuracy = accuracy.eval(session=sess, feed_dict={x: batch[0], y_: batch[1]}) print('step {}, training accuracy: {}'.format(i, train_accuracy)) train_step.run(session=sess, feed_dict={x: batch[0], y_: batch[1]}) print('test accuracy: {}'.format(accuracy.eval(session=sess, feed_dict={x: mnist.test.images, y_: mnist.test.labels})))Update Contents:
更改后的代码如下:
import tensorflow as tf from tensorflow.examples.tutorials.mnist import input_data mnist = input_data.read_data_sets('MNIST/', one_hot = True) sess = tf.InteractiveSession() x = tf.placeholder(tf.float32, [None, 784]) y_ = tf.placeholder(tf.float32, [None, 10]) x_image = tf.reshape(x, [-1, 28, 28, 1]) def weight_variable(shape): initial = tf.random_normal(shape, stddev = 0.1) return tf.Variable(initial) def bias_variable(shape): initial = tf.random_normal(shape = shape) return tf.Variable(initial) def conv2d(x, w): return tf.nn.conv2d(x, w, strides = [1, 1, 1, 1], padding = 'SAME') def max_pool_2x2(x): return tf.nn.max_pool(x, ksize = [1, 2, 2, 1], strides = [1, 2, 2, 1], padding = 'SAME') W_conv1 = weight_variable([5, 5, 1, 6]) b_conv1 = bias_variable([6]) h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1) #28*28*6 h_pool1 = max_pool_2x2(h_conv1) #14*14*6 W_conv2 = weight_variable([5, 5, 6, 16]) b_conv2 = bias_variable([16]) h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2) #14*14*16 #增加一层卷积层,并且将池化操作更换至此步进行 W_conv3 = weight_variable([5, 5, 16, 16]) b_conv3 = bias_variable([16]) h_conv3 = tf.nn.relu(conv2d(h_conv2, W_conv3) + b_conv3) #14*14*16 h_pool3 = max_pool_2x2(h_conv3) #7*7*16 #执行全连接操作,将得到的像素矩阵进行在输入全连接层前作维度准备 h_pool2_flat = tf.reshape(h_pool3, [-1, 7*7*16]) #h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*16]) W_fc1 = weight_variable([7*7*16, 120]) b_fc1 = bias_variable([120]) h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1) W_fc2 = weight_variable([120, 84]) b_fc2 = bias_variable([84]) h_fc2 = tf.nn.relu(tf.matmul(h_fc1, W_fc2) + b_fc2) #增加一层全连接层,将全连接数据维度进一步降低 W_fc3 = weight_variable([84, 42]) b_fc3 = bias_variable([42]) h_fc3 = tf.nn.relu(tf.matmul(h_fc2, W_fc3) + b_fc3) W_fc4 = weight_variable([42, 10]) b_fc4 = bias_variable([10]) y_conv = tf.nn.softmax(tf.matmul(h_fc3, W_fc4) + b_fc4) cross_entropy = -tf.reduce_sum(y_ * tf.log(y_conv)) train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy) correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1)) accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) tf.global_variables_initializer().run() for i in range(1500): batch = mnist.train.next_batch(50) if i % 100 == 0: train_accuracy = accuracy.eval(feed_dict = {x: batch[0], y_: batch[1]}) print('steps: %d, training accuracy: %g\n' %(i, train_accuracy)) train_step.run(feed_dict = {x: batch[0], y_: batch[1]}) print('test accuracy: %g\n'