本文使用TensorFlow实现了基于RNN循环神经网络的MNIST手写数据集的分类。
1. 超参数定义
# 超参数定义(hyper parameters) LEARNING_RATE = 0.001 TRAINING_ITER = 10000 BATCH_SIZE = 128 INPUT_DIMS = 28 HIDDEN_DIMS = 128 CLASSES_NUM = 10 TIME_STEPS = 28对于以上超参数,其中:
INPUT_DIMS : 输入的维度,在本例中长度为28,即是图像的每一列的长度
HIDDEN_DIMS : RNN cell中的状态state的维度,即隐藏节点
CLASSES_NUM : 分类结果数,一共是10个分类
TIME_STEPS : 指的是整个序列的长度,这里设为每个图像的行数
2. 网络结构
此处实现的为RNN循环神经网络的前向传播
# weights和bias初始化函数,用于多次调用 def weight_init(shape): return tf.Variable(tf.random_normal(shape=shape, stddev=0.1)) def bias_init(shape): # 使bias的初始化为较小的值,但是不为零 return tf.Variable(tf.zeros(shape=shape) + 0.01) # 创建网络结构,进行前向传播 def rnn_inference(x): weight_xa = weight_init([INPUT_DIMS, HIDDEN_DIMS]) bias_xa = bias_init([HIDDEN_DIMS]) weight_ay = weight_init([HIDDEN_DIMS, CLASSES_NUM]) bias_ay = bias_init([CLASSES_NUM]) x = tf.reshape(x, [-1 , INPUT_DIMS]) x_to_a = tf.matmul(x, weight_xa) + bias_xa x_to_a = tf.reshape(x_in, [-1, TIME_STEPS, HIDDEN_DIMS]) rnn_cell = tf.nn.rnn_cell.BasicRNNCell(num_units=HIDDEN_DIMS) # state_init = rnn_cell.zero_state() # 可以不进行状态初始化 output, last_state = tf.nn.dynamic_rnn(rnn_cell, x_to_a, dtype=tf.float32) # 这里可以选取output的最后一个输出来进行计算 logits = tf.nn.softmax(tf.matmul(output[:, -1, :], weight_ay) + bias_ay , 1) return logits以上需要注意的的是,每层的RNN cell是共用相同的参数weights和biases,和CNN网络中卷积核比较类似,所以在这我们只需要定义weight_xa, bisa_a以及weight_ay和bias_y;由RNN循环神经网络的原理可知,状态更新为:
在本例中,选择状态量的维度为HIDDEN_DIMS,其中又可以分为,只需要我们自己定义即可,这里我的理解为TensorFlow自己已经实现内部变量的更新,则可以知道,所以这里我们需要对输入的x进行reshape,大小为,然后对,再将其reshape为TensorFlow中RNN cell的输入的格式即可,即为。具体的TensorFlow中的rnn_cell以及danamic_rnn()和rnn_cell.call()可以自行进行详细的了解。
输出为:
3. 进行训练
# 训练函数 def train(): # 数据读取 mnist_data = input_data.read_data_sets("MNIST_data", one_hot=True) print(mnist_data.train.images.shape) # 创建输入和训练标签占位符 x_in = tf.placeholder(dtype=tf.float32, shape=[None, 784], name="x_inputs") y_label = tf.placeholder(dtype=tf.float32, shape=[None, CLASSES_NUM], name="y_labels") y_out = lstm_inference(x_in) # 使用交叉熵目标函数 loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(labels=y_label, logits=y_out)) correct_pred = tf.equal(tf.argmax(y_label, 1), tf.argmax(y_out, 1)) accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32)) # 使用Adam法进行优化 optimizer = tf.train.AdamOptimizer(learning_rate=LEARNING_RATE).minimize(loss) init_op = tf.global_variables_initializer() with tf.Session() as sess: sess.run(init_op) for i in range(TRAINING_ITER): batch_x, batch_label = mnist_data.train.next_batch(BATCH_SIZE) sess.run( optimizer, feed_dict={ x_in: batch_x, y_label: batch_label }) if i % 100 == 0: print("steps {0} , loss : {1} , accuracy : {2}".format(str(i), str(loss.eval( feed_dict={ x_in: batch_x, y_label: batch_label })), str(accuracy.eval( feed_dict={ x_in: mnist_data.test.images, y_label: mnist_data.test.labels } ))))
4. 训练结果
steps 0 , loss : 2.299897 , accuracy : 0.1486 steps 100 , loss : 1.7092526 , accuracy : 0.7546 steps 200 , loss : 1.5534915 , accuracy : 0.9055 steps 300 , loss : 1.5672985 , accuracy : 0.9162 steps 400 , loss : 1.5232389 , accuracy : 0.9295 steps 500 , loss : 1.4930712 , accuracy : 0.9482 steps 600 , loss : 1.520176 , accuracy : 0.9547 steps 700 , loss : 1.497025 , accuracy : 0.9545 steps 800 , loss : 1.4983183 , accuracy : 0.9562 steps 900 , loss : 1.5114753 , accuracy : 0.9644 steps 1000 , loss : 1.498097 , accuracy : 0.9687 steps 1100 , loss : 1.4864463 , accuracy : 0.9672 steps 1200 , loss : 1.5137687 , accuracy : 0.9637 steps 1300 , loss : 1.4695554 , accuracy : 0.9712 steps 1400 , loss : 1.4954376 , accuracy : 0.9706 steps 1500 , loss : 1.4733143 , accuracy : 0.9724 steps 1600 , loss : 1.4818668 , accuracy : 0.9722 steps 1700 , loss : 1.4771671 , accuracy : 0.9669 steps 1800 , loss : 1.4712167 , accuracy : 0.9741 steps 1900 , loss : 1.4700866 , accuracy : 0.9742 steps 2000 , loss : 1.4893398 , accuracy : 0.9721 steps 2100 , loss : 1.48666 , accuracy : 0.9742 steps 2200 , loss : 1.4840587 , accuracy : 0.9748 steps 2300 , loss : 1.488863 , accuracy : 0.9765 steps 2400 , loss : 1.4716485 , accuracy : 0.9735 steps 2500 , loss : 1.4644128 , accuracy : 0.9749 Process finished with exit code -15. 结论
在实验结果中我们可以看到,收敛速度很快,大概进行1500次的梯度下降就可以达到97%的正确率,但是在进行了10000次的迭代之后最终的分类正确率大概只能到98%左右,和CNN和99.2%还是有不小的差距,改进如下:1).可以使用LSTM长短期记忆进行改进;2)使用多层RNN,即深度循环神经网络。
参考文献
附录
使用LSTM
def lstm_inference(x): weight_xa = weight_init([INPUT_DIMS, HIDDEN_DIMS]) bias_xa = bias_init([HIDDEN_DIMS]) weight_ay = weight_init([HIDDEN_DIMS, CLASSES_NUM]) bias_ay = bias_init([CLASSES_NUM]) x = tf.reshape(x, [-1, INPUT_DIMS]) x_in = tf.matmul(x, weight_xa) + bias_xa x_in = tf.reshape(x_in, [-1, TIME_STEPS, HIDDEN_DIMS]) # 定义LSTM cell lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(HIDDEN_DIMS) output, last_state = tf.nn.dynamic_rnn(lstm_cell, x_in, dtype=tf.float32) # 取output的最后一cell的输出 logits = tf.nn.softmax(tf.matmul(output[:, -1, :], weight_ay) + bias_ay, 1) return logits如有错误,欢迎指出,十分感谢