文章目录
搭建网络结构训练网络准备训练所需的数据(与上篇一致)开始训练绘制损失函数使用神经网络进行预测尝试Importance sampling
搭建网络结构
import torch
import torch
.nn
as nn
class GenerateRNN(nn
.Module
):
def __init__(self
, input_size
, hidden_size
, output_size
):
super(GenerateRNN
, self
).__init__
()
self
.hidden_size
= hidden_size
self
.c2h
= nn
.Linear
(n_categories
, hidden_size
)
self
.i2h
= nn
.Linear
(input_size
, hidden_size
)
self
.h2h
= nn
.Linear
(hidden_size
, hidden_size
)
self
.activation
= nn
.Tanh
()
self
.h2o
= nn
.Linear
(hidden_size
, output_size
)
self
.dropout
= nn
.Dropout
(0.2)
self
.softmax
= nn
.LogSoftmax
(dim
=1)
def forward(self
, category
, input, hidden
):
c2h
= self
.c2h
(category
)
i2h
= self
.i2h
(input)
h2h
= self
.h2h
(hidden
)
hidden
= self
.activation
( c2h
+i2h
+h2h
)
dropout
= self
.dropout
(self
.h2o
(hidden
))
output
= self
.softmax
(dropout
)
return output
, hidden
def initHidden(self
, is_cuda
=True):
if is_cuda
:
return torch
.zeros
(1, self
.hidden_size
).cuda
()
else:
return torch
.zeros
(1, self
.hidden_size
)
训练网络
准备训练所需的数据(与上篇一致)
all_categories
['Greek',
'Korean',
'Arabic',
'English',
'Chinese',
'Czech',
'Vietnamese',
'German',
'Polish',
'Japanese',
'Dutch',
'Italian',
'Irish',
'Portuguese',
'Russian',
'French',
'Spanish',
'Scottish']
category_lines
['Chinese']
['Ang',
'AuYong',
'Bai',
'Ban',
'Bao',
'Bei',
'Bian',
'Bui',
'Cai',
'Cao',
'Cen',
'Chai',
'Chaim',
'Chan',
'Chang',
'Chao',
'Che',
'Chen',
'Cheng',
'Cheung',
'Chew',
'Chieu',
'Chin',
'Chong',
'Chou',
'Chu',
'Cui',
'Dai',
'Deng',
'Ding',
'Dong',
'Dou',
'Duan',
'Eng',
'Fan',
'Fei',
'Feng',
'Foong',
'Fung',
'Gan',
'Gauk',
'Geng',
'Gim',
'Gok',
'Gong',
'Guan',
'Guang',
'Guo',
'Gwock',
'Han',
'Hang',
'Hao',
'Hew',
'Hiu',
'Hong',
'Hor',
'Hsiao',
'Hua',
'Huan',
'Huang',
'Hui',
'Huie',
'Huo',
'Jia',
'Jiang',
'Jin',
'Jing',
'Joe',
'Kang',
'Kau',
'Khoo',
'Khu',
'Kong',
'Koo',
'Kwan',
'Kwei',
'Kwong',
'Lai',
'Lam',
'Lang',
'Lau',
'Law',
'Lew',
'Lian',
'Liao',
'Lim',
'Lin',
'Ling',
'Liu',
'Loh',
'Long',
'Loong',
'Luo',
'Mah',
'Mai',
'Mak',
'Mao',
'Mar',
'Mei',
'Meng',
'Miao',
'Min',
'Ming',
'Moy',
'Mui',
'Nie',
'Niu',
'OuYang',
'OwYang',
'Pan',
'Pang',
'Pei',
'Peng',
'Ping',
'Qian',
'Qin',
'Qiu',
'Quan',
'Que',
'Ran',
'Rao',
'Rong',
'Ruan',
'Sam',
'Seah',
'See ',
'Seow',
'Seto',
'Sha',
'Shan',
'Shang',
'Shao',
'Shaw',
'She',
'Shen',
'Sheng',
'Shi',
'Shu',
'Shuai',
'Shui',
'Shum',
'Siew',
'Siu',
'Song',
'Sum',
'Sun',
'Sze ',
'Tan',
'Tang',
'Tao',
'Teng',
'Teoh',
'Thean',
'Thian',
'Thien',
'Tian',
'Tong',
'Tow',
'Tsang',
'Tse',
'Tsen',
'Tso',
'Tze',
'Wan',
'Wang',
'Wei',
'Wen',
'Weng',
'Won',
'Wong',
'Woo',
'Xiang',
'Xiao',
'Xie',
'Xing',
'Xue',
'Xun',
'Yan',
'Yang',
'Yao',
'Yap',
'Yau',
'Yee',
'Yep',
'Yim',
'Yin',
'Ying',
'Yong',
'You',
'Yuan',
'Zang',
'Zeng',
'Zha',
'Zhan',
'Zhang',
'Zhao',
'Zhen',
'Zheng',
'Zhong',
'Zhou',
'Zhu',
'Zhuo',
'Zong',
'Zou',
'Bing',
'Chi',
'Chu',
'Cong',
'Cuan',
'Dan',
'Fei',
'Feng',
'Gai',
'Gao',
'Gou',
'Guan',
'Gui',
'Guo',
'Hong',
'Hou',
'Huan',
'Jian',
'Jiao',
'Jin',
'Jiu',
'Juan',
'Jue',
'Kan',
'Kuai',
'Kuang',
'Kui',
'Lao',
'Liang',
'Lu',
'Luo',
'Man',
'Nao',
'Pian',
'Qiao',
'Qing',
'Qiu',
'Rang',
'Rui',
'She',
'Shi',
'Shuo',
'Sui',
'Tai',
'Wan',
'Wei',
'Xian',
'Xie',
'Xin',
'Xing',
'Xiong',
'Xuan',
'Yan',
'Yin',
'Ying',
'Yuan',
'Yue',
'Yun',
'Zha',
'Zhai',
'Zhang',
'Zhi',
'Zhuan',
'Zhui']
代表终止符
all_letters
[56]
"'"
import random
def randomChoice(l
):
return l
[random
.randint
(0, len(l
) - 1)]
def randomTrainingPair():
category
= randomChoice
(all_categories
)
line
= randomChoice
(category_lines
[category
])
return category
, line
def categoryTensor(category
):
li
= all_categories
.index
(category
)
tensor
= torch
.zeros
(1, n_categories
)
tensor
[0][li
] = 1
if is_cuda
:
tensor
= tensor
.cuda
()
return tensor
def inputTensor(line
):
tensor
= torch
.zeros
(len(line
), 1, n_letters
)
for li
in range(len(line
)):
letter
= line
[li
]
tensor
[li
][0][all_letters
.find
(letter
)] = 1
if is_cuda
:
tensor
= tensor
.cuda
()
return tensor
def targetTensor(line
):
letter_indexes
= [all_letters
.find
(line
[li
]) for li
in range(1, len(line
))]
letter_indexes
.append
(n_letters
- 1)
tensor
= torch
.LongTensor
(letter_indexes
)
if is_cuda
:
tensor
= tensor
.cuda
()
return tensor
随机生成训练集
'''
生成训练数据,其中
category_tensor是描述语言的一维onehot向量
input_line_tensor是描述输入的onehot矩阵
target_line_tensor是描述答案的索引向量,其实每一个索引可以看做是一种分类
'''
def randomTrainingExample():
category
, line
= randomTrainingPair
()
category_tensor
= categoryTensor
(category
)
input_line_tensor
= inputTensor
(line
)
target_line_tensor
= targetTensor
(line
)
return category_tensor
, input_line_tensor
, target_line_tensor
开始训练
criterion
= nn
.NLLLoss
()
learning_rate
= 0.0005
def train(category_tensor
, input_line_tensor
, target_line_tensor
):
target_line_tensor
.unsqueeze_
(-1)
hidden
= rnn
.initHidden
()
rnn
.zero_grad
()
loss
= 0
for i
in range(input_line_tensor
.size
(0)):
output
, hidden
= rnn
(category_tensor
, input_line_tensor
[i
], hidden
)
l
= criterion
(output
, target_line_tensor
[i
])
loss
+= l
loss
.backward
()
for p
in rnn
.parameters
():
if hasattr(p
.grad
, "data"):
p
.data
.add_
(-learning_rate
, p
.grad
.data
)
return output
, loss
.item
() / input_line_tensor
.size
(0)
import time
import math
def timeSince(since
):
now
= time
.time
()
s
= now
- since
m
= math
.floor
(s
/ 60)
s
-= m
* 60
return '%dm %ds' % (m
, s
)
rnn
= GenerateRNN
(n_letters
, 128, n_letters
)
rnn
= rnn
.cuda
() if is_cuda
else rnn
n_iters
= 100000
print_every
= 5000
plot_every
= 500
all_losses
= []
total_loss
= 0
start
= time
.time
()
for iter in range(1, n_iters
+ 1):
output
, loss
= train
(*randomTrainingExample
())
total_loss
+= loss
if iter % print_every
== 0:
print('%s (%d %d%%) %.4f' % (timeSince
(start
), iter, iter / n_iters
* 100, total_loss
/iter))
if iter % plot_every
== 0:
all_losses
.append
(total_loss
/ iter)
0m 17s (5000 5%) 3.1435
0m 35s (10000 10%) 3.0208
0m 52s (15000 15%) 2.9548
1m 10s (20000 20%) 2.9098
1m 27s (25000 25%) 2.8753
1m 45s (30000 30%) 2.8430
2m 2s (35000 35%) 2.8165
2m 21s (40000 40%) 2.7949
2m 38s (45000 45%) 2.7756
2m 55s (50000 50%) 2.7574
3m 12s (55000 55%) 2.7420
3m 30s (60000 60%) 2.7273
3m 48s (65000 65%) 2.7144
4m 6s (70000 70%) 2.7016
4m 23s (75000 75%) 2.6902
4m 40s (80000 80%) 2.6800
4m 58s (85000 85%) 2.6693
5m 16s (90000 90%) 2.6585
5m 35s (95000 95%) 2.6489
5m 54s (100000 100%) 2.6394
绘制损失函数
import matplotlib
.pyplot
as plt
import matplotlib
.ticker
as ticker
plt
.figure
()
plt
.plot
(all_losses
)
使用神经网络进行预测
max_length
= 20
def sample(category
, start_letter
='A'):
with torch
.no_grad
():
category_tensor
= categoryTensor
(category
)
input = inputTensor
(start_letter
)
hidden
= rnn
.initHidden
()
output_name
= start_letter
for i
in range(max_length
):
output
, hidden
= rnn
(category_tensor
, input[0], hidden
)
topv
, topi
= output
.topk
(1)
topi
= topi
[0][0]
if topi
== n_letters
- 1:
break
else:
letter
= all_letters
[topi
]
output_name
+= letter
input = inputTensor
(letter
)
return output_name
def samples(category
, start_letters
='ABC'):
for start_letter
in start_letters
:
print(sample
(category
, start_letter
))
print("\n")
samples
('Russian', 'CCZZYY')
samples
('German', 'CCZZYY')
samples
('Spanish', 'CCZZYY')
samples
('Chinese', 'CCZZYY')
Charinov
Cakin
Zakin
Zherinov
Youlovev
Yulovevk
Chien Cherer Zamerr Zaner Yaner Yanter
Care Carera Zoura Zales Youle Yaner
Can Chan Zhang Zan Yai Yai
samples
('Chinese', 'WWWWWWWW')
Wai
Wan
Wan
Wan
Wang
Win
Wan
Wan
可以看出来对于同一个输入会产生不同的输出,这是由于dropout的存在产生的结果。
rnn
.dropout
= nn
.Dropout
(0)
samples
('Chinese', 'WWWWWWWW')
Wan
Wan
Wan
Wan
Wan
Wan
Wan
Wan
尝试Importance sampling
Importance sampling指的是,在生成单词每一个字母的时候,一般的生成模型都有是选择输出结果中概率值最大的字母作为预测,而Importance sampling的方式则是按照各个字母的输出概率来选择输出哪个模型。举个栗子,当程序输入一个字母C的时候,程序预测下一个字母为[a, o, e]的概率分别为[0.7, 0.2, 0.1], 普通的做法是直接选择概率值最大的a作为预测结果,而Importance Sampling的方式则是以70%的概率选择a,以20%的概率选择o,以10%的该路选择e,最终选到哪个有一定的随机性。
Tips: 直接使用模型输出的概率值会比较麻烦,因为模型的概率输出值会很小,因为这里有50多种字符,所以,一个更加明智的做法是只在概率值最高的前3个或者5个中选择,记得要将概率使用softmax等函数将其转换到和为1.
Tips: 更加简单的做法是,仅仅使用模型输出的概率值进行排序,然后使用指定的概率值进行选择,如,指定选择概率最大的那个字母的概率值为0.5,第二大的为0.3,第三大的为0.2,然后可以直接输入[0.5, 0.3, 0.2].
import numpy
as np
max_length
= 20
def sample(category
, start_letter
='A'):
with torch
.no_grad
():
category_tensor
= categoryTensor
(category
)
input = inputTensor
(start_letter
)
hidden
= rnn
.initHidden
()
output_name
= start_letter
for i
in range(max_length
):
output
, hidden
= rnn
(category_tensor
, input[0], hidden
)
output
= torch
.exp
(output
)
topv
, topi
= output
.topk
(3)
topi
= topi
[0]
r
= random
.random
()
topv
= torch
.nn
.functional
.softmax
(topv
[0])
if r
< topv
[0]:
topi
= topi
[0]
elif r
< topv
[0]+topv
[1]:
topi
= topi
[1]
else:
topi
= topi
[2]
if topi
== n_letters
- 1:
break
else:
letter
= all_letters
[topi
]
output_name
+= letter
input = inputTensor
(letter
)
return output_name
def samples(category
, start_letters
='ABC'):
for start_letter
in start_letters
:
print(sample
(category
, start_letter
))
print("\n")