本公众号由哥大,清华,复旦,中科大多名研究生共同创建,主要更新最新AI顶会快速解读和Pytorch编程,金融大数据与量化投资。如果你喜欢的话,请关注我们公众号,有学习资源放送,谢谢!
http://weixin.qq.com/r/Bik6IiTElnnprWBB93wU (二维码自动识别)
欢迎加微信号:uft-uft进群交流,记得备注知乎+所在学校或企业。
https://u.wechat.com/MK2C3qQoAkXkuiSqn2DomBc (二维码自动识别)
TCN是指时间卷积网路,一种新型的可以用来解决时间序列预测的演算法。这次更新将解读最终训练的代码部分。
论文名称:
An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling
作者:Shaojie Bai 1 J. Zico Kolter 2 Vladlen Koltun 3
代码详解与结果
下面是代码详解与结果部分
第一部分
第二部分
第三部分:训练
main部分与结果
具体代码
model = TCN(emsize, n_words, num_chans, dropout=dropout,
emb_dropout=emb_dropout, kernel_size=k_size, tied_weights=tied)
if cuda:
model.cuda()
# May use adaptive softmax to speed up training
criterion = nn.CrossEntropyLoss()
#交叉熵损失函数
optimizer = getattr(optim, optimization)(model.parameters(), lr=lr)
#getattr() 函数是用于返回一个对象属性值。
#getattr(object, name[, default])
#object -- 对象
#name -- 字元串,对象属性。
#default -- 默认返回值,如果不提供该参数,在没有对应属性时,将触发 AttributeError。
#model.parameters()是提供给optim的参数
def evaluate(data_source):
model.eval()
total_loss = 0
processed_data_size = 0
for i in range(0, data_source.size(1) - 1, validseqlen):
if i + seq_len - validseqlen >= data_source.size(1) - 1:
continue
data, targets = get_batch(data_source, i, seq_len, evaluation=True)
#获得数据
output = model(data)
#得到输出值
# Discard the effective history, just like in training
eff_history = seq_len - validseqlen
#eff_history是总序列长度-验证序列的长度
final_output = output[:, eff_history:].contiguous().view(-1, n_words)
final_target = targets[:, eff_history:].contiguous().view(-1)
loss = criterion(final_output, final_target)
# Note that we dont add TAR loss here
total_loss += (data.size(1) - e ff_history) * loss.data
#总损失
processed_data_size += data.size(1) - eff_history#大小
return total_loss[0] / processed_data_size
def train():
# Turn on training mode which enables dropout.
global train_data
model.train()
start_time = time.time()
for batch_idx, i in enumerate(range(0, train_data.size(1) - 1, validseqlen)):
if i + seq_len - validseqlen >= train_data.size(1) - 1:
data, targets = get_batch(train_data, i, seq_len)
optimizer.zero_grad()#梯度清零
# Discard the effective history part
if eff_history < 0:
raise ValueError("Valid sequence length must be smaller than sequence length!")
#contiguous是因为view需要tensor的内存是整块的
loss = criterion(final_output, final_target)#交叉熵函数
loss.backward()
#看似没有传参数,其实是反向求解梯度的过程
if clip > 0:
torch.nn.utils.clip_grad_norm(model.parameters(), clip)
#梯度裁剪,输入是(NN参数,最大梯度范数,范数类型=2) 一般默认为L2 范数
optimizer.step()
#使用step方法来对所有参数进行更新
total_loss += loss.data
if batch_idx % log_interval == 0 and batch_idx > 0:
cur_loss = total_loss[0] / log_interval
elapsed = time.time() - start_time
print(| epoch {:3d} | {:5d}/{:5d} batches | lr {:02.5f} | ms/batch {:5.5f} |
loss {:5.2f} | ppl {:8.2f}.format(
epoch, batch_idx, train_data.size(1) // validseqlen, lr,
elapsed * 1000 / log_interval, cur_loss, math.exp(cur_loss)))
#输出结果,具体结果可查看之后图片
if __name__ == "__main__":
best_vloss = 1e8
# At any point you can hit Ctrl + C to break out of training early.
try:
all_vloss = []
for epoch in range(1, epochs+1):
epoch_start_time = time.time()
train()
val_loss = evaluate(val_data)
test_loss = evaluate(test_data)
print(- * 89)
print(| end of epoch {:3d} | time: {:5.2f}s | valid loss {:5.2f} |
valid ppl {:8.2f}.format(epoch, (time.time() - epoch_start_time),
val_loss, math.exp(val_loss)))
print(| end of epoch {:3d} | time: {:5.2f}s | test loss {:5.2f} |
test ppl {:8.2f}.format(epoch, (time.time() - epoch_start_time),
test_loss, math.exp(test_loss)))
# Save the model if the validation loss is the best weve seen so far.
if val_loss < best_vloss:
with open("model.pt", wb) as f:
print(Save model! )
torch.save(model, f)
best_vloss = val_loss
# Anneal the learning rate if the validation loss plateaus
if epoch > 5 and val_loss >= max(all_vloss[-5:]):
lr = lr / 2.
for param_group in optimizer.param_groups:
param_group[lr] = lr
all_vloss.append(val_loss)
except KeyboardInterrupt:
print(Exiting from training early)
# Load the best saved model.
with open("model.pt", rb) as f:
model = torch.load(f)
# Run on test data.
print(= * 89)
print(| End of training | test loss {:5.2f} | test ppl {:8.2f}.format(
往期文章精选
AI顶会论文快速解读|将金融领域中的CVaR作为目标策略来定制对话模型
AI顶会论文快速解读|上下文对比特征与门控多尺度聚合用于场景分割
AI顶会论文详细解读|对抗训练解决开放式生成式对话
AI顶会论文详细解读|深度强化学习之基于对话交互的学习对话
AI顶会论文快速解读|控制具体化程度的闲聊对话系统
精美PPT快速熟悉RNN与LSTM,附tensorflow教程代码(回复PPT有资源放送)
每天三分钟之Pytorch编程-1:为何选择你?
每天三分钟之Pytorch编程-2:一起来搭积木吧
每天三分钟之Pytorch编程-3:没事就来炼丹吧
每天三分钟之Pytorch编程-4:来搭建个翻译系统吧(1)
每天三分钟之Pytorch编程-4:来搭建个翻译系统吧(2)
每天三分钟之Pytorch编程-4:来搭建个翻译系统吧(3)
关注本公众号