pytorch搭建神经网络

0.准备数据,处理数据

1.搭建网络结构

https://www.cnblogs.com/tian777/p/15341522.html

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
class pointwise_hybird_contrasive(hybird):
def __init__(self,config_roberta, path,num):
super(pointwise_hybird_contrasive, self).__init__(config_roberta, path,num)
# self.softmax=torch.nn.Softmax()
# self.CrossEntropyLoss=torch.nn.CrossEntropyLoss()
# self.FFN2=
# self.softmax = nn.Softmax(dim=1)
return
def forward(self, input_ids, input_mask, segment_ids, all_en_query, all_en_ans):
ch_match_embedding = self.ch_matching_model(input_ids, input_mask, segment_ids)
en_match_embedding = self.en_matching_model(all_en_query, all_en_ans)

hybird_represent = torch.cat([ch_match_embedding, en_match_embedding], dim=1)
output = self.FFN2(self.relu(self.FFN1(self.dropout(hybird_represent))))

# y_pred_prob, y_pred = torch.max(self.softmax(output.data), 1)

return output
def loss(self,predict,target):
# predict=predict.reshape(-1,target.shape[1])
# predict = torch.squeeze(predict, dim=1)
# predict=torch.unsqueeze(predict, dim=0)
# target=torch.argmax(target,dim=1)
# target= torch.unsqueeze(target, dim=0)
# self.loss(predict,target)
CrossEntropyLoss=torch.nn.CrossEntropyLoss()
return CrossEntropyLoss(predict,target)

def predict(self, output):
softmax = nn.Softmax(dim=1)
y_pred_prob, y_pred = torch.max(softmax(output.data), 1)
# y_pred = y_pred.cpu().numpy()
y_pred_prob = y_pred_prob.cpu().numpy()
for i in range(len(y_pred_prob)):
if not y_pred[i]:
y_pred_prob[i] = 1 - y_pred_prob[i]
return y_pred_prob

nn.Module

https://www.cnblogs.com/tian777/p/15341522.html

1 init

2 forward

3 loss

pytorch各种交叉熵函数的汇总具体使用

https://blog.csdn.net/comway_Li/article/details/121490170

L2和L1正则化

https://blog.csdn.net/guyuealian/article/details/88426648

优化器固定实现L2正则化,源码注释:weight_decay (:obj:float, optional, defaults to 0):
Weight decay (L2 penalty)

1
2
3
4
5
6
param_optimizer = list(model.named_parameters())
no_decay = ['bias', 'LayerNorm.bias', 'LayerNorm.weight']
optimizer_grouped_parameters = [
{'params': [p for n, p in param_optimizer if not any(nd in n for nd in no_decay)], 'weight_decay': 0.01},
{'params': [p for n, p in param_optimizer if any(nd in n for nd in no_decay)], 'weight_decay': 0.0}
]

4 predict

2.构建训练框架

a.数据加载器

Dataset/TensorDataset -》 Sampler -》 Dataloader

https://zhuanlan.zhihu.com/p/337850513#

https://blog.csdn.net/ljp1919/article/details/116484330

https://blog.csdn.net/qq_39507748/article/details/105385709

b.优化器

https://pytorch.org/docs/stable/optim.html

1
2
3
4
5
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
optimizer=optim.SGD([
{'params': model.base.parameters()},
{'params': model.classifier.parameters(), 'lr': 1e-3}
], lr=1e-2, momentum=0.9)

c 训练

optimizer.zero_grad() 梯度归零, loss.backward() 反向传播 , optimizer.step() 参数更新

https://blog.csdn.net/PanYHHH/article/details/107361827

d. 验证

with torch.no_grad()

验证,测试时候用:可显著减少显存占用

https://wstchhwp.blog.csdn.net/article/details/108405102

https://blog.csdn.net/weixin_44134757/article/details/105775027

e. 评价指标

f. 模型保存

https://blog.csdn.net/m0_37605642/article/details/120325062

https://blog.csdn.net/weixin_41278720/article/details/80759933

g. 可视化

https://blog.csdn.net/Wenyuanbo/article/details/118937790

3.预测

加载模型,输入数据,调用网络结构

参考

https://blog.csdn.net/qq_45847624/article/details/114885655

Pre-Training with Whole Word Masking for Chinese BERT

BERT-wwm-ext

wwm:whole word mask

ext: we also use extended training data (mark with ext in the model name)

预训练

1 改变mask策略

Whole Word Masking,wwm

cws: Chinese Word Segmentation

对比四种mask策略

参考

Pre-Training with Whole Word Masking for Chinese BERT

https://arxiv.org/abs/1906.08101v3

Revisiting Pre-trained Models for Chinese Natural Language Processing

https://arxiv.org/abs/2004.13922

github:https://hub.fastgit.org/ymcui/Chinese-BERT-wwm

 NLP PTM
  
 PTM


:D 一言句子获取中...