我在使用hfl/chinese-roberta-wwm-ext-large模型,在下游任务上微调mlm_loss的时候发现loss是300多,并且一直升高; 我用模型测试了几个mask句子任务,发现只有hfl/chinese-roberta-wwm-ext-large有问题,结果如下 我测试使用的是transformers里的TFBertForMaskedLM,具体代码如下: ...
hfl_chinese-roberta-wwm-ext.zip2023-12-04364.18MB 文档 Please use 'Bert' related functions to load this model! Chinese BERT with Whole Word Masking For further accelerating Chinese natural language processing, we provideChinese pre-trained BERT with Whole Word Masking. ...
1.88 License Unknown Expected update frequency Not specified Tags config.json(781 B) get_app fullscreen chevron_right "root":{ 26 items "_name_or_path": string"hfl/chinese-roberta-wwm-ext-large" "architectures":[ 1 item 0 : string"BertForMaskedLM" ...
For Chinese tasks, the teacher models areRoBERTa-wwm-extandElectra-basereleased by the Joint Laboratory of HIT and iFLYTEK Research. We have tested different student models. To compare with public results, the student models are built with standard transformer blocks except for BiGRU which is a ...
This is a re-trained 3-layer RoBERTa-wwm-ext model. Chinese BERT with Whole Word Masking For further accelerating Chinese natural language processing, we provide Chinese pre-trained BERT with Whole Word Masking. Pre-Training with Whole Word Masking for Chinese BERTYiming Cui, Wanxiang Che, Ting...