我在使用hfl/chinese-roberta-wwm-ext-large模型,在下游任务上微调mlm_loss的时候发现loss是300多,并且一直升高; 我用模型测试了几个mask句子任务,发现只有hfl/chinese-roberta-wwm-ext-large有问题,结果如下 我测试使用的是transformers里的TFBertForMaskedLM,具体代
hfl_chinese-roberta-wwm-ext.zip2023-12-04364.18MB 文档 Please use 'Bert' related functions to load this model! Chinese BERT with Whole Word Masking For further accelerating Chinese natural language processing, we provideChinese pre-trained BERT with Whole Word Masking. ...
fullscreen chevron_right "root":{ 26 items "_name_or_path": string"hfl/chinese-roberta-wwm-ext-large" "architectures":[ 1 item 0 : string"BertForMaskedLM" ] "attention_probs_dropout_prob": float0.1 "bos_token_id": int0 "directionality": ...
For English tasks, the teacher model isBERT-base-cased. For Chinese tasks, the teacher models areRoBERTa-wwm-extandElectra-basereleased by the Joint Laboratory of HIT and iFLYTEK Research. We have tested different student models. To compare with public results, the student models are built with...
This is a re-trained 3-layer RoBERTa-wwm-ext model. Chinese BERT with Whole Word Masking For further accelerating Chinese natural language processing, we provide Chinese pre-trained BERT with Whole Word Masking. Pre-Training with Whole Word Masking for Chinese BERTYiming Cui, Wanxiang Che, Ting...