已经解决,修改bert的optimization文件,如下: tvars = tf.trainable_variables() grads = tf.gradients(loss, tvars) (增加不同的学习率,且仅对低学习率进行裁剪) (grads, _) = tf.clip_by_global_norm(grads, clip_norm=1.0) new_grads = [] for i in range(len(tvars)): grad = grads[i] var ...