We choose the General Language Understanding Evaluation (GLUE) dataset [119] as the benchmark to assess the general language representation capabilities maintained by KEPLMs. It is a benchmark used to measure the performance of language models, containing nine natural language understanding tasks. It ...