🔥When Large Language Models Confront Repository-Level Automatic Program Repair: How Well They Done? [2024-ICSE] 🔥Exploring Parameter-Efficient Fine-Tuning of Large Language Model on Automated Program Repair[2024-ASE] Exploring the Potential of Conversational Test Suite Based Program Repair on SWE...
natural language understanding and generation tasks with zero-shot learning, few-shot learning or fine-tuning. We trained the model with 10 billion parameters on a 4TB corpus consisting of plain texts and a large-scale knowledge graph. Empirical results show that the model outperforms the state-...
Finally, we conduct comparative experiments using four LLMs (i.e., CodeBERT, AthenaTest, StarCoder, and CodeLlama7B) to assess the impact of noise on test generation performance. The results show that filtering noise positively influences the test generation ability of the models. PDF Abstract...
In this pa- per, we propose a single repair engine, that leverages a large language model trained on code (LLMC) to perform multi- lingual repair. We select Codex by OpenAI as the LLMC. Our system, RING, shows that repair is nearly generation and exploits Codex's few-shot learning ...
There’s been a fair amount of research on how to augment LLMs with structured code graphs. GraphCodeBERT is a pretty good example from back in 2021, focusing on the data-flow graph. A quote from the authors: We further show that the model prefers structure-level attentions over token-...
CodeBERT、Fintune、Adam、Dropout、Residual MSE、LSTM、GRU、CNN、RNN、Vanilla RNN、LSTM、GRU、ConvS2S、ByteNet、Neural GPU、End-to-End Memory Networks、Additive Attention、Dot-Product Attention、Scaled Dot-Product Attention、Multi-Head Attention、Cross-Attention、Self-Attention、Intra-Attention、Inter-...
Fix Bugs with Transformer through a Neural-symbolic Edit Grammararxiv2022no Grammar-based Patches Generation for Automated Program RepairACL-IJCNLP2021no Leveraging Causal Inference for Explainable Automatic Program Repairarxiv2022no Codebert: A Pre-trained Model for Programming and Natural LanguagesEMNLP202...
Self-training can address the labeled data scarcity issue by leveraging the large-scale unlabeled data in addition to labeled data, which is one of the mature paradigms in semi-supervised learning. However, the standard self-training may generate too much noise, inevitably degrading the model perfo...
fromapplications.information_extraction.HugIE.api_testimportHugIEAPImodel_type="bert"hugie_model_name_or_path="wjn1996/wjn1996-hugnlp-hugie-large-zh"hugie=HugIEAPI("bert",hugie_model_name_or_path)text="央广网北京2月23日消息 据中国地震台网正式测定,2月23日8时37分在塔吉克斯坦发生7.2级地震,震源...