基准数据(Benchmark Dataset)对一个领域的发展至关重要。例如,ImageNet(斯坦福大学)极大地推动了计算机视觉领域的发展,类似包含多种任务的 GLUE(纽约大学)和 XGLUE(https://microsoft.github.io/XGLUE/,微软亚洲研究院自然语言计算组)数据集在自然语言处理领域也产生了非常深远的影响。近年来,统计机器学习算法,尤其是...
基准数据(Benchmark Dataset)对一个领域的发展至关重要。例如,ImageNet(斯坦福大学)极大地推动了计算机视觉领域的发展,类似包含多种任务的 GLUE(纽约大学)和 XGLUE(https://microsoft.github.io/XGLUE/,微软亚洲研究院自然语言计算组)数据集在自然语言处理领域也产生了非常深远的影响。近年来,统计机器学习算法,尤其是...
Benchmark datasets have a significant impact on accelerating research in programming language tasks. In this paper, we introduce CodeXGLUE, a benchmark dataset to foster machine learning research for program understanding and generation. CodeXGLUE includes a collection of...
A dataset between Java and C# is newly created. Code search (CodeSearchNet, AdvTest; CodeSearchNet, WebQueryTest). ). A model is given the task of measuring semantic similarity between text and code. In the retrieval scenario, a test set is newly created where function names and variables...
To address this, researchers from Microsoft Research Asia, Developer Division, and Bing introduce CodeXGLUE, a benchmark dataset and open challenge for code intelligence. It includes a collection of code intelligence tasks and a platform for model evaluation and comparison. CodeXGLUE stands for General...
论文笔记 | CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation,程序员大本营,技术文章内容聚合第一站。
基准数据(Benchmark Dataset)对一个领域的发展至关重要。例如,ImageNet(斯坦福大学)极大地推动了计算机视觉领域的发展,类似包含多种任务的 GLUE(纽约大学)和 XGLUE(https://microsoft.github.io/XGLUE/,微软亚洲研究院自然语言计算组)数据集在自然语言处理领域也产生了非常深远的影响。近年来,统计机器学习算法,尤其是...
基准数据(Benchmark Dataset)对一个领域的发展至关重要。例如,ImageNet(斯坦福大学)极大地推动了计算机视觉领域的发展,类似包含多种任务的 GLUE(纽约大学)和 XGLUE(https://microsoft.github.io/XGLUE/,微软亚洲研究院自然语言计算组)数据集在自然语言处理领域也产生了非常深远的影响。近年来,统计机器学习算法,尤其是...
CodeXGLUE includes six existing code intelligence datasets — BigCloneBench, POJ-104, Defects4J, Bugs2Fix, CONCODE, and CodeSearchNet — but also newly introduced datasets that are highlighted in the table above. Below, we elaborate on the task definition for each task and dataset. ...
Code repair (Bugs2Fix). A model is tasked with trying to automatically refine the code, which could be buggy or complex. An existing dataset is included. Text-to-code generation (CONCODE). A model is given the task to generate code given natural language description. An existing dataset is...