来自神经机器翻译(NMT)领域的Seq2seq模型已经在该领域的很多任务上取得了SOTA的结果,本文提出Code2seq模型,利用编程语言中的语法结构对源代码进行编码。该模型在代码片段的AST种提取一部分路径,进行LSTM编码后利用Attention生成目标序列。该模型在2个任务、2种编程语言、4个数据集上进行实验,取得了比之前的模型都好的...
其中code下有三个重要子文件夹configs,notebooks(放源码,preparation初步下载并且处理数据,code2seq为项目主代码,上图中.jupyter文件为github中源文件,.py文件是我将其中的代码摘到空python文件中的,因为要放在服务器中运行。),src(工具属性代码,由code2seq代码在最前方引用工具) 处理数据(preparation文件) #下载数据...
(NMT), have achieved state-of-the-art performance on these tasks by treating source code as a sequence of tokens. We presentCODE2SEQ: an alternative approach that leverages the syntactic structure of programming languages to better encode source code. Our model represents a code snippet as the...
作者通过如上图所示的Model,借助NMT的模式,对code进行翻译,生成一段seq。 decoder的初始状态设为所有AST-path的mean值,这里并没有考虑path的顺序问题。 略过实验设置。。。 总结一下: 这篇paper应该是较早地结合了AST的路径信息,这和meta-path有点像。 核心观点:两端功能相同的代码,无论写法如何改变,决定这段co...
python3 code2seq.py --load models/java-large-model/model_iter52.release --test data/java-large/java-large.test.c2s While evaluating, a file named "log.txt" is written to the same dir as the saved models, with each test example name and the model's prediction....
git clone https://github.com/tech-srl/code2seq cd code2seq Step 1: Creating a new dataset from Java sources To obtain a preprocessed dataset to train a network on, you can either download our preprocessed dataset, or create a new dataset from Java source files. ...
fengc_h/code2seq 代码 Issues 0 Pull Requests 0 Wiki 统计 流水线 服务 标签 Tags Releases 功能基于仓库中的历史标记 建议使用类似 V1.0 的版本标记作为 Releases 点。支付提示 将跳转至支付宝完成支付 确定 取消 捐赠 捐赠前请先登录 取消 前往登录 登录提示 该操作需登录 Gitee 帐号,请先登...
In this work, we present SCAE, a code authorship obfuscation technique that leverages a Seq2Seq code transformer called StructCoder . SCAE customizes StructCoder , a system designed initially for function-level code translation from one language to another ( e.g., Java to C # \\# ), ...
论文链接:[1808.01400] code2seq: Generating Sequences from Structured Representations of Code (arxiv.org) 目标 将代码片段转化为一个向量,以便于下游任务使用。该方法在代码摘要、代码文档生成、代码检索等方面有较好的表现 想法 同样是Encoder-Decoder模型,不同点在于:将代码片段转换为AST,然后将AST的路径作为输...
Code for the model presented in the paper: "code2seq: Generating Sequences from Structured Representations of Code" - code2seq/model.py at master · tech-srl/code2seq